Question 4
Hm, I get an error AttributeError: 'NoneType' object has no attribute 'find'
import requests
from bs4 import BeautifulSoup
url = 'https://vkatsikaros.github.io/dataharvest24-www.github.io/'
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find('table', id='target-table') # Adjust this line based on your table's actual identifier
if table is not None:
headers = []
for th in table.find('thead').find_all('th'):
headers.append(th.text.strip())
rows = []
for tr in table.find('tbody').find_all('tr'):
cells = [td.text.strip() for td in tr.find_all('td')]
rows.append(cells)
print("Headers:", headers)
for row in rows:
print("Row:", row)
else:
print("Table not found. Check the id or class name.")
else:
print('Failed to retrieve the webpage. Status code:', response.status_code)
The diff:
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
+ table = soup.find('table', id='target-table') # Adjust this line based on your table's actual identifier
- table = soup.find('table', id='target-table') # Use id or class as necessary
-
+ if table is not None:
headers = []
for th in table.find('thead').find_all('th'):
headers.append(th.text.strip())
@@ -21,5 +21,7 @@ if response.status_code == 200:
print("Headers:", headers)
for row in rows:
print("Row:", row)
+ else:
+ print("Table not found. Check the id or class name.")
else:
print('Failed to retrieve the webpage. Status code:', response.status_code)
Output
Table not found. Check the id or class name.
If we inspect the page, we notice there are a lot of <table>
and it’s not easy to find the exact element we want. What can we do?
Instead of trying to locate in the whole page the element we want, let’s see if the element we want is located inside another element that is easier to locate. Kind of “divide and conquer” strategy. Let’s inspect the page and see if there is an element that contains the <table>
we are interested in!
⇦ question 3 | Index | question 5 ⇨ |