How to Scrape Table from Website Using Python Easily
To scrape a table from a website using
Python, use the requests library to get the page content and BeautifulSoup to parse the HTML and extract the table data. Then, you can convert the extracted data into a list or a pandas DataFrame for easy use.Syntax
Here is the basic syntax to scrape a table from a website:
requests.get(url): Fetches the webpage content.BeautifulSoup(html, 'html.parser'): Parses the HTML content.soup.find('table'): Finds the first table element.- Loop through
table.find_all('tr')to get rows. - Extract cell data from
tdorthtags.
python
import requests from bs4 import BeautifulSoup url = 'http://example.com' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') table = soup.find('table') for row in table.find_all('tr'): cells = row.find_all(['td', 'th']) data = [cell.text.strip() for cell in cells] print(data)
Example
This example fetches a sample webpage with a table, extracts the table data, and prints it as rows of text.
python
import requests from bs4 import BeautifulSoup url = 'https://www.w3schools.com/html/html_tables.asp' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') table = soup.find('table', {'id': 'customers'}) for row in table.find_all('tr'): cells = row.find_all(['td', 'th']) data = [cell.text.strip() for cell in cells] print(data)
Output
[['Company', 'Contact', 'Country'],
['Alfreds Futterkiste', 'Maria Anders', 'Germany'],
['Centro comercial Moctezuma', 'Francisco Chang', 'Mexico'],
['Ernst Handel', 'Roland Mendel', 'Austria'],
['Island Trading', 'Helen Bennett', 'UK'],
['Laughing Bacchus Winecellars', 'Yoshi Tannamuri', 'Canada'],
['Magazzini Alimentari Riuniti', 'Giovanni Rovelli', 'Italy']]
Common Pitfalls
Common mistakes when scraping tables include:
- Not checking if the table exists before accessing it, causing errors.
- Ignoring that some tables have nested tags or missing cells.
- Not handling network errors when fetching the page.
- Extracting raw HTML instead of clean text.
Always verify the table's presence and clean the text data.
python
import requests from bs4 import BeautifulSoup url = 'https://example.com' response = requests.get(url) if response.status_code == 200: soup = BeautifulSoup(response.text, 'html.parser') table = soup.find('table') if table: for row in table.find_all('tr'): cells = row.find_all(['td', 'th']) data = [cell.text.strip() for cell in cells] print(data) else: print('No table found on the page.') else: print('Failed to retrieve the webpage.')
Quick Reference
Summary tips for scraping tables:
- Use
requeststo get HTML content. - Parse HTML with
BeautifulSoup. - Find the correct
tableby id, class, or position. - Loop through
trrows and extracttdorthcells. - Clean cell text with
strip(). - Handle missing tables and network errors gracefully.
Key Takeaways
Use requests to fetch the webpage and BeautifulSoup to parse HTML for scraping tables.
Always check if the table exists before extracting data to avoid errors.
Extract text from and tags and clean it with strip() for neat results.
Handle network errors and missing elements gracefully in your code.
You can convert scraped table data into pandas DataFrame for easier analysis.