0
0
PythonHow-ToBeginner · 4 min read

How to Scrape Table from Website Using Python Easily

To scrape a table from a website using Python, use the requests library to get the page content and BeautifulSoup to parse the HTML and extract the table data. Then, you can convert the extracted data into a list or a pandas DataFrame for easy use.
📐

Syntax

Here is the basic syntax to scrape a table from a website:

  • requests.get(url): Fetches the webpage content.
  • BeautifulSoup(html, 'html.parser'): Parses the HTML content.
  • soup.find('table'): Finds the first table element.
  • Loop through table.find_all('tr') to get rows.
  • Extract cell data from td or th tags.
python
import requests
from bs4 import BeautifulSoup

url = 'http://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')

for row in table.find_all('tr'):
    cells = row.find_all(['td', 'th'])
    data = [cell.text.strip() for cell in cells]
    print(data)
💻

Example

This example fetches a sample webpage with a table, extracts the table data, and prints it as rows of text.

python
import requests
from bs4 import BeautifulSoup

url = 'https://www.w3schools.com/html/html_tables.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table', {'id': 'customers'})

for row in table.find_all('tr'):
    cells = row.find_all(['td', 'th'])
    data = [cell.text.strip() for cell in cells]
    print(data)
Output
[['Company', 'Contact', 'Country'], ['Alfreds Futterkiste', 'Maria Anders', 'Germany'], ['Centro comercial Moctezuma', 'Francisco Chang', 'Mexico'], ['Ernst Handel', 'Roland Mendel', 'Austria'], ['Island Trading', 'Helen Bennett', 'UK'], ['Laughing Bacchus Winecellars', 'Yoshi Tannamuri', 'Canada'], ['Magazzini Alimentari Riuniti', 'Giovanni Rovelli', 'Italy']]
⚠️

Common Pitfalls

Common mistakes when scraping tables include:

  • Not checking if the table exists before accessing it, causing errors.
  • Ignoring that some tables have nested tags or missing cells.
  • Not handling network errors when fetching the page.
  • Extracting raw HTML instead of clean text.

Always verify the table's presence and clean the text data.

python
import requests
from bs4 import BeautifulSoup

url = 'https://example.com'
response = requests.get(url)

if response.status_code == 200:
    soup = BeautifulSoup(response.text, 'html.parser')
    table = soup.find('table')
    if table:
        for row in table.find_all('tr'):
            cells = row.find_all(['td', 'th'])
            data = [cell.text.strip() for cell in cells]
            print(data)
    else:
        print('No table found on the page.')
else:
    print('Failed to retrieve the webpage.')
📊

Quick Reference

Summary tips for scraping tables:

  • Use requests to get HTML content.
  • Parse HTML with BeautifulSoup.
  • Find the correct table by id, class, or position.
  • Loop through tr rows and extract td or th cells.
  • Clean cell text with strip().
  • Handle missing tables and network errors gracefully.

Key Takeaways

Use requests to fetch the webpage and BeautifulSoup to parse HTML for scraping tables.
Always check if the table exists before extracting data to avoid errors.
Extract text from and tags and clean it with strip() for neat results.
Handle network errors and missing elements gracefully in your code.
You can convert scraped table data into pandas DataFrame for easier analysis.