0
0
Data Analysis Pythondata~5 mins

Reading HTML tables in Data Analysis Python

Choose your learning style9 modes available
Introduction

We use reading HTML tables to get data from web pages easily. It helps us collect information without typing it all manually.

You want to get sports scores from a website table.
You need financial data shown in tables on a web page.
You want to collect weather data presented in HTML tables.
You are scraping product prices listed in a table on an online store.
You want to analyze election results published as tables on news sites.
Syntax
Data Analysis Python
import pandas as pd

# Read all tables from a webpage
tables = pd.read_html('URL')

# Access the first table
first_table = tables[0]

pd.read_html() returns a list of tables found on the page.

You can select the table you want by its position in the list, starting at 0.

Examples
This code reads all tables from the URL and prints how many tables were found.
Data Analysis Python
import pandas as pd

url = 'https://example.com/data'
tables = pd.read_html(url)
print(len(tables))
This reads the first table from the webpage and shows the first 5 rows.
Data Analysis Python
import pandas as pd

url = 'https://example.com/data'
table = pd.read_html(url)[0]
print(table.head())
This reads tables that contain the word 'Population' in their content.
Data Analysis Python
import pandas as pd

url = 'https://example.com/data'
tables = pd.read_html(url, match='Population')
print(tables[0])
Sample Program

This program fetches the Wikipedia page listing countries by GDP. It reads all tables, selects the first one, and prints the first 5 rows to see the data.

Data Analysis Python
import pandas as pd

# URL with sample HTML tables
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)'

# Read all tables from the page
all_tables = pd.read_html(url)

# Select the first table which lists countries and GDP
gdp_table = all_tables[0]

# Show the first 5 rows
print(gdp_table.head())
OutputSuccess
Important Notes

Sometimes web pages have multiple tables; you may need to try different indexes to find the right one.

Make sure you have internet connection when reading tables from live URLs.

Some tables may require cleaning after reading to fix headers or data types.

Summary

Use pd.read_html() to get tables from web pages easily.

The function returns a list of tables; pick the one you want by index.

This method helps automate data collection from websites without manual copying.