We use reading HTML tables to get data from web pages easily. It helps us collect information without typing it all manually.
0
0
Reading HTML tables in Data Analysis Python
Introduction
You want to get sports scores from a website table.
You need financial data shown in tables on a web page.
You want to collect weather data presented in HTML tables.
You are scraping product prices listed in a table on an online store.
You want to analyze election results published as tables on news sites.
Syntax
Data Analysis Python
import pandas as pd # Read all tables from a webpage tables = pd.read_html('URL') # Access the first table first_table = tables[0]
pd.read_html() returns a list of tables found on the page.
You can select the table you want by its position in the list, starting at 0.
Examples
This code reads all tables from the URL and prints how many tables were found.
Data Analysis Python
import pandas as pd url = 'https://example.com/data' tables = pd.read_html(url) print(len(tables))
This reads the first table from the webpage and shows the first 5 rows.
Data Analysis Python
import pandas as pd url = 'https://example.com/data' table = pd.read_html(url)[0] print(table.head())
This reads tables that contain the word 'Population' in their content.
Data Analysis Python
import pandas as pd url = 'https://example.com/data' tables = pd.read_html(url, match='Population') print(tables[0])
Sample Program
This program fetches the Wikipedia page listing countries by GDP. It reads all tables, selects the first one, and prints the first 5 rows to see the data.
Data Analysis Python
import pandas as pd # URL with sample HTML tables url = 'https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)' # Read all tables from the page all_tables = pd.read_html(url) # Select the first table which lists countries and GDP gdp_table = all_tables[0] # Show the first 5 rows print(gdp_table.head())
OutputSuccess
Important Notes
Sometimes web pages have multiple tables; you may need to try different indexes to find the right one.
Make sure you have internet connection when reading tables from live URLs.
Some tables may require cleaning after reading to fix headers or data types.
Summary
Use pd.read_html() to get tables from web pages easily.
The function returns a list of tables; pick the one you want by index.
This method helps automate data collection from websites without manual copying.