0
0
Data Analysis Pythondata~20 mins

Reading HTML tables in Data Analysis Python - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
HTML Table Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of reading multiple HTML tables
Given the following HTML content with two tables, what is the output of reading all tables using pandas read_html?
Data Analysis Python
import pandas as pd
html = '''
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>Alice</td><td>30</td></tr>
<tr><td>Bob</td><td>25</td></tr>
</table>
<table>
<tr><th>City</th><th>Country</th></tr>
<tr><td>Paris</td><td>France</td></tr>
<tr><td>Berlin</td><td>Germany</td></tr>
</table>
'''
tables = pd.read_html(html)
print(len(tables))
print(tables[0].iloc[1,0])
print(tables[1].iloc[0,1])
A
2
Bob
France
B
1
Alice
France
C
2
Alice
Germany
D
1
Bob
Germany
Attempts:
2 left
💡 Hint
Remember that read_html returns a list of DataFrames, one per table.
data_output
intermediate
1:30remaining
Number of rows in the first HTML table
Using pandas read_html on this HTML snippet, how many rows does the first table have (including header)?
Data Analysis Python
import pandas as pd
html = '''
<table>
<tr><th>Product</th><th>Price</th></tr>
<tr><td>Pen</td><td>1.5</td></tr>
<tr><td>Notebook</td><td>3.0</td></tr>
<tr><td>Eraser</td><td>0.5</td></tr>
</table>
'''
tables = pd.read_html(html)
print(len(tables[0]))
A4
B2
C3
D1
Attempts:
2 left
💡 Hint
The header row is not counted as a data row in the DataFrame.
🔧 Debug
advanced
2:00remaining
Error when reading malformed HTML table
What error will pandas read_html raise when trying to read this malformed HTML table?
Data Analysis Python
import pandas as pd
html = '''
<table>
<tr><th>Name<th>Age</tr>
<tr><td>John<td>22</tr>
</table>
'''
tables = pd.read_html(html)
AValueError: No tables found
BParserError: malformed HTML
CImportError: lxml not installed
DNo error, returns a DataFrame with one row
Attempts:
2 left
💡 Hint
pandas uses robust parsers that can handle some malformed HTML.
🧠 Conceptual
advanced
1:30remaining
Selecting specific table from multiple HTML tables
If an HTML page has 5 tables and you want to read only the third table using pandas read_html, which approach is correct?
Atables = pd.read_html(url); df = tables[2]
Bdf = pd.read_html(url, match=3)
Cdf = pd.read_html(url, table=3)
Ddf = pd.read_html(url, index=3)
Attempts:
2 left
💡 Hint
The read_html function returns a list of tables; you select by index.
🚀 Application
expert
3:00remaining
Extracting and combining data from multiple HTML tables
You have an HTML page with two tables: one with employee names and IDs, another with employee IDs and salaries. How do you combine these tables into one DataFrame with columns: Name, ID, Salary?
Data Analysis Python
import pandas as pd
html = '''
<table>
<tr><th>Name</th><th>ID</th></tr>
<tr><td>Alice</td><td>101</td></tr>
<tr><td>Bob</td><td>102</td></tr>
</table>
<table>
<tr><th>ID</th><th>Salary</th></tr>
<tr><td>101</td><td>70000</td></tr>
<tr><td>102</td><td>80000</td></tr>
</table>
'''
tables = pd.read_html(html)
# What code combines the tables correctly?
A
df = tables[0].join(tables[1])
print(df)
B
df = pd.merge(tables[0], tables[1], on='ID')
print(df)
C
df = tables[0].concat(tables[1])
print(df)
D
df = tables[0].append(tables[1])
print(df)
Attempts:
2 left
💡 Hint
Use a merge on the common column to combine related data.