Challenge - 5 Problems
HTML Table Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of reading multiple HTML tables
Given the following HTML content with two tables, what is the output of reading all tables using pandas
read_html?Data Analysis Python
import pandas as pd html = ''' <table> <tr><th>Name</th><th>Age</th></tr> <tr><td>Alice</td><td>30</td></tr> <tr><td>Bob</td><td>25</td></tr> </table> <table> <tr><th>City</th><th>Country</th></tr> <tr><td>Paris</td><td>France</td></tr> <tr><td>Berlin</td><td>Germany</td></tr> </table> ''' tables = pd.read_html(html) print(len(tables)) print(tables[0].iloc[1,0]) print(tables[1].iloc[0,1])
Attempts:
2 left
💡 Hint
Remember that
read_html returns a list of DataFrames, one per table.✗ Incorrect
The
read_html function reads all tables and returns a list. Here, there are two tables, so length is 2. The first table's second row first column is 'Bob'. The second table's first row second column is 'France'.❓ data_output
intermediate1:30remaining
Number of rows in the first HTML table
Using pandas
read_html on this HTML snippet, how many rows does the first table have (including header)?Data Analysis Python
import pandas as pd html = ''' <table> <tr><th>Product</th><th>Price</th></tr> <tr><td>Pen</td><td>1.5</td></tr> <tr><td>Notebook</td><td>3.0</td></tr> <tr><td>Eraser</td><td>0.5</td></tr> </table> ''' tables = pd.read_html(html) print(len(tables[0]))
Attempts:
2 left
💡 Hint
The header row is not counted as a data row in the DataFrame.
✗ Incorrect
The DataFrame contains only data rows, so the three product rows are counted. The header is used as column names, not counted as a row.
🔧 Debug
advanced2:00remaining
Error when reading malformed HTML table
What error will pandas
read_html raise when trying to read this malformed HTML table?Data Analysis Python
import pandas as pd html = ''' <table> <tr><th>Name<th>Age</tr> <tr><td>John<td>22</tr> </table> ''' tables = pd.read_html(html)
Attempts:
2 left
💡 Hint
pandas uses robust parsers that can handle some malformed HTML.
✗ Incorrect
Despite missing closing tags, pandas can parse the table and returns a DataFrame with one row.
🧠 Conceptual
advanced1:30remaining
Selecting specific table from multiple HTML tables
If an HTML page has 5 tables and you want to read only the third table using pandas
read_html, which approach is correct?Attempts:
2 left
💡 Hint
The
read_html function returns a list of tables; you select by index.✗ Incorrect
pandas
read_html returns a list of DataFrames. To get the third table, index 2 is used.🚀 Application
expert3:00remaining
Extracting and combining data from multiple HTML tables
You have an HTML page with two tables: one with employee names and IDs, another with employee IDs and salaries. How do you combine these tables into one DataFrame with columns: Name, ID, Salary?
Data Analysis Python
import pandas as pd html = ''' <table> <tr><th>Name</th><th>ID</th></tr> <tr><td>Alice</td><td>101</td></tr> <tr><td>Bob</td><td>102</td></tr> </table> <table> <tr><th>ID</th><th>Salary</th></tr> <tr><td>101</td><td>70000</td></tr> <tr><td>102</td><td>80000</td></tr> </table> ''' tables = pd.read_html(html) # What code combines the tables correctly?
Attempts:
2 left
💡 Hint
Use a merge on the common column to combine related data.
✗ Incorrect
The
merge function joins two DataFrames on a common column, here 'ID'. Other options do not join properly.