Challenge - 5 Problems
Chunked Reading Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of chunked sum aggregation
What is the output of this code that reads a CSV file in chunks and sums a column?
Pandas
import pandas as pd from io import StringIO data = '''id,value 1,10 2,20 3,30 4,40 5,50 ''' chunk_iter = pd.read_csv(StringIO(data), chunksize=2) total = 0 for chunk in chunk_iter: total += chunk['value'].sum() print(total)
Attempts:
2 left
💡 Hint
Sum all values in the 'value' column across all chunks.
✗ Incorrect
The code reads the CSV in chunks of 2 rows each. The values are 10,20,30,40,50. Summing all gives 150.
❓ data_output
intermediate2:00remaining
Number of chunks read
How many chunks will be read from this CSV file with 7 rows and chunksize=3?
Pandas
import pandas as pd from io import StringIO data = '''id,value 1,5 2,10 3,15 4,20 5,25 6,30 7,35 ''' chunk_iter = pd.read_csv(StringIO(data), chunksize=3) count = 0 for chunk in chunk_iter: count += 1 print(count)
Attempts:
2 left
💡 Hint
Divide total rows by chunk size and round up.
✗ Incorrect
7 rows with chunksize 3 means chunks: 3,3,1 rows. So total 3 chunks.
🔧 Debug
advanced2:00remaining
Identify error in chunk processing code
What error does this code raise when reading a CSV in chunks and trying to access a non-existent column?
Pandas
import pandas as pd from io import StringIO data = '''id,value 1,100 2,200 ''' chunk_iter = pd.read_csv(StringIO(data), chunksize=1) for chunk in chunk_iter: print(chunk['amount'].sum())
Attempts:
2 left
💡 Hint
Check if the column 'amount' exists in the data.
✗ Incorrect
The column 'amount' does not exist in the DataFrame, so accessing it raises a KeyError.
🚀 Application
advanced2:00remaining
Efficiently count rows with condition using chunks
You want to count how many rows in a large CSV have 'value' greater than 50 using chunked reading. Which code snippet correctly does this?
Attempts:
2 left
💡 Hint
Sum the boolean condition counts for each chunk.
✗ Incorrect
Option C correctly sums the number of rows where 'value' > 50 across all chunks. Others either overwrite count or use wrong logic.
🧠 Conceptual
expert2:00remaining
Why use chunked reading for large files?
Which is the main reason to use chunked reading when processing large CSV files?
Attempts:
2 left
💡 Hint
Think about memory limits when files are very large.
✗ Incorrect
Chunked reading loads small parts of the file sequentially, avoiding high memory use from loading the whole file.