0
0
Data Analysis Pythondata~10 mins

Chunked reading for large files in Data Analysis Python - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to read a large CSV file in chunks using pandas.

Data Analysis Python
import pandas as pd
chunks = pd.read_csv('large_file.csv', chunksize=[1])
for chunk in chunks:
    print(chunk.head())
Drag options to blanks, or click blank then click option'
A0
B1000
C10
D-1
Attempts:
3 left
💡 Hint
Common Mistakes
Using 0 or negative numbers for chunksize causes errors.
Not specifying chunksize reads the whole file at once.
2fill in blank
medium

Complete the code to sum a column named 'sales' from each chunk.

Data Analysis Python
import pandas as pd
chunks = pd.read_csv('large_file.csv', chunksize=1000)
total_sales = 0
for chunk in chunks:
    total_sales += chunk['[1]'].sum()
print(total_sales)
Drag options to blanks, or click blank then click option'
Asales
Bquantity
Cdate
Dprofit
Attempts:
3 left
💡 Hint
Common Mistakes
Using a wrong column name causes KeyError.
Summing a non-numeric column causes errors.
3fill in blank
hard

Fix the error in the code to filter rows where 'quantity' is greater than 10 in each chunk.

Data Analysis Python
import pandas as pd
chunks = pd.read_csv('large_file.csv', chunksize=500)
filtered_rows = []
for chunk in chunks:
    filtered = chunk[chunk['quantity'] [1] 10]
    filtered_rows.append(filtered)
result = pd.concat(filtered_rows)
print(result)
Drag options to blanks, or click blank then click option'
A=>
B==
C<=
D>
Attempts:
3 left
💡 Hint
Common Mistakes
Using '=>' causes syntax error.
Using '<=' or '==' filters wrong rows.
4fill in blank
hard

Fill both blanks to create a dictionary with product names as keys and total sales as values from chunks.

Data Analysis Python
import pandas as pd
chunks = pd.read_csv('large_file.csv', chunksize=1000)
sales_dict = {}
for chunk in chunks:
    for product, group in chunk.groupby([1]):
        sales_dict[product] = sales_dict.get(product, 0) + group[[2]].sum()
print(sales_dict)
Drag options to blanks, or click blank then click option'
A'product_name'
B'sales'
C'quantity'
D'date'
Attempts:
3 left
💡 Hint
Common Mistakes
Grouping by wrong column causes incorrect results.
Summing wrong column leads to wrong totals.
5fill in blank
hard

Fill all three blanks to read a large CSV, filter rows where 'region' is 'East', and count rows per 'category'.

Data Analysis Python
import pandas as pd
chunks = pd.read_csv('large_file.csv', chunksize=[1])
category_counts = {}
for chunk in chunks:
    filtered = chunk[chunk[[2]] == [3]]
    counts = filtered['category'].value_counts()
    for cat, count in counts.items():
        category_counts[cat] = category_counts.get(cat, 0) + count
print(category_counts)
Drag options to blanks, or click blank then click option'
A500
B'region'
C'East'
D'sales'
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong chunk size may slow processing.
Filtering on wrong column or value returns wrong data.