Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to read a large CSV file in chunks using pandas.
Data Analysis Python
import pandas as pd chunks = pd.read_csv('large_file.csv', chunksize=[1]) for chunk in chunks: print(chunk.head())
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 0 or negative numbers for chunksize causes errors.
Not specifying chunksize reads the whole file at once.
✗ Incorrect
The chunksize parameter specifies how many rows to read at a time. 1000 is a common chunk size for large files.
2fill in blank
mediumComplete the code to sum a column named 'sales' from each chunk.
Data Analysis Python
import pandas as pd chunks = pd.read_csv('large_file.csv', chunksize=1000) total_sales = 0 for chunk in chunks: total_sales += chunk['[1]'].sum() print(total_sales)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using a wrong column name causes KeyError.
Summing a non-numeric column causes errors.
✗ Incorrect
We want to sum the 'sales' column from each chunk to get the total sales.
3fill in blank
hardFix the error in the code to filter rows where 'quantity' is greater than 10 in each chunk.
Data Analysis Python
import pandas as pd chunks = pd.read_csv('large_file.csv', chunksize=500) filtered_rows = [] for chunk in chunks: filtered = chunk[chunk['quantity'] [1] 10] filtered_rows.append(filtered) result = pd.concat(filtered_rows) print(result)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using '=>' causes syntax error.
Using '<=' or '==' filters wrong rows.
✗ Incorrect
The correct operator to filter rows where quantity is greater than 10 is '>'.
4fill in blank
hardFill both blanks to create a dictionary with product names as keys and total sales as values from chunks.
Data Analysis Python
import pandas as pd chunks = pd.read_csv('large_file.csv', chunksize=1000) sales_dict = {} for chunk in chunks: for product, group in chunk.groupby([1]): sales_dict[product] = sales_dict.get(product, 0) + group[[2]].sum() print(sales_dict)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Grouping by wrong column causes incorrect results.
Summing wrong column leads to wrong totals.
✗ Incorrect
We group by 'product_name' and sum the 'sales' column to accumulate total sales per product.
5fill in blank
hardFill all three blanks to read a large CSV, filter rows where 'region' is 'East', and count rows per 'category'.
Data Analysis Python
import pandas as pd chunks = pd.read_csv('large_file.csv', chunksize=[1]) category_counts = {} for chunk in chunks: filtered = chunk[chunk[[2]] == [3]] counts = filtered['category'].value_counts() for cat, count in counts.items(): category_counts[cat] = category_counts.get(cat, 0) + count print(category_counts)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong chunk size may slow processing.
Filtering on wrong column or value returns wrong data.
✗ Incorrect
We read in chunks of 500 rows, filter where 'region' equals 'East', then count categories.