Challenge - 5 Problems
Large File Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Reading a large binary file with memory mapping
What is the shape of the numpy array
data after running this code snippet?NumPy
import numpy as np filename = 'large_file.dat' data = np.memmap(filename, dtype='float32', mode='r', shape=(1000, 1000)) print(data.shape)
Attempts:
2 left
💡 Hint
The
shape parameter defines the array dimensions when using np.memmap.✗ Incorrect
The
np.memmap creates an array view of the file with the given shape. Here, shape is (1000, 1000), so the output shape is (1000, 1000).❓ data_output
intermediate2:00remaining
Effect of chunk reading on memory usage
Given a large CSV file too big to fit in memory, which code snippet correctly reads it in chunks and prints the total number of rows?
NumPy
import pandas as pd filename = 'large_data.csv' chunk_size = 10000 row_count = 0 for chunk in pd.read_csv(filename, chunksize=chunk_size): row_count += len(chunk) print(row_count)
Attempts:
2 left
💡 Hint
Reading in chunks allows processing parts of the file without loading all data at once.
✗ Incorrect
The code sums the length of each chunk, resulting in the total number of rows in the file.
🔧 Debug
advanced2:00remaining
Fixing memory error when loading large numpy array
This code tries to load a large numpy array from a file but causes a MemoryError. Which option fixes the issue by using memory mapping?
NumPy
import numpy as np data = np.load('large_array.npy') print(data.sum())
Attempts:
2 left
💡 Hint
Use the
mmap_mode parameter in np.load to avoid loading all data into memory.✗ Incorrect
Option A uses
mmap_mode='r' which memory maps the file, preventing MemoryError by loading data on demand.❓ visualization
advanced2:00remaining
Visualizing data read in chunks
You want to plot the sum of values in each chunk of a large CSV file. Which code snippet produces a line plot of chunk sums?
NumPy
import pandas as pd import matplotlib.pyplot as plt filename = 'large_data.csv' chunk_size = 5000 chunk_sums = [] for chunk in pd.read_csv(filename, chunksize=chunk_size): chunk_sums.append(chunk['value'].sum()) plt.plot(chunk_sums) plt.xlabel('Chunk number') plt.ylabel('Sum of values') plt.title('Sum per chunk') plt.show()
Attempts:
2 left
💡 Hint
Summing values per chunk and plotting them shows trends without loading full data.
✗ Incorrect
The code sums the 'value' column in each chunk and plots these sums as a line graph.
🧠 Conceptual
expert3:00remaining
Choosing the best method for large file processing
You have a 100GB CSV file and limited RAM (8GB). You want to compute the average of a numeric column efficiently. Which approach is best?
Attempts:
2 left
💡 Hint
Think about memory limits and efficient partial processing.
✗ Incorrect
Reading in chunks avoids memory overload and allows incremental computation of the average.