0
0
NumPydata~20 mins

Working with large files efficiently in NumPy - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Large File Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Reading a large binary file with memory mapping
What is the shape of the numpy array data after running this code snippet?
NumPy
import numpy as np
filename = 'large_file.dat'
data = np.memmap(filename, dtype='float32', mode='r', shape=(1000, 1000))
print(data.shape)
ARaises ValueError
B(1000000,)
C(1000,)
D(1000, 1000)
Attempts:
2 left
💡 Hint
The shape parameter defines the array dimensions when using np.memmap.
data_output
intermediate
2:00remaining
Effect of chunk reading on memory usage
Given a large CSV file too big to fit in memory, which code snippet correctly reads it in chunks and prints the total number of rows?
NumPy
import pandas as pd
filename = 'large_data.csv'
chunk_size = 10000
row_count = 0
for chunk in pd.read_csv(filename, chunksize=chunk_size):
    row_count += len(chunk)
print(row_count)
ARaises a TypeError
BPrints the total number of rows in the file
CPrints the number of columns in the file
DPrints the number of chunks read
Attempts:
2 left
💡 Hint
Reading in chunks allows processing parts of the file without loading all data at once.
🔧 Debug
advanced
2:00remaining
Fixing memory error when loading large numpy array
This code tries to load a large numpy array from a file but causes a MemoryError. Which option fixes the issue by using memory mapping?
NumPy
import numpy as np
data = np.load('large_array.npy')
print(data.sum())
Adata = np.load('large_array.npy', mmap_mode='r')
Bdata = np.loadtxt('large_array.npy')
Cdata = np.memmap('large_array.npy', dtype='float64', mode='r')
Ddata = np.load('large_array.npy', allow_pickle=True)
Attempts:
2 left
💡 Hint
Use the mmap_mode parameter in np.load to avoid loading all data into memory.
visualization
advanced
2:00remaining
Visualizing data read in chunks
You want to plot the sum of values in each chunk of a large CSV file. Which code snippet produces a line plot of chunk sums?
NumPy
import pandas as pd
import matplotlib.pyplot as plt
filename = 'large_data.csv'
chunk_size = 5000
chunk_sums = []
for chunk in pd.read_csv(filename, chunksize=chunk_size):
    chunk_sums.append(chunk['value'].sum())
plt.plot(chunk_sums)
plt.xlabel('Chunk number')
plt.ylabel('Sum of values')
plt.title('Sum per chunk')
plt.show()
APlots a line graph showing sum of 'value' column for each chunk
BPlots a bar chart of the number of rows per chunk
CPlots a scatter plot of 'value' vs index for the entire file
DRaises KeyError because 'value' column does not exist
Attempts:
2 left
💡 Hint
Summing values per chunk and plotting them shows trends without loading full data.
🧠 Conceptual
expert
3:00remaining
Choosing the best method for large file processing
You have a 100GB CSV file and limited RAM (8GB). You want to compute the average of a numeric column efficiently. Which approach is best?
AConvert CSV to numpy memmap and compute mean on memmap array
BLoad entire file with pandas.read_csv() and compute mean directly
CUse pandas.read_csv() with chunksize to process file in parts and compute running average
DUse Python's built-in open() and read line by line, converting to float and summing
Attempts:
2 left
💡 Hint
Think about memory limits and efficient partial processing.