Consider this code that reads a large CSV file in chunks and counts total rows:
import pandas as pd
chunk_iter = pd.read_csv('large_data.csv', chunksize=1000)
total_rows = 0
for chunk in chunk_iter:
total_rows += len(chunk)
print(total_rows)If the CSV has 4523 rows, what will be printed?
import pandas as pd chunk_iter = pd.read_csv('large_data.csv', chunksize=1000) total_rows = 0 for chunk in chunk_iter: total_rows += len(chunk) print(total_rows)
Think about how chunksize controls the number of rows per chunk and how the loop sums all rows.
The code reads the CSV in chunks of 1000 rows. It sums the length of each chunk. Since the file has 4523 rows, the total sum is 4523.
Given this DataFrame with integer columns:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': np.random.randint(0, 1000, size=10000),
'B': np.random.randint(0, 100000, size=10000)
})
df_optimized = df.copy()
df_optimized['A'] = pd.to_numeric(df_optimized['A'], downcast='unsigned')
df_optimized['B'] = pd.to_numeric(df_optimized['B'], downcast='unsigned')
print(df_optimized.memory_usage(deep=True).sum())Which option best describes the output?
import pandas as pd import numpy as np df = pd.DataFrame({ 'A': np.random.randint(0, 1000, size=10000), 'B': np.random.randint(0, 100000, size=10000) }) df_optimized = df.copy() df_optimized['A'] = pd.to_numeric(df_optimized['A'], downcast='unsigned') df_optimized['B'] = pd.to_numeric(df_optimized['B'], downcast='unsigned') print(df_optimized.memory_usage(deep=True).sum())
Downcasting reduces the size of numeric columns by using smaller data types.
Downcasting converts columns to smaller unsigned integer types, reducing memory usage compared to original int64 columns.
Look at this code snippet:
import pandas as pd
df = pd.DataFrame({'value': range(1000000)})
filtered = df[df['value'] > 500000]
filtered.reset_index(inplace=True)
print(filtered.head())What error will this code produce?
import pandas as pd df = pd.DataFrame({'value': range(1000000)}) filtered = df[df['value'] > 500000] filtered.reset_index(inplace=True) print(filtered.head())
reset_index with inplace=True adds the old index as a column named 'index'.
The code filters rows and resets the index inplace. This adds the old index as a new column named 'index'. No error occurs.
You have original and optimized DataFrames with memory usage values:
memory_original = 8000000 memory_optimized = 2000000
Which plot code will clearly show the reduction?
import matplotlib.pyplot as plt memory_original = 8000000 memory_optimized = 2000000 plt.bar(['Original', 'Optimized'], [memory_original, memory_optimized]) plt.ylabel('Memory Usage (bytes)') plt.title('Memory Usage Before and After Optimization') plt.show()
Bar charts are good for comparing two values side by side.
The bar chart clearly shows the difference between original and optimized memory usage as two bars.
You have a dataset too large to fit into your computer's RAM. Which approach is best to analyze it efficiently?
Think about how to work with data that doesn't fit in memory.
Reading data in chunks allows processing parts of the dataset sequentially without loading all data at once, which is efficient for large datasets.