Memory-efficient operations in Data Analysis Python - Time & Space Complexity
When working with large data, how fast our code runs matters a lot. Here, we look at how memory-efficient operations affect the time it takes to process data.
We want to know how the speed changes as data size grows when using memory-friendly methods.
Analyze the time complexity of the following code snippet.
import pandas as pd
def process_data():
for chunk in pd.read_csv('large_file.csv', chunksize=1000):
filtered = chunk[chunk['value'] > 10]
# process filtered chunk
print(filtered.shape[0])
# Process data in chunks
process_data()
This code reads a large CSV file in small parts, filters each part, and processes it to save memory.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Looping over chunks of data read from the file.
- How many times: Number of chunks, which depends on total rows divided by chunk size.
As the total data size grows, the number of chunks grows proportionally.
| Input Size (rows) | Approx. Number of Chunks |
|---|---|
| 10,000 | 10 |
| 100,000 | 100 |
| 1,000,000 | 1000 |
Pattern observation: The number of operations grows linearly with the input size because each row is processed once in some chunk.
Time Complexity: O(n)
This means the time to process data grows directly in proportion to the number of rows.
[X] Wrong: "Reading data in chunks makes the process faster than reading all at once."
[OK] Correct: Chunking saves memory but does not reduce total processing time; it still reads and processes all rows once.
Understanding how memory-efficient methods affect time helps you explain practical trade-offs clearly. This skill shows you can handle big data thoughtfully.
"What if we increased the chunk size to process more rows at once? How would the time complexity change?"