Jupyter Notebook best practices in Data Analysis Python - Time & Space Complexity
We want to understand how the time it takes to run code in a Jupyter Notebook grows as the notebook gets bigger or more complex.
How does adding more cells or data affect the time to run the notebook?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.read_csv('large_data.csv')
for i in range(len(data)):
row = data.iloc[i]
# process each row
print(row['value'])
This code reads a large CSV file and then processes each row one by one.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Looping through each row of the data.
- How many times: Once for every row in the dataset.
As the number of rows grows, the time to process grows roughly the same amount.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 operations |
| 100 | 100 operations |
| 1000 | 1000 operations |
Pattern observation: The time grows directly with the number of rows. Double the rows, double the work.
Time Complexity: O(n)
This means the time to run the code grows in a straight line with the size of the data.
[X] Wrong: "Running more cells or adding comments does not affect execution time."
[OK] Correct: While comments do not affect time, adding many cells with heavy computations or repeated data loading can increase total run time significantly.
Understanding how code execution time grows in notebooks helps you write efficient data analysis and explain your approach clearly in real projects or interviews.
"What if we replaced the loop with vectorized operations using pandas? How would the time complexity change?"