Jupyter Notebook setup and usage in Data Analysis Python - Time & Space Complexity
We want to understand how the time it takes to run code in a Jupyter Notebook changes as we add more cells or data.
How does the notebook's performance grow when we do more work inside it?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # Define n before using it
data = pd.DataFrame({'numbers': range(n)})
result = []
for i in data['numbers']:
result.append(i * 2)
This code creates a list of numbers from 0 to n-1 and doubles each number, storing the results.
- Primary operation: Looping through each number in the data.
- How many times: Exactly n times, once for each number.
As the number of items n grows, the time to double each number grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 operations |
| 100 | 100 operations |
| 1000 | 1000 operations |
Pattern observation: The operations grow directly with the input size. Double the input means double the work.
Time Complexity: O(n)
This means the time to run the code grows in a straight line with the number of items you process.
[X] Wrong: "Adding more cells or data in a Jupyter Notebook does not affect performance much."
[OK] Correct: Each cell that processes more data takes more time, so the total time grows with the amount of work done.
Understanding how your code's running time grows helps you write better data analysis scripts and shows you think about efficiency, a key skill in data science.
"What if we replaced the for-loop with a vectorized operation like pandas' apply or map? How would the time complexity change?"