First data analysis walkthrough in Data Analysis Python - Time & Space Complexity
When we analyze data, we often run code that looks at many rows. Understanding how the time to run this code grows as the data grows helps us plan better.
We want to know: How does the work change when we have more data?
Analyze the time complexity of the following code snippet.
import pandas as pd
def analyze_data(df):
total = 0
for value in df['numbers']:
total += value
return total
# df is a DataFrame with a column 'numbers'
This code adds up all the numbers in one column of a data table.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Looping through each number in the column.
- How many times: Once for every row in the data.
As the number of rows grows, the time to add all numbers grows at the same pace.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 additions |
| 100 | 100 additions |
| 1000 | 1000 additions |
Pattern observation: Doubling the data doubles the work.
Time Complexity: O(n)
This means the time to finish grows directly with the number of rows.
[X] Wrong: "Adding numbers in a column takes the same time no matter how big the data is."
[OK] Correct: Each number must be looked at once, so more data means more work.
Knowing how your code grows with data size shows you understand efficiency. This helps you write better data analysis code and explain your thinking clearly.
"What if we used a built-in function like sum() instead of a loop? How would the time complexity change?"