What is Pandas - Complexity Analysis
We want to understand how the time it takes to use Pandas grows as the data size grows.
How does Pandas handle bigger data and what costs come with it?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = {'Name': ['Anna', 'Bob', 'Cara', 'Dan'],
'Age': [23, 35, 45, 28],
'City': ['NY', 'LA', 'NY', 'Chicago']}
df = pd.DataFrame(data)
result = df['Age'].mean()
This code creates a small table and calculates the average age.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Going through each number in the 'Age' column to add them up.
- How many times: Once for each row in the data.
As the number of rows grows, the time to calculate the average grows too.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 additions |
| 100 | 100 additions |
| 1000 | 1000 additions |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the time to calculate the average grows in a straight line as the data grows.
[X] Wrong: "Calculating the average takes the same time no matter how big the data is."
[OK] Correct: The calculation must look at each number once, so more data means more work.
Understanding how Pandas handles data size helps you explain your code choices clearly and confidently.
"What if we calculated the average of a filtered column instead? How would the time complexity change?"