Why DataFrame is the core data structure in Data Analysis Python - Performance Analysis
We want to understand how the time to work with a DataFrame changes as the data grows.
How does the size of data affect operations on a DataFrame?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # Example size
data = {'A': range(n), 'B': range(n)}
df = pd.DataFrame(data)
result = df['A'] + df['B']
This code creates a DataFrame with two columns and adds them element-wise.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Adding each pair of elements from two columns.
- How many times: Once for each row in the DataFrame (n times).
As the number of rows grows, the addition operation happens more times.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 additions |
| 100 | 100 additions |
| 1000 | 1000 additions |
Pattern observation: The number of operations grows directly with the number of rows.
Time Complexity: O(n)
This means the time to add columns grows in a straight line as the data size grows.
[X] Wrong: "Adding two columns is instant no matter how big the DataFrame is."
[OK] Correct: Each row must be processed, so more rows mean more work and more time.
Understanding how DataFrame operations scale helps you explain your code's efficiency clearly and confidently.
"What if we added a new column by combining three existing columns instead of two? How would the time complexity change?"