append equivalent with concat in Pandas - Time & Space Complexity
We want to understand how the time needed to combine data grows when using pandas concat instead of append.
How does the work change as the data size gets bigger?
Analyze the time complexity of this pandas code that combines two dataframes.
import pandas as pd
n = 10 # example value for n
df1 = pd.DataFrame({'A': range(n)})
df2 = pd.DataFrame({'A': range(n, 2*n)})
result = pd.concat([df1, df2], ignore_index=True)
This code joins two dataframes with n rows each into one dataframe with 2n rows.
Look at what repeats as data grows.
- Primary operation: copying rows from both dataframes into a new combined dataframe.
- How many times: once for each row in both dataframes, so about 2n times.
As the number of rows n grows, the work to combine grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 operations (copying rows) |
| 100 | About 200 operations |
| 1000 | About 2000 operations |
Pattern observation: doubling the input doubles the work needed.
Time Complexity: O(n)
This means the time to combine data grows linearly with the number of rows.
[X] Wrong: "Using concat is faster because it just links data without copying."
[OK] Correct: concat actually copies data to create a new dataframe, so time grows with data size.
Understanding how data combining scales helps you write efficient code and explain your choices clearly.
What if we combined many small dataframes in a loop using concat each time? How would the time complexity change?