concat() for stacking DataFrames in Pandas - Time & Space Complexity
When stacking DataFrames using pandas concat(), it is important to understand how the time needed grows as the data gets bigger.
We want to know how the work done changes when we add more rows or columns.
Analyze the time complexity of the following code snippet.
import pandas as pd
df1 = pd.DataFrame({'A': range(1000), 'B': range(1000)})
df2 = pd.DataFrame({'A': range(1000, 2000), 'B': range(1000, 2000)})
result = pd.concat([df1, df2], axis=0, ignore_index=True)
This code stacks two DataFrames vertically, combining their rows into one larger DataFrame.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Copying rows from each DataFrame into a new combined DataFrame.
- How many times: Once for each row in both DataFrames, so total rows combined.
As the number of rows increases, the time to copy and combine grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 row copies |
| 100 | About 200 row copies |
| 1000 | About 2000 row copies |
Pattern observation: Doubling the input roughly doubles the work done.
Time Complexity: O(n)
This means the time needed grows linearly with the total number of rows being stacked.
[X] Wrong: "Concatenating two DataFrames is a constant time operation regardless of size."
[OK] Correct: The operation must copy all rows to create a new DataFrame, so time grows with the total rows.
Understanding how concat() scales helps you reason about data processing speed and memory use in real projects.
"What if we stacked 10 DataFrames instead of 2? How would the time complexity change?"