concat() for stacking DataFrames in Data Analysis Python - Time & Space Complexity
When stacking data tables using concat(), it is important to know how the time needed grows as the data gets bigger.
We want to find out how the work done changes when we add more rows or columns.
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # Example value for n
df1 = pd.DataFrame({'A': range(n)})
df2 = pd.DataFrame({'A': range(n, 2*n)})
result = pd.concat([df1, df2], axis=0, ignore_index=True)
This code stacks two data tables one below the other, combining their rows into one table.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Copying rows from each DataFrame into a new combined DataFrame.
- How many times: Each row from both tables is processed once, so about 2*n times if each has n rows.
As the number of rows grows, the work grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 row copies |
| 100 | About 200 row copies |
| 1000 | About 2000 row copies |
Pattern observation: Doubling the rows doubles the work needed.
Time Complexity: O(n)
This means the time to stack grows linearly with the total number of rows.
[X] Wrong: "concat() is instant no matter how big the data is."
[OK] Correct: The function must copy all rows to create the new table, so bigger data takes more time.
Understanding how concat() scales helps you explain data merging choices clearly and shows you know how data size affects performance.
"What if we stacked 10 DataFrames instead of 2? How would the time complexity change?"