0
0
Pandasdata~5 mins

concat() for stacking DataFrames in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: concat() for stacking DataFrames
O(n)
Understanding Time Complexity

When stacking DataFrames using pandas concat(), it is important to understand how the time needed grows as the data gets bigger.

We want to know how the work done changes when we add more rows or columns.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df1 = pd.DataFrame({'A': range(1000), 'B': range(1000)})
df2 = pd.DataFrame({'A': range(1000, 2000), 'B': range(1000, 2000)})

result = pd.concat([df1, df2], axis=0, ignore_index=True)

This code stacks two DataFrames vertically, combining their rows into one larger DataFrame.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Copying rows from each DataFrame into a new combined DataFrame.
  • How many times: Once for each row in both DataFrames, so total rows combined.
How Execution Grows With Input

As the number of rows increases, the time to copy and combine grows roughly in direct proportion.

Input Size (n)Approx. Operations
10About 20 row copies
100About 200 row copies
1000About 2000 row copies

Pattern observation: Doubling the input roughly doubles the work done.

Final Time Complexity

Time Complexity: O(n)

This means the time needed grows linearly with the total number of rows being stacked.

Common Mistake

[X] Wrong: "Concatenating two DataFrames is a constant time operation regardless of size."

[OK] Correct: The operation must copy all rows to create a new DataFrame, so time grows with the total rows.

Interview Connect

Understanding how concat() scales helps you reason about data processing speed and memory use in real projects.

Self-Check

"What if we stacked 10 DataFrames instead of 2? How would the time complexity change?"