0
0
Data Analysis Pythondata~5 mins

concat() for stacking DataFrames in Data Analysis Python - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: concat() for stacking DataFrames
O(n)
Understanding Time Complexity

When stacking data tables using concat(), it is important to know how the time needed grows as the data gets bigger.

We want to find out how the work done changes when we add more rows or columns.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

n = 10  # Example value for n

df1 = pd.DataFrame({'A': range(n)})
df2 = pd.DataFrame({'A': range(n, 2*n)})

result = pd.concat([df1, df2], axis=0, ignore_index=True)

This code stacks two data tables one below the other, combining their rows into one table.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Copying rows from each DataFrame into a new combined DataFrame.
  • How many times: Each row from both tables is processed once, so about 2*n times if each has n rows.
How Execution Grows With Input

As the number of rows grows, the work grows roughly in direct proportion.

Input Size (n)Approx. Operations
10About 20 row copies
100About 200 row copies
1000About 2000 row copies

Pattern observation: Doubling the rows doubles the work needed.

Final Time Complexity

Time Complexity: O(n)

This means the time to stack grows linearly with the total number of rows.

Common Mistake

[X] Wrong: "concat() is instant no matter how big the data is."

[OK] Correct: The function must copy all rows to create the new table, so bigger data takes more time.

Interview Connect

Understanding how concat() scales helps you explain data merging choices clearly and shows you know how data size affects performance.

Self-Check

"What if we stacked 10 DataFrames instead of 2? How would the time complexity change?"