0
0
Pandasdata~5 mins

apply() on rows (axis=1) in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: apply() on rows (axis=1)
O(n)
Understanding Time Complexity

We want to understand how the time to run apply() on rows grows as the data gets bigger.

Specifically, how does the number of rows affect the work done?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

def sum_row(row):
    return row.sum()

# Create a DataFrame with n rows and 3 columns
n = 1000
df = pd.DataFrame({
    'A': range(n),
    'B': range(n, 2*n),
    'C': range(2*n, 3*n)
})

result = df.apply(sum_row, axis=1)

This code sums each row of the DataFrame using apply() with axis=1.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: The function sum_row is called once for each row.
  • How many times: It runs exactly n times, once per row.
  • Inside each call, it sums 3 values (columns), which is a small fixed amount.
How Execution Grows With Input

As the number of rows grows, the total work grows roughly the same amount.

Input Size (n)Approx. Operations
10About 10 sums of 3 numbers each
100About 100 sums of 3 numbers each
1000About 1000 sums of 3 numbers each

Pattern observation: The work grows directly with the number of rows.

Final Time Complexity

Time Complexity: O(n)

This means the time to run grows linearly as the number of rows increases.

Common Mistake

[X] Wrong: "Since apply() works on rows and columns, it must be very slow and grow like n squared."

[OK] Correct: Here, apply() runs the function once per row only, so the growth depends mostly on the number of rows, not columns. The columns are fixed and small, so they don't cause squared growth.

Interview Connect

Understanding how apply() scales helps you explain your code choices clearly and shows you know how data size affects performance.

Self-Check

"What if the function inside apply() also loops over all columns instead of just summing? How would the time complexity change?"