apply() on rows (axis=1) in Pandas - Time & Space Complexity
We want to understand how the time to run apply() on rows grows as the data gets bigger.
Specifically, how does the number of rows affect the work done?
Analyze the time complexity of the following code snippet.
import pandas as pd
def sum_row(row):
return row.sum()
# Create a DataFrame with n rows and 3 columns
n = 1000
df = pd.DataFrame({
'A': range(n),
'B': range(n, 2*n),
'C': range(2*n, 3*n)
})
result = df.apply(sum_row, axis=1)
This code sums each row of the DataFrame using apply() with axis=1.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: The function
sum_rowis called once for each row. - How many times: It runs exactly
ntimes, once per row. - Inside each call, it sums 3 values (columns), which is a small fixed amount.
As the number of rows grows, the total work grows roughly the same amount.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 sums of 3 numbers each |
| 100 | About 100 sums of 3 numbers each |
| 1000 | About 1000 sums of 3 numbers each |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the time to run grows linearly as the number of rows increases.
[X] Wrong: "Since apply() works on rows and columns, it must be very slow and grow like n squared."
[OK] Correct: Here, apply() runs the function once per row only, so the growth depends mostly on the number of rows, not columns. The columns are fixed and small, so they don't cause squared growth.
Understanding how apply() scales helps you explain your code choices clearly and shows you know how data size affects performance.
"What if the function inside apply() also loops over all columns instead of just summing? How would the time complexity change?"