apply() on columns in Pandas - Time & Space Complexity
We want to understand how the time needed to run apply() on DataFrame columns changes as the data grows.
Specifically, how does the work increase when we apply a function to each column?
Analyze the time complexity of the following code snippet.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(1000, 10))
result = df.apply(lambda col: col.sum(), axis=0)
This code creates a DataFrame with 1000 rows and 10 columns, then sums each column using apply().
Identify the loops, recursion, array traversals that repeat.
- Primary operation: The function runs once per column, processing all rows in that column.
- How many times: It repeats for each of the columns (here, 10 times).
As the number of rows grows, the work to sum each column grows linearly. More rows mean more numbers to add per column.
| Input Size (n rows) | Approx. Operations |
|---|---|
| 10 | 10 columns x 10 rows = 100 operations |
| 100 | 10 columns x 100 rows = 1,000 operations |
| 1000 | 10 columns x 1000 rows = 10,000 operations |
Pattern observation: Operations grow directly with the number of rows. Doubling rows doubles work.
Time Complexity: O(n x m)
This means the time grows proportionally with the number of rows (n) times the number of columns (m).
[X] Wrong: "Applying a function on columns only depends on the number of columns, so it's O(m)."
[OK] Correct: Each column's function processes all rows, so the total work depends on both rows and columns, not just columns.
Understanding how data size affects function application helps you write efficient code and explain your choices clearly in real projects.
What if we changed axis=0 to axis=1 to apply the function on rows instead of columns? How would the time complexity change?