Pandasdata~5 mins

apply() on columns in Pandas - Time & Space Complexity

Choose your learning style9 modes available

Time Complexity: apply() on columns

O(n x m)

Understanding Time Complexity

We want to understand how the time needed to run apply() on DataFrame columns changes as the data grows.

Specifically, how does the work increase when we apply a function to each column?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(1000, 10))
result = df.apply(lambda col: col.sum(), axis=0)

This code creates a DataFrame with 1000 rows and 10 columns, then sums each column using apply().

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: The function runs once per column, processing all rows in that column.
How many times: It repeats for each of the columns (here, 10 times).

How Execution Grows With Input

As the number of rows grows, the work to sum each column grows linearly. More rows mean more numbers to add per column.

Pattern observation: Operations grow directly with the number of rows. Doubling rows doubles work.

Final Time Complexity

Time Complexity: O(n x m)

This means the time grows proportionally with the number of rows (n) times the number of columns (m).

Common Mistake

[X] Wrong: "Applying a function on columns only depends on the number of columns, so it's O(m)."

[OK] Correct: Each column's function processes all rows, so the total work depends on both rows and columns, not just columns.

Interview Connect

Understanding how data size affects function application helps you write efficient code and explain your choices clearly in real projects.

Self-Check

What if we changed axis=0 to axis=1 to apply the function on rows instead of columns? How would the time complexity change?