0
0
Pandasdata~5 mins

Filling missing values with fillna() in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Filling missing values with fillna()
O(n)
Understanding Time Complexity

We want to understand how the time needed to fill missing values changes as the data grows.

How does the work grow when we use fillna() on bigger data?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df = pd.DataFrame({
    'A': [1, None, 3, None, 5],
    'B': [None, 2, None, 4, 5]
})

filled_df = df.fillna(0)

This code replaces all missing values in the DataFrame with zero.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Checking each cell in the DataFrame to see if it is missing and replacing it if so.
  • How many times: Once for every cell in the DataFrame (rows x columns).
How Execution Grows With Input

As the number of rows or columns grows, the number of cells to check grows too.

Input Size (rows x columns)Approx. Operations
10 x 2 = 20About 20 checks and replacements
100 x 2 = 200About 200 checks and replacements
1000 x 2 = 2000About 2000 checks and replacements

Pattern observation: The work grows directly with the number of cells; doubling the data doubles the work.

Final Time Complexity

Time Complexity: O(n)

This means the time to fill missing values grows linearly with the total number of cells in the DataFrame.

Common Mistake

[X] Wrong: "fillna() only checks missing values, so it runs faster than looking at every cell."

[OK] Correct: The method must look at every cell to know if it is missing or not, so it still touches all data points.

Interview Connect

Understanding how data size affects operations like filling missing values helps you write efficient data cleaning steps in real projects.

Self-Check

"What if we used fillna() only on one column instead of the whole DataFrame? How would the time complexity change?"