Counting missing values in Pandas - Time & Space Complexity
We want to know how long it takes to count missing values in a table as the table grows.
How does the time needed change when we have more rows or columns?
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, None, 4],
'B': [None, 2, 3, 4],
'C': [1, None, None, 4]
})
missing_count = df.isna().sum().sum()
This code counts all missing values in the whole table by first marking missing spots, then summing them up.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking each cell to see if it is missing (NaN).
- How many times: Once for every cell in the table (rows x columns).
As the table gets bigger, the time to count missing values grows with the total number of cells.
| Input Size (rows x columns) | Approx. Operations |
|---|---|
| 10 x 5 = 50 | About 50 checks |
| 100 x 5 = 500 | About 500 checks |
| 1000 x 5 = 5000 | About 5000 checks |
Pattern observation: The time grows directly with the number of cells; doubling rows or columns roughly doubles the work.
Time Complexity: O(n * m)
This means the time needed grows proportionally with the total number of cells in the table.
[X] Wrong: "Counting missing values only looks at columns, so time depends just on the number of columns."
[OK] Correct: The code checks every cell, so both rows and columns affect the time equally.
Understanding how counting missing data scales helps you handle bigger datasets confidently and shows you can think about efficiency clearly.
"What if we only count missing values in one column instead of the whole table? How would the time complexity change?"