Identifying missing values (isnull, isna) in Data Analysis Python - Time & Space Complexity
We want to understand how long it takes to find missing values in data.
How does the time grow when the data gets bigger?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.DataFrame({
'A': [1, 2, None, 4],
'B': [None, 2, 3, 4]
})
missing = data.isnull()
This code creates a small table and checks which values are missing.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking each cell in the table to see if it is missing.
- How many times: Once for every cell in the data.
As the table gets bigger, the number of checks grows with the number of cells.
| Input Size (n cells) | Approx. Operations |
|---|---|
| 10 | 10 checks |
| 100 | 100 checks |
| 1000 | 1000 checks |
Pattern observation: The time grows directly with the number of cells.
Time Complexity: O(n)
This means the time to find missing values grows in a straight line with the data size.
[X] Wrong: "Checking for missing values is instant no matter the data size."
[OK] Correct: Each cell must be checked, so bigger data takes more time.
Knowing how missing value checks scale helps you understand data cleaning speed in real projects.
"What if we check missing values only in one column instead of the whole table? How would the time complexity change?"