Forward fill and backward fill in Data Analysis Python - Time & Space Complexity
We want to understand how the time to fill missing data grows as the data size increases.
How does the filling process scale when we have more data points?
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({'A': [1, None, None, 4, None, 6]})
df_filled_forward = df.fillna(method='ffill')
df_filled_backward = df.fillna(method='bfill')
This code fills missing values in a column by carrying forward or backward the last known value.
- Primary operation: Scanning the data column once to fill missing values.
- How many times: Each element is visited once in forward fill and once in backward fill.
As the number of rows increases, the time to fill missing values grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 steps forward and 10 steps backward |
| 100 | About 100 steps forward and 100 steps backward |
| 1000 | About 1000 steps forward and 1000 steps backward |
Pattern observation: The operations grow linearly with the number of data points.
Time Complexity: O(n)
This means the time to fill missing values grows in a straight line as data size grows.
[X] Wrong: "Forward fill or backward fill takes constant time regardless of data size."
[OK] Correct: Because each data point must be checked and possibly updated, the time grows with data size, not fixed.
Understanding how filling missing data scales helps you explain data cleaning steps clearly and shows you think about efficiency in real tasks.
"What if we filled missing values using a more complex method that looks at multiple columns? How would the time complexity change?"