How to Use ffill and bfill in pandas for Missing Data
In pandas,
ffill (forward fill) fills missing values by copying the last valid value forward, while bfill (backward fill) fills missing values by using the next valid value backward. You can apply them using DataFrame.fillna(method='ffill') or DataFrame.fillna(method='bfill') to handle missing data easily.Syntax
The ffill and bfill methods are used with fillna() to fill missing values in pandas objects.
df.fillna(method='ffill'): fills missing values by propagating the last valid observation forward.df.fillna(method='bfill'): fills missing values by using the next valid observation backward.
You can also specify the axis parameter to fill along rows (axis=0) or columns (axis=1).
python
df.fillna(method='ffill', axis=0) df.fillna(method='bfill', axis=1)
Example
This example shows how to use ffill and bfill to fill missing values in a DataFrame.
python
import pandas as pd import numpy as np data = {'A': [1, np.nan, np.nan, 4], 'B': [np.nan, 2, np.nan, 4], 'C': [1, 2, 3, np.nan]} df = pd.DataFrame(data) # Forward fill missing values ffill_df = df.fillna(method='ffill') # Backward fill missing values bfill_df = df.fillna(method='bfill') print('Original DataFrame:') print(df) print('\nAfter forward fill (ffill):') print(ffill_df) print('\nAfter backward fill (bfill):') print(bfill_df)
Output
Original DataFrame:
A B C
0 1.0 NaN 1.0
1 NaN 2.0 2.0
2 NaN NaN 3.0
3 4.0 4.0 NaN
After forward fill (ffill):
A B C
0 1.0 NaN 1.0
1 1.0 2.0 2.0
2 1.0 2.0 3.0
3 4.0 4.0 3.0
After backward fill (bfill):
A B C
0 1.0 2.0 1.0
1 4.0 2.0 2.0
2 4.0 4.0 3.0
3 4.0 4.0 NaN
Common Pitfalls
Common mistakes when using ffill and bfill include:
- Not specifying the
axisparameter correctly, which can lead to unexpected filling direction. - Using
ffillorbfillon data without any valid values to fill from, resulting in unchangedNaNs. - Assuming these methods fill all missing values; they only fill where a valid value exists before (ffill) or after (bfill).
Example of a wrong approach and the correct fix:
python
import pandas as pd import numpy as np df = pd.DataFrame({'A': [np.nan, np.nan, 3]}) # Wrong: forward fill with no initial valid value wrong_fill = df.fillna(method='ffill') # Correct: backward fill to fill initial NaNs correct_fill = df.fillna(method='bfill') print('Wrong fill (ffill):') print(wrong_fill) print('\nCorrect fill (bfill):') print(correct_fill)
Output
Wrong fill (ffill):
A
0 NaN
1 NaN
2 3.0
Correct fill (bfill):
A
0 3.0
1 3.0
2 3.0
Quick Reference
| Method | Description | Default axis |
|---|---|---|
| fillna(method='ffill') | Fill missing values forward using last valid value | axis=0 (rows) |
| fillna(method='bfill') | Fill missing values backward using next valid value | axis=0 (rows) |
| axis=0 | Fill down each column | Default |
| axis=1 | Fill across each row | Optional |
Key Takeaways
Use fillna(method='ffill') to fill missing values by carrying forward the last valid value.
Use fillna(method='bfill') to fill missing values by using the next valid value backward.
Specify axis=0 to fill down columns or axis=1 to fill across rows.
ffill and bfill only fill where valid values exist before or after missing data.
If no valid values exist in the fill direction, missing values remain unchanged.