How to Use Interpolate in pandas for Missing Data
Use
DataFrame.interpolate() or Series.interpolate() in pandas to fill missing values by estimating intermediate values. It supports methods like linear, time, and polynomial interpolation to smoothly fill gaps in your data.Syntax
The interpolate() method is called on a pandas DataFrame or Series to fill missing values. Key parameters include:
method: Type of interpolation (e.g., 'linear', 'time', 'polynomial').axis: Axis to interpolate along (0 for index, 1 for columns).limit: Maximum number of consecutive NaNs to fill.inplace: Whether to modify the original object.
python
DataFrame.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs)
Example
This example shows how to use interpolate() to fill missing values linearly in a DataFrame.
python
import pandas as pd import numpy as np data = {'A': [1, np.nan, 3, np.nan, 5], 'B': [np.nan, 2, np.nan, 4, 5]} df = pd.DataFrame(data) # Interpolate missing values linearly interpolated_df = df.interpolate() print(interpolated_df)
Output
A B
0 1.0 NaN
1 2.0 2.0
2 3.0 3.0
3 4.0 4.0
4 5.0 5.0
Common Pitfalls
Common mistakes when using interpolate() include:
- Not specifying the correct
methodfor your data type (e.g., using 'linear' for time series). - Expecting interpolation to fill all NaNs when some are at the start or end of data (interpolation only fills between known values).
- Forgetting to set
axiscorrectly when working with DataFrames.
Example of a wrong approach and the correct fix:
python
# Wrong: Using linear interpolation on a time index without specifying method='time' import pandas as pd import numpy as np idx = pd.date_range('2023-01-01', periods=5) data = pd.Series([1, np.nan, np.nan, 4, 5], index=idx) # This will not interpolate correctly for time series wrong = data.interpolate() # Correct: Specify method='time' for time-based interpolation correct = data.interpolate(method='time') print('Wrong interpolation:\n', wrong) print('\nCorrect interpolation:\n', correct)
Output
Wrong interpolation:
2023-01-01 1.0
2023-01-02 1.666667
2023-01-03 2.333333
2023-01-04 4.0
2023-01-05 5.0
Freq: D, dtype: float64
Correct interpolation:
2023-01-01 1.0
2023-01-02 2.0
2023-01-03 3.0
2023-01-04 4.0
2023-01-05 5.0
Freq: D, dtype: float64
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| method | Interpolation technique ('linear', 'time', 'polynomial', etc.) | 'linear' |
| axis | Axis to interpolate along (0=index, 1=columns) | 0 |
| limit | Max number of consecutive NaNs to fill | None |
| inplace | Modify original object if True | False |
| limit_direction | Direction to fill ('forward', 'backward', 'both') | 'forward' |
Key Takeaways
Use pandas interpolate() to fill missing values smoothly between known data points.
Choose the interpolation method that fits your data type, like 'time' for time series.
Interpolation only fills gaps between existing values, not NaNs at the start or end.
Set axis correctly when interpolating DataFrames to avoid unexpected results.
Use limit to control how many consecutive NaNs get filled.