0
0
PandasHow-ToBeginner · 3 min read

How to Use Interpolate in pandas for Missing Data

Use DataFrame.interpolate() or Series.interpolate() in pandas to fill missing values by estimating intermediate values. It supports methods like linear, time, and polynomial interpolation to smoothly fill gaps in your data.
📐

Syntax

The interpolate() method is called on a pandas DataFrame or Series to fill missing values. Key parameters include:

  • method: Type of interpolation (e.g., 'linear', 'time', 'polynomial').
  • axis: Axis to interpolate along (0 for index, 1 for columns).
  • limit: Maximum number of consecutive NaNs to fill.
  • inplace: Whether to modify the original object.
python
DataFrame.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs)
💻

Example

This example shows how to use interpolate() to fill missing values linearly in a DataFrame.

python
import pandas as pd
import numpy as np

data = {'A': [1, np.nan, 3, np.nan, 5],
        'B': [np.nan, 2, np.nan, 4, 5]}
df = pd.DataFrame(data)

# Interpolate missing values linearly
interpolated_df = df.interpolate()

print(interpolated_df)
Output
A B 0 1.0 NaN 1 2.0 2.0 2 3.0 3.0 3 4.0 4.0 4 5.0 5.0
⚠️

Common Pitfalls

Common mistakes when using interpolate() include:

  • Not specifying the correct method for your data type (e.g., using 'linear' for time series).
  • Expecting interpolation to fill all NaNs when some are at the start or end of data (interpolation only fills between known values).
  • Forgetting to set axis correctly when working with DataFrames.

Example of a wrong approach and the correct fix:

python
# Wrong: Using linear interpolation on a time index without specifying method='time'
import pandas as pd
import numpy as np

idx = pd.date_range('2023-01-01', periods=5)
data = pd.Series([1, np.nan, np.nan, 4, 5], index=idx)

# This will not interpolate correctly for time series
wrong = data.interpolate()

# Correct: Specify method='time' for time-based interpolation
correct = data.interpolate(method='time')

print('Wrong interpolation:\n', wrong)
print('\nCorrect interpolation:\n', correct)
Output
Wrong interpolation: 2023-01-01 1.0 2023-01-02 1.666667 2023-01-03 2.333333 2023-01-04 4.0 2023-01-05 5.0 Freq: D, dtype: float64 Correct interpolation: 2023-01-01 1.0 2023-01-02 2.0 2023-01-03 3.0 2023-01-04 4.0 2023-01-05 5.0 Freq: D, dtype: float64
📊

Quick Reference

ParameterDescriptionDefault
methodInterpolation technique ('linear', 'time', 'polynomial', etc.)'linear'
axisAxis to interpolate along (0=index, 1=columns)0
limitMax number of consecutive NaNs to fillNone
inplaceModify original object if TrueFalse
limit_directionDirection to fill ('forward', 'backward', 'both')'forward'

Key Takeaways

Use pandas interpolate() to fill missing values smoothly between known data points.
Choose the interpolation method that fits your data type, like 'time' for time series.
Interpolation only fills gaps between existing values, not NaNs at the start or end.
Set axis correctly when interpolating DataFrames to avoid unexpected results.
Use limit to control how many consecutive NaNs get filled.