How to Use Rolling in pandas for Moving Window Calculations
Use
DataFrame.rolling(window) or Series.rolling(window) to create a rolling window object in pandas. Then apply aggregation functions like mean(), sum(), or std() to compute statistics over the moving window.Syntax
The basic syntax for rolling in pandas is:
df.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
Where:
- window: Size of the moving window (number of rows or time span).
- min_periods: Minimum observations in window required to have a value (default is window size).
- center: If True, set labels at the center of the window.
- win_type: Type of window (e.g., 'boxcar', 'triang').
- on: For DataFrame with datetime, column to use for windowing.
- axis: Axis to roll over (0 for rows, 1 for columns).
- closed: Which side of window is closed ('right', 'left', 'both', 'neither').
After creating the rolling object, apply aggregation like .mean(), .sum(), .std(), etc.
python
df.rolling(window=3, min_periods=1).mean()
Example
This example shows how to calculate the rolling mean and sum over a window of 3 rows on a simple DataFrame.
python
import pandas as pd data = {'values': [10, 20, 30, 40, 50]} df = pd.DataFrame(data) # Calculate rolling mean with window size 3 rolling_mean = df['values'].rolling(window=3, min_periods=1).mean() # Calculate rolling sum with window size 3 rolling_sum = df['values'].rolling(window=3, min_periods=1).sum() print('Original DataFrame:') print(df) print('\nRolling Mean (window=3):') print(rolling_mean) print('\nRolling Sum (window=3):') print(rolling_sum)
Output
Original DataFrame:
values
0 10
1 20
2 30
3 40
4 50
Rolling Mean (window=3):
0 10.0
1 15.0
2 20.0
3 30.0
4 40.0
Name: values, dtype: float64
Rolling Sum (window=3):
0 10.0
1 30.0
2 60.0
3 90.0
4 120.0
Name: values, dtype: float64
Common Pitfalls
Common mistakes when using rolling include:
- Not setting
min_periods, which can result inNaNvalues at the start of the series. - Using a window size larger than the data length, causing all results to be
NaN. - For time-based rolling, forgetting to specify the
onparameter with a datetime column. - Misunderstanding the
centerparameter, which changes label alignment.
Example of a common mistake and fix:
python
import pandas as pd data = {'values': [10, 20, 30]} df = pd.DataFrame(data) # Wrong: window larger than data length, default min_periods=window print(df['values'].rolling(window=5).mean()) # Right: set min_periods=1 to get results even if window not full print(df['values'].rolling(window=5, min_periods=1).mean())
Output
0 NaN
1 NaN
2 NaN
Name: values, dtype: float64
0 10.0
1 15.0
2 20.0
Name: values, dtype: float64
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| window | Size of the moving window | Required |
| min_periods | Minimum observations to compute result | window size |
| center | Set labels at center of window if True | False |
| win_type | Type of window (e.g., 'boxcar', 'triang') | None |
| on | Column for time-based rolling | None |
| axis | Axis to roll over (0=rows, 1=columns) | 0 |
| closed | Which side of window is closed | None |
Key Takeaways
Use df.rolling(window) to create a rolling window object for moving calculations.
Set min_periods to control how many values are needed before computing results to avoid NaNs.
Apply aggregation functions like mean(), sum(), std() on the rolling object to get results.
For time-based rolling, specify the 'on' parameter with a datetime column.
Beware of window size larger than data length and understand the center parameter for label alignment.