0
0
PandasHow-ToBeginner · 3 min read

How to Use Rolling in pandas for Moving Window Calculations

Use DataFrame.rolling(window) or Series.rolling(window) to create a rolling window object in pandas. Then apply aggregation functions like mean(), sum(), or std() to compute statistics over the moving window.
📐

Syntax

The basic syntax for rolling in pandas is:

  • df.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)

Where:

  • window: Size of the moving window (number of rows or time span).
  • min_periods: Minimum observations in window required to have a value (default is window size).
  • center: If True, set labels at the center of the window.
  • win_type: Type of window (e.g., 'boxcar', 'triang').
  • on: For DataFrame with datetime, column to use for windowing.
  • axis: Axis to roll over (0 for rows, 1 for columns).
  • closed: Which side of window is closed ('right', 'left', 'both', 'neither').

After creating the rolling object, apply aggregation like .mean(), .sum(), .std(), etc.

python
df.rolling(window=3, min_periods=1).mean()
💻

Example

This example shows how to calculate the rolling mean and sum over a window of 3 rows on a simple DataFrame.

python
import pandas as pd

data = {'values': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# Calculate rolling mean with window size 3
rolling_mean = df['values'].rolling(window=3, min_periods=1).mean()

# Calculate rolling sum with window size 3
rolling_sum = df['values'].rolling(window=3, min_periods=1).sum()

print('Original DataFrame:')
print(df)
print('\nRolling Mean (window=3):')
print(rolling_mean)
print('\nRolling Sum (window=3):')
print(rolling_sum)
Output
Original DataFrame: values 0 10 1 20 2 30 3 40 4 50 Rolling Mean (window=3): 0 10.0 1 15.0 2 20.0 3 30.0 4 40.0 Name: values, dtype: float64 Rolling Sum (window=3): 0 10.0 1 30.0 2 60.0 3 90.0 4 120.0 Name: values, dtype: float64
⚠️

Common Pitfalls

Common mistakes when using rolling include:

  • Not setting min_periods, which can result in NaN values at the start of the series.
  • Using a window size larger than the data length, causing all results to be NaN.
  • For time-based rolling, forgetting to specify the on parameter with a datetime column.
  • Misunderstanding the center parameter, which changes label alignment.

Example of a common mistake and fix:

python
import pandas as pd

data = {'values': [10, 20, 30]}
df = pd.DataFrame(data)

# Wrong: window larger than data length, default min_periods=window
print(df['values'].rolling(window=5).mean())

# Right: set min_periods=1 to get results even if window not full
print(df['values'].rolling(window=5, min_periods=1).mean())
Output
0 NaN 1 NaN 2 NaN Name: values, dtype: float64 0 10.0 1 15.0 2 20.0 Name: values, dtype: float64
📊

Quick Reference

ParameterDescriptionDefault
windowSize of the moving windowRequired
min_periodsMinimum observations to compute resultwindow size
centerSet labels at center of window if TrueFalse
win_typeType of window (e.g., 'boxcar', 'triang')None
onColumn for time-based rollingNone
axisAxis to roll over (0=rows, 1=columns)0
closedWhich side of window is closedNone

Key Takeaways

Use df.rolling(window) to create a rolling window object for moving calculations.
Set min_periods to control how many values are needed before computing results to avoid NaNs.
Apply aggregation functions like mean(), sum(), std() on the rolling object to get results.
For time-based rolling, specify the 'on' parameter with a datetime column.
Beware of window size larger than data length and understand the center parameter for label alignment.