How to Use Rolling Window in pandas for Data Analysis
Use
pandas.DataFrame.rolling(window) to create a rolling window object that slides over your data. Then apply functions like .mean(), .sum(), or custom functions to calculate statistics over each window.Syntax
The basic syntax for using a rolling window in pandas is:
df.rolling(window): Creates a rolling window object with the specified window size.window: Number of observations used for calculating the statistic.- After creating the rolling object, apply aggregation functions like
.mean(),.sum(),.std(), etc.
python
df.rolling(window).function()
Example
This example shows how to calculate the rolling mean over a window of 3 rows in a pandas DataFrame.
python
import pandas as pd data = {'values': [10, 20, 30, 40, 50]} df = pd.DataFrame(data) # Calculate rolling mean with window size 3 df['rolling_mean'] = df['values'].rolling(window=3).mean() print(df)
Output
values rolling_mean
0 10 NaN
1 20 NaN
2 30 20.0
3 40 30.0
4 50 40.0
Common Pitfalls
- NaN values at the start: The first
window - 1rows will haveNaNbecause there is not enough data to fill the window. - Window size too large: If the window is larger than the data length, all results will be
NaN. - Non-numeric data: Rolling functions require numeric data; non-numeric columns will cause errors.
python
import pandas as pd data = {'values': [10, 20, 30]} df = pd.DataFrame(data) # Wrong: window larger than data length print(df['values'].rolling(window=5).mean()) # Right: window smaller or equal to data length print(df['values'].rolling(window=2).mean())
Output
0 NaN
1 NaN
2 NaN
Name: values, dtype: float64
0 NaN
1 15.0
2 25.0
Name: values, dtype: float64
Quick Reference
| Method | Description |
|---|---|
| rolling(window) | Creates rolling window object with specified size |
| .mean() | Calculates mean over each rolling window |
| .sum() | Calculates sum over each rolling window |
| .std() | Calculates standard deviation over each rolling window |
| .apply(func) | Applies custom function to each rolling window |
Key Takeaways
Use df.rolling(window) to create a rolling window over your data.
Apply aggregation functions like .mean() or .sum() on the rolling object to get moving statistics.
The first window-1 results are NaN because the window is not full yet.
Ensure your window size is appropriate for your data length to avoid all NaN results.
Rolling functions work only on numeric data columns.