0
0
Pandasdata~5 mins

Why window functions matter in Pandas

Choose your learning style9 modes available
Introduction

Window functions help us look at data in small groups or windows. They let us calculate things like running totals or averages without losing the original data.

When you want to see a running total of sales over days.
When you want to compare each row to the average of nearby rows.
When you want to rank items within groups, like top scores per class.
When you want to calculate differences between rows in a sequence.
When you want to keep all rows but add summary info about groups.
Syntax
Pandas
df['new_column'] = df['value_column'].rolling(window_size).function()

# or using pandas window functions like:
df['rank'] = df.groupby('group_column')['value_column'].rank()

rolling() creates a moving window over rows.

Window functions often work with groupby() to handle groups separately.

Examples
Calculate running sum over the last 3 rows.
Pandas
df['running_sum'] = df['sales'].rolling(3).sum()
Rank scores within each team, highest score gets rank 1.
Pandas
df['rank'] = df.groupby('team')['score'].rank(ascending=False)
Calculate moving average temperature over 5 days.
Pandas
df['moving_avg'] = df['temperature'].rolling(window=5).mean()
Sample Program

This code shows how to calculate a running total of sales over the last 3 days using a window function. It keeps all rows and adds a new column with the sum of the current and previous two sales.

Pandas
import pandas as pd

data = {'day': [1, 2, 3, 4, 5], 'sales': [100, 150, 200, 130, 170]}
df = pd.DataFrame(data)

# Calculate running total of sales over last 3 days

df['running_total'] = df['sales'].rolling(3).sum()

print(df)
OutputSuccess
Important Notes

Window functions do not reduce the number of rows; they add new info per row.

NaN values appear at the start when the window is not full yet.

They are very useful for time series and grouped data analysis.

Summary

Window functions let you analyze data in small groups or moving windows.

They help calculate running totals, averages, ranks, and differences without losing rows.

They are powerful for understanding trends and comparisons in data.