How to Use Named Aggregation in pandas for GroupBy
Use
named aggregation in pandas by passing keyword arguments to agg() with keys as new column names and values as tuples specifying the column to aggregate and the aggregation function. This allows you to apply multiple aggregations with clear, custom output column names in a single groupby call.Syntax
The syntax for named aggregation in pandas groupby is:
df.groupby('group_column').agg(
new_col_name1 = ('original_col1', 'agg_func1'),
new_col_name2 = ('original_col2', 'agg_func2'),
...
)Here, each new_col_name is the name you want in the output, original_col is the column to aggregate, and agg_func is the aggregation function like 'sum', 'mean', or a custom function.
python
df.groupby('group_column').agg( new_col_name1 = ('original_col1', 'agg_func1'), new_col_name2 = ('original_col2', 'agg_func2') )
Example
This example groups data by the 'Team' column and calculates the total 'Points' and average 'Assists' for each team with custom output column names.
python
import pandas as pd data = { 'Team': ['A', 'A', 'B', 'B', 'C'], 'Points': [10, 15, 10, 20, 30], 'Assists': [5, 7, 8, 6, 9] } df = pd.DataFrame(data) result = df.groupby('Team').agg( Total_Points = ('Points', 'sum'), Average_Assists = ('Assists', 'mean') ) print(result)
Output
Total_Points Average_Assists
Team
A 25 6.0
B 30 7.0
C 30 9.0
Common Pitfalls
Common mistakes include:
- Using a list or dict without naming the output columns, which leads to unclear or multi-level column names.
- Passing aggregation functions directly without tuples, which is the older style and less flexible.
- Mixing positional and named aggregations incorrectly.
Always use the tuple format (column, function) with a new column name as the key for clarity.
python
import pandas as pd data = {'Team': ['A', 'A', 'B'], 'Points': [10, 15, 10]} df = pd.DataFrame(data) # Wrong: no named aggregation, unclear columns wrong = df.groupby('Team').agg({'Points': ['sum', 'mean']}) # Right: named aggregation with clear column names right = df.groupby('Team').agg( Total_Points = ('Points', 'sum'), Average_Points = ('Points', 'mean') ) print('Wrong aggregation output:') print(wrong) print('\nRight aggregation output:') print(right)
Output
Wrong aggregation output:
Points
sum mean
Team
A 25 12.5
B 10 10.0
Right aggregation output:
Total_Points Average_Points
Team
A 25 12.5
B 10 10.0
Quick Reference
Tips for using named aggregation:
- Use
agg()with keyword arguments where keys are new column names. - Each value is a tuple: (
column_to_aggregate,aggregation_function). - Aggregation functions can be strings like
'sum','mean', or custom functions. - Named aggregation works only with pandas version 0.25.0 and later.
Key Takeaways
Named aggregation lets you assign custom output column names in groupby aggregations.
Use the syntax: new_name = ('column', 'agg_func') inside agg() for clarity.
Avoid unnamed or multi-level columns by always naming your aggregations.
Works with pandas 0.25.0+ for clean, readable grouped summaries.