0
0
PandasHow-ToBeginner · 3 min read

How to Use Aggregate Function in pandas for Data Analysis

In pandas, use the aggregate() method to apply one or more summary functions like sum, mean, or max on DataFrame columns. It allows flexible aggregation by passing a single function, a list of functions, or a dictionary mapping columns to functions.
📐

Syntax

The aggregate() method syntax is:

  • DataFrame.aggregate(func, axis=0, *args, **kwargs)

Where:

  • func: a function, list of functions, or dict specifying aggregation(s)
  • axis: 0 to aggregate columns (default), 1 to aggregate rows
python
df.aggregate(func, axis=0, *args, **kwargs)
💻

Example

This example shows how to use aggregate() to get the sum and mean of numeric columns in a DataFrame.

python
import pandas as pd

data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': ['x', 'y', 'z', 'w']}
df = pd.DataFrame(data)

# Aggregate sum and mean for columns A and B
result = df.aggregate({'A': ['sum', 'mean'], 'B': ['sum', 'mean']})
print(result)
Output
A B sum 10 26 mean 2.5 6.5
⚠️

Common Pitfalls

Common mistakes include:

  • Passing non-aggregatable columns without specifying functions causes errors.
  • Using agg() and aggregate() interchangeably is fine, but be consistent.
  • Not specifying a dictionary when different functions are needed per column.

Example of wrong and right usage:

python
# Wrong: applying sum to a non-numeric column
import pandas as pd
data = {'A': [1, 2], 'B': ['x', 'y']}
df = pd.DataFrame(data)

try:
    print(df.aggregate('sum'))
except Exception as e:
    print(f'Error: {e}')

# Right: specify functions only for numeric columns
print(df.aggregate({'A': 'sum'}))
Output
Error: unsupported operand type(s) for +: 'int' and 'str' A 3 dtype: int64
📊

Quick Reference

UsageDescriptionExample
Single functionApply one function to all columnsdf.aggregate('sum')
Multiple functionsApply list of functions to all columnsdf.aggregate(['sum', 'mean'])
Dict of functionsApply different functions per columndf.aggregate({'A': 'sum', 'B': ['min', 'max']})
Axis parameterAggregate over rows instead of columnsdf.aggregate('sum', axis=1)

Key Takeaways

Use aggregate() to apply one or more summary functions on DataFrame columns.
Pass a function, list of functions, or a dictionary mapping columns to functions for flexible aggregation.
Specify axis=1 to aggregate across rows instead of columns.
Avoid applying aggregation functions to non-numeric columns without specifying functions explicitly.
Both agg() and aggregate() are interchangeable methods in pandas.