How to Use Aggregate Function in pandas for Data Analysis
In pandas, use the
aggregate() method to apply one or more summary functions like sum, mean, or max on DataFrame columns. It allows flexible aggregation by passing a single function, a list of functions, or a dictionary mapping columns to functions.Syntax
The aggregate() method syntax is:
DataFrame.aggregate(func, axis=0, *args, **kwargs)
Where:
func: a function, list of functions, or dict specifying aggregation(s)axis: 0 to aggregate columns (default), 1 to aggregate rows
python
df.aggregate(func, axis=0, *args, **kwargs)Example
This example shows how to use aggregate() to get the sum and mean of numeric columns in a DataFrame.
python
import pandas as pd data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': ['x', 'y', 'z', 'w']} df = pd.DataFrame(data) # Aggregate sum and mean for columns A and B result = df.aggregate({'A': ['sum', 'mean'], 'B': ['sum', 'mean']}) print(result)
Output
A B
sum 10 26
mean 2.5 6.5
Common Pitfalls
Common mistakes include:
- Passing non-aggregatable columns without specifying functions causes errors.
- Using
agg()andaggregate()interchangeably is fine, but be consistent. - Not specifying a dictionary when different functions are needed per column.
Example of wrong and right usage:
python
# Wrong: applying sum to a non-numeric column import pandas as pd data = {'A': [1, 2], 'B': ['x', 'y']} df = pd.DataFrame(data) try: print(df.aggregate('sum')) except Exception as e: print(f'Error: {e}') # Right: specify functions only for numeric columns print(df.aggregate({'A': 'sum'}))
Output
Error: unsupported operand type(s) for +: 'int' and 'str'
A 3
dtype: int64
Quick Reference
| Usage | Description | Example |
|---|---|---|
| Single function | Apply one function to all columns | df.aggregate('sum') |
| Multiple functions | Apply list of functions to all columns | df.aggregate(['sum', 'mean']) |
| Dict of functions | Apply different functions per column | df.aggregate({'A': 'sum', 'B': ['min', 'max']}) |
| Axis parameter | Aggregate over rows instead of columns | df.aggregate('sum', axis=1) |
Key Takeaways
Use
aggregate() to apply one or more summary functions on DataFrame columns.Pass a function, list of functions, or a dictionary mapping columns to functions for flexible aggregation.
Specify
axis=1 to aggregate across rows instead of columns.Avoid applying aggregation functions to non-numeric columns without specifying functions explicitly.
Both
agg() and aggregate() are interchangeable methods in pandas.