0
0
PandasHow-ToBeginner · 3 min read

How to Use agg in pandas: Syntax and Examples

In pandas, use the agg function to apply one or more aggregation operations like sum, mean, or custom functions on DataFrames or Series. It allows you to summarize data by specifying aggregation functions for columns or the entire dataset.
📐

Syntax

The agg function can be used on a pandas DataFrame or Series. You can pass a single aggregation function as a string, a list of functions, or a dictionary mapping columns to functions.

  • df.agg(func): Apply a single function to all columns.
  • df.agg([func1, func2]): Apply multiple functions to all columns.
  • df.agg({'col1': func1, 'col2': func2}): Apply different functions to specific columns.
python
df.agg(func)
df.agg([func1, func2])
df.agg({'col1': func1, 'col2': func2})
💻

Example

This example shows how to use agg to calculate the sum and mean of numeric columns in a DataFrame, and how to apply different functions to different columns.

python
import pandas as pd

data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': ['x', 'y', 'z', 'w']}
df = pd.DataFrame(data)

# Apply sum and mean to all numeric columns
aggregated_all = df.agg(['sum', 'mean'])

# Apply sum to column 'A' and max to column 'B'
aggregated_cols = df.agg({'A': 'sum', 'B': 'max'})

aggregated_all, aggregated_cols
Output
( A B sum 10 26 mean 2.5 6.5, A 10 B 8 dtype: int64 )
⚠️

Common Pitfalls

Common mistakes when using agg include:

  • Passing aggregation functions that do not apply to the data type (e.g., using sum on string columns).
  • Using a list of functions without brackets when only one function is intended.
  • Not specifying column names correctly in the dictionary, causing errors or unexpected results.

Always check your data types and use functions appropriate for each column.

python
import pandas as pd

data = {'A': [1, 2, 3], 'B': ['x', 'y', 'z']}
df = pd.DataFrame(data)

# Wrong: trying to sum a string column
try:
    df.agg({'A': 'sum', 'B': 'sum'})
except Exception as e:
    error_message = str(e)

# Right: apply sum only to numeric column
correct = df.agg({'A': 'sum'})

error_message, correct
Output
( "unsupported operand type(s) for +: 'int' and 'str'", 6 )
📊

Quick Reference

UsageDescriptionExample
Single functionApply one aggregation to all columnsdf.agg('mean')
Multiple functionsApply several aggregations to all columnsdf.agg(['sum', 'max'])
Dict by columnApply different functions to specific columnsdf.agg({'A': 'sum', 'B': 'max'})
Custom functionUse your own function for aggregationdf.agg({'A': lambda x: x.max() - x.min()})

Key Takeaways

Use agg to apply one or more aggregation functions to DataFrame columns easily.
You can pass a single function, a list of functions, or a dictionary mapping columns to functions.
Ensure aggregation functions match the data type of each column to avoid errors.
agg works on both pandas DataFrames and Series for flexible summarization.
Custom functions can be used with agg for tailored aggregation logic.