How to Use groupby in pandas for Data Grouping and Aggregation
Use
groupby() in pandas to split data into groups based on column values, then apply aggregation functions like sum() or mean() on each group. It helps summarize and analyze data by categories efficiently.Syntax
The basic syntax of groupby() is:
df.groupby(by): Groups the DataFramedfby the column(s) specified inby.bycan be a single column name, a list of column names, or a function.- After grouping, you can apply aggregation functions like
sum(),mean(),count(), etc.
python
grouped = df.groupby('column_name')
result = grouped.aggregation_function()Example
This example shows how to group a DataFrame by the 'Category' column and calculate the sum of 'Sales' for each group.
python
import pandas as pd data = {'Category': ['A', 'B', 'A', 'B', 'C', 'A'], 'Sales': [100, 200, 150, 300, 250, 50]} df = pd.DataFrame(data) grouped = df.groupby('Category') sales_sum = grouped['Sales'].sum() print(sales_sum)
Output
Category
A 300
B 500
C 250
Name: Sales, dtype: int64
Common Pitfalls
Common mistakes when using groupby() include:
- Forgetting to select a column before applying aggregation, which can cause unexpected results.
- Using aggregation functions without parentheses, e.g.,
suminstead ofsum(). - Assuming
groupby()returns a DataFrame directly; it returns a GroupBy object that needs aggregation.
Always apply an aggregation function after grouping to get meaningful results.
python
import pandas as pd data = {'Category': ['A', 'B', 'A'], 'Sales': [100, 200, 150]} df = pd.DataFrame(data) # Wrong: missing aggregation function # grouped = df.groupby('Category') # print(grouped) # This prints a GroupBy object, not grouped data # Right: apply aggregation grouped = df.groupby('Category') sales_sum = grouped['Sales'].sum() print(sales_sum)
Output
Category
A 250
B 200
Name: Sales, dtype: int64
Quick Reference
| Method | Description |
|---|---|
| groupby(by) | Group data by column(s) or function |
| sum() | Calculate sum of values in each group |
| mean() | Calculate mean of values in each group |
| count() | Count number of items in each group |
| agg(func) | Apply one or more aggregation functions |
| size() | Get size of each group |
Key Takeaways
Use
groupby() to split data into groups based on column values.Always apply an aggregation function like
sum() or mean() after grouping.You can group by one or multiple columns by passing a list to
groupby().The result of
groupby() is a GroupBy object, not a DataFrame, until aggregated.Common aggregation methods include
sum(), mean(), count(), and agg().