0
0
PandasHow-ToBeginner · 3 min read

How to Use groupby in pandas for Data Grouping and Aggregation

Use groupby() in pandas to split data into groups based on column values, then apply aggregation functions like sum() or mean() on each group. It helps summarize and analyze data by categories efficiently.
📐

Syntax

The basic syntax of groupby() is:

  • df.groupby(by): Groups the DataFrame df by the column(s) specified in by.
  • by can be a single column name, a list of column names, or a function.
  • After grouping, you can apply aggregation functions like sum(), mean(), count(), etc.
python
grouped = df.groupby('column_name')
result = grouped.aggregation_function()
💻

Example

This example shows how to group a DataFrame by the 'Category' column and calculate the sum of 'Sales' for each group.

python
import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B', 'C', 'A'],
        'Sales': [100, 200, 150, 300, 250, 50]}
df = pd.DataFrame(data)

grouped = df.groupby('Category')
sales_sum = grouped['Sales'].sum()
print(sales_sum)
Output
Category A 300 B 500 C 250 Name: Sales, dtype: int64
⚠️

Common Pitfalls

Common mistakes when using groupby() include:

  • Forgetting to select a column before applying aggregation, which can cause unexpected results.
  • Using aggregation functions without parentheses, e.g., sum instead of sum().
  • Assuming groupby() returns a DataFrame directly; it returns a GroupBy object that needs aggregation.

Always apply an aggregation function after grouping to get meaningful results.

python
import pandas as pd

data = {'Category': ['A', 'B', 'A'], 'Sales': [100, 200, 150]}
df = pd.DataFrame(data)

# Wrong: missing aggregation function
# grouped = df.groupby('Category')
# print(grouped)  # This prints a GroupBy object, not grouped data

# Right: apply aggregation
grouped = df.groupby('Category')
sales_sum = grouped['Sales'].sum()
print(sales_sum)
Output
Category A 250 B 200 Name: Sales, dtype: int64
📊

Quick Reference

MethodDescription
groupby(by)Group data by column(s) or function
sum()Calculate sum of values in each group
mean()Calculate mean of values in each group
count()Count number of items in each group
agg(func)Apply one or more aggregation functions
size()Get size of each group

Key Takeaways

Use groupby() to split data into groups based on column values.
Always apply an aggregation function like sum() or mean() after grouping.
You can group by one or multiple columns by passing a list to groupby().
The result of groupby() is a GroupBy object, not a DataFrame, until aggregated.
Common aggregation methods include sum(), mean(), count(), and agg().