0
0
PandasHow-ToBeginner · 3 min read

How to Use size in groupby in pandas: Simple Guide

In pandas, you can use groupby().size() to count the number of rows in each group. This returns a Series with group labels as the index and the size of each group as values.
📐

Syntax

The basic syntax to use size with groupby is:

DataFrame.groupby(by).size()

Here:

  • DataFrame is your data table.
  • by is the column name(s) or list of columns to group by.
  • size() counts the number of rows in each group.
python
df.groupby('column_name').size()
💻

Example

This example shows how to group a DataFrame by a column and count the number of rows in each group using size().

python
import pandas as pd

data = {'Team': ['A', 'A', 'B', 'B', 'B', 'C'],
        'Points': [10, 15, 10, 20, 15, 10]}
df = pd.DataFrame(data)

# Group by 'Team' and count rows in each group
group_sizes = df.groupby('Team').size()
print(group_sizes)
Output
Team A 2 B 3 C 1 dtype: int64
⚠️

Common Pitfalls

One common mistake is confusing size() with count(). size() counts all rows including those with missing values, while count() counts only non-missing values in each column.

Also, size() returns a Series with group labels as index, not a DataFrame.

python
import pandas as pd

data = {'Team': ['A', 'A', 'B', 'B', 'B', 'C'],
        'Points': [10, None, 10, 20, None, 10]}
df = pd.DataFrame(data)

# Using size counts all rows
size_result = df.groupby('Team').size()

# Using count counts only non-NA values in 'Points'
count_result = df.groupby('Team')['Points'].count()

print('Size result:\n', size_result)
print('\nCount result:\n', count_result)
Output
Size result: Team A 2 B 3 C 1 dtype: int64 Count result: Team A 1 B 2 C 1 Name: Points, dtype: int64
📊

Quick Reference

MethodDescriptionReturns
groupby().size()Counts total rows in each group including missing valuesSeries with group sizes
groupby().count()Counts non-missing values per column in each groupDataFrame or Series with counts
groupby().agg('size')Alternative way to get group sizesSeries with group sizes

Key Takeaways

Use groupby().size() to count all rows in each group including missing values.
groupby().size() returns a Series with group labels as the index.
Do not confuse size() with count(); count() ignores missing values.
You can group by one or multiple columns using a list in groupby().
size() is a quick way to see how many items are in each group.