How to Use size in groupby in pandas: Simple Guide
In pandas, you can use
groupby().size() to count the number of rows in each group. This returns a Series with group labels as the index and the size of each group as values.Syntax
The basic syntax to use size with groupby is:
DataFrame.groupby(by).size()Here:
DataFrameis your data table.byis the column name(s) or list of columns to group by.size()counts the number of rows in each group.
python
df.groupby('column_name').size()Example
This example shows how to group a DataFrame by a column and count the number of rows in each group using size().
python
import pandas as pd data = {'Team': ['A', 'A', 'B', 'B', 'B', 'C'], 'Points': [10, 15, 10, 20, 15, 10]} df = pd.DataFrame(data) # Group by 'Team' and count rows in each group group_sizes = df.groupby('Team').size() print(group_sizes)
Output
Team
A 2
B 3
C 1
dtype: int64
Common Pitfalls
One common mistake is confusing size() with count(). size() counts all rows including those with missing values, while count() counts only non-missing values in each column.
Also, size() returns a Series with group labels as index, not a DataFrame.
python
import pandas as pd data = {'Team': ['A', 'A', 'B', 'B', 'B', 'C'], 'Points': [10, None, 10, 20, None, 10]} df = pd.DataFrame(data) # Using size counts all rows size_result = df.groupby('Team').size() # Using count counts only non-NA values in 'Points' count_result = df.groupby('Team')['Points'].count() print('Size result:\n', size_result) print('\nCount result:\n', count_result)
Output
Size result:
Team
A 2
B 3
C 1
dtype: int64
Count result:
Team
A 1
B 2
C 1
Name: Points, dtype: int64
Quick Reference
| Method | Description | Returns |
|---|---|---|
| groupby().size() | Counts total rows in each group including missing values | Series with group sizes |
| groupby().count() | Counts non-missing values per column in each group | DataFrame or Series with counts |
| groupby().agg('size') | Alternative way to get group sizes | Series with group sizes |
Key Takeaways
Use groupby().size() to count all rows in each group including missing values.
groupby().size() returns a Series with group labels as the index.
Do not confuse size() with count(); count() ignores missing values.
You can group by one or multiple columns using a list in groupby().
size() is a quick way to see how many items are in each group.