PandasComparisonBeginner · 4 min read

Size vs Count in groupby pandas: Key Differences and Usage

In pandas, groupby.size() returns the total number of rows in each group including missing values, while groupby.count() counts non-missing values for each column in the group. Use size() to get group sizes regardless of missing data, and count() to count valid entries per column.

⚖️

Quick Comparison

Here is a quick side-by-side comparison of size() and count() in pandas groupby:

Aspect	groupby.size()	groupby.count()
Counts rows per group	Yes, includes all rows	Counts non-NA values per column
Handles missing values	Includes missing (NaN) rows	Excludes missing (NaN) values
Returns	Series with group sizes	DataFrame with counts per column
Use case	Total group size	Count of valid entries per column
Output shape	One value per group	One value per group per column

⚖️

Key Differences

groupby.size() counts the total number of rows in each group, including rows where some columns may have missing values (NaN). It returns a Series indexed by the group keys with the size as the value.

On the other hand, groupby.count() counts the number of non-missing values for each column within each group. It returns a DataFrame where each column shows counts of valid (non-NaN) entries per group.

This means size() gives a simple total count of rows per group, while count() provides detailed counts per column, ignoring missing data. Use size() when you want the total group size regardless of missing data, and count() when you want to know how many valid entries each column has in each group.

⚖️

Code Comparison

Example using groupby.size() to count total rows per group:

python

import pandas as pd

data = {'Team': ['A', 'A', 'B', 'B', 'B', 'C'],
        'Points': [10, None, 15, 20, None, 5],
        'Assists': [5, 7, None, 10, 8, None]}
df = pd.DataFrame(data)

size_result = df.groupby('Team').size()
print(size_result)

Output

Team A 2 B 3 C 1 dtype: int64

↔️

Count Equivalent

Example using groupby.count() to count non-missing values per column per group:

python

count_result = df.groupby('Team').count()
print(count_result)

Output

Points Assists Team A 1 2 B 2 2 C 1 0

🎯

When to Use Which

Choose groupby.size() when you want the total number of rows in each group, including those with missing values. This is useful for understanding group sizes or frequencies.

Choose groupby.count() when you need to know how many valid (non-missing) entries exist per column in each group. This helps when analyzing data completeness or valid data points per feature.

✅

Key Takeaways

groupby.size() counts all rows per group including missing values.

groupby.count() counts non-missing values per column per group.

size() returns a Series; count() returns a DataFrame.

Use size() for total group size, count() for valid data counts.

Understanding these helps in accurate data aggregation and analysis.