Size vs Count in groupby pandas: Key Differences and Usage
groupby.size() returns the total number of rows in each group including missing values, while groupby.count() counts non-missing values for each column in the group. Use size() to get group sizes regardless of missing data, and count() to count valid entries per column.Quick Comparison
Here is a quick side-by-side comparison of size() and count() in pandas groupby:
| Aspect | groupby.size() | groupby.count() |
|---|---|---|
| Counts rows per group | Yes, includes all rows | Counts non-NA values per column |
| Handles missing values | Includes missing (NaN) rows | Excludes missing (NaN) values |
| Returns | Series with group sizes | DataFrame with counts per column |
| Use case | Total group size | Count of valid entries per column |
| Output shape | One value per group | One value per group per column |
Key Differences
groupby.size() counts the total number of rows in each group, including rows where some columns may have missing values (NaN). It returns a Series indexed by the group keys with the size as the value.
On the other hand, groupby.count() counts the number of non-missing values for each column within each group. It returns a DataFrame where each column shows counts of valid (non-NaN) entries per group.
This means size() gives a simple total count of rows per group, while count() provides detailed counts per column, ignoring missing data. Use size() when you want the total group size regardless of missing data, and count() when you want to know how many valid entries each column has in each group.
Code Comparison
Example using groupby.size() to count total rows per group:
import pandas as pd data = {'Team': ['A', 'A', 'B', 'B', 'B', 'C'], 'Points': [10, None, 15, 20, None, 5], 'Assists': [5, 7, None, 10, 8, None]} df = pd.DataFrame(data) size_result = df.groupby('Team').size() print(size_result)
Count Equivalent
Example using groupby.count() to count non-missing values per column per group:
count_result = df.groupby('Team').count() print(count_result)
When to Use Which
Choose groupby.size() when you want the total number of rows in each group, including those with missing values. This is useful for understanding group sizes or frequencies.
Choose groupby.count() when you need to know how many valid (non-missing) entries exist per column in each group. This helps when analyzing data completeness or valid data points per feature.
Key Takeaways
groupby.size() counts all rows per group including missing values.groupby.count() counts non-missing values per column per group.size() returns a Series; count() returns a DataFrame.size() for total group size, count() for valid data counts.