How to Find Frequency of Values in pandas DataFrame or Series
Use the
value_counts() method on a pandas Series or DataFrame column to find the frequency of each unique value. It returns a Series with values as the index and their counts as the data.Syntax
The basic syntax to find frequency of values in pandas is:
Series.value_counts(normalize=False, sort=True, ascending=False, dropna=True)DataFrame['column_name'].value_counts()to get frequencies for a specific column.
Parameters explained:
normalize: If True, returns relative frequencies instead of counts.sort: Sort by counts (default True).ascending: Sort ascending if True.dropna: Exclude NaN values if True (default).
python
series.value_counts(normalize=False, sort=True, ascending=False, dropna=True) # Example for DataFrame column DataFrame['column_name'].value_counts()
Example
This example shows how to find the frequency of values in a pandas Series and a DataFrame column.
python
import pandas as pd # Create a sample DataFrame data = {'fruits': ['apple', 'banana', 'apple', 'orange', 'banana', 'banana', 'apple', None]} df = pd.DataFrame(data) # Frequency of values in 'fruits' column freq = df['fruits'].value_counts() # Frequency including NaN values freq_with_nan = df['fruits'].value_counts(dropna=False) # Relative frequency rel_freq = df['fruits'].value_counts(normalize=True) print('Frequency of fruits:') print(freq) print('\nFrequency including NaN:') print(freq_with_nan) print('\nRelative frequency:') print(rel_freq)
Output
Frequency of fruits:
apple 3
banana 3
orange 1
Name: fruits, dtype: int64
Frequency including NaN:
apple 3
banana 3
orange 1
NaN 1
Name: fruits, dtype: int64
Relative frequency:
apple 0.428571
banana 0.428571
orange 0.142857
Name: fruits, dtype: float64
Common Pitfalls
Common mistakes when finding frequency of values in pandas include:
- Forgetting to select a specific column from a DataFrame before calling
value_counts(). - Not handling
NaNvalues, which are excluded by default. - Expecting a DataFrame output instead of a Series.
Example of a wrong approach and the correct way:
python
# Wrong: calling value_counts() directly on DataFrame import pandas as pd df = pd.DataFrame({'A': [1,2,2,3]}) try: df.value_counts() except Exception as e: print(f'Error: {e}') # Correct: call on a column counts = df['A'].value_counts() print(counts)
Output
Error: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
2 2
1 1
3 1
Name: A, dtype: int64
Quick Reference
| Method | Description | Example |
|---|---|---|
| value_counts() | Counts unique values in a Series or DataFrame column | df['col'].value_counts() |
| value_counts(normalize=True) | Returns relative frequencies (proportions) | df['col'].value_counts(normalize=True) |
| value_counts(dropna=False) | Includes NaN values in counts | df['col'].value_counts(dropna=False) |
| sort_values() | Sorts the result if needed | df['col'].value_counts().sort_values(ascending=True) |
Key Takeaways
Use
value_counts() on a pandas Series or DataFrame column to get frequency counts.By default,
value_counts() excludes NaN values; use dropna=False to include them.Set
normalize=True to get relative frequencies instead of counts.Always call
value_counts() on a Series, not directly on a DataFrame.You can sort the frequency results using
sort_values() if needed.