0
0
PandasHow-ToBeginner · 3 min read

How to Find Frequency of Values in pandas DataFrame or Series

Use the value_counts() method on a pandas Series or DataFrame column to find the frequency of each unique value. It returns a Series with values as the index and their counts as the data.
📐

Syntax

The basic syntax to find frequency of values in pandas is:

  • Series.value_counts(normalize=False, sort=True, ascending=False, dropna=True)
  • DataFrame['column_name'].value_counts() to get frequencies for a specific column.

Parameters explained:

  • normalize: If True, returns relative frequencies instead of counts.
  • sort: Sort by counts (default True).
  • ascending: Sort ascending if True.
  • dropna: Exclude NaN values if True (default).
python
series.value_counts(normalize=False, sort=True, ascending=False, dropna=True)

# Example for DataFrame column
DataFrame['column_name'].value_counts()
💻

Example

This example shows how to find the frequency of values in a pandas Series and a DataFrame column.

python
import pandas as pd

# Create a sample DataFrame
data = {'fruits': ['apple', 'banana', 'apple', 'orange', 'banana', 'banana', 'apple', None]}
df = pd.DataFrame(data)

# Frequency of values in 'fruits' column
freq = df['fruits'].value_counts()

# Frequency including NaN values
freq_with_nan = df['fruits'].value_counts(dropna=False)

# Relative frequency
rel_freq = df['fruits'].value_counts(normalize=True)

print('Frequency of fruits:')
print(freq)
print('\nFrequency including NaN:')
print(freq_with_nan)
print('\nRelative frequency:')
print(rel_freq)
Output
Frequency of fruits: apple 3 banana 3 orange 1 Name: fruits, dtype: int64 Frequency including NaN: apple 3 banana 3 orange 1 NaN 1 Name: fruits, dtype: int64 Relative frequency: apple 0.428571 banana 0.428571 orange 0.142857 Name: fruits, dtype: float64
⚠️

Common Pitfalls

Common mistakes when finding frequency of values in pandas include:

  • Forgetting to select a specific column from a DataFrame before calling value_counts().
  • Not handling NaN values, which are excluded by default.
  • Expecting a DataFrame output instead of a Series.

Example of a wrong approach and the correct way:

python
# Wrong: calling value_counts() directly on DataFrame
import pandas as pd
df = pd.DataFrame({'A': [1,2,2,3]})

try:
    df.value_counts()
except Exception as e:
    print(f'Error: {e}')

# Correct: call on a column
counts = df['A'].value_counts()
print(counts)
Output
Error: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 2 2 1 1 3 1 Name: A, dtype: int64
📊

Quick Reference

MethodDescriptionExample
value_counts()Counts unique values in a Series or DataFrame columndf['col'].value_counts()
value_counts(normalize=True)Returns relative frequencies (proportions)df['col'].value_counts(normalize=True)
value_counts(dropna=False)Includes NaN values in countsdf['col'].value_counts(dropna=False)
sort_values()Sorts the result if neededdf['col'].value_counts().sort_values(ascending=True)

Key Takeaways

Use value_counts() on a pandas Series or DataFrame column to get frequency counts.
By default, value_counts() excludes NaN values; use dropna=False to include them.
Set normalize=True to get relative frequencies instead of counts.
Always call value_counts() on a Series, not directly on a DataFrame.
You can sort the frequency results using sort_values() if needed.