0
0
PandasHow-ToBeginner · 3 min read

How to Use crosstab in pandas for Data Analysis

Use pandas.crosstab() to compute a simple cross-tabulation of two or more factors. It counts the frequency of combinations of values from input arrays or columns, helping you see relationships between categorical data.
📐

Syntax

The basic syntax of pandas.crosstab() is:

  • index: values to group by in rows.
  • columns: values to group by in columns.
  • values: optional, to aggregate other data.
  • aggfunc: function to aggregate values, default is len (count).
  • normalize: normalize counts to proportions.
python
pandas.crosstab(index, columns, values=None, aggfunc=None, normalize=False)
💻

Example

This example shows how to use crosstab to count occurrences of two categorical columns in a DataFrame.

python
import pandas as pd

data = {'Gender': ['Male', 'Female', 'Female', 'Male', 'Female', 'Male'],
        'Preference': ['Tea', 'Coffee', 'Tea', 'Coffee', 'Tea', 'Tea']}
df = pd.DataFrame(data)

result = pd.crosstab(df['Gender'], df['Preference'])
print(result)
Output
Preference Coffee Tea Gender Female 1 2 Male 1 2
⚠️

Common Pitfalls

Common mistakes when using crosstab include:

  • Passing non-categorical data without converting it first can lead to unexpected results.
  • Forgetting to specify aggfunc when using values causes errors.
  • Not normalizing when proportions are needed, leading to confusion.

Always check your data types and parameters.

python
import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 30, 40]}
df = pd.DataFrame(data)

# Wrong: Using values without aggfunc
# pd.crosstab(df['Category'], df['Value'], values=df['Value'])  # This raises an error

# Right: Specify aggfunc to aggregate values
pd.crosstab(df['Category'], df['Value'], values=df['Value'], aggfunc='sum')
Output
Value 10 20 30 40 Category A 10.0 NaN 30.0 NaN B NaN 20.0 NaN 40.0
📊

Quick Reference

Summary tips for using pandas.crosstab():

  • Use index and columns to specify grouping variables.
  • Use values and aggfunc to aggregate numeric data.
  • Set normalize=True to get proportions instead of counts.
  • Works well for categorical data analysis and frequency tables.

Key Takeaways

Use pandas.crosstab() to count or aggregate combinations of categorical variables.
Specify aggfunc when aggregating values to avoid errors.
Normalize parameter helps convert counts to proportions.
Check data types to ensure meaningful crosstab results.