0
0
Pandasdata~5 mins

crosstab() for cross-tabulation in Pandas

Choose your learning style9 modes available
Introduction

We use crosstab() to see how two or more groups relate by counting how often their values appear together. It helps us understand connections in data easily.

To find how many customers bought different product categories by gender.
To check how many students passed or failed in different classes.
To compare survey answers between age groups and regions.
To see the relationship between job roles and education levels.
To analyze how different marketing campaigns performed across cities.
Syntax
Pandas
import pandas as pd

# Basic crosstab syntax
df_crosstab = pd.crosstab(index=data['column1'], columns=data['column2'], margins=False, normalize=False)

index is the row variable, columns is the column variable.

margins=True adds totals for rows and columns.

Examples
Shows counts of each product bought by gender.
Pandas
import pandas as pd

# Example data
data = pd.DataFrame({
    'Gender': ['Male', 'Female', 'Female', 'Male'],
    'Product': ['A', 'B', 'A', 'B']
})

# Crosstab of Gender vs Product
result = pd.crosstab(index=data['Gender'], columns=data['Product'])
print(result)
Shows that crosstab returns an empty DataFrame if input data is empty.
Pandas
import pandas as pd

# Empty data example
data_empty = pd.DataFrame({'A': [], 'B': []})

result_empty = pd.crosstab(index=data_empty['A'], columns=data_empty['B'])
print(result_empty)
Shows crosstab with only one row and one column.
Pandas
import pandas as pd

# Single element data
data_single = pd.DataFrame({'A': ['X'], 'B': ['Y']})

result_single = pd.crosstab(index=data_single['A'], columns=data_single['B'])
print(result_single)
Adds row and column totals to the crosstab.
Pandas
import pandas as pd

# Using margins to add totals
result_margins = pd.crosstab(index=data['Gender'], columns=data['Product'], margins=True)
print(result_margins)
Sample Program

This program shows how many products each gender bought and includes totals for rows and columns.

Pandas
import pandas as pd

# Create sample data
sales_data = pd.DataFrame({
    'Customer': ['Alice', 'Bob', 'Charlie', 'Diana', 'Evan', 'Fiona'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Male', 'Female'],
    'Product': ['Book', 'Pen', 'Book', 'Pen', 'Notebook', 'Book']
})

print('Original Data:')
print(sales_data)

# Create crosstab to count products bought by gender
product_gender_crosstab = pd.crosstab(index=sales_data['Gender'], columns=sales_data['Product'], margins=True)

print('\nCrosstab of Product by Gender with Totals:')
print(product_gender_crosstab)
OutputSuccess
Important Notes

Time complexity: O(n), where n is the number of rows in the data.

Space complexity: Depends on the number of unique values in the index and columns.

Common mistake: Forgetting to specify the correct columns for index and columns, which can lead to confusing results.

Use crosstab() when you want to count occurrences between categories. For more complex aggregation, consider pivot_table().

Summary

crosstab() counts how often values from two columns appear together.

It helps compare groups easily with simple tables.

You can add totals with margins=True for better summary.