0
0
Data Analysis Pythondata~5 mins

Cross-tabulation with crosstab() in Data Analysis Python

Choose your learning style9 modes available
Introduction

Cross-tabulation helps us see how two or more groups relate by counting how often their values appear together. It is like making a simple table to compare categories.

You want to compare how many people in different age groups prefer different types of fruits.
You want to check the relationship between gender and favorite sport in a survey.
You want to see how many customers from different cities bought different product categories.
You want to analyze how different education levels relate to job types in a dataset.
Syntax
Data Analysis Python
import pandas as pd

# Create a cross-tabulation table
pd.crosstab(index=data['Column1'], columns=data['Column2'], margins=False, normalize=False)

index is the row variable (categories for rows).

columns is the column variable (categories for columns).

Examples
Basic example showing counts of color and shape combinations.
Data Analysis Python
import pandas as pd

# Example data
colors = ['Red', 'Blue', 'Red', 'Green', 'Blue']
shapes = ['Circle', 'Square', 'Square', 'Circle', 'Circle']

# Cross-tabulation
pd.crosstab(index=colors, columns=shapes)
Edge case: empty data returns an empty DataFrame.
Data Analysis Python
import pandas as pd

# Empty data
colors = []
shapes = []

# Cross-tabulation on empty lists
pd.crosstab(index=colors, columns=shapes)
Edge case: single element returns a 1x1 table with count 1.
Data Analysis Python
import pandas as pd

# Single element data
colors = ['Red']
shapes = ['Circle']

# Cross-tabulation with one element
pd.crosstab(index=colors, columns=shapes)
Edge case: one unique column category, counts all in one column.
Data Analysis Python
import pandas as pd

# Data with one category in columns
colors = ['Red', 'Red', 'Red']
shapes = ['Circle', 'Circle', 'Circle']

# Cross-tabulation
pd.crosstab(index=colors, columns=shapes)
Sample Program

This program shows how many males and females prefer each fruit. It prints the original data and then the cross-tabulation table.

Data Analysis Python
import pandas as pd

# Sample data: survey of favorite fruit by gender
survey_data = {
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Female', 'Male', 'Male'],
    'FavoriteFruit': ['Apple', 'Banana', 'Apple', 'Banana', 'Apple', 'Apple', 'Banana']
}

# Create DataFrame
survey_df = pd.DataFrame(survey_data)

print('Original DataFrame:')
print(survey_df)

# Create cross-tabulation table
fruit_gender_crosstab = pd.crosstab(index=survey_df['Gender'], columns=survey_df['FavoriteFruit'])

print('\nCross-tabulation of Gender and Favorite Fruit:')
print(fruit_gender_crosstab)
OutputSuccess
Important Notes

Time complexity is O(n), where n is the number of rows in the data.

Space complexity depends on the number of unique categories in the index and columns.

Common mistake: forgetting to pass the correct columns as index and columns, leading to confusing tables.

Use crosstab() when you want to count occurrences between categories. For more complex statistics, consider pivot tables.

Summary

Cross-tabulation counts how often categories from two variables appear together.

Use pd.crosstab() with index and columns to create the table.

It helps compare groups easily and understand relationships in data.