Cross-tabulation helps us see how two or more groups relate by counting how often their values appear together. It is like making a simple table to compare categories.
Cross-tabulation with crosstab() in Data Analysis Python
import pandas as pd # Create a cross-tabulation table pd.crosstab(index=data['Column1'], columns=data['Column2'], margins=False, normalize=False)
index is the row variable (categories for rows).
columns is the column variable (categories for columns).
import pandas as pd # Example data colors = ['Red', 'Blue', 'Red', 'Green', 'Blue'] shapes = ['Circle', 'Square', 'Square', 'Circle', 'Circle'] # Cross-tabulation pd.crosstab(index=colors, columns=shapes)
import pandas as pd # Empty data colors = [] shapes = [] # Cross-tabulation on empty lists pd.crosstab(index=colors, columns=shapes)
import pandas as pd # Single element data colors = ['Red'] shapes = ['Circle'] # Cross-tabulation with one element pd.crosstab(index=colors, columns=shapes)
import pandas as pd # Data with one category in columns colors = ['Red', 'Red', 'Red'] shapes = ['Circle', 'Circle', 'Circle'] # Cross-tabulation pd.crosstab(index=colors, columns=shapes)
This program shows how many males and females prefer each fruit. It prints the original data and then the cross-tabulation table.
import pandas as pd # Sample data: survey of favorite fruit by gender survey_data = { 'Gender': ['Male', 'Female', 'Female', 'Male', 'Female', 'Male', 'Male'], 'FavoriteFruit': ['Apple', 'Banana', 'Apple', 'Banana', 'Apple', 'Apple', 'Banana'] } # Create DataFrame survey_df = pd.DataFrame(survey_data) print('Original DataFrame:') print(survey_df) # Create cross-tabulation table fruit_gender_crosstab = pd.crosstab(index=survey_df['Gender'], columns=survey_df['FavoriteFruit']) print('\nCross-tabulation of Gender and Favorite Fruit:') print(fruit_gender_crosstab)
Time complexity is O(n), where n is the number of rows in the data.
Space complexity depends on the number of unique categories in the index and columns.
Common mistake: forgetting to pass the correct columns as index and columns, leading to confusing tables.
Use crosstab() when you want to count occurrences between categories. For more complex statistics, consider pivot tables.
Cross-tabulation counts how often categories from two variables appear together.
Use pd.crosstab() with index and columns to create the table.
It helps compare groups easily and understand relationships in data.