We use crosstab() to see how two or more groups relate by counting how often their values appear together. It helps us understand connections in data easily.
crosstab() for cross-tabulation in Pandas
import pandas as pd # Basic crosstab syntax df_crosstab = pd.crosstab(index=data['column1'], columns=data['column2'], margins=False, normalize=False)
index is the row variable, columns is the column variable.
margins=True adds totals for rows and columns.
import pandas as pd # Example data data = pd.DataFrame({ 'Gender': ['Male', 'Female', 'Female', 'Male'], 'Product': ['A', 'B', 'A', 'B'] }) # Crosstab of Gender vs Product result = pd.crosstab(index=data['Gender'], columns=data['Product']) print(result)
import pandas as pd # Empty data example data_empty = pd.DataFrame({'A': [], 'B': []}) result_empty = pd.crosstab(index=data_empty['A'], columns=data_empty['B']) print(result_empty)
import pandas as pd # Single element data data_single = pd.DataFrame({'A': ['X'], 'B': ['Y']}) result_single = pd.crosstab(index=data_single['A'], columns=data_single['B']) print(result_single)
import pandas as pd # Using margins to add totals result_margins = pd.crosstab(index=data['Gender'], columns=data['Product'], margins=True) print(result_margins)
This program shows how many products each gender bought and includes totals for rows and columns.
import pandas as pd # Create sample data sales_data = pd.DataFrame({ 'Customer': ['Alice', 'Bob', 'Charlie', 'Diana', 'Evan', 'Fiona'], 'Gender': ['Female', 'Male', 'Male', 'Female', 'Male', 'Female'], 'Product': ['Book', 'Pen', 'Book', 'Pen', 'Notebook', 'Book'] }) print('Original Data:') print(sales_data) # Create crosstab to count products bought by gender product_gender_crosstab = pd.crosstab(index=sales_data['Gender'], columns=sales_data['Product'], margins=True) print('\nCrosstab of Product by Gender with Totals:') print(product_gender_crosstab)
Time complexity: O(n), where n is the number of rows in the data.
Space complexity: Depends on the number of unique values in the index and columns.
Common mistake: Forgetting to specify the correct columns for index and columns, which can lead to confusing results.
Use crosstab() when you want to count occurrences between categories. For more complex aggregation, consider pivot_table().
crosstab() counts how often values from two columns appear together.
It helps compare groups easily with simple tables.
You can add totals with margins=True for better summary.