0
0
Pandasdata~10 mins

crosstab() for cross-tabulation in Pandas - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - crosstab() for cross-tabulation
Input Data
Select two categorical variables
Count occurrences of each combination
Create frequency table (cross-tabulation)
Display table showing counts per category pair
The crosstab function takes two categorical variables and counts how often each pair occurs, showing the result as a table.
Execution Sample
Pandas
import pandas as pd

data = {'Gender': ['Male', 'Female', 'Female', 'Male', 'Male'],
        'Preference': ['A', 'B', 'A', 'B', 'A']}
df = pd.DataFrame(data)

result = pd.crosstab(df['Gender'], df['Preference'])
print(result)
This code counts how many males and females prefer categories A or B and shows the counts in a table.
Execution Table
StepActionGenderPreferenceCountCross-tab Table State
1Read first rowMaleA1{'Male': {'A': 1}}
2Read second rowFemaleB1{'Male': {'A': 1}, 'Female': {'B': 1}}
3Read third rowFemaleA1{'Male': {'A': 1}, 'Female': {'B': 1, 'A': 1}}
4Read fourth rowMaleB1{'Male': {'A': 1, 'B': 1}, 'Female': {'B': 1, 'A': 1}}
5Read fifth rowMaleA2{'Male': {'A': 2, 'B': 1}, 'Female': {'B': 1, 'A': 1}}
6Build final table---Preference A B Gender Female 1 1 Male 2 1
💡 All rows processed, cross-tabulation table completed
Variable Tracker
VariableStartAfter 1After 2After 3After 4After 5Final
Cross-tab dict{}{'Male': {'A': 1}}{'Male': {'A': 1}, 'Female': {'B': 1}}{'Male': {'A': 1}, 'Female': {'B': 1, 'A': 1}}{'Male': {'A': 1, 'B': 1}, 'Female': {'B': 1, 'A': 1}}{'Male': {'A': 2, 'B': 1}, 'Female': {'B': 1, 'A': 1}}Final table as shown
Key Moments - 2 Insights
Why does the count for Male and Preference A increase on the last row?
Because the last row has Gender 'Male' and Preference 'A', which already appeared once, so the count increments from 1 to 2 as shown in execution_table row 5.
What happens if a category pair does not appear in the data?
It will not appear in the table or will show zero count. For example, if no Female prefers 'C', that cell is zero or missing in the cross-tab.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the count for Female and Preference B after step 2?
A1
B0
C2
DNot counted yet
💡 Hint
Check execution_table row 2 where Female and B first appear with count 1.
At which step does the count for Male and Preference B first appear?
AStep 1
BStep 4
CStep 3
DStep 5
💡 Hint
Look at execution_table row 4 where Male and B are first counted.
If a new row with Gender 'Female' and Preference 'A' is added, how will the count change?
AMale and A count becomes 3
BFemale and A count becomes 1
CFemale and A count becomes 2
DNo change
💡 Hint
Refer to variable_tracker showing counts increment when same pair appears again.
Concept Snapshot
pandas.crosstab(index, columns) creates a frequency table.
It counts occurrences of each pair of categories.
Input: two categorical series.
Output: DataFrame with counts.
Useful for quick category relationship summaries.
Full Transcript
The crosstab function in pandas takes two categorical variables and counts how many times each combination occurs. We start with input data containing categories like Gender and Preference. For each row, crosstab counts the pair and updates the frequency table. For example, when it reads a row with Male and Preference A, it adds 1 to that cell. If the same pair appears again, the count increases. After processing all rows, it shows a table with counts for each category pair. This helps us see relationships between categories quickly.