0
0
Data Analysis Pythondata~10 mins

Cross-tabulation with crosstab() in Data Analysis Python - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Cross-tabulation with crosstab()
Start with DataFrame
Select two columns
Apply pd.crosstab()
Count combinations
Create cross-tab table
Display result
We start with a data table, pick two columns, count how often each pair appears, and show the counts in a new table.
Execution Sample
Data Analysis Python
import pandas as pd

data = {'Gender': ['M', 'F', 'F', 'M', 'F'],
        'Preference': ['Tea', 'Coffee', 'Tea', 'Coffee', 'Tea']}
df = pd.DataFrame(data)

ct = pd.crosstab(df['Gender'], df['Preference'])
print(ct)
This code counts how many males and females prefer Tea or Coffee and shows the counts in a table.
Execution Table
StepActionDataFrame StateCross-tab Result
1Create DataFrame with Gender and Preference columns{'Gender': ['M', 'F', 'F', 'M', 'F'], 'Preference': ['Tea', 'Coffee', 'Tea', 'Coffee', 'Tea']}N/A
2Select 'Gender' and 'Preference' columnsDataFrame with two columnsN/A
3Apply pd.crosstab() to count combinationsSame DataFrameCounts of each Gender-Preference pair
4Count 'M' with 'Tea' → 1Same DataFrameM Tea: 1
5Count 'M' with 'Coffee' → 2Same DataFrameM Coffee: 2
6Count 'F' with 'Tea' → 2Same DataFrameF Tea: 2
7Count 'F' with 'Coffee' → 1Same DataFrameF Coffee: 1
8Build cross-tab tableSame DataFrameTable: Coffee Tea F 1 2 M 2 1
9Print cross-tab tableSame DataFrameOutput displayed
10End of executionSame DataFrameExecution stops
💡 All rows processed, cross-tabulation complete
Variable Tracker
VariableStartAfter Step 1After Step 3Final
dfNone{'Gender': ['M', 'F', 'F', 'M', 'F'], 'Preference': ['Tea', 'Coffee', 'Tea', 'Coffee', 'Tea']}Same DataFrameSame DataFrame
ctNoneNoneCounts of Gender-Preference pairsCross-tab table with counts
Key Moments - 2 Insights
Why does the cross-tab table show zeros for some combinations?
Because those Gender-Preference pairs do not appear in the data. For example, if no 'M' prefers 'Juice', that cell is zero. See execution_table step 8 where only existing pairs have counts.
Can crosstab() work with columns that have missing values?
Yes, but missing values are ignored by default. They won't appear in the cross-tab counts unless you specify parameters to include them.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table at step 6, what is the count of females who prefer Tea?
A1
B3
C2
D0
💡 Hint
Check the 'Cross-tab Result' column at step 6 in the execution_table.
At which step does the cross-tab table get fully built?
AStep 4
BStep 8
CStep 7
DStep 10
💡 Hint
Look for the step where the table with all counts is shown in the execution_table.
If the data had an extra row with Gender 'M' and Preference 'Tea', how would the count at step 4 change?
AIt would increase by 1
BIt would decrease by 1
CIt would stay the same
DIt would reset to zero
💡 Hint
Step 4 shows counting 'M' with 'Tea'. Adding one more such row increases the count.
Concept Snapshot
pd.crosstab(index, columns) counts occurrences of combinations between two columns.
Input: two Series or columns from DataFrame.
Output: table with counts for each pair.
Useful for quick frequency tables.
Missing pairs show zero count by default.
Full Transcript
Cross-tabulation with pd.crosstab() takes two columns from a data table and counts how often each pair of values appears together. We start with a DataFrame containing columns like Gender and Preference. Then we select these columns and apply pd.crosstab() to count combinations. The result is a new table showing counts for each Gender-Preference pair. For example, how many males prefer Tea or Coffee. The process counts each pair step-by-step and builds the table. Missing pairs get zero counts. This method helps summarize relationships between two categorical variables quickly.