0
0
Pandasdata~5 mins

crosstab() for cross-tabulation in Pandas - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is the purpose of the crosstab() function in pandas?
The crosstab() function creates a table that shows the frequency (count) of combinations between two or more categorical variables. It helps to summarize and compare data easily.
Click to reveal answer
beginner
How do you create a simple cross-tabulation between two columns col1 and col2 in a DataFrame df?
Use pd.crosstab(df['col1'], df['col2']). This counts how many times each combination of values from col1 and col2 appears.
Click to reveal answer
intermediate
What does the normalize parameter do in crosstab()?
The normalize parameter changes counts into proportions or percentages. For example, normalize='index' shows row-wise proportions, and normalize='columns' shows column-wise proportions.
Click to reveal answer
intermediate
Can crosstab() handle more than two variables? How?
Yes, by passing multiple arrays or columns as arguments. For example, pd.crosstab([df['col1'], df['col2']], df['col3']) creates a multi-index table showing counts for combinations of col1 and col2 against col3.
Click to reveal answer
beginner
What is a real-life example where crosstab() is useful?
Imagine a survey with answers about gender and favorite fruit. crosstab() can show how many males and females prefer each fruit, helping to understand preferences by group.
Click to reveal answer
What does pd.crosstab(df['A'], df['B']) return?
AA table counting occurrences of each combination of values in columns A and B
BA table showing the sum of values in columns A and B
CA table with the mean of columns A and B
DA table with unique values from columns A and B
How do you get proportions instead of counts in crosstab()?
ASet <code>normalize='index'</code> or <code>normalize='columns'</code>
BSet <code>normalize='all'</code>
CSet <code>normalize=True</code>
DSet <code>normalize=False</code>
Which of these is a valid way to use crosstab() with three variables?
A<code>pd.crosstab(df['A'], df['B'], df['C'])</code>
B<code>pd.crosstab(df['A'])</code>
C<code>pd.crosstab(df['A'] + df['B'], df['C'])</code>
D<code>pd.crosstab([df['A'], df['B']], df['C'])</code>
What type of data is best suited for crosstab()?
ANumerical continuous data
BCategorical data
CTime series data
DText data
If you want to see how many customers bought product A or B by region, which function helps?
Agroupby()
Bpivot_table()
Ccrosstab()
Dmerge()
Explain how to use crosstab() to analyze the relationship between two categorical columns in a DataFrame.
Think about how to count how often each pair of categories appears.
You got /3 concepts.
    Describe how the normalize parameter changes the output of crosstab() and why it might be useful.
    Consider when percentages are easier to understand than raw counts.
    You got /3 concepts.