Concept Flow - Cross-tabulation advanced usage

Start with DataFrame

↓

Select variables for rows and columns

↓

Apply pd.crosstab()

↓

Add aggregation functions (e.g., margins, normalize)

↓

Use multiple index levels (rows and columns)

↓

Output: Cross-tabulated DataFrame with counts or stats

Start with data, choose row and column variables, apply crosstab with options like margins and normalization, and get a detailed summary table.

Execution Sample

Pandas

import pandas as pd

data = {'Gender': ['F', 'M', 'F', 'M', 'F'],
        'AgeGroup': ['Adult', 'Adult', 'Child', 'Child', 'Adult'],
        'Purchased': ['Yes', 'No', 'Yes', 'No', 'Yes']}
df = pd.DataFrame(data)

ct = pd.crosstab(index=[df['Gender'], df['AgeGroup']],
                 columns=df['Purchased'],
                 margins=True,
                 normalize='index')

Create a DataFrame and generate a cross-tabulation with multi-index rows, column categories, margins, and row-wise normalization.

Execution Table

Step	Action	Input Variables	Parameters	Output (Cross-tab shape and sample values)
1	Create DataFrame	Gender, AgeGroup, Purchased	None	DataFrame with 5 rows and 3 columns
2	Call pd.crosstab	Index: Gender & AgeGroup, Columns: Purchased	margins=True, normalize='index'	Cross-tab with multi-index rows (Gender, AgeGroup), columns (No, Yes), plus All column
3	Calculate counts per group	Groups by Gender & AgeGroup	Count Purchased categories	Counts like Female Adult: No=0, Yes=2; Male Child: No=1, Yes=0
4	Normalize counts by row	Counts per group	normalize='index'	Values converted to proportions per row, e.g. Female Adult Yes=1.0
5	Add margins (totals)	All groups	margins=True	Row 'All' shows overall proportions, e.g. All Yes=0.6
6	Output final crosstab	Normalized proportions with totals	Multi-index rows, columns with totals	DataFrame shape (6 rows, 3 columns), values between 0 and 1

💡 All groups processed, normalized proportions and margins added, final cross-tabulation ready

Variable Tracker

Variable	Start	After Step 1	After Step 2	After Step 3	After Step 4	Final
df	None	DataFrame with 5 rows, 3 columns	Same	Same	Same	Same
ct	None	None	Raw counts with multi-index	Counts per group	Normalized proportions per row	Normalized proportions with margins

Key Moments - 3 Insights

Why do we use a list for the index parameter in pd.crosstab?

What does normalize='index' do in the crosstab?

What is the effect of margins=True?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table at step 4, what is the normalized value for Female Adult who purchased 'Yes'?

A0.0

B0.5

C1.0

D0.33

Concept Snapshot

pd.crosstab(index, columns, margins=False, normalize=None)
- index: row grouping variable(s), can be list for multi-index
- columns: column grouping variable
- margins=True adds totals row/column
- normalize='index' converts counts to row-wise proportions
- Output: DataFrame summarizing counts or proportions by groups

Full Transcript

This visual trace shows how to use pandas crosstab for advanced grouping. We start with a DataFrame containing Gender, AgeGroup, and Purchased columns. We call pd.crosstab with a list for index to group rows by Gender and AgeGroup, and columns by Purchased. We add margins=True to get totals and normalize='index' to get proportions per row. The execution table shows each step: creating the DataFrame, grouping and counting, normalizing counts to proportions, and adding totals. The variable tracker follows the DataFrame and crosstab output states. Key moments clarify why multi-index is used, what normalization does, and the role of margins. The quiz tests understanding of normalized values, when margins are added, and effect of removing normalization. The snapshot summarizes syntax and key options for quick reference.