0
0
Pandasdata~5 mins

crosstab() for cross-tabulation in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: crosstab() for cross-tabulation
O(n)
Understanding Time Complexity

We want to understand how the time to create a cross-tabulation table grows as the data size increases.

Specifically, how the pandas crosstab() function handles larger inputs.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

# Sample data
colors = ['red', 'blue', 'green', 'red', 'blue']
sizes = ['S', 'M', 'L', 'M', 'S']

# Create cross-tabulation
result = pd.crosstab(colors, sizes)
print(result)

This code counts how many times each color appears with each size, making a table of counts.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Scanning through each pair of values in the two input arrays.
  • How many times: Once for each row in the input data (n times).
How Execution Grows With Input

As the number of rows grows, the function must check each row once to count occurrences.

Input Size (n)Approx. Operations
10About 10 checks
100About 100 checks
1000About 1000 checks

Pattern observation: The work grows directly with the number of rows, so doubling rows roughly doubles the work.

Final Time Complexity

Time Complexity: O(n)

This means the time to create the cross-tabulation grows linearly with the number of rows in the input.

Common Mistake

[X] Wrong: "Because the output is a table, the time must grow with the square of input size."

[OK] Correct: The function only scans each row once and updates counts, it does not compare every pair of rows, so it grows linearly, not quadratically.

Interview Connect

Understanding how pandas functions like crosstab() scale helps you explain data processing efficiency clearly in interviews.

Self-Check

"What if we added a third variable to group by in crosstab? How would the time complexity change?"