0
0
Pandasdata~5 mins

Ordered categories in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Ordered categories
O(n log n)
Understanding Time Complexity

We want to understand how the time needed to work with ordered categories changes as the data grows.

Specifically, how does sorting or comparing ordered categories scale with more data?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

# Create a categorical column with order
cats = pd.Categorical(["low", "medium", "high", "medium", "low"], 
                      categories=["low", "medium", "high"], ordered=True)

# Create a DataFrame
df = pd.DataFrame({"priority": cats})

# Sort the DataFrame by the ordered category
sorted_df = df.sort_values(by="priority")

This code creates an ordered categorical column and sorts the DataFrame by that column.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Sorting the DataFrame rows based on the ordered category column.
  • How many times: The sorting algorithm compares elements multiple times, roughly proportional to n log n, where n is the number of rows.
How Execution Grows With Input

As the number of rows increases, the sorting work grows faster than just adding rows.

Input Size (n)Approx. Operations
10About 30 comparisons
100About 700 comparisons
1000About 10,000 comparisons

Pattern observation: The number of operations grows faster than the number of rows, roughly like n times log n.

Final Time Complexity

Time Complexity: O(n log n)

This means that sorting ordered categories takes more time as data grows, but not as fast as checking every pair.

Common Mistake

[X] Wrong: "Sorting ordered categories is just as fast as scanning the data once."

[OK] Correct: Sorting needs multiple comparisons and rearrangements, which take more time than just looking at each item once.

Interview Connect

Understanding how sorting ordered categories scales helps you explain data processing steps clearly and shows you know how data size affects performance.

Self-Check

"What if the categories were unordered? How would sorting by that column affect time complexity?"