0
0
Pandasdata~5 mins

Adding and removing categories in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Adding and removing categories
O(n + k)
Understanding Time Complexity

We want to understand how the time needed changes when we add or remove categories in a pandas categorical column.

How does the work grow as the number of categories or data size grows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

# Create a categorical column
cat = pd.Categorical(["a", "b", "a", "c"])

# Add a new category
cat = cat.add_categories(["d"])

# Remove a category
cat = cat.remove_categories(["b"])

This code creates a categorical column, then adds a new category, and finally removes an existing category.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Updating the internal categories list and mapping data to new categories.
  • How many times: The operations scan through the categories and sometimes the data values once each.
How Execution Grows With Input

When adding or removing categories, pandas updates the list of categories and may check data values to adjust mappings.

Input Size (n)Approx. Operations
10 categories, 1000 data pointsOperations scale with categories and data size, roughly scanning categories and data once.
100 categories, 10,000 data pointsOperations increase roughly 10 times, scanning more categories and data.
1000 categories, 100,000 data pointsOperations increase roughly 100 times, scanning all categories and data.

Pattern observation: The work grows roughly linearly with the number of categories and data points.

Final Time Complexity

Time Complexity: O(n + k)

This means the time grows linearly with the number of data points (n) plus the number of categories (k).

Common Mistake

[X] Wrong: "Adding or removing categories is instant and does not depend on data size."

[OK] Correct: Pandas must update category mappings and sometimes check all data points, so time depends on both categories and data size.

Interview Connect

Understanding how operations scale with data size and categories helps you explain performance in real data tasks clearly and confidently.

Self-Check

"What if we add multiple categories at once instead of one by one? How would the time complexity change?"