0
0
Pandasdata~5 mins

Converting to categorical in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Converting to categorical
O(n)
Understanding Time Complexity

We want to understand how long it takes to convert a column in a DataFrame to a categorical type.

How does the time needed change when the data size grows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df = pd.DataFrame({
    'color': ['red', 'blue', 'green', 'blue', 'red'] * 1000
})
df['color_cat'] = df['color'].astype('category')

This code creates a DataFrame with repeated color names and converts the 'color' column to a categorical type.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Scanning each value in the column to assign category codes.
  • How many times: Once for each row in the column (n times).
How Execution Grows With Input

As the number of rows increases, the time to convert grows roughly in direct proportion.

Input Size (n)Approx. Operations
10About 10 checks and assignments
100About 100 checks and assignments
1000About 1000 checks and assignments

Pattern observation: Doubling the input roughly doubles the work needed.

Final Time Complexity

Time Complexity: O(n)

This means the time to convert grows linearly with the number of rows in the column.

Common Mistake

[X] Wrong: "Converting to categorical is instant no matter the data size."

[OK] Correct: The operation must look at each value to assign categories, so it takes longer with more data.

Interview Connect

Understanding how data type conversions scale helps you write efficient data processing code in real projects.

Self-Check

"What if the column already has only a few unique values? How would that affect the time complexity?"