0
0
Pandasdata~5 mins

Regex operations in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Regex operations in Pandas
O(n)
Understanding Time Complexity

When we use regex operations in pandas, we want to know how the time it takes changes as our data grows.

We ask: How does searching or matching patterns slow down when we have more data?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df = pd.DataFrame({
    'text': ['apple123', 'banana456', 'cherry789', 'date012'] * 1000
})

matches = df['text'].str.contains(r'\d{3}')

This code checks each string in the 'text' column to see if it contains three digits in a row.

Identify Repeating Operations
  • Primary operation: Applying the regex pattern to each string in the column.
  • How many times: Once for every row in the DataFrame.
How Execution Grows With Input

As the number of rows grows, the total work grows roughly in the same way.

Input Size (n)Approx. Operations
10About 10 regex checks
100About 100 regex checks
1000About 1000 regex checks

Pattern observation: Doubling the rows roughly doubles the work because each row is checked once.

Final Time Complexity

Time Complexity: O(n)

This means the time grows linearly with the number of rows; more rows mean proportionally more work.

Common Mistake

[X] Wrong: "Regex operations take constant time no matter how many rows there are."

[OK] Correct: Each row is checked separately, so more rows mean more checks and more time.

Interview Connect

Understanding how regex operations scale helps you explain your code's speed and handle bigger data confidently.

Self-Check

"What if we changed the regex to a more complex pattern that takes longer to match? How would the time complexity change?"