0
0
Data Analysis Pythondata~5 mins

Extracting with str.extract (regex) in Data Analysis Python - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Extracting with str.extract (regex)
O(n)
Understanding Time Complexity

We want to understand how the time needed to extract text using regex grows as the data size increases.

How does the extraction time change when we have more rows to process?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

data = pd.Series(['abc123', 'def456', 'ghi789'] * 1000)
pattern = r'(\d+)'
extracted = data.str.extract(pattern)

This code extracts the number part from each string in a pandas Series using a regex pattern.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Applying the regex extraction on each string in the Series.
  • How many times: Once per element in the Series, so as many times as the number of rows.
How Execution Grows With Input

As the number of rows grows, the total work grows roughly in direct proportion.

Input Size (n)Approx. Operations
1010 regex extractions
100100 regex extractions
10001000 regex extractions

Pattern observation: Doubling the input roughly doubles the work because each string is processed once.

Final Time Complexity

Time Complexity: O(n)

This means the time to extract grows linearly with the number of strings you process.

Common Mistake

[X] Wrong: "Using regex extraction is instant no matter how many rows there are."

[OK] Correct: Each row requires running the regex, so more rows mean more work and more time.

Interview Connect

Understanding how regex extraction scales helps you explain performance when working with text data in real projects.

Self-Check

"What if the regex pattern was more complex and slower to match? How would that affect the time complexity?"