0
0
Pandasdata~5 mins

str.contains() for pattern matching in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: str.contains() for pattern matching
O(n * m)
Understanding Time Complexity

We want to understand how the time needed to find patterns in text grows as the data gets bigger.

How does using str.contains() on many text entries affect the time it takes?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

data = pd.Series(["apple", "banana", "apricot", "cherry", "pineapple"] * 1000)
pattern = "app"
matches = data.str.contains(pattern)

This code checks each string in the series to see if it contains the pattern "app".

Identify Repeating Operations
  • Primary operation: Checking each string for the pattern using str.contains().
  • How many times: Once for each string in the series (n times).
How Execution Grows With Input

Each string is checked one by one. If there are more strings, the work grows in a straight line.

Input Size (n)Approx. Operations
1010 pattern checks
100100 pattern checks
10001000 pattern checks

Pattern observation: Doubling the number of strings roughly doubles the work.

Final Time Complexity

Time Complexity: O(n * m)

This means the time grows with the number of strings (n) and the average length of each string (m).

Common Mistake

[X] Wrong: "The time only depends on the number of strings, not their length."

[OK] Correct: Each string's length affects how long the pattern search takes, so longer strings mean more work per check.

Interview Connect

Understanding how pattern matching scales helps you explain and improve text search tasks in real projects.

Self-Check

"What if we used a compiled regular expression with str.contains()? How would the time complexity change?"