0
0
Data Analysis Pythondata~5 mins

Pattern matching with str.contains in Data Analysis Python - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Pattern matching with str.contains
O(n * m)
Understanding Time Complexity

We want to understand how the time to find patterns in text grows as the data gets bigger.

How does searching for a pattern in many text entries take more or less time as the list grows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

data = pd.Series(["apple", "banana", "apricot", "cherry", "pineapple"] * 1000)
pattern = "app"
matches = data.str.contains(pattern)

This code checks each string in a list to see if it contains the pattern "app".

Identify Repeating Operations
  • Primary operation: Checking each string for the pattern using str.contains.
  • How many times: Once for each string in the list (n times).
How Execution Grows With Input

Each new string adds one more pattern check, so the work grows steadily with the number of strings.

Input Size (n)Approx. Operations
1010 pattern checks
100100 pattern checks
10001000 pattern checks

Pattern observation: The time grows directly in proportion to the number of strings.

Final Time Complexity

Time Complexity: O(n * m)

This means the time grows with the number of strings (n) and the length of each string (m) because each string is checked for the pattern.

Common Mistake

[X] Wrong: "The time to find patterns is just O(n) because it only depends on the number of strings."

[OK] Correct: Each string's length matters because the pattern search looks inside every character, so longer strings take more time.

Interview Connect

Understanding how pattern search scales helps you explain your code choices clearly and shows you know how data size affects performance.

Self-Check

"What if the pattern was a regular expression instead of a simple string? How would the time complexity change?"