Pattern matching with str.contains in Data Analysis Python - Time & Space Complexity
We want to understand how the time to find patterns in text grows as the data gets bigger.
How does searching for a pattern in many text entries take more or less time as the list grows?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.Series(["apple", "banana", "apricot", "cherry", "pineapple"] * 1000)
pattern = "app"
matches = data.str.contains(pattern)
This code checks each string in a list to see if it contains the pattern "app".
- Primary operation: Checking each string for the pattern using str.contains.
- How many times: Once for each string in the list (n times).
Each new string adds one more pattern check, so the work grows steadily with the number of strings.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 pattern checks |
| 100 | 100 pattern checks |
| 1000 | 1000 pattern checks |
Pattern observation: The time grows directly in proportion to the number of strings.
Time Complexity: O(n * m)
This means the time grows with the number of strings (n) and the length of each string (m) because each string is checked for the pattern.
[X] Wrong: "The time to find patterns is just O(n) because it only depends on the number of strings."
[OK] Correct: Each string's length matters because the pattern search looks inside every character, so longer strings take more time.
Understanding how pattern search scales helps you explain your code choices clearly and shows you know how data size affects performance.
"What if the pattern was a regular expression instead of a simple string? How would the time complexity change?"