0
0
DSA Pythonprogramming

Rabin Karp String Matching in DSA Python

Choose your learning style9 modes available
Mental Model
We use a rolling number code for each part of the text to quickly find where the pattern matches. Instead of checking every letter, we compare these codes.
Analogy: Imagine you want to find a word in a book by checking the sum of letter values instead of reading every letter. If the sums match, you check the word carefully.
Text:  T -> h -> i -> s ->   -> i -> s ->   -> a ->   -> t -> e -> x -> t -> null
Pattern:       ↑   ↑   ↑   ↑
Hash window:  [hash of 'This']
Dry Run Walkthrough
Input: text: 'abababc', pattern: 'abc'
Goal: Find the starting index where 'abc' appears in 'abababc' using rolling hash
Step 1: Calculate hash for pattern 'abc' and first window 'aba' in text
pattern_hash=hash('abc'), window_hash=hash('aba')
Why: We need initial hashes to start comparing pattern with text windows
Step 2: Compare pattern_hash and window_hash; they differ, slide window by 1
window moves to 'bab', window_hash updated using rolling hash
Why: Hashes differ, so pattern not found at index 0; move forward
Step 3: Compare pattern_hash and window_hash for 'bab'; differ again, slide window
window moves to 'aba', window_hash updated
Why: Still no match, continue sliding window
Step 4: Compare pattern_hash and window_hash for 'aba'; differ, slide window
window moves to 'bab', window_hash updated
Why: No match, keep sliding
Step 5: Compare pattern_hash and window_hash for 'bab'; differ, slide window
window moves to 'abc', window_hash updated
Why: Last window to check
Step 6: pattern_hash equals window_hash for 'abc', verify characters match
Match found at index 4
Why: Hashes match, so check actual characters to confirm pattern found
Result:
Final state: Match found at index 4
Annotated Code
DSA Python
class RabinKarp:
    def __init__(self, base=256, prime=101):
        self.base = base
        self.prime = prime

    def search(self, text: str, pattern: str) -> int:
        n, m = len(text), len(pattern)
        if m > n:
            return -1

        pattern_hash = 0
        window_hash = 0
        h = 1  # base^(m-1) % prime

        for i in range(m - 1):
            h = (h * self.base) % self.prime

        for i in range(m):
            pattern_hash = (self.base * pattern_hash + ord(pattern[i])) % self.prime
            window_hash = (self.base * window_hash + ord(text[i])) % self.prime

        for i in range(n - m + 1):
            if pattern_hash == window_hash:
                if text[i:i + m] == pattern:
                    return i
            if i < n - m:
                window_hash = (self.base * (window_hash - ord(text[i]) * h) + ord(text[i + m])) % self.prime
                if window_hash < 0:
                    window_hash += self.prime
        return -1


if __name__ == '__main__':
    rk = RabinKarp()
    text = 'abababc'
    pattern = 'abc'
    index = rk.search(text, pattern)
    if index != -1:
        print(f'Pattern found at index {index}')
    else:
        print('Pattern not found')
for i in range(m - 1): h = (h * self.base) % self.prime
Calculate the high order digit multiplier for rolling hash
for i in range(m): pattern_hash = (self.base * pattern_hash + ord(pattern[i])) % self.prime window_hash = (self.base * window_hash + ord(text[i])) % self.prime
Compute initial hash values for pattern and first text window
if pattern_hash == window_hash: if text[i:i + m] == pattern: return i
Check if hashes match and verify actual substring to confirm pattern
window_hash = (self.base * (window_hash - ord(text[i]) * h) + ord(text[i + m])) % self.prime if window_hash < 0: window_hash += self.prime
Update rolling hash for next window efficiently
OutputSuccess
Pattern found at index 4
Complexity Analysis
Time: O(n + m) because we compute initial hashes in O(m) and slide through text in O(n) with constant time hash updates
Space: O(1) because we use fixed extra space for hash calculations and no extra data structures
vs Alternative: Compared to naive substring search O(n*m), Rabin-Karp is faster on average by using hash comparisons to skip unnecessary checks
Edge Cases
pattern longer than text
returns -1 immediately as no match is possible
DSA Python
if m > n:
    return -1
pattern equals text
returns 0 as the pattern matches at start
DSA Python
if pattern_hash == window_hash:
    if text[i:i + m] == pattern:
        return i
pattern not in text
returns -1 after checking all windows
DSA Python
return -1
When to Use This Pattern
When you need to find a substring efficiently and want to avoid checking every character repeatedly, use Rabin-Karp because it uses rolling hashes to speed up matching.
Common Mistakes
Mistake: Not updating the rolling hash correctly when sliding the window
Fix: Subtract the old char's weighted value and add the new char, then mod prime and adjust if negative
Mistake: Comparing only hashes without verifying actual substring
Fix: Always verify substring characters when hashes match to avoid false positives
Summary
Finds a pattern in text using rolling hash to speed up substring matching
Use when you want faster substring search than naive character-by-character comparison
The key is to use a rolling hash to quickly update and compare hash values for each text window