DSA Pythonprogramming

Rabin Karp String Matching in DSA Python

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Mental Model

We use a rolling number code for each part of the text to quickly find where the pattern matches. Instead of checking every letter, we compare these codes.

Analogy: Imagine you want to find a word in a book by checking the sum of letter values instead of reading every letter. If the sums match, you check the word carefully.

Text:  T -> h -> i -> s ->   -> i -> s ->   -> a ->   -> t -> e -> x -> t -> null
Pattern:       ↑   ↑   ↑   ↑
Hash window:  [hash of 'This']

Dry Run Walkthrough

Input: text: 'abababc', pattern: 'abc'

Goal: Find the starting index where 'abc' appears in 'abababc' using rolling hash

Step 1: Calculate hash for pattern 'abc' and first window 'aba' in text

pattern_hash=hash('abc'), window_hash=hash('aba')

Why: We need initial hashes to start comparing pattern with text windows

Step 2: Compare pattern_hash and window_hash; they differ, slide window by 1

window moves to 'bab', window_hash updated using rolling hash

Why: Hashes differ, so pattern not found at index 0; move forward

Step 3: Compare pattern_hash and window_hash for 'bab'; differ again, slide window

window moves to 'aba', window_hash updated

Why: Still no match, continue sliding window

Step 4: Compare pattern_hash and window_hash for 'aba'; differ, slide window

window moves to 'bab', window_hash updated

Why: No match, keep sliding

Step 5: Compare pattern_hash and window_hash for 'bab'; differ, slide window

window moves to 'abc', window_hash updated

Why: Last window to check

Step 6: pattern_hash equals window_hash for 'abc', verify characters match

Match found at index 4

Why: Hashes match, so check actual characters to confirm pattern found

Result:

Final state: Match found at index 4

Annotated Code

DSA Python

class RabinKarp:
    def __init__(self, base=256, prime=101):
        self.base = base
        self.prime = prime

    def search(self, text: str, pattern: str) -> int:
        n, m = len(text), len(pattern)
        if m > n:
            return -1

        pattern_hash = 0
        window_hash = 0
        h = 1  # base^(m-1) % prime

        for i in range(m - 1):
            h = (h * self.base) % self.prime

        for i in range(m):
            pattern_hash = (self.base * pattern_hash + ord(pattern[i])) % self.prime
            window_hash = (self.base * window_hash + ord(text[i])) % self.prime

        for i in range(n - m + 1):
            if pattern_hash == window_hash:
                if text[i:i + m] == pattern:
                    return i
            if i < n - m:
                window_hash = (self.base * (window_hash - ord(text[i]) * h) + ord(text[i + m])) % self.prime
                if window_hash < 0:
                    window_hash += self.prime
        return -1


if __name__ == '__main__':
    rk = RabinKarp()
    text = 'abababc'
    pattern = 'abc'
    index = rk.search(text, pattern)
    if index != -1:
        print(f'Pattern found at index {index}')
    else:
        print('Pattern not found')

for i in range(m - 1):
    h = (h * self.base) % self.prime

Calculate the high order digit multiplier for rolling hash

for i in range(m):
    pattern_hash = (self.base * pattern_hash + ord(pattern[i])) % self.prime
    window_hash = (self.base * window_hash + ord(text[i])) % self.prime

Compute initial hash values for pattern and first text window

if pattern_hash == window_hash:
    if text[i:i + m] == pattern:
        return i

Check if hashes match and verify actual substring to confirm pattern

window_hash = (self.base * (window_hash - ord(text[i]) * h) + ord(text[i + m])) % self.prime
if window_hash < 0:
    window_hash += self.prime

Update rolling hash for next window efficiently

OutputSuccess

Pattern found at index 4

Complexity Analysis

Time: O(n + m) because we compute initial hashes in O(m) and slide through text in O(n) with constant time hash updates

Space: O(1) because we use fixed extra space for hash calculations and no extra data structures

vs Alternative: Compared to naive substring search O(n*m), Rabin-Karp is faster on average by using hash comparisons to skip unnecessary checks

Edge Cases

pattern longer than text

returns -1 immediately as no match is possible

DSA Python

if m > n:
    return -1

pattern equals text

returns 0 as the pattern matches at start

DSA Python

if pattern_hash == window_hash:
    if text[i:i + m] == pattern:
        return i

pattern not in text

returns -1 after checking all windows

DSA Python

return -1

When to Use This Pattern

When you need to find a substring efficiently and want to avoid checking every character repeatedly, use Rabin-Karp because it uses rolling hashes to speed up matching.

Common Mistakes

Mistake: Not updating the rolling hash correctly when sliding the window
Fix: Subtract the old char's weighted value and add the new char, then mod prime and adjust if negative

Mistake: Comparing only hashes without verifying actual substring
Fix: Always verify substring characters when hashes match to avoid false positives

Summary

Finds a pattern in text using rolling hash to speed up substring matching

Use when you want faster substring search than naive character-by-character comparison

The key is to use a rolling hash to quickly update and compare hash values for each text window