Bird
0
0
DSA Cprogramming

Rabin Karp String Matching in DSA C

Choose your learning style9 modes available
Mental Model
We use a number (hash) to represent a string part and compare these numbers to find matches quickly.
Analogy: Imagine checking if a small book page is inside a big book by comparing page numbers instead of reading every word each time.
Text:  T  h  i  s  i  s  a  t  e  s  t  
Index: 0->1->2->3->4->5->6->7->8->9->10->11->null
Pattern: t  e  s  t
Index: 0->1->2->3->null
Dry Run Walkthrough
Input: text: "testtext", pattern: "test"
Goal: Find all starting positions in text where pattern "test" appears.
Step 1: Calculate hash of pattern "test" and first 4 characters of text "test"
pattern_hash=hash("test")=X, text_hash=hash("test")=X
Why: We need these hashes to compare quickly without checking characters one by one.
Step 2: Compare pattern_hash and text_hash at index 0
pattern_hash == text_hash, check characters: 't'=='t', 'e'=='e', 's'=='s', 't'=='t'
Why: Hashes match, so verify characters to confirm actual match.
Step 3: Slide window by 1: remove 't', add 'e' to text_hash for substring "estt"
text_hash=hash("estt") != pattern_hash
Why: Update hash efficiently to check next substring without full recomputation.
Step 4: Slide window by 1: remove 'e', add 's' to text_hash for substring "stte"
text_hash != pattern_hash
Why: Continue sliding to check all substrings.
Step 5: Slide window by 1: remove 's', add 't' to text_hash for substring "ttex"
text_hash != pattern_hash
Why: Keep sliding until end of text.
Step 6: Slide window by 1: remove 't', add 'x' to text_hash for substring "text"
text_hash != pattern_hash
Why: Checked all possible positions, only match at index 0.
Result:
Matches found at indices: 0 -> null
Annotated Code
DSA C
#include <stdio.h>
#include <string.h>
#define d 256
#define q 101

void rabinKarp(char* text, char* pattern) {
    int n = strlen(text);
    int m = strlen(pattern);
    int i, j;
    int p = 0; // hash value for pattern
    int t = 0; // hash value for text
    int h = 1;

    // The value of h would be "pow(d, m-1)%q"
    for (i = 0; i < m - 1; i++)
        h = (h * d) % q;

    // Calculate the hash value of pattern and first window of text
    for (i = 0; i < m; i++) {
        p = (d * p + pattern[i]) % q;
        t = (d * t + text[i]) % q;
    }

    // Slide the pattern over text one by one
    for (i = 0; i <= n - m; i++) {
        // Check the hash values of current window of text and pattern
        if (p == t) {
            // If the hash values match, check characters one by one
            for (j = 0; j < m; j++) {
                if (text[i + j] != pattern[j])
                    break;
            }
            if (j == m)
                printf("Pattern found at index %d\n", i);
        }

        // Calculate hash value for next window of text
        if (i < n - m) {
            t = (d * (t - text[i] * h) + text[i + m]) % q;
            // We might get negative value of t, convert it to positive
            if (t < 0)
                t = t + q;
        }
    }
}

int main() {
    char text[] = "testtext";
    char pattern[] = "test";
    rabinKarp(text, pattern);
    return 0;
}
for (i = 0; i < m - 1; i++) h = (h * d) % q;
Calculate h = d^(m-1) mod q for rolling hash
for (i = 0; i < m; i++) { p = (d * p + pattern[i]) % q; t = (d * t + text[i]) % q; }
Compute initial hash values for pattern and first text window
if (p == t) { for (j = 0; j < m; j++) { if (text[i + j] != pattern[j]) break; } if (j == m) printf("Pattern found at index %d\n", i); }
If hashes match, verify characters to confirm pattern
t = (d * (t - text[i] * h) + text[i + m]) % q; if (t < 0) t = t + q;
Update text hash for next window using rolling hash formula
OutputSuccess
Pattern found at index 0
Complexity Analysis
Time: O(n + m) because we compute hashes once and slide over text once, verifying matches only when hashes match
Space: O(1) because we use fixed extra space for hash calculations and counters
vs Alternative: Compared to naive O(n*m) approach that checks every substring fully, Rabin-Karp uses hashing to skip unnecessary checks, improving average performance
Edge Cases
Empty pattern
Pattern matches at every index 0 to n (n+1 matches printed)
DSA C
int m = strlen(pattern); when m==0, hash loops skipped but main loop runs i=0 to n and verification passes (j==m==0)
Pattern longer than text
No matches found, loop does not run
DSA C
for (i = 0; i <= n - m; i++) (loop condition prevents invalid access)
Pattern equals text
Single match found at index 0
DSA C
if (p == t) { ... } (hash comparison and verification)
When to Use This Pattern
When you need to find a substring quickly multiple times and want to avoid checking every character repeatedly, use Rabin-Karp because it uses hashing to speed up comparisons.
Common Mistakes
Mistake: Not handling negative hash values after rolling hash update
Fix: Add check to convert negative hash to positive by adding modulus q
Mistake: Comparing hashes only without verifying characters
Fix: Always verify characters when hashes match to avoid false positives
Summary
Finds all occurrences of a pattern in a text using hashing to speed up comparisons.
Use when you want faster substring search than naive checking, especially for multiple searches.
The key is using a rolling hash to update substring hashes efficiently instead of recomputing from scratch.