DSA Cprogramming

Rabin Karp String Matching in DSA C

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Mental Model

We use a number (hash) to represent a string part and compare these numbers to find matches quickly.

Analogy: Imagine checking if a small book page is inside a big book by comparing page numbers instead of reading every word each time.

Text:  T  h  i  s  i  s  a  t  e  s  t  
Index: 0->1->2->3->4->5->6->7->8->9->10->11->null
Pattern: t  e  s  t
Index: 0->1->2->3->null

Dry Run Walkthrough

Input: text: "testtext", pattern: "test"

Goal: Find all starting positions in text where pattern "test" appears.

Step 1: Calculate hash of pattern "test" and first 4 characters of text "test"

pattern_hash=hash("test")=X, text_hash=hash("test")=X

Why: We need these hashes to compare quickly without checking characters one by one.

Step 2: Compare pattern_hash and text_hash at index 0

pattern_hash == text_hash, check characters: 't'=='t', 'e'=='e', 's'=='s', 't'=='t'

Why: Hashes match, so verify characters to confirm actual match.

Step 3: Slide window by 1: remove 't', add 'e' to text_hash for substring "estt"

text_hash=hash("estt") != pattern_hash

Why: Update hash efficiently to check next substring without full recomputation.

Step 4: Slide window by 1: remove 'e', add 's' to text_hash for substring "stte"

text_hash != pattern_hash

Why: Continue sliding to check all substrings.

Step 5: Slide window by 1: remove 's', add 't' to text_hash for substring "ttex"

text_hash != pattern_hash

Why: Keep sliding until end of text.

Step 6: Slide window by 1: remove 't', add 'x' to text_hash for substring "text"

text_hash != pattern_hash

Why: Checked all possible positions, only match at index 0.

Result:

Matches found at indices: 0 -> null

Annotated Code

DSA C

#include <stdio.h>
#include <string.h>
#define d 256
#define q 101

void rabinKarp(char* text, char* pattern) {
    int n = strlen(text);
    int m = strlen(pattern);
    int i, j;
    int p = 0; // hash value for pattern
    int t = 0; // hash value for text
    int h = 1;

    // The value of h would be "pow(d, m-1)%q"
    for (i = 0; i < m - 1; i++)
        h = (h * d) % q;

    // Calculate the hash value of pattern and first window of text
    for (i = 0; i < m; i++) {
        p = (d * p + pattern[i]) % q;
        t = (d * t + text[i]) % q;
    }

    // Slide the pattern over text one by one
    for (i = 0; i <= n - m; i++) {
        // Check the hash values of current window of text and pattern
        if (p == t) {
            // If the hash values match, check characters one by one
            for (j = 0; j < m; j++) {
                if (text[i + j] != pattern[j])
                    break;
            }
            if (j == m)
                printf("Pattern found at index %d\n", i);
        }

        // Calculate hash value for next window of text
        if (i < n - m) {
            t = (d * (t - text[i] * h) + text[i + m]) % q;
            // We might get negative value of t, convert it to positive
            if (t < 0)
                t = t + q;
        }
    }
}

int main() {
    char text[] = "testtext";
    char pattern[] = "test";
    rabinKarp(text, pattern);
    return 0;
}

for (i = 0; i < m - 1; i++)
    h = (h * d) % q;

Calculate h = d^(m-1) mod q for rolling hash

for (i = 0; i < m; i++) {
    p = (d * p + pattern[i]) % q;
    t = (d * t + text[i]) % q;
}

Compute initial hash values for pattern and first text window

if (p == t) {
    for (j = 0; j < m; j++) {
        if (text[i + j] != pattern[j])
            break;
    }
    if (j == m)
        printf("Pattern found at index %d\n", i);
}

If hashes match, verify characters to confirm pattern

t = (d * (t - text[i] * h) + text[i + m]) % q;
if (t < 0)
    t = t + q;

Update text hash for next window using rolling hash formula

OutputSuccess

Pattern found at index 0

Complexity Analysis

Time: O(n + m) because we compute hashes once and slide over text once, verifying matches only when hashes match

Space: O(1) because we use fixed extra space for hash calculations and counters

vs Alternative: Compared to naive O(n*m) approach that checks every substring fully, Rabin-Karp uses hashing to skip unnecessary checks, improving average performance

Edge Cases

Empty pattern

Pattern matches at every index 0 to n (n+1 matches printed)

DSA C

int m = strlen(pattern); when m==0, hash loops skipped but main loop runs i=0 to n and verification passes (j==m==0)

Pattern longer than text

No matches found, loop does not run

DSA C

for (i = 0; i <= n - m; i++) (loop condition prevents invalid access)

Pattern equals text

Single match found at index 0

DSA C

if (p == t) { ... } (hash comparison and verification)

When to Use This Pattern

When you need to find a substring quickly multiple times and want to avoid checking every character repeatedly, use Rabin-Karp because it uses hashing to speed up comparisons.

Common Mistakes

Mistake: Not handling negative hash values after rolling hash update
Fix: Add check to convert negative hash to positive by adding modulus q

Mistake: Comparing hashes only without verifying characters
Fix: Always verify characters when hashes match to avoid false positives

Summary

Finds all occurrences of a pattern in a text using hashing to speed up comparisons.

Use when you want faster substring search than naive checking, especially for multiple searches.

The key is using a rolling hash to update substring hashes efficiently instead of recomputing from scratch.