0
0
DSA Cprogramming

Longest Common Substring in DSA C

Choose your learning style9 modes available
Mental Model
Find the longest sequence of characters that appears exactly the same in both strings without skipping any characters.
Analogy: Imagine two sentences written on paper. The longest common substring is like the longest exact phrase that appears in both sentences side by side without any gaps.
String1: A B C D E F G
String2: X Y B C D Z

We look for the longest exact matching block:
B -> C -> D
↑          ↑
match starts here
Dry Run Walkthrough
Input: String1: "ABCDXYZ", String2: "XYZABCD"
Goal: Find the longest substring that appears in both strings exactly and continuously
Step 1: Initialize a 2D array dp with zeros for lengths of substrings
dp matrix filled with 0s, size 8x8 (including empty prefix)
Why: We need a table to store lengths of matching substrings ending at each pair of indices
Step 2: Compare characters at String1[1] = 'A' and String2[1] = 'X', no match, dp[1][1] = 0
dp[1][1] = 0
Why: No common substring ends here because characters differ
Step 3: Compare String1[4] = 'D' and String2[7] = 'D', match found, dp[4][7] = dp[3][6] + 1 = 4
dp[4][7] = 4 (substring length 4 ending here)
Why: We extend the previous matching substring by 1 character
Step 4: Update max length to 4 and record ending index in String1
max_len = 4, end_index = 4
Why: Keep track of the longest substring found so far
Result:
Longest common substring length = 4, substring = "ABCD"
Annotated Code
DSA C
#include <stdio.h>
#include <string.h>

// Function to find longest common substring
void longestCommonSubstring(char *s1, char *s2) {
    int len1 = strlen(s1);
    int len2 = strlen(s2);
    int dp[len1 + 1][len2 + 1];
    int max_len = 0;
    int end_index = 0; // end index of substring in s1

    // Initialize dp array with 0
    for (int i = 0; i <= len1; i++) {
        for (int j = 0; j <= len2; j++) {
            dp[i][j] = 0;
        }
    }

    // Build dp matrix
    for (int i = 1; i <= len1; i++) {
        for (int j = 1; j <= len2; j++) {
            if (s1[i - 1] == s2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1] + 1; // extend previous match
                if (dp[i][j] > max_len) {
                    max_len = dp[i][j];
                    end_index = i - 1; // update end index
                }
            } else {
                dp[i][j] = 0; // no match, reset
            }
        }
    }

    // Print the longest common substring
    if (max_len == 0) {
        printf("No common substring found\n");
    } else {
        printf("Longest common substring length = %d\n", max_len);
        printf("Longest common substring = ");
        for (int i = end_index - max_len + 1; i <= end_index; i++) {
            putchar(s1[i]);
        }
        printf("\n");
    }
}

int main() {
    char s1[] = "ABCDXYZ";
    char s2[] = "XYZABCD";
    longestCommonSubstring(s1, s2);
    return 0;
}
dp[i][j] = dp[i - 1][j - 1] + 1; // extend previous match
extend length of substring when characters match
if (dp[i][j] > max_len) { max_len = dp[i][j]; end_index = i - 1; }
update max length and end index of longest substring found
dp[i][j] = 0; // no match, reset
reset length when characters do not match
OutputSuccess
Longest common substring length = 4 Longest common substring = ABCD
Complexity Analysis
Time: O(m*n) because we fill a 2D table comparing each character of both strings
Space: O(m*n) because we store substring lengths for all pairs of indices
vs Alternative: Naive approach checks all substrings with O(m^2*n) or worse, so DP is much faster
Edge Cases
One or both strings empty
No common substring found, output indicates zero length
DSA C
if (max_len == 0) { printf("No common substring found\n"); }
No common characters at all
No common substring found, output indicates zero length
DSA C
if (max_len == 0) { printf("No common substring found\n"); }
Strings are identical
Longest common substring is the entire string
DSA C
dp[i][j] = dp[i - 1][j - 1] + 1;
When to Use This Pattern
When asked to find the longest exact matching continuous sequence between two strings, use the longest common substring pattern with dynamic programming to efficiently track matches.
Common Mistakes
Mistake: Confusing longest common substring with longest common subsequence (which allows gaps)
Fix: Remember substring must be continuous; reset dp to zero when characters don't match
Mistake: Not updating max length and end index inside the nested loops
Fix: Update max_len and end_index immediately when a longer substring is found
Summary
Finds the longest continuous matching substring between two strings.
Use when you need the longest exact sequence shared by both strings without gaps.
The key is to build a table of substring lengths and reset when characters differ.