DSA Cprogramming

Longest Common Substring in DSA C

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Mental Model

Find the longest sequence of characters that appears exactly the same in both strings without skipping any characters.

Analogy: Imagine two sentences written on paper. The longest common substring is like the longest exact phrase that appears in both sentences side by side without any gaps.

String1: A B C D E F G
String2: X Y B C D Z

We look for the longest exact matching block:
B -> C -> D
↑          ↑
match starts here

Dry Run Walkthrough

Input: String1: "ABCDXYZ", String2: "XYZABCD"

Goal: Find the longest substring that appears in both strings exactly and continuously

Step 1: Initialize a 2D array dp with zeros for lengths of substrings

dp matrix filled with 0s, size 8x8 (including empty prefix)

Why: We need a table to store lengths of matching substrings ending at each pair of indices

Step 2: Compare characters at String1[1] = 'A' and String2[1] = 'X', no match, dp[1][1] = 0

dp[1][1] = 0

Why: No common substring ends here because characters differ

Step 3: Compare String1[4] = 'D' and String2[7] = 'D', match found, dp[4][7] = dp[3][6] + 1 = 4

dp[4][7] = 4 (substring length 4 ending here)

Why: We extend the previous matching substring by 1 character

Step 4: Update max length to 4 and record ending index in String1

max_len = 4, end_index = 4

Why: Keep track of the longest substring found so far

Result:

Longest common substring length = 4, substring = "ABCD"

Annotated Code

DSA C

#include <stdio.h>
#include <string.h>

// Function to find longest common substring
void longestCommonSubstring(char *s1, char *s2) {
    int len1 = strlen(s1);
    int len2 = strlen(s2);
    int dp[len1 + 1][len2 + 1];
    int max_len = 0;
    int end_index = 0; // end index of substring in s1

    // Initialize dp array with 0
    for (int i = 0; i <= len1; i++) {
        for (int j = 0; j <= len2; j++) {
            dp[i][j] = 0;
        }
    }

    // Build dp matrix
    for (int i = 1; i <= len1; i++) {
        for (int j = 1; j <= len2; j++) {
            if (s1[i - 1] == s2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1] + 1; // extend previous match
                if (dp[i][j] > max_len) {
                    max_len = dp[i][j];
                    end_index = i - 1; // update end index
                }
            } else {
                dp[i][j] = 0; // no match, reset
            }
        }
    }

    // Print the longest common substring
    if (max_len == 0) {
        printf("No common substring found\n");
    } else {
        printf("Longest common substring length = %d\n", max_len);
        printf("Longest common substring = ");
        for (int i = end_index - max_len + 1; i <= end_index; i++) {
            putchar(s1[i]);
        }
        printf("\n");
    }
}

int main() {
    char s1[] = "ABCDXYZ";
    char s2[] = "XYZABCD";
    longestCommonSubstring(s1, s2);
    return 0;
}

dp[i][j] = dp[i - 1][j - 1] + 1; // extend previous match

extend length of substring when characters match

if (dp[i][j] > max_len) { max_len = dp[i][j]; end_index = i - 1; }

update max length and end index of longest substring found

dp[i][j] = 0; // no match, reset

reset length when characters do not match

OutputSuccess

Longest common substring length = 4 Longest common substring = ABCD

Complexity Analysis

Time: O(m*n) because we fill a 2D table comparing each character of both strings

Space: O(m*n) because we store substring lengths for all pairs of indices

vs Alternative: Naive approach checks all substrings with O(m^2*n) or worse, so DP is much faster

Edge Cases

One or both strings empty

No common substring found, output indicates zero length

DSA C

if (max_len == 0) { printf("No common substring found\n"); }

No common characters at all

No common substring found, output indicates zero length

DSA C

if (max_len == 0) { printf("No common substring found\n"); }

Strings are identical

Longest common substring is the entire string

DSA C

dp[i][j] = dp[i - 1][j - 1] + 1;

When to Use This Pattern

When asked to find the longest exact matching continuous sequence between two strings, use the longest common substring pattern with dynamic programming to efficiently track matches.

Common Mistakes

Mistake: Confusing longest common substring with longest common subsequence (which allows gaps)
Fix: Remember substring must be continuous; reset dp to zero when characters don't match

Mistake: Not updating max length and end index inside the nested loops
Fix: Update max_len and end_index immediately when a longer substring is found

Summary

Finds the longest continuous matching substring between two strings.

Use when you need the longest exact sequence shared by both strings without gaps.

The key is to build a table of substring lengths and reset when characters differ.