DSA Cprogramming

Longest Common Subsequence in DSA C

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Mental Model

Find the longest sequence of characters that appear in the same order in both strings, but not necessarily consecutively.

Analogy: Imagine two friends telling stories with some common events. The longest common subsequence is like the longest list of shared events they both mention in the same order, even if they skip some details.

String1: A B C D G H
String2: A E D F H R

LCS matrix (partial):
    Ø A E D F H R
Ø [0 0 0 0 0 0 0]
A [0 1 1 1 1 1 1]
B [0 1 1 1 1 1 1]
C [0 1 1 1 1 1 1]
D [0 1 1 2 2 2 2]
G [0 1 1 2 2 2 2]
H [0 1 1 2 2 3 3]

Dry Run Walkthrough

Input: String1: "ABCDGH", String2: "AEDFHR"

Goal: Find the longest common subsequence between the two strings

Step 1: Initialize a 2D array dp with zeros for lengths of substrings

dp matrix filled with zeros for dimensions (7 x 7) including empty prefix rows and columns

Why: We need a base to build subsequence lengths from empty prefixes

Step 2: Compare characters 'A' (String1[0]) and 'A' (String2[0]), they match, set dp[1][1] = dp[0][0] + 1 = 1

dp[1][1] = 1, rest zeros in first row and column

Why: Matching characters increase subsequence length by 1

Step 3: Compare 'B' (String1[1]) and 'E' (String2[1]), no match, set dp[2][2] = max(dp[1][2], dp[2][1]) = 1

dp[2][2] = 1, subsequence length remains 1

Why: No match, so take the best subsequence length from previous prefixes

Step 4: Compare 'D' (String1[3]) and 'D' (String2[2]), match, set dp[4][3] = dp[3][2] + 1 = 2

dp[4][3] = 2, subsequence length increases

Why: Matching characters extend the subsequence length

Step 5: Compare 'H' (String1[5]) and 'H' (String2[4]), match, set dp[6][5] = dp[5][4] + 1 = 3

dp[6][5] = 3, longest subsequence length found

Why: Last matching characters increase subsequence length to final value

Step 6: Trace back from dp[6][5] to find subsequence characters 'A', 'D', 'H'

LCS is "ADH"

Why: Backtracking recovers the actual subsequence from dp matrix

Result:

Longest Common Subsequence: "ADH" with length 3

Annotated Code

DSA C

#include <stdio.h>
#include <string.h>

// Function to find LCS length and print the subsequence
void longestCommonSubsequence(char* X, char* Y) {
    int m = strlen(X);
    int n = strlen(Y);
    int dp[m+1][n+1];

    // Build dp matrix
    for (int i = 0; i <= m; i++) {
        for (int j = 0; j <= n; j++) {
            if (i == 0 || j == 0) {
                dp[i][j] = 0; // base case: empty string
            } else if (X[i-1] == Y[j-1]) {
                dp[i][j] = dp[i-1][j-1] + 1; // match found
            } else {
                dp[i][j] = (dp[i-1][j] > dp[i][j-1]) ? dp[i-1][j] : dp[i][j-1]; // max of ignoring one char
            }
        }
    }

    // Length of LCS
    int index = dp[m][n];
    char lcs[index+1];
    lcs[index] = '\0';

    // Backtrack to find LCS string
    int i = m, j = n;
    while (i > 0 && j > 0) {
        if (X[i-1] == Y[j-1]) {
            lcs[index-1] = X[i-1];
            i--;
            j--;
            index--;
        } else if (dp[i-1][j] > dp[i][j-1]) {
            i--;
        } else {
            j--;
        }
    }

    printf("Longest Common Subsequence: %s\n", lcs);
}

int main() {
    char X[] = "ABCDGH";
    char Y[] = "AEDFHR";
    longestCommonSubsequence(X, Y);
    return 0;
}

if (i == 0 || j == 0) {
    dp[i][j] = 0; // base case: empty string
}

initialize dp edges with zero for empty prefixes

else if (X[i-1] == Y[j-1]) {
    dp[i][j] = dp[i-1][j-1] + 1; // match found
}

increase subsequence length when characters match

else {
    dp[i][j] = (dp[i-1][j] > dp[i][j-1]) ? dp[i-1][j] : dp[i][j-1]; // max of ignoring one char
}

choose max subsequence length ignoring one character

while (i > 0 && j > 0) {
    if (X[i-1] == Y[j-1]) {
        lcs[index-1] = X[i-1];
        i--;
        j--;
        index--;
    } else if (dp[i-1][j] > dp[i][j-1]) {
        i--;
    } else {
        j--;
    }
}

backtrack dp matrix to reconstruct LCS string

OutputSuccess

Longest Common Subsequence: ADH

Complexity Analysis

Time: O(m*n) because we fill a matrix of size m by n once

Space: O(m*n) because we store subsequence lengths for all substring pairs

vs Alternative: Naive recursion without memoization is exponential O(2^(m+n)), so DP is much faster

Edge Cases

One or both strings empty

LCS length is zero and empty subsequence returned

DSA C

if (i == 0 || j == 0) {
    dp[i][j] = 0; // base case: empty string
}

Strings with no common characters

LCS length is zero and empty subsequence returned

DSA C

else {
    dp[i][j] = (dp[i-1][j] > dp[i][j-1]) ? dp[i-1][j] : dp[i][j-1]; // max of ignoring one char
}

Strings with all characters matching

LCS is the entire string

DSA C

else if (X[i-1] == Y[j-1]) {
    dp[i][j] = dp[i-1][j-1] + 1; // match found
}

When to Use This Pattern

When you need to find the longest sequence common to two sequences in order, use the Longest Common Subsequence pattern because it efficiently finds the maximum matching order.

Common Mistakes

Mistake: Not initializing the first row and column of dp matrix to zero
Fix: Explicitly set dp[i][0] and dp[0][j] to zero to handle empty substring cases

Mistake: Incorrectly indexing strings with dp indices (off by one)
Fix: Use X[i-1] and Y[j-1] when accessing characters because dp indices start from 1

Mistake: Not backtracking dp matrix to reconstruct the subsequence
Fix: Implement backtracking loop to build the LCS string from dp matrix

Summary

Finds the longest sequence of characters common to two strings in order.

Use when you want to compare two sequences and find their maximum ordered overlap.

The key insight is building a matrix of subsequence lengths and backtracking to get the sequence.