PythonProgramBeginner · 2 min read

Python Program to Find Longest Common Substring

You can find the longest common substring in Python by using a dynamic programming approach with a 2D array to track matching characters, like longest_common_substring(s1, s2) which returns the longest substring shared by both strings.

📋

Examples

Inputs1 = 'abcdef', s2 = 'zcdemf'

Output'cde'

Inputs1 = '12345', s2 = '54321'

Output'1'

Inputs1 = 'hello', s2 = 'world'

Output'l'

🧠

How to Think About It

To find the longest common substring, compare each character of the first string with each character of the second string. When characters match, track the length of the matching substring ending at those positions. Keep updating the longest found substring as you go.

📐

Algorithm

Create a 2D array to store lengths of matching substrings ending at each pair of indices.

Initialize variables to track the maximum length and ending index of the longest substring.

Loop through each character of the first string and second string.

If characters match, update the 2D array with the length of the substring ending there.

Update maximum length and ending index if a longer substring is found.

Return the substring from the first string using the recorded maximum length and ending index.

💻

Code

python

def longest_common_substring(s1, s2):
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    max_len = 0
    end_index = 0
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i - 1] == s2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
                if dp[i][j] > max_len:
                    max_len = dp[i][j]
                    end_index = i
    return s1[end_index - max_len:end_index]

print(longest_common_substring('abcdef', 'zcdemf'))

Output

cde

🔍

Dry Run

Let's trace 'abcdef' and 'zcdemf' through the code

Initialize variables

m=6, n=6, dp is a 7x7 matrix of zeros, max_len=0, end_index=0

Compare characters and fill dp

At i=3 (c), j=2 (c), dp[3][2]=1; max_len=1, end_index=3 At i=4 (d), j=3 (d), dp[4][3]=2; max_len=2, end_index=4 At i=5 (e), j=4 (e), dp[5][4]=3; max_len=3, end_index=5

Return substring

Return s1[5-3:5] = s1[2:5] = 'cde'

i	j	s1[i-1]	s2[j-1]	dp[i][j]	max_len	end_index
3	2	c	c	1	1	3
4	3	d	d	2	2	4
5	4	e	e	3	3	5

💡

Why This Works

Step 1: Use a 2D array to track matches

The 2D array dp stores lengths of common substrings ending at each pair of indices, helping to build solutions from smaller parts.

Step 2: Update longest substring info

When characters match, increase the length from the previous diagonal cell and update max_len and end_index to remember the longest substring found.

Step 3: Extract the substring

Use end_index and max_len to slice the longest common substring from the first string and return it.

🔄

Alternative Approaches

Using suffix trees

python

def longest_common_substring_suffix_tree(s1, s2):
    # This method is complex and requires building suffix trees
    # Not shown here due to complexity
    return 'Method requires advanced data structures'

print(longest_common_substring_suffix_tree('abcdef', 'zcdemf'))

Suffix trees can find longest common substring in linear time but are complex to implement.

Brute force checking all substrings

python

def longest_common_substring_brute(s1, s2):
    max_sub = ''
    for i in range(len(s1)):
        for j in range(i + 1, len(s1) + 1):
            sub = s1[i:j]
            if sub in s2 and len(sub) > len(max_sub):
                max_sub = sub
    return max_sub

print(longest_common_substring_brute('abcdef', 'zcdemf'))

Simple but slow for long strings because it checks all substrings.

⚡

Complexity: O(m * n) time, O(m * n) space

Time Complexity

The program uses two nested loops over the lengths of the two strings, so it runs in O(m * n) time where m and n are the string lengths.

Space Complexity

It uses a 2D array of size (m+1) by (n+1) to store substring lengths, so space complexity is O(m * n).

Which Approach is Fastest?

Dynamic programming is faster than brute force for large inputs. Suffix trees are fastest but complex to implement.

Approach	Time	Space	Best For
Dynamic Programming	O(m * n)	O(m * n)	Balanced speed and simplicity
Brute Force	O(m^2 * n)	O(1)	Very small strings or learning
Suffix Trees	O(m + n)	O(m + n)	Very large strings, advanced use

💡

Use dynamic programming with a 2D array to efficiently track matching substring lengths.

⚠️

Not resetting substring length counts properly, causing incorrect longest substring results.