Python Program to Find Longest Common Substring
longest_common_substring(s1, s2) which returns the longest substring shared by both strings.Examples
How to Think About It
Algorithm
Code
def longest_common_substring(s1, s2): m, n = len(s1), len(s2) dp = [[0] * (n + 1) for _ in range(m + 1)] max_len = 0 end_index = 0 for i in range(1, m + 1): for j in range(1, n + 1): if s1[i - 1] == s2[j - 1]: dp[i][j] = dp[i - 1][j - 1] + 1 if dp[i][j] > max_len: max_len = dp[i][j] end_index = i return s1[end_index - max_len:end_index] print(longest_common_substring('abcdef', 'zcdemf'))
Dry Run
Let's trace 'abcdef' and 'zcdemf' through the code
Initialize variables
m=6, n=6, dp is a 7x7 matrix of zeros, max_len=0, end_index=0
Compare characters and fill dp
At i=3 (c), j=2 (c), dp[3][2]=1; max_len=1, end_index=3 At i=4 (d), j=3 (d), dp[4][3]=2; max_len=2, end_index=4 At i=5 (e), j=4 (e), dp[5][4]=3; max_len=3, end_index=5
Return substring
Return s1[5-3:5] = s1[2:5] = 'cde'
| i | j | s1[i-1] | s2[j-1] | dp[i][j] | max_len | end_index |
|---|---|---|---|---|---|---|
| 3 | 2 | c | c | 1 | 1 | 3 |
| 4 | 3 | d | d | 2 | 2 | 4 |
| 5 | 4 | e | e | 3 | 3 | 5 |
Why This Works
Step 1: Use a 2D array to track matches
The 2D array dp stores lengths of common substrings ending at each pair of indices, helping to build solutions from smaller parts.
Step 2: Update longest substring info
When characters match, increase the length from the previous diagonal cell and update max_len and end_index to remember the longest substring found.
Step 3: Extract the substring
Use end_index and max_len to slice the longest common substring from the first string and return it.
Alternative Approaches
def longest_common_substring_suffix_tree(s1, s2): # This method is complex and requires building suffix trees # Not shown here due to complexity return 'Method requires advanced data structures' print(longest_common_substring_suffix_tree('abcdef', 'zcdemf'))
def longest_common_substring_brute(s1, s2): max_sub = '' for i in range(len(s1)): for j in range(i + 1, len(s1) + 1): sub = s1[i:j] if sub in s2 and len(sub) > len(max_sub): max_sub = sub return max_sub print(longest_common_substring_brute('abcdef', 'zcdemf'))
Complexity: O(m * n) time, O(m * n) space
Time Complexity
The program uses two nested loops over the lengths of the two strings, so it runs in O(m * n) time where m and n are the string lengths.
Space Complexity
It uses a 2D array of size (m+1) by (n+1) to store substring lengths, so space complexity is O(m * n).
Which Approach is Fastest?
Dynamic programming is faster than brute force for large inputs. Suffix trees are fastest but complex to implement.
| Approach | Time | Space | Best For |
|---|---|---|---|
| Dynamic Programming | O(m * n) | O(m * n) | Balanced speed and simplicity |
| Brute Force | O(m^2 * n) | O(1) | Very small strings or learning |
| Suffix Trees | O(m + n) | O(m + n) | Very large strings, advanced use |