0
0
PythonProgramBeginner · 2 min read

Python Program to Find Edit Distance Between Strings

You can find the edit distance between two strings in Python using dynamic programming by creating a matrix and filling it with minimum operations using insert, delete, and replace operations; for example, use the function def edit_distance(s1, s2): with a nested loop to compute the distance.
📋

Examples

Input"cat", "cut"
Output1
Input"kitten", "sitting"
Output3
Input"", "abc"
Output3
🧠

How to Think About It

To find the edit distance, think about how to change one string into another step by step. At each step, you can insert a letter, delete a letter, or replace a letter. You compare the strings from start to end, keeping track of the smallest number of changes needed so far. This is done by building a table where each cell shows the minimum edits needed for the parts of the strings up to that point.
📐

Algorithm

1
Create a 2D table with size (length of first string + 1) by (length of second string + 1).
2
Initialize the first row and first column with index values representing edits needed to convert empty string to substring.
3
Loop through each character of both strings.
4
If characters match, copy the diagonal value (no new edit needed).
5
If characters differ, take the minimum of insert, delete, or replace operations plus one.
6
Return the value in the bottom-right cell as the edit distance.
💻

Code

python
def edit_distance(s1, s2):
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i - 1] == s2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = 1 + min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1])
    return dp[m][n]

print(edit_distance("kitten", "sitting"))
Output
3
🔍

Dry Run

Let's trace the example 'kitten' and 'sitting' through the code

1

Initialize dp table

Create a 7x8 table (lengths + 1) with first row 0 to 7 and first column 0 to 6

2

Fill dp table

Compare characters one by one, fill dp[i][j] with minimum edits considering insert, delete, replace

3

Final result

dp[6][7] = 3 means minimum 3 edits to convert 'kitten' to 'sitting'

ijs1[i-1]s2[j-1]dp[i][j]
11ks1
22ii1
33tt1
44tt1
55ei2
66nn2
67ng3
💡

Why This Works

Step 1: Setup the table

We create a table where each cell represents the minimum edits needed to convert substrings of the two strings up to that point.

Step 2: Compare characters

If characters are the same, no new edit is needed, so we take the diagonal value; if different, we consider insert, delete, or replace.

Step 3: Choose minimum edits

We pick the smallest number of edits from the three possible operations plus one to update the current cell.

🔄

Alternative Approaches

Recursive with memoization
python
def edit_distance_recursive(s1, s2, i=0, j=0, memo=None):
    if memo is None:
        memo = {}
    if (i, j) in memo:
        return memo[(i, j)]
    if i == len(s1):
        return len(s2) - j
    if j == len(s2):
        return len(s1) - i
    if s1[i] == s2[j]:
        memo[(i, j)] = edit_distance_recursive(s1, s2, i+1, j+1, memo)
    else:
        insert = 1 + edit_distance_recursive(s1, s2, i, j+1, memo)
        delete = 1 + edit_distance_recursive(s1, s2, i+1, j, memo)
        replace = 1 + edit_distance_recursive(s1, s2, i+1, j+1, memo)
        memo[(i, j)] = min(insert, delete, replace)
    return memo[(i, j)]

print(edit_distance_recursive("kitten", "sitting"))
This method is easier to understand but slower for large strings without memoization.
Using Python's difflib library
python
import difflib

def edit_distance_difflib(s1, s2):
    seq = difflib.SequenceMatcher(None, s1, s2)
    return int(round((1 - seq.ratio()) * max(len(s1), len(s2))))

print(edit_distance_difflib("kitten", "sitting"))
This gives an approximate edit distance based on similarity ratio, not exact but faster for quick checks.

Complexity: O(m*n) time, O(m*n) space

Time Complexity

The algorithm fills a table of size m by n, where m and n are the lengths of the two strings, so it runs in O(m*n) time.

Space Complexity

The 2D table requires O(m*n) space to store intermediate results; space can be optimized but at cost of complexity.

Which Approach is Fastest?

Dynamic programming is fastest for exact results; recursive without memo is slow; difflib is faster but approximate.

ApproachTimeSpaceBest For
Dynamic ProgrammingO(m*n)O(m*n)Exact results, medium to large strings
Recursive with MemoizationO(m*n)O(m*n)Conceptual clarity, smaller inputs
difflib ApproximationO(m+n)O(1)Quick similarity checks, approximate results
💡
Use dynamic programming with a 2D table to efficiently compute edit distance for any two strings.
⚠️
Forgetting to initialize the first row and column of the table, which represent edits from empty strings.