Edit distance helps us find how different two words are by counting the smallest number of changes needed to turn one word into the other.
Edit distance (Levenshtein) in NLP
def levenshtein_distance(word1: str, word2: str) -> int: m, n = len(word1), len(word2) dp = [[0] * (n + 1) for _ in range(m + 1)] for i in range(m + 1): dp[i][0] = i for j in range(n + 1): dp[0][j] = j for i in range(1, m + 1): for j in range(1, n + 1): if word1[i - 1] == word2[j - 1]: dp[i][j] = dp[i - 1][j - 1] else: dp[i][j] = 1 + min(dp[i - 1][j], # deletion dp[i][j - 1], # insertion dp[i - 1][j - 1]) # substitution return dp[m][n]
The function uses a table (dp) to store results of smaller problems.
Each cell dp[i][j] shows the edit distance between first i letters of word1 and first j letters of word2.
distance = levenshtein_distance("cat", "cut") print(distance) # Output: 1
distance = levenshtein_distance("", "dog") print(distance) # Output: 3
distance = levenshtein_distance("book", "books") print(distance) # Output: 1
distance = levenshtein_distance("apple", "apple") print(distance) # Output: 0
This program calculates the edit distance between 'kitten' and 'sitting'. It prints the number of changes needed to turn one word into the other.
def levenshtein_distance(word1: str, word2: str) -> int: m, n = len(word1), len(word2) dp = [[0] * (n + 1) for _ in range(m + 1)] for i in range(m + 1): dp[i][0] = i for j in range(n + 1): dp[0][j] = j for i in range(1, m + 1): for j in range(1, n + 1): if word1[i - 1] == word2[j - 1]: dp[i][j] = dp[i - 1][j - 1] else: dp[i][j] = 1 + min(dp[i - 1][j], # deletion dp[i][j - 1], # insertion dp[i - 1][j - 1]) # substitution return dp[m][n] # Words to compare word_a = "kitten" word_b = "sitting" print(f"Edit distance between '{word_a}' and '{word_b}':") distance = levenshtein_distance(word_a, word_b) print(distance)
Time complexity is O(m*n), where m and n are the lengths of the two words.
Space complexity is also O(m*n) because of the table used.
Common mistake: forgetting to initialize the first row and column of the table, which represent empty strings.
Use edit distance when you want to measure how similar two strings are. For very long strings, consider faster approximate methods.
Edit distance counts the smallest number of changes to turn one word into another.
It helps in spell checking, search, and text comparison.
The dynamic programming method uses a table to build the answer step-by-step.