What is Edit distance (Levenshtein) in NLP?

NLPml~5 mins

Edit distance (Levenshtein) in NLP

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Introduction

Edit distance helps us find how different two words are by counting the smallest number of changes needed to turn one word into the other.

Checking spelling mistakes and suggesting corrections.

Comparing two names to see if they might be the same person.

Finding similar words in search engines.

Matching user input with stored commands or keywords.

Detecting plagiarism by comparing text similarity.

Syntax

NLP

def levenshtein_distance(word1: str, word2: str) -> int:
    m, n = len(word1), len(word2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = 1 + min(dp[i - 1][j],    # deletion
                                   dp[i][j - 1],    # insertion
                                   dp[i - 1][j - 1]) # substitution
    return dp[m][n]

The function uses a table (dp) to store results of smaller problems.

Each cell dp[i][j] shows the edit distance between first i letters of word1 and first j letters of word2.

Examples

Only one substitution needed: 'a' to 'u'.

NLP

distance = levenshtein_distance("cat", "cut")
print(distance)  # Output: 1

Empty string to 'dog' needs 3 insertions.

NLP

distance = levenshtein_distance("", "dog")
print(distance)  # Output: 3

One insertion of 's' at the end.

NLP

distance = levenshtein_distance("book", "books")
print(distance)  # Output: 1

Both words are the same, so no changes needed.

NLP

distance = levenshtein_distance("apple", "apple")
print(distance)  # Output: 0

Sample Model

This program calculates the edit distance between 'kitten' and 'sitting'. It prints the number of changes needed to turn one word into the other.

NLP

def levenshtein_distance(word1: str, word2: str) -> int:
    m, n = len(word1), len(word2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = 1 + min(dp[i - 1][j],    # deletion
                                   dp[i][j - 1],    # insertion
                                   dp[i - 1][j - 1]) # substitution
    return dp[m][n]

# Words to compare
word_a = "kitten"
word_b = "sitting"

print(f"Edit distance between '{word_a}' and '{word_b}':")
distance = levenshtein_distance(word_a, word_b)
print(distance)

OutputSuccess

Important Notes

Time complexity is O(m*n), where m and n are the lengths of the two words.

Space complexity is also O(m*n) because of the table used.

Common mistake: forgetting to initialize the first row and column of the table, which represent empty strings.

Use edit distance when you want to measure how similar two strings are. For very long strings, consider faster approximate methods.

Summary

Edit distance counts the smallest number of changes to turn one word into another.

It helps in spell checking, search, and text comparison.

The dynamic programming method uses a table to build the answer step-by-step.

Practice

(1/5)

1. What does the edit distance (Levenshtein distance) between two words measure?

easy

A. The length difference between two words

B. The minimum number of single-character edits to change one word into the other

C. The number of common letters between two words

D. The number of vowels in both words combined

Edit distance (Levenshtein) in NLP

Start learning this pattern below

Practice

Solution

Step 1: Understand the definition of edit distance

Step 2: Compare options with the definition

Final Answer:

Quick Check:

Solution

Step 1: Recall table size for edit distance

Step 2: Match code to correct dimensions

Final Answer:

Quick Check:

Solution

Step 1: Identify edits from "kitten" to "sitting"

Step 2: Count total edits

Final Answer:

Quick Check:

Solution

Step 1: Check string indexing in loops

Step 2: Correct indexing

Final Answer:

Quick Check:

Solution

Step 1: Calculate edit distances to each word

Step 2: Identify minimum distance

Final Answer:

Quick Check: