NLPml~20 mins

Edit distance (Levenshtein) in NLP - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Edit distance (Levenshtein)

Problem:You want to measure how different two words are by counting the minimum number of changes needed to turn one word into the other. This is called the Levenshtein edit distance.

Current Metrics:The current implementation calculates edit distance correctly but is slow for long words, taking over 5 seconds for words longer than 100 characters.

Issue:The current code uses a simple recursive method without optimization, causing slow performance and making it impractical for larger inputs.

Your Task

Improve the edit distance calculation to run efficiently on longer words (100+ characters) while keeping the results accurate.

You must keep the output exactly the same (correct edit distance).

You cannot use external libraries like 'python-Levenshtein'.

You should optimize the existing code or rewrite it using dynamic programming.

Hint 1

Hint 2

Hint 3

Solution

NLP

def levenshtein_distance(s1: str, s2: str) -> int:
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i - 1] == s2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = 1 + min(dp[i - 1][j],    # deletion
                                   dp[i][j - 1],    # insertion
                                   dp[i - 1][j - 1]) # substitution
    return dp[m][n]

# Example usage:
word1 = "kitten"
word2 = "sitting"
print(f"Edit distance between '{word1}' and '{word2}':", levenshtein_distance(word1, word2))

Replaced the slow recursive method with a dynamic programming approach using a 2D table.

Initialized the table with base cases for empty strings.

Filled the table iteratively to compute minimum edits step-by-step.

This change drastically improved performance for long strings while keeping accuracy.

Results Interpretation

Before: Recursive method took over 5 seconds for long words and was impractical.

After: Dynamic programming method runs in under 0.1 seconds for the same inputs with exact results.

Using dynamic programming to store intermediate results avoids repeated work, making algorithms like edit distance fast and practical for real-world use.

Bonus Experiment

Try to reduce the memory usage of the dynamic programming solution by using only two rows instead of the full 2D table.

💡 Hint

Notice that to compute the current row, you only need the previous row, so you can overwrite rows to save space.

Practice

(1/5)

1. What does the edit distance (Levenshtein distance) between two words measure?

easy

A. The length difference between two words

B. The minimum number of single-character edits to change one word into the other

C. The number of common letters between two words

D. The number of vowels in both words combined

Edit distance (Levenshtein) in NLP - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand the definition of edit distance

Step 2: Compare options with the definition

Final Answer:

Quick Check:

Solution

Step 1: Recall table size for edit distance

Step 2: Match code to correct dimensions

Final Answer:

Quick Check:

Solution

Step 1: Identify edits from "kitten" to "sitting"

Step 2: Count total edits

Final Answer:

Quick Check:

Solution

Step 1: Check string indexing in loops

Step 2: Correct indexing

Final Answer:

Quick Check:

Solution

Step 1: Calculate edit distances to each word

Step 2: Identify minimum distance

Final Answer:

Quick Check: