Bird
Raised Fist0
NLPml~10 mins

Edit distance (Levenshtein) in NLP - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to initialize the edit distance matrix with zeros.

NLP
def levenshtein_distance(s1, s2):
    m, n = len(s1), len(s2)
    dp = [[[1] for _ in range(n + 1)] for _ in range(m + 1)]
    return dp
Drag options to blanks, or click blank then click option'
A''
B0
CNone
D1
Attempts:
3 left
💡 Hint
Common Mistakes
Initializing with 1 instead of 0 causes incorrect distance calculations.
Using None or empty string leads to type errors later.
2fill in blank
medium

Complete the code to set the first row of the matrix representing insertions.

NLP
def levenshtein_distance(s1, s2):
    m, n = len(s1), len(s2)
    dp = [[0 for _ in range(n + 1)] for _ in range(m + 1)]
    for j in range(n + 1):
        dp[0][j] = [1]
    return dp[0]
Drag options to blanks, or click blank then click option'
Am
B0
C1
Dj
Attempts:
3 left
💡 Hint
Common Mistakes
Setting all values to 0 ignores insertion costs.
Using m instead of j confuses rows and columns.
3fill in blank
hard

Fix the error in the condition to check if characters are the same.

NLP
if s1[i - 1] [1] s2[j - 1]:
    cost = 0
else:
    cost = 1
Drag options to blanks, or click blank then click option'
A==
B<
C!=
D>
Attempts:
3 left
💡 Hint
Common Mistakes
Using '!=' reverses the logic and causes wrong cost calculation.
Using '<' or '>' is invalid for character equality check.
4fill in blank
hard

Fill both blanks to compute the minimum edit distance step.

NLP
dp[i][j] = min(dp[i - 1][j] + [1], dp[i][j - 1] + [2], dp[i - 1][j - 1] + cost)
Drag options to blanks, or click blank then click option'
A1
B0
Ccost
D2
Attempts:
3 left
💡 Hint
Common Mistakes
Using 0 for insertion or deletion cost ignores actual cost.
Using cost instead of 1 confuses substitution cost with insertion/deletion.
5fill in blank
hard

Fill all three blanks to complete the Levenshtein distance function.

NLP
def levenshtein_distance(s1, s2):
    m, n = len(s1), len(s2)
    dp = [[0 for _ in range(n + 1)] for _ in range(m + 1)]
    for i in range(m + 1):
        dp[i][0] = [1]
    for j in range(n + 1):
        dp[0][j] = [2]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            cost = 0 if s1[i - 1] [3] s2[j - 1] else 1
            dp[i][j] = min(dp[i - 1][j] + 1, dp[i][j - 1] + 1, dp[i - 1][j - 1] + cost)
    return dp[m][n]
Drag options to blanks, or click blank then click option'
Ai
Bj
C==
D!=
Attempts:
3 left
💡 Hint
Common Mistakes
Mixing up i and j in initialization causes wrong matrix setup.
Using '!=' instead of '==' reverses substitution cost logic.

Practice

(1/5)
1. What does the edit distance (Levenshtein distance) between two words measure?
easy
A. The length difference between two words
B. The minimum number of single-character edits to change one word into the other
C. The number of common letters between two words
D. The number of vowels in both words combined

Solution

  1. Step 1: Understand the definition of edit distance

    Edit distance counts how many changes like insertions, deletions, or substitutions are needed to convert one word into another.
  2. Step 2: Compare options with the definition

    Only The minimum number of single-character edits to change one word into the other correctly describes this minimum number of single-character edits.
  3. Final Answer:

    The minimum number of single-character edits to change one word into the other -> Option B
  4. Quick Check:

    Edit distance = minimum edits [OK]
Hint: Edit distance = smallest edits to match words [OK]
Common Mistakes:
  • Confusing edit distance with common letters count
  • Thinking it measures length difference only
  • Mixing up vowels or letter counts
2. Which of the following Python code snippets correctly initializes a 2D table for computing edit distance between strings s1 and s2 of lengths m and n?
easy
A. table = [[0] * (n + 1) for _ in range(m + 1)]
B. table = [[0] * m for _ in range(n)]
C. table = [[0] * (m + 1) for _ in range(n + 1)]
D. table = [[0] * n for _ in range(m)]

Solution

  1. Step 1: Recall table size for edit distance

    The table size must be (m+1) rows and (n+1) columns to include empty prefixes of both strings.
  2. Step 2: Match code to correct dimensions

    table = [[0] * (n + 1) for _ in range(m + 1)] creates a list with (m+1) rows and each row has (n+1) zeros, which is correct.
  3. Final Answer:

    table = [[0] * (n + 1) for _ in range(m + 1)] -> Option A
  4. Quick Check:

    Table size = (m+1) x (n+1) [OK]
Hint: Table size = (len(s1)+1) x (len(s2)+1) [OK]
Common Mistakes:
  • Swapping m and n in dimensions
  • Forgetting to add 1 for empty string prefix
  • Using wrong list comprehension order
3. What is the edit distance between the words "kitten" and "sitting"?
medium
A. 2
B. 4
C. 3
D. 5

Solution

  1. Step 1: Identify edits from "kitten" to "sitting"

    Changes: substitute 'k' -> 's', substitute 'e' -> 'i', insert 'g' at the end.
  2. Step 2: Count total edits

    There are 3 edits: 2 substitutions and 1 insertion.
  3. Final Answer:

    3 -> Option C
  4. Quick Check:

    Edits counted = 3 [OK]
Hint: Count substitutions + insertions + deletions [OK]
Common Mistakes:
  • Counting only substitutions
  • Missing the final insertion
  • Confusing similar letters as no change
4. Consider this Python code snippet for edit distance calculation. What is the error?
def edit_distance(s1, s2):
    m, n = len(s1), len(s2)
    table = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(m + 1):
        table[i][0] = i
    for j in range(n + 1):
        table[0][j] = j
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i] == s2[j]:
                cost = 0
            else:
                cost = 1
            table[i][j] = min(table[i-1][j] + 1, table[i][j-1] + 1, table[i-1][j-1] + cost)
    return table[m][n]
medium
A. The return statement should be table[0][0]
B. The table initialization is incorrect
C. The cost should be 2 when characters differ
D. Indexing s1[i] and s2[j] causes an off-by-one error

Solution

  1. Step 1: Check string indexing in loops

    Strings are 0-indexed, but loops start at 1, so s1[i] and s2[j] skip first character and cause index errors.
  2. Step 2: Correct indexing

    Use s1[i-1] and s2[j-1] to access correct characters for comparison.
  3. Final Answer:

    Indexing s1[i] and s2[j] causes an off-by-one error -> Option D
  4. Quick Check:

    Indexing error = s1[i-1], s2[j-1] needed [OK]
Hint: Remember strings start at index 0, loops at 1 need offset [OK]
Common Mistakes:
  • Using s1[i] instead of s1[i-1]
  • Thinking cost must be 2 for difference
  • Returning wrong table cell
5. You want to find the closest word to "flame" from the list ["frame", "flan", "flame", "blame"] using edit distance. Which word will the algorithm select as closest?
hard
A. "flame"
B. "frame"
C. "blame"
D. "flan"

Solution

  1. Step 1: Calculate edit distances to each word

    Distances: "flame" to "frame" = 1 (substitute 'l'->'r'), to "flan" = 2 (delete 'm' and 'e'), to "blame" = 1 (substitute 'f'->'b'), to "flame" = 0 (same word).
  2. Step 2: Identify minimum distance

    The smallest distance is 0 for "flame" itself, meaning exact match.
  3. Final Answer:

    "flame" -> Option A
  4. Quick Check:

    Exact match distance = 0 [OK]
Hint: Exact match has zero edit distance [OK]
Common Mistakes:
  • Choosing word with one letter difference over exact match
  • Ignoring exact matches
  • Miscounting substitutions