Bird
Raised Fist0
NLPml~5 mins

Edit distance (Levenshtein) in NLP - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is the Edit distance (Levenshtein)?
It is the minimum number of single-character edits (insertions, deletions, or substitutions) needed to change one word into another.
Click to reveal answer
beginner
Which operations are counted in the Levenshtein distance?
Insertions, deletions, and substitutions of characters.
Click to reveal answer
beginner
Why is Edit distance useful in NLP?
It helps measure how similar two words or strings are, useful for spell checking, autocorrect, and fuzzy matching.
Click to reveal answer
intermediate
How does the Levenshtein distance between "kitten" and "sitting" calculate?
The distance is 3: substitute 'k'→'s', substitute 'e'→'i', and insert 'g' at the end.
Click to reveal answer
intermediate
What is the time complexity of the standard dynamic programming algorithm for Levenshtein distance?
It is O(m×n), where m and n are the lengths of the two strings.
Click to reveal answer
Which of these is NOT an operation counted in Levenshtein distance?
ATransposition
BInsertion
CDeletion
DSubstitution
What does a Levenshtein distance of 0 between two strings mean?
AStrings are completely different
BStrings are identical
CStrings differ by one character
DStrings have the same length
If you change 'cat' to 'cut', what is the Levenshtein distance?
A3
B2
C1
D0
Which algorithmic approach is commonly used to compute Levenshtein distance?
AGreedy algorithm
BBacktracking
CDivide and conquer
DDynamic programming
Levenshtein distance is useful in which NLP task?
ASpell checking
BPart-of-speech tagging
CParsing
DTopic modeling
Explain what Edit distance (Levenshtein) measures and why it is important in NLP.
Think about how you fix typos or compare words.
You got /3 concepts.
    Describe how the dynamic programming method calculates the Levenshtein distance between two strings.
    Imagine filling a table step by step to find the answer.
    You got /3 concepts.

      Practice

      (1/5)
      1. What does the edit distance (Levenshtein distance) between two words measure?
      easy
      A. The length difference between two words
      B. The minimum number of single-character edits to change one word into the other
      C. The number of common letters between two words
      D. The number of vowels in both words combined

      Solution

      1. Step 1: Understand the definition of edit distance

        Edit distance counts how many changes like insertions, deletions, or substitutions are needed to convert one word into another.
      2. Step 2: Compare options with the definition

        Only The minimum number of single-character edits to change one word into the other correctly describes this minimum number of single-character edits.
      3. Final Answer:

        The minimum number of single-character edits to change one word into the other -> Option B
      4. Quick Check:

        Edit distance = minimum edits [OK]
      Hint: Edit distance = smallest edits to match words [OK]
      Common Mistakes:
      • Confusing edit distance with common letters count
      • Thinking it measures length difference only
      • Mixing up vowels or letter counts
      2. Which of the following Python code snippets correctly initializes a 2D table for computing edit distance between strings s1 and s2 of lengths m and n?
      easy
      A. table = [[0] * (n + 1) for _ in range(m + 1)]
      B. table = [[0] * m for _ in range(n)]
      C. table = [[0] * (m + 1) for _ in range(n + 1)]
      D. table = [[0] * n for _ in range(m)]

      Solution

      1. Step 1: Recall table size for edit distance

        The table size must be (m+1) rows and (n+1) columns to include empty prefixes of both strings.
      2. Step 2: Match code to correct dimensions

        table = [[0] * (n + 1) for _ in range(m + 1)] creates a list with (m+1) rows and each row has (n+1) zeros, which is correct.
      3. Final Answer:

        table = [[0] * (n + 1) for _ in range(m + 1)] -> Option A
      4. Quick Check:

        Table size = (m+1) x (n+1) [OK]
      Hint: Table size = (len(s1)+1) x (len(s2)+1) [OK]
      Common Mistakes:
      • Swapping m and n in dimensions
      • Forgetting to add 1 for empty string prefix
      • Using wrong list comprehension order
      3. What is the edit distance between the words "kitten" and "sitting"?
      medium
      A. 2
      B. 4
      C. 3
      D. 5

      Solution

      1. Step 1: Identify edits from "kitten" to "sitting"

        Changes: substitute 'k' -> 's', substitute 'e' -> 'i', insert 'g' at the end.
      2. Step 2: Count total edits

        There are 3 edits: 2 substitutions and 1 insertion.
      3. Final Answer:

        3 -> Option C
      4. Quick Check:

        Edits counted = 3 [OK]
      Hint: Count substitutions + insertions + deletions [OK]
      Common Mistakes:
      • Counting only substitutions
      • Missing the final insertion
      • Confusing similar letters as no change
      4. Consider this Python code snippet for edit distance calculation. What is the error?
      def edit_distance(s1, s2):
          m, n = len(s1), len(s2)
          table = [[0] * (n + 1) for _ in range(m + 1)]
          for i in range(m + 1):
              table[i][0] = i
          for j in range(n + 1):
              table[0][j] = j
          for i in range(1, m + 1):
              for j in range(1, n + 1):
                  if s1[i] == s2[j]:
                      cost = 0
                  else:
                      cost = 1
                  table[i][j] = min(table[i-1][j] + 1, table[i][j-1] + 1, table[i-1][j-1] + cost)
          return table[m][n]
      medium
      A. The return statement should be table[0][0]
      B. The table initialization is incorrect
      C. The cost should be 2 when characters differ
      D. Indexing s1[i] and s2[j] causes an off-by-one error

      Solution

      1. Step 1: Check string indexing in loops

        Strings are 0-indexed, but loops start at 1, so s1[i] and s2[j] skip first character and cause index errors.
      2. Step 2: Correct indexing

        Use s1[i-1] and s2[j-1] to access correct characters for comparison.
      3. Final Answer:

        Indexing s1[i] and s2[j] causes an off-by-one error -> Option D
      4. Quick Check:

        Indexing error = s1[i-1], s2[j-1] needed [OK]
      Hint: Remember strings start at index 0, loops at 1 need offset [OK]
      Common Mistakes:
      • Using s1[i] instead of s1[i-1]
      • Thinking cost must be 2 for difference
      • Returning wrong table cell
      5. You want to find the closest word to "flame" from the list ["frame", "flan", "flame", "blame"] using edit distance. Which word will the algorithm select as closest?
      hard
      A. "flame"
      B. "frame"
      C. "blame"
      D. "flan"

      Solution

      1. Step 1: Calculate edit distances to each word

        Distances: "flame" to "frame" = 1 (substitute 'l'->'r'), to "flan" = 2 (delete 'm' and 'e'), to "blame" = 1 (substitute 'f'->'b'), to "flame" = 0 (same word).
      2. Step 2: Identify minimum distance

        The smallest distance is 0 for "flame" itself, meaning exact match.
      3. Final Answer:

        "flame" -> Option A
      4. Quick Check:

        Exact match distance = 0 [OK]
      Hint: Exact match has zero edit distance [OK]
      Common Mistakes:
      • Choosing word with one letter difference over exact match
      • Ignoring exact matches
      • Miscounting substitutions