NLPml~20 mins

Edit distance (Levenshtein) in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Challenge - 5 Problems

🎖️

Edit Distance Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

1:30remaining

Output of Levenshtein distance function

What is the output of this Python function call?

```python def levenshtein(s1, s2): if len(s1) < len(s2): return levenshtein(s2, s1) if len(s2) == 0: return len(s1) previous_row = range(len(s2) + 1) for i, c1 in enumerate(s1): current_row = [i + 1] for j, c2 in enumerate(s2): insertions = previous_row[j + 1] + 1 deletions = current_row[j] + 1 substitutions = previous_row[j] + (c1 != c2) current_row.append(min(insertions, deletions, substitutions)) previous_row = current_row return previous_row[-1] print(levenshtein("kitten", "sitting")) ```

NLP

def levenshtein(s1, s2):
    if len(s1) < len(s2):
        return levenshtein(s2, s1)
    if len(s2) == 0:
        return len(s1)
    previous_row = range(len(s2) + 1)
    for i, c1 in enumerate(s1):
        current_row = [i + 1]
        for j, c2 in enumerate(s2):
            insertions = previous_row[j + 1] + 1
            deletions = current_row[j] + 1
            substitutions = previous_row[j] + (c1 != c2)
            current_row.append(min(insertions, deletions, substitutions))
        previous_row = current_row
    return previous_row[-1]

print(levenshtein("kitten", "sitting"))

Attempts:

2 left

🧠 Conceptual

intermediate

1:00remaining

Understanding Levenshtein distance properties

Which of the following statements about Levenshtein distance is TRUE?

ALevenshtein distance is always equal to the length difference between two strings.

BLevenshtein distance is a measure of semantic similarity between words.

CLevenshtein distance can only be zero or positive, never negative.

DLevenshtein distance counts only substitutions, ignoring insertions and deletions.

Attempts:

2 left

❓ Hyperparameter

advanced

1:30remaining

Choosing weights in weighted Levenshtein distance

In a weighted Levenshtein distance, you assign different costs to insertions, deletions, and substitutions. If you want to penalize substitutions more than insertions and deletions, which weight setting is correct?

AInsertion=2, Deletion=2, Substitution=2

BInsertion=1, Deletion=1, Substitution=3

CInsertion=3, Deletion=3, Substitution=1

DInsertion=0, Deletion=0, Substitution=5

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Identify the error in Levenshtein distance code

What error will this code raise when called with levenshtein('a', 'b')?

```python def levenshtein(s1, s2): if len(s1) < len(s2): return levenshtein(s2, s1) if len(s2) == 0: return len(s1) previous_row = [0] * len(s2) for i, c1 in enumerate(s1): current_row = [i + 1] for j, c2 in enumerate(s2): insertions = previous_row[j + 1] + 1 deletions = current_row[j] + 1 substitutions = previous_row[j] + (c1 != c2) current_row.append(min(insertions, deletions, substitutions)) previous_row = current_row return previous_row[-1] ```

NLP

def levenshtein(s1, s2):
    if len(s1) < len(s2):
        return levenshtein(s2, s1)
    if len(s2) == 0:
        return len(s1)
    previous_row = [0] * len(s2)
    for i, c1 in enumerate(s1):
        current_row = [i + 1]
        for j, c2 in enumerate(s2):
            insertions = previous_row[j + 1] + 1
            deletions = current_row[j] + 1
            substitutions = previous_row[j] + (c1 != c2)
            current_row.append(min(insertions, deletions, substitutions))
        previous_row = current_row
    return previous_row[-1]

levenshtein('a', 'b')

AIndexError: list index out of range

BTypeError: unsupported operand type(s) for +: 'int' and 'str'

CReturns 0 incorrectly

DNo error, returns 1

Attempts:

2 left

❓ Model Choice

expert

2:00remaining

Best model for approximate string matching using edit distance

You want to build a system that suggests corrections for misspelled words by finding dictionary words with smallest edit distance. Which model or approach is best suited for this task?

AUse a BK-tree data structure with Levenshtein distance for fast approximate matching

BTrain a deep neural network to classify words as correct or incorrect spellings

CUse a bag-of-words model with cosine similarity to find closest words

DApply k-means clustering on word embeddings to group similar words

Attempts:

2 left

Practice

(1/5)

1. What does the edit distance (Levenshtein distance) between two words measure?

easy

A. The length difference between two words

B. The minimum number of single-character edits to change one word into the other

C. The number of common letters between two words

D. The number of vowels in both words combined

Edit distance (Levenshtein) in NLP - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand the definition of edit distance

Step 2: Compare options with the definition

Final Answer:

Quick Check:

Solution

Step 1: Recall table size for edit distance

Step 2: Match code to correct dimensions

Final Answer:

Quick Check:

Solution

Step 1: Identify edits from "kitten" to "sitting"

Step 2: Count total edits

Final Answer:

Quick Check:

Solution

Step 1: Check string indexing in loops

Step 2: Correct indexing

Final Answer:

Quick Check:

Solution

Step 1: Calculate edit distances to each word

Step 2: Identify minimum distance

Final Answer:

Quick Check: