0
0
NLPml~20 mins

Edit distance (Levenshtein) in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Edit Distance Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
1:30remaining
Output of Levenshtein distance function
What is the output of this Python function call?

```python def levenshtein(s1, s2): if len(s1) < len(s2): return levenshtein(s2, s1) if len(s2) == 0: return len(s1) previous_row = range(len(s2) + 1) for i, c1 in enumerate(s1): current_row = [i + 1] for j, c2 in enumerate(s2): insertions = previous_row[j + 1] + 1 deletions = current_row[j] + 1 substitutions = previous_row[j] + (c1 != c2) current_row.append(min(insertions, deletions, substitutions)) previous_row = current_row return previous_row[-1] print(levenshtein("kitten", "sitting")) ```
NLP
def levenshtein(s1, s2):
    if len(s1) < len(s2):
        return levenshtein(s2, s1)
    if len(s2) == 0:
        return len(s1)
    previous_row = range(len(s2) + 1)
    for i, c1 in enumerate(s1):
        current_row = [i + 1]
        for j, c2 in enumerate(s2):
            insertions = previous_row[j + 1] + 1
            deletions = current_row[j] + 1
            substitutions = previous_row[j] + (c1 != c2)
            current_row.append(min(insertions, deletions, substitutions))
        previous_row = current_row
    return previous_row[-1]

print(levenshtein("kitten", "sitting"))
A4
B6
C5
D3
Attempts:
2 left
💡 Hint
Count the minimum edits needed to change 'kitten' into 'sitting'.
🧠 Conceptual
intermediate
1:00remaining
Understanding Levenshtein distance properties
Which of the following statements about Levenshtein distance is TRUE?
ALevenshtein distance is always equal to the length difference between two strings.
BLevenshtein distance is a measure of semantic similarity between words.
CLevenshtein distance can only be zero or positive, never negative.
DLevenshtein distance counts only substitutions, ignoring insertions and deletions.
Attempts:
2 left
💡 Hint
Think about what the distance measures and if it can be negative.
Hyperparameter
advanced
1:30remaining
Choosing weights in weighted Levenshtein distance
In a weighted Levenshtein distance, you assign different costs to insertions, deletions, and substitutions. If you want to penalize substitutions more than insertions and deletions, which weight setting is correct?
AInsertion=2, Deletion=2, Substitution=2
BInsertion=1, Deletion=1, Substitution=3
CInsertion=3, Deletion=3, Substitution=1
DInsertion=0, Deletion=0, Substitution=5
Attempts:
2 left
💡 Hint
Higher cost means more penalty.
🔧 Debug
advanced
1:30remaining
Identify the error in Levenshtein distance code
What error will this code raise when called with levenshtein('a', 'b')?

```python def levenshtein(s1, s2): if len(s1) < len(s2): return levenshtein(s2, s1) if len(s2) == 0: return len(s1) previous_row = [0] * len(s2) for i, c1 in enumerate(s1): current_row = [i + 1] for j, c2 in enumerate(s2): insertions = previous_row[j + 1] + 1 deletions = current_row[j] + 1 substitutions = previous_row[j] + (c1 != c2) current_row.append(min(insertions, deletions, substitutions)) previous_row = current_row return previous_row[-1] ```
NLP
def levenshtein(s1, s2):
    if len(s1) < len(s2):
        return levenshtein(s2, s1)
    if len(s2) == 0:
        return len(s1)
    previous_row = [0] * len(s2)
    for i, c1 in enumerate(s1):
        current_row = [i + 1]
        for j, c2 in enumerate(s2):
            insertions = previous_row[j + 1] + 1
            deletions = current_row[j] + 1
            substitutions = previous_row[j] + (c1 != c2)
            current_row.append(min(insertions, deletions, substitutions))
        previous_row = current_row
    return previous_row[-1]

levenshtein('a', 'b')
AIndexError: list index out of range
BTypeError: unsupported operand type(s) for +: 'int' and 'str'
CReturns 0 incorrectly
DNo error, returns 1
Attempts:
2 left
💡 Hint
Check the length of previous_row and how indexes are accessed.
Model Choice
expert
2:00remaining
Best model for approximate string matching using edit distance
You want to build a system that suggests corrections for misspelled words by finding dictionary words with smallest edit distance. Which model or approach is best suited for this task?
AUse a BK-tree data structure with Levenshtein distance for fast approximate matching
BTrain a deep neural network to classify words as correct or incorrect spellings
CUse a bag-of-words model with cosine similarity to find closest words
DApply k-means clustering on word embeddings to group similar words
Attempts:
2 left
💡 Hint
Think about efficient search for words within a certain edit distance.