0
0
NLPml~20 mins

Jaccard similarity in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Jaccard Similarity Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this Jaccard similarity calculation?
Given two sets A = {1, 2, 3, 4} and B = {3, 4, 5, 6}, what is the Jaccard similarity computed by the code below?
NLP
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
intersection = A.intersection(B)
union = A.union(B)
jaccard_similarity = len(intersection) / len(union)
print(round(jaccard_similarity, 2))
A0.50
B0.33
C0.25
D0.40
Attempts:
2 left
💡 Hint
Recall that Jaccard similarity is the size of the intersection divided by the size of the union of two sets.
🧠 Conceptual
intermediate
1:30remaining
Which statement best describes Jaccard similarity?
Choose the best description of what Jaccard similarity measures between two sets.
AThe ratio of the size of the intersection to the size of the union of two sets.
BThe difference in sizes between two sets.
CThe sum of the sizes of two sets.
DThe ratio of the size of the union to the size of the intersection of two sets.
Attempts:
2 left
💡 Hint
Think about how much two sets overlap compared to their total combined size.
Metrics
advanced
2:00remaining
What is the Jaccard similarity between these two token sets?
Given two token sets from text documents: doc1 = {'apple', 'banana', 'cherry'} and doc2 = {'banana', 'cherry', 'date', 'fig'}, what is their Jaccard similarity?
A0.4
B0.5
C0.6
D0.75
Attempts:
2 left
💡 Hint
Count the common tokens and total unique tokens.
🔧 Debug
advanced
2:00remaining
Why does this Jaccard similarity code raise an error?
Consider this code snippet:
def jaccard_similarity(list1, list2):
    intersection = list1 & list2
    union = list1 | list2
    return len(intersection) / len(union)

print(jaccard_similarity(['a', 'b'], ['b', 'c']))
Why does it raise an error?
AThe function returns a float but print expects a string.
BThe function is missing a return statement.
CLists do not support the '&' and '|' operators; sets are needed.
DThe lists are empty, causing division by zero.
Attempts:
2 left
💡 Hint
Check the data types and operators used for intersection and union.
Model Choice
expert
3:00remaining
Which model output best matches Jaccard similarity for text similarity?
You have two text documents and want to measure their similarity using Jaccard similarity on token sets. Which model output below correctly computes this similarity?
A
def jaccard(doc1, doc2):
    return len(set(doc1).intersection(set(doc2))) / len(set(doc1).union(set(doc2)))
B
def jaccard(doc1, doc2):
    tokens1, tokens2 = doc1.split(), doc2.split()
    return len(tokens1 & tokens2) / len(tokens1 | tokens2)
C
def jaccard(doc1, doc2):
    set1, set2 = set(doc1), set(doc2)
    return len(set1 & set2) / len(set1 | set2)
D
def jaccard(doc1, doc2):
    set1, set2 = set(doc1.split()), set(doc2.split())
    return len(set1.intersection(set2)) / len(set1.union(set2))
Attempts:
2 left
💡 Hint
Consider how to tokenize text properly before computing sets.