Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is Jaccard similarity?
Jaccard similarity is a way to measure how similar two sets are. It is the size of the overlap divided by the size of the union of the sets.
Click to reveal answer
beginner
How do you calculate Jaccard similarity between two sets A and B?
Jaccard similarity = (Number of elements in both A and B) / (Number of elements in A or B). In formula: |A ∩ B| / |A ∪ B|.
Click to reveal answer
beginner
Why is Jaccard similarity useful in NLP?
It helps compare texts by looking at shared words or features, showing how much two texts overlap in content.
Click to reveal answer
beginner
What is the Jaccard similarity between the sets {apple, banana} and {banana, cherry}?
The intersection is {banana} (1 element), the union is {apple, banana, cherry} (3 elements). So similarity = 1/3 ≈ 0.33.
Click to reveal answer
beginner
Can Jaccard similarity be used for comparing sentences? How?
Yes. Convert sentences into sets of words, then calculate Jaccard similarity to see how many words they share compared to total unique words.
Click to reveal answer
What does Jaccard similarity measure?
AOverlap between two sets divided by their union
BDifference between two sets
CSum of elements in two sets
DNumber of elements in the first set
✗ Incorrect
Jaccard similarity measures how much two sets overlap compared to their combined size.
If two sets have no elements in common, what is their Jaccard similarity?
A0
B0.5
C1
DUndefined
✗ Incorrect
No overlap means intersection size is zero, so similarity is 0.
Which of these is the correct formula for Jaccard similarity?
A|A| - |B|
B|A ∪ B| / |A ∩ B|
C|A| + |B|
D|A ∩ B| / |A ∪ B|
✗ Incorrect
Jaccard similarity is intersection size divided by union size.
In NLP, what do we usually compare using Jaccard similarity?
ANumber of sentences
BSets of words from texts
CLength of paragraphs
DNumber of characters
✗ Incorrect
We compare sets of words to see how much two texts share.
If set A = {1, 2, 3} and set B = {2, 3, 4, 5}, what is their Jaccard similarity?
A0.6
B0.5
C0.4
D0.75
✗ Incorrect
Intersection is {2,3} (2 elements), union is {1,2,3,4,5} (5 elements), so 2/5 = 0.4.
Explain in your own words what Jaccard similarity is and how it can be used to compare two texts.
Think about how much two groups share compared to all items they have.
You got /4 concepts.
Describe how to calculate Jaccard similarity step-by-step for two example sets.
Use a simple example with fruits or numbers.
You got /4 concepts.
Practice
(1/5)
1. What does the Jaccard similarity measure between two sets?
easy
A. The difference between the sizes of the two sets
B. The size of the union divided by the size of the intersection
C. The sum of the sizes of the two sets
D. The size of the intersection divided by the size of the union
Solution
Step 1: Understand the definition of Jaccard similarity
Jaccard similarity is defined as the size of the intersection of two sets divided by the size of their union.
Step 2: Compare options with the definition
The size of the intersection divided by the size of the union matches the definition exactly, while others describe different calculations.
Final Answer:
The size of the intersection divided by the size of the union -> Option D
Quick Check:
Jaccard similarity = intersection / union [OK]
Hint: Remember: overlap divided by total unique items [OK]
Common Mistakes:
Confusing union with intersection
Using subtraction instead of division
Mixing up numerator and denominator
2. Which of the following Python code snippets correctly calculates the Jaccard similarity between two sets A and B?
easy
A. len(A | B) / len(A & B)
B. len(A & B) / len(A | B)
C. len(A - B) / len(B - A)
D. len(A) + len(B)
Solution
Step 1: Identify set operations for intersection and union
In Python, & is intersection and | is union for sets.
Step 2: Check the formula for Jaccard similarity
Jaccard similarity = size of intersection / size of union, which matches len(A & B) / len(A | B).
Final Answer:
len(A & B) / len(A | B) -> Option B
Quick Check:
Intersection & union operators used correctly [OK]
Hint: Use & for intersection, | for union in Python sets [OK]
Common Mistakes:
Swapping intersection and union operators
Using subtraction instead of intersection
Adding lengths instead of dividing
3. Given two sets A = {'apple', 'banana', 'cherry'} and B = {'banana', 'cherry', 'date', 'fig'}, what is the Jaccard similarity computed by this code?
len(A & B) / len(A | B)
medium
A. 0.4
B. 0.5
C. 0.6
D. 0.75
Solution
Step 1: Calculate intersection and union of sets A and B
Intersection: {'banana', 'cherry'} has 2 elements. Union: {'apple', 'banana', 'cherry', 'date', 'fig'} has 5 elements.
Step 2: Compute Jaccard similarity
Similarity = 2 / 5 = 0.4.
Final Answer:
0.4 -> Option A
Quick Check:
2 / 5 = 0.4 [OK]
Hint: Count common and total unique items, then divide [OK]
Common Mistakes:
Counting union incorrectly
Using addition instead of division
Mixing up intersection and union counts
4. The following code is intended to compute the Jaccard similarity between two sets A and B. What is the error?
def jaccard(A, B):
return len(A & B) / len(A & B | B)
medium
A. Function missing return statement
B. Division by zero error possible
C. Incorrect use of union and intersection operators in denominator
D. Sets A and B are not defined
Solution
Step 1: Analyze the denominator expression
The denominator is len(A & B | B). The operator precedence causes A & B to be evaluated first, then union with B. This results in len(B), which is incorrect for union of A and B.
Step 2: Correct denominator for union
The union should be len(A | B) only. The current expression is wrong and will not compute union correctly.
Final Answer:
Incorrect use of union and intersection operators in denominator -> Option C
Quick Check:
Union must be A | B, not combined with & [OK]
Hint: Use parentheses or correct operators for union [OK]
Common Mistakes:
Confusing operator precedence
Using intersection inside union calculation
Not testing code before use
5. You want to compare two documents by their unique words using Jaccard similarity. Document 1 has 100 unique words, Document 2 has 80 unique words, and they share 30 unique words. What is the Jaccard similarity? Also, if you add 20 common words to both documents, how does the similarity change?
hard
A. Initial similarity 0.2; after adding common words similarity increases to 0.3
B. Initial similarity 0.15; after adding common words similarity decreases
C. Initial similarity 0.25; after adding common words similarity stays the same
D. Initial similarity 0.18; after adding common words similarity increases to 0.33