What is Cosine similarity in NLP?

Cosine similarity helps us measure how alike two things are by looking at the angle between their features, ignoring their size.

Cosine similarity in NLP - Syntax, Examples & Explanation

Practice

(1/5)

1. What does cosine similarity measure between two vectors?

easy

A. The difference in vector lengths

B. How close the vectors point in the same direction

C. The sum of vector elements

D. The distance between vector origins

Solution

Step 1: Understand vector comparison
Cosine similarity compares the angle between two vectors, not their length or sum.
Step 2: Interpret cosine similarity meaning
A value close to 1 means vectors point in the same direction, showing similarity.
Final Answer:
How close the vectors point in the same direction -> Option B
Quick Check:
Cosine similarity = direction closeness [OK]

Hint: Cosine similarity checks angle, not length or sum [OK]

Common Mistakes:

Confusing cosine similarity with Euclidean distance
Thinking it measures vector length difference
Assuming it sums vector values

2. Which of the following is the correct formula for cosine similarity between vectors A and B?

easy

A. \( \frac{\|A\|}{\|B\|} \)

B. \( \|A - B\| \)

C. \( \frac{A \cdot B}{\|A\| \times \|B\|} \)

D. \( A + B \)

Solution

Step 1: Recall cosine similarity formula
Cosine similarity is the dot product of vectors divided by the product of their lengths.
Step 2: Match formula to options
\( \frac{A \cdot B}{\|A\| \times \|B\|} \) matches the formula \( \frac{A \cdot B}{\|A\| \times \|B\|} \), others do not.
Final Answer:
\( \frac{A \cdot B}{\|A\| \times \|B\|} \) -> Option C
Quick Check:
Cosine similarity = dot product / product of norms [OK]

Hint: Look for dot product over product of lengths [OK]

Common Mistakes:

Choosing Euclidean distance formula
Adding vectors instead of dot product
Dividing norms instead of multiplying

3. Given vectors A = [1, 2, 3] and B = [4, 5, 6], what is the cosine similarity (rounded to 2 decimals)?

medium

A. 0.97

B. 0.83

C. 0.74

D. 0.56

Solution

Step 1: Calculate dot product of A and B
Dot product = 1*4 + 2*5 + 3*6 = 4 + 10 + 18 = 32
Step 2: Calculate norms of A and B
Norm A = sqrt(1^2 + 2^2 + 3^2) = sqrt(14) ≈ 3.74; Norm B = sqrt(4^2 + 5^2 + 6^2) = sqrt(77) ≈ 8.77
Step 3: Compute cosine similarity
Cosine similarity = 32 / (3.74 * 8.77) ≈ 32 / 32.83 ≈ 0.9749 rounded to 0.97
Step 4: Check closest option
0.97 matches the value rounded to 2 decimals.
Final Answer:
0.97 -> Option A
Quick Check:
Dot product / (norms product) ≈ 0.97 [OK]

Hint: Calculate dot product and divide by product of lengths [OK]

Common Mistakes:

Forgetting to take vector norms
Mixing up dot product with element-wise multiplication
Rounding too early causing wrong answer

4. What is wrong with this Python code to compute cosine similarity?

import numpy as np

def cosine_sim(a, b):
    return np.dot(a, b) / np.linalg.norm(a + b)

A = np.array([1, 0])
B = np.array([0, 1])
print(cosine_sim(A, B))

medium

A. It should add vectors before dot product

B. It uses np.dot instead of np.cross

C. It misses normalizing vectors before dot product

D. It divides by norm of sum instead of product of norms

Solution

Step 1: Analyze denominator in code
The code divides by norm of (a + b), but cosine similarity requires product of norms of a and b.
Step 2: Understand correct formula
Correct denominator is np.linalg.norm(a) * np.linalg.norm(b), not norm of sum.
Final Answer:
It divides by norm of sum instead of product of norms -> Option D
Quick Check:
Denominator must be product of norms [OK]

Hint: Denominator is product of norms, not norm of sum [OK]

Common Mistakes:

Using norm of sum instead of product
Confusing dot product with cross product
Normalizing vectors before dot product unnecessarily

5. You have two text documents represented as TF-IDF vectors: doc1 = [0, 1, 2, 0] and doc2 = [1, 0, 1, 1]. Which step is best to improve cosine similarity comparison for very sparse vectors?

hard

A. Normalize vectors to unit length before computing cosine similarity

B. Add the vectors element-wise before similarity

C. Use Euclidean distance instead of cosine similarity

D. Ignore zero elements in vectors

Solution

Step 1: Understand sparse vector challenges
Sparse vectors have many zeros; normalizing to unit length ensures fair angle comparison.
Step 2: Identify best practice for cosine similarity
Normalizing vectors before cosine similarity avoids bias from vector length differences.
Final Answer:
Normalize vectors to unit length before computing cosine similarity -> Option A
Quick Check:
Normalization improves cosine similarity on sparse data [OK]

Hint: Always normalize vectors before cosine similarity [OK]

Common Mistakes:

Adding vectors instead of comparing
Switching to Euclidean distance without reason
Ignoring zeros instead of normalizing

Start learning this pattern below

Practice

Solution

Step 1: Understand vector comparison

Step 2: Interpret cosine similarity meaning

Final Answer:

Quick Check:

Solution

Step 1: Recall cosine similarity formula

Step 2: Match formula to options

Final Answer:

Quick Check:

Solution

Step 1: Calculate dot product of A and B

Step 2: Calculate norms of A and B

Step 3: Compute cosine similarity

Step 4: Check closest option

Final Answer:

Quick Check:

Solution

Step 1: Analyze denominator in code

Step 2: Understand correct formula

Final Answer:

Quick Check:

Solution

Step 1: Understand sparse vector challenges

Step 2: Identify best practice for cosine similarity

Final Answer:

Quick Check: