Bird
Raised Fist0
NLPml~20 mins

Cosine similarity in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
Cosine Similarity Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:00remaining
Understanding the range of cosine similarity values
What is the range of values that cosine similarity between two vectors can take?
AFrom -1 to 0, where -1 means identical vectors and 0 means orthogonal vectors
BFrom -1 to 1, where 1 means identical direction and -1 means opposite direction
CFrom 0 to 1, where 0 means orthogonal vectors and 1 means identical vectors
DFrom 0 to infinity, where higher values mean more similarity
Attempts:
2 left
💡 Hint
Think about the angle between two vectors and how cosine behaves.
Predict Output
intermediate
1:30remaining
Output of cosine similarity calculation
What is the output of this Python code calculating cosine similarity between two vectors?
NLP
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

vec1 = np.array([1, 0, 0])
vec2 = np.array([0, 1, 0])
result = cosine_similarity(vec1, vec2)
print(round(result, 2))
A0.71
B1.00
C-1.00
D0.00
Attempts:
2 left
💡 Hint
Consider the angle between the two vectors.
Model Choice
advanced
1:30remaining
Choosing cosine similarity for text similarity
You want to measure similarity between two text documents represented as TF-IDF vectors. Which similarity measure is most appropriate?
ACosine similarity
BEuclidean distance
CManhattan distance
DJaccard index
Attempts:
2 left
💡 Hint
Think about how vector length affects similarity in text data.
Metrics
advanced
1:00remaining
Interpreting cosine similarity value in recommendation
A recommendation system uses cosine similarity between user preference vectors. If two users have a cosine similarity of 0.95, what does this imply?
AThey have no preferences in common
BThey have very different preferences
CThey have very similar preferences
DThey have opposite preferences
Attempts:
2 left
💡 Hint
Recall what a high cosine similarity value means.
🔧 Debug
expert
2:00remaining
Debugging cosine similarity code with zero vector
What error will this code raise when computing cosine similarity if one input vector is all zeros?
NLP
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

vec1 = np.array([0, 0, 0])
vec2 = np.array([1, 2, 3])
result = cosine_similarity(vec1, vec2)
print(result)
AZeroDivisionError
BValueError
CTypeError
DNo error, outputs 0
Attempts:
2 left
💡 Hint
Think about what happens when dividing by zero norm.

Practice

(1/5)
1. What does cosine similarity measure between two vectors?
easy
A. The difference in vector lengths
B. How close the vectors point in the same direction
C. The sum of vector elements
D. The distance between vector origins

Solution

  1. Step 1: Understand vector comparison

    Cosine similarity compares the angle between two vectors, not their length or sum.
  2. Step 2: Interpret cosine similarity meaning

    A value close to 1 means vectors point in the same direction, showing similarity.
  3. Final Answer:

    How close the vectors point in the same direction -> Option B
  4. Quick Check:

    Cosine similarity = direction closeness [OK]
Hint: Cosine similarity checks angle, not length or sum [OK]
Common Mistakes:
  • Confusing cosine similarity with Euclidean distance
  • Thinking it measures vector length difference
  • Assuming it sums vector values
2. Which of the following is the correct formula for cosine similarity between vectors A and B?
easy
A. \( \frac{\|A\|}{\|B\|} \)
B. \( \|A - B\| \)
C. \( \frac{A \cdot B}{\|A\| \times \|B\|} \)
D. \( A + B \)

Solution

  1. Step 1: Recall cosine similarity formula

    Cosine similarity is the dot product of vectors divided by the product of their lengths.
  2. Step 2: Match formula to options

    \( \frac{A \cdot B}{\|A\| \times \|B\|} \) matches the formula \( \frac{A \cdot B}{\|A\| \times \|B\|} \), others do not.
  3. Final Answer:

    \( \frac{A \cdot B}{\|A\| \times \|B\|} \) -> Option C
  4. Quick Check:

    Cosine similarity = dot product / product of norms [OK]
Hint: Look for dot product over product of lengths [OK]
Common Mistakes:
  • Choosing Euclidean distance formula
  • Adding vectors instead of dot product
  • Dividing norms instead of multiplying
3. Given vectors A = [1, 2, 3] and B = [4, 5, 6], what is the cosine similarity (rounded to 2 decimals)?
medium
A. 0.97
B. 0.83
C. 0.74
D. 0.56

Solution

  1. Step 1: Calculate dot product of A and B

    Dot product = 1*4 + 2*5 + 3*6 = 4 + 10 + 18 = 32
  2. Step 2: Calculate norms of A and B

    Norm A = sqrt(1^2 + 2^2 + 3^2) = sqrt(14) ≈ 3.74; Norm B = sqrt(4^2 + 5^2 + 6^2) = sqrt(77) ≈ 8.77
  3. Step 3: Compute cosine similarity

    Cosine similarity = 32 / (3.74 * 8.77) ≈ 32 / 32.83 ≈ 0.9749 rounded to 0.97
  4. Step 4: Check closest option

    0.97 matches the value rounded to 2 decimals.
  5. Final Answer:

    0.97 -> Option A
  6. Quick Check:

    Dot product / (norms product) ≈ 0.97 [OK]
Hint: Calculate dot product and divide by product of lengths [OK]
Common Mistakes:
  • Forgetting to take vector norms
  • Mixing up dot product with element-wise multiplication
  • Rounding too early causing wrong answer
4. What is wrong with this Python code to compute cosine similarity?
import numpy as np

def cosine_sim(a, b):
    return np.dot(a, b) / np.linalg.norm(a + b)

A = np.array([1, 0])
B = np.array([0, 1])
print(cosine_sim(A, B))
medium
A. It should add vectors before dot product
B. It uses np.dot instead of np.cross
C. It misses normalizing vectors before dot product
D. It divides by norm of sum instead of product of norms

Solution

  1. Step 1: Analyze denominator in code

    The code divides by norm of (a + b), but cosine similarity requires product of norms of a and b.
  2. Step 2: Understand correct formula

    Correct denominator is np.linalg.norm(a) * np.linalg.norm(b), not norm of sum.
  3. Final Answer:

    It divides by norm of sum instead of product of norms -> Option D
  4. Quick Check:

    Denominator must be product of norms [OK]
Hint: Denominator is product of norms, not norm of sum [OK]
Common Mistakes:
  • Using norm of sum instead of product
  • Confusing dot product with cross product
  • Normalizing vectors before dot product unnecessarily
5. You have two text documents represented as TF-IDF vectors: doc1 = [0, 1, 2, 0] and doc2 = [1, 0, 1, 1]. Which step is best to improve cosine similarity comparison for very sparse vectors?
hard
A. Normalize vectors to unit length before computing cosine similarity
B. Add the vectors element-wise before similarity
C. Use Euclidean distance instead of cosine similarity
D. Ignore zero elements in vectors

Solution

  1. Step 1: Understand sparse vector challenges

    Sparse vectors have many zeros; normalizing to unit length ensures fair angle comparison.
  2. Step 2: Identify best practice for cosine similarity

    Normalizing vectors before cosine similarity avoids bias from vector length differences.
  3. Final Answer:

    Normalize vectors to unit length before computing cosine similarity -> Option A
  4. Quick Check:

    Normalization improves cosine similarity on sparse data [OK]
Hint: Always normalize vectors before cosine similarity [OK]
Common Mistakes:
  • Adding vectors instead of comparing
  • Switching to Euclidean distance without reason
  • Ignoring zeros instead of normalizing