Bird
0
0

You have two text documents represented as TF-IDF vectors: doc1 = [0, 1, 2, 0] and doc2 = [1, 0, 1, 1]. Which step is best to improve cosine similarity comparison for very sparse vectors?

hard📝 Application Q15 of 15
NLP - Text Similarity and Search
You have two text documents represented as TF-IDF vectors: doc1 = [0, 1, 2, 0] and doc2 = [1, 0, 1, 1]. Which step is best to improve cosine similarity comparison for very sparse vectors?
ANormalize vectors to unit length before computing cosine similarity
BAdd the vectors element-wise before similarity
CUse Euclidean distance instead of cosine similarity
DIgnore zero elements in vectors
Step-by-Step Solution
Solution:
  1. Step 1: Understand sparse vector challenges

    Sparse vectors have many zeros; normalizing to unit length ensures fair angle comparison.
  2. Step 2: Identify best practice for cosine similarity

    Normalizing vectors before cosine similarity avoids bias from vector length differences.
  3. Final Answer:

    Normalize vectors to unit length before computing cosine similarity -> Option A
  4. Quick Check:

    Normalization improves cosine similarity on sparse data [OK]
Quick Trick: Always normalize vectors before cosine similarity [OK]
Common Mistakes:
MISTAKES
  • Adding vectors instead of comparing
  • Switching to Euclidean distance without reason
  • Ignoring zeros instead of normalizing

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions
More NLP Quizzes