You have two text documents represented as TF-IDF vectors: doc1 = [0, 1, 2, 0] and doc2 = [1, 0, 1, 1]. Which step is best to improve cosine similarity comparison for very sparse vectors?

hard📝 Application Q15 of 15

NLP - Text Similarity and Search

You have two text documents represented as TF-IDF vectors: doc1 = [0, 1, 2, 0] and doc2 = [1, 0, 1, 1]. Which step is best to improve cosine similarity comparison for very sparse vectors?

ANormalize vectors to unit length before computing cosine similarity

BAdd the vectors element-wise before similarity

CUse Euclidean distance instead of cosine similarity

DIgnore zero elements in vectors

Step-by-Step Solution

Solution:

Step 1: Understand sparse vector challenges
Sparse vectors have many zeros; normalizing to unit length ensures fair angle comparison.
Step 2: Identify best practice for cosine similarity
Normalizing vectors before cosine similarity avoids bias from vector length differences.
Final Answer:
Normalize vectors to unit length before computing cosine similarity -> Option A
Quick Check:
Normalization improves cosine similarity on sparse data [OK]

Quick Trick: Always normalize vectors before cosine similarity [OK]

Common Mistakes:

MISTAKES

Adding vectors instead of comparing
Switching to Euclidean distance without reason
Ignoring zeros instead of normalizing

Master "Text Similarity and Search" in NLP

9 interactive learning modes - each teaches the same concept differently

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions

More NLP Quizzes

You have two text documents represented as TF-IDF vectors: doc1 = [0, 1, 2, 0] and doc2 = [1, 0, 1, 1]. Which step is best to improve cosine similarity comparison for very sparse vectors?

Step 1: Understand sparse vector challenges

Step 2: Identify best practice for cosine similarity

Final Answer:

Quick Check:

Want More Practice?