Bird
0
0

You want to combine cosine similarity with another metric to improve document clustering. Which combination is most reasonable?

hard📝 Application Q9 of 15
NLP - Text Similarity and Search
You want to combine cosine similarity with another metric to improve document clustering. Which combination is most reasonable?
ACosine similarity with Jaccard similarity on token sets
BCosine similarity with random guessing
CCosine similarity with Euclidean distance on raw counts without normalization
DCosine similarity with sorting document lengths
Step-by-Step Solution
Solution:
  1. Step 1: Understand complementary metrics

    Jaccard similarity measures overlap of token sets, complementing cosine similarity's vector angle.
  2. Step 2: Evaluate options

    Random guessing and sorting lengths do not improve clustering; Euclidean distance on raw counts is less meaningful without normalization.
  3. Final Answer:

    Cosine similarity with Jaccard similarity on token sets -> Option A
  4. Quick Check:

    Combine cosine with Jaccard for better clustering [OK]
Quick Trick: Combine cosine with set-based similarity like Jaccard [OK]
Common Mistakes:
MISTAKES
  • Using random or irrelevant metrics
  • Ignoring normalization for Euclidean distance
  • Using document length sorting as similarity

Want More Practice?

15+ quiz questions · All difficulty levels · Free

Free Signup - Practice All Questions
More NLP Quizzes