Challenge - 5 Problems

🎖️

Document Similarity Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

What does cosine similarity measure in document ranking?

Imagine you have two documents represented as vectors. What does cosine similarity tell you about these documents?

AThe angle between the two document vectors, indicating how similar their content is.

BThe total number of common words between the two documents.

CThe difference in length between the two documents measured by word count.

DThe sum of the frequencies of all words in both documents.

Attempts:

2 left

❓ Predict Output

intermediate

1:30remaining

Output of cosine similarity calculation

What is the output of this Python code that calculates cosine similarity between two document vectors?

NLP

import numpy as np

doc1 = np.array([1, 2, 3])
doc2 = np.array([4, 5, 6])

cos_sim = np.dot(doc1, doc2) / (np.linalg.norm(doc1) * np.linalg.norm(doc2))
print(round(cos_sim, 2))

A0.75

B1.00

C0.97

D0.87

Attempts:

2 left

❓ Model Choice

advanced

2:00remaining

Best model for semantic document similarity

You want to rank documents by meaning, not just word overlap. Which model is best for this?

ABag-of-words model with Euclidean distance

BCount vectorizer with Jaccard similarity

CTF-IDF vectorizer with cosine similarity

DPretrained transformer embeddings with cosine similarity

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Effect of embedding dimension on similarity ranking

How does increasing the embedding vector size affect document similarity ranking?

AIt can improve accuracy but may cause overfitting or slow computation.

BIt reduces accuracy because larger vectors are harder to compare.

CIt has no effect on similarity ranking performance.

DIt always improves ranking accuracy by capturing more details.

Attempts:

2 left

❓ Metrics

expert

2:30remaining

Choosing the best metric for ranking evaluation

You have a list of documents ranked by similarity to a query. Which metric best measures how well the ranking matches user relevance?

AMean Squared Error (MSE)

BPrecision at K (P@K)

CRoot Mean Squared Logarithmic Error (RMSLE)

DConfusion Matrix

Attempts:

2 left

Practice

(1/5)

1. What does document similarity ranking help us do in natural language processing?

easy

A. Find how related two texts are based on their content

B. Translate documents into different languages

C. Summarize long documents into short ones

D. Detect spelling errors in documents

Document similarity ranking in NLP - Practice Problems & Coding Challenges

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of document similarity ranking

Step 2: Identify the correct description

Final Answer:

Quick Check:

Solution

Step 1: Recall cosine similarity formula

Step 2: Match formula to code

Final Answer:

Quick Check:

Solution

Step 1: Understand TF-IDF vectorization of similar documents

Step 2: Calculate cosine similarity between vectors

Final Answer:

Quick Check:

Solution

Step 1: Check input types for cosine_similarity

Step 2: Understand how to fix the error

Final Answer:

Quick Check:

Solution

Step 1: Understand ranking by similarity

Step 2: Identify correct method

Final Answer:

Quick Check: