0
0
NLPml~20 mins

Information retrieval basics in NLP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Information Retrieval Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
1:30remaining
What is the main purpose of the TF-IDF score in information retrieval?

TF-IDF is a common technique used in information retrieval. What does it primarily help with?

AIt measures how important a word is to a document relative to a collection of documents.
BIt counts the total number of words in a document.
CIt ranks documents based on their length.
DIt removes stop words from the text.
Attempts:
2 left
💡 Hint

Think about how TF-IDF balances word frequency with rarity across documents.

Predict Output
intermediate
2:00remaining
What is the output of this code computing cosine similarity?

Given two vectors representing documents, what is the cosine similarity output?

NLP
import numpy as np

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    return dot_product / (norm1 * norm2)

vec1 = np.array([1, 2, 3])
vec2 = np.array([4, 5, 6])
result = cosine_similarity(vec1, vec2)
print(round(result, 2))
A0.50
B0.97
C1.00
D0.74
Attempts:
2 left
💡 Hint

Recall cosine similarity formula and calculate dot product and norms carefully.

Model Choice
advanced
1:30remaining
Which model is best suited for semantic search in information retrieval?

You want to build a search system that understands the meaning of queries and documents beyond exact word matches. Which model would you choose?

ATF-IDF vectorizer
BSimple keyword matching
CBag-of-Words model
DPretrained transformer-based embeddings (e.g., BERT)
Attempts:
2 left
💡 Hint

Consider models that capture context and meaning, not just word counts.

Hyperparameter
advanced
1:00remaining
Which hyperparameter affects the number of neighbors considered in k-NN for document retrieval?

In a k-Nearest Neighbors (k-NN) model used for retrieving similar documents, which hyperparameter controls how many neighbors are checked?

Abatch size
Blearning rate
Ck (number of neighbors)
Dmax depth
Attempts:
2 left
💡 Hint

Think about the 'k' in k-NN and what it stands for.

Metrics
expert
2:00remaining
Which metric best evaluates ranking quality in information retrieval?

You want to measure how well your search engine ranks relevant documents higher than irrelevant ones. Which metric is most appropriate?

ANormalized Discounted Cumulative Gain (NDCG)
BMean Squared Error (MSE)
CRecall
DPrecision at k (P@k)
Attempts:
2 left
💡 Hint

Consider metrics that account for both relevance and position in the ranked list.