NLPml~10 mins

Document similarity ranking in NLP - Interactive Code Practice

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to compute cosine similarity between two vectors.

NLP

from sklearn.metrics.pairwise import [1]

vec1 = [[1, 2, 3]]
vec2 = [[4, 5, 6]]
similarity = [1](vec1, vec2)
print(similarity)

Drag options to blanks, or click blank then click option'

Acosine_similarity

Beuclidean_distance

Cmanhattan_distance

Ddot_product

Attempts:

3 left

2fill in blank

medium

Complete the code to convert text documents into TF-IDF vectors.

NLP

from sklearn.feature_extraction.text import [1]

corpus = ['I love machine learning', 'Machine learning is fun']
tfidf_vectorizer = [1]()
tfidf_matrix = tfidf_vectorizer.fit_transform(corpus)
print(tfidf_matrix.toarray())

Drag options to blanks, or click blank then click option'

AHashingVectorizer

BTfidfVectorizer

CCountVectorizer

DDictVectorizer

Attempts:

3 left

3fill in blank

hard

Fix the error in the code to correctly compute similarity scores between documents.

NLP

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import [1]

texts = ['Data science is cool', 'I love data science']
vectorizer = TfidfVectorizer()
matrix = vectorizer.fit_transform(texts)
scores = [1](matrix)
print(scores)

Drag options to blanks, or click blank then click option'

Amanhattan_distances

Beuclidean_distances

Cpairwise_distances

Dcosine_similarity

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a dictionary of document similarity scores above a threshold.

NLP

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

texts = ['AI is the future', 'AI and ML are related', 'I enjoy sports']
vectorizer = TfidfVectorizer()
matrix = vectorizer.fit_transform(texts)
sim_matrix = cosine_similarity(matrix)

threshold = 0.5
similar_docs = {i: [j for j in range(len(texts)) if sim_matrix[i][j] [1] threshold and i != j] for i in range(len(texts)) if any(sim_matrix[i][j] [2] threshold for j in range(len(texts)))}
print(similar_docs)

Drag options to blanks, or click blank then click option'

B>=

D<=

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to rank documents by similarity to a query document.

NLP

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

corpus = ['Deep learning is powerful', 'I like deep learning', 'Cats are cute']
vectorizer = TfidfVectorizer()
matrix = vectorizer.fit_transform(corpus)

query = ['I love learning']
query_vec = vectorizer.transform(query)
sim_scores = cosine_similarity(query_vec, matrix)[0]

ranked_docs = sorted(((i, sim_scores[i]) for i in range(len(corpus))), key=lambda x: x[1] x[2], reverse=[3])
print(ranked_docs)

Drag options to blanks, or click blank then click option'

B-

DTrue

Attempts:

3 left

Practice

(1/5)

1. What does document similarity ranking help us do in natural language processing?

easy

A. Find how related two texts are based on their content

B. Translate documents into different languages

C. Summarize long documents into short ones

D. Detect spelling errors in documents

Document similarity ranking in NLP - Interactive Code Practice

Start learning this pattern below

Practice

Solution

Step 1: Understand the purpose of document similarity ranking

Step 2: Identify the correct description

Final Answer:

Quick Check:

Solution

Step 1: Recall cosine similarity formula

Step 2: Match formula to code

Final Answer:

Quick Check:

Solution

Step 1: Understand TF-IDF vectorization of similar documents

Step 2: Calculate cosine similarity between vectors

Final Answer:

Quick Check:

Solution

Step 1: Check input types for cosine_similarity

Step 2: Understand how to fix the error

Final Answer:

Quick Check:

Solution

Step 1: Understand ranking by similarity

Step 2: Identify correct method

Final Answer:

Quick Check: