Complete the code to calculate cosine similarity between two vectors.
from numpy import dot from numpy.linalg import norm def cosine_similarity(vec1, vec2): return dot(vec1, vec2) / (norm(vec1) [1] norm(vec2))
Cosine similarity divides the dot product by the product of the norms (lengths) of the two vectors.
Complete the code to convert text into a vector using term frequency.
def term_frequency(text, word): words = text.lower().split() return words.count([1]) / len(words)
We count how many times the specific word appears in the text.
Fix the error in the code to compute Jaccard similarity between two sets.
def jaccard_similarity(set1, set2): intersection = set1.intersection(set2) union = set1.[1](set2) return len(intersection) / len(union)
The union of two sets is found with the union method.
Fill both blanks to create a dictionary of word counts for words longer than 3 letters.
text = 'machine learning finds related text' words = text.split() word_counts = {word: words.count(word) for word in words if len(word) [1] [2]
We want words longer than 3 letters, so length must be greater than 3.
Fill all three blanks to filter words and create a dictionary with uppercase keys and counts for words longer than 4 letters.
text = 'similarity measures find related text' words = text.split() filtered_counts = [1]: [2] for word in words if len(word) [3] 4
Keys are uppercase words, values are counts, and filter is for words longer than 4 letters.