Text embedding models convert text into numbers. What is their main purpose?
Think about how computers understand text in a way that machines can work with.
Text embedding models create fixed-length vectors that represent the meaning of text, enabling machines to compare and analyze text effectively.
You want to create embeddings for sentences to compare their meanings. Which model type is best?
Consider models designed to understand language context deeply.
Transformer-based models like BERT or GPT are trained on large text datasets and excel at capturing semantic relationships in text, making them ideal for embeddings.
Given this Python code snippet:
sentences = ["Hello world", "Machine learning is fun", "AI helps humans"] embeddings = model.encode(sentences) print(embeddings.shape)
What will be printed?
sentences = ["Hello world", "Machine learning is fun", "AI helps humans"] embeddings = model.encode(sentences) print(embeddings.shape)
Each sentence gets a vector of length 768. How many sentences are there?
The model encodes each sentence into a 768-dimensional vector. Since there are 3 sentences, the output shape is (3, 768).
You have two text embeddings and want to measure how similar their meanings are. Which metric is most appropriate?
Think about a metric that measures the angle between two vectors rather than their length.
Cosine similarity measures the angle between two vectors, focusing on their direction, which is ideal for comparing semantic similarity of embeddings.
Consider this code snippet:
text = "AI is amazing" embedding = model.encode(text) print(embedding.shape)
It raises: TypeError: 'float' object has no attribute 'shape'. Why?
Check the input type expected by the encode method.
Many embedding models expect a list of strings to encode multiple texts. Passing a single string returns a vector, but sometimes it returns a float if the model is misused. Here, the error suggests the input should be a list like [text].