Embedding generation transforms raw data into a format that machine learning models can understand better. What is the main goal of this process?
Think about how words or images are represented so models can work with them.
Embedding generation creates fixed-size vectors that capture the meaning or features of data, making it easier for models to learn patterns.
Given the following code snippet generating embeddings for 3 sentences using a model that outputs 768-dimensional vectors, what is the shape of the resulting embedding array?
sentences = ['Hello world', 'Machine learning is fun', 'AI helps humans'] embeddings = model.encode(sentences) print(embeddings.shape)
Each sentence gets its own vector of length 768.
The model encodes each of the 3 sentences into a 768-dimensional vector, so the output shape is (3, 768).
You want to generate embeddings that capture the meaning of words depending on their sentence context. Which model type should you choose?
Think about models that understand word order and context deeply.
Transformer-based models like BERT generate embeddings that consider the context of each word in a sentence, unlike simpler models.
You have two embedding vectors representing sentences. Which metric best measures how similar their meanings are?
Consider a metric that measures the angle between vectors rather than their length.
Cosine similarity measures the angle between two vectors, effectively capturing similarity in direction regardless of magnitude, which is ideal for embeddings.
Examine the code below that attempts to generate embeddings for a list of texts. Why does it raise a TypeError?
texts = ['data science', 'deep learning'] embeddings = model.encode(texts[0], texts[1])
Check how the encode method is called and what arguments it expects.
The encode method expects one argument: a list of texts. Passing multiple string arguments causes a TypeError.