from langchain.embeddings import HuggingFaceEmbeddings embedding_model = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2') text = "Hello, world!" vector = embedding_model.embed_query(text) print(len(vector))
The 'sentence-transformers/all-MiniLM-L6-v2' model outputs embeddings of size 384. The embed_query method returns a vector of this length.
The HuggingFaceEmbeddings class requires the model_name keyword argument to specify the model. Passing the model name as a positional argument or using a different keyword will cause errors.
from langchain.embeddings import HuggingFaceEmbeddings embedding_model = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2') vector = embedding_model.embed_query('Test input')
HuggingFace embedding models rely on the 'transformers' library. If it is not installed, importing or using the model raises ImportError.
embed_query twice with the same input text on the same HuggingFace embedding model instance, what will be the relationship between the two output vectors?embedding_model = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2') vec1 = embedding_model.embed_query('Repeat this sentence') vec2 = embedding_model.embed_query('Repeat this sentence') print(vec1 == vec2)
Open-source embedding models like 'all-MiniLM-L6-v2' produce deterministic embeddings for the same input text when called multiple times.
The 'sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2' model is trained to handle multiple languages well, making it suitable for multilingual embedding tasks.