How to Use OpenAI Embeddings with Langchain: Simple Guide
To use
OpenAIEmbeddings in Langchain, import it and create an instance with your OpenAI API key. Then, call embedder.embed_documents() or embedder.embed_query() to convert text into vector embeddings for search or similarity tasks.Syntax
The main parts to use OpenAI embeddings in Langchain are:
OpenAIEmbeddings(): The class to create an embedding generator.embed_documents(list_of_texts): Converts a list of texts into vectors.embed_query(single_text): Converts a single text query into a vector.
You need to provide your OpenAI API key as an environment variable or directly when creating the instance.
python
from langchain.embeddings import OpenAIEmbeddings # Create an embeddings object embedder = OpenAIEmbeddings() # Embed multiple documents vectors = embedder.embed_documents(["Hello world", "Langchain is cool"]) # Embed a single query query_vector = embedder.embed_query("Hello")
Example
This example shows how to embed texts and print the vector shapes. It demonstrates embedding multiple documents and a single query using OpenAI embeddings in Langchain.
python
import os from langchain.embeddings import OpenAIEmbeddings # Set your OpenAI API key in environment variable before running # export OPENAI_API_KEY='your_api_key_here' embedder = OpenAIEmbeddings() texts = ["Langchain makes working with LLMs easy.", "OpenAI embeddings convert text to vectors."] vectors = embedder.embed_documents(texts) query = "How to use embeddings?" query_vector = embedder.embed_query(query) print(f"Embedded {len(texts)} documents. Each vector length: {len(vectors[0])}") print(f"Embedded query vector length: {len(query_vector)}")
Output
Embedded 2 documents. Each vector length: 1536
Embedded query vector length: 1536
Common Pitfalls
Common mistakes when using OpenAI embeddings in Langchain include:
- Not setting the
OPENAI_API_KEYenvironment variable, causing authentication errors. - Passing non-string inputs to
embed_documentsorembed_query. - Expecting synchronous behavior in async contexts without awaiting properly.
- Confusing
embed_documents(list input) withembed_query(single string input).
python
from langchain.embeddings import OpenAIEmbeddings # Wrong: passing a single string to embed_documents (expects list) embedder = OpenAIEmbeddings() try: embedder.embed_documents("This is a single string") except Exception as e: print(f"Error: {e}") # Right: pass a list of strings vectors = embedder.embed_documents(["This is a single string"])
Output
Error: Expected a list of strings for embed_documents, got str instead
Quick Reference
Remember these tips when using OpenAI embeddings with Langchain:
- Set
OPENAI_API_KEYin your environment before running code. - Use
embed_documentsfor lists of texts,embed_queryfor single queries. - Each embedding vector is a list of floats, usually length 1536 for OpenAI's text-embedding-ada-002 model.
- Use embeddings for similarity search, clustering, or as input features for ML models.
Key Takeaways
Always set your OpenAI API key in the environment before using OpenAIEmbeddings.
Use embed_documents() for multiple texts and embed_query() for single queries.
Embedding outputs are vectors (lists of floats) representing text meaning.
Pass only strings or lists of strings to embedding methods to avoid errors.
OpenAI embeddings in Langchain enable easy text vectorization for search and analysis.