You want to build a system that handles millions of vectors and requires fast, real-time similarity search with automatic scaling. Which vector store is the best choice?
Think about which option offers managed scaling and cloud support for millions of vectors.
Pinecone is a managed vector database service designed for large-scale, real-time similarity search with automatic scaling. FAISS is powerful but requires manual setup and scaling. Chroma is better suited for smaller, local datasets.
You run a similarity search on a vector store and get a list of retrieved items. Which metric best measures how many relevant items are retrieved among the top results?
Recall focuses on how many relevant items you find, not how many you retrieved.
Recall measures the fraction of relevant items retrieved among all relevant items, which is important for evaluating retrieval completeness in vector search.
What is the output of this Python code using FAISS for a simple vector search?
import numpy as np import faiss # Create 5 vectors of dimension 3 vectors = np.array([[1,0,0],[0,1,0],[0,0,1],[1,1,0],[0,1,1]], dtype='float32') # Build index index = faiss.IndexFlatL2(3) index.add(vectors) # Query vector query = np.array([[1,0,0]], dtype='float32') # Search top 2 nearest neighbors D, I = index.search(query, 2) print(I[0].tolist())
Remember FAISS returns nearest neighbors by distance. The query is [1,0,0].
The closest vector to [1,0,0] is itself at index 0, then [1,1,0] at index 3 is the next closest by Euclidean distance.
You want to perform offline batch similarity searches on a dataset of 100,000 vectors without needing real-time responses. Which vector store is most suitable?
Consider which tool is best for offline batch processing and large datasets without cloud dependency.
FAISS is designed for efficient similarity search on large datasets locally or on GPU, making it ideal for offline batch processing.
What error will this Pinecone vector insertion code raise?
import pinecone pinecone.init(api_key='fake_key', environment='us-west1-gcp') index = pinecone.Index('example-index') vectors = [(1, [0.1, 0.2, 0.3]), (2, [0.4, 0.5])] # Note second vector has length 2 index.upsert(vectors)
Check if all vectors have the same dimension length.
Pinecone requires all vectors to have the same dimension length. The second vector has length 2 instead of 3, causing a ValueError.