Which of the following best describes how vector databases like Pinecone, ChromaDB, and Weaviate index data for fast similarity search?
Think about how similarity search works with numbers instead of text.
Vector databases convert data into vectors (numbers) and use special algorithms to quickly find similar vectors, enabling fast similarity search.
You want to build a real-time recommendation system that updates frequently and requires low latency. Which vector database is best suited for this use case?
Consider which database supports fast updates and quick queries.
Pinecone is designed for real-time applications with dynamic updates and low-latency vector search, making it suitable for recommendation systems.
Which metric is most appropriate to evaluate the quality of a vector database's approximate nearest neighbor search results?
Think about how to measure how many returned results are actually relevant.
Precision@k measures how many of the top k retrieved vectors are relevant, which is suitable for evaluating approximate nearest neighbor search quality.
You notice that your vector database returns very poor search results despite correct vector embeddings. Which of the following is the most likely cause?
Consider how similarity is measured between vectors.
If the distance metric (like Euclidean or cosine) does not match the embedding space, similarity search results will be poor even if embeddings are correct.
What is the output of the following Python code snippet using a vector database client?
import numpy as np # Sample vectors vectors = { 'id1': np.array([1.0, 0.0]), 'id2': np.array([0.0, 1.0]), 'id3': np.array([1.0, 1.0]) } # Query vector query = np.array([1.0, 0.5]) # Function to compute cosine similarity def cosine_similarity(a, b): return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)) # Find id with highest similarity best_id = max(vectors, key=lambda k: cosine_similarity(query, vectors[k])) print(best_id)
Calculate cosine similarity for each vector with the query.
Cosine similarity between query and id3 is highest, so 'id3' is printed.