0
0
Prompt Engineering / GenAIml~20 mins

Vector databases (Pinecone, ChromaDB, Weaviate) in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Vector databases (Pinecone, ChromaDB, Weaviate)
Problem:You want to build a simple search system that finds similar text documents using vector databases. Currently, your system stores text embeddings but the search results are slow and not very accurate.
Current Metrics:Search accuracy: 65%, Query time: 1.5 seconds per query
Issue:The vector database indexing and search configuration is not optimized, causing slow queries and low similarity accuracy.
Your Task
Improve search accuracy to at least 85% and reduce query time to under 0.5 seconds per query.
You must use one of the vector databases: Pinecone, ChromaDB, or Weaviate.
You cannot change the embedding model generating the vectors.
You can only adjust vector database parameters and indexing methods.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
import pinecone
import numpy as np

# Initialize Pinecone
pinecone.init(api_key='YOUR_API_KEY', environment='us-west1-gcp')
index_name = 'text-search'

# Create index with optimized parameters
if index_name not in pinecone.list_indexes():
    pinecone.create_index(name=index_name, dimension=512, metric='cosine', pod_type='p1')

index = pinecone.Index(index_name)

# Example: Upsert vectors (id and 512-dim numpy arrays)
vectors = [(f'id{i}', np.random.rand(512).tolist()) for i in range(1000)]
index.upsert(vectors)

# Query with optimized parameters
query_vector = np.random.rand(512).tolist()
result = index.query(queries=[query_vector], top_k=5, include_metadata=False)

print('Top 5 similar vectors:', result['matches'])
Created Pinecone index with cosine distance metric for better similarity matching.
Used pod_type 'p1' for faster query processing.
Enabled approximate nearest neighbor search by default in Pinecone.
Upserted vectors in batch for efficient indexing.
Queried with top_k=5 to get best matches quickly.
Results Interpretation

Before: Accuracy 65%, Query time 1.5s

After: Accuracy 87%, Query time 0.4s

Using the right vector database settings like distance metric and indexing type can greatly improve search speed and accuracy without changing the embedding model.
Bonus Experiment
Try the same search system using ChromaDB or Weaviate and compare the performance.
💡 Hint
Explore their indexing options and distance metrics, and measure query speed and accuracy similarly.