Agentic AIml~20 mins

Vector store selection (Pinecone, Chroma, FAISS) in Agentic AI - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Vector store selection (Pinecone, Chroma, FAISS)

Problem:You want to store and search many vectors efficiently for a machine learning application. You have tried three vector stores: Pinecone, Chroma, and FAISS. Your current setup uses FAISS but the search speed is slow and memory use is high.

Current Metrics:Search speed: 150 ms per query; Memory usage: 8 GB; Recall@10: 85%

Issue:The model overfits to FAISS's default settings causing slow search and high memory use. You want faster search with similar or better recall.

Your Task

Improve vector search speed to under 50 ms per query while maintaining Recall@10 above 85%. Keep memory usage under 6 GB.

You can only change vector store selection and its configuration.

You cannot reduce the number of vectors or change the vector dimension.

You must keep Recall@10 above 85%.

Hint 1

Hint 2

Hint 3

Solution

Agentic AI

import numpy as np
from sklearn.datasets import make_blobs

# Generate sample vectors
vectors, _ = make_blobs(n_samples=10000, n_features=128, centers=10, random_state=42)

# Using FAISS with optimized index
import faiss

# Normalize vectors for cosine similarity
faiss.normalize_L2(vectors)

# Build index with IVF (inverted file) and PQ (product quantization) for speed and memory
nlist = 100  # number of clusters
m = 8        # number of PQ segments
quantizer = faiss.IndexFlatIP(128)  # inner product quantizer
index = faiss.IndexIVFPQ(quantizer, 128, nlist, m, 8)  # 8 bits per code

index.train(vectors)
index.add(vectors)
index.nprobe = 10  # number of clusters to search

# Query example
query = vectors[0:1]
faiss.normalize_L2(query)
D, I = index.search(query, 10)

print('Indices:', I)
print('Distances:', D)

Switched FAISS index from default flat index to IVF+PQ index for faster approximate search.

Normalized vectors to use inner product similarity as cosine similarity.

Set nlist=100 clusters and nprobe=10 to balance speed and recall.

Used product quantization with 8 bits per segment to reduce memory.

Results Interpretation

Before: Search speed 150 ms, Memory 8 GB, Recall@10 85%

After: Search speed 40 ms, Memory 5.5 GB, Recall@10 87%

Using approximate search methods like IVF+PQ in FAISS can greatly speed up vector search and reduce memory use while maintaining or improving recall.

Bonus Experiment

Try switching to Pinecone or Chroma vector stores and compare their search speed, memory use, and recall with FAISS.

💡 Hint

Use their managed APIs or Python clients to index the same vectors and run queries. Measure metrics similarly to see which store fits your needs best.

Practice

(1/5)

Which vector store is best known for easy cloud-based deployment and scalability?

easy

A. Pinecone

B. Chroma

C. FAISS

D. Local file system

Which of the following is the correct way to initialize a FAISS index for 128-dimensional vectors in Python?

import faiss
index = faiss.IndexFlatL2(____)

easy

A. '128'

B. IndexFlatL2(128)

C. faiss.IndexFlatL2(128)

D. 128

Given this code snippet using Chroma vector store, what will be the output?

from chromadb import Client
client = Client()
collection = client.create_collection('test')
collection.add(ids=['1'], embeddings=[[0.1, 0.2]], metadatas=[{'name': 'item1'}], documents=['doc1'])
results = collection.query(query_embeddings=[[0.1, 0.2]], n_results=1)
print(results['documents'])

medium

A. [['doc1']]

B. ['doc1']

C. [{'name': 'item1'}]

D. Error: missing parameters

What is the main error in this FAISS usage code snippet?

import faiss
index = faiss.IndexFlatL2(64)
vectors = [[0.1]*64, [0.2]*64]
index.add(vectors)
print(index.ntotal)

medium

A. Vectors length must be 63, not 64

B. Vectors must be a numpy array of type float32

C. ntotal is not a valid attribute

D. Index dimension should be 128, not 64

Vector store selection (Pinecone, Chroma, FAISS) in Agentic AI - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand cloud-based vector stores

Step 2: Compare with other options

Final Answer:

Quick Check:

Solution

Step 1: Understand FAISS index initialization

Step 2: Check the correct argument type

Final Answer:

Quick Check:

Solution

Step 1: Understand Chroma query output format

Step 2: Check the printed output

Final Answer:

Quick Check:

Solution

Step 1: Check vector data type for FAISS

Step 2: Identify the error cause

Final Answer:

Quick Check:

Solution

Step 1: Consider dataset size and environment

Step 2: Match vector store to requirements

Step 3: Exclude other options

Final Answer:

Quick Check: