When working with vector databases, the key metrics to check are Recall and Precision for search results. This is because we want to find the most relevant vectors (data points) when we search (Read). Recall tells us how many of the truly relevant items we found, and Precision tells us how many of the found items are actually relevant. For Create, Update, and Delete, correctness and speed matter but are usually checked by system logs and response times rather than ML metrics.
Vector database operations (CRUD) in Prompt Engineering / GenAI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
or
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Vector database operations (CRUD)
Which metric matters for Vector database operations (CRUD) and WHY
Confusion matrix for vector search results
|---------------------------|
| | Predicted |
| Actual | Relevant | Not Relevant |
|-----------|----------|-------------|
| Relevant | TP | FN |
| Not Rel. | FP | TN |
|---------------------------|
TP = True Positives: Relevant vectors correctly found
FP = False Positives: Irrelevant vectors wrongly found
FN = False Negatives: Relevant vectors missed
TN = True Negatives: Irrelevant vectors correctly not found
Precision vs Recall tradeoff with examples
Imagine you search a vector database for images similar to a photo of a cat.
- High Precision, Low Recall: You get only very clear cat images but miss some cats that look different. Good if you want only exact matches.
- High Recall, Low Precision: You get almost all cat images but also some dog images mixed in. Good if you want to see all possible cats and can ignore some noise.
Choosing the right balance depends on your goal: strict accuracy or broad coverage.
What "good" vs "bad" metric values look like for vector database operations
- Good: Precision and Recall both above 0.8 means most relevant vectors are found and few irrelevant ones appear.
- Bad: Precision below 0.5 means many irrelevant vectors show up, confusing results.
- Bad: Recall below 0.5 means many relevant vectors are missed, so search is incomplete.
- For CRUD speed: Create, Update, Delete operations should be fast (milliseconds) to keep the database responsive.
Common pitfalls in vector database metrics
- Accuracy paradox: High accuracy can be misleading if the dataset is unbalanced (e.g., many irrelevant vectors).
- Data leakage: Using test vectors that were in training or indexing can inflate recall and precision falsely.
- Overfitting: Tuning vector search too tightly on a small set of queries can reduce general usefulness.
- Ignoring latency: Good metrics but slow CRUD operations hurt user experience.
Self-check question
Your vector search model has 98% accuracy but only 12% recall on relevant vectors. Is it good for production? Why or why not?
Answer: No, it is not good. The low recall means it misses most relevant vectors, so users won't find what they want even if the few results shown are correct. High accuracy here is misleading because most vectors are irrelevant, so the model just avoids false positives but fails to find true matches.
Key Result
Recall and Precision are key metrics to evaluate vector search quality; high recall ensures relevant vectors are found, high precision ensures results are relevant.
Practice
1. What does the
CRUD acronym stand for in vector database operations?easy
Solution
Step 1: Understand CRUD basics
CRUD is a common term in databases meaning the four basic operations you can do with data.Step 2: Match each letter to its meaning
C stands for Create (add new data), R for Read (get data), U for Update (change data), and D for Delete (remove data).Final Answer:
Create, Read, Update, Delete -> Option CQuick Check:
CRUD = Create, Read, Update, Delete [OK]
Hint: Remember CRUD as basic data actions: add, get, change, remove [OK]
Common Mistakes:
- Confusing CRUD with unrelated terms
- Mixing up the order of operations
- Thinking CRUD only applies to files, not vectors
2. Which of the following is the correct syntax to add a vector with ID 'vec1' and values [0.1, 0.2, 0.3] to a vector database named
db?easy
Solution
Step 1: Identify the common method for adding vectors
Most vector databases use a method likeadd_vectorwith an ID and a list of numbers.Step 2: Check method parameters
The method should take the vector ID as a string and the vector values as a list or array.Final Answer:
db.add_vector('vec1', [0.1, 0.2, 0.3]) -> Option DQuick Check:
Add vector syntax = db.add_vector(id, vector) [OK]
Hint: Add vectors with add_vector(id, vector_list) method [OK]
Common Mistakes:
- Using wrong method names like insert or push_vector
- Passing vector values as separate arguments instead of a list
- Mixing ID and vector in one list
3. Given the following code snippet, what will be the output?
db = VectorDB()
db.add_vector('v1', [1, 0, 0])
db.add_vector('v2', [0, 1, 0])
results = db.search([0.9, 0.1, 0], top_k=1)
print(results)medium
Solution
Step 1: Understand the vectors and query
Vectors 'v1' = [1,0,0], 'v2' = [0,1,0], query = [0.9,0.1,0].Step 2: Calculate similarity or distance
Assuming cosine similarity, 'v1' is closer to query (dot product ~0.9), 'v2' is less similar (~0.1).Final Answer:
[('v1', 0.9)] -> Option AQuick Check:
Closest vector = v1 with similarity 0.9 [OK]
Hint: Closest vector has highest dot product with query [OK]
Common Mistakes:
- Confusing similarity with distance
- Mixing up vector IDs in output
- Assuming lower score means closer
4. The following code tries to update a vector but throws an error. What is the likely cause?
db = VectorDB()
db.add_vector('v1', [0.5, 0.5, 0.5])
db.update_vector('v2', [0.1, 0.1, 0.1])medium
Solution
Step 1: Check vector existence before update
Updating a vector requires it to exist in the database first.Step 2: Identify the error cause
Since 'v2' was never added, trying to update it causes an error.Final Answer:
Vector 'v2' does not exist, so update fails -> Option AQuick Check:
Update needs existing vector [OK]
Hint: Update only existing vectors, else error occurs [OK]
Common Mistakes:
- Assuming update creates new vectors
- Thinking data type mismatch causes error
- Ignoring vector existence check
5. You want to delete vectors with similarity less than 0.5 to a query vector
[0, 1, 0] from your vector database. Which sequence of operations correctly achieves this?hard
Solution
Step 1: Find vectors below similarity threshold
Use a search or filter operation to get IDs of vectors with similarity less than 0.5.Step 2: Delete vectors by their IDs
Use the delete operation on each vector ID found to remove them from the database.Final Answer:
Search vectors with similarity < 0.5, then delete each by ID -> Option BQuick Check:
Filter then delete unwanted vectors [OK]
Hint: Filter vectors first, then delete by ID [OK]
Common Mistakes:
- Deleting all vectors instead of selective ones
- Trying to update instead of delete
- Ignoring the similarity filter step
