Bird
Raised Fist0
Prompt Engineering / GenAIml~6 mins

Vector databases (Pinecone, ChromaDB, Weaviate) in Prompt Engineering / GenAI - Full Explanation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
Finding relevant information quickly from large collections of data is hard when the data is complex, like images or text. Vector databases solve this by organizing data in a way that helps computers find similar items fast, even if they are not exact matches.
Explanation
What are vectors in data
Vectors are lists of numbers that represent complex data like words, images, or sounds in a way computers can understand. Each number in the list captures a feature or aspect of the data, allowing similar items to have similar vectors.
Vectors turn complex data into numbers so computers can compare and find similarities.
Purpose of vector databases
Vector databases store and organize these number lists efficiently to quickly find items that are close or similar to a given vector. This helps in tasks like searching for similar images or finding related documents.
Vector databases help find similar data quickly by comparing vectors.
How similarity search works
When you search, the database compares your query vector to stored vectors using math measures like distance or angle. The closest vectors represent the most similar items to your query.
Similarity search finds data items with vectors closest to the query vector.
Examples: Pinecone, ChromaDB, Weaviate
Pinecone, ChromaDB, and Weaviate are popular vector databases that offer tools to store, search, and manage vectors easily. They provide fast search, scalability, and integration with AI models for real-world applications.
These platforms make it easy to use vector search in applications.
Real World Analogy

Imagine a huge library where books are not organized by title or author but by the story's theme and style. Instead of exact titles, you describe the kind of story you want, and the librarian quickly finds books with similar themes and feelings.

Vectors → Numbers describing the story's theme and style
Vector databases → The librarian organizing and searching books by theme
Similarity search → Finding books with themes closest to your description
Pinecone, ChromaDB, Weaviate → Different libraries with expert librarians using this method
Diagram
Diagram
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Input Data  │──────▶│ Vectorization │──────▶│ Vector Storage│
│ (text, image) │       │ (numbers list)│       │ (database)    │
└───────────────┘       └───────────────┘       └───────────────┘
                                   │                      │
                                   ▼                      ▼
                          ┌─────────────────┐    ┌─────────────────┐
                          │ Query Vector    │    │ Similarity      │
                          │ (search input)  │    │ Search Algorithm│
                          └─────────────────┘    └─────────────────┘
                                   │                      │
                                   └──────────┬───────────┘
                                              ▼
                                    ┌─────────────────┐
                                    │  Search Results  │
                                    │ (most similar)   │
                                    └─────────────────┘
This diagram shows how data is turned into vectors, stored in a vector database, and searched by similarity to find the closest matches.
Key Facts
VectorA list of numbers representing complex data features for comparison.
Vector databaseA system designed to store and search vectors efficiently.
Similarity searchFinding data items whose vectors are closest to a query vector.
PineconeA managed vector database service focused on scalability and speed.
ChromaDBAn open-source vector database designed for AI applications.
WeaviateA vector database with built-in AI modules and semantic search.
Common Confusions
Vectors are the original data like images or text.
Vectors are the original data like images or text. Vectors are numeric representations derived from original data, not the data itself.
Vector databases store exact copies of data for search.
Vector databases store exact copies of data for search. They store vectors that summarize data features to enable similarity-based search, not exact data copies.
Similarity search finds exact matches only.
Similarity search finds exact matches only. Similarity search finds items that are close or related, not just exact matches.
Summary
Vector databases organize complex data as numbers to find similar items quickly.
They use similarity search to compare vectors and return related results.
Pinecone, ChromaDB, and Weaviate are popular tools that make vector search practical.

Practice

(1/5)
1. What is the main purpose of a vector database like Pinecone, ChromaDB, or Weaviate?
easy
A. To store plain text documents only
B. To perform traditional SQL queries on structured data
C. To store and search data based on similarity using number lists
D. To create visual graphs from data

Solution

  1. Step 1: Understand what vector databases store

    Vector databases store data as vectors, which are lists of numbers representing complex data like images or text.
  2. Step 2: Identify the main use of vector databases

    They allow fast searching by similarity, not by exact matches like traditional databases.
  3. Final Answer:

    To store and search data based on similarity using number lists -> Option C
  4. Quick Check:

    Vector databases = similarity search [OK]
Hint: Vector DBs = search by meaning, not exact text [OK]
Common Mistakes:
  • Thinking vector DBs only store text
  • Confusing vector DBs with SQL databases
  • Assuming vector DBs create visual graphs
2. Which of the following is the correct way to insert a vector into Pinecone using Python?
easy
A. pinecone.insert(id='vec1', vector=[0.1, 0.2, 0.3])
B. pinecone.upsert(vectors=[('vec1', [0.1, 0.2, 0.3])])
C. pinecone.add_vector('vec1', [0.1, 0.2, 0.3])
D. pinecone.push_vector(id='vec1', vector=[0.1, 0.2, 0.3])

Solution

  1. Step 1: Recall Pinecone's method to add vectors

    Pinecone uses the 'upsert' method to insert or update vectors, which takes a list of tuples with id and vector.
  2. Step 2: Match the correct syntax

    pinecone.upsert(vectors=[('vec1', [0.1, 0.2, 0.3])]) uses 'upsert' with a list of tuples, which is the correct syntax.
  3. Final Answer:

    pinecone.upsert(vectors=[('vec1', [0.1, 0.2, 0.3])]) -> Option B
  4. Quick Check:

    Use upsert with list of (id, vector) tuples [OK]
Hint: Pinecone uses upsert() with list of (id, vector) [OK]
Common Mistakes:
  • Using insert() instead of upsert()
  • Passing vector without wrapping in a list
  • Using non-existent methods like add_vector or push_vector
3. Given the following code snippet using ChromaDB, what will be the output?
collection.add(ids=['1'], embeddings=[[0.1, 0.2, 0.3]], metadatas=[{'type': 'image'}], documents=['cat image'])
results = collection.query(query_embeddings=[[0.1, 0.2, 0.3]], n_results=1)
print(results['documents'])
medium
A. [['cat image']]
B. ['cat image']
C. [{'type': 'image'}]
D. []

Solution

  1. Step 1: Understand what add() does in ChromaDB

    The add() method stores the document with its vector and metadata in the collection.
  2. Step 2: Understand query() output format

    The query() method returns a dictionary with keys like 'documents' containing a list of lists of matched documents.
  3. Final Answer:

    [['cat image']] -> Option A
  4. Quick Check:

    Query returns list of lists of documents [OK]
Hint: ChromaDB query returns list of lists for documents [OK]
Common Mistakes:
  • Expecting a flat list instead of list of lists
  • Confusing documents with metadata
  • Assuming empty result when vector matches exactly
4. You wrote this Weaviate query to find similar items but get an error:
client.query.get('Article', ['title']).with_near_vector({'vector': [0.1, 0.2]}).do()
What is the likely cause of the error?
medium
A. The query must include a filter parameter
B. The method with_near_vector does not exist in Weaviate client
C. The class name 'Article' must be lowercase
D. The vector length is too short; it should match the database dimension

Solution

  1. Step 1: Check vector length requirement in Weaviate

    Weaviate expects the vector length to match the dimension used when creating the index, usually 3 or more numbers.
  2. Step 2: Identify the error cause

    The vector [0.1, 0.2] has length 2, which is likely shorter than expected, causing the error.
  3. Final Answer:

    The vector length is too short; it should match the database dimension -> Option D
  4. Quick Check:

    Vector length must match index dimension [OK]
Hint: Vector length must match index dimension in Weaviate [OK]
Common Mistakes:
  • Thinking method name is wrong
  • Assuming class names must be lowercase
  • Believing filter is always required
5. You want to build a search system that finds similar product descriptions using Weaviate. Which steps should you follow to prepare and query the data correctly?
hard
A. Create a schema with a vector index, add product descriptions as objects with vectors, then query using nearVector filter
B. Store product descriptions as plain text only, then query with SQL-like text search
C. Upload product images only, then query using image metadata filters
D. Create a schema without vector index, add descriptions, then query using exact match filters

Solution

  1. Step 1: Define schema with vector index in Weaviate

    To search by similarity, the schema must include a vector index for the product description class.
  2. Step 2: Add product descriptions as objects with vectors

    Each product description is stored as an object with its vector embedding representing meaning.
  3. Step 3: Query using nearVector filter

    Use the nearVector filter in queries to find objects with vectors close to the query vector.
  4. Final Answer:

    Create a schema with a vector index, add product descriptions as objects with vectors, then query using nearVector filter -> Option A
  5. Quick Check:

    Schema + vectors + nearVector query = correct approach [OK]
Hint: Schema with vectors + nearVector query = similarity search [OK]
Common Mistakes:
  • Trying to search plain text without vectors
  • Using exact match filters for similarity search
  • Ignoring schema vector index setup