0
0
DynamoDBquery~15 mins

Global Secondary Index (GSI) concept in DynamoDB - Deep Dive

Choose your learning style9 modes available
Overview - Global Secondary Index (GSI) concept
What is it?
A Global Secondary Index (GSI) in DynamoDB is a way to create an alternate view of your data with a different key structure. It lets you query the table using different attributes than the main primary key. This helps you find data quickly in ways the original table design might not support. GSIs are separate from the main table but stay updated automatically.
Why it matters
Without GSIs, you can only efficiently query data by the table's primary key, which limits how you access your data. GSIs solve this by letting you search using other attributes, making your app faster and more flexible. Without GSIs, you might have to scan the whole table, which is slow and costly.
Where it fits
Before learning GSIs, you should understand DynamoDB tables, primary keys, and basic querying. After GSIs, you can explore Local Secondary Indexes (LSIs), query optimization, and advanced data modeling in DynamoDB.
Mental Model
Core Idea
A Global Secondary Index is like a separate, automatically updated shortcut that lets you find data using different keys than the main table.
Think of it like...
Imagine a library where books are organized by author (main table). A GSI is like a separate catalog organized by genre, letting you find books by genre quickly without rearranging the whole library.
Main Table (Primary Key: UserID)
┌─────────────┐
│ UserID (PK) │
│ Name        │
│ Email       │
│ City        │
└─────────────┘

Global Secondary Index (GSI) (Partition Key: City)
┌───────────────┐
│ City (GSI PK) │
│ UserID        │
│ Name          │
└───────────────┘

Query main table by UserID or query GSI by City.
Build-Up - 7 Steps
1
FoundationUnderstanding DynamoDB Primary Keys
🤔
Concept: Learn what a primary key is and how it uniquely identifies items in a DynamoDB table.
In DynamoDB, every item in a table must have a primary key. This key can be simple (one attribute) or composite (partition key + sort key). The primary key ensures each item is unique and helps DynamoDB find items quickly.
Result
You know how DynamoDB organizes data and why the primary key is essential for fast lookups.
Understanding primary keys is crucial because GSIs build on this concept by creating alternate keys for querying.
2
FoundationBasics of Querying in DynamoDB
🤔
Concept: Learn how queries use primary keys to find data efficiently.
A query in DynamoDB uses the primary key to quickly find matching items. If you query by an attribute that is not part of the primary key, DynamoDB must scan the whole table, which is slow and expensive.
Result
You see why querying by primary key is fast and why querying by other attributes is slow without indexes.
Knowing this limitation sets the stage for why GSIs are needed to improve query flexibility.
3
IntermediateIntroducing Global Secondary Indexes
🤔Before reading on: do you think GSIs store copies of data or just pointers? Commit to your answer.
Concept: GSIs create a new index with a different key structure, storing copies of selected attributes to enable fast queries on those keys.
A GSI has its own partition key and optional sort key, different from the main table. It stores copies of attributes you choose, so you can query the table using these alternate keys without scanning the whole table.
Result
You can query data efficiently using the GSI keys, even if they are different from the main table's keys.
Understanding that GSIs store copies of data explains why they improve query speed but also affect storage and write costs.
4
IntermediateHow GSIs Stay Updated Automatically
🤔Before reading on: do you think updating a GSI requires manual syncing? Commit to your answer.
Concept: DynamoDB automatically updates GSIs when the main table changes, keeping the index in sync without extra work from you.
When you add, update, or delete items in the main table, DynamoDB updates the GSI behind the scenes. This means your queries on the GSI always see the latest data without manual intervention.
Result
Your GSI queries return fresh data automatically, simplifying application logic.
Knowing this automatic syncing helps you trust GSIs for real-time queries and reduces complexity in your code.
5
IntermediateChoosing Attributes for a GSI
🤔Before reading on: do you think GSIs copy all table attributes by default? Commit to your answer.
Concept: You select which attributes the GSI copies to balance query needs and storage costs.
GSIs only copy the attributes you specify in the projection. You can choose to copy all attributes, only keys, or a subset. This choice affects storage size and query performance.
Result
You optimize GSIs to include only needed data, saving costs and improving speed.
Understanding attribute projection helps you design efficient GSIs tailored to your query patterns.
6
AdvancedImpact of GSIs on Write Capacity and Costs
🤔Before reading on: do you think writing to a table with GSIs costs the same as without? Commit to your answer.
Concept: Writes to a table with GSIs consume additional write capacity because DynamoDB must update the index too.
Every time you write to the main table, DynamoDB also writes to the GSI. This means your write capacity units (WCUs) are used for both the table and each GSI, increasing costs and requiring capacity planning.
Result
You understand that GSIs add overhead to writes and must be managed carefully.
Knowing this prevents unexpected cost spikes and helps you balance read and write needs.
7
ExpertConsistency and Latency Considerations with GSIs
🤔Before reading on: do you think GSI queries always return the latest data immediately? Commit to your answer.
Concept: GSI queries are eventually consistent, meaning there can be a slight delay before updates appear in the index.
Because GSIs update asynchronously, queries on a GSI might not reflect the most recent writes instantly. This eventual consistency is a tradeoff for performance and scalability.
Result
You know to design your application to handle slight delays in GSI query results.
Understanding eventual consistency helps avoid bugs and data confusion in real-world applications.
Under the Hood
When you write to a DynamoDB table, the service writes the item to the main storage and simultaneously updates any GSIs asynchronously. Each GSI maintains its own partition and sort keys and stores copies of projected attributes. The update to GSIs happens in the background, allowing the main write operation to complete quickly. Queries on GSIs use their own keys and storage structures, separate from the main table, enabling fast lookups on alternate keys.
Why designed this way?
GSIs were designed to provide flexible querying without redesigning the main table or duplicating data manually. The asynchronous update model balances write performance and index freshness, avoiding slow writes. Storing copies of attributes in GSIs allows queries without fetching the main table, improving speed. Alternatives like scanning the table were too slow and costly, so GSIs offer a scalable solution.
┌───────────────┐       ┌───────────────────────────────┐
│ Main Table    │       │ Global Secondary Index (GSI)   │
│ Partition Key │◄──────┤ Partition Key (different)      │
│ Sort Key      │       │ Sort Key (optional)            │
│ Attributes    │       │ Projected Attributes           │
└─────┬─────────┘       └───────────────┬───────────────┘
      │ Writes                    ▲ Updates asynchronously
      │                          │
      ▼                          │
┌─────────────────────────────────────────────┐
│ DynamoDB Storage and Indexing Engine         │
└─────────────────────────────────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do GSIs always return the latest data immediately after a write? Commit to yes or no.
Common Belief:GSIs are always up-to-date and return the latest data instantly.
Tap to reveal reality
Reality:GSIs update asynchronously, so queries may return slightly stale data for a short time.
Why it matters:Assuming immediate consistency can cause bugs where your app shows outdated information or misses recent changes.
Quick: Do GSIs reduce write costs because they are separate indexes? Commit to yes or no.
Common Belief:Using GSIs does not affect write costs since they are separate from the main table.
Tap to reveal reality
Reality:Writes to the main table also consume write capacity for each GSI, increasing costs.
Why it matters:Ignoring this leads to unexpected high bills and capacity throttling in production.
Quick: Do GSIs automatically include all attributes from the main table? Commit to yes or no.
Common Belief:GSIs copy all attributes from the main table by default.
Tap to reveal reality
Reality:You must specify which attributes to project; otherwise, only keys are copied.
Why it matters:Not projecting needed attributes can cause incomplete query results or require extra fetches.
Quick: Can you create a GSI with the same partition key as the main table? Commit to yes or no.
Common Belief:GSIs must have different partition keys than the main table.
Tap to reveal reality
Reality:GSIs can have the same partition key attribute but usually differ to enable new query patterns.
Why it matters:Misunderstanding this limits your design options and flexibility.
Expert Zone
1
GSIs consume additional write capacity units proportional to the size of projected attributes, so careful projection design can optimize costs.
2
The eventual consistency of GSIs means that in high-write scenarios, queries might miss recent updates, requiring application-level handling or fallback strategies.
3
GSIs have their own throughput limits separate from the main table, so heavy GSI queries can throttle independently, affecting performance.
When NOT to use
Avoid GSIs when you need strictly consistent reads or when write throughput is extremely high and cost-sensitive. Instead, consider Local Secondary Indexes (LSIs) for consistent queries on the same partition key or redesign your data model to fit primary key queries.
Production Patterns
In production, GSIs are used to support multiple query patterns without duplicating tables. For example, an e-commerce app might use the main table keyed by OrderID and a GSI keyed by CustomerID to quickly find all orders by a customer. Monitoring GSI usage and capacity is critical to avoid throttling and control costs.
Connections
Database Indexing
GSIs are a type of database index specialized for NoSQL key-value stores.
Understanding traditional database indexes helps grasp how GSIs speed up queries by creating alternate data access paths.
Eventual Consistency in Distributed Systems
GSIs exhibit eventual consistency similar to many distributed data stores.
Knowing eventual consistency concepts from distributed systems explains why GSI queries might lag behind writes.
Library Catalog Systems
GSIs function like alternate catalogs in libraries, enabling different ways to find books.
Recognizing this connection helps appreciate how GSIs provide flexible data access without reorganizing the main data.
Common Pitfalls
#1Querying a GSI expecting strongly consistent results.
Wrong approach:Query the GSI with ConsistentRead=true expecting immediate latest data.
Correct approach:Query the GSI with ConsistentRead=false and design your app to handle eventual consistency.
Root cause:Misunderstanding that GSIs only support eventually consistent reads.
#2Creating a GSI without projecting needed attributes.
Wrong approach:Create GSI with ProjectionType=KEYS_ONLY but query attributes not included in keys.
Correct approach:Create GSI with ProjectionType=INCLUDE or ALL to include required attributes for queries.
Root cause:Not realizing GSIs only copy attributes you specify, leading to incomplete query results.
#3Ignoring increased write capacity when adding GSIs.
Wrong approach:Add multiple GSIs but keep write capacity units unchanged, causing throttling.
Correct approach:Increase write capacity units to account for GSI overhead or use on-demand mode.
Root cause:Underestimating the write cost impact of GSIs.
Key Takeaways
Global Secondary Indexes let you query DynamoDB tables using alternate keys for flexible data access.
GSIs store copies of selected attributes and update asynchronously, enabling fast but eventually consistent queries.
Using GSIs increases write costs and capacity needs because updates must be written to both the main table and indexes.
Careful design of GSI keys and projected attributes balances query performance, cost, and storage.
Understanding GSIs' eventual consistency and capacity impact is essential for building reliable, scalable applications.