0
0
DynamoDBquery~15 mins

Creating GSI in DynamoDB - Mechanics & Internals

Choose your learning style9 modes available
Overview - Creating GSI
What is it?
A Global Secondary Index (GSI) in DynamoDB is a way to create an alternate view of your data with a different key structure. It lets you query the table using different attributes than the main key. This helps you find data quickly without scanning the whole table.
Why it matters
Without GSIs, you can only efficiently query data by the table's primary key. This limits how you access your data and can make some queries slow or impossible. GSIs solve this by letting you create new keys for fast lookups, improving app performance and user experience.
Where it fits
Before learning about GSIs, you should understand DynamoDB tables, primary keys, and basic queries. After GSIs, you can explore advanced querying, Local Secondary Indexes (LSIs), and data modeling strategies for DynamoDB.
Mental Model
Core Idea
A GSI is a separate index that lets you query your DynamoDB table using different keys without changing the original table structure.
Think of it like...
Imagine a library where books are organized by author (the main table key). A GSI is like adding a new shelf organized by genre, so you can find books by genre quickly without rearranging the whole library.
┌───────────────┐      ┌───────────────┐
│ DynamoDB Table│      │ Global Secondary│
│ Primary Key:  │      │ Index (GSI)     │
│ UserID       │      │ Partition Key:  │
│ Sort Key: Date│      │ Genre           │
└──────┬────────┘      └──────┬─────────┘
       │                       │
       │ Data stored once       │ Data stored separately
       │                       │
       ▼                       ▼
  Query by UserID          Query by Genre
Build-Up - 7 Steps
1
FoundationUnderstanding DynamoDB Primary Keys
🤔
Concept: Learn what primary keys are and how they uniquely identify items in a DynamoDB table.
In DynamoDB, every item in a table must have a primary key. This key can be simple (one attribute) or composite (partition key + sort key). The primary key ensures each item is unique and helps DynamoDB find data quickly.
Result
You know how DynamoDB organizes data and why primary keys are essential for fast lookups.
Understanding primary keys is crucial because GSIs build on this concept by creating new keys for different query patterns.
2
FoundationWhat is an Index in DynamoDB?
🤔
Concept: Introduce the idea of indexes as alternate ways to access data in DynamoDB.
An index is like a shortcut to find data without scanning the whole table. DynamoDB supports two types: Local Secondary Index (LSI) and Global Secondary Index (GSI). LSIs share the same partition key but different sort keys, while GSIs can have different partition and sort keys.
Result
You understand that indexes let you query data in new ways without changing the main table.
Knowing indexes exist prepares you to use GSIs to solve real querying problems.
3
IntermediateCreating a Basic Global Secondary Index
🤔Before reading on: do you think a GSI must use the same partition key as the main table or can it use a different one? Commit to your answer.
Concept: Learn how to define a GSI with its own partition and optional sort key.
To create a GSI, you specify a new partition key and optionally a sort key different from the main table. You also define which attributes to project (copy) into the index. This lets you query the table using these new keys efficiently.
Result
You can create a GSI that supports queries on different attributes than the main table keys.
Understanding that GSIs have independent keys unlocks flexible querying options beyond the main table.
4
IntermediateChoosing Projection Types for GSI
🤔Before reading on: do you think projecting all attributes into a GSI always improves performance or can it have downsides? Commit to your answer.
Concept: Learn about projection types that control which attributes are copied into the GSI.
Projection types include KEYS_ONLY (only keys), INCLUDE (specific attributes), and ALL (all attributes). Choosing the right projection balances query speed and storage cost. Projecting too many attributes increases storage and write costs.
Result
You can optimize GSIs by selecting appropriate projection types for your query needs.
Knowing projection tradeoffs helps you design GSIs that are both fast and cost-effective.
5
IntermediateQuerying Data Using a GSI
🤔Before reading on: do you think querying a GSI is the same as querying the main table or are there differences? Commit to your answer.
Concept: Learn how to write queries that use the GSI keys to retrieve data.
When querying a GSI, you specify the index name and use the GSI's partition and sort keys in your query conditions. This returns items matching the GSI keys without scanning the whole table.
Result
You can efficiently retrieve data using alternate keys defined by the GSI.
Understanding how to query GSIs unlocks powerful data access patterns in DynamoDB.
6
AdvancedManaging GSI Capacity and Performance
🤔Before reading on: do you think GSIs share the same throughput capacity as the main table or have separate limits? Commit to your answer.
Concept: Learn how GSIs have their own read/write capacity and how to manage them.
GSIs consume separate read and write capacity units from the main table. Writes to the main table also write to GSIs, increasing costs. You can use on-demand or provisioned capacity modes and monitor GSI usage to avoid throttling.
Result
You can plan capacity and cost for GSIs to maintain performance and control expenses.
Knowing GSIs have separate capacity helps prevent unexpected throttling and cost spikes.
7
ExpertGSI Consistency and Latency Considerations
🤔Before reading on: do you think queries on GSIs always return the latest data instantly or can there be delays? Commit to your answer.
Concept: Understand the eventual consistency model of GSIs and its impact on applications.
GSIs are eventually consistent, meaning there can be a delay between writing data to the main table and it appearing in the GSI. This can cause queries on GSIs to return stale data briefly. Applications must handle this or use the main table for strongly consistent reads.
Result
You can design applications that handle GSI latency and consistency correctly.
Understanding eventual consistency in GSIs prevents subtle bugs and improves user experience.
Under the Hood
When you create a GSI, DynamoDB maintains a separate data structure that stores copies of the specified attributes with the new key schema. Every write to the main table triggers a background process that updates the GSI asynchronously. This separation allows fast queries on alternate keys without affecting the main table's performance.
Why designed this way?
GSIs were designed to provide flexible querying without duplicating entire tables or slowing down main table operations. The asynchronous update balances write performance and query flexibility, accepting eventual consistency as a tradeoff for speed and scalability.
┌───────────────┐       ┌───────────────┐
│ Main Table    │       │ Global Secondary│
│ Partition Key │       │ Index          │
│ Sort Key      │       │ Partition Key  │
│ Attributes    │       │ Sort Key       │
└──────┬────────┘       └──────┬─────────┘
       │ Write triggers          │ Async update
       │------------------------>│
       │                         │
       ▼                         ▼
  Data stored               Data stored
  in main table            in GSI index
Myth Busters - 4 Common Misconceptions
Quick: Do you think GSIs always provide strongly consistent reads? Commit to yes or no.
Common Belief:GSIs provide the same strong consistency as the main table.
Tap to reveal reality
Reality:GSIs are eventually consistent, so queries may return stale data briefly after writes.
Why it matters:Assuming strong consistency can cause bugs where your app shows outdated information, confusing users.
Quick: Do you think creating many GSIs has no impact on write costs? Commit to yes or no.
Common Belief:Adding GSIs does not affect write costs significantly.
Tap to reveal reality
Reality:Each GSI requires additional writes when the main table is updated, increasing costs and latency.
Why it matters:Ignoring this can lead to unexpectedly high bills and slower writes.
Quick: Do you think GSIs can be created or deleted instantly on existing tables? Commit to yes or no.
Common Belief:You can add or remove GSIs instantly without downtime.
Tap to reveal reality
Reality:Creating or deleting GSIs takes time and can impact table performance during the process.
Why it matters:Planning index changes poorly can cause application slowdowns or errors.
Quick: Do you think GSIs always store all attributes from the main table? Commit to yes or no.
Common Belief:GSIs automatically copy all attributes from the main table.
Tap to reveal reality
Reality:You must specify which attributes to project; projecting all increases storage and cost.
Why it matters:Not choosing projections carefully can waste resources and increase costs.
Expert Zone
1
GSIs can cause hot partitions if the alternate key is not well-distributed, leading to throttling.
2
The eventual consistency delay varies and can be influenced by write volume and index size.
3
Sparse indexes can be created by projecting only items with certain attributes, optimizing storage.
When NOT to use
Avoid GSIs when you need strongly consistent reads or when write costs must be minimal. Instead, consider Local Secondary Indexes (LSIs) for strong consistency or redesign your data model to fit primary keys better.
Production Patterns
In production, GSIs are often used to support multiple query patterns like searching by user, status, or date. Teams monitor GSI usage with CloudWatch and adjust capacity or use on-demand mode. Sparse GSIs are used to index only relevant items, reducing costs.
Connections
Database Indexing
GSIs are a type of database index specialized for NoSQL key-value stores.
Understanding traditional database indexes helps grasp how GSIs speed up queries by creating alternate keys.
Eventual Consistency Models
GSIs operate under eventual consistency, a common pattern in distributed systems.
Knowing eventual consistency in distributed computing explains why GSI queries might lag behind writes.
Library Cataloging Systems
GSIs resemble alternate cataloging methods in libraries for quick lookup by different criteria.
Seeing GSIs as cataloging methods clarifies their role in organizing and accessing data efficiently.
Common Pitfalls
#1Creating a GSI without specifying projection attributes, leading to high storage costs.
Wrong approach:CreateTable with GSI projection ALL without considering attribute size or necessity.
Correct approach:CreateTable with GSI projection KEYS_ONLY or INCLUDE only necessary attributes.
Root cause:Misunderstanding that projecting all attributes is always best, ignoring cost and performance tradeoffs.
#2Querying a GSI without specifying the index name, causing errors or unexpected results.
Wrong approach:Query operation with KeyConditionExpression on GSI keys but no IndexName parameter.
Correct approach:Query operation with KeyConditionExpression and IndexName set to the GSI's name.
Root cause:Not realizing that queries on GSIs require explicitly naming the index.
#3Assuming GSIs provide strong consistency and relying on them for critical real-time data.
Wrong approach:Using GSI queries for immediate consistency needs without fallback to main table reads.
Correct approach:Use main table queries for strongly consistent reads; use GSIs for eventual consistency scenarios.
Root cause:Confusing DynamoDB's consistency models and GSI update delays.
Key Takeaways
Global Secondary Indexes let you query DynamoDB tables using different keys than the main table, enabling flexible data access.
GSIs maintain separate data structures updated asynchronously, which means queries on GSIs are eventually consistent, not strongly consistent.
Choosing the right projection type for a GSI balances query performance and storage cost, preventing unnecessary expenses.
GSIs consume separate read and write capacity units, so planning capacity and monitoring usage is essential to avoid throttling and high costs.
Understanding GSIs deeply helps design scalable, efficient DynamoDB applications that meet diverse query needs without sacrificing performance.