0
0
AWScloud~15 mins

Secondary indexes (GSI, LSI) in AWS - Deep Dive

Choose your learning style9 modes available
Overview - Secondary indexes (GSI, LSI)
What is it?
Secondary indexes in AWS DynamoDB are ways to create alternative views of your data to support different query patterns. Global Secondary Indexes (GSI) and Local Secondary Indexes (LSI) let you look up items using different keys than the main table. They help you find data quickly without scanning the whole table. Each index has its own rules and uses.
Why it matters
Without secondary indexes, you can only query DynamoDB using the main table's primary key, which limits how you access your data. This would force you to scan the entire table for many queries, making your app slow and costly. Secondary indexes let you build fast, flexible queries that match real user needs, improving performance and saving money.
Where it fits
Before learning about secondary indexes, you should understand DynamoDB tables, primary keys, and basic queries. After mastering indexes, you can explore advanced query optimization, data modeling for NoSQL, and DynamoDB Streams for real-time updates.
Mental Model
Core Idea
Secondary indexes are like alternate phone books that let you find people by different details, not just their main name or number.
Think of it like...
Imagine a phone book that lists people by their last name (main table). A Local Secondary Index is like a special section in the same book that lists people by their city but only for those with the same last name. A Global Secondary Index is like a separate phone book that lists people by their phone number, regardless of their last name.
┌─────────────────────────────┐
│        DynamoDB Table       │
│  Primary Key: UserID        │
│                             │
│  ┌───────────────┐          │
│  │ Local Secondary│         │
│  │ Index (LSI)   │         │
│  │ Sort Key: Date│          │
│  └───────────────┘          │
│                             │
│  ┌─────────────────────┐    │
│  │ Global Secondary    │    │
│  │ Index (GSI)         │    │
│  │ Partition Key: Email │    │
│  └─────────────────────┘    │
└─────────────────────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding DynamoDB Primary Keys
🤔
Concept: Learn what primary keys are and how they uniquely identify items in a DynamoDB table.
In DynamoDB, every item in a table must have a primary key. This key can be simple (just a partition key) or composite (partition key + sort key). The partition key decides which storage partition holds the item. The sort key lets you store multiple items with the same partition key but different sort keys, ordered for efficient queries.
Result
You can uniquely identify and retrieve items using the primary key, enabling fast lookups.
Understanding primary keys is essential because secondary indexes build on this concept to offer alternative ways to find data.
2
FoundationBasics of Querying in DynamoDB
🤔
Concept: Learn how queries work using primary keys and why scanning is inefficient.
A query in DynamoDB uses the primary key to find items quickly. If you don't have the key, you must scan the whole table, which reads every item and is slow and costly. Queries are efficient because DynamoDB knows exactly where to look based on the key.
Result
Queries return matching items fast, while scans are slow and expensive.
Knowing the limits of queries motivates the need for secondary indexes to support more flexible access patterns.
3
IntermediateWhat is a Local Secondary Index (LSI)?
🤔Before reading on: do you think an LSI can change the partition key of the main table? Commit to your answer.
Concept: LSI lets you create an alternate sort key for the same partition key, enabling different sorting and querying within that partition.
An LSI uses the same partition key as the main table but allows a different sort key. This means you can query items with the same partition key but order or filter them differently. LSIs must be created when the table is created and share the same storage partition as the main table.
Result
You can run queries that use the partition key and the LSI's sort key to get data sorted or filtered in new ways.
Understanding that LSIs share the partition key but change the sort key helps you design queries that need multiple sorting options within the same group.
4
IntermediateWhat is a Global Secondary Index (GSI)?
🤔Before reading on: do you think a GSI can have a completely different partition key than the main table? Commit to your answer.
Concept: GSI lets you create a new partition key and optional sort key, independent of the main table's keys, for flexible queries across all data.
A GSI is a separate index with its own partition key and optional sort key. It can be created anytime and stores copies of selected attributes. Because it has a different partition key, you can query data in ways the main table doesn't support. GSIs have their own throughput and storage.
Result
You can query data using new keys, enabling queries that the main table cannot perform efficiently.
Knowing GSIs provide independent keys and storage shows how to build flexible, scalable query patterns.
5
IntermediateDifferences Between LSI and GSI
🤔Before reading on: which index type do you think can be added after table creation, LSI or GSI? Commit to your answer.
Concept: Compare LSI and GSI on partition keys, creation time, storage, and throughput to understand their use cases.
LSIs share the main table's partition key, must be created with the table, and share throughput and storage. GSIs have their own partition key, can be added anytime, and have separate throughput and storage. LSIs are limited to 5 per table; GSIs can be up to 20.
Result
You can choose the right index type based on your query needs and operational constraints.
Understanding these differences helps you design indexes that balance flexibility, cost, and performance.
6
AdvancedConsistency and Capacity Impacts of Indexes
🤔Before reading on: do you think queries on GSIs are always strongly consistent? Commit to your answer.
Concept: Learn how indexes affect read consistency and capacity units, impacting performance and cost.
Queries on LSIs support strong consistency because they share storage with the main table. GSIs only support eventual consistency. Both indexes consume additional write capacity units because writes must update the index. GSIs have separate read and write capacity settings, allowing independent scaling.
Result
You can plan for performance and cost by understanding consistency and capacity trade-offs.
Knowing how indexes affect consistency and capacity prevents surprises in app behavior and billing.
7
ExpertAdvanced Indexing Patterns and Pitfalls
🤔Before reading on: can you use GSIs to query data without the main table's partition key? Commit to your answer.
Concept: Explore complex use cases, limitations, and best practices for using GSIs and LSIs in production.
GSIs allow queries without the main table's partition key, enabling flexible access patterns. However, overusing GSIs can increase costs and complexity. LSIs are limited and must be planned upfront. Sparse indexes (indexes with only some items) can optimize queries but require careful attribute design. Also, GSIs can cause eventual consistency issues that must be handled in your app logic.
Result
You can design efficient, cost-effective indexes and avoid common mistakes that degrade performance or cause bugs.
Understanding advanced patterns and pitfalls empowers you to build robust, scalable DynamoDB applications.
Under the Hood
DynamoDB stores data in partitions based on the partition key. LSIs share the same partition and storage as the main table but maintain a different sort key to order items. GSIs are separate tables internally with their own partitions and storage, replicating selected attributes asynchronously. Writes to the main table trigger updates to indexes, which consume additional capacity and storage.
Why designed this way?
LSIs were designed to allow alternate sorting within the same partition for efficient queries without duplicating data storage. GSIs were introduced later to provide more flexible querying across partitions with different keys, at the cost of eventual consistency and extra resources. This separation balances performance, cost, and query flexibility.
┌───────────────┐       ┌───────────────┐
│ Main Table    │       │ Global Secondary│
│ Partition Key │──────▶│ Index (GSI)    │
│ Sort Key      │       │ Partition Key  │
│               │       │ Sort Key       │
│               │       │               │
│  ┌─────────┐  │       └───────────────┘
│  │ Local   │  │
│  │ Secondary│  │
│  │ Index   │  │
│  │ (LSI)   │  │
│  └─────────┘  │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Can you add a Local Secondary Index after creating a DynamoDB table? Commit to yes or no.
Common Belief:You can add or remove LSIs anytime after table creation just like GSIs.
Tap to reveal reality
Reality:LSIs must be defined when the table is created and cannot be added or removed later.
Why it matters:Trying to add an LSI later leads to deployment failures and forces costly table rebuilds.
Quick: Do queries on GSIs always return the latest data immediately? Commit to yes or no.
Common Belief:Queries on GSIs are strongly consistent and always show the latest data.
Tap to reveal reality
Reality:GSIs only support eventual consistency, so queries might return slightly stale data shortly after writes.
Why it matters:Assuming strong consistency can cause bugs where your app shows outdated information.
Quick: Does using many GSIs always improve query performance without downsides? Commit to yes or no.
Common Belief:Adding more GSIs always makes queries faster and better.
Tap to reveal reality
Reality:Each GSI adds write overhead, storage cost, and complexity, which can degrade overall performance and increase bills.
Why it matters:Over-indexing can cause high costs and slow writes, hurting your app's scalability.
Quick: Can you query a DynamoDB table without specifying the partition key if you have a GSI? Commit to yes or no.
Common Belief:You must always specify the main table's partition key to query any data.
Tap to reveal reality
Reality:GSIs let you query data using their own partition key, independent of the main table's key.
Why it matters:Not knowing this limits your ability to design flexible queries and data models.
Expert Zone
1
LSIs share storage and throughput with the main table, so heavy LSI usage can impact main table performance unexpectedly.
2
GSIs replicate data asynchronously, so write-heavy workloads can cause index lag and eventual consistency delays.
3
Sparse indexes can be created by projecting only items with certain attributes, optimizing query efficiency but requiring careful data design.
When NOT to use
Avoid LSIs if you need to add indexes after table creation or require different partition keys. Avoid GSIs if you need strong consistency or want to minimize write costs. Instead, consider denormalizing data or using other AWS services like Elasticsearch for complex queries.
Production Patterns
In production, GSIs are often used to support alternative query patterns like searching by email or status. LSIs are used for sorting data within a user or group, like ordering orders by date. Sparse GSIs help filter data subsets efficiently. Monitoring index usage and capacity is critical to avoid throttling and high costs.
Connections
Database Indexing
Secondary indexes in DynamoDB are a cloud-native form of database indexing.
Understanding traditional database indexes helps grasp how DynamoDB indexes optimize query speed by creating alternate data access paths.
Eventual Consistency
GSIs operate with eventual consistency, a concept in distributed systems.
Knowing eventual consistency explains why GSI queries might return stale data and how to design apps to handle it gracefully.
Library Cataloging Systems
Like secondary indexes, library catalogs use multiple indexes (author, title, subject) to find books efficiently.
Seeing how libraries organize books by different keys helps understand why multiple indexes improve data retrieval flexibility.
Common Pitfalls
#1Trying to add an LSI after table creation.
Wrong approach:aws dynamodb update-table --table-name MyTable --attribute-definitions AttributeName=UserID,AttributeType=S --local-secondary-indexes '[{"IndexName":"NewLSI","KeySchema":[{"AttributeName":"UserID","KeyType":"HASH"},{"AttributeName":"NewSortKey","KeyType":"RANGE"}],"Projection":{"ProjectionType":"ALL"}}]'
Correct approach:Define LSIs only during table creation with aws dynamodb create-table command including LocalSecondaryIndexes parameter.
Root cause:Misunderstanding that LSIs are fixed at table creation and cannot be modified later.
#2Assuming GSI queries are strongly consistent and coding accordingly.
Wrong approach:const params = { TableName: 'MyTable', IndexName: 'MyGSI', KeyConditionExpression: 'Email = :email', ExpressionAttributeValues: { ':email': 'user@example.com' }, ConsistentRead: true };
Correct approach:Remove ConsistentRead or set it to false because GSIs do not support strong consistency.
Root cause:Not knowing GSIs only support eventual consistency, leading to incorrect use of ConsistentRead.
#3Creating too many GSIs without considering write capacity impact.
Wrong approach:Adding 10+ GSIs to a table without adjusting write capacity or monitoring costs.
Correct approach:Limit GSIs to necessary queries, monitor usage, and adjust capacity to balance cost and performance.
Root cause:Ignoring that each GSI consumes additional write capacity and storage, increasing costs and potential throttling.
Key Takeaways
Secondary indexes let you query DynamoDB tables in flexible ways beyond the main primary key.
Local Secondary Indexes share the partition key but allow different sorting within that partition, and must be created with the table.
Global Secondary Indexes have their own partition and sort keys, can be added anytime, but only support eventual consistency.
Indexes consume extra storage and capacity, so design them carefully to balance query flexibility, cost, and performance.
Understanding consistency, capacity, and index limits is crucial to building scalable and reliable DynamoDB applications.