What is sparse index in dynamodb

DynamodbConceptBeginner · 3 min read

Sparse Index in DynamoDB: What It Is and How It Works

A sparse index in DynamoDB is a secondary index that only includes items with a specific attribute, making it smaller and faster to query. It works by indexing only those items that have the indexed attribute, unlike a regular index that includes all table items.

⚙️

How It Works

Imagine you have a big filing cabinet with many folders, but you only want to quickly find folders that have a special sticker on them. A sparse index in DynamoDB works like a smaller cabinet that only holds folders with that sticker. This means you don't have to search through everything, just the important ones.

Technically, a sparse index only contains entries for items in the main table that have the attribute used as the index key. If an item does not have that attribute, it won't appear in the sparse index. This reduces the size of the index and speeds up queries that target those specific items.

💻

Example

This example shows how to create a sparse global secondary index (GSI) on a DynamoDB table where only items with the attribute status set to 'active' are indexed.

bash

aws dynamodb create-table \
    --table-name Users \
    --attribute-definitions \
        AttributeName=UserId,AttributeType=S \
        AttributeName=Status,AttributeType=S \
    --key-schema AttributeName=UserId,KeyType=HASH \
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
    --global-secondary-indexes '[
        {
            "IndexName": "ActiveUsersIndex",
            "KeySchema": [
                {"AttributeName":"Status","KeyType":"HASH"}
            ],
            "Projection": {"ProjectionType":"ALL"},
            "ProvisionedThroughput": {"ReadCapacityUnits":5,"WriteCapacityUnits":5}
        }
    ]'

# Insert items
aws dynamodb put-item --table-name Users --item '{"UserId": {"S": "1"}, "Name": {"S": "Alice"}, "Status": {"S": "active"}}'
aws dynamodb put-item --table-name Users --item '{"UserId": {"S": "2"}, "Name": {"S": "Bob"}}'

# Query the sparse index for active users
aws dynamodb query \
    --table-name Users \
    --index-name ActiveUsersIndex \
    --key-condition-expression "Status = :status" \
    --expression-attribute-values '{":status":{"S":"active"}}'

Output

{ "Items": [ { "UserId": {"S": "1"}, "Name": {"S": "Alice"}, "Status": {"S": "active"} } ], "Count": 1, "ScannedCount": 1 }

🎯

When to Use

Use a sparse index when you want to efficiently query only a subset of items that share a common attribute. For example, if you have a table of users but only want to quickly find those who are currently active, a sparse index on the status attribute with value 'active' is ideal.

This approach saves storage and speeds up queries because the index excludes all items without the attribute, reducing the amount of data DynamoDB scans.

✅

Key Points

A sparse index only includes items with the indexed attribute present.
It reduces index size and improves query performance for selective data.
Useful for filtering items by a specific attribute value like status or category.
Works with global secondary indexes (GSIs) in DynamoDB.

✅

Key Takeaways

A sparse index in DynamoDB indexes only items with a specific attribute, making queries faster and indexes smaller.

It is ideal for querying selective subsets of data, such as active users or items with a certain status.

Sparse indexes reduce storage costs and improve read efficiency by excluding items without the indexed attribute.

They are implemented using global secondary indexes (GSIs) with a key attribute that is not present on all items.