0
0
DynamoDBquery~15 mins

TTL with Streams for archival in DynamoDB - Deep Dive

Choose your learning style9 modes available
Overview - TTL with Streams for archival
What is it?
TTL (Time to Live) is a feature in DynamoDB that automatically deletes items after a specified time. Streams capture changes in the table, including deletions caused by TTL. Combining TTL with Streams allows you to archive deleted data before it disappears permanently. This helps keep your database clean while preserving important historical records.
Why it matters
Without TTL, expired data would accumulate, making the database large and slow. Without Streams, once data is deleted by TTL, it is lost forever. Using TTL with Streams solves this by automatically cleaning old data and letting you archive it elsewhere. This keeps your system efficient and your data safe for future analysis or compliance.
Where it fits
Before learning this, you should understand basic DynamoDB tables and how TTL works. After this, you can explore advanced data lifecycle management, event-driven architectures using Lambda, and data warehousing for archived data.
Mental Model
Core Idea
TTL automatically removes expired data, and Streams capture those removals so you can archive the data before it disappears.
Think of it like...
It's like setting a timer on food in your fridge (TTL) that throws it away when expired, while a camera (Streams) records what was thrown out so you can keep a photo album of past meals.
┌─────────────┐       ┌───────────────┐       ┌───────────────┐
│ DynamoDB    │       │ TTL Feature   │       │ Streams       │
│ Table       │──────▶│ Deletes Items │──────▶│ Captures      │
│ (Data)      │       │ After Expiry  │       │ Deletion Event│
└─────────────┘       └───────────────┘       └───────────────┘
                                         │
                                         ▼
                               ┌───────────────────┐
                               │ Archival System   │
                               │ (e.g., S3, Redshift)│
                               └───────────────────┘
Build-Up - 6 Steps
1
FoundationUnderstanding DynamoDB TTL Basics
🤔
Concept: Learn what TTL is and how it automatically deletes expired items.
TTL is a timestamp attribute you add to your DynamoDB items. When the current time passes this timestamp, DynamoDB marks the item for deletion. This process runs in the background and helps keep your table size manageable without manual cleanup.
Result
Expired items are automatically removed from the table after their TTL timestamp passes.
Understanding TTL helps you automate data cleanup, reducing manual work and storage costs.
2
FoundationIntroduction to DynamoDB Streams
🤔
Concept: Streams capture changes in your DynamoDB table as a sequence of events.
When items are added, updated, or deleted, Streams record these changes in order. You can enable Streams on your table and configure what information is captured, such as old and new images of the item.
Result
You get a real-time feed of all changes happening in your table.
Streams let you react to data changes, enabling workflows like archiving or triggering other processes.
3
IntermediateHow TTL Deletions Appear in Streams
🤔Before reading on: Do you think TTL deletions generate stream events like manual deletes? Commit to your answer.
Concept: TTL deletions do generate stream events, but only as delete records without old images by default.
When TTL deletes an item, DynamoDB Streams records a DELETE event. However, the old image (the deleted item data) is only available if you enable 'Old Image' in the stream settings. This is crucial for archiving because you need the data before it disappears.
Result
You receive delete events in Streams for TTL removals, which can be used to archive the deleted data.
Knowing TTL deletions appear in Streams allows you to capture and archive data that would otherwise be lost.
4
IntermediateSetting Up Archival with Lambda and Streams
🤔Before reading on: Will a Lambda triggered by Streams see the full deleted item data by default? Commit to your answer.
Concept: Use a Lambda function triggered by Streams to process TTL delete events and archive the data to another storage.
Enable Streams with 'Old Image' to get the full deleted item. Configure a Lambda to trigger on delete events. The Lambda reads the old image and writes it to an archival system like S3 or Redshift. This preserves data before it's permanently gone.
Result
Deleted items are archived automatically as soon as TTL removes them.
Automating archival with Lambda and Streams ensures no data is lost and reduces manual intervention.
5
AdvancedHandling Stream Event Ordering and Duplication
🤔Before reading on: Do you think DynamoDB Streams guarantees exactly-once delivery and strict ordering? Commit to your answer.
Concept: Streams guarantee ordering per shard but can deliver events more than once, so your archival must handle duplicates and ordering carefully.
DynamoDB Streams deliver events in order within each shard but may retry events, causing duplicates. Your Lambda should be idempotent—able to process the same event multiple times without side effects. Use unique keys or checksums in your archival to avoid duplicate records.
Result
Your archival system remains consistent and accurate despite retries or duplicates.
Understanding Streams' delivery model prevents data corruption and ensures reliable archival.
6
ExpertOptimizing Cost and Performance in TTL Archival
🤔Before reading on: Is it cheaper to archive every deleted item immediately or batch archive later? Commit to your answer.
Concept: Balancing immediate archival with batching can optimize costs and performance in large-scale systems.
Archiving every TTL deletion immediately via Lambda can increase costs and throttling risk. Instead, buffer events in a queue or temporary store, then batch archive during off-peak times. Use DynamoDB Streams with Kinesis or SQS for buffering. This reduces Lambda invocations and storage API calls.
Result
You achieve cost-effective, scalable archival without losing data or performance.
Knowing how to balance immediacy and batching helps build efficient, production-ready archival pipelines.
Under the Hood
TTL works by DynamoDB scanning items with expired timestamps and marking them for deletion asynchronously. These deletions trigger stream events that record the change. Streams partition data into shards, maintaining order per shard and storing events for 24 hours. Lambda functions poll these shards to process events. The system ensures eventual consistency and retries on failure, requiring idempotent processing.
Why designed this way?
TTL was designed to automate data lifecycle management without burdening users with manual cleanup. Streams provide a near real-time change log to enable reactive architectures. The asynchronous, partitioned design balances scalability, performance, and cost. Alternatives like synchronous deletes or polling would not scale well or would increase latency and cost.
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ DynamoDB Item │       │ TTL Expiry    │       │ Item Marked   │
│ with TTL      │──────▶│ Detected by   │──────▶│ for Deletion  │
└───────────────┘       │ Background   │       └───────────────┘
                        └───────────────┘               │
                                                        ▼
                                              ┌─────────────────┐
                                              │ DynamoDB Stream │
                                              │ DELETE Event    │
                                              └─────────────────┘
                                                        │
                                                        ▼
                                              ┌─────────────────┐
                                              │ Lambda Function │
                                              │ Processes Event │
                                              └─────────────────┘
                                                        │
                                                        ▼
                                              ┌─────────────────┐
                                              │ Archival Store  │
                                              │ (S3, Redshift)  │
                                              └─────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does TTL deletion immediately remove the item from the table? Commit yes or no.
Common Belief:TTL deletes items instantly at the expiration time.
Tap to reveal reality
Reality:TTL deletions happen asynchronously and can take up to 48 hours to remove expired items.
Why it matters:Assuming immediate deletion can cause stale data to appear in queries, leading to incorrect application behavior.
Quick: Do TTL deletions generate stream events by default? Commit yes or no.
Common Belief:TTL deletions do not generate stream events because they are automatic.
Tap to reveal reality
Reality:TTL deletions do generate DELETE events in Streams if Streams are enabled.
Why it matters:Missing this means you might not archive deleted data, losing important historical records.
Quick: Are DynamoDB Streams guaranteed to deliver each event exactly once? Commit yes or no.
Common Belief:Streams deliver each event exactly once and in perfect order.
Tap to reveal reality
Reality:Streams guarantee order per shard but can deliver events more than once (at-least-once delivery).
Why it matters:Not handling duplicates can cause archival systems to store repeated data, wasting space and causing confusion.
Quick: Can you rely on Streams to keep deleted item data forever? Commit yes or no.
Common Belief:Stream records are stored indefinitely, so you can archive anytime.
Tap to reveal reality
Reality:Stream records are kept only for 24 hours, so archival must happen promptly.
Why it matters:Delaying archival beyond 24 hours risks losing the deleted data forever.
Expert Zone
1
TTL deletions do not trigger DynamoDB Streams if Streams are disabled or configured without 'Old Image', so archival requires careful setup.
2
Lambda functions processing Streams must be idempotent and handle retries gracefully to avoid duplicate archival entries.
3
Partition keys affect stream shard distribution, influencing event ordering and parallelism in archival processing.
When NOT to use
Avoid using TTL with Streams for archival if your data requires immediate deletion without any retention. Instead, use manual deletion with backup snapshots. Also, for extremely high-throughput tables, consider dedicated change data capture tools or AWS Database Migration Service for archival.
Production Patterns
In production, teams use TTL with Streams combined with Lambda to move expired data to S3 in Parquet format for cost-effective long-term storage. They implement deduplication logic in Lambda and batch writes to reduce costs. Monitoring and alerting on stream lag and Lambda errors ensure reliable archival.
Connections
Event-driven Architecture
Builds-on
Understanding TTL with Streams helps grasp how event-driven systems react to data changes automatically, enabling scalable and decoupled workflows.
Data Lifecycle Management
Same pattern
TTL with Streams is a practical example of automating data lifecycle stages: creation, expiration, deletion, and archival.
Supply Chain Management
Analogous process
Just like expired products are removed from shelves but recorded for inventory and compliance, TTL with Streams removes data but archives it for future use.
Common Pitfalls
#1Not enabling Streams or configuring them without 'Old Image' when using TTL for archival.
Wrong approach:DynamoDB table with TTL enabled but Streams disabled or set to KEYS_ONLY.
Correct approach:Enable DynamoDB Streams with 'OLD_IMAGE' to capture full deleted item data for archival.
Root cause:Misunderstanding that TTL deletions automatically appear in Streams with full data.
#2Assuming DynamoDB Streams deliver events exactly once and ignoring duplicate processing.
Wrong approach:Lambda function archives data without checking for duplicates or idempotency.
Correct approach:Implement idempotent Lambda logic using unique keys or checksums to avoid duplicate archival entries.
Root cause:Not knowing Streams can deliver the same event multiple times.
#3Delaying archival processing beyond Streams retention period.
Wrong approach:Archival system polls Streams after 48 hours expecting to find delete events.
Correct approach:Process Streams events promptly within 24 hours to ensure no data loss.
Root cause:Unawareness of Streams' 24-hour retention limit.
Key Takeaways
TTL in DynamoDB automatically deletes expired items to keep tables clean and efficient.
DynamoDB Streams capture TTL deletions as events, enabling you to archive data before it is lost.
Enabling Streams with 'Old Image' is essential to get full deleted item data for archival.
Streams deliver events at least once and in order per shard, so your archival must handle duplicates and ordering.
Prompt processing of Streams events is critical because records expire after 24 hours.