0
0
DynamoDBquery~15 mins

Scan pagination in DynamoDB - Deep Dive

Choose your learning style9 modes available
Overview - Scan pagination
What is it?
Scan pagination is a way to read large sets of data from a DynamoDB table in smaller parts called pages. Instead of getting all data at once, which can be slow or use too much memory, you get a limited number of items per request. Each page includes a marker to get the next page, so you can continue reading until all data is retrieved.
Why it matters
Without scan pagination, trying to read a big table all at once can cause slow responses, timeouts, or high costs. Pagination helps keep your app fast and efficient by breaking data into manageable chunks. It also helps avoid hitting limits set by DynamoDB on how much data you can read in one go.
Where it fits
Before learning scan pagination, you should understand basic DynamoDB concepts like tables, items, and the Scan operation. After mastering pagination, you can learn about Query pagination, filtering data efficiently, and optimizing read capacity for better performance.
Mental Model
Core Idea
Scan pagination breaks a large data scan into smaller pages, each with a pointer to the next, so you can read all data step-by-step without overload.
Think of it like...
Imagine reading a long book by chapters instead of all pages at once. After finishing one chapter, you know where to start the next. Scan pagination works the same way by reading data page by page.
┌───────────────┐
│ Start Scan    │
└──────┬────────┘
       │
       ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│ Page 1 Items  │ -> │ Page 2 Items  │ -> │ Page 3 Items  │ -> ...
└──────┬────────┘    └──────┬────────┘    └──────┬────────┘
       │                   │                   │
       ▼                   ▼                   ▼
  LastEvaluatedKey    LastEvaluatedKey    LastEvaluatedKey
       │                   │                   │
       └───────────────────┴───────────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding DynamoDB Scan Basics
🤔
Concept: Learn what a Scan operation does in DynamoDB and its basic behavior.
A Scan reads every item in a DynamoDB table or index. It returns all data but can be slow and costly for big tables because it reads everything. By default, a Scan returns up to 1 MB of data per request, which might not include all items if the table is large.
Result
You get a partial list of items up to 1 MB in size from the table.
Knowing that Scan reads the whole table but limits data size per request sets the stage for why pagination is needed.
2
FoundationWhat is Pagination in DynamoDB Scan?
🤔
Concept: Pagination means splitting Scan results into pages to handle large data sets efficiently.
Because Scan returns limited data per request, DynamoDB provides a LastEvaluatedKey in the response if more data exists. This key is a marker to start the next Scan request from where the last one ended. Using this, you can get all data page by page.
Result
You can continue scanning by sending the LastEvaluatedKey to get the next page of results.
Understanding the LastEvaluatedKey as a pointer is crucial to reading large tables without overload.
3
IntermediateImplementing Scan Pagination in Code
🤔Before reading on: do you think you must manually track the LastEvaluatedKey or does DynamoDB handle it automatically? Commit to your answer.
Concept: Learn how to use LastEvaluatedKey in your Scan requests to get all pages of data.
When you call Scan, check if the response has LastEvaluatedKey. If yes, include it as ExclusiveStartKey in the next Scan request. Repeat until LastEvaluatedKey is missing, meaning no more data. This loop fetches all pages safely.
Result
Your code fetches all items in multiple requests, each returning a page of data.
Knowing you must manually pass LastEvaluatedKey to continue scanning prevents missing data or infinite loops.
4
IntermediateControlling Page Size with Limit Parameter
🤔Before reading on: do you think setting a Limit guarantees the exact number of items per page or just a maximum? Commit to your answer.
Concept: Limit sets the maximum number of items returned per Scan request, helping control page size.
You can add a Limit parameter to your Scan request to get smaller pages. However, DynamoDB may return fewer items if the 1 MB data size limit is reached first. So Limit is a maximum, not a guaranteed count.
Result
Pages contain up to Limit items, but sometimes fewer due to data size limits.
Understanding Limit as a maximum helps set realistic expectations for page sizes.
5
IntermediateHandling Large Tables Efficiently
🤔Before reading on: do you think Scan pagination alone is enough for best performance on huge tables? Commit to your answer.
Concept: Learn strategies to improve Scan performance on big tables using pagination and filters.
Use pagination with small Limits to reduce load per request. Apply FilterExpression to reduce returned items. Consider parallel scans to speed up reading by dividing the table into segments scanned concurrently.
Result
Scan operations become faster and less resource-heavy on large tables.
Knowing how to combine pagination with filters and parallel scans helps scale reading large datasets.
6
AdvancedDealing with Consistency and Pagination
🤔Before reading on: do you think paginated Scan results always reflect the exact same data snapshot? Commit to your answer.
Concept: Understand how eventual consistency affects paginated Scan results and how to request strongly consistent reads.
By default, Scan uses eventually consistent reads, so data might change between pages. You can request strongly consistent reads, but it costs more and may slow performance. Be aware that data might appear different across pages if updated during scanning.
Result
You can choose consistency level but must handle possible data changes between pages.
Knowing consistency trade-offs helps design applications that handle data changes gracefully during pagination.
7
ExpertUnexpected Pagination Behavior and Limits
🤔Before reading on: do you think DynamoDB guarantees that each Scan page will always have the same number of items if Limit is set? Commit to your answer.
Concept: Learn about DynamoDB internal limits that can cause pages to have fewer items than expected and how to handle them.
DynamoDB limits Scan results by data size (1 MB) and not just item count. Even if Limit is high, a page may return fewer items if the data size limit is reached. Also, if items are large or have many attributes, pages can vary in size. This can surprise developers expecting fixed page sizes.
Result
Scan pages vary in size due to internal limits, requiring flexible handling in code.
Understanding DynamoDB's size-based limits prevents bugs and incorrect assumptions about pagination behavior.
Under the Hood
DynamoDB Scan reads data by sequentially scanning partitions of the table. It returns items up to a 1 MB data size limit per request. If more data remains, it provides a LastEvaluatedKey, which is the primary key of the last item returned. This key tells DynamoDB where to resume scanning in the next request. Internally, DynamoDB uses this key to continue scanning without repeating or skipping items.
Why designed this way?
DynamoDB limits Scan size to 1 MB to protect performance and avoid long-running operations that could block other requests. Using LastEvaluatedKey as a resume token allows stateless pagination, making it easy to fetch data in chunks without server-side session state. This design balances scalability, performance, and simplicity.
┌───────────────┐
│ Scan Request  │
└──────┬────────┘
       │
       ▼
┌───────────────────────────────┐
│ DynamoDB scans partitions      │
│ up to 1 MB data limit          │
└──────┬────────────────────────┘
       │
       ▼
┌───────────────┐       ┌─────────────────────┐
│ Items Returned│       │ LastEvaluatedKey    │
│ (partial page)│       │ (resume pointer)    │
└──────┬────────┘       └─────────┬───────────┘
       │                          │
       ▼                          ▼
┌───────────────┐         ┌───────────────┐
│ Client uses   │         │ Next Scan     │
│ items & key   │◀────────│ request with   │
└───────────────┘         │ ExclusiveStartKey│
                          └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does setting Limit guarantee the exact number of items per Scan page? Commit to yes or no.
Common Belief:Setting Limit in Scan always returns exactly that many items per page.
Tap to reveal reality
Reality:Limit is a maximum, but DynamoDB may return fewer items if the 1 MB data size limit is reached first.
Why it matters:Expecting fixed page sizes can cause bugs in pagination logic and incorrect UI behavior.
Quick: Does Scan pagination automatically fetch all pages without extra code? Commit to yes or no.
Common Belief:DynamoDB Scan automatically handles pagination and returns all data in one call.
Tap to reveal reality
Reality:You must manually check LastEvaluatedKey and make additional Scan calls to get all pages.
Why it matters:Not handling pagination leads to incomplete data retrieval and hidden bugs.
Quick: Are Scan results always consistent across pages without special settings? Commit to yes or no.
Common Belief:Scan pagination returns a consistent snapshot of data across all pages by default.
Tap to reveal reality
Reality:By default, Scan uses eventually consistent reads, so data may change between pages.
Why it matters:Assuming consistency can cause data anomalies in applications that rely on stable data.
Quick: Can you use Scan pagination to efficiently get a small subset of data? Commit to yes or no.
Common Belief:Scan pagination is efficient for any data retrieval, including small filtered queries.
Tap to reveal reality
Reality:Scan reads the whole table and is inefficient for targeted queries; Query operation is better for subsets.
Why it matters:Using Scan instead of Query wastes resources and increases costs.
Expert Zone
1
LastEvaluatedKey is opaque and must be passed exactly as received; modifying it breaks pagination.
2
Parallel Scan divides the table into segments scanned concurrently, but requires careful merging of results.
3
Strongly consistent reads during Scan increase latency and cost, so use only when data accuracy is critical.
When NOT to use
Avoid Scan pagination when you can use Query with keys and indexes for targeted, efficient data retrieval. Use Query pagination instead for better performance and lower cost.
Production Patterns
In production, Scan pagination is often combined with filters and parallel scans to handle large data exports or analytics. Developers implement retry logic for throttling and use LastEvaluatedKey to resume interrupted scans.
Connections
Cursor-based Pagination in APIs
Scan pagination uses a cursor (LastEvaluatedKey) similar to cursor-based pagination in web APIs.
Understanding cursors in APIs helps grasp how DynamoDB tracks position in large data sets for efficient navigation.
Memory Paging in Operating Systems
Both break large data into smaller chunks (pages) to manage resources efficiently.
Knowing how OS paging works clarifies why DynamoDB limits Scan size and uses pagination to avoid overload.
Streaming Data Processing
Scan pagination streams data in parts rather than loading all at once, like streaming systems process data chunks.
Recognizing streaming patterns helps design scalable data retrieval that handles large volumes gracefully.
Common Pitfalls
#1Not checking for LastEvaluatedKey and assuming Scan returns all data in one call.
Wrong approach:response = dynamodb.scan(TableName='MyTable') items = response['Items'] # No loop to check LastEvaluatedKey
Correct approach:items = [] response = dynamodb.scan(TableName='MyTable') items.extend(response['Items']) while 'LastEvaluatedKey' in response: response = dynamodb.scan(TableName='MyTable', ExclusiveStartKey=response['LastEvaluatedKey']) items.extend(response['Items'])
Root cause:Misunderstanding that Scan returns partial data and that pagination requires manual handling.
#2Setting Limit expecting fixed page size but not handling variable page sizes.
Wrong approach:response = dynamodb.scan(TableName='MyTable', Limit=100) # Assuming always 100 items returned
Correct approach:response = dynamodb.scan(TableName='MyTable', Limit=100) items = response['Items'] # Check actual number of items and handle fewer items gracefully
Root cause:Confusing Limit as exact count rather than maximum allowed items.
#3Modifying LastEvaluatedKey before passing it back to Scan.
Wrong approach:last_key = response['LastEvaluatedKey'] last_key['ExtraField'] = 'value' response = dynamodb.scan(TableName='MyTable', ExclusiveStartKey=last_key)
Correct approach:last_key = response['LastEvaluatedKey'] response = dynamodb.scan(TableName='MyTable', ExclusiveStartKey=last_key)
Root cause:Not treating LastEvaluatedKey as an opaque token that must be passed exactly.
Key Takeaways
Scan pagination breaks large DynamoDB scans into manageable pages using LastEvaluatedKey as a resume token.
You must manually check and use LastEvaluatedKey to retrieve all data pages; DynamoDB does not do this automatically.
Limit sets a maximum number of items per page but actual page size can vary due to data size limits.
Scan uses eventually consistent reads by default, so data may change between pages unless strong consistency is requested.
For efficient targeted data retrieval, prefer Query with pagination over Scan pagination.