Overview - Scan pagination

What is it?

Scan pagination is a way to read large sets of data from a DynamoDB table in smaller parts called pages. Instead of getting all data at once, which can be slow or use too much memory, you get a limited number of items per request. Each page includes a marker to get the next page, so you can continue reading until all data is retrieved.

Why it matters

Without scan pagination, trying to read a big table all at once can cause slow responses, timeouts, or high costs. Pagination helps keep your app fast and efficient by breaking data into manageable chunks. It also helps avoid hitting limits set by DynamoDB on how much data you can read in one go.

Where it fits

Before learning scan pagination, you should understand basic DynamoDB concepts like tables, items, and the Scan operation. After mastering pagination, you can learn about Query pagination, filtering data efficiently, and optimizing read capacity for better performance.

Mental Model

Core Idea

Scan pagination breaks a large data scan into smaller pages, each with a pointer to the next, so you can read all data step-by-step without overload.

Think of it like...

Imagine reading a long book by chapters instead of all pages at once. After finishing one chapter, you know where to start the next. Scan pagination works the same way by reading data page by page.

┌───────────────┐
│ Start Scan    │
└──────┬────────┘
       │
       ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│ Page 1 Items  │ -> │ Page 2 Items  │ -> │ Page 3 Items  │ -> ...
└──────┬────────┘    └──────┬────────┘    └──────┬────────┘
       │                   │                   │
       ▼                   ▼                   ▼
  LastEvaluatedKey    LastEvaluatedKey    LastEvaluatedKey
       │                   │                   │
       └───────────────────┴───────────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding DynamoDB Scan Basics

Concept: Learn what a Scan operation does in DynamoDB and its basic behavior.

A Scan reads every item in a DynamoDB table or index. It returns all data but can be slow and costly for big tables because it reads everything. By default, a Scan returns up to 1 MB of data per request, which might not include all items if the table is large.

Result

You get a partial list of items up to 1 MB in size from the table.

Knowing that Scan reads the whole table but limits data size per request sets the stage for why pagination is needed.

2

FoundationWhat is Pagination in DynamoDB Scan?

3

IntermediateImplementing Scan Pagination in Code

4

IntermediateControlling Page Size with Limit Parameter

5

IntermediateHandling Large Tables Efficiently

6

AdvancedDealing with Consistency and Pagination

7

ExpertUnexpected Pagination Behavior and Limits

Under the Hood

DynamoDB Scan reads data by sequentially scanning partitions of the table. It returns items up to a 1 MB data size limit per request. If more data remains, it provides a LastEvaluatedKey, which is the primary key of the last item returned. This key tells DynamoDB where to resume scanning in the next request. Internally, DynamoDB uses this key to continue scanning without repeating or skipping items.

Why designed this way?

DynamoDB limits Scan size to 1 MB to protect performance and avoid long-running operations that could block other requests. Using LastEvaluatedKey as a resume token allows stateless pagination, making it easy to fetch data in chunks without server-side session state. This design balances scalability, performance, and simplicity.

┌───────────────┐
│ Scan Request  │
└──────┬────────┘
       │
       ▼
┌───────────────────────────────┐
│ DynamoDB scans partitions      │
│ up to 1 MB data limit          │
└──────┬────────────────────────┘
       │
       ▼
┌───────────────┐       ┌─────────────────────┐
│ Items Returned│       │ LastEvaluatedKey    │
│ (partial page)│       │ (resume pointer)    │
└──────┬────────┘       └─────────┬───────────┘
       │                          │
       ▼                          ▼
┌───────────────┐         ┌───────────────┐
│ Client uses   │         │ Next Scan     │
│ items & key   │◀────────│ request with   │
└───────────────┘         │ ExclusiveStartKey│
                          └───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does setting Limit guarantee the exact number of items per Scan page? Commit to yes or no.

Common Belief:Setting Limit in Scan always returns exactly that many items per page.

Tap to reveal reality

Quick: Does Scan pagination automatically fetch all pages without extra code? Commit to yes or no.

Common Belief:DynamoDB Scan automatically handles pagination and returns all data in one call.

Tap to reveal reality

Quick: Are Scan results always consistent across pages without special settings? Commit to yes or no.

Common Belief:Scan pagination returns a consistent snapshot of data across all pages by default.

Tap to reveal reality

Quick: Can you use Scan pagination to efficiently get a small subset of data? Commit to yes or no.

Common Belief:Scan pagination is efficient for any data retrieval, including small filtered queries.

Tap to reveal reality

Expert Zone

1

LastEvaluatedKey is opaque and must be passed exactly as received; modifying it breaks pagination.

2

Parallel Scan divides the table into segments scanned concurrently, but requires careful merging of results.

3

Strongly consistent reads during Scan increase latency and cost, so use only when data accuracy is critical.

When NOT to use

Avoid Scan pagination when you can use Query with keys and indexes for targeted, efficient data retrieval. Use Query pagination instead for better performance and lower cost.

Production Patterns

In production, Scan pagination is often combined with filters and parallel scans to handle large data exports or analytics. Developers implement retry logic for throttling and use LastEvaluatedKey to resume interrupted scans.

Connections

Cursor-based Pagination in APIs

Scan pagination uses a cursor (LastEvaluatedKey) similar to cursor-based pagination in web APIs.

Understanding cursors in APIs helps grasp how DynamoDB tracks position in large data sets for efficient navigation.

Memory Paging in Operating Systems

Both break large data into smaller chunks (pages) to manage resources efficiently.

Knowing how OS paging works clarifies why DynamoDB limits Scan size and uses pagination to avoid overload.

Streaming Data Processing

Scan pagination streams data in parts rather than loading all at once, like streaming systems process data chunks.

Recognizing streaming patterns helps design scalable data retrieval that handles large volumes gracefully.

Common Pitfalls

#1Not checking for LastEvaluatedKey and assuming Scan returns all data in one call.

Wrong approach:response = dynamodb.scan(TableName='MyTable') items = response['Items'] # No loop to check LastEvaluatedKey

Correct approach:items = [] response = dynamodb.scan(TableName='MyTable') items.extend(response['Items']) while 'LastEvaluatedKey' in response: response = dynamodb.scan(TableName='MyTable', ExclusiveStartKey=response['LastEvaluatedKey']) items.extend(response['Items'])

Root cause:Misunderstanding that Scan returns partial data and that pagination requires manual handling.

#2Setting Limit expecting fixed page size but not handling variable page sizes.

Wrong approach:response = dynamodb.scan(TableName='MyTable', Limit=100) # Assuming always 100 items returned

Correct approach:response = dynamodb.scan(TableName='MyTable', Limit=100) items = response['Items'] # Check actual number of items and handle fewer items gracefully

Root cause:Confusing Limit as exact count rather than maximum allowed items.

#3Modifying LastEvaluatedKey before passing it back to Scan.

Wrong approach:last_key = response['LastEvaluatedKey'] last_key['ExtraField'] = 'value' response = dynamodb.scan(TableName='MyTable', ExclusiveStartKey=last_key)

Correct approach:last_key = response['LastEvaluatedKey'] response = dynamodb.scan(TableName='MyTable', ExclusiveStartKey=last_key)

Root cause:Not treating LastEvaluatedKey as an opaque token that must be passed exactly.

Key Takeaways

Scan pagination breaks large DynamoDB scans into manageable pages using LastEvaluatedKey as a resume token.

You must manually check and use LastEvaluatedKey to retrieve all data pages; DynamoDB does not do this automatically.

Limit sets a maximum number of items per page but actual page size can vary due to data size limits.

Scan uses eventually consistent reads by default, so data may change between pages unless strong consistency is requested.

For efficient targeted data retrieval, prefer Query with pagination over Scan pagination.