Overview - Scan vs Query performance comparison

What is it?

In DynamoDB, Scan and Query are two ways to read data from a table. Scan reads every item in the table, while Query finds items based on specific keys. Both return data but work differently under the hood. Understanding their performance differences helps you choose the best method for your needs.

Why it matters

Choosing between Scan and Query affects how fast your app responds and how much it costs. Using Scan on large tables can be slow and expensive because it reads everything. Query is faster and cheaper when you know the keys. Without this knowledge, apps can become slow and costly, frustrating users and wasting resources.

Where it fits

Before learning this, you should know basic DynamoDB concepts like tables, items, and primary keys. After this, you can learn about advanced querying techniques, indexes, and optimizing DynamoDB performance.

Mental Model

Core Idea

Scan reads the whole table item by item, while Query directly fetches items using keys, making Query much faster and efficient.

Think of it like...

Imagine looking for a book in a library: Scan is like checking every book on every shelf, while Query is like going straight to the shelf where the book is known to be.

┌───────────────┐       ┌───────────────┐
│   DynamoDB    │       │   DynamoDB    │
│    Table     │       │    Table     │
│  (many items)│       │  (many items)│
└──────┬────────┘       └──────┬────────┘
       │ Scan: reads all items        │ Query: reads items by key
       ▼                            ▼
[Item1, Item2, ..., ItemN]    [Item with matching key(s)]

Build-Up - 7 Steps

1

FoundationUnderstanding DynamoDB Table Basics

Concept: Learn what a DynamoDB table is and how data is stored as items with keys.

A DynamoDB table stores data as items, which are like rows in a spreadsheet. Each item has attributes, including a primary key that uniquely identifies it. The primary key can be simple (one attribute) or composite (partition key and sort key). This key helps DynamoDB find items quickly.

Result

You understand that data is organized by keys, which are essential for fast access.

Knowing how data is organized by keys is the foundation for understanding why Query is faster than Scan.

2

FoundationWhat is a Scan Operation?

3

IntermediateWhat is a Query Operation?

4

IntermediatePerformance Differences Between Scan and Query

5

IntermediateWhen Scan Might Be Necessary

6

AdvancedOptimizing Scan with Parallelization and Filters

7

ExpertImpact of Indexes on Query and Scan Performance

Under the Hood

DynamoDB stores data in partitions based on the partition key. Query uses the partition key to directly access the relevant partition and fetch matching items. Scan reads all partitions sequentially, reading every item regardless of keys. Filters applied in Scan happen after reading all data, so they don't reduce read capacity usage. Parallel Scan divides the table into segments scanned concurrently, increasing throughput but also total read capacity consumption.

Why designed this way?

DynamoDB was designed for fast, scalable access using keys to avoid full table scans. Query leverages this by using keys to jump directly to data. Scan exists to provide flexibility when keys are unknown but is less efficient. This design balances speed and flexibility, encouraging key-based access for performance while allowing full scans when necessary.

┌───────────────┐
│ DynamoDB Table│
├───────────────┤
│ Partition 1   │◄───── Query uses partition key to jump here
│ Partition 2   │
│ Partition 3   │
│ Partition N   │
└─────┬─────────┘
      │
      ▼
  Scan reads all partitions one by one

Parallel Scan:
┌───────────────┐
│ Partition 1   │◄─ Segment 1
│ Partition 2   │◄─ Segment 2
│ Partition 3   │◄─ Segment 3
│ Partition N   │◄─ Segment N
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does applying a filter in Scan reduce the read capacity units consumed? Commit to yes or no.

Common Belief:Applying filters in Scan reduces the amount of data read and lowers costs.

Tap to reveal reality

Quick: Is Query always faster than Scan regardless of table size? Commit to yes or no.

Common Belief:Query is always faster than Scan no matter what.

Tap to reveal reality

Quick: Does running multiple parallel Scans always reduce total read capacity usage? Commit to yes or no.

Common Belief:Parallel Scans reduce total read capacity usage by scanning faster.

Tap to reveal reality

Quick: Can Query be used without specifying the partition key? Commit to yes or no.

Common Belief:You can Query DynamoDB without specifying the partition key by just using filters.

Tap to reveal reality

Expert Zone

1

Query performance depends heavily on how well the partition key distributes data; hot partitions can cause throttling even with Query.

2

Scan operations can be throttled or slowed by DynamoDB if they consume too much read capacity, affecting overall table performance.

3

Using ProjectionExpression in Query and Scan can reduce the amount of data returned, saving bandwidth but not always reducing read capacity.

When NOT to use

Avoid Scan on large tables when you can use Query or indexes. If you need to search by non-key attributes frequently, consider adding Global Secondary Indexes. For complex queries, consider using DynamoDB Streams or exporting data to analytics databases.

Production Patterns

In production, Query is used for fast lookups by key, often combined with GSIs for alternate access patterns. Scan is used sparingly for maintenance tasks or rare full-table operations, often with pagination and parallelization to manage load.

Connections

Indexing in Databases

Query in DynamoDB is similar to using indexes in relational databases to speed up searches.

Understanding indexing in traditional databases helps grasp why Query is efficient and how secondary indexes extend this in DynamoDB.

Caching Systems

Query results can be cached to avoid repeated reads, similar to caching in web applications.

Knowing caching strategies helps optimize DynamoDB Query performance by reducing direct database reads.

Library Book Search

Scan is like browsing every book, Query is like using the catalog to find a book quickly.

This real-world search analogy clarifies why key-based access is faster and more efficient.

Common Pitfalls

#1Using Scan for frequent queries on large tables.

Wrong approach:aws dynamodb scan --table-name MyTable --filter-expression "attribute_exists(Status)"

Correct approach:aws dynamodb query --table-name MyTable --key-condition-expression "PartitionKey = :pk" --expression-attribute-values '{":pk":{"S":"value"}}'

Root cause:Not understanding that Scan reads the whole table, causing slow performance and high cost.

#2Trying to Query without specifying the partition key.

Wrong approach:aws dynamodb query --table-name MyTable --filter-expression "Status = :status" --expression-attribute-values '{":status":{"S":"active"}}'

Correct approach:aws dynamodb query --table-name MyTable --key-condition-expression "PartitionKey = :pk" --expression-attribute-values '{":pk":{"S":"value"}}'

Root cause:Misunderstanding that Query requires the partition key to work.

#3Assuming filters reduce read capacity in Scan.

Wrong approach:aws dynamodb scan --table-name MyTable --filter-expression "Status = :status" --expression-attribute-values '{":status":{"S":"active"}}'

Correct approach:Use Query with keys or add indexes; filters only reduce returned data, not read capacity.

Root cause:Believing filters reduce the amount of data DynamoDB reads internally.

Key Takeaways

Scan reads every item in a table, making it slow and costly for large datasets.

Query uses primary keys to directly fetch matching items, making it faster and more efficient.

Filters in Scan reduce returned data but do not reduce the read capacity units consumed.

Parallel Scan can speed up scanning but increases total read capacity usage and cost.

Using secondary indexes allows Query to access data flexibly and efficiently beyond the primary key.