Overview - Scan vs query performance

What is it?

Scan and query are two ways to read data from a database table. Scan reads every item in the table, checking each one to find matches. Query looks for items using specific keys, so it reads less data. Both are used to get information, but they work differently and affect speed.

Why it matters

Choosing between scan and query affects how fast your app gets data and how much it costs. If you use scan on big tables often, your app slows down and costs rise. Using query smartly makes your app faster and cheaper. Without understanding this, apps can be slow and expensive.

Where it fits

You should know basic database concepts like tables and keys before this. After this, you can learn about indexing, filtering, and optimizing database access for better performance.

Mental Model

Core Idea

Query finds data by looking directly where it should be, while scan checks every item one by one.

Think of it like...

Imagine finding a book in a library: query is like going straight to the shelf where the book belongs, scan is like checking every book in the entire library until you find it.

┌─────────────┐       ┌─────────────┐
│   Query     │       │    Scan     │
├─────────────┤       ├─────────────┤
│ Uses keys   │       │ Checks all  │
│ Direct hit  │       │ items       │
│ Fast        │       │ Slow        │
└─────┬───────┘       └─────┬───────┘
      │                     │       
      ▼                     ▼       
  Reads few items       Reads whole table

Build-Up - 7 Steps

1

FoundationUnderstanding database tables and items

Concept: Learn what a database table and items are.

A database table is like a spreadsheet with rows and columns. Each row is an item with data in columns. For example, a table of users has items with name, email, and ID.

Result

You can picture data stored in rows and columns, ready to be read or searched.

Knowing the basic structure of data storage helps understand how reading methods work.

2

FoundationWhat are keys in databases?

3

IntermediateHow query uses keys to find data

4

IntermediateHow scan reads all items in a table

5

IntermediateComparing performance and cost of scan vs query

6

AdvancedWhen to use scan despite its cost

7

ExpertOptimizing query and scan with indexes and filters

Under the Hood

Query uses the database's key-value store to jump directly to the data location using the key's hash or sorted order. Scan reads every data block sequentially, checking each item against filters. Query reads fewer data blocks and uses less I/O, while scan reads all blocks, causing more I/O and CPU usage.

Why designed this way?

Databases separate query and scan to balance speed and flexibility. Query is optimized for known keys to be fast. Scan exists to handle cases where keys don't help or full data processing is needed. This design avoids forcing all reads to be slow or all reads to require keys.

┌───────────────┐       ┌───────────────┐
│   Query       │       │    Scan       │
├───────────────┤       ├───────────────┤
│ Uses key index│       │ Reads all data│
│ Direct access │       │ Sequentially  │
│ Few I/O ops   │       │ Many I/O ops  │
└───────┬───────┘       └───────┬───────┘
        │                       │       
        ▼                       ▼       
  Fast response          Slower response

Myth Busters - 4 Common Misconceptions

Quick: Does query always return data faster than scan? Commit to yes or no.

Common Belief:Query is always faster than scan no matter what.

Tap to reveal reality

Quick: Does scan read less data if you use filters? Commit to yes or no.

Common Belief:Using filters with scan reduces the amount of data read from the database.

Tap to reveal reality

Quick: Can scan be faster than query in some cases? Commit to yes or no.

Common Belief:Scan is always slower than query.

Tap to reveal reality

Quick: Does adding more indexes always improve query speed? Commit to yes or no.

Common Belief:More indexes always make queries faster.

Tap to reveal reality

Expert Zone

1

Query performance depends heavily on how well the partition key distributes data; uneven keys cause hotspots.

2

Scan can be parallelized across partitions to improve speed but increases cost and complexity.

3

Filters applied after query or scan do not reduce read capacity units consumed; only key conditions do.

When NOT to use

Avoid scan on large tables when you can use query with proper keys or indexes. Use analytics tools or data pipelines for full data processing instead of scans. For real-time lookups, always prefer query.

Production Patterns

Use query for user-specific data retrieval with partition keys. Use scan for batch jobs like backups or analytics with parallel scans. Combine query with secondary indexes for flexible access patterns. Monitor read capacity and optimize keys to avoid throttling.

Connections

Indexing in Databases

Builds-on

Understanding scan vs query helps grasp why indexes exist and how they speed up data access.

Caching Systems

Similar pattern

Like query uses keys to find data fast, caches use keys to avoid scanning all data, improving speed.

Library Book Search

Opposite approach

Knowing how physical search in libraries works clarifies why direct access (query) is faster than checking every book (scan).

Common Pitfalls

#1Using scan for frequent user lookups.

Wrong approach:Performing scan operations to find user data by email every time a user logs in.

Correct approach:Design the table with email as a key or use a secondary index and perform query operations for user lookups.

Root cause:Not understanding that scan reads the whole table, causing slow and costly lookups.

#2Relying on filters to reduce scan cost.

Wrong approach:Using scan with filters expecting to lower read capacity usage and cost.

Correct approach:Use query with key conditions to limit data read; filters only reduce returned data, not read cost.

Root cause:Misunderstanding that filters apply after reading all data in scan.

#3Adding too many indexes to speed queries.

Wrong approach:Creating multiple secondary indexes on all attributes without planning.

Correct approach:Create only necessary indexes based on query patterns to balance read speed and write cost.

Root cause:Believing more indexes always improve performance without considering write overhead.

Key Takeaways

Query uses keys to find data directly, making it fast and cost-effective.

Scan reads every item in a table, which is slower and more expensive, suitable only for full data reads.

Filters reduce returned data but do not reduce scan read costs.

Indexes enable queries on different keys but must be used wisely to avoid overhead.

Choosing between scan and query impacts application speed, cost, and scalability.