0
0
Redisquery~15 mins

SCAN for safe key iteration in Redis - Deep Dive

Choose your learning style9 modes available
Overview - SCAN for safe key iteration
What is it?
SCAN is a Redis command that lets you safely go through all keys in the database without blocking the server. Unlike commands that return all keys at once, SCAN returns small batches of keys each time you call it. This way, you can iterate over keys even in large databases without causing delays or crashes.
Why it matters
Without SCAN, fetching all keys at once can freeze Redis, making your app slow or unresponsive. SCAN solves this by spreading the work into small steps, so Redis stays fast and responsive. This is important for real-world apps that store lots of data and need to find keys safely.
Where it fits
Before learning SCAN, you should know basic Redis commands and understand what keys and databases are. After SCAN, you can learn about other cursor-based commands like SSCAN, HSCAN, and ZSCAN for iterating sets, hashes, and sorted sets.
Mental Model
Core Idea
SCAN lets you walk through Redis keys step-by-step using a cursor, avoiding heavy work all at once.
Think of it like...
Imagine searching for a book in a huge library by checking a few shelves at a time instead of trying to see every book at once.
┌─────────────┐
│ Start cursor│
└──────┬──────┘
       │ SCAN returns batch of keys + new cursor
       ▼
┌─────────────┐
│ Process keys│
└──────┬──────┘
       │ Use new cursor to continue
       ▼
┌─────────────┐
│ Repeat until│
│ cursor = 0  │
└─────────────┘
Build-Up - 7 Steps
1
FoundationWhat is SCAN command in Redis
🤔
Concept: Introducing SCAN as a command to iterate keys safely.
In Redis, the SCAN command lets you get keys in small groups instead of all at once. You start with cursor 0, and Redis returns some keys plus a new cursor. You keep calling SCAN with the new cursor until it returns 0, meaning no more keys.
Result
You get keys in batches, avoiding big delays or memory spikes.
Understanding SCAN as a cursor-based iterator helps avoid blocking Redis when listing keys.
2
FoundationWhy not use KEYS command for iteration
🤔
Concept: Explaining the problem with KEYS command for large data.
The KEYS command returns all matching keys at once. If the database is big, this can freeze Redis and slow down your app. KEYS is simple but unsafe for production use.
Result
Using KEYS on large databases causes delays and can crash Redis.
Knowing KEYS is unsafe motivates using SCAN for safe iteration.
3
IntermediateHow SCAN cursor works step-by-step
🤔Before reading on: do you think SCAN returns keys in a fixed order or can the order change? Commit to your answer.
Concept: Understanding the cursor mechanism and unordered results.
SCAN returns a cursor and a batch of keys. The cursor is a position marker inside Redis's internal data structure. Each call continues from the last cursor. The order of keys is not guaranteed and can change if keys are added or removed during iteration.
Result
You get partial key lists each time, and must keep calling SCAN until cursor is 0.
Knowing SCAN's unordered and incremental nature helps write code that handles partial and changing results.
4
IntermediateUsing MATCH and COUNT options with SCAN
🤔Before reading on: do you think MATCH filters keys before or after SCAN returns them? Commit to your answer.
Concept: Adding filtering and batch size control to SCAN.
MATCH lets you filter keys by pattern, like 'user:*'. COUNT suggests how many keys to return per call, but Redis may return more or fewer. These options help focus iteration and control workload.
Result
SCAN returns keys matching the pattern in batches roughly the size of COUNT.
Understanding MATCH and COUNT lets you customize SCAN for efficient and targeted key iteration.
5
IntermediateHandling SCAN results safely in code
🤔Before reading on: do you think you must collect all keys in memory before processing or can you process them as they come? Commit to your answer.
Concept: Best practices for processing SCAN results incrementally.
Because SCAN returns partial results, you should process keys as you receive them instead of storing all keys at once. This keeps memory use low and avoids blocking. Also, be ready for keys to appear or disappear during iteration.
Result
Your code safely iterates keys without freezing or running out of memory.
Knowing to process keys incrementally prevents common memory and performance problems.
6
AdvancedWhy SCAN is not a snapshot iterator
🤔Before reading on: do you think SCAN guarantees to return every key exactly once during iteration? Commit to your answer.
Concept: Understanding SCAN's limitations with concurrent changes.
SCAN does not guarantee a perfect snapshot. Keys added or deleted during iteration may be missed or returned multiple times. This is because SCAN scans Redis's internal hash table incrementally without locking the database.
Result
You may see some keys twice or miss some keys if the database changes during SCAN.
Knowing SCAN's eventual consistency helps design robust applications that tolerate duplicates or missing keys.
7
ExpertInternal hashing and cursor mechanics of SCAN
🤔Before reading on: do you think SCAN's cursor is a simple index or something more complex? Commit to your answer.
Concept: How SCAN uses Redis's hash table and bit-reversal cursor internally.
Redis stores keys in a hash table. SCAN's cursor is a 64-bit integer representing a position in this table. It uses a bit-reversal algorithm to visit all hash buckets efficiently. This design avoids scanning the entire table at once and supports incremental iteration.
Result
SCAN efficiently visits all keys without blocking, but order is unpredictable.
Understanding SCAN's internal cursor and hashing explains why iteration order is not fixed and why SCAN is fast and safe.
Under the Hood
SCAN works by iterating over Redis's internal hash table that stores keys. It uses a cursor that points to a position in the hash table. Each SCAN call returns keys from the current cursor position and a new cursor for the next call. The cursor uses a bit-reversal pattern to cover all buckets without scanning the entire table at once. This incremental approach avoids blocking Redis and keeps memory use low.
Why designed this way?
Before SCAN, commands like KEYS blocked Redis by returning all keys at once, causing performance problems. SCAN was designed to allow safe, incremental iteration without locking or blocking. The bit-reversal cursor was chosen to efficiently cover the hash table in small steps. This design balances speed, safety, and simplicity.
┌───────────────┐
│ Redis Hash    │
│ Table buckets │
│ 0 1 2 3 ... N │
└──────┬────────┘
       │ SCAN cursor points here
       ▼
┌─────────────────────────────┐
│ SCAN call:                  │
│ - Reads keys at cursor pos  │
│ - Returns keys + new cursor │
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│ Next SCAN call uses new cursor│
│ to continue scanning         │
└─────────────────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does SCAN guarantee to return all keys exactly once? Commit to yes or no.
Common Belief:SCAN returns every key exactly once in a fixed order.
Tap to reveal reality
Reality:SCAN may return some keys multiple times or miss keys if the database changes during iteration.
Why it matters:Assuming perfect iteration can cause bugs if your code does not handle duplicates or missing keys.
Quick: Is SCAN faster than KEYS for small databases? Commit to yes or no.
Common Belief:SCAN is always faster than KEYS, no matter the database size.
Tap to reveal reality
Reality:For very small databases, KEYS can be faster because it returns all keys at once without multiple calls.
Why it matters:Using SCAN unnecessarily on small datasets can add complexity without benefit.
Quick: Does MATCH option in SCAN filter keys before scanning or after? Commit to your answer.
Common Belief:MATCH filters keys before scanning, so SCAN only visits matching keys internally.
Tap to reveal reality
Reality:MATCH filters keys after scanning; SCAN still scans all keys internally and then filters results.
Why it matters:Expecting MATCH to reduce internal scanning can lead to wrong performance assumptions.
Quick: Can you rely on SCAN to get a consistent snapshot of keys at one moment? Commit to yes or no.
Common Belief:SCAN provides a consistent snapshot of keys at the time iteration starts.
Tap to reveal reality
Reality:SCAN does not provide a snapshot; keys added or removed during iteration may affect results.
Why it matters:Relying on SCAN for consistent snapshots can cause data inconsistency in your application.
Expert Zone
1
SCAN's cursor is a 64-bit integer that encodes the position in the hash table using bit-reversal, which is non-intuitive but critical for efficient iteration.
2
The COUNT option is a hint, not a guarantee; Redis may return more or fewer keys per call depending on internal state.
3
SCAN is safe for production but requires client code to handle duplicates and missing keys gracefully, which many beginners overlook.
When NOT to use
Do not use SCAN when you need a perfect snapshot of keys or guaranteed order. Instead, consider Redis keyspace notifications or maintaining your own index. For small datasets, KEYS is simpler and faster.
Production Patterns
In production, SCAN is used for background tasks like cache cleanup, monitoring, or migrating keys. It is combined with incremental processing and deduplication logic. Often, SCAN is wrapped in scripts or libraries that handle cursor management and retries.
Connections
Cursor-based pagination
SCAN uses a cursor to paginate through keys, similar to how databases paginate query results.
Understanding cursor pagination in databases helps grasp SCAN's incremental iteration and why it avoids loading everything at once.
Hash tables
SCAN iterates over Redis's internal hash table buckets using a bit-reversal cursor.
Knowing how hash tables store data explains why SCAN's order is unpredictable and why it uses a special cursor.
Library book search
Like scanning shelves in small batches to find books, SCAN checks keys in small groups to avoid overload.
This real-world process shows why breaking big tasks into small steps keeps systems responsive.
Common Pitfalls
#1Trying to get all keys with SCAN but ignoring the cursor and calling SCAN only once.
Wrong approach:SCAN 0 MATCH * COUNT 100
Correct approach:cursor = 0 repeat { result = SCAN cursor MATCH * COUNT 100 cursor = result.cursor process(result.keys) } while (cursor != 0)
Root cause:Misunderstanding that SCAN returns partial results and requires repeated calls with updated cursor.
#2Assuming SCAN returns keys in a fixed order and relying on that order in application logic.
Wrong approach:keys = [] cursor = 0 while cursor != 0: cursor, batch = SCAN cursor keys.extend(batch) # Then expecting keys to be sorted or stable
Correct approach:Process keys as they come without assuming order; if order is needed, sort keys after collection.
Root cause:Not knowing SCAN's iteration order is unpredictable and can change between calls.
#3Using MATCH to reduce SCAN's internal workload, expecting it to speed up scanning.
Wrong approach:SCAN 0 MATCH user:* COUNT 100
Correct approach:Use MATCH to filter results, but understand SCAN still scans all keys internally; optimize by reducing keyspace size if needed.
Root cause:Believing MATCH filters keys before scanning rather than after.
Key Takeaways
SCAN is a safe way to iterate Redis keys incrementally without blocking the server.
It uses a cursor to remember position and returns keys in batches until the cursor is zero.
SCAN does not guarantee order or perfect snapshots; keys may appear multiple times or be missed if changed during iteration.
MATCH and COUNT options help filter and control batch size but have limitations.
Proper use of SCAN requires handling partial results, duplicates, and incremental processing in your code.