Snapshot and restore in Elasticsearch - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When working with Elasticsearch snapshots and restores, it's important to understand how the time to complete these operations changes as the data size grows.
We want to know how the execution time scales when taking or restoring snapshots of indexes.
Analyze the time complexity of the following snapshot creation request.
PUT /_snapshot/my_backup/snapshot_1
{
"indices": "logs-*",
"ignore_unavailable": true,
"include_global_state": false
}
This code creates a snapshot of all indices matching "logs-*" pattern, saving their data to a repository.
Let's find the main repeated work inside snapshot creation.
- Primary operation: Reading and copying data segments from each shard of the matched indices.
- How many times: Once per shard, for all shards in all matched indices.
As the number of shards and the size of data grow, the snapshot process reads more data.
| Input Size (n shards) | Approx. Operations (data read) |
|---|---|
| 10 | Reads data from 10 shards |
| 100 | Reads data from 100 shards |
| 1000 | Reads data from 1000 shards |
Pattern observation: The work grows roughly in direct proportion to the number of shards and their data size.
Time Complexity: O(n)
This means the time to snapshot grows linearly with the number of shards and amount of data.
[X] Wrong: "Snapshot time is constant no matter how much data we have."
[OK] Correct: Snapshotting reads actual data from each shard, so more data means more time.
Understanding how snapshot and restore scale helps you design systems that handle backups efficiently and predict their impact on performance.
"What if we snapshot only a single index instead of multiple indices? How would the time complexity change?"
Practice
Solution
Step 1: Understand snapshot purpose
A snapshot in Elasticsearch is used to save a backup of your data at a point in time.Step 2: Compare options
Options B, C, and D describe other Elasticsearch features, not snapshot backup.Final Answer:
To save a backup of your data for recovery later -> Option AQuick Check:
Snapshot = Backup [OK]
- Confusing snapshots with index templates
- Thinking snapshots speed up searches
- Assuming snapshots delete data
Solution
Step 1: Identify correct HTTP method for creating repository
Creating a snapshot repository uses the PUT method to define or update it.Step 2: Check other methods
POST is for creating snapshots, GET is for retrieving info, DELETE is for removing repositories.Final Answer:
PUT /_snapshot/my_backup {"type": "fs", "settings": {"location": "/mount/backups"}} -> Option BQuick Check:
Repository creation = PUT [OK]
- Using POST instead of PUT for repository creation
- Confusing GET with creation commands
- Trying to delete instead of create repository
POST /_snapshot/my_backup/snapshot_1/_restore
{
"indices": "index1,index2",
"rename_pattern": "index(.*)",
"rename_replacement": "restored_index$1"
}What will be the name of the restored index originally named
index2?Solution
Step 1: Understand rename_pattern and rename_replacement
The pattern "index(.*)" captures the part after "index". The replacement "restored_index$1" adds "restored_index" plus the captured part.Step 2: Apply to index2
For "index2", the captured part is "2", so the new name is "restored_index2".Final Answer:
restored_index2 -> Option AQuick Check:
Rename pattern + replacement = restored_index2 [OK]
- Ignoring rename_pattern and keeping original name
- Adding extra 'index' in the replacement
- Misplacing the captured group in new name
repository_missing_exception. What is the most likely cause?Solution
Step 1: Understand repository_missing_exception meaning
This error means Elasticsearch cannot find the snapshot repository to access snapshots.Step 2: Check other options
Snapshot name errors cause different exceptions; corrupted indices cause restore failures but not repository missing; version mismatch causes other errors.Final Answer:
The snapshot repository does not exist or is not registered -> Option CQuick Check:
repository_missing_exception = missing repository [OK]
- Assuming snapshot name typo causes repository_missing_exception
- Blaming corrupted indices for repository errors
- Ignoring repository setup before restore
{
"indices": "logs-2023,metrics-2023",
"rename_pattern": "(.*)-2023",
"rename_replacement": "$1-restore"
}Solution
Step 1: Analyze indices and rename_pattern
Indices "logs-2023" and "metrics-2023" match the pattern "(.*)-2023" capturing "logs" and "metrics".Step 2: Apply rename_replacement
Replacement "$1-restore" changes names to "logs-restore" and "metrics-restore".Step 3: Confirm only specified indices restored
Only indices listed in "indices" are restored, renamed as specified.Final Answer:
Restores logs-2023 and metrics-2023 as logs-restore and metrics-restore -> Option DQuick Check:
Indices filtered + renamed correctly = Restores logs-2023 and metrics-2023 as logs-restore and metrics-restore [OK]
- Restoring all snapshot indices ignoring filter
- Not using rename_pattern correctly
- Expecting renamed indexes to exist before restore
