Snapshot and restore in Elasticsearch - Time & Space Complexity
When working with Elasticsearch snapshots and restores, it's important to understand how the time to complete these operations changes as the data size grows.
We want to know how the execution time scales when taking or restoring snapshots of indexes.
Analyze the time complexity of the following snapshot creation request.
PUT /_snapshot/my_backup/snapshot_1
{
"indices": "logs-*",
"ignore_unavailable": true,
"include_global_state": false
}
This code creates a snapshot of all indices matching "logs-*" pattern, saving their data to a repository.
Let's find the main repeated work inside snapshot creation.
- Primary operation: Reading and copying data segments from each shard of the matched indices.
- How many times: Once per shard, for all shards in all matched indices.
As the number of shards and the size of data grow, the snapshot process reads more data.
| Input Size (n shards) | Approx. Operations (data read) |
|---|---|
| 10 | Reads data from 10 shards |
| 100 | Reads data from 100 shards |
| 1000 | Reads data from 1000 shards |
Pattern observation: The work grows roughly in direct proportion to the number of shards and their data size.
Time Complexity: O(n)
This means the time to snapshot grows linearly with the number of shards and amount of data.
[X] Wrong: "Snapshot time is constant no matter how much data we have."
[OK] Correct: Snapshotting reads actual data from each shard, so more data means more time.
Understanding how snapshot and restore scale helps you design systems that handle backups efficiently and predict their impact on performance.
"What if we snapshot only a single index instead of multiple indices? How would the time complexity change?"