Bird
Raised Fist0
Kubernetesdevops~10 mins

etcd backup and recovery in Kubernetes - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Process Flow - etcd backup and recovery
Start etcd backup
Run etcdctl snapshot save
Verify snapshot file
Store snapshot safely
If recovery needed?
NoEnd
Yes
Stop etcd service
Run etcdctl snapshot restore
Restart etcd service
Verify etcd health
Recovery complete
This flow shows how to create a backup snapshot of etcd, store it, and restore it if needed to recover the cluster state.
Execution Sample
Kubernetes
ETCDCTL_API=3 etcdctl snapshot save backup.db
systemctl stop etcd
ETCDCTL_API=3 etcdctl snapshot restore backup.db --data-dir restored_etcd
mv /var/lib/etcd /var/lib/etcd-old
mv restored_etcd /var/lib/etcd
systemctl start etcd
ETCDCTL_API=3 etcdctl endpoint health
This sequence saves a snapshot, stops etcd service, restores it to a new data directory, backs up and replaces the old data directory, starts etcd, and checks health.
Process Table
StepCommandActionResult/Output
1ETCDCTL_API=3 etcdctl snapshot save backup.dbCreate snapshot fileSnapshot saved to backup.db
2ls backup.dbVerify snapshot file existsbackup.db listed
3systemctl stop etcdStop etcd service before restoreetcd service stopped
4ETCDCTL_API=3 etcdctl snapshot restore backup.db --data-dir restored_etcdRestore snapshot to new data directorySnapshot restored to restored_etcd
5mv /var/lib/etcd /var/lib/etcd-oldBackup old data directoryOld data backed up
6mv restored_etcd /var/lib/etcdReplace data directory with restored dataData directory replaced
7systemctl start etcdStart etcd serviceetcd service started
8ETCDCTL_API=3 etcdctl endpoint healthCheck etcd healthendpoint is healthy
9-Recovery completeetcd cluster restored and healthy
💡 Recovery ends after etcd service is healthy and running with restored data
Status Tracker
VariableStartAfter Step 1After Step 4After Step 7Final
snapshot_filenonebackup.db createdbackup.db unchangedbackup.db unchangedbackup.db unchanged
etcd_servicerunningrunningstoppedstartedrunning
data_directory/var/lib/etcd/var/lib/etcdrestored_etcd (new)/var/lib/etcd (restored)/var/lib/etcd (restored)
etcd_healthhealthyhealthyunknown (stopped)unknown (starting)healthy
Key Moments - 3 Insights
Why do we stop the etcd service before restoring the snapshot?
Stopping etcd ensures no data changes happen during restore, preventing corruption. See execution_table step 3 where etcd stops before restore.
What happens if the snapshot file is missing or corrupted?
The restore command will fail and etcd cannot recover. Step 2 verifies snapshot existence to avoid this issue.
Why do we move the old data directory before replacing it with restored data?
This backup step prevents data loss if restore fails, allowing rollback. See step 5 in execution_table.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the state of etcd_service after step 4?
Arunning
Bstopped
Cstarting
Dunknown
💡 Hint
Check the 'etcd_service' variable in variable_tracker after step 4
At which step is the snapshot file created?
AStep 1
BStep 3
CStep 5
DStep 7
💡 Hint
Look at execution_table step 1 command and result
If we skip moving the old data directory (step 5), what risk increases?
Aetcd service won't start
BSnapshot file will be deleted
CNo backup of old data, risking data loss
Detcd health check will fail
💡 Hint
Refer to key_moments explanation about step 5
Concept Snapshot
etcd backup and recovery:
- Use 'etcdctl snapshot save <file>' to backup
- Stop etcd before restore
- Restore with 'etcdctl snapshot restore <file> --data-dir <dir>'
- Replace old data dir with restored dir
- Restart etcd and verify health
- Always keep backup of old data before restore
Full Transcript
This visual execution shows how to backup and recover etcd data in Kubernetes. First, we create a snapshot file using 'etcdctl snapshot save'. We verify the snapshot exists. Before restoring, we stop the etcd service to avoid data corruption. Then we restore the snapshot to a new data directory. We backup the old data directory by moving it, then replace it with the restored data. After that, we start the etcd service again and check its health to confirm recovery success. Key points include stopping etcd before restore and backing up old data to prevent loss. The execution table and variable tracker clearly show each step and state change for easy understanding.

Practice

(1/5)
1. What is the primary purpose of taking an etcd backup in Kubernetes?
easy
A. To save the current state of the cluster data safely
B. To update the Kubernetes version automatically
C. To monitor cluster performance metrics
D. To delete old cluster data permanently

Solution

  1. Step 1: Understand etcd role in Kubernetes

    etcd stores all cluster data including configuration and state.
  2. Step 2: Purpose of backup

    Backing up etcd saves this data so it can be restored if lost or corrupted.
  3. Final Answer:

    To save the current state of the cluster data safely -> Option A
  4. Quick Check:

    Backup = Save cluster data [OK]
Hint: Backup means saving cluster data safely [OK]
Common Mistakes:
  • Confusing backup with updating Kubernetes
  • Thinking backup monitors performance
  • Assuming backup deletes data
2. Which of the following is the correct command to create an etcd snapshot backup?
easy
A. etcdctl save snapshot backup.db
B. etcdctl backup create backup.db
C. etcdctl snapshot create backup.db
D. etcdctl snapshot save backup.db

Solution

  1. Step 1: Recall etcdctl snapshot save syntax

    The correct command to save a snapshot is etcdctl snapshot save <file>.
  2. Step 2: Compare options

    Only etcdctl snapshot save backup.db matches the exact syntax.
  3. Final Answer:

    etcdctl snapshot save backup.db -> Option D
  4. Quick Check:

    Snapshot save = create backup [OK]
Hint: Use 'etcdctl snapshot save' to backup [OK]
Common Mistakes:
  • Using 'backup create' instead of 'snapshot save'
  • Mixing 'create' and 'save' commands
  • Incorrect command order
3. What will be the output of the following command if the backup file backup.db exists and is valid?

etcdctl snapshot restore backup.db --data-dir restored-etcd
medium
A. Restores the snapshot data into the directory 'restored-etcd'
B. Creates a new snapshot named 'restored-etcd'
C. Deletes the existing backup.db file
D. Shows an error that the file does not exist

Solution

  1. Step 1: Understand snapshot restore command

    The command restores data from a snapshot file into a specified data directory.
  2. Step 2: Analyze given command

    It uses backup.db as source and restores into restored-etcd directory.
  3. Final Answer:

    Restores the snapshot data into the directory 'restored-etcd' -> Option A
  4. Quick Check:

    Restore command = recover data to directory [OK]
Hint: Restore puts data into given directory [OK]
Common Mistakes:
  • Thinking it creates a new snapshot
  • Assuming it deletes backup files
  • Expecting error when file exists
4. You ran etcdctl snapshot save backup.db but the command failed with an error: etcdctl: command not found. What is the most likely cause?
medium
A. The command syntax is incorrect
B. The etcdctl tool is not installed or not in the system PATH
C. The etcd server is down and cannot create a snapshot
D. The backup.db file already exists and cannot be overwritten

Solution

  1. Step 1: Analyze error message

    The error 'command not found' means the system cannot find the etcdctl program.
  2. Step 2: Identify cause

    This usually happens if etcdctl is not installed or not in the PATH environment variable.
  3. Final Answer:

    The etcdctl tool is not installed or not in the system PATH -> Option B
  4. Quick Check:

    Command not found = tool missing or PATH issue [OK]
Hint: Command not found means tool missing or PATH error [OK]
Common Mistakes:
  • Assuming file overwrite causes command not found
  • Blaming etcd server status for command not found
  • Thinking syntax error causes command not found
5. You want to recover your Kubernetes cluster after a failure using an etcd snapshot. Which sequence of commands correctly restores the cluster data and starts etcd with the restored data?
hard
A. systemctl restart etcd && etcdctl snapshot restore backup.db --data-dir /var/lib/etcd-restored
B. etcdctl snapshot save backup.db && systemctl stop etcd
C. etcdctl snapshot restore backup.db --data-dir /var/lib/etcd-restored && systemctl restart etcd
D. etcdctl snapshot restore backup.db --data-dir /var/lib/etcd-restored && systemctl stop etcd

Solution

  1. Step 1: Restore snapshot to a new data directory

    Use etcdctl snapshot restore backup.db --data-dir /var/lib/etcd-restored to recover data safely without overwriting live data.
  2. Step 2: Restart etcd service to use restored data

    Restarting etcd with systemctl restart etcd applies the restored data directory.
  3. Final Answer:

    etcdctl snapshot restore backup.db --data-dir /var/lib/etcd-restored && systemctl restart etcd -> Option C
  4. Quick Check:

    Restore then restart etcd = recovery [OK]
Hint: Restore snapshot first, then restart etcd service [OK]
Common Mistakes:
  • Restarting etcd before restoring snapshot
  • Stopping etcd without restarting after restore
  • Saving snapshot instead of restoring