0
0
Kubernetesdevops~10 mins

etcd backup and recovery in Kubernetes - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - etcd backup and recovery
Start etcd backup
Run etcdctl snapshot save
Verify snapshot file
Store snapshot safely
If recovery needed?
NoEnd
Yes
Stop etcd service
Run etcdctl snapshot restore
Restart etcd service
Verify etcd health
Recovery complete
This flow shows how to create a backup snapshot of etcd, store it, and restore it if needed to recover the cluster state.
Execution Sample
Kubernetes
ETCDCTL_API=3 etcdctl snapshot save backup.db
systemctl stop etcd
ETCDCTL_API=3 etcdctl snapshot restore backup.db --data-dir restored_etcd
mv /var/lib/etcd /var/lib/etcd-old
mv restored_etcd /var/lib/etcd
systemctl start etcd
ETCDCTL_API=3 etcdctl endpoint health
This sequence saves a snapshot, stops etcd service, restores it to a new data directory, backs up and replaces the old data directory, starts etcd, and checks health.
Process Table
StepCommandActionResult/Output
1ETCDCTL_API=3 etcdctl snapshot save backup.dbCreate snapshot fileSnapshot saved to backup.db
2ls backup.dbVerify snapshot file existsbackup.db listed
3systemctl stop etcdStop etcd service before restoreetcd service stopped
4ETCDCTL_API=3 etcdctl snapshot restore backup.db --data-dir restored_etcdRestore snapshot to new data directorySnapshot restored to restored_etcd
5mv /var/lib/etcd /var/lib/etcd-oldBackup old data directoryOld data backed up
6mv restored_etcd /var/lib/etcdReplace data directory with restored dataData directory replaced
7systemctl start etcdStart etcd serviceetcd service started
8ETCDCTL_API=3 etcdctl endpoint healthCheck etcd healthendpoint is healthy
9-Recovery completeetcd cluster restored and healthy
💡 Recovery ends after etcd service is healthy and running with restored data
Status Tracker
VariableStartAfter Step 1After Step 4After Step 7Final
snapshot_filenonebackup.db createdbackup.db unchangedbackup.db unchangedbackup.db unchanged
etcd_servicerunningrunningstoppedstartedrunning
data_directory/var/lib/etcd/var/lib/etcdrestored_etcd (new)/var/lib/etcd (restored)/var/lib/etcd (restored)
etcd_healthhealthyhealthyunknown (stopped)unknown (starting)healthy
Key Moments - 3 Insights
Why do we stop the etcd service before restoring the snapshot?
Stopping etcd ensures no data changes happen during restore, preventing corruption. See execution_table step 3 where etcd stops before restore.
What happens if the snapshot file is missing or corrupted?
The restore command will fail and etcd cannot recover. Step 2 verifies snapshot existence to avoid this issue.
Why do we move the old data directory before replacing it with restored data?
This backup step prevents data loss if restore fails, allowing rollback. See step 5 in execution_table.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the state of etcd_service after step 4?
Arunning
Bstopped
Cstarting
Dunknown
💡 Hint
Check the 'etcd_service' variable in variable_tracker after step 4
At which step is the snapshot file created?
AStep 1
BStep 3
CStep 5
DStep 7
💡 Hint
Look at execution_table step 1 command and result
If we skip moving the old data directory (step 5), what risk increases?
Aetcd service won't start
BSnapshot file will be deleted
CNo backup of old data, risking data loss
Detcd health check will fail
💡 Hint
Refer to key_moments explanation about step 5
Concept Snapshot
etcd backup and recovery:
- Use 'etcdctl snapshot save <file>' to backup
- Stop etcd before restore
- Restore with 'etcdctl snapshot restore <file> --data-dir <dir>'
- Replace old data dir with restored dir
- Restart etcd and verify health
- Always keep backup of old data before restore
Full Transcript
This visual execution shows how to backup and recover etcd data in Kubernetes. First, we create a snapshot file using 'etcdctl snapshot save'. We verify the snapshot exists. Before restoring, we stop the etcd service to avoid data corruption. Then we restore the snapshot to a new data directory. We backup the old data directory by moving it, then replace it with the restored data. After that, we start the etcd service again and check its health to confirm recovery success. Key points include stopping etcd before restore and backing up old data to prevent loss. The execution table and variable tracker clearly show each step and state change for easy understanding.