Bird
Raised Fist0
Kubernetesdevops~10 mins

etcd backup and recovery in Kubernetes - Commands & Configuration

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
etcd stores important data for Kubernetes clusters. If this data is lost or corrupted, the cluster can break. Backing up etcd regularly and knowing how to restore it helps keep your cluster safe and running.
When you want to save a snapshot of your cluster state before making big changes.
When your Kubernetes cluster is not responding due to etcd data corruption.
When migrating your cluster data to a new etcd instance or server.
When you want to recover your cluster after accidental deletion of resources.
When performing disaster recovery drills to ensure cluster resilience.
Commands
This command creates a backup snapshot of the etcd database and saves it to /tmp/etcd-backup.db. It uses secure connection parameters to access etcd.
Terminal
ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup.db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key
Expected OutputExpected
Snapshot saved at /tmp/etcd-backup.db
--endpoints - Specifies the etcd server address
--cacert - CA certificate for secure connection
--cert - Client certificate for authentication
--key - Client key for authentication
This command checks the status and details of the saved snapshot to ensure it is valid.
Terminal
ETCDCTL_API=3 etcdctl snapshot status /tmp/etcd-backup.db
Expected OutputExpected
ETCD Version: 3.5.9 Size: 12345678 Revision: 12345 Raft Term: 5 Raft Index: 67890
Stop the Kubernetes API server to safely restore etcd data without interference.
Terminal
systemctl stop kube-apiserver
Expected OutputExpected
No output (command runs silently)
Restore the etcd data from the backup snapshot into a new data directory.
Terminal
ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup.db --data-dir=/var/lib/etcd-new
Expected OutputExpected
Restored etcd snapshot to /var/lib/etcd-new
--data-dir - Directory to restore the etcd data into
Rename the current etcd data directory and replace it with the restored data directory.
Terminal
mv /var/lib/etcd /var/lib/etcd-old && mv /var/lib/etcd-new /var/lib/etcd
Expected OutputExpected
No output (command runs silently)
Start the Kubernetes API server again after restoring etcd data.
Terminal
systemctl start kube-apiserver
Expected OutputExpected
No output (command runs silently)
Check the health of the etcd endpoint to confirm it is running properly after recovery.
Terminal
ETCDCTL_API=3 etcdctl endpoint health --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key
Expected OutputExpected
https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 1.234ms
--endpoints - Specifies the etcd server address
--cacert - CA certificate for secure connection
--cert - Client certificate for authentication
--key - Client key for authentication
Key Concept

If you remember nothing else from this pattern, remember: always stop the Kubernetes API server before restoring etcd data to avoid corruption.

Common Mistakes
Trying to restore etcd snapshot while kube-apiserver is running
This can cause data corruption or conflicts because the API server is actively using etcd data.
Stop the kube-apiserver service before restoring etcd data.
Not using the correct certificates and keys when running etcdctl commands
etcd uses secure communication and will reject commands without proper authentication.
Always provide the correct --cacert, --cert, and --key flags pointing to valid certificates.
Overwriting the current etcd data directory without backing it up
If the restore fails, you lose the original data and cannot recover.
Rename or move the original data directory before replacing it with restored data.
Summary
Use etcdctl snapshot save to create a backup of your etcd data securely.
Stop the kube-apiserver before restoring etcd data to prevent conflicts.
Restore the snapshot to a new directory, then replace the old data directory safely.
Restart the kube-apiserver and verify etcd health to confirm successful recovery.

Practice

(1/5)
1. What is the primary purpose of taking an etcd backup in Kubernetes?
easy
A. To save the current state of the cluster data safely
B. To update the Kubernetes version automatically
C. To monitor cluster performance metrics
D. To delete old cluster data permanently

Solution

  1. Step 1: Understand etcd role in Kubernetes

    etcd stores all cluster data including configuration and state.
  2. Step 2: Purpose of backup

    Backing up etcd saves this data so it can be restored if lost or corrupted.
  3. Final Answer:

    To save the current state of the cluster data safely -> Option A
  4. Quick Check:

    Backup = Save cluster data [OK]
Hint: Backup means saving cluster data safely [OK]
Common Mistakes:
  • Confusing backup with updating Kubernetes
  • Thinking backup monitors performance
  • Assuming backup deletes data
2. Which of the following is the correct command to create an etcd snapshot backup?
easy
A. etcdctl save snapshot backup.db
B. etcdctl backup create backup.db
C. etcdctl snapshot create backup.db
D. etcdctl snapshot save backup.db

Solution

  1. Step 1: Recall etcdctl snapshot save syntax

    The correct command to save a snapshot is etcdctl snapshot save <file>.
  2. Step 2: Compare options

    Only etcdctl snapshot save backup.db matches the exact syntax.
  3. Final Answer:

    etcdctl snapshot save backup.db -> Option D
  4. Quick Check:

    Snapshot save = create backup [OK]
Hint: Use 'etcdctl snapshot save' to backup [OK]
Common Mistakes:
  • Using 'backup create' instead of 'snapshot save'
  • Mixing 'create' and 'save' commands
  • Incorrect command order
3. What will be the output of the following command if the backup file backup.db exists and is valid?

etcdctl snapshot restore backup.db --data-dir restored-etcd
medium
A. Restores the snapshot data into the directory 'restored-etcd'
B. Creates a new snapshot named 'restored-etcd'
C. Deletes the existing backup.db file
D. Shows an error that the file does not exist

Solution

  1. Step 1: Understand snapshot restore command

    The command restores data from a snapshot file into a specified data directory.
  2. Step 2: Analyze given command

    It uses backup.db as source and restores into restored-etcd directory.
  3. Final Answer:

    Restores the snapshot data into the directory 'restored-etcd' -> Option A
  4. Quick Check:

    Restore command = recover data to directory [OK]
Hint: Restore puts data into given directory [OK]
Common Mistakes:
  • Thinking it creates a new snapshot
  • Assuming it deletes backup files
  • Expecting error when file exists
4. You ran etcdctl snapshot save backup.db but the command failed with an error: etcdctl: command not found. What is the most likely cause?
medium
A. The command syntax is incorrect
B. The etcdctl tool is not installed or not in the system PATH
C. The etcd server is down and cannot create a snapshot
D. The backup.db file already exists and cannot be overwritten

Solution

  1. Step 1: Analyze error message

    The error 'command not found' means the system cannot find the etcdctl program.
  2. Step 2: Identify cause

    This usually happens if etcdctl is not installed or not in the PATH environment variable.
  3. Final Answer:

    The etcdctl tool is not installed or not in the system PATH -> Option B
  4. Quick Check:

    Command not found = tool missing or PATH issue [OK]
Hint: Command not found means tool missing or PATH error [OK]
Common Mistakes:
  • Assuming file overwrite causes command not found
  • Blaming etcd server status for command not found
  • Thinking syntax error causes command not found
5. You want to recover your Kubernetes cluster after a failure using an etcd snapshot. Which sequence of commands correctly restores the cluster data and starts etcd with the restored data?
hard
A. systemctl restart etcd && etcdctl snapshot restore backup.db --data-dir /var/lib/etcd-restored
B. etcdctl snapshot save backup.db && systemctl stop etcd
C. etcdctl snapshot restore backup.db --data-dir /var/lib/etcd-restored && systemctl restart etcd
D. etcdctl snapshot restore backup.db --data-dir /var/lib/etcd-restored && systemctl stop etcd

Solution

  1. Step 1: Restore snapshot to a new data directory

    Use etcdctl snapshot restore backup.db --data-dir /var/lib/etcd-restored to recover data safely without overwriting live data.
  2. Step 2: Restart etcd service to use restored data

    Restarting etcd with systemctl restart etcd applies the restored data directory.
  3. Final Answer:

    etcdctl snapshot restore backup.db --data-dir /var/lib/etcd-restored && systemctl restart etcd -> Option C
  4. Quick Check:

    Restore then restart etcd = recovery [OK]
Hint: Restore snapshot first, then restart etcd service [OK]
Common Mistakes:
  • Restarting etcd before restoring snapshot
  • Stopping etcd without restarting after restore
  • Saving snapshot instead of restoring