Failover Routing in Route 53: How It Works and When to Use
Route 53 automatically directs traffic to a backup resource if the primary one becomes unhealthy. It helps keep websites or applications available by switching DNS responses to a secondary endpoint when the main one fails.How It Works
Failover routing in Route 53 works like a safety net for your website or app. Imagine you have a main store and a backup store. If the main store closes unexpectedly, you want customers to go to the backup store instead. Route 53 does this by checking if your main server is healthy. If it detects a problem, it automatically sends visitors to the backup server.
This is done using health checks that monitor your primary resource. When the health check fails, Route 53 changes the DNS response to point to the secondary resource. This switch happens quickly to reduce downtime and keep your service running smoothly.
Example
aws route53 create-health-check --caller-reference "primary-check-001" --health-check-config Type=HTTP,ResourcePath=/,FailureThreshold=3,IPAddress=192.0.2.1,Port=80 aws route53 change-resource-record-sets --hosted-zone-id Z3M3LMPEXAMPLE --change-batch '{"Changes":[{"Action":"CREATE","ResourceRecordSet":{"Name":"example.com","Type":"A","SetIdentifier":"Primary","Failover":"PRIMARY","TTL":60,"ResourceRecords":[{"Value":"192.0.2.1"}],"HealthCheckId":"<health-check-id>"}},{"Action":"CREATE","ResourceRecordSet":{"Name":"example.com","Type":"A","SetIdentifier":"Secondary","Failover":"SECONDARY","TTL":60,"ResourceRecords":[{"Value":"192.0.2.2"}]}}]}'
When to Use
Use failover routing when you want to keep your website or application available even if one server or resource fails. It is ideal for critical services that cannot afford downtime.
Common use cases include:
- Switching traffic to a backup server during outages
- Failing over between data centers in different regions
- Maintaining high availability for web applications
Key Points
- Failover routing uses health checks to monitor primary resources.
- Traffic automatically switches to a secondary resource if the primary fails.
- It helps reduce downtime and improve availability.
- Works best for critical applications needing quick recovery.