0
0
AWScloud~5 mins

Reliability pillar principles in AWS - Commands & Configuration

Choose your learning style9 modes available
Introduction
Reliability means making sure your app or service keeps working well even if things go wrong. It helps avoid downtime and keeps users happy by handling failures smoothly.
When you want your website to stay online even if a server crashes
When you need your app to recover quickly from unexpected errors
When you want to test how your system behaves under failure conditions
When you want to automatically fix problems without manual help
When you want to plan for growth without losing service quality
Commands
This command creates an alarm that watches CPU usage on an EC2 instance. If CPU usage goes above 80% for 2 periods of 5 minutes, it triggers an alert to notify or take action. This helps detect problems early.
Terminal
aws cloudwatch put-metric-alarm --alarm-name HighCPUUtilization --metric-name CPUUtilization --namespace AWS/EC2 --statistic Average --period 300 --threshold 80 --comparison-operator GreaterThanThreshold --dimensions Name=InstanceId,Value=i-0123456789abcdef0 --evaluation-periods 2 --alarm-actions arn:aws:sns:us-east-1:123456789012:MyTopic --unit Percent
Expected OutputExpected
No output (command runs silently)
--alarm-name - Name of the alarm
--threshold - Value that triggers the alarm
--alarm-actions - What happens when alarm triggers
This command creates an auto scaling group that keeps the number of EC2 instances between 1 and 3. It automatically adds or removes instances to keep the app running smoothly during changes in demand.
Terminal
aws autoscaling create-auto-scaling-group --auto-scaling-group-name my-asg --launch-configuration-name my-launch-config --min-size 1 --max-size 3 --desired-capacity 2 --vpc-zone-identifier subnet-12345678
Expected OutputExpected
No output (command runs silently)
--min-size - Minimum number of instances
--max-size - Maximum number of instances
--desired-capacity - Starting number of instances
This command checks the current status of the auto scaling group to see how many instances are running and if scaling actions happened.
Terminal
aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names my-asg
Expected OutputExpected
{ "AutoScalingGroups": [ { "AutoScalingGroupName": "my-asg", "MinSize": 1, "MaxSize": 3, "DesiredCapacity": 2, "Instances": [ { "InstanceId": "i-0123456789abcdef0", "LifecycleState": "InService", "HealthStatus": "Healthy" }, { "InstanceId": "i-0fedcba9876543210", "LifecycleState": "InService", "HealthStatus": "Healthy" } ] } ] }
Key Concept

If you remember nothing else from this pattern, remember: design your system to detect problems early and automatically fix them to keep your service running smoothly.

Common Mistakes
Not setting alarms for important metrics like CPU or disk space
Without alarms, you won't know when your system is struggling until users complain or it crashes
Set alarms on key metrics to get notified early and take action
Setting auto scaling min and max sizes too narrow
If min and max are the same, auto scaling can't adjust capacity to handle load changes
Set a range that allows scaling up and down based on demand
Not verifying auto scaling group status after creation
You might think scaling is working but instances could be unhealthy or missing
Use describe commands to check the health and number of instances regularly
Summary
Create alarms to watch important system metrics and get notified of issues.
Use auto scaling groups to automatically add or remove servers based on demand.
Check the status of your auto scaling groups to ensure your system stays healthy.