Summary: An Auto Scaling Group continuously monitors member instances and replaces anything it considers unhealthy. Stopping an instance or creating an AMI (which may stop or reboot the instance) triggers this replacement logic before the operation completes. To perform these maintenance tasks safely, you need to either detach the instance, place it in Standby, or suspend the relevant ASG processes.
The Core Problem: ASG Treats State Changes as Failures
Auto Scaling Groups maintain a desired capacity. The ASG health check runs on a loop—by default using EC2 status checks, optionally augmented by ELB health checks. When an instance transitions out of the running state, the ASG marks it unhealthy, terminates it, and launches a replacement to restore capacity.
This is exactly the behavior you want during normal operations. It becomes a problem when you intentionally change the instance state for maintenance.
Scenario 1: Stopping an Instance
When you stop an EC2 instance that belongs to an ASG, the following sequence occurs:
- The instance enters the
stopping→stoppedstate. - The ASG health check detects the instance is no longer
running. - The ASG marks the instance as Unhealthy.
- The ASG terminates the stopped instance and launches a new one.
The result: the instance you stopped is terminated. You cannot simply stop and restart it later—the ASG will not allow the instance to remain in a stopped state.
Why this matters
- Cost management: You might want to stop instances outside business hours. An ASG will immediately replace them, defeating the purpose.
- Troubleshooting: Stopping an instance to inspect its EBS volumes or preserve its state for later analysis is not possible while it belongs to the ASG.
Scenario 2: Creating an AMI
Creating an AMI from an EC2 instance uses one of two approaches:
- No-reboot AMI (
--no-reboot): The AMI is created from the running instance without any restart. This is safe inside an ASG but risks filesystem inconsistency because in-flight writes are not flushed. - Default (reboot) AMI: The instance is rebooted during image creation to ensure a clean filesystem snapshot. The reboot is usually fast enough that the ASG health check does not flag the instance, but it depends on the health check grace period and timing.
- Stop-based AMI creation: Some workflows stop the instance first for a fully consistent snapshot. This triggers the same termination sequence described in Scenario 1.
If the reboot takes longer than the health check interval, or if ELB health checks are enabled and the instance fails to respond during the reboot window, the ASG may still terminate the instance mid-AMI-creation. This can leave you with a corrupted or incomplete AMI and a terminated instance.
Workaround 1: Detach the Instance
You can remove an instance from the ASG without terminating it:
aws autoscaling detach-instances \
--instance-ids i-0123456789abcdef0 \
--auto-scaling-group-name my-asg \
--should-decrement-desired-capacity
What happens
- The instance is removed from the ASG and no longer subject to health checks.
- If
--should-decrement-desired-capacityis set, the ASG reduces its desired count by one. If omitted, the ASG launches a replacement immediately. - The detached instance continues running as a standalone EC2 instance.
After your maintenance
Once you finish (stop/start, create AMI, etc.), you can either terminate the instance or re-attach it:
aws autoscaling attach-instances \
--instance-ids i-0123456789abcdef0 \
--auto-scaling-group-name my-asg
Trade-offs
- The instance loses ASG protection (no auto-recovery, no scaling actions).
- You must remember to re-attach or terminate the instance—orphaned instances accumulate cost.
- If you did not decrement desired capacity, you temporarily have one extra instance running.
Workaround 2: Enter Standby
Standby is the preferred approach when you plan to return the instance to service:
aws autoscaling enter-standby \
--instance-ids i-0123456789abcdef0 \
--auto-scaling-group-name my-asg \
--should-decrement-desired-capacity
What happens
- The instance moves to
Standbystate inside the ASG. - ASG health checks are suspended for that instance.
- If the instance is behind a load balancer, it is deregistered from the target group.
- The instance remains a member of the ASG—it has not left the group.
You can now safely stop the instance, create an AMI, install patches, or perform any maintenance. When done:
aws autoscaling exit-standby \
--instance-ids i-0123456789abcdef0 \
--auto-scaling-group-name my-asg
The instance returns to InService, re-registers with the load balancer, and health checks resume.
Standby vs. Detach
| Aspect | Standby | Detach |
|---|---|---|
| Instance remains in ASG | Yes | No |
| Health checks paused | Yes | N/A (not in group) |
| Easy to return to service | exit-standby |
attach-instances |
| Load balancer deregistration | Automatic | Manual |
| Risk of orphaned instances | Low | Higher |
| Use case | Temporary maintenance | Permanent removal or long-term work |
Workaround 3: Suspend ASG Processes
If you need to perform maintenance across multiple instances simultaneously, suspending specific ASG processes is more efficient than placing each instance in Standby:
aws autoscaling suspend-processes \
--auto-scaling-group-name my-asg \
--scaling-processes HealthCheck ReplaceUnhealthy Launch Terminate
Key processes to suspend
- HealthCheck: Stops the ASG from marking instances as unhealthy.
- ReplaceUnhealthy: Stops the ASG from terminating unhealthy instances and launching replacements.
- Launch / Terminate: Prevents any scaling actions during the maintenance window.
Resume when finished:
aws autoscaling resume-processes \
--auto-scaling-group-name my-asg \
--scaling-processes HealthCheck ReplaceUnhealthy Launch Terminate
Risk
Suspending processes disables the ASG's self-healing for all instances in the group—not just the one you are working on. If another instance fails during this window, it will not be replaced. Keep the suspension window as short as possible.
Workaround 4: Use --no-reboot for AMIs (With Caution)
If your only goal is to create an AMI and you can tolerate potential filesystem inconsistency:
aws ec2 create-image \
--instance-id i-0123456789abcdef0 \
--name "my-ami-$(date +%Y%m%d)" \
--no-reboot
This avoids triggering any state change. The ASG never notices. However, applications with in-flight disk writes (databases, queues) may produce a corrupted snapshot. This option works well for stateless instances where the root volume contains only the OS and application code.
Decision Flowchart
Need to perform maintenance on an ASG instance?
│
├─ Creating an AMI only?
│ ├─ Stateless instance → Use --no-reboot
│ └─ Stateful / need consistent snapshot
│ └─ Place instance in Standby → Create AMI → Exit Standby
│
├─ Stopping instance for inspection?
│ └─ Place instance in Standby → Stop → Inspect → Start → Exit Standby
│
├─ Patching / updating multiple instances?
│ └─ Suspend HealthCheck + ReplaceUnhealthy → Patch → Resume
│
└─ Permanently removing instance?
└─ Detach with --should-decrement-desired-capacity
Things to Watch Out For
-
ELB health checks amplify the risk. If your ASG uses ELB health checks (not just EC2 status checks), even a brief reboot can be flagged because the instance stops responding to health check probes during the restart window.
-
Lifecycle hooks interact with detach/standby. If your ASG has lifecycle hooks configured, detaching or entering standby may trigger hook actions. Test in a non-production environment first.
-
Scaling policies continue to fire. Placing an instance in Standby decrements capacity, which may trigger a scale-out policy to launch a replacement. Set
--should-decrement-desired-capacityintentionally based on whether you want a temporary replacement. -
Instance protection is not the same as Standby. Scale-in protection (
SetInstanceProtection) prevents an instance from being selected during scale-in events, but it does not prevent the ASG from terminating an instance it has marked unhealthy. Only Standby or process suspension prevents health-check-driven termination. -
Cooldown periods. After resuming processes, the ASG may enter a cooldown period before responding to scaling events. Factor this into your maintenance window planning.
Conclusion
Auto Scaling Groups are designed to maintain a healthy fleet automatically. That same automation works against you when performing routine maintenance like stopping instances or creating AMIs. The key insight is that you must temporarily opt out of ASG management—via Standby, Detach, or process suspension—before making changes that alter instance state. Choose the approach that matches the scope and duration of your maintenance task.