Service Operations Runbook
Deploy a New Version
Via GitHub Actions (Standard)
Push to main branch triggers automatic deployment.
Manual Deploy (Emergency)
# Update service with new imageaws ecs update-service \ --cluster autom8-cluster \ --service autom8y-{service} \ --force-new-deploymentRollback to Previous Version
# List task definition revisionsaws ecs list-task-definitions --family autom8y-{service}
# Update to previous revisionaws ecs update-service \ --cluster autom8-cluster \ --service autom8y-{service} \ --task-definition autom8y-{service}:{previous-revision}
# Wait for rollbackaws ecs wait services-stable --cluster autom8-cluster --services autom8y-{service}Scale Service
Immediate (via CLI)
aws ecs update-service \ --cluster autom8-cluster \ --service autom8y-{service} \ --desired-count 3Permanent (via Terraform)
Update desired_count in your service module and apply.
View Logs
# Stream logsaws logs tail /ecs/autom8y-{service} --follow
# Last 100 linesaws logs tail /ecs/autom8y-{service} --since 1hCloudWatch Console: CloudWatch > Log groups > /ecs/autom8y-{service}
Check Service Health
# ECS service statusaws ecs describe-services \ --cluster autom8-cluster \ --services autom8y-{service} \ --query 'services[0].{Status:status,Running:runningCount,Desired:desiredCount}'
# Target group healthaws elbv2 describe-target-health \ --target-group-arn {target-group-arn}
# Recent eventsaws ecs describe-services \ --cluster autom8-cluster \ --services autom8y-{service} \ --query 'services[0].events[:5]'Restart Service
Force new deployment without changing image:
aws ecs update-service \ --cluster autom8-cluster \ --service autom8y-{service} \ --force-new-deploymentDebugging Failed Deployments
Task Fails to Start
# Check stopped task reasonaws ecs describe-tasks \ --cluster autom8-cluster \ --tasks $(aws ecs list-tasks --cluster autom8-cluster --service autom8y-{service} --desired-status STOPPED --query 'taskArns[0]' --output text)Common causes:
- Image not found: Check ECR repository and image tag
- Out of memory: Increase
memoryin module - Permission denied: Check task execution role
Health Check Failures
- Verify health endpoint works locally
- Check container logs for startup errors
- Verify security group allows ALB traffic
ALB Target Registration Timeout
- Check ECS service events for errors
- Verify subnet has NAT gateway for ECR access
- Check task execution role permissions
AWS CLI Quick Reference
| Command | Description |
|---|---|
aws ecs list-services --cluster autom8-cluster | List all services |
aws ecs describe-services --cluster autom8-cluster --services {name} | Service details |
aws ecs list-tasks --cluster autom8-cluster --service {name} | Running tasks |
aws logs tail /ecs/{name} --follow | Stream logs |
aws ecs update-service --cluster autom8-cluster --service {name} --force-new-deployment | Restart |