Skip to content

Service Operations Runbook

Deploy a New Version

Via GitHub Actions (Standard)

Push to main branch triggers automatic deployment.

Manual Deploy (Emergency)

Terminal window
# Update service with new image
aws ecs update-service \
--cluster autom8-cluster \
--service autom8y-{service} \
--force-new-deployment

Rollback to Previous Version

Terminal window
# List task definition revisions
aws ecs list-task-definitions --family autom8y-{service}
# Update to previous revision
aws ecs update-service \
--cluster autom8-cluster \
--service autom8y-{service} \
--task-definition autom8y-{service}:{previous-revision}
# Wait for rollback
aws ecs wait services-stable --cluster autom8-cluster --services autom8y-{service}

Scale Service

Immediate (via CLI)

Terminal window
aws ecs update-service \
--cluster autom8-cluster \
--service autom8y-{service} \
--desired-count 3

Permanent (via Terraform)

Update desired_count in your service module and apply.

View Logs

Terminal window
# Stream logs
aws logs tail /ecs/autom8y-{service} --follow
# Last 100 lines
aws logs tail /ecs/autom8y-{service} --since 1h

CloudWatch Console: CloudWatch > Log groups > /ecs/autom8y-{service}

Check Service Health

Terminal window
# ECS service status
aws ecs describe-services \
--cluster autom8-cluster \
--services autom8y-{service} \
--query 'services[0].{Status:status,Running:runningCount,Desired:desiredCount}'
# Target group health
aws elbv2 describe-target-health \
--target-group-arn {target-group-arn}
# Recent events
aws ecs describe-services \
--cluster autom8-cluster \
--services autom8y-{service} \
--query 'services[0].events[:5]'

Restart Service

Force new deployment without changing image:

Terminal window
aws ecs update-service \
--cluster autom8-cluster \
--service autom8y-{service} \
--force-new-deployment

Debugging Failed Deployments

Task Fails to Start

Terminal window
# Check stopped task reason
aws ecs describe-tasks \
--cluster autom8-cluster \
--tasks $(aws ecs list-tasks --cluster autom8-cluster --service autom8y-{service} --desired-status STOPPED --query 'taskArns[0]' --output text)

Common causes:

  • Image not found: Check ECR repository and image tag
  • Out of memory: Increase memory in module
  • Permission denied: Check task execution role

Health Check Failures

  1. Verify health endpoint works locally
  2. Check container logs for startup errors
  3. Verify security group allows ALB traffic

ALB Target Registration Timeout

  1. Check ECS service events for errors
  2. Verify subnet has NAT gateway for ECR access
  3. Check task execution role permissions

AWS CLI Quick Reference

CommandDescription
aws ecs list-services --cluster autom8-clusterList all services
aws ecs describe-services --cluster autom8-cluster --services {name}Service details
aws ecs list-tasks --cluster autom8-cluster --service {name}Running tasks
aws logs tail /ecs/{name} --followStream logs
aws ecs update-service --cluster autom8-cluster --service {name} --force-new-deploymentRestart