Service Operations Runbook

Deploy a New Version

Via GitHub Actions (Standard)

Push to main branch triggers automatic deployment.

Manual Deploy (Emergency)

# Update service with new image
aws ecs update-service \
  --cluster autom8-cluster \
  --service autom8y-{service} \
  --force-new-deployment

Rollback to Previous Version

# List task definition revisions
aws ecs list-task-definitions --family autom8y-{service}

# Update to previous revision
aws ecs update-service \
  --cluster autom8-cluster \
  --service autom8y-{service} \
  --task-definition autom8y-{service}:{previous-revision}

# Wait for rollback
aws ecs wait services-stable --cluster autom8-cluster --services autom8y-{service}

Scale Service

Immediate (via CLI)

aws ecs update-service \
  --cluster autom8-cluster \
  --service autom8y-{service} \
  --desired-count 3

Permanent (via Terraform)

Update desired_count in your service module and apply.

View Logs

# Stream logs
aws logs tail /ecs/autom8y-{service} --follow

# Last 100 lines
aws logs tail /ecs/autom8y-{service} --since 1h

CloudWatch Console: CloudWatch > Log groups > /ecs/autom8y-{service}

Check Service Health

# ECS service status
aws ecs describe-services \
  --cluster autom8-cluster \
  --services autom8y-{service} \
  --query 'services[0].{Status:status,Running:runningCount,Desired:desiredCount}'

# Target group health
aws elbv2 describe-target-health \
  --target-group-arn {target-group-arn}

# Recent events
aws ecs describe-services \
  --cluster autom8-cluster \
  --services autom8y-{service} \
  --query 'services[0].events[:5]'

Restart Service

Force new deployment without changing image:

aws ecs update-service \
  --cluster autom8-cluster \
  --service autom8y-{service} \
  --force-new-deployment

Debugging Failed Deployments

Task Fails to Start

# Check stopped task reason
aws ecs describe-tasks \
  --cluster autom8-cluster \
  --tasks $(aws ecs list-tasks --cluster autom8-cluster --service autom8y-{service} --desired-status STOPPED --query 'taskArns[0]' --output text)

Common causes:

Image not found: Check ECR repository and image tag
Out of memory: Increase memory in module
Permission denied: Check task execution role

Health Check Failures

Verify health endpoint works locally
Check container logs for startup errors
Verify security group allows ALB traffic

ALB Target Registration Timeout

Check ECS service events for errors
Verify subnet has NAT gateway for ECR access
Check task execution role permissions

AWS CLI Quick Reference

Command	Description
`aws ecs list-services --cluster autom8-cluster`	List all services
`aws ecs describe-services --cluster autom8-cluster --services {name}`	Service details
`aws ecs list-tasks --cluster autom8-cluster --service {name}`	Running tasks
`aws logs tail /ecs/{name} --follow`	Stream logs
`aws ecs update-service --cluster autom8-cluster --service {name} --force-new-deployment`	Restart