aws-ecs-monitorAWS ECS production health monitoring with CloudWatch log analysis — monitors ECS service health, ALB targets, SSL certificates, and provides deep CloudWatch...
Install via ClawdBot CLI:
clawdbot install briancolinger/aws-ecs-monitorProduction health monitoring and log analysis for AWS ECS services.
aws CLI configured with appropriate IAM permissions:ecs:ListServices, ecs:DescribeServiceselasticloadbalancing:DescribeTargetGroups, elasticloadbalancing:DescribeTargetHealthlogs:FilterLogEvents, logs:DescribeLogGroupscurl for HTTP health checkspython3 for JSON processing and log analysisopenssl for SSL certificate checks (optional)All configuration is via environment variables:
| Variable | Required | Default | Description |
|---|---|---|---|
| ECS_CLUSTER | Yes | — | ECS cluster name |
| ECS_REGION | No | us-east-1 | AWS region |
| ECS_DOMAIN | No | — | Domain for HTTP/SSL checks (skip if unset) |
| ECS_SERVICES | No | auto-detect | Comma-separated service names to monitor |
| ECS_HEALTH_STATE | No | ./data/ecs-health.json | Path to write health state JSON |
| ECS_HEALTH_OUTDIR | No | ./data/ | Output directory for logs and alerts |
| ECS_LOG_PATTERN | No | /ecs/{service} | CloudWatch log group pattern ({service} is replaced) |
| ECS_HTTP_ENDPOINTS | No | — | Comma-separated name=url pairs for HTTP probes |
ECS_HEALTH_STATE (default: ./data/ecs-health.json) — Health state JSON fileECS_HEALTH_OUTDIR (default: ./data/) — Output directory for logs, alerts, and analysis reportsscripts/ecs-health.sh — Health Monitor# Full check
ECS_CLUSTER=my-cluster ECS_DOMAIN=example.com ./scripts/ecs-health.sh
# JSON output only
ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --json
# Quiet mode (no alerts, just status file)
ECS_CLUSTER=my-cluster ./scripts/ecs-health.sh --quiet
Exit codes: 0 = healthy, 1 = unhealthy/degraded, 2 = script error
scripts/cloudwatch-logs.sh — Log Analyzer# Pull raw logs from a service
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh pull my-api --minutes 30
# Show errors across all services
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh errors all --minutes 120
# Deep analysis with error categorization
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose --minutes 60
# Detect container restarts
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh restarts my-api
# Auto-diagnose from health state file
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh auto-diagnose
# Summary across all services
ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh summary --minutes 120
Options: --minutes N (default: 60), --json, --limit N (default: 200), --verbose
When ECS_SERVICES is not set, both scripts auto-detect services from the cluster:
aws ecs list-services --cluster $ECS_CLUSTER
Log groups are resolved by pattern (default /ecs/{service}). Override with ECS_LOG_PATTERN:
# If your log groups are /ecs/prod/my-api, /ecs/prod/my-frontend, etc.
ECS_LOG_PATTERN="/ecs/prod/{service}" ECS_CLUSTER=my-cluster ./scripts/cloudwatch-logs.sh diagnose
The health monitor can trigger the log analyzer for auto-diagnosis when issues are detected. Set ECS_HEALTH_OUTDIR to a shared directory and run both scripts together:
export ECS_CLUSTER=my-cluster
export ECS_DOMAIN=example.com
export ECS_HEALTH_OUTDIR=./data
# Run health check (auto-triggers log analysis on failure)
./scripts/ecs-health.sh
# Or run log analysis independently
./scripts/cloudwatch-logs.sh auto-diagnose --minutes 30
The log analyzer classifies errors into:
panic — Go panicsfatal — Fatal errorsoom — Out of memorytimeout — Connection/request timeoutsconnection_error — Connection refused/resethttp_5xx — HTTP 500-level responsespython_traceback — Python tracebacksexception — Generic exceptionsauth_error — Permission/authorization failuresstructured_error — JSON-structured error logserror — Generic ERROR-level messagesHealth check noise (GET/HEAD /health from ALB) is automatically filtered from error counts and HTTP status distribution.
Generated Mar 1, 2026
Monitors AWS ECS services hosting an e-commerce application, ensuring high availability during peak shopping periods. It checks service health, SSL certificates for secure transactions, and analyzes CloudWatch logs to quickly identify and categorize errors like payment gateway timeouts or inventory service failures.
Provides continuous health monitoring for a SaaS platform's microservices on ECS, automating detection of degraded services and SSL expiry. Log analysis helps diagnose issues like user authentication errors or API rate limit breaches, reducing downtime and improving customer satisfaction.
Ensures regulatory compliance by monitoring ECS services for financial applications, tracking service uptime and SSL certificate validity. Log analysis detects security-related errors such as unauthorized access attempts or transaction failures, aiding in audit trails and incident response.
Monitors ECS clusters handling patient data processing, checking service health and ALB target groups for reliability. Log analysis identifies errors like data ingestion timeouts or processing exceptions, ensuring data integrity and meeting healthcare compliance standards.
Tracks ECS services for a media streaming platform, monitoring load balancer health and SSL certificates for secure content delivery. Log analysis categorizes errors such as buffering timeouts or encoding failures, helping optimize performance and reduce viewer interruptions.
Offers AWS ECS monitoring as part of a managed IT services package, using this skill to provide clients with automated health checks and log analysis. Revenue is generated through monthly subscription fees based on the number of clusters or services monitored, with upsells for advanced diagnostics.
Integrates this skill into consulting engagements to help clients set up production monitoring for their ECS environments. Revenue comes from project-based fees for implementation and ongoing support contracts, leveraging the skill's auto-diagnosis features to reduce troubleshooting time.
Uses this skill as a core component in a broader SaaS product that monitors multiple cloud services, offering ECS-specific insights. Revenue is generated through tiered pricing plans based on usage metrics like log volume or number of alerts, with premium features for deep log analysis.
💬 Integration Tip
Set ECS_HEALTH_OUTDIR to a shared directory and run ecs-health.sh followed by cloudwatch-logs.sh auto-diagnose for automated issue investigation when health checks fail.
Automatically update Clawdbot and all installed skills once daily. Runs via cron, checks for updates, applies them, and messages the user with a summary of what changed.
Full desktop computer use for headless Linux servers. Xvfb + XFCE virtual desktop with xdotool automation. 17 actions (click, type, scroll, screenshot, drag,...
Essential Docker commands and workflows for container management, image operations, and debugging.
Tool discovery and shell one-liner reference for sysadmin, DevOps, and security tasks. AUTO-CONSULT this skill when the user is: troubleshooting network issues, debugging processes, analyzing logs, working with SSL/TLS, managing DNS, testing HTTP endpoints, auditing security, working with containers, writing shell scripts, or asks 'what tool should I use for X'. Source: github.com/trimstray/the-book-of-secret-knowledge
Deploy applications and manage projects with complete CLI reference. Commands for deployments, projects, domains, environment variables, and live documentation access.
Monitor topics of interest and proactively alert when important developments occur. Use when user wants automated monitoring of specific subjects (e.g., product releases, price changes, news topics, technology updates). Supports scheduled web searches, AI-powered importance scoring, smart alerts vs weekly digests, and memory-aware contextual summaries.