error-recovery-automationStandardize handling of common OpenClaw errors (gateway restart, browser service unavailable, cron failures) with automated recovery steps. Use when you need...
Install via ClawdBot CLI:
clawdbot install konscious0beast/error-recovery-automationThis skill provides patterns for automating the detection and recovery of common OpenClaw errors: gateway unresponsiveness, browser service failures, cron scheduler issues, and other recurring problems. It builds on health‑monitoring and system‑diagnostics by adding automated recovery workflows that can be triggered by cron jobs, heartbeat checks, or external monitoring.
Before automating recovery, you must reliably detect the error. Use these detection methods:
Gateway unresponsive:
openclaw gateway status returns non‑zero exit code or shows "running": false.~/.openclaw/logs/gateway.err.log) contain recent CRITICAL or ERROR entries.Browser service unavailable:
openclaw browser --browser-profile openclaw status --json shows "running": false or CDP not ready.curl to CDP endpoint fails.Cron scheduler not running:
openclaw cron status returns "running": false or error.openclaw cron list for missed runs).Memory search disabled:
memory_search tool returns “disabled” or native‑module error.openclaw doctor --fix reports better‑sqlite3 mismatch.Permission errors:
EACCES/EPERM.For each error type, define a recovery script that attempts to restore service automatically. The script should:
#!/bin/bash
set -e
SERVICE="gateway"
MAX_ATTEMPTS=2
SLEEP_SECONDS=5
log() { echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"; }
check() {
openclaw gateway status > /dev/null 2>&1
}
restart() {
openclaw gateway restart
sleep "$SLEEP_SECONDS"
}
attempt=0
while [ $attempt -lt $MAX_ATTEMPTS ]; do
if check; then
log "$SERVICE is healthy"
exit 0
fi
log "$SERVICE is unhealthy, restarting (attempt $((attempt+1))/$MAX_ATTEMPTS)..."
restart
((attempt++))
done
log "$SERVICE could not be recovered after $MAX_ATTEMPTS attempts"
exit 1
#!/bin/bash
set -e
SERVICE="browser"
PROFILE="openclaw"
MAX_ATTEMPTS=2
SLEEP_SECONDS=8
log() { echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"; }
check() {
openclaw browser --browser-profile "$PROFILE" status --json 2>&1 | grep -q '"running":true'
}
restart() {
openclaw browser --browser-profile "$PROFILE" stop
sleep 2
openclaw browser --browser-profile "$PROFILE" start
sleep "$SLEEP_SECONDS"
}
attempt=0
while [ $attempt -lt $MAX_ATTEMPTS ]; do
if check; then
log "$SERVICE ($PROFILE) is healthy"
exit 0
fi
log "$SERVICE ($PROFILE) is unhealthy, restarting (attempt $((attempt+1))/$MAX_ATTEMPTS)..."
restart
((attempt++))
done
log "$SERVICE ($PROFILE) could not be recovered after $MAX_ATTEMPTS attempts"
exit 1
#!/bin/bash
set -e
SERVICE="cron"
MAX_ATTEMPTS=1
SLEEP_SECONDS=3
log() { echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"; }
check() {
openclaw cron status 2>&1 | grep -q '"running":true'
}
restart() {
# Cron is restarted automatically when gateway restarts.
# If cron is not running, restart gateway.
openclaw gateway restart
sleep "$SLEEP_SECONDS"
}
attempt=0
while [ $attempt -lt $MAX_ATTEMPTS ]; do
if check; then
log "$SERVICE scheduler is running"
exit 0
fi
log "$SERVICE scheduler is not running, restarting gateway (attempt $((attempt+1))/$MAX_ATTEMPTS)..."
restart
((attempt++))
done
log "$SERVICE scheduler still not running after $MAX_ATTEMPTS attempts"
exit 1
#!/bin/bash
set -e
SERVICE="memory_search"
MAX_ATTEMPTS=1
log() { echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"; }
check() {
openclaw memory search --query "test" 2>&1 | grep -q -v "disabled\|Module did not self-register"
}
restart() {
# Try rebuilding better‑sqlite3
cd "$(dirname "$(which openclaw)")/../lib/node_modules/openclaw"
npm rebuild better-sqlite3
# Restart gateway to pick up the rebuilt module
openclaw gateway restart
sleep 5
}
attempt=0
while [ $attempt -lt $MAX_ATTEMPTS ]; do
if check; then
log "$SERVICE is functional"
exit 0
fi
log "$SERVICE is disabled, rebuilding native module (attempt $((attempt+1))/$MAX_ATTEMPTS)..."
restart
((attempt++))
done
log "$SERVICE could not be recovered after $MAX_ATTEMPTS attempts"
exit 1
Once you have a recovery script, schedule it as a cron job that runs only when the service is likely to fail (e.g., every 30 minutes for browser, every hour for gateway). Use an isolated agent session to execute the script and announce failures.
Example cron job for browser recovery:
openclaw cron add \
--name "Browser‑Recovery‑Automation" \
--schedule 'every 30 minutes' \
--session isolated \
--payload '{"kind":"agentTurn","message":"Run browser recovery automation script","model":"default","thinking":"low"}' \
--delivery '{"mode":"announce","channel":"telegram"}'
Agent response inside isolated session: The agent reads the script (or inline logic) and executes it via exec. If the script exits with 0, the agent announces success; if non‑zero, the cron delivery forwards the failure message.
Alternative: You can embed the recovery logic directly in the agent’s response (without a separate script) for simplicity, but a script is easier to test and reuse.
If automated recovery fails after the maximum attempts, escalate:
memory/YYYY‑MM‑DD.md with tag error‑recovery‑failed.inbox/agent‑aufgaben.md for manual diagnosis.Example escalation snippet:
if [ $? -ne 0 ]; then
echo "Browser recovery failed. Adding manual diagnosis task."
# Append to agent-aufgaben.md
echo "| 99 | Diagnose browser recovery failure – automated recovery failed after 2 attempts | ⬜ |" >> inbox/agent-aufgaben.md
# Store in memory
echo "## [error] Browser recovery automation failed" >> memory/$(date +%Y-%m-%d).md
echo "Date: $(date +%Y-%m-%d)" >> memory/$(date +%Y-%m-%d).md
echo "Tags: error, browser, recovery-failed" >> memory/$(date +%Y-%m-%d).md
echo "Browser recovery script exited with code $?. Manual intervention required." >> memory/$(date +%Y-%m-%d).md
fi
Before deploying a recovery script as a cron job, test it manually:
Example test command:
# Stop browser service
openclaw browser --browser-profile openclaw stop
# Run recovery script
./scripts/browser-recovery.sh
# Verify browser is running
openclaw browser --browser-profile openclaw status --json | grep '"running":true'
Script: scripts/gateway-recovery.sh (see template above). Cron schedule: every 1 hour. Announce only on failure.
Script: scripts/browser-recovery.sh (see template above). Cron schedule: every 30 minutes. Announce only on failure.
A single script that checks multiple services and recovers any that are unhealthy. Useful for a comprehensive “keep‑alive” cron job.
#!/bin/bash
set -e
log() { echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"; }
# Check gateway
if ! openclaw gateway status > /dev/null 2>&1; then
log "Gateway unhealthy, restarting..."
openclaw gateway restart
sleep 5
fi
# Check browser
if ! openclaw browser --browser-profile openclaw status --json 2>&1 | grep -q '"running":true'; then
log "Browser unhealthy, restarting..."
openclaw browser --browser-profile openclaw stop
sleep 2
openclaw browser --browser-profile openclaw start
sleep 8
fi
log "All services healthy"
exit 0
Schedule this script every 30 minutes with an isolated agentTurn job.
scripts/gateway-recovery.sh (template)scripts/browser-recovery.sh (template)scripts/cron-recovery.sh (template)skills/health-monitoring/SKILL.mdskills/system-diagnostics/SKILL.mddocs/MAINTENANCE.mdmemory/patterns/playbooks.mdopenclaw cron --helpopenclaw gateway --helpopenclaw browser --helpWhen an OpenClaw error occurs (gateway, browser, cron, memory search), read this skill to create or run an automated recovery script. Store successful recovery patterns in memory/patterns/tools.md. Update pending.md if automation fails and manual intervention is needed.
This skill increases autonomy by providing standardized, automated recovery workflows for common failures, reducing the need for manual intervention and increasing system resilience.
Generated Mar 1, 2026
An e-commerce site uses OpenClaw for automated product scraping and price monitoring. The gateway service intermittently fails due to high traffic, causing downtime. This skill automates detection and restart of the gateway, ensuring continuous data collection and reducing manual intervention by IT staff.
A financial institution relies on OpenClaw to aggregate real-time market data from multiple sources. Browser service failures disrupt data feeds, leading to incomplete analytics. This skill automates recovery of the browser service, maintaining data integrity and minimizing operational risks in fast-paced trading environments.
A healthcare provider uses OpenClaw to automate appointment scheduling and patient data synchronization across clinics. Cron scheduler issues cause missed job executions, affecting appointment reminders. This skill detects and recovers cron failures by restarting the gateway, ensuring reliable scheduling and improving patient care coordination.
A media company employs OpenClaw for automated content scraping and archiving from news websites. Memory search errors due to native module mismatches halt content indexing. This skill rebuilds the better-sqlite3 module and restarts services, restoring search functionality and preventing data loss in content pipelines.
A logistics firm uses OpenClaw to monitor shipment tracking data from carrier websites. Permission errors on log files disrupt error logging and system diagnostics. This skill automates permission fixes and service recovery, ensuring continuous tracking and reducing manual troubleshooting in supply chain operations.
Offer this skill as part of a premium subscription for OpenClaw users, providing automated error recovery features. Revenue is generated through monthly or annual fees, targeting businesses that require high uptime and reduced IT overhead. This model includes regular updates and support for new error patterns.
Provide consulting services to customize and integrate this skill into clients' existing OpenClaw deployments. Revenue comes from one-time project fees or hourly rates, focusing on enterprises with complex infrastructure. This model includes training and ongoing maintenance contracts for tailored recovery workflows.
Release the skill as open source to build community adoption, while offering premium support and advanced features for a fee. Revenue is generated from support contracts, custom development, and enterprise licenses, appealing to organizations that value transparency but need guaranteed assistance.
đź’¬ Integration Tip
Schedule recovery scripts as cron jobs during low-activity periods to minimize disruption, and test them in a staging environment first to ensure they handle edge cases without causing additional failures.
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Search and analyze your own session logs (older/parent conversations) using jq.
Typed knowledge graph for structured agent memory and composable skills. Use when creating/querying entities (Person, Project, Task, Event, Document), linking related objects, enforcing constraints, planning multi-step actions as graph transformations, or when skills need to share state. Trigger on "remember", "what do I know about", "link X to Y", "show dependencies", entity CRUD, or cross-skill data access.
Ultimate AI agent memory system for Cursor, Claude, ChatGPT & Copilot. WAL protocol + vector search + git-notes + cloud backup. Never lose context again. Vibe-coding ready.
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection