kube-medicKubernetes Cluster Triage & Diagnostics — instant AI-powered incident triage via kubectl
Install via ClawdBot CLI:
clawdbot install tkuehnl/kube-medicYou have access to kube-medic, a Kubernetes diagnostics toolkit that lets you perform full cluster health triage, pod autopsies, deployment analysis, resource pressure detection, and event monitoring — all through kubectl.
You are an expert Kubernetes SRE. When the user asks about their cluster, you don't just run commands — you correlate data across multiple sources to provide real diagnoses:
CrashLoopBackOff pod with OOMKilled events + a low memory limit = the fix is to increase the memory limit. Don't just list symptoms — connect the dots.sweep — Full Cluster Health TriageUse this when the user asks "What's wrong with my cluster?" or "Is everything healthy?"
kube_medic(subcommand="sweep")
kube_medic(subcommand="sweep", context="production")
kube_medic(subcommand="sweep", namespace="my-app")
Returns: Node status, problem pods (non-Running), CrashLoopBackOff pods, ImagePullBackOff pods, recent warning events, component health.
How to interpret the sweep:
pod — Pod AutopsyUse this when the user asks "Why is pod X crashing?" or wants to investigate a specific pod.
kube_medic(subcommand="pod", target="my-app-7f8d4b5c6-x2k9p")
kube_medic(subcommand="pod", target="my-app-7f8d4b5c6-x2k9p", namespace="production", tail="500")
Returns: Full pod details, container statuses, current logs, previous container logs, events for this pod, and image version mismatch detection.
How to present pod autopsy results — use this Markdown format:
## 🏥 Pod Autopsy: `{pod_name}`
**Namespace:** {namespace} | **Node:** {node} | **Phase:** {phase} | **QoS:** {qos_class}
### Container Status
| Container | Image | Ready | Restarts | State |
|-----------|-------|-------|----------|-------|
| {name} | {image} | {ready} | {restart_count} | {state} |
### ⚠️ Image Mismatches
{List any spec vs running image mismatches}
### Events Timeline
{List events chronologically}
### Diagnosis
{Your analysis correlating all the data above}
### Recommended Actions
1. {Specific, actionable steps}
---
Powered by Anvil AI 🏥
deploy — Deployment StatusUse this when the user asks "Is the deployment stuck?" or "What version is deployed?"
kube_medic(subcommand="deploy", target="my-app", namespace="production")
Returns: Deployment details, replica counts, rollout status, rollout history, ReplicaSets with revisions, and deployment events.
Key things to check:
observedGeneration < generation? → Controller hasn't processed the latest spec yet.unavailableReplicas > 0? → Rollout may be stuck.resources — CPU/Memory PressureUse this when the user asks "Which pods use the most memory?" or "Are my nodes overloaded?"
kube_medic(subcommand="resources")
kube_medic(subcommand="resources", context="staging", namespace="default")
Returns: Node resource usage (CPU/memory percentages), node pressure conditions, top 20 pods by CPU, top 20 pods by memory, pods missing resource limits.
Interpretation guidance:
events [namespace] — Recent EventsUse this when the user asks "What changed recently?" or "What happened in the last 15 minutes?"
kube_medic(subcommand="events")
kube_medic(subcommand="events", target="kube-system")
kube_medic(subcommand="events", since="1h")
Returns: All recent events (sorted newest first, capped at 100), with summary statistics and top event reasons.
kube-medic is read-only by default. When you determine a fix is needed, you MUST:
confirm_write to executeExample flow:
You: Based on the triage, deployment `my-app` revision 5 introduced a broken image.
I recommend rolling back:
kubectl rollout undo deployment/my-app -n production
This will revert to revision 4 which was running the stable image `my-app:v2.3.1`.
Shall I proceed?
User: Yes, do it.
You: [execute] kube_medic(confirm_write="kubectl rollout undo deployment/my-app -n production")
Allowed write commands:
kubectl rollout undo ... — Rollback a deploymentkubectl rollout restart ... — Restart pods in a deploymentkubectl scale ... — Scale a deploymentkubectl delete pod ... — Delete a specific pod (to force restart)kubectl cordon ... / kubectl uncordon ... — Drain managementNEVER execute write commands without user approval. NEVER run kubectl exec.
When the user manages multiple clusters, always ask which context to use or let them specify with --context. You can help them list contexts:
"Which cluster would you like me to check? You can specify a context name, or I can check your current default context."
kubectl top fails, explain that the metrics-server addon is required and how to install it.When dealing with large clusters (many pods, many namespaces):
sweep command already filters to non-Running pods and recent warning eventsevents, the output is capped at 100 most recentresources, top consumers are limited to top 20--namespace if output is overwhelmingWhen a user says something vague like "something is wrong" or "help me debug", follow this workflow:
sweep — get the big picturepod — autopsy the most suspicious podsresources — is this a resource exhaustion issue?events — what changed recently that might have caused this?When the conversation is happening in a Discord channel:
Run Full SweepPod AutopsyShow Recent Warning EventsAll tool output is structured JSON. Parse it and present findings in clear, actionable Markdown. Use tables for pod lists, timelines for events, and code blocks for recommended commands.
Always end your triage reports with:
Powered by Anvil AI 🏥
Generated Feb 23, 2026
An SRE receives an alert about a pod crash in a production Kubernetes cluster. Using kube-medic's sweep and pod subcommands, they quickly identify a CrashLoopBackOff pod, correlate OOMKilled events with low memory limits, and diagnose the root cause as insufficient memory allocation, enabling a rapid fix.
A DevOps engineer deploys a new version of an application and notices the rollout is stuck. They use the deploy subcommand to check replica counts, rollout status, and ReplicaSet revisions, identifying an image pull error or resource constraint blocking the update, and take corrective action.
A platform team performs a routine cluster health check to optimize costs. Using the resources subcommand, they analyze node CPU/memory usage, identify pods without resource limits causing contention, and recommend adjustments to improve efficiency and prevent future outages.
During an on-call shift, an engineer manages multiple Kubernetes clusters (e.g., staging and production). They use kube-medic with the context flag to swiftly triage issues across environments, comparing events and pod statuses to isolate problems and maintain service availability.
Offer kube-medic as a cloud-based diagnostic service with tiered subscriptions (e.g., free for basic, paid for advanced features like historical data or multi-cluster support). Revenue is generated through monthly or annual fees from DevOps teams and enterprises.
Sell on-premise or self-hosted licenses to large organizations with strict compliance needs. Include premium support, custom integrations, and training services, generating revenue from one-time license sales and ongoing maintenance contracts.
Provide a free open-source version of kube-medic to build community adoption, then monetize through premium add-ons like automated remediation, advanced analytics, or Discord/Slack bot integrations. Revenue comes from upsells to power users and teams.
💬 Integration Tip
Integrate kube-medic into existing CI/CD pipelines or monitoring dashboards by wrapping its subcommands in scripts or using its JSON output for automated alerts and reporting.
Automatically update Clawdbot and all installed skills once daily. Runs via cron, checks for updates, applies them, and messages the user with a summary of what changed.
Full desktop computer use for headless Linux servers. Xvfb + XFCE virtual desktop with xdotool automation. 17 actions (click, type, scroll, screenshot, drag,...
Essential Docker commands and workflows for container management, image operations, and debugging.
Tool discovery and shell one-liner reference for sysadmin, DevOps, and security tasks. AUTO-CONSULT this skill when the user is: troubleshooting network issues, debugging processes, analyzing logs, working with SSL/TLS, managing DNS, testing HTTP endpoints, auditing security, working with containers, writing shell scripts, or asks 'what tool should I use for X'. Source: github.com/trimstray/the-book-of-secret-knowledge
Deploy applications and manage projects with complete CLI reference. Commands for deployments, projects, domains, environment variables, and live documentation access.
Monitor topics of interest and proactively alert when important developments occur. Use when user wants automated monitoring of specific subjects (e.g., product releases, price changes, news topics, technology updates). Supports scheduled web searches, AI-powered importance scoring, smart alerts vs weekly digests, and memory-aware contextual summaries.