kubernetesComprehensive Kubernetes and OpenShift cluster management skill covering operations, troubleshooting, manifest generation, security, and GitOps. Use this skill when: (1) Cluster operations: upgrades, backups, node management, scaling, monitoring setup (2) Troubleshooting: pod failures, networking issues, storage problems, performance analysis (3) Creating manifests: Deployments, StatefulSets, Services, Ingress, NetworkPolicies, RBAC (4) Security: audits, Pod Security Standards, RBAC, secrets management, vulnerability scanning (5) GitOps: ArgoCD, Flux, Kustomize, Helm, CI/CD pipelines, progressive delivery (6) OpenShift-specific: SCCs, Routes, Operators, Builds, ImageStreams (7) Multi-cloud: AKS, EKS, GKE, ARO, ROSA operations
Install via ClawdBot CLI:
clawdbot install kcns008/kubernetesComprehensive skill for Kubernetes and OpenShift clusters covering operations, troubleshooting, manifests, security, and GitOps.
| Platform | Version | Documentation |
|----------|---------|---------------|
| Kubernetes | 1.31.x | https://kubernetes.io/docs/ |
| OpenShift | 4.17.x | https://docs.openshift.com/ |
| EKS | 1.31 | https://docs.aws.amazon.com/eks/ |
| AKS | 1.31 | https://learn.microsoft.com/azure/aks/ |
| GKE | 1.31 | https://cloud.google.com/kubernetes-engine/docs |
| Tool | Version | Purpose |
|------|---------|---------|
| ArgoCD | v2.13.x | GitOps deployments |
| Flux | v2.4.x | GitOps toolkit |
| Kustomize | v5.5.x | Manifest customization |
| Helm | v3.16.x | Package management |
| Velero | 1.15.x | Backup/restore |
| Trivy | 0.58.x | Security scanning |
| Kyverno | 1.13.x | Policy engine |
IMPORTANT: Use kubectl for standard Kubernetes. Use oc for OpenShift/ARO.
# View nodes
kubectl get nodes -o wide
# Drain node for maintenance
kubectl drain ${NODE} --ignore-daemonsets --delete-emptydir-data --grace-period=60
# Uncordon after maintenance
kubectl uncordon ${NODE}
# View node resources
kubectl top nodes
AKS:
az aks get-upgrades -g ${RG} -n ${CLUSTER} -o table
az aks upgrade -g ${RG} -n ${CLUSTER} --kubernetes-version ${VERSION}
EKS:
aws eks update-cluster-version --name ${CLUSTER} --kubernetes-version ${VERSION}
GKE:
gcloud container clusters upgrade ${CLUSTER} --master --cluster-version ${VERSION}
OpenShift:
oc adm upgrade --to=${VERSION}
oc get clusterversion
# Install Velero
velero install --provider ${PROVIDER} --bucket ${BUCKET} --secret-file ${CREDS}
# Create backup
velero backup create ${BACKUP_NAME} --include-namespaces ${NS}
# Restore
velero restore create --from-backup ${BACKUP_NAME}
Run the bundled script for comprehensive health check:
bash scripts/cluster-health-check.sh
| Status | Meaning | Action |
|--------|---------|--------|
| Pending | Scheduling issue | Check resources, nodeSelector, tolerations |
| CrashLoopBackOff | Container crashing | Check logs: kubectl logs ${POD} --previous |
| ImagePullBackOff | Image unavailable | Verify image name, registry access |
| OOMKilled | Out of memory | Increase memory limits |
| Evicted | Node pressure | Check node resources |
# Pod logs (current and previous)
kubectl logs ${POD} -c ${CONTAINER} --previous
# Multi-pod logs with stern
stern ${LABEL_SELECTOR} -n ${NS}
# Exec into pod
kubectl exec -it ${POD} -- /bin/sh
# Pod events
kubectl describe pod ${POD} | grep -A 20 Events
# Cluster events (sorted by time)
kubectl get events -A --sort-by='.lastTimestamp' | tail -50
# Test DNS
kubectl run -it --rm debug --image=busybox -- nslookup kubernetes.default
# Test service connectivity
kubectl run -it --rm debug --image=curlimages/curl -- curl -v http://${SVC}.${NS}:${PORT}
# Check endpoints
kubectl get endpoints ${SVC}
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${APP_NAME}
namespace: ${NAMESPACE}
labels:
app.kubernetes.io/name: ${APP_NAME}
app.kubernetes.io/version: "${VERSION}"
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
template:
metadata:
labels:
app.kubernetes.io/name: ${APP_NAME}
spec:
serviceAccountName: ${APP_NAME}
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: ${APP_NAME}
image: ${IMAGE}:${TAG}
ports:
- name: http
containerPort: 8080
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
topologyKey: kubernetes.io/hostname
apiVersion: v1
kind: Service
metadata:
name: ${APP_NAME}
spec:
selector:
app.kubernetes.io/name: ${APP_NAME}
ports:
- name: http
port: 80
targetPort: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ${APP_NAME}
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- ${HOST}
secretName: ${APP_NAME}-tls
rules:
- host: ${HOST}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ${APP_NAME}
port:
name: http
apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: ${APP_NAME}
spec:
to:
kind: Service
name: ${APP_NAME}
port:
targetPort: http
tls:
termination: edge
insecureEdgeTerminationPolicy: Redirect
Use the bundled script for manifest generation:
bash scripts/generate-manifest.sh deployment myapp production
Run the bundled script:
bash scripts/security-audit.sh [namespace]
apiVersion: v1
kind: Namespace
metadata:
name: ${NAMESPACE}
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: baseline
pod-security.kubernetes.io/warn: restricted
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ${APP_NAME}-policy
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app.kubernetes.io/name: database
ports:
- protocol: TCP
port: 5432
# Allow DNS
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
apiVersion: v1
kind: ServiceAccount
metadata:
name: ${APP_NAME}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: ${APP_NAME}-role
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ${APP_NAME}-binding
subjects:
- kind: ServiceAccount
name: ${APP_NAME}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: ${APP_NAME}-role
# Scan image with Trivy
trivy image ${IMAGE}:${TAG}
# Scan with severity filter
trivy image --severity HIGH,CRITICAL ${IMAGE}:${TAG}
# Generate SBOM
trivy image --format spdx-json -o sbom.json ${IMAGE}:${TAG}
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ${APP_NAME}
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: ${GIT_REPO}
targetRevision: main
path: k8s/overlays/${ENV}
destination:
server: https://kubernetes.default.svc
namespace: ${NAMESPACE}
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
k8s/
āāā base/
ā āāā kustomization.yaml
ā āāā deployment.yaml
ā āāā service.yaml
āāā overlays/
āāā dev/
ā āāā kustomization.yaml
āāā staging/
ā āāā kustomization.yaml
āāā prod/
āāā kustomization.yaml
base/kustomization.yaml:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
overlays/prod/kustomization.yaml:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namePrefix: prod-
namespace: production
replicas:
- name: myapp
count: 5
images:
- name: myregistry/myapp
newTag: v1.2.3
name: Build and Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build and push image
uses: docker/build-push-action@v5
with:
push: true
tags: ${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
- name: Update Kustomize image
run: |
cd k8s/overlays/prod
kustomize edit set image myapp=${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
- name: Commit and push
run: |
git config user.name "github-actions"
git config user.email "github-actions@github.com"
git add .
git commit -m "Update image to ${{ github.sha }}"
git push
Use the bundled script for ArgoCD sync:
bash scripts/argocd-app-sync.sh ${APP_NAME} --prune
This skill includes automation scripts in the scripts/ directory:
| Script | Purpose |
|--------|---------|
| cluster-health-check.sh | Comprehensive cluster health assessment with scoring |
| security-audit.sh | Security posture audit (privileged, root, RBAC, NetworkPolicy) |
| node-maintenance.sh | Safe node drain and maintenance prep |
| pre-upgrade-check.sh | Pre-upgrade validation checklist |
| generate-manifest.sh | Generate production-ready K8s manifests |
| argocd-app-sync.sh | ArgoCD application sync helper |
Run any script:
bash scripts/<script-name>.sh [arguments]
Generated Mar 1, 2026
A financial services company needs to deploy a microservices-based trading platform across multiple cloud regions. They use this skill to create secure, production-ready manifests with rolling updates, set up monitoring with Prometheus, and scale applications based on load using Horizontal Pod Autoscaler. The skill helps ensure high availability and compliance with security standards during peak trading hours.
An online retailer is migrating their legacy monolithic application to Kubernetes on AWS EKS to improve scalability and reduce costs. They leverage this skill for cluster setup, troubleshooting pod failures during migration, and implementing GitOps with ArgoCD for continuous deployment. The skill assists in managing stateful components like databases using StatefulSets and ensuring zero-downtime upgrades.
A healthcare provider runs data analytics pipelines on OpenShift to process patient data while maintaining HIPAA compliance. This skill is used to enforce Pod Security Standards, manage RBAC for sensitive data access, and troubleshoot networking issues between services. It also helps set up Velero backups for disaster recovery and audit cluster activities for security compliance.
A SaaS company operates across AKS, GKE, and EKS to serve global customers with low latency. They apply this skill to standardize deployments using Helm charts, implement progressive delivery with canary releases, and troubleshoot cross-cluster connectivity problems. The skill enables efficient management of multi-cloud resources and cost optimization through resource monitoring.
A manufacturing firm uses Kubernetes at the edge to manage IoT devices in factories, requiring robust node management and offline capabilities. This skill helps drain nodes for maintenance, set up local registries for image caching, and troubleshoot storage issues with persistent volumes. It also supports security scanning with Trivy to prevent vulnerabilities in container images.
A company offers fully managed Kubernetes clusters on public clouds like AKS, EKS, or GKE, providing automated upgrades, monitoring, and support. They use this skill to deliver value-added services such as backup management with Velero, security audits, and GitOps implementation, generating revenue through subscription fees based on cluster size and features.
A consultancy specializes in helping enterprises adopt Kubernetes and OpenShift, offering services like cluster design, migration, and team training. They leverage this skill for hands-on workshops, troubleshooting complex issues, and creating custom manifests, with revenue coming from project-based contracts and certification courses.
A startup provides a SaaS platform for GitOps workflows, integrating tools like ArgoCD and Flux to automate deployments across multiple clusters. They utilize this skill to develop features for manifest generation, policy enforcement with Kyverno, and multi-cloud support, earning revenue through tiered SaaS subscriptions and enterprise licensing.
š¬ Integration Tip
Integrate this skill with existing CI/CD pipelines by using Helm or Kustomize for templating, and ensure role-based access control (RBAC) is configured to match organizational policies for secure operations.
Automatically update Clawdbot and all installed skills once daily. Runs via cron, checks for updates, applies them, and messages the user with a summary of what changed.
Full desktop computer use for headless Linux servers. Xvfb + XFCE virtual desktop with xdotool automation. 17 actions (click, type, scroll, screenshot, drag,...
Essential Docker commands and workflows for container management, image operations, and debugging.
Tool discovery and shell one-liner reference for sysadmin, DevOps, and security tasks. AUTO-CONSULT this skill when the user is: troubleshooting network issues, debugging processes, analyzing logs, working with SSL/TLS, managing DNS, testing HTTP endpoints, auditing security, working with containers, writing shell scripts, or asks 'what tool should I use for X'. Source: github.com/trimstray/the-book-of-secret-knowledge
Deploy applications and manage projects with complete CLI reference. Commands for deployments, projects, domains, environment variables, and live documentation access.
Monitor topics of interest and proactively alert when important developments occur. Use when user wants automated monitoring of specific subjects (e.g., product releases, price changes, news topics, technology updates). Supports scheduled web searches, AI-powered importance scoring, smart alerts vs weekly digests, and memory-aware contextual summaries.