phy-content-safety-guardDual-layer AI content guardrail with red-team test methodology
Install via ClawdBot CLI:
clawdbot install PHY041/phy-content-safety-guardGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Calls external URL not in known-safe list
https://generativelanguage.googleapis.com/v1beta/modelsUses known external API (expected, informational)
googleapis.comAudited Apr 18, 2026 · audit v1.0
Generated Mar 22, 2026
Prevents AI agents from leaking internal policies, making inappropriate brand statements, or providing harmful advice in customer interactions. Ensures all responses align with company values and safety guidelines.
Blocks content that could negatively evaluate student capabilities, provide medical/psychological diagnoses, or contain discriminatory remarks. Maintains encouraging, educational tone while filtering unsafe outputs.
Filters out diagnostic language, harmful self-evaluation statements, and dangerous content while allowing supportive guidance. Crucial for preventing AI from causing psychological harm through inappropriate responses.
Acts as second-layer defense against violent, sexual, or hateful content generation by AI moderation tools. Catches failures where primary models might be manipulated through prompt injection attacks.
Prevents leakage of internal information like API keys, system prompts, or proprietary data while blocking harmful content. Essential for maintaining corporate security and brand integrity.
Offer tiered monthly subscriptions based on message volume and customization levels. Enterprise tiers include custom forbidden categories, multiple language fallbacks, and detailed analytics dashboards.
Charge per API call for content evaluation with volume discounts. Include premium features like custom judge models, lower latency guarantees, and industry-specific safety templates.
License the guardrail technology to chatbot platforms and AI agent marketplaces. Provide customization tools for brands to define their safety parameters while maintaining core infrastructure.
💬 Integration Tip
Customize the GUARD_SYSTEM_PROMPT with abstract category descriptions rather than specific forbidden terms to avoid triggering Gemini's own safety filters on benign content.
Scored Apr 19, 2026
Security vetting protocol before installing any AI agent skill. Red flag detection for credential theft, obfuscated code, exfiltration. Risk classification L...
Security-first skill vetting for AI agents. Use before installing any skill from ClawdHub, GitHub, or other sources. Checks for red flags, permission scope,...
Comprehensive security auditing for Clawdbot deployments. Scans for exposed credentials, open ports, weak configs, and vulnerabilities. Auto-fix mode included.
Audit codebases and infrastructure for security issues. Use when scanning dependencies for vulnerabilities, detecting hardcoded secrets, checking OWASP top 10 issues, verifying SSL/TLS, auditing file permissions, or reviewing code for injection and auth flaws.
Audit a user's current AI tool stack. Score each tool by ROI, identify redundancies, gaps, and upgrade opportunities. Produces a structured report with score...
Detect anomalies and outliers in construction data: unusual costs, schedule variances, productivity spikes. Statistical and ML-based detection methods.