yuyonghao-multimodal-baseSupports image understanding, OCR, speech-to-text, and text-to-speech synthesis with multi-voice and multimodal unified processing using OpenAI and Edge TTS.
Install via ClawdBot CLI:
clawdbot install yuyonghao-123/yuyonghao-multimodal-baseGrade Fair — based on market validation, documentation quality, package completeness, maintenance status, and authenticity signals.
Accesses sensitive credential files or environment variables
process.env.OPENAICalls external URL not in known-safe list
https://registry.npmjs.org/asynckit/-/asynckit-0.4.0.tgzUses known external API (expected, informational)
api.openai.comAI Analysis
The skill's external API usage (OpenAI GPT-4V, Whisper) is consistent with its stated multimodal purpose and requires explicit user-provided API keys. No hidden instructions, credential harvesting, or obfuscation are evident in the provided definition. The primary risk is the standard data-sharing inherent to using third-party AI services.
Usage Guide
Loading usage data… refresh in a few seconds.
Scored May 19, 2026
Audited Apr 16, 2026 · audit v1.0
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.