Local speech-to-text with the Whisper CLI (no API key).
Convert text to natural speech, transcribe audio to text, and translate spoken content with AI.
These skills handle the full audio pipeline — synthesizing voices with ElevenLabs and OpenAI TTS, transcribing recordings with Whisper, translating spoken content across languages, and processing audio files. Used by podcasters, developers, and accessibility teams.
Convert text to natural-sounding speech with AI voice models and custom voice styles.
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.
Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection.
Quick install — most popular voice & audio ai skill:
clawdbot install steipete/openai-whisper459 skills found
Page 1 of 20
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Local text-to-speech via sherpa-onnx (offline, no cloud)
Start voice calls via the OpenClaw voice-call plugin.
Pure Aliyun ASR skill for voice message transcription, supports multiple channels including Feishu
Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.
Transcribes local voice messages to text using Faster Whisper models for fast, privacy-focused speech recognition on audio files.
Text-to-speech via OpenAI Audio Speech API.
Local speech-to-text with the Whisper CLI (no API key).
Pixel art desktop lobster that lip-syncs to OpenClaw TTS speech. Use when: (1) user wants a visual avatar for their AI agent, (2) user wants a desktop overla...
AI/DevOps 유튜브 숏츠 자동 생성. 트렌드 수집 → 스크립트 → 이미지 → Veo 영상 → TTS 나레이션 → Remotion 합성 → YouTube 업로드
Text-to-Speech using SiliconFlow API (CosyVoice2). Supports multiple voices, languages, and dialects.
Text-to-speech using macOS built-in `say` command. Use for voice notifications, audio alerts, reading text aloud, or announcing messages through Mac speakers. Supports multiple languages including Chinese (Mandarin), English, Japanese, etc.
Generate, manage, and track professional invoices with client onboarding, customizable payment terms, recurring billing, automated overdue reminders, and fin...
Convert text to speech using Hume AI (or OpenAI) API. Use when the user asks for an audio message, a voice reply, or to hear something "of vive voix".
Automated daily news video downloader with AI subtitle proofreading. Downloads CBS Evening News and BBC News at Ten from YouTube, extracts and proofreads sub...
将音频文件转换为飞书可播放的语音消息。先用 ffmpeg 转为 opus 格式,再上传到飞书,最后发送 audio 消息。适用于用户想要在飞书中收到可播放的语音消息的场景。
Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).
Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.
AI audio generation and text-to-speech powered by CellCog. Voiceover, narration, voice cloning, avatar voices, sound effects, music, podcasts, dialogue. Thre...
Auto-play TTS voice files with wake word detection. Only plays audio when user message contains wake words like "语音", "念出来", "voice", etc. Perfect for Webcha...
Generate spoken audio from text using the local Kokoro TTS engine. Use when the user asks to "say" something, requests a voice message, or wants text converted to speech.
Local text-to-speech using Piper voices via sherpa-onnx. 100% offline, no API keys required. Use when user asks for a voice reply, audio response, spoken answer, or wants to hear something read aloud. Supports multiple languages including German (thorsten) and English (ryan) voices. Outputs Telegram-compatible voice notes with [[audio_as_voice]] tag.
Skills integrate with ElevenLabs, OpenAI TTS, Murf, Play.ht, Azure Cognitive Services, and Google Cloud TTS — covering natural voices, voice cloning, and custom voice styles.
Yes. Whisper-based skills handle long-form audio by chunking and processing in segments, with speaker diarization and timestamp output. Cloud-based skills handle files up to several hours.