Logo
ClawHub Skills Lib
HomeTrending
Home/๐Ÿค– AI & Agents/๐ŸŽค Speech & Audio

๐ŸŽค Speech & Audio AI Skills

177 AI agent skills for Speech & Audio. Part of the ๐Ÿค– AI & Agents category.

Speech & Audio Skills

177 skills
๐ŸŽคSpeech & Audio

Openai Whisper Api

openai-whisper-api
steipete
v1.0.0
View Details

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

488
11.3k
23
today
๐ŸŽคSpeech & Audio

Openai Whisper

openai-whisper
steipete
v1.0.0
View Details

Local speech-to-text with the Whisper CLI (no API key).

356
13.4k
70
3d ago
๐ŸŽคSpeech & Audio

Sag

sag
steipete
v1.0.0
View Details

ElevenLabs text-to-speech with mac-style say UX.

272
6.8k
8
3d ago
๐ŸŽคSpeech & Audio

Edge TTS

edge-tts
i3130002
v2.0.0
View Details

Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.

25
3.8k
6
3d ago
๐ŸŽคSpeech & Audio

whisper

whisper
fiddlybit
v1.0.0
View Details

End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.

24
2.4k
2d ago
๐ŸŽคSpeech & Audio

OpenAI TTS

openai-tts
pors
v1.0.0
View Details

Text-to-speech via OpenAI Audio Speech API.

22
3.5k
4
2d ago
๐ŸŽคSpeech & Audio

Alexa CLI

alexa-cli
buddyh
v1.3.0
View Details

Control Amazon Alexa devices and smart home via the `alexacli` CLI. Use when a user asks to speak/announce on Echo devices, control lights/thermostats/locks, send voice commands, or query Alexa.

+3
18
2.9k
13
3d ago
๐ŸŽคSpeech & Audio

ElevenLabs Voices

elevenlabs-voices
robbyczgw-cla
v2.1.5
View Details

High-quality voice synthesis with 18 personas, 32 languages, sound effects, batch processing, and voice design using ElevenLabs API.

18
5.1k
16
3d ago
๐ŸŽคSpeech & Audio

Transcribe

transcribe
javicasper
v1.0.2
View Details

Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.

16
2.3k
2
today
๐ŸŽคSpeech & Audio

audio-cog

audio-cog
nitishgargiitd
v1.0.3
View Details

AI audio generation powered by CellCog. Text-to-speech, voice synthesis, voiceovers, podcast audio, narration, music generation, background music, sound design. Professional audio creation with AI.

15
2.9k
2
3d ago
๐ŸŽคSpeech & Audio

Elevenlabs Tts

elevenlabs-tts
Shaharsha
v2.2.0
View Details

ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7...

+12
13
4.5k
6
today
๐ŸŽคSpeech & Audio

Qwen3-tts

qwen-tts
paki81
v1.0.0
View Details

Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.

12
2k
6
yesterday
๐ŸŽคSpeech & Audio

Kokoro TTS

kokoro-tts
edkief
v0.1.0
View Details

Generate spoken audio from text using the local Kokoro TTS engine. Use when the user asks to "say" something, requests a voice message, or wants text converted to speech.

12
3.1k
2d ago
๐ŸŽคSpeech & Audio

Faster Whisper

faster-whisper
ThePlasmak
v1.5.1
View Details

Local speech-to-text using faster-whisper. 4-6x faster than OpenAI Whisper with identical accuracy; GPU acceleration enables ~20x realtime transcription. SRT...

9
4.5k
4
today
๐ŸŽคSpeech & Audio

it will help you to send voice messages to your AI Assistant and also can make it talk

elevenlabs-voice
amreahmed
v1.0.0
View Details

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

9
1.8k
yesterday
๐ŸŽคSpeech & Audio

ElevenLabs Speech-to-Text

elevenlabs-stt
clawdbotborges
v1.0.0
View Details

Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

7
2.9k
4
3d ago
๐ŸŽคSpeech & Audio

Local Whisper

local-whisper
araa47
v1.0.0
View Details

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

7
2.8k
5
3d ago
๐ŸŽคSpeech & Audio

Supercall

supercall
xonder
v2.0.0
View Details

Make AI-powered phone calls with custom personas and goals. Uses OpenAI Realtime API + Twilio for ultra-low latency voice conversations. Supports DTMF/IVR na...

+12
7
1.3k
6
today
๐ŸŽคSpeech & Audio

Voice Reply

voice-reply
stolot0mt0m
v1.0.0
View Details

Local text-to-speech using Piper voices via sherpa-onnx. 100% offline, no API keys required. Use when user asks for a voice reply, audio response, spoken answer, or wants to hear something read aloud. Supports multiple languages including German (thorsten) and English (ryan) voices. Outputs Telegram-compatible voice notes with [[audio_as_voice]] tag.

7
2.6k
4
yesterday
๐ŸŽคSpeech & Audio

Speech To Text

speech-to-text
okaris
v0.1.5
View Details

Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...

6
1.3k
today
๐ŸŽคSpeech & Audio

MLX STT

mlx-stt
guoqiao
v1.0.7
View Details

Speech-To-Text with MLX (Apple Silicon) and opensource models (default GLM-ASR-Nano-2512) locally.

+12
6
2.6k
yesterday
๐ŸŽคSpeech & Audio

Openai Whisper 1.0.0

openai-whisper-1-0-0
czubi1928
v1.0.0
View Details

Local speech-to-text with the Whisper CLI (no API key).

6
222
3d ago
๐ŸŽคSpeech & Audio

music-cog

music-cog
nitishgargiitd
v1.0.1
View Details

Original music, fully yours. 5 seconds to 10 minutes using frontier music generation models. Instrumental and vocal tracks with perfect vocals. Cinematic scores, background tracks, podcast intros, game soundtracks, ambient soundscapes, jingles, lo-fi beats, orchestral compositions, songs with lyrics.

6
1.7k
2
3d ago
๐ŸŽคSpeech & Audio

Pocket Tts

pocket-tts
sherajdev
v1.0.1
View Details

Generate high-quality English speech offline on CPU using 8 built-in voices or custom voice cloning with Kyutai's Pocket TTS model.

+2
6
1.7k
2
3d ago
โ†12โ€ฆ8โ†’

More in ๐Ÿค– AI & Agents

๐Ÿง 
LLMs & Model APIs
251 skills
๐Ÿค–
Agent Frameworks
1794 skills
โš™๏ธ
AI Tools & Utilities
135 skills
๐Ÿ–ผ๏ธ
Image Generation
247 skills
๐ŸŽฌ
Video Generation
74 skills
โšก
Automation & Workflows
326 skills
๐Ÿ’ฌ
Chatbots & Assistants
133 skills
๐Ÿ“
Prompt & Config

Data sourced from clawhub.ai ยท Built with Next.js, Supabase, Prisma

55 skills
๐Ÿฆž
OpenClaw Platform
638 skills