Logo
ClawHub Skills Lib
HomeTrending
Home/🤖 AI & Agents/🎤 Speech & Audio

🎤 Speech & Audio AI Skills

177 AI agent skills for Speech & Audio. Part of the 🤖 AI & Agents category.

Speech & Audio Skills — Page 2

177 skills
🎤Speech & Audio

music-cog

music-cog
nitishgargiitd
v1.0.1
View Details

Original music, fully yours. 5 seconds to 10 minutes using frontier music generation models. Instrumental and vocal tracks with perfect vocals. Cinematic scores, background tracks, podcast intros, game soundtracks, ambient soundscapes, jingles, lo-fi beats, orchestral compositions, songs with lyrics.

6
1.7k
2
3d ago
🎤Speech & Audio

Voice

voice
zhaov1976
v1.0.1
View Details

Convert text to speech using Microsoft Edge's TTS engine with customizable voices, direct playback, and automatic temporary file cleanup.

+3
6
1.5k
today
🎤Speech & Audio

Vocal Chat

vocal-chat
rubenfb23
v1.0.0
View Details

Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.

6
2.2k
8
2d ago
🎤Speech & Audio

MLX STT

mlx-stt
guoqiao
v1.0.7
View Details

Speech-To-Text with MLX (Apple Silicon) and opensource models (default GLM-ASR-Nano-2512) locally.

+12
6
2.6k
yesterday
🎤Speech & Audio

Pocket Tts

pocket-tts
sherajdev
v1.0.1
View Details

Generate high-quality English speech offline on CPU using 8 built-in voices or custom voice cloning with Kyutai's Pocket TTS model.

+2
6
1.7k
2
3d ago
🎤Speech & Audio

Mac TTS

mac-tts
kalijason
v1.0.0
View Details

Text-to-speech using macOS built-in `say` command. Use for voice notifications, audio alerts, reading text aloud, or announcing messages through Mac speakers. Supports multiple languages including Chinese (Mandarin), English, Japanese, etc.

5
1.5k
1
3d ago
🎤Speech & Audio

Tts

tts
AMSTKO
v1.0.0
View Details

Convert text to speech using Hume AI (or OpenAI) API. Use when the user asks for an audio message, a voice reply, or to hear something "of vive voix".

5
2k
2d ago
🎤Speech & Audio

Mlx Whisper

mlx-whisper
Kevin37Li
v1.0.0
View Details

Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).

5
2.4k
3d ago
🎤Speech & Audio

SiliconFlow TTS Gen

siliconflow-tts-gen
lilei0311
v1.0.0
View Details

Text-to-Speech using SiliconFlow API (CosyVoice2). Supports multiple voices, languages, and dialects.

+1
4
303
today
🎤Speech & Audio

MLX Audio Server

mlx-audio-server
guoqiao
v0.2.2
View Details

Local 24x7 OpenAI-compatible API server for STT/TTS, powered by MLX on your Mac.

+21
4
2k
today
🎤Speech & Audio

AI Phone Calls (Bland AI)

phone-calls-bland
dru-ca
v1.0.0
View Details

Make AI-powered phone calls via Bland AI - book restaurants, make appointments, inquire about services. The AI calls on your behalf and reports back with transcripts.

4
2.5k
5
3d ago
🎤Speech & Audio

Text To Speech

text-to-speech
okaris
v0.1.5
View Details

Convert text to natural speech with DIA TTS, Kokoro, Chatterbox, and more via inference.sh CLI. Models: DIA TTS (conversational), Kokoro TTS, Chatterbox, Hig...

4
846
today
🎤Speech & Audio

Phone Voice Agent

phone-agent
kesslerio
v1.0.0
View Details

Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot.

4
2.1k
6
today
🎤Speech & Audio

ElevenLabs

elevenlabs-api
byungkyu
v1.0.0
View Details

ElevenLabs API integration with managed authentication. AI-powered text-to-speech, voice cloning, sound effects, and audio processing. Use this skill when users want to generate speech from text, clone voices, create sound effects, or process audio. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).

3
567
1
3d ago
🎤Speech & Audio

Podcast

podcast
ivangdavila
v1.0.1
View Details

Create and grow podcasts by planning episodes, producing audio or video, generating clips, and building audience across formats.

3
812
2
3d ago
🎤Speech & Audio

Parakeet Stt

parakeet-stt
carlulsoe
v1.1.0
View Details

Local speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API. Use when transcribing audio files, converting speech to text, or processing voice recordings locally without cloud APIs.

3
1.9k
4d ago
🎤Speech & Audio

Audio Reply

audio-reply-skill
MaTriXy
v1.1.0
View Details

Generate audio replies using TTS. Trigger with "read it to me [public URL]" to fetch and read content aloud, or "talk to me [topic]" to generate a spoken res...

3
1.8k
2
3d ago
🎤Speech & Audio

Local STT (Nvidia Parakeet + Whisper Support)

local-stt
araa47
v1.0.0
View Details

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

3
1.8k
yesterday
🎤Speech & Audio

audio-broadcast

audio-broadcast
oxiaom
v1.0.1
View Details

控制小播鼠广播系统进行音频播放和广播通知。使用当用户需要向广播设备播放音频、设置音量、管理定时广播任务、或查看设备状态时。支持播放音频文件、URL播放、音量调节、设备管理、定时任务管理、文字转语音(TTS)广播等功能。Control xiaoboshu broadcast system for audio pla...

2
238
3d ago
🎤Speech & Audio

RingBot

ringbot
gbessoni
v1.1.0
View Details

Make outbound AI phone calls. Use when asked to call a business, make a phone call, order food by phone, schedule appointments, or any task requiring voice calls. Triggers on "call", "phone", "dial", "ring", "order pizza", "make reservation", "schedule appointment".

2
1.9k
3
2d ago
🎤Speech & Audio

Alicloud Ai Audio Tts

alicloud-ai-audio-tts
cinience
v1.0.3
View Details

Generate human-like speech audio with Model Studio DashScope Qwen TTS models (qwen3-tts-flash, qwen3-tts-instruct-flash). Use when converting text to speech,...

2
702
today
🎤Speech & Audio

WebSocket

websocket
ivangdavila
v1.0.0
View Details

Implement reliable WebSocket connections with proper reconnection, heartbeats, and scaling.

2
651
2
3d ago
🎤Speech & Audio

macOS Local Voice

macos-local-voice
STRRL
v1.0.0
View Details

Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection.

2
748
today
🎤Speech & Audio

say

say
tobihagemann
v1.0.2
View Details

Text-to-Speech via macOS say command with Siri Natural Voices. Use for generating speech audio, TTS clips, or speaking text aloud on macOS.

2
273
3d ago
←123…8→

Data sourced from clawhub.ai · Built with Next.js, Supabase, Prisma