🎤 Speech & Audio AI Skills

177 AI agent skills for Speech & Audio. Part of the 🤖 AI & Agents category.

Speech & Audio Skills — Page 2

177 skills

🎤Speech & Audio

music-cog

music-cog

nitishgargiitd

v1.0.1

View Details

Original music, fully yours. 5 seconds to 10 minutes using frontier music generation models. Instrumental and vocal tracks with perfect vocals. Cinematic scores, background tracks, podcast intros, game soundtracks, ambient soundscapes, jingles, lo-fi beats, orchestral compositions, songs with lyrics.

1.7k

3d ago

🎤Speech & Audio

Voice

voice

zhaov1976

v1.0.1

View Details

Convert text to speech using Microsoft Edge's TTS engine with customizable voices, direct playback, and automatic temporary file cleanup.

1.5k

today

🎤Speech & Audio

Vocal Chat

vocal-chat

rubenfb23

v1.0.0

View Details

Handles voice-to-voice conversations on WhatsApp. Automatically transcribes incoming audio and responds with local TTS audio. Use when the user wants to "talk" instead of type.

2.2k

2d ago

🎤Speech & Audio

MLX STT

mlx-stt

guoqiao

v1.0.7

View Details

Speech-To-Text with MLX (Apple Silicon) and opensource models (default GLM-ASR-Nano-2512) locally.

+12

2.6k

yesterday

🎤Speech & Audio

Pocket Tts

pocket-tts

sherajdev

v1.0.1

View Details

Generate high-quality English speech offline on CPU using 8 built-in voices or custom voice cloning with Kyutai's Pocket TTS model.

1.7k

3d ago

🎤Speech & Audio

Mac TTS

mac-tts

kalijason

v1.0.0

View Details

Text-to-speech using macOS built-in `say` command. Use for voice notifications, audio alerts, reading text aloud, or announcing messages through Mac speakers. Supports multiple languages including Chinese (Mandarin), English, Japanese, etc.

1.5k

3d ago

🎤Speech & Audio

Tts

tts

AMSTKO

v1.0.0

View Details

Convert text to speech using Hume AI (or OpenAI) API. Use when the user asks for an audio message, a voice reply, or to hear something "of vive voix".

2d ago

🎤Speech & Audio

Mlx Whisper

mlx-whisper

Kevin37Li

v1.0.0

View Details

Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).

2.4k

3d ago

🎤Speech & Audio

SiliconFlow TTS Gen

siliconflow-tts-gen

lilei0311

v1.0.0

View Details

Text-to-Speech using SiliconFlow API (CosyVoice2). Supports multiple voices, languages, and dialects.

303

today

🎤Speech & Audio

MLX Audio Server

mlx-audio-server

guoqiao

v0.2.2

View Details

Local 24x7 OpenAI-compatible API server for STT/TTS, powered by MLX on your Mac.

+21

today

🎤Speech & Audio

AI Phone Calls (Bland AI)

phone-calls-bland

dru-ca

v1.0.0

View Details

Make AI-powered phone calls via Bland AI - book restaurants, make appointments, inquire about services. The AI calls on your behalf and reports back with transcripts.

2.5k

3d ago

🎤Speech & Audio

Text To Speech

text-to-speech

okaris

v0.1.5

View Details

Convert text to natural speech with DIA TTS, Kokoro, Chatterbox, and more via inference.sh CLI. Models: DIA TTS (conversational), Kokoro TTS, Chatterbox, Hig...

846

today

🎤Speech & Audio

Phone Voice Agent

phone-agent

kesslerio

v1.0.0

View Details

Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot.

2.1k

today

🎤Speech & Audio

ElevenLabs

elevenlabs-api

byungkyu

v1.0.0

View Details

ElevenLabs API integration with managed authentication. AI-powered text-to-speech, voice cloning, sound effects, and audio processing. Use this skill when users want to generate speech from text, clone voices, create sound effects, or process audio. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).

567

3d ago

🎤Speech & Audio

Podcast

podcast

ivangdavila

v1.0.1

View Details

Create and grow podcasts by planning episodes, producing audio or video, generating clips, and building audience across formats.

812

3d ago

🎤Speech & Audio

Parakeet Stt

parakeet-stt

carlulsoe

v1.1.0

View Details

Local speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API. Use when transcribing audio files, converting speech to text, or processing voice recordings locally without cloud APIs.

1.9k

4d ago

🎤Speech & Audio

Audio Reply

audio-reply-skill

MaTriXy

v1.1.0

View Details

Generate audio replies using TTS. Trigger with "read it to me [public URL]" to fetch and read content aloud, or "talk to me [topic]" to generate a spoken res...

1.8k

3d ago

🎤Speech & Audio

Local STT (Nvidia Parakeet + Whisper Support)

local-stt

araa47

v1.0.0

View Details

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

1.8k

yesterday

🎤Speech & Audio

audio-broadcast

audio-broadcast

oxiaom

v1.0.1

View Details

控制小播鼠广播系统进行音频播放和广播通知。使用当用户需要向广播设备播放音频、设置音量、管理定时广播任务、或查看设备状态时。支持播放音频文件、URL播放、音量调节、设备管理、定时任务管理、文字转语音(TTS)广播等功能。Control xiaoboshu broadcast system for audio pla...

238

3d ago

🎤Speech & Audio

RingBot

ringbot

gbessoni

v1.1.0

View Details

Make outbound AI phone calls. Use when asked to call a business, make a phone call, order food by phone, schedule appointments, or any task requiring voice calls. Triggers on "call", "phone", "dial", "ring", "order pizza", "make reservation", "schedule appointment".

1.9k

2d ago

🎤Speech & Audio

Alicloud Ai Audio Tts

alicloud-ai-audio-tts

cinience

v1.0.3

View Details

Generate human-like speech audio with Model Studio DashScope Qwen TTS models (qwen3-tts-flash, qwen3-tts-instruct-flash). Use when converting text to speech,...

702

today

🎤Speech & Audio

WebSocket

websocket

ivangdavila

v1.0.0

View Details

Implement reliable WebSocket connections with proper reconnection, heartbeats, and scaling.

651

3d ago

🎤Speech & Audio

macOS Local Voice

macos-local-voice

STRRL

v1.0.0

View Details

Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection.

748

today

🎤Speech & Audio

say

say

tobihagemann

v1.0.2

View Details

Text-to-Speech via macOS say command with Siri Natural Voices. Use for generating speech audio, TTS clips, or speaking text aloud on macOS.

273

3d ago