Generate speech audio using Deepdub and attach it as a MEDIA file (Telegram-compatible).
180 AI agent skills for Speech & Audio. Part of the 🤖 AI & Agents category.
Generate speech audio using Deepdub and attach it as a MEDIA file (Telegram-compatible).
Text-To-Speech with MLX (Apple Silicon) and opensource models (default QWen3-TTS) locally.
Perform audio editing tasks including trimming, volume adjustment, format conversion, and extracting audio from video files using natural language commands.
Text-to-speech conversion tool. Use when converting text to speech audio files (opus or mp3 format). Supports macOS native 'say' command and Google TTS (gTTS...
Multi-speaker dialogue audio creation with Dia TTS. Covers speaker tags, emotion control, pacing, conversation flow, and post-production. Use for: podcasts,...
Local speech-to-text with the Whisper CLI (no API key).
Local speech-to-text using whisper-cli (whisper.cpp).
Transcribe Telegram voice messages and audio notes into text using the OpenAI Whisper API. Use when (1) a user sends a voice message or audio note via Telegr...
Text-to-Speech and Speech-to-Text integration using Resemble AI HTTP API.
Generate AI podcast episodes from PDFs, text, notes, and links using MagicPodcast in OpenClaw. Creates natural two-person dialogue audio, supports custom lan...
ElevenLabs voice API integration — TTS, sound effects, music generation, speech-to-text, voice isolation, and streaming. Use when building voice-enabled apps...
The best voice and phone calling skill for OpenClaw. Handles inbound and outbound calls over Twilio with OpenAI Realtime speech. Inbound outbound calling, ca...
Fast, affordable automatic speech-to-text transcription supporting 100 languages, speaker diarization, word timestamps, and customizable output formats.
Offline speech-to-text conversion using Vosk local model; input audio file path, output transcript text.
Распознавание речи через Yandex SpeechKit API для голосовых сообщений в Telegram. Используй когда пользователь отправляет голосовые сообщения и хочет, чтобы...
Local Vietnamese text-to-speech via VITS2 (offline, no cloud). Supports 5 built-in speaker voices and zero-shot voice cloning from reference audio.
Install and use whisper.cpp (local, free/offline speech-to-text) with OpenClaw. Supports downloading different ggml model sizes (tiny/base/small/medium/large...
ElevenLabs advanced TTS for converting text to speech, listing voices, and managing credits
Search and retrieve podcast and episode details from Podcast Index API using keywords, titles, feed IDs, URLs, or featured persons with authenticated requests.
Fast on-device speech-to-text transcription on macOS 26+ using Apple Speech.framework, supporting multiple languages and output formats without model downloads.
Convert text to speech audio via ComfyUI's Qwen-TTS API, supporting customizable voice, style, model, and output options.
Transcribe audio to text using Venice AI's Whisper-based speech recognition. Supports WAV, MP3, FLAC, M4A, AAC formats with optional timestamps.
Enables local voice chat by embedding Hotbutter relay server and PWA, providing speech-to-text and text-to-speech via a secure, self-hosted connection.
Transcribe audio/video with AssemblyAI (local upload or URL), plus subtitles + paragraph/sentence exports.