Local speech-to-text with Parakeet MLX (ASR) for Apple Silicon (no API key).
178 AI agent skills for Speech & Audio. Part of the 🤖 AI & Agents category.
Local speech-to-text with Parakeet MLX (ASR) for Apple Silicon (no API key).
Generate human-like speech audio with Model Studio DashScope Qwen TTS models (qwen3-tts-flash, qwen3-tts-instruct-flash). Use when converting text to speech,...
Automatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audio trans...
Implement reliable WebSocket connections with proper reconnection, heartbeats, and scaling.
Real-time speech synthesis with Alibaba Cloud Model Studio Qwen TTS Realtime models. Use when low-latency interactive speech is required, including instruction-controlled realtime synthesis.
控制小播鼠广播系统进行音频播放和广播通知。使用当用户需要向广播设备播放音频、设置音量、管理定时广播任务、或查看设备状态时。支持播放音频文件、URL播放、音量调节、设备管理、定时任务管理、文字转语音(TTS)广播等功能。Control xiaoboshu broadcast system for audio pla...
Create AI-powered podcasts with text-to-speech, music, and audio editing. Tools: Kokoro TTS, DIA TTS, Chatterbox, AI music generation, media merger. Capabili...
Query and create field observations and AI-processed captures. Photos, voice notes, and text notes from the field.
Make outbound AI phone calls. Use when asked to call a business, make a phone call, order food by phone, schedule appointments, or any task requiring voice calls. Triggers on "call", "phone", "dial", "ring", "order pizza", "make reservation", "schedule appointment".
Control Anova Precision Ovens and Precision Cookers (sous vide) via WiFi WebSocket API. Start cooking modes (sous vide, roasting, steam), set temperatures, monitor status, and stop cooking remotely.
A great podcast needs three things: compelling content, natural-sounding voices, and polished production. CellCog delivers all three — #1 on DeepResearch Bench (Feb 2026) for script depth, frontier multi-voice dialogue, and automatic music + editing. Podcast production, episode scripts, show notes, interview prep, audiograms — single prompt to finished MP3.
Text-to-Speech via macOS say command with Siri Natural Voices. Use for generating speech audio, TTS clips, or speaking text aloud on macOS.
Text-to-speech, speech-to-text, voice conversion, and audio processing using EachLabs AI models. Supports ElevenLabs TTS, Whisper transcription with diarization, and RVC voice conversion. Use when the user needs TTS, transcription, or voice conversion.
Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chin...
Publish podcast episodes from RSS and Notion to Substack with Apple Podcasts embeds and images, then generate LinkedIn-ready companion posts.
youtube-voice-summarizer-elevenlabsTransform YouTube videos into podcast-style voice summaries using ElevenLabs TTS
Local STT and TTS on macOS using native Apple capabilities. Speech-to-text via yap (Apple Speech.framework), text-to-speech via say + ffmpeg. Fully offline, no API keys required. Includes voice quality detection and smart voice selection.
Build AI phone call reminders with ElevenLabs Conversational AI + Twilio. Free starter guide.
The Botcast — a podcast platform for AI agents. Be a guest or host on long-form interview episodes. Use when an agent is invited to The Botcast, wants to participate in a podcast episode, or needs to interact with The Botcast API.
触发阿里云晓蜜外呼机器人任务,自动批量拨打电话。适用于批量外呼、客户回访、满意度调查、简历筛查约面试等场景。可从前置工具或节点获取外呼名单。
Alibaba Cloud Text-to-Speech synthesis service.
Generate audio visualization videos using each::sense AI. Create waveforms, spectrum analyzers, particle effects, 3D visualizations, and beat-synced animatio...
OpenClaw local speech-to-text backend using faster-whisper over HTTP on 127.0.0.1:18790. Use when you want voice transcription without external APIs, without...
openai-tts-bak-2026-01-28t18-01-23-10-30Text-to-speech via OpenAI Audio Speech API.