Text-to-speech conversion using node-edge-tts npm package for generating audio from text.
Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation.
Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Transcribe audio files with speaker diarization (who speaks when). Supports 100+ languages, automatic language detection, and timestamps. Use for meetings, interviews, podcasts, or voice messages. Requires AssemblyAI API key.
Translate PowerPoint files to any language while preserving layout. Uses a render-and-verify agent loop (LibreOffice + Vision) to guarantee no text overflow....
FREE voice recognition using Groq's complimentary Whisper API. Transcribe audio messages to text in 50+ languages at no cost. Perfect for voice-to-text autom...
Text-to-speech conversion using Python edge-tts for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and sub...
Text-to-speech using macOS built-in `say` command. Use for voice notifications, audio alerts, reading text aloud, or announcing messages through Mac speakers. Supports multiple languages including Chinese (Mandarin), English, Japanese, etc.
Translate SRT subtitle files using LLM APIs with OpenAI-compatible format. Supports both single-language and bilingual output. Use when you need to translate...
Extract, transcribe, and translate YouTube video transcripts using the YouTubeTranscript.dev V2 API. Supports captions, ASR audio transcription, batch proces...
Smart Telegram reply workflow for OpenClaw: if the user sends text, reply with text; if the user sends a voice note/audio, transcribe locally using the insta...
Text-to-speech with Qwen3-TTS VoiceDesign. Design custom voices via natural language descriptions + seed-based timbre fixation. Includes OpenAI-compatible AP...
Translates PDF documents to Chinese with professional typography. Extracts text, translates section-by-section into well-structured Markdown, then generates...
Translates articles and documents between languages with three modes - quick (direct), normal (analyze then translate), and refined (analyze, translate, revi...
Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.
Local text-to-speech using Piper voices via sherpa-onnx. 100% offline, no API keys required.
Use when user asks for a voice reply, audio response, spoken answer, or wants to hear something read aloud.
Supports multiple languages including German (thorsten) and English (ryan) voices.
Outputs Telegram-compatible voice notes with [[audio_as_voice]] tag.
ElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp,...
Generate high-quality English (and multilingual) audio using Microsoft Edge TTS. Use when the user asks to "speak this", "pronounce", "read aloud", "say this...
Professional multilingual translator with deep domain expertise. Auto-detects language pairs and specialized domains (tech, legal, medical, business, academi...
Translate text accurately — preserve formatting, handle plurals, and adapt tone per locale. Powered by [EvoLink](https://evolink.ai/?utm_source=github&utm_me...
21
568
1mo ago
🎤Speech & Audio
it will help you to send voice messages to your AI Assistant and also can make it talk
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
Build, adapt, or run an audio-processing workflow that takes spoken audio, transcribes it with Whisper or faster-whisper, translates the transcript using the...