Text-to-speech conversion using node-edge-tts npm package for generating audio from text.
Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation.
Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Translates articles and documents between languages with three modes - quick (direct), normal (analyze then translate), and refined (analyze, translate, revi...
End-user guide for running and configuring the `translate` CLI across text/stdin/file/glob inputs, provider selection, presets, custom prompt templates, and...
Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.
12
2k
6
23d ago
🎤Speech & Audio
it will help you to send voice messages to your AI Assistant and also can make it talk
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to large). Use when transcribing audio recordings, podcasts, voice messages, lectures, meetings, or any audio/video file to text. Handles mp3, wav, m4a, ogg, flac, webm, opus, aac formats.
Translate PowerPoint files to any language while preserving layout. Uses a render-and-verify agent loop (LibreOffice + Vision) to guarantee no text overflow....
Local text-to-speech using Piper voices via sherpa-onnx. 100% offline, no API keys required.
Use when user asks for a voice reply, audio response, spoken answer, or wants to hear something read aloud.
Supports multiple languages including German (thorsten) and English (ryan) voices.
Outputs Telegram-compatible voice notes with [[audio_as_voice]] tag.
Perform audio editing tasks including trimming, volume adjustment, format conversion, and extracting audio from video files using natural language commands.
Transcribe, diarise, translate, post-process, and structure audio/video with AssemblyAI. Use this skill when the user wants AssemblyAI specifically, needs hi...
Japanese-English translator and language tutor. Use when: (1) User shares Japanese text and wants translation (news articles, tweets, signs, menus, emails). (2) User asks "what does X mean" for Japanese words/phrases. (3) User wants to learn Japanese grammar, vocabulary, or cultural context. (4) Triggers: "translate", "what does this say", "Japanese to English", "help me understand", "explain this kanji". Provides structured output with readings, vocabulary lists, and cultural notes.
Text-to-speech using macOS built-in `say` command. Use for voice notifications, audio alerts, reading text aloud, or announcing messages through Mac speakers. Supports multiple languages including Chinese (Mandarin), English, Japanese, etc.
Text-to-speech generation on Volcengine audio services. Use when users need narration, multi-language speech output, voice selection, or TTS troubleshooting.
AI-agent Skill for PPTX OOXML localization workflows. Use it to unpack PPTX, extract and apply text translations, normalize terminology, enforce language-specific fonts, validate XML integrity, and repack outputs with machine-readable JSON interfaces for automation.
Get subtitles from YouTube videos for translation, language learning, or reading along. Use when the user asks for subtitles, subs, foreign language text, or wants to read video content. Supports multiple languages and timestamped output for sync'd reading.
Convert text to speech using Microsoft Edge TTS with real-time streaming, customizable voice settings, and support for multiple languages including Chinese a...
Windows SAPI5 text-to-speech with Neural voices. Lightweight alternative to GPU-heavy TTS - zero GPU usage, instant generation. Auto-detects best available voice for your language. Works on Windows 10/11.