edge-ttsText-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
Install via ClawdBot CLI:
clawdbot install i3130002/edge-ttsGenerate high-quality text-to-speech audio using Microsoft Edge's neural TTS service via the node-edge-tts npm package. Supports multiple languages, voices, adjustable speed/pitch, and subtitle generation.
When you detect TTS intent from triggers or user request:
// Example: Built-in tts tool usage
tts("Your text to convert to speech")
// Returns: MEDIA: /path/to/audio.mp3
Recognize "tts" keyword as TTS requests. The skill automatically filters out TTS-related keywords from text before conversion to avoid converting the trigger words themselves to audio.
For more control, use the bundled scripts directly:
cd scripts
npm install
node tts-converter.js "Your text" --voice en-US-AriaNeural --rate +10% --output output.mp3
Options:
--voice, -v: Voice name (default: en-US-AriaNeural)--lang, -l: Language code (e.g., en-US, es-ES)--format, -o: Output format (default: audio-24khz-48kbitrate-mono-mp3)--pitch: Pitch adjustment (e.g., +10%, -20%, default)--rate, -r: Rate adjustment (e.g., +10%, -20%, default)--volume: Volume adjustment (e.g., +0%, -10%, default)--save-subtitles, -s: Save subtitles as JSON file--output, -f: Output file path (default: tts_output.mp3)--proxy, -p: Proxy URL (e.g., http://localhost:7890)--timeout: Request timeout in milliseconds (default: 10000)--list-voices, -L: List available voicescd scripts
npm install
node config-manager.js --set-voice en-US-AriaNeural
node config-manager.js --set-rate +10%
node config-manager.js --get
node config-manager.js --reset
Common voices (use --list-voices for full list):
English:
en-US-MichelleNeural (female, natural, default)en-US-AriaNeural (female, natural)en-US-GuyNeural (male, natural)en-GB-SoniaNeural (female, British)en-GB-RyanNeural (male, British)Other Languages:
es-ES-ElviraNeural (Spanish, Spain)fr-FR-DeniseNeural (French)de-DE-KatjaNeural (German)ja-JP-NanamiNeural (Japanese)zh-CN-XiaoxiaoNeural (Chinese)ar-SA-ZariyahNeural (Arabic)Rate values use percentage format:
"default": Normal speed"-20%" to "-10%": Slow, clear (tutorials, stories, accessibility)"+10%" to "+20%": Slightly fast (summaries)"+30%" to "+50%": Fast (news, efficiency)Choose audio quality based on use case:
audio-24khz-48kbitrate-mono-mp3: Standard quality (voice notes, messages)audio-24khz-96kbitrate-mono-mp3: High quality (presentations, content)audio-48khz-96kbitrate-stereo-mp3: Highest quality (professional audio, music)Main TTS conversion script using node-edge-tts. Generates audio files with customizable voice, rate, volume, pitch, and format. Supports subtitle generation and voice listing.
Manages persistent user preferences for TTS settings (voice, language, format, pitch, rate, volume). Stores config in ~/.tts-config.json.
NPM package configuration with node-edge-tts dependency.
Complete documentation for node-edge-tts npm package including:
Test different voices and preview audio quality at: https://tts.travisvn.com/
Refer to this when you need specific voice details or advanced features.
To use the bundled scripts:
cd /home/user/clawd/skills/public/tts-skill/scripts
npm install
This installs:
node-edge-tts - TTS librarycommander - CLI argument parsingtts tool for simple requests, or scripts/tts-converter.js for customizationRun the test script to verify TTS functionality:
cd /home/user/clawd/skills/public/edge-tts/scripts
npm test
This generates a test audio file and verifies the TTS service is working.
Test different voices and preview audio quality at: https://tts.travisvn.com/
Use the built-in tts tool for quick testing:
// Example: Test TTS with default settings
tts("This is a test of the TTS functionality.")
Verify configuration persistence:
cd /home/user/clawd/skills/public/edge-tts/scripts
node config-manager.js --get
node config-manager.js --set-voice en-US-GuyNeural
node config-manager.js --get
npm test to check if TTS service is accessiblenode tts-converter.js --list-voices to see available voicesnode tts-converter.js "test" --proxy http://localhost:7890test-output.mp3 in the scripts directory/tmp/edge-tts-temp/ on Unix, C:\Users\\AppData\Local\Temp\edge-tts-temp\ on Windows) with unique filenames (e.g., tts_1234567890_abc123.mp3). Files are not automatically deleted - the calling application (Clawdbot) should handle cleanup after use. You can specify a custom output path with the --output option if permanent storage is needed.config-manager.js to set defaultsen-US-MichelleNeural (female, natural)Neural) provide higher quality than Standard voicesGenerated Mar 1, 2026
Educational platforms can integrate this skill to convert textbooks and course materials into audio, allowing visually impaired students to listen to content. The adjustable speed and pitch features help customize the listening experience for different learning paces and preferences.
Smart home devices can use this TTS skill to read out recipes, news summaries, or reminders while users cook or clean. The support for multiple languages and voices enables personalized interactions in diverse household settings.
Media companies can generate high-quality voiceovers for podcasts, audiobooks, or video narrations using customizable voices and output formats. The subtitle generation feature adds value by creating synchronized text for accessibility or translations.
Call centers can deploy this skill to convert automated responses or FAQs into natural-sounding speech for IVR systems. The ability to adjust rate and pitch helps convey urgency or calmness based on customer needs.
Language learning apps can integrate TTS to provide audio examples of vocabulary and phrases in different accents, such as British or American English. The pitch control aids in emphasizing intonation patterns for better pronunciation practice.
Offer this TTS skill as a cloud-based API service where developers pay a monthly fee based on usage volume, such as number of audio minutes generated. It targets app builders needing reliable, scalable text-to-speech without managing infrastructure.
Integrate the skill into a mobile app that offers basic TTS features for free, with premium upgrades for advanced voices, higher audio quality, or ad-free usage. Monetize through in-app purchases and advertisements.
License the skill to large enterprises for internal use in training modules, accessibility tools, or communication systems, with custom support and integration services. Charge based on the number of users or annual contracts.
š¬ Integration Tip
Start by using the built-in tts tool for simple implementations, then leverage the scripts for advanced customization like voice selection and rate adjustments.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.
Control Amazon Alexa devices and smart home via the `alexacli` CLI. Use when a user asks to speak/announce on Echo devices, control lights/thermostats/locks, send voice commands, or query Alexa.