voiceConvert text to speech using Microsoft Edge's TTS engine with customizable voices, direct playback, and automatic temporary file cleanup.
Install via ClawdBot CLI:
clawdbot install zhaov1976/voiceThe Voice skill provides enhanced text-to-speech functionality using edge-tts, allowing you to convert text to spoken audio with multiple playback options.
Before using this skill, you need to install the required dependency:
pip3 install edge-tts
Or use the skill's install action:
await skill.execute({ action: 'install' });
Speak text directly without storing to file:
const result = await skill.execute({
action: 'speak', // New improved action
text: 'Hello, how are you today?'
});
// Audio is played directly and temporary file is cleaned up automatically
Convert text to speech with default settings:
const result = await skill.execute({
action: 'tts',
text: 'Hello, how are you today?'
});
// Returns a MEDIA link to the audio file
With direct playback:
const result = await skill.execute({
action: 'tts',
text: 'Hello, how are you today?',
playImmediately: true // Plays the audio immediately after generation
});
With custom options:
const result = await skill.execute({
action: 'tts',
text: 'This is a sample of voice customization.',
options: {
voice: 'zh-CN-XiaoxiaoNeural',
rate: '+10%',
volume: '-5%',
pitch: '+10Hz'
}
});
Play an existing audio file:
const result = await skill.execute({
action: 'play',
filePath: '/path/to/audio/file.mp3'
});
Get a list of available voices:
const result = await skill.execute({
action: 'voices'
});
Clean up temporary audio files older than 1 hour (default):
const result = await skill.execute({
action: 'cleanup'
});
Or specify a custom age threshold:
const result = await skill.execute({
action: 'cleanup',
options: {
hoursOld: 2 // Clean files older than 2 hours
}
});
The following options are available for text-to-speech:
voice: The voice to use (default: 'zh-CN-XiaoxiaoNeural')rate: Speech rate adjustment (default: '+0%')volume: Volume adjustment (default: '+0%')pitch: Pitch adjustment (default: '+0Hz')Edge-TTS supports many voices in different languages:
temp directorypip3 install edge-tts)Generated Mar 1, 2026
Automatically generate voiceovers for online courses, tutorials, or e-learning modules in multiple languages. This enables educators to create accessible audio content without manual recording, enhancing learning experiences for students with visual impairments or those who prefer auditory learning.
Integrate text-to-speech into chatbots or IVR systems to provide spoken responses to customer inquiries. This reduces reliance on pre-recorded audio, allowing dynamic generation of announcements, instructions, or support messages in real-time, improving efficiency and scalability.
Convert text-based content like articles, blogs, or documents into audio for users with visual impairments or reading difficulties. This skill can be embedded in websites or apps to offer audio playback options, making digital content more inclusive and compliant with accessibility standards.
Develop voice-enabled applications such as virtual assistants, smart home devices, or gaming interfaces that require natural-sounding speech output. By leveraging customizable voice options and playback features, developers can create engaging user interactions without complex audio engineering.
Offer the voice skill as a cloud-based service with tiered pricing based on usage volume, such as number of characters converted or API calls. This model targets businesses needing scalable TTS solutions, with revenue generated from monthly or annual subscriptions and potential add-ons for premium voices.
License the skill to other companies for embedding into their products, such as educational platforms or customer service tools, with customization options. Revenue comes from one-time licensing fees or ongoing royalties, allowing partners to enhance their offerings without developing TTS from scratch.
Provide basic text-to-speech functionality for free to attract individual users or small projects, while charging for advanced features like high-quality voices, faster processing, or ad-free playback. This model drives user adoption and converts a portion to paid plans for additional capabilities.
💬 Integration Tip
Ensure Python 3.x and edge-tts are installed via pip, and use the 'speak' action for direct playback to simplify audio management without file handling.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.