openai-tts-bak-2026-01-28t18-01-23-10-30Text-to-speech via OpenAI Audio Speech API.
Install via ClawdBot CLI:
clawdbot install nicoataiza/openai-tts-bak-2026-01-28t18-01-23-10-30Requires:
Generate speech from text via OpenAI's /v1/audio/speech endpoint.
{baseDir}/scripts/speak.sh "Hello, world!"
{baseDir}/scripts/speak.sh "Hello, world!" --out /tmp/hello.mp3
Defaults:
tts-1 (fast) or tts-1-hd (quality)alloy (neutral), also: echo, fable, onyx, nova, shimmermp3| Voice | Description |
|-------|-------------|
| alloy | Neutral, balanced |
| echo | Male, warm |
| fable | British, expressive |
| onyx | Deep, authoritative |
| nova | Female, friendly |
| shimmer | Female, soft |
{baseDir}/scripts/speak.sh "Text" --voice nova --model tts-1-hd --out speech.mp3
{baseDir}/scripts/speak.sh "Text" --format opus --speed 1.2
Options:
--voice : alloy|echo|fable|onyx|nova|shimmer (default: alloy)--model : tts-1|tts-1-hd (default: tts-1)--format : mp3|opus|aac|flac|wav|pcm (default: mp3)--speed : 0.25-4.0 (default: 1.0)--out : output file (default: stdout or auto-named)Set OPENAI_API_KEY, or configure in ~/.clawdbot/clawdbot.json:
{
skills: {
entries: {
"openai-tts": {
apiKey: "sk-..."
}
}
}
}
Very affordable for short responses!
Generated Mar 1, 2026
Automatically generate voiceovers for online courses, tutorials, or e-learning modules. This allows educators to quickly produce accessible audio versions of text materials, enhancing learning experiences for auditory learners and those with visual impairments.
Integrate TTS into IVR systems or chatbots to provide spoken responses to customer inquiries. This improves accessibility and user experience by offering audio feedback in support applications, reducing reliance on text-only interactions.
Convert written books or articles into audio formats for platforms like Audible or podcasts. This speeds up production timelines and reduces costs compared to hiring human narrators, making it ideal for indie authors or small publishers.
Add text-to-speech functionality to websites to assist users with visual impairments or reading difficulties. This can be implemented as a browser extension or integrated directly into web applications to comply with accessibility standards.
Generate dynamic voice alerts or instructions for smart home devices, wearables, or industrial equipment. This enables personalized audio feedback based on sensor data or user interactions, enhancing usability in connected environments.
Offer a cloud-based TTS service with tiered pricing based on usage volume or features. This model provides recurring revenue from businesses that need regular audio generation, such as for marketing videos or training materials.
Charge customers based on the number of characters processed or audio minutes generated. This appeals to developers and companies with variable needs, allowing them to scale costs with usage without upfront commitments.
License the TTS technology to other software providers for embedding into their products, such as CRM systems or educational platforms. This generates revenue through licensing fees or revenue-sharing agreements with partners.
💬 Integration Tip
Ensure the OPENAI_API_KEY is securely stored in environment variables or configuration files, and use curl scripts for quick testing before full integration into applications.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.