mac-ttsText-to-speech using macOS built-in `say` command. Use for voice notifications, audio alerts, reading text aloud, or announcing messages through Mac speakers. Supports multiple languages including Chinese (Mandarin), English, Japanese, etc.
Install via ClawdBot CLI:
clawdbot install kalijason/mac-ttsUse macOS built-in say command for text-to-speech output through system speakers.
say "Hello, this is a test"
say -v "Meijia" "你好,這是測試" # 台灣中文 (推薦)
say -v "Tingting" "你好,这是测试" # 簡體中文
say -v "Samantha" "Hello world" # 英文
| Voice | Description |
|-------|-------------|
| Meijia | 美佳 - 自然女聲 (推薦) |
| Flo | 年輕女聲 |
| Eddy | 男聲 |
| Reed | 男聲 |
| Sandy | 女聲 |
| Shelley | 女聲 |
say -v "?" # 全部語音
say -v "?" | grep zh_TW # 只列台灣中文
Check/adjust system volume before speaking:
# Check current volume (0-100) and mute status
osascript -e "output volume of (get volume settings)"
osascript -e "output muted of (get volume settings)"
# Unmute
osascript -e "set volume without output muted"
# Set volume (0-100)
osascript -e "set volume output volume 70"
say -v "Meijia" "外送到了"say -v "Meijia" "會議即將開始"say -v "Meijia" "注意,有新的緊急訊息"& for async: say "message" &Generated Mar 1, 2026
This skill enables screen reading and text-to-speech conversion for users with visual impairments on macOS, allowing them to hear notifications, documents, or web content read aloud. It supports multiple languages, including Mandarin and English, making it versatile for diverse user needs in educational or workplace settings.
Businesses can use this skill to provide audio alerts for incoming customer queries or urgent messages in call centers or support desks. By integrating with monitoring systems, it announces updates like 'New ticket received' or 'High-priority alert' to keep staff informed without constant screen checking.
Language learners can utilize the skill to hear correct pronunciations in different languages, such as Mandarin with the Meijia voice or English with Samantha. It helps practice listening and speaking skills by converting written text into speech, useful in self-study apps or classroom tools.
In retail stores or hotels, this skill can be integrated to make automated announcements for events, promotions, or safety messages. For example, announcing 'Sale starting in 10 minutes' or 'Welcome guests' in multiple languages to enhance customer experience and operational efficiency.
Developers can use this skill to test audio output and voice synthesis in macOS applications, such as for debugging notifications or voice interfaces. It provides a quick way to simulate speech without complex setups, aiding in prototyping and quality assurance for software projects.
Offer a free basic version for personal use with limited voices, and charge for premium features like advanced voice customization, multilingual support, or integration APIs. Revenue comes from subscriptions or one-time purchases, targeting individuals and small businesses needing enhanced accessibility.
Provide consulting and integration services to businesses looking to embed text-to-speech capabilities into their macOS-based systems, such as call centers or educational platforms. Charge based on project scope, customization, and ongoing support, generating revenue from service contracts and maintenance fees.
License the skill to educational institutions or content creators for use in language learning apps, e-books, or online courses. Revenue is generated through licensing agreements based on usage volume or per-user fees, leveraging the skill's multilingual support for diverse educational markets.
💬 Integration Tip
Ensure system volume is adjusted and unmuted before use, and consider adding '&' for asynchronous speech to avoid blocking other processes in applications.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.