voice-agentLocal Voice Input/Output for Agents using the AI Voice Agent API.
Install via ClawdBot CLI:
clawdbot install ricardotrevisan/voice-agentThis skill allows you to speak and listen to the user using a local Voice Agent API.
It is client-only and does not start containers or services.
It uses local Whisper for Speech-to-Text transcription and AWS Polly for Text-to-Speech generation.
Requires a running backend API at http://localhost:8000.
Backend setup instructions are in this repository:
README.mdwalkthrough.mdDOCKER_README.md1. User sends audio.
2. Use transcribe to read it.
3. You think of a response.
4. Use synthesize to generate the audio file.
5. You send the file.
6. STOP. Do not add text commentary.
health fails or connection errors occur, do not attempt service management from this skill. Ask the user to start or fix the backend using the repository docs.To transcribe an audio file with local Whisper STT, run the client script with the transcribe command.
python3 {baseDir}/scripts/client.py transcribe "/path/to/audio/file.ogg"
To generate audio from text with AWS Polly TTS and save it to a file, run the client script with the synthesize command.
python3 {baseDir}/scripts/client.py synthesize "Text to speak" --output "/path/to/output.mp3"
To check if the voice agent API is running and healthy:
python3 {baseDir}/scripts/client.py health
Generated Feb 24, 2026
Enables automated customer support via voice interactions, allowing users to speak queries and receive spoken responses. Ideal for call centers or help desks to handle common inquiries without human agents, improving efficiency and availability.
Facilitates language practice by transcribing student speech and generating audio feedback for pronunciation and conversation. Useful for educational apps or tutoring services to provide immersive, interactive learning experiences.
Allows patients to verbally log symptoms or medication adherence, with the system transcribing and synthesizing reminders or summaries. Supports telehealth platforms by enhancing accessibility for users with mobility or literacy challenges.
Integrates with home automation systems to process voice commands for controlling devices like lights or thermostats, responding with audio confirmations. Enhances user convenience in residential IoT applications by enabling hands-free operation.
Converts text-based content like documents or websites into audio output and transcribes user voice inputs for navigation. Serves assistive technology providers to improve digital accessibility and independence for users with visual impairments.
Offers the voice agent as a cloud service with tiered pricing based on usage volume or features, such as higher-quality TTS or faster transcription. Generates recurring revenue from businesses integrating it into their applications for scalable voice capabilities.
Charges customers per transaction, such as each audio transcription or synthesis request, with volume discounts for high usage. Attracts developers and startups needing flexible, low-cost access without long-term commitments, driving revenue from variable demand.
Licenses the skill to enterprises for customization and branding within their own products, such as call center software or educational tools. Provides upfront licensing fees and ongoing support contracts, targeting large organizations seeking proprietary voice solutions.
๐ฌ Integration Tip
Ensure the local backend API is running on port 8000 and follow the provided documentation for setup; test health checks before deployment to avoid connection issues.
Turn your AI into JARVIS. Voice, wit, and personality โ the complete package. Humor cranked to maximum.
Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).
ๆฌๅฐ็ๆ Telegram ่ฏญ้ณๆถๆฏ๏ผๆฏๆ่ชๅจๆธ ๆดใๅๆฎตไธไธดๆถๆไปถ็ฎก็ใ
Speak responses aloud on macOS using the built-in `say` command when user input indicates Voice Wake/voice recognition (for example, messages starting with "User talked via voice recognition on <device>").
ๅๆๅฎ Telegram ็พค็ปๅ้่ฏญ้ณๆถๆฏ
Generate Russian male voice audio using ComfyUI with Qwen3 TTS node and save as MP3 for voice messages.