phone-agentRun a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot.
Install via ClawdBot CLI:
clawdbot install kesslerio/phone-agentRuns a local FastAPI server that acts as a real-time voice bridge.
Twilio (Phone) <--> WebSocket (Audio) <--> [Local Server] <--> Deepgram (STT)
|
+--> OpenAI (LLM)
+--> ElevenLabs (TTS)
pip install -r scripts/requirements.txt
~/.moltbot/.env, ~/.clawdbot/.env, or export):
export DEEPGRAM_API_KEY="your_key"
export OPENAI_API_KEY="your_key"
export ELEVENLABS_API_KEY="your_key"
export TWILIO_ACCOUNT_SID="your_sid"
export TWILIO_AUTH_TOKEN="your_token"
export PORT=8080
python3 scripts/server.py
ngrok http 8080
https://.ngrok.io/incoming POSTCall your Twilio number. The agent should answer, transcribe your speech, think, and reply in a natural voice.
SYSTEM_PROMPT in scripts/server.py to change the persona.ELEVENLABS_VOICE_ID to use different voices.gpt-4o-mini to gpt-4 for smarter (but slower) responses.Generated Mar 1, 2026
Deploy the phone agent to handle routine customer inquiries, such as account balance checks or service status updates, reducing wait times and freeing human agents for complex issues. It can provide 24/7 support in multiple languages by integrating with different ElevenLabs voices.
Use the agent to manage phone-based appointment bookings for clinics or salons, transcribing caller requests and confirming details via LLM-generated responses. It streamlines scheduling without manual input, improving operational efficiency.
Implement the agent to answer inbound sales calls, ask qualifying questions based on a customized system prompt, and log lead information for follow-up. This helps prioritize high-potential leads and reduces sales team workload.
Set up the agent to provide automated updates during crises, such as weather alerts or service disruptions, by delivering pre-programmed or real-time information via TTS. It ensures reliable communication when human operators are overwhelmed.
Replace traditional IVR systems with this AI agent to handle complex menu navigation and natural language queries, offering more intuitive customer interactions. It reduces call abandonment rates by understanding context better.
Offer the phone agent as a cloud-based service with tiered pricing based on call volume or features, such as advanced LLM models or custom voices. Revenue comes from monthly subscriptions, targeting small businesses needing affordable automation.
Provide professional services to customize and deploy the agent for specific industries, including system prompt tuning and API integration. Revenue is generated through one-time project fees and ongoing maintenance contracts.
Monetize the agent by exposing its capabilities via an API, charging per minute of call time or per transaction processed. This model suits developers building voice applications without managing infrastructure.
💬 Integration Tip
Ensure all API keys are securely stored and test the WebSocket connection with Twilio before going live to avoid call drops.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.