faster-whisper-local-serviceOpenClaw local speech-to-text backend using faster-whisper over HTTP on 127.0.0.1:18790. Use when you want voice transcription without external APIs, without...
Install via ClawdBot CLI:
clawdbot install neldar/faster-whisper-local-serviceProvision a local STT backend used by voice skills.
transcribe-server.py HTTP endpoint at http://127.0.0.1:18790/transcribeopenclaw-transcribe.serviceOn first startup, faster-whisper downloads model weights from Hugging Face (~1.5 GB for medium). This requires internet access and disk space. After the initial download, models are cached locally and the service runs fully offline.
| Model | Download size | RAM usage |
|---|---|---|
| tiny | ~75 MB | ~400 MB |
| base | ~150 MB | ~500 MB |
| small | ~500 MB | ~800 MB |
| medium | ~1.5 GB | ~1.4 GB |
| large-v3 | ~3.0 GB | ~3.5 GB |
To pre-download models in an air-gapped environment, see faster-whisper docs.
decodebin to convert incoming audio to WAV. While arguments are passed as a list (no shell injection), processing untrusted/malformed audio files carries inherent risk through GStreamer's media parsers. Ensure gst-launch-1.0 is installed from your OS vendor's trusted packages.127.0.0.1 only (not exposed externally).https://127.0.0.1:8443).faster-whisper==1.1.1 (override via env)gst-launch-1.0https://127.0.0.1:8443 (override via env)bash scripts/deploy.sh
With custom settings:
WORKSPACE=~/.openclaw/workspace \
TRANSCRIBE_PORT=18790 \
WHISPER_MODEL_SIZE=medium \
WHISPER_LANGUAGE=auto \
TRANSCRIBE_ALLOWED_ORIGIN=https://10.0.0.42:8443 \
bash scripts/deploy.sh
Default: auto (auto-detect language). Set WHISPER_LANGUAGE=de for German-only, en for English-only, etc. Fixed language is faster and more accurate if you only use one language.
Idempotent: safe to run repeatedly.
| What | Path | Action |
|---|---|---|
| Python venv | $WORKSPACE/.venv-faster-whisper/ | Creates venv, installs faster-whisper via pip |
| Transcribe server | $WORKSPACE/voice-input/transcribe-server.py | Writes server script |
| Systemd service | ~/.config/systemd/user/openclaw-transcribe.service | Creates + enables persistent service |
| Model cache | ~/.cache/huggingface/ | Downloads model weights on first run |
systemctl --user stop openclaw-transcribe.service
systemctl --user disable openclaw-transcribe.service
rm -f ~/.config/systemd/user/openclaw-transcribe.service
systemctl --user daemon-reload
Optional full cleanup:
rm -rf ~/.openclaw/workspace/.venv-faster-whisper
rm -f ~/.openclaw/workspace/voice-input/transcribe-server.py
bash scripts/status.sh
Expected:
activewebchat-voice-proxy for browser mic + HTTPS/WSS integration.webchat-voice-full-stack (deploys backend + proxy in order).Generated Mar 1, 2026
Universities use this skill to transcribe lectures and seminars locally, ensuring student accessibility without recurring API costs. It supports multiple languages for international programs and operates offline after initial model download.
Healthcare providers deploy this service to transcribe patient consultations and medical notes securely on-premises, complying with data privacy regulations like HIPAA. It runs offline to protect sensitive voice data from external exposure.
Media companies integrate this skill to transcribe podcast episodes locally, enabling efficient editing, subtitling, and content repurposing. The offline operation eliminates API fees and supports high-volume audio processing with customizable models.
Businesses use this service to transcribe customer support calls in-house for quality assurance and training, leveraging local processing to reduce costs and enhance data security. It pairs with voice workflows for real-time or batch transcription.
Law firms implement this skill to transcribe legal depositions and meetings locally, ensuring confidentiality and avoiding third-party API dependencies. The service supports fixed languages for accuracy in legal terminology.
Sell a packaged solution with this skill for on-premises deployment, charging a one-time fee for setup and support. Revenue comes from customization, training, and optional updates, targeting industries with strict data sovereignty needs.
Offer the skill as free open-source software, generating revenue through paid support, consulting, and premium features like advanced model tuning. Monetize by assisting organizations with deployment and integration into their workflows.
Bundle this skill with dedicated hardware appliances for offline transcription, selling to sectors like healthcare or government. Revenue includes appliance sales, maintenance, and optional cloud backup services.
💬 Integration Tip
Ensure gst-launch-1.0 is installed from trusted OS packages to mitigate security risks from audio parsing, and pre-download models in air-gapped environments using faster-whisper documentation.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.