elevenlabs-sttTranscribe audio files using ElevenLabs Speech-to-Text (Scribe v2).
Install via ClawdBot CLI:
clawdbot install clawdbotborges/elevenlabs-sttRequires:
Transcribe audio files using ElevenLabs' Scribe v2 model. Supports 90+ languages with speaker diarization.
# Basic transcription
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3
# With speaker diarization
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --diarize
# Specify language (improves accuracy)
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --lang en
# Full JSON output with timestamps
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --json
| Flag | Description |
|------|-------------|
| --diarize | Identify different speakers |
| --lang CODE | ISO language code (e.g., en, pt, es) |
| --json | Output full JSON with word timestamps |
| --events | Tag audio events (laughter, music, etc.) |
All major audio/video formats: mp3, m4a, wav, ogg, webm, mp4, etc.
Set ELEVENLABS_API_KEY environment variable, or configure in clawdbot.json:
{
skills: {
entries: {
"elevenlabs-stt": {
apiKey: "sk_..."
}
}
}
}
# Transcribe a WhatsApp voice note
{baseDir}/scripts/transcribe.sh ~/Downloads/voice_note.ogg
# Meeting recording with multiple speakers
{baseDir}/scripts/transcribe.sh meeting.mp3 --diarize --lang en
# Get JSON for processing
{baseDir}/scripts/transcribe.sh podcast.mp3 --json > transcript.json
Generated Mar 1, 2026
Transcribe recorded business meetings with speaker diarization to identify who said what, enabling efficient minute-taking and action item extraction. Useful for remote teams and compliance documentation.
Convert podcast audio into text transcripts with timestamps for SEO optimization, accessibility, and repurposing into blog posts or social media snippets. Supports multiple languages for global content.
Transcribe customer service calls to analyze sentiment, identify common issues, and train AI models for automated responses. Speaker diarization helps track agent and customer interactions.
Transcribe university lectures or educational videos into text for student notes, accessibility for hearing-impaired learners, and content indexing for searchable archives.
Accurately transcribe legal depositions with speaker identification and timestamps for evidence tracking and case preparation. JSON output facilitates integration with legal databases.
Offer a cloud-based transcription service with tiered pricing based on audio length or features like diarization and event tagging. Target small businesses and freelancers needing regular transcription.
License the transcription API to developers and enterprises for integration into their own applications, such as call centers or content management systems, with pay-per-use or bulk pricing.
Provide a customizable transcription platform for resellers like media companies or educational institutions, allowing them to brand it as their own service with added features.
💬 Integration Tip
Set the ELEVENLABS_API_KEY environment variable and use the provided shell scripts for easy command-line integration into automated workflows.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.