elevenlabs-transcribeTranscribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.
Install via ClawdBot CLI:
clawdbot install PaulAsjes/elevenlabs-transcribeRequires:
Official ElevenLabs skill for speech-to-text transcription.
Convert audio to text with state-of-the-art accuracy. Supports 90+ languages, speaker diarization, and realtime streaming.
brew install ffmpeg on macOS){baseDir}/scripts/transcribe.sh <audio_file> [options]
{baseDir}/scripts/transcribe.sh --url <stream_url> [options]
{baseDir}/scripts/transcribe.sh --mic [options]
Transcribe a local audio file:
{baseDir}/scripts/transcribe.sh recording.mp3
With speaker identification:
{baseDir}/scripts/transcribe.sh meeting.mp3 --diarize
Get full JSON response with timestamps:
{baseDir}/scripts/transcribe.sh interview.wav --diarize --json
Stream from a URL (e.g., live radio, podcast):
{baseDir}/scripts/transcribe.sh --url https://npr-ice.streamguys1.com/live.mp3
Transcribe from microphone:
{baseDir}/scripts/transcribe.sh --mic
Stream a local file in realtime (useful for testing):
{baseDir}/scripts/transcribe.sh audio.mp3 --realtime
Suppress status messages on stderr:
{baseDir}/scripts/transcribe.sh --mic --quiet
| Option | Description |
|--------|-------------|
| --diarize | Identify different speakers in the audio |
| --lang CODE | ISO language hint (e.g., en, pt, es, fr) |
| --json | Output full JSON with timestamps and metadata |
| --events | Tag audio events (laughter, music, applause) |
| --realtime | Stream local file instead of batch processing |
| --partials | Show interim transcripts during realtime mode |
| -q, --quiet | Suppress status messages (recommended for agents) |
Plain text transcription:
The quick brown fox jumps over the lazy dog.
--json){
"text": "The quick brown fox jumps over the lazy dog.",
"language_code": "eng",
"language_probability": 0.98,
"words": [
{"text": "The", "start": 0.0, "end": 0.15, "type": "word", "speaker_id": "speaker_0"}
]
}
Final transcripts print as they're committed. With --partials:
[partial] The quick
[partial] The quick brown fox
The quick brown fox jumps over the lazy dog.
Audio: MP3, WAV, M4A, FLAC, OGG, WebM, AAC, AIFF, Opus
Video: MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GPP
Limits: Up to 3GB file size, 10 hours duration
The script exits with non-zero status on errors:
ELEVENLABS_API_KEY environment variable| Scenario | Command |
|----------|---------|
| Transcribe a recording | ./transcribe.sh file.mp3 |
| Meeting with multiple speakers | ./transcribe.sh meeting.mp3 --diarize |
| Live radio/podcast stream | ./transcribe.sh --url |
| Voice input from user | ./transcribe.sh --mic --quiet |
| Need word timestamps | ./transcribe.sh file.mp3 --json |
AI Usage Analysis
Analysis is being generated⦠refresh in a few seconds.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.