openai-whisper-apiTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
Install via ClawdBot CLI:
clawdbot install steipete/openai-whisper-apiRequires:
Transcribe an audio file via OpenAIβs /v1/audio/transcriptions endpoint.
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a
Defaults:
whisper-1.txt{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model whisper-1 --out /tmp/transcript.txt
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --language en
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --prompt "Speaker names: Peter, Daniel"
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --json --out /tmp/transcript.json
Set OPENAI_API_KEY, or configure it in ~/.clawdbot/clawdbot.json:
{
skills: {
"openai-whisper-api": {
apiKey: "OPENAI_KEY_HERE"
}
}
}
Generated Feb 27, 2026
Automatically transcribe podcast episodes for content creators to repurpose into blog posts, show notes, or subtitles. This saves time and improves accessibility for listeners.
Transcribe audio recordings from healthcare professionals for patient notes or reports. Helps streamline documentation and reduce manual transcription errors in clinical settings.
Generate accurate transcriptions for online courses or lectures to create subtitles, study materials, or accessible content for students with hearing impairments.
Transcribe customer service calls to analyze sentiment, identify common issues, and improve training. Enables data-driven insights for quality assurance.
Convert audio recordings from legal depositions into text for case files, evidence preparation, and review by attorneys. Enhances accuracy and efficiency in legal workflows.
Offer a monthly subscription for developers or businesses to access the transcription API with usage limits. Provides recurring revenue and scales with customer demand.
Provide a free tier with limited transcriptions and charge per minute for additional usage. Attracts small users and converts heavy users to paid plans.
License the transcription technology to large companies for integration into their internal tools or customer-facing products. Targets B2B sales with custom pricing.
π¬ Integration Tip
Ensure the OPENAI_API_KEY is securely stored and use curl commands in scripts for easy automation in workflows.
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.
Control Amazon Alexa devices and smart home via the `alexacli` CLI. Use when a user asks to speak/announce on Echo devices, control lights/thermostats/locks, send voice commands, or query Alexa.