gemini-sttTranscribe audio files using Google's Gemini API or Vertex AI
Install via ClawdBot CLI:
clawdbot install araa47/gemini-sttTranscribe audio files using Google's Gemini API or Vertex AI. Default model is gemini-2.0-flash-lite for fastest transcription.
gcloud auth application-default login
gcloud config set project YOUR_PROJECT_ID
The script will automatically detect and use ADC when available.
Set GEMINI_API_KEY in environment (e.g., ~/.env or ~/.clawdbot/.env)
.ogg / .opus (Telegram voice messages).mp3.wav.m4a# Auto-detect auth (tries ADC first, then GEMINI_API_KEY)
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg
# Force Vertex AI
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex
# With a specific model
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --model gemini-2.5-pro
# Vertex AI with specific project and region
python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex --project my-project --region us-central1
# With Clawdbot media
python ~/.claude/skills/gemini-stt/transcribe.py ~/.clawdbot/media/inbound/voice-message.ogg
| Option | Description |
|--------|-------------|
| | Path to the audio file (required) |
| --model, -m | Gemini model to use (default: gemini-2.0-flash-lite) |
| --vertex, -v | Force use of Vertex AI with ADC |
| --project, -p | GCP project ID (for Vertex, defaults to gcloud config) |
| --region, -r | GCP region (for Vertex, default: us-central1) |
Any Gemini model that supports audio input can be used. Recommended models:
| Model | Notes |
|-------|-------|
| gemini-2.0-flash-lite | Default. Fastest transcription speed. |
| gemini-2.0-flash | Fast and cost-effective. |
| gemini-2.5-flash-lite | Lightweight 2.5 model. |
| gemini-2.5-flash | Balanced speed and quality. |
| gemini-2.5-pro | Higher quality, slower. |
| gemini-3-flash-preview | Latest flash model. |
| gemini-3-pro-preview | Latest pro model, best quality. |
See Gemini API Models for the latest list.
For Clawdbot voice message handling:
# Transcribe incoming voice message
TRANSCRIPT=$(python ~/.claude/skills/gemini-stt/transcribe.py "$AUDIO_PATH")
echo "User said: $TRANSCRIPT"
The script exits with code 1 and prints to stderr on:
Generated Mar 1, 2026
Transcribe customer service calls from audio recordings for quality assurance and training. Enables analysis of customer interactions and agent performance. Useful for identifying common issues and improving service protocols.
Transcribe doctor-patient consultations or medical dictations into text for electronic health records. Helps streamline documentation, reduce administrative burden, and ensure accurate patient records. Supports compliance with healthcare regulations.
Convert recorded lectures or classroom discussions into text for accessibility and study materials. Assists students with disabilities and provides searchable notes for review. Enhances learning resources in online or hybrid education.
Transcribe court hearings, depositions, or client meetings for legal documentation and case preparation. Ensures accurate records for evidence and reduces manual transcription costs. Supports law firms in managing case files efficiently.
Generate transcripts for podcasts, videos, or live streams to create subtitles or closed captions. Improves accessibility for hearing-impaired audiences and enhances content reach. Useful for media producers and content creators.
Offer monthly or annual subscriptions for unlimited or tiered transcription usage. Target small businesses or individuals needing regular audio processing. Revenue generated from recurring fees based on usage limits or features.
Provide the skill as an API that charges per audio minute or file processed. Integrate with third-party platforms like CRM or project management tools. Revenue comes from transaction fees based on volume and processing time.
License the transcription technology to large organizations for internal use or resale. Customize branding and integrate with existing enterprise systems like healthcare or legal software. Revenue generated through upfront licensing fees and ongoing support contracts.
💬 Integration Tip
Use environment variables for API keys to simplify deployment and ensure security in production environments.
Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Gemini CLI for one-shot Q&A, summaries, and generation.
Research any topic from the last 30 days on Reddit + X + Web, synthesize findings, and write copy-paste-ready prompts. Use when the user wants recent social/web research on a topic, asks "what are people saying about X", or wants to learn current best practices. Requires OPENAI_API_KEY and/or XAI_API_KEY for full Reddit+X access, falls back to web search.
Check Antigravity account quotas for Claude and Gemini models. Shows remaining quota and reset times with ban detection.
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates opencla...
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates openclaw.json. Use when the user mentions free AI, OpenRouter, model switching, rate limits, or wants to reduce AI costs.