parakeet-sttLocal speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API. Use when transcribing audio files, converting speech to text, or processing voice recordings locally without cloud APIs.
Install via ClawdBot CLI:
clawdbot install carlulsoe/parakeet-sttLocal transcription using NVIDIA Parakeet TDT 0.6B v3 with ONNX Runtime.
Runs on CPU — no GPU required. ~30x faster than realtime.
# Clone the repo
git clone https://github.com/groxaxo/parakeet-tdt-0.6b-v3-fastapi-openai.git
cd parakeet-tdt-0.6b-v3-fastapi-openai
# Run with Docker (recommended)
docker compose up -d parakeet-cpu
# Or run directly with Python
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 5000
Default port is 5000. Set PARAKEET_URL to override (e.g., http://localhost:5092).
OpenAI-compatible API at $PARAKEET_URL (default: http://localhost:5000).
# Transcribe audio file (plain text)
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
-F "file=@/path/to/audio.mp3" \
-F "response_format=text"
# Get timestamps and segments
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
-F "file=@/path/to/audio.mp3" \
-F "response_format=verbose_json"
# Generate subtitles (SRT)
curl -X POST $PARAKEET_URL/v1/audio/transcriptions \
-F "file=@/path/to/audio.mp3" \
-F "response_format=srt"
import os
from openai import OpenAI
client = OpenAI(
base_url=os.getenv("PARAKEET_URL", "http://localhost:5000") + "/v1",
api_key="not-needed"
)
with open("audio.mp3", "rb") as f:
transcript = client.audio.transcriptions.create(
model="parakeet-tdt-0.6b-v3",
file=f,
response_format="text"
)
print(transcript)
| Format | Output |
|--------|--------|
| text | Plain text |
| json | {"text": "..."} |
| verbose_json | Segments with timestamps and words |
| srt | SRT subtitles |
| vtt | WebVTT subtitles |
English, Spanish, French, German, Italian, Portuguese, Polish, Russian,
Ukrainian, Dutch, Swedish, Danish, Finnish, Norwegian, Greek, Czech,
Romanian, Hungarian, Bulgarian, Slovak, Croatian, Lithuanian, Latvian,
Estonian, Slovenian
Language is auto-detected — no configuration needed.
Open $PARAKEET_URL in a browser for drag-and-drop transcription UI.
# Check status
docker ps --filter "name=parakeet"
# View logs
docker logs -f <container-name>
# Restart
docker compose restart
# Stop
docker compose down
Generated Mar 1, 2026
Transcribe patient-doctor consultations and medical dictations locally to ensure HIPAA compliance and privacy. The fast processing enables quick turnaround for medical records and reports without relying on cloud services.
Generate subtitles in multiple formats like SRT and VTT for videos, podcasts, and online courses. The auto-detection of 25 languages supports multilingual content production efficiently on local hardware.
Transcribe court proceedings, depositions, and legal meetings securely on-premises to protect sensitive information. The OpenAI-compatible API allows easy integration with existing legal software workflows.
Process voice recordings from call centers to analyze customer interactions and generate text for sentiment analysis. Local deployment avoids data breaches and reduces latency for real-time insights.
Convert lecture recordings and educational videos into text and subtitles to aid students with hearing impairments or language barriers. The speed and local operation make it cost-effective for institutions.
Offer a hosted version of the Parakeet STT service with tiered pricing based on transcription volume and features like advanced analytics. Target small to medium businesses needing reliable, private speech-to-text without infrastructure management.
Sell licenses for organizations to deploy the software locally, with support and maintenance contracts. Ideal for industries with strict data privacy regulations like healthcare and finance, ensuring full control over data.
Integrate the Parakeet STT API into existing platforms or applications and charge per API call or on a usage basis. Partner with software vendors in media, legal, or customer service to enhance their offerings with fast transcription.
💬 Integration Tip
Set the PARAKEET_URL environment variable to point to your local or hosted instance, and use the OpenAI SDK for seamless integration with minimal code changes.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.