zhipu-asrAutomatic Speech Recognition (ASR) using Zhipu AI (BigModel) GLM-ASR model. Use when you need to transcribe audio files to text. Supports Chinese audio trans...
Install via ClawdBot CLI:
clawdbot install franklu0819-lang/zhipu-asrTranscribe Chinese audio files to text using Zhipu AI's GLM-ASR model.
1. Get your API Key:
Get a key from Zhipu AI Console
2. Set it in your environment:
export ZHIPU_API_KEY="your-key-here"
Note: The script automatically converts unsupported formats to MP3 using ffmpeg. Only WAV and MP3 are accepted by the API, but you can use any format that ffmpeg supports.
Transcribe an audio file with default settings:
bash scripts/speech_to_text.sh recording.wav
Provide previous transcription or context for better accuracy:
bash scripts/speech_to_text.sh recording.wav "这是之前的转录内容,有助于提高准确性"
Use custom vocabulary to improve recognition of specific terms:
bash scripts/speech_to_text.sh recording.mp3 "" "人名,地名,专业术语,公司名称"
Combine context and hotwords:
bash scripts/speech_to_text.sh recording.wav "会议记录片段" "张三,李四,项目名称"
Parameters:
audio_file (required): Path to audio file (.wav or .mp3)prompt (optional): Previous transcription or context text (max 8000 chars)hotwords (optional): Comma-separated list of specific terms (max 100 words)Why use context prompts:
When to use:
Example:
bash scripts/speech_to_text.sh part2.wav "第一部分的转录内容:讨论了项目进展和下一步计划"
What are hotwords:
Custom vocabulary list that boosts recognition accuracy for specific terms.
Best use cases:
Examples:
# Medical transcription
bash scripts/speech_to_text.sh medical.wav "" "患者,症状,诊断,治疗方案"
# Business meeting
bash scripts/speech_to_text.sh meeting.wav "" "张经理,李总,项目代号,预算"
# Tech discussion
bash scripts/speech_to_text.sh tech.wav "" "API,数据库,算法,框架"
# Part 1
bash scripts/speech_to_text.sh meeting_part1.wav
# Part 2 with context
bash scripts/speech_to_text.sh meeting_part2.wav "第一部分讨论了项目进度" "张总,李经理,项目名称"
# Part 3 with context
bash scripts/speech_to_text.sh meeting_part3.wav "前两部分讨论了项目进度和预算" "张总,李经理,项目名称"
bash scripts/speech_to_text.sh lecture.wav "" "教授,课程名称,专业术语1,专业术语2"
for file in recording_*.wav; do
bash scripts/speech_to_text.sh "$file"
done
Best practices for accurate transcription:
The script outputs JSON with:
id: Task IDcreated: Request timestamp (Unix timestamp)request_id: Unique request identifiermodel: Model name usedtext: Transcribed textExample output:
{
"id": "task-12345",
"created": 1234567890,
"request_id": "req-abc123",
"model": "glm-asr-2512",
"text": "你好,这是转录的文本内容"
}
File Size Issues:
Duration Issues:
Poor Accuracy:
Format Issues:
Generated Mar 1, 2026
Transcribe business meetings or conference calls in Chinese, using context prompts to link multiple audio segments for continuity. Ideal for capturing discussions on project updates, decisions, and action items in corporate settings.
Convert Chinese-language lectures, seminars, or online course audio into text, with hotwords for technical terms or names to improve accuracy. Useful for creating study materials or subtitles in education and training.
Transcribe patient-doctor conversations in Chinese, leveraging hotwords for medical terminology like symptoms or treatments to ensure precise documentation. Supports compliance and record-keeping in healthcare.
Process audio from customer support calls in Chinese, using context prompts to maintain conversation flow and hotwords for product names or issues. Helps in quality assurance and feedback analysis for service improvement.
Transcribe Chinese audio from podcasts, interviews, or broadcasts, with hotwords for names and brands to enhance transcription quality. Facilitates content creation, subtitling, and archiving in media industries.
Offer tiered subscription plans for developers or businesses to access the ASR service via API, with limits on requests or features like hotwords. Revenue comes from monthly or annual fees based on usage volume.
Charge users per audio file or minute transcribed, with optional add-ons for advanced features like context prompts or bulk processing. Targets occasional users or small businesses needing flexible pricing.
License the ASR technology to large organizations for integration into their internal systems, such as call centers or compliance tools, with customization and support. Revenue is generated through licensing fees and service contracts.
💬 Integration Tip
Ensure the ZHIPU_API_KEY is set in the environment and use the provided shell script with proper audio file paths for quick setup.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.
Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control, and subtitle generation. Use when: (1) User requests audio/voice output with the "tts" trigger or keyword. (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking). (3) User wants a specific voice, speed, pitch, or format for TTS output.
End-to-end encrypted agent-to-agent private messaging via Moltbook dead drops. Use when agents need to communicate privately, exchange secrets, or coordinate without human visibility.
Text-to-speech via OpenAI Audio Speech API.