smart-model-routing-for-zaiAuto-route tasks to the cheapest z.ai (GLM) model that works correctly. Three-tier progression: Flash β Standard β Plus/32B. Classify before responding. FLASH (default): factual Q&A, greetings, reminders, status checks, lookups, simple file ops, heartbeats, casual chat, 1β2 sentence tasks, cron jobs. ESCALATE TO STANDARD: code >10 lines, analysis, comparisons, planning, reports, multi-step reasoning, tables, long writing >3 paragraphs, summarization, research synthesis, most user conversations. ESCALATE TO PLUS/32B: architecture decisions, complex debugging, multi-file refactoring, strategic planning, nuanced judgment, deep research, critical production decisions. Rule: If a human needs >30 seconds of focused thinking, escalate. If Standard struggles with complexity, go to Plus/32B. Save major API costs by starting cheap and escalating only when needed.
Install via ClawdBot CLI:
clawdbot install PrincNL/smart-model-routing-for-zaiThree-tier z.ai (GLM) routing: Flash β Standard β Plus / 32B
Start with the cheapest model. Escalate only when needed. Designed to minimize API cost without sacrificing correctness.
If a human would need more than 30 seconds of focused thinking, escalate from Flash to Standard.
If the task involves architecture, complex tradeoffs, or deep reasoning, escalate to Plus / 32B.
| Tier | Example Models | Purpose |
|-----|----------------|---------|
| Flash | GLM-4.5-Flash, GLM-4.7-Flash | Fastest & cheapest |
| Standard | GLM-4.6, GLM-4.7 | Strong reasoning & code |
| Plus / 32B | GLM-4-Plus, GLM-4-32B-128K | Heavy reasoning & architecture |
Bottom line: Wrong model selection wastes money OR time. Flash for simple, Standard for normal work, Plus/32B for complex decisions.
Stay on Flash for:
Escalate to Standard for:
Most real user conversations belong here.
Escalate to Plus / 32B for:
```javascript
// Routine monitoring
sessions_spawn(task="Check backup status", model="GLM-4.5-Flash")
// Standard code work
sessions_spawn(task="Build the REST API endpoint", model="GLM-4.7")
// Architecture decisions
sessions_spawn(task="Design the database schema for multi-tenancy", model="GLM-4-Plus")
For Cron Jobs
json
Copy code
{
"payload": {
"kind": "agentTurn",
"model": "GLM-4.5-Flash"
}
}
Always use Flash for cron unless the task genuinely needs reasoning.
π Quick Decision Tree
pgsql
Copy code
Is it a greeting, lookup, status check, or 1β2 sentence answer?
YES β FLASH
NO β
Is it code, analysis, planning, writing, or multi-step?
YES β STANDARD
NO β
Is it architecture, deep reasoning, or a critical decision?
YES β PLUS / 32B
NO β Default to STANDARD, escalate if struggling
π Quick Reference Card
less
Copy code
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SMART MODEL SWITCHING β
β Flash β Standard β Plus / 32B β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π FLASH (cheapest) β
β β’ Greetings, status checks, quick lookups β
β β’ Factual Q&A, reminders β
β β’ Simple file ops, 1β2 sentence answers β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π STANDARD (workhorse) β
β β’ Code > 10 lines, debugging β
β β’ Analysis, comparisons, planning β
β β’ Reports, long writing β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β€οΈ PLUS / 32B (complex) β
β β’ Architecture decisions β
β β’ Complex debugging, multi-file refactoring β
β β’ Strategic planning, deep research β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β π‘ RULE: >30 sec human thinking β escalate β
β π° START CHEAP β SCALE ONLY WHEN NEEDED β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Built for z.ai (GLM) setups.
Generated Mar 1, 2026
Handles initial customer inquiries like order status, FAQs, and simple troubleshooting using Flash for quick responses. Escalates to Standard for detailed issue analysis or generating step-by-step guides, and to Plus/32B for complex complaint resolution requiring nuanced judgment.
Uses Flash for code lookups and basic syntax help, Standard for writing functions and debugging typical bugs, and Plus/32B for designing system architecture or refactoring large codebases across multiple files.
Employs Flash for generating short social media posts or fact-checking, Standard for drafting blog articles and creating comparison tables, and Plus/32B for strategic content planning and deep research on trending topics.
Leverages Flash for quick status checks on patient data or simple reminders, Standard for synthesizing research reports and analyzing trends, and Plus/32B for critical decisions on treatment plans or optimizing hospital workflows.
Uses Flash for basic market lookups and greetings, Standard for generating investment reports and planning portfolios, and Plus/32B for complex risk assessment and strategic financial decisions requiring deep reasoning.
Offers tiered pricing based on model usage, with Flash included in basic plans, Standard in premium plans, and Plus/32B as an add-on for enterprises needing high-complexity tasks, generating recurring revenue.
Charges users per API call with different rates for Flash, Standard, and Plus/32B models, encouraging cost-effective usage by starting cheap and scaling only when necessary, ideal for developers and startups.
Provides custom packages for large organizations, bundling all model tiers with dedicated support and integration services, focusing on industries like finance or healthcare where complex reasoning is critical.
π¬ Integration Tip
Start by integrating Flash for simple tasks like status checks, then gradually add Standard and Plus/32B based on user feedback and performance needs to optimize costs.
Create jobs and transact with other specialised agents through the Agent Commerce Protocol (ACP) β extends the agent's action space by discovering and using agents on the marketplace, enables launching an agent token for fundraising and revenue, and supports registering service offerings to sell capabilities to other agents.
Write, structure, and update a business plan for a solopreneur. Use when creating a plan from scratch, updating an existing plan after a pivot or new phase, or preparing a plan to share with investors, partners, or even just to clarify your own strategy. Covers executive summary, market analysis, competitive positioning, revenue model, operations plan, financial projections, and risk assessment β all adapted for a one-person business. Trigger on "write a business plan", "business plan", "create my plan", "business plan template", "update my business plan", "plan for my business", "investor pitch plan".
Executive leadership guidance for strategic decision-making, organizational development, and stakeholder management. Includes strategy analyzer, financial scenario modeling, board governance frameworks, and investor relations playbooks. Use when planning strategy, preparing board presentations, managing investors, developing organizational culture, making executive decisions, or when user mentions CEO, strategic planning, board meetings, investor updates, organizational leadership, or executive strategy.
Strategic product leadership toolkit for Head of Product including OKR cascade generation, market analysis, vision setting, and team scaling. Use for strategic planning, goal alignment, competitive analysis, and organizational design.
B2B SaaS competitive intelligence with 24 scenarios across Sales/HR/Fintech/Ops Tech
Multi-agent war room for brainstorming, system design, architecture review, product specs, business strategy, or any complex problem. Use when a user wants to run a structured multi-agent session with specialist roles, when they mention "war room", when they need to brainstorm a project from scratch, design a system with multiple perspectives, stress-test decisions with a devil's advocate, or produce a comprehensive blueprint/spec. Works for software, hardware, content, business β any domain.