token-guardPrevents LLM API 429 errors by estimating tokens, tracking quotas, throttling requests, detecting duplicates, caching responses, and auto-fallback by model.
Install via ClawdBot CLI:
clawdbot install edmonddantesj/token-guardVersion: 1.5.0
Author: Aoineco & Co.
License: MIT
Tags: rate-limit, 429, token-management, cost-optimization, llm-guard, high-performance
Prevents LLM API 429 (Rate Limit / Resource Exhausted) errors by intercepting requests before they're sent. Designed for users on free/low-cost API plans who need maximum intelligence per dollar.
Core philosophy: "Intelligence is measured not by how much you spend, but by how little you need."
When using LLM APIs (especially Google Gemini Flash with 1M TPM limit):
| Feature | Description |
|---------|-------------|
| Pre-flight Token Estimation | Estimates token count before API call (CJK-aware, no tiktoken dependency) |
| Real-time Quota Tracking | Tracks per-model per-minute token usage with sliding window |
| Smart Throttle | Auto-waits when quota > 80%, blocks at > 95% |
| Duplicate Detection | Blocks identical requests within 60s window (3+ = runaway) |
| Response Caching | Caches successful responses for duplicate requests |
| Auto Model Fallback | Switches to cheaper/available model when primary is exhausted |
| 429 Error Parser | Extracts exact retry delay from Google/Anthropic error responses |
| Batch vs Mistake Detection | Distinguishes intentional bulk processing from error loops |
Pre-configured quotas for:
gemini-3-flash (1M TPM)gemini-3-pro (2M TPM)claude-haiku (50K TPM)claude-sonnet (200K TPM)claude-opus (200K TPM)gpt-4o (800K TPM)deepseek (1M TPM)Custom quotas can be added for any model.
from token_guard import TokenGuard
guard = TokenGuard()
# Before every API call:
decision = guard.check(prompt_text, model="gemini-3-flash")
if decision.action == "proceed":
response = call_your_api(prompt_text)
guard.record_usage(decision.estimated_tokens, model="gemini-3-flash")
guard.cache_response(prompt_text, response)
elif decision.action == "wait":
time.sleep(decision.wait_seconds)
# retry
elif decision.action == "fallback":
response = call_your_api(prompt_text, model=decision.fallback_model)
elif decision.action == "block":
print(f"Blocked: {decision.reason}")
# If you get a 429 error:
guard.record_429("gemini-3-flash", retry_delay=53.0)
Add to your agent's config or use as a middleware:
skills:
- token-guard
The agent can invoke TokenGuard before any LLM API call to prevent quota exhaustion.
token-guard/
āāā SKILL.md # This file
āāā scripts/
āāā token_guard.py # Main engine (zero external dependencies)
{
"models": {
"gemini-3-flash": {
"tpm_limit": 1000000,
"used_this_minute": 750000,
"remaining": 250000,
"usage_pct": "75.0%",
"status": "š¢ OK"
}
},
"stats": {
"total_checks": 42,
"tokens_saved": 128000,
"blocks": 3,
"fallbacks": 2
}
}
Pure Python 3.10+. No pip install needed. No tiktoken, no external API calls.
Designed for the $7 Bootstrap Protocol ā every byte counts.
Generated Mar 1, 2026
Researchers processing large volumes of PDFs and documents to extract insights using LLM APIs on limited budgets. TokenGuard prevents 429 errors from exceeding token quotas during bulk analysis, ensuring cost-effective data processing without interruptions.
Companies using LLMs to summarize support tickets from various sources like emails and chats. TokenGuard manages token usage across models to avoid rate limits, enabling continuous processing of high-volume ticket streams without wasting tokens on retries.
Platforms moderating user-generated content in real-time with LLM APIs. TokenGuard throttles requests and detects duplicates to prevent quota exhaustion during peak usage, maintaining moderation efficiency while optimizing API costs.
Law firms automating the review of lengthy legal documents such as contracts and case files. TokenGuard estimates tokens pre-flight and caches responses to avoid redundant API calls, reducing costs and ensuring reliable processing under strict token limits.
Online retailers generating product descriptions for large catalogs using LLM APIs. TokenGuard tracks quotas and enables model fallback to cheaper options when primary models are exhausted, scaling content creation without exceeding budget constraints.
Offer a free tier with basic token management features for individual developers and small projects, then charge for advanced features like custom model quotas and priority support. Revenue is generated through subscription plans tailored to different usage levels.
License TokenGuard as a middleware solution for large organizations integrating it into their existing AI workflows. Provide dedicated support, custom integrations, and SLA guarantees. Revenue comes from annual licensing fees based on the scale of deployment.
Host TokenGuard as a cloud service where users pay per token managed or per API call intercepted. Offer pay-as-you-go pricing with volume discounts, targeting developers and businesses needing scalable rate limit prevention without self-hosting.
š¬ Integration Tip
Integrate TokenGuard as a middleware in your AI agent's workflow by adding it to the config file, ensuring it checks requests before each LLM API call to prevent quota exhaustion.
Connect Claude to Clawdbot instantly and keep it connected 24/7. Run after setup to link your subscription, then auto-refreshes tokens forever.
ERC-8004 Trustless Agents - Register, discover, and build reputation for AI agents on Ethereum. Use when registering agents on-chain, querying agent registries, giving/receiving reputation feedback, or interacting with the AI agent trust layer.
Autonomous crypto trading on Base via Bankr. Use for trading tokens, monitoring launches, executing strategies, or managing a trading portfolio. Triggers on "trade", "buy", "sell", "launch", "snipe", "profit", "PnL", "portfolio balance", or any crypto trading task on Base.
Deploy ERC20 tokens on Base using Clanker SDK. Create tokens with built-in Uniswap V4 liquidity pools. Supports Base mainnet and Sepolia testnet. Requires PRIVATE_KEY in config.
Query DeFi portfolio data across 50+ chains via Zapper's GraphQL API. Use when the user wants to check wallet balances, DeFi positions, NFT holdings, token prices, or transaction history. Supports Base, Ethereum, Polygon, Arbitrum, Optimism, and more. Requires ZAPPER_API_KEY.
Interact with Solana blockchain via Helius APIs. Create/manage wallets, check balances (SOL + tokens), send transactions, swap tokens via Jupiter, and monitor addresses. Use for any Solana blockchain operation, crypto wallet management, token transfers, DeFi swaps, or portfolio tracking.