llmfitDetect local hardware (RAM, CPU, GPU/VRAM) and recommend the best-fit local LLM models with optimal quantization, speed estimates, and fit scoring.
Install via ClawdBot CLI:
clawdbot install AlexsJones/llmfitHardware-aware local LLM advisor. Detects your system specs (RAM, CPU, GPU/VRAM) and recommends models that actually fit, with optimal quantization and speed estimates.
Use this skill immediately when the user asks any of:
Also use this skill when:
models.providers.ollama or models.providers.lmstudiollmfit --json system
Returns JSON with CPU, RAM, GPU name, VRAM, multi-GPU info, and whether memory is unified (Apple Silicon).
llmfit recommend --json --limit 5
Returns the top 5 models ranked by a composite score (quality, speed, fit, context) with optimal quantization for the detected hardware.
llmfit recommend --json --use-case coding --limit 3
llmfit recommend --json --use-case reasoning --limit 3
llmfit recommend --json --use-case chat --limit 3
Valid use cases: general, coding, reasoning, chat, multimodal, embedding.
llmfit recommend --json --min-fit good --limit 10
Valid fit levels (best to worst): perfect, good, marginal.
{
"system": {
"cpu_name": "Apple M2 Max",
"cpu_cores": 12,
"total_ram_gb": 32.0,
"available_ram_gb": 24.5,
"has_gpu": true,
"gpu_name": "Apple M2 Max",
"gpu_vram_gb": 32.0,
"gpu_count": 1,
"backend": "Metal",
"unified_memory": true
}
}
Each model in the models array includes:
| Field | Meaning |
|---|---|
| name | HuggingFace model ID (e.g. meta-llama/Llama-3.1-8B-Instruct) |
| provider | Model provider (Meta, Alibaba, Google, etc.) |
| params_b | Parameter count in billions |
| score | Composite score 0–100 (higher is better) |
| score_components | Breakdown: quality, speed, fit, context (each 0–100) |
| fit_level | Perfect, Good, Marginal, or TooTight |
| run_mode | GPU, CPU+GPU Offload, or CPU Only |
| best_quant | Optimal quantization for the hardware (e.g. Q5_K_M, Q4_K_M) |
| estimated_tps | Estimated tokens per second |
| memory_required_gb | VRAM/RAM needed at this quantization |
| memory_available_gb | Available VRAM/RAM detected |
| utilization_pct | How much of available memory the model uses |
| use_case | What the model is designed for |
| context_length | Maximum context window |
After getting recommendations, configure the user's local model provider.
Map the HuggingFace model name to its Ollama tag. Common mappings:
| llmfit name | Ollama tag |
|---|---|
| meta-llama/Llama-3.1-8B-Instruct | llama3.1:8b |
| meta-llama/Llama-3.3-70B-Instruct | llama3.3:70b |
| Qwen/Qwen2.5-Coder-7B-Instruct | qwen2.5-coder:7b |
| Qwen/Qwen2.5-72B-Instruct | qwen2.5:72b |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | deepseek-coder-v2:16b |
| deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | deepseek-r1:32b |
| google/gemma-2-9b-it | gemma2:9b |
| mistralai/Mistral-7B-Instruct-v0.3 | mistral:7b |
| microsoft/Phi-3-mini-4k-instruct | phi3:mini |
| microsoft/Phi-4-mini-instruct | phi4-mini |
Then update openclaw.json:
{
"models": {
"providers": {
"ollama": {
"models": ["ollama/<ollama-tag>"]
}
}
}
}
And optionally set as default:
{
"agents": {
"defaults": {
"model": {
"primary": "ollama/<ollama-tag>"
}
}
}
}
Use the HuggingFace model name directly as the model identifier with the appropriate provider prefix (vllm/ or lmstudio/).
When a user asks "what local models can I run?":
llmfit --json system to show hardware summaryllmfit recommend --json --limit 5 to get top picksopenclaw.json with the chosen modelWhen a user asks for a specific use case like "recommend a coding model":
llmfit recommend --json --use-case coding --limit 3best_quant field tells you the optimal quantization — higher quant (Q6_K, Q8_0) means better quality if VRAM allows.estimated_tps) are approximate and vary by hardware and quantization.fit_level: "TooTight" should never be recommended to users.Generated Feb 24, 2026
When a developer sets up a new machine for local AI development, they need to quickly identify which LLMs can run efficiently on their hardware. This skill automates hardware detection and model recommendation, saving hours of manual research and trial-and-error testing.
IT teams in enterprises deploying local LLMs for privacy or cost reasons use this skill to assess hardware across departments. It ensures optimal model selection based on available resources, preventing performance bottlenecks and overspending on unnecessary upgrades.
Universities and coding bootcamps use this skill to configure lab computers for AI courses. It helps instructors recommend models that fit diverse student hardware, enabling hands-on learning with local inference without requiring high-end GPUs for all users.
Creative agencies and indie developers use local LLMs for tasks like scriptwriting or code generation. This skill recommends models that balance quality and speed on their existing hardware, avoiding disruptions in workflow due to slow inference or memory issues.
Computer retailers and system integrators use this skill to demo AI capabilities on different hardware configurations for customers. It provides data-driven recommendations, helping upsell appropriate GPUs or RAM based on the models clients want to run locally.
Offer a free version for basic hardware detection and model recommendations, with a paid tier for advanced analytics like batch processing across multiple systems, historical performance tracking, and priority support. Revenue comes from subscriptions targeting enterprise teams.
Partner with hardware manufacturers (e.g., NVIDIA, Apple) or AI platforms (e.g., Ollama, LM Studio) to bundle this skill as a value-add tool. Revenue is generated through referral fees, co-marketing deals, or licensing agreements for embedded use in their software suites.
Provide tailored consulting services for businesses needing custom model recommendations or integration into existing AI workflows. This includes on-site training, bespoke plugin development, and ongoing support contracts for large-scale deployments.
💬 Integration Tip
Integrate this skill into CI/CD pipelines to automatically test model compatibility during development, or use it with monitoring tools to alert when hardware upgrades are needed for optimal AI performance.
Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Gemini CLI for one-shot Q&A, summaries, and generation.
Research any topic from the last 30 days on Reddit + X + Web, synthesize findings, and write copy-paste-ready prompts. Use when the user wants recent social/web research on a topic, asks "what are people saying about X", or wants to learn current best practices. Requires OPENAI_API_KEY and/or XAI_API_KEY for full Reddit+X access, falls back to web search.
Check Antigravity account quotas for Claude and Gemini models. Shows remaining quota and reset times with ban detection.
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates opencla...
Manages free AI models from OpenRouter for OpenClaw. Automatically ranks models by quality, configures fallbacks for rate-limit handling, and updates openclaw.json. Use when the user mentions free AI, OpenRouter, model switching, rate limits, or wants to reduce AI costs.