senior-ml-engineerML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization.
Install via ClawdBot CLI:
clawdbot install alirezarezvani/senior-ml-engineerProduction ML engineering patterns for model deployment, MLOps infrastructure, and LLM integration.
Deploy a trained model to production with monitoring:
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model/ /app/model/
COPY src/ /app/src/
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1
EXPOSE 8080
CMD ["uvicorn", "src.server:app", "--host", "0.0.0.0", "--port", "8080"]
| Option | Latency | Throughput | Use Case |
|--------|---------|------------|----------|
| FastAPI + Uvicorn | Low | Medium | REST APIs, small models |
| Triton Inference Server | Very Low | Very High | GPU inference, batching |
| TensorFlow Serving | Low | High | TensorFlow models |
| TorchServe | Low | High | PyTorch models |
| Ray Serve | Medium | High | Complex pipelines, multi-model |
Establish automated training and deployment:
from feast import Entity, Feature, FeatureView, FileSource
user = Entity(name="user_id", value_type=ValueType.INT64)
user_features = FeatureView(
name="user_features",
entities=["user_id"],
ttl=timedelta(days=1),
features=[
Feature(name="purchase_count_30d", dtype=ValueType.INT64),
Feature(name="avg_order_value", dtype=ValueType.FLOAT),
],
online=True,
source=FileSource(path="data/user_features.parquet"),
)
| Trigger | Detection | Action |
|---------|-----------|--------|
| Scheduled | Cron (weekly/monthly) | Full retrain |
| Performance drop | Accuracy < threshold | Immediate retrain |
| Data drift | PSI > 0.2 | Evaluate, then retrain |
| New data volume | X new samples | Incremental update |
Integrate LLM APIs into production applications:
from abc import ABC, abstractmethod
from tenacity import retry, stop_after_attempt, wait_exponential
class LLMProvider(ABC):
@abstractmethod
def complete(self, prompt: str, **kwargs) -> str:
pass
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def call_llm_with_retry(provider: LLMProvider, prompt: str) -> str:
return provider.complete(prompt)
| Provider | Input Cost | Output Cost |
|----------|------------|-------------|
| GPT-4 | $0.03/1K | $0.06/1K |
| GPT-3.5 | $0.0005/1K | $0.0015/1K |
| Claude 3 Opus | $0.015/1K | $0.075/1K |
| Claude 3 Haiku | $0.00025/1K | $0.00125/1K |
Build retrieval-augmented generation pipeline:
| Database | Hosting | Scale | Latency | Best For |
|----------|---------|-------|---------|----------|
| Pinecone | Managed | High | Low | Production, managed |
| Qdrant | Both | High | Very Low | Performance-critical |
| Weaviate | Both | High | Low | Hybrid search |
| Chroma | Self-hosted | Medium | Low | Prototyping |
| pgvector | Self-hosted | Medium | Medium | Existing Postgres |
| Strategy | Chunk Size | Overlap | Best For |
|----------|------------|---------|----------|
| Fixed | 500-1000 tokens | 50-100 | General text |
| Sentence | 3-5 sentences | 1 sentence | Structured text |
| Semantic | Variable | Based on meaning | Research papers |
| Recursive | Hierarchical | Parent-child | Long documents |
Monitor production models for drift and degradation:
from scipy.stats import ks_2samp
def detect_drift(reference, current, threshold=0.05):
statistic, p_value = ks_2samp(reference, current)
return {
"drift_detected": p_value < threshold,
"ks_statistic": statistic,
"p_value": p_value
}
| Metric | Warning | Critical |
|--------|---------|----------|
| p95 latency | > 100ms | > 200ms |
| Error rate | > 0.1% | > 1% |
| PSI (drift) | > 0.1 | > 0.2 |
| Accuracy drop | > 2% | > 5% |
references/mlops_production_patterns.md contains:
references/llm_integration_guide.md contains:
references/rag_system_architecture.md contains:
python scripts/model_deployment_pipeline.py --model model.pkl --target staging
Generates deployment artifacts: Dockerfile, Kubernetes manifests, health checks.
python scripts/rag_system_builder.py --config rag_config.yaml --analyze
Scaffolds RAG pipeline with vector store integration and retrieval logic.
python scripts/ml_monitoring_suite.py --config monitoring.yaml --deploy
Sets up drift detection, alerting, and performance dashboards.
| Category | Tools |
|----------|-------|
| ML Frameworks | PyTorch, TensorFlow, Scikit-learn, XGBoost |
| LLM Frameworks | LangChain, LlamaIndex, DSPy |
| MLOps | MLflow, Weights & Biases, Kubeflow |
| Data | Spark, Airflow, dbt, Kafka |
| Deployment | Docker, Kubernetes, Triton |
| Databases | PostgreSQL, BigQuery, Pinecone, Redis |
Generated Mar 1, 2026
A retail company needs to deploy a trained product recommendation model to production. The skill guides them through containerizing the model with Docker, deploying it to a staging environment for integration testing, and then using canary deployment to production with monitoring for latency and error rates. This ensures reliable, low-latency recommendations for users.
A fintech firm wants to automate the retraining and deployment of fraud detection models. Using this skill, they set up a feature store for real-time transaction data, implement drift monitoring to detect changes in data patterns, and configure automated retraining triggers based on performance drops or data drift, ensuring models stay accurate and compliant.
A healthcare research organization builds a retrieval-augmented generation system to answer queries from medical documents. The skill helps select a vector database like Qdrant, implement document chunking strategies, and integrate an LLM with cost tracking, enabling accurate, context-aware responses without hallucinations for clinicians and researchers.
A SaaS company integrates LLMs into their customer support chatbot to handle complex queries. The skill provides guidance on creating a provider abstraction layer for flexibility, implementing retry logic and fallback mechanisms, and tracking costs per request to stay within budget while ensuring high availability and response quality.
A manufacturing plant deploys models to predict equipment failures and needs continuous monitoring. The skill outlines setting up model serving with Triton Inference Server for high throughput, monitoring for drift and performance degradation, and automating alerts to trigger retraining, minimizing downtime and maintenance costs.
Offer a managed platform that automates model deployment, monitoring, and retraining for clients. Revenue is generated through subscription fees based on usage tiers, such as number of models deployed or compute resources consumed, providing scalable infrastructure without upfront investment.
Provide expert consulting services to help businesses implement MLOps pipelines, LLM integrations, or RAG systems. Revenue comes from project-based fees or hourly rates, targeting industries like finance or healthcare that require specialized, compliant solutions.
Host and maintain production models for clients, handling deployment, scaling, and monitoring tasks. Revenue is based on a combination of hosting fees and performance-based pricing, such as charges per inference request or SLA guarantees for uptime and latency.
💬 Integration Tip
Start by containerizing models with the provided Docker template to ensure portability, then integrate with existing CI/CD pipelines for automated testing and deployment.
Automatically update Clawdbot and all installed skills once daily. Runs via cron, checks for updates, applies them, and messages the user with a summary of what changed.
Full desktop computer use for headless Linux servers. Xvfb + XFCE virtual desktop with xdotool automation. 17 actions (click, type, scroll, screenshot, drag,...
Essential Docker commands and workflows for container management, image operations, and debugging.
Tool discovery and shell one-liner reference for sysadmin, DevOps, and security tasks. AUTO-CONSULT this skill when the user is: troubleshooting network issues, debugging processes, analyzing logs, working with SSL/TLS, managing DNS, testing HTTP endpoints, auditing security, working with containers, writing shell scripts, or asks 'what tool should I use for X'. Source: github.com/trimstray/the-book-of-secret-knowledge
Deploy applications and manage projects with complete CLI reference. Commands for deployments, projects, domains, environment variables, and live documentation access.
Monitor topics of interest and proactively alert when important developments occur. Use when user wants automated monitoring of specific subjects (e.g., product releases, price changes, news topics, technology updates). Supports scheduled web searches, AI-powered importance scoring, smart alerts vs weekly digests, and memory-aware contextual summaries.