senior-data-scientistWorld-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication. Use when designing experiments, building predictive models, performing causal analysis, or driving data-driven decisions.
Install via ClawdBot CLI:
clawdbot install alirezarezvani/senior-data-scientistWorld-class senior data scientist skill for production-grade AI/ML/Data systems.
# Core Tool 1
python scripts/experiment_designer.py --input data/ --output results/
# Core Tool 2
python scripts/feature_engineering_pipeline.py --target project/ --analyze
# Core Tool 3
python scripts/model_evaluation_suite.py --config config.yaml --deploy
This skill covers world-class capabilities in:
Languages: Python, SQL, R, Scala, Go
ML Frameworks: PyTorch, TensorFlow, Scikit-learn, XGBoost
Data Tools: Spark, Airflow, dbt, Kafka, Databricks
LLM Frameworks: LangChain, LlamaIndex, DSPy
Deployment: Docker, Kubernetes, AWS/GCP/Azure
Monitoring: MLflow, Weights & Biases, Prometheus
Databases: PostgreSQL, BigQuery, Snowflake, Pinecone
Comprehensive guide available in references/statistical_methods_advanced.md covering:
Complete workflow documentation in references/experiment_design_frameworks.md including:
Technical reference guide in references/feature_engineering_patterns.md with:
Enterprise-scale data processing with distributed computing:
Production ML system with high availability:
High-throughput inference system:
Latency:
Throughput:
Availability:
# Development
python -m pytest tests/ -v --cov
python -m black src/
python -m pylint src/
# Training
python scripts/train.py --config prod.yaml
python scripts/evaluate.py --model best.pth
# Deployment
docker build -t service:v1 .
kubectl apply -f k8s/
helm upgrade service ./charts/
# Monitoring
kubectl logs -f deployment/service
python scripts/health_check.py
references/statistical_methods_advanced.mdreferences/experiment_design_frameworks.mdreferences/feature_engineering_patterns.mdscripts/ directoryAs a world-class senior professional:
Generated Mar 1, 2026
Design and analyze A/B tests to evaluate new website features, such as checkout flow changes or personalized recommendations, using statistical methods to determine impact on conversion rates and revenue. Implement scalable experiment pipelines with real-time monitoring to drive data-driven decisions.
Build predictive models using time series analysis and feature engineering to detect fraudulent transactions in real-time, leveraging distributed computing frameworks like Spark for high-throughput processing. Deploy models with low-latency inference and monitor for drift to ensure compliance and security.
Develop causal inference models to analyze treatment effects and predict patient outcomes, using advanced statistical methods and feature engineering on electronic health records. Ensure data privacy with PII handling and deploy models in a secure, compliant production environment.
Create time series models to forecast product demand, integrating data from multiple sources and optimizing for scalability with tools like Airflow and Kafka. Use model evaluation suites to refine predictions and support inventory management decisions.
Design and deploy machine learning models for personalized content recommendations, utilizing LLM frameworks like LangChain for enhanced user engagement. Implement A/B testing infrastructure to experiment with algorithms and monitor performance targets for latency and throughput.
Offer data science platforms or tools as a service, with tiered pricing based on usage or features, leveraging cloud deployment and monitoring for high availability. Revenue is generated through recurring subscriptions from businesses needing advanced analytics capabilities.
Provide expert consulting for enterprises to design experiments, build predictive models, and implement MLOps practices, with project-based or retainer fees. Revenue comes from tailored solutions that drive data-driven decisions and optimize business processes.
Develop and license proprietary data products, such as pre-trained models or analytics dashboards, to clients in specific industries like finance or healthcare. Revenue is generated through one-time licenses or usage-based royalties, supported by scalable deployment and security compliance.
💬 Integration Tip
Integrate this skill by setting up automated pipelines with tools like Airflow for experiment workflows and using Docker/Kubernetes for scalable model deployment, ensuring monitoring with MLflow to track performance and compliance.
Quick system diagnostics: CPU, memory, disk, uptime
Query Google Analytics 4 (GA4) data via the Analytics Data API. Use when you need to pull website analytics like top pages, traffic sources, user counts, ses...
Google Analytics 4, Search Console, and Indexing API toolkit. Analyze website traffic, page performance, user demographics, real-time visitors, search queries, and SEO metrics. Use when the user asks to: check site traffic, analyze page views, see traffic sources, view user demographics, get real-time visitor data, check search console queries, analyze SEO performance, request URL re-indexing, inspect index status, compare date ranges, check bounce rates, view conversion data, or get e-commerce revenue. Requires a Google Cloud service account with GA4 and Search Console access.
Google Analytics API integration with managed OAuth. Manage accounts, properties, and data streams (Admin API). Run reports on sessions, users, page views, and conversions (Data API). Use this skill when users want to configure or query Google Analytics. For other third party apps, use the api-gateway skill (https://clawhub.ai/byungkyu/api-gateway).
Deploy privacy-first analytics with correct API patterns, rate limits, and GDPR compliance.
YouTube Data API v3 analytics toolkit. Analyze YouTube channels, videos, and search results. Use when the user asks to: check YouTube channel stats, analyze video performance, compare channels, search for videos, get subscriber counts, view engagement metrics, find trending videos, get channel uploads, or analyze YouTube competition. Requires a YouTube Data API v3 key from Google Cloud Console.