data-analysisTurn raw data into decisions with statistical rigor, proper methodology, and awareness of analytical pitfalls.
Install via ClawdBot CLI:
clawdbot install ivangdavila/data-analysisUser asks about: analyzing data, finding patterns, understanding metrics, testing hypotheses, cohort analysis, A/B testing, churn analysis, statistical significance.
Analysis without a decision is just arithmetic. Always clarify: What would change if this analysis shows X vs Y?
Before touching data:
| Pitfall | What it looks like | How to avoid |
|---------|-------------------|--------------|
| Simpson's Paradox | Trend reverses when you segment | Always check by key dimensions |
| Survivorship bias | Only analyzing current users | Include churned/failed in dataset |
| Comparing unequal periods | Feb (28d) vs March (31d) | Normalize to per-day or same-length windows |
| p-hacking | Testing until something is "significant" | Pre-register hypotheses or adjust for multiple comparisons |
| Correlation in time series | Both went up = "related" | Check if controlling for time removes relationship |
| Aggregating percentages | Averaging percentages directly | Re-calculate from underlying totals |
For detailed examples of each pitfall, see pitfalls.md.
| Question type | Approach | Key output |
|---------------|----------|------------|
| "Is X different from Y?" | Hypothesis test | p-value + effect size + CI |
| "What predicts Z?" | Regression/correlation | Coefficients + R² + residual check |
| "How do users behave over time?" | Cohort analysis | Retention curves by cohort |
| "Are these groups different?" | Segmentation | Profiles + statistical comparison |
| "What's unusual?" | Anomaly detection | Flagged points + context |
For technique details and when to use each, see techniques.md.
Generated Mar 1, 2026
An online retailer wants to analyze A/B test results for a new checkout page design to determine if it significantly increases conversion rates. The analysis must account for seasonality, ensure statistical significance, and quantify the lift with confidence intervals to support a rollout decision.
A software-as-a-service company seeks to understand factors driving customer churn by analyzing user behavior data over time. The analysis involves cohort analysis, regression modeling to identify predictors, and checking for pitfalls like survivorship bias to inform retention strategies.
A hospital aims to compare the effectiveness of two treatment protocols by analyzing patient outcome data. The analysis requires rigorous hypothesis testing, controlling for confounders like age and comorbidities, and quantifying effect sizes to guide clinical decisions.
A marketing team needs to evaluate the return on investment for multiple digital ad campaigns across different channels. The analysis involves segmentation, statistical comparisons of conversion metrics, and avoiding pitfalls like aggregating percentages incorrectly to allocate budget effectively.
A manufacturing plant wants to detect anomalies in production line data to reduce defects. The analysis uses anomaly detection techniques, checks for correlation in time series, and quantifies uncertainty to recommend process improvements and maintenance schedules.
Businesses that rely on recurring revenue from customers, such as SaaS or streaming services. Data analysis is crucial for tracking metrics like monthly recurring revenue, churn rates, and customer lifetime value to optimize pricing and retention strategies.
Companies that generate revenue per transaction, like e-commerce platforms or payment processors. Data analysis helps in optimizing conversion funnels, analyzing A/B tests for user interfaces, and understanding customer purchase patterns to increase average order value.
Businesses that monetize through ad impressions or clicks, such as social media or content websites. Data analysis is used to measure engagement metrics, conduct cohort analysis for user retention, and evaluate ad performance to maximize ad revenue and user growth.
💬 Integration Tip
Integrate this skill by first clarifying the decision goal with stakeholders, then applying the methodology checklist to ensure statistical rigor and avoid common pitfalls before proceeding with data analysis.
Use the @steipete/oracle CLI to bundle a prompt plus the right files and get a second-model review (API or browser) for debugging, refactors, design checks, or cross-validation.
Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database). Use when a user asks Clawdbot to add a task to Things, list inbox/today/upcoming, search tasks, or inspect projects/areas/tags.
Local search/indexing CLI (BM25 + vectors + rerank) with MCP mode.
Use when designing database schemas, writing migrations, optimizing SQL queries, fixing N+1 problems, creating indexes, setting up PostgreSQL, configuring EF Core, implementing caching, partitioning tables, or any database performance question.
Connect to Supabase for database operations, vector search, and storage. Use for storing data, running SQL queries, similarity search with pgvector, and managing tables. Triggers on requests involving databases, vector stores, embeddings, or Supabase specifically.
Query, design, migrate, and optimize SQL databases. Use when working with SQLite, PostgreSQL, or MySQL — schema design, writing queries, creating migrations, indexing, backup/restore, and debugging slow queries. No ORMs required.