Data Pipeline & Analytics Skills: ETL, SQL Agents, Pandas, Data Engineering & More
Raw data is worthless without processing. Data pipeline skills handle the gap between "I have data" and "I have answers" — covering analysis, transformation, quality checking, and engineering workflows that production data teams run daily.
ClawHub hosts data pipeline and analytics skills across 5 workflow stages — from exploratory analysis to production ETL pipelines.
Note: Install and download figures in text descriptions reflect stats at the time of writing and may be outdated. All skill tables are live — they fetch current data from the ClawHub database on every page load. Treat table values as authoritative.
Data Overview
| Metric | Value |
|---|---|
| Auto-classified data pipeline skills | 132 |
| Top by downloads | data-analysis ( downloads) |
| Top by installs | data-analysis ( installs) |
| ETL-specific skills | 8 |
一、General Data Analysis Agents
data-analysis (8.4k downloads, 102 installs) is the workhorse: data analysis and visualization, database querying, and report generation in a single skill. 102 installs makes it one of the more consistently-used skills in this category. data-analyst (7.2k downloads, 34 installs) covers the same territory with a more explicit analyst persona — useful when you want Claude to reason about data rather than just compute it.
ai-data-analysis (1.1k downloads) adds automated CSV/Excel cleaning and statistical analysis pipelines. powerdrill-data-analysis-skill (1k downloads) integrates with the Powerdrill data platform for enterprise data catalog access.
二、Python & Pandas
Python-native data work benefits from skills tuned for DataFrame operations rather than natural language descriptions.
python-executor (4.8k downloads, 54 installs) is the broadest tool: executes arbitrary Python code in a sandboxed environment, making it useful for any data task that can be expressed in Python. pandas (648 downloads, 12 installs) focuses specifically on DataFrame patterns — efficient transformations, joins, groupby operations, and the common pitfalls (chained indexing, copy warnings, memory efficiency).
pandas-skill (278 downloads) provides expert-level pandas guidance with emphasis on clean, idiomatic code. pandas-construction-analysis (955 downloads) specializes in construction-industry data formats — an example of a domain-specific pandas wrapper that dramatically outperforms the general tool for its target use case.
三、SQL & Database Agents
sql-query-generator (1k downloads, 4 installs) generates and optimizes SQL queries from natural language descriptions — covering SELECT, JOIN, GROUP BY, window functions, and query plan analysis. data-analysis-sql (45 downloads) targets enterprise-scale SQL engines: Hive, SparkSQL, Presto, ClickHouse — the distributed query systems that appear in large data engineering stacks.
四、ETL Pipelines & Data Engineering
ETL skills handle the plumbing: extracting data from sources, transforming it to target schemas, and loading it to destinations — plus the infrastructure to schedule, monitor, and version that process.
senior-data-engineer (1.6k downloads, 5 installs) is the most comprehensive: scalable data pipeline design, Spark optimization, dbt modeling, and cloud data warehouse integration. afrexai-data-engineering (477 downloads) covers architecture-level decisions — choosing between streaming and batch, designing data lake schemas, and managing backfill strategies.
data-pipeline-toolkit (222 downloads, 8 installs) takes a more operational focus: create, schedule, and monitor ETL pipelines with source-to-destination configuration. dbt-cloud (96 downloads) integrates directly with dbt Cloud's API for project and model management.
spark-engineer (660 downloads) specializes in Apache Spark — distributed data processing, performance tuning, and cluster configuration. For organizations running Spark workloads, the Spark-specific guidance here is more targeted than what general data engineering skills provide.
五、Data Quality & Validation
Data quality is the problem that breaks pipelines silently. These skills build validation checks into the data workflow.
data-quality-check (945 downloads, 4 installs) assesses data completeness, accuracy, and consistency — particularly strong on construction-industry data formats but applicable generally. data-validation (725 downloads, 3 installs) validates data against schemas across languages and formats: JSON Schema, Pydantic models, SQL constraints, and custom rules.
How to Choose
| I need... | Recommended skill |
|---|---|
| General data analysis and visualization | data-analysis |
| Python / Pandas data manipulation | python-executor |
| Efficient Pandas operations | pandas |
| SQL query from natural language | sql-query-generator |
| Enterprise SQL (Hive/Spark/ClickHouse) | data-analysis-sql |
| Design production ETL pipelines | senior-data-engineer |
| dbt Cloud project management | dbt-cloud |
| Apache Spark optimization | spark-engineer |
| Validate data against schemas | data-validation |
Last Words
data-analysis and data-analyst together cover most casual use cases — the 102-install count for data-analysis is the signal. For ad-hoc analysis, EDA, and one-off reports, these two skills handle the 80% case. The specialized skills below them (Pandas, SQL, ETL) are for production workflows that need tighter control.
The ETL skills have notably low install counts relative to downloads. senior-data-engineer has 5 installs vs. 1.6k downloads. This pattern makes sense: ETL pipelines are typically set up once and run automatically, not invoked interactively in every session. The skill is used to design and configure the pipeline, then put away.
Data quality is systematically underinvested. Only 2 dedicated skills explicitly tackle data validation — despite it being the most common cause of pipeline failures. Most skills assume input data is correct. In production, it never is.
python-executor is the Swiss army knife this category needs. 4.8k downloads for a general-purpose code executor suggests that data teams are often writing their own analysis logic rather than relying on pre-built skills. The tool is the skill.
Data source: ClawHub platform download and install stats as of April 16, 2026. Browse more skills at clawhub-skills.com.