pandas-construction-analysisComprehensive Pandas toolkit for construction data analysis. Filter, group, aggregate BIM elements, calculate quantities, merge datasets, and generate report...
Install via ClawdBot CLI:
clawdbot install datadrivenconstruction/pandas-construction-analysisBased on DDC methodology (Chapter 2.3), this skill provides comprehensive Pandas operations for construction data processing. Pandas is the Swiss Army knife for data analysts - handling everything from simple data filtering to complex aggregations across millions of rows.
Book Reference: "Pandas DataFrame и LLM ChatGPT" / "Pandas DataFrame and LLM ChatGPT"
"Используя Pandas, вы можете управлять и анализировать наборы данных, намного превосходящие возможности Excel. В то время как Excel способен обрабатывать до 1 миллиона строк данных, Pandas может без труда работать с наборами данных, содержащими десятки миллионов строк."
— DDC Book, Chapter 2.3
```python
import pandas as pd
df = pd.read_excel("bim_export.xlsx")
print(df.head()) # First 5 rows
print(df.info()) # Column types and memory
print(df.describe()) # Statistics for numeric columns
structural = df[df['Category'] == 'Structural']
total_volume = df['Volume'].sum()
print(f"Total volume: {total_volume:.2f} m³")
```
```python
import pandas as pd
elements = pd.DataFrame({
'ElementId': ['E001', 'E002', 'E003', 'E004'],
'Category': ['Wall', 'Floor', 'Wall', 'Column'],
'Material': ['Concrete', 'Concrete', 'Brick', 'Steel'],
'Volume_m3': [45.5, 120.0, 32.0, 8.5],
'Level': ['Level 1', 'Level 1', 'Level 2', 'Level 1']
})
df_csv = pd.read_csv("construction_data.csv")
df_excel = pd.read_excel("project_data.xlsx", sheet_name="Elements")
all_sheets = pd.read_excel("project.xlsx", sheet_name=None) # Dict of DataFrames
```
```python
df = pd.DataFrame({
'element_id': pd.Series(['W001', 'W002'], dtype='string'),
'quantity': pd.Series([10, 20], dtype='int64'),
'volume': pd.Series([45.5, 32.0], dtype='float64'),
'is_structural': pd.Series([True, False], dtype='bool'),
'created_date': pd.to_datetime(['2024-01-15', '2024-01-16']),
'category': pd.Categorical(['Wall', 'Slab'])
})
print(df.dtypes)
df['quantity'] = df['quantity'].astype('float64')
df['volume'] = pd.to_numeric(df['volume'], errors='coerce')
```
```python
walls = df[df['Category'] == 'Wall']
large_concrete = df[(df['Material'] == 'Concrete') & (df['Volume_m3'] > 50)]
walls_or_floors = df[(df['Category'] == 'Wall') | (df['Category'] == 'Floor')]
structural = df[df['Category'].isin(['Wall', 'Column', 'Beam', 'Foundation'])]
insulated = df[df['Description'].str.contains('insulated', case=False, na=False)]
incomplete = df[df['Cost'].isna()]
complete = df[df['Cost'].notna()]
```
```python
volumes = df[['ElementId', 'Category', 'Volume_m3']]
result = df.query("Category == 'Wall' and Volume_m3 > 30")
specific_row = df.loc[0] # By label
range_rows = df.iloc[0:10] # By position
specific_cell = df.loc[0, 'Volume_m3'] # Row and column
subset = df.loc[0:5, ['Category', 'Volume_m3']] # Range with columns
```
```python
by_category = df.groupby('Category')['Volume_m3'].sum()
summary = df.groupby('Category').agg({
'Volume_m3': ['sum', 'mean', 'count'],
'Cost': ['sum', 'mean']
})
summary = df.groupby('Category').agg(
total_volume=('Volume_m3', 'sum'),
avg_volume=('Volume_m3', 'mean'),
element_count=('ElementId', 'count'),
total_cost=('Cost', 'sum')
).reset_index()
by_level_cat = df.groupby(['Level', 'Category']).agg({
'Volume_m3': 'sum',
'Cost': 'sum'
}).reset_index()
```
```python
pivot = pd.pivot_table(
df,
values='Volume_m3',
index='Level',
columns='Category',
aggfunc='sum',
fill_value=0,
margins=True, # Add totals
margins_name='Total'
)
pivot_detailed = pd.pivot_table(
df,
values=['Volume_m3', 'Cost'],
index='Level',
columns='Category',
aggfunc={'Volume_m3': 'sum', 'Cost': 'mean'}
)
```
```python
df['Cost_Total'] = df['Volume_m3'] * df['Unit_Price']
df['Size_Category'] = df['Volume_m3'].apply(
lambda x: 'Large' if x > 50 else ('Medium' if x > 20 else 'Small')
)
import numpy as np
df['Is_Large'] = np.where(df['Volume_m3'] > 50, True, False)
df['Volume_Bin'] = pd.cut(
df['Volume_m3'],
bins=[0, 10, 50, 100, float('inf')],
labels=['XS', 'S', 'M', 'L']
)
```
```python
df['Level_Number'] = df['Level'].str.extract(r'(\d+)').astype(int)
df[['Building', 'Floor']] = df['Location'].str.split('-', expand=True)
df['Category'] = df['Category'].str.strip().str.lower().str.title()
df['Material'] = df['Material'].str.replace('Reinforced Concrete', 'RC')
```
```python
df['Start_Date'] = pd.to_datetime(df['Start_Date'])
df['Year'] = df['Start_Date'].dt.year
df['Month'] = df['Start_Date'].dt.month
df['Week'] = df['Start_Date'].dt.isocalendar().week
df['DayOfWeek'] = df['Start_Date'].dt.day_name()
df['Duration_Days'] = (df['End_Date'] - df['Start_Date']).dt.days
recent = df[df['Start_Date'] >= '2024-01-01']
```
```python
elements = pd.DataFrame({
'ElementId': ['E001', 'E002', 'E003'],
'Category': ['Wall', 'Floor', 'Column'],
'Volume_m3': [45.5, 120.0, 8.5]
})
prices = pd.DataFrame({
'Category': ['Wall', 'Floor', 'Column', 'Beam'],
'Unit_Price': [150, 80, 450, 200]
})
merged = elements.merge(prices, on='Category', how='inner')
merged = elements.merge(prices, on='Category', how='left')
result = df1.merge(df2, left_on='elem_id', right_on='ElementId')
```
```python
all_floors = pd.concat([floor1_df, floor2_df, floor3_df], ignore_index=True)
combined = pd.concat([quantities, costs, schedule], axis=1)
new_elements = pd.DataFrame({'ElementId': ['E004'], 'Category': ['Beam']})
df = pd.concat([df, new_elements], ignore_index=True)
```
```python
def generate_qto_report(df):
"""Generate Quantity Take-Off summary by category"""
qto = df.groupby(['Category', 'Material']).agg(
count=('ElementId', 'count'),
total_volume=('Volume_m3', 'sum'),
total_area=('Area_m2', 'sum'),
avg_volume=('Volume_m3', 'mean')
).round(2)
# Add percentage column
qto['volume_pct'] = (qto['total_volume'] /
qto['total_volume'].sum() * 100).round(1)
return qto.sort_values('total_volume', ascending=False)
qto_report = generate_qto_report(df)
qto_report.to_excel("qto_report.xlsx")
```
```python
def calculate_project_cost(elements_df, prices_df, markup=0.15):
"""Calculate total project cost with markup"""
# Merge with prices
df = elements_df.merge(prices_df, on='Category', how='left')
# Calculate base cost
df['Base_Cost'] = df['Volume_m3'] * df['Unit_Price']
# Apply markup
df['Total_Cost'] = df['Base_Cost'] * (1 + markup)
# Summary by category
summary = df.groupby('Category').agg(
volume=('Volume_m3', 'sum'),
base_cost=('Base_Cost', 'sum'),
total_cost=('Total_Cost', 'sum')
).round(2)
return df, summary, summary['total_cost'].sum()
detailed, summary, total = calculate_project_cost(elements, prices)
print(f"Project Total: ${total:,.2f}")
```
```python
def material_summary(df):
"""Summarize materials across project"""
summary = df.groupby('Material').agg({
'Volume_m3': 'sum',
'Weight_kg': 'sum',
'ElementId': 'nunique'
}).rename(columns={'ElementId': 'Element_Count'})
summary['Volume_Pct'] = (summary['Volume_m3'] /
summary['Volume_m3'].sum() * 100).round(1)
return summary.sort_values('Volume_m3', ascending=False)
```
```python
def analyze_by_level(df):
"""Analyze construction quantities by building level"""
level_summary = df.pivot_table(
values=['Volume_m3', 'Cost'],
index='Level',
columns='Category',
aggfunc='sum',
fill_value=0
)
level_summary['Total_Volume'] = level_summary['Volume_m3'].sum(axis=1)
level_summary['Total_Cost'] = level_summary['Cost'].sum(axis=1)
return level_summary
```
```python
def export_to_excel_formatted(df, summary, filepath):
"""Export with multiple sheets"""
with pd.ExcelWriter(filepath, engine='openpyxl') as writer:
df.to_excel(writer, sheet_name='Details', index=False)
summary.to_excel(writer, sheet_name='Summary')
pivot = pd.pivot_table(df, values='Volume_m3',
index='Level', columns='Category')
pivot.to_excel(writer, sheet_name='By_Level')
export_to_excel_formatted(elements, qto_summary, "project_report.xlsx")
```
```python
df.to_csv("output.csv", index=False)
df.to_csv("output.csv", index=False, encoding='utf-8-sig')
df[['ElementId', 'Category', 'Volume_m3']].to_csv("volumes.csv", index=False)
```
```python
df['Category'] = df['Category'].astype('category')
df = pd.read_csv("large_file.csv", usecols=['ElementId', 'Category', 'Volume'])
chunks = pd.read_csv("huge_file.csv", chunksize=100000)
result = pd.concat([chunk[chunk['Category'] == 'Wall'] for chunk in chunks])
print(df.memory_usage(deep=True).sum() / 1024**2, "MB")
```
| Operation | Code |
|-----------|------|
| Read Excel | pd.read_excel("file.xlsx") |
| Read CSV | pd.read_csv("file.csv") |
| Filter rows | df[df['Column'] == 'Value'] |
| Select columns | df[['Col1', 'Col2']] |
| Group and sum | df.groupby('Cat')['Vol'].sum() |
| Pivot table | pd.pivot_table(df, values='Vol', index='Level') |
| Merge | df1.merge(df2, on='key') |
| Add column | df['New'] = df['A'] * df['B'] |
| Export Excel | df.to_excel("out.xlsx", index=False) |
llm-data-automation for generating Pandas code with AIqto-report for specialized QTO calculationscost-estimation-resource for detailed cost calculationsGenerated Mar 1, 2026
Analyze BIM export data to calculate material quantities like concrete volume and steel weight per element category. Use filtering and grouping to generate reports for procurement and cost estimation, ensuring accurate project budgeting.
Process daily construction logs to track completed elements by level and category. Aggregate data to monitor progress percentages and identify delays, supporting project management decisions and stakeholder reporting.
Merge material price datasets with element quantities to calculate total costs per category. Use pivot tables to compare costs across project phases, optimizing budget allocation and identifying cost-saving opportunities.
Filter inspection data to identify elements with missing or non-compliant attributes. Group results by subcontractor or location to prioritize rework and improve construction quality assurance processes.
Aggregate material data to calculate embodied carbon per building element. Use transformations to categorize elements by sustainability ratings, supporting green building certifications and environmental compliance.
Offer a cloud-based tool integrating this skill for construction firms to analyze BIM data. Provide tiered subscriptions based on data volume and features, generating recurring revenue from monthly or annual fees.
Provide expert consulting to help construction companies implement data analysis workflows using this skill. Offer training workshops and custom report development, charging hourly or project-based fees.
License this skill as part of a larger construction management software suite. Integrate with existing BIM tools and charge licensing fees per user or per project, targeting enterprise clients.
💬 Integration Tip
Integrate with existing BIM software exports like Revit or Navisworks to automate data ingestion. Use Python scripts to schedule regular analysis and export results to Excel or PDF for easy sharing.
Use the @steipete/oracle CLI to bundle a prompt plus the right files and get a second-model review (API or browser) for debugging, refactors, design checks, or cross-validation.
Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database). Use when a user asks Clawdbot to add a task to Things, list inbox/today/upcoming, search tasks, or inspect projects/areas/tags.
Local search/indexing CLI (BM25 + vectors + rerank) with MCP mode.
Use when designing database schemas, writing migrations, optimizing SQL queries, fixing N+1 problems, creating indexes, setting up PostgreSQL, configuring EF Core, implementing caching, partitioning tables, or any database performance question.
Connect to Supabase for database operations, vector search, and storage. Use for storing data, running SQL queries, similarity search with pgvector, and managing tables. Triggers on requests involving databases, vector stores, embeddings, or Supabase specifically.
Query, design, migrate, and optimize SQL databases. Use when working with SQLite, PostgreSQL, or MySQL — schema design, writing queries, creating migrations, indexing, backup/restore, and debugging slow queries. No ORMs required.