vision-sandboxAgentic Vision via Gemini's native Code Execution sandbox. Use for spatial grounding, visual math, and UI auditing.
Install via ClawdBot CLI:
clawdbot install johanesalxd/vision-sandboxLeverage Gemini's native code execution to analyze images with high precision. The model writes and runs Python code in a Google-hosted sandbox to verify visual data, perfect for UI auditing, spatial grounding, and visual reasoning.
clawhub install vision-sandbox
uv run vision-sandbox --image "path/to/image.png" --prompt "Identify all buttons and provide [x, y] coordinates."
Ask the model to find specific items and return coordinates.
Ask the model to count or calculate based on the image.
Check layout and readability.
Solve visual counting tasks with code verification.
This skill is designed to provide Visual Grounding for automated coding agents like OpenCode.
vision-sandbox to extract UI metadata (coordinates, sizes, colors).gemini-3-flash-preview.Generated Mar 1, 2026
Automate visual verification of product pages to ensure buttons, images, and text are correctly positioned and functional. This reduces manual QA effort and improves user experience by detecting layout issues early in development cycles.
Analyze diagrams or charts in textbooks or online courses to extract data points, count elements, or verify spatial relationships. This aids in creating interactive learning materials and automating assessment of visual assignments.
Process medical forms or charts to locate and extract specific fields, such as patient data or diagnostic markers, using spatial grounding. This enhances accuracy in digitizing records and supports compliance with data handling standards.
Inspect assembly line images to count components, verify placements, or detect defects by analyzing visual patterns. This streamlines production monitoring and reduces errors through automated visual audits.
Offer the skill as a cloud-based service with tiered pricing based on usage volume, such as number of images processed per month. This provides recurring revenue and scales easily with client demand for automated visual analysis.
Provide custom integration services to embed the skill into existing workflows, such as combining with OpenCode for UI development. Revenue comes from project-based fees and ongoing support contracts for tailored solutions.
Release a free basic version with limited features to attract individual developers, then upsell to a premium version with advanced capabilities like batch processing or API access. This builds a user base and drives conversions.
đŹ Integration Tip
Integrate with OpenCode by passing JSON outputs from visual analysis directly into coding workflows to automate UI adjustments based on detected coordinates and elements.
Generate/edit images with Nano Banana Pro (Gemini 3 Pro Image). Use for image create/modify requests incl. edits. Supports text-to-image + image-to-image; 1K/2K/4K; use --input-image.
Capture frames or clips from RTSP/ONVIF cameras.
Batch-generate images via OpenAI Images API. Random prompt sampler + `index.html` gallery.
Generate images using the internal Google Antigravity API (Gemini 3 Pro Image). High quality, native generation without browser automation.
äœżçšć çœź image_generate.py èæŹçæćŸç, ć〿ž æ°ć ·äœç `prompt`ă
AI image generation powered by CellCog. Create images, edit photos, consistent characters, product photography, reference-based images, sets of images, style transfer. Professional image creation with AI.