vlmrun-cli-skillUse the VLM Run CLI (`vlmrun`) to interact with Orion visual AI agent. Process images, videos, and documents with natural language. Triggers: image understanding/generation, object detection, OCR, video summarization, document extraction, image generation, visual AI chat, 'generate an image/video', 'analyze this image/video', 'extract text from', 'summarize this video', 'process this PDF'.
Install via ClawdBot CLI:
clawdbot install spillai/vlmrun-cli-skillChat with VLM Run's Orion visual AI agent via CLI.
uv venv && source .venv/bin/activate
uv pip install "vlmrun[cli]"
You must load the following variables in your environment so that the CLI can use them. You may load the ./env file to your environment.
| Variable | Type | Description |
|----------|------|-------------|
| VLMRUN_API_KEY | Required | Your VLM Run API key (required) |
| VLMRUN_BASE_URL | Optional | Base URL (default: https://agent.vlm.run/v1) |
| VLMRUN_CACHE_DIR | Optional | Cache directory (default: ~/.vlmrun/cache/artifacts/) |
vlmrun chat "<prompt>" -i input.jpg [options]
| Flag | Description |
|------|-------------|
| -p, --prompt | Prompt text, file path, or stdin |
| -i, --input | Input file(s) - images, videos, docs (repeatable) |
| -o, --output | Artifact directory (default: ~/.vlmrun/cache/artifacts/) |
| -m, --model | vlmrun-orion-1:fast, vlmrun-orion-1:auto (default), vlmrun-orion-1:pro |
| -s, --session | Optional session ID to continue a previous session |
| -j, --json | Raw JSON output |
| -ns, --no-stream | Disable streaming |
| -nd, --no-download | Skip artifact download |
vlmrun chat "Describe what you see in this image in detail" -i photo.jpg
vlmrun chat "Detect and list all objects visible in this scene" -i scene.jpg
vlmrun chat "Extract all text and numbers from this document image" -i document.png
vlmrun chat "Compare these two images and describe the differences" -i before.jpg -i after.jpg
vlmrun chat "Generate a photorealistic image of a cozy cabin in a snowy forest at sunset" -o ./generated
vlmrun chat "Remove the background from this product image and make it transparent" -i product.jpg -o ./output
vlmrun chat "Summarize the key points discussed in this meeting video" -i meeting.mp4
vlmrun chat "Find the top 3 highlight moments and create short clips from them" -i sports.mp4
vlmrun chat "Transcribe this lecture with timestamps for each section" -i lecture.mp4 --json
vlmrun chat "Generate a 5-second video of ocean waves crashing on a rocky beach at golden hour" -o ./videos
vlmrun chat "Create a smooth slow-motion video from this image" -i ocean.jpg -o ./output
vlmrun chat "Extract the vendor name, line items, and total amount" -i invoice.pdf --json
vlmrun chat "Summarize the key terms and obligations in this contract" -i contract.pdf
# Direct prompt
vlmrun chat "What objects and people are visible in this image?" -i photo.jpg
# Prompt from file
vlmrun chat -p long_prompt.txt -i photo.jpg
# Prompt from stdin
echo "Describe this image in detail" | vlmrun chat - -i photo.jpg
If you want to keep the past conversation and generated artifacts in context, you can use the -s flag to continue a previous session using the session ID generated when you started the session.
# Start a new session of an image generation task where a new character is generated
vlmrun chat "Create an iconic scene of a ninja in a forest, practicing his skills with a katana?" -i photo.jpg
# Use the previous chat session in context to retain the same character and scene context (where the session ID is <session_id>)
vlmrun chat "Create a new scene with the same character meditating under a tree" -i photo.jpg -s <session_id>
If you want to skip the artifact download, you can use the -nd flag.
vlmrun chat "What objects and people are visible in this image?" -i photo.jpg -nd
-o ./ to save generated artifacts (images, videos) relative to your current working directory-o, artifacts save to ~/.vlmrun/cache/artifacts// Generated Mar 1, 2026
Online retailers can use this skill to automatically generate high-quality product images, remove backgrounds for transparent PNGs, and create detailed descriptions for listings. This streamlines catalog management and improves visual appeal, reducing manual editing time.
Law firms can process contracts and legal documents to extract key terms, obligations, and summarize content. This aids in quick review and due diligence, helping lawyers focus on critical clauses without manual scanning.
Educational institutions can summarize lecture videos, transcribe with timestamps, and create highlight clips for study materials. This enhances learning efficiency by providing concise, searchable content for students and educators.
Marketing agencies can generate photorealistic images and short videos for campaigns, such as promotional visuals or social media content. This accelerates creative production and allows for rapid iteration based on client feedback.
Healthcare providers can extract text from medical forms, invoices, and reports using OCR capabilities. This automates data entry, reduces errors, and helps in organizing patient information for administrative tasks.
Offer a cloud-based API service where businesses pay a monthly fee based on usage tiers (e.g., number of images processed or API calls). This provides predictable revenue and scales with client demand for visual AI tasks.
Provide a free CLI version with basic features and limited usage, then charge for advanced capabilities like high-resolution generation, batch processing, or priority support. This attracts individual users and converts them to paid plans.
Sell custom integrations into existing business workflows, such as CRM or content management systems, with dedicated support and training. This targets large organizations needing tailored solutions for document and media processing.
💬 Integration Tip
Set up environment variables like VLMRUN_API_KEY in a .env file for easy CLI access, and use the -o flag to organize output artifacts in project directories.
Remote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
Command-line tool to manage Google Workspace services including Gmail, Calendar, Drive, Sheets, Docs, Slides, Contacts, Tasks, People, Groups, and Keep.
Runs shell commands inside a dedicated tmux session named claw, captures, and returns the output, with safety checks for destructive commands.
A modern text-based browser. Renders web pages in the terminal using headless Firefox.
Write robust, portable shell scripts. Use when parsing arguments, handling errors properly, writing POSIX-compatible scripts, managing temp files, running commands in parallel, managing background processes, or adding --help to scripts.
NotebookLM CLI wrapper via `node {baseDir}/scripts/notebooklm.mjs`. Use for auth, notebooks, chat, sources, notes, sharing, research, and artifact generation/download.