meta-video-ad-analyzerExtract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.
Install via ClawdBot CLI:
clawdbot install meta-video-ad-analyzerAI-powered video content extraction using Google Gemini Vision.
# Required for Gemini Vision
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Required for audio transcription
# (same service account needs Speech-to-Text API enabled)
pip install opencv-python pillow easyocr ffmpeg-python google-cloud-speech vertexai google-api-python-client
Also requires ffmpeg and ffprobe installed on system.
from scripts.video_extractor import VideoExtractor
from scripts.models import ExtractedVideoContent
import vertexai
from vertexai.generative_models import GenerativeModel
# Initialize Vertex AI
vertexai.init(project="your-project-id", location="us-central1")
gemini_model = GenerativeModel("gemini-1.5-flash")
# Create extractor
extractor = VideoExtractor(gemini_model=gemini_model)
# Analyze video
result = extractor.extract_content("/path/to/video.mp4")
print(f"Duration: {result.duration}s")
print(f"Scenes: {len(result.scene_timeline)}")
print(f"Text overlays: {len(result.text_timeline)}")
print(f"Transcript: {result.transcript[:200]}...")
frames, timestamps, text_timeline, scene_timeline, thumbnail = extractor.extract_smart_frames(
"/path/to/video.mp4",
scene_interval=2, # Check for scene changes every 2s
text_interval=0.5 # Check for text every 0.5s
)
# Works with images too
result = extractor.extract_content("/path/to/image.jpg")
print(result.scene_timeline[0]['description'])
ExtractedVideoContent(
video_path="/path/to/video.mp4",
duration=30.5,
transcript="Here's what we found...",
text_timeline=[
{"at": 0.0, "text": ["Download Now"]},
{"at": 5.5, "text": ["50% Off Today"]}
],
scene_timeline=[
{"timestamp": 0.0, "description": "Woman using phone app..."},
{"timestamp": 2.0, "description": "Product showcase with features..."}
],
thumbnail_url="/static/thumbnails/video_thumb.jpg",
extraction_complete=True
)
| Feature | Description |
|---------|-------------|
| Scene Detection | Histogram-based change detection (threshold=65) |
| OCR Confidence | Tiered thresholds (0.5 high, 0.3 low) |
| AI Proofreading | Gemini cleans up OCR errors |
| Source Reconciliation | Merges OCR + Vision text intelligently |
| Native Video | Direct Gemini analysis for <20MB files |
Customize AI behavior by editing prompts in the prompts/ folder:
scene_analysis.md - Frame analysis promptsscene_reconciliation.md - Scene enrichment promptsGenerated Mar 1, 2026
Marketing teams analyze video ads to extract text overlays and scene descriptions, identifying key messaging and visual elements. This helps optimize ad creatives by comparing successful campaigns and refining call-to-action placement.
Regulatory or legal departments use the skill to transcribe audio and detect text in video ads, ensuring compliance with advertising standards and copyright laws. It automates checks for disclaimers, trademarks, or prohibited content.
Content creators generate transcripts and scene descriptions from video ads to improve accessibility for viewers with disabilities. This supports adding captions and audio descriptions to meet accessibility guidelines.
Business analysts extract and analyze competitors' video ads to understand their messaging, product features, and promotional strategies. This informs market positioning and identifies trends in advertising tactics.
Educators or training providers use the skill to break down instructional videos into scenes and text, creating study guides or summaries. It helps in repurposing video content for e-learning modules.
Offer a subscription-based service where ad agencies access video analysis tools to optimize creatives. Revenue comes from tiered plans based on usage volume, such as number of videos analyzed per month.
Provide specialized consulting to businesses needing video ad compliance checks, using the skill to automate audits. Revenue is generated through project-based fees or retainer agreements for ongoing monitoring.
License the skill as an API for integration into video hosting or social media platforms, enabling users to analyze ads directly. Revenue comes from API usage fees or per-request pricing models.
💬 Integration Tip
Ensure Google Cloud Speech API is enabled and service account credentials are properly configured to avoid authentication errors during audio transcription.
Extract frames or short clips from videos using ffmpeg.
Download videos, audio, subtitles, and clean paragraph-style transcripts from YouTube and any other yt-dlp supported site. Use when asked to “download this video”, “save this clip”, “rip audio”, “get subtitles”, “get transcript”, or to troubleshoot yt-dlp/ffmpeg and formats/playlists.
Generate SRT subtitles from video/audio with translation support. Transcribes Hebrew (ivrit.ai) and English (whisper), translates between languages, burns subtitles into video. Use for creating captions, transcripts, or hardcoded subtitles for WhatsApp/social media.
Create AI videos with optimized prompts, motion control, and platform-ready output.
自动登录抖音账号,上传并发布视频到抖音创作者平台,支持视频标签管理和登录状态检查。
AI video generation workflow on Volcengine. Use when users need text-to-video, image-to-video, generation parameter tuning, or async task troubleshooting for video jobs.