instagram-scraperBrowser-based tool to discover Instagram profiles by location/category and scrape their public info, stats, images, and engagement with export options.
Install via ClawdBot CLI:
clawdbot install ArulmozhiV/instagram-scraperA browser-based Instagram profile discovery and scraping tool.
Part of ScrapeClaw — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, and Facebook built with Python & Playwright, no API keys required.
---
name: instagram-scraper
description: Discover and scrape Instagram profiles from your browser.
emoji: 📸
version: 1.0.3
author: influenza
tags:
- instagram
- scraping
- social-media
- influencer-discovery
metadata:
clawdbot:
requires:
bins:
- python3
- chromium
config:
stateDirs:
- data/output
- data/queue
- thumbnails
outputFormats:
- json
- csv
---
This skill provides a two-phase Instagram scraping system:
instagram.com as the site to searchFor OpenClaw agent integration, the skill provides JSON output:
# Discover profiles (returns JSON)
discover --location "Miami" --category "fitness" --output json
# Scrape single profile (returns JSON)
scrape --username influencer123 --output json
{
"username": "example_user",
"full_name": "Example User",
"bio": "Fashion blogger | NYC",
"followers": 125000,
"following": 1500,
"posts_count": 450,
"is_verified": false,
"is_private": false,
"influencer_tier": "mid",
"category": "fashion",
"location": "New York",
"profile_pic_local": "thumbnails/example_user/profile_abc123.jpg",
"content_thumbnails": [
"thumbnails/example_user/content_1_def456.jpg",
"thumbnails/example_user/content_2_ghi789.jpg"
],
"post_engagement": [
{"post_url": "https://instagram.com/p/ABC123/", "likes": 5420, "comments": 89}
],
"scrape_timestamp": "2025-02-09T14:30:00"
}
| Tier | Follower Range |
|-------|-------------------|
| nano | < 1,000 |
| micro | 1,000 - 10,000 |
| mid | 10,000 - 100,000 |
| macro | 100,000 - 1M |
| mega | > 1,000,000 |
data/queue/{location}_{category}_{timestamp}.jsondata/output/{username}.jsonthumbnails/{username}/profile_.jpg, thumbnails/{username}/content_.jpgdata/export_{timestamp}.json, data/export_{timestamp}.csvEdit config/scraper_config.json:
{
"proxy": {
"enabled": false,
"provider": "brightdata",
"country": "",
"sticky": true,
"sticky_ttl_minutes": 10
},
"google_search": {
"enabled": true,
"api_key": "",
"search_engine_id": "",
"queries_per_location": 3
},
"scraper": {
"headless": false,
"min_followers": 1000,
"download_thumbnails": true,
"max_thumbnails": 6
},
"cities": ["New York", "Los Angeles", "Miami", "Chicago"],
"categories": ["fashion", "beauty", "fitness", "food", "travel", "tech"]
}
The scraper automatically filters out:
Running a scraper at scale without a residential proxy will get your IP blocked fast. Here's why proxies are essential for long-running scrapes:
| Advantage | Description |
|-----------|-------------|
| Avoid IP Bans | Residential IPs look like real household users, not data-center bots. Instagram is far less likely to flag them. |
| Automatic IP Rotation | Each request (or session) gets a fresh IP, so rate-limits never stack up on one address. |
| Geo-Targeting | Route traffic through a specific country/city so scraped content matches the target audience's locale. |
| Sticky Sessions | Keep the same IP for a configurable window (e.g. 10 min) — critical for maintaining a consistent browsing session. |
| Higher Success Rate | Rotating residential IPs deliver 95%+ success rates compared to ~30% with data-center proxies on Instagram. |
| Long-Running Scrapes | Scrape thousands of profiles over hours or days without interruption. |
| Concurrent Scraping | Run multiple browser instances across different IPs simultaneously. |
We have affiliate partnerships with top residential proxy providers. Using these links supports continued development of this skill:
| Provider | Best For | Sign Up |
|----------|----------|---------|
| Bright Data | World's largest network, 72M+ IPs, enterprise-grade | 👉 Get Bright Data |
| IProyal | Pay-as-you-go, 195+ countries, no traffic expiry | 👉 Get IProyal |
| Storm Proxies | Fast & reliable, developer-friendly API, competitive pricing | 👉 Get Storm Proxies |
| NetNut | ISP-grade network, 52M+ IPs, direct connectivity | 👉 Get NetNut |
Sign up with any provider above, then grab:
export PROXY_ENABLED=true
export PROXY_PROVIDER=brightdata # brightdata | iproyal | stormproxies | netnut | custom
export PROXY_USERNAME=your_user
export PROXY_PASSWORD=your_pass
export PROXY_COUNTRY=us # optional: two-letter country code
export PROXY_STICKY=true # optional: keep same IP per session
These are auto-configured when you set the provider name:
| Provider | Host | Port |
|----------|------|------|
| Bright Data | brd.superproxy.io | 22225 |
| IProyal | proxy.iproyal.com | 12321 |
| Storm Proxies | rotating.stormproxies.com | 9999 |
| NetNut | gw-resi.netnut.io | 5959 |
Override with PROXY_HOST / PROXY_PORT env vars if your plan uses a different gateway.
For any other proxy service, set provider to custom and supply host/port manually:
{
"proxy": {
"enabled": true,
"provider": "custom",
"host": "your.proxy.host",
"port": 8080,
"username": "user",
"password": "pass"
}
}
Once configured, the scraper picks up the proxy automatically — no extra flags needed:
# Discover and scrape as usual — proxy is applied automatically
python main.py discover --location "Miami" --category "fitness"
python main.py scrape --username influencer123
# The log will confirm proxy is active:
# INFO - Proxy enabled: <ProxyManager provider=brightdata enabled host=brd.superproxy.io:22225>
# INFO - Browser using proxy: brightdata → brd.superproxy.io:22225
from proxy_manager import ProxyManager
# From config (auto-reads config/scraper_config.json)
pm = ProxyManager.from_config()
# From environment variables
pm = ProxyManager.from_env()
# Manual construction
pm = ProxyManager(
provider="brightdata",
username="your_user",
password="your_pass",
country="us",
sticky=True
)
# For Playwright browser context
proxy = pm.get_playwright_proxy()
# → {"server": "http://brd.superproxy.io:22225", "username": "user-country-us-session-abc123", "password": "pass"}
# For requests / aiohttp
proxies = pm.get_requests_proxy()
# → {"http": "http://user:pass@host:port", "https": "http://user:pass@host:port"}
# Force new IP (rotates session ID)
pm.rotate_session()
# Debug info
print(pm.info())
"sticky": true."country": "us" (or your target region) so Instagram serves content in the expected locale.pm.rotate_session() between large batches of profiles to get a fresh IP.delay_between_profiles in config to avoid aggressive patterns.Generated Mar 1, 2026
Marketing agencies can use this skill to discover and vet Instagram influencers in specific locations and categories, such as fitness trainers in Miami, for targeted campaigns. It provides detailed engagement data and follower tiers to ensure brand alignment and ROI. The scraped profile data and thumbnails help create curated influencer lists and performance reports.
Retail brands can scrape competitor Instagram profiles to analyze content strategies, follower growth, and engagement metrics in real-time. This helps identify market trends, benchmark performance, and optimize social media strategies. The JSON/CSV exports facilitate easy integration with analytics dashboards for ongoing monitoring.
Entertainment agencies can discover emerging Instagram creators in niches like fashion or beauty by location, assessing their potential through verified stats and content quality. The tool filters out low-followers and private accounts, streamlining the scouting process. Downloaded thumbnails aid in portfolio reviews and client presentations.
Local businesses, such as restaurants or gyms, can scrape Instagram profiles in their city to understand local influencer landscapes and customer preferences. This data informs partnership opportunities and targeted marketing efforts. The resume feature allows for periodic updates without restarting from scratch.
Researchers can use this skill to collect large-scale Instagram data on user behavior, engagement patterns, and demographic trends across different categories and locations. The browser simulation ensures accurate data capture, while JSON output supports statistical analysis. Proxy support enables ethical, high-volume scraping without IP blocks.
Build a subscription-based platform that leverages this skill to offer automated Instagram scraping, analytics dashboards, and influencer discovery tools to marketing agencies. Revenue comes from monthly plans tiered by scrape volume and features. Integration with residential proxies ensures reliable service for enterprise clients.
Provide customized Instagram data feeds to large corporations in retail, finance, or media, delivering scraped profile insights via API or exports. Revenue is generated through annual contracts based on data scope and frequency. The skill's configurable filters and output formats allow tailoring to specific client needs.
Offer managed services using this skill to conduct one-off or ongoing Instagram audits, competitor analysis, and influencer outreach for clients. Revenue is earned through project-based fees or retainer agreements. The tool's ease of use and proxy support enable scalable service delivery with minimal technical overhead.
💬 Integration Tip
Ensure Python3 and Chromium are installed on the agent's system, and configure residential proxies in the JSON config to avoid IP bans during large-scale scrapes.
Fetch and read transcripts from YouTube videos. Use when you need to summarize a video, answer questions about its content, or extract information from it.
Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.
Browse, search, post, and moderate Reddit. Read-only works without auth; posting/moderation requires OAuth setup.
Interact with Twitter/X — read tweets, search, post, like, retweet, and manage your timeline.
LinkedIn automation via browser relay or cookies for messaging, profile viewing, and network actions.
Search YouTube videos, get channel info, fetch video details and transcripts using YouTube Data API v3 via MCP server or yt-dlp fallback.