korean-scraperKorean website specialized scraper with anti-bot protection (Naver, Coupang, Daum, Instagram)
Install via ClawdBot CLI:
clawdbot install mupengi-bot/korean-scraperํ๊ตญ ์น์ฌ์ดํธ ์ ๋ฌธ ์คํฌ๋ํผ โ Playwright ๊ธฐ๋ฐ์ผ๋ก ๋ค์ด๋ฒ, ์ฟ ํก, ๋ค์ ๋ฑ ํ๊ตญ ์ฃผ์ ์ฌ์ดํธ์์ ๊ตฌ์กฐํ๋ ๋ฐ์ดํฐ๋ฅผ ์ถ์ถํฉ๋๋ค. Anti-bot ๋ณดํธ ์ฐํ ๊ธฐ๋ฅ ํฌํจ.
cd skills/korean-scraper
npm install
npx playwright install chromium
# ๊ฒ์ ๊ฒฐ๊ณผ ์์ง
node scripts/naver-blog.js search "๋ง์ง ์ถ์ฒ" --limit 10
# ํน์ ๋ธ๋ก๊ทธ ๋ณธ๋ฌธ ์ถ์ถ
node scripts/naver-blog.js extract "https://blog.naver.com/..."
# ์ธ๊ธฐ๊ธ ์์ง
node scripts/naver-cafe.js popular "์นดํURL" --limit 20
# ์ต์ ๊ธ ์์ง
node scripts/naver-cafe.js recent "์นดํURL" --limit 20
# ์ํ ์ ๋ณด ์ถ์ถ
node scripts/coupang.js product "์ํURL"
# ๊ฒ์ ๊ฒฐ๊ณผ ์์ง
node scripts/coupang.js search "๋ฌด์ ์ด์ดํฐ" --limit 20
# ๊ฒ์ ๊ฒฐ๊ณผ ์์ง
node scripts/naver-news.js search "AI" --limit 10
# ๊ธฐ์ฌ ๋ณธ๋ฌธ ์ถ์ถ
node scripts/naver-news.js extract "https://n.news.naver.com/..."
# ๊ฒ์ ๊ฒฐ๊ณผ ์์ง
node scripts/daum-news.js search "๊ฒฝ์ " --limit 10
# ๊ธฐ์ฌ ๋ณธ๋ฌธ ์ถ์ถ
node scripts/daum-news.js extract "https://v.daum.net/..."
๋ชจ๋ ์คํฌ๋ฆฝํธ๋ ๊ตฌ์กฐํ๋ JSON์ ๋ฐํํฉ๋๋ค:
{
"status": "success",
"query": "๋ง์ง ์ถ์ฒ",
"count": 10,
"results": [
{
"title": "์์ธ ๊ฐ๋จ ๋ง์ง ์ถ์ฒ BEST 5",
"url": "https://blog.naver.com/...",
"blogger": "๋ง์งํํ๊ฐ",
"date": "2026-02-15",
"snippet": "๊ฐ๋จ์ญ ๊ทผ์ฒ ์จ์ ๋ง์ง๋ค์..."
}
]
}
{
"status": "success",
"url": "https://blog.naver.com/...",
"title": "์์ธ ๊ฐ๋จ ๋ง์ง ์ถ์ฒ BEST 5",
"author": "๋ง์งํํ๊ฐ",
"date": "2026-02-15",
"content": "# ์์ธ ๊ฐ๋จ ๋ง์ง ์ถ์ฒ BEST 5\n\n1. ...",
"images": ["https://..."],
"tags": ["๋ง์ง", "๊ฐ๋จ", "์์ธ"]
}
{
"status": "success",
"url": "https://www.coupang.com/...",
"productName": "์ ํ ์์ดํ ํ๋ก 2์ธ๋",
"price": 299000,
"originalPrice": 359000,
"discount": "17%",
"rating": 4.8,
"reviewCount": 1523,
"rocketDelivery": true,
"seller": "์ฟ ํก",
"images": ["https://..."]
}
{
"status": "success",
"cafeUrl": "https://cafe.naver.com/...",
"type": "popular",
"count": 20,
"posts": [
{
"title": "์ ์
ํ์ ์ธ์ฌ๋๋ฆฝ๋๋ค",
"url": "https://cafe.naver.com/.../12345",
"author": "๋๋ค์",
"date": "2026-02-17",
"views": 523,
"comments": 12
}
]
}
{
"status": "success",
"url": "https://n.news.naver.com/...",
"title": "AI ์์ฅ ๊ท๋ชจ ๊ธ์ฑ์ฅ ์ ๋ง",
"media": "์กฐ์ ์ผ๋ณด",
"author": "ํ๊ธธ๋ ๊ธฐ์",
"date": "2026-02-17 09:30",
"content": "# AI ์์ฅ ๊ท๋ชจ ๊ธ์ฑ์ฅ ์ ๋ง\n\n...",
"category": "IT/๊ณผํ",
"images": ["https://..."]
}
๋ชจ๋ ์คํฌ๋ฆฝํธ๋ ๊ธฐ๋ณธ์ ์ผ๋ก ์ฌ์ดํธ๋ฅผ ๋ณดํธํฉ๋๋ค:
--fast ํ๋๊ทธ๋ก ๋๋ ์ด ์ถ์ ๊ฐ๋ฅ (์ฃผ์)| ์ํฉ | ๋์ |
|------|------|
| 404 | JSON์ผ๋ก ์๋ฌ ๋ฐํ, ๊ณ์ ์งํ |
| 403/์ฐจ๋จ | ์ฌ์๋ (์ต๋ 3ํ) |
| ํ์์์ | ๋๊ธฐ ์๊ฐ ์ฐ์ฅ ํ ์ฌ์๋ |
| ๋ก๊ทธ์ธ ํ์ | ๊ฒฝ๊ณ ๋ฉ์์ง + ๊ฐ๋ฅํ ๋ฐ์ดํฐ๋ง ๋ฐํ |
# Headless ๋ชจ๋ ๋๊ธฐ (๋๋ฒ๊น
์ฉ)
HEADLESS=false node scripts/naver-blog.js ...
# ์คํฌ๋ฆฐ์ท ์ ์ฅ
SCREENSHOT=true node scripts/coupang.js ...
# ๋๊ธฐ ์๊ฐ ์กฐ์ (ms)
WAIT_TIME=10000 node scripts/naver-cafe.js ...
# User-Agent ์ปค์คํ
USER_AGENT="..." node scripts/naver-news.js ...
// ๋ค์ด๋ฒ ๋ธ๋ก๊ทธ ๊ฒ์
const result = await exec({
command: 'node scripts/naver-blog.js search "AI ํธ๋ ๋" --limit 5',
workdir: '/path/to/skills/korean-scraper'
});
const data = JSON.parse(result.stdout);
# ์ฌ๋ฌ URL ์ผ๊ด ์ฒ๋ฆฌ
cat urls.txt | while read url; do
node scripts/naver-blog.js extract "$url" >> results.jsonl
done
ํด๊ฒฐ์ฑ :
WAIT_TIME=15000)HEADLESS=false)ํด๊ฒฐ์ฑ :
ํด๊ฒฐ์ฑ :
WAIT_TIME ๋๋ฆฌ๊ธฐํ๊ตญ ์ฌ์ดํธ๋ค์ UI๋ฅผ ์์ฃผ ๋ณ๊ฒฝํ๋ฏ๋ก, ์ ๋ ํฐ ์ ๋ฐ์ดํธ๊ฐ ํ์ํ ์ ์์ต๋๋ค.
์
๋ ํฐ ์์น: scripts/ ๋ด ๊ฐ ํ์ผ ์๋จ SELECTORS ๊ฐ์ฒด
const SELECTORS = {
blogTitle: '.se-title-text',
blogContent: '.se-main-container',
// ...
};
Generated Mar 1, 2026
E-commerce companies can use this scraper to monitor competitor pricing and product reviews on Coupang, Korea's largest e-commerce platform. This enables dynamic pricing strategies and product positioning based on real-time market data from Korean consumers.
Marketing agencies can scrape Naver blogs and cafes to identify trending topics, popular influencers, and consumer sentiment in the Korean market. This helps in creating targeted marketing campaigns and understanding niche community discussions.
Financial institutions can extract structured data from Naver and Daum news to monitor market-moving events, regulatory changes, and economic indicators specific to South Korea. This supports investment decisions and risk assessment in Korean markets.
Global companies entering the Korean market can scrape blog content and news articles to understand local preferences, cultural nuances, and language usage patterns. This informs content localization strategies and product adaptation for Korean consumers.
Researchers studying Korean society, media, or consumer behavior can systematically collect data from multiple Korean websites while maintaining ethical scraping practices. The structured JSON output facilitates quantitative analysis of Korean online content.
Offer a subscription-based service providing structured data feeds from Korean websites to international businesses. Clients receive regular reports on pricing, trends, and consumer sentiment from Naver, Coupang, and Daum through automated scraping pipelines.
Provide customized scraping services to help businesses monitor Korean competitors' pricing, product launches, and marketing strategies. The anti-bot features ensure reliable data collection while rate limiting maintains compliance with Korean website policies.
Develop a REST API that exposes scraped Korean website data to developers and businesses. Charge based on API calls and data volume, with endpoints for blog content, product information, and news articles from major Korean platforms.
๐ฌ Integration Tip
Use the OpenClaw Agent integration example to embed scraping commands within automated workflows, ensuring proper workdir paths and JSON parsing of stdout results.
Translate text accurately โ preserve formatting, handle plurals, and adapt tone per locale.
Thinking partner that transforms ideas into platform-optimized content
AI-agent Skill for PPTX OOXML localization workflows. Use it to unpack PPTX, extract and apply text translations, normalize terminology, enforce language-specific fonts, validate XML integrity, and repack outputs with machine-readable JSON interfaces for automation.
Get subtitles from YouTube videos for translation, language learning, or reading along. Use when the user asks for subtitles, subs, foreign language text, or wants to read video content. Supports multiple languages and timestamped output for sync'd reading.
Translate PowerPoint files to any language while preserving layout. Uses a render-and-verify agent loop (LibreOffice + Vision) to guarantee no text overflow....
Apply metadata updates to existing Calibre books via calibredb over a Content server. Use for controlled metadata edits after target IDs are confirmed by a read-only lookup.