agentic-paper-digest-skillFetches and summarizes recent arXiv and Hugging Face papers with Agentic Paper Digest. Use when the user wants a paper digest, a JSON feed of recent papers, or to run the arXiv/HF pipeline.
Install via ClawdBot CLI:
clawdbot install matanle51/agentic-paper-digest-skillOPENAI_API_KEY or an OpenAI-compatible provider via LITELLM_API_BASE + LITELLM_API_KEY.git is optional for bootstrap; otherwise curl/wget (or Python) is used to download the repo.bash "{baseDir}/scripts/bootstrap.sh"
PROJECT_DIR.PROJECT_DIR="$HOME/agentic_paper_digest" bash "{baseDir}/scripts/bootstrap.sh"
bash "{baseDir}/scripts/run_cli.sh"
bash "{baseDir}/scripts/run_cli.sh" --window-hours 24 --sources arxiv,hf
bash "{baseDir}/scripts/run_api.sh"
curl -X POST http://127.0.0.1:8000/api/run
curl http://127.0.0.1:8000/api/status
curl http://127.0.0.1:8000/api/papers
bash "{baseDir}/scripts/stop_api.sh"
--json prints run_id, seen, kept, window_start, and window_end.data/papers.sqlite3 (under PROJECT_DIR).POST /api/run, GET /api/status, GET /api/papers, GET/POST /api/topics, GET/POST /api/settings.Config files live in PROJECT_DIR/config. Environment variables can be set in the shell or via a .env file. The wrappers here auto-load .env from PROJECT_DIR (override with ENV_FILE=/path/to/.env).
Environment (.env or exported vars)
OPENAI_API_KEY: required for OpenAI models (litellm reads this).LITELLM_API_BASE, LITELLM_API_KEY: use an OpenAI-compatible proxy/provider.LITELLM_MODEL_RELEVANCE, LITELLM_MODEL_SUMMARY: models for relevance and summarization (summary defaults to relevance model if unset).LITELLM_TEMPERATURE_RELEVANCE, LITELLM_TEMPERATURE_SUMMARY: lower for more deterministic output.LITELLM_MAX_RETRIES: retry count for LLM calls.LITELLM_DROP_PARAMS=1: drop unsupported params to avoid provider errors.WINDOW_HOURS, APP_TZ: recency window and timezone.ARXIV_CATEGORIES: comma-separated categories (default includes cs.CL,cs.AI,cs.LG,stat.ML,cs.CR).ARXIV_API_BASE, HF_API_BASE: override source endpoints if needed.ARXIV_MAX_RESULTS, ARXIV_PAGE_SIZE: arXiv paging limits.MAX_CANDIDATES_PER_SOURCE: cap candidates per source before LLM filtering.FETCH_TIMEOUT_S, REQUEST_TIMEOUT_S: source fetch and per-request timeouts.ENABLE_PDF_TEXT=1: include first-page PDF text in summaries; requires PyMuPDF (pip install pymupdf).DATA_DIR: location for papers.sqlite3.CORS_ORIGINS: comma-separated origins allowed by the API server (UI use).TOPICS_PATH, SETTINGS_PATH, AFFILIATION_BOOSTS_PATH.Config files
config/topics.json: list of topics with id, label, description, max_per_topic, and keywords. The relevance classifier must output topic IDs exactly as defined here. max_per_topic also caps results in GET /api/papers when apply_topic_caps=1.config/settings.json: overrides fetch limits (arxiv_max_results, arxiv_page_size, fetch_timeout_s, max_candidates_per_source). Updated via POST /api/settings.config/affiliations.json: list of {pattern, weight} boosts applied by substring match over affiliations. Weights add up and are capped at 1.0. Invalid JSON disables boosts, so keep the file strict JSON (no trailing commas).config/topics.json, config/settings.json, and config/affiliations.json (if present).config/topics.json (topics[].id/label/description/keywords, max_per_topic). Show current defaults and ask whether to keep or change them.
WINDOW_HOURS (or pass --window-hours to CLI) only if the user cares; otherwise keep default to 24h.ARXIV_CATEGORIES, ARXIV_MAX_RESULTS, ARXIV_PAGE_SIZE, MAX_CANDIDATES_PER_SOURCE. Ask whether to keep defaults and show the current values.
OPENAI_API_KEY or LITELLM_API_KEY (+ LITELLM_API_BASE if proxy), and set LITELLM_MODEL_RELEVANCE/LITELLM_MODEL_SUMMARY.PROJECT_DIR="$HOME/agentic_paper_digest" if the user doesnβt care. Never hardcode /Users/... paths..env:.env is missing, create it from .env.example (in the repo), then ask the user to fill keys and any requested preferences.OPENAI_API_KEY or LITELLM_API_KEY is set before running.POST /api/topics and POST /api/settings if running the API).scripts/run_cli.sh for one-off JSON output.scripts/run_api.sh only if the user explicitly asks for UI/API access or polling.WINDOW_HOURS, ARXIV_MAX_RESULTS, or broadening topics.WINDOW_HOURS or ARXIV_MAX_RESULTS when results are sparse, or lower them if results are too noisy.ARXIV_CATEGORIES to your research domains.ENABLE_PDF_TEXT=1) when abstracts are too thin.bash "{baseDir}/scripts/stop_api.sh" or pass --port to the API command.WINDOW_HOURS or verify the API key in .env.OPENAI_API_KEY or LITELLM_API_KEY in the shell before running.Generated Mar 1, 2026
Research institutions and university labs use this skill to automatically track and summarize the latest AI/ML papers from arXiv and Hugging Face. The JSON output feeds into internal dashboards, helping researchers stay current without manual paper scanning.
AI startups deploy this skill to monitor competitor publications and emerging techniques. The local API server allows continuous polling for new papers, with summaries helping technical teams quickly assess relevant innovations in their domain.
Venture capital and investment firms use the skill to identify promising AI research trends for investment opportunities. The topic filtering and affiliation boosting help prioritize papers from top institutions and emerging research areas.
Large tech corporations integrate this skill into their R&D workflows to automatically ingest and categorize relevant academic papers. The configurable topics and caps ensure teams receive tailored digests aligned with specific product development areas.
Technical journalists and content creators use the skill to generate timely summaries of AI research breakthroughs. The structured JSON output enables easy transformation into articles, newsletters, or social media content about latest developments.
Offer a hosted version of the API server with managed infrastructure, eliminating the need for users to handle installation and configuration. Provide tiered subscriptions based on paper volume, API call limits, and advanced features like custom topic models.
License the skill to large organizations for internal deployment, with added enterprise features like SSO integration, audit logging, and priority support. Include customization services for specific industry use cases and integration with existing research platforms.
Process and enrich the paper data at scale, then sell access to the aggregated, cleaned dataset through API endpoints or data dumps. Offer specialized datasets focused on particular AI subfields, institutions, or geographic regions.
π¬ Integration Tip
Start with the CLI mode to validate the setup before deploying the API server for production workflows. Use the .env file for environment variables rather than exporting them in the shell for better security and reproducibility.
Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Search and analyze your own session logs (older/parent conversations) using jq.
Typed knowledge graph for structured agent memory and composable skills. Use when creating/querying entities (Person, Project, Task, Event, Document), linking related objects, enforcing constraints, planning multi-step actions as graph transformations, or when skills need to share state. Trigger on "remember", "what do I know about", "link X to Y", "show dependencies", entity CRUD, or cross-skill data access.
Ultimate AI agent memory system for Cursor, Claude, ChatGPT & Copilot. WAL protocol + vector search + git-notes + cloud backup. Never lose context again. Vibe-coding ready.
Headless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection