Skip to content

AI-powered research theme discovery system for ArXiv papers. Automatically identifies trending research themes and their hierarchical relationships through agent-guided exploration and AI critique.

Notifications You must be signed in to change notification settings

shilongdai/LitReviews

Repository files navigation

LitReviews

An agentic system for hierarchical discovery and organization of research themes from ArXiv papers. Rather than exhaustive cataloging, LitReviews identifies genuinely central topics and their relationships through iterative refinement guided by AI critique.

Problem Statement

Practitioners in rapidly developing areas face a fundamental challenge: research directions evolve rapidly, and staying current across thousands of new papers published monthly is infeasible through manual reading. Thus, it is challenging to maintain up-to-date mental model of current research trends.

LitReviews addresses this by automatically discovering trending research themes and their relationships, allowing researchers to rapidly understand the current landscape of any field and identify where innovation is accelerating.

System Overview

Core Approach

The system employs a structured agent graph with the following stages:

  1. Theme Exploration - An LLM agent executes an OODA loop (Observe-Orient-Decide-Act) to iteratively discover research themes, search for relevant papers, and build theme hierarchies
  2. Validation - The theme structure is validated for consistency (parent-child relationships, no cycles, proper nesting)
  3. Critique Evaluation - An AI critic independently evaluates each theme on distinctiveness, coherence, and paper relevance alignment
  4. Refinement - Based on critic feedback (severity levels: CRITICAL, MAJOR, MINOR), the agent iteratively improves themes until convergence or iteration limit

Design Principles

  • Selective, not exhaustive: Prioritizes finding 50 highly relevant papers over 500 marginally related ones
  • Hierarchical organization: Themes nest meaningfully with root, parent, and leaf-level distinctions
  • Multi-dimensional relevance: Papers evaluated on topic relevance, root theme relevance, and current landscape representation
  • Token-aware processing: Real-time context management prevents exceeding LLM token limits through selective consolidation
  • Provider-agnostic: Supports Anthropic Claude and Google Gemini interchangeably

Architecture

Agent Graph

LitReviews Agent Graph

The graph orchestrates theme discovery and refinement through a multi-stage pipeline. The explore node iteratively discovers themes via LLM tool-use. Validation checks for structural consistency before critique. If validation fails, the system returns to exploration. Critic feedback determines whether to continue refinement (returning to explore) or finalize output.

State Components

Component Role
explore LLM-driven theme discovery with tool-use (add/update/delete themes, search papers)
validate_themes Structural validation of theme hierarchy
evaluate_single_theme Critic assessment of individual theme quality
compile_critic_scores Aggregation of critique results across all themes
critic_feedback Routing logic: continue refining or finalize
consolidate_history_node Context window recovery when approaching token limits

Tool Integration

The system integrates with ArXiv via a Model Context Protocol (MCP) server:

  • Paper search by keyword/category
  • Metadata retrieval (authors, abstract, publication date, citations)
  • Citation graph parsing

Theme management tools:

  • add_research_theme() - Create new theme with parent relationship
  • update_research_theme() - Modify theme description and paper assignments
  • validate_theme_list() - Check hierarchy consistency
  • complete_theme_draft() - Mark theme as finalized

Implementation Details

Configuration

Create config.yaml from the template (not version controlled; contains API keys):

provider: "anthropic"

anthropic:
  api_key: ${ANTHROPIC_API_KEY}
  model: claude-haiku-4-5

google:
  api_key: ${GOOGLE_API_KEY}
  model: gemini-2.5-flash

critic:
  max_iterations: 2

rate_limiter:
  requests_per_second: 15

Data Models

Core Pydantic models ensure type safety and validation:

Model Purpose
ResearchTheme Hierarchical theme with description, papers, and scoring
Paper ArXiv paper metadata (id, title, authors, abstract, categories, age)
ThemeCriticOutput Evaluation results with issue list (severity, recommendation)
PaperAnalysis Paper with relevance justifications across multiple dimensions

Token Management Strategy

The system actively monitors context usage against a configurable recommended_token threshold (default: 64,000):

  • 70%: Advisory message ("approaching limit")
  • 80%: Warning state ("critical usage")
  • 100%+: Force consolidation (summarizes conversation history)

Consolidation is order-independent and preserves theme structure while reducing verbosity of earlier exchanges.

Usage

Web Interface

streamlit run arxiv_agent_ui.py

Enter a research topic (e.g., "Continuous Improvement in Agentic AI") and the system will iteratively discover and refine themes.

Programmatic API

from arxiv_agent import create_and_run_agent

result = create_and_run_agent(
    topic="Your research question",
    config_path="config.yaml",
    recursion_limit=25
)

# result contains hierarchical ResearchTheme structure
for theme in result.themes:
    print(f"{theme.topic}: {len(theme.papers)} papers")

Key Files

File Lines Purpose
arxiv_agent.py 1532 State graph orchestration and node logic
arxiv_server.py 706 MCP server implementation for ArXiv
arxiv_prompts.py 652 System and dynamic instruction prompts
arxiv_agent_ui.py 754 Streamlit interface for interactive use
research_theme_tools.py 516 Theme CRUD operations and validation
data_models.py 274 Pydantic models for type safety

Discussion

Strengths

  • Iterative refinement: Critic-guided improvement ensures theme quality and distinctiveness
  • Transparent reasoning: LLM maintains conversation history; decisions are traceable
  • Structured output: Hierarchical themes are easily parsed and consumed by downstream systems
  • Flexible scope: Token management allows handling 5-paper deep dives or 500-paper broad surveys

Limitations & Future Work

  1. Critique consistency: Critic evaluation can vary; weighted aggregation of multiple independent passes could improve stability
  2. Cold start problem: Initial theme discovery quality depends on LLM understanding of topic; few-shot examples in prompts could help
  3. Cross-theme relationships: Current design emphasizes tree structure; modeling sibling relationships or cross-cutting concerns remains unexplored
  4. Evaluation metrics: No ground-truth benchmark for theme hierarchy quality; developing reproducible evaluation metrics is an open question

Requirements

  • Python 3.13+
  • API key for Claude (Anthropic) or Gemini (Google)

See pyproject.toml for dependency versions.


Built for researchers seeking rapid, AI-guided literature organization. Designed for reproducibility and extensibility across different research domains.

About

AI-powered research theme discovery system for ArXiv papers. Automatically identifies trending research themes and their hierarchical relationships through agent-guided exploration and AI critique.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages