LitReviews

An agentic system for hierarchical discovery and organization of research themes from ArXiv papers. Rather than exhaustive cataloging, LitReviews identifies genuinely central topics and their relationships through iterative refinement guided by AI critique.

Problem Statement

Practitioners in rapidly developing areas face a fundamental challenge: research directions evolve rapidly, and staying current across thousands of new papers published monthly is infeasible through manual reading. Thus, it is challenging to maintain up-to-date mental model of current research trends.

LitReviews addresses this by automatically discovering trending research themes and their relationships, allowing researchers to rapidly understand the current landscape of any field and identify where innovation is accelerating.

System Overview

Core Approach

The system employs a structured agent graph with the following stages:

Theme Exploration - An LLM agent executes an OODA loop (Observe-Orient-Decide-Act) to iteratively discover research themes, search for relevant papers, and build theme hierarchies
Validation - The theme structure is validated for consistency (parent-child relationships, no cycles, proper nesting)
Critique Evaluation - An AI critic independently evaluates each theme on distinctiveness, coherence, and paper relevance alignment
Refinement - Based on critic feedback (severity levels: CRITICAL, MAJOR, MINOR), the agent iteratively improves themes until convergence or iteration limit

Design Principles

Selective, not exhaustive: Prioritizes finding 50 highly relevant papers over 500 marginally related ones
Hierarchical organization: Themes nest meaningfully with root, parent, and leaf-level distinctions
Multi-dimensional relevance: Papers evaluated on topic relevance, root theme relevance, and current landscape representation
Token-aware processing: Real-time context management prevents exceeding LLM token limits through selective consolidation
Provider-agnostic: Supports Anthropic Claude and Google Gemini interchangeably

Architecture

Agent Graph

The graph orchestrates theme discovery and refinement through a multi-stage pipeline. The explore node iteratively discovers themes via LLM tool-use. Validation checks for structural consistency before critique. If validation fails, the system returns to exploration. Critic feedback determines whether to continue refinement (returning to explore) or finalize output.

State Components

Component	Role
explore	LLM-driven theme discovery with tool-use (add/update/delete themes, search papers)
validate_themes	Structural validation of theme hierarchy
evaluate_single_theme	Critic assessment of individual theme quality
compile_critic_scores	Aggregation of critique results across all themes
critic_feedback	Routing logic: continue refining or finalize
consolidate_history_node	Context window recovery when approaching token limits

Tool Integration

The system integrates with ArXiv via a Model Context Protocol (MCP) server:

Paper search by keyword/category
Metadata retrieval (authors, abstract, publication date, citations)
Citation graph parsing

Theme management tools:

add_research_theme() - Create new theme with parent relationship
update_research_theme() - Modify theme description and paper assignments
validate_theme_list() - Check hierarchy consistency
complete_theme_draft() - Mark theme as finalized

Implementation Details

Configuration

Create config.yaml from the template (not version controlled; contains API keys):

provider: "anthropic"

anthropic:
  api_key: ${ANTHROPIC_API_KEY}
  model: claude-haiku-4-5

google:
  api_key: ${GOOGLE_API_KEY}
  model: gemini-2.5-flash

critic:
  max_iterations: 2

rate_limiter:
  requests_per_second: 15

Data Models

Core Pydantic models ensure type safety and validation:

Model	Purpose
`ResearchTheme`	Hierarchical theme with description, papers, and scoring
`Paper`	ArXiv paper metadata (id, title, authors, abstract, categories, age)
`ThemeCriticOutput`	Evaluation results with issue list (severity, recommendation)
`PaperAnalysis`	Paper with relevance justifications across multiple dimensions

Token Management Strategy

The system actively monitors context usage against a configurable recommended_token threshold (default: 64,000):

70%: Advisory message ("approaching limit")
80%: Warning state ("critical usage")
100%+: Force consolidation (summarizes conversation history)

Consolidation is order-independent and preserves theme structure while reducing verbosity of earlier exchanges.

Usage

Web Interface

streamlit run arxiv_agent_ui.py

Enter a research topic (e.g., "Continuous Improvement in Agentic AI") and the system will iteratively discover and refine themes.

Programmatic API

from arxiv_agent import create_and_run_agent

result = create_and_run_agent(
    topic="Your research question",
    config_path="config.yaml",
    recursion_limit=25
)

# result contains hierarchical ResearchTheme structure
for theme in result.themes:
    print(f"{theme.topic}: {len(theme.papers)} papers")

Key Files

File	Lines	Purpose
`arxiv_agent.py`	1532	State graph orchestration and node logic
`arxiv_server.py`	706	MCP server implementation for ArXiv
`arxiv_prompts.py`	652	System and dynamic instruction prompts
`arxiv_agent_ui.py`	754	Streamlit interface for interactive use
`research_theme_tools.py`	516	Theme CRUD operations and validation
`data_models.py`	274	Pydantic models for type safety

Discussion

Strengths

Iterative refinement: Critic-guided improvement ensures theme quality and distinctiveness
Transparent reasoning: LLM maintains conversation history; decisions are traceable
Structured output: Hierarchical themes are easily parsed and consumed by downstream systems
Flexible scope: Token management allows handling 5-paper deep dives or 500-paper broad surveys

Limitations & Future Work

Critique consistency: Critic evaluation can vary; weighted aggregation of multiple independent passes could improve stability
Cold start problem: Initial theme discovery quality depends on LLM understanding of topic; few-shot examples in prompts could help
Cross-theme relationships: Current design emphasizes tree structure; modeling sibling relationships or cross-cutting concerns remains unexplored
Evaluation metrics: No ground-truth benchmark for theme hierarchy quality; developing reproducible evaluation metrics is an open question

Requirements

Python 3.13+
API key for Claude (Anthropic) or Gemini (Google)

See pyproject.toml for dependency versions.

Built for researchers seeking rapid, AI-guided literature organization. Designed for reproducibility and extensibility across different research domains.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
agent_graph.png		agent_graph.png
arxiv_agent.py		arxiv_agent.py
arxiv_agent_ui.py		arxiv_agent_ui.py
arxiv_prompts.py		arxiv_prompts.py
arxiv_server.py		arxiv_server.py
config.example.yaml		config.example.yaml
data_models.py		data_models.py
pyproject.toml		pyproject.toml
research_theme_tools.py		research_theme_tools.py
visualize_graph.py		visualize_graph.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LitReviews

Problem Statement

System Overview

Core Approach

Design Principles

Architecture

Agent Graph

State Components

Tool Integration

Implementation Details

Configuration

Data Models

Token Management Strategy

Usage

Web Interface

Programmatic API

Key Files

Discussion

Strengths

Limitations & Future Work

Requirements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

shilongdai/LitReviews

Folders and files

Latest commit

History

Repository files navigation

LitReviews

Problem Statement

System Overview

Core Approach

Design Principles

Architecture

Agent Graph

State Components

Tool Integration

Implementation Details

Configuration

Data Models

Token Management Strategy

Usage

Web Interface

Programmatic API

Key Files

Discussion

Strengths

Limitations & Future Work

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages