Skip to content

Yang-Jiashu/Doc-thinker

Repository files navigation

DocThinker Banner

DocThinker

Self-Evolving Knowledge Graphs · Tiered Memory · Structured Reasoning

Language captures the results of cognition, while cognition itself encompasses perception, experience, and reasoning.

Paper License: MIT Demo LightRAG OpenClaw

Python FastAPI Flask NetworkX FAISS

English | 中文


DocThinker is a document-driven RAG system that constructs self-evolving knowledge graphs from uploaded documents. Unlike conventional retrieve-then-respond pipelines, DocThinker treats knowledge as a dynamic graph.

🎬 Explore our Tutorial!!

▶️ Watch the YouTube Tutorial | 🚀 Use DocThinker in HuggingFace Space | 📝 Try Colab Tutorial


📑 Index


🚀 Quick Install

We recommend using Python version 3.10 or higher for DocThinker.

# 1. Clone the repository
git clone https://github.com/Yang-Jiashu/doc-thinker.git
cd doc-thinker

# 2. Create a virtual environment
conda create -n docthinker python=3.11 -y
conda activate docthinker

# 3. Install dependencies
pip install -r requirements.txt
pip install -e .

🔥 Quick Start

1. Web UI & Server

The easiest way to experience DocThinker is through its web dashboard.

# 1. Configure environment variables (LLM API Keys)
cp env.example .env

# 2. Start the Backend API (FastAPI)
python -m uvicorn docthinker.server.app:app --host 0.0.0.0 --port 8000

# 3. Start the Frontend UI (Flask)
python run_ui.py

Open http://localhost:5000 — upload a PDF, ask questions, and explore the evolving knowledge graph.

2. Python API Usage

You can also use DocThinker programmatically with just a few lines of code.

import asyncio
from docthinker import DocThinker, DocThinkerConfig

async def main():
    # 1. Configuration
    config = DocThinkerConfig(working_dir="./my_knowledge_base")
    
    # 2. Initialize (Requires LLM and Embedding models setup)
    dt = DocThinker(config=config, ...) 
    
    # 3. Ingest Document (Parsing & Knowledge Graph Construction)
    await dt.process_document_complete("your_document.pdf")
    
    # 4. Trigger Test-Time Scaling (Self-Study Loop) to enhance KG density
    await dt.run_self_study_loop(max_rounds=5)
    
    # 5. Query with SPARQL CoT Reasoning
    response = await dt.aquery("What is the core idea of the document?", mode="deep")
    print(response)

asyncio.run(main())

🧬 Key Contributions

DocThinker splits the monolithic pipeline into autonomous agents and introduces graph-based cognitive reasoning.

DocThinker Pipeline

Figure 1. DocThinker end-to-end pipeline — from document input to knowledge graph construction, tiered memory management, hybrid retrieval & reasoning, and output with feedback back to the graph.

1. 🧠 Test-Time Scaling & Agentic Memory

Between document ingestion and user querying, DocThinker runs a background self-study loop (Test-Time Scaling on KG). The LLM autonomously analyzes existing subgraphs, generates questions, retrieves answers, performs continuous deductive reasoning, and writes back new knowledge and methodological experiences (entity_type="experience"). This significantly increases graph density and reasoning capability without requiring additional user prompts.

2. 🔀 Two-Path KG Self-Expansion

Expansion operates in two complementary passes:

  • Path A (Cluster-based): HDBSCAN clusters entity embeddings → LLM generates cluster summaries → expands new entities grounded in cluster themes.
  • Path B (Top-N multi-angle): Top-50 highest-degree nodes expanded across 6 cognitive dimensions (hierarchy, causation, analogy, contrast, temporal, application).

3. 🔄 Self-Evolving Knowledge Graph

Newly expanded nodes do not immediately become authoritative knowledge — they enter the graph as candidates. Only when users repeatedly adopt a node in actual conversations do its usage count and score accumulate; once thresholds are met, the node is promoted to a formal part of the graph.

4. 🤖 Multi-Agent Co-Evolution

DocThinker splits the traditional RAG monolithic pipeline into three specialized Agents:

  • Retrieval Agent: Maximizes retrieval hit rate.
  • Extraction Agent: Maximizes extraction coverage.
  • Answering Agent: Generates final answers and triggers node promotion/decay feedback.
Multi-Agent Co-Evolution Architecture

Figure 2. DocThinker multi-Agent co-evolution architecture.

5. 🗃️ Tiered Conversation Memory (Claw)

Inspired by the OpenClaw / Letta architecture, Claw implements a three-layer memory hierarchy (Hot, Warm, Cold) for unbounded conversation length.

6. 🧠 SPARQL Chain-of-Thought (CoT) Reasoning

Complex queries are internally decomposed into SPARQL-style triple-pattern chains before answer generation. The LLM binds variables against KG context via shared-variable chaining.

SPARQL CoT Reasoning

💡 Use Cases

"Upload a novel and explore its knowledge graph"

"Deep-mode conversation with SPARQL CoT reasoning and tiered memory"


⚡ Query Modes

Mode Strategy Latency Depth
Fast Vector similarity ~1 s Shallow
Standard Hybrid KG + vector + reranking ~3 s Medium
Deep SPARQL CoT + spreading activation + episodic memory + expansion matching + post-query feedback ~8 s Full

📄 PDF Processing

Mode Engine Best for
auto (default) VLM (short) / MinerU (long) General use
vlm Cloud VLM (Qwen-VL) Image-heavy documents
mineru MinerU layout engine Long documents with complex tables

📡 API Reference

Click to expand API endpoints
Category Endpoint Method Description
Sessions /sessions GET / POST List / create sessions
/sessions/{id}/history GET Chat history
/sessions/{id}/files GET Ingested files
Ingest /ingest POST Upload PDF / TXT
/ingest/stream POST Stream raw text
Query /query/stream POST SSE streaming query
/query POST Non-streaming query
KG /knowledge-graph/data GET Nodes + edges for visualization
/knowledge-graph/expand POST Trigger 2-path expansion
/knowledge-graph/stats GET KG statistics
Memory /memory/stats GET Episode + Claw memory stats
/memory/consolidate POST Run episodic consolidation
Settings /settings GET / POST Runtime config

📝 Citation

If you find DocThinker useful in your research, please cite:

@article{docthinker2026,
  title={DocThinker: Self-Evolving Knowledge Graphs with Tiered Memory and Structured Reasoning for Document Understanding},
  author={Yang, Jiashu},
  journal={arXiv preprint arXiv:2603.05551},
  year={2026}
}

🤝 Contributing

PRs and issues welcome! See CONTRIBUTING.md.

📜 License

MIT

About

Doc Thinker: All-in-One RAG - document parsing, graph RAG, and evaluation

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors