Self-Evolving Knowledge Graphs · Tiered Memory · Structured Reasoning
Language captures the results of cognition, while cognition itself encompasses perception, experience, and reasoning.
DocThinker is a document-driven RAG system that constructs self-evolving knowledge graphs from uploaded documents. Unlike conventional retrieve-then-respond pipelines, DocThinker treats knowledge as a dynamic graph.
▶️ Watch the YouTube Tutorial | 🚀 Use DocThinker in HuggingFace Space | 📝 Try Colab Tutorial
- 🚀 Quick Install
- 🔥 Quick Start
- 🧬 Key Contributions (Pipeline)
- 💡 Use Cases
- ⚡ Query Modes & PDF Processing
- 📡 API Reference
- ❓ FaQ
We recommend using Python version 3.10 or higher for DocThinker.
# 1. Clone the repository
git clone https://github.com/Yang-Jiashu/doc-thinker.git
cd doc-thinker
# 2. Create a virtual environment
conda create -n docthinker python=3.11 -y
conda activate docthinker
# 3. Install dependencies
pip install -r requirements.txt
pip install -e .The easiest way to experience DocThinker is through its web dashboard.
# 1. Configure environment variables (LLM API Keys)
cp env.example .env
# 2. Start the Backend API (FastAPI)
python -m uvicorn docthinker.server.app:app --host 0.0.0.0 --port 8000
# 3. Start the Frontend UI (Flask)
python run_ui.pyOpen
http://localhost:5000— upload a PDF, ask questions, and explore the evolving knowledge graph.
You can also use DocThinker programmatically with just a few lines of code.
import asyncio
from docthinker import DocThinker, DocThinkerConfig
async def main():
# 1. Configuration
config = DocThinkerConfig(working_dir="./my_knowledge_base")
# 2. Initialize (Requires LLM and Embedding models setup)
dt = DocThinker(config=config, ...)
# 3. Ingest Document (Parsing & Knowledge Graph Construction)
await dt.process_document_complete("your_document.pdf")
# 4. Trigger Test-Time Scaling (Self-Study Loop) to enhance KG density
await dt.run_self_study_loop(max_rounds=5)
# 5. Query with SPARQL CoT Reasoning
response = await dt.aquery("What is the core idea of the document?", mode="deep")
print(response)
asyncio.run(main())DocThinker splits the monolithic pipeline into autonomous agents and introduces graph-based cognitive reasoning.
Figure 1. DocThinker end-to-end pipeline — from document input to knowledge graph construction, tiered memory management, hybrid retrieval & reasoning, and output with feedback back to the graph.
Between document ingestion and user querying, DocThinker runs a background self-study loop (Test-Time Scaling on KG). The LLM autonomously analyzes existing subgraphs, generates questions, retrieves answers, performs continuous deductive reasoning, and writes back new knowledge and methodological experiences (entity_type="experience"). This significantly increases graph density and reasoning capability without requiring additional user prompts.
Expansion operates in two complementary passes:
- Path A (Cluster-based): HDBSCAN clusters entity embeddings → LLM generates cluster summaries → expands new entities grounded in cluster themes.
- Path B (Top-N multi-angle): Top-50 highest-degree nodes expanded across 6 cognitive dimensions (hierarchy, causation, analogy, contrast, temporal, application).
Newly expanded nodes do not immediately become authoritative knowledge — they enter the graph as candidates. Only when users repeatedly adopt a node in actual conversations do its usage count and score accumulate; once thresholds are met, the node is promoted to a formal part of the graph.
DocThinker splits the traditional RAG monolithic pipeline into three specialized Agents:
- Retrieval Agent: Maximizes retrieval hit rate.
- Extraction Agent: Maximizes extraction coverage.
- Answering Agent: Generates final answers and triggers node promotion/decay feedback.
Inspired by the OpenClaw / Letta architecture, Claw implements a three-layer memory hierarchy (Hot, Warm, Cold) for unbounded conversation length.
Complex queries are internally decomposed into SPARQL-style triple-pattern chains before answer generation. The LLM binds variables against KG context via shared-variable chaining.
|
|
| Mode | Strategy | Latency | Depth |
|---|---|---|---|
| Fast | Vector similarity | ~1 s | Shallow |
| Standard | Hybrid KG + vector + reranking | ~3 s | Medium |
| Deep | SPARQL CoT + spreading activation + episodic memory + expansion matching + post-query feedback | ~8 s | Full |
| Mode | Engine | Best for |
|---|---|---|
auto (default) |
VLM (short) / MinerU (long) | General use |
vlm |
Cloud VLM (Qwen-VL) | Image-heavy documents |
mineru |
MinerU layout engine | Long documents with complex tables |
Click to expand API endpoints
| Category | Endpoint | Method | Description |
|---|---|---|---|
| Sessions | /sessions |
GET / POST | List / create sessions |
/sessions/{id}/history |
GET | Chat history | |
/sessions/{id}/files |
GET | Ingested files | |
| Ingest | /ingest |
POST | Upload PDF / TXT |
/ingest/stream |
POST | Stream raw text | |
| Query | /query/stream |
POST | SSE streaming query |
/query |
POST | Non-streaming query | |
| KG | /knowledge-graph/data |
GET | Nodes + edges for visualization |
/knowledge-graph/expand |
POST | Trigger 2-path expansion | |
/knowledge-graph/stats |
GET | KG statistics | |
| Memory | /memory/stats |
GET | Episode + Claw memory stats |
/memory/consolidate |
POST | Run episodic consolidation | |
| Settings | /settings |
GET / POST | Runtime config |
If you find DocThinker useful in your research, please cite:
@article{docthinker2026,
title={DocThinker: Self-Evolving Knowledge Graphs with Tiered Memory and Structured Reasoning for Document Understanding},
author={Yang, Jiashu},
journal={arXiv preprint arXiv:2603.05551},
year={2026}
}PRs and issues welcome! See CONTRIBUTING.md.



