AutoResearcher - The AI-Powered Research Agent

Automate the literature review, all locally!

A local-first, agentic AI research system that automates literature review, identifies research gaps, and scaffolds novel research papers, all powered by local LLMs.

Overview

AutoResearcher is a complete research automation pipeline that transforms a research domain into publication-ready paper scaffolds through a multi-agent workflow. Unlike traditional literature review tools, AutoResearcher doesn't just find papers, it understands them, extracts insights, identifies novelty gaps, and generates coherent research directions grounded in actual literature.

What It Does

Research Setup - Accepts the required research domain, optional focus areas and the related research parameters from the user
Domain Mapping - Analyzes research domains and identifies subfields, methodologies, and trends
Literature Scouting - Intelligently queries arXiv and discovers relevant papers

Paper Analysis - Extracts claims, limitations, and key insights from the PDFs

Novelty Detection and Synthesis - Identifies research gaps and generates possible novel research directions with detailed justifications

Paper Scaffolding - Creates a complete paper outline with title, abstract, and section structures

Key Features

100% Local - All of the AI inference runs on your machine via Ollama (no API keys, no data sharing)
Multi-Agent Architecture - 9 specialized AI agents orchestrated via Streamlit or n8n workflow
Literature-Grounded - Every claim and citation is traceable to actual papers
Interactive UI - An all-black dark-mode themed Streamlit interface
Production-Ready - FastAPI backend with error handling and logging

Component Details

Agent 1: Domain Mapper

Purpose: Analyzes research domain and maps the landscape
Input: Domain name + constraints
Output: Subfields, methodologies, trends, problem classes
LLM: deepseek-r1:7b

Agent 2: Query Generator

Purpose: Generates optimal search queries for arXiv
Input: Domain map
Output: List of search queries
LLM: qwen3:4b

Agent 3: Literature Scout

Purpose: Discovers relevant papers via arXiv API
Input: Search queries + max papers
Output: Paper metadata (title, authors, abstract, arXiv ID)
External API: arXiv

Agent 4: Paper Ingestion

Purpose: Downloads and processes PDFs
Input: arXiv IDs
Output: Extracted text from PDFs
Libraries: PyMuPDF, pdfplumber

Agent 5: Claim Extractor

Purpose: Extracts key claims from papers
Input: Paper text
Output: List of claims per paper
LLM: qwen3:4b

Agent 6: Limitation Extractor

Purpose: Identifies limitations and future work
Input: Paper text
Output: List of limitations per paper
LLM: qwen3:4b

Agent 7: Novelty Detector

Purpose: Identifies research gaps and underexplored areas
Input: All claims + limitations
Output: Novelty gaps with explanations
LLM: deepseek-r1:7b

Agent 8: Direction Synthesizer

Purpose: Generates possible novel research directions
Input: Domain map + novelty gaps
Output: Research directions with titles, descriptions, methodologies
LLM: deepseek-r1:7b

Agent 9: Scaffold Generator

Purpose: Creates paper outline from selected direction
Input: Research direction + papers
Output: Title, abstract, contributions, section outline
LLM: deepseek-r1:7b

System Architecture

┌─────────────────┐
│  Streamlit UI   │  ← User Input (Domain + Constraints)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   n8n Workflow  │  ← Orchestration Layer (Supportable)
└────────┬────────┘
         │
         ▼
┌───────────────────────────────────────────────────┐
│              FastAPI Backend                      │                             
│  ┌──────────────────────────────────────────────┐ │
│  │          9 Research Agents                   │ │
│  ├──────────────────────────────────────────────┤ │
│  │  1. Domain Mapper                            │ │
│  │  2. Query Generator                          │ │
│  │  3. Literature Scout                         │ │
│  │  4. Paper Ingestion                          │ │
│  │  5. Claim Extractor                          │ │
│  │  6. Limitation Extractor                     │ │
│  │  7. Novelty Detector                         │ │
│  │  8. Direction Synthesizer                    │ │
│  │  9. Scaffold Generator                       │ │
│  └──────────────────────────────────────────────┘ │                                  
│  ┌──────────────────────────────────────────────┐ │
│  │          Core Services                       │ │
│  ├──────────────────────────────────────────────┤ │
│  │  • LLM Service (Ollama)                      │ │
│  │  • Vector Store (Embeddings)                 │ │
│  │  • arXiv Service                             │ │
│  │  • PDF Processor                             │ │
│  │  • Semantic Scholar (Supportable)            │ │
│  └──────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────┐
│  Ollama (LLMs)  │  ← deepseek-r1:7b + qwen3:4b
└─────────────────┘

Tech Stack

Backend

FastAPI - High-performance Python API framework
Python 3.9+ - Core language
Pydantic - Data validation and schemas

AI & ML

Ollama - Local LLM inference engine
- deepseek-r1:7b - Reasoning model for complex analysis
- qwen3:4b - Fast model for simple tasks
sentence-transformers - Semantic embeddings
NLTK - Text processing

Data Processing

PyMuPDF - PDF text extraction
pdfplumber - Alternative PDF parser
BeautifulSoup4 - HTML parsing (Semantic Scholar)

Orchestration

n8n - Workflow automation (runs via local npx, or similar setup)

Frontend

Streamlit - Interactive UI with custom CSS

Data Storage

DiskCache - Result caching
FAISS/Chroma - Vector embeddings

External APIs

arXiv API - Paper search and metadata
Semantic Scholar - Enhanced metadata

Data Privacy & Ethics

100% Local: All LLM inference runs on your machine
No Cloud: No data sent to external APIs (except arXiv for papers)
No Tracking: Zero telemetry or analytics
Ethical Citations: Every claim grounded in actual papers
No Fabrication: System never invents citations or experiments

How to Run

Make sure Python 3.9+ is installed, and n8n is set up on your system (if to be used for orchestration).
Install Ollama and set up deepseek-r1:7b and qwen3:4b.
Clone this repository on your local machine.
Set up virtual Python environment and install the required dependencies:

python -m venv venv

source venv/bin/activate  # macOS/Linux
# OR
venv\Scripts\activate     # Windows

pip install -r requirements.txt

Configure the environment:

# Copy environment template
cp .env.example .env

# Edit .env with your settings (if needed)
nano .env

Run the system by setting up and running 3 separate simultaneous terminal windows:

# Terminal 1 : Start n8n, import workflow
npx n8n 
# Opens on: http://localhost:5678

# Terminal 2 : Start FastAPI Backend
cd python_agents
source venv/bin/activate  # or venv\Scripts\activate on Windows
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# API: http://localhost:8000

# Terminal 3 : Start Streamlit UI
cd streamlit_app
source venv/bin/activate  # or venv\Scripts\activate on Windows
streamlit run app.py
# UI: http://localhost:8501 - Open this and start using the system

Disclaimer

AutoResearcher is a research assistance agent. It:

Does NOT guarantee novelty,
Does NOT replace human judgment,
Does NOT write complete papers.

Always verify outputs and conduct proper literature review.

Contributing

Contributions are welcome!

License

Distributed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
demos		demos
n8n_workflows		n8n_workflows
python_agents		python_agents
streamlit_app		streamlit_app
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
n8n workflow.png		n8n workflow.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoResearcher - The AI-Powered Research Agent

Overview

What It Does

Key Features

Component Details

Agent 1: Domain Mapper

Agent 2: Query Generator

Agent 3: Literature Scout

Agent 4: Paper Ingestion

Agent 5: Claim Extractor

Agent 6: Limitation Extractor

Agent 7: Novelty Detector

Agent 8: Direction Synthesizer

Agent 9: Scaffold Generator

System Architecture

Tech Stack

Backend

AI & ML

Data Processing

Orchestration

Frontend

Data Storage

External APIs

Data Privacy & Ethics

How to Run

Disclaimer

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

krishang118/AutoResearcher

Folders and files

Latest commit

History

Repository files navigation

AutoResearcher - The AI-Powered Research Agent

Overview

What It Does

Key Features

Component Details

Agent 1: Domain Mapper

Agent 2: Query Generator

Agent 3: Literature Scout

Agent 4: Paper Ingestion

Agent 5: Claim Extractor

Agent 6: Limitation Extractor

Agent 7: Novelty Detector

Agent 8: Direction Synthesizer

Agent 9: Scaffold Generator

System Architecture

Tech Stack

Backend

AI & ML

Data Processing

Orchestration

Frontend

Data Storage

External APIs

Data Privacy & Ethics

How to Run

Disclaimer

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages