Skip to content

A local-first AI research agent automating literature review via n8n/Streamlit orchestration, covering arXiv and Semantic Scholar for gap identification and novel LLM-powered scaffold generation.

License

Notifications You must be signed in to change notification settings

krishang118/AutoResearcher

Repository files navigation

AutoResearcher - The AI-Powered Research Agent

Automate the literature review, all locally!

A local-first, agentic AI research system that automates literature review, identifies research gaps, and scaffolds novel research papers, all powered by local LLMs.

Overview

AutoResearcher is a complete research automation pipeline that transforms a research domain into publication-ready paper scaffolds through a multi-agent workflow. Unlike traditional literature review tools, AutoResearcher doesn't just find papers, it understands them, extracts insights, identifies novelty gaps, and generates coherent research directions grounded in actual literature.

What It Does

  1. Research Setup - Accepts the required research domain, optional focus areas and the related research parameters from the user
  2. Domain Mapping - Analyzes research domains and identifies subfields, methodologies, and trends
  3. Literature Scouting - Intelligently queries arXiv and discovers relevant papers

  1. Paper Analysis - Extracts claims, limitations, and key insights from the PDFs

  1. Novelty Detection and Synthesis - Identifies research gaps and generates possible novel research directions with detailed justifications

  1. Paper Scaffolding - Creates a complete paper outline with title, abstract, and section structures

Key Features

  • 100% Local - All of the AI inference runs on your machine via Ollama (no API keys, no data sharing)
  • Multi-Agent Architecture - 9 specialized AI agents orchestrated via Streamlit or n8n workflow
  • Literature-Grounded - Every claim and citation is traceable to actual papers
  • Interactive UI - An all-black dark-mode themed Streamlit interface
  • Production-Ready - FastAPI backend with error handling and logging

Component Details

Agent 1: Domain Mapper

  • Purpose: Analyzes research domain and maps the landscape
  • Input: Domain name + constraints
  • Output: Subfields, methodologies, trends, problem classes
  • LLM: deepseek-r1:7b

Agent 2: Query Generator

  • Purpose: Generates optimal search queries for arXiv
  • Input: Domain map
  • Output: List of search queries
  • LLM: qwen3:4b

Agent 3: Literature Scout

  • Purpose: Discovers relevant papers via arXiv API
  • Input: Search queries + max papers
  • Output: Paper metadata (title, authors, abstract, arXiv ID)
  • External API: arXiv

Agent 4: Paper Ingestion

  • Purpose: Downloads and processes PDFs
  • Input: arXiv IDs
  • Output: Extracted text from PDFs
  • Libraries: PyMuPDF, pdfplumber

Agent 5: Claim Extractor

  • Purpose: Extracts key claims from papers
  • Input: Paper text
  • Output: List of claims per paper
  • LLM: qwen3:4b

Agent 6: Limitation Extractor

  • Purpose: Identifies limitations and future work
  • Input: Paper text
  • Output: List of limitations per paper
  • LLM: qwen3:4b

Agent 7: Novelty Detector

  • Purpose: Identifies research gaps and underexplored areas
  • Input: All claims + limitations
  • Output: Novelty gaps with explanations
  • LLM: deepseek-r1:7b

Agent 8: Direction Synthesizer

  • Purpose: Generates possible novel research directions
  • Input: Domain map + novelty gaps
  • Output: Research directions with titles, descriptions, methodologies
  • LLM: deepseek-r1:7b

Agent 9: Scaffold Generator

  • Purpose: Creates paper outline from selected direction
  • Input: Research direction + papers
  • Output: Title, abstract, contributions, section outline
  • LLM: deepseek-r1:7b

System Architecture

┌─────────────────┐
│  Streamlit UI   │  ← User Input (Domain + Constraints)
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   n8n Workflow  │  ← Orchestration Layer (Supportable)
└────────┬────────┘
         │
         ▼
┌───────────────────────────────────────────────────┐
│              FastAPI Backend                      │                             
│  ┌──────────────────────────────────────────────┐ │
│  │          9 Research Agents                   │ │
│  ├──────────────────────────────────────────────┤ │
│  │  1. Domain Mapper                            │ │
│  │  2. Query Generator                          │ │
│  │  3. Literature Scout                         │ │
│  │  4. Paper Ingestion                          │ │
│  │  5. Claim Extractor                          │ │
│  │  6. Limitation Extractor                     │ │
│  │  7. Novelty Detector                         │ │
│  │  8. Direction Synthesizer                    │ │
│  │  9. Scaffold Generator                       │ │
│  └──────────────────────────────────────────────┘ │                                  
│  ┌──────────────────────────────────────────────┐ │
│  │          Core Services                       │ │
│  ├──────────────────────────────────────────────┤ │
│  │  • LLM Service (Ollama)                      │ │
│  │  • Vector Store (Embeddings)                 │ │
│  │  • arXiv Service                             │ │
│  │  • PDF Processor                             │ │
│  │  • Semantic Scholar (Supportable)            │ │
│  └──────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────┘
         │
         ▼
┌─────────────────┐
│  Ollama (LLMs)  │  ← deepseek-r1:7b + qwen3:4b
└─────────────────┘

Tech Stack

Backend

  • FastAPI - High-performance Python API framework
  • Python 3.9+ - Core language
  • Pydantic - Data validation and schemas

AI & ML

  • Ollama - Local LLM inference engine
    • deepseek-r1:7b - Reasoning model for complex analysis
    • qwen3:4b - Fast model for simple tasks
  • sentence-transformers - Semantic embeddings
  • NLTK - Text processing

Data Processing

  • PyMuPDF - PDF text extraction
  • pdfplumber - Alternative PDF parser
  • BeautifulSoup4 - HTML parsing (Semantic Scholar)

Orchestration

  • n8n - Workflow automation (runs via local npx, or similar setup)

Frontend

  • Streamlit - Interactive UI with custom CSS

Data Storage

  • DiskCache - Result caching
  • FAISS/Chroma - Vector embeddings

External APIs

  • arXiv API - Paper search and metadata
  • Semantic Scholar - Enhanced metadata

Data Privacy & Ethics

  • 100% Local: All LLM inference runs on your machine
  • No Cloud: No data sent to external APIs (except arXiv for papers)
  • No Tracking: Zero telemetry or analytics
  • Ethical Citations: Every claim grounded in actual papers
  • No Fabrication: System never invents citations or experiments

How to Run

  1. Make sure Python 3.9+ is installed, and n8n is set up on your system (if to be used for orchestration).
  2. Install Ollama and set up deepseek-r1:7b and qwen3:4b.
  3. Clone this repository on your local machine.
  4. Set up virtual Python environment and install the required dependencies:
python -m venv venv

source venv/bin/activate  # macOS/Linux
# OR
venv\Scripts\activate     # Windows

pip install -r requirements.txt
  1. Configure the environment:
# Copy environment template
cp .env.example .env

# Edit .env with your settings (if needed)
nano .env
  1. Run the system by setting up and running 3 separate simultaneous terminal windows:
# Terminal 1 : Start n8n, import workflow
npx n8n 
# Opens on: http://localhost:5678

# Terminal 2 : Start FastAPI Backend
cd python_agents
source venv/bin/activate  # or venv\Scripts\activate on Windows
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# API: http://localhost:8000

# Terminal 3 : Start Streamlit UI
cd streamlit_app
source venv/bin/activate  # or venv\Scripts\activate on Windows
streamlit run app.py
# UI: http://localhost:8501 - Open this and start using the system

Disclaimer

AutoResearcher is a research assistance agent. It:

  • Does NOT guarantee novelty,
  • Does NOT replace human judgment,
  • Does NOT write complete papers.

Always verify outputs and conduct proper literature review.

Contributing

Contributions are welcome!

License

Distributed under the MIT License.

About

A local-first AI research agent automating literature review via n8n/Streamlit orchestration, covering arXiv and Semantic Scholar for gap identification and novel LLM-powered scaffold generation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages