DistillAPI is a modern, full-stack AI application designed to transform dense academic research papers into highly structured, actionable insights. By leveraging LangGraph for orchestrating complex LLM workflows and Next.js for a sleek, dashboard-style interface, DistillAPI extracts core methodologies, key findings, and generates interactive study questions in seconds.
- 📄 Smart PDF Ingestion: Asynchronously parses and chunks large PDF documents using LangChain.
- 🤖 State-Based AI Agent: Uses LangGraph to orchestrate a multi-step summarization and Q&A pipeline.
- 🎯 Guaranteed Structured Outputs: Enforces strict JSON schemas using Pydantic, ensuring the LLM always returns the exact format required by the UI.
- 🛡️ Defensive UI Rendering: Built with Next.js and Shadcn, featuring a resilient frontend that handles missing data gracefully and renders markdown natively.
- ⚡ Async Backend: FastAPI implementation with
asynciothread delegation prevents blocking, allowing high-concurrency document processing.
- Framework: FastAPI
- AI Orchestration: LangGraph & LangChain
- LLM Provider: OpenRouter (
hunter-alpha) - Data Validation: Pydantic
- Framework: Next.js 15+ (App Router)
- Styling: Tailwind CSS
- UI Components: Shadcn UI (Radix Primitives)
- Markdown:
react-markdown - Icons: Lucide React
Follow these instructions to get a copy of the project up and running on your local machine.
- Node.js (v18+)
- Python (3.9+)
- An OpenRouter API Key.
Open a terminal and navigate to your backend directory:
# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Install required Python packages
pip install fastapi uvicorn langchain langchain-openai langchain-community pypdf python-multipart pydantic python-dotenvCreate a .env file in the same directory as main.py and add your API key:
Code snippet
OPENAI_API_KEY="your-api-key-here"
Start the backend server:
Bash
python main.py
# Server will start on http://localhost:8000
Open a new terminal window and navigate to your frontend directory (docuparse-web):
Bash
# Install dependencies
npm install
# Install markdown renderer (if you haven't already)
npm install react-markdown
Start the development server:
Bash
npm run dev
# Frontend will start on http://localhost:3000
The backend utilizes a directed graph (StateGraph) to process documents predictably:
load_pdfnode: Reads the uploaded temporary PDF file and extracts raw text.chunk_textnode: Splits the text into manageable 4000-character chunks with overlap to preserve context.summarizenode: Ingests the chunks and uses.with_structured_output(StructuredSummary)to force the LLM to extract 7 specific academic data points (Core Problem, Methodology, etc.).generate_qanode: Reads the structured summary state and generates 3 conceptual study questions and answers using strict Pydantic schemas.
- Map-Reduce Summarization: Upgrade the
summarize_nodeto process entire 50+ page PDFs by summarizing individual chunks and combining them, rather than just reading the first few pages. - Streaming Responses: Implement Server-Sent Events (SSE) to stream the summary to the UI letter-by-letter as the LLM generates it.
- Export to PDF/Markdown: Add a button to download the generated insights.
- Citation Tracking: Map extracted methodologies back to specific page numbers in the original PDF.
This project is licensed under the MIT License - see the LICENSE file for details.
### Pro-Tips for your README:
* **The Hero Image:** Notice the `![DistillAPI Demo]` tag at the top. Once your app is running, take a screenshot of your beautiful dashboard, save it in a `public/` folder, and update that URL to point to your actual image!
* **Customizing:** If you end up switching from OpenRouter back to standard OpenAI, just update the LLM Provider section.
This is a portfolio-grade README. If a recruiter or another developer looks at this, they will