Skip to content

Anirudh77715/Generative-AI-Agent-Biomedical-Research-Papers-Review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Generative AI Agent for Biomedical Research Papers Review ๐Ÿ”ฌ

An advanced AI-powered platform for analyzing biomedical research papers with automated PICO extraction, entity recognition, semantic search, and intelligent Q&A with verified citations.

This repository hosts the full-stack implementation published at Generative-AI-Agent-Biomedical-Research-Papers-Review.

Platform Status Node Version License

๐Ÿ“‹ Table of Contents

โœจ Features

Core Capabilities

1. Paper Upload & Management

  • ๐Ÿ“„ PDF Support: Upload biomedical research papers in PDF format
  • ๐Ÿ“ Text Support: Also accepts plain text files
  • ๐Ÿš€ Automatic Processing: Papers are automatically chunked and embedded for semantic search
  • ๐Ÿ“š Library Management: Organize and browse all uploaded papers

2. PICO Element Extraction

Automatically extracts the four key components of clinical research:

  • Population: Patient groups or subjects being studied
  • Intervention: Treatments or exposures being investigated
  • Comparison: Alternative treatments or control groups
  • Outcome: Measured results or endpoints

Each element includes confidence scores (0.0-1.0) and is displayed with color-coded visual sections.

3. Biomedical Entity Recognition

Identifies and extracts key biomedical entities:

  • ๐Ÿฆ  Diseases/Conditions: Medical conditions and diagnoses
  • ๐Ÿ’Š Drugs/Medications: Pharmaceutical compounds and treatments
  • ๐Ÿงฌ Proteins: Protein names and identifiers
  • ๐Ÿ”ฌ Genes: Genetic markers and gene names

Entities are grouped by type with frequency counts for easy analysis.

4. Semantic Search

  • ๐Ÿ” AI-Powered Search: Natural language queries across all papers
  • ๐Ÿ“Š Relevance Scoring: Cosine similarity matching with relevance percentages
  • ๐Ÿ“Œ Contextual Excerpts: Shows matching passages with highlighting
  • โšก Fast Results: Optimized vector search with embedding indices

5. Intelligent Q&A System

  • ๐Ÿ’ฌ Conversational Interface: Chat-style Q&A about your research papers
  • ๐ŸŽฏ Context-Aware Answers: AI analyzes relevant passages before answering
  • ๐Ÿ“š Automatic Citations: All answers include citations with source papers
  • ๐Ÿ”— Citation Tracking: Click citations to view full excerpts and paper details
  • ๐Ÿ’พ Conversation History: All Q&A sessions are saved for reference

6. Analytics Dashboard

  • ๐Ÿ“ˆ Real-time Statistics: Track papers, analyses, entities, and Q&A sessions
  • ๐Ÿ“‹ Recent Activity: View recently uploaded papers
  • ๐ŸŽจ Visual Overview: Clean interface showing your research corpus at a glance

7. Professional UI/UX

  • ๐ŸŒ“ Dark Mode: Full dark/light theme support with toggle
  • ๐Ÿ“ฑ Responsive Design: Works seamlessly on desktop, tablet, and mobile
  • โ™ฟ Accessibility: WCAG AA compliant with keyboard navigation
  • ๐ŸŽจ Modern Design: Clean, professional interface optimized for research workflows

๐Ÿ›  Technology Stack

Frontend

  • React 18 - Modern UI library with hooks
  • TypeScript - Type-safe development
  • Wouter - Lightweight routing
  • TanStack Query v5 - Server state management
  • Tailwind CSS - Utility-first styling
  • Shadcn UI - High-quality component library
  • Lucide Icons - Beautiful icon set
  • Vite - Lightning-fast build tool

Backend

  • Node.js 20 - JavaScript runtime
  • Express.js - Web application framework
  • TypeScript - Type-safe server code
  • Multer - File upload handling
  • pdf-parse - PDF text extraction

AI & ML

  • OpenAI GPT-5 - Advanced language model (released August 2025)
  • text-embedding-3-small - Text embedding for semantic search
  • Vector Embeddings - 1536-dimensional vectors for similarity matching
  • Cosine Similarity - Mathematical measure for relevance scoring

Data Management

  • Drizzle ORM - Type-safe database toolkit
  • Zod - Schema validation
  • In-Memory Storage - Fast development with MemStorage

๐Ÿš€ Getting Started

Prerequisites

Installation

  1. Clone or fork this Repl

  2. Set up environment variables

    Add your OpenAI API key to Replit Secrets:

    • Open the "Secrets" tab (๐Ÿ”’ icon in left sidebar)
    • Add secret: OPENAI_API_KEY = your-api-key-here
  3. Start the application

    npm run dev
  4. Access the application

    • The app runs on port 5000
    • Click the web preview or open your Repl URL

๐Ÿ“ Project Structure

biomedical-research-ai-agent/
โ”œโ”€โ”€ client/                    # Frontend React application
โ”‚   โ”œโ”€โ”€ public/               # Static assets
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ components/       # Reusable React components
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ui/          # Shadcn UI components
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ app-sidebar.tsx
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ theme-provider.tsx
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ theme-toggle.tsx
โ”‚   โ”‚   โ”œโ”€โ”€ hooks/           # Custom React hooks
โ”‚   โ”‚   โ”œโ”€โ”€ lib/             # Utilities and helpers
โ”‚   โ”‚   โ”œโ”€โ”€ pages/           # Page components
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ dashboard.tsx
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ upload.tsx
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ library.tsx
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ search.tsx
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ qa.tsx
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ not-found.tsx
โ”‚   โ”‚   โ”œโ”€โ”€ App.tsx          # Main app component
โ”‚   โ”‚   โ”œโ”€โ”€ index.css        # Global styles
โ”‚   โ”‚   โ””โ”€โ”€ main.tsx         # Entry point
โ”‚   โ””โ”€โ”€ index.html           # HTML template
โ”œโ”€โ”€ server/                   # Backend Express application
โ”‚   โ”œโ”€โ”€ index.ts             # Server entry point
โ”‚   โ”œโ”€โ”€ routes.ts            # API route handlers
โ”‚   โ”œโ”€โ”€ storage.ts           # Data storage layer
โ”‚   โ”œโ”€โ”€ openai.ts            # OpenAI integration
โ”‚   โ”œโ”€โ”€ pdf-parser.ts        # PDF text extraction
โ”‚   โ””โ”€โ”€ vite.ts              # Vite dev server integration
โ”œโ”€โ”€ shared/                   # Shared code between frontend/backend
โ”‚   โ””โ”€โ”€ schema.ts            # TypeScript types & Zod schemas
โ”œโ”€โ”€ package.json             # Dependencies and scripts
โ”œโ”€โ”€ tsconfig.json            # TypeScript configuration
โ”œโ”€โ”€ tailwind.config.ts       # Tailwind CSS configuration
โ”œโ”€โ”€ vite.config.ts           # Vite build configuration
โ”œโ”€โ”€ replit.md                # Project documentation
โ””โ”€โ”€ README.md                # This file

๐Ÿ“ก API Documentation

Papers

Upload Paper (Manual Entry)

POST /api/papers
Content-Type: application/json

{
  "title": "Paper title",
  "authors": "Author names",
  "abstract": "Abstract text",
  "fullText": "Full paper text"
}

Upload PDF

POST /api/papers/upload-pdf
Content-Type: multipart/form-data

Fields:
- pdf: PDF file
- title: Paper title
- authors: Author names  
- abstract: Abstract text

List All Papers

GET /api/papers

Get Single Paper

GET /api/papers/:id

Delete Paper

DELETE /api/papers/:id

Analysis

Extract PICO Elements

POST /api/papers/:id/extract-pico

Returns PICO analysis with confidence scores.

Extract Biomedical Entities

POST /api/papers/:id/extract-entities

Returns diseases, drugs, proteins, and genes.

Get All PICO Analyses

GET /api/pico-elements

Get All Entities

GET /api/entities

Search & Q&A

Semantic Search

GET /api/search?query=your+search+query

Returns relevant paper excerpts ranked by similarity.

Ask Question

POST /api/qa
Content-Type: application/json

{
  "question": "What are the effects of metformin?"
}

Returns answer with citations.

Get Conversation History

GET /api/conversations

๐Ÿ” How It Works

1. Paper Upload Pipeline

PDF/Text Upload
    โ†“
Extract Text (pdf-parse)
    โ†“
Save Paper Metadata
    โ†“
Chunk Text (~500 chars/chunk)
    โ†“
Generate Embeddings (OpenAI)
    โ†“
Store Chunks + Vectors

2. PICO Extraction

Paper Abstract + First 2000 chars
    โ†“
Send to GPT-5 with specialized prompt
    โ†“
Extract Population, Intervention, Comparison, Outcome
    โ†“
Calculate confidence scores (0.0-1.0)
    โ†“
Store structured PICO data

3. Entity Recognition

Paper Abstract + Full Text
    โ†“
Send to GPT-5 for NER
    โ†“
Extract entities by type:
  - Diseases
  - Drugs
  - Proteins
  - Genes
    โ†“
Store with frequency counts

4. Semantic Search

User Query
    โ†“
Generate Query Embedding
    โ†“
Calculate Cosine Similarity with all chunks
    โ†“
Filter by threshold (>0.7)
    โ†“
Rank by relevance
    โ†“
Return top 10 results with excerpts

5. Q&A System

User Question
    โ†“
Generate Question Embedding
    โ†“
Find top 5 relevant chunks (>0.6 similarity)
    โ†“
Build context from relevant passages
    โ†“
Send to GPT-5 with citation instructions
    โ†“
Generate answer with [1], [2] style citations
    โ†“
Map citations to source papers
    โ†“
Return answer + citation metadata

๐Ÿ“– Usage Guide

Uploading Your First Paper

  1. Navigate to "Upload Papers" in the sidebar
  2. Fill in paper details:
    • Title (required)
    • Authors (required)
    • Abstract (required)
  3. Add the full text:
    • Option A: Drag & drop a PDF file
    • Option B: Drag & drop a .txt file
    • Option C: Paste text directly
  4. Click "Upload Paper"

The system will automatically:

  • Extract text from PDFs
  • Generate embeddings for semantic search
  • Make the paper available for analysis

Extracting PICO Elements

  1. Go to "Paper Library"
  2. Click on a paper to select it
  3. Click "Extract PICO" button
  4. Wait for analysis (typically 5-10 seconds)
  5. View results in the PICO tab

Extracting Entities

  1. Select a paper in the library
  2. Click "Extract Entities"
  3. View results in the Entities tab
  4. Entities are grouped by type and show frequency counts

Searching Papers

  1. Navigate to "Search"
  2. Enter your research question in natural language
    • Example: "What are the side effects of metformin?"
  3. Click "Search"
  4. Review results ranked by relevance
  5. Click on results to see full context

Asking Questions

  1. Go to "Q&A Assistant"
  2. Type your question about your research papers
    • Example: "Summarize the findings about diabetes treatments"
  3. Click send or press Enter
  4. Review the answer with citations
  5. Click citations to see source excerpts

๐Ÿ”ง Development

Running Locally

# Install dependencies
npm install

# Start development server
npm run dev

Adding New Features

  1. Define data model in shared/schema.ts
  2. Update storage interface in server/storage.ts
  3. Create API routes in server/routes.ts
  4. Build React components in client/src/pages/
  5. Add to navigation in client/src/components/app-sidebar.tsx

Code Style

  • TypeScript: Strict mode enabled
  • Formatting: Automatic via Vite
  • Components: Functional components with hooks
  • State: TanStack Query for server state
  • Styling: Tailwind utility classes

Testing

The application includes comprehensive data validation:

  • Zod schemas for all API inputs
  • TypeScript for compile-time type safety
  • Error boundaries for runtime errors
  • Toast notifications for user feedback

๐Ÿš€ Deployment

Publishing on Replit

  1. Click the "Publish" button in the top right
  2. Configure your deployment settings

Custom Domain

  1. Go to deployment settings
  2. Add your custom domain
  3. Configure DNS records as instructed

Environment Variables

Ensure these secrets are set:

  • OPENAI_API_KEY - Your OpenAI API key
  • SESSION_SECRET - Random secret for sessions (auto-generated)

๐Ÿ”ฎ Future Enhancements

Planned Features

  1. SciSpacy Integration

    • Advanced biomedical NER with medical vocabularies
    • Relationship extraction between entities
    • Medical ontology linking
  2. Batch Processing Pipeline

    • Upload multiple papers at once
    • Process 1000+ papers in parallel
    • Background job queue
  3. Database Persistence

    • PostgreSQL with Drizzle migrations
    • Data persistence across restarts
    • Better scalability
  4. PubMed Integration

    • Import papers directly from PubMed
    • Automatic metadata fetching
    • Citation network analysis
  5. Advanced Analytics

    • Comparative analysis across papers
    • Meta-analysis capabilities
    • Trend detection and visualization
  6. n8n Workflow Automation

    • Automated paper processing pipelines
    • Integration with external APIs
    • Scheduled analysis jobs

n8n workflow snapshot (super simple)

To keep BioPaperGenie automation beginner-friendly, the project ships with a tiny n8n scene:

  1. HTTP Trigger โ€“ waits for a basic POST payload containing a paper URL/title.
  2. HTTP Request node โ€“ forwards that payload directly to the serverโ€™s /api/papers endpoint (acting like a headless upload form).
  3. Email node โ€“ sends a โ€œpaper ingestedโ€ confirmation so you know the job finished.

Itโ€™s intentionally small, but perfect to explain in an interview how n8n can orchestrate BioPaperGenie without touching backend code.

  1. Collaboration Features

    • Share papers and analyses
    • Team workspaces
    • Comments and annotations
  2. Export & Reports

    • PDF reports of analyses
    • Export to CSV/JSON
    • Integration with reference managers

๐Ÿค Contributing

Contributions are welcome! Here's how you can help:

  1. Report bugs - Open an issue describing the problem
  2. Suggest features - Share your ideas for improvements
  3. Submit PRs - Fork, make changes, and submit pull requests
  4. Improve docs - Help make documentation clearer

๐Ÿ“„ License

This project is licensed under the MIT License.

๐Ÿ™ Acknowledgments

  • OpenAI for GPT-5 and embeddings API
  • Shadcn for the beautiful UI component library
  • Replit for the development platform
  • pdf-parse for PDF text extraction
  • Biomedical research community for inspiration

๐Ÿ“ง Support

Need help? Have questions?

  • ๐Ÿ“ Check existing documentation
  • ๐Ÿ› Report bugs via issues
  • ๐Ÿ’ฌ Ask questions in discussions
  • ๐Ÿ“ง Contact support

Built with โค๏ธ for biomedical researchers

Last updated: November 2025

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages