An advanced AI-powered platform for analyzing biomedical research papers with automated PICO extraction, entity recognition, semantic search, and intelligent Q&A with verified citations.
This repository hosts the full-stack implementation published at Generative-AI-Agent-Biomedical-Research-Papers-Review.
- Features
- Technology Stack
- Getting Started
- Project Structure
- API Documentation
- How It Works
- Usage Guide
- Development
- Deployment
- Future Enhancements
- Contributing
- ๐ PDF Support: Upload biomedical research papers in PDF format
- ๐ Text Support: Also accepts plain text files
- ๐ Automatic Processing: Papers are automatically chunked and embedded for semantic search
- ๐ Library Management: Organize and browse all uploaded papers
Automatically extracts the four key components of clinical research:
- Population: Patient groups or subjects being studied
- Intervention: Treatments or exposures being investigated
- Comparison: Alternative treatments or control groups
- Outcome: Measured results or endpoints
Each element includes confidence scores (0.0-1.0) and is displayed with color-coded visual sections.
Identifies and extracts key biomedical entities:
- ๐ฆ Diseases/Conditions: Medical conditions and diagnoses
- ๐ Drugs/Medications: Pharmaceutical compounds and treatments
- ๐งฌ Proteins: Protein names and identifiers
- ๐ฌ Genes: Genetic markers and gene names
Entities are grouped by type with frequency counts for easy analysis.
- ๐ AI-Powered Search: Natural language queries across all papers
- ๐ Relevance Scoring: Cosine similarity matching with relevance percentages
- ๐ Contextual Excerpts: Shows matching passages with highlighting
- โก Fast Results: Optimized vector search with embedding indices
- ๐ฌ Conversational Interface: Chat-style Q&A about your research papers
- ๐ฏ Context-Aware Answers: AI analyzes relevant passages before answering
- ๐ Automatic Citations: All answers include citations with source papers
- ๐ Citation Tracking: Click citations to view full excerpts and paper details
- ๐พ Conversation History: All Q&A sessions are saved for reference
- ๐ Real-time Statistics: Track papers, analyses, entities, and Q&A sessions
- ๐ Recent Activity: View recently uploaded papers
- ๐จ Visual Overview: Clean interface showing your research corpus at a glance
- ๐ Dark Mode: Full dark/light theme support with toggle
- ๐ฑ Responsive Design: Works seamlessly on desktop, tablet, and mobile
- โฟ Accessibility: WCAG AA compliant with keyboard navigation
- ๐จ Modern Design: Clean, professional interface optimized for research workflows
- React 18 - Modern UI library with hooks
- TypeScript - Type-safe development
- Wouter - Lightweight routing
- TanStack Query v5 - Server state management
- Tailwind CSS - Utility-first styling
- Shadcn UI - High-quality component library
- Lucide Icons - Beautiful icon set
- Vite - Lightning-fast build tool
- Node.js 20 - JavaScript runtime
- Express.js - Web application framework
- TypeScript - Type-safe server code
- Multer - File upload handling
- pdf-parse - PDF text extraction
- OpenAI GPT-5 - Advanced language model (released August 2025)
- text-embedding-3-small - Text embedding for semantic search
- Vector Embeddings - 1536-dimensional vectors for similarity matching
- Cosine Similarity - Mathematical measure for relevance scoring
- Drizzle ORM - Type-safe database toolkit
- Zod - Schema validation
- In-Memory Storage - Fast development with MemStorage
- Node.js 18 or higher
- OpenAI API key (Get one here)
-
Clone or fork this Repl
-
Set up environment variables
Add your OpenAI API key to Replit Secrets:
- Open the "Secrets" tab (๐ icon in left sidebar)
- Add secret:
OPENAI_API_KEY=your-api-key-here
-
Start the application
npm run dev
-
Access the application
- The app runs on port 5000
- Click the web preview or open your Repl URL
biomedical-research-ai-agent/
โโโ client/ # Frontend React application
โ โโโ public/ # Static assets
โ โโโ src/
โ โ โโโ components/ # Reusable React components
โ โ โ โโโ ui/ # Shadcn UI components
โ โ โ โโโ app-sidebar.tsx
โ โ โ โโโ theme-provider.tsx
โ โ โ โโโ theme-toggle.tsx
โ โ โโโ hooks/ # Custom React hooks
โ โ โโโ lib/ # Utilities and helpers
โ โ โโโ pages/ # Page components
โ โ โ โโโ dashboard.tsx
โ โ โ โโโ upload.tsx
โ โ โ โโโ library.tsx
โ โ โ โโโ search.tsx
โ โ โ โโโ qa.tsx
โ โ โ โโโ not-found.tsx
โ โ โโโ App.tsx # Main app component
โ โ โโโ index.css # Global styles
โ โ โโโ main.tsx # Entry point
โ โโโ index.html # HTML template
โโโ server/ # Backend Express application
โ โโโ index.ts # Server entry point
โ โโโ routes.ts # API route handlers
โ โโโ storage.ts # Data storage layer
โ โโโ openai.ts # OpenAI integration
โ โโโ pdf-parser.ts # PDF text extraction
โ โโโ vite.ts # Vite dev server integration
โโโ shared/ # Shared code between frontend/backend
โ โโโ schema.ts # TypeScript types & Zod schemas
โโโ package.json # Dependencies and scripts
โโโ tsconfig.json # TypeScript configuration
โโโ tailwind.config.ts # Tailwind CSS configuration
โโโ vite.config.ts # Vite build configuration
โโโ replit.md # Project documentation
โโโ README.md # This file
POST /api/papers
Content-Type: application/json
{
"title": "Paper title",
"authors": "Author names",
"abstract": "Abstract text",
"fullText": "Full paper text"
}POST /api/papers/upload-pdf
Content-Type: multipart/form-data
Fields:
- pdf: PDF file
- title: Paper title
- authors: Author names
- abstract: Abstract textGET /api/papersGET /api/papers/:idDELETE /api/papers/:idPOST /api/papers/:id/extract-picoReturns PICO analysis with confidence scores.
POST /api/papers/:id/extract-entitiesReturns diseases, drugs, proteins, and genes.
GET /api/pico-elementsGET /api/entitiesGET /api/search?query=your+search+queryReturns relevant paper excerpts ranked by similarity.
POST /api/qa
Content-Type: application/json
{
"question": "What are the effects of metformin?"
}Returns answer with citations.
GET /api/conversationsPDF/Text Upload
โ
Extract Text (pdf-parse)
โ
Save Paper Metadata
โ
Chunk Text (~500 chars/chunk)
โ
Generate Embeddings (OpenAI)
โ
Store Chunks + Vectors
Paper Abstract + First 2000 chars
โ
Send to GPT-5 with specialized prompt
โ
Extract Population, Intervention, Comparison, Outcome
โ
Calculate confidence scores (0.0-1.0)
โ
Store structured PICO data
Paper Abstract + Full Text
โ
Send to GPT-5 for NER
โ
Extract entities by type:
- Diseases
- Drugs
- Proteins
- Genes
โ
Store with frequency counts
User Query
โ
Generate Query Embedding
โ
Calculate Cosine Similarity with all chunks
โ
Filter by threshold (>0.7)
โ
Rank by relevance
โ
Return top 10 results with excerpts
User Question
โ
Generate Question Embedding
โ
Find top 5 relevant chunks (>0.6 similarity)
โ
Build context from relevant passages
โ
Send to GPT-5 with citation instructions
โ
Generate answer with [1], [2] style citations
โ
Map citations to source papers
โ
Return answer + citation metadata
- Navigate to "Upload Papers" in the sidebar
- Fill in paper details:
- Title (required)
- Authors (required)
- Abstract (required)
- Add the full text:
- Option A: Drag & drop a PDF file
- Option B: Drag & drop a .txt file
- Option C: Paste text directly
- Click "Upload Paper"
The system will automatically:
- Extract text from PDFs
- Generate embeddings for semantic search
- Make the paper available for analysis
- Go to "Paper Library"
- Click on a paper to select it
- Click "Extract PICO" button
- Wait for analysis (typically 5-10 seconds)
- View results in the PICO tab
- Select a paper in the library
- Click "Extract Entities"
- View results in the Entities tab
- Entities are grouped by type and show frequency counts
- Navigate to "Search"
- Enter your research question in natural language
- Example: "What are the side effects of metformin?"
- Click "Search"
- Review results ranked by relevance
- Click on results to see full context
- Go to "Q&A Assistant"
- Type your question about your research papers
- Example: "Summarize the findings about diabetes treatments"
- Click send or press Enter
- Review the answer with citations
- Click citations to see source excerpts
# Install dependencies
npm install
# Start development server
npm run dev- Define data model in
shared/schema.ts - Update storage interface in
server/storage.ts - Create API routes in
server/routes.ts - Build React components in
client/src/pages/ - Add to navigation in
client/src/components/app-sidebar.tsx
- TypeScript: Strict mode enabled
- Formatting: Automatic via Vite
- Components: Functional components with hooks
- State: TanStack Query for server state
- Styling: Tailwind utility classes
The application includes comprehensive data validation:
- Zod schemas for all API inputs
- TypeScript for compile-time type safety
- Error boundaries for runtime errors
- Toast notifications for user feedback
- Click the "Publish" button in the top right
- Configure your deployment settings
- Go to deployment settings
- Add your custom domain
- Configure DNS records as instructed
Ensure these secrets are set:
OPENAI_API_KEY- Your OpenAI API keySESSION_SECRET- Random secret for sessions (auto-generated)
-
SciSpacy Integration
- Advanced biomedical NER with medical vocabularies
- Relationship extraction between entities
- Medical ontology linking
-
Batch Processing Pipeline
- Upload multiple papers at once
- Process 1000+ papers in parallel
- Background job queue
-
Database Persistence
- PostgreSQL with Drizzle migrations
- Data persistence across restarts
- Better scalability
-
PubMed Integration
- Import papers directly from PubMed
- Automatic metadata fetching
- Citation network analysis
-
Advanced Analytics
- Comparative analysis across papers
- Meta-analysis capabilities
- Trend detection and visualization
-
n8n Workflow Automation
- Automated paper processing pipelines
- Integration with external APIs
- Scheduled analysis jobs
To keep BioPaperGenie automation beginner-friendly, the project ships with a tiny n8n scene:
- HTTP Trigger โ waits for a basic POST payload containing a paper URL/title.
- HTTP Request node โ forwards that payload directly to the serverโs
/api/papersendpoint (acting like a headless upload form). - Email node โ sends a โpaper ingestedโ confirmation so you know the job finished.
Itโs intentionally small, but perfect to explain in an interview how n8n can orchestrate BioPaperGenie without touching backend code.
-
Collaboration Features
- Share papers and analyses
- Team workspaces
- Comments and annotations
-
Export & Reports
- PDF reports of analyses
- Export to CSV/JSON
- Integration with reference managers
Contributions are welcome! Here's how you can help:
- Report bugs - Open an issue describing the problem
- Suggest features - Share your ideas for improvements
- Submit PRs - Fork, make changes, and submit pull requests
- Improve docs - Help make documentation clearer
This project is licensed under the MIT License.
- OpenAI for GPT-5 and embeddings API
- Shadcn for the beautiful UI component library
- Replit for the development platform
- pdf-parse for PDF text extraction
- Biomedical research community for inspiration
Need help? Have questions?
- ๐ Check existing documentation
- ๐ Report bugs via issues
- ๐ฌ Ask questions in discussions
- ๐ง Contact support
Built with โค๏ธ for biomedical researchers
Last updated: November 2025