A local-first, self-hosted research assistant inspired by NotebookLM. Built with R and Shiny.
Named after the Serapeum of Alexandria, the daughter library of the ancient Library of Alexandria.
Upload PDFs and chat with your documents using RAG (Retrieval-Augmented Generation).
- Chat with citations - Get answers with document name and page number references
- Markdown rendering - Assistant responses display with formatted headers, tables, lists, and code blocks
- One-click presets - Summarize, Key Points, Study Guide, Outline, and more
- Chat export - Download conversations as Markdown (.md) or styled HTML (.html)
- Full-text search - Vector embeddings for semantic search across documents
- Slide generation - Generate Quarto RevealJS presentations from your research
Discover and curate academic papers via OpenAlex (240M+ scholarly works).
- Smart search - Query across titles, abstracts, or full text
- Document type filters - Filter by article, review, preprint, book, dissertation, dataset
- Quality filters - Exclude retracted papers, flag predatory journals/publishers
- Citation filters - Set minimum citation thresholds
- Rich metadata display:
- Type badges (article, review, preprint, etc.)
- Open Access status badges (gold, green, hybrid, bronze, closed)
- Citation metrics (cited-by count, FWCI, reference count)
- Paper keywords from OpenAlex
- DOI as clickable link (with citation key fallback for legacy papers)
- Citation export - Download results as BibTeX (.bib) or CSV (.csv) with unique citation keys
- Chat export - Download abstract chat conversations as Markdown or HTML
- Export to seed search - Use any paper as a seed for a new discovery search with one click
- Import to documents - Move curated papers to document notebooks for deeper analysis
Generate presentation slides from notebook content using Quarto RevealJS.
- Configurable options - Length, audience level, theme selection
- 11 RevealJS themes - moon, sky, beige, serif, and more
- Speaker notes - Optional auto-generated presenter notes
- Multiple formats - Preview in-app, download .qmd, export to HTML/PDF
- Custom instructions - Guide the AI on focus areas
Explore citation relationships through interactive network graphs.
- Multi-seed networks - Seed from all papers in a notebook or BibTeX import
- Overlap detection - Papers cited by multiple seeds highlighted as diamonds
- Shape encoding - Stars (seeds), diamonds (overlap), dots (regular) with year color gradient
- Missing papers discovery - Find frequently-cited papers not in your collection, import with one click
- Directional control - Explore forward citations, backward references, or both
- Configurable depth - Traverse 1-3 hops from seed papers
- Node cap - Per-seed node limits to keep graphs readable
- Interactive graph - Pan, zoom, click nodes to view paper details
- Physics toggle - Freeze/unfreeze node simulation from the legend panel
- Color palettes - Five viridis color schemes with live-switching
- Save & reload - Persist networks to database with layout positions preserved
- Collapsible legend - Minimizable legend with shape key and gradient preview
Full dark mode support with Catppuccin color palette.
- One-click toggle - Moon/sun icon in the navbar switches between light (Latte) and dark (Mocha) themes
- Persistent preference - Theme choice saved to localStorage, restored on reload
- Auto-themed plots - Chart backgrounds adapt automatically via thematic integration
- Comprehensive coverage - All components styled: value boxes, alerts, chat messages, network graphs, tables
Monitor API usage in real-time.
- Session costs - Live cost display in the sidebar footer
- OpenRouter balance - View remaining credits and usage
- Cost history - 30-day bar chart of daily API spending
- Per-call breakdown - Detailed log of every API call with model, tokens, and cost
- API key validation - Visual indicators show if keys are configured and working
- Model selection - Choose from budget, mid-tier, or premium chat models
- Embedding models - Select from OpenAI, Google, Mistral, and more
- Quality data downloads - Fetch predatory journal lists and retraction databases
- All data stays local - DuckDB for portable, single-file storage
- No cloud dependencies - Everything runs on your machine
- Single-user - No authentication needed
- Portable - Copy the database file to move your research
- R (>= 4.5)
- Quarto (for slide generation)
- RStudio (optional but recommended)
# Clone the repository
git clone https://github.com/seanthimons/serapeum.git
cd serapeum
# One-shot setup: installs renv + all R packages
Rscript setup.RThis installs all 106 dependencies from the lockfile. No manual package management needed.
Configure API keys via the Settings page in the app (recommended), or copy the example config:
cp config.example.yml config.ymlopenrouter:
api_key: "your-openrouter-key" # Get from openrouter.ai/keys
openalex:
email: "your@email.com" # For polite pool access (faster rate limits)shiny::runApp()Open http://localhost:8080 in your browser.
- DuckDB database is created automatically
- Quality data (predatory journals, retraction watch, OpenAlex topics) is seeded from bundled RDS files — no download needed
- Startup wizard guides you through your first search
- Click "New Document Notebook"
- Give it a name
- Upload PDFs using the upload button
- Wait for processing (text extraction)
- Click "Embed Documents" to generate embeddings
- Ask questions in the chat interface
- Use preset buttons for common tasks (Summary, Key Points, etc.)
- Generate slides with the "Slides" tab
- Click "New Search Notebook"
- Enter a search query and configure filters:
- Date range
- Document types (article, review, preprint, etc.)
- Open access only
- Minimum citations
- Exclude retracted papers
- Click "Refresh" to search OpenAlex
- Browse results - each paper shows:
- Type badge (article, review, etc.)
- OA status badge (gold, green, hybrid, etc.)
- Citation metrics (cited-by, FWCI, references)
- Keywords
- Remove unwanted papers with the X button
- Click "Embed Papers" to enable semantic search
- Query the abstracts in chat
- Export results: Export dropdown → BibTeX (.bib) or CSV (.csv)
- Use "Use as Seed" on any paper to launch a new discovery search
- Import selected papers to a document notebook
- API Keys - Configure OpenRouter and OpenAlex credentials
- Visual indicators show validation status (green check = valid)
- Models - Select chat and embedding models
- Quality Data - Download predatory journal/publisher lists and retraction database
- DOI Management - View DOI coverage stats and backfill missing DOIs from OpenAlex
- R + Shiny + bslib: Web framework with Bootstrap 5 UI components
- DuckDB: Embedded analytical database for local storage
- OpenRouter: Unified API for multiple LLM providers (Claude, GPT-4, Llama, etc.)
- OpenAlex: Free, open academic paper search API
- Quarto: Scientific publishing system for slide generation
- pdftools: PDF text extraction
serapeum/
├── app.R # Main Shiny app
├── setup.R # One-shot setup script
├── .Rprofile # Auto-activates renv
├── renv.lock # Locked dependency versions
├── config.yml # Your config (gitignored)
├── config.example.yml # Config template
├── R/
│ ├── config.R # Config loading
│ ├── db.R # Database operations
│ ├── db_migrations.R # Schema migrations
│ ├── api_openrouter.R # OpenRouter client
│ ├── api_openalex.R # OpenAlex client
│ ├── pdf.R # PDF utilities
│ ├── rag.R # RAG pipeline
│ ├── _ragnar.R # Ragnar embedding store helpers
│ ├── slides.R # Slide generation
│ ├── cost_tracking.R # API cost tracking and pricing
│ ├── theme_catppuccin.R # Catppuccin Latte/Mocha dark mode CSS
│ ├── citation_network.R # Citation graph data and layout
│ ├── quality_filter.R # Predatory/retraction filtering + auto-seed
│ ├── interrupt.R # Graceful request cancellation
│ ├── utils_doi.R # DOI normalization and citation keys
│ ├── utils_citation.R # BibTeX/CSV export formatters
│ ├── utils_export.R # Chat export formatters (Markdown/HTML)
│ ├── utils_filters.R # Search filter utilities
│ ├── mod_about.R # About page
│ ├── mod_citation_network.R # Network visualization UI
│ ├── mod_cost_tracker.R # Cost tracking dashboard
│ ├── mod_document_notebook.R
│ ├── mod_search_notebook.R
│ ├── mod_seed_discovery.R # Seed paper discovery
│ ├── mod_query_builder.R # LLM-assisted query building
│ ├── mod_topic_explorer.R # Topic browsing
│ ├── mod_journal_filter.R # Journal filtering
│ ├── mod_keyword_filter.R # Keyword filtering
│ ├── mod_bulk_import.R # DOI/BibTeX bulk import
│ ├── mod_settings.R
│ └── mod_slides.R
├── data/
│ ├── support/ # Bundled RDS files (quality data, topics)
│ └── notebooks.duckdb # Database file (auto-created)
├── storage/ # Uploaded PDFs
├── output/ # Generated slides
└── tests/
└── testthat/ # Unit tests
testthat::test_dir("tests/testthat")Delete data/notebooks.duckdb to start fresh.
We welcome contributions! Please see:
- CONTRIBUTING.md for detailed contribution guidelines
- CODE_OF_CONDUCT.md for community standards
- TODO.md for the feature roadmap and open issues
Important: Serapeum is a research tool powered by AI language models.
- Not an Oracle: AI-generated responses may contain errors, hallucinations, or inaccuracies. Always verify important information from primary sources.
- Not Professional Advice: This tool is not a substitute for professional, medical, legal, financial, or other expert advice.
- Makes Mistakes: AI models can misinterpret documents, generate plausible-sounding but incorrect answers, and miss important context.
- Research Tool Only: Intended for exploratory research and learning. Critical decisions should be based on careful review of original sources.
MIT - Copyright (c) 2024-2026 Sean Thimons
To report security vulnerabilities, please see SECURITY.md.
- Inspired by NotebookLM
- Paper data from OpenAlex
- LLM access via OpenRouter
- Quality data from Retraction Watch and Predatory Journals