Local OpenAlex database with 284M+ scholarly works, abstracts, and semantic search
SciTeX Impact Factor (OpenAlex) validated against JCR 2024 (r = 0.96, 17,042 journals)
Why OpenAlex Local?
Built for the LLM era - features that matter for AI research assistants:
| Feature | Benefit |
|---|---|
| 284M Works | More coverage than CrossRef |
| Abstracts | ~45-60% availability for semantic search |
| Concepts & Topics | Built-in classification |
| Author Disambiguation | Linked to institutions |
| Open Access Info | OA status and URLs |
Perfect for: RAG systems, research assistants, literature review automation.
Installation
pip install openalex-localFrom source:
git clone https://github.com/ywatanabe1989/openalex-local
cd openalex-local && make installDatabase setup (~300 GB, ~1-2 days to build):
# Check system status
make status
# 1. Download OpenAlex Works snapshot (~300GB)
make download-screen # runs in background
# 2. Build SQLite database
make build-db
# 3. Build FTS5 index
make build-ftsPython API
from openalex_local import search, get, count
# Full-text search (title + abstract)
results = search("machine learning neural networks")
for work in results:
print(f"{work.title} ({work.year})")
print(f" Abstract: {work.abstract[:200]}...")
print(f" Concepts: {[c['name'] for c in work.concepts]}")
# Get by OpenAlex ID or DOI
work = get("W2741809807")
work = get("10.1038/nature12373")
# Count matches
n = count("CRISPR")CLI
openalex-local search "CRISPR genome editing" -n 5
openalex-local search-by-doi W2741809807
openalex-local search-by-doi 10.1038/nature12373
openalex-local status # Configuration and database statsWith abstracts (-a flag):
$ openalex-local search "neural network" -n 1 -a
Found 1,523,847 matches in 45.2ms
1. Deep learning for neural networks (2015)
OpenAlex ID: W2741809807
Abstract: This paper presents a comprehensive overview of deep learning
techniques for neural network architectures...
HTTP API
Start the FastAPI server:
openalex-local relay --host 0.0.0.0 --port 31292Endpoints:
# Search works (FTS5)
curl "http://localhost:31292/works?q=CRISPR&limit=10"
# Get by ID or DOI
curl "http://localhost:31292/works/W2741809807"
curl "http://localhost:31292/works/10.1038/nature12373"
# Batch lookup
curl -X POST "http://localhost:31292/works/batch" \
-H "Content-Type: application/json" \
-d '{"ids": ["W2741809807", "10.1038/nature12373"]}'
# Database info
curl "http://localhost:31292/info"HTTP mode (connect to running server):
# On local machine (if server is remote)
ssh -L 31292:127.0.0.1:31292 your-server
# Python client
from openalex_local import configure_http
configure_http("http://localhost:31292")
# Or via CLI
openalex-local --http search "CRISPR"MCP Server
Run as MCP (Model Context Protocol) server:
openalex-local mcp startLocal MCP client configuration:
{
"mcpServers": {
"openalex-local": {
"command": "openalex-local",
"args": ["mcp", "start"],
"env": {
"OPENALEX_LOCAL_DB": "/path/to/openalex.db"
}
}
}
}Remote MCP via HTTP:
# On server: start persistent MCP server
openalex-local mcp start -t http --host 0.0.0.0 --port 8083{
"mcpServers": {
"openalex-remote": {
"url": "http://your-server:8083/mcp"
}
}
}Diagnose setup:
openalex-local mcp doctor # Check dependencies and database
openalex-local mcp list-tools # Show available MCP tools
openalex-local mcp installation # Show client config examplesAvailable tools:
search- Full-text search across 284M+ paperssearch_by_id- Get paper by OpenAlex ID or DOIenrich_ids- Batch lookup with metadatastatus- Database statistics
SciTeX Impact Factor (OpenAlex)
We provide precomputed SciTeX Impact Factors calculated from OpenAlex citation data. These follow the JCR formula but use OpenAlex as the data source.
Validation against JCR 2024 (17,042 matched journals):
| Metric | Value |
|---|---|
| Pearson r | 0.96 |
| Spearman ρ | 0.93 |
| p-value | < 1e-100 |
Export SciTeX IF:
# Export all SciTeX IF values
openalex-local export-if -o scitex_if.csv
openalex-local export-if -o scitex_if.json
# Top 1000
openalex-local export-if -o top1000.csv --limit 1000Use in search results:
openalex-local search "machine learning" --with-ifFormula:
SciTeX IF(Year) = Citations in Year to articles from (Year-1, Year-2)
─────────────────────────────────────────────────────
Citable articles published in (Year-1, Year-2)
Note: "SciTeX IF" is our calculation using OpenAlex data. It is not the trademarked "Journal Impact Factor" from Clarivate/JCR.
Related Projects
crossref-local - Sister project with CrossRef data:
| Feature | crossref-local | openalex-local |
|---|---|---|
| Works | 167M | 284M |
| Abstracts | ~21% | ~45-60% |
| Update frequency | Real-time | Monthly |
| DOI authority | Yes (source) | Uses CrossRef |
| Citations | Raw references | Linked works |
| Concepts/Topics | No | Yes |
| Author IDs | No | Yes |
| Best for | DOI lookup, raw refs | Semantic search |
When to use CrossRef: Real-time DOI updates, raw reference parsing, authoritative metadata. When to use OpenAlex: Semantic search, citation analysis, topic discovery.
Documentation
Full documentation available at openalex-local.readthedocs.io
Data Source
Data from OpenAlex, an open catalog of scholarly works. Updated monthly from their snapshot.
OpenAlex Local is part of SciTeX. When used inside the SciTeX framework, literature search integrates with the scholar module:
import scitex
# Search local OpenAlex database via SciTeX
results = scitex.scholar.search("neural oscillations gamma band")
# Enrich BibTeX with OpenAlex metadata
scitex.scholar.enrich_bibtex("references.bib")The SciTeX system follows the Four Freedoms for Research below, inspired by the Free Software Definition:
Four Freedoms for Research
- The freedom to run your research anywhere — your machine, your terms.
- The freedom to study how every step works — from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.