Skip to content

ojowwalker77/get-papers

Repository files navigation

get-papers-engine

Open source engine to fetch academic papers at scale. Built to make research accessible.

Why?

Academic research shouldn't be locked behind paywalls and clunky interfaces. This tool lets you programmatically collect paper data from open sources - for literature reviews, research projects, or building your own tools.

Current Sources

  • OpenAlex - 250M+ scholarly works, completely free, no API key needed

Planned / Request a Source

This is modular. Each source is a separate adapter. Want support for another database?

Open an issue: github.com/jowwalker77/get-papers-engine/issues

Potential additions:

  • Semantic Scholar
  • arXiv
  • PubMed
  • bioRxiv/medRxiv
  • CORE
  • Unpaywall

Install

Requires Bun:

curl -fsSL https://bun.sh/install | bash

Clone and install:

git clone https://github.com/jowwalker77/get-papers-engine.git
cd get-papers-engine
bun install

Usage

Create a script (e.g. my-search.ts):

import { createPapersEngine } from "./src"
import { Effect } from "effect"

const engine = createPapersEngine({
  dbPath: "./my-papers.db",
})

const program = Effect.gen(function* () {
  yield* engine.migrate()

  const result = yield* engine.fetch({
    query: "your research topic",
    fromYear: 2020,
    minCitations: 10,
    maxPapers: 100,
  })

  console.log(`Fetched ${result.imported} papers`)
})

Effect.runPromise(program)

Run:

bun run my-search.ts

Papers are saved to SQLite (my-papers.db). Open with any SQLite viewer or query programmatically.

Search Options

Option What it does
query Search terms
fromYear Papers from this year onwards
minCitations Minimum citation count
maxPapers How many to fetch
hasAbstract Only papers with abstracts
language Language code (en, es, zh, etc.)

What You Get

Each paper includes:

  • Title, abstract, authors
  • Publication date
  • DOI
  • Citation count
  • Open access PDF link (when available)
  • Paper type (article, review, etc.)

Semantic Search

Find related papers using AI embeddings. Works locally, no API keys needed.

yield* engine.embedAll()

const similar = yield* engine.similar("your research question")

The model downloads once (~23MB) on first use.

Docs

Contributing

Pull requests welcome. To add a new paper source:

  1. Create adapter in src/sources/
  2. Follow the OpenAlex pattern
  3. Open PR

License

MIT

Contact

Questions or source requests: @jowwalker77

About

A TypeScript Bun library for fetching academic papers from OpenAlex and any other open database

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published