Skip to content

BioReasoner: Training LLMs for grounded scientific reasoning. 0% hallucination rate on citations, 100% format adherence. Cross-domain polymathic insights via Scientific Tribunal evaluation.

License

Notifications You must be signed in to change notification settings

vanbelkummax/bioreasoner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BioReasoner Logo

BioReasoner

Training Language Models for Grounded Scientific Reasoning
Cross-domain "polymathic" insights without hallucinated citations

0% Hallucinations 100% Format 15k Papers Qwen2.5-7B License


Overview

BioReasoner is not a theoretical proposal—it is an active, evolving project with established momentum.

We are training a 7B language model to perform rigorous scientific reasoning across domains, generating novel hypotheses that are:

  • Grounded in real literature (no hallucinated citations)
  • Transparent in reasoning (explicit <think> blocks)
  • Cross-domain ("polymathic" bridges between disparate fields)

Our approach uses Direct Preference Optimization (DPO) with a unique "Scientific Tribunal" evaluation framework where multiple AI judges assess outputs for logical rigor, novelty, and citation accuracy.


Key Results

BioReasoner v2.1 (Current)

Metric Result Notes
Hallucination Rate 0% Zero fabricated citations across 83 test samples
Format Adherence 100% All outputs include proper <think> reasoning blocks
Logic Score 3.4/5 Multi-judge consensus
Novelty Score 3.9/5 Cross-domain hypothesis generation

Training Evolution

Version Date DPO Pairs Key Achievement
v2.0 2026-01-02 280 Zero hallucinations, 20% missing <think>
v2.1 2026-01-03 46 100% format adherence via force-prefixing

Methodology

1. Supervised Fine-Tuning (SFT)

  • Base Model: Qwen/Qwen2.5-7B-Instruct
  • Training Data: 751 paper triplets from 7 Vanderbilt faculty
  • Method: LoRA (r=16, alpha=32)
  • Task Types: critique_extend, method_bridge, gap_analysis

2. Direct Preference Optimization (DPO)

  • Chosen Responses: Claude Opus 4.5 generated critiques
  • Rejected Responses: Student model outputs with lower scores
  • Beta (KL penalty): 0.1
  • Key Innovation: Force-prefix <think> to eliminate format regression

3. Scientific Tribunal Evaluation

A multi-judge evaluation framework where:

  • Lead Judge (Claude Opus 4.5) scores logic, novelty, citation accuracy
  • Second Judge (Codex) provides independent verification
  • Auto-Filter catches format violations and low-quality outputs
┌─────────────────────────────────────────────────────────────┐
│                    SCIENTIFIC TRIBUNAL                       │
├─────────────────────────────────────────────────────────────┤
│  Input: Model-generated scientific critique                  │
│                                                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │ Auto-Filter │→ │ Lead Judge  │→ │Second Judge │         │
│  │ (Format)    │  │ (Claude)    │  │ (Codex)     │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
│                          ↓                                   │
│                   Final Verdict                              │
│            (PASS/FAIL + Detailed Rubric)                    │
└─────────────────────────────────────────────────────────────┘

Polymath Corpus

View Full Corpus Documentation →

We are building an unprecedented cross-domain knowledge base spanning 35+ theoretical domains and 15,294 papers:

AI/ML Methods (13 domains)

  • PEFT (LoRA, QLoRA, DoRA)
  • Preference Optimization (DPO, PPO, RLHF)
  • Quantization & Compression
  • Self-Play & Distillation
  • Evolution Strategies
  • Directed Evolution (Biology)

Theoretical Foundations (6 domains)

  • Category Theory (Spivak, Baez)
  • Network Science (Barabási, Alon)
  • Information Theory (Shannon)
  • Complexity Science (Kauffman)
  • Control Theory (Åström)
  • Causal Inference (Pearl)

Interdisciplinary Bridges (8 domains)

  • Biosemiotics
  • Cognitive Analogy
  • Evolutionary Game Theory
  • Thermodynamics of Computation

Current Status: 15,294 papers harvested | 73.3% with abstracts | 288+ PDFs downloaded


Repository Structure

bioreasoner/
├── scripts/
│   ├── train_dpo_8k.py           # DPO training with 4-bit quantization
│   ├── merge_adapter.py          # LoRA → full model merging
│   ├── bioreasoner_21_pipeline.py # Best-of-N → Filter → DPO pairs
│   ├── novelty_scoring.py        # Rubric-based scoring system
│   └── download_all_pdfs.py      # Multi-source PDF acquisition
├── data/
│   ├── train_750.jsonl           # SFT training data
│   ├── test_84.jsonl             # Held-out evaluation set
│   └── dpo_v21_pairs.jsonl       # DPO preference pairs
├── evaluation/
│   ├── tribunal_evaluation.py    # Multi-judge evaluation framework
│   └── format_checker.py         # Auto-filter for format compliance
└── docs/
    ├── LAB_NOTEBOOK.md           # Detailed training logs
    └── POLYMATH_CORPUS.md        # Cross-domain paper collection

Quick Start

Installation

git clone https://github.com/vanbelkummax/bioreasoner.git
cd bioreasoner
pip install -r requirements.txt

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("vanbelkummax/bioreasoner-2.1")
tokenizer = AutoTokenizer.from_pretrained("vanbelkummax/bioreasoner-2.1")

prompt = """Given these three papers:
1. [Paper on spatial transcriptomics methodology]
2. [Paper on graph neural networks]
3. [Paper on tumor microenvironment]

Generate a novel hypothesis that bridges these domains."""

# Model outputs include <think> blocks for transparent reasoning
output = model.generate(tokenizer(prompt, return_tensors="pt").input_ids)

Commitment to Open Science

We believe that "Scientific Alignment"—making models that are both novel and rigorously grounded—should be a shared community resource, not a proprietary secret.

We will release:

  • Model weights (Hugging Face)
  • Scientific Tribunal evaluation scripts
  • Synthetic reasoning datasets
  • Training pipelines and configurations

Institutional Validation

VUMC

Being based at Vanderbilt University Medical Center allows us to validate BioReasoner's outputs against:

  • Real-world clinical data
  • Expert pathology feedback
  • Ongoing research collaborations

This ensures the model's "polymathic" bridges are grounded in biological reality.


Roadmap

  • SFT on 751 paper triplets
  • DPO v2.0 with 280 preference pairs
  • DPO v2.1 with force-prefix (100% format adherence)
  • Polymath corpus: 15,294 papers across 25+ domains
  • Scale to 15,000+ triplets for cross-domain emergence
  • Release model weights on Hugging Face
  • Publish Scientific Tribunal framework
  • Clinical validation studies

Citation

@software{bioreasoner2026,
  author = {Van Belkum, Max},
  title = {BioReasoner: Training Language Models for Grounded Scientific Reasoning},
  year = {2026},
  url = {https://github.com/vanbelkummax/bioreasoner},
  institution = {Vanderbilt University Medical Center}
}

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.


Building AI that reasons like a scientist, not a search engine.

About

BioReasoner: Training LLMs for grounded scientific reasoning. 0% hallucination rate on citations, 100% format adherence. Cross-domain polymathic insights via Scientific Tribunal evaluation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages