Skip to content

jakobrichert/openblup

OpenBLUP

Open-source REML and BLUP for plant and animal breeding — a modern linear mixed model engine written in Rust with Python bindings.

Why This Project?

The breeding community deserves modern, open tools.

For decades, variance component estimation and breeding value prediction in genetics have depended on ASReml, a proprietary Fortran-based tool. While ASReml is an excellent piece of software that has powered thousands of research papers, its closed-source nature and Fortran codebase create real barriers:

  • No transparency: Researchers cannot inspect, audit, or modify the algorithms they depend on for scientific conclusions.
  • No extensibility: Adding new variance structures, integrating with modern ML pipelines, or embedding in larger systems is impossible without vendor support.
  • Fortran lock-in: The Fortran ecosystem lacks modern tooling (package managers, CI/CD, cross-compilation, WebAssembly targets). Maintaining and extending Fortran code requires increasingly rare expertise.
  • Licence costs: ASReml licences are expensive, creating inequity between well-funded programs and breeding efforts in developing countries where genetic gain matters most.
  • Reproducibility: Closed-source software is a weak link in reproducible science. When the solver is a black box, "reproducing" a result means "having the same licence."

Open alternatives exist (e.g., sommer in R, MixedModels.jl in Julia), but none combine ASReml's full feature set (pedigree BLUP, genomic BLUP, spatial models, multi-trait, factor analytic structures) with the performance needed for national-scale evaluations.

This project aims to change that by implementing the core algorithms from scratch in Rust — a language with C-level performance, memory safety without garbage collection, and first-class tooling — and exposing them through an ergonomic Python API.

Features

Core Engine

  • AI-REML (Average Information) with quadratic convergence + EM-REML fallback
  • Henderson's Mixed Model Equations (MME) — dense and sparse assembly/solve
  • BLUP/BLUE extraction with standard errors
  • Sparse Cholesky solver via faer (supernodal, symbolic/numeric split)
  • Wald F-tests for fixed effects with p-values
  • Diagnostics: Log-likelihood, AIC, BIC, convergence monitoring

Pedigree BLUP (Animal Model)

  • Pedigree parsing and validation (CSV, programmatic)
  • Henderson's A-inverse with Meuwissen & Luo (1992) inbreeding
  • Topological sort for correct pedigree ordering
  • Validated against Mrode (2005) textbook examples

Genomic BLUP (GBLUP)

  • VanRaden Method 1 (2008) G-matrix construction
  • G-matrix blending with A22 (Misztal et al. 2010)
  • Single-step H-inverse (Legarra et al. 2009, Aguilar et al. 2010)
  • Full A-matrix computation and A22 extraction

Spatial & Variance Structures

  • AR1 (first-order autoregressive) with closed-form tridiagonal inverse
  • Kronecker product for separable spatial models (AR1 x AR1)
  • Diagonal (heterogeneous variances)
  • Unstructured covariance (Cholesky parameterized)
  • Identity (IID random effects)

Multi-trait Models

  • Kronecker-structured MME for correlated traits
  • EM-REML for trait covariance estimation (G0, R0)
  • Genetic correlation estimation
  • Positive-definiteness enforcement via eigenvalue bending

Python Bindings (PyO3)

  • MixedModel class with fluent API
  • Pedigree class with CSV import and A-inverse
  • compute_g_matrix() from numpy marker arrays
  • scipy.sparse interop for relationship matrices
  • Full type stubs for IDE autocompletion
  • Install via pip install -e . (maturin)

CLI Tool

  • openblup fit — fit models from CSV with formula specification
  • openblup ainverse — compute and inspect A-inverse from pedigree
  • Text and JSON output formats

Data I/O

  • CSV import with automatic type detection (numeric vs. categorical)
  • Flexible model specification with builder pattern API

Factor Analytic (FA) Models

  • FA1, FA2, and higher-order factor analytic covariance structures
  • Woodbury identity for efficient inverse: Σ⁻¹ = Ψ⁻¹ - Ψ⁻¹Λ(I + Λ'Ψ⁻¹Λ)⁻¹Λ'Ψ⁻¹
  • Reduced-rank modeling for multi-environment trials (Smith et al. 2001)

Residual Diagnostics

  • Conditional and marginal residuals
  • Leverage (hat values), Cook's distance
  • Standardized and studentized residuals
  • Satterthwaite denominator degrees of freedom for Wald F-tests

Selection Indices

  • Smith-Hazel index: b = P⁻¹Ga (Smith 1936, Hazel 1943)
  • Restricted index (Kempthorne & Nordskog 1959) — zero gain on restricted traits
  • Desired gains index (Pesek & Baker 1969)
  • Accuracy, expected genetic gain, index variance

Marker Effect Models (RR-BLUP)

  • Ridge regression BLUP for individual SNP marker effects
  • EM-REML for variance component estimation
  • Genomic prediction from marker effects

Cross-Validation

  • k-fold and leave-one-out cross-validation
  • Stratified fold creation
  • Prediction accuracy (Pearson correlation, regression slope)

Sparse Inverse Subset

  • Sparse Cholesky factorization (left-looking algorithm)
  • Selected elements of C⁻¹ via Takahashi equations
  • Efficient tr(A⁻¹B) computation

GPU Acceleration (feature-gated)

  • gpu feature flag for optional GPU support
  • CPU fallback for G-matrix computation
  • Architecture ready for wgpu compute shaders

WebAssembly Target

  • Self-contained WASM crate (no rayon, no faer, no file I/O)
  • JSON API for A-inverse, mixed model fitting, G-matrix
  • Browser demo page included

Quick Start (Rust)

use plant_breeding_lmm_core::data::DataFrame;
use plant_breeding_lmm_core::model::MixedModelBuilder;
use plant_breeding_lmm_core::variance::Identity;
use plant_breeding_lmm_core::genetics::{Pedigree, compute_a_inverse};

// Load data
let df = DataFrame::from_csv("field_trial.csv")?;

// Simple model: yield = rep (fixed) + genotype (random, IID) + error
let mut model = MixedModelBuilder::new()
    .data(&df)
    .response("yield")
    .fixed("rep")
    .random("genotype", Identity::new(1.0), None)
    .build()?;

let result = model.fit_reml()?;
println!("{}", result.summary());

// With pedigree relationship matrix
let ped = Pedigree::from_csv("pedigree.csv")?;
let a_inv = compute_a_inverse(&ped)?;

let mut model = MixedModelBuilder::new()
    .data(&df)
    .response("yield")
    .fixed("rep")
    .random("animal", Identity::new(1.0), Some(a_inv))
    .build()?;

let result = model.fit_reml()?;
println!("{}", result.summary());

Quick Start (Python)

# Install (requires Rust toolchain + maturin)
pip install maturin
pip install -e .
from plant_breeding_lmm import MixedModel, Pedigree, compute_a_inverse

# Fit a simple mixed model
model = MixedModel()
model.load_csv("field_trial.csv")
model.set_response("yield")
model.add_fixed("rep")
model.add_random("genotype")
result = model.fit()
print(result.summary())

# With pedigree relationship matrix
ped = Pedigree.from_csv("pedigree.csv")
a_inv = compute_a_inverse(ped)  # scipy.sparse compatible

model = MixedModel()
model.load_csv("field_trial.csv")
model.set_response("yield")
model.add_fixed("rep")
model.add_random("animal", ginverse=a_inv)
result = model.fit()
print(result.variance_components())  # {'animal': 20.5, 'residual': 40.1}

Quick Start (CLI)

# Fit a model from the command line
openblup fit --data trial.csv --response yield --fixed "rep" --random genotype

# With pedigree
openblup fit --data trial.csv --response yield --fixed "rep" \
    --random animal --pedigree pedigree.csv

# Inspect A-inverse
openblup ainverse --pedigree pedigree.csv

Building

# Requires Rust 1.70+ (tested with 1.93)
cargo build --release

# Run tests (295 tests)
cargo test --workspace

# Build Python bindings
pip install maturin
maturin develop --release

# Build CLI
cargo build --release -p openblup-cli

Architecture

openblup/
├── crates/
│   ├── core/                 # Pure Rust library (17,000+ lines)
│   │   ├── data/             # DataFrame, Factor columns, CSV I/O
│   │   ├── matrix/           # Sparse ops, dense helpers, faer Cholesky, sparse inverse
│   │   ├── model/            # Builder API, design matrices, multi-trait
│   │   ├── lmm/              # MME, AI-REML, EM-REML, BLUP/BLUE, structured R
│   │   ├── variance/         # AR1, Diagonal, Unstructured, Kronecker, Factor Analytic
│   │   ├── genetics/         # Pedigree, A/G/H matrices, RR-BLUP, selection indices
│   │   ├── diagnostics/      # Wald tests, residuals, cross-validation, Satterthwaite df
│   │   └── gpu/              # Feature-gated GPU acceleration
│   ├── python-bindings/      # PyO3 bridge with numpy/scipy interop
│   ├── cli/                  # Command-line tool (clap)
│   └── wasm/                 # WebAssembly target with browser demo
└── python/                   # Python package + type stubs

Algorithms & References

The algorithms implemented here are based on well-established quantitative genetics literature:

Core REML & Mixed Model Theory

  • Henderson, C.R. (1984). Applications of Linear Models in Animal Breeding. University of Guelph. — The foundational text for mixed model equations in breeding.
  • Patterson, H.D. & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545-554. — Original REML paper.
  • Searle, S.R., Casella, G. & McCulloch, C.E. (1992). Variance Components. Wiley. — Comprehensive treatment of variance component estimation.
  • Gilmour, A.R., Thompson, R. & Cullis, B.R. (1995). Average Information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, 51(4), 1440-1450. — AI-REML algorithm used in ASReml.

Pedigree & Relationship Matrices

  • Henderson, C.R. (1976). A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics, 32(1), 69-83. — Henderson's rules for A-inverse.
  • Meuwissen, T.H.E. & Luo, Z. (1992). Computing inbreeding coefficients in large populations. Genetics, Selection, Evolution, 24(4), 305-313. — Efficient inbreeding algorithm.
  • Mrode, R.A. (2005). Linear Models for the Prediction of Animal Breeding Values (2nd ed.). CABI Publishing. — Standard textbook; our integration tests validate against examples from this book.
  • Quaas, R.L. (1976). Computing the diagonal elements and inverse of a large numerator relationship matrix. Biometrics, 32(4), 949-953.

Genomic Selection

  • VanRaden, P.M. (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science, 91(11), 4414-4423. — G-matrix construction (Method 1).
  • Legarra, A., Aguilar, I. & Misztal, I. (2009). A relationship matrix including full pedigree and genomic information. Journal of Dairy Science, 92(9), 4656-4663. — Single-step H-matrix.
  • Aguilar, I., Misztal, I., Johnson, D.L., Legarra, A., Tsuruta, S. & Lawlor, T.J. (2010). Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. Journal of Dairy Science, 93(2), 743-752.

Spatial Analysis

  • Gilmour, A.R., Cullis, B.R. & Verbyla, A.P. (1997). Accounting for natural and extraneous variation in the analysis of field experiments. Journal of Agricultural, Biological, and Environmental Statistics, 2(3), 269-293. — AR1xAR1 spatial models.
  • Smith, A., Cullis, B.R. & Thompson, R. (2005). The analysis of crop cultivar breeding and evaluation trials: an overview of current mixed model approaches. Journal of Agricultural Science, 143(6), 449-462.

Multi-Environment & Factor Analytic Models

  • Smith, A., Cullis, B.R. & Thompson, R. (2001). Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics, 57(4), 1138-1147. — Factor analytic models for MET.
  • Thompson, R., Cullis, B.R., Smith, A. & Gilmour, A.R. (2003). A sparse implementation of the Average Information algorithm for factor analytic and reduced rank variance models. Australian & New Zealand Journal of Statistics, 45(4), 445-459.

Software Comparison

  • Butler, D.G., Cullis, B.R., Gilmour, A.R. & Gogel, B.J. (2009). ASReml-R Reference Manual. VSN International. — The proprietary reference implementation.
  • Covarrubias-Pazaran, G. (2016). Genome-assisted prediction of quantitative traits using the R package sommer. PLoS ONE, 11(6), e0156744. — Open-source R alternative.

Comparison with Existing Tools

Feature ASReml sommer (R) MixedModels.jl OpenBLUP
Language Fortran R Julia Rust + Python
Open source No Yes Yes Yes
AI-REML Yes No No Yes
Pedigree BLUP Yes Yes No Yes
Genomic BLUP Yes Yes No Yes
Single-step (H) Yes Yes No Yes
Spatial (AR1xAR1) Yes Yes No Yes
Multi-trait Yes Yes Yes Yes
Factor analytic Yes Limited No Yes
Sparse solver Yes No Yes Yes (faer)
Python API No No No Yes (PyO3)
CLI tool Yes No No Yes
Wald tests Yes Yes Yes Yes
WebAssembly target No No No Yes
Memory safe No N/A Yes (GC) Yes (ownership)
Performance Excellent Slow Good Excellent

Contributing

Contributions are welcome! See CONTRIBUTING.md for the full guide, including:

  • How to set up your development environment
  • Code style and naming conventions (follows standard QG notation)
  • Testing requirements (all new algorithms must be validated)
  • Pull request process and templates

Areas where help is especially valuable:

Area What's Needed
Validation Run the same model in OpenBLUP + ASReml/sommer, compare variance components and BLUPs
Tutorials Worked examples from real breeding programs (dairy, wheat, maize, forestry)
Python polish pandas DataFrame input, better error messages, documentation
GPU backends wgpu compute shader implementations for G-matrix and dense solvers

Even if you don't write code — validation reports, bug reports, and feature requests are extremely valuable. See our issue templates.

Licence

Dual-licensed under MIT or Apache 2.0, at your option.

Acknowledgements

This project draws on decades of quantitative genetics research. We are grateful to the scientists who developed and published these algorithms, and to the ASReml team whose software set the standard for mixed model analysis in breeding.

About

Open-source REML and BLUP for plant and animal breeding - a modern mixed model engine in Rust

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors