OpenAge

Open-source blood-based biological age estimation from standard clinical biomarkers.

Developed by Healome.

Try it now — upload a blood test report and chat with AI about your biological age at openageai.com/chat. No code required.

Core thesis

This project is built around one idea: there is a small, fixed set of blood-derived measurements—a minimal clique—that, together with the right low-bias model, captures almost all the signal that matters for biological age and mortality prediction. Hyperparameter tuning and feature ablation were aimed at finding that clique, not at maximizing complexity. Features outside this set either add little or actively hurt some metrics once the core signal is already represented.

Why This Exists

Many biological age models are difficult to interpret in clinical contexts, hard to benchmark reproducibly, and show high variability on repeated measurements. I built this release to provide a transparent, reproducible baseline using routine blood biomarkers and a public dataset. The methodology, model, and evaluation are fully open so the community can inspect, compare, and improve on this work.

Model and feature selection

We evaluated a wide range of predictors, from simple to highly flexible: linear regression, elastic net, gradient-boosted trees, bagging / tree ensembles, broader ensembles, neural networks, CNNs, and variational autoencoders. We settled on the architectures that performed best on our validation objectives and then focused on feature pruning.

Starting from roughly 120 blood-related biomarkers, we progressively dropped inputs that did not improve (or did not stably improve) held-out accuracy. That process yielded the 35-feature extended model. We took one further reduction step to the 21-feature standard model: a smaller panel without a meaningful accuracy penalty, so routine labs stay sufficient where the extended panel is not available. On the standard model, performance remains in a strong range on at least one of the headline metrics we track.

Beyond the extended model, adding more markers did not improve metrics; in several cases it caused clear degradation. That pattern supports the thesis above: once the minimal clique is covered, extra features are mostly noise or redundancy for this task.

Design Principles

Interpretable — Built on standard clinical biomarkers (CBC + CMP + medical history) that map to physiological systems physicians already reason about.
Reproducible — Trained entirely on open, public survey data. Anyone can retrain, validate, and audit.
Validated against mortality — Survival analysis confirms the model's biological age estimates predict mortality (univariate Cox PH HR = 1.098 per year, concordance = 0.83).
Extensible — Structured for community contributions: new data sources, models, and benchmarks.

Quick Start

git clone https://github.com/Healome/openage.git
cd openage
pip install -e .

Model weights and validation data are on the Hugging Face Hub: model weights, validation dataset. Download once (see below) or let the library fetch weights automatically when you first use a model.

Requirements

Package	Version
Python	3.8+
numpy	>=1.21, <2
pandas	>=1.3
scikit-learn	>=1.0, <1.1 (models trained with 0.24.1)
joblib	>=1.0
matplotlib	>=3.4

Optional extras:

pip install -e ".[survival]" — adds lifelines (for Kaplan-Meier, Cox PH)
pip install -e ".[neural]" — adds PyTorch (experimental model)

Biomarker names (friendly keys)

predict_age and HealomeClock.predict accept canonical snake_case names (recommended) or the original survey variable codes used in training (e.g. LBXGH). Common synonyms work too (e.g. hba1c_percent → glycohemoglobin). See openage.feature_aliases and NHANES_TO_CANONICAL_KEY in code for the full map.

Questionnaire (history) fields use 1 = Yes, 2 = No for the six items below, matching how the model was trained.

Example — standard model (21 features)

from openage import predict_age

result = predict_age({
    "mean_cell_volume_fl": 93,                      # MCV, fL
    "glycohemoglobin_percent": 5.4,                 # HbA1c, % (glycated hemoglobin)
    "alt_iu_l": 22,                                 # ALT, IU/L
    "rbc_count_million_per_ul": 4.52,               # RBC, million cells/µL
    "ever_cancer_or_malignancy": 2,                 # 1=Yes, 2=No
    "platelet_count_thousand_per_ul": 245,          # platelets, 1000 cells/µL
    "ldh_iu_l": 138,                                # LDH, IU/L
    "ever_angina": 2,                               # 1=Yes, 2=No
    "lymphocyte_percent": 28.5,                     # % of WBCs
    "lymphocyte_count_thousand_per_ul": 2.1,        # 1000 cells/µL
    "cpk_iu_l": 132,                                # CPK, IU/L
    "creatinine_mg_dl": 0.80,                       # mg/dL
    "ever_arthritis": 2,                            # 1=Yes, 2=No
    "alp_iu_l": 68,                                 # alkaline phosphatase, IU/L
    "ever_liver_condition": 2,                      # 1=Yes, 2=No
    "potassium_mmol_l": 4.0,                        # mmol/L
    "rdw_percent": 12.8,                            # RDW, %
    "monocyte_percent": 6.2,                        # % of WBCs
    "bun_mg_dl": 15,                                # BUN, mg/dL
    "ever_gallstones": 2,                          # 1=Yes, 2=No
    "glucose_mg_dl": 95,                            # mg/dL
}, chronological_age=45, variant="standard")

print(result.summary())

Example — extended model (35 features)

result = predict_age({
    "mean_cell_volume_fl": 93,                      # MCV, fL
    "hemoglobin_g_dl": 14.2,                        # g/dL
    "hematocrit_percent": 42.1,                     # %
    "alt_iu_l": 22,                                 # IU/L
    "rbc_count_million_per_ul": 4.52,               # million cells/µL
    "wbc_count_thousand_per_ul": 6.1,               # 1000 cells/µL
    "platelet_count_thousand_per_ul": 245,          # 1000 cells/µL
    "ldh_iu_l": 138,                                # IU/L
    "ever_angina": 2,                               # 1=Yes, 2=No
    "lymphocyte_percent": 28.5,                     # % of WBCs
    "ast_iu_l": 24,                                 # IU/L
    "lymphocyte_count_thousand_per_ul": 2.1,        # 1000 cells/µL
    "cpk_iu_l": 132,                                # IU/L
    "total_bilirubin_mg_dl": 0.8,                   # mg/dL
    "ever_arthritis": 2,                            # 1=Yes, 2=No
    "alp_iu_l": 68,                                 # IU/L
    "calcium_mg_dl": 9.5,                           # mg/dL
    "bun_mg_dl": 15,                                # mg/dL
    "ever_gallstones": 2,                           # 1=Yes, 2=No
    "glucose_mg_dl": 95,                            # mg/dL
    "chloride_mmol_l": 102,                         # mmol/L
    "ldl_cholesterol_mg_dl": 110,                   # mg/dL (Martin–Hopkins)
    "glycohemoglobin_percent": 5.4,                 # %
    "hdl_cholesterol_mg_dl": 55,                    # mg/dL (direct HDL)
    "ever_cancer_or_malignancy": 2,                 # 1=Yes, 2=No
    "triglycerides_mg_dl": 90,                      # mg/dL
    "total_protein_g_dl": 7.0,                      # g/dL
    "creatinine_mg_dl": 0.80,                       # mg/dL
    "ever_liver_condition": 2,                      # 1=Yes, 2=No
    "potassium_mmol_l": 4.0,                        # mmol/L
    "bicarbonate_mmol_l": 25,                       # mmol/L
    "osmolality_mmol_kg": 280,                      # mmol/kg
    "rdw_percent": 12.8,                            # %
    "monocyte_percent": 6.2,                        # % of WBCs
    "sodium_mmol_l": 140,                           # mmol/L
}, chronological_age=45, variant="extended")

print(result.summary())

CSV/JSON column names can use these same friendly keys (or the original survey codes).

Two Model Variants

Variant	Features	Test MAE	Test R²	Pearson r
Standard	21 (16 lab + 5 questionnaire)	5.11 years	0.906	0.952
Extended	35 (expanded lab panel)	6.07 years	0.873	0.934

Both models: GradientBoosting trained on ~50K public survey records (2003–2020), validated with Cox PH survival analysis (univariate HR = 1.098 per year for extended model, concordance = 0.83).

Use standard for routine panels (21 inputs) or extended when you have the larger lipid/liver/electrolyte set (35 inputs). Pass biomarkers as canonical friendly names (see examples above) or the original survey codes.

Models load from openage/models/weights/ (standard_21feat.joblib, extended_35feat.joblib). If the files are missing, the library will try to download them from the Hub; otherwise see Downloading model weights and validation data below.

from openage import HealomeClock

# Standard model (15 blood markers + 6 medical history questions)
clock = HealomeClock(variant="standard")

# Extended model (35 features for comprehensive panels)
clock = HealomeClock(variant="extended")

Downloading model weights and validation data

Assets live on the Hugging Face Hub under Healome.

Resource	Hugging Face	Local path (after download)
Model weights	`Healome/openage-weights`	`openage/models/weights/`
Validation data (XPT bundle for `tests/validate_on_nhanes.py`)	`Healome/nhanes-validation-data` · files	`nhanes_data_dump/`

Model weights (standard_21feat.joblib, extended_35feat.joblib)

Option A — automatic: If you have huggingface_hub installed, the library will download missing weights from the Hub the first time you use HealomeClock or predict_age.

Option B — Python:

from huggingface_hub import hf_hub_download
from pathlib import Path

weights_dir = Path("openage/models/weights")
weights_dir.mkdir(parents=True, exist_ok=True)
for name in ["standard_21feat.joblib", "extended_35feat.joblib"]:
    hf_hub_download(repo_id="Healome/openage-weights", filename=name, local_dir=weights_dir)

Option C — CLI:

huggingface-cli download Healome/openage-weights standard_21feat.joblib --local-dir openage/models/weights
huggingface-cli download Healome/openage-weights extended_35feat.joblib --local-dir openage/models/weights

Validation data (`nhanes_data_dump`)

To run tests/validate_on_nhanes.py, download the dataset snapshot into the repo root:

huggingface-cli download Healome/nhanes-validation-data --local-dir nhanes_data_dump --repo-type dataset

Or in Python:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="Healome/nhanes-validation-data", repo_type="dataset", local_dir="nhanes_data_dump")

Place the contents so that nhanes_data_dump/2017-2020/ and (optionally) nhanes_data_dump/extended_data/ match the structure described in nhanes_data_dump/README.md.

Maintainers: To upload or update Hub assets, use python scripts/upload_to_huggingface.py (weights) or add --dataset for the validation bundle. Requires huggingface_hub and huggingface-cli login.

Training Data

Trained on approximately 50,000 records from public survey cycles 2003–2020. See MODEL_CARD.md and DATASET_FACTS.md for details and primary sources.

Internal Validation

Validated against a proprietary longitudinal clinical dataset (~1.5M blood-test records). Summary statistics and validation results are in DATASET_FACTS.md. Raw data cannot be shared for patient privacy reasons.

Survival Analysis

The model's biological age predictions are validated against mortality outcomes using linked public mortality records:

Cox PH: Univariate HR = 1.098 per year of biological age (95% CI: 1.095–1.100), concordance = 0.83 — biological age is a strong predictor of mortality
Kaplan-Meier: Clear separation between accelerated aging (bio_age - chrono_age >= 5 years) and decelerated aging groups

See BENCHMARKS.md for full results.

Benchmarking & Leaderboard

This repo maintains a dual-track leaderboard for community benchmarking:

Track 1: Age prediction accuracy (MAE, R², RMSE)
Track 2: Mortality prediction (Cox PH concordance, Kaplan-Meier)

See benchmarks/README.md for how to submit your model.

Repo Structure

├── openage/
│   ├── models/              # Tree-based (primary) + experimental neural net
│   ├── data/                # Public survey + mortality data loaders
│   ├── evaluation/          # Metrics, survival analysis, leaderboard
│   ├── inference.py         # Main prediction API
│   └── visualization.py     # Plotting utilities
├── notebooks/
│   ├── demo.ipynb           # Quick start (2 min)
│   ├── training.ipynb       # Full training pipeline
│   └── evaluation.ipynb     # Benchmarking walkthrough
├── benchmarks/              # Leaderboard submissions
├── data/                    # Sample input data
└── figures/                 # Generated plots

Limitations

Read LIMITATIONS.md before using. Key points:

Trained on a US-representative survey; other populations not validated
Estimates overall biological age; no organ-specific resolution
Not a medical device; not for clinical decision-making without oversight

Future Directions

Extended model using proprietary longitudinal clinical dataset
Organ-specific biological age estimation
Methodology paper

Star this repo or follow @Healome for updates.

Web App

Upload & analyze: openageai.com — upload a blood test report, extract biomarkers via OCR, and get your biological age estimate
Chat with AI: openageai.com/chat — ask questions about your results, biomarkers, and what they mean

About

OpenAge is developed and maintained by Healome, a longevity-focused health technology company building blood-based aging models and tools for longitudinal health tracking.

Citation

@article{yadala2026healome,
  title={Healome Clock: An Open-Source, Interpretable, and Actionable Blood-Based Biological Aging Clock with Disease-Specific Mortality Validation},
  author={Yadala, Nikhil},
  year={2026},
  doi={10.5281/zenodo.19244274},
  url={https://doi.org/10.5281/zenodo.19244274}
}

Contributing

I believe the field benefits from better benchmarking and open scrutiny. See CONTRIBUTING.md for how to get involved.

License

Open-source: GNU Affero General Public License v3.0 (AGPLv3). See LICENSE.

Commercial use without AGPL compliance — organizations that wish to integrate OpenAge into proprietary or closed-source products, SaaS platforms, clinical tools, diagnostics, or other applications without releasing their source code — requires a separate commercial license from Healome One Inc. Contact nikhil@healome.one for terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAge

Core thesis

Why This Exists

Model and feature selection

Design Principles

Quick Start

Requirements

Biomarker names (friendly keys)

Example — standard model (21 features)

Example — extended model (35 features)

Two Model Variants

Downloading model weights and validation data

Model weights (standard_21feat.joblib, extended_35feat.joblib)

Validation data (`nhanes_data_dump`)

Training Data

Internal Validation

Survival Analysis

Benchmarking & Leaderboard

Repo Structure

Limitations

Future Directions

Web App

About

Citation

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
benchmarks		benchmarks
data		data
notebooks		notebooks
openage		openage
tests		tests
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
CONTRIBUTING.md		CONTRIBUTING.md
DATASET_FACTS.md		DATASET_FACTS.md
LICENSE		LICENSE
LIMITATIONS.md		LIMITATIONS.md
METHODOLOGY.md		METHODOLOGY.md
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

OpenAge

Core thesis

Why This Exists

Model and feature selection

Design Principles

Quick Start

Requirements

Biomarker names (friendly keys)

Example — standard model (21 features)

Example — extended model (35 features)

Two Model Variants

Downloading model weights and validation data

Model weights (standard_21feat.joblib, extended_35feat.joblib)

Validation data (nhanes_data_dump)

Training Data

Internal Validation

Survival Analysis

Benchmarking & Leaderboard

Repo Structure

Limitations

Future Directions

Web App

About

Citation

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Validation data (`nhanes_data_dump`)

Packages