Thank you for your interest in contributing! This guide will help you understand the repository structure and how to add new models or improvements.
In addition to this repo, feel free to reach out via slack in the Bits in Bio community, specifically the channel #competition-ginkgo-antibody-2025.
-
Install Pixi:
curl -fsSL https://pixi.sh/install.sh | bash -
Clone the repository:
git clone <repository-url> cd abdev-benchmark
-
Install root environment:
pixi install
The repository uses a multi-project Pixi architecture:
models/: Each model is an independent Pixi projectlibs/abdev_core/: Shared utilities, base classes, and evaluationconfigs/: Configuration files for orchestratordata/: Benchmark datasets and schema documentationoutputs/: Generated outputs (models, predictions, evaluation)tests/: Model contract tests and reference datarun_all_models.py: Main orchestrator script
mkdir -p models/your_model/src/your_model
cd models/your_model[workspace]
name = "your-model"
version = "0.1.0"
description = "Brief description of your model"
channels = ["conda-forge"]
platforms = ["linux-64", "osx-64", "osx-arm64"]
[dependencies]
python = "3.11.*"
pandas = ">=2.0"
numpy = ">=1.24"
# Add other conda dependencies
[pypi-dependencies]
abdev-core = { path = "../../libs/abdev_core", editable = true }
your-model = { path = ".", editable = true }
# Add other PyPI dependencies
[feature.dev.dependencies]
pytest = ">=7.0"
ruff = ">=0.1"Create src/your_model/__init__.py:
"""Your model description."""
__version__ = "0.1.0"Create src/your_model/model.py:
"""Model implementation for your model."""
from pathlib import Path
import pandas as pd
from abdev_core import BaseModel, load_features
class YourModel(BaseModel):
"""Your model description.
This model [describe approach].
"""
def train(self, df: pd.DataFrame, run_dir: Path, *, seed: int = 42) -> None:
"""Train model on ALL provided data and save artifacts to run_dir.
Args:
df: Training dataframe with sequences and labels
run_dir: Directory to save model artifacts
seed: Random seed for reproducibility
"""
run_dir.mkdir(parents=True, exist_ok=True)
# Load features if needed
features = load_features("YourFeatureSource", dataset="GDPa1")
# Train your model on ALL samples in df
# The orchestrator handles CV splitting externally
# ... your training logic here ...
# Save model artifacts
# model_path = run_dir / "model.pkl"
# pickle.dump(model, open(model_path, "wb"))
print(f"Model saved to {run_dir}")
def predict(self, df: pd.DataFrame, run_dir: Path) -> pd.DataFrame:
"""Generate predictions for ALL provided samples.
Args:
df: Input dataframe with sequences
run_dir: Directory containing saved model artifacts
Returns:
DataFrame with predictions
"""
# Load model artifacts
# model = pickle.load(open(run_dir / "model.pkl", "rb"))
# Load features if needed
features = load_features("YourFeatureSource")
# Generate predictions for ALL samples
# ... your prediction logic here ...
# Return predictions
df_output = df[["antibody_name", "vh_protein_sequence", "vl_protein_sequence"]].copy()
# df_output["HIC"] = predictions
return df_outputCreate src/your_model/run.py:
"""CLI entry point."""
from abdev_core import create_cli_app
from .model import YourModel
app = create_cli_app(YourModel, "Your Model")
if __name__ == "__main__":
app()Create src/your_model/__main__.py:
"""Allow running as python -m your_model."""
from .run import app
if __name__ == "__main__":
app()Create README.md:
# Your Model Name
Brief description.
## Description
Detailed explanation of the method, what it does, and how it works.
## Requirements
- List data dependencies
- List feature dependencies
## Installation
\`\`\`bash
pixi install
\`\`\`
## Usage
### Train Model
\`\`\`bash
pixi run python -m your_model train \
--data ../../data/GDPa1_v1.2_20250814.csv \
--run-dir ./outputs/run_001 \
--seed 42
\`\`\`
### Generate Predictions
\`\`\`bash
# On training data
pixi run python -m your_model predict \
--data ../../data/GDPa1_v1.2_20250814.csv \
--run-dir ./outputs/run_001 \
--out-dir ./outputs/predictions
# On heldout data
pixi run python -m your_model predict \
--data ../../data/heldout-set-sequences.csv \
--run-dir ./outputs/run_001 \
--out-dir ./outputs/predictions_heldout
\`\`\`
### Run via Orchestrator
\`\`\`bash
# From repository root
pixi run all
\`\`\`
## Reference
Citation if applicable.pixi install
# Test train/predict manually
pixi run python -m your_model train \
--data ../../data/GDPa1_v1.2_20250814.csv \
--run-dir ./test_run
pixi run python -m your_model predict \
--data ../../data/GDPa1_v1.2_20250814.csv \
--run-dir ./test_run \
--out-dir ./test_out# From repository root
python tests/test_model_contract.py --model your_modelThis validates that your model correctly implements the BaseModel interface.
All predictions must follow the standard format (see data/schema/README.md):
antibody_name(string, unique, no NaN)
HIC(float)Tm2(float)Titer(float)PR_CHO(float)AC-SINS_pH7.4(float)
vh_protein_sequence(string)vl_protein_sequence(string)
- Follow PEP 8
- Type hints encouraged
- Docstrings for public functions
- Clear README per model
- Inline comments for complex logic
- Citation for external methods
- Add pytest tests in
tests/ - Compare against reference predictions if available
- Test edge cases (missing values, etc.)
When adding shared constants, utilities, or evaluation functions:
- Add to appropriate module:
constants.py- Shared constants (properties, datasets, etc.)utils.py- Data manipulation utilitiesbase.py- BaseModel interfaceevaluation_metrics.py- Evaluation metricsfeatures.py- Feature loading utilities
- Export in
__init__.py - Update docstrings
- Test that existing models still work
When modifying the orchestration logic:
- Test with multiple models
- Ensure config file compatibility
- Update
configs/README.mdif adding new config options - Verify evaluation metrics are computed correctly
We use the standard GitHub fork and pull request workflow:
- Fork the repository to your GitHub account
- Clone your fork locally:
git clone https://github.com/YOUR_USERNAME/abdev-benchmark.git cd abdev-benchmark - Create a feature branch (not main):
git checkout -b add-your-model-name # or git checkout -b fix-issue-description - Make your changes following the guidelines in this document
- Commit your changes with clear commit messages (see below)
- Push to your fork:
git push origin add-your-model-name
- Open a Pull Request from your fork's branch to our
mainbranch - Address review feedback - we appreciate your contribution and will attempt a timely review
Before submitting your PR, please ensure:
- Code follows style guidelines
- Documentation added/updated (README, docstrings, etc.)
- Tests added/passing (
python tests/test_model_contract.py --model your_model) - Lockfile committed (
pixi.lock) for new models - README updated if needed
- No breaking changes to shared components (or discussed in PR description)
- For new models: Added entry to main README's model table
- For new models: Specified any external dependencies, data sources, and licensing
- CI checks pass (if applicable)
Use clear, descriptive commit messages:
Add XYZ model with feature engineering
- Implement prediction module
- Add documentation
- Configure Pixi environment
- Add tests and validation
- Check existing models for examples (e.g.,
models/random_predictor/,models/tap_linear/) - Read
data/schema/README.mdfor format specifications - Review
libs/abdev_core/for shared utilities and base classes - See
configs/README.mdfor orchestrator configuration options
Open an issue or reach out to maintainers.