The definitive protocol for Python in computational biology
A template repository demonstrating modern Python standards for computational biology research. Created for the CBG Retreat 2026 at ETH Zurich (January 21-23, 2026).
The Python ecosystem has matured dramatically, but most research code still uses outdated practices. This template showcases the 2025 state-of-the-art:
| Old Way | Modern Way | Benefit |
|---|---|---|
| pip + requirements.txt | pixi | 10-100x faster, handles conda + PyPI |
| Black + flake8 + isort | Ruff | Single tool, 30-100x faster |
| pandas for everything | Polars | 5-50x faster for large data |
| Jupyter notebooks | Quarto | Clean git diffs, reproducible |
| Manual testing | pytest + Hypothesis | Property-based testing |
| No type hints | Type hints + mypy | Catch bugs before runtime |
# Clone the repository
git clone https://github.com/cbg-ethz/hive-protocol.git
cd hive-protocol
# Install pixi (if not already installed)
curl -fsSL https://pixi.sh/install.sh | bash
# Set up environment (installs all dependencies)
pixi install
# Verify installation
pixi run test
# Start exploring
pixi run docs # Render tutorial notebooksPackage Management: Pixi
Handles conda and PyPI packages seamlessly. Essential for bioinformatics where tools like samtools remain conda-only.
Bayesian Inference: PyMC 5+ (content example)
Used here as example scientific content. The Kalman filter demonstrates state-space modeling—replace with your own domain logic.
Data Processing: Polars
DataFrame library built in Rust. 5-50x faster than pandas with lazy evaluation.
Code Quality: Ruff
Replaces Black, flake8, isort, pyupgrade, and more. Written in Rust, 30-100x faster.
Documentation: Quarto
Reproducible notebooks with clean git diffs. Outputs render on demand.
Workflow: Snakemake
Reproducible pipeline orchestration with automatic parallelization.
hive-protocol/
├── src/hive_protocol/ # Source code
│ ├── inference/ # Kalman filter + diagnostics
│ └── data/ # Data simulation
├── tests/ # pytest + Hypothesis tests
├── notebooks/ # Quarto tutorials
│ ├── 01_introduction.qmd
│ ├── 02_kalman_filter.qmd
│ └── 03_diagnostics.qmd
├── workflow/ # Snakemake pipeline
│ ├── Snakefile
│ └── config/params.yaml
├── docs/ # Workshop materials
├── pyproject.toml # Python packaging config
├── pixi.toml # Pixi environment config
└── .pre-commit-config.yaml # Code quality hooks
# Run tests
pixi run test
# Run tests with coverage
pixi run test-cov
# Check code style
pixi run lint
# Auto-fix style issues
pixi run lint-fix
# Format code
pixi run format
# Type check
pixi run typecheck
# Run all checks
pixi run check
# Install pre-commit hooks
pixi run hooks
# Render documentation
pixi run docs
# Run Snakemake workflow
pixi run workflowFor the January 2026 workshop:
- slides.qmd - Presentation slides (Quarto reveal.js)
- TUTORIAL.md - Step-by-step guide for participants
Render slides with:
pixi run slidesThis template is designed to be forked and customized:
- Fork this repository
- Rename
hive_protocolto your project name - Update
pyproject.tomlandpixi.tomlwith your details - Replace the Kalman filter code with your domain logic
- Keep the testing, CI/CD, and documentation patterns
We welcome contributions! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run
pixi run checkto ensure quality - Commit with a descriptive message
- Push to your branch
- Open a Pull Request
MIT License - see LICENSE for details.
- CBG-ETH Zurich for supporting modern research practices
- The PyMC, Polars, Ruff, and Quarto communities
- All workshop participants who improve this template
Built with modern Python for computational biology