Skip to content

Conversation

@rec3141
Copy link

@rec3141 rec3141 commented Feb 12, 2026

Summary

Update COMEBin to run with modern dependencies (Python 3.11, NumPy 1.26, scikit-learn 1.4+, PyTorch 2.2+), replacing removed/deprecated APIs and fixing env solver failures.

What changed

Python code fixes:

  • sklearn.cluster._kmeans private imports → public APIs (KMeans, euclidean_distances, check_random_state, row_norms)
  • KMeans(algorithm="full", n_jobs=-1)KMeans(algorithm="lloyd", n_init=10)
  • stable_cumsumnp.cumsum(dtype=np.float64)
  • np.intnp.int64 (5 instances across 3 files)
  • .item(0).item() (deprecated positional arg in NumPy 2.x)
  • torch.cuda.amp.{GradScaler,autocast}torch.amp.{GradScaler,autocast} with explicit device_type (PyTorch 2.4+)
  • ruamel.yaml Path loading → file handle (avoids deprecation warning)
  • hmmsearch: fallback to -E 1e-10 when --cut_tc fails (HMMER ≥3.3)

Runtime hardening:

  • Auto-resolve CHECKM_DATA_PATH from $CONDA_PREFIX
  • Auto-set MPLCONFIGDIR to output dir (avoids writes to restricted $HOME)
  • Lazy imports for lightweight subcommands
  • Suppress pkg_resources deprecation warning from CheckM

Environment (comebin_env.yaml):

  • Python 3.10–3.11 (pinned <3.12 for CheckM1 compatibility)
  • NumPy >=1.26,<2.0, scikit-learn >=1.4, PyTorch >=2.2
  • Moved hnswlib, python-igraph, leidenalg, scanpy from pip to conda-forge
  • Pinned pplacer=1.1.alpha19 (>=1.1 doesn't match alpha version strings → solver failure)
  • Added missing ruamel.yaml and tqdm dependencies
  • CPU-only PyTorch via conda-forge (GPU users can override)

Issue tags

Fixes #44
Refs #29, #38, #11

Validation

  • Environment solves cleanly with mamba env create -f comebin_env.yaml
  • All COMEBin Python modules import without errors
  • sklearn, torch.amp, numpy API smoke tests pass
  • End-to-end run with real contigs + BAMs (pending)

AI Assistance

Portions of this PR were developed with assistance from OpenAI Codex (initial refactoring) and Claude Code (code review, additional fixes, env debugging).

…+, PyTorch 2.2+)

Python code fixes:
- Replace sklearn.cluster._kmeans private imports with public APIs
- KMeans: algorithm="full" → "lloyd", drop removed n_jobs param
- stable_cumsum → np.cumsum(dtype=float64)
- np.int → np.int64 (removed alias in NumPy 1.24+)
- .item(0) → .item() (deprecated positional arg)
- torch.cuda.amp → torch.amp with explicit device_type (PyTorch 2.4+)
- ruamel.yaml: Path() → file handle loading
- hmmsearch: add -E 1e-10 fallback when --cut_tc fails (HMMER 3.3+)

Runtime hardening:
- Auto-resolve CHECKM_DATA_PATH from conda prefix
- Auto-set MPLCONFIGDIR to avoid writes to restricted home dirs
- Lazy imports to avoid loading ML libs for lightweight subcommands
- Suppress pkg_resources deprecation warning from CheckM

Environment (comebin_env.yaml):
- Modernize all deps: Python 3.10-3.11, NumPy <2.0, scikit-learn 1.4+
- Move pip packages to conda-forge where available
- Pin pplacer=1.1.alpha19 (>=1.1 doesn't match alpha version strings)
- Pin python<3.12 for CheckM1 compatibility
- Add missing ruamel.yaml and tqdm dependencies
- CPU-only PyTorch (conda-forge default)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rec3141 rec3141 force-pushed the codex/recent-dependencies-fix branch from db6e76d to 8fab9d7 Compare February 12, 2026 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: HMMER Version Conflict Causes Crash with '--cut_tc' (Strict Error Handling)

1 participant