Releases: p-gueguen/rctd-py
v0.3.0
What's Changed
Bug Fixes
-
counts_MIN pixel filter now enforced (fixes #11): R spacexr calls
restrict_counts()twice — the second call withgene_list_bulkwas missing from rctd-py. Pixels with fewer thancounts_MIN=10counts in the DE gene set are now correctly removed. Validated: exact pixel count match with R spacexr on Xenium Region 1 (n_filtered=13,936). -
torch.compile fallback for environments without CUDA headers (fixes #10):
torch.compilefails at runtime on GPU nodes without CUDA development headers (cuda.h) because Triton attempts to compile CUDA code. Added lazy auto-detection with graceful fallback to eager mode, plusRCTDConfig(compile=False)and--no-compileCLI flag for explicit control. -
cuSOLVER batch-size crash fix:
torch.linalg.eighhas an undocumented batch-size limit in CUDA 12.8 (~27k-31k depending on K). Added_eigh_safe()that sub-batches at 25k, fixing crashes at--batch-size 50000.
New Features
pixel_maskin result types (fixes #8, fixes #9):FullResult,DoubletResult, andMultiResultnow include apixel_maskfield (boolean array matching the input AnnData shape). Maps results back to original barcodes:result = run_rctd(spatial, reference) weights_df = pd.DataFrame( result.weights, index=spatial.obs_names[result.pixel_mask], columns=result.cell_type_names, )
Improvements
-
Memory: sparse-aware reference profiles: Large references (370k+ cells) no longer require
.todense()during profile computation. Sparse mat-vec products keep memory usage proportional to non-zero entries. -
Numerical precision:
_longdouble_sum()uses numpy longdouble (80-bit) for bulk reductions, matching R's extended precision on x86-64. -
Tutorial notebook fixed: Marimo figures now render in static HTML export.
Breaking Changes
counts_MIN=10is now enforced — result pixel counts will differ from v0.2.x (fewer pixels, matching R spacexr).FullResult,DoubletResult,MultiResultgain apixel_maskfield (defaultNone, backward-compatible for directrun_*_mode()callers).RCTDConfiggains acompilefield (defaultTrue).
Validation
- 100/100 tests pass (Python 3.10-3.12)
- Xenium Region 1:
n_filtered=13,936exact match with R,dominant_type_agreement=0.9973,pixel_corr_median=1.0 - No runtime regression on tutorial or Xenium benchmarks
Full Changelog: v0.2.2...v0.3.0
v0.2.2: Fix GPU multi mode crash
Bug fix
- Fix cuSOLVER crash in multi mode on GPU: NVIDIA's batched eigendecomposition (
cusolverDnXsyevBatched) fails on 1×1 matrices, which occur during multi mode's iterative type selection (K_sub=1). Added analytical K=1 path and NaN guard for degenerate Hessians.
Upgrade
uv pip install --upgrade rctd-py==0.2.2
Full mode and doublet mode were unaffected.
v0.2.1: device control, performance optimizations, CLI
New features
deviceparameter inRCTDConfig: force CPU/GPU withdevice="cpu"/"cuda"/"auto"rctd runCLI command for full/doublet/multi modes- Auto batch sizing based on available VRAM
- Analytical K=2 solvers for faster doublet mode
Performance
- Shared-profile IRWLS solver (28% faster, 17% less VRAM)
- Batched log-likelihood computation
torch.compileintegration
Bug fixes
- Correct 0-indexed spot_class labels in tutorial (#4)
- Handle corrupt Q-matrices download with automatic retry
- Fix flaky test tolerances
v0.2.0 — PyTorch backend, device control, consolidated CI
What's Changed
Breaking: JAX → PyTorch migration
- Replaced JAX/jaxlib with PyTorch as the sole compute backend
- All dependencies updated:
torch>=2.0replacesjax>=0.4.20, jaxlib>=0.4.20
New features
deviceparameter — force CPU or GPU viaRCTDConfig(device="cpu")or"cuda"(default"auto"preserves existing behavior)sigma_override— bypass sigma auto-calibration with a known value (e.g. from R) for exact concordance
Testing & validation
- Added R concordance tests with pre-computed spacexr v2.2.1 fixtures (no R required to run)
- 99.7% dominant type agreement on 14k-cell Xenium, 100% with
sigma_override
CI & packaging
- Consolidated lint + test into single CI workflow
- Removed codecov (no token configured)
- Updated README:
uv pip install, benchmark tables, device docs
Full Changelog: v0.1.1...v0.2.0
v0.1.1 — Sigma estimation 23× speedup
What's new in v0.1.1
Performance
- 23× faster sigma estimation via three targeted optimizations:
- Cache the 437×437 tridiagonal matrix inverse (eliminated ~144 redundant O(n³) inversions)
- Precompute all 126 spline coefficient matrices once at startup
- Vectorize sigma candidate evaluation with
jax.vmap/jax.jit(85 sequential → 1 fused kernel)
- Total end-to-end time on Blackwell B200: ~3.5 min vs ~51 min for R spacexr (15× speedup)
New
- Validation report with spatial cell-type maps: https://p-gueguen.github.io/rctd-py/
- q_matrices.npz is now auto-downloaded on first use (not bundled in wheel)
Fixes
- Lint: remove unused variable in
_likelihood.py
Installation
uv pip install rctd-pyrctd-py v0.1.0 — Initial release
rctd-py v0.1.0
GPU-accelerated Robust Cell Type Decomposition (RCTD) for spatial transcriptomics.
Highlights
- JAX reimplementation of spacexr RCTD with 63x GPU speedup (L40S) over R
- 99.7% agreement with R spacexr on 58k Xenium pixels
- Three deconvolution modes: full, doublet, multi
- Pure Python — no R dependency