Skip to content

Releases: p-gueguen/rctd-py

v0.3.0

31 Mar 11:02
51def76

Choose a tag to compare

What's Changed

Bug Fixes

  • counts_MIN pixel filter now enforced (fixes #11): R spacexr calls restrict_counts() twice — the second call with gene_list_bulk was missing from rctd-py. Pixels with fewer than counts_MIN=10 counts in the DE gene set are now correctly removed. Validated: exact pixel count match with R spacexr on Xenium Region 1 (n_filtered=13,936).

  • torch.compile fallback for environments without CUDA headers (fixes #10): torch.compile fails at runtime on GPU nodes without CUDA development headers (cuda.h) because Triton attempts to compile CUDA code. Added lazy auto-detection with graceful fallback to eager mode, plus RCTDConfig(compile=False) and --no-compile CLI flag for explicit control.

  • cuSOLVER batch-size crash fix: torch.linalg.eigh has an undocumented batch-size limit in CUDA 12.8 (~27k-31k depending on K). Added _eigh_safe() that sub-batches at 25k, fixing crashes at --batch-size 50000.

New Features

  • pixel_mask in result types (fixes #8, fixes #9): FullResult, DoubletResult, and MultiResult now include a pixel_mask field (boolean array matching the input AnnData shape). Maps results back to original barcodes:
    result = run_rctd(spatial, reference)
    weights_df = pd.DataFrame(
        result.weights,
        index=spatial.obs_names[result.pixel_mask],
        columns=result.cell_type_names,
    )

Improvements

  • Memory: sparse-aware reference profiles: Large references (370k+ cells) no longer require .todense() during profile computation. Sparse mat-vec products keep memory usage proportional to non-zero entries.

  • Numerical precision: _longdouble_sum() uses numpy longdouble (80-bit) for bulk reductions, matching R's extended precision on x86-64.

  • Tutorial notebook fixed: Marimo figures now render in static HTML export.

Breaking Changes

  • counts_MIN=10 is now enforced — result pixel counts will differ from v0.2.x (fewer pixels, matching R spacexr).
  • FullResult, DoubletResult, MultiResult gain a pixel_mask field (default None, backward-compatible for direct run_*_mode() callers).
  • RCTDConfig gains a compile field (default True).

Validation

  • 100/100 tests pass (Python 3.10-3.12)
  • Xenium Region 1: n_filtered=13,936 exact match with R, dominant_type_agreement=0.9973, pixel_corr_median=1.0
  • No runtime regression on tutorial or Xenium benchmarks

Full Changelog: v0.2.2...v0.3.0

v0.2.2: Fix GPU multi mode crash

21 Mar 13:54

Choose a tag to compare

Bug fix

  • Fix cuSOLVER crash in multi mode on GPU: NVIDIA's batched eigendecomposition (cusolverDnXsyevBatched) fails on 1×1 matrices, which occur during multi mode's iterative type selection (K_sub=1). Added analytical K=1 path and NaN guard for degenerate Hessians.

Upgrade

uv pip install --upgrade rctd-py==0.2.2

Full mode and doublet mode were unaffected.

v0.2.1: device control, performance optimizations, CLI

21 Mar 10:16
520b703

Choose a tag to compare

New features

  • device parameter in RCTDConfig: force CPU/GPU with device="cpu" / "cuda" / "auto"
  • rctd run CLI command for full/doublet/multi modes
  • Auto batch sizing based on available VRAM
  • Analytical K=2 solvers for faster doublet mode

Performance

  • Shared-profile IRWLS solver (28% faster, 17% less VRAM)
  • Batched log-likelihood computation
  • torch.compile integration

Bug fixes

  • Correct 0-indexed spot_class labels in tutorial (#4)
  • Handle corrupt Q-matrices download with automatic retry
  • Fix flaky test tolerances

v0.2.0 — PyTorch backend, device control, consolidated CI

10 Mar 12:49

Choose a tag to compare

What's Changed

Breaking: JAX → PyTorch migration

  • Replaced JAX/jaxlib with PyTorch as the sole compute backend
  • All dependencies updated: torch>=2.0 replaces jax>=0.4.20, jaxlib>=0.4.20

New features

  • device parameter — force CPU or GPU via RCTDConfig(device="cpu") or "cuda" (default "auto" preserves existing behavior)
  • sigma_override — bypass sigma auto-calibration with a known value (e.g. from R) for exact concordance

Testing & validation

  • Added R concordance tests with pre-computed spacexr v2.2.1 fixtures (no R required to run)
  • 99.7% dominant type agreement on 14k-cell Xenium, 100% with sigma_override

CI & packaging

  • Consolidated lint + test into single CI workflow
  • Removed codecov (no token configured)
  • Updated README: uv pip install, benchmark tables, device docs

Full Changelog: v0.1.1...v0.2.0

v0.1.1 — Sigma estimation 23× speedup

01 Mar 22:43

Choose a tag to compare

What's new in v0.1.1

Performance

  • 23× faster sigma estimation via three targeted optimizations:
    • Cache the 437×437 tridiagonal matrix inverse (eliminated ~144 redundant O(n³) inversions)
    • Precompute all 126 spline coefficient matrices once at startup
    • Vectorize sigma candidate evaluation with jax.vmap/jax.jit (85 sequential → 1 fused kernel)
  • Total end-to-end time on Blackwell B200: ~3.5 min vs ~51 min for R spacexr (15× speedup)

New

Fixes

  • Lint: remove unused variable in _likelihood.py

Installation

uv pip install rctd-py

rctd-py v0.1.0 — Initial release

28 Feb 10:35

Choose a tag to compare

rctd-py v0.1.0

GPU-accelerated Robust Cell Type Decomposition (RCTD) for spatial transcriptomics.

Highlights

  • JAX reimplementation of spacexr RCTD with 63x GPU speedup (L40S) over R
  • 99.7% agreement with R spacexr on 58k Xenium pixels
  • Three deconvolution modes: full, doublet, multi
  • Pure Python — no R dependency

Links