Skip to content

Comments

Optimize stack_images: pre-allocate + fill, add get_image(..., out=)#35

Open
YanLogovskiy wants to merge 3 commits intoneuro-ml:masterfrom
YanLogovskiy:master
Open

Optimize stack_images: pre-allocate + fill, add get_image(..., out=)#35
YanLogovskiy wants to merge 3 commits intoneuro-ml:masterfrom
YanLogovskiy:master

Conversation

@YanLogovskiy
Copy link
Collaborator

Reduce peak memory during DICOM series stacking by pre-allocating a single buffer and filling it slice-by-slice instead of building a list of arrays and calling np.stack. Add optional out argument to get_image so callers can write directly into the result buffer.

Changes

  1. dicom_csv.misc.stack_images

    • Replaced np.stack(list(map(get_image, series)), axis) with one pre-allocated array and slice-by-slice fill.
    • For axis in (0, 1, 2) (including axis=-1): allocate once with np.empty(...), copy the first slice with np.copyto, then fill the rest with get_image(series[i], ..., out=out[...]).
    • No list of N arrays and no second full-size allocation from np.stack.
  2. dicom_csv.misc.get_image

    • Added optional argument out: Optional[np.ndarray] = None.
    • When provided, the result is written into out via np.copyto(out, array) and out is returned.
    • Calls without out are unchanged (backward compatible).
  3. Scripts

    • scripts/benchmark_stack_images.py — benchmark for dcmread, pixel_array, get_image, stack_images (with breakdown for get_image_loop vs np_stack); supports --path (real DICOM) and --synthetic (temp series).
    • scripts/generate_compressed_dataset.py — generate synthetic series with optional compression (RLE, JPEG-LS) for benchmarking on compressed data.
  4. Cleanup

    • Removed unused partial import in misc.py.

Rationale

  • Memory: The old path held a list of N arrays (N × slice size) plus the np.stack result (same size), so peak usage during stack_images was about double the series size. The new path uses one buffer and fills it via out=, avoiding the list and the extra full-size allocation.
  • Structure: Single pass over the series and explicit buffer; ready for future optimizations (e.g. decoding directly into the buffer via Rust/dicom-rs) without changing the stack_images contract.

Impact

Memory (per series):

Series size Before (peak) After (peak) Saved
500×512×512 (int16) ~500 MB ~250 MB ~250 MB (~2×)
100×512×512 (int16) ~150 MB ~50 MB ~100 MB

At 1500 studies/day, peak memory during stack_images is reduced by this amount per series; with multiple parallel workers, the saving scales with the number of concurrent series.

CPU: No measurable speedup in current benchmarks; most time is in N get_image calls (pixel_array + rescale) and copying into the buffer. The gain is in memory and in a cleaner path for future optimizations (e.g. direct decode into buffer).

Backward compatibility

  • stack_images(series, axis=-1, to_color_space=None) signature unchanged.
  • get_image(instance, to_color_space=None) without out behaves as before.
  • New get_image(..., out=...) is optional and does not affect existing callers.

How to verify

  • Run benchmark:
    python scripts/benchmark_stack_images.py --synthetic --slices 500 --rows 512 --cols 512 --runs 2
  • Correctness: stack_images(series, axis=k) matches np.stack([get_image(ds) for ds in series], axis=k) for axis -1, 0, 1, 2.

@YanLogovskiy YanLogovskiy self-assigned this Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant