Optimize stack_images: pre-allocate + fill, add get_image(..., out=) by YanLogovskiy · Pull Request #35 · neuro-ml/dicom-csv

YanLogovskiy · 2026-02-02T10:52:47Z

Reduce peak memory during DICOM series stacking by pre-allocating a single buffer and filling it slice-by-slice instead of building a list of arrays and calling np.stack. Add optional out argument to get_image so callers can write directly into the result buffer.

Changes

dicom_csv.misc.stack_images
- Replaced np.stack(list(map(get_image, series)), axis) with one pre-allocated array and slice-by-slice fill.
- For axis in (0, 1, 2) (including axis=-1): allocate once with np.empty(...), copy the first slice with np.copyto, then fill the rest with get_image(series[i], ..., out=out[...]).
- No list of N arrays and no second full-size allocation from np.stack.
dicom_csv.misc.get_image
- Added optional argument out: Optional[np.ndarray] = None.
- When provided, the result is written into out via np.copyto(out, array) and out is returned.
- Calls without out are unchanged (backward compatible).
Scripts
- scripts/benchmark_stack_images.py — benchmark for dcmread, pixel_array, get_image, stack_images (with breakdown for get_image_loop vs np_stack); supports --path (real DICOM) and --synthetic (temp series).
- scripts/generate_compressed_dataset.py — generate synthetic series with optional compression (RLE, JPEG-LS) for benchmarking on compressed data.
Cleanup
- Removed unused partial import in misc.py.

Rationale

Memory: The old path held a list of N arrays (N × slice size) plus the np.stack result (same size), so peak usage during stack_images was about double the series size. The new path uses one buffer and fills it via out=, avoiding the list and the extra full-size allocation.
Structure: Single pass over the series and explicit buffer; ready for future optimizations (e.g. decoding directly into the buffer via Rust/dicom-rs) without changing the stack_images contract.

Impact

Memory (per series):

Series size	Before (peak)	After (peak)	Saved
500×512×512 (int16)	~500 MB	~250 MB	~250 MB (~2×)
100×512×512 (int16)	~150 MB	~50 MB	~100 MB

At 1500 studies/day, peak memory during stack_images is reduced by this amount per series; with multiple parallel workers, the saving scales with the number of concurrent series.

CPU: No measurable speedup in current benchmarks; most time is in N get_image calls (pixel_array + rescale) and copying into the buffer. The gain is in memory and in a cleaner path for future optimizations (e.g. direct decode into buffer).

Backward compatibility

stack_images(series, axis=-1, to_color_space=None) signature unchanged.
get_image(instance, to_color_space=None) without out behaves as before.
New get_image(..., out=...) is optional and does not affect existing callers.

How to verify

Run benchmark:
python scripts/benchmark_stack_images.py --synthetic --slices 500 --rows 512 --cols 512 --runs 2
Correctness: stack_images(series, axis=k) matches np.stack([get_image(ds) for ds in series], axis=k) for axis -1, 0, 1, 2.

yanlogovskiy added 2 commits February 2, 2026 13:44

memory optimizations in stack_images

f5f2dd1

add benchmark scripts

394681a

YanLogovskiy self-assigned this Feb 2, 2026

test all axis & fix warnings

c56b2e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Optimize stack_images: pre-allocate + fill, add get_image(..., out=)#35

Optimize stack_images: pre-allocate + fill, add get_image(..., out=)#35
YanLogovskiy wants to merge 3 commits intoneuro-ml:masterfrom
YanLogovskiy:master

YanLogovskiy commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

YanLogovskiy commented Feb 2, 2026

Changes

Rationale

Impact

Backward compatibility

How to verify

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant