Skip to content

Reading subsets of streams from files without iterating over everything #172

@fdellekart

Description

@fdellekart

Hello,

I am investigating different toolboxes for PET listmode reconstruction and was trying out PyTomography, which supports PETSIRD as its input format, which in turn uses yardl for the model definition.

Specifically, I am looking into dynamic (frame by frame) reconstruction of the listmode data.
Furthermore, for testing PyTomography I tried to limit data input to a subset of a few seconds acquisition time, to try things out and not waste time loading and processing my full dataset (~10GB, ~1h+ acquistion time) before I know that things actually work.

PyTomography does currently not support specifying concrete time intervals to use from listmode data with a longer acquisition time. It allows to specify timeblock IDs, however, it still iterates over all the timeblocks, filtering out the ones with the specified IDs, which is very inefficient (see here).

Therefore, I dug a bit deeper and tried to figure out if I can adapt the toolbox in a way which allows me to read only certain timespans from the PETSIRD file. After seeing that the protocol readers use streams, I was analyzing the binary structure of the protocols to maybe find a way to calculate the size of the data and then seek the correct position in the file where I'd like to read data.

I found out that vector and array types store their length as the first part of the data in binary form. Therefore, AFAICT what I want to achieve isn't possible and I would have to iterate over all the timeblocks because I can't know the length of the event vectors stored inside them upfront.

Is there another way of achieving what I am trying to? Maybe I am missing/misunderstanding something also, please let me know if this should be the case 🙂

If it really is not possible, IMO it would be a useful feature to consider. Being forced to read the full file when interested in part of it is cumbersome IMO. Also for dynamic reconstruction it could be beneficial, as it wouldn't be necessary to wait for loading of all frames before the first one can be processed.

However, I am not really far into the scope of this project, for transmitting the same stream of bytes over a network this would not make any sense I guess, so let me know if this just is not intended. 👍

PETSIRD protocol definition can be found here.

Thanks in advance and best regards, Florian

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions