-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Milestone
Description
Description
Even with multiprocessing working, there remains a critical operation to safely separate segment and ingestion loop working in parallel with each other - catching up the new estimator to the current frame (or the frames that accrued between starting NMF slicing and updating assets)
potential points of race condition / frame_idx mismatch include:
- NMF: uses
residual. since we're using a single deepcopy for the entire thing, it shouldn't cause too much issue? - Catalog: uses
footprintsandtraces.tracesespecially gets live updated, potentially flushing out the frame_idx that the new estimators from NMF are from. this means the quality test (trace correlation) needs to be handled (a) in a thread-safe manner and (b) possibly using zarr. - Update: uses all assets.
new_footprintsandoverlapsjust get appended, so no biggie.new_traceswill have ??? values where the new frames have been accrued iningestion loopwithout the new estimators. same withsufficient_statistics. some of these values cannot benansince it gets used to predict next frame_idx values.residualsshould get an update here, where clearing new estimator areas to zero occurs - however, this is risky, since we may now be removing overlapping estimators from residuals. maybe i should track overlapping frame_idx and only clear those?buffergets only used here withframe_summary. i'll just make it an attr forFrameSummary.
Scenario (turn this into a sequence diagram)
sequenceDiagram
participant I as Ingest
participant Asset@{ "type" : "queue" }
participant S as Segment
I <<->> +Asset: Frame 0-99
Asset ->> S: NMF (residual)
I <<->> Asset: Frame 100-149
Asset ->> S: Catalog (footprints, traces)
I <<->> Asset: Frame 150-200
Asset <<->> S: Update (all)
- Residual needs to take async into account when updating (the “new” cells may not be from the immediate previous epoch anymore)
- Residual needs to be cleared every time while
segmentis run - otherwise we run the risk of duplicate detection
- Stage 1: we have gathered 100 frames
- Stage 2:
- segment: detect 3
- ingest: get 101-150th frames
- Stage 3:
- segment: catalog needs fp and tr
- now we gotta grab 0-100th frames for
tracesthat were flushed to zarr (for comparison withnew)
- now we gotta grab 0-100th frames for
- ingest: get 151-200th frames
- segment: catalog needs fp and tr
- Stage 4:
- segment: update fp, tr, stats, overlaps
- fp can be updated fine
- tr has a bunch of nan’s. might as well start as nans
- probably ok since by the time an overlapping cell is detected, the new traces would have accrued the same number of epochs
- the latest value however has to be a real number since the
trace_ingestionuses it as a starting value- 0?
- cc: just start with zeros…? if we don’t wanna deal with zarr-loading older traces
- cy: buffer needs to be not updated the whole segment loop otherwise frames 0-100 are not in buffer anymore!
- or it keeps accruing and gets cleared after each detect. maybe it should be a class attribute then.
- ingest: locked out
- segment: update fp, tr, stats, overlaps
Alternative
... or something else we could do - keep the ingestion loop accruing frame_update results within thread, and only update assets with segment. (or opposite, copy the entire asset into segment loop)
- this circumvents possibly having to load zarr in segment in case segment takes a long time and frames get flushed (in catalog and update)
- “catching up” the new estimators to the latest frame still is an issue
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels