Skip to content

Asnyc Prep #155

@raymondwjang

Description

@raymondwjang

Description

Even with multiprocessing working, there remains a critical operation to safely separate segment and ingestion loop working in parallel with each other - catching up the new estimator to the current frame (or the frames that accrued between starting NMF slicing and updating assets)

potential points of race condition / frame_idx mismatch include:

  1. NMF: uses residual. since we're using a single deepcopy for the entire thing, it shouldn't cause too much issue?
  2. Catalog: uses footprints and traces. traces especially gets live updated, potentially flushing out the frame_idx that the new estimators from NMF are from. this means the quality test (trace correlation) needs to be handled (a) in a thread-safe manner and (b) possibly using zarr.
  3. Update: uses all assets. new_footprints and overlaps just get appended, so no biggie. new_traces will have ??? values where the new frames have been accrued in ingestion loop without the new estimators. same with sufficient_statistics. some of these values cannot be nan since it gets used to predict next frame_idx values. residuals should get an update here, where clearing new estimator areas to zero occurs - however, this is risky, since we may now be removing overlapping estimators from residuals. maybe i should track overlapping frame_idx and only clear those? buffer gets only used here with frame_summary. i'll just make it an attr for FrameSummary.

Scenario (turn this into a sequence diagram)

sequenceDiagram

participant I as Ingest
participant Asset@{ "type" : "queue" }
participant S as Segment
I <<->> +Asset: Frame 0-99
Asset ->> S: NMF (residual)
I <<->> Asset: Frame 100-149
Asset ->> S: Catalog (footprints, traces)
I <<->> Asset: Frame 150-200
Asset <<->> S: Update (all)
Loading
  1. Residual needs to take async into account when updating (the “new” cells may not be from the immediate previous epoch anymore)
  2. Residual needs to be cleared every time while segment is run - otherwise we run the risk of duplicate detection
  • Stage 1: we have gathered 100 frames
  • Stage 2:
    • segment: detect 3
    • ingest: get 101-150th frames
  • Stage 3:
    • segment: catalog needs fp and tr
      • now we gotta grab 0-100th frames for traces that were flushed to zarr (for comparison with new)
    • ingest: get 151-200th frames
  • Stage 4:
    • segment: update fp, tr, stats, overlaps
      • fp can be updated fine
      • tr has a bunch of nan’s. might as well start as nans
        • probably ok since by the time an overlapping cell is detected, the new traces would have accrued the same number of epochs
        • the latest value however has to be a real number since the trace_ingestion uses it as a starting value
          • 0?
      • cc: just start with zeros…? if we don’t wanna deal with zarr-loading older traces
      • cy: buffer needs to be not updated the whole segment loop otherwise frames 0-100 are not in buffer anymore!
        • or it keeps accruing and gets cleared after each detect. maybe it should be a class attribute then.
    • ingest: locked out

Alternative

... or something else we could do - keep the ingestion loop accruing frame_update results within thread, and only update assets with segment. (or opposite, copy the entire asset into segment loop)

  • this circumvents possibly having to load zarr in segment in case segment takes a long time and frames get flushed (in catalog and update)
  • “catching up” the new estimators to the latest frame still is an issue

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions