How do we make our perception stack conscious of active and inactive agents? Some proposals so far: Optical Flow (YOLOACT) Video-Level detection