Skip to content

Implementation of Overlapped Streaming Inference #25

@praisechan

Description

@praisechan

Hello,

Thanks for the excellent paper and code.

I noticed the current repository covers the kernel optimizations (Sec 3–5) but seems to lack the Overlapped Streaming Inference implementation described in Section 6.1. Specifically, I am looking for the logic that manages concurrent VLM and Action Expert execution via separate CUDA streams.

Do you plan to release the code for this concurrent execution mode? It would be very helpful to see how the multi-stream synchronization is handled to reproduce the real-time throughput results.

Thank you for your work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions