Implementation of Overlapped Streaming Inference

Hello,

Thanks for the excellent paper and code.

I noticed the current repository covers the kernel optimizations (Sec 3–5) but seems to lack the Overlapped Streaming Inference implementation described in Section 6.1. Specifically, I am looking for the logic that manages concurrent VLM and Action Expert execution via separate CUDA streams.

Do you plan to release the code for this concurrent execution mode? It would be very helpful to see how the multi-stream synchronization is handled to reproduce the real-time throughput results.

Thank you for your work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Overlapped Streaming Inference #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implementation of Overlapped Streaming Inference #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions