Skip to content

How can I actually put it into use? #2

@dingjingzhen-bot

Description

@dingjingzhen-bot

Great job!
However, the biggest problem with this type of work lies in how to integrate it efficiently into existing inference engines, such as sglang or vllm, because the additional overhead they bring and the damage to existing features (paged attention, prefix cache) are hard to avoid, making them very useless. To address this issue, How did you solve it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions