Skip to content

The method transfer the KV cache from cpu memory to gpu memory #3

@digbangbang

Description

@digbangbang

Wonderful work! Following Q and looking forward ur reply.

  1. I am curious about the method in your paper that copy the KV cache from cpu memory to gpu memory.
image

Since I have test the following code

model = nn.Linear()

model.to('cuda')

sometimes the model is large, then the time is costly.

  1. Otherwise, does the method in ur paper is time efficient?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions