The method transfer the KV cache from cpu memory to gpu memory

Wonderful work! Following Q and looking forward ur reply.

1) I am curious about the method in your paper that copy the KV cache from cpu memory to gpu memory.

<img width="1063" alt="image" src="https://github.com/user-attachments/assets/0d8504f1-9706-41d4-af74-732349acd8ce">

Since I have test the following code

`model = nn.Linear()`

`model.to('cuda')`

sometimes the model is large, then the time is costly.

2) Otherwise, does the method in ur paper is time efficient?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The method transfer the KV cache from cpu memory to gpu memory #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The method transfer the KV cache from cpu memory to gpu memory #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions