Skip to content

Failed to run demo/inference.py on multiple GPUs with RuntimeError: Expected all tensors to be on the same device #7

@shiqi-dai

Description

@shiqi-dai

I successfully ran demo/inference.py on the CPU, but it responds slowly. Due to limited memory on the 3090 GPU, I attempted to run it on two GPUs. However, I meet an error in Chat.answer(), indicating: "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!". Screenshot of the error:
Screenshot 2024-06-15 at 00 20 40
And I also type the device map of the model:
image
I am unsure why this error occurs. I've tried to fix it all day. Any insights or solutions would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions