'cudaMemcpy(ith_context_req_token_index.ptr, ith_context_req_token_index_cpu, sizeof(int32_t) * (batch_size+1), cudaMemcpyHostToDevice)'

I installed the environment **environment.yml** and wanted to generate evaluation results.  On S-Terminal, I am running **bash /app/distserve/distserve/evaluation/ae-scripts/kick-the-tires/distllm-server.sh** and C-Terminal **bash /app/distserve/distserve/evaluation/ae-scripts/kick-the-tires/distllm-client.sh**.  I tried with OPT-125M, OPT-1.3B, OPT-6.7B. However, I always get the following error in S-Terminal     


INFO 22:10:15 (context) 0 waiting, 7 finished but unaccepted, 0 blocks occupied by on-the-fly requests
INFO 22:10:15 (decoding) CPU blocks: 0 / 2048 (0.00%) used, (0 swapping in)
INFO 22:10:15 (decoding) GPU blocks: 270 / 3755 (7.19%) used, (0 swapping out)
INFO 22:10:15 (decoding) 7 unaccepted, 0 waiting, 3 processing
(ParaWorker pid=14304) [ERROR] CUDA error /DistServe/SwiftTransformer/src/csrc/model/gpt/gpt.cc:294 'cudaMemcpy(ith_context_req_token_index.ptr, ith_context_req_token_index_cpu, sizeof(int32_t) * (batch_size+1), cudaMemcpyHostToDevice)': (719) unspecified launch failure
(ParaWorker pid=14304) INFO 21:47:14 (worker decoding.#0) model facebook/opt-6.7b loaded
(ParaWorker pid=14304) INFO 21:47:14 runtime peak memory: 12.844 GB
(ParaWorker pid=14304) INFO 21:47:14 total GPU memory: 44.403 GB
(ParaWorker pid=14304) INFO 21:47:14 kv cache size for one token: 0.50000 MB
(ParaWorker pid=14304) INFO 21:47:14 num_gpu_blocks: 3755
(ParaWorker pid=14304) INFO 21:47:14 num_cpu_blocks: 2048
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffe65f627fffa81faeb8ef03d901000000 Worker ID: 8bb0dd0ebc21a6586b9a66fd01f48a05fd4f19680185a80d75c0ed4a Node ID: 5c51d1203d5390054ce3a6973294116a342892b6cf2bb763c8dfd950 Worker IP address: 192.168.0.44 Worker port: 46581 Worker PID: 14304 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
Traceback (most recent call last):
  File "/DistServe/distserve/api_server/distserve_api_server.py", line 174, in start_event_loop_wrapper
    await task
  File "/DistServe/distserve/llm.py", line 167, in start_event_loop
    await self.engine.start_all_event_loops()
  File "/DistServe/distserve/engine.py", line 253, in start_all_event_loops
    await asyncio.gather(
  File "/DistServe/distserve/single_stage_engine.py", line 672, in start_event_loop
    await asyncio.gather(event_loop1(), event_loop2(), event_loop3())
  File "/DistServe/distserve/single_stage_engine.py", line 663, in event_loop2
    await self._step()
  File "/DistServe/distserve/single_stage_engine.py", line 609, in _step
    generated_tokens_ids = await self.batches_ret_futures[0]
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
        class_name: ParaWorker
        actor_id: e65f627fffa81faeb8ef03d901000000
        pid: 14304
        namespace: d56e3e54-6ed7-4dc2-9f7f-10a347d1960d
        ip: 192.168.0.44
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
^CException ignored in atexit callback: <function _exit_function at 0x7a4ab42ea050>
Traceback (most recent call last):
  File "/miniconda3/envs/distserve/lib/python3.10/multiprocessing/util.py", line 357, in _exit_function
    p.join()
  File "/miniconda3/envs/distserve/lib/python3.10/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/miniconda3/envs/distserve/lib/python3.10/multiprocessing/popen_fork.py", line 43, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/miniconda3/envs/distserve/lib/python3.10/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)                              



I also tried to run in another machine, in that machine I could not run DistServe,  I am attaching the error message 

![Image](https://github.com/user-attachments/assets/2fbee280-088a-41ca-88df-fc88af263e5d)

Can someone help me, how to run OPT models with DistServe? I tried with ShareGPT dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

'cudaMemcpy(ith_context_req_token_index.ptr, ith_context_req_token_index_cpu, sizeof(int32_t) * (batch_size+1), cudaMemcpyHostToDevice)' #66

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

'cudaMemcpy(ith_context_req_token_index.ptr, ith_context_req_token_index_cpu, sizeof(int32_t) * (batch_size+1), cudaMemcpyHostToDevice)' #66

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions