Skip to content

DeepSeek OCR Triton Error [CUDA] an illegal memory access on vLLM 0.11.2 #150

@bojanlazarevski1

Description

@bojanlazarevski1

For certain images only (as far as I observed if mixed with handwritten and digital letters in the same image, but not on all...), I get thrown an error for illegal memory access. Sometimes, for the same image I get thrown RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx

The server crashes for every request on the same image, but works fine for other images.
When I send the same image on a local HF version of the model, the image is processed

I am using an OpenAI call to the deployed model on vllm.

Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) Process EngineCore_DP0:
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) Traceback (most recent call last):
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self.run()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._target(*self._args, **self._kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 846, in run_engine_core
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise e
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 835, in run_engine_core
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) engine_core.run_busy_loop()
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] AsyncLLM output_handler failed.
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] Traceback (most recent call last):
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 477, in output_handler
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] outputs = await engine_core.get_output_async()
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 883, in get_output_async
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] raise self._format_exception(outputs) from None
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 862, in run_busy_loop
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._process_engine_step()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 891, in _process_engine_step
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) outputs, model_executed = self.step_fn()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 342, in step
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) model_output = future.result()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.__get_result()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise self._exception
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 79, in collective_rpc
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) result = run_method(self.driver_worker, method, args, kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 479, in run_method
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 367, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.worker.execute_model(scheduler_output, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 563, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) output = self.model_runner.execute_model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2799, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) model_output = self._model_forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2621, in _model_forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.runnable(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_ocr.py", line 578, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) hidden_states = self.language_model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1402, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) hidden_states = self.model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 399, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return TorchCompileWithNoGuardsWrapper.call(self, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 152, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1242, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) def forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/eval_frame.py", line 1044, in fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fn(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 53, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.optimized_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.wrapped_call(self, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise e
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "<eval_with_key>.26", line 73, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) submod_4 = self.submod_4(getitem_8, s59, l_self_modules_layers_modules_1_modules_self_attn_modules_o_proj_parameters_weight
, l_self_modules_layers_modules_1_modules_post_attention_layernorm_parameters_weight
, getitem_9, l_self_modules_layers_modules_2_modules_input_layernorm_parameters_weight
, l_self_modules_layers_modules_2_modules_self_attn_modules_qkv_proj_parameters_weight
, l_positions
, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache
); getitem_8 = l_self_modules_layers_modules_1_modules_self_attn_modules_o_proj_parameters_weight
= l_self_modules_layers_modules_1_modules_post_attention_layernorm_parameters_weight
= getitem_9 = l_self_modules_layers_modules_2_modules_input_layernorm_parameters_weight
= l_self_modules_layers_modules_2_modules_self_attn_modules_qkv_proj_parameters_weight
= None
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.runnable(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 99, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.compiled_graph_for_general_shape(*args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._compiled_fn(*args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fn(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return compiled_fn(full_args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) all_outs = call_func_at_runtime_with_args(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) out = normalize_as_list(f(args))
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 724, in inner_fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) outs = compiled_fn(args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return compiled_fn(runtime_args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.current_callable(inputs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2962, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) out = model(new_inputs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/tmp/torchinductor_root/z5/cz5za5j2kxjgmasjvuycovhqpzczqjhi6f4pko5oevs3w3tobfvl.py", line 740, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) buf4 = torch.ops.vllm.moe_forward_shared.default(buf2, buf3, 'language_model.model.layers.1.mlp.experts')
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._op(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1921, in moe_forward_shared
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward_impl(hidden_states, router_logits)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1757, in forward_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) final_hidden_states = self.quant_method.apply(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 295, in apply
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py", line 46, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._forward_method(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 401, in forward_cuda
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) result = fused_experts(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1641, in fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return dispatch_fused_experts_func(inplace)(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1564, in torch_vllm_outplace_fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return torch.ops.vllm.outplace_fused_experts(**kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._op(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1488, in outplace_fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fused_experts_impl(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1896, in fused_experts_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) invoke_fused_moe_kernel(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 693, in invoke_fused_moe_kernel
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) fused_moe_kernel[grid](
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 419, in
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 756, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) launch_metadata = kernel.launch_metadata(grid, stream, *bound_args.values())
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 490, in launch_metadata
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._init_handles()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 473, in _init_handles
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self.module, self.function, self.n_regs, self.n_spills, self.n_max_threads = driver.active.utils.load_binary(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered

Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) Process EngineCore_DP0:
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) Traceback (most recent call last):
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self.run()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._target(*self._args, **self._kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 846, in run_engine_core
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise e
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 835, in run_engine_core
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) engine_core.run_busy_loop()
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] AsyncLLM output_handler failed.
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] Traceback (most recent call last):
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 477, in output_handler
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] outputs = await engine_core.get_output_async()
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 883, in get_output_async
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] raise self._format_exception(outputs) from None
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 862, in run_busy_loop
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._process_engine_step()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 891, in _process_engine_step
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) outputs, model_executed = self.step_fn()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 342, in step
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) model_output = future.result()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.__get_result()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise self._exception
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 79, in collective_rpc
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) result = run_method(self.driver_worker, method, args, kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 479, in run_method
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 367, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.worker.execute_model(scheduler_output, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 563, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) output = self.model_runner.execute_model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2799, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) model_output = self._model_forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2621, in _model_forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.runnable(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_ocr.py", line 578, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) hidden_states = self.language_model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1402, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) hidden_states = self.model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 399, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return TorchCompileWithNoGuardsWrapper.call(self, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 152, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1242, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) def forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/eval_frame.py", line 1044, in fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fn(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 53, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.optimized_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.wrapped_call(self, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise e
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "<eval_with_key>.26", line 73, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) submod_4 = self.submod_4(getitem_8, s59, l_self_modules_layers_modules_1_modules_self_attn_modules_o_proj_parameters_weight
, l_self_modules_layers_modules_1_modules_post_attention_layernorm_parameters_weight
, getitem_9, l_self_modules_layers_modules_2_modules_input_layernorm_parameters_weight
, l_self_modules_layers_modules_2_modules_self_attn_modules_qkv_proj_parameters_weight
, l_positions
, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache
); getitem_8 = l_self_modules_layers_modules_1_modules_self_attn_modules_o_proj_parameters_weight
= l_self_modules_layers_modules_1_modules_post_attention_layernorm_parameters_weight
= getitem_9 = l_self_modules_layers_modules_2_modules_input_layernorm_parameters_weight
= l_self_modules_layers_modules_2_modules_self_attn_modules_qkv_proj_parameters_weight
= None
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.runnable(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 99, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.compiled_graph_for_general_shape(*args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._compiled_fn(*args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fn(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return compiled_fn(full_args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) all_outs = call_func_at_runtime_with_args(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) out = normalize_as_list(f(args))
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 724, in inner_fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) outs = compiled_fn(args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return compiled_fn(runtime_args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.current_callable(inputs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2962, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) out = model(new_inputs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/tmp/torchinductor_root/z5/cz5za5j2kxjgmasjvuycovhqpzczqjhi6f4pko5oevs3w3tobfvl.py", line 740, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) buf4 = torch.ops.vllm.moe_forward_shared.default(buf2, buf3, 'language_model.model.layers.1.mlp.experts')
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._op(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1921, in moe_forward_shared
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward_impl(hidden_states, router_logits)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1757, in forward_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) final_hidden_states = self.quant_method.apply(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 295, in apply
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py", line 46, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._forward_method(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 401, in forward_cuda
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) result = fused_experts(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1641, in fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return dispatch_fused_experts_func(inplace)(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1564, in torch_vllm_outplace_fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return torch.ops.vllm.outplace_fused_experts(**kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._op(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1488, in outplace_fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fused_experts_impl(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1896, in fused_experts_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) invoke_fused_moe_kernel(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 693, in invoke_fused_moe_kernel
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) fused_moe_kernel[grid](
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 419, in
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 756, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) launch_metadata = kernel.launch_metadata(grid, stream, *bound_args.values())
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 490, in launch_metadata
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._init_handles()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 473, in _init_handles
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self.module, self.function, self.n_regs, self.n_spills, self.n_max_threads = driver.active.utils.load_binary(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions