-
Notifications
You must be signed in to change notification settings - Fork 165
Description
For certain images only (as far as I observed if mixed with handwritten and digital letters in the same image, but not on all...), I get thrown an error for illegal memory access. Sometimes, for the same image I get thrown RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx
The server crashes for every request on the same image, but works fine for other images.
When I send the same image on a local HF version of the model, the image is processed
I am using an OpenAI call to the deployed model on vllm.
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) Process EngineCore_DP0:
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) Traceback (most recent call last):
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self.run()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._target(*self._args, **self._kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 846, in run_engine_core
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise e
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 835, in run_engine_core
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) engine_core.run_busy_loop()
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] AsyncLLM output_handler failed.
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] Traceback (most recent call last):
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 477, in output_handler
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] outputs = await engine_core.get_output_async()
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 883, in get_output_async
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] raise self._format_exception(outputs) from None
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 862, in run_busy_loop
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._process_engine_step()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 891, in _process_engine_step
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) outputs, model_executed = self.step_fn()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 342, in step
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) model_output = future.result()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.__get_result()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise self._exception
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 79, in collective_rpc
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) result = run_method(self.driver_worker, method, args, kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 479, in run_method
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 367, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.worker.execute_model(scheduler_output, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 563, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) output = self.model_runner.execute_model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2799, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) model_output = self._model_forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2621, in _model_forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.runnable(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_ocr.py", line 578, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) hidden_states = self.language_model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1402, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) hidden_states = self.model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 399, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return TorchCompileWithNoGuardsWrapper.call(self, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 152, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1242, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) def forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/eval_frame.py", line 1044, in fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fn(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 53, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.optimized_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.wrapped_call(self, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise e
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "<eval_with_key>.26", line 73, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) submod_4 = self.submod_4(getitem_8, s59, l_self_modules_layers_modules_1_modules_self_attn_modules_o_proj_parameters_weight, l_self_modules_layers_modules_1_modules_post_attention_layernorm_parameters_weight, getitem_9, l_self_modules_layers_modules_2_modules_input_layernorm_parameters_weight, l_self_modules_layers_modules_2_modules_self_attn_modules_qkv_proj_parameters_weight, l_positions, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache); getitem_8 = l_self_modules_layers_modules_1_modules_self_attn_modules_o_proj_parameters_weight = l_self_modules_layers_modules_1_modules_post_attention_layernorm_parameters_weight = getitem_9 = l_self_modules_layers_modules_2_modules_input_layernorm_parameters_weight = l_self_modules_layers_modules_2_modules_self_attn_modules_qkv_proj_parameters_weight = None
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.runnable(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 99, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.compiled_graph_for_general_shape(*args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._compiled_fn(*args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fn(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return compiled_fn(full_args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) all_outs = call_func_at_runtime_with_args(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) out = normalize_as_list(f(args))
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 724, in inner_fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) outs = compiled_fn(args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return compiled_fn(runtime_args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.current_callable(inputs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2962, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) out = model(new_inputs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/tmp/torchinductor_root/z5/cz5za5j2kxjgmasjvuycovhqpzczqjhi6f4pko5oevs3w3tobfvl.py", line 740, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) buf4 = torch.ops.vllm.moe_forward_shared.default(buf2, buf3, 'language_model.model.layers.1.mlp.experts')
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._op(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1921, in moe_forward_shared
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward_impl(hidden_states, router_logits)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1757, in forward_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) final_hidden_states = self.quant_method.apply(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 295, in apply
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py", line 46, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._forward_method(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 401, in forward_cuda
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) result = fused_experts(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1641, in fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return dispatch_fused_experts_func(inplace)(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1564, in torch_vllm_outplace_fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return torch.ops.vllm.outplace_fused_experts(**kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._op(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1488, in outplace_fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fused_experts_impl(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1896, in fused_experts_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) invoke_fused_moe_kernel(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 693, in invoke_fused_moe_kernel
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) fused_moe_kernel[grid](
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 419, in
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 756, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) launch_metadata = kernel.launch_metadata(grid, stream, *bound_args.values())
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 490, in launch_metadata
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._init_handles()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 473, in _init_handles
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self.module, self.function, self.n_regs, self.n_spills, self.n_max_threads = driver.active.utils.load_binary(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) Process EngineCore_DP0:
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) Traceback (most recent call last):
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self.run()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._target(*self._args, **self._kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 846, in run_engine_core
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise e
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 835, in run_engine_core
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) engine_core.run_busy_loop()
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] AsyncLLM output_handler failed.
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] Traceback (most recent call last):
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 477, in output_handler
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] outputs = await engine_core.get_output_async()
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 883, in get_output_async
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] raise self._format_exception(outputs) from None
Dec 09 13:35:40 host deepseek-ocr[206214]: (APIServer pid=1) ERROR 12-09 04:35:40 [async_llm.py:525] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 862, in run_busy_loop
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._process_engine_step()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 891, in _process_engine_step
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) outputs, model_executed = self.step_fn()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 342, in step
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) model_output = future.result()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.__get_result()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise self._exception
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/uniproc_executor.py", line 79, in collective_rpc
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) result = run_method(self.driver_worker, method, args, kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/serial_utils.py", line 479, in run_method
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 367, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.worker.execute_model(scheduler_output, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 563, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) output = self.model_runner.execute_model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return func(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2799, in execute_model
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) model_output = self._model_forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2621, in _model_forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.runnable(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_ocr.py", line 578, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) hidden_states = self.language_model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1402, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) hidden_states = self.model(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 399, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return TorchCompileWithNoGuardsWrapper.call(self, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 152, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v2.py", line 1242, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) def forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/eval_frame.py", line 1044, in fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fn(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 53, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.optimized_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.wrapped_call(self, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) raise e
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in wrapped_call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.call_impl(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in call_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return forward_call(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "<eval_with_key>.26", line 73, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) submod_4 = self.submod_4(getitem_8, s59, l_self_modules_layers_modules_1_modules_self_attn_modules_o_proj_parameters_weight, l_self_modules_layers_modules_1_modules_post_attention_layernorm_parameters_weight, getitem_9, l_self_modules_layers_modules_2_modules_input_layernorm_parameters_weight, l_self_modules_layers_modules_2_modules_self_attn_modules_qkv_proj_parameters_weight, l_positions, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache); getitem_8 = l_self_modules_layers_modules_1_modules_self_attn_modules_o_proj_parameters_weight = l_self_modules_layers_modules_1_modules_post_attention_layernorm_parameters_weight = getitem_9 = l_self_modules_layers_modules_2_modules_input_layernorm_parameters_weight = l_self_modules_layers_modules_2_modules_self_attn_modules_qkv_proj_parameters_weight = None
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.runnable(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 99, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.compiled_graph_for_general_shape(*args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 63, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._compiled_fn(*args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fn(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1130, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return compiled_fn(full_args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 353, in runtime_wrapper
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) all_outs = call_func_at_runtime_with_args(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 129, in call_func_at_runtime_with_args
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) out = normalize_as_list(f(args))
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 724, in inner_fn
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) outs = compiled_fn(args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 526, in wrapper
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return compiled_fn(runtime_args)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 613, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.current_callable(inputs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 2962, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) out = model(new_inputs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/tmp/torchinductor_root/z5/cz5za5j2kxjgmasjvuycovhqpzczqjhi6f4pko5oevs3w3tobfvl.py", line 740, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) buf4 = torch.ops.vllm.moe_forward_shared.default(buf2, buf3, 'language_model.model.layers.1.mlp.experts')
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 841, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._op(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1921, in moe_forward_shared
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward_impl(hidden_states, router_logits)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1757, in forward_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) final_hidden_states = self.quant_method.apply(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 295, in apply
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self.forward(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/custom_op.py", line 46, in forward
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._forward_method(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py", line 401, in forward_cuda
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) result = fused_experts(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1641, in fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return dispatch_fused_experts_func(inplace)(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1564, in torch_vllm_outplace_fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return torch.ops.vllm.outplace_fused_experts(**kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in call
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return self._op(*args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1488, in outplace_fused_experts
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return fused_experts_impl(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1896, in fused_experts_impl
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) invoke_fused_moe_kernel(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 693, in invoke_fused_moe_kernel
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) fused_moe_kernel[grid](
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 419, in
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 756, in run
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) launch_metadata = kernel.launch_metadata(grid, stream, *bound_args.values())
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 490, in launch_metadata
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self._init_handles()
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 473, in _init_handles
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) self.module, self.function, self.n_regs, self.n_spills, self.n_max_threads = driver.active.utils.load_binary(
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 09 13:35:40 host deepseek-ocr[206214]: (EngineCore_DP0 pid=86) RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered