-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Reminder
- I have read the above rules and searched the existing issues.
System Info
(ktransformers) ➜ ik_llama.cpp git:(main) kt doctor
KTransformers 环境诊断
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check ┃ Status ┃ Value ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Python 版本 │ 正常 │ 3.11.11 │
│ CUDA 可用性 │ 正常 │ 12.6 │
│ GPU 检测 │ 正常 │ 发现 1 个 GPU: NVIDIA RTX A6000 │
│ CPU │ 正常 │ INTEL(R) XEON(R) GOLD 6530 (64 核心 / 64 线程) │
│ CPU 指令集 │ 正常 │ AMX-INT8, AMX-BF16, AMX-TILE, AVX512BF16, AVX512F, AVX512BW, AVX512VL, AVX2 (+14 more) │
│ NUMA 拓扑 │ 正常 │ 4 个节点 │
│ 系统内存 │ 正常 │ 617.3GB 可用 / 628.6GB 总计 │
│ 磁盘空间 │ 正常 │ /data/llm-models 有 2301.6GB 可用空间 │
│ SGLang Source │ 正常 │ Source (GitHub: kvcache-ai/sglang, branch: main) │
│ SGLang kt-kernel │ 正常 │ SGLang kt-kernel 支持已验证 │
│ Environment Managers │ 正常 │ conda 24.9.2, venv builtin, docker 28.3.0 │
└──────────────────────┴──────────┴────────────────────────────────────────────────────────────────────────────────────────┘
Reproduction
kt run /data/llm-models/MiniMaxAI/MiniMax-M2.1 --cpu-threads 64 --numa-nodes 4Output and error:
[2025-12-27 14:37:30] The available memory for KV cache is 2.00 GB. [313/1863]
[2025-12-27 14:37:30] max_total_tokens=100000 is larger than the profiled value 8469. Use the profiled value instead.
[2025-12-27 14:37:30] KV Cache is allocated. #tokens: 8469, K size: 1.00 GB, V size: 1.00 GB
[2025-12-27 14:37:30] Memory pool end. avail mem=8.14 GB
[2025-12-27 14:37:31] Capture cuda graph begin. This can take up to several minutes. avail mem=7.63 GB
[2025-12-27 14:37:31] Capture cuda graph bs [1, 2, 4, 8, 12, 16]
Capturing batches (bs=16 avail_mem=7.61 GB): 0%| | 0/6 [00:16<?, ?it/s]
[2025-12-27 14:37:48] Scheduler hit an exception: Traceback (most recent call last):
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/managers/scheduler.py", line 2833, in run_scheduler_process
scheduler = Scheduler(
^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/managers/scheduler.py", line 314, in __init__
self.init_model_worker()
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/managers/scheduler.py", line 452, in init_model_worker
self.tp_worker = TpModelWorker(
^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/managers/tp_worker.py", line 253, in __init__
self._model_runner = ModelRunner(
^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/model_runner.py", line 385, in __init__
self.initialize(min_per_gpu_memory)
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/model_runner.py", line 584, in initialize
self.init_device_graphs()
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/model_runner.py", line 2670, in init_device_graphs
self.graph_runner = graph_runners[self.device](self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 358, in __init__
self.capture()
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 513, in capture
_capture_one_stream()
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 500, in _capture_one_stream
) = self.capture_one_batch_size(bs, forward, stream_idx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 719, in capture_one_batch_size
run_once()
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 706, in run_once
logits_output_or_pp_proxy_tensors = forward(
^^^^^^^^
File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/models/minimax_m2.py", line 795, in forward
hidden_states = self.model(input_ids, positions, forward_batch, input_embeds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/models/minimax_m2.py", line 710, in forward
hidden_states, residual = layer(
^^^^^^
File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/models/minimax_m2.py", line 559, in forward
hidden_states = self.block_sparse_moe(hidden_states, forward_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/models/minimax_m2.py", line 200, in forward
return self.forward_normal(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/models/minimax_m2.py", line 210, in forward_normal
final_hidden_states = self.experts(hidden_states, topk_output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 908, in forward
return self.forward_impl(hidden_states, topk_output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 927, in forward_impl
combine_input = self.run_moe_core(
^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 948, in run_moe_core
return self.quant_method.apply(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/layers/moe/kt_ep_wrapper.py", line 965, in apply
self.submit(layer, dispatch_output)
File "/home/mgi527a/Softwares/sglang/python/sglang/srt/layers/moe/kt_ep_wrapper.py", line 895, in submit
self.wrapper.submit_forward(
File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/kt_kernel/experts_base.py", line 277, in submit_forward
self.cpu_infer.submit_with_cuda_stream(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'kt_kernel_ext.CPUInfer' object has no attribute 'submit_with_cuda_stream'
[2025-12-27 14:37:48] Received sigquit from a child process. It usually means the child failed.
Others
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working