AttributeError: 'kt_kernel_ext.CPUInfer' object has no attribute 'submit_with_cuda_stream'

### Reminder

- [x] I have read the above rules and searched the existing issues.

### System Info

```markdown

(ktransformers) ➜  ik_llama.cpp git:(main) kt doctor                                                                     

KTransformers 环境诊断

┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Check                ┃ Status   ┃ Value                                                                                  ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Python 版本          │ 正常     │ 3.11.11                                                                                │
│ CUDA 可用性          │ 正常     │ 12.6                                                                                   │
│ GPU 检测             │ 正常     │ 发现 1 个 GPU: NVIDIA RTX A6000                                                        │
│ CPU                  │ 正常     │ INTEL(R) XEON(R) GOLD 6530 (64 核心 / 64 线程)                                         │
│ CPU 指令集           │ 正常     │ AMX-INT8, AMX-BF16, AMX-TILE, AVX512BF16, AVX512F, AVX512BW, AVX512VL, AVX2 (+14 more) │
│ NUMA 拓扑            │ 正常     │ 4 个节点                                                                               │
│ 系统内存             │ 正常     │ 617.3GB 可用 / 628.6GB 总计                                                            │
│ 磁盘空间             │ 正常     │ /data/llm-models 有 2301.6GB 可用空间                                                  │
│ SGLang Source        │ 正常     │ Source (GitHub: kvcache-ai/sglang, branch: main)                                       │
│ SGLang kt-kernel     │ 正常     │ SGLang kt-kernel 支持已验证                                                            │
│ Environment Managers │ 正常     │ conda 24.9.2, venv builtin, docker 28.3.0                                              │
└──────────────────────┴──────────┴────────────────────────────────────────────────────────────────────────────────────────┘

```

### Reproduction

```bash
 kt run /data/llm-models/MiniMaxAI/MiniMax-M2.1 --cpu-threads 64 --numa-nodes 4
```
Output and error:
```text
[2025-12-27 14:37:30] The available memory for KV cache is 2.00 GB.                                                                           [313/1863]
[2025-12-27 14:37:30] max_total_tokens=100000 is larger than the profiled value 8469. Use the profiled value instead.                                   
[2025-12-27 14:37:30] KV Cache is allocated. #tokens: 8469, K size: 1.00 GB, V size: 1.00 GB                                                            
[2025-12-27 14:37:30] Memory pool end. avail mem=8.14 GB                                                                                                
[2025-12-27 14:37:31] Capture cuda graph begin. This can take up to several minutes. avail mem=7.63 GB                                                  
[2025-12-27 14:37:31] Capture cuda graph bs [1, 2, 4, 8, 12, 16]                                                                                        
Capturing batches (bs=16 avail_mem=7.61 GB):   0%|                                                                             | 0/6 [00:16<?, ?it/s]   
[2025-12-27 14:37:48] Scheduler hit an exception: Traceback (most recent call last):                                                                    
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/managers/scheduler.py", line 2833, in run_scheduler_process                                    
    scheduler = Scheduler(                                                                                                                              
                ^^^^^^^^^^                                                                                                                              
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/managers/scheduler.py", line 314, in __init__                                                  
    self.init_model_worker()                                                                                                                            
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/managers/scheduler.py", line 452, in init_model_worker                                         
    self.tp_worker = TpModelWorker(                                                                                                                     
                     ^^^^^^^^^^^^^^                                                                                                                     
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/managers/tp_worker.py", line 253, in __init__                                                  
    self._model_runner = ModelRunner(                                                                                                                   
                         ^^^^^^^^^^^^                                                                                                                   
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/model_runner.py", line 385, in __init__                                         
    self.initialize(min_per_gpu_memory)                                                                                                                 
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/model_runner.py", line 584, in initialize                                       
    self.init_device_graphs()                                                                                                                           
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/model_runner.py", line 2670, in init_device_graphs                              
    self.graph_runner = graph_runners[self.device](self)                                                                                                
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 358, in __init__                                    
    self.capture()                                                                                                                                      
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 513, in capture                                     
    _capture_one_stream()                                                                                                                               
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 500, in _capture_one_stream                         
    ) = self.capture_one_batch_size(bs, forward, stream_idx)                                                                                            
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                            
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 719, in capture_one_batch_size                      
    run_once()                                                                                                                                          
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 706, in run_once                                    
    logits_output_or_pp_proxy_tensors = forward(                                                                                                        
                                        ^^^^^^^^                                                                                                        
  File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context              
    return func(*args, **kwargs)                                                                                                                        
           ^^^^^^^^^^^^^^^^^^^^^                                                                                                                        
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/models/minimax_m2.py", line 795, in forward                                                    
    hidden_states = self.model(input_ids, positions, forward_batch, input_embeds)                                                                       
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                       
  File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl           
    return self._call_impl(*args, **kwargs)                                                                                                             
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                             
  File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl                   
    return forward_call(*args, **kwargs)                                                                                                                
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                      
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/models/minimax_m2.py", line 710, in forward                                                    
    hidden_states, residual = layer(                                                                                                                    
                              ^^^^^^                                                                                                                    
  File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl           
    return self._call_impl(*args, **kwargs)                                                                                                             
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                             
  File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl                   
    return forward_call(*args, **kwargs)                                                                                                                
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/models/minimax_m2.py", line 559, in forward                                                    
    hidden_states = self.block_sparse_moe(hidden_states, forward_batch)                                                                                 
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                 
  File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl           
    return self._call_impl(*args, **kwargs)                                                                                                             
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                             
  File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl                   
    return forward_call(*args, **kwargs)                                                                                                                
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/models/minimax_m2.py", line 200, in forward                                                    
    return self.forward_normal(hidden_states)                                                                                                           
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                           
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/models/minimax_m2.py", line 210, in forward_normal                                             
    final_hidden_states = self.experts(hidden_states, topk_output)                                                                                      
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                      
  File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl           
    return self._call_impl(*args, **kwargs)                                                                                                             
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                             
  File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl                   
    return forward_call(*args, **kwargs)                                                                                                                
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 908, in forward                                    
    return self.forward_impl(hidden_states, topk_output)                                                                                                
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 927, in forward_impl                               
    combine_input = self.run_moe_core(                                                                                                                  
                    ^^^^^^^^^^^^^^^^^^                                                                                                                  
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 948, in run_moe_core                               
    return self.quant_method.apply(                                                                                                                     
           ^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                     
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/layers/moe/kt_ep_wrapper.py", line 965, in apply                                               
    self.submit(layer, dispatch_output)                                                                                                                 
  File "/home/mgi527a/Softwares/sglang/python/sglang/srt/layers/moe/kt_ep_wrapper.py", line 895, in submit                                              
    self.wrapper.submit_forward(                                                                                                                        
  File "/home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/kt_kernel/experts_base.py", line 277, in submit_forward                 
    self.cpu_infer.submit_with_cuda_stream(                                                                                                             
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                              
AttributeError: 'kt_kernel_ext.CPUInfer' object has no attribute 'submit_with_cuda_stream'                                                              
                                                                                                                                                        
[2025-12-27 14:37:48] Received sigquit from a child process. It usually means the child failed.
```

### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AttributeError: 'kt_kernel_ext.CPUInfer' object has no attribute 'submit_with_cuda_stream' #1760

Reminder

System Info

Reproduction

Others

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AttributeError: 'kt_kernel_ext.CPUInfer' object has no attribute 'submit_with_cuda_stream' #1760

Description

Reminder

System Info

Reproduction

Others

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions