Skip to content

昇腾910B4部署GLM-4.6V-Flash,启动成功但是推理报错 #251

@maitaotao

Description

@maitaotao

System Info / 系統信息

910B4 8*64G,docker:xllm-ai:xllm-dev-hb-rc2-arm,xLLM:master分支

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

启动脚本如下:
BATCH_SIZE=256
XLLM_PATH="./build/xllm/core/server/xllm"
MODEL_PATH=/mnt/ZhipuAI/GLM-4.6V-Flash/

MASTER_NODE_ADDR="172.20.117.34:10015"
LOCAL_HOST="172.20.117.34"
START_PORT=18994
START_DEVICE=0
LOG_DIR="logs"
NNODES=1

for (( i=0; i<$NNODES; i++ ))
do
PORT=$((START_PORT + i))
DEVICE=$((START_DEVICE + i))
LOG_FILE="$LOG_DIR/node_$i.log"
nohup numactl -C $((DEVICE12))-$((DEVICE12+11)) $XLLM_PATH
--model $MODEL_PATH -model_id glm_46v
--host $LOCAL_HOST
--port $PORT
--devices="npu:$DEVICE"
--master_node_addr=$MASTER_NODE_ADDR
--nnodes=$NNODES
--node_rank=$i
--max_memory_utilization=0.86
--max_tokens_per_batch=40000
--max_seqs_per_batch=$BATCH_SIZE
--communication_backend=hccl
--enable_schedule_overlap=true
--enable_prefix_cache=true
--enable_chunked_prefill=false
--enable_shm=true
> $LOG_FILE 2>&1 &
done

启动成功日志:
I20260123 15:40:53.112957 140995 hf_model_loader.cpp:80] Loading model weights from /mnt/ZhipuAI/GLM-4.6V-Flash/model-00002-of-00004.safetensors
I20260123 15:40:53.112959 140994 hf_model_loader.cpp:80] Loading model weights from /mnt/ZhipuAI/GLM-4.6V-Flash/model-00001-of-00004.safetensors
I20260123 15:40:53.112958 140996 hf_model_loader.cpp:80] Loading model weights from /mnt/ZhipuAI/GLM-4.6V-Flash/model-00003-of-00004.safetensors
I20260123 15:40:53.112963 140997 hf_model_loader.cpp:80] Loading model weights from /mnt/ZhipuAI/GLM-4.6V-Flash/model-00004-of-00004.safetensors
I20260123 15:40:56.015409 140694 comm_channel.cpp:543] Init_model_async succeed.
I20260123 15:40:56.019799 140644 vlm_engine.cpp:202] worker #0: available memory: 41.28 GB, total memory: 60.96 GB. Using max_memory_utilization: 0.86, max_cache_size: 0.00 B
I20260123 15:40:56.019863 140644 vlm_engine.cpp:257] kv cache capacity: bytes: 35155882292, blocks: 6705, slot_size: 1024
I20260123 15:40:56.019874 140644 vlm_engine.cpp:280] Initializing k cache with shape: [6705 128 2 128]
I20260123 15:40:56.019883 140644 vlm_engine.cpp:281] Initializing v cache with shape: [6705 128 2 128]
I20260123 15:40:57.122172 140644 jinja_chat_template.cpp:30] Jinja chat template init succeed.
I20260123 15:40:57.625254 140644 server.cpp:1200] Server[xllm::APIService] is serving on port=18994.
I20260123 15:40:57.625342 140644 server.cpp:1203] Check out http://node34:18994 in web browser.
I20260123 15:40:57.625355 140644 xllm_server.cpp:59] Brpc Server started on port 18994, idle_timeout_s: -1, num_threads: 32

调用请求:
curl --location 'http://172.20.117.34:18994/v1/chat/completions'
--header 'Content-Type: application/json'
--data '{
"model": "glm_46v",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "你是谁"}
]
}],
"max_tokens": 256,
"stream": false,
"temperature": 0.2
}'

报错信息:
terminate called after throwing an instance of 'c10::Error'
what(): split_with_sizes expects split_sizes to sum exactly to 64 (input tensor's size at dimension -1), but got split_sizes=[]
Exception raised from split_with_sizes at /pytorch/aten/src/ATen/native/TensorShape.cpp:2610 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x68 (0xffff85a1d898 in /usr/local/lib64/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x6c (0xffff859d62a8 in /usr/local/lib64/python3.11/site-packages/torch/lib/libc10.so)
frame #2: at::native::split_with_sizes(at::Tensor const&, c10::ArrayRef, long) + 0x20c (0xffff86d428d0 in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #3: + 0x1e48c24 (0xffff878d8c24 in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #4: at::_ops::split_with_sizes::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRefc10::SymInt, long) + 0x98 (0xffff872da5e8 in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #5: + 0x37ac8bc (0xffff8923c8bc in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #6: + 0x37aca38 (0xffff8923ca38 in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #7: at::_ops::split_with_sizes::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRefc10::SymInt, long) + 0x98 (0xffff872da5e8 in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #8: + 0x3077098 (0xffff88b07098 in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #9: + 0x3077738 (0xffff88b07738 in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::_ops::split_with_sizes::call(at::Tensor const&, c10::ArrayRefc10::SymInt, long) + 0x140 (0xffff8731b030 in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #11: at::native::split_symint(at::Tensor const&, c10::ArrayRefc10::SymInt, long) + 0x14 (0xffff86d22f08 in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #12: + 0x1fb9854 (0xffff87a49854 in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::_ops::split_sizes::call(at::Tensor const&, c10::ArrayRefc10::SymInt, long) + 0x140 (0xffff871b0be0 in /usr/local/lib64/python3.11/site-packages/torch/lib/libtorch_cpu.so)
frame #14: ./build/xllm/core/server/xllm() [0xcd6258]
frame #15: ./build/xllm/core/server/xllm() [0xceaa74]
frame #16: ./build/xllm/core/server/xllm() [0xd0b16c]
frame #17: ./build/xllm/core/server/xllm() [0xd0b45c]
frame #18: ./build/xllm/core/server/xllm() [0xb645c0]
frame #19: ./build/xllm/core/server/xllm() [0xb644f4]
frame #20: ./build/xllm/core/server/xllm() [0xb47f2c]
frame #21: ./build/xllm/core/server/xllm() [0xb32084]
frame #22: ./build/xllm/core/server/xllm() [0x28b0c80]
frame #23: + 0x19cd24 (0xffff8dc0cd24 in /usr/local/Ascend/ascend-toolkit/latest/lib64/libllm_datadist.so)
frame #24: + 0x7ce4c (0xffff8560ce4c in /usr/lib64/libc.so.6)
frame #25: + 0xe3b0c (0xffff85673b0c in /usr/lib64/libc.so.6)

terminate called recursively
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
/usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown

Expected behavior / 期待表现

成功启动并调用成功

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions