-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Description
I've successfully build image but failed to run test worker, shown missing image.cmdline which is not in image directory.
Detailed log as below:
/data/release-431b14c# ./scripts/cocoon-launch --test --fake-ton worker.conf
=== Config (worker) ===
Common:
root_contract_address: EQCns7bYSp0igFvS1wpb5wsZjCKCV19MD5AVzI4EyxsnU73k
model: Qwen/Qwen3-0.6B
external_ip: 10.0.2.2
external_worker_port: 11001
external_client_port: 11002
node_wallet_key: <redacted>
Runtime:
test: True
fake_ton: True
local_mode: False
instance: 0
Worker:
worker_coefficient: 1000
hf_token: <redacted>
Paths:
image_dir: /data/release-431b14c/images/test
build_dir: /data/release-431b14c/cmake-build-default-tdx
prepared_spec_dir: /tmp/cocoon-spec-ymhgs9_a/spec
QEMU/hardware:
gpu: 0000:19:00.00
persistent: persistent-worker-0.img
ssh_port: 12005
vsock_cid: 6
tcp_ports: [12000]
no_tdx: False
Extra:
prepare_only: False
print_only: False
skip_build: False
just_build: False
========================================
Preparing worker: /data/release-431b14c/spec/spec-worker->/tmp/cocoon-spec-ymhgs9_a/spec
Building model: Qwen/Qwen3-0.6B
+ /data/release-431b14c/scripts/build-model Qwen/Qwen3-0.6B
=== Building COCOON model package 'Qwen/Qwen3-0.6B' ===
Activating venv...
Installing huggingface_hub...
=== Tar file already exists, skipping download and tar creation ===
=== Using existing .hash file ===
=== Build complete ===
-rw-r--r-- 1 root root 1.5G Dec 8 01:02 /data/release-431b14c/images/Qwen_Qwen3_0_6B.tar
-rw------- 1 root root 12M Dec 8 01:02 /data/release-431b14c/images/Qwen_Qwen3_0_6B.tar.verity
Qwen/Qwen3-0.6B@c1899de289a04d12100db370d81485cdf75e47ca:b46980c30f690cc59d22e211bca4470c3069432a4230fed0cfb602d2150798f3 /data/release-431b14c/images/Qwen_Qwen3_0_6B.tar
Result: Qwen/Qwen3-0.6B@c1899de289a04d12100db370d81485cdf75e47ca:b46980c30f690cc59d22e211bca4470c3069432a4230fed0cfb602d2150798f3 /data/release-431b14c/images/Qwen_Qwen3_0_6B.tar
======================================================================
SPEC: /tmp/cocoon-spec-ymhgs9_a/spec
======================================================================
Files:
644 8 .gitignore
644 304 cocoon-router.service
644 810 cocoon-sglang.service
644 564 cocoon-vllm.service
644 347 cocoon-worker-runner.service
644 723 fake-ton-config.json
755 877 init
644 18 model.verity_hash
644 4546 runtime/runtime.vars
644 557 worker-config.json
======================================================================
[RUNTIME VARS] runtime.vars
----------------------------------------------------------------------
PROXY_SC_CODE=b5ee9c724102070100020a000114ff00f4a413f4bcf2c80b0102f6d3eda2edfb6c2220c700915be001d0d3030171b0915be0fa403001d31fd33fed44d0d301fa00fa40d3ffd31fd33fd33fd4d4302b821077be8515ba925f0de02b82105cfc6b87ba945f0ddb31e02b8210ab0fb31cba945f0ddb31e02b8210a35cb580ba945f0ddb31e02b8210c68ebc7bbae3020b821008e7d036ba020401fe145f0433333502fa40fa00fa40d1f8284045705470005300104710361025104910381029c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc922c8cb0112f400f400cb00c9f9007074c8cb02ca07cbffc9d00581524f06c70515f2f482105f2542e2801050427003c8cb0558cf1601fa02cb6a12cb1f0300585210cb3fc970fb0082102565934c80105003707003c8cb0558cf1601fa02cb6a12cb1fcb3fc98040fb00db310108e3025f0c0501fc08fa40d33ffa40d15313a85354a113a8f8285464915610705450311036102504c8cb3f5003cf1601cf16cbffcb1fc921c8cb0113f40012f400cb00c9f9007074c8cb02ca07cbffc9d00f81524f1110c7051ff2f48210f0990d01801040437003c8cb0558cf1601fa02cb6a12cb1f52a0cb3fc970fb00506ba0106807104606008810354403401908c8cb015007fa025005cf1613cbffcb1fcb3fcb3fccccc9ed5482102565934c80105003707003c8cb0558cf1601fa02cb6a12cb1fcb3fc98040fb00db3112368999
WORKER_SC_CODE=b5ee9c724101030100d1000114ff00f4a413f4bcf2c80b0101a0d3eda2edfb3322c700925f03e0d0d3033071b0915be001d31fd33fed44d0d33ffa40fa40d3ffd31f3030268210a040ad28bae30210675f07208210c4ec3349ba9330db31e0821014702741ba92db31e00200de8109c50882103b9aca00bc18f2f403fa40318308d718d430d020d31fd33fd33ffa4030038109c60bba1af2f4278109c702ba385007f2f48109c85347b9355004f2f48109c95152c70515f2f48109ca02f901541046f910f2f45502f82304c8cb3f5003cf1601cf16cbffcb1fc9ed548f8921a4
CLIENT_SC_CODE=b5ee9c72410218010004c3000114ff00f4a413f4bcf2c80b010202cd020d03f7d7b68bb7ec831c02497c138007434c0c05c6c2497c1383e900c0074c7f4cffb51343500740074c07e8034cfc13e903e9034ffcc01b4c7f4ffcc0419c4158411440d0408caa08405f75532eea497c3b80aa0840ecbd01eeea517c3b6cc780aa0843c5cb9b0aeb8c08f0f0a20843117e7ceeeb8c08a20842a4d5c0d2ea03040500cc3a07d15192c705f2e42f2982103b9aca00bef2e430509aa082101dcd6500a12082101dcd6500bef2e4315122a082101dcd6500a15449605456442bf02917103645550304c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db310020303436363636d33fd1504213f02adb3104bc8e3438383803d3ffd154565529f02b4576441403c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db31e0288210fafa6cc1bae302288210c68ebc7bbae302288210a0a1927fbae30239278210bb63ff93ba0608090a01fe36375177c705f2e43922c002f2d43a02c0008e353771f82382015180a0525527f02d0750561413c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db31e007f823b9f2e43b72705445595377f02c2250561413c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54070004db310092383a5179c705f2e44302c002f2d44402d33ffa00301023f02e72705448535379f02c2245764430c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db31008037375176c705f2e4b122c002f2d4b203d33ffa00301034f02e4756144330c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db3102fe8ef424c002f2d5dc05fa408308d718d430d020d31fd33fd33ffa00fa4030504ebaf2e5de52b2baf2e5df5360b9f2e5e051b5c705f2e5e101f90154102bf910f2e5e24480f02e10375e504398c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed5482102565934c80105870e03a268210efd711e10b0c00307003c8cb0558cf1601fa02cb6acb1fcb3fc98040fb00db3100fcba8e7803c002f2d5dc03fa408308d718d430d020d31fd33fd33ffa00fa4030504cbaf2e5de5292baf2e5df5370b9f2e5e05196c705f2e5e101f901541029f910f2e5e2103416f02e722845535247f02c702010371610454400c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db31e05f0a0201660e130201200f10006d582105cfc6b8780185410747003c8cb0558cf1601fa02cb6a16cb1f14cb3f5004cf1601fa020282101dcd6500a112fa02cb3fc971fb0080201201112005d208428d72d6020061401dc5c00f232c15633c5807e80b2da85b2c7c532cfd40133c5807e8084b2cff2cff25c7ec020004f20842ac3ecc7200614015c5c00f232c15633c5807e80b2da8532c7c4b2cfd633c5b2fff25c7ec02002012014170201201516006708b0002497c178208431a3af1ee00614019c1c00f232c15633c5807e80b2da8572c7c4f2cfd633c5807e808073c5b260c1bec020004b20843cc65ce8600614011c1c00f232c15633c5807e80b2da84f2c7f2cfc073c5b260103ec020002945313bb915be05214a15003a85301b991a1e05b70839b479f7
MODEL_NAME=Qwen/Qwen3-0.6B
MODEL_COMMIT=c1899de289a04d12100db370d81485cdf75e47ca
MODEL_VERITY_HASH=b46980c30f690cc59d22e211bca4470c3069432a4230fed0cfb602d2150798f3
OWNER_ADDRESS=UQCHnr27nLpEmNbGVfDF45tKvu-WvBL9yxLVo-FDfFcjBpu3
ROOT_CONTRACT_ADDRESS=EQCns7bYSp0igFvS1wpb5wsZjCKCV19MD5AVzI4EyxsnU73k
EXTERNAL_IP=10.0.2.2
EXTERNAL_WORKER_PORT=11001
EXTERNAL_CLIENT_PORT=11002
WORKER_COEFFICIENT=1000
HF_TOKEN=<redacted>
NODE_WALLET_KEY=<redacted>
----------------------------------------------------------------------
[CONFIG] fake-ton-config.json
----------------------------------------------------------------------
{
"proxy_hashes" : [ "ProxyProxyProxyProxyProxyProxyProxyProxyPro=" ],
"worker_hashes" : [ "WorkerWorkerWorkerWorkerWorkerWorkerWorkerU=" ],
"model_names" : [ "$MODEL_NAME" ],
"registered_proxies" : [
{
"seqno" : 1,
"address" : "$EXTERNAL_IP:$EXTERNAL_WORKER_PORT $EXTERNAL_IP:$EXTERNAL_CLIENT_PORT",
"proxy_hash" : "ProxyProxyProxyProxyProxyProxyProxyProxyPro="
}
],
"price_per_token" : 2,
"worker_fee_per_token" : 1,
"last_proxy_seqno" : 1,
"version" : 3,
"params_version" : 1,
"proxy_sc_code" : "$PROXY_SC_CODE",
"worker_sc_code" : "$WORKER_SC_CODE",
"client_sc_code" : "$CLIENT_SC_CODE",
"root_owner_address" : "0QBgPMq7ye2WRcudRHK_-yjHwMXjeN-UCzAuoo06TR6hmgHV"
}
----------------------------------------------------------------------
[CONFIG] worker-config.json
----------------------------------------------------------------------
{
"is_test": "$IS_DEBUG",
"is_testnet" : false,
"http_port" : 12000,
"rpc_port" : 12001,
"proxy_connections" : 1,
"model_name" : "$MODEL_NAME@$MODEL_COMMIT:$MODEL_VERITY_HASH",
"ton_config_filename": "$TON_CONFIG_FILE",
"owner_address": "$OWNER_ADDRESS",
"image_hash": "$TDX_IMAGE_HASH",
"root_contract_address": "$ROOT_CONTRACT_ADDRESS",
"node_wallet_key": "$NODE_WALLET_KEY",
"connect_to_proxy_via": "127.0.0.1:8116",
"forward_requests_to": "127.0.0.1:8000",
"coefficient": "$WORKER_COEFFICIENT",
"max_active_requests": 60
}
----------------------------------------------------------------------
[SERVICE] cocoon-router.service
----------------------------------------------------------------------
[Unit]
Description=TDX router
Wants=vllm.service
[Service]
Type=simple
ExecStart=/usr/bin/router -S 8116@tdx --serialize-info -C /etc/tdx/tdx
StandardOutput=journal+console
StandardError=journal+console
# Resource accounting (for health-monitor)
CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
----------------------------------------------------------------------
[SERVICE] cocoon-sglang.service
----------------------------------------------------------------------
[Unit]
Description=sglang docker container
Requires=docker.service
After=docker.service
[Service]
ExecStartPre=/usr/bin/docker pull lmsysorg/sglang:v0.5.5.post3@sha256:97fe3876fd7f0d27c72c79f612b024e08e9ac4ffdc52b5e4f81b7b53e1f3e819
ExecStart=docker run --rm --gpus all --name %n -v /mnt/model:/model -p 8000:8000 --ipc=host --entrypoint=python3 lmsysorg/sglang:v0.5.5.post3@sha256:97fe3876fd7f0d27c72c79f612b024e08e9ac4ffdc52b5e4f81b7b53e1f3e819 -m sglang.launch_server --tp=1 --trust-remote-code --host 0.0.0.0 --port 8000 --model /model --served-model-name $MODEL_NAME --enable-cache-report
ExecStop=-/usr/bin/docker stop %n
TimeoutStartSec=2h
StandardOutput=journal+console
StandardError=journal+console
# Resource accounting (for health-monitor)
CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
----------------------------------------------------------------------
[SERVICE] cocoon-vllm.service
----------------------------------------------------------------------
[Unit]
Description=vllm docker container
Requires=docker.service
After=docker.service
[Service]
ExecStartPre=/usr/bin/docker pull vllm/vllm-openai:latest
ExecStart=docker run --rm --gpus all --name %n -v /mnt/model:/model -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model /model --served-model-name $MODEL_NAME --enable-prompt-tokens-details
ExecStop=-/usr/bin/docker stop %n
TimeoutStartSec=2h
StandardOutput=journal+console
StandardError=journal+console
# Resource accounting (for health-monitor)
CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
----------------------------------------------------------------------
[SERVICE] cocoon-worker-runner.service
----------------------------------------------------------------------
[Unit]
Description=TDX VLLM proxy
Wants=vllm.service
[Service]
Type=simple
ExecStart=/usr/bin/worker-runner -c /run/spec/worker-config.json --disable-ton /run/spec/fake-ton-config.json
StandardOutput=journal+console
StandardError=journal+console
# Resource accounting (for health-monitor)
CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
----------------------------------------------------------------------
[INIT SCRIPT] init
----------------------------------------------------------------------
#!/bin/bash
set -euo pipefail
set -x
echo "INIT WORKER RUNNER"
# Mount modle
# For now using cocoon-render-config to get access to MODEL_VERITY_HASH
cocoon-render-config model.verity_hash
MODEL_VERITY_HASH=$(cat /run/spec/model.verity_hash)
time veritysetup open /dev/disk/by-id/virtio-model-tar model.tar /dev/disk/by-id/virtio-model-tar-verity $MODEL_VERITY_HASH
time fuse-archive -o nocache /dev/mapper/model.tar /mnt/model
# Render worker config with runtime values
cocoon-render-config worker-config.json
# Render ton config
BACKEND=sglang
cocoon-render-config cocoon-$BACKEND.service
cocoon-render-config fake-ton-config.json
systemctl enable /spec/cocoon-router.service
systemctl enable /spec/cocoon-worker-runner.service
systemctl enable /run/spec/cocoon-$BACKEND.service
systemctl start cocoon-router.service cocoon-worker-runner.service cocoon-$BACKEND.service
----------------------------------------------------------------------
======================================================================
+ /data/release-431b14c/scripts/setup-gpu-vfio 0000:19:00.00
✓ GPU 0000:19:00.0 is bound to vfio-pci
✓ GPU 0000:19:00.0 CC mode is enabled
Traceback (most recent call last):
File "/data/release-431b14c/./scripts/cocoon-launch", line 888, in <module>
main()
File "/data/release-431b14c/./scripts/cocoon-launch", line 884, in main
run_qemu(cfg)
File "/data/release-431b14c/./scripts/cocoon-launch", line 573, in run_qemu
cmdline = cmdline_file.read_text().strip()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/pathlib.py", line 1029, in read_text
with self.open(mode='r', encoding=encoding, errors=errors) as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/pathlib.py", line 1015, in open
return io.open(self, mode, buffering, encoding, errors, newline)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/data/release-431b14c/images/test/image.cmdline'
and below is the images directory:
/data/release-431b14c# ll -h images/
total 1.5G
drwxr-xr-x 2 root root 4.0K Dec 8 01:30 ./
drwxr-xr-x 9 root root 4.0K Dec 8 01:28 ../
-rw-r--r-- 1 root root 0 Dec 8 01:30 .build-Qwen_Qwen3_0_6B.lock
-rw-r--r-- 1 root root 0 Dec 8 01:30 .build-__help.lock
-rw-r--r-- 1 root root 0 Dec 8 01:30 .build-google_gemma_3_270m.lock
-rw-r--r-- 1 root root 1.5G Dec 8 01:02 Qwen_Qwen3_0_6B.tar
-rw------- 1 root root 64 Dec 8 01:02 Qwen_Qwen3_0_6B.tar.hash
-rw------- 1 root root 12M Dec 8 01:02 Qwen_Qwen3_0_6B.tar.verity
Metadata
Metadata
Assignees
Labels
No labels