Skip to content

Missing image.cmdline while running worker #27

@iamliuyin

Description

@iamliuyin

I've successfully build image but failed to run test worker, shown missing image.cmdline which is not in image directory.

Detailed log as below:

/data/release-431b14c# ./scripts/cocoon-launch --test --fake-ton worker.conf

=== Config (worker) ===

Common:
  root_contract_address: EQCns7bYSp0igFvS1wpb5wsZjCKCV19MD5AVzI4EyxsnU73k
  model: Qwen/Qwen3-0.6B
  external_ip: 10.0.2.2
  external_worker_port: 11001
  external_client_port: 11002
  node_wallet_key: <redacted>

Runtime:
  test: True
  fake_ton: True
  local_mode: False
  instance: 0

Worker:
  worker_coefficient: 1000
  hf_token: <redacted>

Paths:
  image_dir: /data/release-431b14c/images/test
  build_dir: /data/release-431b14c/cmake-build-default-tdx
  prepared_spec_dir: /tmp/cocoon-spec-ymhgs9_a/spec

QEMU/hardware:
  gpu: 0000:19:00.00
  persistent: persistent-worker-0.img
  ssh_port: 12005
  vsock_cid: 6
  tcp_ports: [12000]
  no_tdx: False

Extra:
  prepare_only: False
  print_only: False
  skip_build: False
  just_build: False
========================================
Preparing worker: /data/release-431b14c/spec/spec-worker->/tmp/cocoon-spec-ymhgs9_a/spec
Building model: Qwen/Qwen3-0.6B
+ /data/release-431b14c/scripts/build-model Qwen/Qwen3-0.6B
=== Building COCOON model package 'Qwen/Qwen3-0.6B' ===
Activating venv...
Installing huggingface_hub...
=== Tar file already exists, skipping download and tar creation ===
=== Using existing .hash file ===
=== Build complete ===
-rw-r--r-- 1 root root 1.5G Dec  8 01:02 /data/release-431b14c/images/Qwen_Qwen3_0_6B.tar
-rw------- 1 root root  12M Dec  8 01:02 /data/release-431b14c/images/Qwen_Qwen3_0_6B.tar.verity
Qwen/Qwen3-0.6B@c1899de289a04d12100db370d81485cdf75e47ca:b46980c30f690cc59d22e211bca4470c3069432a4230fed0cfb602d2150798f3 /data/release-431b14c/images/Qwen_Qwen3_0_6B.tar
Result: Qwen/Qwen3-0.6B@c1899de289a04d12100db370d81485cdf75e47ca:b46980c30f690cc59d22e211bca4470c3069432a4230fed0cfb602d2150798f3 /data/release-431b14c/images/Qwen_Qwen3_0_6B.tar

======================================================================
SPEC: /tmp/cocoon-spec-ymhgs9_a/spec
======================================================================

Files:
  644      8  .gitignore
  644    304  cocoon-router.service
  644    810  cocoon-sglang.service
  644    564  cocoon-vllm.service
  644    347  cocoon-worker-runner.service
  644    723  fake-ton-config.json
  755    877  init
  644     18  model.verity_hash
  644   4546  runtime/runtime.vars
  644    557  worker-config.json

======================================================================

[RUNTIME VARS] runtime.vars
----------------------------------------------------------------------
PROXY_SC_CODE=b5ee9c724102070100020a000114ff00f4a413f4bcf2c80b0102f6d3eda2edfb6c2220c700915be001d0d3030171b0915be0fa403001d31fd33fed44d0d301fa00fa40d3ffd31fd33fd33fd4d4302b821077be8515ba925f0de02b82105cfc6b87ba945f0ddb31e02b8210ab0fb31cba945f0ddb31e02b8210a35cb580ba945f0ddb31e02b8210c68ebc7bbae3020b821008e7d036ba020401fe145f0433333502fa40fa00fa40d1f8284045705470005300104710361025104910381029c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc922c8cb0112f400f400cb00c9f9007074c8cb02ca07cbffc9d00581524f06c70515f2f482105f2542e2801050427003c8cb0558cf1601fa02cb6a12cb1f0300585210cb3fc970fb0082102565934c80105003707003c8cb0558cf1601fa02cb6a12cb1fcb3fc98040fb00db310108e3025f0c0501fc08fa40d33ffa40d15313a85354a113a8f8285464915610705450311036102504c8cb3f5003cf1601cf16cbffcb1fc921c8cb0113f40012f400cb00c9f9007074c8cb02ca07cbffc9d00f81524f1110c7051ff2f48210f0990d01801040437003c8cb0558cf1601fa02cb6a12cb1f52a0cb3fc970fb00506ba0106807104606008810354403401908c8cb015007fa025005cf1613cbffcb1fcb3fcb3fccccc9ed5482102565934c80105003707003c8cb0558cf1601fa02cb6a12cb1fcb3fc98040fb00db3112368999
WORKER_SC_CODE=b5ee9c724101030100d1000114ff00f4a413f4bcf2c80b0101a0d3eda2edfb3322c700925f03e0d0d3033071b0915be001d31fd33fed44d0d33ffa40fa40d3ffd31f3030268210a040ad28bae30210675f07208210c4ec3349ba9330db31e0821014702741ba92db31e00200de8109c50882103b9aca00bc18f2f403fa40318308d718d430d020d31fd33fd33ffa4030038109c60bba1af2f4278109c702ba385007f2f48109c85347b9355004f2f48109c95152c70515f2f48109ca02f901541046f910f2f45502f82304c8cb3f5003cf1601cf16cbffcb1fc9ed548f8921a4
CLIENT_SC_CODE=b5ee9c72410218010004c3000114ff00f4a413f4bcf2c80b010202cd020d03f7d7b68bb7ec831c02497c138007434c0c05c6c2497c1383e900c0074c7f4cffb51343500740074c07e8034cfc13e903e9034ffcc01b4c7f4ffcc0419c4158411440d0408caa08405f75532eea497c3b80aa0840ecbd01eeea517c3b6cc780aa0843c5cb9b0aeb8c08f0f0a20843117e7ceeeb8c08a20842a4d5c0d2ea03040500cc3a07d15192c705f2e42f2982103b9aca00bef2e430509aa082101dcd6500a12082101dcd6500bef2e4315122a082101dcd6500a15449605456442bf02917103645550304c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db310020303436363636d33fd1504213f02adb3104bc8e3438383803d3ffd154565529f02b4576441403c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db31e0288210fafa6cc1bae302288210c68ebc7bbae302288210a0a1927fbae30239278210bb63ff93ba0608090a01fe36375177c705f2e43922c002f2d43a02c0008e353771f82382015180a0525527f02d0750561413c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db31e007f823b9f2e43b72705445595377f02c2250561413c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54070004db310092383a5179c705f2e44302c002f2d44402d33ffa00301023f02e72705448535379f02c2245764430c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db31008037375176c705f2e4b122c002f2d4b203d33ffa00301034f02e4756144330c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db3102fe8ef424c002f2d5dc05fa408308d718d430d020d31fd33fd33ffa00fa4030504ebaf2e5de52b2baf2e5df5360b9f2e5e051b5c705f2e5e101f90154102bf910f2e5e24480f02e10375e504398c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed5482102565934c80105870e03a268210efd711e10b0c00307003c8cb0558cf1601fa02cb6acb1fcb3fc98040fb00db3100fcba8e7803c002f2d5dc03fa408308d718d430d020d31fd33fd33ffa00fa4030504cbaf2e5de5292baf2e5df5370b9f2e5e05196c705f2e5e101f901541029f910f2e5e2103416f02e722845535247f02c702010371610454400c85005cf165003cf16cbffc905c8cb015004fa0212cb3f12cb1fcbffccc9ed54db31e05f0a0201660e130201200f10006d582105cfc6b8780185410747003c8cb0558cf1601fa02cb6a16cb1f14cb3f5004cf1601fa020282101dcd6500a112fa02cb3fc971fb0080201201112005d208428d72d6020061401dc5c00f232c15633c5807e80b2da85b2c7c532cfd40133c5807e8084b2cff2cff25c7ec020004f20842ac3ecc7200614015c5c00f232c15633c5807e80b2da8532c7c4b2cfd633c5b2fff25c7ec02002012014170201201516006708b0002497c178208431a3af1ee00614019c1c00f232c15633c5807e80b2da8572c7c4f2cfd633c5807e808073c5b260c1bec020004b20843cc65ce8600614011c1c00f232c15633c5807e80b2da84f2c7f2cfc073c5b260103ec020002945313bb915be05214a15003a85301b991a1e05b70839b479f7
MODEL_NAME=Qwen/Qwen3-0.6B
MODEL_COMMIT=c1899de289a04d12100db370d81485cdf75e47ca
MODEL_VERITY_HASH=b46980c30f690cc59d22e211bca4470c3069432a4230fed0cfb602d2150798f3
OWNER_ADDRESS=UQCHnr27nLpEmNbGVfDF45tKvu-WvBL9yxLVo-FDfFcjBpu3
ROOT_CONTRACT_ADDRESS=EQCns7bYSp0igFvS1wpb5wsZjCKCV19MD5AVzI4EyxsnU73k
EXTERNAL_IP=10.0.2.2
EXTERNAL_WORKER_PORT=11001
EXTERNAL_CLIENT_PORT=11002
WORKER_COEFFICIENT=1000
HF_TOKEN=<redacted>
NODE_WALLET_KEY=<redacted>
----------------------------------------------------------------------

[CONFIG] fake-ton-config.json
----------------------------------------------------------------------
{
  "proxy_hashes" : [ "ProxyProxyProxyProxyProxyProxyProxyProxyPro=" ],
  "worker_hashes" : [ "WorkerWorkerWorkerWorkerWorkerWorkerWorkerU=" ],
  "model_names" : [ "$MODEL_NAME" ],
  "registered_proxies" : [
    {
      "seqno" : 1,
      "address" : "$EXTERNAL_IP:$EXTERNAL_WORKER_PORT $EXTERNAL_IP:$EXTERNAL_CLIENT_PORT",
      "proxy_hash" : "ProxyProxyProxyProxyProxyProxyProxyProxyPro="
    }
  ],
  "price_per_token" : 2,
  "worker_fee_per_token" : 1,
  "last_proxy_seqno" : 1,
  "version" : 3,
  "params_version" : 1,
  "proxy_sc_code" : "$PROXY_SC_CODE",
  "worker_sc_code" : "$WORKER_SC_CODE",
  "client_sc_code" : "$CLIENT_SC_CODE",
  "root_owner_address" : "0QBgPMq7ye2WRcudRHK_-yjHwMXjeN-UCzAuoo06TR6hmgHV"
}
----------------------------------------------------------------------

[CONFIG] worker-config.json
----------------------------------------------------------------------
{
  "is_test": "$IS_DEBUG",
  "is_testnet" : false,
  "http_port" : 12000,
  "rpc_port" : 12001,
  "proxy_connections" : 1,
  "model_name" : "$MODEL_NAME@$MODEL_COMMIT:$MODEL_VERITY_HASH",
  "ton_config_filename": "$TON_CONFIG_FILE",
  "owner_address": "$OWNER_ADDRESS",
  "image_hash": "$TDX_IMAGE_HASH",
  "root_contract_address": "$ROOT_CONTRACT_ADDRESS",
  "node_wallet_key": "$NODE_WALLET_KEY",
  "connect_to_proxy_via": "127.0.0.1:8116",
  "forward_requests_to": "127.0.0.1:8000",
  "coefficient": "$WORKER_COEFFICIENT",
  "max_active_requests": 60
}
----------------------------------------------------------------------

[SERVICE] cocoon-router.service
----------------------------------------------------------------------
[Unit]
Description=TDX router
Wants=vllm.service

[Service]
Type=simple
ExecStart=/usr/bin/router -S 8116@tdx --serialize-info -C /etc/tdx/tdx
StandardOutput=journal+console
StandardError=journal+console

# Resource accounting (for health-monitor)
CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
----------------------------------------------------------------------

[SERVICE] cocoon-sglang.service
----------------------------------------------------------------------
[Unit]
Description=sglang docker container
Requires=docker.service
After=docker.service

[Service]
ExecStartPre=/usr/bin/docker pull lmsysorg/sglang:v0.5.5.post3@sha256:97fe3876fd7f0d27c72c79f612b024e08e9ac4ffdc52b5e4f81b7b53e1f3e819
ExecStart=docker run --rm --gpus all --name %n -v /mnt/model:/model -p 8000:8000 --ipc=host --entrypoint=python3 lmsysorg/sglang:v0.5.5.post3@sha256:97fe3876fd7f0d27c72c79f612b024e08e9ac4ffdc52b5e4f81b7b53e1f3e819 -m sglang.launch_server --tp=1 --trust-remote-code --host 0.0.0.0 --port 8000 --model /model --served-model-name $MODEL_NAME --enable-cache-report
ExecStop=-/usr/bin/docker stop %n

TimeoutStartSec=2h
StandardOutput=journal+console
StandardError=journal+console

# Resource accounting (for health-monitor)
CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
----------------------------------------------------------------------

[SERVICE] cocoon-vllm.service
----------------------------------------------------------------------
[Unit]
Description=vllm docker container
Requires=docker.service
After=docker.service

[Service]
ExecStartPre=/usr/bin/docker pull vllm/vllm-openai:latest
ExecStart=docker run --rm --gpus all --name %n -v /mnt/model:/model -p 8000:8000 --ipc=host vllm/vllm-openai:latest --model /model --served-model-name $MODEL_NAME --enable-prompt-tokens-details
ExecStop=-/usr/bin/docker stop %n

TimeoutStartSec=2h
StandardOutput=journal+console
StandardError=journal+console

# Resource accounting (for health-monitor)
CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
----------------------------------------------------------------------

[SERVICE] cocoon-worker-runner.service
----------------------------------------------------------------------
[Unit]
Description=TDX VLLM proxy
Wants=vllm.service

[Service]
Type=simple
ExecStart=/usr/bin/worker-runner -c /run/spec/worker-config.json --disable-ton /run/spec/fake-ton-config.json
StandardOutput=journal+console
StandardError=journal+console

# Resource accounting (for health-monitor)
CPUAccounting=yes
MemoryAccounting=yes
IOAccounting=yes
----------------------------------------------------------------------

[INIT SCRIPT] init
----------------------------------------------------------------------
#!/bin/bash
set -euo pipefail
set -x

echo "INIT WORKER RUNNER"

# Mount modle
# For now using cocoon-render-config to get access to MODEL_VERITY_HASH
cocoon-render-config model.verity_hash
MODEL_VERITY_HASH=$(cat /run/spec/model.verity_hash)
time veritysetup open /dev/disk/by-id/virtio-model-tar model.tar /dev/disk/by-id/virtio-model-tar-verity $MODEL_VERITY_HASH
time fuse-archive -o nocache /dev/mapper/model.tar /mnt/model

# Render worker config with runtime values
cocoon-render-config worker-config.json

# Render ton config

BACKEND=sglang
cocoon-render-config cocoon-$BACKEND.service

cocoon-render-config fake-ton-config.json
systemctl enable /spec/cocoon-router.service
systemctl enable /spec/cocoon-worker-runner.service
systemctl enable /run/spec/cocoon-$BACKEND.service
systemctl start cocoon-router.service cocoon-worker-runner.service cocoon-$BACKEND.service
----------------------------------------------------------------------
======================================================================

+ /data/release-431b14c/scripts/setup-gpu-vfio 0000:19:00.00
✓ GPU 0000:19:00.0 is bound to vfio-pci
✓ GPU 0000:19:00.0 CC mode is enabled
Traceback (most recent call last):
  File "/data/release-431b14c/./scripts/cocoon-launch", line 888, in <module>
    main()
  File "/data/release-431b14c/./scripts/cocoon-launch", line 884, in main
    run_qemu(cfg)
  File "/data/release-431b14c/./scripts/cocoon-launch", line 573, in run_qemu
    cmdline = cmdline_file.read_text().strip()
              ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/pathlib.py", line 1029, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/pathlib.py", line 1015, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/data/release-431b14c/images/test/image.cmdline'

and below is the images directory:

/data/release-431b14c# ll -h images/
total 1.5G
drwxr-xr-x 2 root root 4.0K Dec  8 01:30 ./
drwxr-xr-x 9 root root 4.0K Dec  8 01:28 ../
-rw-r--r-- 1 root root    0 Dec  8 01:30 .build-Qwen_Qwen3_0_6B.lock
-rw-r--r-- 1 root root    0 Dec  8 01:30 .build-__help.lock
-rw-r--r-- 1 root root    0 Dec  8 01:30 .build-google_gemma_3_270m.lock
-rw-r--r-- 1 root root 1.5G Dec  8 01:02 Qwen_Qwen3_0_6B.tar
-rw------- 1 root root   64 Dec  8 01:02 Qwen_Qwen3_0_6B.tar.hash
-rw------- 1 root root  12M Dec  8 01:02 Qwen_Qwen3_0_6B.tar.verity

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions