Skip to content

FileNotFoundError: [Errno 2] No such file or directory: 'serve' #171

@bakas-george

Description

@bakas-george

Hello,

we are trying to deploy vLLM using the aws-neuron container on AWS sagemaker. On startup, the script fails on ENTRYPOINT:

ENTRYPOINT ["python", "/usr/local/bin/vllm_entrypoint.py"]

I can see that the file is copied on line 107 it seems that we are getting an error related to this entrypoint.

The exact error we get from logs on startup is the following:

Traceback (most recent call last):
  File "/usr/local/bin/vllm_entrypoint.py", line 4, in <module>
    subprocess.check_call(sys.argv[1:])
  File "/opt/conda/lib/python3.11/subprocess.py", line 408, in check_call
    retcode = call(*popenargs, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/subprocess.py", line 389, in call
    with Popen(*popenargs, **kwargs) as p:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/opt/conda/lib/python3.11/subprocess.py", line 1955, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)

For your information we are using the following to create the endpoint:

Sagemaker model:

custom_image: "public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:0.9.1-neuronx-py311-sdk2.26.1-ubuntu22.04"
      mode: "SingleModel"
      model_data_url: "<custom_model_data_vllm>"
      environment:
      - name: "SM_VLLM_MAX_MODEL_LEN"
        value: "12000"
      - name: "SM_VLLM_LIMIT_MM_PER_PROMPT"
        value: '{"image":6, "video":0}'
      - name: "SM_VLLM_MODEL"
        value: "/opt/ml/model/qwen2_W4A16"
      - name: "SM_VLLM_MM_PROCESSOR_CACHE_GB"
        value: "0"
      - name: "SM_VLLM_NO_ENABLE_PREFIX_CACHING"
        value: "true"
      - name: "SM_VLLM_ADDITIONAL_CONFIG"
        value: "{\"override_neuron_config\":{\"enable_bucketing\":false}}"

Sagemaker endpoint configuration

  instance_type: "ml.inf2.8xlarge"
  routing_config:
    routing_strategy: "LEAST_OUTSTANDING_REQUESTS"

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions