generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request
Description
Hello,
we are trying to deploy vLLM using the aws-neuron container on AWS sagemaker. On startup, the script fails on ENTRYPOINT:
ENTRYPOINT ["python", "/usr/local/bin/vllm_entrypoint.py"]
I can see that the file is copied on line 107 it seems that we are getting an error related to this entrypoint.
The exact error we get from logs on startup is the following:
Traceback (most recent call last):
File "/usr/local/bin/vllm_entrypoint.py", line 4, in <module>
subprocess.check_call(sys.argv[1:])
File "/opt/conda/lib/python3.11/subprocess.py", line 408, in check_call
retcode = call(*popenargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/subprocess.py", line 389, in call
with Popen(*popenargs, **kwargs) as p:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/opt/conda/lib/python3.11/subprocess.py", line 1955, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
For your information we are using the following to create the endpoint:
Sagemaker model:
custom_image: "public.ecr.aws/neuron/pytorch-inference-vllm-neuronx:0.9.1-neuronx-py311-sdk2.26.1-ubuntu22.04"
mode: "SingleModel"
model_data_url: "<custom_model_data_vllm>"
environment:
- name: "SM_VLLM_MAX_MODEL_LEN"
value: "12000"
- name: "SM_VLLM_LIMIT_MM_PER_PROMPT"
value: '{"image":6, "video":0}'
- name: "SM_VLLM_MODEL"
value: "/opt/ml/model/qwen2_W4A16"
- name: "SM_VLLM_MM_PROCESSOR_CACHE_GB"
value: "0"
- name: "SM_VLLM_NO_ENABLE_PREFIX_CACHING"
value: "true"
- name: "SM_VLLM_ADDITIONAL_CONFIG"
value: "{\"override_neuron_config\":{\"enable_bucketing\":false}}"Sagemaker endpoint configuration
instance_type: "ml.inf2.8xlarge"
routing_config:
routing_strategy: "LEAST_OUTSTANDING_REQUESTS"Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request