Conversation
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
|
not stale - planning on coming back to this |
| sys.executable, | ||
| '-m', | ||
| 'vllm.entrypoints.openai.api_server', | ||
| self._vllm_executable, |
There was a problem hiding this comment.
Changing this doesn't work on its own because the dynamo vllm executable doesn't include an api server. As a result, running this produces:
error: unrecognized arguments: --port 48455
So I'll need a different way of doing this
There was a problem hiding this comment.
I think I'll need to replicate something like https://github.com/ai-dynamo/dynamo?tab=readme-ov-file#run-dynamo instead
There was a problem hiding this comment.
I made this update, and now I'm successfully starting up a model endpoint (HTTP Request: GET http://localhost:52921/v1/models "HTTP/1.1 200 OK"), however now I'm running into a new problem:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/__main__.py", line 7, in <module>
main()
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 820, in main
uvloop.run(worker())
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 67, in worker
runtime = DistributedRuntime(
^^^^^^^^^^^^^^^^^^^
Exception: Failed to connect to NATS: IO error: Connection refused (os error 111). Verify NATS server is running and accessible.
https://console.cloud.google.com/dataflow/jobs/us-central1/2026-02-11_07_08_48-18398043110228237613
I think that this is called out in https://github.com/ai-dynamo/dynamo?tab=readme-ov-file#run-dynamo and I can avoid NATS entirely with --kv-events-config '{"enable_kv_cache_events": false}', but I've had a little trouble getting that right so far
There was a problem hiding this comment.
I solved that piece, but still am running into issues:
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
thread '<unnamed>' panicked at /opt/dynamo/lib/runtime/src/storage/kv.rs:440:29:
called `Result::unwrap()` on an `Err` value: BuildError(Unable to create lease. Check etcd server status at http://localhost:2379
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
Caused by:
grpc request error: status: 'The service is currently unavailable', self: "tcp connect error")
�[2m2026-02-11T16:22:03.347934Z�[0m �[31mERROR�[0m �[2mrunners._cancel_all_tasks�[0m�[2m:�[0m unhandled exception during asyncio.run() shutdown
task: <Task finished name='Task-4' coro=<VllmEngineMonitor._check_engine_health() done, defined at /usr/local/lib/python3.12/dist-packages/dynamo/vllm/engine_monitor.py:68> exception=PanicException('called `Result::unwrap()` on an `Err` value: BuildError(Unable to create lease. Check etcd server status at http://localhost:2379\n\nCaused by:\n grpc request error: status: \'The service is currently unavailable\', self: "tcp connect error")')>
Traceback (most recent call last):
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 117, in worker
await init(runtime, config)
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 578, in init
await register_vllm_model(
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 370, in register_vllm_model
await register_llm(
Exception: unable to extract tokenizer kind from directory /root/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
During handling of the above exception, another exception occurred:
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/engine_monitor.py", line 71, in _check_engine_health
await self.engine_client.check_health()
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 734, in check_health
raise self.dead_error
vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
During handling of the above exception, another exception occurred:
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/engine_monitor.py", line 78, in _check_engine_health
self.runtime.shutdown()
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: BuildError(Unable to create lease. Check etcd server status at http://localhost:2379
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
Caused by:
grpc request error: status: 'The service is currently unavailable', self: "tcp connect error")
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/__main__.py", line 7, in <module>
main()
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 820, in main
uvloop.run(worker())
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 117, in worker
await init(runtime, config)
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 578, in init
await register_vllm_model(
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 370, in register_vllm_model
await register_llm(
Exception: unable to extract tokenizer kind from directory /root/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6
Not sure what is going on yet
TODO - needs more testing and benchmarking
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.