Skip to content

[WIP] Add vllm Dynamo support#36966

Draft
damccorm wants to merge 5 commits intomasterfrom
users/damccorm/dynamo
Draft

[WIP] Add vllm Dynamo support#36966
damccorm wants to merge 5 commits intomasterfrom
users/damccorm/dynamo

Conversation

@damccorm
Copy link
Contributor

@damccorm damccorm commented Dec 2, 2025

TODO - needs more testing and benchmarking


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@github-actions github-actions bot added the python label Dec 2, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2026

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Feb 1, 2026
@damccorm
Copy link
Contributor Author

damccorm commented Feb 2, 2026

not stale - planning on coming back to this

@github-actions github-actions bot removed the stale label Feb 3, 2026
sys.executable,
'-m',
'vllm.entrypoints.openai.api_server',
self._vllm_executable,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this doesn't work on its own because the dynamo vllm executable doesn't include an api server. As a result, running this produces:

error: unrecognized arguments: --port 48455

So I'll need a different way of doing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll need to replicate something like https://github.com/ai-dynamo/dynamo?tab=readme-ov-file#run-dynamo instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this update, and now I'm successfully starting up a model endpoint (HTTP Request: GET http://localhost:52921/v1/models "HTTP/1.1 200 OK"), however now I'm running into a new problem:

Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/__main__.py", line 7, in <module>
main()
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 820, in main
uvloop.run(worker())
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 67, in worker
runtime = DistributedRuntime(
^^^^^^^^^^^^^^^^^^^
Exception: Failed to connect to NATS: IO error: Connection refused (os error 111). Verify NATS server is running and accessible.

https://console.cloud.google.com/dataflow/jobs/us-central1/2026-02-11_07_08_48-18398043110228237613

I think that this is called out in https://github.com/ai-dynamo/dynamo?tab=readme-ov-file#run-dynamo and I can avoid NATS entirely with --kv-events-config '{"enable_kv_cache_events": false}', but I've had a little trouble getting that right so far

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I solved that piece, but still am running into issues:

{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
thread '<unnamed>' panicked at /opt/dynamo/lib/runtime/src/storage/kv.rs:440:29:
called `Result::unwrap()` on an `Err` value: BuildError(Unable to create lease. Check etcd server status at http://localhost:2379
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
Caused by:
grpc request error: status: 'The service is currently unavailable', self: "tcp connect error")
�[2m2026-02-11T16:22:03.347934Z�[0m �[31mERROR�[0m �[2mrunners._cancel_all_tasks�[0m�[2m:�[0m unhandled exception during asyncio.run() shutdown
task: <Task finished name='Task-4' coro=<VllmEngineMonitor._check_engine_health() done, defined at /usr/local/lib/python3.12/dist-packages/dynamo/vllm/engine_monitor.py:68> exception=PanicException('called `Result::unwrap()` on an `Err` value: BuildError(Unable to create lease. Check etcd server status at http://localhost:2379\n\nCaused by:\n grpc request error: status: \'The service is currently unavailable\', self: "tcp connect error")')>
Traceback (most recent call last):
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 117, in worker
await init(runtime, config)
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 578, in init
await register_vllm_model(
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 370, in register_vllm_model
await register_llm(
Exception: unable to extract tokenizer kind from directory /root/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
During handling of the above exception, another exception occurred:
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/engine_monitor.py", line 71, in _check_engine_health
await self.engine_client.check_health()
File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 734, in check_health
raise self.dead_error
vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
During handling of the above exception, another exception occurred:
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/engine_monitor.py", line 78, in _check_engine_health
self.runtime.shutdown()
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: BuildError(Unable to create lease. Check etcd server status at http://localhost:2379
{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"}
Caused by:
grpc request error: status: 'The service is currently unavailable', self: "tcp connect error")
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/__main__.py", line 7, in <module>
main()
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 820, in main
uvloop.run(worker())
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 117, in worker
await init(runtime, config)
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 578, in init
await register_vllm_model(
File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 370, in register_vllm_model
await register_llm(
Exception: unable to extract tokenizer kind from directory /root/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6

Not sure what is going on yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant