[WIP] Add vllm Dynamo support by damccorm · Pull Request #36966 · apache/beam

damccorm · 2025-12-02T21:24:12Z

TODO - needs more testing and benchmarking

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

github-actions · 2026-02-01T12:47:47Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

damccorm · 2026-02-02T14:13:47Z

not stale - planning on coming back to this

damccorm · 2026-02-05T15:24:10Z

sdks/python/apache_beam/ml/inference/vllm_inference.py

            sys.executable,
            '-m',
-            'vllm.entrypoints.openai.api_server',
+            self._vllm_executable,


Changing this doesn't work on its own because the dynamo vllm executable doesn't include an api server. As a result, running this produces:

error: unrecognized arguments: --port 48455

So I'll need a different way of doing this

I think I'll need to replicate something like https://github.com/ai-dynamo/dynamo?tab=readme-ov-file#run-dynamo instead

I made this update, and now I'm successfully starting up a model endpoint (HTTP Request: GET http://localhost:52921/v1/models "HTTP/1.1 200 OK"), however now I'm running into a new problem:

Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/__main__.py", line 7, in <module> main() File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 820, in main uvloop.run(worker()) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run return __asyncio.run( ^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper return await main ^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 67, in worker runtime = DistributedRuntime( ^^^^^^^^^^^^^^^^^^^ Exception: Failed to connect to NATS: IO error: Connection refused (os error 111). Verify NATS server is running and accessible.

https://console.cloud.google.com/dataflow/jobs/us-central1/2026-02-11_07_08_48-18398043110228237613

I think that this is called out in https://github.com/ai-dynamo/dynamo?tab=readme-ov-file#run-dynamo and I can avoid NATS entirely with --kv-events-config '{"enable_kv_cache_events": false}', but I've had a little trouble getting that right so far

I solved that piece, but still am running into issues:

{"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"} thread '<unnamed>' panicked at /opt/dynamo/lib/runtime/src/storage/kv.rs:440:29: called `Result::unwrap()` on an `Err` value: BuildError(Unable to create lease. Check etcd server status at http://localhost:2379 {"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"} Caused by: grpc request error: status: 'The service is currently unavailable', self: "tcp connect error") �[2m2026-02-11T16:22:03.347934Z�[0m �[31mERROR�[0m �[2mrunners._cancel_all_tasks�[0m�[2m:�[0m unhandled exception during asyncio.run() shutdown task: <Task finished name='Task-4' coro=<VllmEngineMonitor._check_engine_health() done, defined at /usr/local/lib/python3.12/dist-packages/dynamo/vllm/engine_monitor.py:68> exception=PanicException('called `Result::unwrap()` on an `Err` value: BuildError(Unable to create lease. Check etcd server status at http://localhost:2379\n\nCaused by:\n grpc request error: status: \'The service is currently unavailable\', self: "tcp connect error")')> Traceback (most recent call last): File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper return await main ^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 117, in worker await init(runtime, config) File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 578, in init await register_vllm_model( File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 370, in register_vllm_model await register_llm( Exception: unable to extract tokenizer kind from directory /root/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6 {"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"} During handling of the above exception, another exception occurred: {"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"} Traceback (most recent call last): File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/engine_monitor.py", line 71, in _check_engine_health await self.engine_client.check_health() File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 734, in check_health raise self.dead_error vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. {"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"} During handling of the above exception, another exception occurred: {"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"} Traceback (most recent call last): File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/engine_monitor.py", line 78, in _check_engine_health self.runtime.shutdown() pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: BuildError(Unable to create lease. Check etcd server status at http://localhost:2379 {"job":"2026-02-11_07_49_28-348318588552584172", "logger":"/opt/apache/beam-venv/beam-venv-worker-sdk-0-0/lib/python3.12/site-packages/apache_beam/ml/inference/vllm_inference.py:84", "portability_worker_id":"sdk-0-0_sibling_2", "thread":"Thread-91 (log_stdout)", "worker":"beamapp-dannymccormick-02-02110749-wdvx-harness-qpdp"} Caused by: grpc request error: status: 'The service is currently unavailable', self: "tcp connect error") Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/__main__.py", line 7, in <module> main() File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 820, in main uvloop.run(worker()) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run return __asyncio.run( ^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper return await main ^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 117, in worker await init(runtime, config) File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 578, in init await register_vllm_model( File "/usr/local/lib/python3.12/dist-packages/dynamo/vllm/main.py", line 370, in register_vllm_model await register_llm( Exception: unable to extract tokenizer kind from directory /root/.cache/huggingface/hub/models--facebook--opt-125m/snapshots/27dcfa74d334bc871f3234de431e71c6eeba5dd6

Not sure what is going on yet

[WIP] Add vllm Dynamo support

a5251b8

github-actions bot added the python label Dec 2, 2025

github-actions bot added the stale label Feb 1, 2026

github-actions bot removed the stale label Feb 3, 2026

Add integration test (currently failing)

ef2d83e

github-actions bot added the examples label Feb 5, 2026

damccorm commented Feb 5, 2026

View reviewed changes

damccorm added 3 commits February 11, 2026 10:33

Update commands to start server

85fd5b3

Update commands to start server

cbdad3a

wip

bdc80df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add vllm Dynamo support#36966

[WIP] Add vllm Dynamo support#36966
damccorm wants to merge 5 commits intomasterfrom
users/damccorm/dynamo

damccorm commented Dec 2, 2025

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

damccorm commented Feb 2, 2026

Uh oh!

damccorm Feb 5, 2026

Uh oh!

damccorm Feb 5, 2026

Uh oh!

damccorm Feb 11, 2026

Uh oh!

damccorm Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

damccorm commented Dec 2, 2025

GitHub Actions Tests Status (on master branch)

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

damccorm commented Feb 2, 2026

Uh oh!

damccorm Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

damccorm Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

damccorm Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

damccorm Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant