fix serve_multiplexed_model_id and mm data process#120
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug in the Twinkle server's handling of the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses an issue with serve_multiplexed_model_id by correcting the HTTP header name to the canonical Serve-Multiplexed-Model-Id format and introducing a patch for Ray Serve to handle header normalization from proxies. The changes also improve robustness by using .get() for dictionary access and add a request start hook to several endpoints. My review focuses on improving maintainability by refactoring duplicated code in the endpoint handlers. The core fix and the patching mechanism appear solid.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant set of changes, primarily focused on fixing an HTTP header normalization issue with serve_multiplexed_model_id by patching Ray Serve. It also includes a major and beneficial refactoring of the server state management and resource lifecycle from synchronous to asynchronous, which improves the architecture.
My review has identified a critical bug in the new lifecycle management for processors that will cause a TypeError at runtime. I've also pointed out a high-severity issue regarding a potential resource leak in the sampler application due to the removal of adapter lifecycle management. Additionally, I've suggested a medium-severity refactoring to reduce code duplication in the model endpoint handlers. Please address the critical and high-severity issues before merging.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new client script for multimodal training with Tinker, updates existing client examples, and significantly refactors the Twinkle server's resource lifecycle management to be asynchronous. This refactoring involves converting numerous synchronous methods for session, model, adapter, and processor management to async/await patterns, consolidating common logic into a new SessionResourceMixin, and updating all corresponding calls throughout the server and client components. A patch for Ray Serve is also included to correctly handle HTTP header normalization for multiplexed model IDs. A review comment suggests refactoring duplicated logic for packing multimodal tensors into loss_fn_inputs within the input_feature_to_datum function for improved maintainability.
There was a problem hiding this comment.
Pull request overview
This PR fixes Ray Serve multiplexed model header handling and refactors server-side lifecycle/serialization utilities to improve compatibility between Twinkle and Tinker flows (including multimodal datum processing).
Changes:
- Standardize multiplexing header usage (
Serve-Multiplexed-Model-Id) and add a Ray Serve proxy patch to normalize headers across proxies. - Convert
ServerState/ServerStateProxyAPIs to async and propagateawaitusage across gateway/model/sampler handlers. - Consolidate adapter/processor lifecycle management into a shared async
SessionResourceMixin, and unify server-side deserialization viatwinkle_client.common.serialize.
Reviewed changes
Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/twinkle_client/types/server.py | Adds capability response models (SupportedModel, GetServerCapabilitiesResponse). |
| src/twinkle_client/types/init.py | Re-exports new server capability response types. |
| src/twinkle_client/manager.py | Adds get_server_capabilities() client method. |
| src/twinkle_client/http/http_utils.py | Switches multiplex header key to Serve-Multiplexed-Model-Id. |
| src/twinkle_client/common/serialize.py | Adds deserialize helpers and data_slice deserialization for client/server interchange. |
| src/twinkle_client/init.py | Adds TYPE_CHECKING import for type-only references. |
| src/twinkle/server/utils/state/server_state.py | Converts state/proxy APIs to async and updates cleanup loop awaits. |
| src/twinkle/server/utils/ray_serve_patch.py | Introduces Ray Serve proxy patch for header normalization (hyphen/underscore). |
| src/twinkle/server/utils/processor_manager.py | Removes old processor lifecycle mixin (replaced by lifecycle package). |
| src/twinkle/server/utils/lifecycle/processor.py | New processor lifecycle mixin built atop SessionResourceMixin. |
| src/twinkle/server/utils/lifecycle/base.py | New shared async lifecycle base for session-bound resources. |
| src/twinkle/server/utils/lifecycle/adapter.py | New adapter lifecycle mixin built atop SessionResourceMixin. |
| src/twinkle/server/utils/lifecycle/init.py | Exposes lifecycle mixins via package API. |
| src/twinkle/server/utils/adapter_manager.py | Removes old adapter lifecycle mixin (replaced by lifecycle package). |
| src/twinkle/server/utils/init.py | Re-exports lifecycle mixins from the new package. |
| src/twinkle/server/sampler/twinkle_handlers.py | Uses shared client deserializer; removes sampler-side adapter lifecycle registration. |
| src/twinkle/server/sampler/tinker_handlers.py | Updates sampling-session state access to await async proxy methods. |
| src/twinkle/server/sampler/app.py | Removes AdapterManagerMixin usage from sampler service and simplifies init args. |
| src/twinkle/server/processor/twinkle_handlers.py | Switches to shared client deserializer; uses register_resource/assert_resource_exists. |
| src/twinkle/server/processor/app.py | Uses new lifecycle mixin and lazy-starts countdown on first sticky request. |
| src/twinkle/server/model/twinkle_handlers.py | Uses shared client deserializer; converts adapter existence checks to resource checks; preserves HTTPException codes. |
| src/twinkle/server/model/tinker_handlers.py | Awaits async ServerStateProxy methods; migrates adapter state storage to resource state. |
| src/twinkle/server/model/backends/transformers_model.py | Makes logits handling resilient when logits is absent from outputs. |
| src/twinkle/server/model/app.py | Migrates to new lifecycle mixin, lazy replica registration, async adapter cleanup/expiry. |
| src/twinkle/server/launcher.py | Applies Ray Serve patch and initializes Ray with runtime env hook for worker processes. |
| src/twinkle/server/gateway/twinkle_gateway_handlers.py | Adds /twinkle/get_server_capabilities; updates state calls to await. |
| src/twinkle/server/gateway/tinker_gateway_handlers.py | Updates state calls to await; fixes base-model lookup awaiting; normalizes capability response types. |
| src/twinkle/server/gateway/server.py | Switches supported model normalization to twinkle_client.types and makes base-model lookup async. |
| src/twinkle/server/gateway/proxy.py | Renames forwarded multiplex header to Serve-Multiplexed-Model-Id. |
| src/twinkle/server/common/serialize.py | Removes server-local serialize module (now using client-side common serialize). |
| src/twinkle/server/common/datum.py | Adds packing/unpacking of multimodal tensors via loss_fn_inputs for Tinker↔Twinkle bridge. |
| src/twinkle/server/common/init.py | Stops exporting serialize helpers from twinkle.server.common. |
| cookbook/client/twinkle/self_host/self_congnition.py | Demonstrates querying server capabilities and uses shared base model config vars. |
| cookbook/client/twinkle/self_host/multi_modal.py | New example: multimodal LoRA training via Twinkle client. |
| cookbook/client/tinker/self_host/sample.py | Removes stop token config line from sample script. |
| cookbook/client/tinker/self_host/multi_modal.py | New example: multimodal training via Tinker client using Twinkle dataset components. |
PR type
PR information
This pull request introduces a new client script for multimodal training with Tinker, updates existing client examples, and significantly refactors the Twinkle server's resource lifecycle management to be asynchronous. This refactoring involves converting numerous synchronous methods for session, model, adapter, and processor management to async/await patterns, consolidating common logic into a new SessionResourceMixin, and updating all corresponding calls throughout the server and client components. A patch for Ray Serve is also included to correctly handle HTTP header normalization for multiplexed model IDs. A review comment suggests refactoring duplicated logic for packing multimodal tensors into loss_fn_inputs within the input_feature_to_datum function for improved maintainability.
Experiment results
Paste your experiment result here(if needed).