Skip to content

fix serve_multiplexed_model_id and mm data process#120

Merged
Yunnglin merged 13 commits intomainfrom
fix_request
Mar 24, 2026
Merged

fix serve_multiplexed_model_id and mm data process#120
Yunnglin merged 13 commits intomainfrom
fix_request

Conversation

@Yunnglin
Copy link
Copy Markdown
Collaborator

@Yunnglin Yunnglin commented Mar 19, 2026

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

This pull request introduces a new client script for multimodal training with Tinker, updates existing client examples, and significantly refactors the Twinkle server's resource lifecycle management to be asynchronous. This refactoring involves converting numerous synchronous methods for session, model, adapter, and processor management to async/await patterns, consolidating common logic into a new SessionResourceMixin, and updating all corresponding calls throughout the server and client components. A patch for Ray Serve is also included to correctly handle HTTP header normalization for multiplexed model IDs. A review comment suggests refactoring duplicated logic for packing multimodal tensors into loss_fn_inputs within the input_feature_to_datum function for improved maintainability.

Experiment results

Paste your experiment result here(if needed).

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the Twinkle server's handling of the serve_multiplexed_model_id header, which was being inconsistently interpreted due to HTTP proxy normalization. The core solution involves introducing a robust patching mechanism for Ray Serve to correctly process these headers, alongside standardizing the header's naming convention across the codebase. Additionally, it includes minor bug fixes for model output access and adds request lifecycle hooks to various API endpoints for better extensibility and monitoring.

Highlights

  • Ray Serve Header Normalization: Implemented a patching mechanism for Ray Serve to correctly handle HTTP header normalization, specifically converting hyphens to underscores for the Serve-Multiplexed-Model-Id header, which was previously causing issues with proxies.
  • Consistent Header Naming: Standardized the serve_multiplexed_model_id header key to Serve-Multiplexed-Model-Id across both the server gateway and the HTTP client utilities for improved consistency and compatibility.
  • Ray Initialization with Patches: Updated the Ray initialization process to ensure that the newly introduced Ray Serve patches are applied not only in the main process but also in all worker processes via runtime_env configuration.
  • Robust Model Output Handling: Enhanced the transformers_model backend to safely access 'logits' from model outputs using .get() with a None check, preventing potential KeyError exceptions.
  • API Request Lifecycle Hooks: Integrated _on_request_start calls into numerous API endpoints within twinkle_handlers.py to allow for consistent pre-request processing.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue with serve_multiplexed_model_id by correcting the HTTP header name to the canonical Serve-Multiplexed-Model-Id format and introducing a patch for Ray Serve to handle header normalization from proxies. The changes also improve robustness by using .get() for dictionary access and add a request start hook to several endpoints. My review focuses on improving maintainability by refactoring duplicated code in the endpoint handlers. The core fix and the patching mechanism appear solid.

@Yunnglin
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant set of changes, primarily focused on fixing an HTTP header normalization issue with serve_multiplexed_model_id by patching Ray Serve. It also includes a major and beneficial refactoring of the server state management and resource lifecycle from synchronous to asynchronous, which improves the architecture.

My review has identified a critical bug in the new lifecycle management for processors that will cause a TypeError at runtime. I've also pointed out a high-severity issue regarding a potential resource leak in the sampler application due to the removal of adapter lifecycle management. Additionally, I've suggested a medium-severity refactoring to reduce code duplication in the model endpoint handlers. Please address the critical and high-severity issues before merging.

@Yunnglin
Copy link
Copy Markdown
Collaborator Author

/gemini review

@Yunnglin Yunnglin changed the title fix serve_multiplexed_model_id fix serve_multiplexed_model_id and datum process Mar 24, 2026
@Yunnglin Yunnglin marked this pull request as ready for review March 24, 2026 07:45
Copilot AI review requested due to automatic review settings March 24, 2026 07:45
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new client script for multimodal training with Tinker, updates existing client examples, and significantly refactors the Twinkle server's resource lifecycle management to be asynchronous. This refactoring involves converting numerous synchronous methods for session, model, adapter, and processor management to async/await patterns, consolidating common logic into a new SessionResourceMixin, and updating all corresponding calls throughout the server and client components. A patch for Ray Serve is also included to correctly handle HTTP header normalization for multiplexed model IDs. A review comment suggests refactoring duplicated logic for packing multimodal tensors into loss_fn_inputs within the input_feature_to_datum function for improved maintainability.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Ray Serve multiplexed model header handling and refactors server-side lifecycle/serialization utilities to improve compatibility between Twinkle and Tinker flows (including multimodal datum processing).

Changes:

  • Standardize multiplexing header usage (Serve-Multiplexed-Model-Id) and add a Ray Serve proxy patch to normalize headers across proxies.
  • Convert ServerState/ServerStateProxy APIs to async and propagate await usage across gateway/model/sampler handlers.
  • Consolidate adapter/processor lifecycle management into a shared async SessionResourceMixin, and unify server-side deserialization via twinkle_client.common.serialize.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/twinkle_client/types/server.py Adds capability response models (SupportedModel, GetServerCapabilitiesResponse).
src/twinkle_client/types/init.py Re-exports new server capability response types.
src/twinkle_client/manager.py Adds get_server_capabilities() client method.
src/twinkle_client/http/http_utils.py Switches multiplex header key to Serve-Multiplexed-Model-Id.
src/twinkle_client/common/serialize.py Adds deserialize helpers and data_slice deserialization for client/server interchange.
src/twinkle_client/init.py Adds TYPE_CHECKING import for type-only references.
src/twinkle/server/utils/state/server_state.py Converts state/proxy APIs to async and updates cleanup loop awaits.
src/twinkle/server/utils/ray_serve_patch.py Introduces Ray Serve proxy patch for header normalization (hyphen/underscore).
src/twinkle/server/utils/processor_manager.py Removes old processor lifecycle mixin (replaced by lifecycle package).
src/twinkle/server/utils/lifecycle/processor.py New processor lifecycle mixin built atop SessionResourceMixin.
src/twinkle/server/utils/lifecycle/base.py New shared async lifecycle base for session-bound resources.
src/twinkle/server/utils/lifecycle/adapter.py New adapter lifecycle mixin built atop SessionResourceMixin.
src/twinkle/server/utils/lifecycle/init.py Exposes lifecycle mixins via package API.
src/twinkle/server/utils/adapter_manager.py Removes old adapter lifecycle mixin (replaced by lifecycle package).
src/twinkle/server/utils/init.py Re-exports lifecycle mixins from the new package.
src/twinkle/server/sampler/twinkle_handlers.py Uses shared client deserializer; removes sampler-side adapter lifecycle registration.
src/twinkle/server/sampler/tinker_handlers.py Updates sampling-session state access to await async proxy methods.
src/twinkle/server/sampler/app.py Removes AdapterManagerMixin usage from sampler service and simplifies init args.
src/twinkle/server/processor/twinkle_handlers.py Switches to shared client deserializer; uses register_resource/assert_resource_exists.
src/twinkle/server/processor/app.py Uses new lifecycle mixin and lazy-starts countdown on first sticky request.
src/twinkle/server/model/twinkle_handlers.py Uses shared client deserializer; converts adapter existence checks to resource checks; preserves HTTPException codes.
src/twinkle/server/model/tinker_handlers.py Awaits async ServerStateProxy methods; migrates adapter state storage to resource state.
src/twinkle/server/model/backends/transformers_model.py Makes logits handling resilient when logits is absent from outputs.
src/twinkle/server/model/app.py Migrates to new lifecycle mixin, lazy replica registration, async adapter cleanup/expiry.
src/twinkle/server/launcher.py Applies Ray Serve patch and initializes Ray with runtime env hook for worker processes.
src/twinkle/server/gateway/twinkle_gateway_handlers.py Adds /twinkle/get_server_capabilities; updates state calls to await.
src/twinkle/server/gateway/tinker_gateway_handlers.py Updates state calls to await; fixes base-model lookup awaiting; normalizes capability response types.
src/twinkle/server/gateway/server.py Switches supported model normalization to twinkle_client.types and makes base-model lookup async.
src/twinkle/server/gateway/proxy.py Renames forwarded multiplex header to Serve-Multiplexed-Model-Id.
src/twinkle/server/common/serialize.py Removes server-local serialize module (now using client-side common serialize).
src/twinkle/server/common/datum.py Adds packing/unpacking of multimodal tensors via loss_fn_inputs for Tinker↔Twinkle bridge.
src/twinkle/server/common/init.py Stops exporting serialize helpers from twinkle.server.common.
cookbook/client/twinkle/self_host/self_congnition.py Demonstrates querying server capabilities and uses shared base model config vars.
cookbook/client/twinkle/self_host/multi_modal.py New example: multimodal LoRA training via Twinkle client.
cookbook/client/tinker/self_host/sample.py Removes stop token config line from sample script.
cookbook/client/tinker/self_host/multi_modal.py New example: multimodal training via Tinker client using Twinkle dataset components.

@Yunnglin Yunnglin changed the title fix serve_multiplexed_model_id and datum process fix serve_multiplexed_model_id and mm data process Mar 24, 2026
@Yunnglin Yunnglin merged commit 4bfed40 into main Mar 24, 2026
2 of 4 checks passed
@Yunnglin Yunnglin deleted the fix_request branch March 24, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants