fix serve_multiplexed_model_id and mm data process by Yunnglin · Pull Request #120 · modelscope/twinkle

Yunnglin · 2026-03-19T10:29:56Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

This pull request introduces a new client script for multimodal training with Tinker, updates existing client examples, and significantly refactors the Twinkle server's resource lifecycle management to be asynchronous. This refactoring involves converting numerous synchronous methods for session, model, adapter, and processor management to async/await patterns, consolidating common logic into a new SessionResourceMixin, and updating all corresponding calls throughout the server and client components. A patch for Ray Serve is also included to correctly handle HTTP header normalization for multiplexed model IDs. A review comment suggests refactoring duplicated logic for packing multimodal tensors into loss_fn_inputs within the input_feature_to_datum function for improved maintainability.

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist · 2026-03-19T10:30:18Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the Twinkle server's handling of the serve_multiplexed_model_id header, which was being inconsistently interpreted due to HTTP proxy normalization. The core solution involves introducing a robust patching mechanism for Ray Serve to correctly process these headers, alongside standardizing the header's naming convention across the codebase. Additionally, it includes minor bug fixes for model output access and adds request lifecycle hooks to various API endpoints for better extensibility and monitoring.

Highlights

Ray Serve Header Normalization: Implemented a patching mechanism for Ray Serve to correctly handle HTTP header normalization, specifically converting hyphens to underscores for the Serve-Multiplexed-Model-Id header, which was previously causing issues with proxies.
Consistent Header Naming: Standardized the serve_multiplexed_model_id header key to Serve-Multiplexed-Model-Id across both the server gateway and the HTTP client utilities for improved consistency and compatibility.
Ray Initialization with Patches: Updated the Ray initialization process to ensure that the newly introduced Ray Serve patches are applied not only in the main process but also in all worker processes via runtime_env configuration.
Robust Model Output Handling: Enhanced the transformers_model backend to safely access 'logits' from model outputs using .get() with a None check, preventing potential KeyError exceptions.
API Request Lifecycle Hooks: Integrated _on_request_start calls into numerous API endpoints within twinkle_handlers.py to allow for consistent pre-request processing.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses an issue with serve_multiplexed_model_id by correcting the HTTP header name to the canonical Serve-Multiplexed-Model-Id format and introducing a patch for Ray Serve to handle header normalization from proxies. The changes also improve robustness by using .get() for dictionary access and add a request start hook to several endpoints. My review focuses on improving maintainability by refactoring duplicated code in the endpoint handlers. The core fix and the patching mechanism appear solid.

src/twinkle/server/model/twinkle_handlers.py

Yunnglin · 2026-03-20T10:20:10Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant set of changes, primarily focused on fixing an HTTP header normalization issue with serve_multiplexed_model_id by patching Ray Serve. It also includes a major and beneficial refactoring of the server state management and resource lifecycle from synchronous to asynchronous, which improves the architecture.

My review has identified a critical bug in the new lifecycle management for processors that will cause a TypeError at runtime. I've also pointed out a high-severity issue regarding a potential resource leak in the sampler application due to the removal of adapter lifecycle management. Additionally, I've suggested a medium-severity refactoring to reduce code duplication in the model endpoint handlers. Please address the critical and high-severity issues before merging.

src/twinkle/server/utils/lifecycle/processor.py

src/twinkle/server/sampler/app.py

src/twinkle/server/model/twinkle_handlers.py

Yunnglin · 2026-03-24T07:45:09Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new client script for multimodal training with Tinker, updates existing client examples, and significantly refactors the Twinkle server's resource lifecycle management to be asynchronous. This refactoring involves converting numerous synchronous methods for session, model, adapter, and processor management to async/await patterns, consolidating common logic into a new SessionResourceMixin, and updating all corresponding calls throughout the server and client components. A patch for Ray Serve is also included to correctly handle HTTP header normalization for multiplexed model IDs. A review comment suggests refactoring duplicated logic for packing multimodal tensors into loss_fn_inputs within the input_feature_to_datum function for improved maintainability.

src/twinkle/server/common/datum.py

Copilot

Pull request overview

This PR fixes Ray Serve multiplexed model header handling and refactors server-side lifecycle/serialization utilities to improve compatibility between Twinkle and Tinker flows (including multimodal datum processing).

Changes:

Standardize multiplexing header usage (Serve-Multiplexed-Model-Id) and add a Ray Serve proxy patch to normalize headers across proxies.
Convert ServerState/ServerStateProxy APIs to async and propagate await usage across gateway/model/sampler handlers.
Consolidate adapter/processor lifecycle management into a shared async SessionResourceMixin, and unify server-side deserialization via twinkle_client.common.serialize.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/twinkle_client/types/server.py	Adds capability response models (`SupportedModel`, `GetServerCapabilitiesResponse`).
src/twinkle_client/types/init.py	Re-exports new server capability response types.
src/twinkle_client/manager.py	Adds `get_server_capabilities()` client method.
src/twinkle_client/http/http_utils.py	Switches multiplex header key to `Serve-Multiplexed-Model-Id`.
src/twinkle_client/common/serialize.py	Adds deserialize helpers and data_slice deserialization for client/server interchange.
src/twinkle_client/init.py	Adds `TYPE_CHECKING` import for type-only references.
src/twinkle/server/utils/state/server_state.py	Converts state/proxy APIs to async and updates cleanup loop awaits.
src/twinkle/server/utils/ray_serve_patch.py	Introduces Ray Serve proxy patch for header normalization (hyphen/underscore).
src/twinkle/server/utils/processor_manager.py	Removes old processor lifecycle mixin (replaced by lifecycle package).
src/twinkle/server/utils/lifecycle/processor.py	New processor lifecycle mixin built atop `SessionResourceMixin`.
src/twinkle/server/utils/lifecycle/base.py	New shared async lifecycle base for session-bound resources.
src/twinkle/server/utils/lifecycle/adapter.py	New adapter lifecycle mixin built atop `SessionResourceMixin`.
src/twinkle/server/utils/lifecycle/init.py	Exposes lifecycle mixins via package API.
src/twinkle/server/utils/adapter_manager.py	Removes old adapter lifecycle mixin (replaced by lifecycle package).
src/twinkle/server/utils/init.py	Re-exports lifecycle mixins from the new package.
src/twinkle/server/sampler/twinkle_handlers.py	Uses shared client deserializer; removes sampler-side adapter lifecycle registration.
src/twinkle/server/sampler/tinker_handlers.py	Updates sampling-session state access to `await` async proxy methods.
src/twinkle/server/sampler/app.py	Removes `AdapterManagerMixin` usage from sampler service and simplifies init args.
src/twinkle/server/processor/twinkle_handlers.py	Switches to shared client deserializer; uses `register_resource`/`assert_resource_exists`.
src/twinkle/server/processor/app.py	Uses new lifecycle mixin and lazy-starts countdown on first sticky request.
src/twinkle/server/model/twinkle_handlers.py	Uses shared client deserializer; converts adapter existence checks to resource checks; preserves HTTPException codes.
src/twinkle/server/model/tinker_handlers.py	Awaits async `ServerStateProxy` methods; migrates adapter state storage to resource state.
src/twinkle/server/model/backends/transformers_model.py	Makes logits handling resilient when `logits` is absent from outputs.
src/twinkle/server/model/app.py	Migrates to new lifecycle mixin, lazy replica registration, async adapter cleanup/expiry.
src/twinkle/server/launcher.py	Applies Ray Serve patch and initializes Ray with runtime env hook for worker processes.
src/twinkle/server/gateway/twinkle_gateway_handlers.py	Adds `/twinkle/get_server_capabilities`; updates state calls to `await`.
src/twinkle/server/gateway/tinker_gateway_handlers.py	Updates state calls to `await`; fixes base-model lookup awaiting; normalizes capability response types.
src/twinkle/server/gateway/server.py	Switches supported model normalization to `twinkle_client.types` and makes base-model lookup async.
src/twinkle/server/gateway/proxy.py	Renames forwarded multiplex header to `Serve-Multiplexed-Model-Id`.
src/twinkle/server/common/serialize.py	Removes server-local serialize module (now using client-side common serialize).
src/twinkle/server/common/datum.py	Adds packing/unpacking of multimodal tensors via `loss_fn_inputs` for Tinker↔Twinkle bridge.
src/twinkle/server/common/init.py	Stops exporting serialize helpers from `twinkle.server.common`.
cookbook/client/twinkle/self_host/self_congnition.py	Demonstrates querying server capabilities and uses shared base model config vars.
cookbook/client/twinkle/self_host/multi_modal.py	New example: multimodal LoRA training via Twinkle client.
cookbook/client/tinker/self_host/sample.py	Removes stop token config line from sample script.
cookbook/client/tinker/self_host/multi_modal.py	New example: multimodal training via Tinker client using Twinkle dataset components.

src/twinkle/server/processor/app.py

src/twinkle/server/utils/lifecycle/base.py

src/twinkle/server/utils/ray_serve_patch.py

src/twinkle_client/types/server.py

src/twinkle/server/gateway/server.py

src/twinkle/server/utils/lifecycle/processor.py

fix serve_multiplexed_model_id

c1139e2

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

src/twinkle/server/model/twinkle_handlers.py Show resolved Hide resolved

Yunnglin added 6 commits March 20, 2026 11:49

fix model check

ade7827

Merge remote-tracking branch 'origin' into fix_request

1386b01

update import

60820f0

update

5002a52

update async

2472409

update async

f7a3d0e

gemini-code-assist bot reviewed Mar 20, 2026

View reviewed changes

src/twinkle/server/utils/lifecycle/processor.py Show resolved Hide resolved

src/twinkle/server/sampler/app.py Show resolved Hide resolved

src/twinkle/server/model/twinkle_handlers.py Show resolved Hide resolved

Yunnglin added 4 commits March 23, 2026 14:25

Merge remote-tracking branch 'origin' into fix_request

993c049

Merge remote-tracking branch 'origin' into fix_request

ef28207

update tinker datum

d65c2af

update handler

8220365

Yunnglin changed the title ~~fix serve_multiplexed_model_id~~ fix serve_multiplexed_model_id and datum process Mar 24, 2026

Yunnglin marked this pull request as ready for review March 24, 2026 07:45

Copilot AI review requested due to automatic review settings March 24, 2026 07:45

Copilot started reviewing on behalf of Yunnglin March 24, 2026 07:46 View session

gemini-code-assist bot reviewed Mar 24, 2026

View reviewed changes

src/twinkle/server/common/datum.py Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Yunnglin added 2 commits March 24, 2026 15:57

update handler

8ae58e6

fix

1f515ef

Yunnglin changed the title ~~fix serve_multiplexed_model_id and datum process~~ fix serve_multiplexed_model_id and mm data process Mar 24, 2026

tastelikefeet approved these changes Mar 24, 2026

View reviewed changes

Yunnglin merged commit 4bfed40 into main Mar 24, 2026
2 of 4 checks passed

Yunnglin deleted the fix_request branch March 24, 2026 08:28

Conversation

Yunnglin commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist bot commented Mar 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Yunnglin commented Mar 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yunnglin commented Mar 24, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yunnglin commented Mar 19, 2026 •

edited

Loading