Skip to content

Fix proxy#87

Merged
Yunnglin merged 9 commits intomainfrom
fix/proxy
Mar 2, 2026
Merged

Fix proxy#87
Yunnglin merged 9 commits intomainfrom
fix/proxy

Conversation

@Yunnglin
Copy link
Copy Markdown
Collaborator

@Yunnglin Yunnglin commented Mar 1, 2026

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

Copilot AI review requested due to automatic review settings March 1, 2026 06:32
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the internal proxying mechanism within the server by introducing a dedicated ServiceProxy class, centralizing request routing logic. It standardizes the request identification header used for sticky sessions, ensuring consistency across client and server components. Additionally, server module imports have been optimized through lazy loading, and a minor configuration change was made to the ModelManagement service's replica settings.

Highlights

  • Dedicated Service Proxy Implementation: Introduced a new ServiceProxy class to centralize and improve internal request routing for model and sampler services, handling URL construction, header management, and error handling.
  • Standardized Request ID Header: Updated the header used for sticky sessions and request identification from serve_multiplexed_model_id to X-Ray-Serve-Request-Id across server-side validation, client-side header building, and documentation.
  • Server Component Lazy Loading Refactoring: Refactored server-side __init__.py files to utilize _LazyModule for lazy loading of build_*_app functions, improving import performance and module organization.
  • Configuration Adjustment: Reduced the minimum and maximum replica counts for the ModelManagement service in the server_config.yaml from 2 to 1.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • cookbook/client/tinker/custom_service/megatron/server_config.yaml
    • Updated min_replicas and max_replicas for ModelManagement deployment from 2 to 1.
  • docs/source_en/Usage Guide/Server and Client/Tinker-Compatible-Client.md
    • Updated documentation to reflect the change in authentication header from serve_multiplexed_model_id to X-Ray-Serve-Request-Id.
  • docs/source_zh/使用指引/服务端和客户端/Tinker兼容客户端.md
    • Updated Chinese documentation to reflect the change in authentication header from serve_multiplexed_model_id to X-Ray-Serve-Request-Id.
  • src/twinkle/server/init.py
    • Removed direct imports and __all__ entries for build_model_app, build_processor_app, build_sampler_app, and build_server_app.
  • src/twinkle/server/launcher.py
    • Adjusted import paths for build_*_app functions to reflect their new location within twinkle.server.twinkle.
    • Added logic to pass http_options to server applications for internal proxy routing.
  • src/twinkle/server/tinker/init.py
    • Refactored module imports to use _LazyModule for lazy loading of model, sampler, and server components.
  • src/twinkle/server/tinker/common/datum.py
    • Removed unused List, Union imports.
    • Added handling for list type when converting labels_raw to a NumPy array.
  • src/twinkle/server/tinker/model.py
    • Imported wrap_builder_with_device_group_env.
    • Applied wrap_builder_with_device_group_env to build_model_app.
  • src/twinkle/server/tinker/proxy.py
    • Added new file proxy.py containing the ServiceProxy class for internal HTTP request routing.
  • src/twinkle/server/tinker/sampler.py
    • Imported wrap_builder_with_device_group_env.
    • Applied wrap_builder_with_device_group_env to build_sampler_app.
  • src/twinkle/server/tinker/server.py
    • Removed direct httpx and logging imports, and updated logger initialization.
    • Added http_options parameter to build_server_app.
    • Replaced internal httpx.AsyncClient and proxy methods with an instance of the new ServiceProxy.
    • Updated all model and sampler endpoint handlers to use the ServiceProxy for request forwarding.
    • Passed http_options to the TinkerCompatServer initialization.
  • src/twinkle/server/twinkle/init.py
    • Refactored module imports to use _LazyModule for lazy loading of model, processor, sampler, and server components.
  • src/twinkle/server/utils/validation.py
    • Updated the expected request header for sticky sessions from serve_multiplexed_model_id to X-Ray-Serve-Request-Id.
  • src/twinkle_client/http/http_utils.py
    • Changed the header key for request ID from serve_multiplexed_model_id to X-Ray-Serve-Request-Id.
  • src/twinkle_client/utils/patch_tinker.py
    • Updated the header key for request ID from serve_multiplexed_model_id to X-Ray-Serve-Request-Id in patched service client initialization.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a proxying issue by ensuring internal requests are correctly routed to localhost. The refactoring of proxy logic into a dedicated ServiceProxy class is a significant improvement for code clarity and maintainability. The change of the sticky session header to X-Ray-Serve-Request-Id is consistently applied across the client, server, and documentation. Additionally, the introduction of lazy loading for modules is a good optimization. I have one suggestion to improve code consistency.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Twinkle’s client/server request routing to improve sticky-session request ID handling and internal proxying for the Tinker-compatible server, aiming to avoid issues caused by proxies and to reduce import overhead.

Changes:

  • Switch client + server sticky-session header from serve_multiplexed_model_id to X-Ray-Serve-Request-Id.
  • Refactor Tinker server proxying into a dedicated ServiceProxy that routes internally using http_options.
  • Introduce lazy imports for twinkle.server.twinkle and twinkle.server.tinker (and adjust launcher imports accordingly).

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/twinkle_client/utils/patch_tinker.py Updates patched Tinker client headers to send X-Ray-Serve-Request-Id.
src/twinkle_client/http/http_utils.py Updates default HTTP headers to send X-Ray-Serve-Request-Id.
src/twinkle/server/utils/validation.py Middleware now requires X-Ray-Serve-Request-Id for sticky sessions.
src/twinkle/server/twinkle/init.py Converts to _LazyModule-based lazy imports for server builders.
src/twinkle/server/tinker/server.py Uses ServiceProxy, adds http_options plumbed from launcher for internal routing.
src/twinkle/server/tinker/sampler.py Wraps build_sampler_app with device-group env helper within the module.
src/twinkle/server/tinker/proxy.py New internal proxy module handling URL building, header forwarding, and request forwarding.
src/twinkle/server/tinker/model.py Wraps build_model_app with device-group env helper within the module.
src/twinkle/server/tinker/common/datum.py Adds list handling when converting labels to numpy arrays.
src/twinkle/server/tinker/init.py Converts to _LazyModule-based lazy imports for Tinker builders.
src/twinkle/server/launcher.py Imports Twinkle builders from twinkle.server.twinkle and passes http_options into apps.
src/twinkle/server/init.py Removes re-exports of build_* app builders (leaving launcher exports).
cookbook/client/tinker/custom_service/megatron/server_config.yaml Changes ModelManagement autoscaling replicas from 2→1.

@Yunnglin Yunnglin merged commit 2c24077 into main Mar 2, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants