Skip to content

Conversation

@guapisolo
Copy link
Collaborator

Details are to be checked with miles router code owner.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @guapisolo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Miles router by adding native support for OpenAI API-compatible chat completion requests. It introduces a new middleware layer and associated utility functions to seamlessly translate between OpenAI's request/response formats and the router's internal processing, alongside an example script to validate this new capability. The changes aim to broaden the router's interoperability and ease of use for developers accustomed to the OpenAI API standard.

Highlights

  • OpenAI Compatibility Middleware: Introduced a new OpenAICompatMiddleware that enables the Miles router to process OpenAI-style chat completion requests (/v1/chat/completions), translating them into the router's internal format and then converting the responses back to OpenAI-compatible JSON.
  • Custom OpenAI Generate Function: Added a new Python file openai_generate.py that defines an asynchronous function to interact with the SGLang router, handling OpenAI-style messages and sampling parameters, and integrating with the radix tree post-processing.
  • Example Script for OpenAI Integration: Provided a new shell script run-qwen3-4B-openai.sh to demonstrate how to run the Qwen3-4B model with the new OpenAI compatibility middleware and custom generate function, validating the end-to-end OpenAI format pipeline.
  • Radix Tree Middleware Refactoring: Refactored the RadixTreeMiddleware to explicitly handle /generate and /retrieve_from_text paths, encapsulating their logic into dedicated methods (_generate and _retrieve_from_text). The postprocess_sample_with_radix_tree function was updated to accept raw text instead of an output dictionary.
  • OpenAI Utility Functions: Created a new utility module openai_utils.py containing helper functions for converting between SGLang's internal data structures and OpenAI's ChatCompletionRequest and ChatCompletionResponse protocols, facilitating seamless integration.
  • Router Route Management: Modified the main router to remove the direct retrieve_from_text route, as its functionality is now managed within the RadixTreeMiddleware for better modularity and middleware-driven request handling.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for an OpenAI-compatible API endpoint to the miles router. The changes include a new OpenAICompatMiddleware to handle /v1/chat/completions requests, utility functions for converting between OpenAI and internal formats, and an example usage script. My review has identified a few critical issues. There's a bug in openai_generate.py that duplicates response content, and another in OpenAICompatMiddleware that could lead to an UnboundLocalError. I've also pointed out several high and medium severity issues related to unsafe data access, potential bugs in logic, and maintainability improvements such as avoiding broad exception catches and removing redundant code. Please address the critical issues before merging.

Comment on lines 56 to 65
try:
response = await self.router.client.post(url, json=generate_payload)
response.raise_for_status()
output = response.json()
finally:
self.router._finish_url(worker_url)

model = chat_request.model or getattr(self.router.args, "hf_checkpoint", "default")
prompt_tokens = len(generate_payload.get("input_ids") or [])
return JSONResponse(content=build_chat_response(model, output, prompt_tokens))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a potential UnboundLocalError here. If an exception is raised within the try block (e.g., from client.post or response.raise_for_status), the output variable will not be assigned. The finally block will execute, and then the code will attempt to use output on line 65, which will crash. The logic that depends on output should be moved inside the try block.

Suggested change
try:
response = await self.router.client.post(url, json=generate_payload)
response.raise_for_status()
output = response.json()
finally:
self.router._finish_url(worker_url)
model = chat_request.model or getattr(self.router.args, "hf_checkpoint", "default")
prompt_tokens = len(generate_payload.get("input_ids") or [])
return JSONResponse(content=build_chat_response(model, output, prompt_tokens))
try:
response = await self.router.client.post(url, json=generate_payload)
response.raise_for_status()
output = response.json()
model = chat_request.model or getattr(self.router.args, "hf_checkpoint", "default")
prompt_tokens = len(generate_payload.get("input_ids") or [])
return JSONResponse(content=build_chat_response(model, output, prompt_tokens))
finally:
self.router._finish_url(worker_url)

Comment on lines 23 to 24
choice = data["choices"][0]
content = choice["message"]["content"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Directly accessing nested dictionary and list elements like data["choices"][0] is risky and can lead to IndexError or KeyError if the response structure from the server is not as expected. Consider using .get() with checks to handle potential malformed responses gracefully.

Comment on lines 12 to 14
sleep 3
pkill -9 ray || true
pkill -9 python || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of pkill commands is redundant. The commands pkill -9 ray and pkill -9 python are repeated from lines 10-11. One round of cleanup should be sufficient. Repeating them adds clutter and can hide issues with the initial cleanup process.

) -> dict[str, Any]:
try:
chat_request = ChatCompletionRequest.model_validate(payload)
except Exception as exc: # noqa: BLE001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception can hide bugs and make debugging difficult. It's better to catch more specific exceptions. Since ChatCompletionRequest is a Pydantic model, ChatCompletionRequest.model_validate(payload) will raise pydantic.ValidationError on failure. Catching this specific exception would be more robust.

@guapisolo guapisolo force-pushed the feat/openai_router branch 2 times, most recently from de122b1 to ac4483c Compare January 17, 2026 05:02
@guapisolo guapisolo changed the title Feat: OpenAI middleware support in miles router Feat: OpenAI TITIO support by middlware Jan 17, 2026
@guapisolo guapisolo changed the title Feat: OpenAI TITIO support by middlware Feat: OpenAI TITIO support by middleware Jan 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants