Feat: OpenAI TITIO support by middleware #414

guapisolo · 2026-01-10T05:02:11Z

Details are to be checked with miles router code owner.

gemini-code-assist · 2026-01-10T05:02:28Z

Summary of Changes

Hello @guapisolo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Miles router by adding native support for OpenAI API-compatible chat completion requests. It introduces a new middleware layer and associated utility functions to seamlessly translate between OpenAI's request/response formats and the router's internal processing, alongside an example script to validate this new capability. The changes aim to broaden the router's interoperability and ease of use for developers accustomed to the OpenAI API standard.

Highlights

OpenAI Compatibility Middleware: Introduced a new OpenAICompatMiddleware that enables the Miles router to process OpenAI-style chat completion requests (/v1/chat/completions), translating them into the router's internal format and then converting the responses back to OpenAI-compatible JSON.
Custom OpenAI Generate Function: Added a new Python file openai_generate.py that defines an asynchronous function to interact with the SGLang router, handling OpenAI-style messages and sampling parameters, and integrating with the radix tree post-processing.
Example Script for OpenAI Integration: Provided a new shell script run-qwen3-4B-openai.sh to demonstrate how to run the Qwen3-4B model with the new OpenAI compatibility middleware and custom generate function, validating the end-to-end OpenAI format pipeline.
Radix Tree Middleware Refactoring: Refactored the RadixTreeMiddleware to explicitly handle /generate and /retrieve_from_text paths, encapsulating their logic into dedicated methods (_generate and _retrieve_from_text). The postprocess_sample_with_radix_tree function was updated to accept raw text instead of an output dictionary.
OpenAI Utility Functions: Created a new utility module openai_utils.py containing helper functions for converting between SGLang's internal data structures and OpenAI's ChatCompletionRequest and ChatCompletionResponse protocols, facilitating seamless integration.
Router Route Management: Modified the main router to remove the direct retrieve_from_text route, as its functionality is now managed within the RadixTreeMiddleware for better modularity and middleware-driven request handling.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for an OpenAI-compatible API endpoint to the miles router. The changes include a new OpenAICompatMiddleware to handle /v1/chat/completions requests, utility functions for converting between OpenAI and internal formats, and an example usage script. My review has identified a few critical issues. There's a bug in openai_generate.py that duplicates response content, and another in OpenAICompatMiddleware that could lead to an UnboundLocalError. I've also pointed out several high and medium severity issues related to unsafe data access, potential bugs in logic, and maintainability improvements such as avoiding broad exception catches and removing redundant code. Please address the critical issues before merging.

examples/openai_format/openai_generate.py

gemini-code-assist · 2026-01-10T05:04:32Z

miles/router/middleware_hub/openai_compat_middleware.py

+        try:
+            response = await self.router.client.post(url, json=generate_payload)
+            response.raise_for_status()
+            output = response.json()
+        finally:
+            self.router._finish_url(worker_url)
+
+        model = chat_request.model or getattr(self.router.args, "hf_checkpoint", "default")
+        prompt_tokens = len(generate_payload.get("input_ids") or [])
+        return JSONResponse(content=build_chat_response(model, output, prompt_tokens))


There is a potential UnboundLocalError here. If an exception is raised within the try block (e.g., from client.post or response.raise_for_status), the output variable will not be assigned. The finally block will execute, and then the code will attempt to use output on line 65, which will crash. The logic that depends on output should be moved inside the try block.

Suggested change

try:

response = await self.router.client.post(url, json=generate_payload)

response.raise_for_status()

output = response.json()

finally:

self.router._finish_url(worker_url)

model = chat_request.model or getattr(self.router.args, "hf_checkpoint", "default")

prompt_tokens = len(generate_payload.get("input_ids") or [])

return JSONResponse(content=build_chat_response(model, output, prompt_tokens))

try:

response = await self.router.client.post(url, json=generate_payload)

response.raise_for_status()

output = response.json()

model = chat_request.model or getattr(self.router.args, "hf_checkpoint", "default")

prompt_tokens = len(generate_payload.get("input_ids") or [])

return JSONResponse(content=build_chat_response(model, output, prompt_tokens))

finally:

self.router._finish_url(worker_url)

gemini-code-assist · 2026-01-10T05:04:33Z

examples/openai_format/openai_generate.py

+    choice = data["choices"][0]
+    content = choice["message"]["content"]


Directly accessing nested dictionary and list elements like data["choices"][0] is risky and can lead to IndexError or KeyError if the response structure from the server is not as expected. Consider using .get() with checks to handle potential malformed responses gracefully.

miles/utils/openai_utils.py

gemini-code-assist · 2026-01-10T05:04:33Z

examples/openai_format/run-qwen3-4B-openai.sh

+sleep 3
+pkill -9 ray || true
+pkill -9 python || true


This block of pkill commands is redundant. The commands pkill -9 ray and pkill -9 python are repeated from lines 10-11. One round of cleanup should be sufficient. Repeating them adds clutter and can hide issues with the initial cleanup process.

miles/router/middleware_hub/openai_compat_middleware.py

gemini-code-assist · 2026-01-10T05:04:33Z

miles/utils/openai_utils.py

+) -> dict[str, Any]:
+    try:
+        chat_request = ChatCompletionRequest.model_validate(payload)
+    except Exception as exc:  # noqa: BLE001


Catching a broad Exception can hide bugs and make debugging difficult. It's better to catch more specific exceptions. Since ChatCompletionRequest is a Pydantic model, ChatCompletionRequest.model_validate(payload) will raise pydantic.ValidationError on failure. Catching this specific exception would be more robust.

miles/utils/openai_utils.py

This reverts commit 25819c8.

This reverts commit 881d5a3.

gemini-code-assist bot reviewed Jan 10, 2026

View reviewed changes

guapisolo force-pushed the feat/openai_router branch 2 times, most recently from de122b1 to ac4483c Compare January 17, 2026 05:02

guapisolo changed the title ~~Feat: OpenAI middleware support in miles router~~ Feat: OpenAI TITIO support by middlware Jan 17, 2026

guapisolo changed the title ~~Feat: OpenAI TITIO support by middlware~~ Feat: OpenAI TITIO support by middleware Jan 17, 2026

fzyzcjy added 24 commits January 17, 2026 13:28

more

62f0740

more

c8bebbc

more

6c80b1c

more

881d5a3

more

25819c8

Revert "more"

a34fd40

This reverts commit 25819c8.

Revert "more"

73a8ad4

This reverts commit 881d5a3.

more

887814b

more

8d00a0c

more

5832c22

more

8bd8459

more

90c934a

more

5a15002

more

80ba47d

more

885aff8

more

075aa93

more

8d50974

more

6651808

more

546db91

more

ef42bd4

more

8d469d2

more

b0bd948

more

d1ab853

more

d745520

fzyzcjy and others added 24 commits January 18, 2026 10:24

more

8ed78e5

more

695df96

more

45a0259

merge

219c4e1

more

4b9704f

more

bb7deae

fix: use pip install instead of large docker image

3c4ec84

chore: use uv for faster dependency installation

ad996b9

chore: separate pytest from main package installation

091577f

more

b269de0

more

3b31227

rm

51dd13f

more

9127f4f

more

6ab64c7

more

bf0a3b4

more

0697448

more

d6e522e

more

6ab728b

fmt

d964184

fix: typo args.device -> args.devices

775552f

fix: skip gated llama model and fix tool_index expectations

083f676

Merge branch 'feat/ac8113aw' into feat/ac8113ax

3a64139

Merge branch 'feat/ac8113ax' into feat/ac8113ay

3689e4f

move retrieve_from_text api to middleware

2711f50

guapisolo force-pushed the feat/openai_router branch from ac4483c to 24a8995 Compare January 19, 2026 01:48

temporarily give up cross turn inherit

37b4b5f

guapisolo force-pushed the feat/openai_router branch from 24a8995 to 37b4b5f Compare January 19, 2026 06:51

guapisolo added 3 commits January 19, 2026 08:44

fix assistant think problem

70b963c

small fix

4ef014b

give up because only assistant before last user was cut

ae13c47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: OpenAI TITIO support by middleware #414

Feat: OpenAI TITIO support by middleware #414

Uh oh!

guapisolo commented Jan 10, 2026

Uh oh!

gemini-code-assist bot commented Jan 10, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 10, 2026

Uh oh!

gemini-code-assist bot Jan 10, 2026

Uh oh!

Uh oh!

gemini-code-assist bot Jan 10, 2026

Uh oh!

Uh oh!

gemini-code-assist bot Jan 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		choice = data["choices"][0]
		content = choice["message"]["content"]

Feat: OpenAI TITIO support by middleware #414

Are you sure you want to change the base?

Feat: OpenAI TITIO support by middleware #414

Uh oh!

Conversation

guapisolo commented Jan 10, 2026

Uh oh!

gemini-code-assist bot commented Jan 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants