Doc: Add Doc for agentic multi-turn #509

guapisolo · 2026-01-23T22:33:45Z

as the title

gemini-code-assist · 2026-01-23T22:34:16Z

Summary of Changes

Hello @guapisolo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the OpenAI-compatible chat completion endpoint by integrating explicit support for pre-tokenized inputs and enhancing log probability reporting. It also involves a substantial refactoring of the Miles Router's session management, improving its modularity and laying groundwork for advanced multi-turn agentic functionalities. The changes are accompanied by new documentation to guide users through the updated endpoint usage and related concepts like TITO, ensuring clarity and ease of adoption.

Highlights

OpenAI Protocol Enhancements: Introduced support for input_ids in OpenAI chat completion requests, allowing pre-tokenized prompts. Added token_ids to LogProbs and ChatCompletionTokenLogprob models, and a logprob_start_len parameter for granular log probability control.
Refactored Session Management: The Miles Router's session management has been refactored into a dedicated session package, including NaiveTrajectoryManager for handling session records and SessionRecord/GetSessionResponse models for structured data.
New Documentation: Added comprehensive documentation for both the /generate endpoint (gen_endpoint.md) and the OpenAI-format chat endpoint (oai_endpoint.md), covering agentic multi-turn interactions and Token-In Token-Out (TITO) concepts.
Router Configuration and Logic: A new --miles-router-enable-token-input-for-chat-completions argument allows the router to tokenize messages and inject input_ids. The radix_tree_middleware now directly handles /retrieve_from_text requests, streamlining router responsibilities.
Agentic Tool Call Updates: The agentic_tool_call.py module was updated to use a configurable custom_agent_function and includes build_chat_request_kwargs to standardize sampling parameters for chat requests.
End-to-End Testing: New end-to-end tests (test_chat_input_ids_equivalence.py) were added to validate the functional equivalence of providing input_ids versus raw messages in chat completions and to ensure input_ids are correctly echoed back in responses.

Ignored Files

Ignored by pattern: .github/workflows/** (2)
- .github/workflows/pr-test.yml
- .github/workflows/pr-test.yml.j2

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant new capabilities for agentic multi-turn interactions, primarily by enabling input_ids to be passed directly to the chat completion endpoint and refactoring session management. The addition of comprehensive documentation, examples, and E2E tests is commendable and greatly enhances the usability and correctness of the new features.

My main feedback centers on a large block of duplicated code within the sglang patch, which I recommend refactoring to improve long-term maintainability. I've also highlighted a functional limitation regarding the use of stop sequences with input_ids that could cause unexpected behavior. Minor suggestions include improving documentation formatting and removing a redundant parameter in an example file.

Overall, this is a strong contribution. Addressing the code duplication will make the implementation more robust and easier to maintain.

gemini-code-assist · 2026-01-23T22:36:38Z

docker/patch/v0.5.7/sglang.patch

+    def _hack_convert_input_ids_to_messages(
+        self,
+        request: ChatCompletionRequest,
+        raw_request: Request = None,
+    ) -> tuple[GenerateReqInput, ChatCompletionRequest]:
+
+        # Notice: currently, if input_ids is provided, the stop token is not used.
+        sampling_params = request.to_sampling_params(
+            model_generation_config=self.default_sampling_params
+        )
+        
+        prompt_kwargs = {"input_ids": request.input_ids}
+        
+         # Extract custom labels from raw request headers
+        custom_labels = self.extract_custom_labels(raw_request)
+
+        # Resolve LoRA adapter from model parameter or explicit lora_path
+        lora_path = self._resolve_lora_path(request.model, request.lora_path)
+        if lora_path:
+            first_adapter = (
+                lora_path
+                if isinstance(lora_path, str)
+                else next((a for a in lora_path if a), None)
+            )
+            if first_adapter:
+                self._validate_lora_enabled(first_adapter)
+
+        logprob_start_len = (
+            request.logprob_start_len if request.logprob_start_len is not None else -1
+        )
+
+        adapted_request = GenerateReqInput(
+            **prompt_kwargs,
+            sampling_params=sampling_params,
+            return_logprob=request.logprobs,
+            logprob_start_len=logprob_start_len,
+            top_logprobs_num=request.top_logprobs or 0,
+            stream=request.stream,
+            return_text_in_logprobs=True,
+            lora_path=lora_path,
+            bootstrap_host=request.bootstrap_host,
+            bootstrap_port=request.bootstrap_port,
+            bootstrap_room=request.bootstrap_room,
+            data_parallel_rank=request.data_parallel_rank,
+            return_hidden_states=request.return_hidden_states,
+            rid=request.rid,
+            extra_key=self._compute_extra_key(request),
+            require_reasoning=self._get_reasoning_from_request(request),
+            priority=request.priority,
+            custom_labels=custom_labels,
+            custom_logit_processor=request.custom_logit_processor,
+        )
+
+        return adapted_request, request
+        


The function _hack_convert_input_ids_to_messages contains a significant amount of code duplicated from _convert_to_internal_request. The function name prefix _hack_ also suggests this is a temporary or non-ideal solution.

To improve maintainability and reduce redundancy, I recommend refactoring this logic. The two paths (one for input_ids and one for messages) can be unified within _convert_to_internal_request. You could determine the prompt_kwargs and other message-related data at the beginning, and then proceed with the common logic for handling sampling parameters, LoRA, and creating the GenerateReqInput.

This would make the code cleaner, easier to understand, and less prone to bugs when one path is updated and the other is forgotten.

gemini-code-assist · 2026-01-23T22:36:38Z

docker/patch/v0.5.7/sglang.patch

+        raw_request: Request = None,
+    ) -> tuple[GenerateReqInput, ChatCompletionRequest]:
+
+        # Notice: currently, if input_ids is provided, the stop token is not used.


The comment notes a significant limitation: stop tokens are not used when input_ids are provided. This could lead to unexpected behavior for users who expect stop sequences to function as they do with message-based inputs. This is a potential correctness issue from a user's perspective.

This limitation should either be addressed to make the feature complete, or it should be very clearly documented in the public-facing API documentation to prevent confusion and bugs in user code.

gemini-code-assist · 2026-01-23T22:36:38Z

examples/openai_format/dapo_math.py

+
+
+async def run_agent(base_url: str, prompt: list[dict[str, Any]] | str, request_kwargs: dict[str, Any] = None) -> None:
+    payload = {"model": "default", "messages": prompt, "logprobs": True, **request_kwargs}


The request_kwargs passed to this function are generated by build_chat_request_kwargs in agentic_tool_call.py, which already hardcodes "logprobs": True. Setting it again here in the payload is redundant.

To avoid redundancy, you can remove "logprobs": True from this payload definition.

Suggested change

payload = {"model": "default", "messages": prompt, "logprobs": True, **request_kwargs}

payload = {"model": "default", "messages": prompt, **request_kwargs}

guapisolo · 2026-01-24T02:48:05Z

@zhaochenyang20 Could you help review the doc oai_endpoint.md and gen_endpoint.md? Thanks!

guapisolo added 15 commits January 23, 2026 19:18

move retrieve_from_text api to middleware

d63b67d

do naive tito

aaf39a3

delete useless code

dac6d7b

fix import

efb4554

test chat input id

c92fd79

refine gen

6d229ba

small upd

c190d54

fix sglang side token output for prompt part

7162d78

fix proxy path problem

b3bda26

add 4b script

538cc13

fix try catch

b6d537b

upd test

6bebb90

not sure whether test is good because gpu taken away

5343a51

upd test

a85d006

submit sglang patch

29c1595

guapisolo requested review from fzyzcjy, yueming-yuan and yushengsu-thu as code owners January 23, 2026 22:33

gemini-code-assist bot reviewed Jan 23, 2026

View reviewed changes

guapisolo added 4 commits January 24, 2026 01:14

fix tests

8edff99

fix mock sgl server output

51c1768

fix chat input ids

0e42621

upd doc

c11ecea

guapisolo force-pushed the gen_hub_doc branch from 5fcb85b to c11ecea Compare January 24, 2026 02:44

guapisolo changed the title ~~Add Doc for agentic multi-turn~~ Doc: Add Doc for agentic multi-turn Jan 24, 2026

guapisolo marked this pull request as draft January 24, 2026 05:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc: Add Doc for agentic multi-turn #509

Doc: Add Doc for agentic multi-turn #509

guapisolo commented Jan 23, 2026

Uh oh!

gemini-code-assist bot commented Jan 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

gemini-code-assist bot Jan 23, 2026

Uh oh!

guapisolo commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant



		async def run_agent(base_url: str, prompt: list[dict[str, Any]] \| str, request_kwargs: dict[str, Any] = None) -> None:
		payload = {"model": "default", "messages": prompt, "logprobs": True, **request_kwargs}

Doc: Add Doc for agentic multi-turn #509

Are you sure you want to change the base?

Doc: Add Doc for agentic multi-turn #509

Conversation

guapisolo commented Jan 23, 2026

Uh oh!

gemini-code-assist bot commented Jan 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

guapisolo commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant