Skip to content

Conversation

@guapisolo
Copy link
Collaborator

as the title

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @guapisolo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the OpenAI-compatible chat completion endpoint by integrating explicit support for pre-tokenized inputs and enhancing log probability reporting. It also involves a substantial refactoring of the Miles Router's session management, improving its modularity and laying groundwork for advanced multi-turn agentic functionalities. The changes are accompanied by new documentation to guide users through the updated endpoint usage and related concepts like TITO, ensuring clarity and ease of adoption.

Highlights

  • OpenAI Protocol Enhancements: Introduced support for input_ids in OpenAI chat completion requests, allowing pre-tokenized prompts. Added token_ids to LogProbs and ChatCompletionTokenLogprob models, and a logprob_start_len parameter for granular log probability control.
  • Refactored Session Management: The Miles Router's session management has been refactored into a dedicated session package, including NaiveTrajectoryManager for handling session records and SessionRecord/GetSessionResponse models for structured data.
  • New Documentation: Added comprehensive documentation for both the /generate endpoint (gen_endpoint.md) and the OpenAI-format chat endpoint (oai_endpoint.md), covering agentic multi-turn interactions and Token-In Token-Out (TITO) concepts.
  • Router Configuration and Logic: A new --miles-router-enable-token-input-for-chat-completions argument allows the router to tokenize messages and inject input_ids. The radix_tree_middleware now directly handles /retrieve_from_text requests, streamlining router responsibilities.
  • Agentic Tool Call Updates: The agentic_tool_call.py module was updated to use a configurable custom_agent_function and includes build_chat_request_kwargs to standardize sampling parameters for chat requests.
  • End-to-End Testing: New end-to-end tests (test_chat_input_ids_equivalence.py) were added to validate the functional equivalence of providing input_ids versus raw messages in chat completions and to ensure input_ids are correctly echoed back in responses.
Ignored Files
  • Ignored by pattern: .github/workflows/** (2)
    • .github/workflows/pr-test.yml
    • .github/workflows/pr-test.yml.j2
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new capabilities for agentic multi-turn interactions, primarily by enabling input_ids to be passed directly to the chat completion endpoint and refactoring session management. The addition of comprehensive documentation, examples, and E2E tests is commendable and greatly enhances the usability and correctness of the new features.

My main feedback centers on a large block of duplicated code within the sglang patch, which I recommend refactoring to improve long-term maintainability. I've also highlighted a functional limitation regarding the use of stop sequences with input_ids that could cause unexpected behavior. Minor suggestions include improving documentation formatting and removing a redundant parameter in an example file.

Overall, this is a strong contribution. Addressing the code duplication will make the implementation more robust and easier to maintain.

Comment on lines +141 to +195
+ def _hack_convert_input_ids_to_messages(
+ self,
+ request: ChatCompletionRequest,
+ raw_request: Request = None,
+ ) -> tuple[GenerateReqInput, ChatCompletionRequest]:
+
+ # Notice: currently, if input_ids is provided, the stop token is not used.
+ sampling_params = request.to_sampling_params(
+ model_generation_config=self.default_sampling_params
+ )
+
+ prompt_kwargs = {"input_ids": request.input_ids}
+
+ # Extract custom labels from raw request headers
+ custom_labels = self.extract_custom_labels(raw_request)
+
+ # Resolve LoRA adapter from model parameter or explicit lora_path
+ lora_path = self._resolve_lora_path(request.model, request.lora_path)
+ if lora_path:
+ first_adapter = (
+ lora_path
+ if isinstance(lora_path, str)
+ else next((a for a in lora_path if a), None)
+ )
+ if first_adapter:
+ self._validate_lora_enabled(first_adapter)
+
+ logprob_start_len = (
+ request.logprob_start_len if request.logprob_start_len is not None else -1
+ )
+
+ adapted_request = GenerateReqInput(
+ **prompt_kwargs,
+ sampling_params=sampling_params,
+ return_logprob=request.logprobs,
+ logprob_start_len=logprob_start_len,
+ top_logprobs_num=request.top_logprobs or 0,
+ stream=request.stream,
+ return_text_in_logprobs=True,
+ lora_path=lora_path,
+ bootstrap_host=request.bootstrap_host,
+ bootstrap_port=request.bootstrap_port,
+ bootstrap_room=request.bootstrap_room,
+ data_parallel_rank=request.data_parallel_rank,
+ return_hidden_states=request.return_hidden_states,
+ rid=request.rid,
+ extra_key=self._compute_extra_key(request),
+ require_reasoning=self._get_reasoning_from_request(request),
+ priority=request.priority,
+ custom_labels=custom_labels,
+ custom_logit_processor=request.custom_logit_processor,
+ )
+
+ return adapted_request, request
+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The function _hack_convert_input_ids_to_messages contains a significant amount of code duplicated from _convert_to_internal_request. The function name prefix _hack_ also suggests this is a temporary or non-ideal solution.

To improve maintainability and reduce redundancy, I recommend refactoring this logic. The two paths (one for input_ids and one for messages) can be unified within _convert_to_internal_request. You could determine the prompt_kwargs and other message-related data at the beginning, and then proceed with the common logic for handling sampling parameters, LoRA, and creating the GenerateReqInput.

This would make the code cleaner, easier to understand, and less prone to bugs when one path is updated and the other is forgotten.

+ raw_request: Request = None,
+ ) -> tuple[GenerateReqInput, ChatCompletionRequest]:
+
+ # Notice: currently, if input_ids is provided, the stop token is not used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment notes a significant limitation: stop tokens are not used when input_ids are provided. This could lead to unexpected behavior for users who expect stop sequences to function as they do with message-based inputs. This is a potential correctness issue from a user's perspective.

This limitation should either be addressed to make the feature complete, or it should be very clearly documented in the public-facing API documentation to prevent confusion and bugs in user code.



async def run_agent(base_url: str, prompt: list[dict[str, Any]] | str, request_kwargs: dict[str, Any] = None) -> None:
payload = {"model": "default", "messages": prompt, "logprobs": True, **request_kwargs}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The request_kwargs passed to this function are generated by build_chat_request_kwargs in agentic_tool_call.py, which already hardcodes "logprobs": True. Setting it again here in the payload is redundant.

To avoid redundancy, you can remove "logprobs": True from this payload definition.

Suggested change
payload = {"model": "default", "messages": prompt, "logprobs": True, **request_kwargs}
payload = {"model": "default", "messages": prompt, **request_kwargs}

@guapisolo
Copy link
Collaborator Author

@zhaochenyang20 Could you help review the doc oai_endpoint.md and gen_endpoint.md? Thanks!

@guapisolo guapisolo changed the title Add Doc for agentic multi-turn Doc: Add Doc for agentic multi-turn Jan 24, 2026
@guapisolo guapisolo marked this pull request as draft January 24, 2026 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant