-
Notifications
You must be signed in to change notification settings - Fork 62
OLS-1406: Add tool output truncation #2717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@onmete: This pull request references OLS-1406 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
asamal4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall tool output truncation logic is fine. But we will have to handle few other scenarios to avoid any error because of context window limit..
- enforce overall tool token usage limit. Imagine a scenario where we may have multiple small parallel tool outputs. But overall token size may be more.
- consider tools definition for initial prompt token calculation.
- need to consider AIMessage in each iteration for input token calculation for next iteration
- need to re-calculate available tokens for each tool iteration. With every iteration (or with parallel tool calls) AIMessage & tool output will be added to the prompt.
- token count should be weighted.
ols/utils/token_handler.py
Outdated
| if truncation occurred | ||
| """ | ||
| tokens = self.text_to_tokens(text) | ||
| token_count = len(tokens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider adding the weight, the constant is already there with weight 1.1
| max_tokens_per_tool_output: PositiveInt = ( | ||
| constants.DEFAULT_MAX_TOKENS_PER_TOOL_OUTPUT | ||
| ) | ||
| max_tokens_for_tools: PositiveInt = constants.DEFAULT_MAX_TOKENS_FOR_TOOLS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just used for initial token reservation, never enforced later. It is possible that after multiple iteration overall limit will exceed
| @@ -388,17 +393,29 @@ async def iterate_with_tools( # noqa: C901 | |||
|
|
|||
| # execute tools and add to messages | |||
| tool_calls_messages = await execute_tool_calls( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to recalculate available tokens for every iteration.
|
@onmete We want to keep the summarization separate ? how would that impact this calculation ? |
|
@xrajesh Partially - the logic for truncating/cut-off of very long tool output will be still be required (output that don't fit into the context window), the rest will be replaced by the summarization logic. |
|
@asamal4 comments addressed.
|
|
watson timeout |
|
@onmete: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
Add tool output truncation
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing