-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Problem
When web_search tools are enabled and executed, build_prompt() is called twice, causing the expensive CTE query (get_messages_by_ids) to execute twice per request:
- Phase 2 (line 1039): Build context for billing check and initial LLM call
- Phase 6 (line 1863): Rebuild entire prompt from DB after tools execute to include tool messages
This doubles the cost of the most expensive query in the Responses API flow.
Proposed Solution
Instead of rebuilding the entire prompt from scratch, incrementally append tool messages to the existing context:
// In Phase 6, instead of full rebuild:
let prompt_messages = if tools_executed {
let mut messages = Arc::as_ref(&context.prompt_messages).clone();
// Fetch ONLY the new tool messages (much cheaper)
let tool_calls = db.get_tool_calls_for_response(response_id)?;
let tool_outputs = db.get_tool_outputs_for_response(response_id)?;
// Decrypt and append to existing context
for tool_call in tool_calls {
messages.push(format_tool_call_message(tool_call, &user_key)?);
}
for tool_output in tool_outputs {
messages.push(format_tool_output_message(tool_output, &user_key)?);
}
messages
} else {
Arc::as_ref(&context.prompt_messages).clone()
};Benefits
- Eliminates second full CTE UNION query
- Only fetches 2-4 new records (tool_calls + tool_outputs)
- Avoids re-decrypting entire conversation history
- Significant performance improvement for web_search-enabled requests
Implementation Notes
- Need to add
get_tool_calls_for_responseandget_tool_outputs_for_responsehelper methods to DB trait - Ensure tool messages are ordered correctly when appending (created_at ASC)
- May need to verify token counts still work correctly with incremental approach
Related
This was discovered during investigation of query frequency increases after web_search tool integration (PR #103).
Metadata
Metadata
Assignees
Labels
No labels