Skip to content

Conversation

@bparees
Copy link

@bparees bparees commented Jan 30, 2026

Description

Adds support for the current LCORE tool_call response format, e.g:

data: {"event": "tool_call", "data": {"id": "mcp_list_cb6c7811-b2de-4ab7-9109-63a3fac1a5ee", "name": "mcp_list_tools", "args": {"server_label": "obs"}, "type": "mcp_list_tools"}}

data: {"event": "tool_result", "data": {"id": "mcp_list_cb6c7811-b2de-4ab7-9109-63a3fac1a5ee", "status": "success", "content": "{\"server_label\": \"obs\", \"tools\": [{\"name\": \"execute_range_query\", \"description\": \"Execute a PromQL range query with flexible time specification.\\n\\nFor current time data queries, use only the 'duration' parameter to specify how far back\\nto look from now (e.g., '1h' for last hour, '30m' for last 30 minutes). In that case\\nSET 'end' to 'NOW' and leave 'start' empty.\\n\\nFor historical data queries, use explicit 'start' and 'end' times.\\n\", \"input_schema\": {\"type\": \"object\", \"properties\": {\"duration\": {\"description\": \"Duration to look back from now (e.g., '1h', '30m', '1d', '2w') (optional)\", \"pattern\": \"^\\\\d+[smhdwy]$\", \"type\": \"string\"}, \"end\": {\"description\": \"End time as RFC3339 or Unix timestamp (optional). Use `NOW` for current time.\", \"type\": \"string\"}, \"query\": {\"description\": \"PromQL query string\", \"type\": \"string\"}, \"start\": {\"description\": \"Start time as RFC3339 or Unix timestamp (optional)\", \"type\": \"string\"}, \"step\": {\"description\": \"Query resolution step width (e.g., '15s', '1m', '1h')\", \"pattern\": \"^\\\\d+[smhdwy]$\", \"type\": \"string\"}}, \"required\": [\"query\", \"step\"]}}, {\"name\": \"list_metrics\", \"description\": \"List all available metrics in Prometheus\", \"input_schema\": {\"type\": \"object\", \"properties\": {}}}]}", "type": "mcp_list_tools", "round": 1}}

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Verified by running evals against a current LCORE deployment (v0.4+)

Summary by CodeRabbit

Release Notes

  • New Features

    • Tool calls can now be linked with their execution results via unique identifiers
    • Enhanced tool call parsing to support both legacy and new event data formats while maintaining backward compatibility
  • Tests

    • Added test coverage for new tool call format and result association scenarios

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

Walkthrough

The streaming parser has been extended to handle tool_result events alongside tool calls. A new tool_calls_by_id mapping tracks tool calls by ID, enabling result association. Parsing now supports both legacy (tool_name/arguments) and new (name/args) formats. A new helper processes and applies results to corresponding tool calls.

Changes

Cohort / File(s) Summary
Streaming Parser Enhancement
src/lightspeed_evaluation/core/api/streaming_parser.py
Added support for tool_result events and extended CONTENT_EVENTS constant. Introduced tool_calls_by_id field to StreamingContext for ID-based tool call tracking. Modified _parse_tool_call to accept broader event data and support both legacy and new formats. Added _process_tool_result helper to link results to tool calls by ID with error handling and logging.
Parser Tests
tests/unit/core/api/test_streaming_parser.py
Added comprehensive tests for new-format tool call parsing (id, name, args) and tool_result event association. Verified result propagation to corresponding tool calls and token count preservation. Updated existing test docstring to clarify legacy format usage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Parse new LCORE tool_call format' directly and concisely summarizes the main change: adding support for parsing the new LCORE tool_call response format with updated event structures.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bparees
Copy link
Author

bparees commented Jan 30, 2026

@asamal4 w/o this change i was not getting tool_call data in the eval results when using a current LCORE deployment

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/lightspeed_evaluation/core/api/streaming_parser.py`:
- Around line 206-219: The parser incorrectly treats an empty new-format args
dict as missing because it uses "or" when assigning tool_name and arguments;
change the logic to prefer the "name"/"args" keys when present by checking key
presence on event_data (e.g., use "if 'name' in event_data: tool_name =
event_data['name'] else: tool_name = event_data.get('tool_name')" and similarly
for "args"/"arguments") and then treat arguments as missing only if neither key
is present (i.e., arguments is None and neither "args" nor "arguments" keys
exist). Update the code paths around event_data, tool_name, and arguments in
streaming_parser.py so empty dicts for args are accepted as valid.

Comment on lines 206 to 219
try:
tool_name = token.get("tool_name")
arguments = token.get("arguments")
# Support both "name"/"args" (new format) and "tool_name"/"arguments" (legacy)
tool_name = event_data.get("name") or event_data.get("tool_name")
arguments = event_data.get("args") or event_data.get("arguments")

if not tool_name:
logger.debug("Tool call missing tool_name field")
logger.debug("Tool call missing name/tool_name field")
return None

# Only process tool calls that explicitly have arguments field
# Arguments can be empty dict, but field should exist
if arguments is None:
logger.debug("Tool call missing arguments field for %s", tool_name)
logger.debug("Tool call missing args/arguments field for %s", tool_name)
return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle empty new-format args without dropping tool calls.

Line 208-210 uses or, so args={} in the new format becomes falsy and is treated as missing, causing valid tool calls to be discarded. Consider key-presence checks instead.

🛠️ Proposed fix
-        tool_name = event_data.get("name") or event_data.get("tool_name")
-        arguments = event_data.get("args") or event_data.get("arguments")
+        tool_name = (
+            event_data.get("name") if "name" in event_data else event_data.get("tool_name")
+        )
+        if "args" in event_data:
+            arguments = event_data.get("args")
+        elif "arguments" in event_data:
+            arguments = event_data.get("arguments")
+        else:
+            arguments = None
🤖 Prompt for AI Agents
In `@src/lightspeed_evaluation/core/api/streaming_parser.py` around lines 206 -
219, The parser incorrectly treats an empty new-format args dict as missing
because it uses "or" when assigning tool_name and arguments; change the logic to
prefer the "name"/"args" keys when present by checking key presence on
event_data (e.g., use "if 'name' in event_data: tool_name = event_data['name']
else: tool_name = event_data.get('tool_name')" and similarly for
"args"/"arguments") and then treat arguments as missing only if neither key is
present (i.e., arguments is None and neither "args" nor "arguments" keys exist).
Update the code paths around event_data, tool_name, and arguments in
streaming_parser.py so empty dicts for args are accepted as valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant