Conversation
WalkthroughThe PR introduces logging improvements via a new filter for noisy HTTP warnings, adds two new GraphQL field resolvers for team members and labels with validation enhancements for issue mutations, reformats a calendar API route definitions table for readability, updates a Git LFS path pattern for recursive JSON file matching, and adds a comprehensive benchmarking notebook for agent-diff experiments with ReAct-based workflow and checkpoint management. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Benchmark Suite
participant ReAct as ReAct Agent
participant API as OpenRouter API
participant Executor as Bash Executor
participant CP as Checkpoint
Client->>ReAct: Initialize with system prompt & task
loop Max Iterations
ReAct->>ReAct: Generate Thought
ReAct->>ReAct: Parse Thought/Action/Observation
alt Action is Tool Use
ReAct->>Executor: Execute bash command
Executor-->>ReAct: Return output
else Action is Final
ReAct-->>Client: Return result
end
ReAct->>API: Send ReAct state to model
API-->>ReAct: Receive model response
end
Client->>CP: Mark test completed
CP->>CP: Persist checkpoint state
Client->>Client: Aggregate results & analyze
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@backend/src/services/linear/api/resolvers.py`:
- Around line 1245-1278: The resolver accepts a filter param but never applies
it; update the labels query to apply the provided filter (similar to
resolve_team_members). Import or use the existing apply_issue_label_filter (or
implement it if missing) and call it with the current base_query and the
incoming filter param (e.g., base_query = apply_issue_label_filter(base_query,
filter)) before pagination/ordering, ensuring you reference IssueLabel,
base_query, and order_field when composing the filtered query.
In `@experiments/kdd` 2026/agent-diff bench.ipynb:
- Around line 718-719: Remove the unused local variable assignment to score:
instead of assigning score = client.evaluate_run(runId=run.runId), invoke
client.evaluate_run(runId=run.runId) for its side effects and drop the score
variable; keep the subsequent run_result =
client.get_results_for_run(runId=run.runId) unchanged so the evaluation call
still runs but no unused variable remains.
- Around line 48-53: The load_api_docs function currently calls
json.load(open(filepath)) which leaves the file handle open; change it to open
the file using a context manager and explicit encoding (e.g. use with
open(filepath, "r", encoding="utf-8") as f: return json.load(f)) so the file is
closed reliably; keep the existing filepath.exists() check and the fallback
print/return {} behavior in load_api_docs.
- Around line 359-435: The call_openrouter function can end up raising None when
max_retries <= 0 because the retry loop never runs; add an upfront guard in
call_openrouter to validate max_retries (e.g., if max_retries < 1: raise
ValueError("max_retries must be >= 1") or coerce to 1) before entering the for
loop, so last_error is never None and behavior is deterministic; update any
callers/tests if you choose to change the parameter semantics.
- Around line 675-744: The current await of asyncio.to_thread(run_react_agent,
...) is cancelled on timeout but the worker thread keeps running; change to
create a persistent task handle (e.g., worker_task =
asyncio.create_task(asyncio.to_thread(run_react_agent, ...))) instead of
directly awaiting asyncio.to_thread, await worker_task with asyncio.wait_for,
and on asyncio.TimeoutError set stop_event and then await worker_task again with
a short grace period (e.g., grace_seconds) to allow the thread to exit before
proceeding to cleanup; if that second wait_for also times out, proceed but
record that the worker did not stop. Also replace the silent except around
client.delete_env with logging of the exception (use an existing logger or
logging.exception) so deletion failures are visible. Ensure references:
run_react_agent, stop_event, worker_task (new), and client.delete_env are
updated accordingly.
🧹 Nitpick comments (3)
backend/src/platform/logging_config.py (1)
21-24: Rename unusedexc_typebinding.Line 22 binds
exc_typebut never uses it; rename to_exc_typeto avoid unused-variable noise.Suggested change
- exc_type, exc_value, _ = record.exc_info + _exc_type, exc_value, _ = record.exc_infobackend/src/services/calendar/api/batch.py (1)
301-306: Mark unusedparam_namesas intentionally unused.Line 301 binds
param_namesbut never reads it; rename to_param_namesto make intent explicit.Suggested change
- for route_method, pattern, handler, param_names in ROUTE_PATTERNS: + for route_method, pattern, handler, _param_names in ROUTE_PATTERNS:backend/src/services/linear/api/resolvers.py (1)
1280-1327: Consider extracting duplicated cursor pagination logic.The pagination logic in this resolver (lines 1280-1327) is nearly identical to
resolve_team_members(lines 1189-1236). The only differences are the model class and its attributes. This could be extracted into a generic helper function that accepts the model class and query as parameters.
| "def load_api_docs(filepath: Path) -> dict:\n", | ||
| " \"\"\"Load API docs JSON, return empty dict if not found.\"\"\"\n", | ||
| " if filepath.exists():\n", | ||
| " return json.load(open(filepath))\n", | ||
| " print(f\"Docs not found: {filepath}\")\n", | ||
| " return {}\n", |
There was a problem hiding this comment.
Close the API docs file handle after reading.
Line 51 uses json.load(open(...)) without closing the file, which can leak descriptors during long runs. Use a context manager (and set encoding).
🔧 Suggested fix
- if filepath.exists():
- return json.load(open(filepath))
+ if filepath.exists():
+ with open(filepath, "r", encoding="utf-8") as f:
+ return json.load(f)🤖 Prompt for AI Agents
In `@experiments/kdd` 2026/agent-diff bench.ipynb around lines 48 - 53, The
load_api_docs function currently calls json.load(open(filepath)) which leaves
the file handle open; change it to open the file using a context manager and
explicit encoding (e.g. use with open(filepath, "r", encoding="utf-8") as f:
return json.load(f)) so the file is closed reliably; keep the existing
filepath.exists() check and the fallback print/return {} behavior in
load_api_docs.
| " start = time.perf_counter()\n", | ||
| " try:\n", | ||
| " response = await asyncio.wait_for(\n", | ||
| " asyncio.to_thread(\n", | ||
| " run_react_agent,\n", | ||
| " model_name=model_name,\n", | ||
| " task_prompt=prompt,\n", | ||
| " bash_executor=bash_executor,\n", | ||
| " system_prompt=system_prompt,\n", | ||
| " max_iterations=max_iterations,\n", | ||
| " trace_accumulator=trace_accumulator,\n", | ||
| " stop_event=stop_event,\n", | ||
| " ),\n", | ||
| " timeout=test_timeout_seconds\n", | ||
| " )\n", | ||
| " except asyncio.TimeoutError:\n", | ||
| " timed_out = True\n", | ||
| " # Signal thread to stop at next iteration\n", | ||
| " stop_event.set()\n", | ||
| " # Give thread a moment to finish current operation and exit\n", | ||
| " await asyncio.sleep(2)\n", | ||
| " # Use accumulated trace (partial) instead of losing it\n", | ||
| " response = {\n", | ||
| " \"steps\": trace_accumulator.get(\"steps\", []),\n", | ||
| " \"final\": trace_accumulator.get(\"final\"),\n", | ||
| " \"iterations\": len(trace_accumulator.get(\"steps\", [])),\n", | ||
| " \"completed\": False,\n", | ||
| " \"usage\": trace_accumulator.get(\"usage\", {}),\n", | ||
| " \"timeout_error\": f\"Test timed out after {test_timeout_seconds} seconds\",\n", | ||
| " }\n", | ||
| " except Exception as e:\n", | ||
| " response = {\n", | ||
| " \"steps\": trace_accumulator.get(\"steps\", []),\n", | ||
| " \"final\": trace_accumulator.get(\"final\"),\n", | ||
| " \"iterations\": len(trace_accumulator.get(\"steps\", [])),\n", | ||
| " \"completed\": False,\n", | ||
| " \"usage\": trace_accumulator.get(\"usage\", {}),\n", | ||
| " \"error\": str(e),\n", | ||
| " }\n", | ||
| " finally:\n", | ||
| " execution_time = time.perf_counter() - start\n", | ||
| "\n", | ||
| " # Evaluation\n", | ||
| " score = client.evaluate_run(runId=run.runId)\n", | ||
| " run_result = client.get_results_for_run(runId=run.runId)\n", | ||
| "\n", | ||
| " result = {\n", | ||
| " \"prompt\": prompt,\n", | ||
| " \"status\": \"timeout\" if timed_out else run_result.status,\n", | ||
| " \"passed\": False if timed_out else run_result.passed,\n", | ||
| " \"score\": 0 if timed_out else run_result.score.get(\"percent\", 0),\n", | ||
| " \"time\": round(execution_time, 2),\n", | ||
| " \"failures\": [\"Test timed out\"] if timed_out else run_result.failures,\n", | ||
| " \"runId\": run.runId,\n", | ||
| " \"trace\": response,\n", | ||
| " \"diff\": getattr(run_result, \"diff\", None),\n", | ||
| " }\n", | ||
| "\n", | ||
| " # Cleanup\n", | ||
| " client.delete_env(envId=env.environmentId)\n", | ||
| " return test_id, result\n", | ||
| "\n", | ||
| " except Exception as e:\n", | ||
| " # Cleanup on error if environment was created\n", | ||
| " if env:\n", | ||
| " try:\n", | ||
| " client.delete_env(envId=env.environmentId)\n", | ||
| " except:\n", | ||
| " pass\n", | ||
| " return test_id, {\"passed\": False, \"score\": 0, \"status\": \"error\", \"error\": str(e)}\n", |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
asyncio.wait_for cancel task behavior asyncio.to_thread timeout
💡 Result:
asyncio.wait_for(aw, timeout)works by cancelling the awaited Task/awaitable when the timeout expires, then raisingTimeoutError(Python 3.11+) to the caller. It also waits for the cancellation to actually finish, so the total time spent can exceedtimeout. If the caller is cancelled,wait_foralso cancelsaw. To prevent cancellingaw, wrap it inasyncio.shield(). [1]asyncio.to_thread(func, ...)runsfuncin a separate OS thread and returns an awaitable for its result. The docs describe execution-in-a-thread and result propagation, but there is no general mechanism in Python to forcibly stop an arbitrary running thread once started. [2]- Therefore,
await asyncio.wait_for(asyncio.to_thread(...), timeout=...)will typically:- time out,
- cancel the await (the asyncio Task waiting for the thread result),
- but the underlying thread keeps running in the background until
funcreturns (or never returns). This is the same behavior people observe withrun_in_executor, whichto_threadis built upon. [3][4]
- Practical implication: if you need “real” cancellation on timeout for work done in
to_thread, you must build it intofunc(e.g., periodically check athreading.Event/ cancellation flag, or use a cancellable API in the blocking call). [4]
Sources:
[1] Python docs: asyncio.wait_for timeout/cancellation semantics https://docs.python.org/3.12/library/asyncio-task.html
[2] Python docs: asyncio.to_thread (“Running in Threads”) https://docs.python.org/3.12/library/asyncio-task.html
[3] CPython issue: cancelling run_in_executor doesn’t stop the thread python/cpython#107505
[4] Stack Overflow: cancelling a to_thread task won’t stop the thread https://stackoverflow.com/questions/71416383/python-asyncio-cancelling-a-to-thread-task-wont-stop-the-thread
🌐 Web query:
asyncio.to_thread documentation timeout handling thread cleanup
💡 Result:
asyncio.to_thread: timeouts, cancellation, and “thread cleanup”
What to_thread() does
asyncio.to_thread(func, *args, **kwargs) runs func in a separate thread (via asyncio’s thread-pool machinery) and returns an awaitable for its result; it also propagates contextvars into that thread. [1]
Timeout handling (recommended patterns)
Use an asyncio timeout around the await:
import asyncio
async def main():
try:
return await asyncio.wait_for(asyncio.to_thread(blocking_fn), timeout=2.0)
except TimeoutError:
...Or (Python 3.11+) structured timeout:
async with asyncio.timeout(2.0):
await asyncio.to_thread(blocking_fn)Important: timeout/cancellation does not stop the underlying thread
If the timeout triggers, asyncio cancels the awaiting task, but a running thread-pool job generally keeps running; it is not forcibly terminated. This is the same behavior people observe with run_in_executor() (which to_thread() is built on top of). [2]
“Thread cleanup”: what you can and can’t do
- You typically cannot kill the worker thread safely from asyncio.
- The thread-pool threads are reused, so there’s usually no per-call “thread cleanup” to perform—your concern is stopping/cleaning up the work function you ran.
Practical options:
- Cooperative cancellation: pass a
threading.Event(or similar) that the worker checks periodically and exits cleanly. - Design
funcwith timeouts (e.g., socket timeouts, request timeouts) so it returns on its own. - If you manage your own
ThreadPoolExecutor, you can shut it down;shutdown(cancel_futures=True)only cancels tasks that haven’t started yet—running ones still run to completion. [3] - If you rely on the loop’s default executor, note that asyncio uses a lazily-created
ThreadPoolExecutorforrun_in_executor(None, ...)-style calls. [4]
Sources
- Python docs:
asyncio.to_thread()(asyncio tasks / “Running in Threads”). [1] - CPython issue discussing that cancelling
run_in_executor()doesn’t stop the running thread. [2] - Python docs:
ThreadPoolExecutor.shutdown(cancel_futures=...)semantics. [3] - Python docs: default executor behavior in
loop.run_in_executor()(ThreadPoolExecutor is lazy-initialized if needed). [4]
🏁 Script executed:
# First, find the notebook file
find . -name "*agent-diff*" -type f 2>/dev/null | head -20Repository: hubert-marek/agent-diff
Length of output: 233
🏁 Script executed:
# Check if the file exists in the expected path
ls -la "experiments/kdd 2026/agent-diff bench.ipynb" 2>/dev/null || echo "File not found at expected location"Repository: hubert-marek/agent-diff
Length of output: 160
🏁 Script executed:
# Extract the notebook content as JSON and examine the code cell around lines 675-744
python3 << 'EOF'
import json
import sys
try:
with open('./experiments/kdd 2026/agent-diff bench.ipynb', 'r') as f:
nb = json.load(f)
# Find the cell containing the code around lines 675-744
# Note: line numbers in the comment refer to line numbers within the cell
line_count = 0
for idx, cell in enumerate(nb.get('cells', [])):
if cell['cell_type'] == 'code':
source_text = ''.join(cell['source'])
cell_lines = source_text.split('\n')
cell_line_count = len(cell_lines)
# Print info to help locate the right cell
if 'asyncio.wait_for' in source_text or 'to_thread' in source_text:
print(f"\n=== Cell {idx} (Type: {cell['cell_type']}) ===")
print(f"Total lines in this cell: {cell_line_count}")
print("=" * 60)
# Print the entire cell
for i, line in enumerate(cell_lines, 1):
print(f"{i:4d}: {line}")
print("=" * 60)
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
EOFRepository: hubert-marek/agent-diff
Length of output: 37275
🏁 Script executed:
# Also search for the run_react_agent function to understand if stop_event is actually used
python3 << 'EOF'
import json
import sys
try:
with open('./experiments/kdd 2026/agent-diff bench.ipynb', 'r') as f:
nb = json.load(f)
# Find cells containing run_react_agent definition or calls
for idx, cell in enumerate(nb.get('cells', [])):
if cell['cell_type'] == 'code':
source_text = ''.join(cell['source'])
if 'def run_react_agent' in source_text:
print(f"\n=== Found run_react_agent definition in Cell {idx} ===")
print("=" * 60)
cell_lines = source_text.split('\n')
for i, line in enumerate(cell_lines, 1):
print(f"{i:4d}: {line}")
print("=" * 60)
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
EOFRepository: hubert-marek/agent-diff
Length of output: 37268
Ensure the background ReAct thread stops before cleanup.
When asyncio.wait_for times out, asyncio.to_thread does not terminate the underlying thread—it only cancels the awaiting task. The worker thread continues running, and with a fixed 2-second sleep, there is no guarantee it finishes before cleanup deletes the environment. This creates a race condition where in-flight API calls (lines 194–198) or bash executions (line 227) can race with environment deletion (line 439).
Keep a handle to the worker task, signal stop via stop_event, then await task completion with a grace period. Only after confirming completion (or timing out again) should cleanup proceed. Additionally, log deletion failures instead of silently swallowing them (lines 446–448).
🔧 Suggested fix
- try:
- response = await asyncio.wait_for(
- asyncio.to_thread(
- run_react_agent,
- model_name=model_name,
- task_prompt=prompt,
- bash_executor=bash_executor,
- system_prompt=system_prompt,
- max_iterations=max_iterations,
- trace_accumulator=trace_accumulator,
- stop_event=stop_event,
- ),
- timeout=test_timeout_seconds
- )
+ grace_period_seconds = 5
+ worker_task = asyncio.create_task(
+ asyncio.to_thread(
+ run_react_agent,
+ model_name=model_name,
+ task_prompt=prompt,
+ bash_executor=bash_executor,
+ system_prompt=system_prompt,
+ max_iterations=max_iterations,
+ trace_accumulator=trace_accumulator,
+ stop_event=stop_event,
+ )
+ )
+ try:
+ response = await asyncio.wait_for(worker_task, timeout=test_timeout_seconds)
except asyncio.TimeoutError:
timed_out = True
# Signal thread to stop at next iteration
stop_event.set()
- # Give thread a moment to finish current operation and exit
- await asyncio.sleep(2)
- # Use accumulated trace (partial) instead of losing it
- response = {
+ try:
+ response = await asyncio.wait_for(worker_task, timeout=grace_period_seconds)
+ except asyncio.TimeoutError:
+ # Use accumulated trace (partial) instead of losing it
+ response = {
"steps": trace_accumulator.get("steps", []),
"final": trace_accumulator.get("final"),
"iterations": len(trace_accumulator.get("steps", [])),
"completed": False,
"usage": trace_accumulator.get("usage", {}),
"timeout_error": f"Test timed out after {test_timeout_seconds} seconds",
- }
+ }
except Exception as e:
response = {
"steps": trace_accumulator.get("steps", []),
"final": trace_accumulator.get("final"),
"iterations": len(trace_accumulator.get("steps", [])),
"completed": False,
"usage": trace_accumulator.get("usage", {}),
"error": str(e),
}
finally:
execution_time = time.perf_counter() - start
...
- except:
- pass
+ except Exception as e:
+ print(f"[WARN] Failed to delete env {env.environmentId}: {e}")🧰 Tools
🪛 Ruff (0.14.14)
[error] 680-680: Local variable score is assigned to but never used
Remove assignment to unused variable score
(F841)
[warning] 697-697: Consider moving this statement to an else block
(TRY300)
[warning] 699-699: Do not catch blind exception: Exception
(BLE001)
[error] 704-704: Do not use bare except
(E722)
[error] 704-705: try-except-pass detected, consider logging the exception
(S110)
[warning] 713-713: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
In `@experiments/kdd` 2026/agent-diff bench.ipynb around lines 675 - 744, The
current await of asyncio.to_thread(run_react_agent, ...) is cancelled on timeout
but the worker thread keeps running; change to create a persistent task handle
(e.g., worker_task = asyncio.create_task(asyncio.to_thread(run_react_agent,
...))) instead of directly awaiting asyncio.to_thread, await worker_task with
asyncio.wait_for, and on asyncio.TimeoutError set stop_event and then await
worker_task again with a short grace period (e.g., grace_seconds) to allow the
thread to exit before proceeding to cleanup; if that second wait_for also times
out, proceed but record that the worker did not stop. Also replace the silent
except around client.delete_env with logging of the exception (use an existing
logger or logging.exception) so deletion failures are visible. Ensure
references: run_react_agent, stop_event, worker_task (new), and
client.delete_env are updated accordingly.
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.