-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Hi Team,
This is Luke.
I've been interested in trying out and benchmarking Droidrun so I started with this repo.
In the midst of running a task, I've came across a few issue that could be alleviated rather quickly.
Error Messages:
💬 Preparing chat for task execution... Error during task execution: 'Context' object has no attribute 'get' 2025-10-14 17:59:55,024 - eval.runner - ERROR - Error completing task ContactsNewContactDraft 0: 1 validation error for CodeActResultEvent steps Input should be a valid integer [type=int_type, input_value=[], input_type=list] For further information visit https://errors.pydantic.dev/2.11/v/int_type 2025-10-14 17:59:55,024 - eval.runner - DEBUG - Tearing down task ContactsNewContactDraft 0 2025-10-14 17:59:55,461 - eval.cli - INFO - Task ContactsNewContactDraft 0 completed successfully 2025-10-14 17:59:55,462 - tracker - ERROR - DISCORD_WEBHOOK_URL is not set
Issues shown:
- Context API Mismatch
- Pydantic Validation
- Missing Environment Variable
DISCORD_WEBHOOK_URL
Temporary workarounds for now:
Issue 1 :
Updated the following files to replace `ctx.get` -> `ctx.store.get` and `ctx.set` to `ctx.store.set`
# llama_index.core.workflow
-droidrun/agent/utils/executer.py
-droidrun/agent/planner/planner_agent.py
-droidrun/agent/codeact/codeact_agent.py
-droidrun/agent/droid/droid_agent.py
Issue 2:
Updated "droid_agent.py" file to replace `steps=[]` to `steps=0`
CodeActResultEvent(success=False, reason=f"Error: {str(e)}", task=task, steps=0)
Issue 3:
Updated "eval/tracker.py" file to replace `logger.error` to `logger.debug` to make the webhook optional.
def send_discord_embed(embed: dict):
webhook_url = os.getenv("DISCORD_WEBHOOK_URL")
if webhook_url is None:
logger.debug("DISCORD_WEBHOOK_URL is not set, skipping Discord notification")
return
I am pretty sure your team has updated version internally and thought it be helpful for others that came across the same issue that I did when reproducing the benchmark result.