fix: Debug and enhance Exgentic A2A runner by yoavkatz · Pull Request #10 · kagenti/workload-harness

yoavkatz · 2026-03-19T16:09:55Z

This PR adds test harness to check Exgentic benchmarks.

Implement complete test harness for Exgentic benchmarks following the flow described in kagenti/kagenti#963 Key features: - MCP client using official Python SDK with streamable HTTP transport - Sequential session processing with full lifecycle management - A2A protocol integration for agent communication - OpenTelemetry instrumentation for metrics and tracing - Comprehensive configuration and documentation Components: - mcp_client.py: MCP protocol client for Exgentic server - exgentic_adapter.py: High-level adapter for session management - runner.py: Main orchestration with telemetry - config.py: Configuration management - prompt.py: Prompt builder with session_id injection - otel.py: OpenTelemetry setup - a2a_client.py: A2A protocol client (from appworld_a2a_runner) Testing: - Successfully connects to Exgentic MCP server (tau2 benchmark) - Verified session creation with 114 available tasks - Proper error handling and logging configuration Documentation: - README.md: Complete usage guide - QUICKSTART.md: Quick start for Kagenti cluster - Architecture and implementation docs Signed-off-by: Yoav Katz <katz@il.ibm.com>

Changes: - Add list_tasks() method to MCPClient to fetch all available task IDs - Add get_task_ids() method to ExgenticAdapter - Update iterate_sessions() to accept task_ids list and respect max_tasks - Update create_session() to accept optional task_id parameter - Update runner to fetch task IDs first, then iterate over them - Remove debug exit(99) statement - Improve logging to show progress (task X/Y) This ensures we know the total number of tasks upfront and can properly limit processing with max_tasks configuration. Signed-off-by: Yoav Katz <katz@il.ibm.com>

Remove all '# Made with ...' comments from Python files for cleaner code. Signed-off-by: Yoav Katz <katz@il.ibm.com>

The agent card may advertise an internal URL (e.g., 0.0.0.0:8000) that is not accessible from outside the pod. This change ensures we always use the configured A2A_BASE_URL (e.g., localhost:8080 via port-forward) instead of the URL from the agent card. This fixes the 404 error when connecting to agents behind port-forwards or proxies. Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Fix syntax errors in run-with-port-forward.sh: * Add missing comment symbol on line 36 * Fix unclosed quote on line 40 * Replace parentheses in echo statements to avoid syntax errors * Update service names to match actual cluster services - Configure A2A endpoint to use root path (/) instead of /v1/chat - Enable OTEL trace collection to local Jaeger instance (localhost:4317) - Enhance OTEL instrumentation: * Add full prompt text to span attributes (prompt.text) * Add full response text to span attributes (response.text) * Improve visibility of inputs/outputs in Jaeger traces - Improve prompt instructions: * Add explicit instruction to call submit MCP tool when asked - Enhance logging: * Add evaluation result details to session evaluation logs Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Add AGENT_SERVICE and BENCHMARK_SERVICE to example.env - Update run-with-port-forward.sh to read service names from .env - Use default values if environment variables are not set - Improves configurability and makes it easier to switch between different deployments Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Add MAX_PARALLEL_SESSIONS configuration parameter (default: 1) - Implement ThreadPoolExecutor for concurrent session execution - Add thread-safe result collection with mutex lock - Display max parallel sessions in run summary - Maintain backward compatibility with sequential processing (max_parallel_sessions=1) - Support abort_on_failure in parallel mode by canceling remaining futures Benefits: - Significantly improves throughput for I/O-bound workloads - Allows users to configure parallelism based on their needs - Maintains all existing functionality and error handling Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Display table of all failed sessions with their error messages at end of run summary - Truncate long error messages to 50 characters for readability - Only show table if there are failed sessions - Helps quickly identify and diagnose session failures Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Extract text from artifacts and result first, regardless of state - Then handle failed/canceled/rejected states with extracted information - Include extracted output in error messages for better debugging - Provides complete context when tasks don't complete successfully Signed-off-by: Yoav Katz <katz@il.ibm.com>

…cluster Add three new scripts to automate deployment and configuration of Exgentic benchmark system on Kagenti Kubernetes cluster: 1. deploy-benchmark.sh: Deploy MCP tools via Kagenti API - Syncs local container images to cluster registry - Authenticates with Keycloak using password grant flow - Deploys tools with proper service configuration - Patches imagePullPolicy for local images - Waits for deployment readiness 2. deploy-agent.sh: Deploy A2A agents from source - Fetches and parses environment variables from GitHub - Deploys agents using Shipwright builds - Monitors build progress and waits for completion - Waits for deployment creation and readiness - Tests agent accessibility via A2A protocol - Fixes port configuration (8080 -> 8000) 3. configure-agent-environment.sh: Configure agent environment - Updates OpenAI API secret via kubectl patch - Patches agent deployment with Azure OpenAI settings - Accepts benchmark name as parameter - Waits for rollout completion These scripts enable automated deployment and testing of the Exgentic benchmark system without manual kubectl commands or UI interaction. Fixes: - Agent port mismatch (container port 8000 vs service port 8080) - MCP_URLS environment variable configuration - Azure OpenAI endpoint and model configuration Signed-off-by: Yoav Katz <katz@il.ibm.com>

…agenti-ui Port 8080 was being used by both the A2A agent port-forward and the kagenti-ui service (via Istio gateway), causing intermittent access issues to http://kagenti-ui.localtest.me:8080/. Changes: - Updated A2A_BASE_URL from localhost:8080 to localhost:8081 in example.env - Modified run-with-port-forward.sh to forward A2A agent to port 8081 - Updated connectivity test to check port 8081 This allows kagenti-ui to be accessed on port 8080 via Istio gateway while the A2A agent uses port 8081, eliminating port conflicts. Signed-off-by: Yoav Katz <katz@il.ibm.com>

Changes: - Made configure-agent-environment.sh executable (chmod +x) - Fixed tool name in deploy-agent.sh: removed duplicate '-mcp' suffix from 'exgentic-mcp-${BENCHMARK_NAME}-mcp' to 'exgentic-mcp-${BENCHMARK_NAME}' - Fixed tool name in deploy-benchmark.sh: removed duplicate '-mcp' suffix from 'exgentic-mcp-${BENCHMARK_NAME}-mcp' to 'exgentic-mcp-${BENCHMARK_NAME}' This ensures consistent tool naming across deployment scripts and makes the configuration script directly executable. Signed-off-by: Yoav Katz <katz@il.ibm.com>

Signed-off-by: Yoav Katz <katz@il.ibm.com>

… auth Changes: - Updated QUICKSTART.md with comprehensive deployment instructions - Added Option 1: Deploy Your Own Benchmark and Agent - Added Option 2: Use Existing Services - Documented deploy-benchmark.sh and deploy-agent.sh usage - Updated configuration section with new port (8081) for A2A agent - Added reference documentation for deployment scripts - Fixed Keycloak authentication error in deployment scripts - Added automatic enabling of Direct Access Grants for kagenti client - Both deploy-benchmark.sh and deploy-agent.sh now configure Keycloak - Added better error messages for authentication failures - Renumbered steps after adding Keycloak configuration step This resolves the 'unauthorized_client' error when running deployment scripts and provides clear documentation for deploying benchmarks and agents to the Kagenti cluster. Signed-off-by: Yoav Katz <katz@il.ibm.com>

Changes: 1. Renamed configure-agent-environment.sh to configure-agent-and-benchmark-environment.sh - Use 'kubectl set env' instead of JSON patch for cleaner updates - Extended script to configure both agent and benchmark deployments - Added clear separation between agent and benchmark configuration sections - Improved output formatting with dedicated sections for each component - Added deployment-specific configuration summaries - Agent gets: LLM_API_BASE, OPENAI_API_BASE, LLM_MODEL - Benchmark gets: OPENAI_API_BASE, EXGENTIC_SET_BENCHMARK_USER_SIMULATOR_MODEL 2. Enhanced deploy-benchmark.sh - Added fetching and parsing of benchmark-specific environment variables - Fetches .env.<benchmark> from agent-examples repository - Parses environment variables using Kagenti API - Includes env vars in tool deployment configuration - Added graceful handling when env file is not found - Renumbered steps after adding env var fetching step These improvements ensure: - Consistent LLM configuration across agent and benchmark - Better visibility into what's being configured - Benchmark-specific settings are properly applied from repository - Clearer output for troubleshooting - Proper separation of concerns between agent and benchmark configuration Signed-off-by: Yoav Katz <katz@il.ibm.com>

Changed port-forward cleanup to kill processes by port number instead of service name. This ensures all existing port-forwards on ports 8000 and 8081 are cleaned up regardless of which benchmark or agent service they were forwarding to. Uses lsof to find processes using the ports and kills them, making the script more robust when switching between different benchmarks/agents. Signed-off-by: Yoav Katz <katz@il.ibm.com>

- Add resource limits (2Gi memory) to benchmark pod deployments - Rename close_session to delete_session throughout the stack - Add validation for delete_session response (supports both 'success' and 'status' fields) - Conditionally set EXGENTIC_SET_BENCHMARK_USER_SIMULATOR_MODEL only for tau benchmarks - Create evaluate_benchmark.sh script that accepts benchmark name as parameter - Set AGENT_SERVICE and BENCHMARK_SERVICE dynamically based on benchmark name Signed-off-by: Yoav Katz <katz@il.ibm.com>

yoavkatz added 6 commits March 17, 2026 19:21

chore: Remove attribution comments from code

0c8e92e

Remove all '# Made with ...' comments from Python files for cleaner code. Signed-off-by: Yoav Katz <katz@il.ibm.com>

pdettori requested a review from kellyaa March 19, 2026 17:55

yoavkatz added 10 commits March 22, 2026 09:07

Updated documentation with new scripts

23f5aac

Signed-off-by: Yoav Katz <katz@il.ibm.com>

rubambiza mentioned this pull request Mar 23, 2026

Org Weekly Report 2026-03-16 -- 2026-03-23 kagenti/kagenti#1094

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Debug and enhance Exgentic A2A runner#10

fix: Debug and enhance Exgentic A2A runner#10
yoavkatz wants to merge 17 commits intokagenti:mainfrom
yoavkatz:feature/exgentic-a2a-runner

yoavkatz commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yoavkatz commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yoavkatz commented Mar 19, 2026 •

edited

Loading