fix: Debug and enhance Exgentic A2A runner#10
Open
yoavkatz wants to merge 17 commits intokagenti:mainfrom
Open
fix: Debug and enhance Exgentic A2A runner#10yoavkatz wants to merge 17 commits intokagenti:mainfrom
yoavkatz wants to merge 17 commits intokagenti:mainfrom
Conversation
Implement complete test harness for Exgentic benchmarks following the flow described in kagenti/kagenti#963 Key features: - MCP client using official Python SDK with streamable HTTP transport - Sequential session processing with full lifecycle management - A2A protocol integration for agent communication - OpenTelemetry instrumentation for metrics and tracing - Comprehensive configuration and documentation Components: - mcp_client.py: MCP protocol client for Exgentic server - exgentic_adapter.py: High-level adapter for session management - runner.py: Main orchestration with telemetry - config.py: Configuration management - prompt.py: Prompt builder with session_id injection - otel.py: OpenTelemetry setup - a2a_client.py: A2A protocol client (from appworld_a2a_runner) Testing: - Successfully connects to Exgentic MCP server (tau2 benchmark) - Verified session creation with 114 available tasks - Proper error handling and logging configuration Documentation: - README.md: Complete usage guide - QUICKSTART.md: Quick start for Kagenti cluster - Architecture and implementation docs Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changes: - Add list_tasks() method to MCPClient to fetch all available task IDs - Add get_task_ids() method to ExgenticAdapter - Update iterate_sessions() to accept task_ids list and respect max_tasks - Update create_session() to accept optional task_id parameter - Update runner to fetch task IDs first, then iterate over them - Remove debug exit(99) statement - Improve logging to show progress (task X/Y) This ensures we know the total number of tasks upfront and can properly limit processing with max_tasks configuration. Signed-off-by: Yoav Katz <katz@il.ibm.com>
Remove all '# Made with ...' comments from Python files for cleaner code. Signed-off-by: Yoav Katz <katz@il.ibm.com>
The agent card may advertise an internal URL (e.g., 0.0.0.0:8000) that is not accessible from outside the pod. This change ensures we always use the configured A2A_BASE_URL (e.g., localhost:8080 via port-forward) instead of the URL from the agent card. This fixes the 404 error when connecting to agents behind port-forwards or proxies. Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Fix syntax errors in run-with-port-forward.sh: * Add missing comment symbol on line 36 * Fix unclosed quote on line 40 * Replace parentheses in echo statements to avoid syntax errors * Update service names to match actual cluster services - Configure A2A endpoint to use root path (/) instead of /v1/chat - Enable OTEL trace collection to local Jaeger instance (localhost:4317) - Enhance OTEL instrumentation: * Add full prompt text to span attributes (prompt.text) * Add full response text to span attributes (response.text) * Improve visibility of inputs/outputs in Jaeger traces - Improve prompt instructions: * Add explicit instruction to call submit MCP tool when asked - Enhance logging: * Add evaluation result details to session evaluation logs Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add AGENT_SERVICE and BENCHMARK_SERVICE to example.env - Update run-with-port-forward.sh to read service names from .env - Use default values if environment variables are not set - Improves configurability and makes it easier to switch between different deployments Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add MAX_PARALLEL_SESSIONS configuration parameter (default: 1) - Implement ThreadPoolExecutor for concurrent session execution - Add thread-safe result collection with mutex lock - Display max parallel sessions in run summary - Maintain backward compatibility with sequential processing (max_parallel_sessions=1) - Support abort_on_failure in parallel mode by canceling remaining futures Benefits: - Significantly improves throughput for I/O-bound workloads - Allows users to configure parallelism based on their needs - Maintains all existing functionality and error handling Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Display table of all failed sessions with their error messages at end of run summary - Truncate long error messages to 50 characters for readability - Only show table if there are failed sessions - Helps quickly identify and diagnose session failures Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Extract text from artifacts and result first, regardless of state - Then handle failed/canceled/rejected states with extracted information - Include extracted output in error messages for better debugging - Provides complete context when tasks don't complete successfully Signed-off-by: Yoav Katz <katz@il.ibm.com>
…cluster Add three new scripts to automate deployment and configuration of Exgentic benchmark system on Kagenti Kubernetes cluster: 1. deploy-benchmark.sh: Deploy MCP tools via Kagenti API - Syncs local container images to cluster registry - Authenticates with Keycloak using password grant flow - Deploys tools with proper service configuration - Patches imagePullPolicy for local images - Waits for deployment readiness 2. deploy-agent.sh: Deploy A2A agents from source - Fetches and parses environment variables from GitHub - Deploys agents using Shipwright builds - Monitors build progress and waits for completion - Waits for deployment creation and readiness - Tests agent accessibility via A2A protocol - Fixes port configuration (8080 -> 8000) 3. configure-agent-environment.sh: Configure agent environment - Updates OpenAI API secret via kubectl patch - Patches agent deployment with Azure OpenAI settings - Accepts benchmark name as parameter - Waits for rollout completion These scripts enable automated deployment and testing of the Exgentic benchmark system without manual kubectl commands or UI interaction. Fixes: - Agent port mismatch (container port 8000 vs service port 8080) - MCP_URLS environment variable configuration - Azure OpenAI endpoint and model configuration Signed-off-by: Yoav Katz <katz@il.ibm.com>
…agenti-ui Port 8080 was being used by both the A2A agent port-forward and the kagenti-ui service (via Istio gateway), causing intermittent access issues to http://kagenti-ui.localtest.me:8080/. Changes: - Updated A2A_BASE_URL from localhost:8080 to localhost:8081 in example.env - Modified run-with-port-forward.sh to forward A2A agent to port 8081 - Updated connectivity test to check port 8081 This allows kagenti-ui to be accessed on port 8080 via Istio gateway while the A2A agent uses port 8081, eliminating port conflicts. Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changes:
- Made configure-agent-environment.sh executable (chmod +x)
- Fixed tool name in deploy-agent.sh: removed duplicate '-mcp' suffix
from 'exgentic-mcp-${BENCHMARK_NAME}-mcp' to 'exgentic-mcp-${BENCHMARK_NAME}'
- Fixed tool name in deploy-benchmark.sh: removed duplicate '-mcp' suffix
from 'exgentic-mcp-${BENCHMARK_NAME}-mcp' to 'exgentic-mcp-${BENCHMARK_NAME}'
This ensures consistent tool naming across deployment scripts and makes
the configuration script directly executable.
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
… auth Changes: - Updated QUICKSTART.md with comprehensive deployment instructions - Added Option 1: Deploy Your Own Benchmark and Agent - Added Option 2: Use Existing Services - Documented deploy-benchmark.sh and deploy-agent.sh usage - Updated configuration section with new port (8081) for A2A agent - Added reference documentation for deployment scripts - Fixed Keycloak authentication error in deployment scripts - Added automatic enabling of Direct Access Grants for kagenti client - Both deploy-benchmark.sh and deploy-agent.sh now configure Keycloak - Added better error messages for authentication failures - Renumbered steps after adding Keycloak configuration step This resolves the 'unauthorized_client' error when running deployment scripts and provides clear documentation for deploying benchmarks and agents to the Kagenti cluster. Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changes: 1. Renamed configure-agent-environment.sh to configure-agent-and-benchmark-environment.sh - Use 'kubectl set env' instead of JSON patch for cleaner updates - Extended script to configure both agent and benchmark deployments - Added clear separation between agent and benchmark configuration sections - Improved output formatting with dedicated sections for each component - Added deployment-specific configuration summaries - Agent gets: LLM_API_BASE, OPENAI_API_BASE, LLM_MODEL - Benchmark gets: OPENAI_API_BASE, EXGENTIC_SET_BENCHMARK_USER_SIMULATOR_MODEL 2. Enhanced deploy-benchmark.sh - Added fetching and parsing of benchmark-specific environment variables - Fetches .env.<benchmark> from agent-examples repository - Parses environment variables using Kagenti API - Includes env vars in tool deployment configuration - Added graceful handling when env file is not found - Renumbered steps after adding env var fetching step These improvements ensure: - Consistent LLM configuration across agent and benchmark - Better visibility into what's being configured - Benchmark-specific settings are properly applied from repository - Clearer output for troubleshooting - Proper separation of concerns between agent and benchmark configuration Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changed port-forward cleanup to kill processes by port number instead of service name. This ensures all existing port-forwards on ports 8000 and 8081 are cleaned up regardless of which benchmark or agent service they were forwarding to. Uses lsof to find processes using the ports and kills them, making the script more robust when switching between different benchmarks/agents. Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add resource limits (2Gi memory) to benchmark pod deployments - Rename close_session to delete_session throughout the stack - Add validation for delete_session response (supports both 'success' and 'status' fields) - Conditionally set EXGENTIC_SET_BENCHMARK_USER_SIMULATOR_MODEL only for tau benchmarks - Create evaluate_benchmark.sh script that accepts benchmark name as parameter - Set AGENT_SERVICE and BENCHMARK_SERVICE dynamically based on benchmark name Signed-off-by: Yoav Katz <katz@il.ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds test harness to check Exgentic benchmarks.
For: kagenti/kagenti#963
as part of Epic: kagenti/kagenti#962