Skip to content

fix: Debug and enhance Exgentic A2A runner#10

Open
yoavkatz wants to merge 17 commits intokagenti:mainfrom
yoavkatz:feature/exgentic-a2a-runner
Open

fix: Debug and enhance Exgentic A2A runner#10
yoavkatz wants to merge 17 commits intokagenti:mainfrom
yoavkatz:feature/exgentic-a2a-runner

Conversation

@yoavkatz
Copy link
Copy Markdown

@yoavkatz yoavkatz commented Mar 19, 2026

This PR adds test harness to check Exgentic benchmarks.

For: kagenti/kagenti#963

as part of Epic: kagenti/kagenti#962

Implement complete test harness for Exgentic benchmarks following the flow
described in kagenti/kagenti#963

Key features:
- MCP client using official Python SDK with streamable HTTP transport
- Sequential session processing with full lifecycle management
- A2A protocol integration for agent communication
- OpenTelemetry instrumentation for metrics and tracing
- Comprehensive configuration and documentation

Components:
- mcp_client.py: MCP protocol client for Exgentic server
- exgentic_adapter.py: High-level adapter for session management
- runner.py: Main orchestration with telemetry
- config.py: Configuration management
- prompt.py: Prompt builder with session_id injection
- otel.py: OpenTelemetry setup
- a2a_client.py: A2A protocol client (from appworld_a2a_runner)

Testing:
- Successfully connects to Exgentic MCP server (tau2 benchmark)
- Verified session creation with 114 available tasks
- Proper error handling and logging configuration

Documentation:
- README.md: Complete usage guide
- QUICKSTART.md: Quick start for Kagenti cluster
- Architecture and implementation docs

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changes:
- Add list_tasks() method to MCPClient to fetch all available task IDs
- Add get_task_ids() method to ExgenticAdapter
- Update iterate_sessions() to accept task_ids list and respect max_tasks
- Update create_session() to accept optional task_id parameter
- Update runner to fetch task IDs first, then iterate over them
- Remove debug exit(99) statement
- Improve logging to show progress (task X/Y)

This ensures we know the total number of tasks upfront and can properly
limit processing with max_tasks configuration.

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Remove all '# Made with ...' comments from Python files for cleaner code.

Signed-off-by: Yoav Katz <katz@il.ibm.com>
The agent card may advertise an internal URL (e.g., 0.0.0.0:8000) that is
not accessible from outside the pod. This change ensures we always use the
configured A2A_BASE_URL (e.g., localhost:8080 via port-forward) instead of
the URL from the agent card.

This fixes the 404 error when connecting to agents behind port-forwards or
proxies.

Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Fix syntax errors in run-with-port-forward.sh:
  * Add missing comment symbol on line 36
  * Fix unclosed quote on line 40
  * Replace parentheses in echo statements to avoid syntax errors
  * Update service names to match actual cluster services

- Configure A2A endpoint to use root path (/) instead of /v1/chat

- Enable OTEL trace collection to local Jaeger instance (localhost:4317)

- Enhance OTEL instrumentation:
  * Add full prompt text to span attributes (prompt.text)
  * Add full response text to span attributes (response.text)
  * Improve visibility of inputs/outputs in Jaeger traces

- Improve prompt instructions:
  * Add explicit instruction to call submit MCP tool when asked

- Enhance logging:
  * Add evaluation result details to session evaluation logs

Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add AGENT_SERVICE and BENCHMARK_SERVICE to example.env
- Update run-with-port-forward.sh to read service names from .env
- Use default values if environment variables are not set
- Improves configurability and makes it easier to switch between different deployments

Signed-off-by: Yoav Katz <katz@il.ibm.com>
@pdettori pdettori requested a review from kellyaa March 19, 2026 17:55
yoavkatz added 10 commits March 22, 2026 09:07
- Add MAX_PARALLEL_SESSIONS configuration parameter (default: 1)
- Implement ThreadPoolExecutor for concurrent session execution
- Add thread-safe result collection with mutex lock
- Display max parallel sessions in run summary
- Maintain backward compatibility with sequential processing (max_parallel_sessions=1)
- Support abort_on_failure in parallel mode by canceling remaining futures

Benefits:
- Significantly improves throughput for I/O-bound workloads
- Allows users to configure parallelism based on their needs
- Maintains all existing functionality and error handling

Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Display table of all failed sessions with their error messages at end of run summary
- Truncate long error messages to 50 characters for readability
- Only show table if there are failed sessions
- Helps quickly identify and diagnose session failures

Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Extract text from artifacts and result first, regardless of state
- Then handle failed/canceled/rejected states with extracted information
- Include extracted output in error messages for better debugging
- Provides complete context when tasks don't complete successfully

Signed-off-by: Yoav Katz <katz@il.ibm.com>
…cluster

Add three new scripts to automate deployment and configuration of Exgentic
benchmark system on Kagenti Kubernetes cluster:

1. deploy-benchmark.sh: Deploy MCP tools via Kagenti API
   - Syncs local container images to cluster registry
   - Authenticates with Keycloak using password grant flow
   - Deploys tools with proper service configuration
   - Patches imagePullPolicy for local images
   - Waits for deployment readiness

2. deploy-agent.sh: Deploy A2A agents from source
   - Fetches and parses environment variables from GitHub
   - Deploys agents using Shipwright builds
   - Monitors build progress and waits for completion
   - Waits for deployment creation and readiness
   - Tests agent accessibility via A2A protocol
   - Fixes port configuration (8080 -> 8000)

3. configure-agent-environment.sh: Configure agent environment
   - Updates OpenAI API secret via kubectl patch
   - Patches agent deployment with Azure OpenAI settings
   - Accepts benchmark name as parameter
   - Waits for rollout completion

These scripts enable automated deployment and testing of the Exgentic
benchmark system without manual kubectl commands or UI interaction.

Fixes:
- Agent port mismatch (container port 8000 vs service port 8080)
- MCP_URLS environment variable configuration
- Azure OpenAI endpoint and model configuration

Signed-off-by: Yoav Katz <katz@il.ibm.com>
…agenti-ui

Port 8080 was being used by both the A2A agent port-forward and the
kagenti-ui service (via Istio gateway), causing intermittent access
issues to http://kagenti-ui.localtest.me:8080/.

Changes:
- Updated A2A_BASE_URL from localhost:8080 to localhost:8081 in example.env
- Modified run-with-port-forward.sh to forward A2A agent to port 8081
- Updated connectivity test to check port 8081

This allows kagenti-ui to be accessed on port 8080 via Istio gateway
while the A2A agent uses port 8081, eliminating port conflicts.

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changes:
- Made configure-agent-environment.sh executable (chmod +x)
- Fixed tool name in deploy-agent.sh: removed duplicate '-mcp' suffix
  from 'exgentic-mcp-${BENCHMARK_NAME}-mcp' to 'exgentic-mcp-${BENCHMARK_NAME}'
- Fixed tool name in deploy-benchmark.sh: removed duplicate '-mcp' suffix
  from 'exgentic-mcp-${BENCHMARK_NAME}-mcp' to 'exgentic-mcp-${BENCHMARK_NAME}'

This ensures consistent tool naming across deployment scripts and makes
the configuration script directly executable.

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
… auth

Changes:
- Updated QUICKSTART.md with comprehensive deployment instructions
  - Added Option 1: Deploy Your Own Benchmark and Agent
  - Added Option 2: Use Existing Services
  - Documented deploy-benchmark.sh and deploy-agent.sh usage
  - Updated configuration section with new port (8081) for A2A agent
  - Added reference documentation for deployment scripts

- Fixed Keycloak authentication error in deployment scripts
  - Added automatic enabling of Direct Access Grants for kagenti client
  - Both deploy-benchmark.sh and deploy-agent.sh now configure Keycloak
  - Added better error messages for authentication failures
  - Renumbered steps after adding Keycloak configuration step

This resolves the 'unauthorized_client' error when running deployment
scripts and provides clear documentation for deploying benchmarks and
agents to the Kagenti cluster.

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changes:

1. Renamed configure-agent-environment.sh to configure-agent-and-benchmark-environment.sh
   - Use 'kubectl set env' instead of JSON patch for cleaner updates
   - Extended script to configure both agent and benchmark deployments
   - Added clear separation between agent and benchmark configuration sections
   - Improved output formatting with dedicated sections for each component
   - Added deployment-specific configuration summaries
   - Agent gets: LLM_API_BASE, OPENAI_API_BASE, LLM_MODEL
   - Benchmark gets: OPENAI_API_BASE, EXGENTIC_SET_BENCHMARK_USER_SIMULATOR_MODEL

2. Enhanced deploy-benchmark.sh
   - Added fetching and parsing of benchmark-specific environment variables
   - Fetches .env.<benchmark> from agent-examples repository
   - Parses environment variables using Kagenti API
   - Includes env vars in tool deployment configuration
   - Added graceful handling when env file is not found
   - Renumbered steps after adding env var fetching step

These improvements ensure:
- Consistent LLM configuration across agent and benchmark
- Better visibility into what's being configured
- Benchmark-specific settings are properly applied from repository
- Clearer output for troubleshooting
- Proper separation of concerns between agent and benchmark configuration

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changed port-forward cleanup to kill processes by port number instead of
service name. This ensures all existing port-forwards on ports 8000 and
8081 are cleaned up regardless of which benchmark or agent service they
were forwarding to.

Uses lsof to find processes using the ports and kills them, making the
script more robust when switching between different benchmarks/agents.

Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add resource limits (2Gi memory) to benchmark pod deployments
- Rename close_session to delete_session throughout the stack
- Add validation for delete_session response (supports both 'success' and 'status' fields)
- Conditionally set EXGENTIC_SET_BENCHMARK_USER_SIMULATOR_MODEL only for tau benchmarks
- Create evaluate_benchmark.sh script that accepts benchmark name as parameter
- Set AGENT_SERVICE and BENCHMARK_SERVICE dynamically based on benchmark name

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant