Add agent filtering and unify CLI/pipeline code paths by sdubagun-amd · Pull Request #17 · AMD-AGI/GEAK

sdubagun-amd · 2026-02-20T11:07:45Z

Agent filtering:

GEAK_ALLOWED_AGENTS / GEAK_EXCLUDED_AGENTS env vars with CLI flags
Prompt-level enforcement via system prompt addendum in task_generator
Parse-time safety-net filter in dispatch and task_generator
Default fallback agent: swe_agent
Accept **_extra kwargs in orchestrator tool functions
Handle task_files arriving as JSON string in dispatch_tasks

CLI/pipeline unification:

run-tasks delegates to dispatch.task_file_to_agent_task and run_task_batch
Publicize dispatch.task_file_to_agent_task as canonical entry point
Add codebase-context standalone CLI entry point
Update tools.md diagram with agent type nodes
Fix preprocessor CLI to pass model_factory for UnitTestAgent

Tests:

test_agent_filtering: 17 cases for filtering logic and prompt injection
test_tool_consistency: structural checks for CLI/pipeline alignment

Co-authored-by: Cursor <cursoragent@cursor.com>

Agent filtering: - GEAK_ALLOWED_AGENTS / GEAK_EXCLUDED_AGENTS env vars with CLI flags - Prompt-level enforcement via system prompt addendum in task_generator - Parse-time safety-net filter in dispatch and task_generator - Default fallback agent: swe_agent - Accept **_extra kwargs in orchestrator tool functions - Handle task_files arriving as JSON string in dispatch_tasks CLI/pipeline unification: - run-tasks delegates to dispatch.task_file_to_agent_task and run_task_batch - Publicize dispatch.task_file_to_agent_task as canonical entry point - Add codebase-context standalone CLI entry point - Update tools.md diagram with agent type nodes - Fix preprocessor CLI to pass model_factory for UnitTestAgent Tests: - test_agent_filtering: 17 cases for filtering logic and prompt injection - test_tool_consistency: structural checks for CLI/pipeline alignment Co-authored-by: Cursor <cursoragent@cursor.com>

…code - Add pipeline_helpers.py: centralize harness creation/validation, baseline profiling, context injection, model loading, and agent filtering across all CLI entry points (geak, geak-orchestrate, run-tasks, task-generator) - Add discovery_types.py: shared DiscoveryResult/KernelInfo dataclasses with from_dict() factory and kernel language inference - Flow per-kernel metrics (duration, pct_of_total, bottleneck) from baseline_metrics.json through inject_pipeline_context to all agents - Fix config priority to CLI > Prompt > YAML (prevents LLM prompt from overriding explicit CLI flags like --gpu-ids) - Fix COMMANDMENT path resolution in task file writing (relative_to) - Add backend-agnostic warmup to profiler-mcp - Add harness static validation for --profile/--correctness flags - Integrate mini-swe-agent tools into orchestrator (bash, str_replace_editor, profile_kernel, strategy_manager) - Remove deprecated kernel-profiler MCP, discovery built-in tool, and discovery_defaults.toml - Update architecture, flow, and tools documentation Co-authored-by: Cursor <cursoragent@cursor.com>

- Add _normalize_command() to profiler-mcp: wraps commands containing shell constructs (cd, $VAR, &&, |) in bash -c so rocprofv3's os.execvpe can execute them correctly - Fix all MCP test suites (profiler-mcp, kernel-evolve, openevolve-mcp, kernel-ercs) broken by fastmcp API change: replace _tool_manager._tools with asyncio.run(mcp.list_tools()) and .fn() with direct function calls Co-authored-by: Cursor <cursoragent@cursor.com>

…llocations - Extend _normalize_command to detect VAR=value prefix patterns (e.g. HIP_VISIBLE_DEVICES=4 python3 ...) that crash rocprofv3's execvpe - Add static check in validate_harness for torch.randn(..., device='cuda') inside run_profile, which pollutes profiler traces with RNG/memset kernels Co-authored-by: Cursor <cursoragent@cursor.com>

…add per-round evaluation - Add BENCHMARK and FULL_BENCHMARK sections to COMMANDMENT.md generation and validation (now requires all 5: SETUP, CORRECTNESS, PROFILE, BENCHMARK, FULL_BENCHMARK) - Rename test_perf tool to save_and_test across all agents, configs, prompts, and tests - Preprocessor now captures benchmark_baseline.txt and full_benchmark_baseline.txt from harness --benchmark/--full-benchmark - dispatch.py includes BENCHMARK section in _geak_test_cmd.sh so agents get wall-clock latency feedback via save_and_test - Orchestrator runs per-round evaluation: FULL_BENCHMARK + PROFILE on the best candidate from each round, feeding results into next-round task generation - Update Dockerfile to use geak-oe branch with BENCHMARK-based evaluator - Update all tests to match 5-section COMMANDMENT structure Co-authored-by: Cursor <cursoragent@cursor.com>

The preprocessor writes the UnitTestAgent's generated harness path to harness_path.txt. The orchestrator reads it instead of falling back to discovery's focused_test_file, which lacks --benchmark/--profile support. Co-authored-by: Cursor <cursoragent@cursor.com>

- Rewrite _evaluate_round_best to create a temporary worktree, apply the best patch, set GEAK_* env vars, and run SETUP+FULL_BENCHMARK+PROFILE against the patched kernel (was running against unpatched baseline). - Add programmatic benchmark parsing (_parse_median_latency_ms, _parse_shape_count) for independent speedup verification. - Auto-discover task_files and results_dir when LLM omits them. - Catch LimitsExceeded from task-generation sub-agent gracefully. - Add GEAK_ORCHESTRATOR_STEP_LIMIT safety net (default 200). - Write benchmark_duration_us into baseline_metrics.json during preprocessing for consistent wall-clock comparisons. - Increase task-generator limits (step: 75→200, cost: $10→$50). Co-authored-by: Cursor <cursoragent@cursor.com>

- Fix eval worktree crash: resolve eval_dir to absolute path so subprocess.run(cwd=...) works regardless of the process CWD. - Fix wrong winner selection: compare absolute TOTAL_KERNEL_TIME_MS across agents instead of self-reported speedup (which varies per agent baseline). Falls back to speedup when kernel times unavailable. - Fix profiler call: replace unsupported env= kwarg with temporary os.environ PYTHONPATH, avoiding both the TypeError and rocprofv3 nested-quote issues. - Add --start-round flag to geak-orchestrate CLI so the orchestrator can resume from a given round, skipping exploration and loading prior round evaluations from disk. Co-authored-by: Cursor <cursoragent@cursor.com>

- Validate fallback agent is in the allowed set before using it - Strip fenced code block markers when reading COMMANDMENT sections - Use unique temp filenames for test scripts to avoid concurrent races - Gracefully handle test discovery failures in preprocessor - Safely parse priority and num_gpus in task generator LLM responses - Fix commandment auto-fix regexes to match both quote styles - Fix workspace_path type annotation (Path -> Path | None) Co-authored-by: Cursor <cursoragent@cursor.com>

A task with num_gpus > len(gpu_ids) would block forever in gpu_queue.get() waiting for GPUs that don't exist. Cap the request to the pool size to prevent the deadlock. Co-authored-by: Cursor <cursoragent@cursor.com>

- Exclude .rocprofv3/, __pycache__/, *.pyc, .pytest_cache/, *.egg-info/, *.so, and .geak_resolved/ from git diff when generating patches to prevent binary artifacts from breaking patch application. - Add DEFAULT_EVAL_BENCHMARK_ITERATIONS (50) as a shared constant in pipeline_helpers.py. All benchmark invocations — preprocessing baselines, agent benchmarks, and orchestrator evaluations — now use this value via the GEAK_BENCHMARK_EXTRA_ARGS env var, ensuring apples-to-apples speedup comparisons. - COMMANDMENT BENCHMARK/FULL_BENCHMARK sections now expand ${GEAK_BENCHMARK_EXTRA_ARGS:-} so iteration count is configurable. - Preprocessor re-runs all harness modes with --iterations 50 after initial validation to collect high-quality baselines. - geak --from-task and geak parallel mode now propagate GEAK_BENCHMARK_EXTRA_ARGS to agent environments. - Harness template (mini_unit_test_agent.yaml) instructs agents to accept --iterations N CLI arg with GEAK_BENCHMARK_ITERATIONS env fallback. - Updated INSTRUCTIONS.md, README.md, and docs/ with new env vars, patch exclusion list, and baseline benchmark re-run step. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

- Add MCP integration module (mcp_integration/) from geak_v3_db - Add AMD/NVIDIA knowledge base (knowledge-base/) from geak_v3_db - Add RAG scripts and examples from geak_v3_db - Add --mcp / --debug flags to mini.py entry point - Add TeeOutput for console log capture to trajectory - Add langchain optional-dependencies group to pyproject.toml - Apply output truncation logic to all agent config templates (geak_v3_db) - Keep dev multi-agent architecture: ParallelAgent, StrategyInteractiveAgent, UnitTestAgent - Keep dev tools_runtime, save_patch, config_editor, task_parser modules - Merge mkdocs.yml nav: retain all pages from both branches - save.py: use ensure_ascii=False for non-ASCII character support Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

- Add missing geak_v3_db content to README.md: detailed config table (mini_no_temp, mini_reverse_kl), RAG config parameter table, knowledge base document requirements - Rewrite README_zh.md to match README.md structure: add GEAK-v3 intro, multi-agent architecture, parallel optimization, tools, output artifacts Co-authored-by: Cursor <cursoragent@cursor.com>

- Add HIP/CK kernel language detection and COMMANDMENT generation - Broaden orchestrator metric parsers to recognize BENCHMARK_LATENCY_MS and "Total median time:" output formats - Fix preprocessor baseline enrichment to parse BENCHMARK_LATENCY_MS from benchmark_baseline.txt - Update test discovery, task planner, and unit test agent for CK kernels Co-authored-by: Cursor <cursoragent@cursor.com>

…llel - Remove separate MCP early-return path in mini.py - MCP environment now flows into the standard ParallelAgent pipeline - env_factory creates MCPEnabledEnvironment for parallel sub-agents when --mcp - MCP prompts (SYSTEM_TEMPLATE, INSTANCE_TEMPLATE) injected into agent config - Fix profiling_tools.py f-string Python 3.10 compatibility (double→single quotes) Co-authored-by: Cursor <cursoragent@cursor.com>

…, skip redundant baselines Seven focused changes to reduce geak-orchestrate wall-clock time: 1. Build on prior round's best patch: round N+1 agents start from the globally best patch via create_worktree_with_patch, tracked in ctx. 2. Full round context for task generator: auto-inject ALL prior rounds' results, planned tasks, and orchestrator evaluations so the task generator avoids repeating strategies. 3. Remove baseline establishment: replace Phase 3 / Step 3 in YAML prompts with "Review Provided Baselines" and add skip-baseline instruction to inject_pipeline_context for dispatch-path agents. 4. Separate agent vs eval benchmark iterations: agents use 10 iterations (GEAK_AGENT_BENCHMARK_ITERATIONS) for fast feedback; eval keeps 50. 5. Deterministic patch selection first: try rewrite_best_results before falling back to LLM SelectPatchAgent, saving 8-76 LLM steps per task. 6. Default rounds to 2: change GEAK_MAX_ROUNDS default from 5 to 2. 7. Early stopping: break out of round loop when verified_speedup doesn't improve over prior best by GEAK_EARLY_STOP_THRESHOLD (default 0.5%). Also includes bash tool cwd propagation and save_and_test git exclude syntax fix (:(exclude) instead of :!). Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

merge: integrate geak_v3_db into dev (MCP + multi-agent coexistence)

…ture - Add benchmark_parsing.py to git (imported by orchestrator, default agent, and parallel agent but was untracked -- would break on fresh clone) - Add previous_tasks_dir, round_evaluations, current_round params to generate_tasks_from_content to match generate_tasks signature Co-authored-by: Cursor <cursoragent@cursor.com>

Resolve all conflicts by keeping PR branch versions: - README.md, scripts/README.md: keep pipeline-focused docs - mini.py, profiling_tools.py: keep PR's CLI flags and ruff formatting - mkdocs.yml: delete (intentionally removed in cleanup) - examples/test_scripts/: keep renamed files Co-authored-by: Cursor <cursoragent@cursor.com>

Puyuan-Yang and others added 6 commits February 6, 2026 09:41

Initial commit: mini-swe-agent

ac19d58

Co-authored-by: Cursor <cursoragent@cursor.com>

update readme.md

a7cb33e

update readme.md

d61dafb

update readme.md

4cb09d6

update readme.md

1c6cd78

sdubagun-amd requested review from Umangatamd and iraj465 February 20, 2026 11:07

sdubagun-amd and others added 22 commits February 20, 2026 21:01

Fixed orchestrator running in exploration phase

7741d80

Fix GPU pool deadlock when task requests more GPUs than available

96229a7

A task with num_gpus > len(gpu_ids) would block forever in gpu_queue.get() waiting for GPUs that don't exist. Cap the request to the pool size to prevent the deadlock. Co-authored-by: Cursor <cursoragent@cursor.com>

docs: sync README from geak_v3_features

3d41d37

Co-authored-by: Cursor <cursoragent@cursor.com>

update readme

5579a85

update readme

c357dbc

docs: merge README from both branches, add MCP integration section

9989efe

Co-authored-by: Cursor <cursoragent@cursor.com>

chore: remove README_zh.md

727b1a3

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request #18 from AMD-AGI/merge/integrate-geak-v3-db

2a85039

merge: integrate geak_v3_db into dev (MCP + multi-agent coexistence)

sdubagun-amd and others added 5 commits February 25, 2026 04:04

Update README.md

78ddc9b

Update README.md

37897cd

Update README.md

08694ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add agent filtering and unify CLI/pipeline code paths#17

Add agent filtering and unify CLI/pipeline code paths#17
sdubagun-amd wants to merge 33 commits intoyabfrom
agent-filtering-and-cli-unification

sdubagun-amd commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sdubagun-amd commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants