Agent filtering and cli unification by sdubagun-amd · Pull Request #19 · AMD-AGI/GEAK

sdubagun-amd · 2026-02-25T09:15:45Z

Latest changes with Unit Test Discovery, Heterogenous parallel agents with task creation and execution.

Decoupled additions (copied as-is from MSA): - mcp_tools/: 6 MCP servers (automated-test-discovery, kernel-evolve, kernel-ercs, kernel-profiler, metrix-mcp, openevolve-mcp) + mcp-client - Dockerfile, entrypoint.sh, scripts/run-docker.sh - runtime_env.py (local/Docker auto-detection) - optimizer/ (unified OpenEvolve + Autotune interface) - benchmark.py (standardized benchmarking framework) - kernel_profile.py (GPU profiling CLI) - mcp_tools/metrix.py (AMD Metrix API tool) - reference/ (50+ GPU optimization strategies database + state machine) - test_suite/ (10-kernel AITER regression suite) - examples/add_kernel/ - docs: DISCOVERY_PIPELINE.md, METRIX_TOOL.md, GETTING_STARTED.md, RUNTIME_ENV.md, RUNTIME_QUICKSTART.md Integrated changes (best-of-both-worlds): - Test discovery: MSA's content-based pipeline runs first (fast, free), results fed into v3's UnitTestAgent as context. Subagent always runs but starts informed rather than exploring from scratch. - mini.py: Added --runtime, --docker-image, --workspace CLI flags - pyproject.toml: Added geak/kernel-profile scripts, mcp[cli] dep - README.md: Added MCP servers, Docker, architecture sections All geakagent -> minisweagent import references fixed in ported files.

Analysis/comparison docs moved to ~/geak_analysis_docs/ (not needed in repo).

Cherry-picked 6 geak_v3_features commits (167fc13..18853fb): - Model refactor: amd_base, amd_claude, amd_openai, amd_gemini - Tool-call message protocol in default.py - Trajectory saving, test_profiling_tool.py deletion - Parallel worktree fixes, unit test prompt Ported 15 msa post-squash-merge commits (path-translated geakagent->minisweagent): - baseline_metrics.py, protected_files.py (new) - resolve_kernel_url.py, test discovery injection (new) - OpenEvolve COMMANDMENT-based evaluation refactor - kernel_profile: remove --filter, always profile all kernels - optimizer/core.py: new MCP API (gpu, output_dir, commandment_path) - mini.py: --kernel-url flag, discovery injection, INSTRUCTIONS.md loading - default.py: summary_on_cost_limit feature - Dockerfile: OpenEvolve installation - openevolve-mcp/server.py: refactored Doc rejects (README, METRIX_TOOL, RUNTIME_QUICKSTART) deferred to cleanup phase. Co-authored-by: Cursor <cursoragent@cursor.com>

Replace double quotes inside f-string expressions with single quotes. Python 3.10 does not support reusing the outer quote character inside f-string braces (PEP 701 landed in 3.12). Co-authored-by: Cursor <cursoragent@cursor.com>

New MCP server `profiler-mcp` wraps both profiling backends behind a single `profile_kernel` tool with a `backend` parameter: - backend="metrix": AMD Metrix API (structured JSON, bottleneck classification) - backend="rocprof-compute": rocprof-compute CLI (deep roofline, instruction mix) Files: - mcp_tools/profiler-mcp/src/profiler_mcp/server.py - unified MCP server - mcp_tools/profiler-mcp/tests/test_profiler_unit.py - 14 mock-based tests - mcp_tools/profiler-mcp/tests/test_profiler_integration.py - 4 GPU tests - mcp_tools/profiler-mcp/examples/profile_kernel.py - Python API example All tests pass (14 unit, 4 integration on MI300X). Ruff-clean. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

…(Phase 4) Co-authored-by: Cursor <cursoragent@cursor.com>

…Worker (Phase 5) Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

- Delete geak_agent/ legacy package; move resolve_kernel_url to src/minisweagent/tools/ - Deduplicate MetrixTool (delete src copy, keep mcp_tools/metrix-mcp canonical) - Delete 66 inherited mini-swe-agent docs + mkdocs.yml + assets - Delete 23 dead tests (missing upstream modules), fix 3 remaining - Rename test_suite/ -> eval_suite/, reference/ -> knowledge_base/ - Consolidate examples under examples/ - Fix all stale geakagent/sdubagun/yueliu14 references - Remove 16 ghost __pycache__-only directories - Deprecate mcp_tools/kernel-profiler/ (superseded by profiler-mcp) - Remove mcp_tools/kernel-from-url-mcp/ (dead, only __pycache__) - Remove mkdocs dependencies from pyproject.toml - Update git remote to AMD-AGI/GEAK.git Tests: 96 passed, 58 skipped, 1 xfailed, 0 failures Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

…plumbing tests - Refactor MCPToolBridge to use a single persistent asyncio event loop on a background daemon thread per instance, resolving "Future attached to a different loop" errors that broke MCP server calls during e2e runs. - Fix discovery scoping for --kernel-url flows: when the kernel lives inside a .geak_resolved clone, scope both discovery calls (in mini.py and unit_test_agent.py) to the clone root instead of the entire workspace. Add a hard boundary in discovery._expand_workspace_for_file to prevent walking above .geak_resolved. - Export RESOLVED_DIR_NAME constant and add find_resolved_clone_root() helper in resolve_kernel_url_impl.py to couple the directory convention cleanly. - Fix mini.py second discovery call: prioritize _resolved_kernel_path (from --kernel-url resolution) when determining _kernel_path, instead of falling through to None when --task is not provided. - Add new test files: test_discovery_scope.py (scope boundary + mini.py wiring), test_e2e_pipeline_smoke.py, test_mcp_server_smoke.py, test_plumbing_contracts.py, test_toolruntime_dispatch.py, and extend test_mcp_bridge.py with event loop lifecycle tests. - Apply ruff formatting fixes across modified files.

…nt instructions - File-based MCP transport for large profiler results; bump StreamReader limit to 16MB - Auto-detect num_parallel from gpu_ids when not explicitly set - Externalize discovery patterns into discovery_defaults.toml with per-project overrides - Detect Triton wrapper files; fall back to kernel_file stem for test/bench matching - Strengthen test_perf mandatory usage in agent prompts - Fix default task to defer to INSTRUCTIONS.md instead of banning OpenEvolve - Add confirm_exit to base AgentConfig for --exit-immediately support

- Extend discovery to detect HIP/CK/ASM kernels, trace cross-language call chains (Python→torch.ops→pybind11→.cu), and build dependency graphs with fusion opportunity detection - Add GPU pool scheduler: M tasks on N GPUs with dynamic slot assignment via ThreadPoolExecutor and thread-safe GPU queue - Add dynamic task planner generating language-aware optimization tasks (OpenEvolve, CK template tuning, HIP launch config, fusion, etc.) - Add COMMANDMENT.md validation (required sections, shell built-in detection) with auto-validation hook in str_replace_editor - Fix baseline_metrics NaN/inf sanitization and post-write JSON roundtrip - Update SelectPatchAgent to handle task_*/parallel_* directories and prefer per-kernel latency metrics - Consolidate duplicated extension lists into shared constants - Add 25 unit tests for validate_commandment and task_planner Co-authored-by: Cursor <cursoragent@cursor.com>

- Add GPU/profiler rules to task planner and strategy yaml to prevent agents from using inline env vars or HIP_VISIBLE_DEVICES prefixes - Detect inline env var prefixes (VAR=val cmd) in COMMANDMENT validator - Add COMMANDMENT.md validation hook to bash tool (agents bypass editor) - Update INSTRUCTIONS.md with anti-patterns and wrapper script template - Add 4 new tests for inline env var detection Co-authored-by: Cursor <cursoragent@cursor.com>

Discovery was broken when given a directory instead of a single file: - MCP server crashed on read_text() for directories, _expand_workspace started from parent instead of the directory itself - mini.py missed .git inside the directory (kp.parents excludes kp) - DiscoveryPipeline.run() skipped workspace expansion for directories Fixes: - Add directory mode to MCP discover() with recursive kernel scanning - Fix _expand_workspace to check the path itself when it's a directory - Add _expand_workspace_for_dir() that scopes workspace to the given directory, preventing unrelated sibling files from polluting results - Use parent directory name as kernel_name when file has a generic name (kernel.py, main.py) so test-name matching works properly - Fix mini.py to check kp itself for .git before walking kp.parents Co-authored-by: Cursor <cursoragent@cursor.com>

- Eliminate double discovery: mini.py reuses stashed _run_discovery._last_result instead of calling run_discovery_pipeline() a second time - Enrich discovery context: new format_discovery_for_agent() includes kernel analysis, language-specific testing guidance (triton/hip/ck/asm), and extracted test patterns (tolerances, shapes, dtypes, imports) - Extract test patterns: _extract_test_patterns() in discovery.py pulls atol/rtol, input shapes, dtypes, reference impls, and import patterns from top-confidence test files - Upgrade UnitTestAgent to TestHarnessAgent: creates a fixed test harness with --correctness/--profile/--benchmark modes. Reads INSTRUCTIONS.md for harness rules. The harness is an immutable evaluation contract. - Update INSTRUCTIONS.md: section 1a references pre-scanned discovery results (no re-discovery needed), section 1b notes pre-built harness from UTA Co-authored-by: Cursor <cursoragent@cursor.com>

Fix task pipeline: agent cwd, config conflicts, and task execution

resolve_kernel_url stored local_repo_path as a relative path while local_file_path was absolute. The parallel agent resolved the relative path against the task file directory, producing a doubled nonsense path that didn't exist. Now all three layers ensure absolute paths: the source (resolve_kernel_url_impl), the orchestrator context loader, and the dispatch batch runner. Co-authored-by: Cursor <cursoragent@cursor.com>

The Full Pipeline Mode (preprocessor → orchestrator) was skipping the UnitTestAgent, relying on a single-shot LLM finisher in the MCP discovery server for harness creation. That approach consistently failed because a single LLM call can't reliably generate correct test harnesses (wrong tensor shapes, wrong tolerances, wrong imports). The UnitTestAgent is a multi-turn agent with bash/editor tools that can read the kernel, read existing tests, run them, see errors, and iterate until the harness works. It was already built for this purpose but wasn't wired into the new pipeline. Changes: - preprocessor.py: Add model/model_factory params to run_preprocessor(). After MCP discovery (Step 2), run UnitTestAgent (Step 2b) with discovery context to create a validated harness. Extract absolute path to the harness script for the profiler. Fall back to raw discovery test command if UnitTestAgent fails. - mini.py: Pass model and model_factory to run_preprocessor(). Tested on ROCm/aiter RoPE kernel: UnitTestAgent creates a working harness, profiling succeeds (48.44 us baseline), orchestrator generates tasks, optimization agent produces 18+ patches with ~13% speedup. Co-authored-by: Cursor <cursoragent@cursor.com>

Wire UnitTestAgent into Full Pipeline Mode preprocessor

…ntext passing Co-authored-by: Cursor <cursoragent@cursor.com>

- Fix GPU isolation: propagate HIP_VISIBLE_DEVICES through BashCommand, MCPToolBridge, ProfilingAnalyzer, and OpenEvolve subprocess env. Prevent shallow-copy race in ParallelAgent by creating new env dicts per thread. Add defensive copy in ToolRuntime.set_env(). - GPU-aware task generation: extend AgentTask with num_gpus, teach task-generator LLM to allocate GPUs per task, ParallelAgent acquires N GPU slots from pool for multi-GPU tasks (e.g. OpenEvolve). - Docker: remove hardcoded HIP_VISIBLE_DEVICES=0 from Dockerfile, unset it in entrypoint.sh so geak --gpu-ids controls isolation. - Fix profiler integration tests: add __main__ to examples/add_kernel so rocprofv3 captures GPU activity, fix MetrixTool empty HIP_VISIBLE_DEVICES handling, update test assertions to match add_kernel (not rope), mark rocprof-compute roofline as xfail. - Add developer docs: gpu-isolation.md (invariants, how-to, pitfalls), update architecture/flow/tools diagrams with SweAgent, codebase context passing chain, multi-GPU dispatch, and --gpu-ids flags. Remove redundant diagrams.md. Co-authored-by: Cursor <cursoragent@cursor.com>

- Fix 1: profiler-mcp no longer mutates os.environ; passes clean env via _env_override to ProfilingAnalyzer subprocess instead. - Fix 2: Centralize agent-type ↔ class mappings into agent_spec.py (_agent_type_to_class / _agent_class_to_type) eliminating 4 duplicate definitions across dispatch, orchestrator, task_generator, task_runner. - Fix 3: Replace silent `except Exception: pass` in OpenEvolveWorker._save_result_artifacts with logger.warning(). - Fix 4: Add public set_tools() to AmdLlmModelBase and AmdLlmModel router; SweAgent and task_generator use it instead of reaching into model._impl. - Fix 5: Remove duplicate `cfg: dict` type annotation in dispatch.py else-branch. - Fix 6: Harden _derive_test_command_from_commandment to support fenced code blocks, add fallback for raw .py commands, and log debug messages on parse outcomes. Co-authored-by: Cursor <cursoragent@cursor.com>

The previous _env_override approach didn't actually remove the empty key from the subprocess env (dict merge brings it back from os.environ). Switch to save/restore of os.environ, which is safe here because profiler-mcp runs as a dedicated single-threaded MCP server process. Co-authored-by: Cursor <cursoragent@cursor.com>

Swe agent, openevolve fixes and context

The test harness had no control over how many shapes were used for profiling vs testing, causing OOM during GPU profiling. Changes: - Add select_shapes_uniform() utility in discovery.py for programmatic shape selection (dedup, sort by element count, uniform sampling) - UnitTestAgent system prompt now instructs the LLM to read discovered test files, extract ALL shapes (variables, loops, configs — not just literal tuples), and build two lists: HARNESS_SHAPES (20-25) for correctness/benchmark PROFILE_SHAPES (5) for --profile mode only - format_discovery_for_agent() cleaned up: passes all extracted patterns without truncation so the LLM has full shape context Co-authored-by: Cursor <cursoragent@cursor.com>

The harness now supports four CLI modes with distinct shape sets: --profile → PROFILE_SHAPES (5) --benchmark → HARNESS_SHAPES (20-25 sampled) --correctness → HARNESS_SHAPES --full-benchmark → ALL_SHAPES (every discovered shape) --full-benchmark runs all discovered shapes and is intended for use only at the start and end of optimization to get the complete picture. --benchmark uses the sampled subset for fast iteration loops. If ALL_SHAPES has ≤25 entries, HARNESS_SHAPES = ALL_SHAPES and both benchmark modes behave identically. Updated INSTRUCTIONS.md and UTA system prompt accordingly. Co-authored-by: Cursor <cursoragent@cursor.com>

The baseline must record BOTH --benchmark (reduced, 20-25 shapes) and --full-benchmark (all shapes) results. During iterations the agent compares reduced vs reduced; at the end it compares full vs full. Mixing modes in a comparison produces meaningless speedup numbers because the shape sets differ. Co-authored-by: Cursor <cursoragent@cursor.com>

Fix OOM in profiling: LLM-driven shape extraction from discovery

The uniform index calculation divides by (count-1), which crashes when count=1. Add early returns for count<=0 (empty) and count==1 (median shape). Co-authored-by: Cursor <cursoragent@cursor.com>

Agent filtering: - GEAK_ALLOWED_AGENTS / GEAK_EXCLUDED_AGENTS env vars with CLI flags - Prompt-level enforcement via system prompt addendum in task_generator - Parse-time safety-net filter in dispatch and task_generator - Default fallback agent: swe_agent - Accept **_extra kwargs in orchestrator tool functions - Handle task_files arriving as JSON string in dispatch_tasks CLI/pipeline unification: - run-tasks delegates to dispatch.task_file_to_agent_task and run_task_batch - Publicize dispatch.task_file_to_agent_task as canonical entry point - Add codebase-context standalone CLI entry point - Update tools.md diagram with agent type nodes - Fix preprocessor CLI to pass model_factory for UnitTestAgent Tests: - test_agent_filtering: 17 cases for filtering logic and prompt injection - test_tool_consistency: structural checks for CLI/pipeline alignment Co-authored-by: Cursor <cursoragent@cursor.com>

…code - Add pipeline_helpers.py: centralize harness creation/validation, baseline profiling, context injection, model loading, and agent filtering across all CLI entry points (geak, geak-orchestrate, run-tasks, task-generator) - Add discovery_types.py: shared DiscoveryResult/KernelInfo dataclasses with from_dict() factory and kernel language inference - Flow per-kernel metrics (duration, pct_of_total, bottleneck) from baseline_metrics.json through inject_pipeline_context to all agents - Fix config priority to CLI > Prompt > YAML (prevents LLM prompt from overriding explicit CLI flags like --gpu-ids) - Fix COMMANDMENT path resolution in task file writing (relative_to) - Add backend-agnostic warmup to profiler-mcp - Add harness static validation for --profile/--correctness flags - Integrate mini-swe-agent tools into orchestrator (bash, str_replace_editor, profile_kernel, strategy_manager) - Remove deprecated kernel-profiler MCP, discovery built-in tool, and discovery_defaults.toml - Update architecture, flow, and tools documentation Co-authored-by: Cursor <cursoragent@cursor.com>

- Add _normalize_command() to profiler-mcp: wraps commands containing shell constructs (cd, $VAR, &&, |) in bash -c so rocprofv3's os.execvpe can execute them correctly - Fix all MCP test suites (profiler-mcp, kernel-evolve, openevolve-mcp, kernel-ercs) broken by fastmcp API change: replace _tool_manager._tools with asyncio.run(mcp.list_tools()) and .fn() with direct function calls Co-authored-by: Cursor <cursoragent@cursor.com>

…llocations - Extend _normalize_command to detect VAR=value prefix patterns (e.g. HIP_VISIBLE_DEVICES=4 python3 ...) that crash rocprofv3's execvpe - Add static check in validate_harness for torch.randn(..., device='cuda') inside run_profile, which pollutes profiler traces with RNG/memset kernels Co-authored-by: Cursor <cursoragent@cursor.com>

…add per-round evaluation - Add BENCHMARK and FULL_BENCHMARK sections to COMMANDMENT.md generation and validation (now requires all 5: SETUP, CORRECTNESS, PROFILE, BENCHMARK, FULL_BENCHMARK) - Rename test_perf tool to save_and_test across all agents, configs, prompts, and tests - Preprocessor now captures benchmark_baseline.txt and full_benchmark_baseline.txt from harness --benchmark/--full-benchmark - dispatch.py includes BENCHMARK section in _geak_test_cmd.sh so agents get wall-clock latency feedback via save_and_test - Orchestrator runs per-round evaluation: FULL_BENCHMARK + PROFILE on the best candidate from each round, feeding results into next-round task generation - Update Dockerfile to use geak-oe branch with BENCHMARK-based evaluator - Update all tests to match 5-section COMMANDMENT structure Co-authored-by: Cursor <cursoragent@cursor.com>

The preprocessor writes the UnitTestAgent's generated harness path to harness_path.txt. The orchestrator reads it instead of falling back to discovery's focused_test_file, which lacks --benchmark/--profile support. Co-authored-by: Cursor <cursoragent@cursor.com>

- Rewrite _evaluate_round_best to create a temporary worktree, apply the best patch, set GEAK_* env vars, and run SETUP+FULL_BENCHMARK+PROFILE against the patched kernel (was running against unpatched baseline). - Add programmatic benchmark parsing (_parse_median_latency_ms, _parse_shape_count) for independent speedup verification. - Auto-discover task_files and results_dir when LLM omits them. - Catch LimitsExceeded from task-generation sub-agent gracefully. - Add GEAK_ORCHESTRATOR_STEP_LIMIT safety net (default 200). - Write benchmark_duration_us into baseline_metrics.json during preprocessing for consistent wall-clock comparisons. - Increase task-generator limits (step: 75→200, cost: $10→$50). Co-authored-by: Cursor <cursoragent@cursor.com>

- Fix eval worktree crash: resolve eval_dir to absolute path so subprocess.run(cwd=...) works regardless of the process CWD. - Fix wrong winner selection: compare absolute TOTAL_KERNEL_TIME_MS across agents instead of self-reported speedup (which varies per agent baseline). Falls back to speedup when kernel times unavailable. - Fix profiler call: replace unsupported env= kwarg with temporary os.environ PYTHONPATH, avoiding both the TypeError and rocprofv3 nested-quote issues. - Add --start-round flag to geak-orchestrate CLI so the orchestrator can resume from a given round, skipping exploration and loading prior round evaluations from disk. Co-authored-by: Cursor <cursoragent@cursor.com>

- Validate fallback agent is in the allowed set before using it - Strip fenced code block markers when reading COMMANDMENT sections - Use unique temp filenames for test scripts to avoid concurrent races - Gracefully handle test discovery failures in preprocessor - Safely parse priority and num_gpus in task generator LLM responses - Fix commandment auto-fix regexes to match both quote styles - Fix workspace_path type annotation (Path -> Path | None) Co-authored-by: Cursor <cursoragent@cursor.com>

A task with num_gpus > len(gpu_ids) would block forever in gpu_queue.get() waiting for GPUs that don't exist. Cap the request to the pool size to prevent the deadlock. Co-authored-by: Cursor <cursoragent@cursor.com>

- Exclude .rocprofv3/, __pycache__/, *.pyc, .pytest_cache/, *.egg-info/, *.so, and .geak_resolved/ from git diff when generating patches to prevent binary artifacts from breaking patch application. - Add DEFAULT_EVAL_BENCHMARK_ITERATIONS (50) as a shared constant in pipeline_helpers.py. All benchmark invocations — preprocessing baselines, agent benchmarks, and orchestrator evaluations — now use this value via the GEAK_BENCHMARK_EXTRA_ARGS env var, ensuring apples-to-apples speedup comparisons. - COMMANDMENT BENCHMARK/FULL_BENCHMARK sections now expand ${GEAK_BENCHMARK_EXTRA_ARGS:-} so iteration count is configurable. - Preprocessor re-runs all harness modes with --iterations 50 after initial validation to collect high-quality baselines. - geak --from-task and geak parallel mode now propagate GEAK_BENCHMARK_EXTRA_ARGS to agent environments. - Harness template (mini_unit_test_agent.yaml) instructs agents to accept --iterations N CLI arg with GEAK_BENCHMARK_ITERATIONS env fallback. - Updated INSTRUCTIONS.md, README.md, and docs/ with new env vars, patch exclusion list, and baseline benchmark re-run step. Co-authored-by: Cursor <cursoragent@cursor.com>

- Add HIP/CK kernel language detection and COMMANDMENT generation - Broaden orchestrator metric parsers to recognize BENCHMARK_LATENCY_MS and "Total median time:" output formats - Fix preprocessor baseline enrichment to parse BENCHMARK_LATENCY_MS from benchmark_baseline.txt - Update test discovery, task planner, and unit test agent for CK kernels Co-authored-by: Cursor <cursoragent@cursor.com>

…, skip redundant baselines Seven focused changes to reduce geak-orchestrate wall-clock time: 1. Build on prior round's best patch: round N+1 agents start from the globally best patch via create_worktree_with_patch, tracked in ctx. 2. Full round context for task generator: auto-inject ALL prior rounds' results, planned tasks, and orchestrator evaluations so the task generator avoids repeating strategies. 3. Remove baseline establishment: replace Phase 3 / Step 3 in YAML prompts with "Review Provided Baselines" and add skip-baseline instruction to inject_pipeline_context for dispatch-path agents. 4. Separate agent vs eval benchmark iterations: agents use 10 iterations (GEAK_AGENT_BENCHMARK_ITERATIONS) for fast feedback; eval keeps 50. 5. Deterministic patch selection first: try rewrite_best_results before falling back to LLM SelectPatchAgent, saving 8-76 LLM steps per task. 6. Default rounds to 2: change GEAK_MAX_ROUNDS default from 5 to 2. 7. Early stopping: break out of round loop when verified_speedup doesn't improve over prior best by GEAK_EARLY_STOP_THRESHOLD (default 0.5%). Also includes bash tool cwd propagation and save_and_test git exclude syntax fix (:(exclude) instead of :!). Co-authored-by: Cursor <cursoragent@cursor.com>

…ture - Add benchmark_parsing.py to git (imported by orchestrator, default agent, and parallel agent but was untracked -- would break on fresh clone) - Add previous_tasks_dir, round_evaluations, current_round params to generate_tasks_from_content to match generate_tasks signature Co-authored-by: Cursor <cursoragent@cursor.com>

Resolve all conflicts by keeping PR branch versions: - README.md, scripts/README.md: keep pipeline-focused docs - mini.py, profiling_tools.py: keep PR's CLI flags and ruff formatting - mkdocs.yml: delete (intentionally removed in cleanup) - examples/test_scripts/: keep renamed files Co-authored-by: Cursor <cursoragent@cursor.com>

Umangatamd and others added 30 commits February 9, 2026 11:38

Add integration summary doc for team

b31f107

Remove analysis docs from repo, keep only INTEGRATION_SUMMARY

6ccb802

Analysis/comparison docs moved to ~/geak_analysis_docs/ (not needed in repo).

fix tool call messages

e78cb4c

fix tool call in messages

30cc7f1

fix traj.json save

3b9529d

fix repo replace

87c9e05

add bash tool finished detect, repo detect

2b95051

modify unit test prompt

cf5009f

Fix f-string syntax for Python 3.10 compatibility

d4bcb59

Replace double quotes inside f-string expressions with single quotes. Python 3.10 does not support reusing the outer quote character inside f-string braces (PEP 701 landed in 3.12). Co-authored-by: Cursor <cursoragent@cursor.com>

tool call results truncated

33a45ed

Merge branch 'geak_v3_features' into yab

2b19119

Add Cursor rule for ruff/Python standards

8c12883

Co-authored-by: Cursor <cursoragent@cursor.com>

Add MCPToolBridge and wire 4 MCP servers (Phase 2)

5ae016e

Co-authored-by: Cursor <cursoragent@cursor.com>

Wire MCPs in Dockerfile and entrypoint health checks (Phase 3)

71f3cdf

Co-authored-by: Cursor <cursoragent@cursor.com>

Add native tools: resolve_kernel_url, baseline_metrics, check_compat …

6dc8844

…(Phase 4) Co-authored-by: Cursor <cursoragent@cursor.com>

Add heterogeneous ParallelAgent: AgentSpec, GPU detection, OpenEvolve…

ed1baf8

…Worker (Phase 5) Co-authored-by: Cursor <cursoragent@cursor.com>

Add sub_agent tool for child agent sub-tasks (Phase 6)

7d702e8

Co-authored-by: Cursor <cursoragent@cursor.com>

Wire new tools into geak.yaml and strategy template (Phase 7)

aea6330

Co-authored-by: Cursor <cursoragent@cursor.com>

Ruff auto-fix: import ordering and type annotations (Phase 8)

6728b02

Co-authored-by: Cursor <cursoragent@cursor.com>

Add missing tests and examples, ruff auto-fix

d243087

Co-authored-by: Cursor <cursoragent@cursor.com>

Umangatamd and others added 30 commits February 19, 2026 01:45

Merge pull request #13 from AMD-AGI/task-pipeline-fixes

1e55591

Fix task pipeline: agent cwd, config conflicts, and task execution

Merge pull request #14 from AMD-AGI/ttd

2607f9a

Wire UnitTestAgent into Full Pipeline Mode preprocessor

Add SweAgent, fix OpenEvolve worker, and improve agent routing and co…

26810f6

…ntext passing Co-authored-by: Cursor <cursoragent@cursor.com>

Merge pull request #15 from AMD-AGI/swe-agent-openevolve-fixes

8d78baf

Swe agent, openevolve fixes and context

Merge pull request #16 from AMD-AGI/fix/discovery-shape-selection

9408b00

Fix OOM in profiling: LLM-driven shape extraction from discovery

Fix ZeroDivisionError in select_shapes_uniform when count=1

3003ecc

The uniform index calculation divides by (count-1), which crashes when count=1. Add early returns for count<=0 (empty) and count==1 (median shape). Co-authored-by: Cursor <cursoragent@cursor.com>

Fixed orchestrator running in exploration phase

7741d80

Fix GPU pool deadlock when task requests more GPUs than available

96229a7

A task with num_gpus > len(gpu_ids) would block forever in gpu_queue.get() waiting for GPUs that don't exist. Cap the request to the pool size to prevent the deadlock. Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent filtering and cli unification#19

Agent filtering and cli unification#19
sdubagun-amd wants to merge 76 commits intodevfrom
agent-filtering-and-cli-unification

sdubagun-amd commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sdubagun-amd commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants