Skip to content

[RFC] Task queue manager for eval execution #21

@BobbyZhouZijian

Description

@BobbyZhouZijian

Problem

Currently, each agent triggers grader execution immediately and independently via coral eval. With multiple agents running concurrently, this leads to several issues:

  1. No concurrency control — All agents can spawn grader child processes simultaneously, causing resource exhaustion when grading is expensive (ML models, GPU, heavy computation)
  2. Eval counter race condition_increment_eval_count() in post_commit.py uses a non-atomic read-modify-write pattern. Concurrent evals can read the same counter value, causing undercounting
  3. No prioritization — All evals are treated equally. There's no way to prioritize (e.g., agents that haven't been graded recently, or agents showing improvement)
  4. No rate limiting — A fast agent can flood the grader, starving slower agents of feedback

Proposal

Add a task queue manager that sits between coral eval and grader execution:

Agent calls `coral eval`
    → Request enqueued (commit hash, agent_id, message, timestamp)
    → Queue manager picks next eval based on policy
    → Grader runs with concurrency limit (e.g., max N concurrent graders)
    → Result written to .coral/attempts/
    → Agent notified of result

Key capabilities

  • Concurrency limit: Configure max concurrent grader processes (e.g., queue.max_concurrent: 2)
  • Fair scheduling: Round-robin or similar policy so all agents get graded, not just the fastest committer
  • Rate limiting: Optional per-agent rate limit (e.g., max 1 eval per agent per minute)
  • Priority support: Optional priority hints (e.g., boost agents that recently improved)
  • Atomic counters: Replace file-based read-modify-write with proper atomic operations
  • Backpressure: When the queue is full, agents get immediate feedback ("eval queued, position N") instead of blocking

Possible architecture

┌─────────────┐     ┌──────────────────┐     ┌──────────────┐
│  Agent 1    │────▶│                  │────▶│  Grader      │
│  coral eval │     │   Queue Manager  │     │  (slot 1)    │
├─────────────┤     │                  │     ├──────────────┤
│  Agent 2    │────▶│  - FIFO/fair     │────▶│  Grader      │
│  coral eval │     │  - concurrency   │     │  (slot 2)    │
├─────────────┤     │  - rate limit    │     └──────────────┘
│  Agent 3    │────▶│  - backpressure  │
│  coral eval │     └──────────────────┘
└─────────────┘

Options for implementation:

  • In-process: asyncio.Semaphore + queue in the manager process, agents communicate via IPC (Unix socket or named pipe)
  • File-based: Lock file + queue directory in .coral/, compatible with current design
  • Lightweight daemon: Separate queue process started by coral start, agents submit via local socket

Configuration (example)

queue:
  max_concurrent: 2        # max grader processes at once
  strategy: fair            # fair | fifo | priority
  rate_limit: 60            # min seconds between evals per agent (0 = unlimited)
  max_queue_size: 50        # drop oldest if exceeded

Context

  • Eval is triggered in coral/hooks/post_commit.py:run_eval()
  • Grader runs in a child process via _run_grader_with_timeout() (multiprocessing)
  • Attempts are written to .coral/public/attempts/{commit_hash}.json
  • The manager's monitor_loop() polls for new attempt files every 5s
  • The eval counter race is in _increment_eval_count() (read-modify-write without locks)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions