Skip to content

fix(daemon): platform-aware resource thresholds for macOS#1173

Open
ahmedibrahim085 wants to merge 3 commits intoruvnet:mainfrom
ahmedibrahim085:fix/platform-aware-resource-thresholds
Open

fix(daemon): platform-aware resource thresholds for macOS#1173
ahmedibrahim085 wants to merge 3 commits intoruvnet:mainfrom
ahmedibrahim085:fix/platform-aware-resource-thresholds

Conversation

@ahmedibrahim085
Copy link

Summary

  • Daemon workers (audit, optimize, testgaps) permanently blocked on macOS due to hardcoded resource thresholds that are incompatible with macOS system metric reporting
  • CPU threshold of 2.0 is exceeded by os.loadavg() on any multi-core Mac at idle (reports aggregate load, not per-core)
  • Memory threshold of 20% is unreachable because os.freemem() reports only truly unused RAM, not reclaimable cache (macOS typically shows 5-8% "free")
  • Compute thresholds from system capabilities: CPU cores × 3 for load, 1% free memory on Darwin (5% on Linux)
  • Thresholds remain overridable via config parameter — zero breaking change

Problem

All daemon workers with resource checks fail with:

CPU load too high: 3.80
Memory too low: 7.6% free

On a 16-core Mac at idle. Workers have 0% success rate (0 runs out of hundreds of scheduler cycles).

Root cause: os.loadavg()[0] returns aggregate load (not per-core), and os.freemem() excludes reclaimable cache on macOS. The hardcoded thresholds assume Linux-like reporting semantics.

Approach

  1. import os from 'os' — Use top-level ESM import instead of await import('os') in async method. Built-in modules don't need dynamic import, and this enables synchronous access in the constructor.

  2. getDefaultResourceThresholds() — Extract threshold computation into a pure function that:

  3. Startup logging — Log computed thresholds (values, core count, platform) at daemon start for debuggability.

Test plan

  • Verified on 16-core macOS: all 5 workers execute successfully after patch
  • daemon status shows 100% success rate for audit, optimize, testgaps (were 0% before)
  • daemon trigger --worker audit completes in <1ms
  • Config override still works (config parameter takes precedence over computed defaults)
  • Verify Linux behavior unchanged (same function, higher thresholds: cores×3 for CPU, 5% for memory)

Fixes #1077

…port

The `os` module was imported dynamically via `await import('os')` inside
the async `canRunWorker()` method. Since `os` is a Node.js built-in that
is always available and never changes at runtime, this dynamic import
adds unnecessary overhead on every resource check call.

Move to a standard top-level ESM import. This also enables synchronous
access to `os` APIs (like `os.cpus()`) in non-async contexts such as
the constructor, which the next commit requires for platform-aware
threshold computation.
…tead of hardcoded values

Daemon workers (audit, optimize, testgaps) never execute on macOS because
the hardcoded resource thresholds are incompatible with how macOS reports
system metrics.

CPU: os.loadavg() returns aggregate load across all cores. A 16-core Mac
at modest utilization reports loadavg ~3-5, permanently exceeding the
hardcoded maxCpuLoad of 2.0. Workers are blocked even when the system is
effectively idle.

Memory: macOS uses available RAM as filesystem cache. os.freemem() reports
only truly unused memory (typically 5-8%), not reclaimable cache. The 20%
minFreeMemoryPercent threshold is unreachable under normal macOS operation.

Extract threshold computation into getDefaultResourceThresholds() that
scales CPU threshold by core count (cores × 3) and uses a 1% free memory
floor on Darwin. Linux gets a 5% floor since its memory reporting is more
conservative. Both remain overridable via config parameter.

Fixes ruvnet#1077
When debugging why workers are blocked, the first question is "what
thresholds is the daemon using?" Currently this requires reading the
daemon-state.json file after startup.

Add a single log line at daemon start that shows the active maxCpuLoad,
core count, minFreeMemoryPercent, and platform. This makes threshold
misconfiguration immediately visible in daemon logs without additional
tooling.
Copy link

@mixingchex mixingchex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_CHANGES — Reviewed via hive-mind swarm analysis.

The fix correctly diagnoses the macOS issue — os.loadavg() scaling and macOS memory reporting make hardcoded thresholds permanently block all workers. The memory thresholds (1% Darwin, 5% Linux) are sound.

Required change:
The CPU multiplier cpuCount * 3 is too aggressive — on a 16-core Mac it sets maxCpuLoad = 48, effectively disabling the resource gate entirely.

System Cores Current (2.0) Proposed (x3) Recommended (x1.0)
1-core 1 2.0 3.0 2.0 (floor)
4-core 4 2.0 12.0 4.0
8-core 8 2.0 24.0 8.0
16-core 16 2.0 48.0 16.0

Recommended fix:

maxCpuLoad: Math.max(cpuCount * 1.0, 2.0),

This means "allow workers when load < 100% per core, with a minimum of 2.0 for small machines."

Everything else (static import, startup logging, config override preservation) is clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Daemon workers never execute on macOS due to unrealistic resource thresholds

2 participants