Skip to content

feat: implement 8 HumanLayer-informed audit improvements#17

Open
WellDunDun wants to merge 4 commits intomasterfrom
WellDunDun/fabric-wisdom-to-reins
Open

feat: implement 8 HumanLayer-informed audit improvements#17
WellDunDun wants to merge 4 commits intomasterfrom
WellDunDun/fabric-wisdom-to-reins

Conversation

@WellDunDun
Copy link
Owner

@WellDunDun WellDunDun commented Mar 14, 2026

Summary

Implement all 8 proposed Reins CLI improvements derived from HumanLayer's harness engineering wisdom. Expands scoring model (22→24), promotes back-pressure and hooks to scored dimensions, adds compare command, language-agnostic back-pressure detection, and custom check plugins.

Changes

  • Back-pressure and hooks now award scored points (agent_legibility max 4→5, agent_workflow max 5→6)
  • AGENTS.md tiered evaluation (<60/<100/<150/>150 with quality labels)
  • MCP tool proliferation detection (warns >10 servers)
  • Language-agnostic back-pressure (JS, Python, Rust, Makefile)
  • New reins compare <path> <baseline.json> command for tracking audit progression
  • Plugin system via .reins/custom-checks.json (file-exists, file-contains checks)
  • Maturity thresholds scaled: L0 ≤6, L1 ≤12, L2 ≤18, L3 ≤21, L4 22-24
  • All documentation updated (README, skill workflows, design docs)

Testing

  • bun test passes (80/80)
  • Self-audit shows 19/24, L3: Full Outloop
  • No new external dependencies

Merge Readiness

  • All changes committed and pushed
  • Code reviewed by parallel agents (reuse, quality, efficiency)
  • Documentation fully updated

Audit Impact

Before: max_score 22 with L3 threshold at 19. After: max_score 24 with L3 threshold at 21. Back-pressure and hooks now directly contribute to scores rather than just findings.

Summary by CodeRabbit

  • New Features

    • Added a compare command to diff current vs. baseline audits
    • Support for user-defined custom checks and new runtime detections (back-pressure, hooks, MCP tool proliferation)
  • Updates

    • Expanded audit scoring from 0–22 to 0–24; adjusted per-dimension maxima and maturity thresholds
    • Evolve guidance updated to include back-pressure and workflow signals; help text updated
  • Tests

    • New tests covering compare, custom checks, and MCP detection
  • Chores

    • Version bumped to 0.1.5 and added a TypeScript typecheck script

… command

Apply extracted wisdom from HumanLayer's harness engineering analysis to expand
the Reins scoring model and CLI capabilities. Back-pressure and hooks promoted
from findings to scored points (max 22→24), AGENTS.md uses tiered evaluation,
MCP tool proliferation detected, language-agnostic back-pressure covers JS/Python/
Rust/Make, plugin system via .reins/custom-checks.json, and new compare command
for tracking audit progression over time. All docs updated to reflect new
thresholds and scoring dimensions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link

coderabbitai bot commented Mar 14, 2026

📝 Walkthrough

Walkthrough

This PR expands the audit scoring from 0–22 to 0–24, adds runtime detection signals (hooks, back-pressure, MCP counts, custom checks), adds new scoring hooks and MATURITY_THRESHOLDS, implements a reins compare CLI command, and updates docs, tests, and package metadata to match the new model.

Changes

Cohort / File(s) Summary
Top-level docs & CLI README
README.md, cli/reins/README.md, skill/Reins/SKILL.md, skill/Reins/Workflows/Audit.md
Expanded scoring scale to 0–24 in text, examples, tables and diagrams; updated maturity brackets and documented reins compare.
Design & methodology docs
docs/design-docs/core-beliefs.md, docs/design-docs/symphony-integration.md, skill/Reins/HarnessMethodology.md
Added "Configuration Over Model Capability" and configuration-surface guidance; updated score references and added related guidance.
CLI wiring & command
cli/reins/src/index.ts, cli/reins/src/lib/commands/compare.ts
Wired compare into CLI and implemented runCompare to load baseline JSON, run a current audit via injected runAudit, compute per-dimension deltas/findings, and print JSON with error handling.
Audit runtime context
cli/reins/src/lib/audit/context.ts
Added CustomCheck type; extended AuditRuntimeContext with hasHooksConfig, hasBackPressure, mcpToolCount, customChecks; added detectors/loaders to populate these fields.
Scoring logic
cli/reins/src/lib/audit/scoring.ts
Added MATURITY_THRESHOLDS; new scoring functions: scoreBackPressure, scoreCustomChecks, scoreAgentWorkflowHooks, scoreMcpToolProliferation; integrated into scoring flow and adjusted AGENTS.md heuristics.
Evolution & commands
cli/reins/src/lib/commands/evolve.ts
Replaced hard-coded thresholds with MATURITY_THRESHOLDS; added an L1 evolve step to set up back-pressure.
Types & package metadata
cli/reins/src/lib/types.ts, cli/reins/package.json, package.json
Updated AuditResult.max_score to 24; bumped package version to 0.1.5; added a typecheck script.
Tests & fixtures
cli/reins/src/index.test.ts, cli/reins/src/lib/commands/evolve.test.ts
Added tests for compare, MCP proliferation, and custom checks; updated fixtures/assertions to new per-dimension and total max scores.
Skill docs
skill/Reins/HarnessMethodology.md
Expanded configuration surfaces, anti-patterns, and sources; added guidance on back-pressure, hooks, and isolation.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant User as CLI (user)
participant FS as Filesystem (baseline.json)
participant Audit as Audit Engine (runAudit)
participant Compare as Compare Logic
User->>FS: read baseline.json
FS-->>User: baseline JSON
User->>Audit: runAudit(path)
Audit-->>User: current AuditResult
User->>Compare: compute deltas (baseline, current)
Compare-->>User: CompareResult JSON (stdout)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: implementing 8 audit improvements derived from HumanLayer principles, which aligns with the comprehensive updates across scoring, commands, and documentation.
Description check ✅ Passed The description includes all required template sections with substantive content: comprehensive Summary, detailed Changes list, Testing confirmation, Merge Readiness checklist completion, and clear Audit Impact metrics.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch WellDunDun/fabric-wisdom-to-reins
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
docs/design-docs/symphony-integration.md (1)

14-20: ⚠️ Potential issue | 🟡 Minor

Scoring deltas in this section are internally inconsistent with the final model.

Please align this block to the final thresholds/maxima used in this PR (agent_workflow max 6 and total max shift 22→24).

🧩 Suggested consistency patch
-- **Scoring**: `agent_workflow` dimension expanded from max 4 to max 5, adding orchestration readiness (3+ of: workflow config, skills directory, isolation policy, concurrency limits, merge protection)
+- **Scoring**: `agent_workflow` dimension expanded from max 5 to max 6, adding orchestration readiness (3+ of: workflow config, skills directory, isolation policy, concurrency limits, merge protection)
...
-- **Total max score**: 24 (was 21)
+- **Total max score**: 24 (was 22)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design-docs/symphony-integration.md` around lines 14 - 20, The scoring
summary is inconsistent with the PR's final thresholds: update the "Scoring"
line to show agent_workflow expanded from max 4 to max 6 (not 5), and update the
"Total max score" line to reflect the PR's final shift from 22→24 (total max
24); ensure the rest of the block still lists the new detection signals in
context.ts (hasWorkflowConfig, hasSkillsDirectory, hasIsolationPolicy,
hasConcurrencyLimits, hasMergeProtection, hasSpecDocument, frameworksDetected),
the three Doctor checks, the Evolve steps, Init scaffolds (WORKFLOW.md,
SPEC.md), and the new frameworks_detected audit field so the section matches the
final model.
skill/Reins/HarnessMethodology.md (1)

13-16: ⚠️ Potential issue | 🟡 Minor

Audit scale reference is stale in this document.

The command mapping still says reins audit scores (0-22), which is now outdated for this PR.

📝 Suggested fix
-- `reins audit` — score maturity across six dimensions (0-22)
+- `reins audit` — score maturity across six dimensions (0-24)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Reins/HarnessMethodology.md` around lines 13 - 16, Update the stale
audit-scale text for the `reins audit` mapping in HarnessMethodology.md to match
the real, current scoring range used by the implementation; find the
authoritative value in the `reins audit` command implementation or its scoring
constant (search for symbols like AUDIT_MAX_SCORE, maxScore, computeAuditScore,
or scoreAudit) and replace “(0-22)” with the actual range (e.g.,
“(0-<actual_max>)”) so the doc and code are consistent.
cli/reins/package.json (1)

40-44: ⚠️ Potential issue | 🟠 Major

typecheck script fails without typescript in devDependencies.

The npm script "typecheck": "tsc --noEmit" invokes tsc directly, but typescript is not declared in this package's dependencies. Developers running npm run typecheck or bun run typecheck will fail with "command not found: tsc". CI avoids this by explicitly calling bunx tsc --noEmit instead of using the npm script.

Use bunx tsc --noEmit in the script to ensure tsc is available:

Suggested fix
 "scripts": {
   "start": "bun src/index.ts",
   "test": "bun test",
-  "typecheck": "tsc --noEmit"
+  "typecheck": "bunx tsc --noEmit"
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/reins/package.json` around lines 40 - 44, Update the package.json
"typecheck" npm script so it does not assume a global tsc binary is installed;
replace the current command "tsc --noEmit" with "bunx tsc --noEmit" (i.e., edit
the "typecheck" script entry) so running npm run typecheck or bun run typecheck
works without adding typescript to devDependencies.
🧹 Nitpick comments (1)
skill/Reins/SKILL.md (1)

68-70: Clarify compare arguments to match baseline workflow semantics.

Using <path1> <path2> is ambiguous here; naming the second argument as a baseline audit artifact is clearer for users and agents.

✏️ Suggested doc tweak
-# Compare audits across repos or over time
-reins compare <path1> <path2>
+# Compare current audit against a baseline
+reins compare <path> <baseline.json>

Based on learnings cli/reins is the product engine and the only source of truth for readiness scoring and JSON outputs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skill/Reins/SKILL.md` around lines 68 - 70, Update the documentation for the
reins compare command to explicitly name and describe the arguments (e.g.,
`reins compare <current-audit-path> <baseline-audit-path>`), clarifying that the
second argument is a baseline audit artifact used for diffing against the
current audit; update the example and any surrounding text referencing `reins
compare` in SKILL.md so it reflects these baseline workflow semantics and that
cli/reins is the source of truth for readiness scoring and JSON outputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/reins/src/lib/audit/scoring.ts`:
- Around line 537-545: The code constructs RegExp from user input (new
RegExp(check.pattern, "i")) which can cause ReDoS; update scoring.ts to validate
and/or sandbox regex before using it: refuse or warn on overly-complex patterns
by running check.pattern through a safe-regex/regexp-tree complexity check (or
length/quantifier heuristics) and if unsafe push a finding like the existing
invalid-regex message; alternatively, execute the actual match within a
time-limited sandbox (worker thread or AbortController-based timeout) and catch
timeouts to record a finding instead of allowing the CLI to hang. Ensure you
reference and update the creation/usage points around the regex variable and the
matching logic that uses check.pattern so unsafe patterns are rejected or
timeboxed.

---

Outside diff comments:
In `@cli/reins/package.json`:
- Around line 40-44: Update the package.json "typecheck" npm script so it does
not assume a global tsc binary is installed; replace the current command "tsc
--noEmit" with "bunx tsc --noEmit" (i.e., edit the "typecheck" script entry) so
running npm run typecheck or bun run typecheck works without adding typescript
to devDependencies.

In `@docs/design-docs/symphony-integration.md`:
- Around line 14-20: The scoring summary is inconsistent with the PR's final
thresholds: update the "Scoring" line to show agent_workflow expanded from max 4
to max 6 (not 5), and update the "Total max score" line to reflect the PR's
final shift from 22→24 (total max 24); ensure the rest of the block still lists
the new detection signals in context.ts (hasWorkflowConfig, hasSkillsDirectory,
hasIsolationPolicy, hasConcurrencyLimits, hasMergeProtection, hasSpecDocument,
frameworksDetected), the three Doctor checks, the Evolve steps, Init scaffolds
(WORKFLOW.md, SPEC.md), and the new frameworks_detected audit field so the
section matches the final model.

In `@skill/Reins/HarnessMethodology.md`:
- Around line 13-16: Update the stale audit-scale text for the `reins audit`
mapping in HarnessMethodology.md to match the real, current scoring range used
by the implementation; find the authoritative value in the `reins audit` command
implementation or its scoring constant (search for symbols like AUDIT_MAX_SCORE,
maxScore, computeAuditScore, or scoreAudit) and replace “(0-22)” with the actual
range (e.g., “(0-<actual_max>)”) so the doc and code are consistent.

---

Nitpick comments:
In `@skill/Reins/SKILL.md`:
- Around line 68-70: Update the documentation for the reins compare command to
explicitly name and describe the arguments (e.g., `reins compare
<current-audit-path> <baseline-audit-path>`), clarifying that the second
argument is a baseline audit artifact used for diffing against the current
audit; update the example and any surrounding text referencing `reins compare`
in SKILL.md so it reflects these baseline workflow semantics and that cli/reins
is the source of truth for readiness scoring and JSON outputs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 825f348b-7508-4a1f-b7b3-2d83ee48122a

📥 Commits

Reviewing files that changed from the base of the PR and between 5510d31 and 7a21ca2.

📒 Files selected for processing (17)
  • README.md
  • cli/reins/README.md
  • cli/reins/package.json
  • cli/reins/src/index.test.ts
  • cli/reins/src/index.ts
  • cli/reins/src/lib/audit/context.ts
  • cli/reins/src/lib/audit/scoring.ts
  • cli/reins/src/lib/commands/compare.ts
  • cli/reins/src/lib/commands/evolve.test.ts
  • cli/reins/src/lib/commands/evolve.ts
  • cli/reins/src/lib/types.ts
  • docs/design-docs/core-beliefs.md
  • docs/design-docs/symphony-integration.md
  • package.json
  • skill/Reins/HarnessMethodology.md
  • skill/Reins/SKILL.md
  • skill/Reins/Workflows/Audit.md

Reject patterns with nested quantifiers (e.g. (a+)+$) before
constructing RegExp to prevent catastrophic backtracking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli/reins/src/lib/audit/scoring.ts`:
- Around line 522-564: The scoreCustomChecks function is over-complex; split the
logic for each check.type into small helpers to reduce nesting and cognitive
complexity. Extract and call at least handleFileExistsCheck(ctx, check, result)
to handle the "file-exists" branch and handleFileContainsCheck(ctx, check,
result) to encapsulate path resolution, existsSync, file read, ReDoS pattern
guard, regex construction (separately catching invalid RegExp) and test/result
push; optionally add validatePattern(check.pattern) to centralize the
nested-quantifier check. Ensure helpers reference the same symbols
(ctx.customChecks, result.scores, check.dimension) and preserve the exact
result.scores[dimension].findings messages for missing files, invalid regex,
rejected patterns, read errors, pattern found/not found.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d28a4cb7-104a-4e5e-b54d-ab50ff98f6e2

📥 Commits

Reviewing files that changed from the base of the PR and between 7a21ca2 and 49c9a3f.

📒 Files selected for processing (1)
  • cli/reins/src/lib/audit/scoring.ts

Split scoreCustomChecks into handleFileExistsCheck and
handleFileContainsCheck helpers. Resolves Biome lint error
(cognitive complexity 25 > 15 limit).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cli/reins/src/lib/audit/scoring.ts (1)

1-2: ⚠️ Potential issue | 🟡 Minor

CI failure: Run formatter to fix style issues.

The pipeline reports formatting differences. Run the project's formatter (e.g., bun format or biome format --write) to resolve before merge.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/reins/src/lib/audit/scoring.ts` around lines 1 - 2, The CI failure is due
to formatting differences in cli/reins/src/lib/audit/scoring.ts; run the
project's formatter (e.g., bun format or biome format --write) on the repository
so imports like existsSync, readFileSync, readdirSync and join and the rest of
scoring.ts are reformatted to match the project's style and resolve the pipeline
error.
🧹 Nitpick comments (1)
cli/reins/src/lib/audit/scoring.ts (1)

544-550: ReDoS guard may produce false positives for non-greedy quantifiers.

The pattern /([+*?]\)?[+*?]|(\.\*){3,})/ catches dangerous nested quantifiers, but it may also reject legitimate non-greedy patterns like a+? or .*? since +? matches the [+*?]\)?[+*?] portion.

Since this is repo-owner-authored config, false positives are low risk — users can simply adjust their patterns. However, if you want to be more precise:

🔧 Optional: More precise ReDoS detection
   // Guard against ReDoS: reject nested quantifiers like (a+)+
-  if (/([+*?]\)?[+*?]|(\.\*){3,})/.test(check.pattern)) {
+  // Detect: (X+)+, (X*)+, (X+)*, etc. but allow non-greedy like a+?
+  if (/\([^)]*[+*]\)[+*]|([+*])\1/.test(check.pattern)) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli/reins/src/lib/audit/scoring.ts` around lines 544 - 550, The current ReDoS
guard in the block that checks check.pattern is too broad and flags non-greedy
quantifiers like +? and .*?; update the regex used there so it does not match a
quantifier immediately followed by the non-greedy "?" (i.e., only detect
nested/grotesque quantifiers when the first quantifier is not followed by ?),
then keep the existing behavior of pushing `[custom] ${check.name}: regex
pattern rejected (nested quantifiers may cause ReDoS)` into
result.scores[dimension].findings when a true nested-quantifier case is
detected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@cli/reins/src/lib/audit/scoring.ts`:
- Around line 1-2: The CI failure is due to formatting differences in
cli/reins/src/lib/audit/scoring.ts; run the project's formatter (e.g., bun
format or biome format --write) on the repository so imports like existsSync,
readFileSync, readdirSync and join and the rest of scoring.ts are reformatted to
match the project's style and resolve the pipeline error.

---

Nitpick comments:
In `@cli/reins/src/lib/audit/scoring.ts`:
- Around line 544-550: The current ReDoS guard in the block that checks
check.pattern is too broad and flags non-greedy quantifiers like +? and .*?;
update the regex used there so it does not match a quantifier immediately
followed by the non-greedy "?" (i.e., only detect nested/grotesque quantifiers
when the first quantifier is not followed by ?), then keep the existing behavior
of pushing `[custom] ${check.name}: regex pattern rejected (nested quantifiers
may cause ReDoS)` into result.scores[dimension].findings when a true
nested-quantifier case is detected.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 21e489f8-40c7-4850-adb9-0f5619c4f593

📥 Commits

Reviewing files that changed from the base of the PR and between 49c9a3f and a545379.

📒 Files selected for processing (1)
  • cli/reins/src/lib/audit/scoring.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant