Skip to content

Execution agent over-engineering patterns degrading target codebase #407

@peteromallet

Description

@peteromallet

Problem

The execution agent (not the review/scoring bot) consistently over-corrects when fixing findings, producing code that is larger and more complex than before. The review bot finds real issues (dead exports, missing tests, duplication), but the execution agent's fixes violate the target codebase's core principle: "every change should make the codebase smaller or more explicit."

Recurring patterns observed

Evidence from chore/desloppify-queue-2026-03-02 on peteromallet/reigh:

1. Single-consumer extractions

Creates new files/functions with exactly 1 caller:

  • toolSettingsAuthCache.ts, toolSettingsErrors.ts, toolSettingsOperationFailures.ts, toolSettingsScopeReader.ts — all had 0-1 external consumers, added +155 lines of overhead (reverted in bc01baa7f)
  • buildShotEditorControllerSlices, buildShotEditorSections + 117 lines of interface defs, each called once (reverted)

2. __internal test export hacks

7 files used export const __internal = {...} to expose private functions for testing, instead of exporting them properly or testing through the public API. Files affected:

  • useShotEditorLayoutModel.ts, generateVideoService.ts, useTimelineOrchestrator.ts, calculate-task-cost/index.ts, huggingface-upload/index.ts

3. Naming convention churn

8 functions renamed with OrThrow suffix; none had a non-throwing variant; pure noise across 18 files (reverted in bc01baa7f)

4. Stripping "why" comments

Navigator.locks rationale, "IMPORTANT: update these transformers" warnings, per-field JSDoc on HiresFixConfig — all stripped as collateral during moves (restored in 7561f7b9e)

5. Behavioral changes hidden in refactors

resolvePrimaryStructureVideo dropped a legacy field fallback chain in a "unify contracts" commit; the test was rewritten to match the new behavior rather than preserving the old (fixed in 7561f7b9e)

6. Net line growth while claiming cleanup

4 of 5 commits in one batch added lines (+440, +17, +90, +38); only 1 reduced lines (-20)

7. Repetitive boilerplate

buildTravelStructureSource had 22 identical pickFirstDefinedValue calls in 148 lines. A 25-line field-descriptor loop does the same thing.

Suggested constraints for the execution agent

  • Net lines must decrease or stay flat — if a cleanup commit adds lines, something is wrong
  • Don't extract unless 2+ consumers exist — single-consumer extractions add indirection without value
  • Don't create new files for single-consumer code — keep things colocated
  • Never use __internal exports — either export the function properly or test through the public API
  • Preserve "why" comments; only strip "what" comments — rationale comments are load-bearing
  • Refactors must preserve behavior — no test expectation changes in refactor commits
  • Don't rename for convention aloneOrThrow suffixes, etc. are noise if there's no non-throwing variant
  • Don't delete tests without replacement — two test files (191 + 155 lines) were deleted in 6fd933f83 without equivalent coverage

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions