feat: env validation v2 with mode-aware runtime gates#320
feat: env validation v2 with mode-aware runtime gates#320eva57gr wants to merge 3 commits intoLight-Heart-Labs:mainfrom
Conversation
Lightheartdevs
left a comment
There was a problem hiding this comment.
Review: feat: env validation v2 with mode-aware runtime gates
Good concept — mode-aware validation that fails early before runtime crashes is the right pattern. Two breaking bugs need fixing before this can land.
Breaking Bugs
-
validate_env_gatecalled but never defined.dream-clicallsvalidate_env_gateincmd_startandcmd_update, but this function doesn't exist anywhere in the codebase or this PR. Underset -euo pipefail, this crashesdream startanddream updateimmediately. Either define it indream-clior source it from a shared lib. -
preflight-engine.shargument count mismatch. The PR adds 7 new variables to the Pythonsys.argv[1:]unpacking (env_file, schema_file, env_validation_attempted, etc.) but does NOT update the Bash invocation line that passes arguments to the Python heredoc. Python expects 19 args, only 12 are passed →ValueError: not enough values to unpackon every preflight run.
Must-fix
-
Duplicate
usage()invalidate-env.sh. The PR adds a newusage()function but doesn't remove the old one. Bash silently uses the last definition, so the new detailed help text is overridden by the old shorter version. Remove the old one. -
Unknown key detection silently removed. The loop that detects unknown/typo'd env keys is deleted with no replacement. The
--allow-unknownflag is parsed but has no corresponding check. Unknown keys now silently pass validation — regression from current behavior. -
STRICThas no default value. The--strictflag description says it's the default, but no default is set in the arg-parsing block. If unset, behavior is undefined.
Conflicts
- PR #349: Both modify
get_current_version()indream-update.shandcompare_versionsinmigrate-config.sh. Merge conflict guaranteed. Coordinate merge order. - PR #356: Minor textual conflicts on port defaults in shared files.
Non-blocking
DREAM_MODEnot set at all has no explicit fallback — should default tolocalor error clearly.- Empty
.env(zero keys) behavior is unclear. - The core design (schema-driven mode requirements, deprecated key tracking, exit code separation) is solid. Just needs the bugs fixed.
Lightheartdevs
left a comment
There was a problem hiding this comment.
Review of env-validation-v2
Lightheartdevs
left a comment
There was a problem hiding this comment.
Review: feat/env-validation-v2
Solid direction -- mode-aware env validation with schema-driven deprecation auto-fix is a good architectural choice. The test coverage in test-env-validation-v2.sh is thorough. However, there are several bugs that will break runtime behavior and some CLAUDE.md convention violations that need fixing before merge.
Bugs (blocking)
1. validate_env_gate is called but never defined (dream-cli, lines ~358/381)
cmd_start() and cmd_update() both call validate_env_gate, but this function is never defined anywhere in dream-cli or sourced from another file. This will crash every dream-cli start and dream-cli update invocation. Likely the implementation was planned but not included in this PR.
2. preflight-engine.sh -- Python argv mismatch will crash every invocation
The Python code (after the diff) unpacks 19 positional args from sys.argv[1:] (adding env_file, schema_file, env_validation_attempted, env_validation_file, env_validation_exit, env_validation_strict, skip_env_validation) but the bash invocation line (~line 83) still only passes the original 12 args. The 7 new variables are never appended to the Python invocation. This will produce a ValueError: not enough values to unpack on every preflight run, breaking dream-doctor.sh and the installer.
3. validate-env.sh -- duplicate usage() function
The diff adds a new usage() function (with the full --strict/--json docs) but does NOT remove the original usage() that remains further down in the file. Bash will silently use the second definition, meaning the new help text will never display. The old usage() also references the now-removed positional-arg interface. One of them needs to be removed.
4. validate-env.sh -- unknown-key check silently removed
The diff removes the unknown-key detection loop but the reporting section that references the unknown array still exists downstream. Unknown keys will never be detected, silently weakening validation. If this was intentional (related to --allow-unknown), the conditional logic is missing.
CLAUDE.md Convention Violations
5. || true used twice in migrate-config.sh (~line 253-254 in diff)
cmd_autofix_env || true and cmd_validate --warn-only >/dev/null || true violate the CLAUDE.md rule: "Never || true or 2>/dev/null." The second line also uses >/dev/null. If these failures are intentionally non-fatal, use the approved pattern: cmd_autofix_env || log_warn "autofix-env failed (non-fatal)".
6. Multiple set +e blocks in migrate-config.sh
Four set +e / set -e blocks to capture compare_versions return codes. Toggling set -e is fragile -- any command failing between set +e and set -e is silently swallowed. Safer pattern: compare_versions "$a" "$b" && result=0 || result=$?.
7. Broad except Exception in preflight-engine.sh Python (line ~661 in diff)
CLAUDE.md rule: "No broad or silent catches." This should catch specific exceptions (json.JSONDecodeError, FileNotFoundError, KeyError) that map to distinct statuses.
8. 2>/dev/null in migrate-config.sh get_current_version()
jq errors are suppressed with 2>/dev/null. Per CLAUDE.md, errors should be visible or logged.
Minor / Non-blocking
9. cmd_autofix_env uses ((changed+=1)) which returns exit code 1 when changed goes from 0 to 1 under set -e (arithmetic returning 0 is falsy in bash). Use : $((changed += 1)) instead.
10. dream-doctor.sh references PREFLIGHT_ENV_VALIDATION_STATUS etc. that are only emitted in --env mode. If preflight crashes due to bug #2, these vars will be unset and the ${VAR:-not_run} defaults will silently mask the failure.
Summary
Bugs #1 and #2 are ship-blockers -- they will crash dream-cli start/update and every preflight invocation. Bug #3 and #4 are functional regressions. The || true and broad-catch patterns need updating per CLAUDE.md conventions. The overall design (schema-driven mode requirements, deprecation auto-fix, JSON reporting) is sound and well-tested.
No description provided.