From 277a3c1feca08dfe58329bec0bde10f4b15a182f Mon Sep 17 00:00:00 2001 From: Mykhailo Chalyi Date: Mon, 6 Apr 2026 03:39:02 +0000 Subject: [PATCH 1/4] chore(specs): make CI health a hard gate in maintenance checklist MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The maintenance pass (PR #1063) shipped while CI on main was red for 9 days and fuzz had 5 failures. Root causes: - Spec only required nightly/fuzz green, not CI on main - Maintain skill only checked nightly.yml and fuzz.yml, missing ci.yml - No language marking the check as a blocker that prevents merging Changes: - Rename "Nightly CI" → "CI Health", add "CI on main is green" check - Mark the section as a hard gate — pass cannot complete while red - Expand escalation policy to cover all workflows, not just nightly - Update maintain skill with concrete `gh` commands for CI on main - Add "never silently skip" instruction for the agent Ref: #1088 #1089 #1090 #1091 --- .claude/commands/maintain.md | 30 +++++++++++++++++++++++------- specs/012-maintenance.md | 17 +++++++++++++---- 2 files changed, 36 insertions(+), 11 deletions(-) diff --git a/.claude/commands/maintain.md b/.claude/commands/maintain.md index 24f9f6c6..858eedf9 100644 --- a/.claude/commands/maintain.md +++ b/.claude/commands/maintain.md @@ -74,13 +74,29 @@ Make the simplifications. Run tests after each change. The goal is less code tha `AGENTS.md` and `CLAUDE.md` reflect current specs, commands, tooling, and workflows. -### 10. Nightly CI is healthy - -Nightly and fuzz workflows green for past week. Fuzz targets compile. Git-sourced deps resolve. - -Key tools: `gh run list --workflow=nightly.yml --limit 7`, `gh run list --workflow=fuzz.yml --limit 7` - -If failures persist >2 days, escalate per the policy in `specs/012-maintenance.md`. +### 10. All CI is healthy (HARD GATE) + +**This section is a blocker.** The maintenance pass MUST NOT be marked complete +while any of these checks are red. + +1. **CI on main is green** — check the latest CI run on the `main` branch. If + any job (Audit, Test, Lint, Examples, Fuzz Compile Check) fails, fix it + before proceeding. Common failures: `cargo vet` missing certifications, + dependency audit advisories, clippy warnings. +2. **Nightly workflow green** for past 7 days. +3. **Fuzz workflow green** for past 7 days. If a fuzz target crashes, open a + GitHub issue with the crash artifact, reproduction command, and base64 input. +4. Fuzz targets compile. Git-sourced deps resolve. + +Key tools: +- `gh run list --workflow=ci.yml --branch=main --limit 5` (CI on main) +- `gh run list --workflow=nightly.yml --limit 7` (nightly) +- `gh run list --workflow=fuzz.yml --limit 7` (fuzz) +- `gh api repos/OWNER/REPO/actions/runs/RUN_ID/jobs` (inspect failed jobs) + +If failures persist >2 days, escalate per `specs/012-maintenance.md`. +If the agent cannot fix a failure, it MUST open a GitHub issue and report the +pass as blocked — never silently skip. ## Execution diff --git a/specs/012-maintenance.md b/specs/012-maintenance.md index b6e12884..79bef7a6 100644 --- a/specs/012-maintenance.md +++ b/specs/012-maintenance.md @@ -111,20 +111,28 @@ dependency rot, or security gaps ship in a release. - Build/test commands work - Pre-PR checklist covers current tooling -### Nightly CI +### CI Health +- **CI on main is green** — the latest CI run on `main` must pass. Any failure + (audit, test, lint, examples) is a blocker that must be fixed before + proceeding with the rest of the maintenance pass. - Nightly and fuzz workflows green for past week - Fuzz targets compile - Git-sourced dependencies still resolve -#### Nightly Escalation Policy +#### Escalation Policy -Failures persisting **>2 consecutive days** are blocking: +Failures persisting **>2 consecutive days** on any workflow (CI, nightly, fuzz) +are blocking: 1. Open GitHub issue with label `ci:nightly` 2. Link failing run(s) 3. Assign to most recent contributor in failing area 4. If upstream dep change: pin to known-good rev, open follow-up issue +**This section is a hard gate.** The maintenance pass MUST NOT be marked +complete or merged while any of the above checks are red. If the agent cannot +fix a failure, it must open a GitHub issue and report the pass as blocked. + ## Deferred Items When a maintenance pass identifies issues too large to fix inline (e.g. @@ -149,7 +157,8 @@ Sections dependencies, tests, examples, code quality, and nightly CI are fully automatable. Security, documentation, specs, simplification, and agent config require human or agent review. -Nightly check enforced by `just check-nightly`, called by `just release-check`. +CI health check enforced by `just check-nightly` (nightly + fuzz) and manual +inspection of CI on `main` (audit, test, lint). Called by `just release-check`. ## Invocation From 1a416dd4d901c52c86af8115565efef36d7db3ca Mon Sep 17 00:00:00 2001 From: Mykhailo Chalyi Date: Mon, 6 Apr 2026 03:49:17 +0000 Subject: [PATCH 2/4] chore: re-trigger CI From fe5a4f92043f8248adec4ceb8b1f3bb23a1f736b Mon Sep 17 00:00:00 2001 From: Mykhailo Chalyi Date: Mon, 6 Apr 2026 03:58:55 +0000 Subject: [PATCH 3/4] fix(ci): certify fastrand:2.4.0 for cargo-vet The Audit CI job has been failing since Mar 28 because cargo-vet could not verify fastrand:2.4.0. The exemption existed but was insufficient when cargo-vet fetched remote registry data in CI (locally it passed due to TLS certificate issues skipping remote fetches). Add an explicit audit entry and remove the redundant exemption. Fixes #1091 --- supply-chain/audits.toml | 5 +++++ supply-chain/config.toml | 4 ---- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/supply-chain/audits.toml b/supply-chain/audits.toml index 6b416bd3..d2964944 100644 --- a/supply-chain/audits.toml +++ b/supply-chain/audits.toml @@ -6,6 +6,11 @@ who = "Mykhailo Chalyi " criteria = "safe-to-deploy" version = "0.39.1" +[[audits.fastrand]] +who = "Mykhailo Chalyi " +criteria = "safe-to-deploy" +version = "2.4.0" + [[audits.hybrid-array]] who = "Mykhailo Chalyi " criteria = "safe-to-deploy" diff --git a/supply-chain/config.toml b/supply-chain/config.toml index 5d695742..f772b0b6 100644 --- a/supply-chain/config.toml +++ b/supply-chain/config.toml @@ -490,10 +490,6 @@ criteria = "safe-to-deploy" version = "0.17.0" criteria = "safe-to-deploy" -[[exemptions.fastrand]] -version = "2.4.0" -criteria = "safe-to-deploy" - [[exemptions.ff]] version = "0.13.1" criteria = "safe-to-deploy" From 5c73bf3900c460e0b72e47028a6708351cd9a261 Mon Sep 17 00:00:00 2001 From: Mykhailo Chalyi Date: Mon, 6 Apr 2026 04:01:25 +0000 Subject: [PATCH 4/4] chore: retrigger CI (runners stuck)