Skip to content

Add scaffold tool for creating new health checks [2/2]#78

Open
gustcol wants to merge 7 commits intofacebookresearch:mainfrom
gustcol:feature/health-check-scaffold
Open

Add scaffold tool for creating new health checks [2/2]#78
gustcol wants to merge 7 commits intofacebookresearch:mainfrom
gustcol:feature/health-check-scaffold

Conversation

@gustcol
Copy link
Contributor

@gustcol gustcol commented Mar 1, 2026

Summary

Ref: #75
Depends on: #77

  • Introduce bin/create_new_health_check.py, a scaffold tool that automates the creation of new health checks by generating all required files and registering the check across all touchpoints
  • Generated checks use the HealthCheckRuntime context manager from Add HealthCheckRuntime context manager for shared boilerplate [1/2] #77
  • Supports --group flag for grouped checks (@click.group()) and --dry-run for previewing changes
  • Idempotent — safe to re-run without duplicating entries
  • Updates the "Adding New Health Check" documentation with a Quick Start section

Stacked PR series: [1/2] Runtime helper (#77) → [2/2] Scaffold tool (this PR)

What the tool does

$ python bin/create_new_health_check.py check_ntp_sync --dry-run
[dry-run] Would create: gcm/health_checks/checks/check_ntp_sync.py
[dry-run] Would create: gcm/tests/health_checks_tests/test_check_ntp_sync.py
[dry-run] Would create: website/docs/GCM_Health_Checks/health_checks/check-ntp-sync.md
[dry-run] Would add import to gcm/health_checks/checks/__init__.py
[dry-run] Would add __all__ entry to gcm/health_checks/checks/__init__.py
[dry-run] Would add checks.check_ntp_sync to gcm/health_checks/cli/health_checks.py
[dry-run] Would add CHECK_NTP_SYNC to gcm/schemas/health_check/health_check_name.py
[dry-run] Would add disable_check_ntp_sync to gcm/monitoring/features/.../health_checks_features.py
Action Target Insert Order
Create gcm/health_checks/checks/check_{name}.py
Create gcm/tests/health_checks_tests/test_check_{name}.py
Create website/docs/.../check-{name}.md
Modify checks/__init__.py Alphabetical import + append __all__
Modify cli/health_checks.py Append to list_of_checks
Modify health_check_name.py Alphabetical enum entry
Modify health_checks_features.py Alphabetical disable_ field

Test plan

  • nox -s tests -- gcm/tests/test_create_health_check.py — 15 tests covering validation, file generation, registration, idempotency, dry-run, grouped checks
  • python bin/create_new_health_check.py check_test_example --dry-run — integration smoke test
  • nox -s lint
  • nox -s format
  • Verify Docusaurus website builds with updated docs

…plate

Extract the ~30 lines of repeated setup code (logger init, GPU node ID
detection, derived cluster resolution, TelemetryContext + OutputContext
nesting, killswitch check) into a reusable HealthCheckRuntime dataclass
context manager. This reduces per-subcommand boilerplate from ~30 lines
to ~5 lines.

The helper is purely additive — existing checks continue to work
unchanged. New checks can use `with HealthCheckRuntime(...) as rt:`
instead of manually wiring up the setup ceremony.

Includes comprehensive tests covering field initialization, killswitch
behavior, context manager nesting, GPU node ID failure handling, and
the finish() convenience method.

Refs: facebookresearch#75
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

CI Commands

The following CI workflows run automatically on every push and pull request:

Workflow What it runs
GPU Cluster Monitoring Python CI lint, tests, typecheck, format, deb build, pyoxidizer builds
Go packages CI shelper tests, format, lint

The following commands can be used by maintainers to trigger additional tests that require access to secrets:

Command Description Requires approval?
/metaci tests Runs Meta internal integration tests (pytest) Yes — a maintainer must trigger the command and approve the deployment request
/metaci integration tests Same as above (alias) Yes

Note: Only repository maintainers (OWNER association) can trigger /metaci commands. After commenting the command, a maintainer must also navigate to the Actions tab and approve the deployment to the graph-api-access environment before the jobs will run. See the approval guidelines for what to approve or reject.

gustcol added 3 commits March 1, 2026 12:51
Apply ufmt formatting and fix mypy errors in test helper
by using explicit typed parameters instead of **kwargs dict
unpacking.
Introduce bin/create_new_health_check.py that automates the creation
of new health checks by generating all required files (check module,
test skeleton, documentation stub) and registering the check across
all touchpoints (checks/__init__.py, CLI entry point, HealthCheckName
enum, killswitch feature flag).

The tool supports single-command and grouped-command checks, has a
dry-run mode, and is idempotent (safe to re-run). Generated checks
use the new HealthCheckRuntime context manager from the previous
commit.

Also updates the "Adding New Health Check" guide with a Quick Start
section pointing to the scaffold tool.

Refs: facebookresearch#75
- Replace len() slice with constant to avoid E203 whitespace-before-colon
- Remove placeholder-less f-strings (F541)
- Remove unused StringIO import (F401)
@gustcol gustcol force-pushed the feature/health-check-scaffold branch from 158959c to 4110400 Compare March 1, 2026 11:58
Add type narrowing asserts for importlib spec/loader (which return
Optional types), type the dynamically-loaded scaffold module as Any
to allow attribute access, and annotate the scaffold_env fixture
return type as Iterator[Path].
```bash
python bin/generate_features.py
ufmt format gcm
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run these from create_new_health_check.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea — the scaffold tool now runs generate_features.py and ufmt format gcm automatically as a post-scaffold step (skipped in --dry-run mode). Updated the docs to remove the manual instructions.

gustcol added 2 commits March 1, 2026 22:02
The scaffold tool now runs generate_features.py and ufmt format gcm
as a post-scaffold step, so users no longer need to run them manually.
Updated docs to reflect the automated workflow.
- Remove unused import sys from generated templates (would fail lint)
- Remove unused check_name parameter from run_post_scaffold
- Add guard for silent no-op when CLI anchor string is not found
- Fix dry-run inconsistency in update_init __all__ block
- Remove tautological assertion in test_update_init_idempotent
- Add test for missing CLI anchor warning
- Update Quick Start docs to list all remaining manual steps
Copy link
Member

@luccabb luccabb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gustcol if you could move the base branch to #77 instead of main

@luccabb
Copy link
Member

luccabb commented Mar 12, 2026

@claude

@claude
Copy link

claude bot commented Mar 12, 2026

Claude finished @luccabb's task in 3m 5s —— View job


Updating PR Base Branch

  • Gather context
  • Analyze what's needed to change the base branch

Summary

The request is to retarget PR #78 so it points to PR #77's branch instead of main. This would make the diff show only the scaffold tool changes (not the runtime helper changes from #77).

What I found: The current branch feature/health-check-scaffold is already structured correctly — the runtime commits (8bc8cb6, a076093) sit below the scaffold commits (b22f798 through fe67af4), so no rebasing is needed. Only the GitHub base branch setting needs to change.

However, I'm unable to complete this automatically because:

  • curl / network requests require approval that hasn't been granted
  • gh CLI requires approval that hasn't been granted
  • I cannot change the PR base branch without an authenticated GitHub API call

How to fix this manually

Option 1 — GitHub UI:

  1. Go to Add scaffold tool for creating new health checks [2/2] #78
  2. Click Edit next to the PR title
  3. Change the base dropdown from main to PR Add HealthCheckRuntime context manager for shared boilerplate [1/2] #77's branch (e.g. feature/health-check-runtime)
  4. Click Change base

Option 2 — gh CLI (from your local machine):

# First find PR #77's branch
gh pr view 77 --json headRefName -q .headRefName

# Then update the base branch
gh pr edit 78 --base <pr77-branch-name>

No code changes are needed — the git history is already correct for a stacked PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants