Skip to content

Add pre-commit hook to sanitize Jupyter notebooks (strip noisy metadata / execution counts, keep cell source + outputs) #333

@conorheins

Description

@conorheins

Notebook diffs in pymdp often include noisy diffs from Jupyter metadata (e.g., metadata.kernelspec, metadata.language_info) and execution-count churn. Let’s add a pre-commit hook that automatically sanitizes notebooks before commit while still keeping cell source/markdown (and outputs) intact.

Proposed changes

  • Add .pre-commit-config.yaml with nbstripout:

    repos:
      - repo: https://github.com/kynan/nbstripout
        rev: 0.8.2
        hooks:
          - id: nbstripout
            files: \.ipynb$
            args:
              - --keep-output
              - "--extra-keys=metadata.kernelspec metadata.language_info"
  • Add pre-commit to a dev dependency group (e.g. [dependency-groups].test or a new [dependency-groups].dev) since we already ship nbstripout in pyproject.toml.

  • Add a short note to CONTRIBUTING.md / README:

    • uv sync --group test (or --group dev)
    • uv run pre-commit install
    • (optional) uv run pre-commit run --all-files

Why this approach

  • Hook runs locally and reduces notebook diff noise without changing our CI workflows.
  • nbstripout supports running as a pre-commit hook and pre-commit uses .pre-commit-config.yaml at repo root. (link to nbstripout)

Metadata

Metadata

Assignees

No one assigned

    Labels

    cleanupCode cleanups and improvementsenhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions