A modular, security-first pipeline designed to bridge the gap between data science experimentation and production-grade software engineering. This suite automates the conversion of .ipynb notebooks into clean, PEP 8 compliant Python scripts while enforcing strict security protocols.
The suite operates as a decoupled orchestration layer, ensuring that your logic is transformed without compromising the integrity of your production environment.
- Extraction: Parses JSON-based notebooks, stripping "magics" (e.g.,
%%bash) and wrapping code inif __name__ == "__main__":guards. - Security Gate: A zero-trust
security_scan.pymodule audits the generated code for high-entropy strings or API patterns before allowing a Git push. - Secure Git Ops: Utilizes
GIT_ASKPASSto handle authentication, ensuring Personal Access Tokens (PATs) never leak into system logs or shell history. - Repo Hygiene: Automatically bootstraps the destination repository with a
.gitignoreand descriptive documentation.
nb_refactor_suite/
├── nb_refactor/ # Core Package
│ ├── cli.py # Orchestrator (The "Brain")
│ ├── git_ops.py # Secure Git integration (GIT_ASKPASS)
│ ├── security_scan.py # Pre-push secret detection
│ └── notebook_refactor.py # IPYNB -> PY logic
└── ...
To prevent "Git noise," the suite uses a write_text_if_changed utility. It compares hashes of existing files and only performs a write/commit if changes are detected, keeping your contribution graph meaningful.
The pipeline assumes all notebooks are "unsafe." Even if a push is triggered, the security_scan acts as a hard gate. If a potential secret is detected, the entire operation is aborted.
The architecture is provider-agnostic. While currently tuned for GitHub, the git_ops.py module is isolated, allowing for an easy swap to GitLab or Bitbucket without touching the core transformation logic.
- Configure: Add your
PATandREPO_URLto.env. - Install:
pip install -r requirements.txt - Execute: ```bash python -m nb_refactor