Hancock uses GitHub Actions for continuous integration, security scanning, benchmarking, and releases. All workflows live in .github/workflows/.
- Workflow Overview
- Test Workflow
- Benchmark Workflow
- Security Workflow
- Release Workflow
- Supporting Workflows
- Required Secrets
- Local Equivalents
| Workflow file | Trigger | Purpose |
|---|---|---|
test.yml |
Push to main/master, PRs |
Lint + unit tests on Python 3.10–3.12 |
benchmark.yml |
PRs to main/master |
Latency benchmark, posts p99 table to PR |
security.yml |
Push, PR, weekly (Monday) | Bandit SAST, pip-audit, Trivy container scan |
release.yml |
Tagged push (v*.*.*) |
Build/push Docker image, create GitHub Release |
deploy.yml |
Push to main (docs/**) |
Deploy docs/ to Netlify |
python-package.yml |
All pushes | Secondary lint + pytest run |
finetune.yml |
Manual workflow_dispatch |
Fine-tune Hancock on Modal GPU |
codeql.yml |
Push, PR, schedule | CodeQL static analysis |
File: .github/workflows/test.yml
Triggers: Push to main/master, all pull requests
Runs a matrix build across Python 3.10, 3.11, and 3.12.
- Lint (critical errors only) —
flake8with--select E9,F63,F7,F82. These flags catch syntax errors, undefined names, and broken imports. The build fails on any match. - pytest — Runs the full test suite in
tests/.
All three Python versions must pass. A red check on any matrix leg blocks merge.
# Lint (matches CI exactly)
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# Tests
pytest tests/ -vFile: .github/workflows/benchmark.yml
Triggers: Pull requests to main/master
Runs tests/benchmark_suite.py and posts a latency summary table as a PR comment.
- p50, p95, and p99 response times for
/health,/models,/mode, and/chat - 50 iterations per endpoint after a 5-request warm-up
- Hard failure if p99 > 500 ms on non-LLM endpoints
The PR comment shows a table like:
| Endpoint | p50 | p95 | p99 | Status |
|-------------|-------|--------|--------|--------|
| GET /health | 3 ms | 6 ms | 9 ms | ✅ |
| POST /chat | 41 ms | 89 ms | 134 ms | ✅ |
A regression in p99 will fail the job and block merge. See performance.md for target values and tuning advice.
pytest tests/benchmark_suite.py -vFile: .github/workflows/security.yml
Triggers: Push, pull requests, weekly schedule (every Monday)
Three independent security scans run in parallel:
Scans Python source for common security issues (hardcoded credentials, use of eval, insecure deserialization, etc.).
# Local equivalent
pip install bandit
bandit -r . -ll # medium severity and aboveChecks all installed packages against the OSV and PyPI advisory databases.
# Local equivalent
pip install pip-audit
pip-auditBuilds the Docker image and scans it for CRITICAL and HIGH CVEs in OS packages and Python dependencies.
# Local equivalent
docker build -t hancock:local .
trivy image --severity CRITICAL,HIGH hancock:local- Bandit findings: Review the flagged line and either fix the code or add a
# noseccomment with a justification. - pip-audit findings: Update the affected package in
requirements.txtto a patched version. - Trivy findings: Update the base image or the vulnerable OS/Python package.
The weekly schedule ensures dependency vulnerabilities are caught even when no code is pushed.
File: .github/workflows/release.yml
Triggers: Tagged pushes matching v*.*.* (e.g., v0.6.0)
- Build Docker image — built from the root
Dockerfile - Push to GHCR — pushes to
ghcr.io/cyberviser/hancockwith three tags:- Full semver:
v0.6.0 - Minor:
v0.6 latest
- Full semver:
- Extract changelog — pulls the relevant section from
CHANGELOG.md - Create GitHub Release — attaches the changelog section as release notes
# Tag and push — this triggers the workflow automatically
git tag v0.6.0
git push origin v0.6.0The image will be available at ghcr.io/cyberviser/hancock:v0.6.0 a few minutes after the tag is pushed.
Hancock follows Semantic Versioning:
- Patch (
v0.5.1): Bug fixes, dependency updates - Minor (
v0.6.0): New features, new agent modes, backwards-compatible changes - Major (
v1.0.0): Breaking API changes
Deploys the docs/ directory to Netlify whenever files under docs/ are pushed to main. Requires NETLIFY_AUTH_TOKEN and NETLIFY_SITE_ID secrets. Posts a preview URL as a PR comment.
A secondary lint + test run that also runs flake8 with --exit-zero (reports style warnings without failing) alongside the stricter check in test.yml.
Manual workflow for fine-tuning Hancock on Modal GPUs (T4, A10G, A100). Dispatched via the GitHub Actions UI with optional --dry-run and --push-hub flags. Requires MODAL_TOKEN_ID and MODAL_TOKEN_SECRET secrets.
GitHub's CodeQL engine performs deep static analysis on pushes, PRs, and a schedule. Results appear in the Security → Code scanning tab. High and critical alerts should be resolved before merge.
Configure these in Settings → Secrets and variables → Actions:
| Secret | Required by | Purpose |
|---|---|---|
NETLIFY_AUTH_TOKEN |
deploy.yml |
Netlify deployment |
NETLIFY_SITE_ID |
deploy.yml |
Target Netlify site |
MODAL_TOKEN_ID |
finetune.yml |
Modal GPU compute |
MODAL_TOKEN_SECRET |
finetune.yml |
Modal GPU compute |
GHCR_TOKEN |
release.yml |
Push to GitHub Container Registry |
The GITHUB_TOKEN (automatically provided by Actions) is used by the release and CodeQL workflows.
Run these locally to replicate what CI does before opening a PR:
# 1. Lint (fail-fast, matches test.yml)
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# 2. Full test suite
pytest tests/ -v
# 3. Benchmark
pytest tests/benchmark_suite.py -v
# 4. SAST
bandit -r . -ll
# 5. Dependency audit
pip-audit
# 6. Container scan
docker build -t hancock:local .
trivy image --severity CRITICAL,HIGH hancock:local
# 7. Pre-flight checks
python deploy/startup_checks.py