Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions .cursor/commands/resolve-production-issue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Resolve Production Issue - Mandatory Hotfix Workflow

## CRITICAL: Follow #file:../rules/cursorrules.mdc EXACTLY

### STEP 1: Prepare Hotfix Branch (MANDATORY)

#### Branch Setup
- [ ] Ensure clean working tree (starting from local `dev`)
- [ ] Fetch latest remote references: `git fetch origin`
- [ ] Create hotfix branch from remote master: `git checkout -b hotfix/issue-{ISSUE}-{slug} origin/master`
- [ ] Add `in-progress` label to the issue: `gh issue edit {ISSUE} --add-label in-progress`
- [ ] Keep implementation scoped only to production fix requirements
- [ ] Ensure preprod compatibility for config/deployment updates

---

### STEP 2: Issue Intake & Plan Confirmation

- [ ] Read the full issue, incident details, and any linked docs/runbooks
- [ ] Identify blast radius (service, API routes, infra, data, auth, tenant impact)
- [ ] Define rollback and validation strategy before coding
- [ ] Post issue comment with:
- root-cause hypothesis
- planned fix scope
- preprod + production validation plan
- [ ] STOP! Announce plan and wait for user approval before changes

### STEP 3: Implement Hotfix (Scoped + Safe)

- [ ] Implement only the minimal safe fix for production behavior
- [ ] Follow Python + FastAPI clean architecture conventions:
- Domain entities in `coaching/src/domain/entities/`
- Value objects in `coaching/src/domain/value_objects/`
- Ports in `coaching/src/domain/ports/`
- Application services in `coaching/src/application/`
- Infrastructure adapters in `coaching/src/infrastructure/`
- API routes in `coaching/src/api/routes/`
- Core types/constants in `coaching/src/core/`
- [ ] Use Pydantic models, not untyped `dict[str, Any]` in domain
- [ ] Preserve tenant isolation checks on all data access
- [ ] Update workflow/deployment checks when reliability is part of the incident
- [ ] Remove temporary code, debug logs, and dead code

### STEP 4: Validate on Hotfix Branch (Preprod First)

- [ ] Run lint/type/tests locally before push:
```powershell
# Lint + format
python -m ruff check coaching/ shared/ --fix
python -m ruff format coaching/ shared/

# Type checking
python -m mypy coaching/src shared/ --explicit-package-bases

# Tests
cd coaching && uv run pytest --cov=src
```
- [ ] Push hotfix branch: `git push -u origin hotfix/issue-{ISSUE}-{slug}`
- [ ] Confirm preprod deployment workflow runs successfully
- [ ] Execute manual validation in preprod for the incident scenario
- [ ] If validation fails: fix on same hotfix branch and repeat this step

### STEP 5: Promote Hotfix to Production

- [ ] Open PR `hotfix/issue-{ISSUE}-{slug} -> master`
- [ ] Include incident context, root cause, and validation evidence in PR body
- [ ] Merge PR only after preprod validation is confirmed
- [ ] Watch production deployment workflow end-to-end
- [ ] Verify post-deploy checks:
- Lambda/runtime state healthy (if applicable)
- API health endpoint responsive
- CORS preflight/critical route behavior validated

### STEP 6: Propagate Fix Downstream (MANDATORY)

- [ ] Open PR `master -> staging`, merge after checks pass
- [ ] Open PR `staging -> dev`, merge after checks pass
- [ ] Ensure all three branches now contain the hotfix commit(s)

### STEP 7: Close Incident + Cleanup

- [ ] Post closing summary on the issue with:
- root cause
- final fix
- validation evidence (preprod + production)
- follow-up actions (if any)
- [ ] Remove `in-progress` label
- [ ] Close issue with state_reason: `completed`
- [ ] Delete hotfix branches (local + remote) only after downstream merges:
```powershell
git branch -d hotfix/issue-{ISSUE}-{slug}
git push origin --delete hotfix/issue-{ISSUE}-{slug}
```

---

**Non-negotiables:**
- ❌ Never commit directly to `dev`/`staging`/`master`
- ❌ Never skip preprod validation before `hotfix -> master` merge
- ❌ Never leave workflow reliability gaps unverified for deployment incidents
- ❌ Never leave mock data, TODOs, or `dict[str, Any]` in domain
- ❌ Never skip tenant isolation checks
- ✅ Always keep scope to production incident requirements
- ✅ Always document root cause and validation evidence
- ✅ Always merge `master -> staging -> dev` after hotfix production release
- ✅ Always keep docs/config/workflows in sync with the fix
181 changes: 181 additions & 0 deletions .github/workflows/deploy-preprod-hotfix.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
name: Deploy Preprod Hotfix

on:
push:
branches:
- "hotfix/**"
workflow_dispatch:
inputs:
branch:
description: "Branch to deploy (defaults to current ref)"
required: false
type: string
skip_tests:
description: "Skip tests before deployment"
required: false
default: "false"
type: choice
options:
- "true"
- "false"

concurrency:
group: deploy-preprod-hotfix-${{ github.ref_name }}
cancel-in-progress: true

jobs:
pre-deployment-checks:
name: Pre-Deployment Validation
runs-on: ubuntu-latest
if: ${{ github.event_name != 'workflow_dispatch' || github.event.inputs.skip_tests != 'true' }}
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch || github.ref }}

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Set up uv
uses: astral-sh/setup-uv@v5
with:
version: "latest"
enable-cache: true

- name: Create virtual environment
run: uv venv .venv

- name: Install dependencies
run: |
source .venv/bin/activate
uv pip install -r coaching/requirements.txt
uv pip install -r coaching/requirements-dev.txt
shell: bash

- name: Run Ruff Linting
run: |
source .venv/bin/activate
python -m ruff check . --exclude=".venv,venv,__pycache__,.pytest_cache"
shell: bash

- name: Run MyPy Type Checking
run: |
source .venv/bin/activate
python -m mypy coaching/src/ shared/ --config-file=pyproject.toml
shell: bash

- name: Run Unit Tests
run: |
source .venv/bin/activate
python -m pytest coaching/tests/unit/ -v --cov=coaching/src --cov-fail-under=70
shell: bash
env:
PYTHONPATH: coaching:shared:.

deploy-coaching:
name: Deploy to Preprod
runs-on: ubuntu-latest
needs: [pre-deployment-checks]
if: ${{ always() && (needs.pre-deployment-checks.result == 'success' || (github.event_name == 'workflow_dispatch' && github.event.inputs.skip_tests == 'true')) }}
permissions:
id-token: write
contents: read
steps:
- name: Checkout
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch || github.ref }}

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Install Pulumi Python dependencies
working-directory: coaching/pulumi
run: pip install -r requirements.txt

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: us-east-1
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

- name: Deploy Coaching Service
uses: pulumi/actions@v5
with:
command: up
stack-name: preprod
work-dir: coaching/pulumi
env:
PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: us-east-1

smoke-tests:
name: Post-Deployment Smoke Tests
runs-on: ubuntu-latest
needs: [deploy-coaching]
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch || github.ref }}

- name: Install Pulumi CLI
uses: pulumi/actions@v5
with:
pulumi-version: "latest"

- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: us-east-1
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

- name: Get API Gateway URL
id: api-url
working-directory: coaching/pulumi
run: |
URL=$(pulumi stack output customDomainUrl --stack preprod)
echo "url=$URL" >> $GITHUB_OUTPUT
env:
PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}

- name: Health Check
run: |
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "${{ steps.api-url.outputs.url }}/health" || echo "000")
if [ "$HTTP_CODE" != "200" ] && [ "$HTTP_CODE" != "404" ]; then
echo "❌ Health check failed with HTTP $HTTP_CODE"
exit 1
fi
echo "✅ Health check passed ($HTTP_CODE)"

- name: CORS Preflight Check
run: |
ORIGIN="https://preprod.purposepath.app"
TARGET="${{ steps.api-url.outputs.url }}/api/v1/ai/execute-async"
CORS_HEADERS=$(curl -s -D - -o /dev/null -X OPTIONS "$TARGET" \
-H "Origin: $ORIGIN" \
-H "Access-Control-Request-Method: POST" \
-H "Access-Control-Request-Headers: Authorization,Content-Type,X-Tenant-Id")

ALLOW_ORIGIN=$(echo "$CORS_HEADERS" | tr -d '\r' | awk -F': ' 'tolower($1)=="access-control-allow-origin"{print $2}' | tail -n 1)
ALLOW_CREDENTIALS=$(echo "$CORS_HEADERS" | tr -d '\r' | awk -F': ' 'tolower($1)=="access-control-allow-credentials"{print $2}' | tail -n 1)

if [ "$ALLOW_ORIGIN" != "$ORIGIN" ]; then
echo "❌ Invalid Access-Control-Allow-Origin: '$ALLOW_ORIGIN' (expected '$ORIGIN')"
exit 1
fi
if [ "$ALLOW_CREDENTIALS" != "true" ]; then
echo "❌ Invalid Access-Control-Allow-Credentials: '$ALLOW_CREDENTIALS' (expected 'true')"
exit 1
fi

echo "✅ CORS preflight returned expected headers"
Loading