Last updated: 2026-02-28
The frontend is deployed on Vercel from the frontend/ directory.
| Setting | Value |
|---|---|
| Root Directory | frontend |
| Framework | Next.js |
| Build Command | (auto) |
| Install Command | npm ci |
| Output Dir | .next |
Set these in Vercel > Project Settings > Environment Variables:
| Key | Example Value |
|---|---|
NEXT_PUBLIC_SUPABASE_URL |
https://your-project.supabase.co |
NEXT_PUBLIC_SUPABASE_ANON_KEY |
eyJhbGciOiJIUzI1NiIsInR5cCI6... |
These are public keys (embedded in the client bundle). The anon key only grants access allowed by RLS policies.
Critical: Supabase must know your production domain for auth callbacks to work.
- Go to Supabase Dashboard > Authentication > URL Configuration
- Set Site URL to your production domain:
https://your-domain.vercel.app - Add Redirect URLs (at minimum):
https://your-domain.vercel.app/auth/callback - Click Save
For Vercel preview deployments, add a wildcard redirect URL matching your preview domain pattern:
https://*-<your-vercel-username>.vercel.app/auth/callback
Replace <your-vercel-username> with your actual Vercel account name (e.g., https://*-janedoe.vercel.app/auth/callback).
Note: Supabase supports wildcard subdomains in redirect URLs. This allows all Vercel preview deployments to use auth callbacks.
If wildcards aren't supported in your Supabase plan, add each preview domain individually as needed.
User signs up → Supabase sends confirmation email
→ User clicks link → /auth/callback (exchanges code for session)
→ Redirect to /app/search
→ App layout checks onboarding_complete
→ If false → /onboarding/region → /onboarding/preferences → /app/search
The callback route (/auth/callback) is the only auth callback target. It exchanges the Supabase auth code for a session and redirects to /app/search.
Post-login redirect validation happens in the login form (LoginForm.tsx): the redirect query parameter is validated to prevent open-redirect attacks — only relative paths starting with / are accepted, and // prefixes are blocked.
Database deployments are automated via GitHub Actions using .github/workflows/deploy.yml. The workflow is triggered manually from the GitHub Actions UI with an environment selector (production/staging).
Developer triggers via GitHub UI
→ Pre-flight: Schema diff + dry-run option
→ Approval gate (production only — GitHub Environment protection)
→ Pre-deploy backup (supabase db dump → artifact)
→ Push migrations (supabase db push)
→ Post-deploy sanity checks (16 SQL checks)
→ Summary in GitHub Actions step summary
- Go to GitHub → Actions → Deploy Database
- Click Run workflow
- Select the target environment (production or staging)
- Optionally check Dry run to only see the schema diff without deploying
- Click Run workflow
For production deployments, a reviewer must approve the deployment in the GitHub Environment approval UI before the deploy job starts.
| Step | Description | On Failure |
|---|---|---|
| Pre-flight: Schema diff | Shows pending migrations and drift between Git and remote | Informational — does not block |
| Dry run gate | If dry run is checked, stops after showing diff | N/A |
| Approval gate | Production requires reviewer approval via GitHub Environments | Deploy waits indefinitely |
| Pre-deploy backup | supabase db dump --data-only saved as artifact (30-day retention) |
Aborts deployment |
| Push migrations | supabase db push applies pending migrations |
Workflow fails, backup available |
| Post-deploy sanity | Runs all 16 SQL sanity checks against remote | Workflow fails with check details |
| Environment | Approval Required | Wait Timer | Secrets |
|---|---|---|---|
production |
Yes (1+ reviewer) | 5 minutes | SUPABASE_ACCESS_TOKEN, SUPABASE_PROJECT_REF, SUPABASE_DB_PASSWORD |
staging |
No | None | SUPABASE_ACCESS_TOKEN, SUPABASE_STAGING_PROJECT_REF, SUPABASE_DB_PASSWORD |
| Secret | Purpose | Scope |
|---|---|---|
SUPABASE_ACCESS_TOKEN |
CLI authentication | Repository |
SUPABASE_PROJECT_REF |
Production project reference | Environment: production |
SUPABASE_STAGING_PROJECT_REF |
Staging project reference | Environment: staging |
SUPABASE_DB_PASSWORD |
Direct DB access (backup + sanity) | Repository |
Security: All secrets are accessed via
${{ secrets.* }}— never echoed in logs or step outputs. Rotate access tokens quarterly.
The workflow uses concurrency: deploy-<environment> to prevent parallel deployments to the same environment. A new deployment to the same environment will wait for the current one to finish.
sync-cloud-db.yml automatically pushes migrations to staging first (when STAGING_ENABLED=true), then to production on merge to main. The manual deploy.yml workflow is intended for:
- Controlled deployments with approval gates
- Staging deployments
- Re-deployments after failed syncs
- Dry-run schema diff checks
If deploy.yml fails mid-push:
- Download the backup artifact from the workflow run
- Follow Restore Procedures below
- Investigate the failing migration
- Fix and re-trigger the deployment
See also: Issue #121 (Rollback Documentation) for detailed procedures.
- Go to Vercel > Project Settings > Domains
- Add your custom domain (e.g.,
TryVit.example.com) - Configure DNS per Vercel's instructions (CNAME or A record)
- Update Supabase Auth URLs to match:
- Site URL:
https://TryVit.example.com - Redirect URL:
https://TryVit.example.com/auth/callback
- Site URL:
- Keep the Vercel
.vercel.appdomain in Supabase redirect URLs as a fallback
Add these in GitHub > Settings > Secrets and variables > Actions > Repository secrets:
| Secret | Value | Used by |
|---|---|---|
NEXT_PUBLIC_SUPABASE_URL |
Production Supabase URL | CI (fallback) |
NEXT_PUBLIC_SUPABASE_ANON_KEY |
Production Supabase anon key | CI (fallback) |
SUPABASE_URL_STAGING |
Staging Supabase URL | CI + preview E2E |
SUPABASE_ANON_KEY_STAGING |
Staging Supabase anon key | CI + preview E2E |
SUPABASE_SERVICE_ROLE_KEY_STAGING |
Staging service role key | CI auth E2E |
SUPABASE_SERVICE_ROLE_KEY |
Production service role key (fallback) | CI auth E2E |
Note: When staging secrets are configured, CI automatically uses them over the production fallbacks. This ensures CI never touches production.
The CI workflows use a tiered architecture:
- PR Gate (
.github/workflows/pr-gate.yml): Typecheck + Lint → Unit tests + Build (parallel) → Playwright smoke E2E - Main Gate (
.github/workflows/main-gate.yml): Full build + tests + coverage → Full Playwright E2E → SonarCloud (blocking) → Sentry sourcemaps - Nightly (
.github/workflows/nightly.yml): Full Playwright (all projects incl. visual) + Data Integrity Audit
Vercel preview deployments are wired to the staging Supabase project:
| Vercel Environment | Supabase Target | NEXT_PUBLIC_SUPABASE_URL |
NEXT_PUBLIC_SUPABASE_ANON_KEY |
|---|---|---|---|
| Preview | Staging | Staging URL | Staging anon key |
| Production | Production | Production URL | Production anon key |
How it works:
- Developer opens a PR → Vercel creates a preview deployment using Preview env vars (staging Supabase)
- CI
playwright-previewjob waits for the Vercel deployment to be ready - Playwright runs smoke tests against the preview URL (no local dev server needed)
- Preview E2E uses
BASE_URLenv var to overrideplaywright.config.tsbaseURL - The
webServerblock in Playwright config is automatically skipped whenBASE_URLis set
Activation: Set STAGING_ENABLED=true as a GitHub repository variable and configure the staging secrets listed above.
Tip: The preview E2E job is non-blocking during initial rollout — it runs as a separate check and does not prevent merging.
cd frontend
npm ci
npm run type-check # TypeScript check
npm run lint # ESLint
npm run build # Production build
npx playwright test # E2E tests (auto-starts dev server via webServer config)| Item | Value |
|---|---|
| Plan tier | Free (verified 2026-02-22 via Supabase Dashboard > Billing) |
| PITR availability | Not available — PITR requires the Pro plan ($25/month) |
| Daily auto-backups | Yes (Free tier: last 7 days, no PITR, no point-in-time granularity) |
| Backup granularity | Daily snapshot only — no sub-day recovery |
Implication: Because there is no PITR, a bad migration could lose all data written since the last daily snapshot. The
BACKUP.ps1pre-deployment dump is the primary safety net.
RUN_REMOTE.ps1 automatically calls BACKUP.ps1 -Env remote before executing any SQL pipelines. If the backup fails, deployment is aborted.
To skip the backup in an emergency:
.\RUN_REMOTE.ps1 -SkipBackup -ForceWarning: Skipping the backup removes your safety net. Only do this if the backup itself is broken and you have another recovery path.
# Remote (production)
.\BACKUP.ps1 -Env remote
# Local (Docker)
.\BACKUP.ps1 -Env localProduces: backups/cloud_backup_YYYYMMDD_HHmmss.dump (compressed custom format)
Prerequisites:
pg_dumpandpsqlon PATH- Remote:
SUPABASE_DB_PASSWORDenvironment variable (or interactive prompt) - Local: Docker Desktop + Supabase running (
supabase start)
.\scripts\export_user_data.ps1 -Env remoteExports 8 user tables to backups/user_data_YYYYMMDD_HHmmss.json:
user_preferences,user_health_profiles,user_product_lists,user_product_list_itemsuser_comparisons,user_saved_searches,scan_history,product_submissions
# Full database restore (drops and recreates objects)
pg_restore --no-owner --no-privileges --clean --if-exists -d postgres backups/cloud_backup_YYYYMMDD_HHmmss.dumpFor remote restore, set the connection via environment:
$env:PGPASSWORD = "your-password"
pg_restore --no-owner --no-privileges --clean --if-exists `
-h aws-1-eu-west-1.pooler.supabase.com `
-p 5432 `
-U "postgres.uskvezwftkkudvksmken" `
-d postgres `
backups/cloud_backup_YYYYMMDD_HHmmss.dump.\scripts\import_user_data.ps1 -Env local -File backups\user_data_YYYYMMDD_HHmmss.json
.\scripts\import_user_data.ps1 -Env remote -File backups\user_data_YYYYMMDD_HHmmss.jsonImport uses ON CONFLICT DO UPDATE (upsert) — safe to run multiple times.
After any restore, run the full validation suite:
- Sanity checks —
.\RUN_SANITY.ps1 -Env production(17 checks pass) - QA checks —
.\RUN_QA.ps1(all suites pass) - Row counts — verify user table row counts match pre-backup values
- Frontend smoke — load a product detail page and verify data displays correctly
| Metric | Approximate Value |
|---|---|
| Total products | ~1,279 |
| Database size | ~50–100 MB |
| Dump file size | ~10–30 MB |
| Backup duration | ~10–30 seconds |
| User data JSON | < 1 MB |
These will grow as more products and users are added.
Before running RUN_REMOTE.ps1 against production:
- On
mainbranch — must be onmain(script enforces this) - All CI checks pass —
tsc --noEmit, lint, build, Vitest, Playwright - QA checks pass locally —
.\RUN_QA.ps1with local Supabase - Backup runs successfully — automatic via
RUN_REMOTE.ps1, or manual.\BACKUP.ps1 -Env remote - Review the execution plan —
.\RUN_REMOTE.ps1 -DryRunto see which files will execute - Confirm interactively — type
YESwhen prompted (or use-Forcefor CI)
Golden rule: Always take a backup before attempting any fix. If the database is accessible, run
.\BACKUP.ps1 -Env remotefirst.
A migration was successfully applied but introduced a schema error — e.g., dropped a column, altered a constraint incorrectly, or created a conflicting index.
Steps:
-
Identify the bad migration:
SELECT version, name, statements FROM supabase_migrations.schema_migrations ORDER BY version DESC LIMIT 5;
-
Take a current backup (if DB is still accessible):
.\BACKUP.ps1 -Env remote -
Write a compensating migration — a new migration that undoes the damage. Example:
-- ═══════════════════════════════════════════════════════════════════════════ -- Migration: Fix accidental column drop from migration YYYYMMDD_HHMMSS -- Rollback: This migration itself is forward-only; manual DROP if needed -- ═══════════════════════════════════════════════════════════════════════════ BEGIN; -- Re-create the accidentally dropped column ALTER TABLE products ADD COLUMN IF NOT EXISTS product_name text; -- Restore data from backup if needed (backfill from last known-good dump) -- UPDATE products SET product_name = b.product_name -- FROM backup_products b WHERE products.id = b.id; -- Restore any constraints -- ALTER TABLE products ALTER COLUMN product_name SET NOT NULL; COMMIT;
-
Save the compensating migration to
supabase/migrations/with the next timestamp. -
Apply:
# Local test first supabase db push --local .\RUN_QA.ps1 # Then production supabase db push --linked # Or via deploy.yml workflow with approval gate
-
Verify:
.\RUN_SANITY.ps1 -Env production # 17 checks pass .\RUN_QA.ps1 # 724 checks pass
-
Document the incident — write a post-mortem within 24 hours.
The database is corrupted or data integrity is compromised beyond compensating migration repair. Requires full restore from the latest .dump file.
Steps:
-
Locate latest backup:
Get-ChildItem backups/*.dump | Sort-Object LastWriteTime -Descending | Select-Object -First 5
Also check GitHub Actions artifacts from
deploy.ymlruns (30-day retention). -
Verify backup integrity:
pg_restore --list backups/cloud_backup_YYYYMMDD_HHmmss.dump | head -20If this prints a table of contents, the file is valid. If it errors, the dump is corrupt — try an older backup.
-
Take a snapshot of the current (broken) state (if accessible):
.\BACKUP.ps1 -Env remote # Save as evidence for post-mortem
-
Restore the backup:
$env:PGPASSWORD = "your-password" pg_restore --no-owner --no-privileges --clean --if-exists ` -h aws-1-eu-west-1.pooler.supabase.com ` -p 5432 ` -U "postgres.uskvezwftkkudvksmken" ` -d postgres ` backups/cloud_backup_YYYYMMDD_HHmmss.dump
Flags explained:
--clean --if-exists— drops objects before recreating (safe even if objects don't exist)--no-owner --no-privileges— avoids permission errors on Supabase managed roles
-
Re-apply migrations newer than the backup (if any):
# Check which migrations are recorded in the restored DB psql -c "SELECT version FROM supabase_migrations.schema_migrations ORDER BY version DESC LIMIT 10;" # Manually apply any missing migrations after the backup timestamp
-
Verify:
.\RUN_SANITY.ps1 -Env production # 17 checks pass .\RUN_QA.ps1 # 724 checks pass
-
Verify product count:
SELECT COUNT(*) FROM products WHERE is_deprecated IS NOT TRUE; -- Expected: ≥ 1,279
-
Spot-check the frontend — load a product detail page and verify data displays correctly.
User-facing data (preferences, lists, scan history) was lost or corrupted, but the schema and product data are intact.
Steps:
-
Locate the user data export:
Get-ChildItem backups/user_data_*.json | Sort-Object LastWriteTime -Descending | Select-Object -First 5
If no export exists, create one from the backup dump manually.
-
Import user data:
.\scripts\import_user_data.ps1 -Env remote -File backups\user_data_YYYYMMDD_HHmmss.json
Uses
ON CONFLICT DO UPDATE(upsert) — safe to run multiple times. FK dependency order is handled automatically. -
Verify user table row counts:
SELECT 'user_preferences' AS tbl, COUNT(*) FROM user_preferences UNION ALL SELECT 'user_health_profiles', COUNT(*) FROM user_health_profiles UNION ALL SELECT 'user_product_lists', COUNT(*) FROM user_product_lists UNION ALL SELECT 'user_product_list_items', COUNT(*) FROM user_product_list_items UNION ALL SELECT 'user_comparisons', COUNT(*) FROM user_comparisons UNION ALL SELECT 'user_saved_searches', COUNT(*) FROM user_saved_searches UNION ALL SELECT 'scan_history', COUNT(*) FROM scan_history UNION ALL SELECT 'product_submissions', COUNT(*) FROM product_submissions;
-
Verify auth still works — log in with a test account, view preferences, check saved lists.
A bad frontend deployment was pushed — the site is broken, shows errors, or has a critical UX regression.
Steps:
- Go to Vercel Dashboard → Project → Deployments
- Find the last known-good deployment (green checkmark before the bad one)
- Click "..." → "Promote to Production"
- Wait ~30 seconds for the rollback to propagate
- Verify:
- Home page loads (
/) - Search works (
/app/search) - Product detail page renders data (
/app/product/[id]) - Auth callback works (
/auth/callback) - Health endpoint returns 200 (
/api/health)
- Home page loads (
- If the rollback needs to stay in place, revert the bad commit on
mainto prevent the next push from re-deploying the broken code.
A migration applied successfully but introduced data corruption — e.g., an UPDATE with incorrect WHERE clause, a bad DEFAULT value, or a trigger that modified existing rows.
Steps:
-
Assess the damage:
.\RUN_QA.ps1 # Check which suites fail — this identifies affected data -
If damage is limited (a few rows affected):
- Write a compensating SQL script to fix the data
- Apply and verify with QA
-
If damage is widespread (many tables/rows affected):
- Follow Scenario 2 (full restore from backup)
-
If data was deleted irreversibly:
- Restore from backup (Scenario 2)
- Accept data loss between backup time and incident time
- Document the gap in the post-mortem
Copy-paste this into your incident channel (Slack/Discord/Teams) when a production incident occurs. For the full incident response process (severity classification, escalation ladder, runbooks, post-mortem template), see docs/INCIDENT_RESPONSE.md.
## 🚨 Production Incident — [DATE] [TIME UTC]
**Reported by:** @name
**Severity:** P1 / P2 / P3
**Impact:** [Describe what users are experiencing]
### Immediate Actions
- [ ] Stop any in-progress deployments (cancel GitHub Actions run on deploy.yml)
- [ ] Stop `sync-cloud-db.yml` if running (cancel workflow)
- [ ] Take a current backup if DB is accessible: `.\BACKUP.ps1 -Env remote`
- [ ] Export user data if schema is intact: `.\scripts\export_user_data.ps1 -Env remote`
### Investigation
- [ ] Identify root cause (check: migration logs, Supabase dashboard logs, Vercel deployment logs)
- [ ] Identify scope — which tables/data/features are affected
- [ ] Check `supabase_migrations.schema_migrations` for recently applied migrations
### Recovery
- [ ] Choose restore scenario (1–5 from DEPLOYMENT.md Rollback Procedures)
- [ ] Execute restore with a second person verifying each step
- [ ] Run `.\RUN_SANITY.ps1 -Env production` — all 17 checks pass
- [ ] Run `.\RUN_QA.ps1` against production data — all 724 checks pass
- [ ] Verify frontend loads correctly (home, search, product detail, auth)
- [ ] Verify `/api/health` returns 200
### Communication
- [ ] Notify stakeholders of impact and ETA
- [ ] Update status page (if applicable)
- [ ] Write post-mortem within 24 hours
### Post-mortem Template
- **Timeline:** When was the incident detected? When was it resolved?
- **Root cause:** What exactly went wrong?
- **Impact:** How many users were affected? For how long?
- **Recovery:** What steps were taken? How long did recovery take?
- **Prevention:** What changes will prevent recurrence?If normal tooling fails (Supabase CLI, scripts), use direct psql access:
# Set credentials
$env:PGPASSWORD = "your-db-password"
# Direct connection (bypasses CLI, bypasses pooler)
psql -h db.uskvezwftkkudvksmken.supabase.co `
-p 5432 `
-U postgres `
-d postgresWhen to use break-glass:
- Supabase CLI is down or unresponsive
- GitHub Actions is not available
RUN_REMOTE.ps1orBACKUP.ps1are failing for script-level reasons- You need to run a manual SQL fix immediately
Security note: Direct database access bypasses all application-level security. Use only during incidents. Log all manual SQL commands for the post-mortem.
Frequency: Run this drill at least once per quarter, and after every time the backup or restore scripts change. Last drill: 2026-02-23 | Next scheduled: 2026-05-23 Full report:
docs/DISASTER_DRILL_REPORT.md
The DR drill is fully automated via RUN_DR_DRILL.ps1, which runs 6 scenarios:
| Scenario | Description | TTR Target | Recovery Method |
|---|---|---|---|
| A | Bad Migration (column drop) | < 5 min | SAVEPOINT/ROLLBACK |
| B | Table Truncation (data loss) | < 5 min | SAVEPOINT/ROLLBACK |
| C | Full Backup Restore | < 30 min | pg_restore / supabase db reset |
| D | User Data Restore | < 5 min | SAVEPOINT/ROLLBACK or import script |
| E | Frontend Deployment Rollback | < 5 min | Vercel "Promote to Production" |
| F | API Endpoint Failure | < 10 min | Compensating migration |
# Run all scenarios against local Supabase
.\RUN_DR_DRILL.ps1 -Env local
# Run a specific scenario
.\RUN_DR_DRILL.ps1 -Env local -Scenario A
# Skip full restore (quick validation)
.\RUN_DR_DRILL.ps1 -Env local -SkipRestore
# JSON output for CI integration
.\RUN_DR_DRILL.ps1 -Env local -Json -OutFile dr-results.json
# Run against staging
.\RUN_DR_DRILL.ps1 -Env stagingPrerequisites:
- Docker Desktop running +
supabase start(local mode) psql,pg_dump,pg_restoreon PATH- At least one backup file in
backups/ - For staging:
SUPABASE_STAGING_DB_PASSWORDenvironment variable
Drill scripts: supabase/dr-drill/ directory contains per-scenario SQL files and post-drill verification queries.
For a manual walkthrough (useful for training or when automation cannot run):
# 1. Create backup of healthy local DB
.\BACKUP.ps1 -Env local
# 2. Verify backup exists
Get-ChildItem backups/local_backup*.dump | Sort-Object LastWriteTime -Descending | Select-Object -First 1
# 3. Apply destructive migration to simulate disaster
psql -h 127.0.0.1 -p 54322 -U postgres -d postgres -c "ALTER TABLE products DROP COLUMN product_name;"
# 4. Confirm QA detects the breakage
.\RUN_QA.ps1
# Expected: multiple suite failures (product_name referenced in views, queries, API contract)
# 5. Restore from backup
pg_restore --clean --if-exists --no-owner --no-privileges `
-h 127.0.0.1 -p 54322 -U postgres -d postgres `
backups/local_backup_YYYYMMDD_HHmmss.dump
# Alternative: full reset (reapplies all migrations from scratch)
supabase db reset
# 6. Verify recovery
.\RUN_SANITY.ps1 # 17 checks pass
.\RUN_QA.ps1 # 724 checks pass| Recovery Method | TTR | Data Loss | When to Use |
|---|---|---|---|
| SAVEPOINT/ROLLBACK | < 100 ms | Zero | Failure caught within active transaction |
| Compensating migration | 5–30 min | Varies | Schema error committed, data intact |
| User data import (JSON) | < 2 min | Since last export | User data loss, schema intact |
| Full backup restore | 10–30 min | Since last backup | Widespread corruption |
| supabase db reset | < 30 sec | All user data | Local dev only |
| Vercel Promote | ~30 sec | Zero | Frontend deployment broken |
### Drill #N — [Date]
- **Operator:** @name
- **Environment:** local / staging
- **Scenario(s) run:** A, B, C, D, E, F
- **All scenarios passed:** yes / no
- **Total TTR (worst scenario):** ___
- **All QA checks passed after restore:** yes / no
- **Issues encountered:** ___
- **Lessons learned:** ___pg_restore --cleanon a running DB with active connections requires--if-existsto avoid errors on missing objects- Always verify backup integrity (
pg_restore --list) before relying on it for restore TRUNCATE CASCADEcascades to 3+ dependent tables — verify ALL dependent tables after recovery- QA suite catches column drops immediately — sanity + QA provides comprehensive post-restore validation
- The compensating migration approach (Scenario 1) is preferred over full restore when the issue is isolated to schema changes
- SAVEPOINT/ROLLBACK provides near-instant recovery (< 100 ms) but requires the failure to be caught within a transaction