Governance: Mandatory subsection for any issue whose acceptance criteria, technical plan, or scope mentions "backfill".
Parent: #195 — Execution Governance Blueprint
Extends: #208 — Backfill Orchestration & Validation Framework
| Concern | What to Document |
|---|---|
| Expected runtime | How long at current data? At 10K? At 50K? |
| Batch size | Rows per batch and lock impact |
| Max lock duration | Longest row/table lock per batch |
| Rollback strategy | Per-step recovery actions |
| Blast radius | Tables, columns, indexes, triggers, views affected |
Copy this into any issue that involves a backfill:
## Backfill Plan
### Overview
| Field | Value |
|---|---|
| Target table(s) | `{table_name}` |
| Affected columns | `{column_list}` |
| Estimated row count | {N} rows (current), {M} projected at scale |
| Batch size | {B} rows per batch |
| Estimated runtime | {T} (current data), {T'} (at 10K products) |
| Lock type | `{lock_type}` (e.g., RowExclusiveLock via FOR UPDATE SKIP LOCKED) |
| Max lock duration per batch | {D} (e.g., <100ms per batch of 1000) |
| Blast radius | Tables: {list} · Indexes: {list} · Triggers: {list} · Views: {list} |
### Execution Steps
| Step | Operation | Reversible? | Validation |
|---|---|---|---|
| 1 | {description} | Yes/No | {how to verify} |
| 2 | {description} | Yes/No | {how to verify} |
| ... | | | |
### Rollback Strategy
| Failure Point | Database State | Recovery Action |
|---|---|---|
| After step 1, before step 2 | {state description} | {action} |
| After step 2, before step 3 | {state description} | {action} |
| Partial step N completion | {state description} | {action} |
### Runtime Estimation
- Current data: {N} rows × {avg_ms_per_row}ms ÷ {batch_size} batches × {pause_ms}ms pause = {total}
- At 10K products: {estimate}
- At 50K products: {estimate}
### Pre-flight Checks
- [ ] Backup verified (point-in-time or manual)
- [ ] Estimated runtime communicated
- [ ] Monitoring dashboard open during execution
- [ ] Rollback SQL prepared and tested on staging
- [ ] Batch size tested on representative data subsetFor small backfills under 1,000 rows, use this abbreviated version:
## Backfill Plan (Mini)
| Field | Value |
|---|---|
| Target table(s) | `{table_name}` |
| Row count | {N} rows |
| Batch size | {B} (or single batch) |
| Estimated runtime | {T} |
| Reversible? | Yes/No — {brief description} |All backfill runtime estimates must use this formula:
Total Runtime = (total_rows / batch_size) × (avg_batch_ms + pause_ms) + overhead_ms
| Variable | Description | Default |
|---|---|---|
total_rows |
SELECT count(*) FROM target WHERE needs_backfill_condition |
— |
batch_size |
Rows per batch | 1,000 |
avg_batch_ms |
Average time per batch (measure on 3 test batches) | — |
pause_ms |
pg_sleep between batches (reduces contention) |
100ms |
overhead_ms |
Connection setup + validation + logging | 500ms |
Example 1: Scoring re-computation
Current: 1,076 rows / 1000 batch = 2 batches
2 × (50ms + 100ms) + 500ms = 800ms
→ Runtime: <1 second
At 10K: 10,000 / 1000 = 10 batches
10 × (50ms + 100ms) + 500ms = 2,000ms
→ Runtime: ~2 seconds
At 50K: 50,000 / 1000 = 50 batches
50 × (50ms + 100ms) + 500ms = 8,000ms
→ Runtime: ~8 seconds
Example 2: Search vector regeneration (tsvector is CPU-intensive)
Current: 1,076 rows / 500 batch = 3 batches
3 × (200ms + 100ms) + 500ms = 1,400ms
→ Runtime: ~1.5 seconds
At 50K: 50,000 / 500 = 100 batches
100 × (200ms + 100ms) + 500ms = 30,500ms
→ Runtime: ~31 seconds
Before executing any backfill:
- Run 3 test batches on a representative data subset
- Measure actual
avg_batch_ms - Recalculate total estimate with measured value
- If actual > 2× estimated, investigate before proceeding
For every backfill, document each item:
| Checklist Item | What to Document |
|---|---|
| Tables modified | Which tables receive UPDATE/INSERT during backfill |
| Columns touched | Which specific columns are written |
| Indexes affected | Which indexes must be updated (GIN on JSONB is expensive) |
| Triggers fired | Which BEFORE/AFTER triggers fire on each UPDATE (see trigger naming convention) |
| Views invalidated | Which materialized views need refresh after backfill |
| API impact | Will users see partial/inconsistent data during backfill? |
| Concurrent access | Can users read/write during backfill? Any degradation? |
| Scoring impact | Does this backfill change unhealthiness_score or visible scores? |
When updating rows in a table with triggers, each batch fires triggers per-row. A backfill of 50K rows with a trigger that writes to an audit table means 50K audit inserts. Document this.
| Operation Pattern | Lock Type | Duration | Safe for Production? |
|---|---|---|---|
UPDATE ... SET col = val WHERE id IN (batch) |
RowExclusiveLock | Per-row, <1ms | Yes |
FOR UPDATE SKIP LOCKED batch processing |
RowExclusiveLock | Per-batch, <100ms | Yes (recommended) |
ALTER TABLE ADD COLUMN (nullable, no default expression) |
AccessExclusiveLock | <1s | Yes (brief) |
ALTER TABLE ADD COLUMN ... DEFAULT val (PG 11+) |
AccessExclusiveLock | <1s | Yes (metadata only) |
CREATE INDEX CONCURRENTLY |
ShareUpdateExclusiveLock | Minutes (non-blocking) | Yes |
CREATE INDEX (without CONCURRENTLY) |
ShareLock | Minutes (BLOCKING) | No — never do this |
UPDATE ... SET col = val (full table, no WHERE) |
RowExclusiveLock | Proportional to rows | No — always batch |
REFRESH MATERIALIZED VIEW |
AccessExclusiveLock | Seconds | No — use CONCURRENTLY |
REFRESH MATERIALIZED VIEW CONCURRENTLY |
ExclusiveLock | Seconds | Yes |
- Always batch — Never UPDATE an entire table in one statement.
- Always use CONCURRENTLY — For index creation and materialized view refreshes.
- Always use FOR UPDATE SKIP LOCKED — When concurrent access is expected.
- Always pause between batches — Minimum 100ms to prevent lock starvation.
Issues that require or may require backfill plans:
| Issue | Backfill Description | Status |
|---|---|---|
| #189 (Scoring Engine) | Re-score all products with new model version | Shipped — rescore_batch() implements batched re-scoring |
| #192 (Search Architecture) | Generate search_vector for all products |
Open — needs Backfill Plan subsection |
| #193 (Data Provenance) | Populate field_provenance for existing products |
Open — needs Backfill Plan subsection |
| #127 (Image Optimization) | Image processing/resizing for existing product images | Open — needs Backfill Plan subsection |
When reviewing PRs, reviewers should verify:
- If the PR's linked issue or migration mentions "backfill", a Backfill Plan subsection is present in the issue
- Runtime estimation uses the standard formula (Section 2)
- Blast radius is documented (Section 3)
- Lock-unsafe patterns are absent (Section 4)
- For > 1K rows: rollback strategy is defined
- For > 10K rows: 3-batch test run results are included
DO $backfill$
DECLARE
v_batch_size integer := 1000;
v_affected integer;
v_total integer := 0;
BEGIN
LOOP
WITH batch AS (
SELECT product_id
FROM products
WHERE needs_backfill_condition
ORDER BY product_id
LIMIT v_batch_size
FOR UPDATE SKIP LOCKED
)
UPDATE products p
SET target_column = computed_value
FROM batch b
WHERE p.product_id = b.product_id;
GET DIAGNOSTICS v_affected = ROW_COUNT;
v_total := v_total + v_affected;
EXIT WHEN v_affected = 0;
RAISE NOTICE 'Backfill progress: % rows updated', v_total;
PERFORM pg_sleep(0.1); -- 100ms pause
END LOOP;
RAISE NOTICE 'Backfill complete: % total rows', v_total;
END
$backfill$;-- NEVER do this in production:
UPDATE products SET target_column = computed_value;-- NEVER do this in production:
CREATE INDEX idx_foo ON products (bar);
-- Always use:
CREATE INDEX CONCURRENTLY idx_foo ON products (bar);Last updated: 2026-03-03 — Extended for #208 (backfill registry + monitoring view)
All backfills must register in the backfill_registry table before execution.
This provides audit trail, progress monitoring, and resumability.
Migration: 20260303000000_backfill_registry.sql
| Column | Type | Notes |
|---|---|---|
backfill_id |
uuid (PK) |
Auto-generated |
name |
text (UNIQUE) |
e.g., provenance_field_backfill_v1 |
description |
text |
Human-readable purpose |
source_issue |
text |
e.g., #193 |
status |
text |
pending / running / completed / failed / rolled_back |
started_at |
timestamptz |
Set by start_backfill() |
completed_at |
timestamptz |
Set by complete_backfill() or fail_backfill() |
rows_processed |
integer |
Updated during execution |
rows_expected |
integer |
From pre-count |
batch_size |
integer |
Default: 1000 |
error_message |
text |
On failure |
executed_by |
text |
github username or automation |
rollback_sql |
text |
SQL to undo this backfill |
validation_passed |
boolean |
Post-validation result |
| Function | Purpose |
|---|---|
register_backfill(...) |
Register (upsert) a new backfill; returns backfill_id |
start_backfill(id) |
Mark as running, set started_at |
update_backfill_progress(id, rows) |
Update rows_processed during execution |
complete_backfill(id, rows, passed) |
Mark as completed, set completed_at |
fail_backfill(id, error_msg) |
Mark as failed, record error |
v_backfill_status provides a real-time dashboard:
SELECT name, status, rows_processed, rows_expected,
pct_complete, elapsed_seconds, validation_passed
FROM v_backfill_status
WHERE status = 'running';service_role: full access (read + write)authenticated: read-onlyanon: no access
Copy scripts/backfill_template.py for each backfill. It handles:
registration, pre-validation, batched execution, progress updates,
post-validation, and status finalization.
python scripts/backfill_{name}.py --dry-run # estimate only
python scripts/backfill_{name}.py --batch-size 500 # execute