Skip to content

[Feature]: Trigger ETL pipeline automatically after staging deploy #218

@Vraj1234

Description

@Vraj1234

Pre-flight checklist

  • I have searched the existing issues

Problem to solve

The Master ETL Flow must be triggered manually after every staging deployment. This is error-prone — a developer can merge to main, the deploy succeeds, but the ETL never runs because nobody remembered to trigger it. Data served by the webservice becomes stale without any visible indication.

Proposed solution or API

Add a fire-and-forget ETL trigger as the final job in deploy-staging.yml. After infrastructure deploys, migrations run, services update, and health checks pass, the workflow enqueues a master-etl-deployment run on the Prefect server and exits.

Implementation:

  • New scripts/cloud-trigger-etl.sh — discovers Prefect server URL via gcloud run services describe, calls prefect deployment run
  • New cloud-trigger-etl pixi task in the cloud feature
  • New trigger-etl job in deploy-staging.yml after validate-deployment
  • concurrency_limit: 1 on the Prefect deployment to prevent overlapping runs

Design decisions:

  • Fire-and-forget — ETL can take 30+ mins; blocking CI is fragile. Monitor ETL health via Prefect UI.
  • Trigger failure = CI failure — if we can't enqueue, the workflow fails (infrastructure problem).
  • ETL execution failure ≠ CI failure — once enqueued, ETL is a Prefect concern.
  • Concurrency limit 1 — prevents two rapid pushes from running ETL in parallel.

Failure modes to be aware of:

  • Partial failure: master flow catches sub-flow exceptions and continues. Looks like success in Prefect.
  • Silent failure: some flows (e.g., LandIQ) return early without error if extraction fails.
  • OOM: worker has 2 GiB; LandIQ shapefile processing can be memory-intensive.

Alternatives considered

  • Synchronous wait — CI blocks until ETL finishes. Rejected: 30+ min ETL runtime makes this fragile; timeouts are indistinguishable from real failures.
  • Scheduled ETL (cron) — run ETL on a fixed schedule regardless of deploys. Rejected: wastes compute when no code changed, and data could be stale between schedule intervals.
  • Cloud Run Job for ETL — package the ETL as a Cloud Run Job instead of a Prefect flow. Rejected: loses Prefect's orchestration, logging, retry, and UI capabilities.

Additional context

  • Prefect server is currently unauthenticated (allUsers invoker) — securing it is a separate follow-up.
  • Post-merge action: run prefect deploy --name master-etl-deployment once to apply concurrency_limit: 1.
  • All ETL loaders use upsert/ON CONFLICT semantics — re-runs are safe and idempotent.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions