Skip to content

RFC: Airflow integration via JobbersExecutor#44

Open
yoshrote wants to merge 2 commits intomainfrom
airflow-integration-proposal
Open

RFC: Airflow integration via JobbersExecutor#44
yoshrote wants to merge 2 commits intomainfrom
airflow-integration-proposal

Conversation

@yoshrote
Copy link
Copy Markdown
Owner

Context

Apache Airflow uses an executor abstraction to decouple DAG/task orchestration from actual task execution. The CeleryExecutor sends tasks to Celery queues; workers execute them and report status back. The goal is to replace this Celery layer with Jobbers while keeping Airflow responsible for DAG definition, dependency resolution, scheduling, and the UI.

Integration Model: Jobbers as an Airflow Executor

Airflow responsibility (unchanged) Jobbers responsibility (new)
DAG definitions (Python files) Distributed task execution
Task dependency resolution Stall detection via heartbeats
Scheduler (decides what runs next) Retry logic with backoff
Metadata DB, XCom, UI Cancellation, DLQ
Connections/Variables/config Concurrency control

What Needs to Be Built

1. JobbersExecutor Airflow plugin (new airflow-jobbers package)

Implements BaseExecutor with:

  • execute_async(): Submits a task to the Jobbers Manager API (POST /submit-task)
  • sync(): Polls Jobbers for status changes and reports back to Airflow's scheduler
  • State mapping: Jobbers statuses → Airflow TaskInstanceState

2. Built-in Airflow runner task in Jobbers

A @register_task function that receives an Airflow task instance descriptor (dag_id, task_id, run_id) and executes it via airflow tasks run — the same mechanism CeleryExecutor workers use. This is the bridge between the two systems.

3. Optional: bulk status endpoint on the Manager

POST /task-status/bulk so the executor can poll many task states in one call rather than N individual requests.

Key Gaps

Gap Severity Notes
No bulk status query Low Easy to add; executor polling scales linearly without it
Synchronous operator support Low Airflow operators are sync; asyncio.to_thread() handles this
No webhook/callback Low Polling works; CeleryExecutor also polls
Worker environment Medium Workers need Airflow installed + DAG file access (same as CeleryExecutor)
State mapping Low COMPLETED→SUCCESS, FAILED/STALLED→FAILED, CANCELLED→REMOVED

Why This Is Viable

  • Airflow's executor interface is stable and designed for this kind of replacement
  • Jobbers already has submission, status tracking, cancellation, and retry infrastructure
  • CeleryExecutor is ~600 lines; a JobbersExecutor would be comparable in scope
  • Jobbers adds value over Celery: heartbeat-based stall detection, structured retry policies, explicit cancellation, and DLQ

Scope

Two deliverables:

  1. Jobbers changes: built-in Airflow runner task + optional bulk status endpoint
  2. New airflow-jobbers package: JobbersExecutor class + state mapping + config

No changes to Airflow core are needed.

🤖 Generated with Claude Code

yoshrote and others added 2 commits March 9, 2026 21:03
Documents the approach for using Jobbers as an Airflow executor,
replacing CeleryExecutor with a JobbersExecutor plugin while keeping
Airflow responsible for DAG orchestration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant