-
Notifications
You must be signed in to change notification settings - Fork 63
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Priority Level
High
Task Summary
Transform the dataset builder from sequential column-by-column processing into an async task queue with dependency-aware scheduling. Generators become async-first, and the builder dispatches individual cell/batch tasks as soon as their upstream dependencies are satisfied — enabling pipeline parallelism across columns and rows.
Technical Details & Implementation Plan
- Dependency map: built from each column config's
required_columnsproperty (Jinja2 template introspection) — no config schema changes needed - Completion tracker: lightweight columns × rows matrix that determines task readiness
- Async task scheduler: replaces the sequential
_run_batchloop; dispatches tasks as dependencies are met, bounded by semaphore - Generator async migration: all generator types get async-capable
ageneratemethods (LLM generators already have native async from feat(engine): env-var switch for async-first models experiment #280) - Row group checkpointing: parquet written when a row group fully completes
Dependencies
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request