Skip to content

Comments

Add middleware-based staging system#41

Open
cswaney wants to merge 10 commits intomainfrom
feature/staging-conditions
Open

Add middleware-based staging system#41
cswaney wants to merge 10 commits intomainfrom
feature/staging-conditions

Conversation

@cswaney
Copy link
Member

@cswaney cswaney commented Feb 7, 2026

Summary

  • Replace boolean conditions with middleware pattern where each step transforms the candidate list: (list[Path], state) -> list[Path]
  • Add PipelineState dataclass exposing waiting/staged/completed/failed counts plus input_dir/output_dir paths
  • Built-in filters: min_size, max_size, min_age, filename_match, companion_file
  • Built-in limits: max_staged, max_batch, sort_by
  • CallableMiddleware for custom user functions
  • Config uses staging.steps list instead of conditions.file/directory

Example config

staging:
  steps:
    - kind: min_size
      bytes: 1024
    - kind: filename_match
      pattern: "^recording_\\d+"
    - kind: max_staged
      count: 100
    - kind: callable
      function: myproject.staging:limit_by_capacity

tasks:
  - name: transcribe
    # ...

Test plan

  • 117 unit tests pass
  • E2E test with playground example (staged 2/5 files as expected)
  • CI will run pytest on Python 3.10-3.13

Closes #39

cswaney and others added 7 commits February 3, 2026 19:41
Provides the ability for users to specify pipeline- and file-level staging conditions.

Pipeline conditions define a `check` method taking the pipeline input directory as its only argument. File conditions define a `check` method taking a file path as its only argument. All `check` methods return a `bool` indicating whether any files (pipeline-level) or a specific file (file-level) should be staged. There are five built-in, parameterized file-level condition types and one pipeline type that runs a arbitrary shell command.
- Add validate_callable_reference and import_callable utilities
- Replace ScriptCondition with CallableFileCondition and
  CallablePipelineCondition using module:function references
- Validate callables eagerly at config load time
- Handle callable errors gracefully (log warning, return False)
- Rename task -> file (field, method, union type) for clarity
- Replace boolean conditions with middleware pattern where each step
  transforms the candidate list: (list[Path], state) -> list[Path]
- Add PipelineState dataclass exposing waiting/staged/completed/failed
  counts plus input_dir/output_dir paths
- Built-in filters: min_size, max_size, min_age, filename_match,
  companion_file
- Built-in limits: max_staged, max_batch, sort_by
- CallableMiddleware for custom user functions
- Config uses staging.steps list instead of conditions.file/directory
- Delete conditions.py, add staging.py with 117 tests

Closes #39
@yoonspark yoonspark self-requested a review February 20, 2026 12:28
Copy link
Collaborator

@yoonspark yoonspark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cswaney Sharing the first round of feedback (will take a closer look at tests later):


kind: Literal["callable"]
function: str

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that CallableMiddleware only works if the function's module is importable (i.e., an installed package or on sys.path). Users cannot reference a function defined in a standalone Python script. Is this intentional, or should we support that? For example, could we allow something like:

staging:
  steps:
    - kind: callable
      function: path/to/filters.py:my_func

Copy link
Member Author

@cswaney cswaney Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Let me think about this a bit more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR can proceed without addressing this — we can tackle it in a separate issue. Let me know your preference!

@yoonspark yoonspark added feature Implement new functionalities docs Improvements or additions to documentation labels Feb 20, 2026
- Change PipelineState to StagingContext
- Fix staged files count error
Copy link
Collaborator

@yoonspark yoonspark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cswaney Looks good. Requesting minor updates for file reorganization and variable renaming.

Comment on lines +215 to +216
def _build_pipeline_state(self) -> StagingContext:
"""Build current pipeline state for staging middleware."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename for consistency:

Suggested change
def _build_pipeline_state(self) -> StagingContext:
"""Build current pipeline state for staging middleware."""
def _build_staging_context(self) -> StagingContext:
"""Build the current context for staging middleware."""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs to be moved into tests/unit/ per file reorg done in #32.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs to be merged into existing tests/unit/test_utils.py.


@pytest.fixture
def mock_context(tmp_path: Path) -> StagingContext:
"""Create a mock pipeline state for testing."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename for consistency:

Suggested change
"""Create a mock pipeline state for testing."""
"""Create a mock staging context for testing."""



class TestCallableMiddleware:
def test_calls_function_with_candidates_and_state(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename for consistency:

Suggested change
def test_calls_function_with_candidates_and_state(
def test_calls_function_with_candidates_and_context(

test_module = tmp_path / "test_staging_func.py"
test_module.write_text(
"""
def keep_first(candidates, state):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename for consistency:

Suggested change
def keep_first(candidates, state):
def keep_first(candidates, context):

test_module = tmp_path / "test_error_func.py"
test_module.write_text(
"""
def raise_error(candidates, state):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename for consistency:

Suggested change
def raise_error(candidates, state):
def raise_error(candidates, context):

def _build_pipeline_state(self) -> StagingContext:
"""Build current pipeline state for staging middleware."""
n_finished = sum(1 for f in self._finished_dir.iterdir() if f.is_file())
n_failed = sum(len(e) for e in self._task_error_filenames.values())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Rename for consistency across codebase:

Suggested change
n_failed = sum(len(e) for e in self._task_error_filenames.values())
n_failed = sum(len(errs) for errs in self._task_error_filenames.values())


kind: Literal["callable"]
function: str

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR can proceed without addressing this — we can tackle it in a separate issue. Let me know your preference!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation feature Implement new functionalities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add configurable staging conditions

2 participants