Add middleware-based staging system by cswaney · Pull Request #41 · princeton-ddss/tigerflow

cswaney · 2026-02-07T13:24:53Z

Summary

Replace boolean conditions with middleware pattern where each step transforms the candidate list: (list[Path], state) -> list[Path]
Add PipelineState dataclass exposing waiting/staged/completed/failed counts plus input_dir/output_dir paths
Built-in filters: min_size, max_size, min_age, filename_match, companion_file
Built-in limits: max_staged, max_batch, sort_by
CallableMiddleware for custom user functions
Config uses staging.steps list instead of conditions.file/directory

Example config

staging:
  steps:
    - kind: min_size
      bytes: 1024
    - kind: filename_match
      pattern: "^recording_\\d+"
    - kind: max_staged
      count: 100
    - kind: callable
      function: myproject.staging:limit_by_capacity

tasks:
  - name: transcribe
    # ...

Test plan

117 unit tests pass
E2E test with playground example (staged 2/5 files as expected)
CI will run pytest on Python 3.10-3.13

Closes #39

Provides the ability for users to specify pipeline- and file-level staging conditions. Pipeline conditions define a `check` method taking the pipeline input directory as its only argument. File conditions define a `check` method taking a file path as its only argument. All `check` methods return a `bool` indicating whether any files (pipeline-level) or a specific file (file-level) should be staged. There are five built-in, parameterized file-level condition types and one pipeline type that runs a arbitrary shell command.

- Add validate_callable_reference and import_callable utilities - Replace ScriptCondition with CallableFileCondition and CallablePipelineCondition using module:function references - Validate callables eagerly at config load time - Handle callable errors gracefully (log warning, return False) - Rename task -> file (field, method, union type) for clarity

- Replace boolean conditions with middleware pattern where each step transforms the candidate list: (list[Path], state) -> list[Path] - Add PipelineState dataclass exposing waiting/staged/completed/failed counts plus input_dir/output_dir paths - Built-in filters: min_size, max_size, min_age, filename_match, companion_file - Built-in limits: max_staged, max_batch, sort_by - CallableMiddleware for custom user functions - Config uses staging.steps list instead of conditions.file/directory - Delete conditions.py, add staging.py with 117 tests Closes #39

yoonspark

@cswaney Sharing the first round of feedback (will take a closer look at tests later):

src/tigerflow/pipeline.py

src/tigerflow/staging.py

yoonspark · 2026-02-20T14:17:29Z

src/tigerflow/staging.py

+
+    kind: Literal["callable"]
+    function: str
+


My understanding is that CallableMiddleware only works if the function's module is importable (i.e., an installed package or on sys.path). Users cannot reference a function defined in a standalone Python script. Is this intentional, or should we support that? For example, could we allow something like:

staging: steps: - kind: callable function: path/to/filters.py:my_func

Good question. Let me think about this a bit more.

This PR can proceed without addressing this — we can tackle it in a separate issue. Let me know your preference!

src/tigerflow/models.py

…tions

- Change PipelineState to StagingContext - Fix staged files count error

yoonspark

@cswaney Looks good. Requesting minor updates for file reorganization and variable renaming.

yoonspark · 2026-02-24T15:29:49Z

src/tigerflow/pipeline.py

+    def _build_pipeline_state(self) -> StagingContext:
+        """Build current pipeline state for staging middleware."""


Rename for consistency:

Suggested change

def _build_pipeline_state(self) -> StagingContext:

"""Build current pipeline state for staging middleware."""

def _build_staging_context(self) -> StagingContext:

"""Build the current context for staging middleware."""

src/tigerflow/models.py

yoonspark · 2026-02-24T15:55:46Z

tests/test_staging.py

This file needs to be moved into tests/unit/ per file reorg done in #32.

yoonspark · 2026-02-24T15:57:07Z

tests/test_utils.py

This file needs to be merged into existing tests/unit/test_utils.py.

yoonspark · 2026-02-24T16:02:13Z

tests/test_staging.py

+
+@pytest.fixture
+def mock_context(tmp_path: Path) -> StagingContext:
+    """Create a mock pipeline state for testing."""


Rename for consistency:

Suggested change

"""Create a mock pipeline state for testing."""

"""Create a mock staging context for testing."""

yoonspark · 2026-02-24T16:04:26Z

tests/test_staging.py

+
+
+class TestCallableMiddleware:
+    def test_calls_function_with_candidates_and_state(


Rename for consistency:

Suggested change

def test_calls_function_with_candidates_and_state(

def test_calls_function_with_candidates_and_context(

yoonspark · 2026-02-24T16:04:44Z

tests/test_staging.py

+        test_module = tmp_path / "test_staging_func.py"
+        test_module.write_text(
+            """
+def keep_first(candidates, state):


Rename for consistency:

Suggested change

def keep_first(candidates, state):

def keep_first(candidates, context):

yoonspark · 2026-02-24T16:05:02Z

tests/test_staging.py

+        test_module = tmp_path / "test_error_func.py"
+        test_module.write_text(
+            """
+def raise_error(candidates, state):


Rename for consistency:

Suggested change

def raise_error(candidates, state):

def raise_error(candidates, context):

yoonspark · 2026-02-24T16:31:08Z

src/tigerflow/pipeline.py

+    def _build_pipeline_state(self) -> StagingContext:
+        """Build current pipeline state for staging middleware."""
+        n_finished = sum(1 for f in self._finished_dir.iterdir() if f.is_file())
+        n_failed = sum(len(e) for e in self._task_error_filenames.values())


[Nit] Rename for consistency across codebase:

Suggested change

n_failed = sum(len(e) for e in self._task_error_filenames.values())

n_failed = sum(len(errs) for errs in self._task_error_filenames.values())

yoonspark · 2026-02-24T16:36:19Z

src/tigerflow/staging.py

+
+    kind: Literal["callable"]
+    function: str
+


This PR can proceed without addressing this — we can tackle it in a separate issue. Let me know your preference!

cswaney and others added 7 commits February 3, 2026 19:41

Merge branch 'main' into feature/staging-conditions

42e224f

Merge branch 'main' into feature/staging-conditions

2bccdda

Rename pipeline conditions to directory conditions

eb59fda

Ignore draft documentation

f525858

yoonspark self-requested a review February 20, 2026 12:28

yoonspark requested changes Feb 20, 2026

View reviewed changes

yoonspark added feature Implement new functionalities docs Improvements or additions to documentation labels Feb 20, 2026

cswaney added 3 commits February 23, 2026 16:36

Merge remote-tracking branch 'origin/main' into feature/staging-condi…

34bca1e

…tions

Address PR feedback

b0e1581

- Change PipelineState to StagingContext - Fix staged files count error

Add documentation

9807e07

yoonspark requested changes Feb 24, 2026

View reviewed changes

		def _build_pipeline_state(self) -> StagingContext:
		"""Build current pipeline state for staging middleware."""

	"""Create a mock pipeline state for testing."""
	"""Create a mock staging context for testing."""



		class TestCallableMiddleware:
		def test_calls_function_with_candidates_and_state(

	def keep_first(candidates, state):
	def keep_first(candidates, context):

	def raise_error(candidates, state):
	def raise_error(candidates, context):

	n_failed = sum(len(e) for e in self._task_error_filenames.values())
	n_failed = sum(len(errs) for errs in self._task_error_filenames.values())

Comments

Conversation

cswaney commented Feb 7, 2026

Summary

Example config

Test plan

Uh oh!

yoonspark left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cswaney Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yoonspark left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cswaney Feb 23, 2026 •

edited

Loading