Feat/20250715/threaded transformer #22
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces significant enhancements to the pipeline framework, focusing on context-aware transformations, improved threading capabilities, and updates to test coverage. The most important changes include introducing explicit context handling in transformers, adding a parallel transformer for multi-threaded execution, and refactoring the pipeline and transformer classes for better type safety and functionality.
Enhancements to Context Handling:
efemel/pipeline/helpers.py: Added utility functionsis_context_awareandis_context_aware_reduceto determine if functions are context-aware, along with thePipelineContextclass to manage shared context across pipeline operations.efemel/pipeline/pipeline.py: Introduced acontextmethod to set shared context for pipelines and updated theapplymethod to handle context-aware transformers.Parallel Execution:
efemel/pipeline/transformers/parallel.py: Added theParallelTransformerclass to enable concurrent execution of transformations using multiple threads, with support for ordered and unordered results.Refactoring for Type Safety and Explicit Context:
efemel/pipeline/transformers/transformer.py: Refactored theTransformerclass to explicitly pass context through all transformation operations, replacing implicit context handling with explicit context-aware function signatures.Performance and Test Updates:
performance_test.py: Updated performance tests to use the newTransformerclass and added type hints for better clarity. [1] [2]test.py: Added a new test script to validate pipeline transformations with large datasets.tests/test_integration.py: Updated integration tests to validate context sharing and added a test for the new parallel transformer. [1] [2]