|
1 | | -# TheCodeCrate's Pipeline |
2 | | - |
3 | | -This package provides a pipeline pattern implementation. |
4 | | - |
5 | | -The implementation is inspired by the excellent [PHP League Pipeline](https://github.com/thephpleague/pipeline) package. |
6 | | - |
7 | | -## Installation |
8 | | - |
9 | | -```bash |
10 | | -pip install thecodecrate-pipeline |
11 | | -``` |
12 | | - |
13 | | -## Pipeline Pattern |
14 | | - |
15 | | -The pipeline pattern allows you to easily compose sequential stages by chaining stages. |
16 | | - |
17 | | -In this particular implementation, the interface consists of two parts: |
18 | | - |
19 | | -- `StageInterface` |
20 | | -- `PipelineInterface` |
21 | | - |
22 | | -A pipeline consists of zero, one, or multiple stages. A pipeline can process a payload. During the processing, the payload will be passed to the first stage. From that moment on, the resulting value is passed on from stage to stage. |
23 | | - |
24 | | -In the simplest form, the execution chain can be represented as a for loop: |
25 | | - |
26 | | -```python |
27 | | -result = payload |
28 | | - |
29 | | -for stage in stages: |
30 | | - result = stage(result) |
31 | | - |
32 | | -return result |
33 | | -``` |
34 | | - |
35 | | -Effectively, this is the same as: |
36 | | - |
37 | | -```python |
38 | | -result = stage3(stage2(stage1(payload))) |
39 | | -``` |
40 | | - |
41 | | -## Immutability |
42 | | - |
43 | | -Pipelines are implemented as immutable stage chains. When you pipe a new stage, a new pipeline will be created with the added stage. This makes pipelines easy to reuse and minimizes side-effects. |
44 | | - |
45 | | -## Usage |
46 | | - |
47 | | -Operations in a pipeline, stages, can be anything that satisfies the `Callable` type hint. So functions and anything that's callable is acceptable. |
48 | | - |
49 | | -```python |
50 | | -pipeline = Pipeline().pipe(lambda payload: payload * 10) |
51 | | - |
52 | | -# Returns 100 |
53 | | -await pipeline.process(10) |
54 | | -``` |
55 | | - |
56 | | -## Class-Based Stages |
57 | | - |
58 | | -Class-based stages are also possible. The `StageInterface[InputType, OutputType]` interface can be implemented, which ensures you have the correct method signature for the `__call__` method. |
59 | | - |
60 | | -```python |
61 | | -class TimesTwoStage(StageInterface[int, int]): |
62 | | - async def __call__(self, payload: int) -> int: |
63 | | - return payload * 2 |
64 | | - |
65 | | -class AddOneStage(StageInterface[int, int]): |
66 | | - async def __call__(self, payload: int) -> int: |
67 | | - return payload + 1 |
68 | | - |
69 | | -pipeline = ( |
70 | | - Pipeline[int, int]() |
71 | | - .pipe(TimesTwoStage()) |
72 | | - .pipe(AddOneStage()) |
73 | | -) |
74 | | - |
75 | | -# Returns 21 |
76 | | -await pipeline.process(10) |
77 | | -``` |
78 | | - |
79 | | -## Reusable Pipelines |
80 | | - |
81 | | -Because the `PipelineInterface` is an extension of the `StageInterface`, pipelines can be reused as stages. This creates a highly composable model to create complex execution patterns while keeping the cognitive load low. |
82 | | - |
83 | | -For example, if we'd want to compose a pipeline to process API calls, we'd create something along these lines: |
84 | | - |
85 | | -```python |
86 | | -process_api_request = ( |
87 | | - Pipeline() |
88 | | - .pipe(ExecuteHttpRequest()) |
89 | | - .pipe(ParseJsonResponse()) |
90 | | -) |
91 | | - |
92 | | -pipeline = ( |
93 | | - Pipeline() |
94 | | - .pipe(ConvertToPsr7Request()) |
95 | | - .pipe(process_api_request) |
96 | | - .pipe(ConvertToResponseDto()) |
97 | | -) |
98 | | - |
99 | | -await pipeline.process(DeleteBlogPost(post_id)) |
100 | | -``` |
101 | | - |
102 | | -## Type Hinting |
103 | | - |
104 | | -You can specify the input and output types for pipelines and stages using type variables `T_in` and `T_out`. This allows you to handle varying types between stages, enhancing type safety and code clarity. |
105 | | - |
106 | | -The `T_out` type variable is optional and defaults to `T_in`. Similarly, `T_in` is also optional and defaults to `Any`. |
107 | | - |
108 | | -```python |
109 | | -from typing import Any |
110 | | - |
111 | | -pipeline = Pipeline[int]().pipe(lambda payload: payload * 2) |
112 | | - |
113 | | -# Returns 20 |
114 | | -await pipeline.process(10) |
115 | | -``` |
116 | | - |
117 | | -You can also handle varying types between stages: |
118 | | - |
119 | | -```python |
120 | | -pipeline = Pipeline[int, str]().pipe(lambda payload: f"Number: {payload}") |
121 | | - |
122 | | -# Returns "Number: 10" |
123 | | -await pipeline.process(10) |
124 | | -``` |
125 | | - |
126 | | -This flexibility allows you to build pipelines that transform data types between stages seamlessly. |
127 | | - |
128 | | -## Custom Processors |
129 | | - |
130 | | -You can create your own processors to customize how the pipeline processes stages. This allows you to implement different execution strategies, such as handling exceptions, processing resources, or implementing middleware patterns. |
131 | | - |
132 | | -For example, you can define a custom processor: |
133 | | - |
134 | | -```python |
135 | | -class MyCustomProcessor(Processor[T_in, T_out]): |
136 | | - async def process( |
137 | | - self, |
138 | | - payload: T_in, |
139 | | - stages: StageInstanceCollection, |
140 | | - ) -> T_out: |
141 | | - # Custom processing logic |
142 | | - for stage in stages: |
143 | | - payload = await stage(payload) |
144 | | - return payload |
145 | | -``` |
146 | | - |
147 | | -And use it in your pipeline: |
148 | | - |
149 | | -```python |
150 | | -pipeline = Pipeline[int, int](processor=MyCustomProcessor()).pipe(lambda x: x * 2) |
151 | | -``` |
152 | | - |
153 | | -## Declarative Stages |
154 | | - |
155 | | -Instead of using `pipe` to add stages at runtime, you can define stages declaratively by specifying them as class-level attributes. This makes pipelines easier to set up and reuse with predefined stages. |
156 | | - |
157 | | -```python |
158 | | -class MyPipeline(Pipeline[int, int]): |
159 | | - stages = [ |
160 | | - TimesTwoStage(), |
161 | | - TimesThreeStage(), |
162 | | - ] |
163 | | - |
164 | | -# Process the payload through the pipeline with the declared stages |
165 | | -result = await MyPipeline().process(5) |
166 | | - |
167 | | -# Returns 30 |
168 | | -print(result) |
169 | | -``` |
170 | | - |
171 | | -In this example, `MyPipeline` declares its stages directly in the class definition, making the pipeline setup more readable and maintainable. |
172 | | - |
173 | | -## Declarative Processor |
174 | | - |
175 | | -You can also specify the processor in a declarative way by setting the `processor_class` attribute in your pipeline class. |
176 | | - |
177 | | -```python |
178 | | -class MyPipeline(Pipeline[T_in, T_out]): |
179 | | - processor_class = MyCustomProcessor |
180 | | -``` |
181 | | - |
182 | | -This allows you to customize the processing behavior of your pipeline while keeping the definition clean and declarative. |
183 | | - |
184 | | -## Processing Streams |
185 | | - |
186 | | -The pipeline can also process streams in real-time, allowing you to handle asynchronous iterators and process data as it becomes available. |
187 | | - |
188 | | -```python |
189 | | -from typing import AsyncIterator |
190 | | -import asyncio |
191 | | - |
192 | | -async def input_stream() -> AsyncIterator[int]: |
193 | | - for i in range(5): |
194 | | - yield i |
195 | | - |
196 | | -async def stage1(stream: AsyncIterator[int]) -> AsyncIterator[int]: |
197 | | - async for item in stream: |
198 | | - yield item * 2 |
199 | | - await asyncio.sleep(1) # Simulate processing delay |
200 | | - |
201 | | -async def stage2(stream: AsyncIterator[int]) -> AsyncIterator[str]: |
202 | | - async for item in stream: |
203 | | - yield f"Number: {item}" |
204 | | - |
205 | | - |
206 | | -async def main(): |
207 | | - pipeline = ( |
208 | | - Pipeline[AsyncIterator[int], AsyncIterator[str]]() |
209 | | - .pipe(stage1) |
210 | | - .pipe(stage2) |
211 | | - ) |
212 | | - |
213 | | - stream = await pipeline.process(input_stream()) |
214 | | - |
215 | | - async for result in stream: |
216 | | - print(result) |
217 | | - |
218 | | -# Run the async main function |
219 | | -await main() |
220 | | -``` |
221 | | - |
222 | | -This allows you to process data in a streaming fashion, where each stage can yield results that are immediately consumed by the next stage. |
223 | | - |
224 | | -## Pipeline Factory |
225 | | - |
226 | | -Because pipelines themselves are immutable, pipeline factory is introduced to facilitate distributed composition of a pipeline. |
227 | | - |
228 | | -The `PipelineFactory[InputType, OutputType]` collects stages and allows you to create a pipeline at any given time. |
229 | | - |
230 | | -```python |
231 | | -pipeline_factory = PipelineFactory().with_stages([LogicalStage(), AddOneStage()]) |
232 | | - |
233 | | -# Additional stages can be added later |
234 | | -pipeline_factory.add_stage(LastStage()).with_processor(MyCustomProcessor()) |
235 | | - |
236 | | -# Build the pipeline |
237 | | -pipeline = pipeline_factory.build() |
238 | | -``` |
239 | | - |
240 | | -## Exception Handling |
241 | | - |
242 | | -This package is completely transparent when dealing with exceptions. In no case will this package catch an exception or silence an error. Exceptions should be dealt with on a per-case basis, either inside a _stage_ or at the time the pipeline processes a payload. |
243 | | - |
244 | | -```python |
245 | | -pipeline = Pipeline().pipe(lambda payload: payload / 0) |
246 | | - |
247 | | -try: |
248 | | - await pipeline.process(10) |
249 | | -except ZeroDivisionError as e: |
250 | | - # Handle the exception. |
251 | | - pass |
252 | | -``` |
| 1 | +--8<-- "README.md" |
0 commit comments