-
Notifications
You must be signed in to change notification settings - Fork 1
Support custom task serialization #44
Description
Summary
Add an optional serializer argument to @wool.routine that allows users to supply custom serialization and deserialization functions for task arguments and return values. When provided, the custom serializer bypasses cloudpickle on the dispatch path (Task.to_protobuf / Task.from_protobuf), giving users control over how payloads cross the wire.
Motivation
Wool currently uses cloudpickle unconditionally to serialize task arguments, return values, and yielded value. This works for arbitrary Python objects but is suboptimal for types that have more efficient representations — for example, bytes payloads are cloudpickled into an even larger bytestream before being placed into a protobuf bytes field, when they could be passed through directly.
Competing frameworks like Ray avoid this by checking argument types against Arrow-native types and only falling back to cloudpickle for complex objects. However, baking type-specific fast paths into wool's core couples the serialization layer to specific types. A user-facing serializer parameter is more extensible: users (or wool itself in future built-in optimizations) can supply serializers tuned to their workload without modifying the dispatch internals.
Concrete use cases:
- Zero-overhead
bytespassthrough — skip cloudpickle entirely for byte-array workloads - Arrow/numpy integration — serialize large numerical arrays via Arrow IPC instead of pickle
- Domain-specific codecs — e.g., MessagePack, Avro, or custom protobuf messages for structured data