Skip to content

Support custom task serialization #44

@conradbzura

Description

@conradbzura

Summary

Add an optional serializer argument to @wool.routine that allows users to supply custom serialization and deserialization functions for task arguments and return values. When provided, the custom serializer bypasses cloudpickle on the dispatch path (Task.to_protobuf / Task.from_protobuf), giving users control over how payloads cross the wire.

Motivation

Wool currently uses cloudpickle unconditionally to serialize task arguments, return values, and yielded value. This works for arbitrary Python objects but is suboptimal for types that have more efficient representations — for example, bytes payloads are cloudpickled into an even larger bytestream before being placed into a protobuf bytes field, when they could be passed through directly.

Competing frameworks like Ray avoid this by checking argument types against Arrow-native types and only falling back to cloudpickle for complex objects. However, baking type-specific fast paths into wool's core couples the serialization layer to specific types. A user-facing serializer parameter is more extensible: users (or wool itself in future built-in optimizations) can supply serializers tuned to their workload without modifying the dispatch internals.

Concrete use cases:

  • Zero-overhead bytes passthrough — skip cloudpickle entirely for byte-array workloads
  • Arrow/numpy integration — serialize large numerical arrays via Arrow IPC instead of pickle
  • Domain-specific codecs — e.g., MessagePack, Avro, or custom protobuf messages for structured data

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or capability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions