Implement the serialization of tagged values by pyfisch · Pull Request #1643 · serde-rs/serde

pyfisch · 2019-10-07T15:35:37Z

Add a new method called serialize_tagged_value to the Serializer trait.
The method has a default implementation serializing just the value for all
formats without tags.

Tagged values are the most requested feature of the serde_cbor crate and they are an important part of working with CBOR documents. But tagged values do not fit neatly into serde's data model therefore this small change to serde is required to enable serialization of tagged values in CBOR and possibly other data formats.

Simple CBOR serialization with a string tagged as an URL (tag 32):

let mut ser = serde_cbor::Serializer::new(&mut buf);
assert!(ser.serialize_tagged_value("cbor", &32u64, "http://example.org/").is_ok());

To support tagged values in a serializer for a data format it is necessary to implement serialize_tagged_value and write a custom serializer as I have done for CBOR: https://github.com/pyfisch/cbor/blob/6ad5ea3ade62ceadc20c5bcc832ac85f6c1d3901/src/ser.rs#L511-L695

This PR deliberately only includes serialization and not deserialization as both parts can be discussed and used independently from each other.

What do you think about this extension to serde?

Add a new method called serialize_tagged_value to the Serializer trait. The method has a default implementation serializing just the value for all formats without tags.

pyfisch · 2019-10-22T08:57:26Z

@dtolnay Did you have a look at this? Do you see any problems with this approach?

rkuhn · 2019-10-24T08:46:22Z

This would be highly beneficial for us (Actyx), how can we help?

vmx · 2019-10-24T10:23:48Z

I've implemented the deserialization part based on #408: https://github.com/vmx/serde/tree/tagged-value-deserialization

I dare to cross-post my comment from pyfisch/cbor#157 (comment) in hope that this issue might have more people looking at it:

I'm not happy with the result. One missing thing is to deserialize the value (without the tag) by default. I just couldn't get it working. I always got trait bounds errors. I hope someone can help with that.

I also don't really like the Tagger. I tried several different ways. Traits with generics, traits with associated types, using a deserializer for the tag, implementing a TagAccess trait similar to the SeqAccess trait. I could none of those working. If anyone has a good idea, please let me know. Meanwhile I changed the Tagger to return an Any-type instead of having methods for each type to access it. I like it better, but I'm also open to change that to some other approach.

vmx · 2019-10-27T17:04:54Z

@dtolnay Is there any chance that tagged values become part of the Serde Data Model given the design constraints outlined at pyfisch/cbor#157 (comment) (unrelated to the implementation proposed in later comments)?

I'm asking as only if there is a chance, it makes sense for @pyfisch and me to spend more time on this. I really need tagged values and else I would need to look into other ways solving it and I fear that I would eventually end up with a less good re-implementation of Serde.

dtolnay

Thanks for the PR! I think unfortunately introducing support for tagged values wouldn't be the right course for Serde. It isn't a goal for the data model to be a union of all the features of data formats that people care about. Rather it's a fairly selective subset that translates well across a range of formats (and maybe would be smaller if it were being redesigned today). Formats that equate naturally with the data model are a good fit for Serde. Formats where a subset can be made to fit the data model (like serde_cbor today) are still often useful for working within that subset. Beyond that, I would recommend building dedicated libraries to get full coverage of individual formats -- such as I've encouraged in the past for formats that care about element types of empty lists and maps, formats that care about preserving comments or whitespace, formats that distinguish between fields and attributes, etc.

pyfisch · 2019-10-27T22:24:05Z

Thanks for your comment. I am disappointed by your response.

Beyond that, I would recommend building dedicated libraries to get full coverage of individual formats

serde-cbor already covers 95% of CBOR it just needs this tiny bit of help from serde to push it across the finish line. Building a separate library to implement all of serde would be a lot of work and you'd lose all the benefits of integration into the serde ecosystem.

dtolnay · 2019-10-27T22:43:20Z

From my perspective the trouble is that many other data formats also need a different tiny bit of help, and even if each one is only a 5% extension, we really need to draw a line to keep the overall complexity of the core under control.

There should be ways for a library with its own traits to integrate with the Serde ecosystem, such as providing blanket impls for its traits for any type that implements Serde's traits.

vmx · 2019-10-28T15:19:44Z

I agree that projects shouldn't just blindly implement whatever is requested from users. It's important to keep scope so you don't end up in some hard to maintain state where lots of additions are only used by a small subset of people.

Though I think things are different with Serde and this request to add tags:

Serde is the de facto standard library in the Rust ecosystem to serialize/deserialize things. It looks like it started as "I want to serialize my Rust types" library, but in my opinion it long outgrew that use case and is used in a bigger scope now. I think if you are implementing some serialization the go-to crate is Serde. Just using something else is not as easy as with other libraries for e.g. error handling or logging, where there are several implementations for different tasts and you could pick the one that fits best. Of course you could create your own library, but so much work has been put into Serde over the years, it would be a waste re-implementing a similar library.
This is not the request from a single library author that needs one special feature that is only for his implementation. Several formats work around this limitation in the same hacky way (CBOR, MessagePack, TOML). If such a common pattern emerges, I think it makes sense to move it to the upstream library and solve it once for good there. There's also more formats that have tag support (BSON, YAML) that could leverage that feature. Correct me if I'm wrong, but it would also apply to formats that don't have direct tag support, but implement types that are not part of the Serde Data Model (e.g. TOML's date type).
It has been discussed in the past between @pyfisch, @oli-obk, @dtolnay, @erickt and it sounded like there was consensus that this feature is a good idea if a good implementation is found.

rklaehn · 2019-10-29T10:35:57Z

Is there possibly a way to keep the serde data model as it is, but just add a mechanism for different data formats that have a bit that is outside the serde data model to work? Just a lightweight optional communication channel between values and serializers/deserializers?

This could be beneficial not just for tagged values, but also for other things that are currently done in a slightly hackish way, like e.g. RawValue support in serde_json, where a "magic value" is used as a communication channel between a special value and a certain serializer.

The tag hack that @vmx came up with in serde_cbor ( pyfisch/cbor#151 ) and the RawValue hack ( https://github.com/serde-rs/json/blob/master/src/raw.rs#L220) kinda have the same basic approach.

I have not yet spent time thinking about this, but would you be willing to add something to make the life of formats that go beyond the serde data model easier if it does not include extending the serde data model?

pyfisch · 2019-11-10T15:34:50Z

Correct me if I'm wrong, but it would also apply to formats that don't have direct tag support, but implement types that are not part of the Serde Data Model (e.g. TOML's date type).

Yes this could be used for the TOML date type or to differentiate between undefined and null in data formats which usually map to unit in the serde data model.

Just a lightweight optional communication channel between values and serializers/deserializers?

Tags are kind of a lightweight communication channel. I made sure that they are entirely optional and specific to a given data format so messages are not confused between different formats. I have not checked if the proposed tags are sufficient for deserializing a RawValue but serialization would work and I am confident that a design for deserialization would be found too that enables both tags and raw values.

rklaehn · 2019-11-15T09:04:01Z

Tags are kind of a lightweight communication channel. I made sure that they are entirely optional and specific to a given data format so messages are not confused between different formats. I have not checked if the proposed tags are sufficient for deserializing a RawValue but serialization would work and I am confident that a design for deserialization would be found too that enables both tags and raw values.

If many of these ugly hacks like RawValue etc. can be made nicer with an additional communication channel, that probably increases the chances of getting the additional communication channel merged.

Would having an additional &str parameter for newtype structs be sufficient (as described in #1556 )

pyfisch · 2019-11-15T13:23:03Z

I don't think an additional &'static str parameter would be sufficient as for tags I need to pass two pieces of information, namely the format CBOR and the tag e.g. 42. But I have not thought about this much, so it may also be possible. Am 15. November 2019 10:04:09 MEZ schrieb "Rüdiger Klaehn" <notifications@github.com>:

…

> Tags are kind of a lightweight communication channel. I made sure that they are entirely optional and specific to a given data format so messages are not confused between different formats. I have not checked if the proposed tags are sufficient for deserializing a RawValue but serialization would work and I am confident that a design for deserialization would be found too that enables both tags and raw values. If many of these ugly hacks like RawValue etc. can be made nicer with an additional communication channel, that probably increases the chances of getting the additional communication channel merged. Would having an additional &str parameter for newtype structs be sufficient (as described in #1556 ) -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #1643 (comment)

rklaehn · 2019-11-16T14:35:34Z

I don't think an additional &'static str parameter would be sufficient as for tags I need to pass two pieces of information, namely the format CBOR and the tag e.g. 42. But I have not thought about this much, so it may also be possible.

I am just trying to think of something really minimal that has a chance to get accepted. You could of course encode CBOR:42 in a string, but that would be super ugly. Maybe make the additional newtype parameter a &'static [u8] so all the different tagged formats can make up whatever they need to transfer to the visitor?

Implement the serialization of tagged values

70c4acb

Add a new method called serialize_tagged_value to the Serializer trait. The method has a default implementation serializing just the value for all formats without tags.

This was referenced Oct 7, 2019

Implement semantic tagging of values pyfisch/cbor#157

Closed

Add tags support behind feature flag pyfisch/cbor#151

Closed

dtolnay reviewed Oct 27, 2019

View reviewed changes

dtolnay closed this Oct 27, 2019

vmx mentioned this pull request Jan 28, 2020

Rust IPLD ipld/ipld#83

Open

not-my-profile mentioned this pull request Apr 3, 2021

Add deserialize-datetime-as-string feature flag toml-rs/toml-rs#423

Closed

vmx mentioned this pull request Feb 15, 2022

Design for Data Model Extensibility mitsuhiko/deser#13

Open

serde-rs locked and limited conversation to collaborators Feb 15, 2022

Uh oh!

Conversation

pyfisch commented Oct 7, 2019

Uh oh!

pyfisch commented Oct 22, 2019

Uh oh!

rkuhn commented Oct 24, 2019

Uh oh!

vmx commented Oct 24, 2019

Uh oh!

vmx commented Oct 27, 2019

Uh oh!

dtolnay left a comment

Choose a reason for hiding this comment

Uh oh!

pyfisch commented Oct 27, 2019

Uh oh!

dtolnay commented Oct 27, 2019

Uh oh!

vmx commented Oct 28, 2019

Uh oh!

rklaehn commented Oct 29, 2019

Uh oh!

pyfisch commented Nov 10, 2019

Uh oh!

rklaehn commented Nov 15, 2019

Uh oh!

pyfisch commented Nov 15, 2019 via email

Uh oh!

rklaehn commented Nov 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants