Skip to content

Implement the serialization of tagged values#1643

Closed
pyfisch wants to merge 1 commit intoserde-rs:masterfrom
pyfisch:tagged-value-serialization
Closed

Implement the serialization of tagged values#1643
pyfisch wants to merge 1 commit intoserde-rs:masterfrom
pyfisch:tagged-value-serialization

Conversation

@pyfisch
Copy link
Contributor

@pyfisch pyfisch commented Oct 7, 2019

Add a new method called serialize_tagged_value to the Serializer trait.
The method has a default implementation serializing just the value for all
formats without tags.

Tagged values are the most requested feature of the serde_cbor crate and they are an important part of working with CBOR documents. But tagged values do not fit neatly into serde's data model therefore this small change to serde is required to enable serialization of tagged values in CBOR and possibly other data formats.

Simple CBOR serialization with a string tagged as an URL (tag 32):

let mut ser = serde_cbor::Serializer::new(&mut buf);
assert!(ser.serialize_tagged_value("cbor", &32u64, "http://example.org/").is_ok());

To support tagged values in a serializer for a data format it is necessary to implement serialize_tagged_value and write a custom serializer as I have done for CBOR: https://github.com/pyfisch/cbor/blob/6ad5ea3ade62ceadc20c5bcc832ac85f6c1d3901/src/ser.rs#L511-L695

This PR deliberately only includes serialization and not deserialization as both parts can be discussed and used independently from each other.

What do you think about this extension to serde?

Add a new method called serialize_tagged_value to the Serializer trait.
The method has a default implementation serializing just the value for all
formats without tags.
@pyfisch
Copy link
Contributor Author

pyfisch commented Oct 22, 2019

@dtolnay Did you have a look at this? Do you see any problems with this approach?

@rkuhn
Copy link

rkuhn commented Oct 24, 2019

This would be highly beneficial for us (Actyx), how can we help?

@vmx
Copy link

vmx commented Oct 24, 2019

I've implemented the deserialization part based on #408: https://github.com/vmx/serde/tree/tagged-value-deserialization

I dare to cross-post my comment from pyfisch/cbor#157 (comment) in hope that this issue might have more people looking at it:

I'm not happy with the result. One missing thing is to deserialize the value (without the tag) by default. I just couldn't get it working. I always got trait bounds errors. I hope someone can help with that.

I also don't really like the Tagger. I tried several different ways. Traits with generics, traits with associated types, using a deserializer for the tag, implementing a TagAccess trait similar to the SeqAccess trait. I could none of those working. If anyone has a good idea, please let me know. Meanwhile I changed the Tagger to return an Any-type instead of having methods for each type to access it. I like it better, but I'm also open to change that to some other approach.

@vmx
Copy link

vmx commented Oct 27, 2019

@dtolnay Is there any chance that tagged values become part of the Serde Data Model given the design constraints outlined at pyfisch/cbor#157 (comment) (unrelated to the implementation proposed in later comments)?

I'm asking as only if there is a chance, it makes sense for @pyfisch and me to spend more time on this. I really need tagged values and else I would need to look into other ways solving it and I fear that I would eventually end up with a less good re-implementation of Serde.

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think unfortunately introducing support for tagged values wouldn't be the right course for Serde. It isn't a goal for the data model to be a union of all the features of data formats that people care about. Rather it's a fairly selective subset that translates well across a range of formats (and maybe would be smaller if it were being redesigned today). Formats that equate naturally with the data model are a good fit for Serde. Formats where a subset can be made to fit the data model (like serde_cbor today) are still often useful for working within that subset. Beyond that, I would recommend building dedicated libraries to get full coverage of individual formats -- such as I've encouraged in the past for formats that care about element types of empty lists and maps, formats that care about preserving comments or whitespace, formats that distinguish between fields and attributes, etc.

@dtolnay dtolnay closed this Oct 27, 2019
@pyfisch
Copy link
Contributor Author

pyfisch commented Oct 27, 2019

Thanks for your comment. I am disappointed by your response.

Beyond that, I would recommend building dedicated libraries to get full coverage of individual formats

serde-cbor already covers 95% of CBOR it just needs this tiny bit of help from serde to push it across the finish line. Building a separate library to implement all of serde would be a lot of work and you'd lose all the benefits of integration into the serde ecosystem.

@dtolnay
Copy link
Member

dtolnay commented Oct 27, 2019

From my perspective the trouble is that many other data formats also need a different tiny bit of help, and even if each one is only a 5% extension, we really need to draw a line to keep the overall complexity of the core under control.

There should be ways for a library with its own traits to integrate with the Serde ecosystem, such as providing blanket impls for its traits for any type that implements Serde's traits.

@vmx
Copy link

vmx commented Oct 28, 2019

I agree that projects shouldn't just blindly implement whatever is requested from users. It's important to keep scope so you don't end up in some hard to maintain state where lots of additions are only used by a small subset of people.

Though I think things are different with Serde and this request to add tags:

  • Serde is the de facto standard library in the Rust ecosystem to serialize/deserialize things. It looks like it started as "I want to serialize my Rust types" library, but in my opinion it long outgrew that use case and is used in a bigger scope now. I think if you are implementing some serialization the go-to crate is Serde. Just using something else is not as easy as with other libraries for e.g. error handling or logging, where there are several implementations for different tasts and you could pick the one that fits best. Of course you could create your own library, but so much work has been put into Serde over the years, it would be a waste re-implementing a similar library.
  • This is not the request from a single library author that needs one special feature that is only for his implementation. Several formats work around this limitation in the same hacky way (CBOR, MessagePack, TOML). If such a common pattern emerges, I think it makes sense to move it to the upstream library and solve it once for good there. There's also more formats that have tag support (BSON, YAML) that could leverage that feature. Correct me if I'm wrong, but it would also apply to formats that don't have direct tag support, but implement types that are not part of the Serde Data Model (e.g. TOML's date type).
  • It has been discussed in the past between @pyfisch, @oli-obk, @dtolnay, @erickt and it sounded like there was consensus that this feature is a good idea if a good implementation is found.

@rklaehn
Copy link

rklaehn commented Oct 29, 2019

Is there possibly a way to keep the serde data model as it is, but just add a mechanism for different data formats that have a bit that is outside the serde data model to work? Just a lightweight optional communication channel between values and serializers/deserializers?

This could be beneficial not just for tagged values, but also for other things that are currently done in a slightly hackish way, like e.g. RawValue support in serde_json, where a "magic value" is used as a communication channel between a special value and a certain serializer.

The tag hack that @vmx came up with in serde_cbor ( pyfisch/cbor#151 ) and the RawValue hack ( https://github.com/serde-rs/json/blob/master/src/raw.rs#L220) kinda have the same basic approach.

I have not yet spent time thinking about this, but would you be willing to add something to make the life of formats that go beyond the serde data model easier if it does not include extending the serde data model?

@pyfisch
Copy link
Contributor Author

pyfisch commented Nov 10, 2019

Correct me if I'm wrong, but it would also apply to formats that don't have direct tag support, but implement types that are not part of the Serde Data Model (e.g. TOML's date type).

Yes this could be used for the TOML date type or to differentiate between undefined and null in data formats which usually map to unit in the serde data model.

Just a lightweight optional communication channel between values and serializers/deserializers?

Tags are kind of a lightweight communication channel. I made sure that they are entirely optional and specific to a given data format so messages are not confused between different formats. I have not checked if the proposed tags are sufficient for deserializing a RawValue but serialization would work and I am confident that a design for deserialization would be found too that enables both tags and raw values.

@rklaehn
Copy link

rklaehn commented Nov 15, 2019

Tags are kind of a lightweight communication channel. I made sure that they are entirely optional and specific to a given data format so messages are not confused between different formats. I have not checked if the proposed tags are sufficient for deserializing a RawValue but serialization would work and I am confident that a design for deserialization would be found too that enables both tags and raw values.

If many of these ugly hacks like RawValue etc. can be made nicer with an additional communication channel, that probably increases the chances of getting the additional communication channel merged.

Would having an additional &str parameter for newtype structs be sufficient (as described in #1556 )

@pyfisch
Copy link
Contributor Author

pyfisch commented Nov 15, 2019 via email

@rklaehn
Copy link

rklaehn commented Nov 16, 2019

I don't think an additional &'static str parameter would be sufficient as for tags I need to pass two pieces of information, namely the format CBOR and the tag e.g. 42. But I have not thought about this much, so it may also be possible.

I am just trying to think of something really minimal that has a chance to get accepted. You could of course encode CBOR:42 in a string, but that would be super ugly. Maybe make the additional newtype parameter a &'static [u8] so all the different tagged formats can make up whatever they need to transfer to the visitor?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants