perf(rmw_cyclonedds_cpp): Add thread-local caching to rmw_serialize and rmw_deserialize#561
perf(rmw_cyclonedds_cpp): Add thread-local caching to rmw_serialize and rmw_deserialize#561mjcarroll wants to merge 1 commit intoros2:rollingfrom
Conversation
…nd rmw_deserialize This optimization caches MessageTypeSupport and CDR writer objects to avoid expensive repeated construction and string manipulation.
| RMW_SET_ERROR_MSG_WITH_FORMAT_STRING( | ||
| "rmw_serialize: failed to serialize: %s", e.what()); |
There was a problem hiding this comment.
There are still irrelevant formatting changes, it would be great to avoid them.
| void *ros_message) { | ||
| // Cache the resolved typesupport and MessageTypeSupport per type_support | ||
| // pointer, mirroring the rmw_serialize thread-local writer cache. | ||
| // MessageTypeSupport's constructor runs std::regex_replace + ostringstream |
There was a problem hiding this comment.
Does it worth to complicate the code with caching instead of fixing the root cause in the constructor by removing std::regex and std::ostringstream? E.g. something like this:
template<typename MembersType>
MessageTypeSupport<MembersType>::MessageTypeSupport(const MembersType * members)
{
assert(members);
this->members_ = members;
std::string message_namespace(this->members_->message_namespace_);
std::string message_name(this->members_->message_name_);
if (!message_namespace.empty()) {
std::string::size_type pos = 0;
while ((pos = message_namespace.find("__", pos)) != std::string::npos) {
message_namespace.replace(pos, 2, "::");
pos += 2;
}
}
std::string name;
name.reserve(
message_namespace.size() + 2 + 5 + message_name.size() + 1); // "::" + "dds_::" + "_"
if (!message_namespace.empty()) {
name += message_namespace;
name += "::";
}
name += "dds_::";
name += message_name;
name += '_';
this->setName(name.c_str());
}There was a problem hiding this comment.
Perhaps. I'm mostly pushing this so that @jmachowinski can take a look at it as well. This was vibe coded in response to some micro-benchmarks that came up in Zulip, so I wouldn't read too much into the actual implementation versus the general idea behind it.
There was a problem hiding this comment.
Also, the TLS caching looks really good in the mcirobenchmark because you are always publishing the same message repeatedly. In the case that you alternated the message types that you are publishing, you would constantly get cache misses, so optimizing the constructor additionally makes sense.
There was a problem hiding this comment.
Ah so you did in fact manage to genuinely benchmaxx the optimization @mjcarroll, I'm impressed 😂
fujitatomoya
left a comment
There was a problem hiding this comment.
after #562, this still reduces the frequency of construction entirely.
on a thread that repeatedly serializes the same type (which is extremely common in ROS nodes with dedicated publisher threads), we can skip construction altogether.
Cache typesupport in thread local storage for repeated serialization/deserialization. Just a prototype at the moment to start some conversation here based on results from a microbenchmark: https://github.com/mjcarroll/RosBenchmarks
Replaces #560
Related: ros2/rosidl_typesupport_fastrtps#142
Related: ros2/rmw_fastrtps#866
Ref Zulip conversation: #ROS General > Lyrical default RMW
Generated with the help of Claude Sonnet 4.6