Skip to content

perf(rmw_cyclonedds_cpp): Add thread-local caching to rmw_serialize and rmw_deserialize#561

Open
mjcarroll wants to merge 1 commit intoros2:rollingfrom
mjcarroll:optimization/tls-caching
Open

perf(rmw_cyclonedds_cpp): Add thread-local caching to rmw_serialize and rmw_deserialize#561
mjcarroll wants to merge 1 commit intoros2:rollingfrom
mjcarroll:optimization/tls-caching

Conversation

@mjcarroll
Copy link
Member

@mjcarroll mjcarroll commented Mar 5, 2026

Cache typesupport in thread local storage for repeated serialization/deserialization. Just a prototype at the moment to start some conversation here based on results from a microbenchmark: https://github.com/mjcarroll/RosBenchmarks

Replaces #560
Related: ros2/rosidl_typesupport_fastrtps#142
Related: ros2/rmw_fastrtps#866

Ref Zulip conversation: #ROS General > Lyrical default RMW

Generated with the help of Claude Sonnet 4.6

…nd rmw_deserialize

This optimization caches MessageTypeSupport and CDR writer objects to avoid expensive repeated construction and string manipulation.
Comment on lines +1875 to +1876
RMW_SET_ERROR_MSG_WITH_FORMAT_STRING(
"rmw_serialize: failed to serialize: %s", e.what());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still irrelevant formatting changes, it would be great to avoid them.

void *ros_message) {
// Cache the resolved typesupport and MessageTypeSupport per type_support
// pointer, mirroring the rmw_serialize thread-local writer cache.
// MessageTypeSupport's constructor runs std::regex_replace + ostringstream

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it worth to complicate the code with caching instead of fixing the root cause in the constructor by removing std::regex and std::ostringstream? E.g. something like this:

template<typename MembersType>
MessageTypeSupport<MembersType>::MessageTypeSupport(const MembersType * members)
{
  assert(members);
  this->members_ = members;

  std::string message_namespace(this->members_->message_namespace_);
  std::string message_name(this->members_->message_name_);

  if (!message_namespace.empty()) {
    std::string::size_type pos = 0;
    while ((pos = message_namespace.find("__", pos)) != std::string::npos) {
      message_namespace.replace(pos, 2, "::");
      pos += 2;
    }
  }

  std::string name;
  name.reserve(
    message_namespace.size() + 2 + 5 + message_name.size() + 1);  // "::" + "dds_::" + "_"

  if (!message_namespace.empty()) {
    name += message_namespace;
    name += "::";
  }
  name += "dds_::";
  name += message_name;
  name += '_';

  this->setName(name.c_str());
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. I'm mostly pushing this so that @jmachowinski can take a look at it as well. This was vibe coded in response to some micro-benchmarks that came up in Zulip, so I wouldn't read too much into the actual implementation versus the general idea behind it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the TLS caching looks really good in the mcirobenchmark because you are always publishing the same message repeatedly. In the case that you alternated the message types that you are publishing, you would constantly get cache misses, so optimizing the constructor additionally makes sense.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so you did in fact manage to genuinely benchmaxx the optimization @mjcarroll, I'm impressed 😂

Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after #562, this still reduces the frequency of construction entirely.
on a thread that repeatedly serializes the same type (which is extremely common in ROS nodes with dedicated publisher threads), we can skip construction altogether.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants