Resolve issues identified while investigating #21#22
Conversation
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
|
Linter errors resolved in 0b8fd8a |
|
@clalancette @ivanpauno @wjwwood a quick review to merge these changes to CI looks good, and the few errors are linter ones that have already been resolved. Thanks! |
This is true, I hope at some point we improve the rmw waitset API.
If we're deleting "waited on" objects somewhere, that's a bug. |
rmw_connextdds_common/include/rmw_connextdds/rmw_waitset_std.hpp
Outdated
Show resolved
Hide resolved
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
|
Addressed review comments. Running CI again to validate:
As I mentioned in this comment, based on #21, something "fishy" seemed to be happening in Gazebo's |
|
@ivanpauno after reading the article that you linked, I think the code still suffers from "lost wakeup". Maybe I'm missing something, but I'm considering refactoring to use semaphores and avoid this problem with the condition variable API altogether. |
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
ivanpauno
left a comment
There was a problem hiding this comment.
I still have some concerns related with concurrency.
I also need to take a closer look to the code again...
rmw_connextdds_common/include/rmw_connextdds/rmw_waitset_std.hpp
Outdated
Show resolved
Hide resolved
rmw_connextdds_common/include/rmw_connextdds/rmw_waitset_std.hpp
Outdated
Show resolved
Hide resolved
rmw_connextdds_common/include/rmw_connextdds/rmw_waitset_std.hpp
Outdated
Show resolved
Hide resolved
| std::mutex * waitset_mutex = this->waitset_mutex; | ||
| auto scope_exit = rcpputils::make_scope_exit( | ||
| [waitset_mutex]() | ||
| { | ||
| if (nullptr != waitset_mutex) { | ||
| waitset_mutex->unlock(); | ||
| } | ||
| }); | ||
| if (nullptr != waitset_mutex) { | ||
| waitset_mutex->lock(); | ||
| } | ||
| std::lock_guard<std::mutex> lock(this->mutex_internal); | ||
| this->triggered_data = true; |
There was a problem hiding this comment.
Similar to above, I think this should be:
| std::mutex * waitset_mutex = this->waitset_mutex; | |
| auto scope_exit = rcpputils::make_scope_exit( | |
| [waitset_mutex]() | |
| { | |
| if (nullptr != waitset_mutex) { | |
| waitset_mutex->unlock(); | |
| } | |
| }); | |
| if (nullptr != waitset_mutex) { | |
| waitset_mutex->lock(); | |
| } | |
| std::lock_guard<std::mutex> lock(this->mutex_internal); | |
| this->triggered_data = true; | |
| std::lock_guard<std::mutex> lock(this->mutex_internal); | |
| if (nullptr != this->waitset_mutex) { | |
| std::lock_guard<std::mutex> lock(this->waitset_mutex); | |
| this->triggered_data = true; | |
| } else { | |
| this->triggered_data = true; | |
| } | |
| if (this->waitset_condition) { | |
| this->waitset_condition->notify_one(); | |
| } |
rmw_connextdds_common/include/rmw_connextdds/rmw_waitset_std.hpp
Outdated
Show resolved
Hide resolved
Yes, this kind of code always gets a bit tricky 😬 . The trick with condition variables is to always follow a pattern like this when modifying a "condition" (i.e. something that can modify the return value of |
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
Failures look in line with my expectations:
|
You can ignore that one, it fails with any rmw implementation. |
hidmic
left a comment
There was a problem hiding this comment.
Overall looks reasonable to me too.
Signed-off-by: Andrea Sorbini <asorbini@rti.com>
|
Removed DDS-based implementation altogether in ce3f86d, and created branch asorbini/dds-waitsets to preserve it for future reference. |
|
I opened rmw_connextdds#26 to introduce some QoS optimizations for the reliability protocol which resolve the test failures in |
|
I've been investigating the invalid memory access in I think the problem is that when a Node object is removed from the executor and ultimately deleted, there is no guarantee that the executor thread has already got out of a concurrent When the executor's waitset is awaken, it goes through its list of condition to check their status and "tear down" the Relevant code in rclcpp: Not sure about a fix, because I'm not too familiar with For now, I don't think this is a bug in I wonder if this is the root cause of the concurrent |
Yes, your description sounds like a probelem in |
|
@ivanpauno thanks, looking forward to your findings! |
|
Wow, this is really reducing the ammount of failures when testing with Great work @asorbini !!! |
* Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com>
* Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com>
* Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com>
* Resolve issues identified while investigating #21 (#22) * Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove unavailable time helpers Signed-off-by: Andrea Sorbini <asorbini@rti.com>
* Resolve issues identified while investigating #21 (#22) * Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove unavailable time helpers Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove unsupported QoS event Signed-off-by: Andrea Sorbini <asorbini@rti.com>
* Resolve issues identified while investigating #21 (#22) * Use C++ std WaitSets by default Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Use Rolling in README's Quick Start Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Improved implementation of client::is_service_available for Connext Pro Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Only add request header to typecode with Basic req/rep profile Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove commented/unused code Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Avoid topic name validation in get_info functions Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Reduce shutdown period to 10ms Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Pass HistoryQosPolicy to graph cache * Reset error string after looking up type support Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove DDS-based WaitSet implementation Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove unavailable time helpers Signed-off-by: Andrea Sorbini <asorbini@rti.com> * Remove unsupported QoS event Signed-off-by: Andrea Sorbini <asorbini@rti.com>
This PR includes several improvements identified while investigating issue #21:
stdlibrary.The reason for this is still unclear, but it probably lies in the fundamental mismatch between the ROS 2 WaitSet API and the DDS one. The RMW tries to compensate for the lack of
attach()anddetach()in ROS 2 by trying to cache its arguments acrosswait()calls. Evidently this is not enough, and it may be even partially responsible for unnecessarily added CPU cycles under certain circumstances.stdwas updated to fix the standing issues with "status reporting", and it is now the default.stdimplementation relies on DDS listeners and coordinates events usingstd::condition_variable's andmutex's. Because it uses listeners (which automatically consume status flags and reset "change" counters), it must cache and keep track of reported statuses, in order to then later report the correct "status changes" to the user.RMW_Connext_Client::is_service_available()RMW_Connext_Client::is_service_available()was updated to only returntrueif both a "request reader" and a "reply writer" have been matched from the same remote DomainParticipant.RMW_DURATION_INFINITEinWaitSet::wait()wait_timeoutargument. I noticed in Add rmw_publisher_wait_for_all_acked support #20 that there is this constant so I added support for it.Allow deletion of "waited on" objects (alternative WaitSet implementation)This implementation will most likely be removed altogether, but I made some updates to try to resolve some "catastrophic failures" when running with applications that seem to be deleting objects while also trying to delete them (gzserverfrom Gazebo). This code didn't fully resolve the issues and is probably still bugged, hence why I ultimately went with item1.Note to reviewers: feel free to ignore changes inrmw_impl_waitset_dds.cppandrmw_waitset_dds.hppInitial CI validation: