-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Issue
Based on discussion over at ros2/rclcpp#1447 (comment)
If setup code is needed during each iteration of a benchmark run, and that setup code can take a long time relative to the code under test, google benchmarks will choose to run the benchmark for an abnormally long time. This can happen when PauseTiming/ResumeTiming or SetIterationTime is used to avoid google benchmark from measuring the full iteration time.
For example, this benchmark was removed in ros2/rclcpp#1447 due to it's long runtime. (Note this version is fixed so the destructors of client/service are not included in the measured time).
BENCHMARK_F(PerformanceTest, get_node_name)(benchmark::State & state)
{
int count = 0;
for (auto _ : state) {
state.PauseTiming();
const std::string service_name = std::string("service_") + std::to_string(count++);
auto client = node->create_client<test_msgs::srv::Empty>(service_name);
auto callback =
[](
const test_msgs::srv::Empty::Request::SharedPtr,
test_msgs::srv::Empty::Response::SharedPtr) {};
auto service =
node->create_service<test_msgs::srv::Empty>(service_name, callback);
state.ResumeTiming();
client->wait_for_service(std::chrono::seconds(1));
state.PauseTiming();
client.reset();
service.reset();
state.ResumeTiming();
}
}
Here client->wait_for_service() was measured at 4us, but creating the client, service and then destroying both would take many orders of magnitude more time. Google benchmarks chooses to run based on the configured MinTime, which is defaulted to about a second. So it's looking to run this benchmark about 200,000 iterations.
Based on these benchmark results:
http://build.ros2.org/view/Rci/job/Rci__benchmark_ubuntu_focal_amd64/193/testReport/projectroot.test/benchmark/benchmark_client/
http://build.ros2.org/view/Rci/job/Rci__benchmark_ubuntu_focal_amd64/193/testReport/projectroot.test/benchmark/benchmark_service/
Constructing and destructing a client and service both would take about 500us-600us. At 200,000 iterations, this benchmark would be required to run for 120 seconds to achieve 1 second of evaluation time for the code under test.
Possible resolutions
There may be more resolutions identified, these are just suggestions to get the ball rolling.
-
Make use of
MinTimeand accept the reduced accuracy of measurement averages. CallingMinTimeon a registered benchmark, can override the default value of 1 second. The MinTime duration would need to be chosen specifically for each benchmark and might need to be periodically updated if significant changes to the source code occur. The downside of reducing the MinTime for a benchmark is that fewer iterations are measured and that could decrease the accuracy of the averaged result. -
Measure the full iteration time with and without code under test. In the example above, it would require creating two benchmarks. One with the call to
wait_for_serviceand another without it. The benchmarks could then be compared side-by-side. The benefit here is that the reduced number of iterations is automatically chosen by google benchmark and can adjust as the source code changes. The disadvantage is that we'd be measuring the difference of averages, which would be more noisy than choice 1.