Prototype for Mimick-based memory recording#14
Conversation
e61c4b8 to
3d60945
Compare
More testing is needed before we switch over to this new memory recording approach, but I added a test demonstrating how to use it right now. The Benchmark MemoryManager system exposes the metrics under different names, and unfortunately, only the JSON reporter is currently writing them out. Signed-off-by: Scott K Logan <logans@cottsay.net>
The provided console reporter in Benchmark doesn't display memory statistics at all. This change adds two things: 1. A BenchmarkReporter which augments the run data so that the memory statistics are shown as user counters (even though they're not). 2. A modified version of libbenchmark_main.so which utilizes (1) in place of the default console reporter. Note that use of (2) means that the command line arguments that augment the behavior of the console output won't work. The API doesn't provide a mechanism to get those options after they've been parsed. Also note that the JSON reporter does process memory statistics, so no change is necessary there. Signed-off-by: Scott K Logan <logans@cottsay.net>
3d60945 to
b48fb91
Compare
|
I had to drop the Also, the Microsoft implementation of I'm still having issues with aborts in Windows in my local builds, but ci.ros2.org seems to show that it's working as expected in both |
brawner
left a comment
There was a problem hiding this comment.
I'm really excited about those results. Showing practically no timing overhead during the timing runs, but also getting correct memory allocation results. Very cool. It might be helpful to run a benchmark with thousands of allocations to match creating/destroying a node to see if the pointer map is prohibitively expensive.
| memory_manager->Reset(); | ||
| } | ||
|
|
||
| void MimickPerformanceTest::set_are_allocation_measurements_active(bool value) |
There was a problem hiding this comment.
Probably worth renaming this function since it's no longer a simple setter.
There was a problem hiding this comment.
How about PauseRecording/ResumeRecording to mirror benchmark::State's PauseTiming/ResumeTiming?
|
I also don't have any good feedback to provide about the two main questions you are seeking answers to. Unfortunately. Tagging @hidmic to get his eyes on this PR to see if he had any general recommendations. |
Signed-off-by: Scott K Logan <logans@cottsay.net>
Good idea, I'll look into that. I'm not sure I have an alternative if it isn't - I don't see any other way to track the high water mark memory usage without it (unless I'm missing something obvious). The map is the only reason I'm using a mutex, otherwise atomic variables would be sufficient. |
What about allocating |
There are four Mimick or
|
We'd have to reverse it, but maybe. If we allocated There's one huge catch though - if ANY memory is freed during the test that wasn't allocated during the test, we'll probably segfault. Which is really not ideal. |
|
@brawner - we'd have to make explicit implementations for each platform, but this may be an option for getting rid of the map: https://stackoverflow.com/a/1281720 If you think about it though, our high water mark statistic is still going to get messed up if you try to free something that wasn't allocated when timing was enabled. The whole "pause recording" think could make it even worse. I'm not sure we can avoid at least maintaining a list of what allocations were performed while recording is enabled... |
|
That's a pretty good concern I didn't fully think through. I agree that it gets tricky during paused recording. It might still have to allocate the space and just set the memory size to 0 so we know it's an untracked pointer. That doesn't solve the allocate before tracking and free during issue. I'm not expecting std::map to be a problem for something less that 10,000 items though, just thought it would be good to get a rough handle on impact. |
|
Another thing to think about is that we've talked about the need to filter memory recording based on which thread is performing the operation. Depending on exactly which way we implement this (i.e. allow-list or block-list of thread IDs), we'll probably have a std::set that will need to be protected as well. Might be easier to just keep the mutex. |
Also store the size of the allocation at the beginning of the allocation. Note that this could fail catastrophically if the code under test overwrites that block, but this approach appears to be more performant. Signed-off-by: Scott K Logan <logans@cottsay.net>
Signed-off-by: Scott K Logan <logans@cottsay.net>
The calloc function isn't one of the Mimick "vital" functions, so we don't have a safe function to call in our stub. Since timing performance isn't really a concern here, and our underlying infrastructure can't handle operations bigger than size_t anyway, I just made the stub call malloc and memset to get similar behavior. Signed-off-by: Scott K Logan <logans@cottsay.net>
Signed-off-by: Scott K Logan <logans@cottsay.net>
hidmic
left a comment
There was a problem hiding this comment.
Ability to build a SHARED library. Mimick is explicitly STATIC, which I don't really care about, but it doesn't build with PIC, so it forces downstream artifacts to also be STATIC (again, unless I'm missing something obvious). Maybe it's a technical limitation that it needs to be static - I'm not sure.
I wouldn't expect havoc if we flip -fPIC on, but I haven't tried. Perhaps I'm missing some obscure detail too.
Explicit initialization function. I need the "vial" functions to be initialized for the mmk_allocator to work. What I have right now is hacky - it would be nice if mmk_init was callable. Alternatively, we could submit the mmk_allocator upstream - it might be useful to others.
Pushing mmk_allocator into mimick sounds reasonable.
There are symbols getting defined when you include mimick.h. You can reproduce this by creating two empty cpp files that both #include "mimick.h" and try to compile them into the same executable or library. There is a symbol collision (only C++ - C works just fine).
I bet it's bad inlining within literal.h.
The headers getting installed don't appear to have the right hierarchy. I would expect /opt/ros/rolling/include/mimick.h and /opt/ros/rolling/include/mimick/.h so that I could compile the Mimick sample as-is. Right now, we're doing #include "mimick/mimick.h", where the sample makes it look like we should just #include <mimick.h>.
Hmm, not sure I follow you here. I can see #include "mimick/mimick.h" in the sample. This is something we did to please cpplint.
| &std::remove_reference<decltype(*inst)>::type::func, \ | ||
| std::remove_reference<decltype(*inst)>::type, \ | ||
| __VA_ARGS__>, \ | ||
| inst); |
There was a problem hiding this comment.
@cottsay meta: some of this machinery is already available at brawner/test_mocking_utils#1. This one's neat though, perhaps we can mix and match.
There was a problem hiding this comment.
I don't have any plans of making test_mocking_utils official. So if you want to make use of the ideas and steal code, feel free to incorporate them here.
| decltype(benchmark::MemoryManager::Result::max_bytes_used) max_bytes_used; | ||
| decltype(benchmark::MemoryManager::Result::num_allocs) num_allocs; | ||
| std::unique_ptr<std::unordered_set<void *, std::hash<void *>, std::equal_to<void *>, | ||
| mmk_allocator<void *>>> ptr_set; |
There was a problem hiding this comment.
@cottsay why the extra level of indirection using std::unique_ptr?
There was a problem hiding this comment.
I did that so that I could forward-declare mmk_allocator and avoid including "mimick.h" in this public API header.
I didn't know we'd forked Mimick. I was looking at the upstream: https://github.com/Snaipe/Mimick/blob/87e9898ebc9f1644c00df74dc1b34bf20391661b/sample/strdup/test.c#L1-L2 |
Oh, yes. We did. I should spend time submitting our changes upstream though... |
Signed-off-by: Scott K Logan <logans@cottsay.net>
After showing that this approach works, this commit actually swaps the PerformanceTestFixture from using osrf_testing_tools_cpp to using Mimick. Signed-off-by: Scott K Logan <logans@cottsay.net>
Signed-off-by: Scott K Logan <logans@cottsay.net>
|
Alright, this isn't working out to be quite as easy as I thought it would. In the main test case, stubbing the four memory functions works just fine, but it doesn't appear to work in all cases. I tried running the downstream benchmarks for other packages with this change, and the stubbing fails with I tried a rather contrived change to iterate over each of the loaded libraries and specifically tried stubbing the memory functions in each one, but that seemed to be a little TOO aggressive. I had to specifically skip I'm not really sure where to go from here. Maybe Mimick isn't suited to do what we want it to here. |
There are several parts in this change, which when combined, yield a better mechanism for memory measurements than we're currently using in this package.
Here's a breakdown of the components:
benchmark::MemoryManagerinterface by using Mimick to stub outmalloc,realloc, andfree. Since Mimick is making changes that affect the whole executable, it should be noted that you should NEVER try to use more than one instance of this class at the same time. Internally, Mimick is using un-stubbed variants of these allocation functions it refers to as "vital". TheMimickMemoryManageruses these same functions to do the actual allocation operations before recording the relevant statistics.It should be noted that we are not currently using the
benchmark::MemoryManagermodel for memory tracking in this package. We're collecting statistics in the test fixture and adding the results as "user counters". One of the biggest advantages of taking the approach presented here is that memory instrumentation isn't active during timing runs, whereas our current approach is impacting allocation performance during timing runs quite substantially.PerformanceTestfixture, this fixture isn't responsible for performing any memory recording. Instead, it's used to ensure that theMimickMemoryManagergets registered and unregistered for a particular run. From what I can tell, the Benchmark API makes it look like we're supposed to register thebenchmark::MemoryManagerbefore any of the tests run, meaning that all tests, whether they use fixtures or not, would have memory measurements enabled. This fixture essentially bends the API to act the way we're currently expecting memory measurements to be enabled.Honestly, because this new implementation should work on all of our supported platforms, I'm tempted to instead introduce a custom version of
libbenchmark_main.sothat registersMimickMemoryManagerfor all of the tests, so everything always gets memory measurements and we don't have to use a fixture to turn it on. Unfortunately that's not possible at the moment because the benchmark main library needs to beSHARED, andlibmimick.awas specifically compiledSTATICwithout PIC. We'd need a change to Mimick for sure, but it might be doable.benchmark::ConsoleReporterimplementation doesn't display any of the memory statistics. This isn't an enormous problem because the JSON reporter DOES include the memory statistics, so they'll get displayed in Jenkins anyway, but I think that seeing the statistics on the console is really important to local development. This implementation augments the run data, and if memory statistics are present, it fakes those statistics as user counters, which ARE displayed bybenchmark::ConsoleReporter.MemoryAwareConsoleReporterwas to re-implementBENCHMARK_MAINmanually. I don't see too much of a problem doing this, since the process is very straightforward, but there is a downside: the command line arguments include modifiers for the behavior ofbenchmark::ConsoleReporter, and the flags that get parsed from those arguments aren't exposed in the Benchmark API. Since I'm instantiating my custom reporter class manually, and I don't have the flags that Benchmark parsed, I can't communicate them to the reporter. This isn't a big deal because the flags only seem to affect formatting, and we haven't used them in the past. I'm pretty sure we don't actually have a way to affect those command line arguments without hard-coding them intoament_cmake_google_benchmarkanyway.mmk_wrapperandmmk_stub_create_wrappedare used to create a non-member function that uses a class instance passed through the Mimick stub's context to invoke a public member function on the instance. There is some pretty thick template code here, but I used it as a learning exercise to generalize the concept I had originally implemented manually for each of the three functions we needed to stub.mmk_allocatoris a simple C++ allocator that should redirect allocation operations directly to the Mimick "vital" allocation methods, thereby ensuring that those operations aren't redirected to stub functions. I'm using this to make sure that the pointer map that's used for allocation size tracking doesn't affect the memory statistics when it grows.add_performance_testmacro, it's running on all platforms. It's mostly a copy/paste frombenchmark_malloc_realloc, but I modified it to run a third test where the memory measurements are forced on, so that we can see how much overheadMimickMemoryManageris causing. Keep in mind that registering abenchmark::MemoryManageris not the same as starting it. Normally, the Benchmark test runner specifically starts thebenchmark::MemoryManagerto do a small number of runs that are not timed. In this "forced" test, I'm turning them on explicitly. That should never happen during typical use, but I wanted to see the performance.I think we need to do two things before we can move forward here:
benchmark::ConsoleReporterto show the memory statistics or if we're OK with the current solution based onMemoryAwareConsoleReporterandmemory_aware_benchmark_main.benchmark::MemoryManagerseems to be designed to do, where the memory statistics are always collected, we'll need to figure out how to get Mimick to play nice withSHAREDlibraries. The fixture-based approach doesn't strictly require that because unlike thelibbenchmark_mainreplacement which must beSHARED, the fixture can beSTATIC, but it might be nice to build the fixture asSHAREDanyway.