- Darshan D
The end of Moore’s Law necessitates the development of innovative solutions to augment the performance of applications rather than attempting to pack more transistors on a chip and/or increasing the CPU frequency. In this direction, Multicore and Multiprocessor systems are now ubiquitous and are an ideal candidate to improve performance as they enable the execution of parallel, multithreaded programs. However, these parallel applications are often inhibited by the memory allocator which can negatively throttle performance and scalability. Moreover, the memory allocator can also introduce other issues such as false sharing and fragmentation which can be considerable bottlenecks to performance. This project presents a scalable memory allocator that can be used in conjunction with these parallel applications (even those developed with OpenMP). It introduces the concept of per thread heaps with memory ownership that can scale efficiently while almost eliminating false sharing and minimizing fragmentation. The generated results for the developed OpenMP benchmark programs denote substantial promise with the proposed and implemented memory allocator exhibiting considerable improvements in Scalability, False Sharing avoidance, and low Fragmentation as compared to the Malloc and Hoard memory allocators.
-
All the Source Code is placed in the src directory
-
The src directory includes the 3 implemented incremental versions of the memory allocator organized as nested directories as follows:
- 1_Sequential_Memory_Allocator: This corresponds to the single-threaded-only implementation
- 2_Concurrent_Memory_Allocator: This corresponds to the Serial Single Heap with global lock implementation
- 3_Concurrent_Scalable_Memory_Allocator: This corresponds to the finalized implementation with per-thread heaps
-
Each of the above 3 directories contains the following files:
- custom_mem_alloc.h: A header file exposing the interface of the required functions
- custom_mem_alloc.cpp: The implementation file where all the functionalities are implemented
- client_mem_alloc.cpp: A Sanity Client file that verifies the behavior of the implemented functionalities
- Makefile: The Makefile to build the executable for a particular version of the memory allocator
-
The steps to execute the Sanity Client for each of the above implementations are as follows:
cdinto the directory corresponding to the version that you need to verify- Execute the
make cleancommand to remove the previous object files and executable - Execute the
makecommand to generate the executable for the Sanity Client linked with a specific implementation of the memory allocator - Execute the
./<executable_name>command to run the executable where <executable_name> refers to the name of the executable generated by the above makefile
-
It has to be noted that the 3_Concurrent_Scalable_Memory_Allocator version is the finalized version used for experimenting with the benchmarks and generating the required results
-
All the Benchmark programs used for experimentation are placed within the Benchmark_Experiments directory
-
This directory is further organized into the following directories:
- comparison_testing_malloc: Benchmark testing for the malloc memory allocator
- comparison_testing_hoard: Benchmark testing for the hoard memory allocator
- comparison_testing_my_mem_alloc: Benchmark testing for the my_mem_alloc memory allocator which corresponds to the 3_Concurrent_Scalable_Memory_Allocator version implemented as part of this project
-
Each of the above comparison_testing directories is organized as follows:
- 1_Speed: Contains the benchmarks for the Speed metric
- 2_Scalability: Contains the benchmarks for the Scalability metric
- 3_False_Sharing: Contains the benchmarks for the False Sharing metric
- 4_Fragmentation: Contains the benchmarks for the Fragmentation metric
-
It has to be noted that while running the Scalability benchmark, the number of threads needs to be passed as a command line argument
-
In order to run the benchmarks for the malloc memory allocator, follow the steps given below:
- Compile the benchmark program using the command
g++ -fopenmp <benchmark_program> - Run the generated executable using the command
./<executable_name> - While running the Scalability benchmark, the number of threads needs to be passed as a command line argument
- Compile the benchmark program using the command
-
In order to run the benchmarks for the hoard memory allocator, follow the steps given below:
- Clone the source code for Hoard using the command
git clone https://github.com/emeryberger/Hoard cdinto the src directory of the cloned repository- Run the
makecommand to generate the libhoard.so shared object - Run the command
export LD_LIBRARY_PATH=<path_to_so>:$LD_LIBRARY_PATH - Compile the benchmark program using the command
g++ -c -fopenmp <benchmark_program>to generate .o file - Link the .o with the .so using the command
g++ -fopenmp <benchmark_program>.o -L<path_to_so> -lhoard - Run the generated executable using the command
./<executable_name> - While running the Scalability benchmark, the number of threads needs to be passed as a command line argument
- Clone the source code for Hoard using the command
-
In order to run the benchmarks for the my_mem_alloc memory allocator, follow the steps given below:
- Ensure to place the custom_mem_alloc.h from the 3_Concurrent_Scalable_Memory_Allocator directory in the same directory as the benchmark program or change the include path in the benchmark program
- Copy the custom_mem_alloc.o generated using make in 3_Concurrent_Scalable_Memory_Allocator to the directory with the benchmark program
- Compile the benchmark program using the command
g++ -c -fopenmp <benchmark_program> - Link the .o files using the command
g++ -fopenmp <benchmark_program>.o custom_mem_alloc.o - Run the generated executable using the command
./<executable_name> - While running the Scalability benchmark, the number of threads needs to be passed as a command line argument
- The details regarding the implementation are included as part of the Project Report titled Multicore_Project_Report_dd3888_Darshan_Dinesh_Kumar.pdf