Skip to content

OpenMP NUMA first touch replay does not exactly reproduce the original behavior #190

@mihailpopov

Description

@mihailpopov

Pages are allocated in NUMA systems with the lazy first touch policy: a page is mapped to the NUMA domain of the thread which first touches it. To ensure faithful codelets replay over NUMA systems, CERE must map the pages as they were in the original run.

At replay, CERE uses an OpenMP region to touch previously recorded pages with strncpy. While this method is more faithful to the original run than just touching all the pages from a serial region of code, it does not always faithfully reproduce the original mapping.

We did the following test to show that the current NUMA mapping is not correct. We focus on the parallel region rhs from SP OMP over 4 NUMA nodes. We consider 2 versions. First, we use a first touch file where all the pages are touched by the same master thread. Second, we unset CERE_FIRST_TOUCH to touch all the pages within a serial region. These two versions should have the same performance. Yet, the first is 25% faster.

A solution to address this issue is to use libnuma. In particular, the function "numa_move_pages" moves a page to a specific NUMA domain. This function can also be used to check the actual allocation of a page.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions