OFI MR flags selection based on the provider #1077
OFI MR flags selection based on the provider #1077wrrobin wants to merge 5 commits intoSandia-OpenSHMEM:mainfrom
Conversation
|
What is MR mode?
From: Md Rahman ***@***.***>
Date: Thursday, December 1, 2022 at 6:06 PM
To: Sandia-OpenSHMEM/SOS ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [Sandia-OpenSHMEM/SOS] OFI MR flags selection based on the provider (PR #1077)
For most of the OFI providers, SOS uses a specific MR mode selection. However, user still has the option to select a different mode that may not be supported. This is an attempt to select the MR mode based on the provider by default. User may still choose a different mode during configuration.
…________________________________
You can view, comment on, or merge this pull request online at:
#1077
Commit Summary
* e0be06f<e0be06f> Removed deprecated flag; added trial MR hint based on provider
* f3a0540<f3a0540> Initial changes to adopt provider based MR flag selection
File Changes
(4 files<https://github.com/Sandia-OpenSHMEM/SOS/pull/1077/files>)
* M configure.ac<https://github.com/Sandia-OpenSHMEM/SOS/pull/1077/files#diff-49473dca262eeab3b4a43002adb08b4db31020d190caaad1594b47f1d5daa810> (8)
* M src/shmem_env_defs.h<https://github.com/Sandia-OpenSHMEM/SOS/pull/1077/files#diff-f68772abccc3858c31d45e1eb92a77ae0f449462ae2864ff76a9e37a29e89726> (2)
* M src/transport_ofi.c<https://github.com/Sandia-OpenSHMEM/SOS/pull/1077/files#diff-3cac951d73790a8a9ce2e3647758bc66cd9fc5d4c3243173fee89846df3d2ce3> (222)
* M src/transport_ofi.h<https://github.com/Sandia-OpenSHMEM/SOS/pull/1077/files#diff-d085d3b6407317efe43e401b2c469a5edf6c802fb513363ddd5043f10781817d> (96)
Patch Links:
* https://github.com/Sandia-OpenSHMEM/SOS/pull/1077.patch
* https://github.com/Sandia-OpenSHMEM/SOS/pull/1077.diff
—
Reply to this email directly, view it on GitHub<#1077>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AZXQXSN756OQSFW5I5BL2QTWLEVNXANCNFSM6AAAAAASRJ23EM>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
|
@stewartl318 - Great question. In OFI memory registration, there are certain flags that define the behavior of the operations. For example, OFI internally can choose memory region attributes v/s. the user can choose by themselves. These settings are controlled by different MR mode flags. Traditionally, there were two modes: basic and scalable (https://ofiwg.github.io/libfabric/v1.1.1/man/fi_mr.3.html). But, now, there are additional flags that you can use and each provider supports / requires a subset of them (https://github.com/ofiwg/libfabric/wiki/Provider-Feature-Matrix-Main). |
| AC_ARG_ENABLE([ofi-mr], | ||
| [AC_HELP_STRING([--enable-ofi-mr=MODE], | ||
| [OFI memory registration mode: basic, scalable, or rma-event (default: scalable)])]) | ||
| [OFI memory registration mode: none, basic, scalable, or rma-event (default: none)])]) |
There was a problem hiding this comment.
Should "none" be called "auto" since it selects MR flags based on the provider?
| [AC_DEFINE([ENABLE_MR_NONE], [1], [If defined, the OFI transport will use MR mode based on provider])], | ||
| [basic], | ||
| [], | ||
| [AC_DEFINE([ENABLE_MR_BASIC], [1], [If defined, the OFI transport will use FI_MR_BASIC])], |
There was a problem hiding this comment.
Is ENABLE_MR_BASIC used anywhere? Is it needed?
| ret = fi_mr_reg(shmem_transport_ofi_domainfd, 0, UINT64_MAX, | ||
| FI_REMOTE_READ | FI_REMOTE_WRITE, 0, 0ULL, flags, | ||
| &shmem_transport_ofi_target_heap_mrfd, NULL); | ||
| OFI_CHECK_RETURN_STR(ret, "target memory (all) registration failed"); |
There was a problem hiding this comment.
I'm confused why virtual addressing now registers the heap now. Does it not need the data segment?
|
|
||
| #if defined(ENABLE_REMOTE_VIRTUAL_ADDRESSING) | ||
| if (shmem_transport_ofi_mr_mode != 0) { | ||
| ret = fi_close(&shmem_transport_ofi_target_data_mrfd->fid); |
There was a problem hiding this comment.
I don't see a corresponding mr_reg for shmem_transport_ofi_target_data_mrfd if only ENABLE_REMOTE_VIRTUAL_ADDRESSING is enabled...
| @@ -1157,20 +1158,10 @@ int query_for_fabric(struct fabric_info *info) | |||
| hints.addr_format = FI_FORMAT_UNSPEC; | |||
| domain_attr.data_progress = FI_PROGRESS_AUTO; | |||
There was a problem hiding this comment.
Just a note that #1059 possibly motivates a guard warning about and/or preventing the usage of FI_MR_ENDPOINT (and thus FI_PROGRESS_MANUAL) except for providers where it makes sense. But come to think of it, maybe this PR already accomplishes that?
For most of the OFI providers, SOS uses a specific MR mode selection. However, user still has the option to select a different mode that may not be supported. This is an attempt to select the MR mode based on the provider by default. User may still choose a different mode during configuration.