Skip to content

OFI MR flags selection based on the provider #1077

Open
wrrobin wants to merge 5 commits intoSandia-OpenSHMEM:mainfrom
wrrobin:pr/no-mr-config
Open

OFI MR flags selection based on the provider #1077
wrrobin wants to merge 5 commits intoSandia-OpenSHMEM:mainfrom
wrrobin:pr/no-mr-config

Conversation

@wrrobin
Copy link
Collaborator

@wrrobin wrrobin commented Dec 1, 2022

For most of the OFI providers, SOS uses a specific MR mode selection. However, user still has the option to select a different mode that may not be supported. This is an attempt to select the MR mode based on the provider by default. User may still choose a different mode during configuration.

@stewartl318
Copy link
Collaborator

stewartl318 commented Dec 2, 2022 via email

@wrrobin
Copy link
Collaborator Author

wrrobin commented Dec 2, 2022

@stewartl318 - Great question. In OFI memory registration, there are certain flags that define the behavior of the operations. For example, OFI internally can choose memory region attributes v/s. the user can choose by themselves. These settings are controlled by different MR mode flags. Traditionally, there were two modes: basic and scalable (https://ofiwg.github.io/libfabric/v1.1.1/man/fi_mr.3.html). But, now, there are additional flags that you can use and each provider supports / requires a subset of them (https://github.com/ofiwg/libfabric/wiki/Provider-Feature-Matrix-Main).

AC_ARG_ENABLE([ofi-mr],
[AC_HELP_STRING([--enable-ofi-mr=MODE],
[OFI memory registration mode: basic, scalable, or rma-event (default: scalable)])])
[OFI memory registration mode: none, basic, scalable, or rma-event (default: none)])])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "none" be called "auto" since it selects MR flags based on the provider?

[AC_DEFINE([ENABLE_MR_NONE], [1], [If defined, the OFI transport will use MR mode based on provider])],
[basic],
[],
[AC_DEFINE([ENABLE_MR_BASIC], [1], [If defined, the OFI transport will use FI_MR_BASIC])],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ENABLE_MR_BASIC used anywhere? Is it needed?

ret = fi_mr_reg(shmem_transport_ofi_domainfd, 0, UINT64_MAX,
FI_REMOTE_READ | FI_REMOTE_WRITE, 0, 0ULL, flags,
&shmem_transport_ofi_target_heap_mrfd, NULL);
OFI_CHECK_RETURN_STR(ret, "target memory (all) registration failed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused why virtual addressing now registers the heap now. Does it not need the data segment?


#if defined(ENABLE_REMOTE_VIRTUAL_ADDRESSING)
if (shmem_transport_ofi_mr_mode != 0) {
ret = fi_close(&shmem_transport_ofi_target_data_mrfd->fid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a corresponding mr_reg for shmem_transport_ofi_target_data_mrfd if only ENABLE_REMOTE_VIRTUAL_ADDRESSING is enabled...

@davidozog davidozog added this to the 1.5.x milestone Feb 1, 2023
@@ -1157,20 +1158,10 @@ int query_for_fabric(struct fabric_info *info)
hints.addr_format = FI_FORMAT_UNSPEC;
domain_attr.data_progress = FI_PROGRESS_AUTO;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that #1059 possibly motivates a guard warning about and/or preventing the usage of FI_MR_ENDPOINT (and thus FI_PROGRESS_MANUAL) except for providers where it makes sense. But come to think of it, maybe this PR already accomplishes that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants