Skip to content

prof/efa: Wide WQE support#11944

Open
alekswn wants to merge 7 commits intoofiwg:mainfrom
alekswn:wide-wqe-support
Open

prof/efa: Wide WQE support#11944
alekswn wants to merge 7 commits intoofiwg:mainfrom
alekswn:wide-wqe-support

Conversation

@alekswn
Copy link
Contributor

@alekswn alekswn commented Mar 3, 2026

Summary

Add support for 128-byte wide Work Queue Elements (WQE) in the EFA provider's efa-direct data path. This enables larger inline data for send and inject operations, and introduces inline RDMA write support when the device firmware advertises the capability via inline_buf_size_ex.

Problem

The EFA device currently uses 64-byte WQEs with a 32-byte inline data limit. Some workloads benefit from larger inline payloads to avoid memory registration overhead. New firmware exposes a wider 128-byte WQE format that doubles the available inline data space and enables inline RDMA write, which was previously unsupported.

Changes

  1. Configure (configure.m4): Add compile-time detection for efadv_device_attr.inline_buf_size_ex and efadv_get_max_sq_depth API.

  2. fi_getinfo (efa_prov_info.c, efa_user_info.c): Advertise inline_buf_size_ex as max inject_size in provider info. Handle three cases in efa_user_info_alter_direct: no hint (default 32B), hint ≤ inline_buf_size (no change), hint > inline_buf_size (query efadv_get_max_sq_depth for reduced TX depth).

  3. QP creation (efa_base_ep.c): Use info->tx_attr->inject_size as max_inline_data when creating efa-direct QPs, allowing the device to allocate wider WQEs.

  4. Data path (efa_io_defs.h, efa_data_path_direct_entry.h, efa_data_path_direct_internal.h): Add efa_io_tx_wqe_128 and efa_io_rdma_req_128 structs. Update post_send, post_read, post_write to use the 128-byte WQE format. Use memcpy with sq->wq.wqe_size for WQE submission to LLQ.

  5. Inject RMA write (efa_rma.c, efa_data_path_ops.h): Add inline_data_list and use_inline parameters to efa_qp_post_write. Implement inline data copy in efa_data_path_direct_post_write. Restore fi_inject_write, fi_inject_writedata, and FI_INJECT support in fi_writemsg.

  6. EP inject sizes (efa_base_ep.c): Set inject_msg_size from info->tx_attr->inject_size. Enable inject_rma_size when inject_size exceeds inline_buf_size (wide WQE active).

  7. Tests: Add 6 unit tests for inject_size handling in fi_getinfo and fi_getopt. Add test_efa_rma_writemsg_with_inject. Skip inject RMA tests on hardware without wide WQE support. Fix double-free in test teardown.

Compatibility

The feature depend on rdma-core API change (linux-rdma/rdma-core#1708) and hardware support.

This change is compatible with any combination of new/old rdma-core and hardware.

./configure script detects if new RDMA-core API present. RDMA-core API ensures compatibility with old hardware.

Manual Testing on hardware with wide WQE support

  • Unit tests passed
  • Run following fabtests with following pytest expressions: test_rma, test_rdm, test_info.

@alekswn alekswn changed the title Wide wqe support prof/efa: Wide WQE support Mar 3, 2026
@alekswn
Copy link
Contributor Author

alekswn commented Mar 3, 2026

bot:aws:retest

Add configure check for inline_buf_size_ex field in efadv_device_attr
and update device initialization to query this field when available

Signed-off-by: Alexey Novikov <nalexey@amazon.com>
@alekswn alekswn force-pushed the wide-wqe-support branch 5 times, most recently from 7f7dbf2 to 63a957c Compare March 6, 2026 07:06
@alekswn
Copy link
Contributor Author

alekswn commented Mar 6, 2026

bot:aws:retest

alekswn added 6 commits March 6, 2026 19:17
Add unit tests to verify inject_size handling in fi_getinfo and
fi_getopt for the wide WQE feature:

- test_info_direct_inject_size_no_hint: default 32-byte inject size
- test_info_direct_inject_size_small: requested size <= inline_buf_size
- test_info_direct_inject_size_wide_wqe: wide WQE with reduced TX depth
- test_info_direct_inject_size_exceeds_max: reject oversized requests
- test_ep_getopt_inject_size_regular_wqe: verify MSG/RMA inject sizes
- test_ep_getopt_inject_size_wide_wqe: verify wide WQE inject sizes

Also fix a double-free in efa_unit_test_resource_destruct by NULLing
info and hints pointers after freeing them.

Signed-off-by: Alexey Novikov <nalexey@amazon.com>
Implement inject_size hint handling for wide WQE support in efa-direct:

- Advertise inline_buf_size_ex (when available) as the maximum
  inject_size in prov_info, so ofi_check_info allows larger hints.

- In efa_user_info_alter_direct, handle three cases:
  1. No hint: default inject_size to inline_buf_size (32 bytes)
  2. inject_size <= inline_buf_size: use requested size as-is
  3. inject_size > inline_buf_size: query actual TX queue depth
     via efadv_get_max_sq_depth and adjust tx_attr->size

When rdma-core lacks inline_buf_size_ex support, requests exceeding
inline_buf_size are rejected with -FI_ENODATA.

Signed-off-by: Alexey Novikov <nalexey@amazon.com>
For efa-direct endpoints, use info->tx_attr->inject_size as the QP's
max_inline_data during creation. This allows the QP to use wide WQE
(128-byte) when a larger inject size is requested.

For efa RDM endpoints, continue using the device's inline_buf_size
since inject != inline in the RDM protocol path.

Signed-off-by: Alexey Novikov <nalexey@amazon.com>
Add efa_io_tx_wqe_128 and efa_io_rdma_req_128 structs for the 128-byte
wide WQE format. The new structs extend inline data capacity from 32 to
80 bytes for both send and RDMA write operations.

Update data path direct post functions (send, read, write) to use
efa_io_tx_wqe_128 as the local stack variable. Since the 128-byte
format is backward compatible with the 64-byte format within the first
64 bytes, this is safe for both regular and wide WQE paths.

Update send_wr_post to copy wqe_size bytes (determined at QP creation)
instead of a hardcoded 64-byte copy, using memcpy.

Signed-off-by: Alexey Novikov <nalexey@amazon.com>
Add inline data support for RDMA write operations, enabling
fi_inject_write, fi_inject_writedata, and fi_writemsg with FI_INJECT
flag on efa-direct endpoints.

- Add inline_data_list and use_inline parameters to efa_qp_post_write
  and efa_data_path_direct_post_write signatures
- Implement inline data path in post_write: when use_inline is set,
  copy data into rdma_req.inline_data and set INLINE_MSG control flag
- Restore efa_rma_inject_write, efa_rma_inject_writedata, and
  FI_INJECT handling in efa_rma_post_write
- Construct inline_data_list and determine use_inline in efa_rma.c
  based on message length vs inject_rma_size
- Add test_efa_rma_writemsg_with_inject test
- Skip inject RMA tests on hardware without wide WQE support using
  fi_getopt(FI_OPT_INJECT_RMA_SIZE)
- Update all callers, test mocks, and stubs for new signatures
- Fix use-after-free in efa_unit_test_av.c for efa-direct path

Signed-off-by: Alexey Novikov <nalexey@amazon.com>
Set inject_msg_size from info->tx_attr->inject_size instead of
hardcoding the device inline_buf_size. Enable inject_rma_size when
inject_size exceeds inline_buf_size, indicating wide WQE is active
and firmware supports inline RDMA write.

The RDM path already overrides both values at EP open time, so this
change only affects efa-direct endpoints.

Signed-off-by: Alexey Novikov <nalexey@amazon.com>
@alekswn alekswn force-pushed the wide-wqe-support branch from 63a957c to 22d6847 Compare March 6, 2026 19:17
@alekswn alekswn removed the evaluating label Mar 6, 2026
@alekswn alekswn marked this pull request as ready for review March 6, 2026 21:14
@alekswn alekswn requested a review from a team March 7, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant