[perf] Improve performance for putting jagged tensor by 0oshowero0 · Pull Request #36 · Ascend/TransferQueue

0oshowero0 · 2026-02-25T02:56:34Z

Background

When users input a TensorDict containing jagged tensors (nested tensors), the put_data process becomes extremely slow.

Specifically, the _filter_storage_data function uses itemgetter(*batch_indexes)(data[fname]) to extract individual items from each tensor in the TensorDict. This indexing approach works efficiently for strided tensors but is extremely inefficient for jagged tensors.

Root Cause

For jagged tensors, itemgetter with multiple batch indexes requires repeated indexing operations, which is $\mathcal{O}(n)$ for each access. When extracting multiple samples, this becomes $\mathcal{O}(n²)$ complexity.

Solution

We unbind nested tensor before accessing each sample from it.

  # unbind nested tensor
  results: dict = {}
  for field in sorted(data.keys()):
      field_data = data[field]
      if isinstance(field_data, Tensor) and field_data.is_nested:
          results[field] = field_data.unbind()
      else:
          results[field] = field_data

Simple Reproduction Script

  import torch
  import time
  from operator import itemgetter

  # Create a jagged tensor with 1000 samples
  offsets = torch.tensor([0] + list(torch.randint(10, 50, (1001,)).cumsum(0)))
  values = torch.randn(offsets[-1].item(), 128)
  jagged = torch.nested.as_nested_tensor(
      [values[offsets[i]:offsets[i+1]] for i in range(1000)],
      layout=torch.jagged
  )

  batch_indexes = list(range(0, 1000, 10))  # 100 indexes

  # Method 1: Direct itemgetter on jagged tensor (SLOW)
  start = time.perf_counter()
  result = itemgetter(*batch_indexes)(jagged)
  print(f"Direct itemgetter: {(time.perf_counter() - start)*1000:.2f} ms")

  # Method 2: Unbind first, then itemgetter (FAST)
  start = time.perf_counter()
  field_list = jagged.unbind()
  result = itemgetter(*batch_indexes)(field_list)
  print(f"Unbind + itemgetter: {(time.perf_counter() - start)*1000:.2f} ms")

Output:

Direct itemgetter: 150.94 ms
Unbind + itemgetter: 1.80 ms

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

ascend-robot · 2026-02-25T02:56:44Z

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Copilot

Pull request overview

This PR targets a performance bottleneck when put_data processes TensorDict fields backed by jagged (nested) tensors by avoiding repeated expensive multi-indexing on jagged tensors.

Changes:

Optimize _filter_storage_data to unbind jagged tensors before applying itemgetter over multiple batch indexes.
Add a note in KVStorageManager._generate_values indicating a similar potential optimization for jagged tensors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`transfer_queue/storage/managers/simple_backend_manager.py`	Adds a jagged-tensor fast path in `_filter_storage_data` by unbinding before multi-index selection.
`transfer_queue/storage/managers/base.py`	Adds a TODO note in `_generate_values` related to jagged tensor handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

transfer_queue/storage/managers/simple_backend_manager.py

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

ascend-robot · 2026-02-25T03:28:40Z

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

transfer_queue/storage/managers/simple_backend_manager.py

Copilot · 2026-02-25T03:33:02Z

transfer_queue/storage/managers/simple_backend_manager.py

+        # unbind jagged tensor
+        results: dict = {}
+        for field in sorted(data.keys()):
+            field_data = data[field]
+
+            # For jagged tensors, unbind() first to accelerate indexing process
+            if isinstance(field_data, Tensor) and field_data.layout == torch.jagged:
+                results[field] = field_data.unbind()
+            else:
+                results[field] = field_data


This change introduces a jagged-tensor fast path (pre-unbind before indexing), but there’s no test exercising put_data with layout=torch.jagged. Adding a unit test that uses a jagged tensor field and asserts the data sent to _put_to_single_storage_unit matches expected samples would prevent regressions (and ensure the performance fix stays wired in).

transfer_queue/storage/managers/base.py

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

ascend-robot · 2026-02-25T03:54:47Z

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

transfer_queue/storage/managers/simple_backend_manager.py

Copilot · 2026-02-25T03:59:50Z

transfer_queue/storage/managers/simple_backend_manager.py

+        # unbind jagged tensor
+        results: dict = {}
+        for field in sorted(data.keys()):
+            field_data = data[field]
+
+            # For jagged tensors, unbind() first to accelerate indexing process
+            if isinstance(field_data, Tensor) and field_data.layout == torch.jagged:
+                results[field] = field_data.unbind()
+            else:
+                results[field] = field_data
+


This adds a new jagged-tensor fast path (unbind() before indexing), but there isn't a unit test exercising it. Consider extending the existing tests/test_async_simple_storage_manager.py::test_async_storage_manager_mock_operations to include a layout=torch.jagged nested tensor and assert unbind() is called and that _put_to_single_storage_unit receives the expected sliced items.

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

ascend-robot · 2026-02-25T08:23:04Z

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

ascend-robot · 2026-02-25T08:32:00Z

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

ascend-robot · 2026-02-25T09:41:23Z

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot · 2026-02-25T10:48:14Z

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

ascend-robot · 2026-02-25T10:49:52Z

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

unbind jagged tensor

5b0379c

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

Copilot AI review requested due to automatic review settings February 25, 2026 02:56

ascend-robot added the ascend-cla/yes label Feb 25, 2026

Copilot started reviewing on behalf of 0oshowero0 February 25, 2026 02:57 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

transfer_queue/storage/managers/simple_backend_manager.py Outdated Show resolved Hide resolved

transfer_queue/storage/managers/simple_backend_manager.py Outdated Show resolved Hide resolved

transfer_queue/storage/managers/simple_backend_manager.py Outdated Show resolved Hide resolved

0oshowero0 added 2 commits February 25, 2026 11:06

optimize for KVStorageManager

0e1462a

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix review comments

3f32bea

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

0oshowero0 requested a review from Copilot February 25, 2026 03:28

Copilot started reviewing on behalf of 0oshowero0 February 25, 2026 03:29 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

fix

c735a44

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

0oshowero0 requested a review from Copilot February 25, 2026 03:54

Copilot started reviewing on behalf of 0oshowero0 February 25, 2026 03:55 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

fix

4844b59

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix

0e85e0d

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

fix

1a09d4c

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

Merge branch 'upstream-main' into jagged_tensor

224719e

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

0oshowero0 force-pushed the jagged_tensor branch from e3dc052 to 224719e Compare February 25, 2026 10:49

ji-huazhong approved these changes Feb 25, 2026

View reviewed changes

0oshowero0 merged commit 73ed4c9 into Ascend:main Feb 25, 2026
5 checks passed

Conversation

0oshowero0 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Root Cause

Solution

Simple Reproduction Script

Uh oh!

ascend-robot commented Feb 25, 2026

CLA Signature Pass

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ascend-robot commented Feb 25, 2026

CLA Signature Pass

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ascend-robot commented Feb 25, 2026

CLA Signature Pass

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

ascend-robot commented Feb 25, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Feb 25, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Feb 25, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Feb 25, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Feb 25, 2026

CLA Signature Pass

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

0oshowero0 commented Feb 25, 2026 •

edited

Loading