[CI] Add E2E consistency tests with comprehensive data type coverage by mpb159753 · Pull Request #35 · Ascend/TransferQueue

mpb159753 · 2026-02-14T01:36:52Z

Test Case Design

Same as https://gitcode.com/Ascend/TransferQueue/pull/25
Only for review

This PR introduces a new E2E consistency test suite tests/test_e2e_consistency.py to comprehensively verify the read/write consistency of TransferQueue in complex data scenarios. The tests cover the following three core scenarios:

Core Data Types (test_consistency_core_types)
- Verifies system support for diverse data types, including:
  - Tensor: Standard, Non-contiguous, Special values (Inf/NaN), Bool.
  - NestedTensor: Ragged tensors (layout=torch.jagged).
  - Numpy: List of Arrays, Object Arrays (String/Mixed).
  - Python Built-ins: List[int], List[str].
Multi-round Put & Field Merge (test_consistency_multi_round_put_get)
- Verifies data consistency in multi-round, fragmented update scenarios.
- Simulates production behavior: writing Standard Group fields first, then updating Complex Group fields.
- Ensures robust shard merging logic by mixing all 9 complex data types in all update steps.
Slicing and Field Subsetting (test_consistency_slicing_and_subset)
- Verifies on-demand reading and slice access for specific fields in force_fetch mode.
- Enhancement: Validates type preservation and data correctness for NestedTensor and Numpy Object in slicing scenarios.

Source Code Modifications

Client API Update: Modified TransferQueueClient.get_meta in transfer_queue/client.py. Exposed mode parameter (default="fetch") to support force_fetch and insert modes in synchronous client, aligning capabilities with async_get_meta.
The get_meta update allows the synchronous client to perform specialized operations like force_fetch (used in inspection tests) and insert (used in test data allocation), which were previously only available in the async API.

Discussion

API Signature Consistency (/transfer_queue/client.py:L1044):
- Currently, get_meta strictly follows the parameter order of async_get_meta, with mode located in the middle (data_fields, batch_size, partition_id, mode, ...).
- Discussion: Should we move mode to the end of the parameter list to improve external interface compatibility? Or prioritizing consistency between Sync/Async signatures?

ascend-robot · 2026-02-14T01:37:05Z

CLA Signature Pass

mpb159753, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot · 2026-02-14T01:43:07Z

CLA Signature Pass

mpb159753, thanks for your pull request. All authors of the commits have signed the CLA. 👍

0oshowero0 · 2026-02-14T01:53:50Z

tests/e2e/test_e2e_lifecycle_consistency.py

+"""
+E2E Lifecycle Consistency Tests for TransferQueue.
+
+Implements all 5 scenarios from consistency_validation_plan.md:


clean unnecessary AI doc

0oshowero0 · 2026-02-14T01:54:46Z

tests/e2e/test_e2e_lifecycle_consistency.py

+sys.path.append(str(parent_dir))
+
+from transfer_queue import (  # noqa: E402
+    SimpleStorageUnit,


Please import it from lower layers. The top-level namespace is cleaned up recently

tests/e2e/test_e2e_lifecycle_consistency.py

Copilot

Pull request overview

Adds a new end-to-end lifecycle consistency test module for TransferQueue, exercising read/write consistency across complex data types and key lifecycle operations (production/consumption/custom meta/reset/clear).

Changes:

Introduces an E2E test suite covering 5 lifecycle consistency scenarios.
Adds helpers to generate complex mixed-type TensorDict payloads and verify round-tripped correctness.
Validates partition lifecycle operations (custom meta persistence, reset consumption, and clear partition).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-14T01:56:03Z

tests/e2e/test_e2e_lifecycle_consistency.py

+        custom_metadata = {}
+        for i in range(batch_size):
+            custom_metadata[meta.global_indexes[i]] = {
+                "score": float(i) / 10.0,
+                "label": f"label_{i}",
+                "tags": [f"tag_{i}_a", f"tag_{i}_b"],
+            }
+        meta.update_custom_meta(custom_metadata)
+
+        # 3. Upload Custom Metadata
+        client.set_custom_meta(meta)
+
+        # 4. Retrieve Metadata and Verify Custom Meta
+        retrieved_meta = poll_for_meta(client, partition_id, fields, batch_size, task_name, mode="force_fetch")
+        assert retrieved_meta is not None, "Failed to retrieve metadata"
+
+        # Verify custom metadata content
+        retrieved_custom = retrieved_meta.get_all_custom_meta()
+        for global_idx, expected_meta in custom_metadata.items():
+            assert global_idx in retrieved_custom, f"Missing custom_meta for index {global_idx}"
+            actual = retrieved_custom[global_idx]
+            assert actual["score"] == expected_meta["score"], f"Score mismatch at index {global_idx}"
+            assert actual["label"] == expected_meta["label"], f"Label mismatch at index {global_idx}"
+            assert actual["tags"] == expected_meta["tags"], f"Tags mismatch at index {global_idx}"


BatchMeta.update_custom_meta() expects a list of per-sample dicts (length == batch size, aligned with meta.global_indexes), but the test passes a dict keyed by global_index. This will raise at runtime (dict indexing with 0..N-1) and also get_all_custom_meta() returns a list, not a dict keyed by global_index. Please build a custom_meta_list in batch order and compare against the returned list (or add a helper that returns a global_index->meta mapping and use that consistently).

Copilot · 2026-02-14T01:56:03Z

tests/e2e/test_e2e_lifecycle_consistency.py

+        # 6. Verify region 0-9: original Put A values
+        original_data_0_9 = generate_complex_data(list(range(0, 10)))
+        assert torch.allclose(full_data["tensor_f32"][:10], original_data_0_9["tensor_f32"]), (
+            "Region 0-9 tensor_f32 should match original Put A"
+        )
+
+        # 7. Verify region 10-29: updated values (using offset indices 1010-1029)
+        updated_expected = generate_complex_data([i + 1000 for i in range(10, 30)])
+        assert torch.allclose(full_data["tensor_f32"][10:30], updated_expected["tensor_f32"]), (
+            "Region 10-29 tensor_f32 should match updated values"
+        )
+
+        # 8. Verify region 30-39: original Put B values
+        original_data_30_39 = generate_complex_data(list(range(30, 40)))
+        assert torch.allclose(full_data["tensor_f32"][30:40], original_data_30_39["tensor_f32"]), (


This test assumes force_fetch returns metadata/data in a deterministic order that matches the original insertion order (e.g., slicing full_data["tensor_f32"][:10] to represent indices 0-9). In the current implementation, force_fetch uses IndexManager.get_indexes_for_partition() which returns a list built from a set, so ordering is not guaranteed. Please sort/reorder the returned BatchMeta by global_indexes (or build an index->row mapping) before doing positional slicing assertions.

Suggested change

# 6. Verify region 0-9: original Put A values

original_data_0_9 = generate_complex_data(list(range(0, 10)))

assert torch.allclose(full_data["tensor_f32"][:10], original_data_0_9["tensor_f32"]), (

"Region 0-9 tensor_f32 should match original Put A"

)

# 7. Verify region 10-29: updated values (using offset indices 1010-1029)

updated_expected = generate_complex_data([i + 1000 for i in range(10, 30)])

assert torch.allclose(full_data["tensor_f32"][10:30], updated_expected["tensor_f32"]), (

"Region 10-29 tensor_f32 should match updated values"

)

# 8. Verify region 30-39: original Put B values

original_data_30_39 = generate_complex_data(list(range(30, 40)))

assert torch.allclose(full_data["tensor_f32"][30:40], original_data_30_39["tensor_f32"]), (

# The order of rows returned by force_fetch is not guaranteed.

# Sort by global_indexes so that position i corresponds to global index i.

sorted_order = np.argsort(full_meta.global_indexes)

tensor_f32_sorted = full_data["tensor_f32"][sorted_order]

# 6. Verify region 0-9: original Put A values

original_data_0_9 = generate_complex_data(list(range(0, 10)))

assert torch.allclose(tensor_f32_sorted[:10], original_data_0_9["tensor_f32"]), (

"Region 0-9 tensor_f32 should match original Put A"

)

# 7. Verify region 10-29: updated values (using offset indices 1010-1029)

updated_expected = generate_complex_data([i + 1000 for i in range(10, 30)])

assert torch.allclose(tensor_f32_sorted[10:30], updated_expected["tensor_f32"]), (

"Region 10-29 tensor_f32 should match updated values"

)

# 8. Verify region 30-39: original Put B values

original_data_30_39 = generate_complex_data(list(range(30, 40)))

assert torch.allclose(tensor_f32_sorted[30:40], original_data_30_39["tensor_f32"]), (

tests/e2e/test_e2e_lifecycle_consistency.py

Copilot · 2026-02-14T01:56:03Z

tests/e2e/test_e2e_lifecycle_consistency.py

+    # 1. Put Data
+    data = generate_complex_data(list(range(batch_size)))
+    client.put(data=data, partition_id=partition_id)
+
+    # 2. Verify Data Exists - production status should be True
+    is_ready = client.check_production_status(data_fields=fields, partition_id=partition_id)
+    assert is_ready, "Data should be ready after put"
+
+    # 3. Get Data to confirm accessibility
+    meta = poll_for_meta(client, partition_id, fields, batch_size, task_name, mode="force_fetch")
+    assert meta is not None and meta.size == batch_size, "Failed to poll metadata"
+
+    # 4. Verify partition exists before clear
+    partition_list_before = client.get_partition_list()
+    assert partition_id in partition_list_before, "Partition should exist before clear"
+
+    # 5. Clear Partition
+    client.clear_partition(partition_id)
+
+    # 6. Verify partition is removed from list
+    partition_list_after = client.get_partition_list()
+    assert partition_id not in partition_list_after, "Partition should be removed after clear"
+
+    # 7. Verify Production Status returns False for cleared partition
+    is_ready_after_clear = client.check_production_status(data_fields=fields, partition_id=partition_id)
+    assert not is_ready_after_clear, "Production status should be False after clear"


test_clear_partition doesn’t have a try/finally cleanup. If an assertion fails before client.clear_partition(partition_id) runs, the partition can leak into later tests in this module (shared e2e_client) and cause cascading failures. Please wrap the body in try/finally and call clear_partition in the finally block.

Suggested change

# 1. Put Data

data = generate_complex_data(list(range(batch_size)))

client.put(data=data, partition_id=partition_id)

# 2. Verify Data Exists - production status should be True

is_ready = client.check_production_status(data_fields=fields, partition_id=partition_id)

assert is_ready, "Data should be ready after put"

# 3. Get Data to confirm accessibility

meta = poll_for_meta(client, partition_id, fields, batch_size, task_name, mode="force_fetch")

assert meta is not None and meta.size == batch_size, "Failed to poll metadata"

# 4. Verify partition exists before clear

partition_list_before = client.get_partition_list()

assert partition_id in partition_list_before, "Partition should exist before clear"

# 5. Clear Partition

client.clear_partition(partition_id)

# 6. Verify partition is removed from list

partition_list_after = client.get_partition_list()

assert partition_id not in partition_list_after, "Partition should be removed after clear"

# 7. Verify Production Status returns False for cleared partition

is_ready_after_clear = client.check_production_status(data_fields=fields, partition_id=partition_id)

assert not is_ready_after_clear, "Production status should be False after clear"

try:

# 1. Put Data

data = generate_complex_data(list(range(batch_size)))

client.put(data=data, partition_id=partition_id)

# 2. Verify Data Exists - production status should be True

is_ready = client.check_production_status(data_fields=fields, partition_id=partition_id)

assert is_ready, "Data should be ready after put"

# 3. Get Data to confirm accessibility

meta = poll_for_meta(client, partition_id, fields, batch_size, task_name, mode="force_fetch")

assert meta is not None and meta.size == batch_size, "Failed to poll metadata"

# 4. Verify partition exists before clear

partition_list_before = client.get_partition_list()

assert partition_id in partition_list_before, "Partition should exist before clear"

# 5. Clear Partition

client.clear_partition(partition_id)

# 6. Verify partition is removed from list

partition_list_after = client.get_partition_list()

assert partition_id not in partition_list_after, "Partition should be removed after clear"

# 7. Verify Production Status returns False for cleared partition

is_ready_after_clear = client.check_production_status(data_fields=fields, partition_id=partition_id)

assert not is_ready_after_clear, "Production status should be False after clear"

finally:

# Ensure partition is cleared even if assertions above fail

try:

client.clear_partition(partition_id)

except Exception:

# Best-effort cleanup; ignore errors during teardown

pass

tests/e2e/test_e2e_lifecycle_consistency.py

0oshowero0 · 2026-02-14T01:57:22Z

tests/e2e/test_e2e_lifecycle_consistency.py

+    """Create a client with 2 storage units for lifecycle testing."""
+    controller_actor = TransferQueueController.options(
+        name="lifecycle_controller",
+        get_if_exists=True,
+    ).remote(polling_mode=True)
+    controller_info = ray.get(controller_actor.get_zmq_server_info.remote())
+
+    # Two storage units to ensure sharding
+    storage_actor_1 = SimpleStorageUnit.options(
+        name="lifecycle_storage_1",
+        get_if_exists=True,
+    ).remote(storage_unit_size=10000)
+    storage_info_1 = ray.get(storage_actor_1.get_zmq_server_info.remote())
+
+    storage_actor_2 = SimpleStorageUnit.options(
+        name="lifecycle_storage_2",
+        get_if_exists=True,
+    ).remote(storage_unit_size=10000)
+    storage_info_2 = ray.get(storage_actor_2.get_zmq_server_info.remote())
+
+    client = TransferQueueClient(
+        client_id="lifecycle_test_client",
+        controller_info=controller_info,
+    )
+


The initialization can be simplified now through transfer_queue.init(config). It also helps to test the initialization in e2e

0oshowero0 · 2026-02-14T01:58:52Z

tests/e2e/test_e2e_lifecycle_consistency.py

+    # Nested Tensor (Strided) - fallback to jagged if not supported
+    try:
+        strided_tensors = [torch.full((3, 4), float(i)) for i in indices]
+        nested_strided = torch.nested.nested_tensor(strided_tensors, layout=torch.strided)
+    except (TypeError, RuntimeError):
+        strided_tensors = [torch.full((3, 4), float(i)) for i in indices]
+        nested_strided = torch.nested.as_nested_tensor(strided_tensors, layout=torch.jagged)


Need to make sure this is strided. No silent fallback

tests/e2e/test_e2e_lifecycle_consistency.py

0oshowero0 · 2026-02-14T02:03:22Z

tests/e2e/test_e2e_lifecycle_consistency.py

+    """Verify nested tensors element by element."""
+    if len(retrieved.unbind()) != len(expected.unbind()):
+        return False
+    for r, e in zip(retrieved.unbind(), expected.unbind(), strict=False):


strict=True?

tests/e2e/test_e2e_lifecycle_consistency.py

0oshowero0 · 2026-02-14T02:05:36Z

tests/e2e/test_e2e_lifecycle_consistency.py

+
+
+def verify_list_equal(retrieved, expected) -> bool:
+    """Verify list content, handling possible Tensor conversion."""


Why it can become Tensor? Should not change the data type.

TensorDict automatically converts Python list[int] / list[float] to torch.Tensor at construction time.For list[str]/list[dict], TensorDict stores them as NonTensorData with no type change.

0oshowero0 · 2026-02-14T02:08:25Z

tests/e2e/test_e2e_lifecycle_consistency.py

+# =============================================================================
+# Scenario Two: Cross-Partition & Complex Update
+# =============================================================================
+def test_cross_partition_complex_update(e2e_client):


the name is a little bit misleading - the partition here is not the same concept in TransferQueue

consider change to shard?

0oshowero0 · 2026-02-14T02:09:22Z

tests/e2e/test_e2e_lifecycle_consistency.py

+    1. Round 1 Put: Indices 0-9, only Set_A fields -> Check production(Set_A)=True, production(Set_B)=False
+    2. Round 2 Put: Indices 0-9, complete Set_B fields -> Check production(Set_A+Set_B)=True


These descriptions are not easy to undersand

0oshowero0 · 2026-02-14T02:14:39Z

tests/e2e/test_e2e_lifecycle_consistency.py

+        retrieved_data = client.get_data(meta)
+        assert retrieved_data.batch_size[0] == batch_size, "Retrieved data batch_size mismatch"
+
+        # 4. Post-Consumption Status Check - should be True
+        is_consumed_after = client.check_consumption_status(task_name=task_name, partition_id=partition_id)
+        assert is_consumed_after, "Data should be consumed after get_data"


Is this logic correct? get_data will not trigger consumption label. get_meta will.

0oshowero0 · 2026-02-14T02:16:44Z

tests/e2e/test_e2e_lifecycle_consistency.py

+    partition_list_after = client.get_partition_list()
+    assert partition_id not in partition_list_after, "Partition should be removed after clear"
+
+    # 7. Verify Production Status returns False for cleared partition


also need to check consumption status

0oshowero0 · 2026-02-14T02:17:13Z

tests/e2e/test_e2e_lifecycle_consistency.py

+    assert not is_ready_after_clear, "Production status should be False after clear"
+
+
+if __name__ == "__main__":


And please also add additional tests for our high-level kv API provided in interface.py~

0oshowero0 · 2026-02-14T02:17:44Z

Great work!

- Added tests/test_e2e_consistency.py covering core types, multi-round puts, and slicing. - Updated terminology to Standard/Complex groups. - Verified cross-batch put scenarios. - Fixed linter errors. Signed-off-by: 看我72遍 <m.pb@msn.com>

Signed-off-by: 看我72遍 <m.pb@msn.com>

…ation Signed-off-by: 看我72遍 <m.pb@msn.com>

Signed-off-by: 看我72遍 <m.pb@msn.com>

…cleanup Signed-off-by: 看我72遍 <m.pb@msn.com>

Signed-off-by: 看我72遍 <m.pb@msn.com>

…anup - Add runtime assertion in generate_complex_data to ensure field_values keys exactly match DEFAULT_FIELDS, preventing silent field mismatches - Build TensorDict via dict comprehension keyed by DEFAULT_FIELDS order - Extract inline reorder logic into reusable _reorder_tensordict helper - Remove redundant section separator comments (=== banners) - Add missing assertions for tensor_bf16 and list_obj in core consistency - Rename test_cross_partition_complex_update -> test_cross_shard_complex_update - Improve verify_list_equal docstring with TensorDict conversion note Signed-off-by: 看我72遍 <m.pb@msn.com>

ascend-robot · 2026-02-16T09:27:47Z

CLA Signature Pass

mpb159753, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T01:43:35Z

tests/e2e/test_e2e_lifecycle_consistency.py

+        assert verify_list_equal(retrieved_data["list_str"], original_data["list_str"]), "list_str mismatch"
+        assert verify_list_equal(retrieved_data["list_obj"], original_data["list_obj"]), "list_obj mismatch"


list_str and list_obj are written as non-tensor fields, but SimpleStorageManager.get_data() will materialize non-tensor batches as NonTensorStack (not torch.Tensor / Python list). verify_list_equal() only converts torch.Tensor via .tolist(), so these assertions will compare NonTensorStack to the original Python lists and likely fail. Consider switching these checks to verify_non_tensor_data() (or normalizing both sides via .tolist() when available).

Suggested change

assert verify_list_equal(retrieved_data["list_str"], original_data["list_str"]), "list_str mismatch"

assert verify_list_equal(retrieved_data["list_obj"], original_data["list_obj"]), "list_obj mismatch"

assert verify_non_tensor_data(retrieved_data["list_str"], original_data["list_str"]), "list_str mismatch"

assert verify_non_tensor_data(retrieved_data["list_obj"], original_data["list_obj"]), "list_obj mismatch"

Copilot · 2026-02-25T01:43:35Z

tests/e2e/test_e2e_lifecycle_consistency.py

+
+        # 7. Verify NumPy Arrays
+        assert np.allclose(retrieved_data["np_array"], original_data["np_array"]), "np_array mismatch"
+        assert np.array_equal(retrieved_data["np_obj"], original_data["np_obj"]), "np_obj mismatch"


np_obj is an object-dtype NumPy array; per-sample reads/writes through SimpleStorage will round-trip it as a non-tensor batch (typically NonTensorStack of Python objects), not necessarily a NumPy array with the same dtype/shape. Using np.array_equal(retrieved_data["np_obj"], original_data["np_obj"]) is therefore unlikely to be a valid equality check. Normalize the retrieved value (e.g., .tolist()) and compare to original_data["np_obj"].tolist(), or store per-sample object arrays explicitly and assert against that representation.

Suggested change

assert np.array_equal(retrieved_data["np_obj"], original_data["np_obj"]), "np_obj mismatch"

original_np_obj_list = original_data["np_obj"].tolist()

retrieved_np_obj = retrieved_data["np_obj"]

if hasattr(retrieved_np_obj, "tolist"):

retrieved_np_obj_list = retrieved_np_obj.tolist()

else:

retrieved_np_obj_list = list(retrieved_np_obj)

assert retrieved_np_obj_list == original_np_obj_list, "np_obj mismatch"

Copilot · 2026-02-25T01:43:35Z

tests/e2e/test_e2e_lifecycle_consistency.py

+def _reorder_tensordict(td: TensorDict, order: list[int]) -> TensorDict:
+    """Reorder a TensorDict by the given index order.
+
+    Handles regular tensors, nested/jagged tensors, lists, and other indexable types.
+    """
+    reordered = {}
+    for key in td.keys():
+        field = td[key]
+        if hasattr(field, "unbind"):
+            items = field.unbind(0)
+            reordered_items = [items[i] for i in order]
+            try:
+                reordered[key] = torch.stack(reordered_items)
+            except RuntimeError:
+                reordered[key] = torch.nested.as_nested_tensor(reordered_items, layout=field.layout)
+        elif isinstance(field, list):
+            reordered[key] = [field[i] for i in order]
+        else:
+            reordered[key] = field[torch.tensor(order)]
+    return TensorDict(reordered, batch_size=td.batch_size)


_reorder_tensordict() treats any field with .unbind() as tensor-like and tries torch.stack(reordered_items). For non-tensor batches returned as NonTensorStack, .unbind() exists but the unbound items are not plain torch.Tensors, so torch.stack(...) can raise TypeError (not caught here). This can break the cross-shard test when reordering is needed. Handle NonTensorStack explicitly (e.g., index it directly or reorder via .tolist()), and/or broaden the exception handling to include TypeError.

Copilot · 2026-02-25T01:43:35Z

tests/e2e/test_e2e_lifecycle_consistency.py

+        # 9. Verify new fields exist in update region
+        extended_fields = base_fields + ["new_extra_tensor", "new_extra_non_tensor"]
+        update_region_meta = poll_for_meta(
+            client, partition_id, extended_fields, 20, "update_region_task", mode="force_fetch"
+        )
+        if update_region_meta is not None and update_region_meta.size > 0:
+            update_region_data = client.get_data(update_region_meta)
+            assert "new_extra_tensor" in update_region_data.keys(), "new_extra_tensor should exist"
+            assert "new_extra_non_tensor" in update_region_data.keys(), "new_extra_non_tensor should exist"


In the cross-shard update test, the “verify new fields exist in update region” step doesn’t actually ensure the returned batch corresponds to the updated global index range (10–29), and "new_extra_tensor" in update_region_data.keys() will always be true because those fields were requested. To make this meaningful, assert the returned global_indexes are within the update range (or select the exact updated samples) and validate the retrieved values for the new fields for those samples.

Suggested change

# 9. Verify new fields exist in update region

extended_fields = base_fields + ["new_extra_tensor", "new_extra_non_tensor"]

update_region_meta = poll_for_meta(

client, partition_id, extended_fields, 20, "update_region_task", mode="force_fetch"

)

if update_region_meta is not None and update_region_meta.size > 0:

update_region_data = client.get_data(update_region_meta)

assert "new_extra_tensor" in update_region_data.keys(), "new_extra_tensor should exist"

assert "new_extra_non_tensor" in update_region_data.keys(), "new_extra_non_tensor should exist"

# 9. Verify new fields exist in update region and have correct values

extended_fields = base_fields + ["new_extra_tensor", "new_extra_non_tensor"]

update_region_meta = poll_for_meta(

client, partition_id, extended_fields, 20, "update_region_task", mode="force_fetch"

)

if update_region_meta is not None and update_region_meta.size > 0:

# Ensure we are actually reading from the updated global index range (10–29)

assert all(

10 <= gi < 30 for gi in update_region_meta.global_indexes

), f"Fetched indexes {list(update_region_meta.global_indexes)} are not within the update region 10–29"

update_region_data = client.get_data(update_region_meta)

# New fields should exist in the retrieved data

assert "new_extra_tensor" in update_region_data.keys(), "new_extra_tensor should exist in update region data"

assert "new_extra_non_tensor" in update_region_data.keys(), "new_extra_non_tensor should exist in update region data"

# Map fetched global indexes back to positions within the update_data tensor dict

fetched_global_indexes = list(update_region_meta.global_indexes)

relative_update_positions = [idx_update.index(gi) for gi in fetched_global_indexes]

# Validate tensor field values for the updated samples

expected_new_tensor = update_data["new_extra_tensor"][relative_update_positions]

actual_new_tensor = update_region_data["new_extra_tensor"]

assert torch.allclose(

actual_new_tensor, expected_new_tensor

), "new_extra_tensor values in update region do not match updated data"

# Validate non-tensor field values for the updated samples

expected_new_non_tensor = [

update_data["new_extra_non_tensor"][i] for i in relative_update_positions

]

actual_new_non_tensor = list(update_region_data["new_extra_non_tensor"])

assert (

actual_new_non_tensor == expected_new_non_tensor

), "new_extra_non_tensor values in update region do not match updated data"

Copilot · 2026-02-25T01:43:35Z

tests/e2e/test_e2e_lifecycle_consistency.py

+    if not ray.is_initialized():
+        ray.init(ignore_reinit_error=True)
+    yield
+    if ray.is_initialized():


The module-level Ray fixture always calls ray.shutdown() at teardown, even if Ray was already initialized by another test/fixture in the same worker. This can cause cross-module interference. Consider tracking whether this fixture performed the init and only shutting down in that case, and also setting a dedicated namespace (as done in other E2E tests) to avoid actor-name collisions.

Suggested change

if not ray.is_initialized():

ray.init(ignore_reinit_error=True)

yield

if ray.is_initialized():

did_init = False

if not ray.is_initialized():

ray.init(

ignore_reinit_error=True,

namespace="transfer_queue_e2e_lifecycle_consistency",

)

did_init = True

yield

if did_init and ray.is_initialized():

ascend-robot · 2026-02-25T06:45:43Z

CLA Signature Pass

mpb159753, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Signed-off-by: 看我72遍 <m.pb@msn.com>

ascend-robot · 2026-02-25T07:13:33Z

CLA Signature Pass

mpb159753, thanks for your pull request. All authors of the commits have signed the CLA. 👍

ascend-robot · 2026-02-25T12:34:20Z

CLA Signature Guide

@mpb159753 , thanks for your pull request.

The following commit(s) are not associated with a signed Contributor License Agreement (CLA).

Commit	Reason
3af1a97c Merge branch 'Ascend:main' into ...	the email used in the commit is not linked to a signed CLA! please verify that it matches the email you used when signing the CLA.

To sign CLA, click here.

To check if your email is configured correctly, refer to the FAQs.

Once you've signed the CLA or updating your email, please comment /check-cla to revalidate CLA status.

0oshowero0 · 2026-02-26T02:47:30Z

closed as already merged in Gitcode: https://gitcode.com/Ascend/TransferQueue/pull/25

ascend-robot added the ascend-cla/yes label Feb 14, 2026

mpb159753 force-pushed the feature/add_e2e_consistency_tests branch from ed40bba to 845b85c Compare February 14, 2026 01:42

0oshowero0 requested a review from Copilot February 14, 2026 01:50

Copilot started reviewing on behalf of 0oshowero0 February 14, 2026 01:50 View session

0oshowero0 changed the title ~~Add E2E Consistency Tests with Comprehensive Data Type Coverage~~ [CI] Add E2E consistency tests with comprehensive data type coverage Feb 14, 2026

0oshowero0 reviewed Feb 14, 2026

View reviewed changes

tests/e2e/test_e2e_lifecycle_consistency.py Show resolved Hide resolved

0oshowero0 reviewed Feb 14, 2026

View reviewed changes

tests/e2e/test_e2e_lifecycle_consistency.py Show resolved Hide resolved

Copilot AI reviewed Feb 14, 2026

View reviewed changes

0oshowero0 reviewed Feb 14, 2026

View reviewed changes

tests/e2e/test_e2e_lifecycle_consistency.py Show resolved Hide resolved

0oshowero0 reviewed Feb 14, 2026

View reviewed changes

tests/e2e/test_e2e_lifecycle_consistency.py Show resolved Hide resolved

0oshowero0 reviewed Feb 14, 2026

View reviewed changes

tests/e2e/test_e2e_lifecycle_consistency.py Show resolved Hide resolved

0oshowero0 reviewed Feb 14, 2026

View reviewed changes

mpb159753 added 5 commits February 16, 2026 17:27

test: move sys.path setup before package import in e2e test

adf6b56

Signed-off-by: 看我72遍 <m.pb@msn.com>

Refactor e2e consistency tests: cleanup fixture and deduplicate valid…

fc5948e

…ation Signed-off-by: 看我72遍 <m.pb@msn.com>

move e2e tests

7fa6c11

Signed-off-by: 看我72遍 <m.pb@msn.com>

fix polling mode in text

7a663a1

Signed-off-by: 看我72遍 <m.pb@msn.com>

mpb159753 added 3 commits February 16, 2026 17:27

refactor: improve e2e lifecycle tests with better error handling and …

30a5757

…cleanup Signed-off-by: 看我72遍 <m.pb@msn.com>

remove file

b999ae4

Signed-off-by: 看我72遍 <m.pb@msn.com>

mpb159753 force-pushed the feature/add_e2e_consistency_tests branch from 845b85c to c447faf Compare February 16, 2026 09:27

0oshowero0 requested a review from Copilot February 25, 2026 01:35

Copilot started reviewing on behalf of 0oshowero0 February 25, 2026 01:36 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

fix some review issue

0ac69a2

Signed-off-by: 看我72遍 <m.pb@msn.com>

mpb159753 force-pushed the feature/add_e2e_consistency_tests branch from 2eaf7dc to 0ac69a2 Compare February 25, 2026 07:13

Merge branch 'Ascend:main' into feature/add_e2e_consistency_tests

3af1a97

ascend-robot added ascend-cla/no and removed ascend-cla/yes labels Feb 25, 2026

0oshowero0 closed this Feb 26, 2026

0oshowero0 mentioned this pull request Feb 27, 2026

[Roadmap] TransferQueue Q1 roadmap #20

Open

22 tasks



		def verify_list_equal(retrieved, expected) -> bool:
		"""Verify list content, handling possible Tensor conversion."""

		1. Round 1 Put: Indices 0-9, only Set_A fields -> Check production(Set_A)=True, production(Set_B)=False
		2. Round 2 Put: Indices 0-9, complete Set_B fields -> Check production(Set_A+Set_B)=True

		assert not is_ready_after_clear, "Production status should be False after clear"


		if __name__ == "__main__":

		assert verify_list_equal(retrieved_data["list_str"], original_data["list_str"]), "list_str mismatch"
		assert verify_list_equal(retrieved_data["list_obj"], original_data["list_obj"]), "list_obj mismatch"

-        assert np.array_equal(retrieved_data["np_obj"], original_data["np_obj"]), "np_obj mismatch"
+        original_np_obj_list = original_data["np_obj"].tolist()
+        retrieved_np_obj = retrieved_data["np_obj"]
+        if hasattr(retrieved_np_obj, "tolist"):
+            retrieved_np_obj_list = retrieved_np_obj.tolist()
+        else:
+            retrieved_np_obj_list = list(retrieved_np_obj)
+        assert retrieved_np_obj_list == original_np_obj_list, "np_obj mismatch"

-        # 9. Verify new fields exist in update region
-        extended_fields = base_fields + ["new_extra_tensor", "new_extra_non_tensor"]
-        update_region_meta = poll_for_meta(
-            client, partition_id, extended_fields, 20, "update_region_task", mode="force_fetch"
-        )
-        if update_region_meta is not None and update_region_meta.size > 0:
-            update_region_data = client.get_data(update_region_meta)
-            assert "new_extra_tensor" in update_region_data.keys(), "new_extra_tensor should exist"
-            assert "new_extra_non_tensor" in update_region_data.keys(), "new_extra_non_tensor should exist"
+        # 9. Verify new fields exist in update region and have correct values
+        extended_fields = base_fields + ["new_extra_tensor", "new_extra_non_tensor"]
+        update_region_meta = poll_for_meta(
+            client, partition_id, extended_fields, 20, "update_region_task", mode="force_fetch"
+        )
+        if update_region_meta is not None and update_region_meta.size > 0:
+            # Ensure we are actually reading from the updated global index range (10–29)
+            assert all(
+<= gi < 30 for gi in update_region_meta.global_indexes
+            ), f"Fetched indexes {list(update_region_meta.global_indexes)} are not within the update region 10–29"
+            update_region_data = client.get_data(update_region_meta)
+            # New fields should exist in the retrieved data
+            assert "new_extra_tensor" in update_region_data.keys(), "new_extra_tensor should exist in update region data"
+            assert "new_extra_non_tensor" in update_region_data.keys(), "new_extra_non_tensor should exist in update region data"
+            # Map fetched global indexes back to positions within the update_data tensor dict
+            fetched_global_indexes = list(update_region_meta.global_indexes)
+            relative_update_positions = [idx_update.index(gi) for gi in fetched_global_indexes]
+            # Validate tensor field values for the updated samples
+            expected_new_tensor = update_data["new_extra_tensor"][relative_update_positions]
+            actual_new_tensor = update_region_data["new_extra_tensor"]
+            assert torch.allclose(
+                actual_new_tensor, expected_new_tensor
+            ), "new_extra_tensor values in update region do not match updated data"
+            # Validate non-tensor field values for the updated samples
+            expected_new_non_tensor = [
+                update_data["new_extra_non_tensor"][i] for i in relative_update_positions
+            ]
+            actual_new_non_tensor = list(update_region_data["new_extra_non_tensor"])
+            assert (
+                actual_new_non_tensor == expected_new_non_tensor
+            ), "new_extra_non_tensor values in update region do not match updated data"

Conversation

mpb159753 commented Feb 14, 2026

Test Case Design

Source Code Modifications

Discussion

Uh oh!

ascend-robot commented Feb 14, 2026

CLA Signature Pass

Uh oh!

ascend-robot commented Feb 14, 2026

CLA Signature Pass

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0oshowero0 commented Feb 14, 2026

Uh oh!

ascend-robot commented Feb 16, 2026

CLA Signature Pass

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!