[feat] Introduce high-level key-value (KV) interface#28
[feat] Introduce high-level key-value (KV) interface#280oshowero0 wants to merge 35 commits intoAscend:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds a Key-Value (KV) adapter API to TransferQueue, enabling users to interact with data using string keys instead of BatchMeta objects and global indexes. The feature provides both synchronous and asynchronous APIs for putting, getting, listing, and clearing key-value pairs.
Changes:
- Adds KV interface API with
kv_put,kv_batch_put,kv_get,kv_list,kv_clearand their async variants - Modifies
BatchMeta.update_custom_meta()API from dict-based (indexed by global_index) to list-based (indexed by position) - Adds
keys_mappingandrevert_keys_mappingto DataPartitionStatus for key-to-index translation - Adds new ZMQ request types (
KV_RETRIEVE_KEYS,KV_LIST) for KV operations - Includes comprehensive test coverage for the new KV interface
- Adds tutorial files demonstrating custom samplers, controller features, and streaming data loading
Reviewed changes
Copilot reviewed 15 out of 18 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| transfer_queue/interface.py | Adds KV API functions (kv_put, kv_get, kv_list, kv_clear) with sync/async variants and helper utility dict_to_tensordict |
| transfer_queue/controller.py | Adds keys_mapping/revert_keys_mapping fields, kv_retrieve_keys method, KV_LIST request handling, and key cleanup logic |
| transfer_queue/client.py | Implements async_kv_retrieve_keys and async_kv_list methods with proper validation and error handling |
| transfer_queue/metadata.py | Changes custom_meta API from dict[int, dict] to list[dict], affecting update_custom_meta and get_all_custom_meta |
| transfer_queue/storage/managers/base.py | Renames custom_meta to custom_backend_meta for clarity |
| transfer_queue/utils/zmq_utils.py | Adds KV_RETRIEVE_KEYS and KV_LIST ZMQ request types |
| transfer_queue/utils/common.py | Adds dict_to_tensordict utility function for converting dicts to TensorDict |
| transfer_queue/init.py | Exports new KV API functions |
| tutorial/03_metadata_concepts.py | Updates to use new list-based custom_meta API |
| tutorial/04_understanding_controller.py | New tutorial demonstrating controller features |
| tutorial/05_custom_sampler.py | New tutorial showing custom sampler development |
| tutorial/06_streaming_dataloader.py | New tutorial for streaming data loading |
| tests/test_kv_interface.py | Comprehensive unit tests for all KV interface functions |
| tests/test_controller.py | Tests for controller KV interface functionality |
| tests/test_controller_data_partitions.py | Tests for DataPartitionStatus KV methods |
| tests/test_client.py | Tests for client KV methods and mock KV responses |
| tests/test_metadata.py | Updates tests for new custom_meta API |
| tests/test_kv_storage_manager.py | Updates tests renaming custom_meta to custom_backend_meta |
Comments suppressed due to low confidence (1)
tutorial/03_metadata_concepts.py:213
- The
update_custom_metacall on lines 208-213 only provides 2 items in the list for a batch that contains 5 samples (created on line 193). According to the new API (line 333 in metadata.py), this will raise a ValueError because the length of custom_meta (2) doesn't match the batch size (5). Either provide custom_meta for all 5 samples or adjust the example to create a batch with only 2 samples.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
5813c41 to
4836b48
Compare
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
89849dd to
b02e648
Compare
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 12 comments.
Comments suppressed due to low confidence (1)
README.md:168
- This link still points to
tutorial/05_streaming_dataloader.py, but the StreamingDataLoader tutorial file is nowtutorial/06_streaming_dataloader.py(and05iscustom_sampler). The current URL likely 404s; please update it to the new tutorial path/number.
We have experimentally implemented a **standardized, fully-streamed distributed** workflow via TransferQueue.
By leveraging the `RankAwareSampler` and `StreamingDataLoader` interfaces, we achieve a **streamlined micro-batch-level producer-consumer pipeline**. This design eliminates the need to manually determine data dispatching logic across varying parallelism strategies—a typical complexity in the single-controller paradigm—thereby greatly simplifying framework design.
Please refer to our [Roadmap](https://github.com/Ascend/TransferQueue/issues/1) and [tutorials/05_streaming_dataloader.py](https://github.com/Ascend/TransferQueue/blob/main/tutorial/05_streaming_dataloader.py) for more details.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # TODO(tianyi): the order of custom meta is coupled with keys/values | ||
| for (field_name, global_idx), meta_value in zip( | ||
| itertools.product(sorted(metadata.field_names), metadata.global_indexes), | ||
| custom_meta, | ||
| custom_backend_meta, | ||
| strict=True, | ||
| ): | ||
| per_field_custom_meta[global_idx][field_name] = meta_value | ||
| metadata.update_custom_meta(per_field_custom_meta) | ||
| per_field_custom_backend_meta[global_idx][field_name] = meta_value | ||
| metadata._custom_backend_meta.update(per_field_custom_backend_meta) |
There was a problem hiding this comment.
In KVStorageManager.put_data, keys are generated from data.keys() but the per-field custom_backend_meta mapping is zipped against sorted(metadata.field_names). If metadata.field_names is empty or doesn’t match the fields being written (e.g., when inserting brand-new KV keys or adding new columns), this can either make the write a no-op (due to the earlier if not metadata.field_names: return) or raise due to the strict zip/length mismatch. Consider deriving the field iteration order from data.keys() (or updating metadata with data fields before this point) so KV writes work reliably.
There was a problem hiding this comment.
@Evelynn-V Please notice this potential issue
| fields = TensorDict(batch, batch_size=[1]) | ||
| elif not isinstance(fields, TensorDict): | ||
| raise ValueError("field can only be dict or TensorDict") | ||
|
|
There was a problem hiding this comment.
kv_put retrieves a BatchMeta that can have an empty field_names set (especially for brand-new keys). For the KV storage backend (KVStorageManager), put_data() is a no-op when metadata.field_names is empty and may also mis-handle custom_backend_meta ordering if metadata.field_names doesn’t include the fields being written. Before calling tq_client.put(...), ensure batch_meta is populated with the fields being written (e.g., add the fields to metadata) so inserts/partial updates work across backends.
| # Ensure BatchMeta.field_names includes all fields being written so that | |
| # KV backends handle inserts/updates correctly even for brand-new keys. | |
| if hasattr(batch_meta, "field_names"): | |
| if batch_meta.field_names is None: | |
| batch_meta.field_names = set() | |
| try: | |
| batch_meta.field_names.update(list(fields.keys())) | |
| except AttributeError: | |
| # In case field_names is not a set-like container, fall back to assignment. | |
| batch_meta.field_names = set(fields.keys()) |
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
transfer_queue/interface.py
Outdated
| _maybe_create_transferqueue_client(final_conf) | ||
|
|
||
|
|
||
| # ==================== Basic API ==================== |
There was a problem hiding this comment.
Does method init() belong to 'Basic API'?
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
README.md:168
- In the Disaggregated Example section, the tutorial reference still points to
tutorial/05_streaming_dataloader.py, but this PR renumbers the streaming dataloader tutorial to 06 (and updates other references accordingly). Update this link/text to avoid sending users to the wrong tutorial.
We have experimentally implemented a **standardized, fully-streamed distributed** workflow via TransferQueue.
By leveraging the `RankAwareSampler` and `StreamingDataLoader` interfaces, we achieve a **streamlined micro-batch-level producer-consumer pipeline**. This design eliminates the need to manually determine data dispatching logic across varying parallelism strategies—a typical complexity in the single-controller paradigm—thereby greatly simplifying framework design.
Please refer to our [Roadmap](https://github.com/Ascend/TransferQueue/issues/1) and [tutorials/05_streaming_dataloader.py](https://github.com/Ascend/TransferQueue/blob/main/tutorial/05_streaming_dataloader.py) for more details.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Summary
This PR introduces a High-Level Key-Value (KV) Interface to TransferQueue, offering a Redis-style API that can enjoy most of the advanced features provided by TransferQueue.
Background
In previous versions of TransferQueue, the learning curve was relatively sharp for new users. To perform basic operations, users had to:
BatchMetaSampleMetaandFieldMetadesign (as illustrated in tutorial/02_metadat_concepts.pyTransferQueueClientAPI.Although PR #26 simplified the initialization process, the core interaction still required exposing low-level details. This PR bridges that gap by providing a familiar, easy-to-use KV abstraction.
TransferQueue API Architecture
With this PR, TransferQueue now supports a two-level API architecture to satisfy different user needs.
High-Level API
Key-Value based API (This PR)
Methods
Key Features
StreamingDataLoader API
Refer to our RoadMap and related PRs(#23).
The usage example can be found in tutorial/06_streaming_dataloader.py.
Low-Level API
Directly manipulate the
TransferQueueClient. Refer to tutorial/03_metadata_concepts.py, tutorial/04_understanding_controller.py and tutorial/05_custom_sampler.py for details.Usage Example
Please refer to tutorial/02_kv_interface.py and tests/e2e/test_kv_interface_e2e.py for details.
Use Cases & Limitations
Best For:
Limitations (vs. Streaming/Low-level APIs):
keysbefore fetching, rather than a continuous stream.