You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compared to the previous KVGroup approach proposed in #263, this approach replaces the internal KVGroup struct with a multiton pattern, using one FTensorAllocator instance per group_id, lazily created via global_allocator(group_id).
What changed
allocator.hpp: Replaced single g_allocator_ with g_allocators_ map.
allocator.cpp: global_allocator(group_id) lazily creates per-group allocators.
torch_bindings.cpp: Routes group_id to the correct allocator at the binding layer.
Thanks for the PR! This one does look simpler. QQ: how do we set the kv cache config, such as tensor size, number of layers, etc., into the cpp extension?
Thanks for the PR! This one does look simpler. QQ: how do we set the kv cache config, such as tensor size, number of layers, etc., into the cpp extension?
The kv cache config is set when the Python side calls create_kv_tensors(size, dtype_size, dev_str, num_layers, num_kv_buffers, group_id), which is the same entry point as before. torch_bindings.cpp receives all config + group_id, and calls global_allocator(group_id) to get or lazily create the right allocator instance.
Then in allocator.cpp, create_kv_tensors() stores the config into its own members, creates the zero page, and builds the FTensors.
So each allocator instance gets configured the first time create_kv_tensors is called on it with group_id.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Compared to the previous KVGroup approach proposed in #263, this approach replaces the internal KVGroup struct with a multiton pattern, using one
FTensorAllocatorinstance pergroup_id, lazily created viaglobal_allocator(group_id).What changed
allocator.hpp: Replaced single
g_allocator_withg_allocators_map.allocator.cpp:
global_allocator(group_id)lazily creates per-group allocators.torch_bindings.cpp: Routes
group_idto the correct allocator at the binding layer.Tested gpt-oss-20b on sglang-0.5.9.