Skip to content

Conversation

@daniellepintz
Copy link
Contributor

@daniellepintz daniellepintz commented Jan 4, 2026

Concatenate tensors into one blob of bytes for sending across transport (RDMA, Gloo, etc.) instead of sending one by one. In theory this should be faster than sending one by one due to overhead from transport buffers.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 4, 2026
@daniellepintz daniellepintz changed the title regular tensor working TorchStoreStateDict Jan 4, 2026
Copy link
Contributor

@LucasLLC LucasLLC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Impressed to see so much progress in a short time span.

Some recommended next steps:

  • Let's profile the current implementation and see what kind of speedup we're getting on batch put vs. none-batch put. Could be helpful to add some fine-grained logging (e.g. check out latency_tracker)

  • Next solid step would be to unpack the state dict within the storage volume

  • Once this is done, we can take a look at what it would take to "fetch in batch" as well

MODEL_LINER_LENGTH = 10


def _setup_process_group():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth putting this function in a helper in tests/utils since it's used in multiple places?

https://github.com/meta-pytorch/torchstore/blob/main/tests/utils.py#L105

@daniellepintz
Copy link
Contributor Author

daniellepintz commented Jan 11, 2026

@LucasLLC I moved the get changes to a new PR: #97

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants