-
Notifications
You must be signed in to change notification settings - Fork 8
TorchStoreStateDict #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
LucasLLC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Impressed to see so much progress in a short time span.
Some recommended next steps:
-
Let's profile the current implementation and see what kind of speedup we're getting on batch put vs. none-batch put. Could be helpful to add some fine-grained logging (e.g. check out latency_tracker)
-
Next solid step would be to unpack the state dict within the storage volume
-
Once this is done, we can take a look at what it would take to "fetch in batch" as well
| MODEL_LINER_LENGTH = 10 | ||
|
|
||
|
|
||
| def _setup_process_group(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth putting this function in a helper in tests/utils since it's used in multiple places?
https://github.com/meta-pytorch/torchstore/blob/main/tests/utils.py#L105
Concatenate tensors into one blob of bytes for sending across transport (RDMA, Gloo, etc.) instead of sending one by one. In theory this should be faster than sending one by one due to overhead from transport buffers.