Piper

Piper is a PyTorch library for training large models with flexible pipeline parallel schedules.

Environment setup: conda

We assume a Linux-based environment

Create a conda environment with python==3.10
Install the requirements in requirements.txt
Modify PyTorch and Ray dependencies according to the instructions below

Modifying dependencies

PyTorch

Piper RemoteTensor is not traceable by TorchDynamo.

WIP: following FakeTensor, register operator implementations that will make RemoteTensor transparently traceable by TorchDynamo.
Modification: Add to the beginning of transform in convert_frame.py:

####### PIPER MODIFICATION START #######
# Instead of tracing RemoteTensors, trace their
# underlying FakeTensor
from src.piper_utils import RemoteTensor
for k, v in locals.items():
    if isinstance(v, RemoteTensor):
        locals[k] = v._fake
####### PIPER MODIFICATION END #######

Piper RemoteTensor causes recompilation bugs because it's not traceable by TorchDynamo.

WIP: same as above
Modification: Add at the beginning of CheckFunctionManager.__init__ in guards.py:

####### PIPER MODIFICATION START #######
def filter_guards(guard):
    return not guard.inner_create_fn().__name__ == "TENSOR_MATCH"
guards = list(filter(filter_guards, guards))
####### PIPER MODIFICATION END #######

Ray

Tensor transport backends currently only support 1 return value per task.

WIP: Upstream this into Ray.
Modifications (2): Comment out the assertion in ActorMethod._remote() and add logic for handling multiple return values with a GPU object manager.

####### PIPER MODIFICATION START #######
# if num_returns != 1:
#     raise ValueError(
#         f"Currently, methods with tensor_transport={tensor_transport.name} only support 1 return value. "
#         "Please make sure the actor method is decorated with `@ray.method(num_returns=1)` (the default)."
#     )
####### PIPER MODIFICATION END #######

####### PIPER MODIFICATION START #######
gpu_object_manager = ray._private.worker.global_worker.gpu_object_manager
if isinstance(object_refs, ObjectRef):
    object_ref = object_refs
    gpu_object_manager.add_gpu_object_ref(
        object_ref, self._actor, tensor_transport
    )
else:
    for object_ref in object_refs:
        assert isinstance(object_ref, ObjectRef)
        gpu_object_manager.add_gpu_object_ref(
            object_ref, self._actor, tensor_transport
        )
####### PIPER MODIFICATION END #######

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
figs		figs
src		src
test		test
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Piper

Environment setup: conda

Modifying dependencies

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

uw-syfi/piper

Folders and files

Latest commit

History

Repository files navigation

Piper

Environment setup: conda

Modifying dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages