Skip to content

uw-syfi/piper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Piper

Piper is a PyTorch library for training large models with flexible pipeline parallel schedules.

Environment setup: conda

We assume a Linux-based environment

  1. Create a conda environment with python==3.10
  2. Install the requirements in requirements.txt
  3. Modify PyTorch and Ray dependencies according to the instructions below

Modifying dependencies

PyTorch

Piper RemoteTensor is not traceable by TorchDynamo.

  • WIP: following FakeTensor, register operator implementations that will make RemoteTensor transparently traceable by TorchDynamo.
  • Modification: Add to the beginning of transform in convert_frame.py:
####### PIPER MODIFICATION START #######
# Instead of tracing RemoteTensors, trace their
# underlying FakeTensor
from src.piper_utils import RemoteTensor
for k, v in locals.items():
    if isinstance(v, RemoteTensor):
        locals[k] = v._fake
####### PIPER MODIFICATION END #######

Piper RemoteTensor causes recompilation bugs because it's not traceable by TorchDynamo.

####### PIPER MODIFICATION START #######
def filter_guards(guard):
    return not guard.inner_create_fn().__name__ == "TENSOR_MATCH"
guards = list(filter(filter_guards, guards))
####### PIPER MODIFICATION END #######

Ray

Tensor transport backends currently only support 1 return value per task.

####### PIPER MODIFICATION START #######
# if num_returns != 1:
#     raise ValueError(
#         f"Currently, methods with tensor_transport={tensor_transport.name} only support 1 return value. "
#         "Please make sure the actor method is decorated with `@ray.method(num_returns=1)` (the default)."
#     )
####### PIPER MODIFICATION END #######
####### PIPER MODIFICATION START #######
gpu_object_manager = ray._private.worker.global_worker.gpu_object_manager
if isinstance(object_refs, ObjectRef):
    object_ref = object_refs
    gpu_object_manager.add_gpu_object_ref(
        object_ref, self._actor, tensor_transport
    )
else:
    for object_ref in object_refs:
        assert isinstance(object_ref, ObjectRef)
        gpu_object_manager.add_gpu_object_ref(
            object_ref, self._actor, tensor_transport
        )
####### PIPER MODIFICATION END #######

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages