A fromconfig Launcher for yarn execution.
pip install fromconfig_yarnOnce installed, the launcher is available with the name yarn.
Given the following module
class Model:
def __init__(self, learning_rate: float):
self.learning_rate = learning_rate
def train(self):
print(f"Training model with learning_rate {self.learning_rate}")and config files
# config.yaml
model:
_attr_: foo.Model
learning_rate: "${params.learning_rate}"
# params.yaml
params:
learning_rate: 0.001
# launcher.yaml
yarn:
name: test-fromconfig
logging:
level: 20
launcher:
run: yarnRun (assuming you are in a Hadoop environment)
fromconfig config.yaml params.yaml launcher.yaml - model - trainWhich prints
INFO:fromconfig.launcher.logger:- yarn.name: test-fromconfig
INFO:fromconfig.launcher.logger:- logging.level: 20
INFO:fromconfig.launcher.logger:- params.learning_rate: 0.001
INFO:fromconfig.launcher.logger:- model._attr_: foo.Model
INFO:fromconfig.launcher.logger:- model.learning_rate: 0.001
INFO skein.Driver: Driver started, listening on 12345
INFO:fromconfig_yarn.launcher:Uploading pex to viewfs://root/user/path/to/pex
INFO:cluster_pack.filesystem:Resolved base filesystem: <class 'pyarrow.hdfs.HadoopFileSystem'>
INFO:cluster_pack.uploader:Zipping and uploading your env to viewfs://root/user/path/to/pex
INFO skein.Driver: Uploading application resources to viewfs://root/user/...
INFO skein.Driver: Submitting application...
INFO impl.YarnClientImpl: Submitted application application_12345
INFO:fromconfig_yarn.launcher:TRACKING_URL: http://12.34.56/application_12345
You can also monkeypatch the relevant functions to "fake" the Hadoop environment with
python monkeypatch_fromconfig.py config.yaml params.yaml launcher.yaml - model - trainThis example can be found in docs/examples/quickstart.
To configure Yarn, add a yarn entry to your config.
You can set the following parameters.
env_vars: A list of environment variables to forward to the container(s)hadoop_file_systems: The list of available filesystemsignored_packages: The list of packages not to include in the environmentjvm_memory_in_gb: The JVM memory (default,8)memory: The executor's memory (default,32 GiB)num_cores: The executor's number of cores (default,8)package_path: The HDFS location where to save the environmentzip_file: The path to an existingpexfile, either local or on HDFSname: The application namequeue: The yarn queue to submit the application tonode_label: The label of the hadoop node to be scheduledpre_script_hook: A script to be executed before python is invokedextra_env_vars: A mapping of extra environment variables to forward to the container(s)