This project is meant to automatize the distribution of a PyGrid training job across multiple Genesis Cloud compute instances. The training protocol is specified on a local desktop. The distributor then launches a PyGrid distributed across multiple Genesis Cloud instances. Each instance is home to one worker node and one instance runs the gateway.
Executing this script will start a series of instances, the runtime of which will count against your account credits. Make sure to shut down any idle computing power so as not to accumulate accidental costs. This is solely a proof of concept and, as stated in the license, none of the functionality is guaranteed.
- Set up a Genesis Cloud Account
-
Create a security group called
pygridthat includes a inbound TCP connection for the ports 3000 and 5000 -
Clone this repository to your local machine
git clone https://github.com/Benecoder/distributor.git cd distributor -
Install the required dependencies. It is recommended to do so using your preferred virtual machine. For compatibility down the road, make sure that you are using python version 3.7. Here is how you would do that using anaconda.
conda create -n pygrid_env python=3.7 conda activate pygrid_env pip install -r requirements.txt
As an example a jupyter notebook is provided in model.ipynb.
For running the example, install jupyter lab (or notebook) and place your Genesis Cloud API token and the name of your SSH Key in environment variables.
conda install jupyter notebook
export GC_API_TOKEN="your-api-token"
export GC_SSH_KEY_NAME="your-ssh-key-name"
This is how the network is build up: Using the developer API build_docker_image.py creates
a new ubuntu 18 instance and installs the GPU drivers. All of the PyGrid Software that is used is
obtained by pulling the latest docker containers and starting a redis server using docker-compose
and the correct docker-compose file. To make sure docker, docker-compose and the correct GPU drivers
are installed.The first instance needs to be build using the base_image_cloud_init.yml file. This
cloud-init file makes sure the correct software is installed and once that is the case touches the
/home/ubuntu/installation_finished file.
On The master node it then starts a redis server based on the information in the docker-compose.yml file. This houses one gateway node and one worker, called brutus.
As soon as main.py receives the ip for the gateway it starts a additional series of workers.
These connect to the gateway using the private network between the instances.