-
Notifications
You must be signed in to change notification settings - Fork 0
Description
The FBM services (node, gui, restful, mqtt, jupyter, tensorboard) are run as ECS services.
These services can be turned off when not in use by setting their "desired tasks" to zero.
- For Fargate services (all except the node) costs will stop once the service task ends;
- For the EC2 services (the gpu-enabled node services), the instance will be terminated 15 minutes later, and costs will stop then
This would save on AWS costs, particualry for expensive GPU instances running the node services.Services can be restarted by setting "desired tasks" back to one.
Ideally services could be spun up when needed, for example with gpu nodes starting up before a training plan is executed, and shutting down afterwards. However, the current Fed-BioMed architecture does not support this. This is because the only communication between the node and the network is by the node connecting into the mqtt and restful servers, so the node must already be running in order know that a gpu instance needs to start.
A way of dealing with this for the gpu nodes (which are the most expensive components) could be to split the Fed-BioMed node code into two services; a messaging agent which runs as a lightweight service always connected to the mqtt server, and a processing service on the gpu instance which runs the training. The messaging agent will spin up the gpu when required before instructing it to perform the training.