Automaically turn off GPU and other services to save AWS costs

The FBM services (node, gui, restful, mqtt, jupyter, tensorboard) are run as ECS services.
These services can be turned off when not in use by setting their "desired tasks" to zero. 
- For Fargate services (all except the node) costs will stop once the service task ends;
- For the EC2 services (the gpu-enabled node services), the instance will be terminated 15 minutes later, and costs will stop then

This would save on AWS costs, particualry for expensive GPU instances running the node services.Services can be restarted by setting "desired tasks" back to one.

Ideally services could be spun up when needed, for example with gpu nodes starting up before a training plan is executed, and shutting down afterwards. However, the current Fed-BioMed architecture does not support this. This is because the only communication between the node and the network is by the node connecting into the mqtt and restful servers, so the node must already be running in order know that a gpu instance needs to start.

A way of dealing with this for the gpu nodes (which are the most expensive components) could be to split the Fed-BioMed node code into two services; a messaging agent which runs as a lightweight service always connected to the mqtt server, and a processing service on the gpu instance which runs the training. The messaging agent will spin up the gpu when required before instructing it to perform the training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automaically turn off GPU and other services to save AWS costs #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Automaically turn off GPU and other services to save AWS costs #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions