Skip to content

Automaically turn off GPU and other services to save AWS costs #6

@tomdoel

Description

@tomdoel

The FBM services (node, gui, restful, mqtt, jupyter, tensorboard) are run as ECS services.
These services can be turned off when not in use by setting their "desired tasks" to zero.

  • For Fargate services (all except the node) costs will stop once the service task ends;
  • For the EC2 services (the gpu-enabled node services), the instance will be terminated 15 minutes later, and costs will stop then

This would save on AWS costs, particualry for expensive GPU instances running the node services.Services can be restarted by setting "desired tasks" back to one.

Ideally services could be spun up when needed, for example with gpu nodes starting up before a training plan is executed, and shutting down afterwards. However, the current Fed-BioMed architecture does not support this. This is because the only communication between the node and the network is by the node connecting into the mqtt and restful servers, so the node must already be running in order know that a gpu instance needs to start.

A way of dealing with this for the gpu nodes (which are the most expensive components) could be to split the Fed-BioMed node code into two services; a messaging agent which runs as a lightweight service always connected to the mqtt server, and a processing service on the gpu instance which runs the training. The messaging agent will spin up the gpu when required before instructing it to perform the training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions