Skip to content

Add support for distributed asynchronous training ( ParamServer ) #2

@Henry-Chinner

Description

@Henry-Chinner

The ParamServer is responsible for storing model parameters. The ParamServer is a service that is made available over a network. The parameters are updated by applying deltas from stochastic gradient processes ( ParamServer clients ) in an asynchronous way.

Since the ParamServer is a separate process it can be implemented in a lower level language like C, so weight updates can be quick. At the moment weight updates takes up 50% of the training time. So it makes sense to move it out to a new service for 2 reasons.

  1. Allow for asynchronous training - Data Parallelism
  2. Move a bottleneck of the code into a specialized service, will place focus on it, and allow us to implement it in another language

One reason to delay this implementation is that it is not known how easily it will be to generalize this approach with in other types of networks. This approach is employed by Google and I have personal experience in applying it. In both cases it is used to manage the parameters of a feed-forward network. Maybe it is a good idea to delay this implementation until stronger patterns in the code has emerged.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions