Add support for distributed asynchronous training ( ParamServer )

The ParamServer is responsible for storing model parameters. The ParamServer is a service that is made available over a network. The parameters are updated by applying deltas from stochastic gradient processes ( ParamServer clients ) in an asynchronous way.

Since the ParamServer is a separate process it can be implemented in a lower level language like C, so weight updates can be quick. At the moment weight updates takes up 50% of the training time. So it makes sense to move it out to a new service for 2 reasons.
1. Allow for asynchronous training - Data Parallelism
2. Move a bottleneck of the code into a specialized service, will place focus on it, and allow us to implement it in another language

One reason to delay this implementation is that it is not known how easily it will be to generalize this approach with in other types of networks. This approach is employed by Google and I have personal experience in applying it. In both cases it is used to manage the parameters of a feed-forward network. Maybe it is a good idea to delay this implementation until stronger patterns in the code has emerged.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for distributed asynchronous training ( ParamServer ) #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add support for distributed asynchronous training ( ParamServer ) #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions