This repository was archived by the owner on Oct 8, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 149
This repository was archived by the owner on Oct 8, 2019. It is now read-only.
Implement general Parameter Server #332
Copy link
Copy link
Open
Description
A parameter server is a framework to asynchronously share parameters among machine learning workers for higher scalability. Hivemall currently has a standalone server implementation, named a MIX server, to asynchronously average parameters among workers for internal use only. To make the MIX server more general, we are planning to implement parameter server functionalities (e.g., cluster manager supports, optimizers to calculate deltas from gradients to update parameters, RPC protocols that third-party libraries use, and so on) based on the implementation.
We started some works as a first step:
- Support Cluster managers. The MIX server implementation currently supports a standalone mode and we can start a cluster of MIX servers through the start-up script. For easy operability, it is a good idea to deploy MIX servers via cluster managers, e.g.,
Apache Hadoop YARNandApache Mesos. We are working on aYARNintegration in [WIP] Add codes to launch MIX servers through a yarn cluster manager #236, Launch MIX servers through a yarn cluster manager #246, and the topic branch. - Incorporate optimizer functionalities into the MIX server.
Hivemallhas optimizer functionalities in the core package. So, we'll separate them in Separate optimizer implementations from the core package #285 and then import in core and mixserv. - Define RPC protocols for general use. There are some works (e.g., Making MixServer's serialization pluggable #147) though, this interoperability issue is still open.
This ticket is to track related activities for parameter servers and please feel free to leave comments and advices here.