Implementation of Spark history server that uses MongoDB as a backend to store events.
History server provides persistence and quick access to application logs without keeping data in memory. Inspired by amazing work done for Spark UI in hammerlab/spree, and makes install and ops easy, since the project is designed to be drop-in replacement for Spark history server.
This is a very early stage of the project and some notable features are missing such as RDD operation graph, event timeline, cache timeline, etc. I will be working on adding them, and contributions are always welcome.
- Spark 2.x
- Java 7+
- Mongo 3.2+ (see install below)
Available distributions (.tgz) are uploaded with every release and live in releases tab on GitHub. You can also build your own, see Build section.
Download one of the distributions history-server-bin-X.Y.Z.tgz, unpack archive, and edit a few
configuration parameters in conf/history-server-env.sh (see Configuration).
$ tar -xzf history-server-bin-X.Y.Z.tgz
# optionally edit configuration
$ vi conf/history-server-env.shMake sure that you have MongoDB running before you start application (though app will report error
if database is not accessible). You can run docker container as well, in this case you do not need
to change any settings in conf/history-server-env.sh (unless you also change container host/port).
$ docker run -it -p 27017:27017 mongo:3.2Application will create database history_server and necessary tables automatically.
To launch application run:
$ sbin/start.shFollowing options can be specified with start.sh:
-d,--daemon=true/falselaunch service as daemon process--helpshow help for script
To stop application use Ctrl-C or sbin/stop.sh. Script does not stop Mongo database or docker
container as part of shutdown.
Configuration for history server is available in conf/history-server-env.sh. You can set
following options:
HISTORY_SERVER_HOSThost to use for history server, default is localhostHISTORY_SERVER_PORTport to use for history server, default is 8080SPARK_EVENT_LOG_DIRdirectory with Spark application logs, normally configured asspark.eventLog.diroption in Spark, can be eitherfile:/orhdfs:/; directory should exist otherwise error is raisedMONGO_CONNECTIONconnection url to MongoDB, default is mongodb://localhost:27017LOG4J_CONF_FILE- alternative path to log4j configuration file, should be in form offile:/path/to/file, if not provided default is used inconf/directory
You can also configure logging in conf/log4j.properties, by default logging level is set to INFO.
If you want to build project, instructions are below:
Build requirements
- Java 7+
- Node 6+ (npm 3.9.5 works)
Clone repository:
git clone https://github.com/lightcopy/history-server.git
cd history-server
# Prepare code and dev files
sbt compile # pull dependencies and compile code
npm install # install frontend dependenciesTo make distribution, just run bin/make-distribution. Script will compile sources, assemble jar,
and create static files (html/css/js), and copy them into target/history-server-bin directory.
Following options are available:
--nameadds suffix to the name, e.g.--name=xyzwill result intarget/history-server-bin-xyz--tgzcreate.tgzarchive, release directory will be removed afterwards; if not provided - only directory is created--helpshow help for script
Note that there is no need to build distribution to test code, since repository acts like distribution (all scripts work the same way). Following process might be useful:
# build code and assembly jar
$ sbt assembly
# build static files
$ npm run dev
# run start script (Mongo should be running)
$ sbin/start.shstart.sh will discover jars that need to be added to classpath.
Also bin/start-dev.sh script is available to test either frontend or some basic functionality.
This runs server that does not require MongoDB or scanning any event logs and returns sample data
when API is invoked.
You can also run individual build commands declared in package.json, e.g. to rebuild javascript
code, just run npm run make_js.
Run sbt test to launch tests.
Run bin/make-release with --release set to release version (e.g. 0.1.2) and --next set to
next development version (e.g. 0.1.3-SNAPSHOT).