Skip to content

rockthejvm/spark-cluster-docker

Repository files navigation

Spark Cluster Docker

Powered by Rock the JVM

This repository contains the Docker files to create a Spark cluster with a JupyterLab interface. This cluster is used as a teaching tool for the Rock the JVM online courses and live training sessions on Apache Spark:

The cluster is set up for Spark 3.0.0.

How to Install

As prerequisite, you need a Docker installation for your OS. This repository has been tested on Linux and macOS, but with a Bash interpreter it can also work on Windows as it is.

Then, you need to build the Docker images. This repository contains image definitions for

  • a JupyterLab interface
  • a Spark master node
  • a Spark worker node (of which we'll instantiate two, each carrying 2 vCores and 1GB memory)

To build the images, run the build script from the root directory:

./build-images.sh

After the command is finished, still in the root directory, run

docker-compose up

That's it!

Important Links

Other Tools

To kill the cluster, hit Ctrl-C in the terminal running it, or run this command from another terminal in the root directory:

docker-compose kill

To remove the containers altogether, run in the root directory

docker-compose rm

To start a (Scala) Spark Shell, run

./start-spark-shell.sh

To start a PySpark shell, run

./start-pyspark.sh

To start a Spark SQL shell, run

./start-spark-sql.sh

PostgreSQL

This setup also has a SQL database (PostgreSQL) for students to access from Apache Spark. The database comes preloaded with a smaller version of the classical fictitious "employees" database.

To open a PSQL shell and manage the database manually, run the helper script

./psql.sh

How to upload data to the Spark cluster

You have two options:

  1. Use the JupyterLab upload interface while it's active.
  2. Copy your data to shared-workspace — the directory is auto-mounted on all the containers.

How to port your notebooks to another Jupyter instance

Similar options:

  1. Use the JupyterLab interface to download your notebooks as .ipynb files.
  2. Copy the .ipynb files directly from the shared-workspace directory: everything you save will be immediately visible there.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published