-
Notifications
You must be signed in to change notification settings - Fork 1
Spark
Spark 2.4.3
Download and extract Spark in path you want to install it on.
Edit hosts file.
$ sudo vim /etc/hosts
Now add entries of master and slaves in hosts file.
<MASTER-IP> master
<SLAVE01-IP> slave1
<SLAVE02-IP> slave2
<SLAVE02-IP> slave3
Generate key pairs:
$ ssh-keygen -t rsa -P ""
Configure passwordless SSH:
ssh-copy-id remote_username@server_ip_address -p port
Check by SSH to all the slaves
$ ssh slave1
$ ssh slave2
$ ssh slave3
Do the following procedures only in master.
Move to spark conf folder and create a copy of template of spark-env.sh and rename it.
$ cd /usr/local/spark/conf
$ cp spark-env.sh.template spark-env.sh
Now edit the configuration file spark-env.sh.
$ sudo vim spark-env.sh
And set the following parameters.
export SPARK_MASTER_HOST='<MASTER-IP>'
export JAVA_HOME=<Path_of_JAVA_installation>
note: you need to set java home in slaves too.
Edit the configuration file slaves in (/usr/local/spark/conf).
$ sudo vim slaves
And add the following entries.
slave1
slave2
slave3
To start the spark cluster, run the following command on master.
$ cd /usr/local/spark
$ ./sbin/start-all.sh
To stop the spark cluster, run the following command on master.
$ cd /usr/local/spark
$ ./sbin/stop-all.sh
To check daemons on master and slaves, use $ jps.
Browse the Spark UI to know about worker nodes, running application, cluster resources.
http://<MASTER-IP>:8080/
http://<MASTER_IP>:8040/
https://medium.com/ymedialabs-innovation/apache-spark-on-a-multi-node-cluster-b75967c8cb2b