spark-machine-learning

This is a demonstration of supervised and unsupervised machine learning techniques in Spark

This is Workshop 11 Spark Machine Learning which one of a workshop series given as part of the Big Data Engineering for Analytics module which fulfills a requirement for the Engineering Big Data certificate issued by NUS-ISS

I have translated the original Python code to Scala

Getting started

Clone the repo

git clone https://github.com/frenoid/tour-of-spark.git

Structure

src/main/scala/com/normanlimxk/SparkML contains the ML code
src/main/resources contains data grouped by ML algorithm
build.sbt contains a list of dependencies. Similar to pom.xml in Maven

Running the Spark job

You have 2 options to run the spark job

Compile and run on a spark-cluster
Use Intellij (Recommended)

(Option 1) Compile and run on a spark-cluster

Do this if you have a spark cluster to spark-submit to
Take note of these versions. See also build.sbt

scala = 2.12.10
spark = 3.0.3
sbt = 1.6.1

Use sbt to compile into a jar

sbt compile

The jar file will be in target/scala-2.12

Use spark-submit to submit the spark job

spark-submit {your-jar-file}

(Option 2 RECOMMENDED) Use Intellij

Install Intellij and use it to Open the build.sbt file as a Project

Intellij will resolve the dependencies listed in build.sbt

Go to Run > Edit Configurations > Modify options > Add dependencies with "provided" scope to classpath

Run > Run class of your choice

Data

The data was provided by Dr LIU FAN from NUS-ISS

Structure

Each class under src.main.scala.com.normanlimxk.SparkML contains examples of each Machine Learning method

The project uses the spark-sbt.g8 template from MrPowers

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
project		project
src		src
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-machine-learning

Getting started

Clone the repo

Structure

Running the Spark job

(Option 1) Compile and run on a spark-cluster

(Option 2 RECOMMENDED) Use Intellij

Data

Structure

About

Uh oh!

Releases

Packages

Languages

frenoid/spark-machine-learning

Folders and files

Latest commit

History

Repository files navigation

spark-machine-learning

Getting started

Clone the repo

Structure

Running the Spark job

(Option 1) Compile and run on a spark-cluster

(Option 2 RECOMMENDED) Use Intellij

Data

Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages