This project exists as a place to view and test out Scala Spark examples with corresponding pom.xml file to be used with Maven to build locally. Assumes using IntelliJ IDEA as IDE but should work with anything. Spark examples can be found at src/main/scala/com/datakickstart/spark/examples.
Recommended starting points:
- Spark Core Batch:
Spark Core Example - Spark Structure Streaming:
Kafka Structure Streaming
Recommendation is to use Docker in which case you can follow the instructions in order (Docker setup, Building project, Running pipeline). You will need Java 8 and Maven installed locally. If on a Mac, using homebrew should work but may have to specify versions.
To test Kafka pipeline locally, you can use Docker and Docker Compose. Once those are installed follow these steps:
- Start docker (if not already running)
- In a terminal window, navigate to project home directory (likely
spark-scala-training) - Run command
docker-compose up -d zookeeper - Run command
docker-compose up - Leave this running in the terminal window and proceed with other steps in a new terminal window/tab
- Navigate to project home directory
- Run command
mvn install
- Kick off the
Kafka Structure Streamingconsuming pipeline with commandspark-submit --master local[*] --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0 --class com.datakickstart.spark.examples.streaming.structured.KafkaStructuredStreamingExample target/spark-training-1.0-SNAPSHOT.jar. - While the pipeline is running, produce messages to Kafka by running job
VehicleStopsWriter(https://github.com/datakickstart/spark-scala-training/blob/master/src/main/scala/com/datakickstart/common/VehicleStopsWriter.scala) either from your IDE or with commandjava -cp target/spark-training-1.0-SNAPSHOT.jar com.datakickstart.common.VehicleStopsWriter vehicle-stops localhost:9092.