In this demo, we will show you how to do the following all orchestrated by Airflow
- Create a Dataproc cluster
- Submit a Scala Spark Job that runs a
select * from database.keyspace.tableagainst Astra - Destroy Dataproc cluster upon job completion
If you want to modify the Scala Spark Jar, you can reference spark-cassandra.scala and the comments within the file.
1.1 Create a Google Cloud Storage Bucket: (https://console.cloud.google.com/storage/browser)
Airflow credentials are admin admin
3.3 Copy the full path of the keyfile.json and paste into the Keyfile Path section of the Google connection setup and save.
Reference comments in poc.py if unsure