Apache Spark for big data

After testing the distributed engine, maybe it is good to go back to the first examples: what if you have a lot of files or tabular data that you need to process that does not fit into the distributed engine.

You can try with Spark programming. Spark is complex but we just make an example to have a feeling about its powerful in supporting us. You can go back to the taxi example

Setting up Spark

You can try with a single node Spark in your laptop. Or you can get a node in Databricks. Even better: you can go to get a free DataProc cluster in Google Cloud, if you have credits.

if you find difficult to setup Apache Spark, it is normal as this part should be part of the CS core. so let us assume that you have Spark running.

Low-level programming

Well, Spark has several APIs but say, you are data analytics you we assume you can write and run Python (maybe other programming languages as well but in the age of ML/AI, everyone likes to write some python code)

Note: to be added

SQL stype

check our sample of taxi again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apache Spark for big data

Setting up Spark

Low-level programming

SQL stype

FilesExpand file tree

spark.md

Latest commit

History

spark.md

File metadata and controls

Apache Spark for big data

Setting up Spark

Low-level programming

SQL stype