After testing the distributed engine, maybe it is good to go back to the first examples: what if you have a lot of files or tabular data that you need to process that does not fit into the distributed engine.
You can try with Spark programming. Spark is complex but we just make an example to have a feeling about its powerful in supporting us. You can go back to the taxi example
You can try with a single node Spark in your laptop. Or you can get a node in Databricks. Even better: you can go to get a free DataProc cluster in Google Cloud, if you have credits.
if you find difficult to setup Apache Spark, it is normal as this part should be part of the CS core. so let us assume that you have Spark running.
Well, Spark has several APIs but say, you are data analytics you we assume you can write and run Python (maybe other programming languages as well but in the age of ML/AI, everyone likes to write some python code)
Note: to be added
check our sample of taxi again.