Deployment Guide for Spark on EMR #11279
Replies: 2 comments 6 replies
-
|
Gluten uses couple of Spark APIs which may be modified by customized Spark like AWS EMR. Gluten is built on top of vanilla spark only. So you will have high risk to encounter API conflict when you run Gluten with EMR spark. |
Beta Was this translation helpful? Give feedback.
-
|
Small update, managed to start the cluster with the JAR stored from s3, but then unable to read the Parquet files from S3. EMR has preinstalled packages that manages Didn't really manage to get it working and gave up, maybe I'll come back to this at some point. If anyone managed to run Gluten-Velox on EMR, looking forward to follow your guidance :) |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Hi, sorry for the amateur question, but I'm facing difficulty when testing the deployment of Gluten+Velox on AWS EMR.
I'm using EMR 7.8, Spark 3.5.4, with PySpark. I'm using Intel x86 instances on Amazon Linux distribution. I want to test a job that reads from vanilla Parquet, do simple
Project&Exchange, and write to Iceberg table (copy-on-write). Both input & output are in S3.I downloaded the
gluten-velox-bundle-spark3.5_2.12-linux_amd64-1.4.0.jarand put it into S3 directly. Then, in the EMR cluster, I'm trying to run a Spark submit while passing the JAR and setting up the configurations.Questions:
Beta Was this translation helpful? Give feedback.
All reactions