Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions NOTICE-binary
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,16 @@ Copyright (2021) The Delta Lake Project Authors.

---------------------------------------------------------

Apache Hudi
Copyright 2019-2020 The Apache Software Foundation.

---------------------------------------------------------

Apache Paimon
Copyright 2023-2025 The Apache Software Foundation.

---------------------------------------------------------

The Velox Project
Copyright © 2024 Meta Platforms, Inc.

Expand Down
12 changes: 6 additions & 6 deletions docs/get-started/Velox.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,12 +247,12 @@ When compiling the Gluten Java module, it's required to enable `celeborn` profil
mvn clean package -Pbackends-velox -Pspark-3.3 -Pceleborn -DskipTests
```

Then add the Gluten and Spark Celeborn Client packages to your Spark application's classpath(usually add them into `$SPARK_HOME/jars`).
Then add the Gluten and Spark Celeborn Client packages to your Spark application's classpath (usually add them into `$SPARK_HOME/jars`).

- Celeborn: celeborn-client-spark-3-shaded_2.12-[celebornVersion].jar
- Gluten: gluten-velox-bundle-spark3.x_2.12-xx_xx_xx-SNAPSHOT.jar (The bundled Gluten Jar. Make sure -Pceleborn is specified when it is built.)
- Gluten: gluten-velox-bundle-spark3.x_2.12-xxx.jar (The bundled Gluten JAR built with the -Pceleborn profile specified)

Currently to use Gluten following configurations are required in `spark-defaults.conf`
Currently, to use Gluten, the following configurations are required in `spark-defaults.conf`

```
spark.shuffle.manager org.apache.spark.shuffle.gluten.celeborn.CelebornShuffleManager
Expand Down Expand Up @@ -304,12 +304,12 @@ When compiling the Gluten Java module, it's required to enable `uniffle` profile
mvn clean package -Pbackends-velox -Pspark-3.3 -Puniffle -DskipTests
```

Then add the Uniffle and Spark Celeborn Client packages to your Spark application's classpath(usually add them into `$SPARK_HOME/jars`).
Then add the Uniffle and Spark Celeborn Client packages to your Spark application's classpath (usually add them into `$SPARK_HOME/jars`).

- Uniffle: rss-client-spark3-shaded-[uniffleVersion].jar
- Gluten: gluten-velox-bundle-spark3.x_2.12-xx_xx_xx-SNAPSHOT.jar (The bundled Gluten Jar. Make sure -Puniffle is specified when it is built.)
- Gluten: gluten-velox-bundle-spark3.x_2.12-xxx.jar (The bundled Gluten JAR built with the -Puniffle profile specified)

Currently to use Gluten following configurations are required in `spark-defaults.conf`
Currently, to use Gluten, the following configurations are required in `spark-defaults.conf`

```
spark.shuffle.manager org.apache.spark.shuffle.gluten.uniffle.UniffleShuffleManager
Expand Down
20 changes: 10 additions & 10 deletions docs/velox-backend-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,30 +40,30 @@ the gluten jar is loaded prior to the vanilla spark jar. In this section, we wil

```
// spark will upload the gluten jar to hdfs and then the nodemanager will fetch the gluten jar before start the executor process. Here also can set the spark.jars.
spark.files = {absolute_path}/gluten-<spark-version>-<gluten-version>-SNAPSHOT-jar-with-dependencies.jar
spark.files=<absolute_path_to_Gluten_JAR>
// The absolute path on running node
spark.driver.extraClassPath={absolute_path}/gluten-<spark-version>-<gluten-version>-SNAPSHOT-jar-with-dependencies.jar
spark.driver.extraClassPath=<absolute_path_to_Gluten_JAR>
// The relative path under the executor working directory
spark.executor.extraClassPath=./gluten-<spark-version>-<gluten-version>-SNAPSHOT-jar-with-dependencies.jar
spark.executor.extraClassPath=./<Gluten_JAR>
```

#### Configurations for Yarn Cluster mode
```
spark.driver.userClassPathFirst = true
spark.executor.userClassPathFirst = true
spark.driver.userClassPathFirst=true
spark.executor.userClassPathFirst=true

spark.files = {absolute_path}/gluten-<spark-version>-<gluten-version>-SNAPSHOT-jar-with-dependencies.jar
spark.files=<absolute_path_to_Gluten_JAR>
// The relative path under the executor working directory
spark.driver.extraClassPath=./gluten-<spark-version>-<gluten-version>-SNAPSHOT-jar-with-dependencies.jar
spark.driver.extraClassPath=./<Gluten_JAR>
// The relative path under the executor working directory
spark.executor.extraClassPath=./gluten-<spark-version>-<gluten-version>-SNAPSHOT-jar-with-dependencies.jar
spark.executor.extraClassPath=./<Gluten_JAR>
```
#### Configurations for Local & Standalone mode
```
// The absolute path on running node
spark.driver.extraClassPath={absolute_path}/gluten-<spark-version>-<gluten-version>-SNAPSHOT-jar-with-dependencies.jar
spark.driver.extraClassPath=<absolute_path_to_Gluten_JAR>
// The absolute path on running node
spark.executor.extraClassPath={absolute_path}/gluten-<spark-version>-<gluten-version>-SNAPSHOT-jar-with-dependencies.jar
spark.executor.extraClassPath=<absolute_path_to_Gluten_JAR>
```

### Invalid pointer error
Expand Down
6 changes: 3 additions & 3 deletions tools/qualification-tool/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ This will create a jar file in the `target` directory.
To execute the tool, use the following command:

```bash
java -jar target/qualification-tool-1.3.0-SNAPSHOT-jar-with-dependencies.jar -f <eventFile>
java -jar <qualification_tool_fat_JAR> -f <eventFile>
```

### Parameters:
Expand All @@ -37,12 +37,12 @@ java -jar target/qualification-tool-1.3.0-SNAPSHOT-jar-with-dependencies.jar -f

### Example Usage:
```bash
java -jar target/qualification-tool-1.3.0-SNAPSHOT-jar-with-dependencies.jar -f /path/to/eventlog
java -jar <qualification_tool_fat_JAR> -f /path/to/eventlog
```

### Advanced Example:
```bash
java -jar target/qualification-tool-1.3.0-SNAPSHOT-jar-with-dependencies.jar -f /path/to/folder -o /output/path -t 8 -d 2023-01-01 -k /path/to/gcs_keys.json -p my_project
java -jar <qualification_tool_fat_JAR> -f /path/to/folder -o /output/path -t 8 -d 2023-01-01 -k /path/to/gcs_keys.json -p my_project
```

## Features
Expand Down
9 changes: 2 additions & 7 deletions tools/workload/benchmark_velox/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,18 +48,13 @@ Please check the **Set up perf analysis tools (optional)** section in [initializ

After the workload completes, the tool generates a notebook, executes it automatically, and saves the output notebook in the `$HOME/PAUS/base_dir` directory with the name of `[APP_NAME]_[APP_ID].ipynb`. Additionally, the output notebook is converted into an HTML format for improved readability, with the same filename, and stored in the `html` sub-folder.

A sample generated notebook for TPCH Q1 and its corresponding HTML file are available for reference:
- Notebook: [tpch_q1.ipynb](./sample/tpch_q1.ipynb)
- HTML file: [tpch_q1.html](./sample/tpch_q1.html)

The notebook also produces a trace-viewer JSON file to analyze workload statistics. This includes SAR metrics and stage/task-level breakdowns. Using this tool, users can compare statistics across stages and queries, identify performance bottlenecks, and target specific stages for optimization.
Executing the notebook will produce a trace-viewer JSON file for analysis. This includes SAR metrics and stage/task-level breakdowns. Using this tool, users can compare statistics across stages and queries, identify performance bottlenecks, and target specific stages for optimization.

You can explore the sample trace-viewer JSON file using the Google Chrome browser. To do so:

1. Download the sample file [trace_result_tpch_q1.json](./sample/trace_result_tpch_q1.json)
1. Download the trace result file, e.g., trace_result_tpch_q1.json.
2. Launch Google Chrome. In the address bar, enter "chrome://tracing/".
3. Use the "Load" button to upload the sample JSON file.

This will allow you to check the sample trace data interactively.

![trace-result-tpch-q1](./sample/Trace-viewer.png)
2 changes: 1 addition & 1 deletion tools/workload/benchmark_velox/params.yaml.template
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
gluten_home: /home/sparkuser/gluten

# Local path to gluten jar.
gluten_target_jar: /home/sparkuser/gluten-velox-bundle-spark3.3_2.12-centos_7_x86_64-1.5.0-SNAPSHOT.jar
gluten_target_jar: /home/sparkuser/gluten-velox-bundle-spark3.5_2.12-linux_amd64-1.5.0.jar

# Spark app master.
master: yarn
Expand Down
Loading