[VL] GPU and CPU mixed cluster schedule

### Backend

VL (Velox)

### Bug description

We suppose to schedule some IO bound tasks such as the stage contains table scan to CPU node, and some computation intensive tasks to GPU.
Now Spark has this ability to do stage resource scheduler by resource profile as this document https://spark.apache.org/docs/latest/configuration.html#custom-resource-scheduling-and-configuration-overview describes, in Gluten, there has been offheap/onheap memory allocation adjusted by ResourceProfile

This script describes how to set up GPU host environment, the script has executed on the IBM internal AMI linux image, so if you use IBM pipeline `pipeline-create-dev-vm` and select GPU node such as g4dn.xlarge, the environment is ready, no need to execute the script.
https://raw.githubusercontent.com/jinchengchenghh/gluten/cudf_script/dev/start_cudf_amazon.sh
Note: The environment has been upgraded to cuda 13.1 because cudf build issue, but the script install cuda 12.8, it is outdated.

This document describes how to set up yarn on GPU node.
https://docs.nvidia.com/spark-rapids/user-guide/23.10/getting-started/yarn-gpu.html
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/UsingGpus.html
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html

GPU document describes how to build with GPU
https://github.com/apache/incubator-gluten/blob/main/docs/get-started/VeloxGPU.mdutdated.

Existing offheap/onheap memory ResourceProfile allocation, we should use the similar way to set the profile to require 1 GPU, now the Spark cannot set the core number by resource profile, this feature is under developing.
https://github.com/apache/incubator-gluten/pull/8209

We could use TPCDS q95 to test.

The query runs successfully on yarn, but if we set up the environment according to https://docs.nvidia.com/spark-rapids/user-guide/23.10/getting-started/yarn-gpu.html, the query will hang, I also tried stand alone mode before, it also hangs.


### Gluten version

_No response_

### Spark version

None

### Spark configurations

_No response_

### System information

_No response_

### Relevant logs

```bash

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] GPU and CPU mixed cluster schedule #11524

Backend

Bug description

Gluten version

Spark version

Spark configurations

System information

Relevant logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[VL] GPU and CPU mixed cluster schedule #11524

Description

Backend

Bug description

Gluten version

Spark version

Spark configurations

System information

Relevant logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions