[RFC] RayDP 2.0: Migration to Spark 4.1 & Java 17

## 1. Executive Summary
This proposal outlines the roadmap for **RayDP 2.0**, a major architectural upgrade designed to support the next generation of big data infrastructure. The primary goal is to migrate the core runtime to **Apache Spark 4.1.1**, enforcing **Java 17** and **Scala 2.13**. This modernization addresses the deprecation of older JVMs in the Spark ecosystem and aligns RayDP with the performance and security standards of 2025+.

## 2. Motivation
*   **Spark 4.0 Paradigm Shift:** Apache Spark 4.0 has officially dropped support for Scala 2.12 and Java 8/11. To ensure RayDP remains compatible with modern data stacks, we must align with these upstream requirements.
*   **Performance & Security:** Moving to Java 17 (LTS) unlocks significant garbage collection improvements (ZGC), better container awareness, and enhanced security features required by enterprise deployments.
*   **Technical Debt Resolution:** The previous build system relied on legacy `setup.py` behaviors and Maven plugins incompatible with the Java module system (JPMS). This overhaul modernizes the entire build chain.

## 3. Technical Specification

### 3.1 Core Dependency Matrix
RayDP 2.0 introduces a strict upgrade to the dependency matrix.

| Component | RayDP 1.x (Legacy) | RayDP 2.0 (New) | Impact |
| :--- | :--- | :--- | :--- |
| **Java** | 8 / 11 | **17 (Strict)** | Requires environment updates on all Ray nodes. |
| **Spark** | 3.x | **4.1.1** | Major internal API drift (Scheduler, SQL, RPC). |
| **Scala** | 2.12 | **2.13** | Incompatible binary interface; requires recompilation. |
| **Ray** | 2.x | **2.53.0+** | Aligned with recent Ray releases. |
| **Python** | 3.6+ | **3.10+** | Dropped support for EOL Python versions. |

### 3.2 Architectural Changes

#### A. The Shim Layer Strategy
Spark 4.1 introduces aggressive encapsulation and API removals. To handle this, we have restructured the `core/shims` module:
*   **Legacy Shims Disabled:** Modules for Spark 3.2–3.5 are disabled in the build reactor.
*   **New Shim (`raydp-shims-spark411`):** A dedicated module implementing the `SparkShims` interface for the 4.1.x line.
*   **Package-Private Accessors:** Due to stricter visibility in Spark 4.x, we introduced helper classes in the `org.apache.spark` namespace:
    *   `Spark411Helper`: Handles `TaskContextImpl` instantiation and `CoarseGrainedExecutorBackend` creation.
    *   `Spark411SQLHelper`: Bridges `ArrowConverters` for `toDataFrame` operations, handling the new signature requirements for Arrow batch conversion.

#### B. Scheduler Backend Refactoring
The `RayCoarseGrainedSchedulerBackend` has been significantly refactored:
*   **Removed Deprecated APIs:** Calls to `Utils.sparkJavaOpts` (removed in Spark 4.x) have been replaced with direct configuration handling.
*   **Constructor Updates:** Adapted to the new `HadoopDelegationTokenManager` and `CoarseGrainedSchedulerBackend` signatures.
*   **Scala 2.13 Compliance:** Migrated all collection conversions from `JavaConverters` to `CollectionConverters` and handled explicit boxing for primitive types.

### 3.3 Build System Overhaul
*   **PEP 517 Compliance:** The Python build system now uses `pyproject.toml` as the source of truth for dependencies and metadata.
*   **Editable Installs:** `setup.py` was rewritten to support `pip install -e .` by correctly copying compiled JARs from `core/` into the `raydp/jars` source tree during development. This fixes runtime `IndexError` and `ClassNotFoundException` issues during local testing.
*   **Maven Modernization:** Updated `maven-compiler-plugin`, `scala-maven-plugin`, and `scalatest` to versions compatible with Java 17 modules.

## 4. Breaking Changes
Users upgrading to RayDP 2.0 must be aware of the following:
1.  **Java 8/11 is no longer supported.** Attempting to run RayDP 2.0 on older JVMs will result in `UnsupportedClassVersionError`.
2.  **Spark 3.x is no longer supported.** The codebase is not backward compatible.
3.  **Scala 2.12 is no longer supported.** Custom UDFs or libraries compiled against Scala 2.12 will cause binary incompatibility errors.

## 5. Verification Plan
We have established a rigorous verification suite (`examples/test_raydp_sanity.py`) to validate the migration:

*   **[x] Cluster Initialization:** Verifies that `RayDPSparkMaster` and `RayDPExecutor` actors start successfully on Ray.
*   **[x] Network Binding:** Validates that Spark Driver and Executors can communicate, specifically addressing macOS split-brain networking (Localhost vs LAN IP) via explicit binding.
*   **[x] Job Execution:** Runs standard Spark actions (`count`, `range`) to verify the RPC layer.
*   **[x] Data Transfer (Spark → Ray):** Validates `ray.data.from_spark()` using the updated `ObjectStoreWriter`.
*   **[x] Data Transfer (Ray → Spark):** Validates `ds.to_spark()` using the new `Spark411SQLHelper` to reconstruct DataFrames from Arrow batches.

## 6. Migration Guide
To upgrade to RayDP 2.0:

1.  **Upgrade Java:** Install OpenJDK 17 on your local machine and all cluster nodes.
2.  **Set JAVA_HOME:** Ensure `JAVA_HOME` points to the JDK 17 installation.
3.  **Update Python Deps:** `pip install pyspark>=4.1.1 ray>=2.53.0`.
4.  **Rebuild:** Run `mvn clean package` in `core/` and `pip install .` in `python/`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] RayDP 2.0: Migration to Spark 4.1 & Java 17 #459

1. Executive Summary

2. Motivation

3. Technical Specification

3.1 Core Dependency Matrix

3.2 Architectural Changes

A. The Shim Layer Strategy

B. Scheduler Backend Refactoring

3.3 Build System Overhaul

4. Breaking Changes

5. Verification Plan

6. Migration Guide

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	RayDP 1.x (Legacy)	RayDP 2.0 (New)	Impact
Java	8 / 11	17 (Strict)	Requires environment updates on all Ray nodes.
Spark	3.x	4.1.1	Major internal API drift (Scheduler, SQL, RPC).
Scala	2.12	2.13	Incompatible binary interface; requires recompilation.
Ray	2.x	2.53.0+	Aligned with recent Ray releases.
Python	3.6+	3.10+	Dropped support for EOL Python versions.

[RFC] RayDP 2.0: Migration to Spark 4.1 & Java 17 #459

Description

1. Executive Summary

2. Motivation

3. Technical Specification

3.1 Core Dependency Matrix

3.2 Architectural Changes

A. The Shim Layer Strategy

B. Scheduler Backend Refactoring

3.3 Build System Overhaul

4. Breaking Changes

5. Verification Plan

6. Migration Guide

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions