Skip to content

πŸ”„ daily merge: master β†’ main 2026-01-13#743

Open
antfin-oss wants to merge 281 commits intomainfrom
create-pull-request/patch-fcde85e757
Open

πŸ”„ daily merge: master β†’ main 2026-01-13#743
antfin-oss wants to merge 281 commits intomainfrom
create-pull-request/patch-fcde85e757

Conversation

@antfin-oss
Copy link

This Pull Request was created automatically to merge the latest changes from master into main branch.

πŸ“… Created: 2026-01-13
πŸ”€ Merge direction: master β†’ main
πŸ€– Triggered by: Scheduled

Please review and merge if everything looks good.

eicherseiji and others added 30 commits December 17, 2025 15:34
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
fixes ray-project#59218

Signed-off-by: abrar <abrar@anyscale.com>
…t#59502)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
We now prebuild manylinux2014 with JDK as part of
ray-project#59204

We can directly consume this, rather than rebuilding each time we need
it

---------

Signed-off-by: andrew <andrew@anyscale.com>
fixes ray-project#59218

---------

Signed-off-by: abrar <abrar@anyscale.com>
…55781)

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
fixes ray-project#59218

---------

Signed-off-by: abrar <abrar@anyscale.com>
as we do not need to support python 3.9 anymore

---------

Signed-off-by: Mark Towers <mark@anyscale.com>
Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
Co-authored-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Mark Towers <mark.m.towers@gmail.com>
were not all renamed

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…l_export_options (ray-project#59509)

the options are required for test telemetry gathering to work at the end of the test job run.
not used anymore, and does not work after gymnasium upgrade.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
needs to take `HOSTTYPE` from env and avoid adding defaults

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…stic multi-epoch training (ray-project#59528)

This PR renamed ray-project#59044

## Description

This PR adds execution-aware shuffling to Ray Data's file-based
datasources, enabling different file orders across different executions
while maintaining determinism.

### Changes

**Core functionality:**
- Added `execution_idx` field to `DataContext` to track the current
epoch
- `FileShuffleConfig` can receive `base_seed` to automatically increment
seed after each execution
- If `FileShuffleConfig` still uses `seed`, the random seed is still the
same for each execution.
- Modified `FileBasedDatasource.get_read_tasks()` to accept
`execution_idx` parameter and pass it through the shuffle logic

**Benefits:**
- No breaking API change.
- Each epoch produces a different but deterministic shuffle when
`base_seed` is provided
- Ensures that multiple datasets with the same shuffle config produce
identical results within each epoch

---------

Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com>
Signed-off-by: xgui <xgui@anyscale.com>
ray-project#59218

---------

Signed-off-by: abrar <abrar@anyscale.com>
fixes ray-project#56633

- [x] Add documentation
- [x] update `get_multiplexed_model_id` to see if we are batch context
first
- [x] update logic
- [x] add tests
- [x] does not introduce any backwards incompatibility, previously the
system did not provide any guarantee about contents of a batch and now
we are add a constraint that guarantees each batch contains requests for
same model.
- [x] execute sub batches concurrently 

The thing I dislike about this implementation is that it does not fill
the batch in the case where the replica is responsible for > 2 models
and incoming traffic is equally distributed between those models.
Becasue the current implementation fills the batch first, then divides
them.

Metric | Baseline (42905 reqs) | Master (27526 reqs) | Ξ” Change (Master
βˆ’ Baseline)
-- | -- | -- | --
Requests | 42,905 | 27,526 | βˆ’15,379
Fails | 0 | 0 | 0
Median (ms) | 290 | 300 | +10 ms
95%ile (ms) | 560 | 570 | +10 ms
99%ile (ms) | 620 | 640 | +20 ms
Average (ms) | 327.41 | 332.96 | +5.55 ms
Min (ms) | 61 | 80 | +19 ms
Max (ms) | 764 | 802 | +38 ms
Avg Size (bytes) | 13 | 13 | 0
Current RPS | 299 | 293 | βˆ’6
Current Failures/s | 0 | 0 | 0

---------

Signed-off-by: abrar <abrar@anyscale.com>
…Data LLM (ray-project#59499)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…#59423)

PR ray-project#58325 adds shutdown and abort hooks to enhance resource-cleanup
logic in DatasetsSetupCallback, the callback’s responsibilities have
expanded beyond initial setup. Accordingly, this PR renames it to
DatasetsCallback for better alignment with its behavior.

Signed-off-by: JasonLi1909 <jasli1909@gmail.com>
## Description

Enable sync actors to detect task cancellation during execution via
`ray.get_runtime_context().is_canceled()` API for graceful termination

Main changes:

Record task cancellation in executor:

- `core_worker.cc`
- Track canceled task with `canceled_tasks_` and expose it to Python API
through `IsTaskCanceled()`
- `runtime_context.py`
- Add `is_canceled()` API that calls `CoreWorker::IsTaskCanceled()`
following path `worker.py` -> `_raylet.pyx` -> `core_worker.cc`

Raise cancellation error when using `ray.get()` (on submitter side)

- `actor_task_submitter.cc`
- For sync actor task that completed without error but marked as
canceled (Get from `TaskManager::IsTaskCanceled()`), explicitly set
`TASK_CANCELLED` error
- `task_manager.cc`
    - Add `IsTaskCanceled()` to expose `is_canceled_` flag

Docs update:


https://anyscale-ray--58914.com.readthedocs.build/en/58914/ray-core/actors.html#cancelling-actor-tasks

https://anyscale-ray--58914.com.readthedocs.build/en/58914/ray-core/tasks.html#cancelling-tasks

https://anyscale-ray--58914.com.readthedocs.build/en/58914/ray-core/api/doc/ray.cancel.html#ray.cancel


## Related issues

Related to ray-project#58213

## Additional information

Tested with following example:

```py
import ray
import time

ray.init()


@ray.remote(max_concurrency=10)
class ThreadedActor:
    def __init__(self):
        self.counter = 0

    def long_running_task(self, duration=10):
        for i in range(duration):
            print("counter: ", self.counter)
            if ray.get_runtime_context().is_canceled():
                return "canceled"

            self.counter += 1
            time.sleep(0.1)

        return "completed"


if __name__ == "__main__":
    actor = ThreadedActor.remote()
    task_ref = actor.long_running_task.remote(duration=10)

    time.sleep(0.3)
    ray.cancel(task_ref)

    try:
        result = ray.get(task_ref)
        print(f"Result: {result}")
    except ray.exceptions.TaskCancelledError as e:
        print(f"Cancelled: {e}")

    ray.shutdown()
```

Result screenshot:

<img width="1165" height="182" alt="image"
src="https://github.com/user-attachments/assets/29acb5dd-1dd6-4635-8d20-ea39090b7b1d"
/>

---------

Signed-off-by: machichima <nary12321@gmail.com>
## Description
Remove unused NODE_DEFAULT_IP constant

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: yicheng <yicheng@anyscale.com>
Co-authored-by: yicheng <yicheng@anyscale.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
…ct#59476)

Analysis of the two operator patterns:

## Streaming_repartition β†’ map_batches

| | Number of `map_batches` tasks |

|----------------------|---------------------------------------------------------------------------|
| **Fused** | `num_input_blocks` (which is ≀ number of output blocks of
StreamingRepartition) |
| **Not fused** | number of output blocks of StreamingRepartition |

When fused, the number of tasks equals the number of input blocks, which
is
≀ the number of output blocks of StreamingRepartition. If
StreamingRepartition
is supposed to break down blocks to increase parallelism, that won't
happen
when fused. So we don't fuse.

---

## Map_batches β†’ streaming_repartition

`batch_size % target_num_rows == 0`

|                      | Number of `map_batches` tasks |
|----------------------|-------------------------------|
| **Fused**            | == total_rows / batch_size |
| **Not fused**        | == total_rows / batch_size |

So, the fusion doesn’t affect the parallelism.

---

Thus, we currently disable the `Streaming_repartition β†’ map_batches`
fusion and enable the fusion when `batch_size % target_num_rows == 0`
for `Map_batches β†’ streaming_repartition`.

---------

Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com>
Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
…#59512)

## Description
Currently, we actor pool util is calculated as the following:

- `max_tasks_in_flight / (num_actors_running * max_concurrency)`

Since `max_tasks_in_flight_per_actor = 2 * max_concurrency` and because
The default value for scaling up
`RAY_DATA_DEFAULT_ACTOR_POOL_UTIL_UPSCALING_THRESHOLD` is 2.0, the only
way for the actor pool util to reach 200% is if the actor pool is fully
saturated.

## Related issues
None

## Additional information
None

---------

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
)

Adds documentation around
* Expressions
* Resource configuration
* Async UDFs
* Placement Groups / Distributed UDFs

And also refine text around key concepts.

---------

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
## Description
This pull request removes Unity3D-based environments (`mlagents` and
`mlagents_envs`) from RLlib, including dependencies, code,
documentation, and related test requirements.

The main goal is to clean up the requirements dependency.

---------

Signed-off-by: Kamil Kaczmarek <kamil@anyscale.com>
```
REGRESSION 29.30%: client__1_1_actor_calls_async (THROUGHPUT) regresses from 1152.0966726586987 to 814.5335887693782 in microbenchmark.json
REGRESSION 27.74%: client__1_1_actor_calls_concurrent (THROUGHPUT) regresses from 1146.7134222243185 to 828.6299560282166 in microbenchmark.json
REGRESSION 27.16%: multi_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 13260.224066162647 to 9658.981678535481 in microbenchmark.json
REGRESSION 25.88%: multi_client_put_gigabytes (THROUGHPUT) regresses from 47.62336463265461 to 35.29689743165927 in microbenchmark.json
REGRESSION 25.52%: client__tasks_and_get_batch (THROUGHPUT) regresses from 1.0755792867557323 to 0.8011125804259877 in microbenchmark.json
REGRESSION 21.24%: client__1_1_actor_calls_sync (THROUGHPUT) regresses from 576.2018967799997 to 453.8059915017394 in microbenchmark.json
REGRESSION 20.91%: client__tasks_and_put_batch (THROUGHPUT) regresses from 11657.102874288967 to 9220.111790372692 in microbenchmark.json
REGRESSION 12.79%: single_client_tasks_and_get_batch (THROUGHPUT) regresses from 6.5168926025589275 to 5.683136512909751 in microbenchmark.json
REGRESSION 12.54%: 1_n_actor_calls_async (THROUGHPUT) regresses from 7818.847120700663 to 6838.2845805526595 in microbenchmark.json
REGRESSION 12.17%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 4686.60144219099 to 4116.404938052882 in microbenchmark.json
REGRESSION 11.79%: 1_1_actor_calls_concurrent (THROUGHPUT) regresses from 5629.888924437268 to 4965.99522007048 in microbenchmark.json
REGRESSION 10.73%: client__put_calls (THROUGHPUT) regresses from 821.8214713340072 to 733.6703843739219 in microbenchmark.json
REGRESSION 10.69%: 1_1_async_actor_calls_async (THROUGHPUT) regresses from 4314.570035703319 to 3853.261228971964 in microbenchmark.json
REGRESSION 10.46%: client__get_calls (THROUGHPUT) regresses from 1033.7763022350296 to 925.594265020844 in microbenchmark.json
REGRESSION 9.84%: 1_1_async_actor_calls_with_args_async (THROUGHPUT) regresses from 2762.9385297368535 to 2490.991223668351 in microbenchmark.json
REGRESSION 9.16%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 6913.550938819563 to 6280.583274671035 in microbenchmark.json
REGRESSION 8.78%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 21866.061040938854 to 19945.253372184772 in microbenchmark.json
REGRESSION 7.90%: n_n_actor_calls_async (THROUGHPUT) regresses from 24531.521409406632 to 22593.67022851302 in microbenchmark.json
REGRESSION 3.15%: single_client_tasks_sync (THROUGHPUT) regresses from 872.2036137608502 to 844.7209532677355 in microbenchmark.json
REGRESSION 2.79%: tasks_per_second (THROUGHPUT) regresses from 390.190063861316 to 379.30168512953065 in benchmarks/many_nodes.json
REGRESSION 2.75%: single_client_tasks_async (THROUGHPUT) regresses from 6961.354217387221 to 6769.634231009387 in microbenchmark.json
REGRESSION 2.70%: n_n_actor_calls_with_arg_async (THROUGHPUT) regresses from 3353.6340010226468 to 3263.220674257469 in microbenchmark.json
REGRESSION 2.21%: multi_client_tasks_async (THROUGHPUT) regresses from 20569.559125979922 to 20114.199533908533 in microbenchmark.json
REGRESSION 1.89%: single_client_get_calls_Plasma_Store (THROUGHPUT) regresses from 9541.420239681218 to 9361.068161075398 in microbenchmark.json
REGRESSION 1.82%: single_client_wait_1k_refs (THROUGHPUT) regresses from 4.803898199921876 to 4.716418799247922 in microbenchmark.json
REGRESSION 1.39%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 12.697911312818526 to 12.521640061929554 in microbenchmark.json
REGRESSION 1.02%: placement_group_create/removal (THROUGHPUT) regresses from 685.9076055741489 to 678.9244842339416 in microbenchmark.json
REGRESSION 0.03%: client__put_gigabytes (THROUGHPUT) regresses from 0.10220457395600176 to 0.10217222369611438 in microbenchmark.json
REGRESSION 157.48%: dashboard_p99_latency_ms (LATENCY) regresses from 382.069 to 983.739 in benchmarks/many_pgs.json
REGRESSION 108.23%: dashboard_p95_latency_ms (LATENCY) regresses from 17.033 to 35.467 in benchmarks/many_pgs.json
REGRESSION 68.74%: stage_4_spread (LATENCY) regresses from 0.26078188348514014 to 0.4400494907723027 in stress_tests/stress_test_many_tasks.json
REGRESSION 49.97%: dashboard_p95_latency_ms (LATENCY) regresses from 17.15 to 25.72 in benchmarks/many_nodes.json
REGRESSION 49.08%: stage_3_time (LATENCY) regresses from 1911.0371930599213 to 2849.039920568466 in stress_tests/stress_test_many_tasks.json
REGRESSION 47.13%: dashboard_p50_latency_ms (LATENCY) regresses from 20.494 to 30.152 in benchmarks/many_actors.json
REGRESSION 39.86%: time_to_broadcast_1073741824_bytes_to_50_nodes (LATENCY) regresses from 12.110535204000001 to 16.937953781000004 in scalability/object_store.json
REGRESSION 24.17%: 1000000_queued_time (LATENCY) regresses from 177.25064926800002 to 220.095988608 in scalability/single_node.json
REGRESSION 22.86%: dashboard_p50_latency_ms (LATENCY) regresses from 6.037 to 7.417 in benchmarks/many_nodes.json
REGRESSION 14.23%: stage_2_avg_iteration_time (LATENCY) regresses from 36.306496143341064 to 41.472771883010864 in stress_tests/stress_test_many_tasks.json
REGRESSION 11.81%: dashboard_p99_latency_ms (LATENCY) regresses from 49.411 to 55.245 in benchmarks/many_nodes.json
REGRESSION 4.39%: stage_1_avg_iteration_time (LATENCY) regresses from 14.045441269874573 to 14.662270617485046 in stress_tests/stress_test_many_tasks.json
REGRESSION 1.20%: avg_pg_remove_time_ms (LATENCY) regresses from 1.4351014099100747 to 1.452357480480116 in stress_tests/stress_test_placement_group.json
REGRESSION 1.20%: 10000_args_time (LATENCY) regresses from 17.502108547999995 to 17.71234498399999 in scalability/single_node.json
REGRESSION 0.66%: 3000_returns_time (LATENCY) regresses from 5.539246789999993 to 5.576066161 in scalability/single_node.json
```

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
Co-authored-by: Lonnie Liu <lonnie@anyscale.com>
…roject#59572)

we can remove the python version constraint after windows CI is migrated
to python 3.9

Signed-off-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
otherwise the wheel uploading is failing with older versions.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…ification loop (ray-project#59574)

## Description

In test_cancel_recursive_tree, the concurrent test case:
1. Creates 10 ChildActor instances
2. Submits 10 Actor.run tasks, each spawning child tasks on a ChildActor
3. Cancels 5 tasks with recursive=True and 5 with recursive=False
4. Expects that for recursive=True, both the parent task and child tasks
are cancelled; for recursive=False, only the parent task is cancelled

The issue is in the verification loop: when checking if the parent tasks
are cancelled, the test uses `run_ref` (a stale loop variable from the
previous loop) instead of run_refs[i]. This causes the test to verify
the same task (`run_refs[9]`) ten times, rather than verifying all 10
tasks.

This PR fixes the issue by using `run_refs[i]` to correctly verify each
task.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: yicheng <yicheng@anyscale.com>
Co-authored-by: yicheng <yicheng@anyscale.com>
This is being done as part of cataloging the ray serve env var as per
the doc:
https://docs.google.com/spreadsheets/d/1mU_ds6_hI39dK-7zZEFr4SBgHJoASpTvqW5t0yi596A/edit?usp=sharing

This PR removes support for several environment variables that were used
to override Ray Serve HTTP and gRPC configuration settings. These
settings should now be configured exclusively through the Serve config
API (http_options). Additionally, this PR adds documentation for the
RAY_SERVE_GRPC_MAX_MESSAGE_SIZE environment variable.

### Removed Environment Variables
| Environment Variable | Default Value | Alternative |
|---------------------|---------------|-------------|
| `RAY_SERVE_DEFAULT_HTTP_HOST` | `127.0.0.1` | Use `http_options.host`
in config |
| `RAY_SERVE_DEFAULT_HTTP_PORT` | `8000` | Use `http_options.port` in
config |
| `RAY_SERVE_DEFAULT_GRPC_PORT` | `9000` | Use `grpc_options.port` in
config |
| `RAY_SERVE_HTTP_KEEP_ALIVE_TIMEOUT_S` | `0` (disabled) | Use
`http_options.keep_alive_timeout_s` in config |
| `RAY_SERVE_REQUEST_PROCESSING_TIMEOUT_S` | `0.0` (disabled) | Use
`http_options.request_timeout_s` in config |
| `SERVE_REQUEST_PROCESSING_TIMEOUT_S` | `0.0` (disabled) | Use
`http_options.request_timeout_s` in config |

### Changes
- `python/ray/serve/_private/constants.py`
  - Replaced environment variable lookups with hardcoded default values
- `doc/source/serve/http-guide.md`
- Removed documentation for RAY_SERVE_HTTP_KEEP_ALIVE_TIMEOUT_S
environment variable
- `doc/source/serve/advanced-guides/grpc-guide.md`
- Added new section "Configure gRPC message size limits" documenting the
RAY_SERVE_GRPC_MAX_MESSAGE_SIZE environment variable
  - Updated introduction to include the new topic
- `python/ray/serve/tests/test_proxy.py`
  - Removed test_set_keep_alive_timeout_in_env test
  - Removed test_set_timeout_keep_alive_in_both_config_and_env test
- `python/ray/serve/tests/unit/test_http_util.py`
  - Removed mock_env_constants fixture
- Simplified test_basic_configuration (formerly
test_basic_configuration_with_mock_env)
  - Removed test_keep_alive_timeout_override_from_env test
  - Removed test_request_timeout_preserved_when_already_set test
- `python/ray/serve/tests/test_request_timeout.py`
- Updated all tests to use
serve.start(http_options={"request_timeout_s": ...}) instead of
environment variable parametrization

---------

Signed-off-by: harshit <harshit@anyscale.com>
mgchoi239 and others added 22 commits January 9, 2026 20:35
… Data APIs (ray-project#59918)

This PR splits the Input/Output API reference page into two separate
pages to improve organization and mirror the structure of the user
guides.

## Changes
- Renamed `input_output.rst` to `loading_data.rst`
- Created `saving_data.rst` with all saving/writing APIs
- Updated `api.rst` to reference both new files
- Updated all references from `input-output` to
`loading-data-api`/`saving-data-api`
- Standardized section header formatting with dashes matching title
length

Fixes ray-project#59301

---------

Signed-off-by: mgchoi239 <mg.choi.239@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: mgchoi239 <mg.choi.239@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
…y-project#59959)

## Description
The
[test_operators.py](https://github.com/ray-project/ray/blob/f85c5255669d8a682673401b54a86612a79058e3/python/ray/data/tests/test_operators.py)
file (~1,670 lines, 28 test functions) occasionally times out during CI
runs. We should split it into smaller, logically grouped test modules to
improve test reliability and allow better parallel execution.
@tianyi-ge 
## Related issues
Fixes ray-project#59881

Signed-off-by: Haichuan <kaisennhu@gmail.com>
…ay-project#59955)

All PRs that are submitted to Ray must have clear titles and
descriptions that give the reviewer adequate context. To help catch PRs
that violate this rule, I've added a bugbot rule to cursor that will
automate this in turn taking load off of the review process.

---------

Signed-off-by: joshlee <joshlee@anyscale.com>
…ject#59957)

# Summary

Before this PR, the training failed error was buried in the `exc_text`
part of the log. After this PR it should also appear in the `message`
part of the log.

# Testing

Unit tests

---------

Signed-off-by: Timothy Seah <tseah@anyscale.com>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Dead code is a maintenance burden. Removing unused mocks. 

This came up as I was working on removing the ClusterLeaseManager from
the GCS in ray-project#60008.

Signed-off-by: irabbani <israbbani@gmail.com>
ray-project#60012)

## Description
Reverting ray-project#59852 as it causes
release test infra to fail. We need to update the infra to jive with the
new port discovery settings properly.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
Example failing build:
https://buildkite.com/ray-project/release/builds/74681#
…orphan (ray-project#59982)

If publishers follow the example publishing pattern for ray
data/serve/train, they should end up with a structure like:

```
doc/source/{path-to-examples}/my-example/
  └─ content/
     β”œβ”€ notebook.ipynb
     β”œβ”€ README.md
     └─ …
```
with path-to-examples any of "serve/tutorials/" | "data/examples/" |
"train/examples/"

In the `examples.yml`, publishers link to `notebook.ipynb`.

### Issue

* `examples.rst` is dynamically created by custom scripts in
`custom_directives.py`. The script reads each `examples.yml` file and
create the html for the examples index page in ray docs.
* Because `examples.rst` is initially empty, Sphinx considers all
notebooks as "orphan" documents and emits warnings, which fail CI
because ReadTheDocs is configured to fail on warnings
* To silence the warning, publishers must manually add a `orphan: True`
to the metadata of the notebook
* This adds unnecessary overhead for publishers unfamiliar with Sphinx.
They should only worry about their content, not what an "orphan"
document is

### Solution

This PR automatically adds the `orphan: True` metadata to any files
listed in any of the `examples.yml` for ray data/serve/train. This
ensures:

* Using examples.yml as source of truth so only impacted files are
impacted. No side effect on unrelated files.
* Publisher can focus on its content, doesn't have to worry about sphinx

---------

Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Signed-off-by: Aydin Abiar <62435714+Aydin-ab@users.noreply.github.com>
Co-authored-by: Aydin Abiar <aydin@anyscale.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… different processes (ray-project#59634)

**Problem:**

fixes ray-project#57803

When tracing is enabled, calling an actor method from a different
process than the one that created the actor fails with:
```
TypeError: got an unexpected keyword argument '_ray_trace_ctx'
```

This commonly occurs with Ray Serve, where:
- `serve start` creates the controller actor (process A)
- Dashboard calls `ray.get_actor()` to interact with it (process B)

## Repo
Simplest way to repro is to run the following
```bash
ray start --head --tracing-startup-hook="ray.util.tracing.setup_local_tmp_tracing:setup_tracing"                                                                                                        βœ” β”‚ ray_310 Py β”‚ with ubuntu@devbox β”‚ at 06:22:12
serve start
```

But here is a core specific repro script

`repro_actor_module.py`
```python
class MyActor:
    """A simple actor class that will be decorated dynamically."""
    
    def __init__(self):
        self.value = 0
    
    def my_method(self, x):
        """A simple method."""
        return x * 2
    
    def check_alive(self):
        """Health check method."""
        return True
    
    def increment(self, amount=1):
        """Method with a default parameter."""
        self.value += amount
        return self.value
```

`repro_tracing_issue.py`
```python
import multiprocessing
import subprocess
import sys


NAMESPACE = "test_ns"


def creator_process(ready_event, done_event):
    import ray
    from ray.util.tracing.tracing_helper import _is_tracing_enabled
    
    # Import the actor class from module (NOT decorated yet)
    from repro_actor_module import MyActor
    
    setup_tracing_path = "ray.util.tracing.setup_local_tmp_tracing:setup_tracing"
    ray.init(_tracing_startup_hook=setup_tracing_path, namespace=NAMESPACE)
    
    print(f"[CREATOR] Tracing enabled: {_is_tracing_enabled()}")
    
    # Dynamically decorate and create the test actor (like Serve does)
    MyActorRemote = ray.remote(
        name="my_test_actor",
        namespace=NAMESPACE,
        num_cpus=0,
        lifetime="detached",
    )(MyActor)
    
    actor = MyActorRemote.remote()
    
    # Print signatures from creator's handle
    print(f"[CREATOR] Signatures in handle from creation:")
    for method_name, sig in actor._ray_method_signatures.items():
        param_names = [p.name for p in sig]
        print(f"  {method_name}: {param_names}")
    
    my_method_sig = actor._ray_method_signatures.get("my_method", [])
    has_trace = "_ray_trace_ctx" in [p.name for p in my_method_sig]
    print(f"[CREATOR] my_method has _ray_trace_ctx: {has_trace}")
    
    # Verify the method works from creator
    result = ray.get(actor.my_method.remote(5))
    print(f"[CREATOR] Test call result: {result}")
    
    # Signal that actor is ready
    print("[CREATOR] Actor created, signaling getter...")
    sys.stdout.flush()
    ready_event.set()
    
    # Wait for getter to finish
    done_event.wait(timeout=30)
    print("[CREATOR] Getter finished, shutting down...")
    
    # Cleanup
    ray.kill(actor)
    ray.shutdown()


def getter_process(ready_event, done_event):
    import ray
    from ray.util.tracing.tracing_helper import _is_tracing_enabled
    
    # Wait for creator to signal ready
    print("[GETTER] Waiting for creator to set up actor...")
    if not ready_event.wait(timeout=30):
        print("[GETTER] Timeout waiting for creator!")
        done_event.set()
        return
    
    # Connect to the existing cluster (this will also enable tracing from GCS hook)
    ray.init(address="auto", namespace=NAMESPACE)
    
    print(f"\n[GETTER] Tracing enabled: {_is_tracing_enabled()}")
    
    # Get the actor by name - this will RELOAD the class fresh in this process
    # The class loaded here was NEVER processed by _inject_tracing_into_class
    actor = ray.get_actor("my_test_actor", namespace=NAMESPACE)
    
    # Print signatures from getter's handle
    print(f"[GETTER] Signatures in handle from get_actor():")
    for method_name, sig in actor._ray_method_signatures.items():
        param_names = [p.name for p in sig]
        print(f"  {method_name}: {param_names}")
    
    my_method_sig = actor._ray_method_signatures.get("my_method", [])
    has_trace = "_ray_trace_ctx" in [p.name for p in my_method_sig]
    print(f"[GETTER] my_method has _ray_trace_ctx: {has_trace}")
    
    # Try calling a method
    print(f"\n[GETTER] Attempting to call my_method.remote(5)...")
    sys.stdout.flush()
    try:
        result = ray.get(actor.my_method.remote(5))
        print(f"[GETTER] Method call SUCCEEDED! Result: {result}")
    except TypeError as e:
        print(f"[GETTER] Method call FAILED with TypeError: {e}")
    
    # Signal done
    done_event.set()
    ray.shutdown()


def main():
    # Stop any existing Ray cluster
    print("Stopping any existing Ray cluster...")
    subprocess.run(["ray", "stop", "--force"], capture_output=True)
    
    # Create synchronization events
    ready_event = multiprocessing.Event()
    done_event = multiprocessing.Event()
    
    # Start creator process
    creator = multiprocessing.Process(target=creator_process, args=(ready_event, done_event))
    creator.start()
    
    # Start getter process (will connect to existing cluster)
    getter = multiprocessing.Process(target=getter_process, args=(ready_event, done_event))
    getter.start()
    
    # Wait for both to complete
    getter.join(timeout=60)
    creator.join(timeout=10)
    
    # Clean up any hung processes
    if creator.is_alive():
        creator.terminate()
        creator.join(timeout=5)
    if getter.is_alive():
        getter.terminate()
        getter.join(timeout=5)
    
    # Cleanup Ray
    print("\nCleaning up...")
    subprocess.run(["ray", "stop", "--force"], capture_output=True)
    print("Done.")


if __name__ == "__main__":
    main()
```

<details>

<summary> output from master </summary>

```bash
❯ python repro_tracing_issue.py
Stopping any existing Ray cluster...
[GETTER] Waiting for creator to set up actor...
2025-12-24 07:05:02,215 INFO worker.py:1991 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
/home/ubuntu/ray/python/ray/_private/worker.py:2039: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(
[CREATOR] Tracing enabled: True
[CREATOR] Signatures in handle from creation:
  __init__: ['_ray_trace_ctx']
  __ray_call__: ['fn', 'args', '_ray_trace_ctx', 'kwargs']
  __ray_ready__: ['_ray_trace_ctx']
  __ray_terminate__: ['_ray_trace_ctx']
  check_alive: ['_ray_trace_ctx']
  increment: ['amount', '_ray_trace_ctx']
  my_method: ['x', '_ray_trace_ctx']
[CREATOR] my_method has _ray_trace_ctx: True
[CREATOR] Test call result: 10
[CREATOR] Actor created, signaling getter...
2025-12-24 07:05:02,953 INFO worker.py:1811 -- Connecting to existing Ray cluster at address: 172.31.7.228:38871...
2025-12-24 07:05:02,984 INFO worker.py:1991 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265 
/home/ubuntu/ray/python/ray/_private/worker.py:2039: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(

[GETTER] Tracing enabled: True
[GETTER] Signatures in handle from get_actor():
  __init__: []
  __ray_call__: ['fn', 'args', '_ray_trace_ctx', 'kwargs']
  __ray_ready__: ['_ray_trace_ctx']
  __ray_terminate__: ['_ray_trace_ctx']
  check_alive: []
  increment: ['amount']
  my_method: ['x']
[GETTER] my_method has _ray_trace_ctx: False

[GETTER] Attempting to call my_method.remote(5)...
[GETTER] Method call FAILED with TypeError: got an unexpected keyword argument '_ray_trace_ctx'
[CREATOR] Getter finished, shutting down...

Cleaning up...
Done.
```

</details>

<details>

<summary>output from this PR</summary>

```bash
❯ python repro_tracing_issue.py
Stopping any existing Ray cluster...
[GETTER] Waiting for creator to set up actor...
2025-12-24 07:04:03,758 INFO worker.py:1991 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
/home/ubuntu/ray/python/ray/_private/worker.py:2039: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(
[CREATOR] Tracing enabled: True
[CREATOR] Signatures in handle from creation:
  __init__: ['_ray_trace_ctx']
  __ray_call__: ['fn', 'args', '_ray_trace_ctx', 'kwargs']
  __ray_ready__: ['_ray_trace_ctx']
  __ray_terminate__: ['_ray_trace_ctx']
  check_alive: ['_ray_trace_ctx']
  increment: ['amount', '_ray_trace_ctx']
  my_method: ['x', '_ray_trace_ctx']
[CREATOR] my_method has _ray_trace_ctx: True
[CREATOR] Test call result: 10
[CREATOR] Actor created, signaling getter...
2025-12-24 07:04:04,476 INFO worker.py:1811 -- Connecting to existing Ray cluster at address: 172.31.7.228:37231...
2025-12-24 07:04:04,504 INFO worker.py:1991 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265 
/home/ubuntu/ray/python/ray/_private/worker.py:2039: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(

[GETTER] Tracing enabled: True
[GETTER] Signatures in handle from get_actor():
  __init__: ['_ray_trace_ctx']
  __ray_call__: ['fn', 'args', '_ray_trace_ctx', 'kwargs']
  __ray_ready__: ['_ray_trace_ctx']
  __ray_terminate__: ['_ray_trace_ctx']
  check_alive: ['_ray_trace_ctx']
  increment: ['amount', '_ray_trace_ctx']
  my_method: ['x', '_ray_trace_ctx']
[GETTER] my_method has _ray_trace_ctx: True

[GETTER] Attempting to call my_method.remote(5)...
[GETTER] Method call SUCCEEDED! Result: 10
[CREATOR] Getter finished, shutting down...

Cleaning up...
Done.
```

</details>

**Root Cause:**

`_inject_tracing_into_class` sets `__signature__` (including
`_ray_trace_ctx`) on the method object during actor creation. However:

1. When the actor class is serialized (cloudpickle) and loaded in
another process, `__signature__` is **not preserved** on module-level
functions. See repro script at the end of PR description as proof
2. `_ActorClassMethodMetadata.create()` uses `inspect.unwrap()` which
follows the `__wrapped__` chain to the **deeply unwrapped original
method**
3. The original method's `__signature__` was lost during serialization β†’
signatures extracted **without** `_ray_trace_ctx`
4. When calling the method, `_tracing_actor_method_invocation` adds
`_ray_trace_ctx` to kwargs β†’ **signature validation fails**

**Fix:**

1. In `_inject_tracing_into_class`: Set `__signature__` on the **deeply
unwrapped** method (via `inspect.unwrap`) rather than the immediate
method. This ensures `_ActorClassMethodMetadata.create()` finds it after
unwrapping.

2. In `load_actor_class`: Call `_inject_tracing_into_class` after
loading to re-inject the lost `__signature__` attributes.

**Testing:**
- Added reproduction script demonstrating cross-process actor method
calls with tracing
- All existing tracing tests pass
- Add a new test for serve with tracing

`repro_cloudpickle_signature.py`
```python
import inspect
import cloudpickle
import pickle
import multiprocessing


def check_signature_in_subprocess(pickled_func_bytes):
    func = pickle.loads(pickled_func_bytes)
    
    print(f"[SUBPROCESS] Unpickled function: {func}")
    print(f"[SUBPROCESS] Module: {func.__module__}")
    
    sig = getattr(func, '__signature__', None)
    if sig is not None:
        params = list(sig.parameters.keys())
        print(f"[SUBPROCESS] __signature__: {sig}")
        if '_ray_trace_ctx' in params:
            print(f"[SUBPROCESS] __signature__ WAS preserved")
            return True
        else:
            print(f"[SUBPROCESS] __signature__ NOT preserved (missing _ray_trace_ctx)")
            return False
    else:
        print(f"[SUBPROCESS] __signature__ NOT preserved (attribute missing)")
        return False


def main():
    from repro_actor_module import MyActor
    func = MyActor.my_method
    
    print(f"\n[MAIN] Function: {func}")
    print(f"[MAIN] Module: {func.__module__}")
    print(f"[MAIN] __signature__ before: {getattr(func, '__signature__', 'NOT SET')}")
    
    # Set a custom __signature__ with _ray_trace_ctx
    custom_sig = inspect.signature(func)
    new_params = list(custom_sig.parameters.values()) + [
        inspect.Parameter("_ray_trace_ctx", inspect.Parameter.KEYWORD_ONLY, default=None)
    ]
    func.__signature__ = custom_sig.replace(parameters=new_params)
    
    print(f"[MAIN] __signature__ after: {func.__signature__}")
    print(f"[MAIN] Parameters: {list(func.__signature__.parameters.keys())}")
    
    # Pickle
    print(f"\n[MAIN] Pickling with cloudpickle...")
    pickled = cloudpickle.dumps(func)
    
    # Test 1: Same process
    print(f"\n{'='*70}")
    print("TEST 1: Unpickle in SAME process")
    print(f"{'='*70}")
    same_func = pickle.loads(pickled)
    same_sig = getattr(same_func, '__signature__', None)
    if same_sig and '_ray_trace_ctx' in list(same_sig.parameters.keys()):
        print(f"Same process: __signature__ preserved")
    else:
        print(f"Same process: __signature__ NOT preserved")
    
    # Test 2: Different process
    print(f"\n{'='*70}")
    print("TEST 2: Unpickle in DIFFERENT process")
    print(f"{'='*70}")
    
    ctx = multiprocessing.get_context('spawn')
    with ctx.Pool(1) as pool:
        result = pool.apply(check_signature_in_subprocess, (pickled,))
    
    if result:
        print("__signature__ IS preserved (unexpected)")
    else:
        print("__signature__ is NOT preserved for functions from imported modules!")


if __name__ == "__main__":
    main()

```

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
… output (ray-project#60034)

Signed-off-by: yicheng <yicheng@anyscale.com>
Co-authored-by: yicheng <yicheng@anyscale.com>
## Description
Third PR for isolating progress managers / making them easier to work
with.

- Fully unify interfaces for all progress managers
- Introduce the `Noop` progress --> basically to use for no-op
situations
- Separate out function for determining which progress manager to use
- Add in `verbose_progress` setting for `ExecutionOptions`. This was
missing from previous versions, mb

## Related issues
N/A

## Additional information
N/A

---------

Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
## Description

Add sql_params support to read_sql so callers can pass [DB‑API
2](https://peps.python.org/pep-0249/#id20) parameter bindings instead of
string formatting. This enables safe, parameterized queries and is
propagated through all SQL execution paths (count, sharding checks, and
reads). Also adds a sqlite parameterized query test and updates
docstring.

## Related issues

Related to ray-project#54098.

## Additional information

Design/implementation notes:

- API: add optional sql_params to read_sql, matching DB‑API 2
cursor.execute(operation[, parameters]).
- Call chain: 
   read_sql(...) 
   β†’ SQLDatasource(sql_params=...) 
   β†’ get_read_tasks(...)
   β†’ supports_sharding/_get_num_rows/fallback read/per‑shard read 
   β†’ _execute(cursor, sql, sql_params).
- No paramstyle parsing: Ray doesn’t interpret placeholders; it passes
sql_params through to the driver as‑is.
- Behavior: if sql_params is None, _execute falls back to
cursor.execute(sql), preserving existing behavior.

Tests:

- pytest python/ray/data/tests/test_sql.py

- Local quick check (example):

```python
Python 3.10.19 | packaged by conda-forge | (main, Oct 22 2025, 22:46:49) [Clang 19.1.7 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> import ray
>>> 
>>> db = "example.db"
>>> conn = sqlite3.connect(db)
>>> conn.execute("DROP TABLE IF EXISTS movie")
<sqlite3.Cursor object at 0x1035af6c0>
>>> conn.execute("CREATE TABLE movie(title, year, score)")
<sqlite3.Cursor object at 0x1055c7040>
>>> conn.executemany(
...     "INSERT INTO movie VALUES (?, ?, ?)",
...     [
...         ("Monty Python and the Holy Grail", 1975, 8.2),
...         ("And Now for Something Completely Different", 1971, 7.5),
...         ("Monty Python's Life of Brian", 1979, 8.0),
...     ],
... )
<sqlite3.Cursor object at 0x1035af6c0>
>>> conn.commit()
>>> conn.close()
>>> 
>>> def create_connection():
...     return sqlite3.connect(db)
... 
>>> # tuple 
>>> ds_tuple = ray.data.read_sql(
...     "SELECT * FROM movie WHERE year >= ?",
...     create_connection,
...     sql_params=(1975,),
... )
2026-01-11 00:26:54,103 INFO worker.py:2007 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
/Users/XXX/miniforge3/envs/clion-ray-ce/lib/python3.10/site-packages/ray/_private/worker.py:2055: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(
2026-01-11 00:26:54,700 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
>>> print("tuple:", ds_tuple.take_all())
2026-01-11 00:26:56,226 INFO logging.py:397 -- Registered dataset logger for dataset dataset_0_0
2026-01-11 00:26:56,240 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
2026-01-11 00:26:56,241 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
2026-01-11 00:26:56,242 INFO streaming_executor.py:182 -- Starting execution of Dataset dataset_0_0. Full logs are in /tmp/ray/session_2026-01-11_00-26-50_843083_56953/logs/ray-data
2026-01-11 00:26:56,242 INFO streaming_executor.py:183 -- Execution plan of Dataset dataset_0_0: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadSQL]
2026-01-11 00:26:56,242 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
2026-01-11 00:26:56,246 WARNING resource_manager.py:134 -- ⚠️  Ray's object store is configured to use only 25.2% of available memory (2.0GiB out of 7.9GiB total). For optimal Ray Data performance, we recommend setting the object store to at least 50% of available memory. You can do this by setting the 'object_store_memory' parameter when calling ray.init() or by setting the RAY_DEFAULT_OBJECT_STORE_MEMORY_PROPORTION environment variable.
2026-01-11 00:26:56,246 INFO streaming_executor.py:661 -- [dataset]: A new progress UI is available. To enable, set `ray.data.DataContext.get_current().enable_rich_progress_bars = True` and `ray.data.DataContext.get_current().use_ray_tqdm = False`.
Running Dataset dataset_0_0.: 0.00 row [00:00, ? row/s]
2026-01-11 00:26:56,2672WARNING resource_manager.py:791 -- Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadSQL]. The job may hang forever unless the cluster scales up.
βœ”οΈ  Dataset dataset_0_0 execution finished in 0.46 seconds: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.00/2.00 [00:00<00:00, 4.48 row/s]
- ReadSQL->SplitBlocks(200): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 99.0B object store: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.00/2.00 [00:00<00:00, 4.48 row/s]
2026-01-11 00:26:56,702 INFO streaming_executor.py:302 -- βœ”οΈ  Dataset dataset_0_0 execution finished in 0.46 seconds                                                   
tuple: [{'title': 'Monty Python and the Holy Grail', 'year': 1975, 'score': 8.2}, {'title': "Monty Python's Life of Brian", 'year': 1979, 'score': 8.0}]
>>> # list 
>>> ds_list = ray.data.read_sql(
...     "SELECT * FROM movie WHERE year >= ?",
...     create_connection,
...     sql_params=[1975],
... )
2026-01-11 00:27:07,304 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
>>> print("list:", ds_list.take_all())
2026-01-11 00:27:08,867 INFO logging.py:397 -- Registered dataset logger for dataset dataset_1_0
2026-01-11 00:27:08,871 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
2026-01-11 00:27:08,872 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
2026-01-11 00:27:08,873 INFO streaming_executor.py:182 -- Starting execution of Dataset dataset_1_0. Full logs are in /tmp/ray/session_2026-01-11_00-26-50_843083_56953/logs/ray-data
2026-01-11 00:27:08,873 INFO streaming_executor.py:183 -- Execution plan of Dataset dataset_1_0: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadSQL]
2026-01-11 00:27:08,874 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
Running Dataset dataset_1_0.: 0.00 row [00:00, ? row/s]
2026-01-11 00:27:08,8812WARNING resource_manager.py:791 -- Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadSQL]. The job may hang forever unless the cluster scales up.
βœ”οΈ  Dataset dataset_1_0 execution finished in 0.06 seconds: : 2.00 row [00:00, 38.9 row/s]
- ReadSQL->SplitBlocks(200): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 0.0B object store: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.00/2.00 [00:00<00:00, 37.6 row/s]
2026-01-11 00:27:08,932 INFO streaming_executor.py:302 -- βœ”οΈ  Dataset dataset_1_0 execution finished in 0.06 seconds                                                   
list: [{'title': 'Monty Python and the Holy Grail', 'year': 1975, 'score': 8.2}, {'title': "Monty Python's Life of Brian", 'year': 1979, 'score': 8.0}]
>>> # dict 
>>> ds_dict = ray.data.read_sql(
...     "SELECT * FROM movie WHERE year >= :year",
...     create_connection,
...     sql_params={"year": 1975},
... )
2026-01-11 00:27:19,155 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
>>> print("dict:", ds_dict.take_all())
2026-01-11 00:27:19,807 INFO logging.py:397 -- Registered dataset logger for dataset dataset_2_0
2026-01-11 00:27:19,811 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
2026-01-11 00:27:19,812 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
2026-01-11 00:27:19,813 INFO streaming_executor.py:182 -- Starting execution of Dataset dataset_2_0. Full logs are in /tmp/ray/session_2026-01-11_00-26-50_843083_56953/logs/ray-data
2026-01-11 00:27:19,813 INFO streaming_executor.py:183 -- Execution plan of Dataset dataset_2_0: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadSQL]
2026-01-11 00:27:19,814 INFO sql_datasource.py:153 -- Sharding is not supported. Falling back to reading all data in a single task.
Running Dataset dataset_2_0.: 0.00 row [00:00, ? row/s]
2026-01-11 00:27:19,8212WARNING resource_manager.py:791 -- Cluster resources are not enough to run any task from TaskPoolMapOperator[ReadSQL]. The job may hang forever unless the cluster scales up.
βœ”οΈ  Dataset dataset_2_0 execution finished in 0.04 seconds: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.00/2.00 [00:00<00:00, 51.6 row/s]
- ReadSQL->SplitBlocks(200): Tasks: 0; Actors: 0; Queued blocks: 0 (0.0B); Resources: 0.0 CPU, 99.0B object store: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.00/2.00 [00:00<00:00, 49.0 row/s]
2026-01-11 00:27:19,859 INFO streaming_executor.py:302 -- βœ”οΈ  Dataset dataset_2_0 execution finished in 0.04 seconds                                                   
dict: [{'title': 'Monty Python and the Holy Grail', 'year': 1975, 'score': 8.2}, {'title': "Monty Python's Life of Brian", 'year': 1979, 'score': 8.0}]
>>> 
>>> 
```

---------

Signed-off-by: yaommen <myanstu@163.com>
Fixes missing space in the warning message.

---------

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
Leaking Ray Train actors have been observed occupying GPU memory
following Train run termination, causing training failures/OOMs in
subsequent train runs. Despite the train actors being marked DEAD by Ray
Core, we find that upon ssh-ing into nodes, that the actor processes are
still alive and occupying valuable GPU memory.

This PR: 
- Replaces `__ray_terminate__` with `ray.kill` in Train run shutdown and
abort paths to guarantee the termination of train actors

---------

Signed-off-by: JasonLi1909 <jasli1909@gmail.com>
Signed-off-by: Jason Li <57246540+JasonLi1909@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
this works for different python versions and it is much easier to use
than in a conda managed python env.

Signed-off-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
- async-inf readme wasn't included in toctree, and was included in the
exclude pattern list, fixed it.

Signed-off-by: harshit <harshit@anyscale.com>
stop using the large oss ci test base

Signed-off-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
…#59896)

## Description

Addresses a critical issue in the `DefaultAutoscalerV2`, where nodes
were not being properly scaled from zero. With this update, clusters
managed by Ray will now automatically provision additional nodes when
there is workload demand, even when starting from an idle (zero-node)
state.

## Related issues
Closes ray-project#59682


## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>
Co-authored-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
Co-authored-by: sampan <sampan@anyscale.com>
…#59616)

## Description
We observed that raylet frequently emits log messages of the form
β€œDropping sync message with stale version”, which can become quite noisy
in practice.

This behavior occurs because raylet does not update the message version
for sync messages received from the GCS, and stale-version broadcast
messages are expected to be skipped by default. As a result, these log
entries are generated repeatedly even though this is normal and
non-actionable behavior.

Given that this does not indicate an error or unexpected state, logging
it at the INFO level significantly increases log noise and makes it
harder to identify genuinely important events.

We propose demoting this log from INFO to DEBUG in
RaySyncerBidiReactorBase to keep raylet logs cleaner while still
preserving the information for debugging purposes when needed.


![img_v3_02t7_be5071d6-99d2-4b3c-b189-66aa77476d3g](https://github.com/user-attachments/assets/ed91c317-3a86-441c-a2bf-b317ac0af618)

## Related issues
Closes ray-project#59615

## Additional information
- Change log level from INFO to DEBUG for β€œDropping sync message with
stale version” in RaySyncerBidiReactorBase.

Signed-off-by: Mao Yancan <yancan.mao@bytedance.com>
Co-authored-by: Mao Yancan <yancan.mao@bytedance.com>
## Description
Runs linkcheck on docs, in particular for RLlib where we've moved
tuned-examples to examples/algorithms
Further, updated github links that were automatically redirected

There are problems with some of the RLlib examples missing but I'm going
to fix these in the algorithm premerge PRs, i.e.,
ray-project#59007

---------

Signed-off-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Mark Towers <mark@anyscale.com>
@gemini-code-assist
Copy link

Note

The number of changes in this pull request is too large for Gemini Code Assist to generate a review.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale label Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.