Skip to content

πŸ”„ daily merge: master β†’ main 2026-01-12#742

Open
antfin-oss wants to merge 268 commits intomainfrom
create-pull-request/patch-a5d3e3c80e
Open

πŸ”„ daily merge: master β†’ main 2026-01-12#742
antfin-oss wants to merge 268 commits intomainfrom
create-pull-request/patch-a5d3e3c80e

Conversation

@antfin-oss
Copy link

This Pull Request was created automatically to merge the latest changes from master into main branch.

πŸ“… Created: 2026-01-12
πŸ”€ Merge direction: master β†’ main
πŸ€– Triggered by: Scheduled

Please review and merge if everything looks good.

aslonnie and others added 30 commits December 16, 2025 22:44
missed one in the change.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…ay-project#59438)

## Description

Rollout fragment length in AlgorithmConfig can be calculated to be zero
if num_learners are high enough.
This PR guarantees that rollout fragments are at least of size 1
…ct#58886)

Signed-off-by: dancingactor <s990346@gmail.com>
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
…ject#59299)

Add documentation for Ray's token authentication feature introduced in
v2.52.0 (based on REP:
https://github.com/ray-project/enhancements/tree/048270c0750b9fe8aeac858466cb1b539d9b49c2/reps/2025-11-24-ray-token-auth
)

Co-authored-by: Andrew Sy Kim andrewsy@google.com

---------

Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Signed-off-by: sampan <sampan@anyscale.com>
Co-authored-by: Andrew Sy Kim <andrewsy@google.com>
Co-authored-by: sampan <sampan@anyscale.com>
The old log line was getting executed as part of the control loop
unconditionally.

cc @KeeProMise

Signed-off-by: abrar <abrar@anyscale.com>
…9392)

> Thank you for contributing to Ray! πŸš€
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> πŸ’‘ Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description
> Briefly describe what this PR accomplishes and why it's needed.

### [Data] Concurrency cap backpressure tuning
**EWMA_ALPHA**
Update EWMA_ALPHA from 0.2->0.1. This makes adjusting level to be more
in-favor of limiting concurrency by being more sensitive to
downstreaming queuing.

**K_DEV**
Update K_DEV from 2.0->1.0. This makes stddev to be more in-favor of
limiting concurrency by being more sensitive to downstreaming queuing.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
### Description
Completing the Expr (direct) Rounding operations (ceil, floor, round,
trunc)

for example:
```
import ray
from ray.data import from_items
from ray.data.expressions import col

ray.init(include_dashboard=False, ignore_reinit_error=True)
ds = from_items([{"x": 1.1}, {"x": 2.5}, {"x": -3.7}])
for row in ds.iter_rows():
    print(row)

x_expr = col("x")
hasattr(x_expr, "ceil")
ds = ds.with_column("x_ceil", x_expr.ceil())
hasattr(x_expr, "floor")
ds = ds.with_column("x_floor", x_expr.floor())
hasattr(x_expr, "round")
ds = ds.with_column("x_round", x_expr.round())
hasattr(x_expr, "trunc")
ds = ds.with_column("x_trunc", x_expr.trunc())
for row in ds.iter_rows():
    print(row)
```


### Related issues
Related to ray-project#58674

### Additional information

---------

Signed-off-by: yaommen <myanstu@163.com>
Co-authored-by: yanmin <homeryan@didiglobal.com>
…() (ray-project#59102)

## Description
Currently, `write_parquet` has been hard-coded to use `hive`
partititoning. This PR allows passing `partitioning_flavor` via
`arrow_parquet_args`/`arrow_parquet_args_fn`.

Since the default behaviors are different between Ray Data and pyarrow:
- Ray Data defaults to "hive", which is the case when we do not specify
this `partitioning_flavor`
- pyarrow uses `None` to represent dictionary partitioning. So we can
use partitioning_flavor=None

Also, I did not use the Partitioning class in `ray.data.read_parquet`,
which seems to be overkill (e.g., we exposed `partition_cols` as
top-level args here..)

Finally, I have rearranged the docstring a little bit.

## Related issues
NA

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Kit Lee <7000003+wingkitlee0@users.noreply.github.com>
…ray-project#59412)

This PR adds a cap on the total budget allocation by max_resource_usage
in ReservationOpResourceAllocator.

Previously, the reservation was capped by max_resource_usage, but the
total budget (reservation + shared resources) could exceed the max. This
could lead to operators being allocated more resources than they can
use, while other operators are starved.
 
Changes

  Cap total allocation by max_resource_usage:
  - Total allocation = max(total_reserved, op_usage) + op_shared
  - We now cap op_shared so that total allocation ≀ max_resource_usage
- Excess shared resources remain in remaining_shared for other operators

  Redistribute remaining shared resources:
- After the allocation loop, any remaining shared resources (from caps)
are given to the most downstream uncapped operator
- An operator is "uncapped" if its max_resource_usage ==
ExecutionResources.inf()
- If all operators are capped, remaining resources are not redistributed

  Tests

- test_budget_capped_by_max_resource_usage: Tests that capped operators
don't receive excess shared resources, and remaining goes to uncapped
downstream op
- test_budget_capped_by_max_resource_usage_all_capped: Tests that when
all operators are capped, remaining shared resources are not
redistributed

---------

Signed-off-by: Hao Chen <chenh1024@gmail.com>
## Description

This change addresses a long-standing problem when Pandas tensors
holding null values couldn't be converted into Arrow ones.

More details are captured in
ray-project#59445.

Following changes are made to address that:

 - Fixed `_is_ndarray_variable_shaped_tensor`
- Numpy tensors with `dtype='o'` are cycled t/h Pyarrow to be converted
into proper ndarrays
- Path raveling and formatting are unified b/w fixed-shape and
var-shaped tensors

## Related issues

Addresses ray-project#59445

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
)

## Description
this pr introduces the following optimizations in the
`opentelemetryMetricsRecorder` and some of its consumers:
- use asynchronous instruments wherever available (counter and up down
counter)
- introduce a batch api to record histogram metrics (to prevent lock
contention caused by repeated `set_metric_value()` calls)
- batch events received metric update in aggregator_agent instead of
making individual calls

---------

Signed-off-by: sampan <sampan@anyscale.com>
Co-authored-by: sampan <sampan@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
fixes ray-project#59218

Signed-off-by: abrar <abrar@anyscale.com>
…t#59502)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
We now prebuild manylinux2014 with JDK as part of
ray-project#59204

We can directly consume this, rather than rebuilding each time we need
it

---------

Signed-off-by: andrew <andrew@anyscale.com>
fixes ray-project#59218

---------

Signed-off-by: abrar <abrar@anyscale.com>
…55781)

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
fixes ray-project#59218

---------

Signed-off-by: abrar <abrar@anyscale.com>
as we do not need to support python 3.9 anymore

---------

Signed-off-by: Mark Towers <mark@anyscale.com>
Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
Co-authored-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Mark Towers <mark.m.towers@gmail.com>
were not all renamed

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…l_export_options (ray-project#59509)

the options are required for test telemetry gathering to work at the end of the test job run.
not used anymore, and does not work after gymnasium upgrade.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
needs to take `HOSTTYPE` from env and avoid adding defaults

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…stic multi-epoch training (ray-project#59528)

This PR renamed ray-project#59044

## Description

This PR adds execution-aware shuffling to Ray Data's file-based
datasources, enabling different file orders across different executions
while maintaining determinism.

### Changes

**Core functionality:**
- Added `execution_idx` field to `DataContext` to track the current
epoch
- `FileShuffleConfig` can receive `base_seed` to automatically increment
seed after each execution
- If `FileShuffleConfig` still uses `seed`, the random seed is still the
same for each execution.
- Modified `FileBasedDatasource.get_read_tasks()` to accept
`execution_idx` parameter and pass it through the shuffle logic

**Benefits:**
- No breaking API change.
- Each epoch produces a different but deterministic shuffle when
`base_seed` is provided
- Ensures that multiple datasets with the same shuffle config produce
identical results within each epoch

---------

Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com>
Signed-off-by: xgui <xgui@anyscale.com>
ray-project#59218

---------

Signed-off-by: abrar <abrar@anyscale.com>
fixes ray-project#56633

- [x] Add documentation
- [x] update `get_multiplexed_model_id` to see if we are batch context
first
- [x] update logic
- [x] add tests
- [x] does not introduce any backwards incompatibility, previously the
system did not provide any guarantee about contents of a batch and now
we are add a constraint that guarantees each batch contains requests for
same model.
- [x] execute sub batches concurrently 

The thing I dislike about this implementation is that it does not fill
the batch in the case where the replica is responsible for > 2 models
and incoming traffic is equally distributed between those models.
Becasue the current implementation fills the batch first, then divides
them.

Metric | Baseline (42905 reqs) | Master (27526 reqs) | Ξ” Change (Master
βˆ’ Baseline)
-- | -- | -- | --
Requests | 42,905 | 27,526 | βˆ’15,379
Fails | 0 | 0 | 0
Median (ms) | 290 | 300 | +10 ms
95%ile (ms) | 560 | 570 | +10 ms
99%ile (ms) | 620 | 640 | +20 ms
Average (ms) | 327.41 | 332.96 | +5.55 ms
Min (ms) | 61 | 80 | +19 ms
Max (ms) | 764 | 802 | +38 ms
Avg Size (bytes) | 13 | 13 | 0
Current RPS | 299 | 293 | βˆ’6
Current Failures/s | 0 | 0 | 0

---------

Signed-off-by: abrar <abrar@anyscale.com>
srinathk10 and others added 22 commits January 8, 2026 11:07
…roject#59956)

> Thank you for contributing to Ray! πŸš€
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> πŸ’‘ Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description
> Briefly describe what this PR accomplishes and why it's needed.


**[Data] Fix test_execution_optimizer_limit_pushdown determinism**

Fix by adding `override_num_blocks=1`

```

[2026-01-08T00:20:42Z] =================================== FAILURES ===================================
--
[2026-01-08T00:20:42Z] ____________________ test_limit_pushdown_basic_limit_fusion ____________________
[2026-01-08T00:20:42Z]
[2026-01-08T00:20:42Z] ray_start_regular_shared_2_cpus = RayContext(dashboard_url='127.0.0.1:8265', python_version='3.10.19', ray_version='3.0.0.dev0', ray_commit='{{RAY_COMMIT_SHA}}')
[2026-01-08T00:20:42Z]
[2026-01-08T00:20:42Z]     def test_limit_pushdown_basic_limit_fusion(ray_start_regular_shared_2_cpus):
[2026-01-08T00:20:42Z]         """Test basic Limit -> Limit fusion."""
[2026-01-08T00:20:42Z]         ds = ray.data.range(100).limit(5).limit(100)
[2026-01-08T00:20:42Z] >       _check_valid_plan_and_result(
[2026-01-08T00:20:42Z]             ds,
[2026-01-08T00:20:42Z]             "Read[ReadRange] -> Limit[limit=5]",
[2026-01-08T00:20:42Z]             [{"id": i} for i in range(5)],
[2026-01-08T00:20:42Z]             check_ordering=False,
[2026-01-08T00:20:42Z]         )
[2026-01-08T00:20:42Z]
[2026-01-08T00:20:42Z] python/ray/data/tests/test_execution_optimizer_limit_pushdown.py:40:
[2026-01-08T00:20:42Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2026-01-08T00:20:42Z]
[2026-01-08T00:20:42Z] ds = limit=5
[2026-01-08T00:20:42Z] +- Dataset(num_rows=5, schema={id: int64})
[2026-01-08T00:20:42Z] expected_plan = 'Read[ReadRange] -> Limit[limit=5]'
[2026-01-08T00:20:42Z] expected_result = [{'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]
[2026-01-08T00:20:42Z] expected_physical_plan_ops = None, check_ordering = False
[2026-01-08T00:20:42Z]
[2026-01-08T00:20:42Z]     def _check_valid_plan_and_result(
[2026-01-08T00:20:42Z]         ds: Dataset,
[2026-01-08T00:20:42Z]         expected_plan: Plan,
[2026-01-08T00:20:42Z]         expected_result: List[Dict[str, Any]],
[2026-01-08T00:20:42Z]         expected_physical_plan_ops=None,
[2026-01-08T00:20:42Z]         check_ordering=True,
[2026-01-08T00:20:42Z]     ):
[2026-01-08T00:20:42Z]         actual_result = ds.take_all()
[2026-01-08T00:20:42Z]         if check_ordering:
[2026-01-08T00:20:42Z]             assert actual_result == expected_result
[2026-01-08T00:20:42Z]         else:
[2026-01-08T00:20:42Z] >           assert rows_same(pd.DataFrame(actual_result), pd.DataFrame(expected_result))
[2026-01-08T00:20:42Z] E           AssertionError: assert False
[2026-01-08T00:20:42Z] E            +  where False = rows_same(   id\n0  25\n1  26\n2  27\n3  28\n4  29,    id\n0   0\n1   1\n2   2\n3   3\n4   4)
[2026-01-08T00:20:42Z] E            +    where    id\n0  25\n1  26\n2  27\n3  28\n4  29 = <class 'pandas.core.frame.DataFrame'>([{'id': 25}, {'id': 26}, {'id': 27}, {'id': 28}, {'id': 29}])
[2026-01-08T00:20:42Z] E            +      where <class 'pandas.core.frame.DataFrame'> = pd.DataFrame
[2026-01-08T00:20:42Z] E            +    and      id\n0   0\n1   1\n2   2\n3   3\n4   4 = <class 'pandas.core.frame.DataFrame'>([{'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}])
[2026-01-08T00:20:42Z] E            +      where <class 'pandas.core.frame.DataFrame'> = pd.DataFrame
Β 
```



## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
…remental streaming (ray-project#59972)

## Description

This change makes `StatefulShuffleAggregation.finalize` interface allow
incremental streaming to prepare it for future shift to streaming
results incrementally.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
### Summary

This PR is part 1 of 3 for adding queue-based autoscaling support for
Ray Serve TaskConsumer deployments.

###  Background

TaskConsumers are workloads that consume tasks from message queues
(Redis, RabbitMQ), and their scaling needs are fundamentally different
from HTTP-based deployments. Instead of scaling based on HTTP request
load, TaskConsumers should scale based on the number of pending tasks in
the message queue.

###  Overall Architecture (Full Feature)
```
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Message Queue  │◄─────│  QueueMonitor    β”‚      β”‚ ServeController β”‚
  β”‚  (Redis/RMQ)    β”‚      β”‚  Actor           │◄─────│ Autoscaler      β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚                         β”‚
                                   β”‚ get_queue_length()      β”‚
                                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                             β”‚
                                             β–Ό
                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                β”‚ queue_based_autoscaling   β”‚
                                β”‚ _policy()                 β”‚
                                β”‚ desired = ceil(len/target)β”‚
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
  The full implementation consists of three PRs:

| PR | Description | Status |

|----------------|-----------------------------------------------------|------------|
| PR 1 (This PR) | QueueMonitor actor for querying broker queue length |
πŸ”„ Current |
| PR 2 | Introduce default Queue-based autoscaling policy | Upcoming |
| PR 3 | Integration with TaskConsumer deployments | Upcoming |

###  This PR: QueueMonitor Actor

This PR introduces the QueueMonitor Ray actor that queries message
brokers to get queue length for autoscaling decisions.

###  Key Features

  - Multi-broker support: Redis and RabbitMQ
- Lightweight Ray actor: Runs with num_cpus=0, and pika and redis in
runtime env
  - Fault tolerance: Caches last known queue length on query failures
- Named actor pattern: QUEUE_MONITOR::<deployment_name> for easy lookup

###  Queue Length Calculation

For accurate autoscaling, QueueMonitor returns total workload (pending
tasks):

  | Broker   | Pending Tasks  |
  |----------|----------------|
  | Redis    | LLEN <queue>   |
  | RabbitMQ | messages_ready  |


###  Components

1. QueueMonitorConfig - Configuration dataclass with broker URL and
queue name
2. QueueMonitor - Core class that initializes broker connections and
queries queue length
  3. QueueMonitorActor - Ray actor wrapper for remote access
  4. Helper functions:
    - create_queue_monitor_actor() - Create named actor
    - get_queue_monitor_actor() - Lookup existing actor
    - delete_queue_monitor_actor() - Cleanup on deployment deletion

  
###  Test Plan

  - Unit tests for QueueMonitorConfig (7 tests)
    - Broker type detection (Redis, RabbitMQ, SQS, unknown)
    - Config value storage
  - Unit tests for QueueMonitor (4 tests)
    - Redis queue length retrieval (pending)
    - RabbitMQ queue length retrieval
    - Error handling with cached value fallback

---------

Signed-off-by: harshit <harshit@anyscale.com>
… Checkpoint Report Time" (ray-project#58470)

The existing Checkpoint Report Time metric represents the total time
workers spend reporting metrics across multiple checkpoints. However,
the current title does not clearly indicate that this time is
cumulative. This PR addresses the issue by renaming the panel to
β€œCumulative Checkpoint Report Time” and updating the description to
reflect that it sums reporting time across multiple checkpoints.

Current State:
<img width="1730" height="988" alt="image"
src="https://github.com/user-attachments/assets/5d74eb17-6b6a-4902-a9f3-c25749ec8e5d"
/>

Signed-off-by: JasonLi1909 <jasli1909@gmail.com>
…ct#59963)

### Summary
This PR removes the `RAY_SERVE_ALWAYS_RUN_PROXY_ON_HEAD_NODE`
environment variable as it has no practical effect in production.

### Why this env var is redundant
The env var was intended to "always run a proxy on the head node even if
it has no replicas." However, the controller already unconditionally
ensures the head node is included in the proxy nodes set, making this
flag ineffective.

### Code flow analysis
#### In `controller.py` `_update_proxy_nodes()` (lines 419-430):
```python
def _update_proxy_nodes(self):
    new_proxy_nodes = self.deployment_state_manager.get_active_node_ids()
    new_proxy_nodes = new_proxy_nodes - set(
        self.cluster_node_info_cache.get_draining_nodes()
    )
    new_proxy_nodes.add(self._controller_node_id)  # <-- UNCONDITIONAL
    self._proxy_nodes = new_proxy_nodes
```

The head node (`_controller_node_id`) is **always** added to
`_proxy_nodes` unconditionally.
#### In `proxy_state.py` `update()` (where the env var was checked):
```python
if RAY_SERVE_ALWAYS_RUN_PROXY_ON_HEAD_NODE:
    proxy_nodes.add(self._head_node_id)
```

This check happens **after** the controller already added the head node.
Since proxy_nodes is a set, adding the same node again is a no-op.

### Consequence
- Setting `RAY_SERVE_ALWAYS_RUN_PROXY_ON_HEAD_NODE=0` had no effect
because the controller adds the head node regardless.
- Setting `RAY_SERVE_ALWAYS_RUN_PROXY_ON_HEAD_NODE=1` (the default) was
also a no-op since the head node was already present.

### Changes
- Removed `RAY_SERVE_ALWAYS_RUN_PROXY_ON_HEAD_NODE` definition from
`constants.py`
- Removed the import and usage in `proxy_state.py`

---------

Signed-off-by: harshit <harshit@anyscale.com>
## Description
Reviewing our testing logs, they can often be incredibly long. This PR
aims to reduce them by changing three things
1. By default, the CLIReporter in `run_rllib_example_script_experiment`
will report an algorithms training results at least every 5 seconds.
This PR adds a `tune-max-report-freq` argument that we keep at 5 for
end-users while in tests we change it to 30 seconds
2. Change the verbosity of the tune results from 2 to 1 when testing
3. Removed ` WARNING impala_learner.py:576 -- No old learner state to
remove from the queue.` warnings

---------

Signed-off-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Mark Towers <mark@anyscale.com>
## Description

Changes
---
- Added `AsList` aggregation allowing to aggregate given column values
into a single element as a list
 - Added `AsListVectorized`
 - Added modulo op to `Expr`
 

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…multi-agent (ray-project#59928)

## Description
For the FlattenObservation connector, in the mutli-agent case, if any of
the agent's had a nested space, if it was ignored then we used the
base_struct version of the observation_spaces rather than the actual
spaces which for nested spaces is a problem.
FYI, I believe that `get_base_struct_from_space` can be removed as
Gymnasium has added the ability to treat `Dict` and `Tuple` spaces as
like a dict or tuple for iterating, calling `.keys()`, `.values()` and
`.items()`. For a future PR possibly

## Related issues
Fixes ray-project#59849

---------

Signed-off-by: Mark Towers <mark@anyscale.com>
Signed-off-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Co-authored-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Artur Niederfahrenhorst <attaismyname@googlemail.com>
Co-authored-by: Artur Niederfahrenhorst <artur@anyscale.com>
…E.md (ray-project#59980)

If publishers follow the example publishing pattern, they should end up
with a structure like:

```
doc/source/{path-to-examples}/my-example/
  └─ content/
     β”œβ”€ notebook.ipynb
     β”œβ”€ README.md
     └─ …
```
with path-to-examples any of "serve/tutorials/" | "data/examples/" |
"ray-overview/examples/" | "train/examples/" | "tune/examples/"

In the toctree (or `examples.yml` for data/train/serve), publishers
*should* link to `notebook.ipynb`. The `README.md` exists only for
display in the console once the example is converted into an Anyscale
template.

### Issue

* `README.md` is a copy of the notebook and doesn't belong to any
toctree because it is **not used by Ray Docs** -> Sphinx emits *orphan
warnings* for these files, which fail CI because ReadTheDocs is
configured to fail on warnings
* To silence the warning, publishers must manually add the file to
`exclude_patterns` in `conf.py` -> This adds unnecessary overhead for
publishers unfamiliar with Sphinx
* Over time, it also clutters `exclude_patterns`

### Solution

This PR automatically adds example `README.md` files to
`exclude_patterns`, **only** when they match specific glob patterns.
These patterns ensure the files:

* Live under a Ray docs examples directory, and
* Are contained within a `content/` folder

This minimizes the risk of accidentally excluding unrelated files

---------

Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Co-authored-by: Aydin Abiar <aydin@anyscale.com>
## Description
Currently, when displaying hanging tasks, we show ray data level task
index, which is useless for ray core debugging. This PR adds more info
to long running tasks namely:
- node_id
- pid
- attempt #

I did consider adding this to high memory detector, but avoided for 2
reasons
- requires more refractor of `RunningTaskInfo`
- afaik, not helpful in debugging since high memory is _after the task
completes_

## Example script to trigger hanging issues
```python
import ray
import time
from ray.data._internal.issue_detection.detectors import HangingExecutionIssueDetectorConfig


ctx = ray.data.DataContext.get_current()
ctx.issue_detectors_config.hanging_detector_config = HangingExecutionIssueDetectorConfig(
    detection_time_interval_s=1.0,
)

def sleep(x):
    if x['id'] == 0:
        time.sleep(100)
    return x
ray.data.range(100, override_num_blocks=100).map_batches(sleep).materialize()
```




## Related issues
None

## Additional information
None

---------

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <140442892+iamjustinhsu@users.noreply.github.com>
Co-authored-by: Alexey Kudinkin <alexey.kudinkin@gmail.com>
…ay-project#59907)

## Description
The grid is not being rendered correctly in some logs. Instead, this
changes opts for a simpler representation from tabulate.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Goutam <goutam@anyscale.com>
> Thank you for contributing to Ray! πŸš€
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> πŸ’‘ Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description
> Briefly describe what this PR accomplishes and why it's needed.

### [Data] Fixes to DownstreamCapacityBackpressuePolicy

### Fixes

In DownstreamCapacityBackpressuePolicy, when calculating
available/utilized object store ratio per Op, also include the
downstream in-eligible Ops.


## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Srinath Krishnamachari <srinath.krishnamachari@anyscale.com>
## Description
test_import_in_subprocess was failing on windows as `return
subprocess.run(["python", "-c", "import pip_install_test"])` would use
system python instead of the virtualenv's python

warning highlighted in python subprocess docs:
> Warning For maximum reliability, use a fully qualified path for the
executable. To search for an unqualified name on PATH, use
[shutil.which()](https://docs.python.org/3/library/shutil.html#shutil.which).
On all platforms, passing
[sys.executable](https://docs.python.org/3/library/sys.html#sys.executable)
is the recommended way to launch the current Python interpreter again,
and use the -m command-line format to launch an installed module.

https://docs.python.org/3/library/subprocess.html

test passed (but looks like some other unrelated tests failed): 

https://buildkite.com/organizations/ray-project/pipelines/premerge/builds/57162/jobs/019b9c44-0ec0-457b-8b8a-dac06d5ea818/log#13851-14985

https://buildkite.com/ray-project/premerge/builds/57254#019ba0e5-9bf8-4157-b997-2a2e4e01358b/L11484

## Related issues


## Additional information

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
… Data APIs (ray-project#59918)

This PR splits the Input/Output API reference page into two separate
pages to improve organization and mirror the structure of the user
guides.

## Changes
- Renamed `input_output.rst` to `loading_data.rst`
- Created `saving_data.rst` with all saving/writing APIs
- Updated `api.rst` to reference both new files
- Updated all references from `input-output` to
`loading-data-api`/`saving-data-api`
- Standardized section header formatting with dashes matching title
length

Fixes ray-project#59301

---------

Signed-off-by: mgchoi239 <mg.choi.239@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: mgchoi239 <mg.choi.239@gmail.com>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
…y-project#59959)

## Description
The
[test_operators.py](https://github.com/ray-project/ray/blob/f85c5255669d8a682673401b54a86612a79058e3/python/ray/data/tests/test_operators.py)
file (~1,670 lines, 28 test functions) occasionally times out during CI
runs. We should split it into smaller, logically grouped test modules to
improve test reliability and allow better parallel execution.
@tianyi-ge 
## Related issues
Fixes ray-project#59881

Signed-off-by: Haichuan <kaisennhu@gmail.com>
…ay-project#59955)

All PRs that are submitted to Ray must have clear titles and
descriptions that give the reviewer adequate context. To help catch PRs
that violate this rule, I've added a bugbot rule to cursor that will
automate this in turn taking load off of the review process.

---------

Signed-off-by: joshlee <joshlee@anyscale.com>
…ject#59957)

# Summary

Before this PR, the training failed error was buried in the `exc_text`
part of the log. After this PR it should also appear in the `message`
part of the log.

# Testing

Unit tests

---------

Signed-off-by: Timothy Seah <tseah@anyscale.com>
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Dead code is a maintenance burden. Removing unused mocks. 

This came up as I was working on removing the ClusterLeaseManager from
the GCS in ray-project#60008.

Signed-off-by: irabbani <israbbani@gmail.com>
ray-project#60012)

## Description
Reverting ray-project#59852 as it causes
release test infra to fail. We need to update the infra to jive with the
new port discovery settings properly.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
Example failing build:
https://buildkite.com/ray-project/release/builds/74681#
…orphan (ray-project#59982)

If publishers follow the example publishing pattern for ray
data/serve/train, they should end up with a structure like:

```
doc/source/{path-to-examples}/my-example/
  └─ content/
     β”œβ”€ notebook.ipynb
     β”œβ”€ README.md
     └─ …
```
with path-to-examples any of "serve/tutorials/" | "data/examples/" |
"train/examples/"

In the `examples.yml`, publishers link to `notebook.ipynb`.

### Issue

* `examples.rst` is dynamically created by custom scripts in
`custom_directives.py`. The script reads each `examples.yml` file and
create the html for the examples index page in ray docs.
* Because `examples.rst` is initially empty, Sphinx considers all
notebooks as "orphan" documents and emits warnings, which fail CI
because ReadTheDocs is configured to fail on warnings
* To silence the warning, publishers must manually add a `orphan: True`
to the metadata of the notebook
* This adds unnecessary overhead for publishers unfamiliar with Sphinx.
They should only worry about their content, not what an "orphan"
document is

### Solution

This PR automatically adds the `orphan: True` metadata to any files
listed in any of the `examples.yml` for ray data/serve/train. This
ensures:

* Using examples.yml as source of truth so only impacted files are
impacted. No side effect on unrelated files.
* Publisher can focus on its content, doesn't have to worry about sphinx

---------

Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Signed-off-by: Aydin Abiar <62435714+Aydin-ab@users.noreply.github.com>
Co-authored-by: Aydin Abiar <aydin@anyscale.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… different processes (ray-project#59634)

**Problem:**

fixes ray-project#57803

When tracing is enabled, calling an actor method from a different
process than the one that created the actor fails with:
```
TypeError: got an unexpected keyword argument '_ray_trace_ctx'
```

This commonly occurs with Ray Serve, where:
- `serve start` creates the controller actor (process A)
- Dashboard calls `ray.get_actor()` to interact with it (process B)

## Repo
Simplest way to repro is to run the following
```bash
ray start --head --tracing-startup-hook="ray.util.tracing.setup_local_tmp_tracing:setup_tracing"                                                                                                        βœ” β”‚ ray_310 Py β”‚ with ubuntu@devbox β”‚ at 06:22:12
serve start
```

But here is a core specific repro script

`repro_actor_module.py`
```python
class MyActor:
    """A simple actor class that will be decorated dynamically."""
    
    def __init__(self):
        self.value = 0
    
    def my_method(self, x):
        """A simple method."""
        return x * 2
    
    def check_alive(self):
        """Health check method."""
        return True
    
    def increment(self, amount=1):
        """Method with a default parameter."""
        self.value += amount
        return self.value
```

`repro_tracing_issue.py`
```python
import multiprocessing
import subprocess
import sys


NAMESPACE = "test_ns"


def creator_process(ready_event, done_event):
    import ray
    from ray.util.tracing.tracing_helper import _is_tracing_enabled
    
    # Import the actor class from module (NOT decorated yet)
    from repro_actor_module import MyActor
    
    setup_tracing_path = "ray.util.tracing.setup_local_tmp_tracing:setup_tracing"
    ray.init(_tracing_startup_hook=setup_tracing_path, namespace=NAMESPACE)
    
    print(f"[CREATOR] Tracing enabled: {_is_tracing_enabled()}")
    
    # Dynamically decorate and create the test actor (like Serve does)
    MyActorRemote = ray.remote(
        name="my_test_actor",
        namespace=NAMESPACE,
        num_cpus=0,
        lifetime="detached",
    )(MyActor)
    
    actor = MyActorRemote.remote()
    
    # Print signatures from creator's handle
    print(f"[CREATOR] Signatures in handle from creation:")
    for method_name, sig in actor._ray_method_signatures.items():
        param_names = [p.name for p in sig]
        print(f"  {method_name}: {param_names}")
    
    my_method_sig = actor._ray_method_signatures.get("my_method", [])
    has_trace = "_ray_trace_ctx" in [p.name for p in my_method_sig]
    print(f"[CREATOR] my_method has _ray_trace_ctx: {has_trace}")
    
    # Verify the method works from creator
    result = ray.get(actor.my_method.remote(5))
    print(f"[CREATOR] Test call result: {result}")
    
    # Signal that actor is ready
    print("[CREATOR] Actor created, signaling getter...")
    sys.stdout.flush()
    ready_event.set()
    
    # Wait for getter to finish
    done_event.wait(timeout=30)
    print("[CREATOR] Getter finished, shutting down...")
    
    # Cleanup
    ray.kill(actor)
    ray.shutdown()


def getter_process(ready_event, done_event):
    import ray
    from ray.util.tracing.tracing_helper import _is_tracing_enabled
    
    # Wait for creator to signal ready
    print("[GETTER] Waiting for creator to set up actor...")
    if not ready_event.wait(timeout=30):
        print("[GETTER] Timeout waiting for creator!")
        done_event.set()
        return
    
    # Connect to the existing cluster (this will also enable tracing from GCS hook)
    ray.init(address="auto", namespace=NAMESPACE)
    
    print(f"\n[GETTER] Tracing enabled: {_is_tracing_enabled()}")
    
    # Get the actor by name - this will RELOAD the class fresh in this process
    # The class loaded here was NEVER processed by _inject_tracing_into_class
    actor = ray.get_actor("my_test_actor", namespace=NAMESPACE)
    
    # Print signatures from getter's handle
    print(f"[GETTER] Signatures in handle from get_actor():")
    for method_name, sig in actor._ray_method_signatures.items():
        param_names = [p.name for p in sig]
        print(f"  {method_name}: {param_names}")
    
    my_method_sig = actor._ray_method_signatures.get("my_method", [])
    has_trace = "_ray_trace_ctx" in [p.name for p in my_method_sig]
    print(f"[GETTER] my_method has _ray_trace_ctx: {has_trace}")
    
    # Try calling a method
    print(f"\n[GETTER] Attempting to call my_method.remote(5)...")
    sys.stdout.flush()
    try:
        result = ray.get(actor.my_method.remote(5))
        print(f"[GETTER] Method call SUCCEEDED! Result: {result}")
    except TypeError as e:
        print(f"[GETTER] Method call FAILED with TypeError: {e}")
    
    # Signal done
    done_event.set()
    ray.shutdown()


def main():
    # Stop any existing Ray cluster
    print("Stopping any existing Ray cluster...")
    subprocess.run(["ray", "stop", "--force"], capture_output=True)
    
    # Create synchronization events
    ready_event = multiprocessing.Event()
    done_event = multiprocessing.Event()
    
    # Start creator process
    creator = multiprocessing.Process(target=creator_process, args=(ready_event, done_event))
    creator.start()
    
    # Start getter process (will connect to existing cluster)
    getter = multiprocessing.Process(target=getter_process, args=(ready_event, done_event))
    getter.start()
    
    # Wait for both to complete
    getter.join(timeout=60)
    creator.join(timeout=10)
    
    # Clean up any hung processes
    if creator.is_alive():
        creator.terminate()
        creator.join(timeout=5)
    if getter.is_alive():
        getter.terminate()
        getter.join(timeout=5)
    
    # Cleanup Ray
    print("\nCleaning up...")
    subprocess.run(["ray", "stop", "--force"], capture_output=True)
    print("Done.")


if __name__ == "__main__":
    main()
```

<details>

<summary> output from master </summary>

```bash
❯ python repro_tracing_issue.py
Stopping any existing Ray cluster...
[GETTER] Waiting for creator to set up actor...
2025-12-24 07:05:02,215 INFO worker.py:1991 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
/home/ubuntu/ray/python/ray/_private/worker.py:2039: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(
[CREATOR] Tracing enabled: True
[CREATOR] Signatures in handle from creation:
  __init__: ['_ray_trace_ctx']
  __ray_call__: ['fn', 'args', '_ray_trace_ctx', 'kwargs']
  __ray_ready__: ['_ray_trace_ctx']
  __ray_terminate__: ['_ray_trace_ctx']
  check_alive: ['_ray_trace_ctx']
  increment: ['amount', '_ray_trace_ctx']
  my_method: ['x', '_ray_trace_ctx']
[CREATOR] my_method has _ray_trace_ctx: True
[CREATOR] Test call result: 10
[CREATOR] Actor created, signaling getter...
2025-12-24 07:05:02,953 INFO worker.py:1811 -- Connecting to existing Ray cluster at address: 172.31.7.228:38871...
2025-12-24 07:05:02,984 INFO worker.py:1991 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265 
/home/ubuntu/ray/python/ray/_private/worker.py:2039: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(

[GETTER] Tracing enabled: True
[GETTER] Signatures in handle from get_actor():
  __init__: []
  __ray_call__: ['fn', 'args', '_ray_trace_ctx', 'kwargs']
  __ray_ready__: ['_ray_trace_ctx']
  __ray_terminate__: ['_ray_trace_ctx']
  check_alive: []
  increment: ['amount']
  my_method: ['x']
[GETTER] my_method has _ray_trace_ctx: False

[GETTER] Attempting to call my_method.remote(5)...
[GETTER] Method call FAILED with TypeError: got an unexpected keyword argument '_ray_trace_ctx'
[CREATOR] Getter finished, shutting down...

Cleaning up...
Done.
```

</details>

<details>

<summary>output from this PR</summary>

```bash
❯ python repro_tracing_issue.py
Stopping any existing Ray cluster...
[GETTER] Waiting for creator to set up actor...
2025-12-24 07:04:03,758 INFO worker.py:1991 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
/home/ubuntu/ray/python/ray/_private/worker.py:2039: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(
[CREATOR] Tracing enabled: True
[CREATOR] Signatures in handle from creation:
  __init__: ['_ray_trace_ctx']
  __ray_call__: ['fn', 'args', '_ray_trace_ctx', 'kwargs']
  __ray_ready__: ['_ray_trace_ctx']
  __ray_terminate__: ['_ray_trace_ctx']
  check_alive: ['_ray_trace_ctx']
  increment: ['amount', '_ray_trace_ctx']
  my_method: ['x', '_ray_trace_ctx']
[CREATOR] my_method has _ray_trace_ctx: True
[CREATOR] Test call result: 10
[CREATOR] Actor created, signaling getter...
2025-12-24 07:04:04,476 INFO worker.py:1811 -- Connecting to existing Ray cluster at address: 172.31.7.228:37231...
2025-12-24 07:04:04,504 INFO worker.py:1991 -- Connected to Ray cluster. View the dashboard at 127.0.0.1:8265 
/home/ubuntu/ray/python/ray/_private/worker.py:2039: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(

[GETTER] Tracing enabled: True
[GETTER] Signatures in handle from get_actor():
  __init__: ['_ray_trace_ctx']
  __ray_call__: ['fn', 'args', '_ray_trace_ctx', 'kwargs']
  __ray_ready__: ['_ray_trace_ctx']
  __ray_terminate__: ['_ray_trace_ctx']
  check_alive: ['_ray_trace_ctx']
  increment: ['amount', '_ray_trace_ctx']
  my_method: ['x', '_ray_trace_ctx']
[GETTER] my_method has _ray_trace_ctx: True

[GETTER] Attempting to call my_method.remote(5)...
[GETTER] Method call SUCCEEDED! Result: 10
[CREATOR] Getter finished, shutting down...

Cleaning up...
Done.
```

</details>

**Root Cause:**

`_inject_tracing_into_class` sets `__signature__` (including
`_ray_trace_ctx`) on the method object during actor creation. However:

1. When the actor class is serialized (cloudpickle) and loaded in
another process, `__signature__` is **not preserved** on module-level
functions. See repro script at the end of PR description as proof
2. `_ActorClassMethodMetadata.create()` uses `inspect.unwrap()` which
follows the `__wrapped__` chain to the **deeply unwrapped original
method**
3. The original method's `__signature__` was lost during serialization β†’
signatures extracted **without** `_ray_trace_ctx`
4. When calling the method, `_tracing_actor_method_invocation` adds
`_ray_trace_ctx` to kwargs β†’ **signature validation fails**

**Fix:**

1. In `_inject_tracing_into_class`: Set `__signature__` on the **deeply
unwrapped** method (via `inspect.unwrap`) rather than the immediate
method. This ensures `_ActorClassMethodMetadata.create()` finds it after
unwrapping.

2. In `load_actor_class`: Call `_inject_tracing_into_class` after
loading to re-inject the lost `__signature__` attributes.

**Testing:**
- Added reproduction script demonstrating cross-process actor method
calls with tracing
- All existing tracing tests pass
- Add a new test for serve with tracing

`repro_cloudpickle_signature.py`
```python
import inspect
import cloudpickle
import pickle
import multiprocessing


def check_signature_in_subprocess(pickled_func_bytes):
    func = pickle.loads(pickled_func_bytes)
    
    print(f"[SUBPROCESS] Unpickled function: {func}")
    print(f"[SUBPROCESS] Module: {func.__module__}")
    
    sig = getattr(func, '__signature__', None)
    if sig is not None:
        params = list(sig.parameters.keys())
        print(f"[SUBPROCESS] __signature__: {sig}")
        if '_ray_trace_ctx' in params:
            print(f"[SUBPROCESS] __signature__ WAS preserved")
            return True
        else:
            print(f"[SUBPROCESS] __signature__ NOT preserved (missing _ray_trace_ctx)")
            return False
    else:
        print(f"[SUBPROCESS] __signature__ NOT preserved (attribute missing)")
        return False


def main():
    from repro_actor_module import MyActor
    func = MyActor.my_method
    
    print(f"\n[MAIN] Function: {func}")
    print(f"[MAIN] Module: {func.__module__}")
    print(f"[MAIN] __signature__ before: {getattr(func, '__signature__', 'NOT SET')}")
    
    # Set a custom __signature__ with _ray_trace_ctx
    custom_sig = inspect.signature(func)
    new_params = list(custom_sig.parameters.values()) + [
        inspect.Parameter("_ray_trace_ctx", inspect.Parameter.KEYWORD_ONLY, default=None)
    ]
    func.__signature__ = custom_sig.replace(parameters=new_params)
    
    print(f"[MAIN] __signature__ after: {func.__signature__}")
    print(f"[MAIN] Parameters: {list(func.__signature__.parameters.keys())}")
    
    # Pickle
    print(f"\n[MAIN] Pickling with cloudpickle...")
    pickled = cloudpickle.dumps(func)
    
    # Test 1: Same process
    print(f"\n{'='*70}")
    print("TEST 1: Unpickle in SAME process")
    print(f"{'='*70}")
    same_func = pickle.loads(pickled)
    same_sig = getattr(same_func, '__signature__', None)
    if same_sig and '_ray_trace_ctx' in list(same_sig.parameters.keys()):
        print(f"Same process: __signature__ preserved")
    else:
        print(f"Same process: __signature__ NOT preserved")
    
    # Test 2: Different process
    print(f"\n{'='*70}")
    print("TEST 2: Unpickle in DIFFERENT process")
    print(f"{'='*70}")
    
    ctx = multiprocessing.get_context('spawn')
    with ctx.Pool(1) as pool:
        result = pool.apply(check_signature_in_subprocess, (pickled,))
    
    if result:
        print("__signature__ IS preserved (unexpected)")
    else:
        print("__signature__ is NOT preserved for functions from imported modules!")


if __name__ == "__main__":
    main()

```

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: abrar <abrar@anyscale.com>
@gemini-code-assist
Copy link

Note

The number of changes in this pull request is too large for Gemini Code Assist to generate a review.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale label Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.