From 847d6572a9e531f5f327472fefb333e4501493a8 Mon Sep 17 00:00:00 2001
From: "promptless[bot]" <179508745+promptless[bot]@users.noreply.github.com>
Date: Tue, 5 Nov 2024 18:37:45 +0000
Subject: [PATCH 1/5] Docs update (ee0512a)

Signed-off-by: Promptless Bot <hi@gopromptless.ai>
---
 doc/source/serve/advanced-guides/dev-workflow.md | 9 +++++++++
 doc/source/serve/http-guide.md                   | 9 +++++++++
 doc/source/serve/model_composition.md            | 8 +-------
 3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/doc/source/serve/advanced-guides/dev-workflow.md b/doc/source/serve/advanced-guides/dev-workflow.md
index de8b6745a400..81e8080c3118 100644
--- a/doc/source/serve/advanced-guides/dev-workflow.md
+++ b/doc/source/serve/advanced-guides/dev-workflow.md
@@ -79,6 +79,15 @@ After you're done testing, you can shut down Ray Serve by interrupting the `serv
 
 Note that rerunning `serve run` will redeploy all deployments. To prevent redeploying those deployments whose code hasn't changed, you can use `serve deploy`; see the [Production Guide](serve-in-production) for details.
 
+### Local Testing Mode
+
+Ray Serve now supports a local testing mode that allows you to run your deployments locally in a single process. This mode is useful for unit testing and debugging your application logic without the overhead of a full Ray cluster. To enable this mode, use the `_local_testing_mode` flag in the `serve.run` function:
+
+```python
+serve.run(app, _local_testing_mode=True)
+```
+
+This mode runs each deployment in a background thread and supports most of the same features as running on a full Ray cluster, with some limitations. For example, converting deployment responses to Ray object references is not supported in local testing mode.
 ## Testing on a remote cluster
 
 To test on a remote cluster, you'll use `serve run` again, but this time you'll pass in an `--address` argument to specify the address of the Ray cluster to connect to.  For remote clusters, this address has the form `ray://<head-node-ip-address>:10001`; see [Ray Client](ray-client-ref) for more information.
diff --git a/doc/source/serve/http-guide.md b/doc/source/serve/http-guide.md
index 054ac9ff2145..f1a210eaa643 100644
--- a/doc/source/serve/http-guide.md
+++ b/doc/source/serve/http-guide.md
@@ -19,6 +19,7 @@ Considering your use case, you can choose the right level of abstraction:
 
 (serve-http)=
 ## Calling Deployments via HTTP
+
 When you deploy a Serve application, the [ingress deployment](serve-key-concepts-ingress-deployment) (the one passed to `serve.run`) is exposed over HTTP.
 
 ```{literalinclude} doc_code/http_guide/http_guide.py
@@ -58,6 +59,14 @@ To prevent an async call from being interrupted by `asyncio.CancelledError`, use
 When the request is cancelled, a cancellation error is raised inside the `SnoringSleeper` deployment's `__call__()` method. However, the cancellation is not raised inside the `snore()` call, so `ZZZ` is printed even if the request is cancelled. Note that `asyncio.shield` cannot be used on a `DeploymentHandle` call to prevent the downstream handler from being cancelled. You need to explicitly handle the cancellation error in that handler as well.
 
 (serve-fastapi-http)=
+
+### Local Testing Mode
+
+Ray Serve now supports a local testing mode, which allows you to run deployments locally for faster development and testing. This mode can be enabled by setting the `_local_testing_mode` flag to `True` in `serve.run()`. This feature is particularly useful for unit testing application and model composition logic without deploying to a full cluster.
+
+To enable local testing mode, you can set the environment variable `RAY_SERVE_FORCE_LOCAL_TESTING_MODE=1` or pass `_local_testing_mode=True` directly to `serve.run()`. This mode runs user code for each deployment in a background thread, allowing for the use of debugging tools like PDB.
+
+Note that some features, such as converting `DeploymentResponses` to `ObjectRefs`, are not supported in local testing mode. If you encounter limitations, consider filing a feature request on GitHub.
 ## FastAPI HTTP Deployments
 
 If you want to define more complex HTTP handling logic, Serve integrates with [FastAPI](https://fastapi.tiangolo.com/). This allows you to define a Serve deployment using the {mod}`@serve.ingress <ray.serve.ingress>` decorator that wraps a FastAPI app with its full range of features. The most basic example of this is shown below, but for more details on all that FastAPI has to offer such as variable routes, automatic type validation, dependency injection (e.g., for database connections), and more, please check out [their documentation](https://fastapi.tiangolo.com/).
diff --git a/doc/source/serve/model_composition.md b/doc/source/serve/model_composition.md
index 28de84c558e6..fa98328af48e 100644
--- a/doc/source/serve/model_composition.md
+++ b/doc/source/serve/model_composition.md
@@ -113,12 +113,7 @@ Note how the response from the `Adder` handle passes directly to the `Multiplier
 
 ## Streaming DeploymentHandle calls
 
-You can also use `DeploymentHandles` to make streaming method calls that return multiple outputs.
-To make a streaming call, the method must be a generator and you must set `handle.options(stream=True)`.
-Then, the handle call returns a {mod}`DeploymentResponseGenerator <ray.serve.handle.DeploymentResponseGenerator>` instead of a unary `DeploymentResponse`.
-You can use `DeploymentResponseGenerators` as a sync or async generator, like in an `async for` code block.
-Similar to `DeploymentResponse.result()`, avoid using a `DeploymentResponseGenerator` as a sync generator within a deployment, as that blocks other requests from executing concurrently on that replica.
-Note that you can't pass `DeploymentResponseGenerators` to other handle calls.
+You can also use `DeploymentHandles` to make streaming method calls that return multiple outputs. To make a streaming call, the method must be a generator and you must set `handle.options(stream=True)`. Then, the handle call returns a {mod}`DeploymentResponseGenerator <ray.serve.handle.DeploymentResponseGenerator>` instead of a unary `DeploymentResponse`. You can use `DeploymentResponseGenerators` as a sync or async generator, like in an `async for` code block. Similar to `DeploymentResponse.result()`, avoid using a `DeploymentResponseGenerator` as a sync generator within a deployment, as that blocks other requests from executing concurrently on that replica. Note that you can't pass `DeploymentResponseGenerators` to other handle calls. If you have a use case requiring this feature, please file a feature request on GitHub.
 
 Example:
 
@@ -127,7 +122,6 @@ Example:
 :end-before: __streaming_example_end__
 :language: python
 ```
-
 ## Advanced: Pass a DeploymentResponse in a nested object [FULLY DEPRECATED]
 
 :::{warning}

From 17579cba9b793c2eccd2937d7a58e26ce22646db Mon Sep 17 00:00:00 2001
From: Frances Liu <francestfls@gmail.com>
Date: Tue, 5 Nov 2024 10:47:58 -0800
Subject: [PATCH 2/5] Edit doc changes to remove additional updates

Signed-off-by: Frances Liu <francestfls@gmail.com>
---
 doc/source/serve/advanced-guides/dev-workflow.md | 5 ++++-
 doc/source/serve/http-guide.md                   | 9 ---------
 doc/source/serve/model_composition.md            | 8 +++++++-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/doc/source/serve/advanced-guides/dev-workflow.md b/doc/source/serve/advanced-guides/dev-workflow.md
index 81e8080c3118..a909dca8566e 100644
--- a/doc/source/serve/advanced-guides/dev-workflow.md
+++ b/doc/source/serve/advanced-guides/dev-workflow.md
@@ -87,7 +87,10 @@ Ray Serve now supports a local testing mode that allows you to run your deployme
 serve.run(app, _local_testing_mode=True)
 ```
 
-This mode runs each deployment in a background thread and supports most of the same features as running on a full Ray cluster, with some limitations. For example, converting deployment responses to Ray object references is not supported in local testing mode.
+Alternatively, you can set the environment variable `RAY_SERVE_FORCE_LOCAL_TESTING_MODE=1`.
+
+This mode runs each deployment in a background thread and supports most of the same features as running on a full Ray cluster. Note that some features, such as converting `DeploymentResponses` to `ObjectRefs`, are not supported in local testing mode. If you encounter limitations, consider filing a feature request on GitHub.
+
 ## Testing on a remote cluster
 
 To test on a remote cluster, you'll use `serve run` again, but this time you'll pass in an `--address` argument to specify the address of the Ray cluster to connect to.  For remote clusters, this address has the form `ray://<head-node-ip-address>:10001`; see [Ray Client](ray-client-ref) for more information.
diff --git a/doc/source/serve/http-guide.md b/doc/source/serve/http-guide.md
index f1a210eaa643..054ac9ff2145 100644
--- a/doc/source/serve/http-guide.md
+++ b/doc/source/serve/http-guide.md
@@ -19,7 +19,6 @@ Considering your use case, you can choose the right level of abstraction:
 
 (serve-http)=
 ## Calling Deployments via HTTP
-
 When you deploy a Serve application, the [ingress deployment](serve-key-concepts-ingress-deployment) (the one passed to `serve.run`) is exposed over HTTP.
 
 ```{literalinclude} doc_code/http_guide/http_guide.py
@@ -59,14 +58,6 @@ To prevent an async call from being interrupted by `asyncio.CancelledError`, use
 When the request is cancelled, a cancellation error is raised inside the `SnoringSleeper` deployment's `__call__()` method. However, the cancellation is not raised inside the `snore()` call, so `ZZZ` is printed even if the request is cancelled. Note that `asyncio.shield` cannot be used on a `DeploymentHandle` call to prevent the downstream handler from being cancelled. You need to explicitly handle the cancellation error in that handler as well.
 
 (serve-fastapi-http)=
-
-### Local Testing Mode
-
-Ray Serve now supports a local testing mode, which allows you to run deployments locally for faster development and testing. This mode can be enabled by setting the `_local_testing_mode` flag to `True` in `serve.run()`. This feature is particularly useful for unit testing application and model composition logic without deploying to a full cluster.
-
-To enable local testing mode, you can set the environment variable `RAY_SERVE_FORCE_LOCAL_TESTING_MODE=1` or pass `_local_testing_mode=True` directly to `serve.run()`. This mode runs user code for each deployment in a background thread, allowing for the use of debugging tools like PDB.
-
-Note that some features, such as converting `DeploymentResponses` to `ObjectRefs`, are not supported in local testing mode. If you encounter limitations, consider filing a feature request on GitHub.
 ## FastAPI HTTP Deployments
 
 If you want to define more complex HTTP handling logic, Serve integrates with [FastAPI](https://fastapi.tiangolo.com/). This allows you to define a Serve deployment using the {mod}`@serve.ingress <ray.serve.ingress>` decorator that wraps a FastAPI app with its full range of features. The most basic example of this is shown below, but for more details on all that FastAPI has to offer such as variable routes, automatic type validation, dependency injection (e.g., for database connections), and more, please check out [their documentation](https://fastapi.tiangolo.com/).
diff --git a/doc/source/serve/model_composition.md b/doc/source/serve/model_composition.md
index fa98328af48e..28de84c558e6 100644
--- a/doc/source/serve/model_composition.md
+++ b/doc/source/serve/model_composition.md
@@ -113,7 +113,12 @@ Note how the response from the `Adder` handle passes directly to the `Multiplier
 
 ## Streaming DeploymentHandle calls
 
-You can also use `DeploymentHandles` to make streaming method calls that return multiple outputs. To make a streaming call, the method must be a generator and you must set `handle.options(stream=True)`. Then, the handle call returns a {mod}`DeploymentResponseGenerator <ray.serve.handle.DeploymentResponseGenerator>` instead of a unary `DeploymentResponse`. You can use `DeploymentResponseGenerators` as a sync or async generator, like in an `async for` code block. Similar to `DeploymentResponse.result()`, avoid using a `DeploymentResponseGenerator` as a sync generator within a deployment, as that blocks other requests from executing concurrently on that replica. Note that you can't pass `DeploymentResponseGenerators` to other handle calls. If you have a use case requiring this feature, please file a feature request on GitHub.
+You can also use `DeploymentHandles` to make streaming method calls that return multiple outputs.
+To make a streaming call, the method must be a generator and you must set `handle.options(stream=True)`.
+Then, the handle call returns a {mod}`DeploymentResponseGenerator <ray.serve.handle.DeploymentResponseGenerator>` instead of a unary `DeploymentResponse`.
+You can use `DeploymentResponseGenerators` as a sync or async generator, like in an `async for` code block.
+Similar to `DeploymentResponse.result()`, avoid using a `DeploymentResponseGenerator` as a sync generator within a deployment, as that blocks other requests from executing concurrently on that replica.
+Note that you can't pass `DeploymentResponseGenerators` to other handle calls.
 
 Example:
 
@@ -122,6 +127,7 @@ Example:
 :end-before: __streaming_example_end__
 :language: python
 ```
+
 ## Advanced: Pass a DeploymentResponse in a nested object [FULLY DEPRECATED]
 
 :::{warning}

From 9fcd41f03484c4ab24ae01b3ace6653d00f74b63 Mon Sep 17 00:00:00 2001
From: frances720 <francestfls@gmail.com>
Date: Wed, 6 Nov 2024 11:57:30 -0800
Subject: [PATCH 3/5] Apply suggestions from code review

Style changes

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: frances720 <francestfls@gmail.com>
---
 doc/source/serve/advanced-guides/dev-workflow.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/source/serve/advanced-guides/dev-workflow.md b/doc/source/serve/advanced-guides/dev-workflow.md
index a909dca8566e..006771a947ba 100644
--- a/doc/source/serve/advanced-guides/dev-workflow.md
+++ b/doc/source/serve/advanced-guides/dev-workflow.md
@@ -77,11 +77,11 @@ After you're done testing, you can shut down Ray Serve by interrupting the `serv
 (ServeController pid=9865) INFO 2022-08-11 11:47:19,929 controller 9865 deployment_state.py:1257 - Removing 1 replicas from deployment 'HelloDeployment'.
 ```
 
-Note that rerunning `serve run` will redeploy all deployments. To prevent redeploying those deployments whose code hasn't changed, you can use `serve deploy`; see the [Production Guide](serve-in-production) for details.
+Note that rerunning `serve run` redeploys all deployments. To prevent redeploying the deployments whose code hasn't changed, you can use `serve deploy`; see the [Production Guide](serve-in-production) for details.
 
 ### Local Testing Mode
 
-Ray Serve now supports a local testing mode that allows you to run your deployments locally in a single process. This mode is useful for unit testing and debugging your application logic without the overhead of a full Ray cluster. To enable this mode, use the `_local_testing_mode` flag in the `serve.run` function:
+Ray Serve supports a local testing mode that allows you to run your deployments locally in a single process. This mode is useful for unit testing and debugging your application logic without the overhead of a full Ray cluster. To enable this mode, use the `_local_testing_mode` flag in the `serve.run` function:
 
 ```python
 serve.run(app, _local_testing_mode=True)
@@ -89,11 +89,11 @@ serve.run(app, _local_testing_mode=True)
 
 Alternatively, you can set the environment variable `RAY_SERVE_FORCE_LOCAL_TESTING_MODE=1`.
 
-This mode runs each deployment in a background thread and supports most of the same features as running on a full Ray cluster. Note that some features, such as converting `DeploymentResponses` to `ObjectRefs`, are not supported in local testing mode. If you encounter limitations, consider filing a feature request on GitHub.
+This mode runs each deployment in a background thread and supports most of the same features as running on a full Ray cluster. Note that some features, such as converting `DeploymentResponses` to `ObjectRefs`, aren't supported in local testing mode. If you encounter limitations, consider filing a feature request on GitHub.
 
 ## Testing on a remote cluster
 
-To test on a remote cluster, you'll use `serve run` again, but this time you'll pass in an `--address` argument to specify the address of the Ray cluster to connect to.  For remote clusters, this address has the form `ray://<head-node-ip-address>:10001`; see [Ray Client](ray-client-ref) for more information.
+To test on a remote cluster, use `serve run` again, but this time, pass in an `--address` argument to specify the address of the Ray cluster to connect to.  For remote clusters, this address has the form `ray://<head-node-ip-address>:10001`; see [Ray Client](ray-client-ref) for more information.
 
 When making the transition from your local machine to a remote cluster, you'll need to make sure your cluster has a similar environment to your local machine--files, environment variables, and Python packages, for example.
 

From 94198caa1715e5f56771a170f5b6c131cf91447a Mon Sep 17 00:00:00 2001
From: Frances Liu <francestfls@gmail.com>
Date: Thu, 7 Nov 2024 09:53:13 -0800
Subject: [PATCH 4/5] Resolve comment changes

Signed-off-by: Frances Liu <francestfls@gmail.com>
---
 doc/source/serve/advanced-guides/dev-workflow.md | 10 +++++-----
 doc/source/serve/doc_code/local_dev.py           |  4 ++++
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/doc/source/serve/advanced-guides/dev-workflow.md b/doc/source/serve/advanced-guides/dev-workflow.md
index 006771a947ba..5720b28165cb 100644
--- a/doc/source/serve/advanced-guides/dev-workflow.md
+++ b/doc/source/serve/advanced-guides/dev-workflow.md
@@ -83,13 +83,13 @@ Note that rerunning `serve run` redeploys all deployments. To prevent redeployin
 
 Ray Serve supports a local testing mode that allows you to run your deployments locally in a single process. This mode is useful for unit testing and debugging your application logic without the overhead of a full Ray cluster. To enable this mode, use the `_local_testing_mode` flag in the `serve.run` function:
 
-```python
-serve.run(app, _local_testing_mode=True)
+```{literalinclude} ../doc_code/local_dev.py
+:start-after: __local_dev_testing_start__
+:end-before: __local_dev_testing_end__
+:language: python
 ```
 
-Alternatively, you can set the environment variable `RAY_SERVE_FORCE_LOCAL_TESTING_MODE=1`.
-
-This mode runs each deployment in a background thread and supports most of the same features as running on a full Ray cluster. Note that some features, such as converting `DeploymentResponses` to `ObjectRefs`, aren't supported in local testing mode. If you encounter limitations, consider filing a feature request on GitHub.
+This mode runs each deployment in a background thread and supports most of the same features as running on a full Ray cluster. Note that some features, such as converting `DeploymentResponses` to `ObjectRefs`, are not supported in local testing mode. If you encounter limitations, consider filing a feature request on GitHub.
 
 ## Testing on a remote cluster
 
diff --git a/doc/source/serve/doc_code/local_dev.py b/doc/source/serve/doc_code/local_dev.py
index f5ef7ead2795..269f819b6590 100644
--- a/doc/source/serve/doc_code/local_dev.py
+++ b/doc/source/serve/doc_code/local_dev.py
@@ -32,3 +32,7 @@ async def __call__(self, request: Request):
 response: DeploymentResponse = handle.say_hello_twice.remote(name="Ray")
 assert response.result() == "Hello, Ray! Hello, Ray!"
 # __local_dev_handle_end__
+
+# __local_dev_testing_start__
+serve.run(app, _local_testing_mode=True)
+# __local_dev_testing_end__

From 6011019234c1bc322224996644b6bc028c472d47 Mon Sep 17 00:00:00 2001
From: Frances Liu <francestfls@gmail.com>
Date: Thu, 7 Nov 2024 10:07:18 -0800
Subject: [PATCH 5/5] Add note block to indicate experimental

Signed-off-by: Frances Liu <francestfls@gmail.com>
---
 doc/source/serve/advanced-guides/dev-workflow.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/source/serve/advanced-guides/dev-workflow.md b/doc/source/serve/advanced-guides/dev-workflow.md
index 5720b28165cb..9785c7a68643 100644
--- a/doc/source/serve/advanced-guides/dev-workflow.md
+++ b/doc/source/serve/advanced-guides/dev-workflow.md
@@ -81,6 +81,10 @@ Note that rerunning `serve run` redeploys all deployments. To prevent redeployin
 
 ### Local Testing Mode
 
+:::{note}
+This is an experimental feature.
+:::
+
 Ray Serve supports a local testing mode that allows you to run your deployments locally in a single process. This mode is useful for unit testing and debugging your application logic without the overhead of a full Ray cluster. To enable this mode, use the `_local_testing_mode` flag in the `serve.run` function:
 
 ```{literalinclude} ../doc_code/local_dev.py