From edd06c7a32fee9731835d35c7dec5170602ce483 Mon Sep 17 00:00:00 2001
From: Mike Christensen <mike.christensen@ably.com>
Date: Thu, 15 Jan 2026 14:50:44 +0000
Subject: [PATCH] ait: advise disabling echo

For high volume token streaming uses cases, we generally advise
disabling `echoMessages` to avoid incurring additional cost from echoed
messages.
---
 .../features/messaging/accepting-user-input.mdx          | 8 ++++++++
 .../ai-transport/features/messaging/chain-of-thought.mdx | 8 ++++++++
 .../docs/ai-transport/features/messaging/citations.mdx   | 4 ++++
 .../features/messaging/human-in-the-loop.mdx             | 8 ++++++++
 .../docs/ai-transport/features/messaging/tool-calls.mdx  | 8 ++++++++
 .../features/token-streaming/message-per-response.mdx    | 4 ++++
 .../features/token-streaming/message-per-token.mdx       | 4 ++++
 .../ai-transport/anthropic-message-per-response.mdx      | 9 ++++++++-
 .../guides/ai-transport/anthropic-message-per-token.mdx  | 9 ++++++++-
 .../guides/ai-transport/openai-message-per-response.mdx  | 9 ++++++++-
 .../guides/ai-transport/openai-message-per-token.mdx     | 9 ++++++++-
 11 files changed, 76 insertions(+), 4 deletions(-)
diff --git a/src/pages/docs/ai-transport/features/messaging/accepting-user-input.mdx b/src/pages/docs/ai-transport/features/messaging/accepting-user-input.mdx
index 7442b12ab9..4ce509c3ca 100644
--- a/src/pages/docs/ai-transport/features/messaging/accepting-user-input.mdx
+++ b/src/pages/docs/ai-transport/features/messaging/accepting-user-input.mdx
@@ -109,6 +109,10 @@ await channel.publish('user-input', {
 ```
 </Code>
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` to prevent the publishing client from receiving its own message, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo). When disabled, update your UI to reflect user input immediately upon sending rather than waiting for the echoed message.
+</Aside>
+
 ## Subscribe to user input <a id="subscribe"/>
 
 The agent subscribes to a channel to receive messages from users. When a user publishes a message to the channel, the agent receives it through the subscription callback.
@@ -191,6 +195,10 @@ await channel.subscribe('agent-response', (message) => {
 ```
 </Code>
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own responses, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 ## Stream responses <a id="stream"/>
 
 For longer AI responses, you'll typically want to stream tokens back to the user rather than waiting for the complete response. The `promptId` correlation allows users to associate streamed tokens with their original prompt.
diff --git a/src/pages/docs/ai-transport/features/messaging/chain-of-thought.mdx b/src/pages/docs/ai-transport/features/messaging/chain-of-thought.mdx
index e058e9e39f..35c6473ac9 100644
--- a/src/pages/docs/ai-transport/features/messaging/chain-of-thought.mdx
+++ b/src/pages/docs/ai-transport/features/messaging/chain-of-thought.mdx
@@ -82,6 +82,10 @@ for await (const event of stream) {
 To learn how to stream individual tokens as they are generated, see the [token streaming](/docs/ai-transport/features/token-streaming) documentation.
 </Aside>
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own reasoning and output messages, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 #### Subscribing
 
 Subscribe to both reasoning and model output messages on the same channel.
@@ -204,6 +208,10 @@ for await (const event of stream) {
 To learn how to stream individual tokens as they are generated, see the [token streaming](/docs/ai-transport/features/token-streaming) documentation.
 </Aside>
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own reasoning and output messages, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 #### Subscribing
 
 Subscribe to the main conversation channel to receive control messages and model output. Subscribe to the reasoning channel on demand, for example in response to a click event.
diff --git a/src/pages/docs/ai-transport/features/messaging/citations.mdx b/src/pages/docs/ai-transport/features/messaging/citations.mdx
index d44f85e614..50f162785d 100644
--- a/src/pages/docs/ai-transport/features/messaging/citations.mdx
+++ b/src/pages/docs/ai-transport/features/messaging/citations.mdx
@@ -139,6 +139,10 @@ await channel.annotations.publish(msgSerial, {
 When streaming response tokens using the [message-per-response](/docs/ai-transport/message-per-response) pattern, citations can be published while the response is still being streamed since the `serial` of the response message is known after the [initial message is published](/docs/ai-transport/features/token-streaming/message-per-response#publishing).
 </Aside>
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own responses and citations, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 <Aside data-type="note">
 Identify the agent with a [`clientId`](/docs/messages#properties) in order to attribute a citation to a specific agent. This is useful in multi-agent architectures where multiple agents may contribute citations to the same response. For more information, see [Agent identity](/docs/ai-transport/features/sessions-identity/identifying-users-and-agents#agent-identity).
 </Aside>
diff --git a/src/pages/docs/ai-transport/features/messaging/human-in-the-loop.mdx b/src/pages/docs/ai-transport/features/messaging/human-in-the-loop.mdx
index ba1d7ceec2..1c020871cf 100644
--- a/src/pages/docs/ai-transport/features/messaging/human-in-the-loop.mdx
+++ b/src/pages/docs/ai-transport/features/messaging/human-in-the-loop.mdx
@@ -52,6 +52,10 @@ async function requestHumanApproval(toolCall) {
 ```
 </Code>
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own approval requests, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 ## Review and decide <a id="review"/>
 
 Authorized humans subscribe to approval requests on the conversation channel and publish their decisions. The `requestId` correlates the response with the original request.
@@ -99,6 +103,10 @@ async function reject(requestId) {
 ```
 </Code>
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` in the client options to prevent approval responses from being echoed back to the approver. This avoids billing for the echoed message. When disabled, update your UI to reflect the approval decision immediately upon sending rather than waiting for the echoed message. See [echoing messages](/docs/pub-sub/advanced#echo) for more details.
+</Aside>
+
 ## Process the decision <a id="process"/>
 
 The agent listens for human decisions and acts accordingly. When a response arrives, the agent retrieves the pending request using the `requestId`, verifies that the user is permitted to approve that specific action, and either executes the action or handles the rejection.
diff --git a/src/pages/docs/ai-transport/features/messaging/tool-calls.mdx b/src/pages/docs/ai-transport/features/messaging/tool-calls.mdx
index 36fc09784a..d01db9a499 100644
--- a/src/pages/docs/ai-transport/features/messaging/tool-calls.mdx
+++ b/src/pages/docs/ai-transport/features/messaging/tool-calls.mdx
@@ -94,6 +94,10 @@ Model APIs like OpenAI's [Responses API](https://platform.openai.com/docs/api-re
 To learn how to stream individual tokens as they are generated, see the [token streaming](/docs/ai-transport/features/token-streaming) documentation.
 </Aside>
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own tool call messages, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 ## Subscribing to tool calls <a id="subscribing"/>
 
 Subscribe to tool call and model output messages on the channel.
@@ -239,6 +243,10 @@ await channel.subscribe('tool_call', async (message) => {
 Client-side tools often require user permission to access device APIs. These permissions are managed by the device operating system, not the agent. Handle permission denials gracefully by publishing an error tool result so the AI can respond appropriately.
 </Aside>
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` in the client options to prevent tool results from being echoed back to the client that published them. This avoids billing for the echoed message. When disabled, update your UI to reflect the tool execution status immediately upon sending rather than waiting for the echoed message. See [echoing messages](/docs/pub-sub/advanced#echo) for more details.
+</Aside>
+
 The agent subscribes to tool results to continue processing. The `toolCallId` correlates the result back to the original request:
 
 <Code>
diff --git a/src/pages/docs/ai-transport/features/token-streaming/message-per-response.mdx b/src/pages/docs/ai-transport/features/token-streaming/message-per-response.mdx
index 5b4f09934b..60031382d1 100644
--- a/src/pages/docs/ai-transport/features/token-streaming/message-per-response.mdx
+++ b/src/pages/docs/ai-transport/features/token-streaming/message-per-response.mdx
@@ -94,6 +94,10 @@ for await (const event of stream) {
 Append only supports concatenating data of the same type as the original message. For example, if the initial message data is a string, all appended tokens must also be strings. If the initial message data is binary, all appended tokens must be binary.
 </Aside>
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own streamed tokens, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 This pattern allows publishing append operations for multiple concurrent model responses on the same channel. As long as you append to the correct message serial, tokens from different responses will not interfere with each other, and the final concatenated message for each response will contain only the tokens from that response.
 
 ## Subscribing to token streams <a id="subscribing"/>
diff --git a/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx b/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx
index 66232ff5a2..4a31f2fa38 100644
--- a/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx
+++ b/src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx
@@ -48,6 +48,10 @@ for await (const event of stream) {
 
 This approach maximizes throughput while maintaining ordering guarantees, allowing you to stream tokens as fast as your AI model generates them.
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own streamed tokens, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 ## Streaming patterns <a id="patterns"/>
 
 Ably is a pub/sub messaging platform, so you can structure your messages however works best for your application. Below are common patterns for streaming tokens, each showing both agent-side publishing and client-side subscription. Choose the approach that fits your use case, or create your own variation.
diff --git a/src/pages/docs/guides/ai-transport/anthropic-message-per-response.mdx b/src/pages/docs/guides/ai-transport/anthropic-message-per-response.mdx
index 7cc7579371..fd5e3ab7e3 100644
--- a/src/pages/docs/guides/ai-transport/anthropic-message-per-response.mdx
+++ b/src/pages/docs/guides/ai-transport/anthropic-message-per-response.mdx
@@ -175,7 +175,10 @@ Add the Ably client initialization to your `publisher.mjs` file:
 import Ably from 'ably';
 
 // Initialize Ably Realtime client
-const realtime = new Ably.Realtime({ key: '{{API_KEY}}' });
+const realtime = new Ably.Realtime({
+  key: '{{API_KEY}}',
+  echoMessages: false
+});
 
 // Create a channel for publishing streamed AI responses
 const channel = realtime.channels.get('ai:{{RANDOM_CHANNEL_NAME}}');
@@ -184,6 +187,10 @@ const channel = realtime.channels.get('ai:{{RANDOM_CHANNEL_NAME}}');
 
 The Ably Realtime client maintains a persistent connection to the Ably service, which allows you to publish tokens at high message rates with low latency.
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own streamed tokens, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 ### Publish initial message and append tokens <a id="publish-and-append"/>
 
 When a new response begins, publish an initial message to create it. Ably assigns a [`serial`](/docs/messages#properties) identifier to the message. Use this `serial` to append each token to the message as it arrives from the Anthropic model.
diff --git a/src/pages/docs/guides/ai-transport/anthropic-message-per-token.mdx b/src/pages/docs/guides/ai-transport/anthropic-message-per-token.mdx
index 30f32d844e..73e8d86c76 100644
--- a/src/pages/docs/guides/ai-transport/anthropic-message-per-token.mdx
+++ b/src/pages/docs/guides/ai-transport/anthropic-message-per-token.mdx
@@ -152,7 +152,10 @@ Add the Ably client initialization to your `publisher.mjs` file:
 import Ably from 'ably';
 
 // Initialize Ably Realtime client
-const realtime = new Ably.Realtime({ key: '{{API_KEY}}' });
+const realtime = new Ably.Realtime({
+  key: '{{API_KEY}}',
+  echoMessages: false
+});
 
 // Create a channel for publishing streamed AI responses
 const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}');
@@ -161,6 +164,10 @@ const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}');
 
 The Ably Realtime client maintains a persistent connection to the Ably service, which allows you to publish tokens at high message rates with low latency.
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own streamed tokens, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 ### Map Anthropic streaming events to Ably messages <a id="map-events"/>
 
 Choose how to map [Anthropic streaming events](#understand-streaming-events) to Ably messages. You can choose any mapping strategy that suits your application's needs. This guide uses the following pattern as an example:
diff --git a/src/pages/docs/guides/ai-transport/openai-message-per-response.mdx b/src/pages/docs/guides/ai-transport/openai-message-per-response.mdx
index 76e0ed6a2b..7ac44a9e02 100644
--- a/src/pages/docs/guides/ai-transport/openai-message-per-response.mdx
+++ b/src/pages/docs/guides/ai-transport/openai-message-per-response.mdx
@@ -189,7 +189,10 @@ Add the Ably client initialization to your `publisher.mjs` file:
 import Ably from 'ably';
 
 // Initialize Ably Realtime client
-const realtime = new Ably.Realtime({ key: '{{API_KEY}}' });
+const realtime = new Ably.Realtime({
+  key: '{{API_KEY}}',
+  echoMessages: false
+});
 
 // Create a channel for publishing streamed AI responses
 const channel = realtime.channels.get('ai:{{RANDOM_CHANNEL_NAME}}');
@@ -198,6 +201,10 @@ const channel = realtime.channels.get('ai:{{RANDOM_CHANNEL_NAME}}');
 
 The Ably Realtime client maintains a persistent connection to the Ably service, which allows you to publish tokens at high message rates with low latency.
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own streamed tokens, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 ### Publish initial message and append tokens <a id="publish-and-append"/>
 
 When a new response begins, publish an initial message to create it. Ably assigns a [`serial`](/docs/messages#properties) identifier to the message. Use this `serial` to append each token to the message as it arrives from the OpenAI model.
diff --git a/src/pages/docs/guides/ai-transport/openai-message-per-token.mdx b/src/pages/docs/guides/ai-transport/openai-message-per-token.mdx
index adc9fdfb87..d88b7f5991 100644
--- a/src/pages/docs/guides/ai-transport/openai-message-per-token.mdx
+++ b/src/pages/docs/guides/ai-transport/openai-message-per-token.mdx
@@ -166,7 +166,10 @@ Add the Ably client initialization to your `publisher.mjs` file:
 import Ably from 'ably';
 
 // Initialize Ably Realtime client
-const realtime = new Ably.Realtime({ key: '{{API_KEY}}' });
+const realtime = new Ably.Realtime({
+  key: '{{API_KEY}}',
+  echoMessages: false
+});
 
 // Create a channel for publishing streamed AI responses
 const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}');
@@ -175,6 +178,10 @@ const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}');
 
 The Ably Realtime client maintains a persistent connection to the Ably service, which allows you to publish tokens at high message rates with low latency.
 
+<Aside data-type="note">
+Set [`echoMessages`](/docs/api/realtime-sdk/types#client-options) to `false` on the agent's Ably client to prevent the agent from receiving its own streamed tokens, avoiding billing for [echoed messages](/docs/pub-sub/advanced#echo).
+</Aside>
+
 ### Map OpenAI streaming events to Ably messages <a id="map-events"/>
 
 Choose how to map [OpenAI streaming events](#understand-streaming-events) to Ably messages. You can choose any mapping strategy that suits your application's needs. This guide uses the following pattern as an example: