From 52a80a8db92c261a706b78bc63b8419bb5251e3c Mon Sep 17 00:00:00 2001 From: Fiona Corden Date: Thu, 15 Jan 2026 09:29:43 +0000 Subject: [PATCH 1/3] WIP - add pricing information to overview --- src/pages/docs/ai-transport/index.mdx | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/src/pages/docs/ai-transport/index.mdx b/src/pages/docs/ai-transport/index.mdx index 90f130a268..30c0870f30 100644 --- a/src/pages/docs/ai-transport/index.mdx +++ b/src/pages/docs/ai-transport/index.mdx @@ -134,4 +134,17 @@ Take a look at some example code running in-browser of the sorts of features you ## Pricing -// Todo +AI Transport uses Ably's [usage based billing model](/pricing) at your package rates. Your consumption costs will depend on the number of messages inbound (published to Ably) and outbound (delivered to subscribers), and how long channels or connections are active. [Contact Ably](/contact) to discuss options for Enterprise pricing and volume discounts. + +The cost of streaming token responses over Ably depends on: + +- the number of tokens in the LLM responses that you are streaming. For example, a simple support chatbot response is around 300 tokens, a code session chat can be 2000-3000 tokens and a deep reasoning response could be 50000+ tokens. +- the rate at which your agent publishes tokens to Ably and the number of messages it uses to do so. Some LLMs output every token as a single event, while others batch multiple tokens together. Similarly, your agent may publish tokens as they are received from the LLM or perform its own processing and batching first. +- the number of subscribers receiving the response +- the [token streaming pattern](/docs/ai-transport/features/token-streaming#token-streaming-patterns) you choose + + +- message-per-response Ably will automatically + +- message-per-token you are in control, you can turn on server side batching to group messages together in a batching interval. Higher batching interval increases latency but reduces total number of messages, lower batching interval delivers messages quickly. +[server-side batching](/docs/messages/batch#server-side) From 56a0dfc7baa4d2eeeba2f9332c68efd8ddf113dc Mon Sep 17 00:00:00 2001 From: Fiona Corden Date: Thu, 15 Jan 2026 16:03:00 +0000 Subject: [PATCH 2/3] WIP - pricing notes --- src/pages/docs/ai-transport/index.mdx | 14 ++++------- .../platform/pricing/examples/ai-chatbot.mdx | 23 +++++++++++++++++++ 2 files changed, 28 insertions(+), 9 deletions(-) create mode 100644 src/pages/docs/platform/pricing/examples/ai-chatbot.mdx diff --git a/src/pages/docs/ai-transport/index.mdx b/src/pages/docs/ai-transport/index.mdx index 30c0870f30..9e6106699f 100644 --- a/src/pages/docs/ai-transport/index.mdx +++ b/src/pages/docs/ai-transport/index.mdx @@ -134,17 +134,13 @@ Take a look at some example code running in-browser of the sorts of features you ## Pricing -AI Transport uses Ably's [usage based billing model](/pricing) at your package rates. Your consumption costs will depend on the number of messages inbound (published to Ably) and outbound (delivered to subscribers), and how long channels or connections are active. [Contact Ably](/contact) to discuss options for Enterprise pricing and volume discounts. +AI Transport uses Ably's [usage based billing model](/docs/platform/pricing) at your package rates. Your consumption costs will depend on the number of messages inbound (published to Ably) and outbound (delivered to subscribers), and how long channels or connections are active. [Contact Ably](https://ably.com/contact) to discuss options for Enterprise pricing and volume discounts. The cost of streaming token responses over Ably depends on: -- the number of tokens in the LLM responses that you are streaming. For example, a simple support chatbot response is around 300 tokens, a code session chat can be 2000-3000 tokens and a deep reasoning response could be 50000+ tokens. +- the number of tokens in the LLM responses that you are streaming. For example, a simple support chatbot response is around 300 tokens, a code session chat can be 2,000-3,000 tokens and a deep reasoning response could be over 50,000 tokens. - the rate at which your agent publishes tokens to Ably and the number of messages it uses to do so. Some LLMs output every token as a single event, while others batch multiple tokens together. Similarly, your agent may publish tokens as they are received from the LLM or perform its own processing and batching first. -- the number of subscribers receiving the response -- the [token streaming pattern](/docs/ai-transport/features/token-streaming#token-streaming-patterns) you choose +- the number of subscribers receiving the response. +- the [token streaming pattern](/docs/ai-transport/features/token-streaming#token-streaming-patterns) you choose. - -- message-per-response Ably will automatically - -- message-per-token you are in control, you can turn on server side batching to group messages together in a batching interval. Higher batching interval increases latency but reduces total number of messages, lower batching interval delivers messages quickly. -[server-side batching](/docs/messages/batch#server-side) +*** Link to worked example(s) *** diff --git a/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx b/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx new file mode 100644 index 0000000000..5a433988fa --- /dev/null +++ b/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx @@ -0,0 +1,23 @@ +--- +title: AI support chatbot +meta_description: "Calculate AI Transport pricing for conversations with an AI chatbot. Example shows how using the message-per-response pattern and modifying the append rollup window can generate cost savings." +meta_keywords: "chatbot, support chat, token streaming, token cost, AI Transport pricing, Ably AI Transport pricing, stream cost, Pub/Sub pricing, realtime data delivery, Ably Pub/Sub pricing" +intro: "This example uses consumption-based pricing for an AI support chatbot use case, where a single agent is publishing tokens to user over AI Transport." +--- + +### Assumptions + +The scale and features used in this calculation. + +### Cost summary + +The high level cost breakdown for this scenario. Messages are billed for both inbound (published to Ably) and outbound (delivered to subscribers). + +### Effect + + +- message-per-response Ably will automatically + +- message-per-token you are in control, you can turn on server side batching to group messages together in a batching interval. Higher batching interval increases latency but reduces total number of messages, lower batching interval delivers messages quickly. +[server-side batching](/docs/messages/batch#server-side) + From 5d511d9da78055411f31bde5e9d44b394938afc7 Mon Sep 17 00:00:00 2001 From: Fiona Corden Date: Thu, 15 Jan 2026 22:19:13 +0000 Subject: [PATCH 3/3] Add worked example for AI chatbot use case --- src/pages/docs/ai-transport/index.mdx | 2 +- .../platform/pricing/examples/ai-chatbot.mdx | 44 ++++++++++++++++--- 2 files changed, 40 insertions(+), 6 deletions(-) diff --git a/src/pages/docs/ai-transport/index.mdx b/src/pages/docs/ai-transport/index.mdx index 9e6106699f..4753186876 100644 --- a/src/pages/docs/ai-transport/index.mdx +++ b/src/pages/docs/ai-transport/index.mdx @@ -143,4 +143,4 @@ The cost of streaming token responses over Ably depends on: - the number of subscribers receiving the response. - the [token streaming pattern](/docs/ai-transport/features/token-streaming#token-streaming-patterns) you choose. -*** Link to worked example(s) *** +For example, an AI support chatbot sending a response of 250 tokens at 70 tokens/s to a single client using the [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) pattern would consume 90 inbound messages, 90 outbound messages and 90 persisted messages. See the [AI support chatbot pricing example](/docs/platform/pricing/examples/ai-chatbot) for a full breakdown of the costs in this scenario. diff --git a/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx b/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx index 5a433988fa..41d08bc5d4 100644 --- a/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx +++ b/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx @@ -9,15 +9,49 @@ intro: "This example uses consumption-based pricing for an AI support chatbot us The scale and features used in this calculation. +| Scale | Features | +|-------|----------| +| 4 user prompts to get to resolution | ✓ Message-per-response | +| 250 tokens per LLM response | | +| 70 appends per second from agent | | +| 3 minute average chat duration | | +| 1 million chats | | + ### Cost summary -The high level cost breakdown for this scenario. Messages are billed for both inbound (published to Ably) and outbound (delivered to subscribers). +The high level cost breakdown for this scenario. Messages are billed for both inbound (published to Ably) and outbound (delivered to subscribers). Creating the "Message updates and deletes" [channel rule](/docs/ai-transport/features/token-streaming/message-per-response#enable) will automatically enable message persistence. + +| Item | Calculation | Cost | +|------|-------------|------| +| Messages | 1092M × $2.50/M | $2730.00 | +| Connection minutes | 6M × $1.00/M | $6.00 | +| Channel minutes | 3M × $1.00/M | $3.00 | +| Package fee | | [See plans](/pricing) | +| **Total** | | **~$2739.00/M chats** | + +### Message breakdown + +How the message cost breaks down. The message-per-response pattern includes [automatic rollup of append events](/docs/ai-transport/features/token-streaming/token-rate-limits#per-response) to reduce consumption costs and avoid rate limits. + +| Type | Calculation | Inbound | Outbound | Total messages | Cost | +|------|-------------|---------|----------|----------------|------| +| User prompts | 1M chats × 4 prompts | 4M | 4M | 8M | $20.00 | +| Agent responses | 1M chats x 4 responses x 250 token events per response | 360M | 360M | 720M | $1800.00 | +| Persisted messages | Every inbound message is persisted | 364M | 0 | 364M | $910.00 | -### Effect +### Effect of append rollup +The calculation above uses the default append rollup window of 40ms, chosen to control costs with minimum impact on responsiveness. For a text chatbot use case, you could increase the window to 200ms without noticably impacting the user experience. -- message-per-response Ably will automatically +| Rollup window | Inbound response messages | Total messages | Cost | +|---------------|---------------------------|----------------|------| +| 40ms | 360 per chat | 1092M | $2730.00/M chats | +| 100ms | 144 per chat | 444M | $1110.00/M chats | +| 200ms | 72 per chat | 228M | $570.00/M chats | -- message-per-token you are in control, you can turn on server side batching to group messages together in a batching interval. Higher batching interval increases latency but reduces total number of messages, lower batching interval delivers messages quickly. -[server-side batching](/docs/messages/batch#server-side) +