diff --git a/src/pages/docs/ai-transport/index.mdx b/src/pages/docs/ai-transport/index.mdx index 90f130a268..4753186876 100644 --- a/src/pages/docs/ai-transport/index.mdx +++ b/src/pages/docs/ai-transport/index.mdx @@ -134,4 +134,13 @@ Take a look at some example code running in-browser of the sorts of features you ## Pricing -// Todo +AI Transport uses Ably's [usage based billing model](/docs/platform/pricing) at your package rates. Your consumption costs will depend on the number of messages inbound (published to Ably) and outbound (delivered to subscribers), and how long channels or connections are active. [Contact Ably](https://ably.com/contact) to discuss options for Enterprise pricing and volume discounts. + +The cost of streaming token responses over Ably depends on: + +- the number of tokens in the LLM responses that you are streaming. For example, a simple support chatbot response is around 300 tokens, a code session chat can be 2,000-3,000 tokens and a deep reasoning response could be over 50,000 tokens. +- the rate at which your agent publishes tokens to Ably and the number of messages it uses to do so. Some LLMs output every token as a single event, while others batch multiple tokens together. Similarly, your agent may publish tokens as they are received from the LLM or perform its own processing and batching first. +- the number of subscribers receiving the response. +- the [token streaming pattern](/docs/ai-transport/features/token-streaming#token-streaming-patterns) you choose. + +For example, an AI support chatbot sending a response of 250 tokens at 70 tokens/s to a single client using the [message-per-response](/docs/ai-transport/features/token-streaming/message-per-response) pattern would consume 90 inbound messages, 90 outbound messages and 90 persisted messages. See the [AI support chatbot pricing example](/docs/platform/pricing/examples/ai-chatbot) for a full breakdown of the costs in this scenario. diff --git a/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx b/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx new file mode 100644 index 0000000000..41d08bc5d4 --- /dev/null +++ b/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx @@ -0,0 +1,57 @@ +--- +title: AI support chatbot +meta_description: "Calculate AI Transport pricing for conversations with an AI chatbot. Example shows how using the message-per-response pattern and modifying the append rollup window can generate cost savings." +meta_keywords: "chatbot, support chat, token streaming, token cost, AI Transport pricing, Ably AI Transport pricing, stream cost, Pub/Sub pricing, realtime data delivery, Ably Pub/Sub pricing" +intro: "This example uses consumption-based pricing for an AI support chatbot use case, where a single agent is publishing tokens to user over AI Transport." +--- + +### Assumptions + +The scale and features used in this calculation. + +| Scale | Features | +|-------|----------| +| 4 user prompts to get to resolution | ✓ Message-per-response | +| 250 tokens per LLM response | | +| 70 appends per second from agent | | +| 3 minute average chat duration | | +| 1 million chats | | + +### Cost summary + +The high level cost breakdown for this scenario. Messages are billed for both inbound (published to Ably) and outbound (delivered to subscribers). Creating the "Message updates and deletes" [channel rule](/docs/ai-transport/features/token-streaming/message-per-response#enable) will automatically enable message persistence. + +| Item | Calculation | Cost | +|------|-------------|------| +| Messages | 1092M × $2.50/M | $2730.00 | +| Connection minutes | 6M × $1.00/M | $6.00 | +| Channel minutes | 3M × $1.00/M | $3.00 | +| Package fee | | [See plans](/pricing) | +| **Total** | | **~$2739.00/M chats** | + +### Message breakdown + +How the message cost breaks down. The message-per-response pattern includes [automatic rollup of append events](/docs/ai-transport/features/token-streaming/token-rate-limits#per-response) to reduce consumption costs and avoid rate limits. + +| Type | Calculation | Inbound | Outbound | Total messages | Cost | +|------|-------------|---------|----------|----------------|------| +| User prompts | 1M chats × 4 prompts | 4M | 4M | 8M | $20.00 | +| Agent responses | 1M chats x 4 responses x 250 token events per response | 360M | 360M | 720M | $1800.00 | +| Persisted messages | Every inbound message is persisted | 364M | 0 | 364M | $910.00 | + +### Effect of append rollup + +The calculation above uses the default append rollup window of 40ms, chosen to control costs with minimum impact on responsiveness. For a text chatbot use case, you could increase the window to 200ms without noticably impacting the user experience. + +| Rollup window | Inbound response messages | Total messages | Cost | +|---------------|---------------------------|----------------|------| +| 40ms | 360 per chat | 1092M | $2730.00/M chats | +| 100ms | 144 per chat | 444M | $1110.00/M chats | +| 200ms | 72 per chat | 228M | $570.00/M chats | + + +