From b0adc211fd1a146b9a9f32de829ae821d6693149 Mon Sep 17 00:00:00 2001 From: Pallavi-Janardhan Date: Fri, 26 Dec 2025 19:02:32 +0530 Subject: [PATCH 1/7] eventing_memory_quota_first_cut_for_my_reference --- .../install/pages/eventing-memory-quota.adoc | 157 ++++++++++++++++++ modules/install/pages/sizing-general.adoc | 9 +- 2 files changed, 164 insertions(+), 2 deletions(-) create mode 100644 modules/install/pages/eventing-memory-quota.adoc diff --git a/modules/install/pages/eventing-memory-quota.adoc b/modules/install/pages/eventing-memory-quota.adoc new file mode 100644 index 0000000000..2426b83ec8 --- /dev/null +++ b/modules/install/pages/eventing-memory-quota.adoc @@ -0,0 +1,157 @@ += Eventing Memory Quota + +:description: The Eventing Service memory quota does not enforce a hard memory limit on the Eventing subsystem, including producer and worker processes. + +This page explains how the memory quota actually works, how Eventing distributes it across workers, and its limitations in non-containerized environments. + +[abstract] +{description} + +== Overview + +The Couchbase Eventing Service memory quota does not enforce a hard memory limit on the entire Eventing subsystem, including producer and worker processes. +The service uses the quota for queue sizing and JavaScript (JS) runtime memory heap, not as an absolute memory cap. +In VM or bare-metal deployments, Eventing functions can exceed the configured quota at runtime. +Understanding this behavior helps you to appropriately size and monitor Eventing functions in production environments. + +== How Memory Quota Works +The Eventing Service memory quota controls specific aspects of memory management but does not act as an absolute ceiling on memory usage. +The Eventing Service memory quota controls specific aspects of memory management, such as queue sizing and garbage collection, but does not act as an absolute ceiling on memory usage. + +=== Minimum Size +The memory quota must be a minimum of 256 MB. +Couchbase does not support values lesser than 256 MB. + +=== Purpose of the Quota + +The memory quota serves 2 primary purposes: + +* `Producer-to-worker queue sizing`: controls the maximum size of each worker's input queue. +* `Garbage Collection (GC) triggering`: determines when Eventing invokes JavaScript environment (JSE) GC to reclaim memory. +The quota does not restrict the total memory consumed by JSE heaps or Eventing processes. + +== Per-Worker Distribution + +The Eventing Service divides the total memory quota uniformly across all workers in all deployed functions. + +=== Calculation +Eventing calculates the per-worker quota as follows: +`Per-Worker quota = Total memory quota ÷ Total number of workers` + +=== Example + +Consider the following configuration: + +* Total memory quota: 256 MB +* Total workers (across all deployed functions): 4 +* Per-worker budget: 64 MB (256 MB ÷ 4 workers) + +Each worker receives an equal share of the total quota, regardless of the actual memory usage of individual functions. + +=== Impact of Worker Count + +As you deploy more functions or add workers per function, the per-worker quota decreases proportionally. +A smaller per-worker quota can affect performance if individual functions require substantial memory for processing. + +== Queue Sizing and GC Behavior + +The per-worker quota directly influences 2 key aspects of Eventing runtime behavior. + +=== Producer-to-Worker Queue Sizing + +Each worker has a bounded input queue that receives changes to documents from the Database Change Protocol (DCP) stream. + +* The per-worker quota determines the maximum size of this queue. +* When the queue exceeds this size, Eventing throttles the DCP stream to prevent unbounded memory growth. +* Throttling temporarily pauses mutation flow, ensuring workers do not become overwhelmed when processing lags behind the mutation rate. + +=== Garbage Collection (GC) Triggering + +NOTE: When memory usage approaches the per-worker quota limit, the JavaScript runtime may trigger garbage collection to reclaim memory. +However, the garbage collector may delay the stop-the-world collection because it's optimized for throughput. +As a result, the runtime may not reclaim memory without delay, which can temporarily affect Eventing memory consumption. +The service invokes a stop-the-world JSE GC to reduce memory inside the JSE isolate running each function. + +* GC frees memory occupied by unreachable objects that are no longer in use. +* GC reclaims memory only from objects that are no longer in use. +Memory held by live objects in the JSE heap remains allocated, so total usage may exceed the per-worker quota. + +== Limitations and Considerations +The Eventing Service memory quota has important limitations that users must understand to prevent operational issues: + +* Not a hard limit: The quota does not cap total memory usage. +Live objects in JSE heaps may persist, so memory usage can exceed the quota. +* No OS-level enforcement outside cgroups: In VM or bare-metal deployments, the operating system does not enforce memory limits. +* Worker sizing impacts memory: Configuring an excessive number of workers or running memory-intensive functions can increase memory pressure and affect node stability. +* Quota applies per worker, not per function: Each worker receives an equal share of the total quota, regardless of individual function memory requirements. + +=== Not a Hard Limit + +The Eventing Service memory quota does not restrict total memory usage. + +JSE heaps may still contain live (non-garbage) objects that: +* GC cannot reclaim. +* The quota does not account. +* Continue to consume memory beyond the configured limit. + +As a result, setting a memory quota does not guarantee that Eventing functions remain within that limit. + +=== No OS-Level Enforcement Outside cgroups + +In non-containerized environments, the operating system does not enforce the memory quota. + +* Users must monitor Eventing memory usage using system tools. +* Configuring an excessive number of workers can lead to memory exhaustion. +* High-memory-consuming functions can affect overall system stability. +* Eventing does not automatically stop or throttle execution when total memory exceeds the quota. + +=== Memory Isolation + +The memory quota does not isolate memory usage between Eventing functions. + +* All workers draw from the same total memory quota. +* A memory-intensive function can reduce the memory available to other functions. +* Eventing does not support per-function memory limits or reservations. + +== Best Practices +Follow these recommendations to effectively manage Eventing memory usage effectively and avoid operational issues. + +=== Monitor Memory Usage + +* Use system-level monitoring tools to track actual memory consumption. +* Monitor memory usage for the Eventing process and individual workers. +* Configure alerts for memory thresholds well below system limits. + +=== Configure Worker Count Appropriately + +* Consider the per-worker quota when determining the number of workers. +* Avoid over-provisioning workers when Eventing functions require substantial memory. +* Balance the worker count against available system memory. + +=== Test in Representative Environments + +* Test Eventing functions under production-like load conditions. +* Measure actual memory consumption during peak processing. +* Validate that memory usage remains within acceptable limits. + +=== Use cgroups for Enforcement + +* Consider containerized deployments or cgroup-based memory limits in production environments. +* Container platforms such as Docker and Kubernetes provide hard memory limits. +* cgroups enforce OS-level memory restrictions. +* This approach adds a safety layer beyond the Eventing Service memory quota. + +=== Size the Memory Quota Appropriately + +* Set the Eventing memory quota based on expected worker count and function complexity. +* Allow headroom above the minimum 256 MB for production workloads. +* Account for peak processing scenarios, not just average load. + +=== Review Function Memory Efficiency + +* Optimize JavaScript code to minimize memory allocation. +* Avoid accumulating large data structures in function scope. +* Release object references when data is no longer needed. +* Profile functions to identify memory-intensive operations. + +// Learn how to provide links to other doc pages and provide references to pages like Eventing Service Overview, Eventing Service Settings, and Eventing Function Examples. \ No newline at end of file diff --git a/modules/install/pages/sizing-general.adoc b/modules/install/pages/sizing-general.adoc index 1358382b0d..9fd4a521cf 100644 --- a/modules/install/pages/sizing-general.adoc +++ b/modules/install/pages/sizing-general.adoc @@ -420,9 +420,14 @@ Eventing also can perform I/O to external REST endpoints via a synchronous HTTP/ === RAM -In general, the Eventing memory quota of 256 MB is sufficient for almost all workloads. -When scaling up vertically by adding more workers (in the handler’s settings), you see a stall in processing when the number exceeds 48 workers. In this case, the memory quota can be increased to 384 MB or 512 MB. Do not add memory to the Eventing Service’s memory quota without a justified need as it can create resource issues. +The Eventing memory quota is discussed in detail in the xref:manage/manage-eventing/eventing-memory-quota.adoc[Eventing Memory Quota] doc. +Refer this document to learn: +* What is Eventing Service memory quota +* How the Eventing Service memory quota works +* The distribution of the total memory quota across workers +* The limitations and considerations +* Best practices and recommendations === Eventing Storage Collection (previously Metadata Bucket) From 42f98a48caf5e410c16f583f8f12938a0d57b7d3 Mon Sep 17 00:00:00 2001 From: Pallavi-Janardhan Date: Fri, 26 Dec 2025 19:13:09 +0530 Subject: [PATCH 2/7] eventing_memory_quota_first_cut_for_my_reference --- modules/install/pages/sizing-general.adoc | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/modules/install/pages/sizing-general.adoc b/modules/install/pages/sizing-general.adoc index 9fd4a521cf..a080717010 100644 --- a/modules/install/pages/sizing-general.adoc +++ b/modules/install/pages/sizing-general.adoc @@ -420,14 +420,14 @@ Eventing also can perform I/O to external REST endpoints via a synchronous HTTP/ === RAM - -The Eventing memory quota is discussed in detail in the xref:manage/manage-eventing/eventing-memory-quota.adoc[Eventing Memory Quota] doc. -Refer this document to learn: -* What is Eventing Service memory quota -* How the Eventing Service memory quota works -* The distribution of the total memory quota across workers -* The limitations and considerations -* Best practices and recommendations +The xref:eventing-memory-quota.adoc[Eventing Memory Quota] doc discusses the Eventing Service memory quota in detail. + +Refer this document to learn about: +. Eventing Service memory quota +. How Eventing Service memory quota works +. The distribution of the total memory quota across workers +. The limitations and considerations +. Best practices and recommendations === Eventing Storage Collection (previously Metadata Bucket) From 06328cab1f8fe60b00f56bfeb7cea018e5688e6b Mon Sep 17 00:00:00 2001 From: Pallavi-Janardhan Date: Mon, 29 Dec 2025 15:01:28 +0530 Subject: [PATCH 3/7] eventing_memory_quota_first_cut_for_my_reference_1 --- .../install/pages/eventing-memory-quota.adoc | 21 ++++++++++++++----- modules/install/pages/sizing-general.adoc | 18 +++++++++------- 2 files changed, 26 insertions(+), 13 deletions(-) diff --git a/modules/install/pages/eventing-memory-quota.adoc b/modules/install/pages/eventing-memory-quota.adoc index 2426b83ec8..4c5b54840d 100644 --- a/modules/install/pages/eventing-memory-quota.adoc +++ b/modules/install/pages/eventing-memory-quota.adoc @@ -15,10 +15,14 @@ In VM or bare-metal deployments, Eventing functions can exceed the configured qu Understanding this behavior helps you to appropriately size and monitor Eventing functions in production environments. == How Memory Quota Works + The Eventing Service memory quota controls specific aspects of memory management but does not act as an absolute ceiling on memory usage. -The Eventing Service memory quota controls specific aspects of memory management, such as queue sizing and garbage collection, but does not act as an absolute ceiling on memory usage. + +The Eventing Service memory quota controls specific aspects of memory management, such as queue sizing and garbage collection. +However, it does not act as an absolute ceiling on memory usage. === Minimum Size + The memory quota must be a minimum of 256 MB. Couchbase does not support values lesser than 256 MB. @@ -26,8 +30,8 @@ Couchbase does not support values lesser than 256 MB. The memory quota serves 2 primary purposes: -* `Producer-to-worker queue sizing`: controls the maximum size of each worker's input queue. -* `Garbage Collection (GC) triggering`: determines when Eventing invokes JavaScript environment (JSE) GC to reclaim memory. +* **Producer-to-worker queue sizing**: controls the maximum size of each worker's input queue. +* **Garbage Collection (GC) triggering**: determines when Eventing invokes JavaScript environment (JSE) GC to reclaim memory. The quota does not restrict the total memory consumed by JSE heaps or Eventing processes. == Per-Worker Distribution @@ -36,7 +40,10 @@ The Eventing Service divides the total memory quota uniformly across all workers === Calculation Eventing calculates the per-worker quota as follows: -`Per-Worker quota = Total memory quota ÷ Total number of workers` ++ +.... +Per-Worker quota = Total memory quota ÷ Total number of workers +.... === Example @@ -154,4 +161,8 @@ Follow these recommendations to effectively manage Eventing memory usage effecti * Release object references when data is no longer needed. * Profile functions to identify memory-intensive operations. -// Learn how to provide links to other doc pages and provide references to pages like Eventing Service Overview, Eventing Service Settings, and Eventing Function Examples. \ No newline at end of file +// Learn how to provide links to other doc pages and provide references to pages like Eventing Service Overview, Eventing Service Settings, and Eventing Function Examples. +// == Related Links +// * xref:eventing-service-overview.adoc[Eventing Service Overview] +// * xref:eventing-service-settings.adoc[Eventing Service Settings] +// * \ No newline at end of file diff --git a/modules/install/pages/sizing-general.adoc b/modules/install/pages/sizing-general.adoc index a080717010..dc5a15d7f2 100644 --- a/modules/install/pages/sizing-general.adoc +++ b/modules/install/pages/sizing-general.adoc @@ -234,14 +234,16 @@ You should allocate additional CPU cores for these workloads. A node running the Index Service must be sized properly to create and maintain secondary indexes and to perform index scan for {sqlpp} queries. Similar to the nodes that run the Data Service, answer the following questions to take care of your application needs: - -. What is the length oƒ the document key? -. Which fields need to be indexed? -. Will you be using simple or compound indexes? -. What is the minimum, maximum, or average value size of the index field? -. How many indexes do you need? -. How many documents need to be indexed? -. What is the working set percentage of index required memory? ++ +-- +* What is the length oƒ the document key? +* Which fields need to be indexed? +* Will you be using simple or compound indexes? +* What is the minimum, maximum, or average value size of the index field? +* How many indexes do you need? +* How many documents need to be indexed? +* What is the working set percentage of index required memory? +-- Answers to these questions can help you better understand the capacity requirement of your cluster, and provide a better estimation for sizing. From 3b4cd9be3c1abc9fd882d01732abe91c256d65dd Mon Sep 17 00:00:00 2001 From: Pallavi-Janardhan Date: Mon, 29 Dec 2025 19:41:21 +0530 Subject: [PATCH 4/7] eventing_memory_quota_first_cut_for_my_reference_2_sizing_general --- modules/install/pages/sizing-general.adoc | 155 +++++++++++++--------- 1 file changed, 90 insertions(+), 65 deletions(-) diff --git a/modules/install/pages/sizing-general.adoc b/modules/install/pages/sizing-general.adoc index dc5a15d7f2..82fc1121d0 100644 --- a/modules/install/pages/sizing-general.adoc +++ b/modules/install/pages/sizing-general.adoc @@ -4,7 +4,7 @@ [abstract] {description} -When you plan to deploy a Couchbase Server cluster, the most common and important question that comes up is: how many nodes do I need and what size do they need to be? +The most common and important question that comes up when you plan to deploy a Couchbase Server cluster is: how many nodes do I need and what size do they need to be? With the increasing number of Couchbase services and the flexibility of the Couchbase Data Platform, the answer to this question can be challenging. This guide aims to help you better size your deployment. @@ -23,27 +23,29 @@ There needs to be enough capacity in all areas to support everything the system === Multi-Dimensional Scaling -Couchbase Services are what allow you to access and maintain your data. -These services can be deployed, maintained, and provisioned independently of one another. -This independent service model allows you to take advantage of _Multi-Dimensional Scaling_, whereby a cluster can be fine-tuned for optimal handling of emergent workload-requirements, on a service-by-service basis. +Couchbase Services are what allow you to use and maintain your data. +Deploy, maintain, and provision these services independently of each other. +This independent service model allows you to take advantage of `Multi-Dimensional Scaling`, whereby a cluster can be fine-tuned for optimal handling of emergent workload-requirements, on a service-by-service basis. -Since each service has different demands on hardware resources, Multi-Dimensional Scaling plays an important role when sizing your Couchbase cluster, both pre and post-deployment. -For example, core Data Service operations can often benefit greatly from the _scale out_ of smaller commodity nodes, whereas low latency operations with the Query Service may see a greater benefit from the _scale up_ of hardware resources on a given node. +Every service has different demands on hardware resources. +Multi-Dimensional Scaling plays an important role when sizing your Couchbase cluster, both pre and post-deployment. +For example, core Data Service operations can often benefit greatly from the `scale out` of smaller commodity nodes, whereas low latency operations with the Query Service may see a greater benefit from the `scale up` of hardware resources on a given node. -For more information about the nature and resource demands of each Couchbase Service, refer to xref:learn:services-and-indexes/services/services.adoc[Services]. +For more information about the nature and resource demands of each Couchbase Service, see xref:learn:services-and-indexes/services/services.adoc[Services]. == About Couchbase Server Resources This guide discusses four types of resources that you should consider when sizing a Couchbase Server cluster node: CPU:: -CPU refers to the number of cores and clock speed that are required to run your workload. +CPU specifies the number of cores and clock speed required to run your workload. RAM:: -RAM is frequently one of the most crucial areas to size correctly. -Cached documents allow the reads to be served at low latency and consistently high throughput. +RAM is often the most crucial areas to size accurately. +Cached documents provide low-latency reads and consistently high throughput. + -This resource represents the main memory that you allocate to Couchbase Server and must be determined by the following factors: +This resource represents the main memory you allocate to Couchbase Server. +Determine the allocation based on the following factors: + -- * How much free RAM is available beyond OS and other applications @@ -75,7 +77,7 @@ Some components that require RAM are: | 256 MB minimum; 2048 MB and above recommended | Query Service -| No RAM-allocation is required for this service. +| This service does not require any RAM allocation. | Eventing Service | 256 MB @@ -88,8 +90,8 @@ Storage (disk space):: Requirements for your disk subsystem are: + -- -* [.term]_Disk size_ — Refers to the amount of the disk storage space that is needed to hold your entire data set. -* [.term]_Disk I/O_ — Is a combination of your sustained read/write rate, the need for compacting the database files, and anything else that requires disk access. +* [.term]`Disk size` — Specifies the disk storage space needed to hold your entire dataset. +* [.term]`Disk I/O` — Combines your sustained read/write rate, database file compaction, and any other operations that requires disk access. -- + To better support Couchbase Server, keep in mind the following: @@ -97,9 +99,9 @@ To better support Couchbase Server, keep in mind the following: -- * Disk space continues to grow if fragmentation ratio keeps climbing. To mitigate this, add enough buffer in your disk space to store all of the data. -Also, keep an eye on the fragmentation ratio in the Couchbase user interfaces and trigger compaction processes when needed. -* Solid State Drives (SSDs) are desired, but not required. -An SSD will give much better performance than a Hard Disk Drive (HDD) when it comes to disk throughput and latency. +Monitor the fragmentation ratio in the Couchbase user interfaces and trigger compaction processes when needed. +* Prefer Solid State Drives(SSD), but is not required. +An SSD gives much better performance than a Hard Disk Drive(HDD) when it comes to disk throughput and latency. -- Network:: @@ -110,19 +112,20 @@ Most deployments can achieve optimal performance with 1 Gbps interconnects, but == Sizing Data Service Nodes -Data Service nodes handle data service operations, such as create/read/update/delete (CRUD). -The sizing information provided below applies both to the _Couchstore_ and _Magma_ storage engines: however, the _differences_ between these storage engines should also be reviewed, before sizing is attempted. +Data Service nodes handle data service operations, such as create/read/update/delete(CRUD). +The sizing information provided below applies to both the `Couchstore` and `Magma` storage engines. +Review the differences between these storage engines before attempting sizing. For information, see xref:learn:buckets-memory-and-storage/storage-engines.adoc[Storage Engines]. It's important to keep use-cases and application workloads in mind since different application workloads have different resource requirements. -For example, if your working set needs to be fully in memory, you might need large RAM size. -On the other hand, if your application requires only 10% of data in memory, you will need disks with enough space to store all of the data, and that are fast enough for your read/write operations. +For example, if your working set needs to be fully in-memory, you might need large RAM size. +On the other hand, if your application requires only 10% of data in-memory, you'll need disks with enough space to store all of the data, and that are fast enough for your read/write operations. You can start sizing the Data Service nodes by answering the following questions: -. Is the application primarily (or even exclusively) using individual document access? -. Do you plan to use XDCR? -. What’s your working set size and what are your data operation throughput and latency requirements? +* Is the application primarily using individual document access? +* Do you plan to use XDCR? +* What's your working set size and what are your data operation throughput and latency requirements? Answers to the above questions can help you better understand the capacity requirement of your cluster and provide a better estimation for sizing. @@ -214,7 +217,7 @@ Based on the above formula, these are the suggested sizing guidelines: | = (312,000,000 + 4,000,000,000) * (1+0.25)/(0.85) = 6,341,176,470 bytes |=== -This tells you that the RAM requirement for the whole cluster is 7 GB. +This tells you that the RAM requirement for the whole cluster is 7{nbsp}GB. NOTE: This amount is in addition to the RAM requirements for the operating system and any other software that runs on the cluster nodes. @@ -227,11 +230,11 @@ When sizing, you must account for raw CPU overhead when using a high number of b This overhead does not account for any front-end workloads. You should allocate additional CPU cores for these workloads. -* xref:manage:monitor/monitor-intro.adoc[Monitoring] is recommended for CPU usage and System Limits. +* Refer xref:manage:monitor/monitor-intro.adoc[Monitoring] to monitor CPU usage and System Limits. == Sizing Index Service Nodes -A node running the Index Service must be sized properly to create and maintain secondary indexes and to perform index scan for {sqlpp} queries. +Size a node running the Index Service properly to create and maintain secondary indexes and perform index scans for {sqlpp} queries. Similar to the nodes that run the Data Service, answer the following questions to take care of your application needs: + @@ -249,7 +252,7 @@ Answers to these questions can help you better understand the capacity requireme *The following is an example use-case for sizing RAM for the Index service:* -The following sizing guide can be used to compute the memory requirement for each individual index and can be used to determine the total RAM quota required for the Index service. +Use the following sizing guide to compute the memory requirement for each individual index and to determine the total RAM quota required for the Index service. .Input Variables for Sizing RAM |=== @@ -334,7 +337,8 @@ Based on the above formula, these are the suggested sizing guidelines: | (3200000000) * (1 + 0.25) * 0.2 = 800000000 bytes |=== -The above example shows the memory requirement of a secondary index with 10M index entries, each with 50 bytes size of secondary key and 30 bytes size of documentID. The memory usage requirements are 2.5GB(Nitro, 100% resident), 1GB(plasma, 20% resident), 800MB(Forestdb, 20% resident). +The above example shows the memory requirement of a secondary index with 10M index entries, each with 50 bytes size of secondary key and 30 bytes size of documentID. +The memory usage requirements are 2.5{nbsp}GB(Nitro, 100% resident), 1{nbsp}GB(plasma, 20% resident), 800{nbsp}MB(ForestDB, 20% resident). NOTE: The storage engine used in the sizing calculation corresponds to the storage mode chosen for Index Service as explained in the table below. @@ -356,22 +360,22 @@ NOTE: The storage engine used in the sizing calculation corresponds to the stora A node that runs the Query Service executes queries for your application needs. -Since the Query Service doesn’t need to persist data to disk, there are very minimal resource requirements for disk space and disk I/O. +Since the Query Service doesn’t need to persist data to disk, there are minimal resource requirements for disk space and disk I/O. You only need to consider CPU and memory. -There are a few questions that will help size the cluster: +Questions that help in sizing the cluster: -. What types of queries do you need to run? -. Do you need to run `stale=ok` or `stale=false` queries? -. Are the queries simple or complex (requiring JOINs, for example)? -. What are the throughput and latency requirements for your queries? +* What types of queries do you need to run? +* Do you need to run `stale=ok` or `stale=false` queries? +* Are the queries simple or complex (requiring JOINs, for example)? +* What are the throughput and latency requirements for your queries? Different queries have different resource requirements. A simple query might return results within milliseconds while a complex query may require several seconds. -The number of queries that may be processed simultaneously may be approximated with the formula _CPU_cores * 4_. -The maximum queue-length for queries may be approximated with the formula _CPU_cores * 256_. -If either limit is reached, additional queries are rejected with a `503` error. +The formula used to calculate the number of queries that's processed simultaneously is `CPU_cores * 4`. +The formula used to calculate the maximum queue-length for queries is`CPU_cores * 256`. +The system rejects additional queries with a 503 error once it reaches the limits. == Sizing Analytics Service Nodes @@ -381,67 +385,86 @@ The Analytics Service is dependent on the Data Service and requires the Data ser === Data space -* Ensure that the data space for Analytics node takes into account metadata replicas. The Analytics Service currently only replicates metadata and not the actual data. There is a small overhead for metadata replicas as metadata is usually small. +* Make sure that the data space for Analytics node takes into account metadata replicas. +The Analytics Service only replicates the metadata and not the actual data. +There's a small overhead for metadata replicas as metadata is generally small. -* When evaluating a query, the Analytics engine uses temporary disk space. The type of query being executed can impact the amount of temporary disk space required. For example, a query with heavy JOINs, aggregates, windowing, or more predicates will require more temporary disk space. Typically, the temporary disk space can be 2x the data space. +* When evaluating a query, the Analytics engine uses temporary disk space. +The query type the required amount of temporary disk space. +For example, queries with heavy JOINs, aggregates, windowing, or additional predicates require more temporary disk space. +Typically, the temporary disk space can be 2x the data space. * The percent of data shadowed, which is dependent on your use case. -* When ingesting data from the the Data Service into the Analytics Service a filter can be provided that reduces the size of the data that is ingested and also the storage size for the Analytics Service proportionally. +* When you load data from the Data Service into the Analytics Service, you can apply a filter to reduce both the loaded data size and the Analytics Service storage requirements proportionally. === Disk types and partioning -During query execution, Analytics’s query engine attempts to concurrently read and process data from all data partitions. Because of that, the Input/Output Operations per Second (IOPS) of the actual physical disk in which each data partition resides plays a major role in determining the query execution time. -Modern storage devices such as SSDs have much higher IOPS and can deal better with concurrent reads than HDDs. Therefore, having a single data partition on devices with high IOPS will not fully utilize their capabilities. +During query execution, the Analytics query engine concurrently reads and processes data from all partitions. +The Input/Output Operations per Second(IOPS) of the physical disk that hosts the data partitions plays a major role in determining the query execution time. +Modern storage devices such as SSDs have much higher IOPS and can deal better with concurrent reads than HDDs. +Therefore, a single data partition under-utilizes high-IOPS devices. -To simplify the setup of the typical case of a node having a single modern storage device, the Analytics service automatically creates multiple data partitions within the same storage device if and only if a single “Analytics Disk Path” is specified during the node initialization. The number of automatically created data partitions is based on this formula: +To simplify setup for nodes with a single modern storage device, the Analytics Service creates multiple data partitions on the same storage device. +It does this only when you specify a single Analytics disk path during node initialization. +The service determines the number of partitions using the following formula: * `Maximum partitions to create = Min((Analytics Memory in MB / 1024), 16)` * `Actual created partitions = Min(node virtual cores, Maximum partitions to create)` -For example, if a node has 8 virtual cores and the Analytics service was configured with memory >= 8GB, 8 data partitions will be created on that node. -Similarly, if a node has 32 virtual cores and was configured with memory >= 16GB, only 16 partitions will be created as 16 is the upper bound for automatic partitioning. +For example, if a node has 8 virtual cores and the Analytics Service has at least 8{nbsp}GB of memory, the system creates 8 data partitions on that node. +Similarly, for a node with 32 virtual cores and 16{nbsp}GB memory, the system creates 16 partitions, the maximum for automatic partitioning. === Index considerations -The size of a secondary index is approximately the total size of indexed fields in the Analytics collection. For example, if a collection has 20 fields and only 1 of those fields appears in the secondary index, the secondary index size will be ~1/20 of the collection size. +The size of a secondary index is around the total size of indexed fields in the Analytics collection. +For example, if a collection has 20 fields and only 1 of those fields appears in the secondary index, the secondary index size is ~1/20 of the collection size. == Sizing Eventing Service Nodes -Eventing is a compute oriented service. By default, Eventing service has one worker and each worker has two threads of execution. You can scale eventing both vertically by adding more workers or horizontally by adding more nodes. The Eventing service will partition vBuckets across the number of available nodes. +Eventing is a compute oriented service. +By default, Eventing service has 1 worker and each worker has 2 threads of execution. +You can scale eventing both vertically by adding more workers or horizontally by adding more nodes. +The Eventing service partitions the vBuckets across the number of available nodes. === CPU -Because Eventing allows arbitrary code, JavaScript, to be written and run, it is difficult to come up with a perfect sizing formula unless all Functions have been designed and their KV ops, Query ops, and cURLops are known along with the expected mutation rate. +Eventing runs arbitrary JavaScript code. +This flexibility makes it difficult to define a precise sizing formula. +You cannot define a precise formula unless you know the function designs, their KV operations, query operations, cURL operations, and the expected mutation rate. For example, if you process 100K mutations per second and only match 1 out of 1000 patterns, then perform some intense computation on the matched 100 items in your Eventing Function, you need 100X less compute than if you performed the intense computation on each mutation. -Eventing also can perform I/O to external REST endpoints via a synchronous HTTP/S cURL call. In this case, Eventing typically blocks on I/O and doesn’t need much CPU. However. if you want high throughput to overcome bandwidth, you will need more workers and thus more cores. +Eventing also can perform I/O to external REST endpoints via a synchronous HTTP/S cURL call. +Eventing typically blocks on I/O and requires little CPU. +Achieving high throughput to overcome bandwidth requires additional workers and cores. -8 vCPUs or 4 physical cores should be considered a good start for running a few Eventing Functions. +Use 8 vCPUs or 4 physical cores to run Eventing Functions. === RAM The xref:eventing-memory-quota.adoc[Eventing Memory Quota] doc discusses the Eventing Service memory quota in detail. - -Refer this document to learn about: -. Eventing Service memory quota -. How Eventing Service memory quota works -. The distribution of the total memory quota across workers -. The limitations and considerations -. Best practices and recommendations +Refer this document to learn more about: +* Eventing Service memory quota +* How Eventing Service memory quota works +* The distribution of the total memory quota across workers +* The limitations and considerations +* Best practices and recommendations === Eventing Storage Collection (previously Metadata Bucket) -Eventing functions store less than 2048 docs per Function. If timers are not used or if you have less than a few thousand active timers, then the size of the Eventing storage collection can simply be in a bucket with a minimum size 100 MB. +Each Eventing function stores fewer than 2048 documents in its Eventing storage collection. +If timers are not used or if the active timers count does not exceed the per-function document limit, store the Eventing storage collection in a 100 MB bucket. -However, if you use timers you will have to allocate an additional space of about 800 bytes + the size of the passed context (which is the state passed to the function when it is called in the future) per active timer. +Using timers requires additional storage for each active timer. +Each active timer requires 800 bytes, plus the size of the passed context, which represents the state supplied to the function at future execution. -If you have a context of 200 bytes (total 1K/timer), then for 100,000 active timers you'll need 100 MB of additional space in this bucket. +A 200-byte context results in 1 KB of storage per active timer. +100,000 active timers require 100 MB of additional bucket space. -As a best practice, it's recommended to keep this collection 100% resident, so that it's always available in-memory. +As a best practice, keep this collection fully resident in-memory to make sure constant availability. -NOTE: This collection is shared across all your Eventing Functions. +NOTE: All Eventing functions use this collection. == Sizing Backup Service Nodes @@ -470,6 +493,8 @@ Before setting up a replication, you must make sure your cluster is appropriatel Your cluster must be properly sized to be able to handle new XDCR streams. -For example, XDCR needs 1-2 additional CPU cores per stream. In some cases, it also requires additional RAM and network resources. If a cluster is not sized to handle _both_ the existing workload _and_ the new XDCR streams, the performance of both XDCR and the cluster overall may be negatively impacted. +For example, XDCR needs 1-2 additional CPU cores per stream. +In some cases, it also requires additional RAM and network resources. +If a cluster is not sized to handle `both` the existing workload `and` the new XDCR streams, the performance of both XDCR and the cluster overall may be negatively impacted. -For information on preparing your cluster for replication, see xref:manage:manage-xdcr/prepare-for-xdcr.adoc[Prepare for XDCR]. +For information about preparing your cluster for replication, see xref:manage:manage-xdcr/prepare-for-xdcr.adoc[Prepare for XDCR]. From eebb46f5e761e8cb5efbe60abd7ab431615e2dec Mon Sep 17 00:00:00 2001 From: Pallavi-Janardhan Date: Mon, 29 Dec 2025 20:38:30 +0530 Subject: [PATCH 5/7] eventing_memory_quota_first_cut_for_my_reference_3_sizing_general --- modules/install/pages/sizing-general.adoc | 25 ++++++++++++++--------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/modules/install/pages/sizing-general.adoc b/modules/install/pages/sizing-general.adoc index 82fc1121d0..3a85a16f9f 100644 --- a/modules/install/pages/sizing-general.adoc +++ b/modules/install/pages/sizing-general.adoc @@ -48,11 +48,12 @@ This resource represents the main memory you allocate to Couchbase Server. Determine the allocation based on the following factors: + -- -* How much free RAM is available beyond OS and other applications -* How much data do you want to store in main memory -* How much latency you expect from KV/indexing/query performance +* How much free RAM is available beyond OS and other applications? +* How much data do you want to store in main memory? +* How much latency you expect from KV/indexing/query performance? -- + + Some components that require RAM are: + -- @@ -237,7 +238,7 @@ You should allocate additional CPU cores for these workloads. Size a node running the Index Service properly to create and maintain secondary indexes and perform index scans for {sqlpp} queries. Similar to the nodes that run the Data Service, answer the following questions to take care of your application needs: -+ + -- * What is the length oƒ the document key? * Which fields need to be indexed? @@ -360,7 +361,7 @@ NOTE: The storage engine used in the sizing calculation corresponds to the stora A node that runs the Query Service executes queries for your application needs. -Since the Query Service doesn’t need to persist data to disk, there are minimal resource requirements for disk space and disk I/O. +Since the Query Service doesn't need to persist data to disk, there are minimal resource requirements for disk space and disk I/O. You only need to consider CPU and memory. Questions that help in sizing the cluster: @@ -374,7 +375,7 @@ Different queries have different resource requirements. A simple query might return results within milliseconds while a complex query may require several seconds. The formula used to calculate the number of queries that's processed simultaneously is `CPU_cores * 4`. -The formula used to calculate the maximum queue-length for queries is`CPU_cores * 256`. +The formula used to calculate the maximum queue-length for queries is `CPU_cores * 256`. The system rejects additional queries with a 503 error once it reaches the limits. == Sizing Analytics Service Nodes @@ -443,13 +444,19 @@ Use 8 vCPUs or 4 physical cores to run Eventing Functions. === RAM -The xref:eventing-memory-quota.adoc[Eventing Memory Quota] doc discusses the Eventing Service memory quota in detail. -Refer this document to learn more about: +The xref:eventing-memory-quota.adoc[Eventing Memory Quota] doc discusses the Eventing Service memory quota in detail. + +Refer to this document to learn more about: + ++ +-- * Eventing Service memory quota * How Eventing Service memory quota works * The distribution of the total memory quota across workers * The limitations and considerations * Best practices and recommendations +-- ++ === Eventing Storage Collection (previously Metadata Bucket) @@ -470,7 +477,6 @@ NOTE: All Eventing functions use this collection. The hardware requirements for running a backup cluster are as follows: - .Hardware requirements |=== ||Minimum |Recommended @@ -486,7 +492,6 @@ The hardware requirements for running a backup cluster are as follows: |=== - == Sizing for Replication (XDCR) Before setting up a replication, you must make sure your cluster is appropriately configured and provisioned. From 882f1d17e0e258e9d6b5ac19c6e365e3e1e26ab5 Mon Sep 17 00:00:00 2001 From: Pallavi-Janardhan Date: Mon, 29 Dec 2025 21:08:49 +0530 Subject: [PATCH 6/7] eventing_memory_quota_adding_&_Modifying_sizing_guidelines --- modules/install/pages/eventing-memory-quota.adoc | 12 ++++++------ modules/install/pages/sizing-general.adoc | 4 +--- 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/modules/install/pages/eventing-memory-quota.adoc b/modules/install/pages/eventing-memory-quota.adoc index 4c5b54840d..7451f4fdeb 100644 --- a/modules/install/pages/eventing-memory-quota.adoc +++ b/modules/install/pages/eventing-memory-quota.adoc @@ -40,7 +40,6 @@ The Eventing Service divides the total memory quota uniformly across all workers === Calculation Eventing calculates the per-worker quota as follows: -+ .... Per-Worker quota = Total memory quota ÷ Total number of workers .... @@ -74,15 +73,15 @@ Each worker has a bounded input queue that receives changes to documents from th === Garbage Collection (GC) Triggering +* GC frees memory occupied by unreachable objects that are no longer in use. +* GC reclaims memory only from objects that are no longer in use. +Memory held by live objects in the JSE heap remains allocated, so total usage may exceed the per-worker quota. + NOTE: When memory usage approaches the per-worker quota limit, the JavaScript runtime may trigger garbage collection to reclaim memory. However, the garbage collector may delay the stop-the-world collection because it's optimized for throughput. As a result, the runtime may not reclaim memory without delay, which can temporarily affect Eventing memory consumption. The service invokes a stop-the-world JSE GC to reduce memory inside the JSE isolate running each function. -* GC frees memory occupied by unreachable objects that are no longer in use. -* GC reclaims memory only from objects that are no longer in use. -Memory held by live objects in the JSE heap remains allocated, so total usage may exceed the per-worker quota. - == Limitations and Considerations The Eventing Service memory quota has important limitations that users must understand to prevent operational issues: @@ -97,6 +96,7 @@ Live objects in JSE heaps may persist, so memory usage can exceed the quota. The Eventing Service memory quota does not restrict total memory usage. JSE heaps may still contain live (non-garbage) objects that: + * GC cannot reclaim. * The quota does not account. * Continue to consume memory beyond the configured limit. @@ -151,7 +151,7 @@ Follow these recommendations to effectively manage Eventing memory usage effecti === Size the Memory Quota Appropriately * Set the Eventing memory quota based on expected worker count and function complexity. -* Allow headroom above the minimum 256 MB for production workloads. +* Allow additional headroom on top of the 256 MB default for production workloads. * Account for peak processing scenarios, not just average load. === Review Function Memory Efficiency diff --git a/modules/install/pages/sizing-general.adoc b/modules/install/pages/sizing-general.adoc index 3a85a16f9f..c44753926d 100644 --- a/modules/install/pages/sizing-general.adoc +++ b/modules/install/pages/sizing-general.adoc @@ -58,7 +58,7 @@ Some components that require RAM are: + -- ** All index storage types which need sufficient memory quota allocation for proper functioning. -** Full Text Search (FTS) +** Full Text Search (FTS). -- + .Minimum RAM Quota for Couchbase Server Components @@ -448,7 +448,6 @@ The xref:eventing-memory-quota.adoc[Eventing Memory Quota] doc discusses the Eve Refer to this document to learn more about: -+ -- * Eventing Service memory quota * How Eventing Service memory quota works @@ -456,7 +455,6 @@ Refer to this document to learn more about: * The limitations and considerations * Best practices and recommendations -- -+ === Eventing Storage Collection (previously Metadata Bucket) From 263d9c85a42ed35681582bf28cf8e02ffb9d97a8 Mon Sep 17 00:00:00 2001 From: Pallavi-Janardhan Date: Tue, 30 Dec 2025 16:16:58 +0530 Subject: [PATCH 7/7] DOC-13422_incorporating_technical_comments --- .../install/pages/eventing-memory-quota.adoc | 30 +++++++++---------- .../database-change-protocol.adoc | 0 2 files changed, 14 insertions(+), 16 deletions(-) create mode 100644 modules/install/pages/modules/architecture/database-change-protocol.adoc diff --git a/modules/install/pages/eventing-memory-quota.adoc b/modules/install/pages/eventing-memory-quota.adoc index 7451f4fdeb..606afeb0d8 100644 --- a/modules/install/pages/eventing-memory-quota.adoc +++ b/modules/install/pages/eventing-memory-quota.adoc @@ -1,6 +1,6 @@ = Eventing Memory Quota -:description: The Eventing Service memory quota does not enforce a hard memory limit on the Eventing subsystem, including producer and worker processes. +:description: The Eventing Service memory quota does not enforce a hard memory limit on the Eventing subsystem, including worker processes. This page explains how the memory quota actually works, how Eventing distributes it across workers, and its limitations in non-containerized environments. @@ -9,16 +9,14 @@ This page explains how the memory quota actually works, how Eventing distributes == Overview -The Couchbase Eventing Service memory quota does not enforce a hard memory limit on the entire Eventing subsystem, including producer and worker processes. +The Couchbase Eventing Service memory quota does not enforce a hard memory limit on the entire Eventing subsystem, including worker processes. The service uses the quota for queue sizing and JavaScript (JS) runtime memory heap, not as an absolute memory cap. In VM or bare-metal deployments, Eventing functions can exceed the configured quota at runtime. Understanding this behavior helps you to appropriately size and monitor Eventing functions in production environments. == How Memory Quota Works -The Eventing Service memory quota controls specific aspects of memory management but does not act as an absolute ceiling on memory usage. - -The Eventing Service memory quota controls specific aspects of memory management, such as queue sizing and garbage collection. +The Eventing Service memory quota controls specific aspects of memory management, such as queue sizing and frequency of garbage collection. However, it does not act as an absolute ceiling on memory usage. === Minimum Size @@ -30,9 +28,9 @@ Couchbase does not support values lesser than 256 MB. The memory quota serves 2 primary purposes: -* **Producer-to-worker queue sizing**: controls the maximum size of each worker's input queue. -* **Garbage Collection (GC) triggering**: determines when Eventing invokes JavaScript environment (JSE) GC to reclaim memory. -The quota does not restrict the total memory consumed by JSE heaps or Eventing processes. +* **Worker queue sizing**: controls the maximum size of each worker's input queue. +* **Garbage Collection (GC) triggering**: determines when Eventing invokes JavaScript environment GC to reclaim memory. +The quota does not restrict the total memory consumed by JavaScript runtime heaps or Eventing processes. == Per-Worker Distribution @@ -63,30 +61,30 @@ A smaller per-worker quota can affect performance if individual functions requir The per-worker quota directly influences 2 key aspects of Eventing runtime behavior. -=== Producer-to-Worker Queue Sizing +=== Worker Queue Sizing -Each worker has a bounded input queue that receives changes to documents from the Database Change Protocol (DCP) stream. +Each worker maintains a bounded input queue that receives changes to documents from the Database Change Protocol (xref:server:learn:clusters-and-availability/intra-cluster-replication.adoc#database-change-protocol[DCP]) stream. * The per-worker quota determines the maximum size of this queue. -* When the queue exceeds this size, Eventing throttles the DCP stream to prevent unbounded memory growth. -* Throttling temporarily pauses mutation flow, ensuring workers do not become overwhelmed when processing lags behind the mutation rate. +* When the queue exceeds this size, Eventing throttles the xref:server:learn:clusters-and-availability/intra-cluster-replication.adoc#database-change-protocol[DCP] stream to prevent unbounded memory growth. +* Throttling temporarily pauses mutation flow, ensuring workers do not become overwhelmed when processing lags behind the mutation rate. === Garbage Collection (GC) Triggering * GC frees memory occupied by unreachable objects that are no longer in use. * GC reclaims memory only from objects that are no longer in use. -Memory held by live objects in the JSE heap remains allocated, so total usage may exceed the per-worker quota. +Memory held by live objects in the JavaScript runtime heap remains allocated, so total usage may exceed the per-worker quota. NOTE: When memory usage approaches the per-worker quota limit, the JavaScript runtime may trigger garbage collection to reclaim memory. However, the garbage collector may delay the stop-the-world collection because it's optimized for throughput. As a result, the runtime may not reclaim memory without delay, which can temporarily affect Eventing memory consumption. -The service invokes a stop-the-world JSE GC to reduce memory inside the JSE isolate running each function. +The service invokes a stop-the-world JavaScript runtime GC to reduce memory inside the JavaScript runtime isolate running each function. == Limitations and Considerations The Eventing Service memory quota has important limitations that users must understand to prevent operational issues: * Not a hard limit: The quota does not cap total memory usage. -Live objects in JSE heaps may persist, so memory usage can exceed the quota. +Live objects in JavaScript runtime heaps may persist, so memory usage can exceed the quota. * No OS-level enforcement outside cgroups: In VM or bare-metal deployments, the operating system does not enforce memory limits. * Worker sizing impacts memory: Configuring an excessive number of workers or running memory-intensive functions can increase memory pressure and affect node stability. * Quota applies per worker, not per function: Each worker receives an equal share of the total quota, regardless of individual function memory requirements. @@ -95,7 +93,7 @@ Live objects in JSE heaps may persist, so memory usage can exceed the quota. The Eventing Service memory quota does not restrict total memory usage. -JSE heaps may still contain live (non-garbage) objects that: +JavaScript runtime heaps may still contain live (non-garbage) objects that: * GC cannot reclaim. * The quota does not account. diff --git a/modules/install/pages/modules/architecture/database-change-protocol.adoc b/modules/install/pages/modules/architecture/database-change-protocol.adoc new file mode 100644 index 0000000000..e69de29bb2