diff --git a/src/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md index 8cabc7278..e897012e0 100644 --- a/src/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -184,498 +184,499 @@ Ensure the URL for Prometheus is correct. Click "Save & Test". If the message "D ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) -## 3. Appendix, Detailed Monitoring Metrics +## 3. Appendix, Detailed Explanation of Monitoring Indicators ### 3.1 System Dashboard -This dashboard displays the current system's **CPU****, memory, disk, and network resource****s**, as well as some **JVM****-related metrics**. - -#### 3.1.1 CPU - -- **CPU Core:** Number of CPU cores. -- **CPU Load:** - - **System CPU Load:** The average CPU load and utilization of the entire system during the sampling period. - - **Process CPU Load:** The percentage of CPU resources occupied by the IoTDB process during the sampling period. -- **CPU Time Per Minute:** The total CPU time consumed by all processes in the system per minute. - -#### 3.1.2 Memory - -- **System Memory:** Current system memory usage. - - **Committed VM Size:** Virtual memory size allocated by the operating system to running processes. - - **Total Physical Memory****:** Total available physical memory in the system. - - **Used Physical Memory****:** The total amount of memory currently in use, including memory actively used by processes and memory occupied by the operating system for buffers and caching. -- **System Swap Memory:** The amount of swap space memory in use. -- **Process Memory:** Memory usage of the IoTDB process. - - **Max Memory:** The maximum amount of memory that the IoTDB process can request from the OS (configured in the `datanode-env`/`confignode-env` configuration files). - - **Total Memory:** The total amount of memory currently allocated by the IoTDB process from the OS. - - **Used Memory:** The total amount of memory currently in use by the IoTDB process. - -#### 3.1.3 Disk - -- **Disk Space:** - - **Total Disk Space:** Maximum disk space available for IoTDB. - - **Used Disk Space:** Disk space currently occupied by IoTDB. -- **Log Number Per Minute:** Average number of IoTDB logs generated per minute, categorized by log levels. -- **File Count:** The number of files related to IoTDB. - - **All:** Total number of files. - - **TsFile:** Number of TsFiles. - - **Seq:** Number of sequential TsFiles. - - **Unseq:** Number of unordered TsFiles. - - **WAL:** Number of WAL (Write-Ahead Log) files. - - **Cross-Temp:** Number of temporary files generated during cross-space merge operations. - - **Inner-Seq-Temp:** Number of temporary files generated during sequential-space merge operations. - - **Inner-Unseq-Temp:** Number of temporary files generated during unordered-space merge operations. - - **Mods:** Number of tombstone files. -- **Open File Count:** Number of open file handles in the system. -- **File Size:** The size of IoTDB-related files, with each sub-item representing the size of a specific file type. -- **Disk I/O Busy Rate:** Equivalent to the `%util` metric in `iostat`, indicating the level of disk utilization. Each sub-item corresponds to a specific disk. -- **Disk I/O Throughput****:** Average I/O throughput of system disks over a given period. Each sub-item corresponds to a specific disk. -- **Disk I/O Ops:** Equivalent to `r/s`, `w/s`, `rrqm/s`, and `wrqm/s` in `iostat`, representing the number of I/O operations per second. -- **Disk I/O Avg Time:** Equivalent to the `await` metric in `iostat`, representing the average latency of each I/O request, recorded separately for read and write operations. -- **Disk I/O Avg Size:** Equivalent to the `avgrq-sz` metric in `iostat`, indicating the average size of each I/O request, recorded separately for read and write operations. -- **Disk I/O Avg Queue Size:** Equivalent to `avgqu-sz` in `iostat`, representing the average length of the I/O request queue. -- **I/O System Call Rate:** Frequency of read/write system calls invoked by the process, similar to IOPS. -- **I/O Throughput****:** I/O throughput of the process, divided into `actual_read/write` and `attempt_read/write`. `Actual read` and `actual write` refer to the number of bytes actually written to or read from the storage device, excluding those handled by the Page Cache. - -#### 3.1.4 JVM - -- **GC Time Percentage:** Percentage of time spent on garbage collection (GC) by the JVM in the past minute. -- **GC Allocated/Promoted Size Detail:** The average size of objects promoted to the old generation per minute, as well as newly allocated objects in the young/old generation and non-generational areas. -- **GC Data Size Detail:** Size of long-lived objects in the JVM and the maximum allowed size for each generation. -- **Heap Memory:** JVM heap memory usage. - - **Maximum Heap Memory:** Maximum available heap memory for the JVM. - - **Committed Heap Memory:** Committed heap memory size for the JVM. - - **Used Heap Memory:** The amount of heap memory currently in use. - - **PS Eden Space:** Size of the PS Young generation's Eden space. - - **PS Old Space:** Size of the PS Old generation. - - **PS Survivor Space:** Size of the PS Survivor space. -- **O****ff Heap Memory:** Off-heap memory usage. - - **Direct Memory:** The amount of direct memory used. - - **Mapped Memory:** The amount of memory used for mapped files. -- **GC Number Per Minute:** Average number of garbage collections (YGC and FGC) performed per minute. -- **GC Time Per Minute:** Average time spent on garbage collection (YGC and FGC) per minute. -- **GC Number Per Minute Detail:** Average number of garbage collections performed per minute due to different causes. -- **GC Time Per Minute Detail:** Average time spent on garbage collection per minute due to different causes. -- **Time Consumed of Compilation Per Minute:** Total time spent on JVM compilation per minute. -- **The Number of Class:** - - **Loaded:** Number of classes currently loaded by the JVM. - - **Unloaded:** Number of classes unloaded by the JVM since system startup. -- **The Number of Java Thread:** The number of currently active threads in IoTDB. Each sub-item represents the number of threads in different states. - -#### 3.1.5 Network - -- **Net Speed:** Data transmission and reception speed by the network interface. -- **Receive/Transmit Data Size:** The total size of data packets sent and received by the network interface since system startup. -- **Packet Speed:** The rate of data packets sent and received by the network interface. A single RPC request may correspond to one or more packets. -- **Connection Num:** Number of socket connections for the current process (IoTDB only uses TCP). +This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. + +#### CPU + +- CPU Cores:CPU cores +- CPU Utilization: + - System CPU Utilization:The average CPU load and busyness of the entire system during the sampling time + - Process CPU Utilization:The proportion of CPU occupied by the IoTDB process during sampling time +- CPU Time Per Minute:The total CPU time of all processes in the system per minute + +#### Memory + +- System Memory:The current usage of system memory. + - Commited VM Size: The size of virtual memory allocated by the operating system to running processes. + - Total Physical Memory:The total amount of available physical memory in the system. + - Used Physical Memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. +- System Swap Memory:Swap Space memory usage. +- Process Memory:The usage of memory by the IoTDB process. + - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) + - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. + - Used Memory:The total amount of memory currently used by the IoTDB process. + +#### Disk + +- Disk Space: + - Total Disk Space:The maximum disk space that IoTDB can use. + - Used Disk Space:The disk space already used by IoTDB. +- Logs Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. +- File Count:Number of IoTDB related files + - All:All file quantities + - TsFile:Number of TsFiles + - Seq:Number of sequential TsFiles + - Unseq:Number of unsequence TsFiles + - WAL:Number of WAL files + - Cross-Temp:Number of cross space merge temp files + - Inner-Seq-Temp:Number of merged temp files in sequential space + - Innsr-Unseq-Temp:Number of merged temp files in unsequential space + - Mods:Number of tombstone files +- Open File Handles:Number of file handles opened by the system +- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. +- Disk Utilization (%):Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. +- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. +- Disk IOPS:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. +- Disk I/O Latency (Avg):Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. +- Disk I/O Request Size (Avg):Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. +- Disk I/O Queue Length (Avg):Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. +- I/O Syscall Rate:The frequency of process calls to read and write system calls, similar to IOPS. +- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. + +#### JVM + +- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window +- GC Allocated/Promoted Size: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications +- GC Live Data Size:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value +- Heap Memory:JVM heap memory usage. + - Maximum heap memory:The maximum available heap memory size for the JVM. + - Committed heap memory:The size of heap memory that has been committed by the JVM. + - Used heap memory:The size of heap memory already used by the JVM. + - PS Eden Space:The size of the PS Young area. + - PS Old Space:The size of the PS Old area. + - PS Survivor Space:The size of the PS survivor area. + - ...(CMS/G1/ZGC, etc) +- Off-Heap Memory:Out of heap memory usage. + - Direct Memory:Out of heap direct memory. + - Mapped Memory:Out of heap mapped memory. +- GCs Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC +- GC Latency Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC +- GC Events Breakdown Per Minute:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC +- GC Pause Time Breakdown Per Minute:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC +- JIT Compilation Time Per Minute:The total time JVM spends compiling per minute +- Loaded & Unloaded Classes: + - Loaded:The number of classes currently loaded by the JVM + - Unloaded:The number of classes uninstalled by the JVM since system startup +- Active Java Threads:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. + +#### Network + +Eno refers to the network card connected to the public network, while lo refers to the virtual network card. + +- Network Speed:The speed of network card sending and receiving data +- Network Throughput (Receive/Transmit):The size of data packets sent or received by the network card, calculated from system restart +- Packet Transmission Rate:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets +- Active TCP Connections:The current number of socket connections for the selected process (IoTDB only has TCP) ### 3.2 Performance Overview Dashboard -This dashboard provides an overview of the system's overall performance. - -#### 3.2.1 Cluster Overview - -- **Total CPU Core:** Total number of CPU cores in the cluster. -- **DataNode CPU Load:** CPU utilization of each DataNode in the cluster. -- Disk: - - **Total Disk Space:** Total disk space across all cluster nodes. - - **DataNode Disk Usage:** Disk usage of each DataNode in the cluster. -- **Total Timeseries:** The total number of time series managed by the cluster (including replicas). The actual number of time series should be calculated considering metadata replicas. -- **Cluster:** The number of ConfigNode and DataNode instances in the cluster. -- **Up Time:** The duration since the cluster started. -- **Total Write Point Per Second:** The total number of data points written per second in the cluster (including replicas). The actual number of writes should be analyzed in conjunction with the data replication factor. -- Memory: - - **Total System Memory:** The total system memory available in the cluster. - - **Total Swap Memory:** The total swap memory available in the cluster. - - **DataNode Process Memory Usage:** The memory usage of each DataNode in the cluster. -- **Total File Number:** The total number of files managed by the cluster. -- **Cluster System Overview:** An overview of cluster-wide system resources, including average DataNode memory usage and average disk usage. -- **Total Database:** The total number of databases managed by the cluster (including replicas). -- **Total DataRegion:** The total number of DataRegions in the cluster. -- **Total SchemaRegion:** The total number of SchemaRegions in the cluster. - -#### 3.2.2 Node Overview - -- **CPU Core:** Number of CPU cores on the node’s machine. -- **Disk Space:** Total disk space available on the node’s machine. -- **Timeseries:** The number of time series managed by the node (including replicas). -- **System Overview:** Overview of the node’s system resources, including CPU load, process memory usage, and disk usage. -- **Write Point Per Second:** The write speed of the node, including replicated data. -- **System Memory:** The total system memory available on the node’s machine. -- **Swap Memory:** The total swap memory available on the node’s machine. -- **File Number:** The number of files managed by the node. - -#### 3.2.3 Performance - -- **Session Idle Time:** The total idle time of session connections on the node. -- **Client Connection:** The status of client connections on the node, including the total number of connections and the number of active connections. -- **Time Consumed Of Operation:** The latency of various operations on the node, including the average value and P99 percentile. -- **Average Time Consumed Of Interface:** The average latency of each **Thrift interface** on the node. -- **P99 Time Consumed Of Interface:** The P99 latency of each Thrift interface on the node. -- **Task Number:** The number of system tasks running on the node. -- **Average Time Consumed Of Task:** The average execution time of system tasks on the node. -- **P99 Time Consumed Of Task:** The P99 execution time of system tasks on the node. -- **Operation Per Second:** The number of operations executed per second on the node. -- Main Process: - - **Operation Per Second of Stage:** The number of operations executed per second in different stages of the main process. - - **Average Time Consumed of Stage:** The average execution time of different stages in the main process. - - **P99 Time Consumed of Stage:** The P99 execution time of different stages in the main process. -- Scheduling Stage: - - **OPS Of Schedule:** The number of operations executed per second in different sub-stages of the scheduling stage. - - **Average Time Consumed Of Schedule Stage:** The average execution time in different sub-stages of the scheduling stage. - - **P99 Time Consumed Of Schedule Stage:** The P99 execution time in different sub-stages of the scheduling stage. -- Local Scheduling Stage: - - **OPS of Local Schedule Stage:** Number of operations per second at each sub-stage of the local schedule stage. - - **Average Time Consumed of Local Schedule Stage:** Average time consumed at each sub-stage of the local schedule stage. - - **P99 Time Consumed of Local Schedule Stage:** P99 time consumed at each sub-stage of the local schedule stage. -- Storage Stage: - - **OPS of Storage Stage:** Number of operations per second at each sub-stage of the storage stage. - - **Average Time Consumed of Storage Stage:** Average time consumed at each sub-stage of the storage stage. - - **P99 Time Consumed of Storage Stage:** P99 time consumed at each sub-stage of the storage stage. -- Engine Stage: - - **OPS Of Engine Stage:** The number of operations executed per second in different sub-stages of the engine stage. - - **Average Time Consumed Of Engine Stage:** The average execution time in different sub-stages of the engine stage. - - **P99 Time Consumed Of Engine Stage:** The P99 execution time in different sub-stages of the engine stage. - -#### 3.2.4 System - -- **CPU Load:** The CPU load of the node. -- **CPU Time Per Minute:** The total CPU time per minute on the node, which is influenced by the number of CPU cores. -- **GC Time Per Minute:** The average time spent on Garbage Collection (GC) per minute on the node, including Young GC (YGC) and Full GC (FGC). -- **Heap Memory:** The heap memory usage of the node. -- **Off-Heap Memory:** The off-heap memory usage of the node. -- **The Number Of Java Thread:** The number of Java threads on the node. -- **File Count:** The number of files managed by the node. -- **File Size:** The total size of files managed by the node. -- **Log Number Per Minute:** The number of logs generated per minute on the node, categorized by log type. +#### Cluster Overview + +- Total CPU Cores:Total CPU cores of cluster machines +- DataNode CPU Load:CPU usage of each DataNode node in the cluster +- Disk + - Total Disk Space: Total disk size of cluster machines + - DataNode Disk Utilization: The disk usage rate of each DataNode in the cluster +- Total Time Series: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas +- Cluster Info: Number of ConfigNode and DataNode nodes in the cluster +- Up Time: The duration of cluster startup until now +- Total Write Throughput: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas +- Memory + - Total System Memory: Total memory size of cluster machine system + - Total Swap Memory: Total size of cluster machine swap memory + - DataNode Process Memory Utilization: Memory usage of each DataNode in the cluster +- Total Files:Total number of cluster management files +- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage +- Total DataBases: The total number of databases managed by the cluster (including replicas) +- Total DataRegions: The total number of DataRegions managed by the cluster +- Total SchemaRegions: The total number of SchemeRegions managed by the cluster + +#### Node Overview + +- CPU Cores: The number of CPU cores in the machine where the node is located +- Disk Space: The disk size of the machine where the node is located +- Time Series: Number of time series managed by the machine where the node is located (including replicas) +- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio +- Write Throughput: The write speed per second of the machine where the node is located (including replicas) +- System Memory: The system memory size of the machine where the node is located +- Swap Memory:The swap memory size of the machine where the node is located +- File Count: Number of files managed by nodes + +#### Performance + +- Session Idle Time:The total idle time and total busy time of the session connection of the node +- Client Connections: The client connection status of the node, including the total number of connections and the number of active connections +- Operation Latency: The time consumption of various types of node operations, including average and P99 +- Average Interface Latency: The average time consumption of each thrust interface of a node +- P99 Interface Latency: P99 time consumption of various thrust interfaces of nodes +- Total Tasks: The number of system tasks for each node +- Average Task Latency: The average time spent on various system tasks of a node +- P99 Task Latency: P99 time consumption for various system tasks of nodes +- Operations Per Second: The number of operations per second for a node +- Mainstream Process + - Operations Per Second (Stage-wise): The number of operations per second for each stage of the node's main process + - Average Stage Latency: The average time consumption of each stage in the main process of a node + - P99 Stage Latency: P99 time consumption for each stage of the node's main process +- Schedule Stage + - Schedule Operations Per Second: The number of operations per second in each sub stage of the node schedule stage + - Average Schedule Stage Latency:The average time consumption of each sub stage in the node schedule stage + - P99 Schedule Stage Latency: P99 time consumption for each sub stage of the schedule stage of the node +- Local Schedule Sub Stages + - Local Schedule Operations Per Second: The number of operations per second in each sub stage of the local schedule node + - Average Local Schedule Stage Latency: The average time consumption of each sub stage in the local schedule stage of the node + - P99 Local Schedule Latency: P99 time consumption for each sub stage of the local schedule stage of the node +- Storage Stage + - Storage Operations Per Second: The number of operations per second in each sub stage of the node storage stage + - Average Storage Stage Latency: Average time consumption of each sub stage in the node storage stage + - P99 Storage Stage Latency: P99 time consumption for each sub stage of node storage stage +- Engine Stage + - Engine Operations Per Second: The number of operations per second in each sub stage of the node engine stage + - Average Engine Stage Latency: The average time consumption of each sub stage in the engine stage of a node + - P99 Engine Stage Latency: P99 time consumption of each sub stage in the node engine stage + +#### System + +- CPU Utilization: CPU load of nodes +- CPU Latency Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores +- GC Latency Per Minute:The average GC time per minute for nodes, including YGC and FGC +- Heap Memory: Node's heap memory usage +- Off-Heap Memory: Non heap memory usage of nodes +- Total Java Threads: Number of Java threads on nodes +- File Count:Number of files managed by nodes +- File Size: Node management file size situation +- Logs Per Minute: Different types of logs per minute for nodes ### 3.3 ConfigNode Dashboard -This dashboard displays the performance metrics of all management nodes in the cluster, including **partition information, node status, and client connection statistics**. - -#### 3.3.1 Node Overview - -- **Database Count:** Number of databases on the node. -- Region: - - **DataRegion Count:** Number of DataRegions on the node. - - **DataRegion Current Status:** Current status of DataRegions on the node. - - **SchemaRegion Count:** Number of SchemaRegions on the node. - - **SchemaRegion Current Status:** Current status of SchemaRegions on the node. -- **System Memory:** System memory on the node's machine. -- **Swap Memory:** Swap memory on the node's machine. -- **ConfigNodes:** Status of ConfigNodes in the cluster. -- **DataNodes:** Status of DataNodes in the cluster. -- **System Overview:** Overview of the node's system resources, including system memory, disk usage, process memory, and CPU load. - -#### 3.3.2 NodeInfo - -- **Node Count:** The total number of nodes in the cluster, including ConfigNodes and DataNodes. -- **ConfigNode Status:** The status of ConfigNodes in the cluster. -- **DataNode Status:** The status of DataNodes in the cluster. -- **SchemaRegion Distribution:** The distribution of SchemaRegions in the cluster. -- **SchemaRegionGroup Leader Distribution:** The leader distribution of SchemaRegionGroups in the cluster. -- **DataRegion Distribution:** The distribution of DataRegions in the cluster. -- **DataRegionGroup Leader Distribution:** The leader distribution of DataRegionGroups in the cluster. - -#### 3.3.3 Protocol - -- Client Count Statistics: - - **Active Client Num:** The number of active clients in each thread pool on the node. - - **Idle Client Num:** The number of idle clients in each thread pool on the node. - - **Borrowed Client Count:** The number of borrowed clients in each thread pool on the node. - - **Created Client Count:** The number of clients created in each thread pool on the node. - - **Destroyed Client Count:** The number of clients destroyed in each thread pool on the node. -- Client Time Statistics: - - **Client Mean Active Time:** The average active time of clients in each thread pool on the node. - - **Client Mean Borrow Wait Time:** The average time clients spend waiting for borrowed resources in each thread pool. - - **Client Mean Idle Time:** The average idle time of clients in each thread pool. - -#### 3.3.4 Partition Table - -- **SchemaRegionGroup Count:** The number of **SchemaRegionGroups** in the cluster’s databases. -- **DataRegionGroup Count:** The number of DataRegionGroups in the cluster’s databases. -- **SeriesSlot Count:** The number of SeriesSlots in the cluster’s databases. -- **TimeSlot Count:** The number of TimeSlots in the cluster’s databases. -- **DataRegion Status:** The status of DataRegions in the cluster. -- **SchemaRegion Status:** The status of SchemaRegions in the cluster. - -#### 3.3.5 Consensus - -- **Ratis Stage Time:** The execution time of different stages in the Ratis consensus protocol. -- **Write Log Entry:** The execution time for writing log entries in Ratis. -- **Remote / Local Write Time:** The time taken for remote and local writes in Ratis. -- **Remote / Local Write QPS:** The **queries per second (QPS)** for remote and local writes in Ratis. -- **RatisConsensus Memory:** The memory usage of the Ratis consensus protocol on the node. +This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. + +#### Node Overview + +- Database Count: Number of databases for nodes +- Region + - DataRegion Count:Number of DataRegions for nodes + - DataRegion Status: The state of the DataRegion of the node + - SchemaRegion Count: Number of SchemeRegions for nodes + - SchemaRegion Status: The state of the SchemeRegion of the node +- System Memory Utilization: The system memory size of the node +- Swap Memory Utilization: Node's swap memory size +- ConfigNodes Status: The running status of the ConfigNode in the cluster where the node is located +- DataNodes Status:The DataNode situation of the cluster where the node is located +- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load + +#### NodeInfo + +- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode +- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located +- DataNode Status: The status of the DataNode node in the cluster where the node is located +- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located +- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located +- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located +- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located + +#### Protocol + +- Client Count + - Active Clients: The number of active clients in each thread pool of a node + - Idle Clients: The number of idle clients in each thread pool of a node + - Borrowed Clients Per Second: Number of borrowed clients in each thread pool of the node + - Created Clients Per Second: Number of created clients for each thread pool of the node + - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node +- Client time situation + - Average Client Active Time: The average active time of clients in each thread pool of a node + - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node + - Average Client Idle Time: The average idle time of clients in each thread pool of a node + +#### Partition Table + +- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located +- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located +- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located +- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located +- DataRegion Status: The DataRegion status of the cluster where the node is located +- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located + +#### Consensus + +- Ratis Stage Latency: The time consumption of each stage of the node's Ratis +- Write Log Entry Latency: The time required to write a log for the Ratis of a node +- Remote / Local Write Latency: The time consumption of remote and local writes for the Ratis of nodes +- Remote / Local Write Throughput: Remote and local QPS written to node Ratis +- RatisConsensus Memory Utilization: Memory usage of Node Ratis consensus protocol ### 3.4 DataNode Dashboard -This dashboard displays the monitoring status of all **DataNodes** in the cluster, including **write latency, query latency, and storage file counts**. - -#### 3.4.1 Node Overview - -- **The Number of Entity:** The number of entities managed by the node. -- **Write Point Per Second:** The write speed of the node (points per second). -- **Memory Usage:** The memory usage of the node, including IoT Consensus memory usage, SchemaRegion memory usage, and per-database memory usage. - -#### 3.4.2 Protocol - -- Operation Latency: - - **The Time Consumed of Operation (avg):** The average latency of operations on the node. - - **The Time Consumed of Operation (50%):** The median latency of operations on the node. - - **The Time Consumed of Operation (99%):** The P99 latency of operations on the node. -- Thrift Statistics: - - **The QPS of Interface:** The queries per second (QPS) for each Thrift interface on the node. - - **The Avg Time Consumed of Interface:** The average execution time for each Thrift interface on the node. - - **Thrift Connection:** The number of active Thrift connections on the node. - - **Thrift Active Thread:** The number of active Thrift threads on the node. -- Client Statistics: - - **Active Client Num:** The number of active clients in each thread pool. - - **Idle Client Num:** The number of idle clients in each thread pool. - - **Borrowed Client Count:** The number of borrowed clients in each thread pool. - - **Created Client Count:** The number of clients created in each thread pool. - - **Destroyed Client Count:** The number of clients destroyed in each thread pool. - - **Client Mean Active Time:** The average active time of clients in each thread pool. - - **Client Mean Borrow Wait Time:** The average time clients spend waiting for borrowed resources in each thread pool. - - **Client Mean Idle Time:** The average idle time of clients in each thread pool. - -#### 3.4.3 Storage Engine - -- **File Count:** The number of files managed by the node. -- **File Size:** The total size of files managed by the node. -- TsFile: - - **TsFile Total Size In Each Level:** The total size of TsFiles at each level. - - **TsFile Count In Each Level:** The number of TsFiles at each level. - - **Avg TsFile Size In Each Level:** The average size of TsFiles at each level. -- **Task Number:** The number of tasks on the node. -- **The Time Consumed of Task:** The total execution time of tasks on the node. -- Compaction: - - **Compaction Read And Write Per Second:** The read/write speed of compaction operations. - - **Compaction Number Per Minute:** The number of **compaction** operations per minute. - - **Compaction Process Chunk Status:** The number of **chunks** in different states during compaction. - - **Compacted Point Num Per Minute:** The number of data points compacted per minute. - -#### 3.4.4 Write Performance - -- **Write Cost (avg):** The average **write latency**, including WAL and **memtable** writes. -- **Write Cost (50%):** The **median write latency**, including WAL and **memtable** writes. -- **Write Cost (99%):** The **P99 write latency**, including WAL and **memtable** writes. -- WAL (Write-Ahead Logging) - - **WAL File Size:** The total size of WAL files managed by the node. - - **WAL File Num:** The total number of WAL files managed by the node. - - **WAL Nodes Num:** The total number of WAL Nodes managed by the node. - - **Make Checkpoint Costs:** The time required to create different types of Checkpoints. - - **WAL Serialize Total Cost:** The total serialization time for WAL. - - **Data Region Mem Cost:** The memory usage of different DataRegions, including total memory usage of DataRegions on the current instance and total memory usage of DataRegions across the entire cluster. - - **Serialize One WAL Info Entry Cost:** The time taken to serialize a single WAL Info Entry. - - **Oldest MemTable Ram Cost When Cause Snapshot:** The memory size of the oldest MemTable when a snapshot is triggered by WAL. - - **Oldest MemTable Ram Cost When Cause Flush:** The memory size of the oldest MemTable when a flush is triggered by WAL. - - **Effective Info Ratio of WALNode:** The ratio of effective information in different WALNodes. +This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. + +#### Node Overview + +- Total Managed Entities: Entity situation of node management +- Write Throughput: The write speed per second of the node +- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. + +#### Protocol + +- Node Operation Time Consumption + - Average Operation Latency: The average time spent on various operations of a node + - P50 Operation Latency: The median time spent on various operations of a node + - P99 Operation Latency: P99 time consumption for various operations of nodes +- Thrift Statistics + - Thrift Interface QPS: QPS of various Thrift interfaces of nodes + - Average Thrift Interface Latency: The average time consumption of each Thrift interface of a node + - Thrift Connections: The number of Thrfit connections of each type of node + - Active Thrift Threads: The number of active Thrift connections for each type of node +- Client Statistics + - Active Clients: The number of active clients in each thread pool of a node + - Idle Clients: The number of idle clients in each thread pool of a node + - Borrowed Clients Per Second:Number of borrowed clients for each thread pool of a node + - Created Clients Per Second: Number of created clients for each thread pool of the node + - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node + - Average Client Active Time: The average active time of clients in each thread pool of a node + - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node + - Average Client Idle Time: The average idle time of clients in each thread pool of a node + +#### Storage Engine + +- File Count: Number of files of various types managed by nodes +- File Size: Node management of various types of file sizes +- TsFile + - Total TsFile Size Per Level: The total size of TsFile files at each level of node management + - TsFile Count Per Level: Number of TsFile files at each level of node management + - Average TsFile Size Per Level: The average size of TsFile files at each level of node management +- Total Tasks: Number of Tasks for Nodes +- Task Latency: The time consumption of tasks for nodes +- Compaction + - Compaction Read/Write Throughput: The merge read and write speed of nodes per second + - Compactions Per Minute: The number of merged nodes per minute + - Compaction Chunk Status: The number of Chunks in different states merged by nodes + - Compacted-Points Per Minute: The number of merged nodes per minute + +#### Write Performance + +- Average Write Latency: Average node write time, including writing wal and memtable +- P50 Write Latency: Median node write time, including writing wal and memtable +- P99 Write Latency: P99 for node write time, including writing wal and memtable +- WAL + - WAL File Size: Total size of WAL files managed by nodes + - WAL Files:Number of WAL files managed by nodes + - WAL Nodes: Number of WAL nodes managed by nodes + - Checkpoint Creation Time: The time required to create various types of CheckPoints for nodes + - WAL Serialization Time (Total): Total time spent on node WAL serialization + - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster + - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry + - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot + - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush + - WALNode Effective Info Ratio: The effective information ratio of different WALNodes of nodes - WAL Buffer - - **WAL Buffer Cost:** The time taken to flush the SyncBuffer of WAL, including both synchronous and asynchronous flushes. - - **WAL Buffer Used Ratio:** The utilization ratio of the WAL Buffer. - - **WAL Buffer Entries Count:** The number of entries in the WAL Buffer. + - WAL Buffer Latency: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options + - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node + - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node - Flush Statistics - - **Flush MemTable Cost (avg):** The average total flush time, including time spent in different sub-stages. - - **Flush MemTable Cost (50%):** The median total flush time, including time spent in different sub-stages. - - **Flush MemTable Cost (99%):** The P99 total flush time, including time spent in different sub-stages. - - **Flush Sub Task Cost (avg):** The average execution time of flush sub-tasks, including sorting, encoding, and I/O stages. - - **Flush Sub Task Cost (50%):** The median execution time of flush sub-tasks, including sorting, encoding, and I/O stages. - - **Flush Sub Task Cost (99%):** The P99 execution time of flush sub-tasks, including sorting, encoding, and I/O stages. -- **Pending Flush Task Num:** The number of Flush tasks currently in a blocked state. -- **Pending Flush Sub Task Num:** The number of blocked Flush sub-tasks. -- **TsFile Compression Ratio of Flushing MemTable:** The compression ratio of TsFiles generated from flushed MemTables. -- **Flush TsFile Size of DataRegions:** The size of TsFiles generated from flushed MemTables in different DataRegions. -- **Size of Flushing MemTable:** The size of the MemTable currently being flushed. -- **Points Num of Flushing MemTable:** The number of data points being flushed from MemTables in different DataRegions. -- S**eries Num of Flushing MemTable:** The number of time series being flushed from MemTables in different DataRegions. -- **Average Point Num of Flushing MemChunk:** The average number of points in MemChunks being flushed. - -#### 3.4.5 Schema Engine - -- **Schema Engine Mode:** The metadata engine mode used by the node. -- **Schema Consensus Protocol:** The metadata consensus protocol used by the node. -- **Schema Region Number:** The number of SchemaRegions managed by the node. -- **Schema Region Memory Overview:** The total memory used by SchemaRegions on the node. -- **Memory Usage per SchemaRegion:** The average memory usage per SchemaRegion. -- **Cache MNode per SchemaRegion:** The number of cached MNodes per SchemaRegion. -- **MLog Length and Checkpoint****:** The current MLog size and checkpoint position for each SchemaRegion (valid only for SimpleConsensus). -- **Buffer MNode per SchemaRegion:** The number of buffered MNodes per SchemaRegion. -- **Activated Template Count per SchemaRegion:** The number of activated templates per SchemaRegion. -- Time Series Statistics - - **Timeseries Count per SchemaRegion:** The average number of time series per SchemaRegion. - - **Series Type:** The number of time series of different types. - - **Time Series Number:** The total number of time series on the node. - - **Template Series Number:** The total number of template-based time series on the node. - - **Template Series Count per SchemaRegion:** The number of time series created via templates per SchemaRegion. + - Average Flush Latency: The total time spent on node Flush and the average time spent on each sub stage + - P50 Flush Latency: The total time spent on node Flush and the median time spent on each sub stage + - P99 Flush Latency: The total time spent on node Flush and the P99 time spent on each sub stage + - Average Flush Subtask Latency: The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages + - P50 Flush Subtask Latency: The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages + - P99 Flush Subtask Latency: The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages +- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node +- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes +- Tsfile Compression Ratio of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable +- Flush TsFile Size of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions +- Size of Flushing MemTable: The size of the Memtable for node disk flushing +- Points Num of Flushing MemTable: The number of points when flashing data in different DataRegions of a node +- Series Num of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node +- Average Point Num of Flushing MemChunk: The average number of disk flushing points for node MemChunk + +#### Schema Engine + +- Schema Engine Mode: The metadata engine pattern of nodes +- Schema Consensus Protocol: Node metadata consensus protocol +- Schema Region Number:Number of SchemeRegions managed by nodes +- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node +- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion +- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node +- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) +- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node +- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node +- Time Series statistics + - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion + - Series Type: Number of time series of different types of nodes + - Time Series Number: The total number of time series nodes + - Template Series Number: The total number of template time series for nodes + - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node - IMNode Statistics - - **Pinned MNode per SchemaRegion:** The number of pinned IMNodes per SchemaRegion. - - **Pinned Memory per SchemaRegion:** The memory usage of pinned IMNodes per SchemaRegion. - - **Unpinned MNode per SchemaRegion:** The number of unpinned IMNodes per SchemaRegion. - - **Unpinned Memory per SchemaRegion:** The memory usage of unpinned IMNodes per SchemaRegion. - - **Schema File Memory MNode Number:** The total number of pinned and unpinned IMNodes on the node. - - **Release and Flush MNode Rate:** The number of IMNodes released and flushed per second. -- **Cache Hit Rate:** The cache hit ratio of the node. -- **Release and Flush Thread Number:** The number of active threads for releasing and flushing memory. -- **Time Consumed of Release and Flush (avg):** The average execution time for cache release and buffer flush. -- **Time Consumed of Release and Flush (99%):** The P99 execution time for cache release and buffer flush. - -#### 3.4.6 Query Engine - -- Time Consumed at Each Stage - - **The time consumed of query plan stages (avg):** The average time consumed in different query plan stages on the node. - - **The time consumed of query plan stages (50%):** The median time consumed in different query plan stages on the node. - - **The time consumed of query plan stages (99%):** The P99 time consumed in different query plan stages on the node. -- Plan Dispatch Time - - **The time consumed of plan dispatch stages (avg):** The average time consumed in query execution plan dispatch. - - **The time consumed of plan dispatch stages (50%):** The median time consumed in query execution plan dispatch. - - **The time consumed of plan dispatch stages (99%):** The P99 time consumed in query execution plan dispatch. -- Query Execution Time - - **The time consumed of query execution stages (avg):** The average time consumed in query execution on the node. - - **The time consumed of query execution stages (50%):** The median time consumed in query execution on the node. - - **The time consumed of query execution stages (99%):** The P99 time consumed in query execution on the node. + - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion + - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node + - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node + - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node + - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes + - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second +- Cache Hit Rate: Cache hit rate of nodes +- Release and Flush Thread Number: The current number of active Release and Flush threads on the node +- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing +- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing + +#### Query Engine + +- Time Consumption In Each Stage + - Average Query Plan Execution Time: The average time spent on node queries at each stage + - P50 Query Plan Execution Time: Median time spent on node queries at each stage + - P99 Query Plan Execution Time: P99 time consumption for node query at each stage +- Execution Plan Distribution Time + - Average Query Plan Dispatch Time: The average time spent on node query execution plan distribution + - P50 Query Plan Dispatch Time: Median time spent on node query execution plan distribution + - P99 Query Plan Dispatch Time: P99 of node query execution plan distribution time +- Execution Plan Execution Time + - Average Query Execution Time: The average execution time of node query execution plan + - P50 Query Execution Time:Median execution time of node query execution plan + - P99 Query Execution Time: P99 of node query execution plan execution time - Operator Execution Time - - **The time consumed of operator execution stages (avg):** The average time consumed in query operator execution. - - **The time consumed of operator execution (50%):** The median time consumed in query operator execution. - - **The time consumed of operator execution (99%):** The P99 time consumed in query operator execution + - Average Query Operator Execution Time: The average execution time of node query operators + - P50 Query Operator Execution Time: Median execution time of node query operator + - P99 Query Operator Execution Time: P99 of node query operator execution time - Aggregation Query Computation Time - - **The time consumed of query aggregation (avg):** The average time consumed in aggregation query computation. - - **The time consumed of query aggregation (50%):** The median time consumed in aggregation query computation. - - **The time consumed of query aggregation (99%):** The P99 time consumed in aggregation query computation. -- File/Memory Interface Time - - **The time consumed of query scan (avg):** The average time consumed in file/memory interface query scans. - - **The time consumed of query scan (50%):** The median time consumed in file/memory interface query scans. - - **The time consumed of query scan (99%):** The P99 time consumed in file/memory interface query scans. -- Resource Access Count - - **The usage of query resource (avg):** The average number of resource accesses during query execution. - - **The usage of query resource (50%):** The median number of resource accesses during query execution. - - **The usage of query resource (99%):** The P99 number of resource accesses during query execution. + - Average Query Aggregation Execution Time: The average computation time for node aggregation queries + - P50 Query Aggregation Execution Time: Median computation time for node aggregation queries + - P99 Query Aggregation Execution Time: P99 of node aggregation query computation time +- File/Memory Interface Time Consumption + - Average Query Scan Execution Time: The average time spent querying file/memory interfaces for nodes + - P50 Query Scan Execution Time: Median time spent querying file/memory interfaces for nodes + - P99 Query Scan Execution Time: P99 time consumption for node query file/memory interface +- Number Of Resource Visits + - Average Query Resource Utilization: The average number of resource visits for node queries + - P50 Query Resource Utilization: Median number of resource visits for node queries + - P99 Query Resource Utilization: P99 for node query resource access quantity - Data Transmission Time - - **The time consumed of query data exchange (avg):** The average time consumed in query data exchange. - - **The time consumed of query data exchange (50%):** The median time consumed in query data exchange. - - **The time consumed of query data exchange (99%):** The P99 time consumed in query data exchange. -- Data Transmission Count - - **The count of Data Exchange (avg):** The average number of data exchanges during queries. - - **The count of Data Exchange:** The quantiles (median, P99) of data exchanges during queries. -- Task Scheduling Count and Time - - **The number of query queue:** The number of query tasks scheduled. - - **The time consumed of query schedule time (avg):** The average time consumed for query scheduling. - - **The time consumed of query schedule time (50%):** The median time consumed for query scheduling. - - **The time consumed of query schedule time (99%):** The P99 time consumed for query scheduling. - -#### 3.4.7 Query Interface + - Average Query Data Exchange Latency: The average time spent on node query data transmission + - P50 Query Data Exchange Latency: Median query data transmission time for nodes + - P99 Query Data Exchange Latency: P99 for node query data transmission time +- Number Of Data Transfers + - Average Query Data Exchange Count: The average number of data transfers queried by nodes + - Query Data Exchange Count: The quantile of the number of data transfers queried by nodes, including the median and P99 +- Task Scheduling Quantity And Time Consumption + - Query Queue Length: Node query task scheduling quantity + - Average Query Scheduling Latency: The average time spent on scheduling node query tasks + - P50 Query Scheduling Latency: Median time spent on node query task scheduling + - P99 Query Scheduling Latency: P99 of node query task scheduling time + +#### Query Interface - Load Time Series Metadata - - **The time consumed of load timeseries metadata (avg):** The average time consumed for loading time series metadata. - - **The time consumed of load timeseries metadata (50%):** The median time consumed for loading time series metadata. - - **The time consumed of load timeseries metadata (99%):** The P99 time consumed for loading time series metadata. + - Average Timeseries Metadata Load Time: The average time taken for node queries to load time series metadata + - P50 Timeseries Metadata Load Time: Median time spent on loading time series metadata for node queries + - P99 Timeseries Metadata Load Time: P99 time consumption for node query loading time series metadata - Read Time Series - - **The time consumed of read timeseries metadata (avg):** The average time consumed for reading time series. - - **The time consumed of read timeseries metadata (50%):** The median time consumed for reading time series. - - **The time consumed of read timeseries metadata (99%):** The P99 time consumed for reading time series. + - Average Timeseries Metadata Read Time: The average time taken for node queries to read time series + - P50 Timeseries Metadata Read Time: The median time taken for node queries to read time series + - P99 Timeseries Metadata Read Time: P99 time consumption for node query reading time series - Modify Time Series Metadata - - **The time consumed of timeseries metadata modification (avg):** The average time consumed for modifying time series metadata. - - **The time consumed of timeseries metadata modification (50%):** The median time consumed for modifying time series metadata. - - **The time consumed of timeseries metadata modification (99%):** The P99 time consumed for modifying time series metadata. + - Average Timeseries Metadata Modification Time:The average time taken for node queries to modify time series metadata + - P50 Timeseries Metadata Modification Time: Median time spent on querying and modifying time series metadata for nodes + - P99 Timeseries Metadata Modification Time: P99 time consumption for node query and modification of time series metadata - Load Chunk Metadata List - - The time consumed of load chunk metadata list(avg): Average time consumed of loading chunk metadata list by the node - - The time consumed of load chunk metadata list(50%): Median time consumed of loading chunk metadata list by the node - - The time consumed of load chunk metadata list(99%): P99 time consumed of loading chunk metadata list by the node + - Average Chunk Metadata List Load Time: The average time it takes for node queries to load Chunk metadata lists + - P50 Chunk Metadata List Load Time: Median time spent on node query loading Chunk metadata list + - P99 Chunk Metadata List Load Time: P99 time consumption for node query loading Chunk metadata list - Modify Chunk Metadata - - The time consumed of chunk metadata modification(avg): Average time consumed of modifying chunk metadata by the node - - The time consumed of chunk metadata modification(50%): Median time consumed of modifying chunk metadata by the node - - The time consumed of chunk metadata modification(99%): P99 time consumed of modifying chunk metadata by the node -- Filter by Chunk Metadata - - **The time consumed of chunk metadata filter (avg):** The average time consumed for filtering by chunk metadata. - - **The time consumed of chunk metadata filter (50%):** The median time consumed for filtering by chunk metadata. - - **The time consumed of chunk metadata filter (99%):** The P99 time consumed for filtering by chunk metadata. -- Construct Chunk Reader - - **The time consumed of construct chunk reader (avg):** The average time consumed for constructing a Chunk Reader. - - **The time consumed of construct chunk reader (50%):** The median time consumed for constructing a Chunk Reader. - - **The time consumed of construct chunk reader (99%):** The P99 time consumed for constructing a Chunk Reader. + - Average Chunk Metadata Modification Time: The average time it takes for node queries to modify Chunk metadata + - P50 Chunk Metadata Modification Time: The total number of bits spent on modifying Chunk metadata for node queries + - P99 Chunk Metadata Modification Time: P99 time consumption for node query and modification of Chunk metadata +- Filter According To Chunk Metadata + - Average Chunk Metadata Filtering Time: The average time spent on node queries filtering by Chunk metadata + - P50 Chunk Metadata Filtering Time: Median filtering time for node queries based on Chunk metadata + - P99 Chunk Metadata Filtering Time: P99 time consumption for node query filtering based on Chunk metadata +- Constructing Chunk Reader + - Average Chunk Reader Construction Time: The average time spent on constructing Chunk Reader for node queries + - P50 Chunk Reader Construction Time: Median time spent on constructing Chunk Reader for node queries + - P99 Chunk Reader Construction Time: P99 time consumption for constructing Chunk Reader for node queries - Read Chunk - - **The time consumed of read chunk (avg):** The average time consumed for reading a Chunk. - - **The time consumed of read chunk (50%):** The median time consumed for reading a Chunk. - - **The time consumed of read chunk (99%):** The P99 time consumed for reading a Chunk. + - Average Chunk Read Time: The average time taken for node queries to read Chunks + - P50 Chunk Read Time: Median time spent querying nodes to read Chunks + - P99 Chunk Read Time: P99 time spent on querying and reading Chunks for nodes - Initialize Chunk Reader - - **The time consumed of init chunk reader (avg):** The average time consumed for initializing a Chunk Reader. - - **The time consumed of init chunk reader (50%):** The median time consumed for initializing a Chunk Reader. - - **The time consumed of init chunk reader (99%):** The P99 time consumed for initializing a Chunk Reader. -- Build TsBlock from Page Reader - - **The time consumed of build tsblock from page reader (avg):** The average time consumed for building a TsBlock using a Page Reader. - - **The time consumed of build tsblock from page reader (50%):** The median time consumed for building a TsBlock using a Page Reader. - - **The time consumed of build tsblock from page reader (99%):** The P99 time consumed for building a TsBlock using a Page Reader. -- Build TsBlock from Merge Reader - - **The time consumed of build tsblock from merge reader (avg):** The average time consumed for building a TsBlock using a Merge Reader. - - **The time consumed of build tsblock from merge reader (50%):** The median time consumed for building a TsBlock using a Merge Reader. - - **The time consumed of build tsblock from merge reader (99%):** The P99 time consumed for building a TsBlock using a Merge Reader. - -#### 3.4.8 Query Data Exchange - -Time consumed of data exchange in queries. - -- Get TsBlock via Source Handle - - **The time consumed of source handle get tsblock (avg):** The average time consumed for retrieving a TsBlock using the source handle. - - **The time consumed of source handle get tsblock (50%):** The median time consumed for retrieving a TsBlock using the source handle. - - **The time consumed of source handle get tsblock (99%):** The P99 time consumed for retrieving a TsBlock using the source handle. -- Deserialize TsBlock via Source Handle - - **The time consumed of source handle deserialize tsblock (avg):** The average time consumed for deserializing a TsBlock via the source handle. - - **The time consumed of source handle deserialize tsblock (50%):** The median time consumed for deserializing a TsBlock via the source handle. - - **The time consumed of source handle deserialize tsblock (99%):** The P99 time consumed for deserializing a TsBlock via the source handle. -- Send TsBlock via Sink Handle - - **The time consumed of sink handle send tsblock (avg):** The average time consumed for sending a TsBlock via the sink handle. - - **The time consumed of sink handle send tsblock (50%):** The median time consumed for sending a TsBlock via the sink handle. - - **The time consumed of sink handle send tsblock (99%):** The P99 time consumed for sending a TsBlock via the sink handle. -- Handle Data Block Event Callback - - **The time consumed of handling data block event callback (avg):** The average time consumed for handling the callback of a data block event during query execution. - - **The time consumed of handling data block event callback (50%):** The median time consumed for handling the callback of a data block event during query execution. - - **The time consumed of handling data block event callback (99%):** The P99 time consumed for handling the callback of a data block event during query execution. -- Get Data Block Task - - **The time consumed of get data block task (avg):** The average time consumed for retrieving a data block task. - - **The time consumed of get data block task (50%):** The median time consumed for retrieving a data block task. - - **The time consumed of get data block task (99%):** The P99 time consumed for retrieving a data block task. - -#### 3.4.9 Query Related Resource - -- **MppDataExchangeManager:** The number of shuffle sink handles and source handles during queries. -- **LocalExecutionPlanner:** The remaining memory available for query fragments. -- **FragmentInstanceManager:** The context information and count of running query fragments. -- **Coordinator:** The number of queries recorded on the node. -- **MemoryPool Size:** The status of the memory pool related to queries. -- **MemoryPool Capacity:** The size of the query-related memory pool, including the maximum and remaining available capacity. -- **DriverScheduler:** The number of queued query tasks. - -#### 3.4.10 Consensus - IoT Consensus + - Average Chunk Reader Initialization Time: The average time spent initializing Chunk Reader for node queries + - P50 Chunk Reader Initialization Time: Median time spent initializing Chunk Reader for node queries + - P99 Chunk Reader Initialization Time:P99 time spent initializing Chunk Reader for node queries +- Constructing TsBlock Through Page Reader + - Average TsBlock Construction Time from Page Reader: The average time it takes for node queries to construct TsBlock through Page Reader + - P50 TsBlock Construction Time from Page Reader: The median time spent on constructing TsBlock through Page Reader for node queries + - P99 TsBlock Construction Time from Page Reader:Node query using Page Reader to construct TsBlock time-consuming P99 +- Query the construction of TsBlock through Merge Reader + - Average TsBlock Construction Time from Merge Reader: The average time taken for node queries to construct TsBlock through Merge Reader + - P50 TsBlock Construction Time from Merge Reader: The median time spent on constructing TsBlock through Merge Reader for node queries + - P99 TsBlock Construction Time from Merge Reader: Node query using Merge Reader to construct TsBlock time-consuming P99 + +#### Query Data Exchange + +The data exchange for the query is time-consuming. + +- Obtain TsBlock through source handle + - Average Source Handle TsBlock Retrieval Time: The average time taken for node queries to obtain TsBlock through source handle + - P50 Source Handle TsBlock Retrieval Time:Node query obtains the median time spent on TsBlock through source handle + - P99 Source Handle TsBlock Retrieval Time: Node query obtains TsBlock time P99 through source handle +- Deserialize TsBlock through source handle + - Average Source Handle TsBlock Deserialization Time: The average time taken for node queries to deserialize TsBlock through source handle + - P50 Source Handle TsBlock Deserialization Time: The median time taken for node queries to deserialize TsBlock through source handle + - P99 Source Handle TsBlock Deserialization Time: P99 time spent on deserializing TsBlock through source handle for node query +- Send TsBlock through sink handle + - Average Sink Handle TsBlock Transmission Time: The average time taken for node queries to send TsBlock through sink handle + - P50 Sink Handle TsBlock Transmission Time: Node query median time spent sending TsBlock through sink handle + - P99 Sink Handle TsBlock Transmission Time: Node query sends TsBlock through sink handle with a time consumption of P99 +- Callback data block event + - Average Data Block Event Acknowledgment Time: The average time taken for node query callback data block event + - P50 Data Block Event Acknowledgment Time: Median time spent on node query callback data block event + - P99 Data Block Event Acknowledgment Time: P99 time consumption for node query callback data block event +- Get Data Block Tasks + - Average Data Block Task Retrieval Time: The average time taken for node queries to obtain data block tasks + - P50 Data Block Task Retrieval Time: The median time taken for node queries to obtain data block tasks + - P99 Data Block Task Retrieval Time: P99 time consumption for node query to obtain data block task + +#### Query Related Resource + +- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries +- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards +- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running +- Coordinator: The number of queries recorded on the node +- MemoryPool Size: Node query related memory pool situation +- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values +- DriverScheduler Count: Number of queue tasks related to node queries + +#### Consensus - IoT Consensus - Memory Usage - - **IoTConsensus Used Memory:** The memory usage of IoT Consensus, including total used memory, queue memory usage, and synchronization memory usage. -- Synchronization between Nodes - - **IoTConsensus Sync Index:** The sync index size of different DataRegions. - - **IoTConsensus Overview:** The total synchronization lag and cached request count of IoT Consensus. - - **IoTConsensus Search Index Rate:** The growth rate of SearchIndex writes for different DataRegions. - - **IoTConsensus Safe Index Rate:** The growth rate of SafeIndex synchronization for different DataRegions. - - **IoTConsensus LogDispatcher Request Size:** The size of synchronization requests sent to other nodes for different DataRegions. - - **Sync Lag:** The synchronization lag size of different DataRegions. - - **Min Peer Sync Lag:** The minimum synchronization lag to different replicas for different DataRegions. - - **Sync Speed Diff of Peers:** The maximum synchronization lag to different replicas for different DataRegions. - - **IoTConsensus LogEntriesFromWAL Rate:** The rate of retrieving log entries from WAL for different DataRegions. - - **IoTConsensus LogEntriesFromQueue Rate:** The rate of retrieving log entries from the queue for different DataRegions. -- Execution Time of Different Stages - - **The Time Consumed of Different Stages (avg):** The average execution time of different stages in IoT Consensus. - - **The Time Consumed of Different Stages (50%):** The median execution time of different stages in IoT Consensus. - - **The Time Consumed of Different Stages (99%):** The P99 execution time of different stages in IoT Consensus. - -#### 3.4.11 Consensus - DataRegion Ratis Consensus - -- **Ratis Stage Time:** The execution time of different stages in Ratis. -- **Write Log Entry:** The execution time for writing logs in Ratis. -- **Remote / Local Write Time:** The time taken for remote and local writes in Ratis. -- **Remote / Local Write QPS****:** The QPS for remote and local writes in Ratis. -- **RatisConsensus Memory:** The memory usage of Ratis consensus. - -#### 3.4.12 Consensus - SchemaRegion Ratis Consensus - -- **Ratis Stage Time:** The execution time of different stages in Ratis. -- **Write Log Entry:** The execution time for writing logs in Ratis. -- **Remote / Local Write Time:** The time taken for remote and local writes in Ratis. -- **Remote / Local Write QPS****:** The QPS for remote and local writes in Ratis. -- **RatisConsensus Memory:** The memory usage of Ratis consensus. \ No newline at end of file + - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage +- Synchronization Status Between Nodes + - IoTConsensus Sync Index Size: SyncIndex size for different DataRegions of IoT Consumption nodes + - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes + - IoTConsensus Search Index Growth Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus Safe Index Growth Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes + - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node + - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption + - Peer Sync Speed Difference: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption + - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions + - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue +- Different Execution Stages Take Time + - The Time Consumed of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus + - The Time Consumed of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus + - The Time Consumed of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus + +#### Consensus - DataRegion Ratis Consensus + +- Ratis Consensus Stage Latency: The time consumption of different stages of node Ratis +- Ratis Log Write Latency: The time consumption of writing logs at different stages of node Ratis +- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotely +- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely +- RatisConsensus Memory Usage:Memory usage of node Ratis + +#### Consensus - SchemaRegion Ratis Consensus + +- RatisConsensus Stage Latency: The time consumption of different stages of node Ratis +- Ratis Log Write Latency: The time consumption for writing logs at each stage of node Ratis +- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely +- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely +- RatisConsensus Memory Usage: Node Ratis Memory Usage \ No newline at end of file diff --git a/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md index ec61a2a41..541ff4946 100644 --- a/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ b/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -192,18 +192,18 @@ This panel displays the current usage of system CPU, memory, disk, and network r #### CPU -- CPU Core:CPU cores -- CPU Load: - - System CPU Load:The average CPU load and busyness of the entire system during the sampling time - - Process CPU Load:The proportion of CPU occupied by the IoTDB process during sampling time +- CPU Cores:CPU cores +- CPU Utilization: + - System CPU Utilization:The average CPU load and busyness of the entire system during the sampling time + - Process CPU Utilization:The proportion of CPU occupied by the IoTDB process during sampling time - CPU Time Per Minute:The total CPU time of all processes in the system per minute #### Memory - System Memory:The current usage of system memory. - - Commited vm size: The size of virtual memory allocated by the operating system to running processes. - - Total physical memory:The total amount of available physical memory in the system. - - Used physical memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. + - Commited VM Size: The size of virtual memory allocated by the operating system to running processes. + - Total Physical Memory:The total amount of available physical memory in the system. + - Used Physical Memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. - System Swap Memory:Swap Space memory usage. - Process Memory:The usage of memory by the IoTDB process. - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) @@ -213,35 +213,35 @@ This panel displays the current usage of system CPU, memory, disk, and network r #### Disk - Disk Space: - - Total disk space:The maximum disk space that IoTDB can use. - - Used disk space:The disk space already used by IoTDB. -- Log Number Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. + - Total Disk Space:The maximum disk space that IoTDB can use. + - Used Disk Space:The disk space already used by IoTDB. +- Logs Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. - File Count:Number of IoTDB related files - - all:All file quantities + - All:All file quantities - TsFile:Number of TsFiles - - seq:Number of sequential TsFiles - - unseq:Number of unsequence TsFiles - - wal:Number of WAL files - - cross-temp:Number of cross space merge temp files - - inner-seq-temp:Number of merged temp files in sequential space - - innser-unseq-temp:Number of merged temp files in unsequential space - - mods:Number of tombstone files -- Open File Count:Number of file handles opened by the system + - Seq:Number of sequential TsFiles + - Unseq:Number of unsequence TsFiles + - WAL:Number of WAL files + - Cross-Temp:Number of cross space merge temp files + - Inner-Seq-Temp:Number of merged temp files in sequential space + - Innsr-Unseq-Temp:Number of merged temp files in unsequential space + - Mods:Number of tombstone files +- Open File Handles:Number of file handles opened by the system - File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. -- Disk I/O Busy Rate:Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. +- Disk Utilization (%):Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. - Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. -- Disk I/O Ops:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. -- Disk I/O Avg Time:Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. -- Disk I/O Avg Size:Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. -- Disk I/O Avg Queue Size:Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. -- I/O System Call Rate:The frequency of process calls to read and write system calls, similar to IOPS. +- Disk IOPS:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. +- Disk I/O Latency (Avg):Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. +- Disk I/O Request Size (Avg):Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. +- Disk I/O Queue Length (Avg):Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. +- I/O Syscall Rate:The frequency of process calls to read and write system calls, similar to IOPS. - I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. #### JVM - GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window -- GC Allocated/Promoted Size Detail: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications -- GC Data Size Detail:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value +- GC Allocated/Promoted Size: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications +- GC Live Data Size:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value - Heap Memory:JVM heap memory usage. - Maximum heap memory:The maximum available heap memory size for the JVM. - Committed heap memory:The size of heap memory that has been committed by the JVM. @@ -250,105 +250,105 @@ This panel displays the current usage of system CPU, memory, disk, and network r - PS Old Space:The size of the PS Old area. - PS Survivor Space:The size of the PS survivor area. - ...(CMS/G1/ZGC, etc) -- Off Heap Memory:Out of heap memory usage. - - direct memory:Out of heap direct memory. - - mapped memory:Out of heap mapped memory. -- GC Number Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC -- GC Time Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC -- GC Number Per Minute Detail:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC -- GC Time Per Minute Detail:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC -- Time Consumed Of Compilation Per Minute:The total time JVM spends compiling per minute -- The Number of Class: - - loaded:The number of classes currently loaded by the JVM - - unloaded:The number of classes uninstalled by the JVM since system startup -- The Number of Java Thread:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. +- Off-Heap Memory:Out of heap memory usage. + - Direct Memory:Out of heap direct memory. + - Mapped Memory:Out of heap mapped memory. +- GCs Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC +- GC Latency Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC +- GC Events Breakdown Per Minute:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC +- GC Pause Time Breakdown Per Minute:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC +- JIT Compilation Time Per Minute:The total time JVM spends compiling per minute +- Loaded & Unloaded Classes: + - Loaded:The number of classes currently loaded by the JVM + - Unloaded:The number of classes uninstalled by the JVM since system startup +- Active Java Threads:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. #### Network Eno refers to the network card connected to the public network, while lo refers to the virtual network card. -- Net Speed:The speed of network card sending and receiving data -- Receive/Transmit Data Size:The size of data packets sent or received by the network card, calculated from system restart -- Packet Speed:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets -- Connection Num:The current number of socket connections for the selected process (IoTDB only has TCP) +- Network Speed:The speed of network card sending and receiving data +- Network Throughput (Receive/Transmit):The size of data packets sent or received by the network card, calculated from system restart +- Packet Transmission Rate:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets +- Active TCP Connections:The current number of socket connections for the selected process (IoTDB only has TCP) ### 3.2 Performance Overview Dashboard #### Cluster Overview -- Total CPU Core:Total CPU cores of cluster machines +- Total CPU Cores:Total CPU cores of cluster machines - DataNode CPU Load:CPU usage of each DataNode node in the cluster - Disk - Total Disk Space: Total disk size of cluster machines - - DataNode Disk Usage: The disk usage rate of each DataNode in the cluster -- Total Timeseries: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas -- Cluster: Number of ConfigNode and DataNode nodes in the cluster + - DataNode Disk Utilization: The disk usage rate of each DataNode in the cluster +- Total Time Series: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas +- Cluster Info: Number of ConfigNode and DataNode nodes in the cluster - Up Time: The duration of cluster startup until now -- Total Write Point Per Second: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas +- Total Write Throughput: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas - Memory - Total System Memory: Total memory size of cluster machine system - Total Swap Memory: Total size of cluster machine swap memory - - DataNode Process Memory Usage: Memory usage of each DataNode in the cluster -- Total File Number:Total number of cluster management files + - DataNode Process Memory Utilization: Memory usage of each DataNode in the cluster +- Total Files:Total number of cluster management files - Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage -- Total DataBase: The total number of databases managed by the cluster (including replicas) -- Total DataRegion: The total number of DataRegions managed by the cluster -- Total SchemaRegion: The total number of SchemeRegions managed by the cluster +- Total DataBases: The total number of databases managed by the cluster (including replicas) +- Total DataRegions: The total number of DataRegions managed by the cluster +- Total SchemaRegions: The total number of SchemeRegions managed by the cluster #### Node Overview -- CPU Core: The number of CPU cores in the machine where the node is located +- CPU Cores: The number of CPU cores in the machine where the node is located - Disk Space: The disk size of the machine where the node is located -- Timeseries: Number of time series managed by the machine where the node is located (including replicas) +- Time Series: Number of time series managed by the machine where the node is located (including replicas) - System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio -- Write Point Per Second: The write speed per second of the machine where the node is located (including replicas) +- Write Throughput: The write speed per second of the machine where the node is located (including replicas) - System Memory: The system memory size of the machine where the node is located - Swap Memory:The swap memory size of the machine where the node is located -- File Number: Number of files managed by nodes +- File Count: Number of files managed by nodes #### Performance - Session Idle Time:The total idle time and total busy time of the session connection of the node -- Client Connection: The client connection status of the node, including the total number of connections and the number of active connections -- Time Consumed Of Operation: The time consumption of various types of node operations, including average and P99 -- Average Time Consumed Of Interface: The average time consumption of each thrust interface of a node -- P99 Time Consumed Of Interface: P99 time consumption of various thrust interfaces of nodes -- Task Number: The number of system tasks for each node -- Average Time Consumed of Task: The average time spent on various system tasks of a node -- P99 Time Consumed of Task: P99 time consumption for various system tasks of nodes -- Operation Per Second: The number of operations per second for a node +- Client Connections: The client connection status of the node, including the total number of connections and the number of active connections +- Operation Latency: The time consumption of various types of node operations, including average and P99 +- Average Interface Latency: The average time consumption of each thrust interface of a node +- P99 Interface Latency: P99 time consumption of various thrust interfaces of nodes +- Total Tasks: The number of system tasks for each node +- Average Task Latency: The average time spent on various system tasks of a node +- P99 Task Latency: P99 time consumption for various system tasks of nodes +- Operations Per Second: The number of operations per second for a node - Mainstream Process - - Operation Per Second Of Stage: The number of operations per second for each stage of the node's main process - - Average Time Consumed Of Stage: The average time consumption of each stage in the main process of a node - - P99 Time Consumed Of Stage: P99 time consumption for each stage of the node's main process + - Operations Per Second (Stage-wise): The number of operations per second for each stage of the node's main process + - Average Stage Latency: The average time consumption of each stage in the main process of a node + - P99 Stage Latency: P99 time consumption for each stage of the node's main process - Schedule Stage - - OPS Of Schedule: The number of operations per second in each sub stage of the node schedule stage - - Average Time Consumed Of Schedule Stage:The average time consumption of each sub stage in the node schedule stage - - P99 Time Consumed Of Schedule Stage: P99 time consumption for each sub stage of the schedule stage of the node + - Schedule Operations Per Second: The number of operations per second in each sub stage of the node schedule stage + - Average Schedule Stage Latency:The average time consumption of each sub stage in the node schedule stage + - P99 Schedule Stage Latency: P99 time consumption for each sub stage of the schedule stage of the node - Local Schedule Sub Stages - - OPS Of Local Schedule Stage: The number of operations per second in each sub stage of the local schedule node - - Average Time Consumed Of Local Schedule Stage: The average time consumption of each sub stage in the local schedule stage of the node - - P99 Time Consumed Of Local Schedule Stage: P99 time consumption for each sub stage of the local schedule stage of the node + - Local Schedule Operations Per Second: The number of operations per second in each sub stage of the local schedule node + - Average Local Schedule Stage Latency: The average time consumption of each sub stage in the local schedule stage of the node + - P99 Local Schedule Latency: P99 time consumption for each sub stage of the local schedule stage of the node - Storage Stage - - OPS Of Storage Stage: The number of operations per second in each sub stage of the node storage stage - - Average Time Consumed Of Storage Stage: Average time consumption of each sub stage in the node storage stage - - P99 Time Consumed Of Storage Stage: P99 time consumption for each sub stage of node storage stage + - Storage Operations Per Second: The number of operations per second in each sub stage of the node storage stage + - Average Storage Stage Latency: Average time consumption of each sub stage in the node storage stage + - P99 Storage Stage Latency: P99 time consumption for each sub stage of node storage stage - Engine Stage - - OPS Of Engine Stage: The number of operations per second in each sub stage of the node engine stage - - Average Time Consumed Of Engine Stage: The average time consumption of each sub stage in the engine stage of a node - - P99 Time Consumed Of Engine Stage: P99 time consumption of each sub stage in the node engine stage + - Engine Operations Per Second: The number of operations per second in each sub stage of the node engine stage + - Average Engine Stage Latency: The average time consumption of each sub stage in the engine stage of a node + - P99 Engine Stage Latency: P99 time consumption of each sub stage in the node engine stage #### System -- CPU Load: CPU load of nodes -- CPU Time Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores -- GC Time Per Minute:The average GC time per minute for nodes, including YGC and FGC +- CPU Utilization: CPU load of nodes +- CPU Latency Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores +- GC Latency Per Minute:The average GC time per minute for nodes, including YGC and FGC - Heap Memory: Node's heap memory usage -- Off Heap Memory: Non heap memory usage of nodes -- The Number Of Java Thread: Number of Java threads on nodes +- Off-Heap Memory: Non heap memory usage of nodes +- Total Java Threads: Number of Java threads on nodes - File Count:Number of files managed by nodes - File Size: Node management file size situation -- Log Number Per Minute: Different types of logs per minute for nodes +- Logs Per Minute: Different types of logs per minute for nodes ### 3.3 ConfigNode Dashboard @@ -359,13 +359,13 @@ This panel displays the performance of all management nodes in the cluster, incl - Database Count: Number of databases for nodes - Region - DataRegion Count:Number of DataRegions for nodes - - DataRegion Current Status: The state of the DataRegion of the node + - DataRegion Status: The state of the DataRegion of the node - SchemaRegion Count: Number of SchemeRegions for nodes - - SchemaRegion Current Status: The state of the SchemeRegion of the node -- System Memory: The system memory size of the node -- Swap Memory: Node's swap memory size -- ConfigNodes: The running status of the ConfigNode in the cluster where the node is located -- DataNodes:The DataNode situation of the cluster where the node is located + - SchemaRegion Status: The state of the SchemeRegion of the node +- System Memory Utilization: The system memory size of the node +- Swap Memory Utilization: Node's swap memory size +- ConfigNodes Status: The running status of the ConfigNode in the cluster where the node is located +- DataNodes Status:The DataNode situation of the cluster where the node is located - System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load #### NodeInfo @@ -381,15 +381,15 @@ This panel displays the performance of all management nodes in the cluster, incl #### Protocol - Client Count - - Active Client Num: The number of active clients in each thread pool of a node - - Idle Client Num: The number of idle clients in each thread pool of a node - - Borrowed Client Count: Number of borrowed clients in each thread pool of the node - - Created Client Count: Number of created clients for each thread pool of the node - - Destroyed Client Count: The number of destroyed clients in each thread pool of the node + - Active Clients: The number of active clients in each thread pool of a node + - Idle Clients: The number of idle clients in each thread pool of a node + - Borrowed Clients Per Second: Number of borrowed clients in each thread pool of the node + - Created Clients Per Second: Number of created clients for each thread pool of the node + - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node - Client time situation - - Client Mean Active Time: The average active time of clients in each thread pool of a node - - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node - - Client Mean Idle Time: The average idle time of clients in each thread pool of a node + - Average Client Active Time: The average active time of clients in each thread pool of a node + - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node + - Average Client Idle Time: The average idle time of clients in each thread pool of a node #### Partition Table @@ -402,11 +402,11 @@ This panel displays the performance of all management nodes in the cluster, incl #### Consensus -- Ratis Stage Time: The time consumption of each stage of the node's Ratis -- Write Log Entry: The time required to write a log for the Ratis of a node -- Remote / Local Write Time: The time consumption of remote and local writes for the Ratis of nodes -- Remote / Local Write QPS: Remote and local QPS written to node Ratis -- RatisConsensus Memory: Memory usage of Node Ratis consensus protocol +- Ratis Stage Latency: The time consumption of each stage of the node's Ratis +- Write Log Entry Latency: The time required to write a log for the Ratis of a node +- Remote / Local Write Latency: The time consumption of remote and local writes for the Ratis of nodes +- Remote / Local Write Throughput: Remote and local QPS written to node Ratis +- RatisConsensus Memory Utilization: Memory usage of Node Ratis consensus protocol ### 3.4 DataNode Dashboard @@ -414,82 +414,82 @@ This panel displays the monitoring status of all data nodes in the cluster, incl #### Node Overview -- The Number Of Entity: Entity situation of node management -- Write Point Per Second: The write speed per second of the node +- Total Managed Entities: Entity situation of node management +- Write Throughput: The write speed per second of the node - Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. #### Protocol - Node Operation Time Consumption - - The Time Consumed Of Operation (avg): The average time spent on various operations of a node - - The Time Consumed Of Operation (50%): The median time spent on various operations of a node - - The Time Consumed Of Operation (99%): P99 time consumption for various operations of nodes + - Average Operation Latency: The average time spent on various operations of a node + - P50 Operation Latency: The median time spent on various operations of a node + - P99 Operation Latency: P99 time consumption for various operations of nodes - Thrift Statistics - - The QPS Of Interface: QPS of various Thrift interfaces of nodes - - The Avg Time Consumed Of Interface: The average time consumption of each Thrift interface of a node - - Thrift Connection: The number of Thrfit connections of each type of node - - Thrift Active Thread: The number of active Thrift connections for each type of node + - Thrift Interface QPS: QPS of various Thrift interfaces of nodes + - Average Thrift Interface Latency: The average time consumption of each Thrift interface of a node + - Thrift Connections: The number of Thrfit connections of each type of node + - Active Thrift Threads: The number of active Thrift connections for each type of node - Client Statistics - - Active Client Num: The number of active clients in each thread pool of a node - - Idle Client Num: The number of idle clients in each thread pool of a node - - Borrowed Client Count:Number of borrowed clients for each thread pool of a node - - Created Client Count: Number of created clients for each thread pool of the node - - Destroyed Client Count: The number of destroyed clients in each thread pool of the node - - Client Mean Active Time: The average active time of clients in each thread pool of a node - - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node - - Client Mean Idle Time: The average idle time of clients in each thread pool of a node + - Active Clients: The number of active clients in each thread pool of a node + - Idle Clients: The number of idle clients in each thread pool of a node + - Borrowed Clients Per Second:Number of borrowed clients for each thread pool of a node + - Created Clients Per Second: Number of created clients for each thread pool of the node + - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node + - Average Client Active Time: The average active time of clients in each thread pool of a node + - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node + - Average Client Idle Time: The average idle time of clients in each thread pool of a node #### Storage Engine - File Count: Number of files of various types managed by nodes - File Size: Node management of various types of file sizes - TsFile - - TsFile Total Size In Each Level: The total size of TsFile files at each level of node management - - TsFile Count In Each Level: Number of TsFile files at each level of node management - - Avg TsFile Size In Each Level: The average size of TsFile files at each level of node management -- Task Number: Number of Tasks for Nodes -- The Time Consumed of Task: The time consumption of tasks for nodes + - Total TsFile Size Per Level: The total size of TsFile files at each level of node management + - TsFile Count Per Level: Number of TsFile files at each level of node management + - Average TsFile Size Per Level: The average size of TsFile files at each level of node management +- Total Tasks: Number of Tasks for Nodes +- Task Latency: The time consumption of tasks for nodes - Compaction - - Compaction Read And Write Per Second: The merge read and write speed of nodes per second - - Compaction Number Per Minute: The number of merged nodes per minute - - Compaction Process Chunk Status: The number of Chunks in different states merged by nodes - - Compacted Point Num Per Minute: The number of merged nodes per minute + - Compaction Read/Write Throughput: The merge read and write speed of nodes per second + - Compactions Per Minute: The number of merged nodes per minute + - Compaction Chunk Status: The number of Chunks in different states merged by nodes + - Compacted-Points Per Minute: The number of merged nodes per minute #### Write Performance -- Write Cost(avg): Average node write time, including writing wal and memtable -- Write Cost(50%): Median node write time, including writing wal and memtable -- Write Cost(99%): P99 for node write time, including writing wal and memtable +- Average Write Latency: Average node write time, including writing wal and memtable +- P50 Write Latency: Median node write time, including writing wal and memtable +- P99 Write Latency: P99 for node write time, including writing wal and memtable - WAL - WAL File Size: Total size of WAL files managed by nodes - - WAL File Num:Number of WAL files managed by nodes - - WAL Nodes Num: Number of WAL nodes managed by nodes - - Make Checkpoint Costs: The time required to create various types of CheckPoints for nodes - - WAL Serialize Total Cost: Total time spent on node WAL serialization + - WAL Files:Number of WAL files managed by nodes + - WAL Nodes: Number of WAL nodes managed by nodes + - Checkpoint Creation Time: The time required to create various types of CheckPoints for nodes + - WAL Serialization Time (Total): Total time spent on node WAL serialization - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush - - Effective Info Ratio Of WALNode: The effective information ratio of different WALNodes of nodes + - WALNode Effective Info Ratio: The effective information ratio of different WALNodes of nodes - WAL Buffer - - WAL Buffer Cost: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options + - WAL Buffer Latency: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node - Flush Statistics - - Flush MemTable Cost(avg): The total time spent on node Flush and the average time spent on each sub stage - - Flush MemTable Cost(50%): The total time spent on node Flush and the median time spent on each sub stage - - Flush MemTable Cost(99%): The total time spent on node Flush and the P99 time spent on each sub stage - - Flush Sub Task Cost(avg): The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages - - Flush Sub Task Cost(50%): The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages - - Flush Sub Task Cost(99%): The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages + - Average Flush Latency: The total time spent on node Flush and the average time spent on each sub stage + - P50 Flush Latency: The total time spent on node Flush and the median time spent on each sub stage + - P99 Flush Latency: The total time spent on node Flush and the P99 time spent on each sub stage + - Average Flush Subtask Latency: The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages + - P50 Flush Subtask Latency: The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages + - P99 Flush Subtask Latency: The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages - Pending Flush Task Num: The number of Flush tasks in a blocked state for a node - Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes -- Tsfile Compression Ratio Of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable -- Flush TsFile Size Of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions -- Size Of Flushing MemTable: The size of the Memtable for node disk flushing -- Points Num Of Flushing MemTable: The number of points when flashing data in different DataRegions of a node -- Series Num Of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node -- Average Point Num Of Flushing MemChunk: The average number of disk flushing points for node MemChunk +- Tsfile Compression Ratio of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable +- Flush TsFile Size of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions +- Size of Flushing MemTable: The size of the Memtable for node disk flushing +- Points Num of Flushing MemTable: The number of points when flashing data in different DataRegions of a node +- Series Num of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node +- Average Point Num of Flushing MemChunk: The average number of disk flushing points for node MemChunk #### Schema Engine @@ -523,117 +523,117 @@ This panel displays the monitoring status of all data nodes in the cluster, incl #### Query Engine - Time Consumption In Each Stage - - The time consumed of query plan stages(avg): The average time spent on node queries at each stage - - The time consumed of query plan stages(50%): Median time spent on node queries at each stage - - The time consumed of query plan stages(99%): P99 time consumption for node query at each stage + - Average Query Plan Execution Time: The average time spent on node queries at each stage + - P50 Query Plan Execution Time: Median time spent on node queries at each stage + - P99 Query Plan Execution Time: P99 time consumption for node query at each stage - Execution Plan Distribution Time - - The time consumed of plan dispatch stages(avg): The average time spent on node query execution plan distribution - - The time consumed of plan dispatch stages(50%): Median time spent on node query execution plan distribution - - The time consumed of plan dispatch stages(99%): P99 of node query execution plan distribution time + - Average Query Plan Dispatch Time: The average time spent on node query execution plan distribution + - P50 Query Plan Dispatch Time: Median time spent on node query execution plan distribution + - P99 Query Plan Dispatch Time: P99 of node query execution plan distribution time - Execution Plan Execution Time - - The time consumed of query execution stages(avg): The average execution time of node query execution plan - - The time consumed of query execution stages(50%):Median execution time of node query execution plan - - The time consumed of query execution stages(99%): P99 of node query execution plan execution time + - Average Query Execution Time: The average execution time of node query execution plan + - P50 Query Execution Time:Median execution time of node query execution plan + - P99 Query Execution Time: P99 of node query execution plan execution time - Operator Execution Time - - The time consumed of operator execution stages(avg): The average execution time of node query operators - - The time consumed of operator execution(50%): Median execution time of node query operator - - The time consumed of operator execution(99%): P99 of node query operator execution time + - Average Query Operator Execution Time: The average execution time of node query operators + - P50 Query Operator Execution Time: Median execution time of node query operator + - P99 Query Operator Execution Time: P99 of node query operator execution time - Aggregation Query Computation Time - - The time consumed of query aggregation(avg): The average computation time for node aggregation queries - - The time consumed of query aggregation(50%): Median computation time for node aggregation queries - - The time consumed of query aggregation(99%): P99 of node aggregation query computation time + - Average Query Aggregation Execution Time: The average computation time for node aggregation queries + - P50 Query Aggregation Execution Time: Median computation time for node aggregation queries + - P99 Query Aggregation Execution Time: P99 of node aggregation query computation time - File/Memory Interface Time Consumption - - The time consumed of query scan(avg): The average time spent querying file/memory interfaces for nodes - - The time consumed of query scan(50%): Median time spent querying file/memory interfaces for nodes - - The time consumed of query scan(99%): P99 time consumption for node query file/memory interface + - Average Query Scan Execution Time: The average time spent querying file/memory interfaces for nodes + - P50 Query Scan Execution Time: Median time spent querying file/memory interfaces for nodes + - P99 Query Scan Execution Time: P99 time consumption for node query file/memory interface - Number Of Resource Visits - - The usage of query resource(avg): The average number of resource visits for node queries - - The usage of query resource(50%): Median number of resource visits for node queries - - The usage of query resource(99%): P99 for node query resource access quantity + - Average Query Resource Utilization: The average number of resource visits for node queries + - P50 Query Resource Utilization: Median number of resource visits for node queries + - P99 Query Resource Utilization: P99 for node query resource access quantity - Data Transmission Time - - The time consumed of query data exchange(avg): The average time spent on node query data transmission - - The time consumed of query data exchange(50%): Median query data transmission time for nodes - - The time consumed of query data exchange(99%): P99 for node query data transmission time + - Average Query Data Exchange Latency: The average time spent on node query data transmission + - P50 Query Data Exchange Latency: Median query data transmission time for nodes + - P99 Query Data Exchange Latency: P99 for node query data transmission time - Number Of Data Transfers - - The count of Data Exchange(avg): The average number of data transfers queried by nodes - - The count of Data Exchange: The quantile of the number of data transfers queried by nodes, including the median and P99 + - Average Query Data Exchange Count: The average number of data transfers queried by nodes + - Query Data Exchange Count: The quantile of the number of data transfers queried by nodes, including the median and P99 - Task Scheduling Quantity And Time Consumption - - The number of query queue: Node query task scheduling quantity - - The time consumed of query schedule time(avg): The average time spent on scheduling node query tasks - - The time consumed of query schedule time(50%): Median time spent on node query task scheduling - - The time consumed of query schedule time(99%): P99 of node query task scheduling time + - Query Queue Length: Node query task scheduling quantity + - Average Query Scheduling Latency: The average time spent on scheduling node query tasks + - P50 Query Scheduling Latency: Median time spent on node query task scheduling + - P99 Query Scheduling Latency: P99 of node query task scheduling time #### Query Interface - Load Time Series Metadata - - The time consumed of load timeseries metadata(avg): The average time taken for node queries to load time series metadata - - The time consumed of load timeseries metadata(50%): Median time spent on loading time series metadata for node queries - - The time consumed of load timeseries metadata(99%): P99 time consumption for node query loading time series metadata + - Average Timeseries Metadata Load Time: The average time taken for node queries to load time series metadata + - P50 Timeseries Metadata Load Time: Median time spent on loading time series metadata for node queries + - P99 Timeseries Metadata Load Time: P99 time consumption for node query loading time series metadata - Read Time Series - - The time consumed of read timeseries metadata(avg): The average time taken for node queries to read time series - - The time consumed of read timeseries metadata(50%): The median time taken for node queries to read time series - - The time consumed of read timeseries metadata(99%): P99 time consumption for node query reading time series + - Average Timeseries Metadata Read Time: The average time taken for node queries to read time series + - P50 Timeseries Metadata Read Time: The median time taken for node queries to read time series + - P99 Timeseries Metadata Read Time: P99 time consumption for node query reading time series - Modify Time Series Metadata - - The time consumed of timeseries metadata modification(avg):The average time taken for node queries to modify time series metadata - - The time consumed of timeseries metadata modification(50%): Median time spent on querying and modifying time series metadata for nodes - - The time consumed of timeseries metadata modification(99%): P99 time consumption for node query and modification of time series metadata + - Average Timeseries Metadata Modification Time:The average time taken for node queries to modify time series metadata + - P50 Timeseries Metadata Modification Time: Median time spent on querying and modifying time series metadata for nodes + - P99 Timeseries Metadata Modification Time: P99 time consumption for node query and modification of time series metadata - Load Chunk Metadata List - - The time consumed of load chunk metadata list(avg): The average time it takes for node queries to load Chunk metadata lists - - The time consumed of load chunk metadata list(50%): Median time spent on node query loading Chunk metadata list - - The time consumed of load chunk metadata list(99%): P99 time consumption for node query loading Chunk metadata list + - Average Chunk Metadata List Load Time: The average time it takes for node queries to load Chunk metadata lists + - P50 Chunk Metadata List Load Time: Median time spent on node query loading Chunk metadata list + - P99 Chunk Metadata List Load Time: P99 time consumption for node query loading Chunk metadata list - Modify Chunk Metadata - - The time consumed of chunk metadata modification(avg): The average time it takes for node queries to modify Chunk metadata - - The time consumed of chunk metadata modification(50%): The total number of bits spent on modifying Chunk metadata for node queries - - The time consumed of chunk metadata modification(99%): P99 time consumption for node query and modification of Chunk metadata + - Average Chunk Metadata Modification Time: The average time it takes for node queries to modify Chunk metadata + - P50 Chunk Metadata Modification Time: The total number of bits spent on modifying Chunk metadata for node queries + - P99 Chunk Metadata Modification Time: P99 time consumption for node query and modification of Chunk metadata - Filter According To Chunk Metadata - - The time consumed of chunk metadata filter(avg): The average time spent on node queries filtering by Chunk metadata - - The time consumed of chunk metadata filter(50%): Median filtering time for node queries based on Chunk metadata - - The time consumed of chunk metadata filter(99%): P99 time consumption for node query filtering based on Chunk metadata + - Average Chunk Metadata Filtering Time: The average time spent on node queries filtering by Chunk metadata + - P50 Chunk Metadata Filtering Time: Median filtering time for node queries based on Chunk metadata + - P99 Chunk Metadata Filtering Time: P99 time consumption for node query filtering based on Chunk metadata - Constructing Chunk Reader - - The time consumed of construct chunk reader(avg): The average time spent on constructing Chunk Reader for node queries - - The time consumed of construct chunk reader(50%): Median time spent on constructing Chunk Reader for node queries - - The time consumed of construct chunk reader(99%): P99 time consumption for constructing Chunk Reader for node queries + - Average Chunk Reader Construction Time: The average time spent on constructing Chunk Reader for node queries + - P50 Chunk Reader Construction Time: Median time spent on constructing Chunk Reader for node queries + - P99 Chunk Reader Construction Time: P99 time consumption for constructing Chunk Reader for node queries - Read Chunk - - The time consumed of read chunk(avg): The average time taken for node queries to read Chunks - - The time consumed of read chunk(50%): Median time spent querying nodes to read Chunks - - The time consumed of read chunk(99%): P99 time spent on querying and reading Chunks for nodes + - Average Chunk Read Time: The average time taken for node queries to read Chunks + - P50 Chunk Read Time: Median time spent querying nodes to read Chunks + - P99 Chunk Read Time: P99 time spent on querying and reading Chunks for nodes - Initialize Chunk Reader - - The time consumed of init chunk reader(avg): The average time spent initializing Chunk Reader for node queries - - The time consumed of init chunk reader(50%): Median time spent initializing Chunk Reader for node queries - - The time consumed of init chunk reader(99%):P99 time spent initializing Chunk Reader for node queries + - Average Chunk Reader Initialization Time: The average time spent initializing Chunk Reader for node queries + - P50 Chunk Reader Initialization Time: Median time spent initializing Chunk Reader for node queries + - P99 Chunk Reader Initialization Time:P99 time spent initializing Chunk Reader for node queries - Constructing TsBlock Through Page Reader - - The time consumed of build tsblock from page reader(avg): The average time it takes for node queries to construct TsBlock through Page Reader - - The time consumed of build tsblock from page reader(50%): The median time spent on constructing TsBlock through Page Reader for node queries - - The time consumed of build tsblock from page reader(99%):Node query using Page Reader to construct TsBlock time-consuming P99 + - Average TsBlock Construction Time from Page Reader: The average time it takes for node queries to construct TsBlock through Page Reader + - P50 TsBlock Construction Time from Page Reader: The median time spent on constructing TsBlock through Page Reader for node queries + - P99 TsBlock Construction Time from Page Reader:Node query using Page Reader to construct TsBlock time-consuming P99 - Query the construction of TsBlock through Merge Reader - - The time consumed of build tsblock from merge reader(avg): The average time taken for node queries to construct TsBlock through Merge Reader - - The time consumed of build tsblock from merge reader(50%): The median time spent on constructing TsBlock through Merge Reader for node queries - - The time consumed of build tsblock from merge reader(99%): Node query using Merge Reader to construct TsBlock time-consuming P99 + - Average TsBlock Construction Time from Merge Reader: The average time taken for node queries to construct TsBlock through Merge Reader + - P50 TsBlock Construction Time from Merge Reader: The median time spent on constructing TsBlock through Merge Reader for node queries + - P99 TsBlock Construction Time from Merge Reader: Node query using Merge Reader to construct TsBlock time-consuming P99 #### Query Data Exchange The data exchange for the query is time-consuming. - Obtain TsBlock through source handle - - The time consumed of source handle get tsblock(avg): The average time taken for node queries to obtain TsBlock through source handle - - The time consumed of source handle get tsblock(50%):Node query obtains the median time spent on TsBlock through source handle - - The time consumed of source handle get tsblock(99%): Node query obtains TsBlock time P99 through source handle + - Average Source Handle TsBlock Retrieval Time: The average time taken for node queries to obtain TsBlock through source handle + - P50 Source Handle TsBlock Retrieval Time:Node query obtains the median time spent on TsBlock through source handle + - P99 Source Handle TsBlock Retrieval Time: Node query obtains TsBlock time P99 through source handle - Deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(avg): The average time taken for node queries to deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(50%): The median time taken for node queries to deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(99%): P99 time spent on deserializing TsBlock through source handle for node query + - Average Source Handle TsBlock Deserialization Time: The average time taken for node queries to deserialize TsBlock through source handle + - P50 Source Handle TsBlock Deserialization Time: The median time taken for node queries to deserialize TsBlock through source handle + - P99 Source Handle TsBlock Deserialization Time: P99 time spent on deserializing TsBlock through source handle for node query - Send TsBlock through sink handle - - The time consumed of sink handle send tsblock(avg): The average time taken for node queries to send TsBlock through sink handle - - The time consumed of sink handle send tsblock(50%): Node query median time spent sending TsBlock through sink handle - - The time consumed of sink handle send tsblock(99%): Node query sends TsBlock through sink handle with a time consumption of P99 + - Average Sink Handle TsBlock Transmission Time: The average time taken for node queries to send TsBlock through sink handle + - P50 Sink Handle TsBlock Transmission Time: Node query median time spent sending TsBlock through sink handle + - P99 Sink Handle TsBlock Transmission Time: Node query sends TsBlock through sink handle with a time consumption of P99 - Callback data block event - - The time consumed of on acknowledge data block event task(avg): The average time taken for node query callback data block event - - The time consumed of on acknowledge data block event task(50%): Median time spent on node query callback data block event - - The time consumed of on acknowledge data block event task(99%): P99 time consumption for node query callback data block event + - Average Data Block Event Acknowledgment Time: The average time taken for node query callback data block event + - P50 Data Block Event Acknowledgment Time: Median time spent on node query callback data block event + - P99 Data Block Event Acknowledgment Time: P99 time consumption for node query callback data block event - Get Data Block Tasks - - The time consumed of get data block task(avg): The average time taken for node queries to obtain data block tasks - - The time consumed of get data block task(50%): The median time taken for node queries to obtain data block tasks - - The time consumed of get data block task(99%): P99 time consumption for node query to obtain data block task + - Average Data Block Task Retrieval Time: The average time taken for node queries to obtain data block tasks + - P50 Data Block Task Retrieval Time: The median time taken for node queries to obtain data block tasks + - P99 Data Block Task Retrieval Time: P99 time consumption for node query to obtain data block task #### Query Related Resource @@ -643,40 +643,40 @@ The data exchange for the query is time-consuming. - Coordinator: The number of queries recorded on the node - MemoryPool Size: Node query related memory pool situation - MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values -- DriverScheduler: Number of queue tasks related to node queries +- DriverScheduler Count: Number of queue tasks related to node queries #### Consensus - IoT Consensus - Memory Usage - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage - Synchronization Status Between Nodes - - IoTConsensus Sync Index: SyncIndex size for different DataRegions of IoT Consumption nodes + - IoTConsensus Sync Index Size: SyncIndex size for different DataRegions of IoT Consumption nodes - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes - - IoTConsensus Search Index Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus Safe Index Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus Search Index Growth Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus Safe Index Growth Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption - - Sync Speed Diff Of Peers: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption + - Peer Sync Speed Difference: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue - Different Execution Stages Take Time - - The Time Consumed Of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus - - The Time Consumed Of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus - - The Time Consumed Of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus + - The Time Consumed of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus + - The Time Consumed of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus + - The Time Consumed of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus #### Consensus - DataRegion Ratis Consensus -- Ratis Stage Time: The time consumption of different stages of node Ratis -- Write Log Entry: The time consumption of writing logs at different stages of node Ratis -- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotely -- Remote / Local Write QPS: QPS written by node Ratis locally or remotely -- RatisConsensus Memory:Memory usage of node Ratis +- Ratis Consensus Stage Latency: The time consumption of different stages of node Ratis +- Ratis Log Write Latency: The time consumption of writing logs at different stages of node Ratis +- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotely +- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely +- RatisConsensus Memory Usage:Memory usage of node Ratis #### Consensus - SchemaRegion Ratis Consensus -- Ratis Stage Time: The time consumption of different stages of node Ratis -- Write Log Entry: The time consumption for writing logs at each stage of node Ratis -- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely -- Remote / Local Write QPS: QPS written by node Ratis locally or remotely -- RatisConsensus Memory: Node Ratis Memory Usage \ No newline at end of file +- RatisConsensus Stage Latency: The time consumption of different stages of node Ratis +- Ratis Log Write Latency: The time consumption for writing logs at each stage of node Ratis +- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely +- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely +- RatisConsensus Memory Usage: Node Ratis Memory Usage \ No newline at end of file diff --git a/src/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md index 8cabc7278..e897012e0 100644 --- a/src/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ b/src/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -184,498 +184,499 @@ Ensure the URL for Prometheus is correct. Click "Save & Test". If the message "D ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) -## 3. Appendix, Detailed Monitoring Metrics +## 3. Appendix, Detailed Explanation of Monitoring Indicators ### 3.1 System Dashboard -This dashboard displays the current system's **CPU****, memory, disk, and network resource****s**, as well as some **JVM****-related metrics**. - -#### 3.1.1 CPU - -- **CPU Core:** Number of CPU cores. -- **CPU Load:** - - **System CPU Load:** The average CPU load and utilization of the entire system during the sampling period. - - **Process CPU Load:** The percentage of CPU resources occupied by the IoTDB process during the sampling period. -- **CPU Time Per Minute:** The total CPU time consumed by all processes in the system per minute. - -#### 3.1.2 Memory - -- **System Memory:** Current system memory usage. - - **Committed VM Size:** Virtual memory size allocated by the operating system to running processes. - - **Total Physical Memory****:** Total available physical memory in the system. - - **Used Physical Memory****:** The total amount of memory currently in use, including memory actively used by processes and memory occupied by the operating system for buffers and caching. -- **System Swap Memory:** The amount of swap space memory in use. -- **Process Memory:** Memory usage of the IoTDB process. - - **Max Memory:** The maximum amount of memory that the IoTDB process can request from the OS (configured in the `datanode-env`/`confignode-env` configuration files). - - **Total Memory:** The total amount of memory currently allocated by the IoTDB process from the OS. - - **Used Memory:** The total amount of memory currently in use by the IoTDB process. - -#### 3.1.3 Disk - -- **Disk Space:** - - **Total Disk Space:** Maximum disk space available for IoTDB. - - **Used Disk Space:** Disk space currently occupied by IoTDB. -- **Log Number Per Minute:** Average number of IoTDB logs generated per minute, categorized by log levels. -- **File Count:** The number of files related to IoTDB. - - **All:** Total number of files. - - **TsFile:** Number of TsFiles. - - **Seq:** Number of sequential TsFiles. - - **Unseq:** Number of unordered TsFiles. - - **WAL:** Number of WAL (Write-Ahead Log) files. - - **Cross-Temp:** Number of temporary files generated during cross-space merge operations. - - **Inner-Seq-Temp:** Number of temporary files generated during sequential-space merge operations. - - **Inner-Unseq-Temp:** Number of temporary files generated during unordered-space merge operations. - - **Mods:** Number of tombstone files. -- **Open File Count:** Number of open file handles in the system. -- **File Size:** The size of IoTDB-related files, with each sub-item representing the size of a specific file type. -- **Disk I/O Busy Rate:** Equivalent to the `%util` metric in `iostat`, indicating the level of disk utilization. Each sub-item corresponds to a specific disk. -- **Disk I/O Throughput****:** Average I/O throughput of system disks over a given period. Each sub-item corresponds to a specific disk. -- **Disk I/O Ops:** Equivalent to `r/s`, `w/s`, `rrqm/s`, and `wrqm/s` in `iostat`, representing the number of I/O operations per second. -- **Disk I/O Avg Time:** Equivalent to the `await` metric in `iostat`, representing the average latency of each I/O request, recorded separately for read and write operations. -- **Disk I/O Avg Size:** Equivalent to the `avgrq-sz` metric in `iostat`, indicating the average size of each I/O request, recorded separately for read and write operations. -- **Disk I/O Avg Queue Size:** Equivalent to `avgqu-sz` in `iostat`, representing the average length of the I/O request queue. -- **I/O System Call Rate:** Frequency of read/write system calls invoked by the process, similar to IOPS. -- **I/O Throughput****:** I/O throughput of the process, divided into `actual_read/write` and `attempt_read/write`. `Actual read` and `actual write` refer to the number of bytes actually written to or read from the storage device, excluding those handled by the Page Cache. - -#### 3.1.4 JVM - -- **GC Time Percentage:** Percentage of time spent on garbage collection (GC) by the JVM in the past minute. -- **GC Allocated/Promoted Size Detail:** The average size of objects promoted to the old generation per minute, as well as newly allocated objects in the young/old generation and non-generational areas. -- **GC Data Size Detail:** Size of long-lived objects in the JVM and the maximum allowed size for each generation. -- **Heap Memory:** JVM heap memory usage. - - **Maximum Heap Memory:** Maximum available heap memory for the JVM. - - **Committed Heap Memory:** Committed heap memory size for the JVM. - - **Used Heap Memory:** The amount of heap memory currently in use. - - **PS Eden Space:** Size of the PS Young generation's Eden space. - - **PS Old Space:** Size of the PS Old generation. - - **PS Survivor Space:** Size of the PS Survivor space. -- **O****ff Heap Memory:** Off-heap memory usage. - - **Direct Memory:** The amount of direct memory used. - - **Mapped Memory:** The amount of memory used for mapped files. -- **GC Number Per Minute:** Average number of garbage collections (YGC and FGC) performed per minute. -- **GC Time Per Minute:** Average time spent on garbage collection (YGC and FGC) per minute. -- **GC Number Per Minute Detail:** Average number of garbage collections performed per minute due to different causes. -- **GC Time Per Minute Detail:** Average time spent on garbage collection per minute due to different causes. -- **Time Consumed of Compilation Per Minute:** Total time spent on JVM compilation per minute. -- **The Number of Class:** - - **Loaded:** Number of classes currently loaded by the JVM. - - **Unloaded:** Number of classes unloaded by the JVM since system startup. -- **The Number of Java Thread:** The number of currently active threads in IoTDB. Each sub-item represents the number of threads in different states. - -#### 3.1.5 Network - -- **Net Speed:** Data transmission and reception speed by the network interface. -- **Receive/Transmit Data Size:** The total size of data packets sent and received by the network interface since system startup. -- **Packet Speed:** The rate of data packets sent and received by the network interface. A single RPC request may correspond to one or more packets. -- **Connection Num:** Number of socket connections for the current process (IoTDB only uses TCP). +This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. + +#### CPU + +- CPU Cores:CPU cores +- CPU Utilization: + - System CPU Utilization:The average CPU load and busyness of the entire system during the sampling time + - Process CPU Utilization:The proportion of CPU occupied by the IoTDB process during sampling time +- CPU Time Per Minute:The total CPU time of all processes in the system per minute + +#### Memory + +- System Memory:The current usage of system memory. + - Commited VM Size: The size of virtual memory allocated by the operating system to running processes. + - Total Physical Memory:The total amount of available physical memory in the system. + - Used Physical Memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. +- System Swap Memory:Swap Space memory usage. +- Process Memory:The usage of memory by the IoTDB process. + - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) + - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. + - Used Memory:The total amount of memory currently used by the IoTDB process. + +#### Disk + +- Disk Space: + - Total Disk Space:The maximum disk space that IoTDB can use. + - Used Disk Space:The disk space already used by IoTDB. +- Logs Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. +- File Count:Number of IoTDB related files + - All:All file quantities + - TsFile:Number of TsFiles + - Seq:Number of sequential TsFiles + - Unseq:Number of unsequence TsFiles + - WAL:Number of WAL files + - Cross-Temp:Number of cross space merge temp files + - Inner-Seq-Temp:Number of merged temp files in sequential space + - Innsr-Unseq-Temp:Number of merged temp files in unsequential space + - Mods:Number of tombstone files +- Open File Handles:Number of file handles opened by the system +- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. +- Disk Utilization (%):Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. +- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. +- Disk IOPS:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. +- Disk I/O Latency (Avg):Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. +- Disk I/O Request Size (Avg):Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. +- Disk I/O Queue Length (Avg):Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. +- I/O Syscall Rate:The frequency of process calls to read and write system calls, similar to IOPS. +- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. + +#### JVM + +- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window +- GC Allocated/Promoted Size: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications +- GC Live Data Size:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value +- Heap Memory:JVM heap memory usage. + - Maximum heap memory:The maximum available heap memory size for the JVM. + - Committed heap memory:The size of heap memory that has been committed by the JVM. + - Used heap memory:The size of heap memory already used by the JVM. + - PS Eden Space:The size of the PS Young area. + - PS Old Space:The size of the PS Old area. + - PS Survivor Space:The size of the PS survivor area. + - ...(CMS/G1/ZGC, etc) +- Off-Heap Memory:Out of heap memory usage. + - Direct Memory:Out of heap direct memory. + - Mapped Memory:Out of heap mapped memory. +- GCs Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC +- GC Latency Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC +- GC Events Breakdown Per Minute:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC +- GC Pause Time Breakdown Per Minute:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC +- JIT Compilation Time Per Minute:The total time JVM spends compiling per minute +- Loaded & Unloaded Classes: + - Loaded:The number of classes currently loaded by the JVM + - Unloaded:The number of classes uninstalled by the JVM since system startup +- Active Java Threads:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. + +#### Network + +Eno refers to the network card connected to the public network, while lo refers to the virtual network card. + +- Network Speed:The speed of network card sending and receiving data +- Network Throughput (Receive/Transmit):The size of data packets sent or received by the network card, calculated from system restart +- Packet Transmission Rate:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets +- Active TCP Connections:The current number of socket connections for the selected process (IoTDB only has TCP) ### 3.2 Performance Overview Dashboard -This dashboard provides an overview of the system's overall performance. - -#### 3.2.1 Cluster Overview - -- **Total CPU Core:** Total number of CPU cores in the cluster. -- **DataNode CPU Load:** CPU utilization of each DataNode in the cluster. -- Disk: - - **Total Disk Space:** Total disk space across all cluster nodes. - - **DataNode Disk Usage:** Disk usage of each DataNode in the cluster. -- **Total Timeseries:** The total number of time series managed by the cluster (including replicas). The actual number of time series should be calculated considering metadata replicas. -- **Cluster:** The number of ConfigNode and DataNode instances in the cluster. -- **Up Time:** The duration since the cluster started. -- **Total Write Point Per Second:** The total number of data points written per second in the cluster (including replicas). The actual number of writes should be analyzed in conjunction with the data replication factor. -- Memory: - - **Total System Memory:** The total system memory available in the cluster. - - **Total Swap Memory:** The total swap memory available in the cluster. - - **DataNode Process Memory Usage:** The memory usage of each DataNode in the cluster. -- **Total File Number:** The total number of files managed by the cluster. -- **Cluster System Overview:** An overview of cluster-wide system resources, including average DataNode memory usage and average disk usage. -- **Total Database:** The total number of databases managed by the cluster (including replicas). -- **Total DataRegion:** The total number of DataRegions in the cluster. -- **Total SchemaRegion:** The total number of SchemaRegions in the cluster. - -#### 3.2.2 Node Overview - -- **CPU Core:** Number of CPU cores on the node’s machine. -- **Disk Space:** Total disk space available on the node’s machine. -- **Timeseries:** The number of time series managed by the node (including replicas). -- **System Overview:** Overview of the node’s system resources, including CPU load, process memory usage, and disk usage. -- **Write Point Per Second:** The write speed of the node, including replicated data. -- **System Memory:** The total system memory available on the node’s machine. -- **Swap Memory:** The total swap memory available on the node’s machine. -- **File Number:** The number of files managed by the node. - -#### 3.2.3 Performance - -- **Session Idle Time:** The total idle time of session connections on the node. -- **Client Connection:** The status of client connections on the node, including the total number of connections and the number of active connections. -- **Time Consumed Of Operation:** The latency of various operations on the node, including the average value and P99 percentile. -- **Average Time Consumed Of Interface:** The average latency of each **Thrift interface** on the node. -- **P99 Time Consumed Of Interface:** The P99 latency of each Thrift interface on the node. -- **Task Number:** The number of system tasks running on the node. -- **Average Time Consumed Of Task:** The average execution time of system tasks on the node. -- **P99 Time Consumed Of Task:** The P99 execution time of system tasks on the node. -- **Operation Per Second:** The number of operations executed per second on the node. -- Main Process: - - **Operation Per Second of Stage:** The number of operations executed per second in different stages of the main process. - - **Average Time Consumed of Stage:** The average execution time of different stages in the main process. - - **P99 Time Consumed of Stage:** The P99 execution time of different stages in the main process. -- Scheduling Stage: - - **OPS Of Schedule:** The number of operations executed per second in different sub-stages of the scheduling stage. - - **Average Time Consumed Of Schedule Stage:** The average execution time in different sub-stages of the scheduling stage. - - **P99 Time Consumed Of Schedule Stage:** The P99 execution time in different sub-stages of the scheduling stage. -- Local Scheduling Stage: - - **OPS of Local Schedule Stage:** Number of operations per second at each sub-stage of the local schedule stage. - - **Average Time Consumed of Local Schedule Stage:** Average time consumed at each sub-stage of the local schedule stage. - - **P99 Time Consumed of Local Schedule Stage:** P99 time consumed at each sub-stage of the local schedule stage. -- Storage Stage: - - **OPS of Storage Stage:** Number of operations per second at each sub-stage of the storage stage. - - **Average Time Consumed of Storage Stage:** Average time consumed at each sub-stage of the storage stage. - - **P99 Time Consumed of Storage Stage:** P99 time consumed at each sub-stage of the storage stage. -- Engine Stage: - - **OPS Of Engine Stage:** The number of operations executed per second in different sub-stages of the engine stage. - - **Average Time Consumed Of Engine Stage:** The average execution time in different sub-stages of the engine stage. - - **P99 Time Consumed Of Engine Stage:** The P99 execution time in different sub-stages of the engine stage. - -#### 3.2.4 System - -- **CPU Load:** The CPU load of the node. -- **CPU Time Per Minute:** The total CPU time per minute on the node, which is influenced by the number of CPU cores. -- **GC Time Per Minute:** The average time spent on Garbage Collection (GC) per minute on the node, including Young GC (YGC) and Full GC (FGC). -- **Heap Memory:** The heap memory usage of the node. -- **Off-Heap Memory:** The off-heap memory usage of the node. -- **The Number Of Java Thread:** The number of Java threads on the node. -- **File Count:** The number of files managed by the node. -- **File Size:** The total size of files managed by the node. -- **Log Number Per Minute:** The number of logs generated per minute on the node, categorized by log type. +#### Cluster Overview + +- Total CPU Cores:Total CPU cores of cluster machines +- DataNode CPU Load:CPU usage of each DataNode node in the cluster +- Disk + - Total Disk Space: Total disk size of cluster machines + - DataNode Disk Utilization: The disk usage rate of each DataNode in the cluster +- Total Time Series: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas +- Cluster Info: Number of ConfigNode and DataNode nodes in the cluster +- Up Time: The duration of cluster startup until now +- Total Write Throughput: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas +- Memory + - Total System Memory: Total memory size of cluster machine system + - Total Swap Memory: Total size of cluster machine swap memory + - DataNode Process Memory Utilization: Memory usage of each DataNode in the cluster +- Total Files:Total number of cluster management files +- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage +- Total DataBases: The total number of databases managed by the cluster (including replicas) +- Total DataRegions: The total number of DataRegions managed by the cluster +- Total SchemaRegions: The total number of SchemeRegions managed by the cluster + +#### Node Overview + +- CPU Cores: The number of CPU cores in the machine where the node is located +- Disk Space: The disk size of the machine where the node is located +- Time Series: Number of time series managed by the machine where the node is located (including replicas) +- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio +- Write Throughput: The write speed per second of the machine where the node is located (including replicas) +- System Memory: The system memory size of the machine where the node is located +- Swap Memory:The swap memory size of the machine where the node is located +- File Count: Number of files managed by nodes + +#### Performance + +- Session Idle Time:The total idle time and total busy time of the session connection of the node +- Client Connections: The client connection status of the node, including the total number of connections and the number of active connections +- Operation Latency: The time consumption of various types of node operations, including average and P99 +- Average Interface Latency: The average time consumption of each thrust interface of a node +- P99 Interface Latency: P99 time consumption of various thrust interfaces of nodes +- Total Tasks: The number of system tasks for each node +- Average Task Latency: The average time spent on various system tasks of a node +- P99 Task Latency: P99 time consumption for various system tasks of nodes +- Operations Per Second: The number of operations per second for a node +- Mainstream Process + - Operations Per Second (Stage-wise): The number of operations per second for each stage of the node's main process + - Average Stage Latency: The average time consumption of each stage in the main process of a node + - P99 Stage Latency: P99 time consumption for each stage of the node's main process +- Schedule Stage + - Schedule Operations Per Second: The number of operations per second in each sub stage of the node schedule stage + - Average Schedule Stage Latency:The average time consumption of each sub stage in the node schedule stage + - P99 Schedule Stage Latency: P99 time consumption for each sub stage of the schedule stage of the node +- Local Schedule Sub Stages + - Local Schedule Operations Per Second: The number of operations per second in each sub stage of the local schedule node + - Average Local Schedule Stage Latency: The average time consumption of each sub stage in the local schedule stage of the node + - P99 Local Schedule Latency: P99 time consumption for each sub stage of the local schedule stage of the node +- Storage Stage + - Storage Operations Per Second: The number of operations per second in each sub stage of the node storage stage + - Average Storage Stage Latency: Average time consumption of each sub stage in the node storage stage + - P99 Storage Stage Latency: P99 time consumption for each sub stage of node storage stage +- Engine Stage + - Engine Operations Per Second: The number of operations per second in each sub stage of the node engine stage + - Average Engine Stage Latency: The average time consumption of each sub stage in the engine stage of a node + - P99 Engine Stage Latency: P99 time consumption of each sub stage in the node engine stage + +#### System + +- CPU Utilization: CPU load of nodes +- CPU Latency Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores +- GC Latency Per Minute:The average GC time per minute for nodes, including YGC and FGC +- Heap Memory: Node's heap memory usage +- Off-Heap Memory: Non heap memory usage of nodes +- Total Java Threads: Number of Java threads on nodes +- File Count:Number of files managed by nodes +- File Size: Node management file size situation +- Logs Per Minute: Different types of logs per minute for nodes ### 3.3 ConfigNode Dashboard -This dashboard displays the performance metrics of all management nodes in the cluster, including **partition information, node status, and client connection statistics**. - -#### 3.3.1 Node Overview - -- **Database Count:** Number of databases on the node. -- Region: - - **DataRegion Count:** Number of DataRegions on the node. - - **DataRegion Current Status:** Current status of DataRegions on the node. - - **SchemaRegion Count:** Number of SchemaRegions on the node. - - **SchemaRegion Current Status:** Current status of SchemaRegions on the node. -- **System Memory:** System memory on the node's machine. -- **Swap Memory:** Swap memory on the node's machine. -- **ConfigNodes:** Status of ConfigNodes in the cluster. -- **DataNodes:** Status of DataNodes in the cluster. -- **System Overview:** Overview of the node's system resources, including system memory, disk usage, process memory, and CPU load. - -#### 3.3.2 NodeInfo - -- **Node Count:** The total number of nodes in the cluster, including ConfigNodes and DataNodes. -- **ConfigNode Status:** The status of ConfigNodes in the cluster. -- **DataNode Status:** The status of DataNodes in the cluster. -- **SchemaRegion Distribution:** The distribution of SchemaRegions in the cluster. -- **SchemaRegionGroup Leader Distribution:** The leader distribution of SchemaRegionGroups in the cluster. -- **DataRegion Distribution:** The distribution of DataRegions in the cluster. -- **DataRegionGroup Leader Distribution:** The leader distribution of DataRegionGroups in the cluster. - -#### 3.3.3 Protocol - -- Client Count Statistics: - - **Active Client Num:** The number of active clients in each thread pool on the node. - - **Idle Client Num:** The number of idle clients in each thread pool on the node. - - **Borrowed Client Count:** The number of borrowed clients in each thread pool on the node. - - **Created Client Count:** The number of clients created in each thread pool on the node. - - **Destroyed Client Count:** The number of clients destroyed in each thread pool on the node. -- Client Time Statistics: - - **Client Mean Active Time:** The average active time of clients in each thread pool on the node. - - **Client Mean Borrow Wait Time:** The average time clients spend waiting for borrowed resources in each thread pool. - - **Client Mean Idle Time:** The average idle time of clients in each thread pool. - -#### 3.3.4 Partition Table - -- **SchemaRegionGroup Count:** The number of **SchemaRegionGroups** in the cluster’s databases. -- **DataRegionGroup Count:** The number of DataRegionGroups in the cluster’s databases. -- **SeriesSlot Count:** The number of SeriesSlots in the cluster’s databases. -- **TimeSlot Count:** The number of TimeSlots in the cluster’s databases. -- **DataRegion Status:** The status of DataRegions in the cluster. -- **SchemaRegion Status:** The status of SchemaRegions in the cluster. - -#### 3.3.5 Consensus - -- **Ratis Stage Time:** The execution time of different stages in the Ratis consensus protocol. -- **Write Log Entry:** The execution time for writing log entries in Ratis. -- **Remote / Local Write Time:** The time taken for remote and local writes in Ratis. -- **Remote / Local Write QPS:** The **queries per second (QPS)** for remote and local writes in Ratis. -- **RatisConsensus Memory:** The memory usage of the Ratis consensus protocol on the node. +This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. + +#### Node Overview + +- Database Count: Number of databases for nodes +- Region + - DataRegion Count:Number of DataRegions for nodes + - DataRegion Status: The state of the DataRegion of the node + - SchemaRegion Count: Number of SchemeRegions for nodes + - SchemaRegion Status: The state of the SchemeRegion of the node +- System Memory Utilization: The system memory size of the node +- Swap Memory Utilization: Node's swap memory size +- ConfigNodes Status: The running status of the ConfigNode in the cluster where the node is located +- DataNodes Status:The DataNode situation of the cluster where the node is located +- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load + +#### NodeInfo + +- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode +- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located +- DataNode Status: The status of the DataNode node in the cluster where the node is located +- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located +- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located +- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located +- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located + +#### Protocol + +- Client Count + - Active Clients: The number of active clients in each thread pool of a node + - Idle Clients: The number of idle clients in each thread pool of a node + - Borrowed Clients Per Second: Number of borrowed clients in each thread pool of the node + - Created Clients Per Second: Number of created clients for each thread pool of the node + - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node +- Client time situation + - Average Client Active Time: The average active time of clients in each thread pool of a node + - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node + - Average Client Idle Time: The average idle time of clients in each thread pool of a node + +#### Partition Table + +- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located +- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located +- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located +- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located +- DataRegion Status: The DataRegion status of the cluster where the node is located +- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located + +#### Consensus + +- Ratis Stage Latency: The time consumption of each stage of the node's Ratis +- Write Log Entry Latency: The time required to write a log for the Ratis of a node +- Remote / Local Write Latency: The time consumption of remote and local writes for the Ratis of nodes +- Remote / Local Write Throughput: Remote and local QPS written to node Ratis +- RatisConsensus Memory Utilization: Memory usage of Node Ratis consensus protocol ### 3.4 DataNode Dashboard -This dashboard displays the monitoring status of all **DataNodes** in the cluster, including **write latency, query latency, and storage file counts**. - -#### 3.4.1 Node Overview - -- **The Number of Entity:** The number of entities managed by the node. -- **Write Point Per Second:** The write speed of the node (points per second). -- **Memory Usage:** The memory usage of the node, including IoT Consensus memory usage, SchemaRegion memory usage, and per-database memory usage. - -#### 3.4.2 Protocol - -- Operation Latency: - - **The Time Consumed of Operation (avg):** The average latency of operations on the node. - - **The Time Consumed of Operation (50%):** The median latency of operations on the node. - - **The Time Consumed of Operation (99%):** The P99 latency of operations on the node. -- Thrift Statistics: - - **The QPS of Interface:** The queries per second (QPS) for each Thrift interface on the node. - - **The Avg Time Consumed of Interface:** The average execution time for each Thrift interface on the node. - - **Thrift Connection:** The number of active Thrift connections on the node. - - **Thrift Active Thread:** The number of active Thrift threads on the node. -- Client Statistics: - - **Active Client Num:** The number of active clients in each thread pool. - - **Idle Client Num:** The number of idle clients in each thread pool. - - **Borrowed Client Count:** The number of borrowed clients in each thread pool. - - **Created Client Count:** The number of clients created in each thread pool. - - **Destroyed Client Count:** The number of clients destroyed in each thread pool. - - **Client Mean Active Time:** The average active time of clients in each thread pool. - - **Client Mean Borrow Wait Time:** The average time clients spend waiting for borrowed resources in each thread pool. - - **Client Mean Idle Time:** The average idle time of clients in each thread pool. - -#### 3.4.3 Storage Engine - -- **File Count:** The number of files managed by the node. -- **File Size:** The total size of files managed by the node. -- TsFile: - - **TsFile Total Size In Each Level:** The total size of TsFiles at each level. - - **TsFile Count In Each Level:** The number of TsFiles at each level. - - **Avg TsFile Size In Each Level:** The average size of TsFiles at each level. -- **Task Number:** The number of tasks on the node. -- **The Time Consumed of Task:** The total execution time of tasks on the node. -- Compaction: - - **Compaction Read And Write Per Second:** The read/write speed of compaction operations. - - **Compaction Number Per Minute:** The number of **compaction** operations per minute. - - **Compaction Process Chunk Status:** The number of **chunks** in different states during compaction. - - **Compacted Point Num Per Minute:** The number of data points compacted per minute. - -#### 3.4.4 Write Performance - -- **Write Cost (avg):** The average **write latency**, including WAL and **memtable** writes. -- **Write Cost (50%):** The **median write latency**, including WAL and **memtable** writes. -- **Write Cost (99%):** The **P99 write latency**, including WAL and **memtable** writes. -- WAL (Write-Ahead Logging) - - **WAL File Size:** The total size of WAL files managed by the node. - - **WAL File Num:** The total number of WAL files managed by the node. - - **WAL Nodes Num:** The total number of WAL Nodes managed by the node. - - **Make Checkpoint Costs:** The time required to create different types of Checkpoints. - - **WAL Serialize Total Cost:** The total serialization time for WAL. - - **Data Region Mem Cost:** The memory usage of different DataRegions, including total memory usage of DataRegions on the current instance and total memory usage of DataRegions across the entire cluster. - - **Serialize One WAL Info Entry Cost:** The time taken to serialize a single WAL Info Entry. - - **Oldest MemTable Ram Cost When Cause Snapshot:** The memory size of the oldest MemTable when a snapshot is triggered by WAL. - - **Oldest MemTable Ram Cost When Cause Flush:** The memory size of the oldest MemTable when a flush is triggered by WAL. - - **Effective Info Ratio of WALNode:** The ratio of effective information in different WALNodes. +This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. + +#### Node Overview + +- Total Managed Entities: Entity situation of node management +- Write Throughput: The write speed per second of the node +- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. + +#### Protocol + +- Node Operation Time Consumption + - Average Operation Latency: The average time spent on various operations of a node + - P50 Operation Latency: The median time spent on various operations of a node + - P99 Operation Latency: P99 time consumption for various operations of nodes +- Thrift Statistics + - Thrift Interface QPS: QPS of various Thrift interfaces of nodes + - Average Thrift Interface Latency: The average time consumption of each Thrift interface of a node + - Thrift Connections: The number of Thrfit connections of each type of node + - Active Thrift Threads: The number of active Thrift connections for each type of node +- Client Statistics + - Active Clients: The number of active clients in each thread pool of a node + - Idle Clients: The number of idle clients in each thread pool of a node + - Borrowed Clients Per Second:Number of borrowed clients for each thread pool of a node + - Created Clients Per Second: Number of created clients for each thread pool of the node + - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node + - Average Client Active Time: The average active time of clients in each thread pool of a node + - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node + - Average Client Idle Time: The average idle time of clients in each thread pool of a node + +#### Storage Engine + +- File Count: Number of files of various types managed by nodes +- File Size: Node management of various types of file sizes +- TsFile + - Total TsFile Size Per Level: The total size of TsFile files at each level of node management + - TsFile Count Per Level: Number of TsFile files at each level of node management + - Average TsFile Size Per Level: The average size of TsFile files at each level of node management +- Total Tasks: Number of Tasks for Nodes +- Task Latency: The time consumption of tasks for nodes +- Compaction + - Compaction Read/Write Throughput: The merge read and write speed of nodes per second + - Compactions Per Minute: The number of merged nodes per minute + - Compaction Chunk Status: The number of Chunks in different states merged by nodes + - Compacted-Points Per Minute: The number of merged nodes per minute + +#### Write Performance + +- Average Write Latency: Average node write time, including writing wal and memtable +- P50 Write Latency: Median node write time, including writing wal and memtable +- P99 Write Latency: P99 for node write time, including writing wal and memtable +- WAL + - WAL File Size: Total size of WAL files managed by nodes + - WAL Files:Number of WAL files managed by nodes + - WAL Nodes: Number of WAL nodes managed by nodes + - Checkpoint Creation Time: The time required to create various types of CheckPoints for nodes + - WAL Serialization Time (Total): Total time spent on node WAL serialization + - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster + - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry + - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot + - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush + - WALNode Effective Info Ratio: The effective information ratio of different WALNodes of nodes - WAL Buffer - - **WAL Buffer Cost:** The time taken to flush the SyncBuffer of WAL, including both synchronous and asynchronous flushes. - - **WAL Buffer Used Ratio:** The utilization ratio of the WAL Buffer. - - **WAL Buffer Entries Count:** The number of entries in the WAL Buffer. + - WAL Buffer Latency: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options + - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node + - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node - Flush Statistics - - **Flush MemTable Cost (avg):** The average total flush time, including time spent in different sub-stages. - - **Flush MemTable Cost (50%):** The median total flush time, including time spent in different sub-stages. - - **Flush MemTable Cost (99%):** The P99 total flush time, including time spent in different sub-stages. - - **Flush Sub Task Cost (avg):** The average execution time of flush sub-tasks, including sorting, encoding, and I/O stages. - - **Flush Sub Task Cost (50%):** The median execution time of flush sub-tasks, including sorting, encoding, and I/O stages. - - **Flush Sub Task Cost (99%):** The P99 execution time of flush sub-tasks, including sorting, encoding, and I/O stages. -- **Pending Flush Task Num:** The number of Flush tasks currently in a blocked state. -- **Pending Flush Sub Task Num:** The number of blocked Flush sub-tasks. -- **TsFile Compression Ratio of Flushing MemTable:** The compression ratio of TsFiles generated from flushed MemTables. -- **Flush TsFile Size of DataRegions:** The size of TsFiles generated from flushed MemTables in different DataRegions. -- **Size of Flushing MemTable:** The size of the MemTable currently being flushed. -- **Points Num of Flushing MemTable:** The number of data points being flushed from MemTables in different DataRegions. -- S**eries Num of Flushing MemTable:** The number of time series being flushed from MemTables in different DataRegions. -- **Average Point Num of Flushing MemChunk:** The average number of points in MemChunks being flushed. - -#### 3.4.5 Schema Engine - -- **Schema Engine Mode:** The metadata engine mode used by the node. -- **Schema Consensus Protocol:** The metadata consensus protocol used by the node. -- **Schema Region Number:** The number of SchemaRegions managed by the node. -- **Schema Region Memory Overview:** The total memory used by SchemaRegions on the node. -- **Memory Usage per SchemaRegion:** The average memory usage per SchemaRegion. -- **Cache MNode per SchemaRegion:** The number of cached MNodes per SchemaRegion. -- **MLog Length and Checkpoint****:** The current MLog size and checkpoint position for each SchemaRegion (valid only for SimpleConsensus). -- **Buffer MNode per SchemaRegion:** The number of buffered MNodes per SchemaRegion. -- **Activated Template Count per SchemaRegion:** The number of activated templates per SchemaRegion. -- Time Series Statistics - - **Timeseries Count per SchemaRegion:** The average number of time series per SchemaRegion. - - **Series Type:** The number of time series of different types. - - **Time Series Number:** The total number of time series on the node. - - **Template Series Number:** The total number of template-based time series on the node. - - **Template Series Count per SchemaRegion:** The number of time series created via templates per SchemaRegion. + - Average Flush Latency: The total time spent on node Flush and the average time spent on each sub stage + - P50 Flush Latency: The total time spent on node Flush and the median time spent on each sub stage + - P99 Flush Latency: The total time spent on node Flush and the P99 time spent on each sub stage + - Average Flush Subtask Latency: The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages + - P50 Flush Subtask Latency: The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages + - P99 Flush Subtask Latency: The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages +- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node +- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes +- Tsfile Compression Ratio of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable +- Flush TsFile Size of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions +- Size of Flushing MemTable: The size of the Memtable for node disk flushing +- Points Num of Flushing MemTable: The number of points when flashing data in different DataRegions of a node +- Series Num of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node +- Average Point Num of Flushing MemChunk: The average number of disk flushing points for node MemChunk + +#### Schema Engine + +- Schema Engine Mode: The metadata engine pattern of nodes +- Schema Consensus Protocol: Node metadata consensus protocol +- Schema Region Number:Number of SchemeRegions managed by nodes +- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node +- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion +- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node +- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) +- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node +- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node +- Time Series statistics + - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion + - Series Type: Number of time series of different types of nodes + - Time Series Number: The total number of time series nodes + - Template Series Number: The total number of template time series for nodes + - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node - IMNode Statistics - - **Pinned MNode per SchemaRegion:** The number of pinned IMNodes per SchemaRegion. - - **Pinned Memory per SchemaRegion:** The memory usage of pinned IMNodes per SchemaRegion. - - **Unpinned MNode per SchemaRegion:** The number of unpinned IMNodes per SchemaRegion. - - **Unpinned Memory per SchemaRegion:** The memory usage of unpinned IMNodes per SchemaRegion. - - **Schema File Memory MNode Number:** The total number of pinned and unpinned IMNodes on the node. - - **Release and Flush MNode Rate:** The number of IMNodes released and flushed per second. -- **Cache Hit Rate:** The cache hit ratio of the node. -- **Release and Flush Thread Number:** The number of active threads for releasing and flushing memory. -- **Time Consumed of Release and Flush (avg):** The average execution time for cache release and buffer flush. -- **Time Consumed of Release and Flush (99%):** The P99 execution time for cache release and buffer flush. - -#### 3.4.6 Query Engine - -- Time Consumed at Each Stage - - **The time consumed of query plan stages (avg):** The average time consumed in different query plan stages on the node. - - **The time consumed of query plan stages (50%):** The median time consumed in different query plan stages on the node. - - **The time consumed of query plan stages (99%):** The P99 time consumed in different query plan stages on the node. -- Plan Dispatch Time - - **The time consumed of plan dispatch stages (avg):** The average time consumed in query execution plan dispatch. - - **The time consumed of plan dispatch stages (50%):** The median time consumed in query execution plan dispatch. - - **The time consumed of plan dispatch stages (99%):** The P99 time consumed in query execution plan dispatch. -- Query Execution Time - - **The time consumed of query execution stages (avg):** The average time consumed in query execution on the node. - - **The time consumed of query execution stages (50%):** The median time consumed in query execution on the node. - - **The time consumed of query execution stages (99%):** The P99 time consumed in query execution on the node. + - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion + - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node + - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node + - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node + - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes + - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second +- Cache Hit Rate: Cache hit rate of nodes +- Release and Flush Thread Number: The current number of active Release and Flush threads on the node +- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing +- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing + +#### Query Engine + +- Time Consumption In Each Stage + - Average Query Plan Execution Time: The average time spent on node queries at each stage + - P50 Query Plan Execution Time: Median time spent on node queries at each stage + - P99 Query Plan Execution Time: P99 time consumption for node query at each stage +- Execution Plan Distribution Time + - Average Query Plan Dispatch Time: The average time spent on node query execution plan distribution + - P50 Query Plan Dispatch Time: Median time spent on node query execution plan distribution + - P99 Query Plan Dispatch Time: P99 of node query execution plan distribution time +- Execution Plan Execution Time + - Average Query Execution Time: The average execution time of node query execution plan + - P50 Query Execution Time:Median execution time of node query execution plan + - P99 Query Execution Time: P99 of node query execution plan execution time - Operator Execution Time - - **The time consumed of operator execution stages (avg):** The average time consumed in query operator execution. - - **The time consumed of operator execution (50%):** The median time consumed in query operator execution. - - **The time consumed of operator execution (99%):** The P99 time consumed in query operator execution + - Average Query Operator Execution Time: The average execution time of node query operators + - P50 Query Operator Execution Time: Median execution time of node query operator + - P99 Query Operator Execution Time: P99 of node query operator execution time - Aggregation Query Computation Time - - **The time consumed of query aggregation (avg):** The average time consumed in aggregation query computation. - - **The time consumed of query aggregation (50%):** The median time consumed in aggregation query computation. - - **The time consumed of query aggregation (99%):** The P99 time consumed in aggregation query computation. -- File/Memory Interface Time - - **The time consumed of query scan (avg):** The average time consumed in file/memory interface query scans. - - **The time consumed of query scan (50%):** The median time consumed in file/memory interface query scans. - - **The time consumed of query scan (99%):** The P99 time consumed in file/memory interface query scans. -- Resource Access Count - - **The usage of query resource (avg):** The average number of resource accesses during query execution. - - **The usage of query resource (50%):** The median number of resource accesses during query execution. - - **The usage of query resource (99%):** The P99 number of resource accesses during query execution. + - Average Query Aggregation Execution Time: The average computation time for node aggregation queries + - P50 Query Aggregation Execution Time: Median computation time for node aggregation queries + - P99 Query Aggregation Execution Time: P99 of node aggregation query computation time +- File/Memory Interface Time Consumption + - Average Query Scan Execution Time: The average time spent querying file/memory interfaces for nodes + - P50 Query Scan Execution Time: Median time spent querying file/memory interfaces for nodes + - P99 Query Scan Execution Time: P99 time consumption for node query file/memory interface +- Number Of Resource Visits + - Average Query Resource Utilization: The average number of resource visits for node queries + - P50 Query Resource Utilization: Median number of resource visits for node queries + - P99 Query Resource Utilization: P99 for node query resource access quantity - Data Transmission Time - - **The time consumed of query data exchange (avg):** The average time consumed in query data exchange. - - **The time consumed of query data exchange (50%):** The median time consumed in query data exchange. - - **The time consumed of query data exchange (99%):** The P99 time consumed in query data exchange. -- Data Transmission Count - - **The count of Data Exchange (avg):** The average number of data exchanges during queries. - - **The count of Data Exchange:** The quantiles (median, P99) of data exchanges during queries. -- Task Scheduling Count and Time - - **The number of query queue:** The number of query tasks scheduled. - - **The time consumed of query schedule time (avg):** The average time consumed for query scheduling. - - **The time consumed of query schedule time (50%):** The median time consumed for query scheduling. - - **The time consumed of query schedule time (99%):** The P99 time consumed for query scheduling. - -#### 3.4.7 Query Interface + - Average Query Data Exchange Latency: The average time spent on node query data transmission + - P50 Query Data Exchange Latency: Median query data transmission time for nodes + - P99 Query Data Exchange Latency: P99 for node query data transmission time +- Number Of Data Transfers + - Average Query Data Exchange Count: The average number of data transfers queried by nodes + - Query Data Exchange Count: The quantile of the number of data transfers queried by nodes, including the median and P99 +- Task Scheduling Quantity And Time Consumption + - Query Queue Length: Node query task scheduling quantity + - Average Query Scheduling Latency: The average time spent on scheduling node query tasks + - P50 Query Scheduling Latency: Median time spent on node query task scheduling + - P99 Query Scheduling Latency: P99 of node query task scheduling time + +#### Query Interface - Load Time Series Metadata - - **The time consumed of load timeseries metadata (avg):** The average time consumed for loading time series metadata. - - **The time consumed of load timeseries metadata (50%):** The median time consumed for loading time series metadata. - - **The time consumed of load timeseries metadata (99%):** The P99 time consumed for loading time series metadata. + - Average Timeseries Metadata Load Time: The average time taken for node queries to load time series metadata + - P50 Timeseries Metadata Load Time: Median time spent on loading time series metadata for node queries + - P99 Timeseries Metadata Load Time: P99 time consumption for node query loading time series metadata - Read Time Series - - **The time consumed of read timeseries metadata (avg):** The average time consumed for reading time series. - - **The time consumed of read timeseries metadata (50%):** The median time consumed for reading time series. - - **The time consumed of read timeseries metadata (99%):** The P99 time consumed for reading time series. + - Average Timeseries Metadata Read Time: The average time taken for node queries to read time series + - P50 Timeseries Metadata Read Time: The median time taken for node queries to read time series + - P99 Timeseries Metadata Read Time: P99 time consumption for node query reading time series - Modify Time Series Metadata - - **The time consumed of timeseries metadata modification (avg):** The average time consumed for modifying time series metadata. - - **The time consumed of timeseries metadata modification (50%):** The median time consumed for modifying time series metadata. - - **The time consumed of timeseries metadata modification (99%):** The P99 time consumed for modifying time series metadata. + - Average Timeseries Metadata Modification Time:The average time taken for node queries to modify time series metadata + - P50 Timeseries Metadata Modification Time: Median time spent on querying and modifying time series metadata for nodes + - P99 Timeseries Metadata Modification Time: P99 time consumption for node query and modification of time series metadata - Load Chunk Metadata List - - The time consumed of load chunk metadata list(avg): Average time consumed of loading chunk metadata list by the node - - The time consumed of load chunk metadata list(50%): Median time consumed of loading chunk metadata list by the node - - The time consumed of load chunk metadata list(99%): P99 time consumed of loading chunk metadata list by the node + - Average Chunk Metadata List Load Time: The average time it takes for node queries to load Chunk metadata lists + - P50 Chunk Metadata List Load Time: Median time spent on node query loading Chunk metadata list + - P99 Chunk Metadata List Load Time: P99 time consumption for node query loading Chunk metadata list - Modify Chunk Metadata - - The time consumed of chunk metadata modification(avg): Average time consumed of modifying chunk metadata by the node - - The time consumed of chunk metadata modification(50%): Median time consumed of modifying chunk metadata by the node - - The time consumed of chunk metadata modification(99%): P99 time consumed of modifying chunk metadata by the node -- Filter by Chunk Metadata - - **The time consumed of chunk metadata filter (avg):** The average time consumed for filtering by chunk metadata. - - **The time consumed of chunk metadata filter (50%):** The median time consumed for filtering by chunk metadata. - - **The time consumed of chunk metadata filter (99%):** The P99 time consumed for filtering by chunk metadata. -- Construct Chunk Reader - - **The time consumed of construct chunk reader (avg):** The average time consumed for constructing a Chunk Reader. - - **The time consumed of construct chunk reader (50%):** The median time consumed for constructing a Chunk Reader. - - **The time consumed of construct chunk reader (99%):** The P99 time consumed for constructing a Chunk Reader. + - Average Chunk Metadata Modification Time: The average time it takes for node queries to modify Chunk metadata + - P50 Chunk Metadata Modification Time: The total number of bits spent on modifying Chunk metadata for node queries + - P99 Chunk Metadata Modification Time: P99 time consumption for node query and modification of Chunk metadata +- Filter According To Chunk Metadata + - Average Chunk Metadata Filtering Time: The average time spent on node queries filtering by Chunk metadata + - P50 Chunk Metadata Filtering Time: Median filtering time for node queries based on Chunk metadata + - P99 Chunk Metadata Filtering Time: P99 time consumption for node query filtering based on Chunk metadata +- Constructing Chunk Reader + - Average Chunk Reader Construction Time: The average time spent on constructing Chunk Reader for node queries + - P50 Chunk Reader Construction Time: Median time spent on constructing Chunk Reader for node queries + - P99 Chunk Reader Construction Time: P99 time consumption for constructing Chunk Reader for node queries - Read Chunk - - **The time consumed of read chunk (avg):** The average time consumed for reading a Chunk. - - **The time consumed of read chunk (50%):** The median time consumed for reading a Chunk. - - **The time consumed of read chunk (99%):** The P99 time consumed for reading a Chunk. + - Average Chunk Read Time: The average time taken for node queries to read Chunks + - P50 Chunk Read Time: Median time spent querying nodes to read Chunks + - P99 Chunk Read Time: P99 time spent on querying and reading Chunks for nodes - Initialize Chunk Reader - - **The time consumed of init chunk reader (avg):** The average time consumed for initializing a Chunk Reader. - - **The time consumed of init chunk reader (50%):** The median time consumed for initializing a Chunk Reader. - - **The time consumed of init chunk reader (99%):** The P99 time consumed for initializing a Chunk Reader. -- Build TsBlock from Page Reader - - **The time consumed of build tsblock from page reader (avg):** The average time consumed for building a TsBlock using a Page Reader. - - **The time consumed of build tsblock from page reader (50%):** The median time consumed for building a TsBlock using a Page Reader. - - **The time consumed of build tsblock from page reader (99%):** The P99 time consumed for building a TsBlock using a Page Reader. -- Build TsBlock from Merge Reader - - **The time consumed of build tsblock from merge reader (avg):** The average time consumed for building a TsBlock using a Merge Reader. - - **The time consumed of build tsblock from merge reader (50%):** The median time consumed for building a TsBlock using a Merge Reader. - - **The time consumed of build tsblock from merge reader (99%):** The P99 time consumed for building a TsBlock using a Merge Reader. - -#### 3.4.8 Query Data Exchange - -Time consumed of data exchange in queries. - -- Get TsBlock via Source Handle - - **The time consumed of source handle get tsblock (avg):** The average time consumed for retrieving a TsBlock using the source handle. - - **The time consumed of source handle get tsblock (50%):** The median time consumed for retrieving a TsBlock using the source handle. - - **The time consumed of source handle get tsblock (99%):** The P99 time consumed for retrieving a TsBlock using the source handle. -- Deserialize TsBlock via Source Handle - - **The time consumed of source handle deserialize tsblock (avg):** The average time consumed for deserializing a TsBlock via the source handle. - - **The time consumed of source handle deserialize tsblock (50%):** The median time consumed for deserializing a TsBlock via the source handle. - - **The time consumed of source handle deserialize tsblock (99%):** The P99 time consumed for deserializing a TsBlock via the source handle. -- Send TsBlock via Sink Handle - - **The time consumed of sink handle send tsblock (avg):** The average time consumed for sending a TsBlock via the sink handle. - - **The time consumed of sink handle send tsblock (50%):** The median time consumed for sending a TsBlock via the sink handle. - - **The time consumed of sink handle send tsblock (99%):** The P99 time consumed for sending a TsBlock via the sink handle. -- Handle Data Block Event Callback - - **The time consumed of handling data block event callback (avg):** The average time consumed for handling the callback of a data block event during query execution. - - **The time consumed of handling data block event callback (50%):** The median time consumed for handling the callback of a data block event during query execution. - - **The time consumed of handling data block event callback (99%):** The P99 time consumed for handling the callback of a data block event during query execution. -- Get Data Block Task - - **The time consumed of get data block task (avg):** The average time consumed for retrieving a data block task. - - **The time consumed of get data block task (50%):** The median time consumed for retrieving a data block task. - - **The time consumed of get data block task (99%):** The P99 time consumed for retrieving a data block task. - -#### 3.4.9 Query Related Resource - -- **MppDataExchangeManager:** The number of shuffle sink handles and source handles during queries. -- **LocalExecutionPlanner:** The remaining memory available for query fragments. -- **FragmentInstanceManager:** The context information and count of running query fragments. -- **Coordinator:** The number of queries recorded on the node. -- **MemoryPool Size:** The status of the memory pool related to queries. -- **MemoryPool Capacity:** The size of the query-related memory pool, including the maximum and remaining available capacity. -- **DriverScheduler:** The number of queued query tasks. - -#### 3.4.10 Consensus - IoT Consensus + - Average Chunk Reader Initialization Time: The average time spent initializing Chunk Reader for node queries + - P50 Chunk Reader Initialization Time: Median time spent initializing Chunk Reader for node queries + - P99 Chunk Reader Initialization Time:P99 time spent initializing Chunk Reader for node queries +- Constructing TsBlock Through Page Reader + - Average TsBlock Construction Time from Page Reader: The average time it takes for node queries to construct TsBlock through Page Reader + - P50 TsBlock Construction Time from Page Reader: The median time spent on constructing TsBlock through Page Reader for node queries + - P99 TsBlock Construction Time from Page Reader:Node query using Page Reader to construct TsBlock time-consuming P99 +- Query the construction of TsBlock through Merge Reader + - Average TsBlock Construction Time from Merge Reader: The average time taken for node queries to construct TsBlock through Merge Reader + - P50 TsBlock Construction Time from Merge Reader: The median time spent on constructing TsBlock through Merge Reader for node queries + - P99 TsBlock Construction Time from Merge Reader: Node query using Merge Reader to construct TsBlock time-consuming P99 + +#### Query Data Exchange + +The data exchange for the query is time-consuming. + +- Obtain TsBlock through source handle + - Average Source Handle TsBlock Retrieval Time: The average time taken for node queries to obtain TsBlock through source handle + - P50 Source Handle TsBlock Retrieval Time:Node query obtains the median time spent on TsBlock through source handle + - P99 Source Handle TsBlock Retrieval Time: Node query obtains TsBlock time P99 through source handle +- Deserialize TsBlock through source handle + - Average Source Handle TsBlock Deserialization Time: The average time taken for node queries to deserialize TsBlock through source handle + - P50 Source Handle TsBlock Deserialization Time: The median time taken for node queries to deserialize TsBlock through source handle + - P99 Source Handle TsBlock Deserialization Time: P99 time spent on deserializing TsBlock through source handle for node query +- Send TsBlock through sink handle + - Average Sink Handle TsBlock Transmission Time: The average time taken for node queries to send TsBlock through sink handle + - P50 Sink Handle TsBlock Transmission Time: Node query median time spent sending TsBlock through sink handle + - P99 Sink Handle TsBlock Transmission Time: Node query sends TsBlock through sink handle with a time consumption of P99 +- Callback data block event + - Average Data Block Event Acknowledgment Time: The average time taken for node query callback data block event + - P50 Data Block Event Acknowledgment Time: Median time spent on node query callback data block event + - P99 Data Block Event Acknowledgment Time: P99 time consumption for node query callback data block event +- Get Data Block Tasks + - Average Data Block Task Retrieval Time: The average time taken for node queries to obtain data block tasks + - P50 Data Block Task Retrieval Time: The median time taken for node queries to obtain data block tasks + - P99 Data Block Task Retrieval Time: P99 time consumption for node query to obtain data block task + +#### Query Related Resource + +- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries +- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards +- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running +- Coordinator: The number of queries recorded on the node +- MemoryPool Size: Node query related memory pool situation +- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values +- DriverScheduler Count: Number of queue tasks related to node queries + +#### Consensus - IoT Consensus - Memory Usage - - **IoTConsensus Used Memory:** The memory usage of IoT Consensus, including total used memory, queue memory usage, and synchronization memory usage. -- Synchronization between Nodes - - **IoTConsensus Sync Index:** The sync index size of different DataRegions. - - **IoTConsensus Overview:** The total synchronization lag and cached request count of IoT Consensus. - - **IoTConsensus Search Index Rate:** The growth rate of SearchIndex writes for different DataRegions. - - **IoTConsensus Safe Index Rate:** The growth rate of SafeIndex synchronization for different DataRegions. - - **IoTConsensus LogDispatcher Request Size:** The size of synchronization requests sent to other nodes for different DataRegions. - - **Sync Lag:** The synchronization lag size of different DataRegions. - - **Min Peer Sync Lag:** The minimum synchronization lag to different replicas for different DataRegions. - - **Sync Speed Diff of Peers:** The maximum synchronization lag to different replicas for different DataRegions. - - **IoTConsensus LogEntriesFromWAL Rate:** The rate of retrieving log entries from WAL for different DataRegions. - - **IoTConsensus LogEntriesFromQueue Rate:** The rate of retrieving log entries from the queue for different DataRegions. -- Execution Time of Different Stages - - **The Time Consumed of Different Stages (avg):** The average execution time of different stages in IoT Consensus. - - **The Time Consumed of Different Stages (50%):** The median execution time of different stages in IoT Consensus. - - **The Time Consumed of Different Stages (99%):** The P99 execution time of different stages in IoT Consensus. - -#### 3.4.11 Consensus - DataRegion Ratis Consensus - -- **Ratis Stage Time:** The execution time of different stages in Ratis. -- **Write Log Entry:** The execution time for writing logs in Ratis. -- **Remote / Local Write Time:** The time taken for remote and local writes in Ratis. -- **Remote / Local Write QPS****:** The QPS for remote and local writes in Ratis. -- **RatisConsensus Memory:** The memory usage of Ratis consensus. - -#### 3.4.12 Consensus - SchemaRegion Ratis Consensus - -- **Ratis Stage Time:** The execution time of different stages in Ratis. -- **Write Log Entry:** The execution time for writing logs in Ratis. -- **Remote / Local Write Time:** The time taken for remote and local writes in Ratis. -- **Remote / Local Write QPS****:** The QPS for remote and local writes in Ratis. -- **RatisConsensus Memory:** The memory usage of Ratis consensus. \ No newline at end of file + - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage +- Synchronization Status Between Nodes + - IoTConsensus Sync Index Size: SyncIndex size for different DataRegions of IoT Consumption nodes + - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes + - IoTConsensus Search Index Growth Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus Safe Index Growth Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes + - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node + - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption + - Peer Sync Speed Difference: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption + - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions + - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue +- Different Execution Stages Take Time + - The Time Consumed of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus + - The Time Consumed of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus + - The Time Consumed of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus + +#### Consensus - DataRegion Ratis Consensus + +- Ratis Consensus Stage Latency: The time consumption of different stages of node Ratis +- Ratis Log Write Latency: The time consumption of writing logs at different stages of node Ratis +- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotely +- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely +- RatisConsensus Memory Usage:Memory usage of node Ratis + +#### Consensus - SchemaRegion Ratis Consensus + +- RatisConsensus Stage Latency: The time consumption of different stages of node Ratis +- Ratis Log Write Latency: The time consumption for writing logs at each stage of node Ratis +- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely +- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely +- RatisConsensus Memory Usage: Node Ratis Memory Usage \ No newline at end of file diff --git a/src/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md index ec61a2a41..541ff4946 100644 --- a/src/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ b/src/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -192,18 +192,18 @@ This panel displays the current usage of system CPU, memory, disk, and network r #### CPU -- CPU Core:CPU cores -- CPU Load: - - System CPU Load:The average CPU load and busyness of the entire system during the sampling time - - Process CPU Load:The proportion of CPU occupied by the IoTDB process during sampling time +- CPU Cores:CPU cores +- CPU Utilization: + - System CPU Utilization:The average CPU load and busyness of the entire system during the sampling time + - Process CPU Utilization:The proportion of CPU occupied by the IoTDB process during sampling time - CPU Time Per Minute:The total CPU time of all processes in the system per minute #### Memory - System Memory:The current usage of system memory. - - Commited vm size: The size of virtual memory allocated by the operating system to running processes. - - Total physical memory:The total amount of available physical memory in the system. - - Used physical memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. + - Commited VM Size: The size of virtual memory allocated by the operating system to running processes. + - Total Physical Memory:The total amount of available physical memory in the system. + - Used Physical Memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. - System Swap Memory:Swap Space memory usage. - Process Memory:The usage of memory by the IoTDB process. - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) @@ -213,35 +213,35 @@ This panel displays the current usage of system CPU, memory, disk, and network r #### Disk - Disk Space: - - Total disk space:The maximum disk space that IoTDB can use. - - Used disk space:The disk space already used by IoTDB. -- Log Number Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. + - Total Disk Space:The maximum disk space that IoTDB can use. + - Used Disk Space:The disk space already used by IoTDB. +- Logs Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. - File Count:Number of IoTDB related files - - all:All file quantities + - All:All file quantities - TsFile:Number of TsFiles - - seq:Number of sequential TsFiles - - unseq:Number of unsequence TsFiles - - wal:Number of WAL files - - cross-temp:Number of cross space merge temp files - - inner-seq-temp:Number of merged temp files in sequential space - - innser-unseq-temp:Number of merged temp files in unsequential space - - mods:Number of tombstone files -- Open File Count:Number of file handles opened by the system + - Seq:Number of sequential TsFiles + - Unseq:Number of unsequence TsFiles + - WAL:Number of WAL files + - Cross-Temp:Number of cross space merge temp files + - Inner-Seq-Temp:Number of merged temp files in sequential space + - Innsr-Unseq-Temp:Number of merged temp files in unsequential space + - Mods:Number of tombstone files +- Open File Handles:Number of file handles opened by the system - File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. -- Disk I/O Busy Rate:Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. +- Disk Utilization (%):Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. - Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. -- Disk I/O Ops:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. -- Disk I/O Avg Time:Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. -- Disk I/O Avg Size:Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. -- Disk I/O Avg Queue Size:Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. -- I/O System Call Rate:The frequency of process calls to read and write system calls, similar to IOPS. +- Disk IOPS:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. +- Disk I/O Latency (Avg):Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. +- Disk I/O Request Size (Avg):Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. +- Disk I/O Queue Length (Avg):Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. +- I/O Syscall Rate:The frequency of process calls to read and write system calls, similar to IOPS. - I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. #### JVM - GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window -- GC Allocated/Promoted Size Detail: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications -- GC Data Size Detail:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value +- GC Allocated/Promoted Size: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications +- GC Live Data Size:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value - Heap Memory:JVM heap memory usage. - Maximum heap memory:The maximum available heap memory size for the JVM. - Committed heap memory:The size of heap memory that has been committed by the JVM. @@ -250,105 +250,105 @@ This panel displays the current usage of system CPU, memory, disk, and network r - PS Old Space:The size of the PS Old area. - PS Survivor Space:The size of the PS survivor area. - ...(CMS/G1/ZGC, etc) -- Off Heap Memory:Out of heap memory usage. - - direct memory:Out of heap direct memory. - - mapped memory:Out of heap mapped memory. -- GC Number Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC -- GC Time Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC -- GC Number Per Minute Detail:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC -- GC Time Per Minute Detail:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC -- Time Consumed Of Compilation Per Minute:The total time JVM spends compiling per minute -- The Number of Class: - - loaded:The number of classes currently loaded by the JVM - - unloaded:The number of classes uninstalled by the JVM since system startup -- The Number of Java Thread:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. +- Off-Heap Memory:Out of heap memory usage. + - Direct Memory:Out of heap direct memory. + - Mapped Memory:Out of heap mapped memory. +- GCs Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC +- GC Latency Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC +- GC Events Breakdown Per Minute:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC +- GC Pause Time Breakdown Per Minute:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC +- JIT Compilation Time Per Minute:The total time JVM spends compiling per minute +- Loaded & Unloaded Classes: + - Loaded:The number of classes currently loaded by the JVM + - Unloaded:The number of classes uninstalled by the JVM since system startup +- Active Java Threads:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. #### Network Eno refers to the network card connected to the public network, while lo refers to the virtual network card. -- Net Speed:The speed of network card sending and receiving data -- Receive/Transmit Data Size:The size of data packets sent or received by the network card, calculated from system restart -- Packet Speed:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets -- Connection Num:The current number of socket connections for the selected process (IoTDB only has TCP) +- Network Speed:The speed of network card sending and receiving data +- Network Throughput (Receive/Transmit):The size of data packets sent or received by the network card, calculated from system restart +- Packet Transmission Rate:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets +- Active TCP Connections:The current number of socket connections for the selected process (IoTDB only has TCP) ### 3.2 Performance Overview Dashboard #### Cluster Overview -- Total CPU Core:Total CPU cores of cluster machines +- Total CPU Cores:Total CPU cores of cluster machines - DataNode CPU Load:CPU usage of each DataNode node in the cluster - Disk - Total Disk Space: Total disk size of cluster machines - - DataNode Disk Usage: The disk usage rate of each DataNode in the cluster -- Total Timeseries: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas -- Cluster: Number of ConfigNode and DataNode nodes in the cluster + - DataNode Disk Utilization: The disk usage rate of each DataNode in the cluster +- Total Time Series: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas +- Cluster Info: Number of ConfigNode and DataNode nodes in the cluster - Up Time: The duration of cluster startup until now -- Total Write Point Per Second: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas +- Total Write Throughput: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas - Memory - Total System Memory: Total memory size of cluster machine system - Total Swap Memory: Total size of cluster machine swap memory - - DataNode Process Memory Usage: Memory usage of each DataNode in the cluster -- Total File Number:Total number of cluster management files + - DataNode Process Memory Utilization: Memory usage of each DataNode in the cluster +- Total Files:Total number of cluster management files - Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage -- Total DataBase: The total number of databases managed by the cluster (including replicas) -- Total DataRegion: The total number of DataRegions managed by the cluster -- Total SchemaRegion: The total number of SchemeRegions managed by the cluster +- Total DataBases: The total number of databases managed by the cluster (including replicas) +- Total DataRegions: The total number of DataRegions managed by the cluster +- Total SchemaRegions: The total number of SchemeRegions managed by the cluster #### Node Overview -- CPU Core: The number of CPU cores in the machine where the node is located +- CPU Cores: The number of CPU cores in the machine where the node is located - Disk Space: The disk size of the machine where the node is located -- Timeseries: Number of time series managed by the machine where the node is located (including replicas) +- Time Series: Number of time series managed by the machine where the node is located (including replicas) - System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio -- Write Point Per Second: The write speed per second of the machine where the node is located (including replicas) +- Write Throughput: The write speed per second of the machine where the node is located (including replicas) - System Memory: The system memory size of the machine where the node is located - Swap Memory:The swap memory size of the machine where the node is located -- File Number: Number of files managed by nodes +- File Count: Number of files managed by nodes #### Performance - Session Idle Time:The total idle time and total busy time of the session connection of the node -- Client Connection: The client connection status of the node, including the total number of connections and the number of active connections -- Time Consumed Of Operation: The time consumption of various types of node operations, including average and P99 -- Average Time Consumed Of Interface: The average time consumption of each thrust interface of a node -- P99 Time Consumed Of Interface: P99 time consumption of various thrust interfaces of nodes -- Task Number: The number of system tasks for each node -- Average Time Consumed of Task: The average time spent on various system tasks of a node -- P99 Time Consumed of Task: P99 time consumption for various system tasks of nodes -- Operation Per Second: The number of operations per second for a node +- Client Connections: The client connection status of the node, including the total number of connections and the number of active connections +- Operation Latency: The time consumption of various types of node operations, including average and P99 +- Average Interface Latency: The average time consumption of each thrust interface of a node +- P99 Interface Latency: P99 time consumption of various thrust interfaces of nodes +- Total Tasks: The number of system tasks for each node +- Average Task Latency: The average time spent on various system tasks of a node +- P99 Task Latency: P99 time consumption for various system tasks of nodes +- Operations Per Second: The number of operations per second for a node - Mainstream Process - - Operation Per Second Of Stage: The number of operations per second for each stage of the node's main process - - Average Time Consumed Of Stage: The average time consumption of each stage in the main process of a node - - P99 Time Consumed Of Stage: P99 time consumption for each stage of the node's main process + - Operations Per Second (Stage-wise): The number of operations per second for each stage of the node's main process + - Average Stage Latency: The average time consumption of each stage in the main process of a node + - P99 Stage Latency: P99 time consumption for each stage of the node's main process - Schedule Stage - - OPS Of Schedule: The number of operations per second in each sub stage of the node schedule stage - - Average Time Consumed Of Schedule Stage:The average time consumption of each sub stage in the node schedule stage - - P99 Time Consumed Of Schedule Stage: P99 time consumption for each sub stage of the schedule stage of the node + - Schedule Operations Per Second: The number of operations per second in each sub stage of the node schedule stage + - Average Schedule Stage Latency:The average time consumption of each sub stage in the node schedule stage + - P99 Schedule Stage Latency: P99 time consumption for each sub stage of the schedule stage of the node - Local Schedule Sub Stages - - OPS Of Local Schedule Stage: The number of operations per second in each sub stage of the local schedule node - - Average Time Consumed Of Local Schedule Stage: The average time consumption of each sub stage in the local schedule stage of the node - - P99 Time Consumed Of Local Schedule Stage: P99 time consumption for each sub stage of the local schedule stage of the node + - Local Schedule Operations Per Second: The number of operations per second in each sub stage of the local schedule node + - Average Local Schedule Stage Latency: The average time consumption of each sub stage in the local schedule stage of the node + - P99 Local Schedule Latency: P99 time consumption for each sub stage of the local schedule stage of the node - Storage Stage - - OPS Of Storage Stage: The number of operations per second in each sub stage of the node storage stage - - Average Time Consumed Of Storage Stage: Average time consumption of each sub stage in the node storage stage - - P99 Time Consumed Of Storage Stage: P99 time consumption for each sub stage of node storage stage + - Storage Operations Per Second: The number of operations per second in each sub stage of the node storage stage + - Average Storage Stage Latency: Average time consumption of each sub stage in the node storage stage + - P99 Storage Stage Latency: P99 time consumption for each sub stage of node storage stage - Engine Stage - - OPS Of Engine Stage: The number of operations per second in each sub stage of the node engine stage - - Average Time Consumed Of Engine Stage: The average time consumption of each sub stage in the engine stage of a node - - P99 Time Consumed Of Engine Stage: P99 time consumption of each sub stage in the node engine stage + - Engine Operations Per Second: The number of operations per second in each sub stage of the node engine stage + - Average Engine Stage Latency: The average time consumption of each sub stage in the engine stage of a node + - P99 Engine Stage Latency: P99 time consumption of each sub stage in the node engine stage #### System -- CPU Load: CPU load of nodes -- CPU Time Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores -- GC Time Per Minute:The average GC time per minute for nodes, including YGC and FGC +- CPU Utilization: CPU load of nodes +- CPU Latency Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores +- GC Latency Per Minute:The average GC time per minute for nodes, including YGC and FGC - Heap Memory: Node's heap memory usage -- Off Heap Memory: Non heap memory usage of nodes -- The Number Of Java Thread: Number of Java threads on nodes +- Off-Heap Memory: Non heap memory usage of nodes +- Total Java Threads: Number of Java threads on nodes - File Count:Number of files managed by nodes - File Size: Node management file size situation -- Log Number Per Minute: Different types of logs per minute for nodes +- Logs Per Minute: Different types of logs per minute for nodes ### 3.3 ConfigNode Dashboard @@ -359,13 +359,13 @@ This panel displays the performance of all management nodes in the cluster, incl - Database Count: Number of databases for nodes - Region - DataRegion Count:Number of DataRegions for nodes - - DataRegion Current Status: The state of the DataRegion of the node + - DataRegion Status: The state of the DataRegion of the node - SchemaRegion Count: Number of SchemeRegions for nodes - - SchemaRegion Current Status: The state of the SchemeRegion of the node -- System Memory: The system memory size of the node -- Swap Memory: Node's swap memory size -- ConfigNodes: The running status of the ConfigNode in the cluster where the node is located -- DataNodes:The DataNode situation of the cluster where the node is located + - SchemaRegion Status: The state of the SchemeRegion of the node +- System Memory Utilization: The system memory size of the node +- Swap Memory Utilization: Node's swap memory size +- ConfigNodes Status: The running status of the ConfigNode in the cluster where the node is located +- DataNodes Status:The DataNode situation of the cluster where the node is located - System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load #### NodeInfo @@ -381,15 +381,15 @@ This panel displays the performance of all management nodes in the cluster, incl #### Protocol - Client Count - - Active Client Num: The number of active clients in each thread pool of a node - - Idle Client Num: The number of idle clients in each thread pool of a node - - Borrowed Client Count: Number of borrowed clients in each thread pool of the node - - Created Client Count: Number of created clients for each thread pool of the node - - Destroyed Client Count: The number of destroyed clients in each thread pool of the node + - Active Clients: The number of active clients in each thread pool of a node + - Idle Clients: The number of idle clients in each thread pool of a node + - Borrowed Clients Per Second: Number of borrowed clients in each thread pool of the node + - Created Clients Per Second: Number of created clients for each thread pool of the node + - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node - Client time situation - - Client Mean Active Time: The average active time of clients in each thread pool of a node - - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node - - Client Mean Idle Time: The average idle time of clients in each thread pool of a node + - Average Client Active Time: The average active time of clients in each thread pool of a node + - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node + - Average Client Idle Time: The average idle time of clients in each thread pool of a node #### Partition Table @@ -402,11 +402,11 @@ This panel displays the performance of all management nodes in the cluster, incl #### Consensus -- Ratis Stage Time: The time consumption of each stage of the node's Ratis -- Write Log Entry: The time required to write a log for the Ratis of a node -- Remote / Local Write Time: The time consumption of remote and local writes for the Ratis of nodes -- Remote / Local Write QPS: Remote and local QPS written to node Ratis -- RatisConsensus Memory: Memory usage of Node Ratis consensus protocol +- Ratis Stage Latency: The time consumption of each stage of the node's Ratis +- Write Log Entry Latency: The time required to write a log for the Ratis of a node +- Remote / Local Write Latency: The time consumption of remote and local writes for the Ratis of nodes +- Remote / Local Write Throughput: Remote and local QPS written to node Ratis +- RatisConsensus Memory Utilization: Memory usage of Node Ratis consensus protocol ### 3.4 DataNode Dashboard @@ -414,82 +414,82 @@ This panel displays the monitoring status of all data nodes in the cluster, incl #### Node Overview -- The Number Of Entity: Entity situation of node management -- Write Point Per Second: The write speed per second of the node +- Total Managed Entities: Entity situation of node management +- Write Throughput: The write speed per second of the node - Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. #### Protocol - Node Operation Time Consumption - - The Time Consumed Of Operation (avg): The average time spent on various operations of a node - - The Time Consumed Of Operation (50%): The median time spent on various operations of a node - - The Time Consumed Of Operation (99%): P99 time consumption for various operations of nodes + - Average Operation Latency: The average time spent on various operations of a node + - P50 Operation Latency: The median time spent on various operations of a node + - P99 Operation Latency: P99 time consumption for various operations of nodes - Thrift Statistics - - The QPS Of Interface: QPS of various Thrift interfaces of nodes - - The Avg Time Consumed Of Interface: The average time consumption of each Thrift interface of a node - - Thrift Connection: The number of Thrfit connections of each type of node - - Thrift Active Thread: The number of active Thrift connections for each type of node + - Thrift Interface QPS: QPS of various Thrift interfaces of nodes + - Average Thrift Interface Latency: The average time consumption of each Thrift interface of a node + - Thrift Connections: The number of Thrfit connections of each type of node + - Active Thrift Threads: The number of active Thrift connections for each type of node - Client Statistics - - Active Client Num: The number of active clients in each thread pool of a node - - Idle Client Num: The number of idle clients in each thread pool of a node - - Borrowed Client Count:Number of borrowed clients for each thread pool of a node - - Created Client Count: Number of created clients for each thread pool of the node - - Destroyed Client Count: The number of destroyed clients in each thread pool of the node - - Client Mean Active Time: The average active time of clients in each thread pool of a node - - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node - - Client Mean Idle Time: The average idle time of clients in each thread pool of a node + - Active Clients: The number of active clients in each thread pool of a node + - Idle Clients: The number of idle clients in each thread pool of a node + - Borrowed Clients Per Second:Number of borrowed clients for each thread pool of a node + - Created Clients Per Second: Number of created clients for each thread pool of the node + - Destroyed Clients Per Second: The number of destroyed clients in each thread pool of the node + - Average Client Active Time: The average active time of clients in each thread pool of a node + - Average Client Borrowing Latency: The average borrowing waiting time of clients in each thread pool of a node + - Average Client Idle Time: The average idle time of clients in each thread pool of a node #### Storage Engine - File Count: Number of files of various types managed by nodes - File Size: Node management of various types of file sizes - TsFile - - TsFile Total Size In Each Level: The total size of TsFile files at each level of node management - - TsFile Count In Each Level: Number of TsFile files at each level of node management - - Avg TsFile Size In Each Level: The average size of TsFile files at each level of node management -- Task Number: Number of Tasks for Nodes -- The Time Consumed of Task: The time consumption of tasks for nodes + - Total TsFile Size Per Level: The total size of TsFile files at each level of node management + - TsFile Count Per Level: Number of TsFile files at each level of node management + - Average TsFile Size Per Level: The average size of TsFile files at each level of node management +- Total Tasks: Number of Tasks for Nodes +- Task Latency: The time consumption of tasks for nodes - Compaction - - Compaction Read And Write Per Second: The merge read and write speed of nodes per second - - Compaction Number Per Minute: The number of merged nodes per minute - - Compaction Process Chunk Status: The number of Chunks in different states merged by nodes - - Compacted Point Num Per Minute: The number of merged nodes per minute + - Compaction Read/Write Throughput: The merge read and write speed of nodes per second + - Compactions Per Minute: The number of merged nodes per minute + - Compaction Chunk Status: The number of Chunks in different states merged by nodes + - Compacted-Points Per Minute: The number of merged nodes per minute #### Write Performance -- Write Cost(avg): Average node write time, including writing wal and memtable -- Write Cost(50%): Median node write time, including writing wal and memtable -- Write Cost(99%): P99 for node write time, including writing wal and memtable +- Average Write Latency: Average node write time, including writing wal and memtable +- P50 Write Latency: Median node write time, including writing wal and memtable +- P99 Write Latency: P99 for node write time, including writing wal and memtable - WAL - WAL File Size: Total size of WAL files managed by nodes - - WAL File Num:Number of WAL files managed by nodes - - WAL Nodes Num: Number of WAL nodes managed by nodes - - Make Checkpoint Costs: The time required to create various types of CheckPoints for nodes - - WAL Serialize Total Cost: Total time spent on node WAL serialization + - WAL Files:Number of WAL files managed by nodes + - WAL Nodes: Number of WAL nodes managed by nodes + - Checkpoint Creation Time: The time required to create various types of CheckPoints for nodes + - WAL Serialization Time (Total): Total time spent on node WAL serialization - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush - - Effective Info Ratio Of WALNode: The effective information ratio of different WALNodes of nodes + - WALNode Effective Info Ratio: The effective information ratio of different WALNodes of nodes - WAL Buffer - - WAL Buffer Cost: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options + - WAL Buffer Latency: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node - Flush Statistics - - Flush MemTable Cost(avg): The total time spent on node Flush and the average time spent on each sub stage - - Flush MemTable Cost(50%): The total time spent on node Flush and the median time spent on each sub stage - - Flush MemTable Cost(99%): The total time spent on node Flush and the P99 time spent on each sub stage - - Flush Sub Task Cost(avg): The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages - - Flush Sub Task Cost(50%): The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages - - Flush Sub Task Cost(99%): The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages + - Average Flush Latency: The total time spent on node Flush and the average time spent on each sub stage + - P50 Flush Latency: The total time spent on node Flush and the median time spent on each sub stage + - P99 Flush Latency: The total time spent on node Flush and the P99 time spent on each sub stage + - Average Flush Subtask Latency: The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages + - P50 Flush Subtask Latency: The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages + - P99 Flush Subtask Latency: The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages - Pending Flush Task Num: The number of Flush tasks in a blocked state for a node - Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes -- Tsfile Compression Ratio Of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable -- Flush TsFile Size Of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions -- Size Of Flushing MemTable: The size of the Memtable for node disk flushing -- Points Num Of Flushing MemTable: The number of points when flashing data in different DataRegions of a node -- Series Num Of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node -- Average Point Num Of Flushing MemChunk: The average number of disk flushing points for node MemChunk +- Tsfile Compression Ratio of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable +- Flush TsFile Size of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions +- Size of Flushing MemTable: The size of the Memtable for node disk flushing +- Points Num of Flushing MemTable: The number of points when flashing data in different DataRegions of a node +- Series Num of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node +- Average Point Num of Flushing MemChunk: The average number of disk flushing points for node MemChunk #### Schema Engine @@ -523,117 +523,117 @@ This panel displays the monitoring status of all data nodes in the cluster, incl #### Query Engine - Time Consumption In Each Stage - - The time consumed of query plan stages(avg): The average time spent on node queries at each stage - - The time consumed of query plan stages(50%): Median time spent on node queries at each stage - - The time consumed of query plan stages(99%): P99 time consumption for node query at each stage + - Average Query Plan Execution Time: The average time spent on node queries at each stage + - P50 Query Plan Execution Time: Median time spent on node queries at each stage + - P99 Query Plan Execution Time: P99 time consumption for node query at each stage - Execution Plan Distribution Time - - The time consumed of plan dispatch stages(avg): The average time spent on node query execution plan distribution - - The time consumed of plan dispatch stages(50%): Median time spent on node query execution plan distribution - - The time consumed of plan dispatch stages(99%): P99 of node query execution plan distribution time + - Average Query Plan Dispatch Time: The average time spent on node query execution plan distribution + - P50 Query Plan Dispatch Time: Median time spent on node query execution plan distribution + - P99 Query Plan Dispatch Time: P99 of node query execution plan distribution time - Execution Plan Execution Time - - The time consumed of query execution stages(avg): The average execution time of node query execution plan - - The time consumed of query execution stages(50%):Median execution time of node query execution plan - - The time consumed of query execution stages(99%): P99 of node query execution plan execution time + - Average Query Execution Time: The average execution time of node query execution plan + - P50 Query Execution Time:Median execution time of node query execution plan + - P99 Query Execution Time: P99 of node query execution plan execution time - Operator Execution Time - - The time consumed of operator execution stages(avg): The average execution time of node query operators - - The time consumed of operator execution(50%): Median execution time of node query operator - - The time consumed of operator execution(99%): P99 of node query operator execution time + - Average Query Operator Execution Time: The average execution time of node query operators + - P50 Query Operator Execution Time: Median execution time of node query operator + - P99 Query Operator Execution Time: P99 of node query operator execution time - Aggregation Query Computation Time - - The time consumed of query aggregation(avg): The average computation time for node aggregation queries - - The time consumed of query aggregation(50%): Median computation time for node aggregation queries - - The time consumed of query aggregation(99%): P99 of node aggregation query computation time + - Average Query Aggregation Execution Time: The average computation time for node aggregation queries + - P50 Query Aggregation Execution Time: Median computation time for node aggregation queries + - P99 Query Aggregation Execution Time: P99 of node aggregation query computation time - File/Memory Interface Time Consumption - - The time consumed of query scan(avg): The average time spent querying file/memory interfaces for nodes - - The time consumed of query scan(50%): Median time spent querying file/memory interfaces for nodes - - The time consumed of query scan(99%): P99 time consumption for node query file/memory interface + - Average Query Scan Execution Time: The average time spent querying file/memory interfaces for nodes + - P50 Query Scan Execution Time: Median time spent querying file/memory interfaces for nodes + - P99 Query Scan Execution Time: P99 time consumption for node query file/memory interface - Number Of Resource Visits - - The usage of query resource(avg): The average number of resource visits for node queries - - The usage of query resource(50%): Median number of resource visits for node queries - - The usage of query resource(99%): P99 for node query resource access quantity + - Average Query Resource Utilization: The average number of resource visits for node queries + - P50 Query Resource Utilization: Median number of resource visits for node queries + - P99 Query Resource Utilization: P99 for node query resource access quantity - Data Transmission Time - - The time consumed of query data exchange(avg): The average time spent on node query data transmission - - The time consumed of query data exchange(50%): Median query data transmission time for nodes - - The time consumed of query data exchange(99%): P99 for node query data transmission time + - Average Query Data Exchange Latency: The average time spent on node query data transmission + - P50 Query Data Exchange Latency: Median query data transmission time for nodes + - P99 Query Data Exchange Latency: P99 for node query data transmission time - Number Of Data Transfers - - The count of Data Exchange(avg): The average number of data transfers queried by nodes - - The count of Data Exchange: The quantile of the number of data transfers queried by nodes, including the median and P99 + - Average Query Data Exchange Count: The average number of data transfers queried by nodes + - Query Data Exchange Count: The quantile of the number of data transfers queried by nodes, including the median and P99 - Task Scheduling Quantity And Time Consumption - - The number of query queue: Node query task scheduling quantity - - The time consumed of query schedule time(avg): The average time spent on scheduling node query tasks - - The time consumed of query schedule time(50%): Median time spent on node query task scheduling - - The time consumed of query schedule time(99%): P99 of node query task scheduling time + - Query Queue Length: Node query task scheduling quantity + - Average Query Scheduling Latency: The average time spent on scheduling node query tasks + - P50 Query Scheduling Latency: Median time spent on node query task scheduling + - P99 Query Scheduling Latency: P99 of node query task scheduling time #### Query Interface - Load Time Series Metadata - - The time consumed of load timeseries metadata(avg): The average time taken for node queries to load time series metadata - - The time consumed of load timeseries metadata(50%): Median time spent on loading time series metadata for node queries - - The time consumed of load timeseries metadata(99%): P99 time consumption for node query loading time series metadata + - Average Timeseries Metadata Load Time: The average time taken for node queries to load time series metadata + - P50 Timeseries Metadata Load Time: Median time spent on loading time series metadata for node queries + - P99 Timeseries Metadata Load Time: P99 time consumption for node query loading time series metadata - Read Time Series - - The time consumed of read timeseries metadata(avg): The average time taken for node queries to read time series - - The time consumed of read timeseries metadata(50%): The median time taken for node queries to read time series - - The time consumed of read timeseries metadata(99%): P99 time consumption for node query reading time series + - Average Timeseries Metadata Read Time: The average time taken for node queries to read time series + - P50 Timeseries Metadata Read Time: The median time taken for node queries to read time series + - P99 Timeseries Metadata Read Time: P99 time consumption for node query reading time series - Modify Time Series Metadata - - The time consumed of timeseries metadata modification(avg):The average time taken for node queries to modify time series metadata - - The time consumed of timeseries metadata modification(50%): Median time spent on querying and modifying time series metadata for nodes - - The time consumed of timeseries metadata modification(99%): P99 time consumption for node query and modification of time series metadata + - Average Timeseries Metadata Modification Time:The average time taken for node queries to modify time series metadata + - P50 Timeseries Metadata Modification Time: Median time spent on querying and modifying time series metadata for nodes + - P99 Timeseries Metadata Modification Time: P99 time consumption for node query and modification of time series metadata - Load Chunk Metadata List - - The time consumed of load chunk metadata list(avg): The average time it takes for node queries to load Chunk metadata lists - - The time consumed of load chunk metadata list(50%): Median time spent on node query loading Chunk metadata list - - The time consumed of load chunk metadata list(99%): P99 time consumption for node query loading Chunk metadata list + - Average Chunk Metadata List Load Time: The average time it takes for node queries to load Chunk metadata lists + - P50 Chunk Metadata List Load Time: Median time spent on node query loading Chunk metadata list + - P99 Chunk Metadata List Load Time: P99 time consumption for node query loading Chunk metadata list - Modify Chunk Metadata - - The time consumed of chunk metadata modification(avg): The average time it takes for node queries to modify Chunk metadata - - The time consumed of chunk metadata modification(50%): The total number of bits spent on modifying Chunk metadata for node queries - - The time consumed of chunk metadata modification(99%): P99 time consumption for node query and modification of Chunk metadata + - Average Chunk Metadata Modification Time: The average time it takes for node queries to modify Chunk metadata + - P50 Chunk Metadata Modification Time: The total number of bits spent on modifying Chunk metadata for node queries + - P99 Chunk Metadata Modification Time: P99 time consumption for node query and modification of Chunk metadata - Filter According To Chunk Metadata - - The time consumed of chunk metadata filter(avg): The average time spent on node queries filtering by Chunk metadata - - The time consumed of chunk metadata filter(50%): Median filtering time for node queries based on Chunk metadata - - The time consumed of chunk metadata filter(99%): P99 time consumption for node query filtering based on Chunk metadata + - Average Chunk Metadata Filtering Time: The average time spent on node queries filtering by Chunk metadata + - P50 Chunk Metadata Filtering Time: Median filtering time for node queries based on Chunk metadata + - P99 Chunk Metadata Filtering Time: P99 time consumption for node query filtering based on Chunk metadata - Constructing Chunk Reader - - The time consumed of construct chunk reader(avg): The average time spent on constructing Chunk Reader for node queries - - The time consumed of construct chunk reader(50%): Median time spent on constructing Chunk Reader for node queries - - The time consumed of construct chunk reader(99%): P99 time consumption for constructing Chunk Reader for node queries + - Average Chunk Reader Construction Time: The average time spent on constructing Chunk Reader for node queries + - P50 Chunk Reader Construction Time: Median time spent on constructing Chunk Reader for node queries + - P99 Chunk Reader Construction Time: P99 time consumption for constructing Chunk Reader for node queries - Read Chunk - - The time consumed of read chunk(avg): The average time taken for node queries to read Chunks - - The time consumed of read chunk(50%): Median time spent querying nodes to read Chunks - - The time consumed of read chunk(99%): P99 time spent on querying and reading Chunks for nodes + - Average Chunk Read Time: The average time taken for node queries to read Chunks + - P50 Chunk Read Time: Median time spent querying nodes to read Chunks + - P99 Chunk Read Time: P99 time spent on querying and reading Chunks for nodes - Initialize Chunk Reader - - The time consumed of init chunk reader(avg): The average time spent initializing Chunk Reader for node queries - - The time consumed of init chunk reader(50%): Median time spent initializing Chunk Reader for node queries - - The time consumed of init chunk reader(99%):P99 time spent initializing Chunk Reader for node queries + - Average Chunk Reader Initialization Time: The average time spent initializing Chunk Reader for node queries + - P50 Chunk Reader Initialization Time: Median time spent initializing Chunk Reader for node queries + - P99 Chunk Reader Initialization Time:P99 time spent initializing Chunk Reader for node queries - Constructing TsBlock Through Page Reader - - The time consumed of build tsblock from page reader(avg): The average time it takes for node queries to construct TsBlock through Page Reader - - The time consumed of build tsblock from page reader(50%): The median time spent on constructing TsBlock through Page Reader for node queries - - The time consumed of build tsblock from page reader(99%):Node query using Page Reader to construct TsBlock time-consuming P99 + - Average TsBlock Construction Time from Page Reader: The average time it takes for node queries to construct TsBlock through Page Reader + - P50 TsBlock Construction Time from Page Reader: The median time spent on constructing TsBlock through Page Reader for node queries + - P99 TsBlock Construction Time from Page Reader:Node query using Page Reader to construct TsBlock time-consuming P99 - Query the construction of TsBlock through Merge Reader - - The time consumed of build tsblock from merge reader(avg): The average time taken for node queries to construct TsBlock through Merge Reader - - The time consumed of build tsblock from merge reader(50%): The median time spent on constructing TsBlock through Merge Reader for node queries - - The time consumed of build tsblock from merge reader(99%): Node query using Merge Reader to construct TsBlock time-consuming P99 + - Average TsBlock Construction Time from Merge Reader: The average time taken for node queries to construct TsBlock through Merge Reader + - P50 TsBlock Construction Time from Merge Reader: The median time spent on constructing TsBlock through Merge Reader for node queries + - P99 TsBlock Construction Time from Merge Reader: Node query using Merge Reader to construct TsBlock time-consuming P99 #### Query Data Exchange The data exchange for the query is time-consuming. - Obtain TsBlock through source handle - - The time consumed of source handle get tsblock(avg): The average time taken for node queries to obtain TsBlock through source handle - - The time consumed of source handle get tsblock(50%):Node query obtains the median time spent on TsBlock through source handle - - The time consumed of source handle get tsblock(99%): Node query obtains TsBlock time P99 through source handle + - Average Source Handle TsBlock Retrieval Time: The average time taken for node queries to obtain TsBlock through source handle + - P50 Source Handle TsBlock Retrieval Time:Node query obtains the median time spent on TsBlock through source handle + - P99 Source Handle TsBlock Retrieval Time: Node query obtains TsBlock time P99 through source handle - Deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(avg): The average time taken for node queries to deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(50%): The median time taken for node queries to deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(99%): P99 time spent on deserializing TsBlock through source handle for node query + - Average Source Handle TsBlock Deserialization Time: The average time taken for node queries to deserialize TsBlock through source handle + - P50 Source Handle TsBlock Deserialization Time: The median time taken for node queries to deserialize TsBlock through source handle + - P99 Source Handle TsBlock Deserialization Time: P99 time spent on deserializing TsBlock through source handle for node query - Send TsBlock through sink handle - - The time consumed of sink handle send tsblock(avg): The average time taken for node queries to send TsBlock through sink handle - - The time consumed of sink handle send tsblock(50%): Node query median time spent sending TsBlock through sink handle - - The time consumed of sink handle send tsblock(99%): Node query sends TsBlock through sink handle with a time consumption of P99 + - Average Sink Handle TsBlock Transmission Time: The average time taken for node queries to send TsBlock through sink handle + - P50 Sink Handle TsBlock Transmission Time: Node query median time spent sending TsBlock through sink handle + - P99 Sink Handle TsBlock Transmission Time: Node query sends TsBlock through sink handle with a time consumption of P99 - Callback data block event - - The time consumed of on acknowledge data block event task(avg): The average time taken for node query callback data block event - - The time consumed of on acknowledge data block event task(50%): Median time spent on node query callback data block event - - The time consumed of on acknowledge data block event task(99%): P99 time consumption for node query callback data block event + - Average Data Block Event Acknowledgment Time: The average time taken for node query callback data block event + - P50 Data Block Event Acknowledgment Time: Median time spent on node query callback data block event + - P99 Data Block Event Acknowledgment Time: P99 time consumption for node query callback data block event - Get Data Block Tasks - - The time consumed of get data block task(avg): The average time taken for node queries to obtain data block tasks - - The time consumed of get data block task(50%): The median time taken for node queries to obtain data block tasks - - The time consumed of get data block task(99%): P99 time consumption for node query to obtain data block task + - Average Data Block Task Retrieval Time: The average time taken for node queries to obtain data block tasks + - P50 Data Block Task Retrieval Time: The median time taken for node queries to obtain data block tasks + - P99 Data Block Task Retrieval Time: P99 time consumption for node query to obtain data block task #### Query Related Resource @@ -643,40 +643,40 @@ The data exchange for the query is time-consuming. - Coordinator: The number of queries recorded on the node - MemoryPool Size: Node query related memory pool situation - MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values -- DriverScheduler: Number of queue tasks related to node queries +- DriverScheduler Count: Number of queue tasks related to node queries #### Consensus - IoT Consensus - Memory Usage - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage - Synchronization Status Between Nodes - - IoTConsensus Sync Index: SyncIndex size for different DataRegions of IoT Consumption nodes + - IoTConsensus Sync Index Size: SyncIndex size for different DataRegions of IoT Consumption nodes - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes - - IoTConsensus Search Index Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus Safe Index Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus Search Index Growth Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus Safe Index Growth Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption - - Sync Speed Diff Of Peers: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption + - Peer Sync Speed Difference: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue - Different Execution Stages Take Time - - The Time Consumed Of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus - - The Time Consumed Of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus - - The Time Consumed Of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus + - The Time Consumed of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus + - The Time Consumed of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus + - The Time Consumed of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus #### Consensus - DataRegion Ratis Consensus -- Ratis Stage Time: The time consumption of different stages of node Ratis -- Write Log Entry: The time consumption of writing logs at different stages of node Ratis -- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotely -- Remote / Local Write QPS: QPS written by node Ratis locally or remotely -- RatisConsensus Memory:Memory usage of node Ratis +- Ratis Consensus Stage Latency: The time consumption of different stages of node Ratis +- Ratis Log Write Latency: The time consumption of writing logs at different stages of node Ratis +- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotely +- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely +- RatisConsensus Memory Usage:Memory usage of node Ratis #### Consensus - SchemaRegion Ratis Consensus -- Ratis Stage Time: The time consumption of different stages of node Ratis -- Write Log Entry: The time consumption for writing logs at each stage of node Ratis -- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely -- Remote / Local Write QPS: QPS written by node Ratis locally or remotely -- RatisConsensus Memory: Node Ratis Memory Usage \ No newline at end of file +- RatisConsensus Stage Latency: The time consumption of different stages of node Ratis +- Ratis Log Write Latency: The time consumption for writing logs at each stage of node Ratis +- Remote / Local Write Latency: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely +- Remote / Local Write Throughput(QPS): QPS written by node Ratis locally or remotely +- RatisConsensus Memory Usage: Node Ratis Memory Usage \ No newline at end of file diff --git a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md index 14b1e29e5..8d2526c4a 100644 --- a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -190,185 +190,185 @@ cd grafana-* 该面板展示了当前系统CPU、内存、磁盘、网络资源的使用情况已经JVM的部分状况。 -#### 3.1.1 CPU +#### CPU -- CPU Core:CPU 核数 -- CPU Load: - - System CPU Load:整个系统在采样时间内 CPU 的平均负载和繁忙程度 - - Process CPU Load:IoTDB 进程在采样时间内占用的 CPU 比例 +- CPU Cores:CPU 核数 +- CPU Utilization: + - System CPU Utilization:整个系统在采样时间内 CPU 的平均负载和繁忙程度 + - Process CPU Utilization:IoTDB 进程在采样时间内占用的 CPU 比例 - CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 -#### 3.1.2 Memory +#### Memory - System Memory:当前系统内存的使用情况。 - - Commited vm size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 - - Total physical memory:系统可用物理内存的总量。 - - Used physical memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 + - Commited VM Size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 + - Total Physical Memory:系统可用物理内存的总量。 + - Used Physical Memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 - System Swap Memory:交换空间(Swap Space)内存用量。 - Process Memory:IoTDB 进程使用内存的情况。 - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) - Total Memory:IoTDB 进程当前已经从操作系统中请求到的内存总量。 - Used Memory:IoTDB 进程当前已经使用的内存总量。 -#### 3.1.3 Disk +#### Disk - Disk Space: - - Total disk space:IoTDB 可使用的最大磁盘空间。 - - Used disk space:IoTDB 已经使用的磁盘空间。 -- Log Number Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 + - Total Disk Space:IoTDB 可使用的最大磁盘空间。 + - Used Disk Space:IoTDB 已经使用的磁盘空间。 +- Logs Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 - File Count:IoTDB 相关文件数量 - - all:所有文件数量 + - All:所有文件数量 - TsFile:TsFile 数量 - - seq:顺序 TsFile 数量 - - unseq:乱序 TsFile 数量 - - wal:WAL 文件数量 - - cross-temp:跨空间合并 temp 文件数量 - - inner-seq-temp:顺序空间内合并 temp 文件数量 - - innser-unseq-temp:乱序空间内合并 temp 文件数量 - - mods:墓碑文件数量 -- Open File Count:系统打开的文件句柄数量 + - Seq:顺序 TsFile 数量 + - Unseq:乱序 TsFile 数量 + - WAL:WAL 文件数量 + - Cross-Temp:跨空间合并 temp 文件数量 + - Tnner-Seq-Temp:顺序空间内合并 temp 文件数量 + - Innser-Unseq-Temp:乱序空间内合并 temp 文件数量 + - Mods:墓碑文件数量 +- Open File Handles:系统打开的文件句柄数量 - File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 -- Disk I/O Busy Rate:等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 +- Disk Utilization (%):等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 - Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 -- Disk I/O Ops:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 -- Disk I/O Avg Time:等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 -- Disk I/O Avg Size:等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 -- Disk I/O Avg Queue Size:等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 -- I/O System Call Rate:进程调用读写系统调用的频率,类似于 IOPS。 +- Disk IOPS:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 +- Disk I/O Latency (Avg):等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 +- Disk I/O Request Size (Avg):等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 +- Disk I/O Queue Length (Avg):等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 +- I/O Syscall Rate:进程调用读写系统调用的频率,类似于 IOPS。 - I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 -#### 3.1.4 JVM +#### JVM - GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 -- GC Allocated/Promoted Size Detail: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 -- GC Data Size Detail:节点 JVM 长期存活的对象大小和对应代际允许的最大值 +- GC Allocated/Promoted Size: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 +- GC Live Data Size:节点 JVM 长期存活的对象大小和对应代际允许的最大值 - Heap Memory:JVM 堆内存使用情况。 - - Maximum heap memory:JVM 最大可用的堆内存大小。 - - Committed heap memory:JVM 已提交的堆内存大小。 - - Used heap memory:JVM 已经使用的堆内存大小。 + - Maximum Heap Memory:JVM 最大可用的堆内存大小。 + - Committed Heap Memory:JVM 已提交的堆内存大小。 + - Used Heap Memory:JVM 已经使用的堆内存大小。 - PS Eden Space:PS Young 区的大小。 - PS Old Space:PS Old 区的大小。 - PS Survivor Space:PS Survivor 区的大小。 - ...(CMS/G1/ZGC 等) -- Off Heap Memory:堆外内存用量。 - - direct memory:堆外直接内存。 - - mapped memory:堆外映射内存。 -- GC Number Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC -- GC Number Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC -- Time Consumed Of Compilation Per Minute:每分钟 JVM 用于编译的总时间 -- The Number of Class: - - loaded:JVM 目前已经加载的类的数量 - - unloaded:系统启动至今 JVM 卸载的类的数量 -- The Number of Java Thread:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 - -#### 3.1.5 Network +- Off-Heap Memory:堆外内存用量。 + - Direct Memory:堆外直接内存。 + - Mapped Memory:堆外映射内存。 +- GCs Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC +- GC Latency Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC +- GC Events Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC +- GC Pause Time Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC +- JIT Compilation Time Per Minute:每分钟 JVM 用于编译的总时间 +- Loaded & Unloaded Classes: + - Loaded:JVM 目前已经加载的类的数量 + - Unloaded:系统启动至今 JVM 卸载的类的数量 +- Active Java Threads:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 + +#### Network eno 指的是到公网的网卡,lo 是虚拟网卡。 -- Net Speed:网卡发送和接收数据的速度 -- Receive/Transmit Data Size:网卡发送或者接收的数据包大小,自系统重启后算起 -- Packet Speed:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 -- Connection Num:当前选定进程的 socket 连接数(IoTDB只有 TCP) +- Network Speed:网卡发送和接收数据的速度 +- Network Throughput (Receive/Transmit):网卡发送或者接收的数据包大小,自系统重启后算起 +- Packet Transmission Rate:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 +- Active TCP Connections:当前选定进程的 socket 连接数(IoTDB只有 TCP) ### 3.2 整体性能面板(Performance Overview Dashboard) -#### 3.2.1 Cluster Overview +#### Cluster Overview -- Total CPU Core: 集群机器 CPU 总核数 +- Total CPU Cores: 集群机器 CPU 总核数 - DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 - 磁盘 - Total Disk Space: 集群机器磁盘总大小 - - DataNode Disk Usage: 集群各 DataNode 的磁盘使用率 -- Total Timeseries: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 -- Cluster: 集群 ConfigNode 和 DataNode 节点数量 + - DataNode Disk Utilization: 集群各 DataNode 的磁盘使用率 +- Total Time Series: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 +- Cluster Info: 集群 ConfigNode 和 DataNode 节点数量 - Up Time: 集群启动至今的时长 -- Total Write Point Per Second: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 +- Total Write Throughput: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 - 内存 - Total System Memory: 集群机器系统内存总大小 - Total Swap Memory: 集群机器交换内存总大小 - - DataNode Process Memory Usage: 集群各 DataNode 的内存使用率 -- Total File Number: 集群管理文件总数量 + - DataNode Process Memory Utilization: 集群各 DataNode 的内存使用率 +- Total Files: 集群管理文件总数量 - Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 -- Total DataBase: 集群管理的 Database 总数(含副本) -- Total DataRegion: 集群管理的 DataRegion 总数 -- Total SchemaRegion: 集群管理的 SchemaRegion 总数 +- Total DataBases: 集群管理的 Database 总数(含副本) +- Total DataRegions: 集群管理的 DataRegion 总数 +- Total SchemaRegions: 集群管理的 SchemaRegion 总数 -#### 3.2.2 Node Overview +#### Node Overview -- CPU Core: 节点所在机器的 CPU 核数 +- CPU Cores: 节点所在机器的 CPU 核数 - Disk Space: 节点所在机器的磁盘大小 -- Timeseries: 节点所在机器管理的时间序列数量(含副本) +- Time Series: 节点所在机器管理的时间序列数量(含副本) - System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 -- Write Point Per Second: 节点所在机器的每秒写入速度(含副本) +- Write Throughput: 节点所在机器的每秒写入速度(含副本) - System Memory: 节点所在机器的系统内存大小 - Swap Memory: 节点所在机器的交换内存大小 -- File Number: 节点管理的文件数 +- File Count: 节点管理的文件数 -#### 3.2.3 Performance +#### Performance - Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 -- Client Connection: 节点的客户端连接情况,包括总连接数和活跃连接数 -- Time Consumed Of Operation: 节点的各类型操作耗时,包括平均值和P99 -- Average Time Consumed Of Interface: 节点的各个 thrift 接口平均耗时 -- P99 Time Consumed Of Interface: 节点的各个 thrift 接口的 P99 耗时数 -- Task Number: 节点的各项系统任务数量 -- Average Time Consumed of Task: 节点的各项系统任务的平均耗时 -- P99 Time Consumed of Task: 节点的各项系统任务的 P99 耗时 -- Operation Per Second: 节点的每秒操作数 +- Client Connections: 节点的客户端连接情况,包括总连接数和活跃连接数 +- Operation Latency: 节点的各类型操作耗时,包括平均值和P99 +- Average Interface Latency: 节点的各个 thrift 接口平均耗时 +- P99 Interface Latency: 节点的各个 thrift 接口的 P99 耗时数 +- Total Tasks: 节点的各项系统任务数量 +- Average Task Latency: 节点的各项系统任务的平均耗时 +- P99 Task Latency: 节点的各项系统任务的 P99 耗时 +- Operations Per Second: 节点的每秒操作数 - 主流程 - - Operation Per Second Of Stage: 节点主流程各阶段的每秒操作数 - - Average Time Consumed Of Stage: 节点主流程各阶段平均耗时 - - P99 Time Consumed Of Stage: 节点主流程各阶段 P99 耗时 + - Operations Per Second (Stage-wise): 节点主流程各阶段的每秒操作数 + - Average Stage Latency: 节点主流程各阶段平均耗时 + - P99 Stage Latency: 节点主流程各阶段 P99 耗时 - Schedule 阶段 - - OPS Of Schedule: 节点 schedule 阶段各子阶段每秒操作数 - - Average Time Consumed Of Schedule Stage: 节点 schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Schedule Stage: 节点的 schedule 阶段各子阶段 P99 耗时 + - Schedule Operations Per Second: 节点 schedule 阶段各子阶段每秒操作数 + - Average Schedule Stage Latency: 节点 schedule 阶段各子阶段平均耗时 + - P99 Schedule Stage Latency: 节点的 schedule 阶段各子阶段 P99 耗时 - Local Schedule 各子阶段 - - OPS Of Local Schedule Stage: 节点 local schedule 各子阶段每秒操作数 - - Average Time Consumed Of Local Schedule Stage: 节点 local schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Local Schedule Stage: 节点的 local schedule 阶段各子阶段 P99 耗时 + - Local Schedule Operations Per Second: 节点 local schedule 各子阶段每秒操作数 + - Average Local Schedule Stage Latency: 节点 local schedule 阶段各子阶段平均耗时 + - P99 Local Schedule Latency: 节点的 local schedule 阶段各子阶段 P99 耗时 - Storage 阶段 - - OPS Of Storage Stage: 节点 storage 阶段各子阶段每秒操作数 - - Average Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段平均耗时 - - P99 Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段 P99 耗时 + - Storage Operations Per Second: 节点 storage 阶段各子阶段每秒操作数 + - Average Storage Stage Latency: 节点 storage 阶段各子阶段平均耗时 + - P99 Storage Stage Latency: 节点 storage 阶段各子阶段 P99 耗时 - Engine 阶段 - - OPS Of Engine Stage: 节点 engine 阶段各子阶段每秒操作数 - - Average Time Consumed Of Engine Stage: 节点的 engine 阶段各子阶段平均耗时 - - P99 Time Consumed Of Engine Stage: 节点 engine 阶段各子阶段的 P99 耗时 + - Engine Operations Per Second: 节点 engine 阶段各子阶段每秒操作数 + - Average Engine Stage Latency: 节点的 engine 阶段各子阶段平均耗时 + - P99 Engine Stage Latency: 节点 engine 阶段各子阶段的 P99 耗时 -#### 3.2.4 System +#### System -- CPU Load: 节点的 CPU 负载 -- CPU Time Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 -- GC Time Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC +- CPU Utilization: 节点的 CPU 负载 +- CPU Latency Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 +- GC Latency Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC - Heap Memory: 节点的堆内存使用情况 -- Off Heap Memory: 节点的非堆内存使用情况 -- The Number Of Java Thread: 节点的 Java 线程数量情况 +- Off-Heap Memory: 节点的非堆内存使用情况 +- Total Java Threads: 节点的 Java 线程数量情况 - File Count: 节点管理的文件数量情况 - File Size: 节点管理文件大小情况 -- Log Number Per Minute: 节点的每分钟不同类型日志情况 +- Logs Per Minute: 节点的每分钟不同类型日志情况 ### 3.3 ConfigNode 面板(ConfigNode Dashboard) 该面板展示了集群中所有管理节点的表现情况,包括分区、节点信息、客户端连接情况统计等。 -#### 3.3.1 Node Overview +#### Node Overview - Database Count: 节点的数据库数量 - Region - DataRegion Count: 节点的 DataRegion 数量 - - DataRegion Current Status: 节点的 DataRegion 的状态 + - DataRegion Status: 节点的 DataRegion 的状态 - SchemaRegion Count: 节点的 SchemaRegion 数量 - - SchemaRegion Current Status: 节点的 SchemaRegion 的状态 -- System Memory: 节点的系统内存大小 -- Swap Memory: 节点的交换区内存大小 -- ConfigNodes: 节点所在集群的 ConfigNode 的运行状态 -- DataNodes: 节点所在集群的 DataNode 情况 + - SchemaRegion Status: 节点的 SchemaRegion 的状态 +- System Memory Utilization: 节点的系统内存大小 +- Swap Memory Utilization: 节点的交换区内存大小 +- ConfigNodes Status: 节点所在集群的 ConfigNode 的运行状态 +- DataNodes Status: 节点所在集群的 DataNode 情况 - System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 -#### 3.3.2 NodeInfo +#### NodeInfo - Node Count: 节点所在集群的节点数量,包括 ConfigNode 和 DataNode - ConfigNode Status: 节点所在集群的 ConfigNode 节点的状态 @@ -378,20 +378,20 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - DataRegion Distribution: 节点所在集群的 DataRegion 的分布情况 - DataRegionGroup Leader Distribution: 节点所在集群的 DataRegionGroup 的 Leader 分布情况 -#### 3.3.3 Protocol +#### Protocol - 客户端数量统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点各线程池的借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 + - Active Clients: 节点各线程池的活跃客户端数量 + - Idle Clients: 节点各线程池的空闲客户端数量 + - Borrowed Clients Per Second: 节点各线程池的借用客户端数量 + - Created Clients Per Second: 节点各线程池的创建客户端数量 + - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 - 客户端时间情况 - - Client Mean Active Time: 节点各线程池客户端的平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + - Average Client Active Time: 节点各线程池客户端的平均活跃时间 + - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 + - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 -#### 3.3.4 Partition Table +#### Partition Table - SchemaRegionGroup Count: 节点所在集群的 Database 的 SchemaRegionGroup 的数量 - DataRegionGroup Count: 节点所在集群的 Database 的 DataRegionGroup 的数量 @@ -400,98 +400,98 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - DataRegion Status: 节点所在集群的 DataRegion 状态 - SchemaRegion Status: 节点所在集群的 SchemaRegion 的状态 -#### 3.3.5 Consensus +#### Consensus -- Ratis Stage Time: 节点的 Ratis 各阶段耗时 -- Write Log Entry: 节点的 Ratis 写 Log 的耗时 -- Remote / Local Write Time: 节点的 Ratis 的远程写入和本地写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 的远程和本地写入的 QPS -- RatisConsensus Memory: 节点 Ratis 共识协议的内存使用 +- Ratis Stage Latency: 节点的 Ratis 各阶段耗时 +- Write Log Entry Latency: 节点的 Ratis 写 Log 的耗时 +- Remote/Local Write Latency: 节点的 Ratis 的远程写入和本地写入的耗时 +- Remote/Local Write Throughput: 节点 Ratis 的远程和本地写入的 QPS +- RatisConsensus Memory Utilization: 节点 Ratis 共识协议的内存使用 ### 3.4 DataNode 面板(DataNode Dashboard) 该面板展示了集群中所有数据节点的监控情况,包含写入耗时、查询耗时、存储文件数等。 -#### 3.4.1 Node Overview +#### Node Overview -- The Number Of Entity: 节点管理的实体情况 -- Write Point Per Second: 节点的每秒写入速度 +- Total Managed Entities: 节点管理的实体情况 +- Write Throughput: 节点的每秒写入速度 - Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 -#### 3.4.2 Protocol +#### Protocol - 节点操作耗时 - - The Time Consumed Of Operation (avg): 节点的各项操作的平均耗时 - - The Time Consumed Of Operation (50%): 节点的各项操作耗时的中位数 - - The Time Consumed Of Operation (99%): 节点的各项操作耗时的P99 + - Average Operation Latency: 节点的各项操作的平均耗时 + - P50 Operation Latency: 节点的各项操作耗时的中位数 + - P99 Operation Latency: 节点的各项操作耗时的P99 - Thrift统计 - - The QPS Of Interface: 节点各个 Thrift 接口的 QPS - - The Avg Time Consumed Of Interface: 节点各个 Thrift 接口的平均耗时 - - Thrift Connection: 节点的各类型的 Thrfit 连接数量 - - Thrift Active Thread: 节点各类型的活跃 Thrift 连接数量 + - Thrift Interface QPS: 节点各个 Thrift 接口的 QPS + - Average Thrift Interface Latency: 节点各个 Thrift 接口的平均耗时 + - Thrift Connections: 节点的各类型的 Thrfit 连接数量 + - Active Thrift Threads: 节点各类型的活跃 Thrift 连接数量 - 客户端统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点的各线程池借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 - - Client Mean Active Time: 节点各线程池的客户端平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + - Active Clients: 节点各线程池的活跃客户端数量 + - Idle Clients: 节点各线程池的空闲客户端数量 + - Borrowed Clients Per Second: 节点的各线程池借用客户端数量 + - Created Clients Per Second: 节点各线程池的创建客户端数量 + - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 + - Average Client Active Time: 节点各线程池的客户端平均活跃时间 + - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 + - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 -#### 3.4.3 Storage Engine +#### Storage Engine - File Count: 节点管理的各类型文件数量 - File Size: 节点管理的各类型文件大小 - TsFile - - TsFile Total Size In Each Level: 节点管理的各级别 TsFile 文件总大小 - - TsFile Count In Each Level: 节点管理的各级别 TsFile 文件数量 - - Avg TsFile Size In Each Level: 节点管理的各级别 TsFile 文件的平均大小 -- Task Number: 节点的 Task 数量 -- The Time Consumed of Task: 节点的 Task 的耗时 + - Total TsFile Size Per Level: 节点管理的各级别 TsFile 文件总大小 + - TsFile Count Per Level: 节点管理的各级别 TsFile 文件数量 + - Average TsFile Size Per Level: 节点管理的各级别 TsFile 文件的平均大小 +- Total Tasks: 节点的 Task 数量 +- Task Latency: 节点的 Task 的耗时 - Compaction - - Compaction Read And Write Per Second: 节点的每秒钟合并读写速度 - - Compaction Number Per Minute: 节点的每分钟合并数量 - - Compaction Process Chunk Status: 节点合并不同状态的 Chunk 的数量 - - Compacted Point Num Per Minute: 节点每分钟合并的点数 + - Compaction Read/Write Throughput: 节点的每秒钟合并读写速度 + - Compactions Per Minute: 节点的每分钟合并数量 + - Compaction Chunk Status: 节点合并不同状态的 Chunk 的数量 + - Compacted-Points Per Minute: 节点每分钟合并的点数 -#### 3.4.4 Write Performance +#### Write Performance -- Write Cost(avg): 节点写入耗时平均值,包括写入 wal 和 memtable -- Write Cost(50%): 节点写入耗时中位数,包括写入 wal 和 memtable -- Write Cost(99%): 节点写入耗时的P99,包括写入 wal 和 memtable +- Average Write Latency: 节点写入耗时平均值,包括写入 wal 和 memtable +- P50 Write Latency: 节点写入耗时中位数,包括写入 wal 和 memtable +- P99 Write Latency: 节点写入耗时的P99,包括写入 wal 和 memtable - WAL - WAL File Size: 节点管理的 WAL 文件总大小 - - WAL File Num: 节点管理的 WAL 文件数量 - - WAL Nodes Num: 节点管理的 WAL Node 数量 - - Make Checkpoint Costs: 节点创建各类型的 CheckPoint 的耗时 - - WAL Serialize Total Cost: 节点 WAL 序列化总耗时 + - WAL Files: 节点管理的 WAL 文件数量 + - WAL Nodes: 节点管理的 WAL Node 数量 + - Checkpoint Creation Time: 节点创建各类型的 CheckPoint 的耗时 + - WAL Serialization Time (Total): 节点 WAL 序列化总耗时 - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 - - Effective Info Ratio Of WALNode: 节点的不同 WALNode 的有效信息比 + - WALNode Effective Info Ratio: 节点的不同 WALNode 的有效信息比 - WAL Buffer - - WAL Buffer Cost: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 + - WAL Buffer Latency: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 - Flush统计 - - Flush MemTable Cost(avg): 节点 Flush 的总耗时和各个子阶段耗时的平均值 - - Flush MemTable Cost(50%): 节点 Flush 的总耗时和各个子阶段耗时的中位数 - - Flush MemTable Cost(99%): 节点 Flush 的总耗时和各个子阶段耗时的 P99 - - Flush Sub Task Cost(avg): 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(50%): 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(99%): 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 + - Average Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的平均值 + - P50 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的中位数 + - P99 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的 P99 + - Average Flush Subtask Latency: 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 + - P50 Flush Subtask Latency: 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 + - P99 Flush Subtask Latency: 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 - Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 - Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 -- Tsfile Compression Ratio Of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 -- Flush TsFile Size Of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 -- Size Of Flushing MemTable: 节点刷盘的 Memtable 的大小 -- Points Num Of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 -- Series Num Of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 -- Average Point Num Of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 +- Tsfile Compression Ratio of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 +- Flush TsFile Size of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 +- Size of Flushing MemTable: 节点刷盘的 Memtable 的大小 +- Points Num of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 +- Series Num of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 +- Average Point Num of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 -#### 3.4.5 Schema Engine +#### Schema Engine - Schema Engine Mode: 节点的元数据引擎模式 - Schema Consensus Protocol: 节点的元数据共识协议 @@ -520,122 +520,122 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - Time Consumed of Relead and Flush (avg): 节点触发 cache 释放和 buffer 刷盘耗时的平均值 - Time Consumed of Relead and Flush (99%): 节点触发 cache 释放和 buffer 刷盘的耗时的 P99 -#### 3.4.6 Query Engine +#### Query Engine - 各阶段耗时 - - The time consumed of query plan stages(avg): 节点查询各阶段耗时的平均值 - - The time consumed of query plan stages(50%): 节点查询各阶段耗时的中位数 - - The time consumed of query plan stages(99%): 节点查询各阶段耗时的P99 + - Average Query Plan Execution Time: 节点查询各阶段耗时的平均值 + - P50 Query Plan Execution Time: 节点查询各阶段耗时的中位数 + - P99 Query Plan Execution Time: 节点查询各阶段耗时的P99 - 执行计划分发耗时 - - The time consumed of plan dispatch stages(avg): 节点查询执行计划分发耗时的平均值 - - The time consumed of plan dispatch stages(50%): 节点查询执行计划分发耗时的中位数 - - The time consumed of plan dispatch stages(99%): 节点查询执行计划分发耗时的P99 + - Average Query Plan Dispatch Time: 节点查询执行计划分发耗时的平均值 + - P50 Query Plan Dispatch Time: 节点查询执行计划分发耗时的中位数 + - P99 Query Plan Dispatch Time: 节点查询执行计划分发耗时的P99 - 执行计划执行耗时 - - The time consumed of query execution stages(avg): 节点查询执行计划执行耗时的平均值 - - The time consumed of query execution stages(50%): 节点查询执行计划执行耗时的中位数 - - The time consumed of query execution stages(99%): 节点查询执行计划执行耗时的P99 + - Average Query Execution Time: 节点查询执行计划执行耗时的平均值 + - P50 Query Execution Time: 节点查询执行计划执行耗时的中位数 + - P99 Query Execution Time: 节点查询执行计划执行耗时的P99 - 算子执行耗时 - - The time consumed of operator execution stages(avg): 节点查询算子执行耗时的平均值 - - The time consumed of operator execution(50%): 节点查询算子执行耗时的中位数 - - The time consumed of operator execution(99%): 节点查询算子执行耗时的P99 + - Average Query Operator Execution Time: 节点查询算子执行耗时的平均值 + - P50 Query Operator Execution Time: 节点查询算子执行耗时的中位数 + - P99 Query Operator Execution Time: 节点查询算子执行耗时的P99 - 聚合查询计算耗时 - - The time consumed of query aggregation(avg): 节点聚合查询计算耗时的平均值 - - The time consumed of query aggregation(50%): 节点聚合查询计算耗时的中位数 - - The time consumed of query aggregation(99%): 节点聚合查询计算耗时的P99 + - Average Query Aggregation Execution Time: 节点聚合查询计算耗时的平均值 + - P50 Query Aggregation Execution Time: 节点聚合查询计算耗时的中位数 + - P99 Query Aggregation Execution Time: 节点聚合查询计算耗时的P99 - 文件/内存接口耗时 - - The time consumed of query scan(avg): 节点查询文件/内存接口耗时的平均值 - - The time consumed of query scan(50%): 节点查询文件/内存接口耗时的中位数 - - The time consumed of query scan(99%): 节点查询文件/内存接口耗时的P99 + - Average Query Scan Execution Time: 节点查询文件/内存接口耗时的平均值 + - P50 Query Scan Execution Time: 节点查询文件/内存接口耗时的中位数 + - P99 Query Scan Execution Time: 节点查询文件/内存接口耗时的P99 - 资源访问数量 - - The usage of query resource(avg): 节点查询资源访问数量的平均值 - - The usage of query resource(50%): 节点查询资源访问数量的中位数 - - The usage of query resource(99%): 节点查询资源访问数量的P99 + - Average Query Resource Utilization: 节点查询资源访问数量的平均值 + - P50 Query Resource Utilization: 节点查询资源访问数量的中位数 + - P99 Query Resource Utilization: 节点查询资源访问数量的P99 - 数据传输耗时 - - The time consumed of query data exchange(avg): 节点查询数据传输耗时的平均值 - - The time consumed of query data exchange(50%): 节点查询数据传输耗时的中位数 - - The time consumed of query data exchange(99%): 节点查询数据传输耗时的P99 + - Average Query Data Exchange Latency: 节点查询数据传输耗时的平均值 + - P50 Query Data Exchange Latency: 节点查询数据传输耗时的中位数 + - P99 Query Data Exchange Latency: 节点查询数据传输耗时的P99 - 数据传输数量 - - The count of Data Exchange(avg): 节点查询的数据传输数量的平均值 - - The count of Data Exchange: 节点查询的数据传输数量的分位数,包括中位数和P99 + - Average Query Data Exchange Count: 节点查询的数据传输数量的平均值 + - Query Data Exchange Count: 节点查询的数据传输数量的分位数,包括中位数和P99 - 任务调度数量与耗时 - - The number of query queue: 节点查询任务调度数量 - - The time consumed of query schedule time(avg): 节点查询任务调度耗时的平均值 - - The time consumed of query schedule time(50%): 节点查询任务调度耗时的中位数 - - The time consumed of query schedule time(99%): 节点查询任务调度耗时的P99 + - Query Queue Length: 节点查询任务调度数量 + - Average Query Scheduling Latency: 节点查询任务调度耗时的平均值 + - P50 Query Scheduling Latency: 节点查询任务调度耗时的中位数 + - P99 Query Scheduling Latency: 节点查询任务调度耗时的P99 -#### 3.4.7 Query Interface +#### Query Interface - 加载时间序列元数据 - - The time consumed of load timeseries metadata(avg): 节点查询加载时间序列元数据耗时的平均值 - - The time consumed of load timeseries metadata(50%): 节点查询加载时间序列元数据耗时的中位数 - - The time consumed of load timeseries metadata(99%): 节点查询加载时间序列元数据耗时的P99 + - Average Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的平均值 + - P50 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的中位数 + - P99 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的P99 - 读取时间序列 - - The time consumed of read timeseries metadata(avg): 节点查询读取时间序列耗时的平均值 - - The time consumed of read timeseries metadata(50%): 节点查询读取时间序列耗时的中位数 - - The time consumed of read timeseries metadata(99%): 节点查询读取时间序列耗时的P99 + - Average Timeseries Metadata Read Time: 节点查询读取时间序列耗时的平均值 + - P50 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的中位数 + - P99 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的P99 - 修改时间序列元数据 - - The time consumed of timeseries metadata modification(avg): 节点查询修改时间序列元数据耗时的平均值 - - The time consumed of timeseries metadata modification(50%): 节点查询修改时间序列元数据耗时的中位数 - - The time consumed of timeseries metadata modification(99%): 节点查询修改时间序列元数据耗时的P99 + - Average Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的平均值 + - P50 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的中位数 + - P99 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的P99 - 加载Chunk元数据列表 - - The time consumed of load chunk metadata list(avg): 节点查询加载Chunk元数据列表耗时的平均值 - - The time consumed of load chunk metadata list(50%): 节点查询加载Chunk元数据列表耗时的中位数 - - The time consumed of load chunk metadata list(99%): 节点查询加载Chunk元数据列表耗时的P99 + - Average Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的平均值 + - P50 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的中位数 + - P99 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的P99 - 修改Chunk元数据 - - The time consumed of chunk metadata modification(avg): 节点查询修改Chunk元数据耗时的平均值 - - The time consumed of chunk metadata modification(50%): 节点查询修改Chunk元数据耗时的总位数 - - The time consumed of chunk metadata modification(99%): 节点查询修改Chunk元数据耗时的P99 + - Average Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的平均值 + - P50 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的总位数 + - P99 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的P99 - 按照Chunk元数据过滤 - - The time consumed of chunk metadata filter(avg): 节点查询按照Chunk元数据过滤耗时的平均值 - - The time consumed of chunk metadata filter(50%): 节点查询按照Chunk元数据过滤耗时的中位数 - - The time consumed of chunk metadata filter(99%): 节点查询按照Chunk元数据过滤耗时的P99 + - Average Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的平均值 + - P50 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的中位数 + - P99 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的P99 - 构造Chunk Reader - - The time consumed of construct chunk reader(avg): 节点查询构造Chunk Reader耗时的平均值 - - The time consumed of construct chunk reader(50%): 节点查询构造Chunk Reader耗时的中位数 - - The time consumed of construct chunk reader(99%): 节点查询构造Chunk Reader耗时的P99 + - Average Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的平均值 + - P50 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的中位数 + - P99 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的P99 - 读取Chunk - - The time consumed of read chunk(avg): 节点查询读取Chunk耗时的平均值 - - The time consumed of read chunk(50%): 节点查询读取Chunk耗时的中位数 - - The time consumed of read chunk(99%): 节点查询读取Chunk耗时的P99 + - Average Chunk Read Time: 节点查询读取Chunk耗时的平均值 + - P50 Chunk Read Time: 节点查询读取Chunk耗时的中位数 + - P99 Chunk Read Time: 节点查询读取Chunk耗时的P99 - 初始化Chunk Reader - - The time consumed of init chunk reader(avg): 节点查询初始化Chunk Reader耗时的平均值 - - The time consumed of init chunk reader(50%): 节点查询初始化Chunk Reader耗时的中位数 - - The time consumed of init chunk reader(99%): 节点查询初始化Chunk Reader耗时的P99 -- 通过 Page Reader 构造 TsBlock - - The time consumed of build tsblock from page reader(avg): 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from page reader(50%): 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from page reader(99%): 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 + - Average Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的平均值 + - P50 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的中位数 + - P99 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的P99 +- 通过 Page Reader 构造 TsBlock + - Average TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 + - P50 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 + - P99 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 - 查询通过 Merge Reader 构造 TsBlock - - The time consumed of build tsblock from merge reader(avg): 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from merge reader(50%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from merge reader(99%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 + - Average TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 + - P50 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 + - P99 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 -#### 3.4.8 Query Data Exchange +#### Query Data Exchange 查询的数据交换耗时。 -- 通过 source handle 获取 TsBlock - - The time consumed of source handle get tsblock(avg): 节点查询通过 source handle 获取 TsBlock 耗时的平均值 - - The time consumed of source handle get tsblock(50%): 节点查询通过 source handle 获取 TsBlock 耗时的中位数 - - The time consumed of source handle get tsblock(99%): 节点查询通过 source handle 获取 TsBlock 耗时的P99 +- 通过 source handle 获取 TsBlock + - Average Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的平均值 + - P50 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的中位数 + - P99 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的P99 - 通过 source handle 反序列化 TsBlock - - The time consumed of source handle deserialize tsblock(avg): 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 - - The time consumed of source handle deserialize tsblock(50%): 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 - - The time consumed of source handle deserialize tsblock(99%): 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 -- 通过 sink handle 发送 TsBlock - - The time consumed of sink handle send tsblock(avg): 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 - - The time consumed of sink handle send tsblock(50%): 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 - - The time consumed of sink handle send tsblock(99%): 节点查询通过 sink handle 发送 TsBlock 耗时的P99 + - Average Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 + - P50 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 + - P99 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 +- 通过 sink handle 发送 TsBlock + - Average Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 + - P50 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 + - P99 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的P99 - 回调 data block event - - The time consumed of on acknowledge data block event task(avg): 节点查询回调 data block event 耗时的平均值 - - The time consumed of on acknowledge data block event task(50%): 节点查询回调 data block event 耗时的中位数 - - The time consumed of on acknowledge data block event task(99%): 节点查询回调 data block event 耗时的P99 -- 获取 data block task - - The time consumed of get data block task(avg): 节点查询获取 data block task 耗时的平均值 - - The time consumed of get data block task(50%): 节点查询获取 data block task 耗时的中位数 - - The time consumed of get data block task(99%): 节点查询获取 data block task 耗时的 P99 + - Average Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的平均值 + - P50 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的中位数 + - P99 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的P99 +- 获取 data block task + - Average Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的平均值 + - P50 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的中位数 + - P99 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的 P99 -#### 3.4.9 Query Related Resource +#### Query Related Resource - MppDataExchangeManager: 节点查询时 shuffle sink handle 和 source handle 的数量 - LocalExecutionPlanner: 节点可分配给查询分片的剩余内存 @@ -643,40 +643,40 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - Coordinator: 节点上记录的查询数量 - MemoryPool Size: 节点查询相关的内存池情况 - MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 -- DriverScheduler: 节点查询相关的队列任务数量 +- DriverScheduler Count: 节点查询相关的队列任务数量 -#### 3.4.10 Consensus - IoT Consensus +#### Consensus - IoT Consensus - 内存使用 - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 - 节点间同步情况 - - IoTConsensus Sync Index: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 + - IoTConsensus Sync Index Size: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 - - IoTConsensus Search Index Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 - - IoTConsensus Safe Index Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 + - IoTConsensus Search Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 + - IoTConsensus Safe Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 - - Sync Speed Diff Of Peers: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 + - Peer Sync Speed Difference: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 - 不同执行阶段耗时 - - The Time Consumed Of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 - - The Time Consumed Of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 - - The Time Consumed Of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 + - The Time Consumed of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 + - The Time Consumed of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 + - The Time Consumed of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 -#### 3.4.11 Consensus - DataRegion Ratis Consensus +#### Consensus - DataRegion Ratis Consensus -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 不同阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的 QPS -- RatisConsensus Memory: 节点 Ratis 的内存使用情况 +- Ratis Consensus Stage Latency: 节点 Ratis 不同阶段的耗时 +- Ratis Log Write Latency: 节点 Ratis 写 Log 不同阶段的耗时 +- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的 QPS +- RatisConsensus Memory Usage: 节点 Ratis 的内存使用情况 -#### 3.4.12 Consensus - SchemaRegion Ratis Consensus +#### Consensus - SchemaRegion Ratis Consensus -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 各阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的QPS -- RatisConsensus Memory: 节点 Ratis 内存使用情况 \ No newline at end of file +- RatisConsensus Stage Latency: 节点 Ratis 不同阶段的耗时 +- Ratis Log Write Latency: 节点 Ratis 写 Log 各阶段的耗时 +- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的QPS +- RatisConsensus Memory Usage: 节点 Ratis 内存使用情况 \ No newline at end of file diff --git a/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md index d2156fa29..2e626a773 100644 --- a/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ b/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -194,18 +194,18 @@ cd grafana-* #### CPU -- CPU Core:CPU 核数 -- CPU Load: - - System CPU Load:整个系统在采样时间内 CPU 的平均负载和繁忙程度 - - Process CPU Load:IoTDB 进程在采样时间内占用的 CPU 比例 +- CPU Cores:CPU 核数 +- CPU Utilization: + - System CPU Utilization:整个系统在采样时间内 CPU 的平均负载和繁忙程度 + - Process CPU Utilization:IoTDB 进程在采样时间内占用的 CPU 比例 - CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 #### Memory - System Memory:当前系统内存的使用情况。 - - Commited vm size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 - - Total physical memory:系统可用物理内存的总量。 - - Used physical memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 + - Commited VM Size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 + - Total Physical Memory:系统可用物理内存的总量。 + - Used Physical Memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 - System Swap Memory:交换空间(Swap Space)内存用量。 - Process Memory:IoTDB 进程使用内存的情况。 - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) @@ -215,142 +215,142 @@ cd grafana-* #### Disk - Disk Space: - - Total disk space:IoTDB 可使用的最大磁盘空间。 - - Used disk space:IoTDB 已经使用的磁盘空间。 -- Log Number Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 + - Total Disk Space:IoTDB 可使用的最大磁盘空间。 + - Used Disk Space:IoTDB 已经使用的磁盘空间。 +- Logs Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 - File Count:IoTDB 相关文件数量 - - all:所有文件数量 + - All:所有文件数量 - TsFile:TsFile 数量 - - seq:顺序 TsFile 数量 - - unseq:乱序 TsFile 数量 - - wal:WAL 文件数量 - - cross-temp:跨空间合并 temp 文件数量 - - inner-seq-temp:顺序空间内合并 temp 文件数量 - - innser-unseq-temp:乱序空间内合并 temp 文件数量 - - mods:墓碑文件数量 -- Open File Count:系统打开的文件句柄数量 + - Seq:顺序 TsFile 数量 + - Unseq:乱序 TsFile 数量 + - WAL:WAL 文件数量 + - Cross-Temp:跨空间合并 temp 文件数量 + - Tnner-Seq-Temp:顺序空间内合并 temp 文件数量 + - Innser-Unseq-Temp:乱序空间内合并 temp 文件数量 + - Mods:墓碑文件数量 +- Open File Handles:系统打开的文件句柄数量 - File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 -- Disk I/O Busy Rate:等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 +- Disk Utilization (%):等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 - Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 -- Disk I/O Ops:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 -- Disk I/O Avg Time:等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 -- Disk I/O Avg Size:等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 -- Disk I/O Avg Queue Size:等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 -- I/O System Call Rate:进程调用读写系统调用的频率,类似于 IOPS。 +- Disk IOPS:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 +- Disk I/O Latency (Avg):等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 +- Disk I/O Request Size (Avg):等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 +- Disk I/O Queue Length (Avg):等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 +- I/O Syscall Rate:进程调用读写系统调用的频率,类似于 IOPS。 - I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 #### JVM - GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 -- GC Allocated/Promoted Size Detail: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 -- GC Data Size Detail:节点 JVM 长期存活的对象大小和对应代际允许的最大值 +- GC Allocated/Promoted Size: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 +- GC Live Data Size:节点 JVM 长期存活的对象大小和对应代际允许的最大值 - Heap Memory:JVM 堆内存使用情况。 - - Maximum heap memory:JVM 最大可用的堆内存大小。 - - Committed heap memory:JVM 已提交的堆内存大小。 - - Used heap memory:JVM 已经使用的堆内存大小。 + - Maximum Heap Memory:JVM 最大可用的堆内存大小。 + - Committed Heap Memory:JVM 已提交的堆内存大小。 + - Used Heap Memory:JVM 已经使用的堆内存大小。 - PS Eden Space:PS Young 区的大小。 - PS Old Space:PS Old 区的大小。 - PS Survivor Space:PS Survivor 区的大小。 - ...(CMS/G1/ZGC 等) -- Off Heap Memory:堆外内存用量。 - - direct memory:堆外直接内存。 - - mapped memory:堆外映射内存。 -- GC Number Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC -- GC Number Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC -- Time Consumed Of Compilation Per Minute:每分钟 JVM 用于编译的总时间 -- The Number of Class: - - loaded:JVM 目前已经加载的类的数量 - - unloaded:系统启动至今 JVM 卸载的类的数量 -- The Number of Java Thread:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 +- Off-Heap Memory:堆外内存用量。 + - Direct Memory:堆外直接内存。 + - Mapped Memory:堆外映射内存。 +- GCs Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC +- GC Latency Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC +- GC Events Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC +- GC Pause Time Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC +- JIT Compilation Time Per Minute:每分钟 JVM 用于编译的总时间 +- Loaded & Unloaded Classes: + - Loaded:JVM 目前已经加载的类的数量 + - Unloaded:系统启动至今 JVM 卸载的类的数量 +- Active Java Threads:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 #### Network eno 指的是到公网的网卡,lo 是虚拟网卡。 -- Net Speed:网卡发送和接收数据的速度 -- Receive/Transmit Data Size:网卡发送或者接收的数据包大小,自系统重启后算起 -- Packet Speed:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 -- Connection Num:当前选定进程的 socket 连接数(IoTDB只有 TCP) +- Network Speed:网卡发送和接收数据的速度 +- Network Throughput (Receive/Transmit):网卡发送或者接收的数据包大小,自系统重启后算起 +- Packet Transmission Rate:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 +- Active TCP Connections:当前选定进程的 socket 连接数(IoTDB只有 TCP) ### 3.2 整体性能面板(Performance Overview Dashboard) #### Cluster Overview -- Total CPU Core: 集群机器 CPU 总核数 +- Total CPU Cores: 集群机器 CPU 总核数 - DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 - 磁盘 - Total Disk Space: 集群机器磁盘总大小 - - DataNode Disk Usage: 集群各 DataNode 的磁盘使用率 -- Total Timeseries: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 -- Cluster: 集群 ConfigNode 和 DataNode 节点数量 + - DataNode Disk Utilization: 集群各 DataNode 的磁盘使用率 +- Total Time Series: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 +- Cluster Info: 集群 ConfigNode 和 DataNode 节点数量 - Up Time: 集群启动至今的时长 -- Total Write Point Per Second: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 +- Total Write Throughput: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 - 内存 - Total System Memory: 集群机器系统内存总大小 - Total Swap Memory: 集群机器交换内存总大小 - - DataNode Process Memory Usage: 集群各 DataNode 的内存使用率 -- Total File Number: 集群管理文件总数量 + - DataNode Process Memory Utilization: 集群各 DataNode 的内存使用率 +- Total Files: 集群管理文件总数量 - Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 -- Total DataBase: 集群管理的 Database 总数(含副本) -- Total DataRegion: 集群管理的 DataRegion 总数 -- Total SchemaRegion: 集群管理的 SchemaRegion 总数 +- Total DataBases: 集群管理的 Database 总数(含副本) +- Total DataRegions: 集群管理的 DataRegion 总数 +- Total SchemaRegions: 集群管理的 SchemaRegion 总数 #### Node Overview -- CPU Core: 节点所在机器的 CPU 核数 +- CPU Cores: 节点所在机器的 CPU 核数 - Disk Space: 节点所在机器的磁盘大小 -- Timeseries: 节点所在机器管理的时间序列数量(含副本) +- Time Series: 节点所在机器管理的时间序列数量(含副本) - System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 -- Write Point Per Second: 节点所在机器的每秒写入速度(含副本) +- Write Throughput: 节点所在机器的每秒写入速度(含副本) - System Memory: 节点所在机器的系统内存大小 - Swap Memory: 节点所在机器的交换内存大小 -- File Number: 节点管理的文件数 +- File Count: 节点管理的文件数 #### Performance - Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 -- Client Connection: 节点的客户端连接情况,包括总连接数和活跃连接数 -- Time Consumed Of Operation: 节点的各类型操作耗时,包括平均值和P99 -- Average Time Consumed Of Interface: 节点的各个 thrift 接口平均耗时 -- P99 Time Consumed Of Interface: 节点的各个 thrift 接口的 P99 耗时数 -- Task Number: 节点的各项系统任务数量 -- Average Time Consumed of Task: 节点的各项系统任务的平均耗时 -- P99 Time Consumed of Task: 节点的各项系统任务的 P99 耗时 -- Operation Per Second: 节点的每秒操作数 +- Client Connections: 节点的客户端连接情况,包括总连接数和活跃连接数 +- Operation Latency: 节点的各类型操作耗时,包括平均值和P99 +- Average Interface Latency: 节点的各个 thrift 接口平均耗时 +- P99 Interface Latency: 节点的各个 thrift 接口的 P99 耗时数 +- Total Tasks: 节点的各项系统任务数量 +- Average Task Latency: 节点的各项系统任务的平均耗时 +- P99 Task Latency: 节点的各项系统任务的 P99 耗时 +- Operations Per Second: 节点的每秒操作数 - 主流程 - - Operation Per Second Of Stage: 节点主流程各阶段的每秒操作数 - - Average Time Consumed Of Stage: 节点主流程各阶段平均耗时 - - P99 Time Consumed Of Stage: 节点主流程各阶段 P99 耗时 + - Operations Per Second (Stage-wise): 节点主流程各阶段的每秒操作数 + - Average Stage Latency: 节点主流程各阶段平均耗时 + - P99 Stage Latency: 节点主流程各阶段 P99 耗时 - Schedule 阶段 - - OPS Of Schedule: 节点 schedule 阶段各子阶段每秒操作数 - - Average Time Consumed Of Schedule Stage: 节点 schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Schedule Stage: 节点的 schedule 阶段各子阶段 P99 耗时 + - Schedule Operations Per Second: 节点 schedule 阶段各子阶段每秒操作数 + - Average Schedule Stage Latency: 节点 schedule 阶段各子阶段平均耗时 + - P99 Schedule Stage Latency: 节点的 schedule 阶段各子阶段 P99 耗时 - Local Schedule 各子阶段 - - OPS Of Local Schedule Stage: 节点 local schedule 各子阶段每秒操作数 - - Average Time Consumed Of Local Schedule Stage: 节点 local schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Local Schedule Stage: 节点的 local schedule 阶段各子阶段 P99 耗时 + - Local Schedule Operations Per Second: 节点 local schedule 各子阶段每秒操作数 + - Average Local Schedule Stage Latency: 节点 local schedule 阶段各子阶段平均耗时 + - P99 Local Schedule Latency: 节点的 local schedule 阶段各子阶段 P99 耗时 - Storage 阶段 - - OPS Of Storage Stage: 节点 storage 阶段各子阶段每秒操作数 - - Average Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段平均耗时 - - P99 Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段 P99 耗时 + - Storage Operations Per Second: 节点 storage 阶段各子阶段每秒操作数 + - Average Storage Stage Latency: 节点 storage 阶段各子阶段平均耗时 + - P99 Storage Stage Latency: 节点 storage 阶段各子阶段 P99 耗时 - Engine 阶段 - - OPS Of Engine Stage: 节点 engine 阶段各子阶段每秒操作数 - - Average Time Consumed Of Engine Stage: 节点的 engine 阶段各子阶段平均耗时 - - P99 Time Consumed Of Engine Stage: 节点 engine 阶段各子阶段的 P99 耗时 + - Engine Operations Per Second: 节点 engine 阶段各子阶段每秒操作数 + - Average Engine Stage Latency: 节点的 engine 阶段各子阶段平均耗时 + - P99 Engine Stage Latency: 节点 engine 阶段各子阶段的 P99 耗时 #### System -- CPU Load: 节点的 CPU 负载 -- CPU Time Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 -- GC Time Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC +- CPU Utilization: 节点的 CPU 负载 +- CPU Latency Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 +- GC Latency Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC - Heap Memory: 节点的堆内存使用情况 -- Off Heap Memory: 节点的非堆内存使用情况 -- The Number Of Java Thread: 节点的 Java 线程数量情况 +- Off-Heap Memory: 节点的非堆内存使用情况 +- Total Java Threads: 节点的 Java 线程数量情况 - File Count: 节点管理的文件数量情况 - File Size: 节点管理文件大小情况 -- Log Number Per Minute: 节点的每分钟不同类型日志情况 +- Logs Per Minute: 节点的每分钟不同类型日志情况 ### 3.3 ConfigNode 面板(ConfigNode Dashboard) @@ -361,13 +361,13 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - Database Count: 节点的数据库数量 - Region - DataRegion Count: 节点的 DataRegion 数量 - - DataRegion Current Status: 节点的 DataRegion 的状态 + - DataRegion Status: 节点的 DataRegion 的状态 - SchemaRegion Count: 节点的 SchemaRegion 数量 - - SchemaRegion Current Status: 节点的 SchemaRegion 的状态 -- System Memory: 节点的系统内存大小 -- Swap Memory: 节点的交换区内存大小 -- ConfigNodes: 节点所在集群的 ConfigNode 的运行状态 -- DataNodes: 节点所在集群的 DataNode 情况 + - SchemaRegion Status: 节点的 SchemaRegion 的状态 +- System Memory Utilization: 节点的系统内存大小 +- Swap Memory Utilization: 节点的交换区内存大小 +- ConfigNodes Status: 节点所在集群的 ConfigNode 的运行状态 +- DataNodes Status: 节点所在集群的 DataNode 情况 - System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 #### NodeInfo @@ -383,15 +383,15 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 #### Protocol - 客户端数量统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点各线程池的借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 + - Active Clients: 节点各线程池的活跃客户端数量 + - Idle Clients: 节点各线程池的空闲客户端数量 + - Borrowed Clients Per Second: 节点各线程池的借用客户端数量 + - Created Clients Per Second: 节点各线程池的创建客户端数量 + - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 - 客户端时间情况 - - Client Mean Active Time: 节点各线程池客户端的平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + - Average Client Active Time: 节点各线程池客户端的平均活跃时间 + - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 + - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 #### Partition Table @@ -404,11 +404,11 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 #### Consensus -- Ratis Stage Time: 节点的 Ratis 各阶段耗时 -- Write Log Entry: 节点的 Ratis 写 Log 的耗时 -- Remote / Local Write Time: 节点的 Ratis 的远程写入和本地写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 的远程和本地写入的 QPS -- RatisConsensus Memory: 节点 Ratis 共识协议的内存使用 +- Ratis Stage Latency: 节点的 Ratis 各阶段耗时 +- Write Log Entry Latency: 节点的 Ratis 写 Log 的耗时 +- Remote/Local Write Latency: 节点的 Ratis 的远程写入和本地写入的耗时 +- Remote/Local Write Throughput: 节点 Ratis 的远程和本地写入的 QPS +- RatisConsensus Memory Utilization: 节点 Ratis 共识协议的内存使用 ### 3.4 DataNode 面板(DataNode Dashboard) @@ -416,82 +416,82 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 #### Node Overview -- The Number Of Entity: 节点管理的实体情况 -- Write Point Per Second: 节点的每秒写入速度 +- Total Managed Entities: 节点管理的实体情况 +- Write Throughput: 节点的每秒写入速度 - Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 #### Protocol - 节点操作耗时 - - The Time Consumed Of Operation (avg): 节点的各项操作的平均耗时 - - The Time Consumed Of Operation (50%): 节点的各项操作耗时的中位数 - - The Time Consumed Of Operation (99%): 节点的各项操作耗时的P99 + - Average Operation Latency: 节点的各项操作的平均耗时 + - P50 Operation Latency: 节点的各项操作耗时的中位数 + - P99 Operation Latency: 节点的各项操作耗时的P99 - Thrift统计 - - The QPS Of Interface: 节点各个 Thrift 接口的 QPS - - The Avg Time Consumed Of Interface: 节点各个 Thrift 接口的平均耗时 - - Thrift Connection: 节点的各类型的 Thrfit 连接数量 - - Thrift Active Thread: 节点各类型的活跃 Thrift 连接数量 + - Thrift Interface QPS: 节点各个 Thrift 接口的 QPS + - Average Thrift Interface Latency: 节点各个 Thrift 接口的平均耗时 + - Thrift Connections: 节点的各类型的 Thrfit 连接数量 + - Active Thrift Threads: 节点各类型的活跃 Thrift 连接数量 - 客户端统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点的各线程池借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 - - Client Mean Active Time: 节点各线程池的客户端平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + - Active Clients: 节点各线程池的活跃客户端数量 + - Idle Clients: 节点各线程池的空闲客户端数量 + - Borrowed Clients Per Second: 节点的各线程池借用客户端数量 + - Created Clients Per Second: 节点各线程池的创建客户端数量 + - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 + - Average Client Active Time: 节点各线程池的客户端平均活跃时间 + - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 + - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 #### Storage Engine - File Count: 节点管理的各类型文件数量 - File Size: 节点管理的各类型文件大小 - TsFile - - TsFile Total Size In Each Level: 节点管理的各级别 TsFile 文件总大小 - - TsFile Count In Each Level: 节点管理的各级别 TsFile 文件数量 - - Avg TsFile Size In Each Level: 节点管理的各级别 TsFile 文件的平均大小 -- Task Number: 节点的 Task 数量 -- The Time Consumed of Task: 节点的 Task 的耗时 + - Total TsFile Size Per Level: 节点管理的各级别 TsFile 文件总大小 + - TsFile Count Per Level: 节点管理的各级别 TsFile 文件数量 + - Average TsFile Size Per Level: 节点管理的各级别 TsFile 文件的平均大小 +- Total Tasks: 节点的 Task 数量 +- Task Latency: 节点的 Task 的耗时 - Compaction - - Compaction Read And Write Per Second: 节点的每秒钟合并读写速度 - - Compaction Number Per Minute: 节点的每分钟合并数量 - - Compaction Process Chunk Status: 节点合并不同状态的 Chunk 的数量 - - Compacted Point Num Per Minute: 节点每分钟合并的点数 + - Compaction Read/Write Throughput: 节点的每秒钟合并读写速度 + - Compactions Per Minute: 节点的每分钟合并数量 + - Compaction Chunk Status: 节点合并不同状态的 Chunk 的数量 + - Compacted-Points Per Minute: 节点每分钟合并的点数 #### Write Performance -- Write Cost(avg): 节点写入耗时平均值,包括写入 wal 和 memtable -- Write Cost(50%): 节点写入耗时中位数,包括写入 wal 和 memtable -- Write Cost(99%): 节点写入耗时的P99,包括写入 wal 和 memtable +- Average Write Latency: 节点写入耗时平均值,包括写入 wal 和 memtable +- P50 Write Latency: 节点写入耗时中位数,包括写入 wal 和 memtable +- P99 Write Latency: 节点写入耗时的P99,包括写入 wal 和 memtable - WAL - WAL File Size: 节点管理的 WAL 文件总大小 - - WAL File Num: 节点管理的 WAL 文件数量 - - WAL Nodes Num: 节点管理的 WAL Node 数量 - - Make Checkpoint Costs: 节点创建各类型的 CheckPoint 的耗时 - - WAL Serialize Total Cost: 节点 WAL 序列化总耗时 + - WAL Files: 节点管理的 WAL 文件数量 + - WAL Nodes: 节点管理的 WAL Node 数量 + - Checkpoint Creation Time: 节点创建各类型的 CheckPoint 的耗时 + - WAL Serialization Time (Total): 节点 WAL 序列化总耗时 - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 - - Effective Info Ratio Of WALNode: 节点的不同 WALNode 的有效信息比 + - WALNode Effective Info Ratio: 节点的不同 WALNode 的有效信息比 - WAL Buffer - - WAL Buffer Cost: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 + - WAL Buffer Latency: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 - Flush统计 - - Flush MemTable Cost(avg): 节点 Flush 的总耗时和各个子阶段耗时的平均值 - - Flush MemTable Cost(50%): 节点 Flush 的总耗时和各个子阶段耗时的中位数 - - Flush MemTable Cost(99%): 节点 Flush 的总耗时和各个子阶段耗时的 P99 - - Flush Sub Task Cost(avg): 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(50%): 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(99%): 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 + - Average Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的平均值 + - P50 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的中位数 + - P99 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的 P99 + - Average Flush Subtask Latency: 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 + - P50 Flush Subtask Latency: 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 + - P99 Flush Subtask Latency: 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 - Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 - Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 -- Tsfile Compression Ratio Of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 -- Flush TsFile Size Of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 -- Size Of Flushing MemTable: 节点刷盘的 Memtable 的大小 -- Points Num Of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 -- Series Num Of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 -- Average Point Num Of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 +- Tsfile Compression Ratio of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 +- Flush TsFile Size of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 +- Size of Flushing MemTable: 节点刷盘的 Memtable 的大小 +- Points Num of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 +- Series Num of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 +- Average Point Num of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 #### Schema Engine @@ -525,117 +525,117 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 #### Query Engine - 各阶段耗时 - - The time consumed of query plan stages(avg): 节点查询各阶段耗时的平均值 - - The time consumed of query plan stages(50%): 节点查询各阶段耗时的中位数 - - The time consumed of query plan stages(99%): 节点查询各阶段耗时的P99 + - Average Query Plan Execution Time: 节点查询各阶段耗时的平均值 + - P50 Query Plan Execution Time: 节点查询各阶段耗时的中位数 + - P99 Query Plan Execution Time: 节点查询各阶段耗时的P99 - 执行计划分发耗时 - - The time consumed of plan dispatch stages(avg): 节点查询执行计划分发耗时的平均值 - - The time consumed of plan dispatch stages(50%): 节点查询执行计划分发耗时的中位数 - - The time consumed of plan dispatch stages(99%): 节点查询执行计划分发耗时的P99 + - Average Query Plan Dispatch Time: 节点查询执行计划分发耗时的平均值 + - P50 Query Plan Dispatch Time: 节点查询执行计划分发耗时的中位数 + - P99 Query Plan Dispatch Time: 节点查询执行计划分发耗时的P99 - 执行计划执行耗时 - - The time consumed of query execution stages(avg): 节点查询执行计划执行耗时的平均值 - - The time consumed of query execution stages(50%): 节点查询执行计划执行耗时的中位数 - - The time consumed of query execution stages(99%): 节点查询执行计划执行耗时的P99 + - Average Query Execution Time: 节点查询执行计划执行耗时的平均值 + - P50 Query Execution Time: 节点查询执行计划执行耗时的中位数 + - P99 Query Execution Time: 节点查询执行计划执行耗时的P99 - 算子执行耗时 - - The time consumed of operator execution stages(avg): 节点查询算子执行耗时的平均值 - - The time consumed of operator execution(50%): 节点查询算子执行耗时的中位数 - - The time consumed of operator execution(99%): 节点查询算子执行耗时的P99 + - Average Query Operator Execution Time: 节点查询算子执行耗时的平均值 + - P50 Query Operator Execution Time: 节点查询算子执行耗时的中位数 + - P99 Query Operator Execution Time: 节点查询算子执行耗时的P99 - 聚合查询计算耗时 - - The time consumed of query aggregation(avg): 节点聚合查询计算耗时的平均值 - - The time consumed of query aggregation(50%): 节点聚合查询计算耗时的中位数 - - The time consumed of query aggregation(99%): 节点聚合查询计算耗时的P99 + - Average Query Aggregation Execution Time: 节点聚合查询计算耗时的平均值 + - P50 Query Aggregation Execution Time: 节点聚合查询计算耗时的中位数 + - P99 Query Aggregation Execution Time: 节点聚合查询计算耗时的P99 - 文件/内存接口耗时 - - The time consumed of query scan(avg): 节点查询文件/内存接口耗时的平均值 - - The time consumed of query scan(50%): 节点查询文件/内存接口耗时的中位数 - - The time consumed of query scan(99%): 节点查询文件/内存接口耗时的P99 + - Average Query Scan Execution Time: 节点查询文件/内存接口耗时的平均值 + - P50 Query Scan Execution Time: 节点查询文件/内存接口耗时的中位数 + - P99 Query Scan Execution Time: 节点查询文件/内存接口耗时的P99 - 资源访问数量 - - The usage of query resource(avg): 节点查询资源访问数量的平均值 - - The usage of query resource(50%): 节点查询资源访问数量的中位数 - - The usage of query resource(99%): 节点查询资源访问数量的P99 + - Average Query Resource Utilization: 节点查询资源访问数量的平均值 + - P50 Query Resource Utilization: 节点查询资源访问数量的中位数 + - P99 Query Resource Utilization: 节点查询资源访问数量的P99 - 数据传输耗时 - - The time consumed of query data exchange(avg): 节点查询数据传输耗时的平均值 - - The time consumed of query data exchange(50%): 节点查询数据传输耗时的中位数 - - The time consumed of query data exchange(99%): 节点查询数据传输耗时的P99 + - Average Query Data Exchange Latency: 节点查询数据传输耗时的平均值 + - P50 Query Data Exchange Latency: 节点查询数据传输耗时的中位数 + - P99 Query Data Exchange Latency: 节点查询数据传输耗时的P99 - 数据传输数量 - - The count of Data Exchange(avg): 节点查询的数据传输数量的平均值 - - The count of Data Exchange: 节点查询的数据传输数量的分位数,包括中位数和P99 + - Average Query Data Exchange Count: 节点查询的数据传输数量的平均值 + - Query Data Exchange Count: 节点查询的数据传输数量的分位数,包括中位数和P99 - 任务调度数量与耗时 - - The number of query queue: 节点查询任务调度数量 - - The time consumed of query schedule time(avg): 节点查询任务调度耗时的平均值 - - The time consumed of query schedule time(50%): 节点查询任务调度耗时的中位数 - - The time consumed of query schedule time(99%): 节点查询任务调度耗时的P99 + - Query Queue Length: 节点查询任务调度数量 + - Average Query Scheduling Latency: 节点查询任务调度耗时的平均值 + - P50 Query Scheduling Latency: 节点查询任务调度耗时的中位数 + - P99 Query Scheduling Latency: 节点查询任务调度耗时的P99 #### Query Interface - 加载时间序列元数据 - - The time consumed of load timeseries metadata(avg): 节点查询加载时间序列元数据耗时的平均值 - - The time consumed of load timeseries metadata(50%): 节点查询加载时间序列元数据耗时的中位数 - - The time consumed of load timeseries metadata(99%): 节点查询加载时间序列元数据耗时的P99 + - Average Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的平均值 + - P50 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的中位数 + - P99 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的P99 - 读取时间序列 - - The time consumed of read timeseries metadata(avg): 节点查询读取时间序列耗时的平均值 - - The time consumed of read timeseries metadata(50%): 节点查询读取时间序列耗时的中位数 - - The time consumed of read timeseries metadata(99%): 节点查询读取时间序列耗时的P99 + - Average Timeseries Metadata Read Time: 节点查询读取时间序列耗时的平均值 + - P50 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的中位数 + - P99 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的P99 - 修改时间序列元数据 - - The time consumed of timeseries metadata modification(avg): 节点查询修改时间序列元数据耗时的平均值 - - The time consumed of timeseries metadata modification(50%): 节点查询修改时间序列元数据耗时的中位数 - - The time consumed of timeseries metadata modification(99%): 节点查询修改时间序列元数据耗时的P99 + - Average Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的平均值 + - P50 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的中位数 + - P99 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的P99 - 加载Chunk元数据列表 - - The time consumed of load chunk metadata list(avg): 节点查询加载Chunk元数据列表耗时的平均值 - - The time consumed of load chunk metadata list(50%): 节点查询加载Chunk元数据列表耗时的中位数 - - The time consumed of load chunk metadata list(99%): 节点查询加载Chunk元数据列表耗时的P99 + - Average Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的平均值 + - P50 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的中位数 + - P99 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的P99 - 修改Chunk元数据 - - The time consumed of chunk metadata modification(avg): 节点查询修改Chunk元数据耗时的平均值 - - The time consumed of chunk metadata modification(50%): 节点查询修改Chunk元数据耗时的总位数 - - The time consumed of chunk metadata modification(99%): 节点查询修改Chunk元数据耗时的P99 + - Average Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的平均值 + - P50 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的总位数 + - P99 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的P99 - 按照Chunk元数据过滤 - - The time consumed of chunk metadata filter(avg): 节点查询按照Chunk元数据过滤耗时的平均值 - - The time consumed of chunk metadata filter(50%): 节点查询按照Chunk元数据过滤耗时的中位数 - - The time consumed of chunk metadata filter(99%): 节点查询按照Chunk元数据过滤耗时的P99 + - Average Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的平均值 + - P50 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的中位数 + - P99 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的P99 - 构造Chunk Reader - - The time consumed of construct chunk reader(avg): 节点查询构造Chunk Reader耗时的平均值 - - The time consumed of construct chunk reader(50%): 节点查询构造Chunk Reader耗时的中位数 - - The time consumed of construct chunk reader(99%): 节点查询构造Chunk Reader耗时的P99 + - Average Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的平均值 + - P50 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的中位数 + - P99 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的P99 - 读取Chunk - - The time consumed of read chunk(avg): 节点查询读取Chunk耗时的平均值 - - The time consumed of read chunk(50%): 节点查询读取Chunk耗时的中位数 - - The time consumed of read chunk(99%): 节点查询读取Chunk耗时的P99 + - Average Chunk Read Time: 节点查询读取Chunk耗时的平均值 + - P50 Chunk Read Time: 节点查询读取Chunk耗时的中位数 + - P99 Chunk Read Time: 节点查询读取Chunk耗时的P99 - 初始化Chunk Reader - - The time consumed of init chunk reader(avg): 节点查询初始化Chunk Reader耗时的平均值 - - The time consumed of init chunk reader(50%): 节点查询初始化Chunk Reader耗时的中位数 - - The time consumed of init chunk reader(99%): 节点查询初始化Chunk Reader耗时的P99 + - Average Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的平均值 + - P50 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的中位数 + - P99 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的P99 - 通过 Page Reader 构造 TsBlock - - The time consumed of build tsblock from page reader(avg): 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from page reader(50%): 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from page reader(99%): 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 + - Average TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 + - P50 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 + - P99 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 - 查询通过 Merge Reader 构造 TsBlock - - The time consumed of build tsblock from merge reader(avg): 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from merge reader(50%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from merge reader(99%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 + - Average TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 + - P50 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 + - P99 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 #### Query Data Exchange 查询的数据交换耗时。 - 通过 source handle 获取 TsBlock - - The time consumed of source handle get tsblock(avg): 节点查询通过 source handle 获取 TsBlock 耗时的平均值 - - The time consumed of source handle get tsblock(50%): 节点查询通过 source handle 获取 TsBlock 耗时的中位数 - - The time consumed of source handle get tsblock(99%): 节点查询通过 source handle 获取 TsBlock 耗时的P99 + - Average Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的平均值 + - P50 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的中位数 + - P99 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的P99 - 通过 source handle 反序列化 TsBlock - - The time consumed of source handle deserialize tsblock(avg): 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 - - The time consumed of source handle deserialize tsblock(50%): 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 - - The time consumed of source handle deserialize tsblock(99%): 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 + - Average Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 + - P50 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 + - P99 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 - 通过 sink handle 发送 TsBlock - - The time consumed of sink handle send tsblock(avg): 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 - - The time consumed of sink handle send tsblock(50%): 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 - - The time consumed of sink handle send tsblock(99%): 节点查询通过 sink handle 发送 TsBlock 耗时的P99 + - Average Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 + - P50 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 + - P99 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的P99 - 回调 data block event - - The time consumed of on acknowledge data block event task(avg): 节点查询回调 data block event 耗时的平均值 - - The time consumed of on acknowledge data block event task(50%): 节点查询回调 data block event 耗时的中位数 - - The time consumed of on acknowledge data block event task(99%): 节点查询回调 data block event 耗时的P99 + - Average Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的平均值 + - P50 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的中位数 + - P99 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的P99 - 获取 data block task - - The time consumed of get data block task(avg): 节点查询获取 data block task 耗时的平均值 - - The time consumed of get data block task(50%): 节点查询获取 data block task 耗时的中位数 - - The time consumed of get data block task(99%): 节点查询获取 data block task 耗时的 P99 + - Average Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的平均值 + - P50 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的中位数 + - P99 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的 P99 #### Query Related Resource @@ -645,40 +645,40 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - Coordinator: 节点上记录的查询数量 - MemoryPool Size: 节点查询相关的内存池情况 - MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 -- DriverScheduler: 节点查询相关的队列任务数量 +- DriverScheduler Count: 节点查询相关的队列任务数量 #### Consensus - IoT Consensus - 内存使用 - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 - 节点间同步情况 - - IoTConsensus Sync Index: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 + - IoTConsensus Sync Index Size: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 - - IoTConsensus Search Index Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 - - IoTConsensus Safe Index Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 + - IoTConsensus Search Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 + - IoTConsensus Safe Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 - - Sync Speed Diff Of Peers: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 + - Peer Sync Speed Difference: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 - 不同执行阶段耗时 - - The Time Consumed Of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 - - The Time Consumed Of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 - - The Time Consumed Of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 + - The Time Consumed of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 + - The Time Consumed of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 + - The Time Consumed of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 #### Consensus - DataRegion Ratis Consensus -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 不同阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的 QPS -- RatisConsensus Memory: 节点 Ratis 的内存使用情况 +- Ratis Consensus Stage Latency: 节点 Ratis 不同阶段的耗时 +- Ratis Log Write Latency: 节点 Ratis 写 Log 不同阶段的耗时 +- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的 QPS +- RatisConsensus Memory Usage: 节点 Ratis 的内存使用情况 #### Consensus - SchemaRegion Ratis Consensus -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 各阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的QPS -- RatisConsensus Memory: 节点 Ratis 内存使用情况 \ No newline at end of file +- RatisConsensus Stage Latency: 节点 Ratis 不同阶段的耗时 +- Ratis Log Write Latency: 节点 Ratis 写 Log 各阶段的耗时 +- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的QPS +- RatisConsensus Memory Usage: 节点 Ratis 内存使用情况 \ No newline at end of file diff --git a/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md index 14b1e29e5..8d2526c4a 100644 --- a/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ b/src/zh/UserGuide/latest-Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -190,185 +190,185 @@ cd grafana-* 该面板展示了当前系统CPU、内存、磁盘、网络资源的使用情况已经JVM的部分状况。 -#### 3.1.1 CPU +#### CPU -- CPU Core:CPU 核数 -- CPU Load: - - System CPU Load:整个系统在采样时间内 CPU 的平均负载和繁忙程度 - - Process CPU Load:IoTDB 进程在采样时间内占用的 CPU 比例 +- CPU Cores:CPU 核数 +- CPU Utilization: + - System CPU Utilization:整个系统在采样时间内 CPU 的平均负载和繁忙程度 + - Process CPU Utilization:IoTDB 进程在采样时间内占用的 CPU 比例 - CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 -#### 3.1.2 Memory +#### Memory - System Memory:当前系统内存的使用情况。 - - Commited vm size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 - - Total physical memory:系统可用物理内存的总量。 - - Used physical memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 + - Commited VM Size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 + - Total Physical Memory:系统可用物理内存的总量。 + - Used Physical Memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 - System Swap Memory:交换空间(Swap Space)内存用量。 - Process Memory:IoTDB 进程使用内存的情况。 - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) - Total Memory:IoTDB 进程当前已经从操作系统中请求到的内存总量。 - Used Memory:IoTDB 进程当前已经使用的内存总量。 -#### 3.1.3 Disk +#### Disk - Disk Space: - - Total disk space:IoTDB 可使用的最大磁盘空间。 - - Used disk space:IoTDB 已经使用的磁盘空间。 -- Log Number Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 + - Total Disk Space:IoTDB 可使用的最大磁盘空间。 + - Used Disk Space:IoTDB 已经使用的磁盘空间。 +- Logs Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 - File Count:IoTDB 相关文件数量 - - all:所有文件数量 + - All:所有文件数量 - TsFile:TsFile 数量 - - seq:顺序 TsFile 数量 - - unseq:乱序 TsFile 数量 - - wal:WAL 文件数量 - - cross-temp:跨空间合并 temp 文件数量 - - inner-seq-temp:顺序空间内合并 temp 文件数量 - - innser-unseq-temp:乱序空间内合并 temp 文件数量 - - mods:墓碑文件数量 -- Open File Count:系统打开的文件句柄数量 + - Seq:顺序 TsFile 数量 + - Unseq:乱序 TsFile 数量 + - WAL:WAL 文件数量 + - Cross-Temp:跨空间合并 temp 文件数量 + - Tnner-Seq-Temp:顺序空间内合并 temp 文件数量 + - Innser-Unseq-Temp:乱序空间内合并 temp 文件数量 + - Mods:墓碑文件数量 +- Open File Handles:系统打开的文件句柄数量 - File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 -- Disk I/O Busy Rate:等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 +- Disk Utilization (%):等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 - Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 -- Disk I/O Ops:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 -- Disk I/O Avg Time:等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 -- Disk I/O Avg Size:等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 -- Disk I/O Avg Queue Size:等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 -- I/O System Call Rate:进程调用读写系统调用的频率,类似于 IOPS。 +- Disk IOPS:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 +- Disk I/O Latency (Avg):等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 +- Disk I/O Request Size (Avg):等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 +- Disk I/O Queue Length (Avg):等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 +- I/O Syscall Rate:进程调用读写系统调用的频率,类似于 IOPS。 - I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 -#### 3.1.4 JVM +#### JVM - GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 -- GC Allocated/Promoted Size Detail: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 -- GC Data Size Detail:节点 JVM 长期存活的对象大小和对应代际允许的最大值 +- GC Allocated/Promoted Size: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 +- GC Live Data Size:节点 JVM 长期存活的对象大小和对应代际允许的最大值 - Heap Memory:JVM 堆内存使用情况。 - - Maximum heap memory:JVM 最大可用的堆内存大小。 - - Committed heap memory:JVM 已提交的堆内存大小。 - - Used heap memory:JVM 已经使用的堆内存大小。 + - Maximum Heap Memory:JVM 最大可用的堆内存大小。 + - Committed Heap Memory:JVM 已提交的堆内存大小。 + - Used Heap Memory:JVM 已经使用的堆内存大小。 - PS Eden Space:PS Young 区的大小。 - PS Old Space:PS Old 区的大小。 - PS Survivor Space:PS Survivor 区的大小。 - ...(CMS/G1/ZGC 等) -- Off Heap Memory:堆外内存用量。 - - direct memory:堆外直接内存。 - - mapped memory:堆外映射内存。 -- GC Number Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC -- GC Number Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC -- Time Consumed Of Compilation Per Minute:每分钟 JVM 用于编译的总时间 -- The Number of Class: - - loaded:JVM 目前已经加载的类的数量 - - unloaded:系统启动至今 JVM 卸载的类的数量 -- The Number of Java Thread:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 - -#### 3.1.5 Network +- Off-Heap Memory:堆外内存用量。 + - Direct Memory:堆外直接内存。 + - Mapped Memory:堆外映射内存。 +- GCs Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC +- GC Latency Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC +- GC Events Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC +- GC Pause Time Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC +- JIT Compilation Time Per Minute:每分钟 JVM 用于编译的总时间 +- Loaded & Unloaded Classes: + - Loaded:JVM 目前已经加载的类的数量 + - Unloaded:系统启动至今 JVM 卸载的类的数量 +- Active Java Threads:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 + +#### Network eno 指的是到公网的网卡,lo 是虚拟网卡。 -- Net Speed:网卡发送和接收数据的速度 -- Receive/Transmit Data Size:网卡发送或者接收的数据包大小,自系统重启后算起 -- Packet Speed:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 -- Connection Num:当前选定进程的 socket 连接数(IoTDB只有 TCP) +- Network Speed:网卡发送和接收数据的速度 +- Network Throughput (Receive/Transmit):网卡发送或者接收的数据包大小,自系统重启后算起 +- Packet Transmission Rate:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 +- Active TCP Connections:当前选定进程的 socket 连接数(IoTDB只有 TCP) ### 3.2 整体性能面板(Performance Overview Dashboard) -#### 3.2.1 Cluster Overview +#### Cluster Overview -- Total CPU Core: 集群机器 CPU 总核数 +- Total CPU Cores: 集群机器 CPU 总核数 - DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 - 磁盘 - Total Disk Space: 集群机器磁盘总大小 - - DataNode Disk Usage: 集群各 DataNode 的磁盘使用率 -- Total Timeseries: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 -- Cluster: 集群 ConfigNode 和 DataNode 节点数量 + - DataNode Disk Utilization: 集群各 DataNode 的磁盘使用率 +- Total Time Series: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 +- Cluster Info: 集群 ConfigNode 和 DataNode 节点数量 - Up Time: 集群启动至今的时长 -- Total Write Point Per Second: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 +- Total Write Throughput: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 - 内存 - Total System Memory: 集群机器系统内存总大小 - Total Swap Memory: 集群机器交换内存总大小 - - DataNode Process Memory Usage: 集群各 DataNode 的内存使用率 -- Total File Number: 集群管理文件总数量 + - DataNode Process Memory Utilization: 集群各 DataNode 的内存使用率 +- Total Files: 集群管理文件总数量 - Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 -- Total DataBase: 集群管理的 Database 总数(含副本) -- Total DataRegion: 集群管理的 DataRegion 总数 -- Total SchemaRegion: 集群管理的 SchemaRegion 总数 +- Total DataBases: 集群管理的 Database 总数(含副本) +- Total DataRegions: 集群管理的 DataRegion 总数 +- Total SchemaRegions: 集群管理的 SchemaRegion 总数 -#### 3.2.2 Node Overview +#### Node Overview -- CPU Core: 节点所在机器的 CPU 核数 +- CPU Cores: 节点所在机器的 CPU 核数 - Disk Space: 节点所在机器的磁盘大小 -- Timeseries: 节点所在机器管理的时间序列数量(含副本) +- Time Series: 节点所在机器管理的时间序列数量(含副本) - System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 -- Write Point Per Second: 节点所在机器的每秒写入速度(含副本) +- Write Throughput: 节点所在机器的每秒写入速度(含副本) - System Memory: 节点所在机器的系统内存大小 - Swap Memory: 节点所在机器的交换内存大小 -- File Number: 节点管理的文件数 +- File Count: 节点管理的文件数 -#### 3.2.3 Performance +#### Performance - Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 -- Client Connection: 节点的客户端连接情况,包括总连接数和活跃连接数 -- Time Consumed Of Operation: 节点的各类型操作耗时,包括平均值和P99 -- Average Time Consumed Of Interface: 节点的各个 thrift 接口平均耗时 -- P99 Time Consumed Of Interface: 节点的各个 thrift 接口的 P99 耗时数 -- Task Number: 节点的各项系统任务数量 -- Average Time Consumed of Task: 节点的各项系统任务的平均耗时 -- P99 Time Consumed of Task: 节点的各项系统任务的 P99 耗时 -- Operation Per Second: 节点的每秒操作数 +- Client Connections: 节点的客户端连接情况,包括总连接数和活跃连接数 +- Operation Latency: 节点的各类型操作耗时,包括平均值和P99 +- Average Interface Latency: 节点的各个 thrift 接口平均耗时 +- P99 Interface Latency: 节点的各个 thrift 接口的 P99 耗时数 +- Total Tasks: 节点的各项系统任务数量 +- Average Task Latency: 节点的各项系统任务的平均耗时 +- P99 Task Latency: 节点的各项系统任务的 P99 耗时 +- Operations Per Second: 节点的每秒操作数 - 主流程 - - Operation Per Second Of Stage: 节点主流程各阶段的每秒操作数 - - Average Time Consumed Of Stage: 节点主流程各阶段平均耗时 - - P99 Time Consumed Of Stage: 节点主流程各阶段 P99 耗时 + - Operations Per Second (Stage-wise): 节点主流程各阶段的每秒操作数 + - Average Stage Latency: 节点主流程各阶段平均耗时 + - P99 Stage Latency: 节点主流程各阶段 P99 耗时 - Schedule 阶段 - - OPS Of Schedule: 节点 schedule 阶段各子阶段每秒操作数 - - Average Time Consumed Of Schedule Stage: 节点 schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Schedule Stage: 节点的 schedule 阶段各子阶段 P99 耗时 + - Schedule Operations Per Second: 节点 schedule 阶段各子阶段每秒操作数 + - Average Schedule Stage Latency: 节点 schedule 阶段各子阶段平均耗时 + - P99 Schedule Stage Latency: 节点的 schedule 阶段各子阶段 P99 耗时 - Local Schedule 各子阶段 - - OPS Of Local Schedule Stage: 节点 local schedule 各子阶段每秒操作数 - - Average Time Consumed Of Local Schedule Stage: 节点 local schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Local Schedule Stage: 节点的 local schedule 阶段各子阶段 P99 耗时 + - Local Schedule Operations Per Second: 节点 local schedule 各子阶段每秒操作数 + - Average Local Schedule Stage Latency: 节点 local schedule 阶段各子阶段平均耗时 + - P99 Local Schedule Latency: 节点的 local schedule 阶段各子阶段 P99 耗时 - Storage 阶段 - - OPS Of Storage Stage: 节点 storage 阶段各子阶段每秒操作数 - - Average Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段平均耗时 - - P99 Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段 P99 耗时 + - Storage Operations Per Second: 节点 storage 阶段各子阶段每秒操作数 + - Average Storage Stage Latency: 节点 storage 阶段各子阶段平均耗时 + - P99 Storage Stage Latency: 节点 storage 阶段各子阶段 P99 耗时 - Engine 阶段 - - OPS Of Engine Stage: 节点 engine 阶段各子阶段每秒操作数 - - Average Time Consumed Of Engine Stage: 节点的 engine 阶段各子阶段平均耗时 - - P99 Time Consumed Of Engine Stage: 节点 engine 阶段各子阶段的 P99 耗时 + - Engine Operations Per Second: 节点 engine 阶段各子阶段每秒操作数 + - Average Engine Stage Latency: 节点的 engine 阶段各子阶段平均耗时 + - P99 Engine Stage Latency: 节点 engine 阶段各子阶段的 P99 耗时 -#### 3.2.4 System +#### System -- CPU Load: 节点的 CPU 负载 -- CPU Time Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 -- GC Time Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC +- CPU Utilization: 节点的 CPU 负载 +- CPU Latency Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 +- GC Latency Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC - Heap Memory: 节点的堆内存使用情况 -- Off Heap Memory: 节点的非堆内存使用情况 -- The Number Of Java Thread: 节点的 Java 线程数量情况 +- Off-Heap Memory: 节点的非堆内存使用情况 +- Total Java Threads: 节点的 Java 线程数量情况 - File Count: 节点管理的文件数量情况 - File Size: 节点管理文件大小情况 -- Log Number Per Minute: 节点的每分钟不同类型日志情况 +- Logs Per Minute: 节点的每分钟不同类型日志情况 ### 3.3 ConfigNode 面板(ConfigNode Dashboard) 该面板展示了集群中所有管理节点的表现情况,包括分区、节点信息、客户端连接情况统计等。 -#### 3.3.1 Node Overview +#### Node Overview - Database Count: 节点的数据库数量 - Region - DataRegion Count: 节点的 DataRegion 数量 - - DataRegion Current Status: 节点的 DataRegion 的状态 + - DataRegion Status: 节点的 DataRegion 的状态 - SchemaRegion Count: 节点的 SchemaRegion 数量 - - SchemaRegion Current Status: 节点的 SchemaRegion 的状态 -- System Memory: 节点的系统内存大小 -- Swap Memory: 节点的交换区内存大小 -- ConfigNodes: 节点所在集群的 ConfigNode 的运行状态 -- DataNodes: 节点所在集群的 DataNode 情况 + - SchemaRegion Status: 节点的 SchemaRegion 的状态 +- System Memory Utilization: 节点的系统内存大小 +- Swap Memory Utilization: 节点的交换区内存大小 +- ConfigNodes Status: 节点所在集群的 ConfigNode 的运行状态 +- DataNodes Status: 节点所在集群的 DataNode 情况 - System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 -#### 3.3.2 NodeInfo +#### NodeInfo - Node Count: 节点所在集群的节点数量,包括 ConfigNode 和 DataNode - ConfigNode Status: 节点所在集群的 ConfigNode 节点的状态 @@ -378,20 +378,20 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - DataRegion Distribution: 节点所在集群的 DataRegion 的分布情况 - DataRegionGroup Leader Distribution: 节点所在集群的 DataRegionGroup 的 Leader 分布情况 -#### 3.3.3 Protocol +#### Protocol - 客户端数量统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点各线程池的借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 + - Active Clients: 节点各线程池的活跃客户端数量 + - Idle Clients: 节点各线程池的空闲客户端数量 + - Borrowed Clients Per Second: 节点各线程池的借用客户端数量 + - Created Clients Per Second: 节点各线程池的创建客户端数量 + - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 - 客户端时间情况 - - Client Mean Active Time: 节点各线程池客户端的平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + - Average Client Active Time: 节点各线程池客户端的平均活跃时间 + - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 + - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 -#### 3.3.4 Partition Table +#### Partition Table - SchemaRegionGroup Count: 节点所在集群的 Database 的 SchemaRegionGroup 的数量 - DataRegionGroup Count: 节点所在集群的 Database 的 DataRegionGroup 的数量 @@ -400,98 +400,98 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - DataRegion Status: 节点所在集群的 DataRegion 状态 - SchemaRegion Status: 节点所在集群的 SchemaRegion 的状态 -#### 3.3.5 Consensus +#### Consensus -- Ratis Stage Time: 节点的 Ratis 各阶段耗时 -- Write Log Entry: 节点的 Ratis 写 Log 的耗时 -- Remote / Local Write Time: 节点的 Ratis 的远程写入和本地写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 的远程和本地写入的 QPS -- RatisConsensus Memory: 节点 Ratis 共识协议的内存使用 +- Ratis Stage Latency: 节点的 Ratis 各阶段耗时 +- Write Log Entry Latency: 节点的 Ratis 写 Log 的耗时 +- Remote/Local Write Latency: 节点的 Ratis 的远程写入和本地写入的耗时 +- Remote/Local Write Throughput: 节点 Ratis 的远程和本地写入的 QPS +- RatisConsensus Memory Utilization: 节点 Ratis 共识协议的内存使用 ### 3.4 DataNode 面板(DataNode Dashboard) 该面板展示了集群中所有数据节点的监控情况,包含写入耗时、查询耗时、存储文件数等。 -#### 3.4.1 Node Overview +#### Node Overview -- The Number Of Entity: 节点管理的实体情况 -- Write Point Per Second: 节点的每秒写入速度 +- Total Managed Entities: 节点管理的实体情况 +- Write Throughput: 节点的每秒写入速度 - Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 -#### 3.4.2 Protocol +#### Protocol - 节点操作耗时 - - The Time Consumed Of Operation (avg): 节点的各项操作的平均耗时 - - The Time Consumed Of Operation (50%): 节点的各项操作耗时的中位数 - - The Time Consumed Of Operation (99%): 节点的各项操作耗时的P99 + - Average Operation Latency: 节点的各项操作的平均耗时 + - P50 Operation Latency: 节点的各项操作耗时的中位数 + - P99 Operation Latency: 节点的各项操作耗时的P99 - Thrift统计 - - The QPS Of Interface: 节点各个 Thrift 接口的 QPS - - The Avg Time Consumed Of Interface: 节点各个 Thrift 接口的平均耗时 - - Thrift Connection: 节点的各类型的 Thrfit 连接数量 - - Thrift Active Thread: 节点各类型的活跃 Thrift 连接数量 + - Thrift Interface QPS: 节点各个 Thrift 接口的 QPS + - Average Thrift Interface Latency: 节点各个 Thrift 接口的平均耗时 + - Thrift Connections: 节点的各类型的 Thrfit 连接数量 + - Active Thrift Threads: 节点各类型的活跃 Thrift 连接数量 - 客户端统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点的各线程池借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 - - Client Mean Active Time: 节点各线程池的客户端平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + - Active Clients: 节点各线程池的活跃客户端数量 + - Idle Clients: 节点各线程池的空闲客户端数量 + - Borrowed Clients Per Second: 节点的各线程池借用客户端数量 + - Created Clients Per Second: 节点各线程池的创建客户端数量 + - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 + - Average Client Active Time: 节点各线程池的客户端平均活跃时间 + - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 + - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 -#### 3.4.3 Storage Engine +#### Storage Engine - File Count: 节点管理的各类型文件数量 - File Size: 节点管理的各类型文件大小 - TsFile - - TsFile Total Size In Each Level: 节点管理的各级别 TsFile 文件总大小 - - TsFile Count In Each Level: 节点管理的各级别 TsFile 文件数量 - - Avg TsFile Size In Each Level: 节点管理的各级别 TsFile 文件的平均大小 -- Task Number: 节点的 Task 数量 -- The Time Consumed of Task: 节点的 Task 的耗时 + - Total TsFile Size Per Level: 节点管理的各级别 TsFile 文件总大小 + - TsFile Count Per Level: 节点管理的各级别 TsFile 文件数量 + - Average TsFile Size Per Level: 节点管理的各级别 TsFile 文件的平均大小 +- Total Tasks: 节点的 Task 数量 +- Task Latency: 节点的 Task 的耗时 - Compaction - - Compaction Read And Write Per Second: 节点的每秒钟合并读写速度 - - Compaction Number Per Minute: 节点的每分钟合并数量 - - Compaction Process Chunk Status: 节点合并不同状态的 Chunk 的数量 - - Compacted Point Num Per Minute: 节点每分钟合并的点数 + - Compaction Read/Write Throughput: 节点的每秒钟合并读写速度 + - Compactions Per Minute: 节点的每分钟合并数量 + - Compaction Chunk Status: 节点合并不同状态的 Chunk 的数量 + - Compacted-Points Per Minute: 节点每分钟合并的点数 -#### 3.4.4 Write Performance +#### Write Performance -- Write Cost(avg): 节点写入耗时平均值,包括写入 wal 和 memtable -- Write Cost(50%): 节点写入耗时中位数,包括写入 wal 和 memtable -- Write Cost(99%): 节点写入耗时的P99,包括写入 wal 和 memtable +- Average Write Latency: 节点写入耗时平均值,包括写入 wal 和 memtable +- P50 Write Latency: 节点写入耗时中位数,包括写入 wal 和 memtable +- P99 Write Latency: 节点写入耗时的P99,包括写入 wal 和 memtable - WAL - WAL File Size: 节点管理的 WAL 文件总大小 - - WAL File Num: 节点管理的 WAL 文件数量 - - WAL Nodes Num: 节点管理的 WAL Node 数量 - - Make Checkpoint Costs: 节点创建各类型的 CheckPoint 的耗时 - - WAL Serialize Total Cost: 节点 WAL 序列化总耗时 + - WAL Files: 节点管理的 WAL 文件数量 + - WAL Nodes: 节点管理的 WAL Node 数量 + - Checkpoint Creation Time: 节点创建各类型的 CheckPoint 的耗时 + - WAL Serialization Time (Total): 节点 WAL 序列化总耗时 - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 - - Effective Info Ratio Of WALNode: 节点的不同 WALNode 的有效信息比 + - WALNode Effective Info Ratio: 节点的不同 WALNode 的有效信息比 - WAL Buffer - - WAL Buffer Cost: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 + - WAL Buffer Latency: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 - Flush统计 - - Flush MemTable Cost(avg): 节点 Flush 的总耗时和各个子阶段耗时的平均值 - - Flush MemTable Cost(50%): 节点 Flush 的总耗时和各个子阶段耗时的中位数 - - Flush MemTable Cost(99%): 节点 Flush 的总耗时和各个子阶段耗时的 P99 - - Flush Sub Task Cost(avg): 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(50%): 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(99%): 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 + - Average Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的平均值 + - P50 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的中位数 + - P99 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的 P99 + - Average Flush Subtask Latency: 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 + - P50 Flush Subtask Latency: 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 + - P99 Flush Subtask Latency: 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 - Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 - Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 -- Tsfile Compression Ratio Of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 -- Flush TsFile Size Of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 -- Size Of Flushing MemTable: 节点刷盘的 Memtable 的大小 -- Points Num Of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 -- Series Num Of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 -- Average Point Num Of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 +- Tsfile Compression Ratio of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 +- Flush TsFile Size of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 +- Size of Flushing MemTable: 节点刷盘的 Memtable 的大小 +- Points Num of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 +- Series Num of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 +- Average Point Num of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 -#### 3.4.5 Schema Engine +#### Schema Engine - Schema Engine Mode: 节点的元数据引擎模式 - Schema Consensus Protocol: 节点的元数据共识协议 @@ -520,122 +520,122 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - Time Consumed of Relead and Flush (avg): 节点触发 cache 释放和 buffer 刷盘耗时的平均值 - Time Consumed of Relead and Flush (99%): 节点触发 cache 释放和 buffer 刷盘的耗时的 P99 -#### 3.4.6 Query Engine +#### Query Engine - 各阶段耗时 - - The time consumed of query plan stages(avg): 节点查询各阶段耗时的平均值 - - The time consumed of query plan stages(50%): 节点查询各阶段耗时的中位数 - - The time consumed of query plan stages(99%): 节点查询各阶段耗时的P99 + - Average Query Plan Execution Time: 节点查询各阶段耗时的平均值 + - P50 Query Plan Execution Time: 节点查询各阶段耗时的中位数 + - P99 Query Plan Execution Time: 节点查询各阶段耗时的P99 - 执行计划分发耗时 - - The time consumed of plan dispatch stages(avg): 节点查询执行计划分发耗时的平均值 - - The time consumed of plan dispatch stages(50%): 节点查询执行计划分发耗时的中位数 - - The time consumed of plan dispatch stages(99%): 节点查询执行计划分发耗时的P99 + - Average Query Plan Dispatch Time: 节点查询执行计划分发耗时的平均值 + - P50 Query Plan Dispatch Time: 节点查询执行计划分发耗时的中位数 + - P99 Query Plan Dispatch Time: 节点查询执行计划分发耗时的P99 - 执行计划执行耗时 - - The time consumed of query execution stages(avg): 节点查询执行计划执行耗时的平均值 - - The time consumed of query execution stages(50%): 节点查询执行计划执行耗时的中位数 - - The time consumed of query execution stages(99%): 节点查询执行计划执行耗时的P99 + - Average Query Execution Time: 节点查询执行计划执行耗时的平均值 + - P50 Query Execution Time: 节点查询执行计划执行耗时的中位数 + - P99 Query Execution Time: 节点查询执行计划执行耗时的P99 - 算子执行耗时 - - The time consumed of operator execution stages(avg): 节点查询算子执行耗时的平均值 - - The time consumed of operator execution(50%): 节点查询算子执行耗时的中位数 - - The time consumed of operator execution(99%): 节点查询算子执行耗时的P99 + - Average Query Operator Execution Time: 节点查询算子执行耗时的平均值 + - P50 Query Operator Execution Time: 节点查询算子执行耗时的中位数 + - P99 Query Operator Execution Time: 节点查询算子执行耗时的P99 - 聚合查询计算耗时 - - The time consumed of query aggregation(avg): 节点聚合查询计算耗时的平均值 - - The time consumed of query aggregation(50%): 节点聚合查询计算耗时的中位数 - - The time consumed of query aggregation(99%): 节点聚合查询计算耗时的P99 + - Average Query Aggregation Execution Time: 节点聚合查询计算耗时的平均值 + - P50 Query Aggregation Execution Time: 节点聚合查询计算耗时的中位数 + - P99 Query Aggregation Execution Time: 节点聚合查询计算耗时的P99 - 文件/内存接口耗时 - - The time consumed of query scan(avg): 节点查询文件/内存接口耗时的平均值 - - The time consumed of query scan(50%): 节点查询文件/内存接口耗时的中位数 - - The time consumed of query scan(99%): 节点查询文件/内存接口耗时的P99 + - Average Query Scan Execution Time: 节点查询文件/内存接口耗时的平均值 + - P50 Query Scan Execution Time: 节点查询文件/内存接口耗时的中位数 + - P99 Query Scan Execution Time: 节点查询文件/内存接口耗时的P99 - 资源访问数量 - - The usage of query resource(avg): 节点查询资源访问数量的平均值 - - The usage of query resource(50%): 节点查询资源访问数量的中位数 - - The usage of query resource(99%): 节点查询资源访问数量的P99 + - Average Query Resource Utilization: 节点查询资源访问数量的平均值 + - P50 Query Resource Utilization: 节点查询资源访问数量的中位数 + - P99 Query Resource Utilization: 节点查询资源访问数量的P99 - 数据传输耗时 - - The time consumed of query data exchange(avg): 节点查询数据传输耗时的平均值 - - The time consumed of query data exchange(50%): 节点查询数据传输耗时的中位数 - - The time consumed of query data exchange(99%): 节点查询数据传输耗时的P99 + - Average Query Data Exchange Latency: 节点查询数据传输耗时的平均值 + - P50 Query Data Exchange Latency: 节点查询数据传输耗时的中位数 + - P99 Query Data Exchange Latency: 节点查询数据传输耗时的P99 - 数据传输数量 - - The count of Data Exchange(avg): 节点查询的数据传输数量的平均值 - - The count of Data Exchange: 节点查询的数据传输数量的分位数,包括中位数和P99 + - Average Query Data Exchange Count: 节点查询的数据传输数量的平均值 + - Query Data Exchange Count: 节点查询的数据传输数量的分位数,包括中位数和P99 - 任务调度数量与耗时 - - The number of query queue: 节点查询任务调度数量 - - The time consumed of query schedule time(avg): 节点查询任务调度耗时的平均值 - - The time consumed of query schedule time(50%): 节点查询任务调度耗时的中位数 - - The time consumed of query schedule time(99%): 节点查询任务调度耗时的P99 + - Query Queue Length: 节点查询任务调度数量 + - Average Query Scheduling Latency: 节点查询任务调度耗时的平均值 + - P50 Query Scheduling Latency: 节点查询任务调度耗时的中位数 + - P99 Query Scheduling Latency: 节点查询任务调度耗时的P99 -#### 3.4.7 Query Interface +#### Query Interface - 加载时间序列元数据 - - The time consumed of load timeseries metadata(avg): 节点查询加载时间序列元数据耗时的平均值 - - The time consumed of load timeseries metadata(50%): 节点查询加载时间序列元数据耗时的中位数 - - The time consumed of load timeseries metadata(99%): 节点查询加载时间序列元数据耗时的P99 + - Average Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的平均值 + - P50 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的中位数 + - P99 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的P99 - 读取时间序列 - - The time consumed of read timeseries metadata(avg): 节点查询读取时间序列耗时的平均值 - - The time consumed of read timeseries metadata(50%): 节点查询读取时间序列耗时的中位数 - - The time consumed of read timeseries metadata(99%): 节点查询读取时间序列耗时的P99 + - Average Timeseries Metadata Read Time: 节点查询读取时间序列耗时的平均值 + - P50 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的中位数 + - P99 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的P99 - 修改时间序列元数据 - - The time consumed of timeseries metadata modification(avg): 节点查询修改时间序列元数据耗时的平均值 - - The time consumed of timeseries metadata modification(50%): 节点查询修改时间序列元数据耗时的中位数 - - The time consumed of timeseries metadata modification(99%): 节点查询修改时间序列元数据耗时的P99 + - Average Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的平均值 + - P50 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的中位数 + - P99 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的P99 - 加载Chunk元数据列表 - - The time consumed of load chunk metadata list(avg): 节点查询加载Chunk元数据列表耗时的平均值 - - The time consumed of load chunk metadata list(50%): 节点查询加载Chunk元数据列表耗时的中位数 - - The time consumed of load chunk metadata list(99%): 节点查询加载Chunk元数据列表耗时的P99 + - Average Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的平均值 + - P50 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的中位数 + - P99 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的P99 - 修改Chunk元数据 - - The time consumed of chunk metadata modification(avg): 节点查询修改Chunk元数据耗时的平均值 - - The time consumed of chunk metadata modification(50%): 节点查询修改Chunk元数据耗时的总位数 - - The time consumed of chunk metadata modification(99%): 节点查询修改Chunk元数据耗时的P99 + - Average Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的平均值 + - P50 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的总位数 + - P99 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的P99 - 按照Chunk元数据过滤 - - The time consumed of chunk metadata filter(avg): 节点查询按照Chunk元数据过滤耗时的平均值 - - The time consumed of chunk metadata filter(50%): 节点查询按照Chunk元数据过滤耗时的中位数 - - The time consumed of chunk metadata filter(99%): 节点查询按照Chunk元数据过滤耗时的P99 + - Average Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的平均值 + - P50 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的中位数 + - P99 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的P99 - 构造Chunk Reader - - The time consumed of construct chunk reader(avg): 节点查询构造Chunk Reader耗时的平均值 - - The time consumed of construct chunk reader(50%): 节点查询构造Chunk Reader耗时的中位数 - - The time consumed of construct chunk reader(99%): 节点查询构造Chunk Reader耗时的P99 + - Average Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的平均值 + - P50 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的中位数 + - P99 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的P99 - 读取Chunk - - The time consumed of read chunk(avg): 节点查询读取Chunk耗时的平均值 - - The time consumed of read chunk(50%): 节点查询读取Chunk耗时的中位数 - - The time consumed of read chunk(99%): 节点查询读取Chunk耗时的P99 + - Average Chunk Read Time: 节点查询读取Chunk耗时的平均值 + - P50 Chunk Read Time: 节点查询读取Chunk耗时的中位数 + - P99 Chunk Read Time: 节点查询读取Chunk耗时的P99 - 初始化Chunk Reader - - The time consumed of init chunk reader(avg): 节点查询初始化Chunk Reader耗时的平均值 - - The time consumed of init chunk reader(50%): 节点查询初始化Chunk Reader耗时的中位数 - - The time consumed of init chunk reader(99%): 节点查询初始化Chunk Reader耗时的P99 -- 通过 Page Reader 构造 TsBlock - - The time consumed of build tsblock from page reader(avg): 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from page reader(50%): 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from page reader(99%): 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 + - Average Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的平均值 + - P50 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的中位数 + - P99 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的P99 +- 通过 Page Reader 构造 TsBlock + - Average TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 + - P50 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 + - P99 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 - 查询通过 Merge Reader 构造 TsBlock - - The time consumed of build tsblock from merge reader(avg): 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from merge reader(50%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from merge reader(99%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 + - Average TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 + - P50 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 + - P99 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 -#### 3.4.8 Query Data Exchange +#### Query Data Exchange 查询的数据交换耗时。 -- 通过 source handle 获取 TsBlock - - The time consumed of source handle get tsblock(avg): 节点查询通过 source handle 获取 TsBlock 耗时的平均值 - - The time consumed of source handle get tsblock(50%): 节点查询通过 source handle 获取 TsBlock 耗时的中位数 - - The time consumed of source handle get tsblock(99%): 节点查询通过 source handle 获取 TsBlock 耗时的P99 +- 通过 source handle 获取 TsBlock + - Average Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的平均值 + - P50 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的中位数 + - P99 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的P99 - 通过 source handle 反序列化 TsBlock - - The time consumed of source handle deserialize tsblock(avg): 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 - - The time consumed of source handle deserialize tsblock(50%): 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 - - The time consumed of source handle deserialize tsblock(99%): 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 -- 通过 sink handle 发送 TsBlock - - The time consumed of sink handle send tsblock(avg): 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 - - The time consumed of sink handle send tsblock(50%): 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 - - The time consumed of sink handle send tsblock(99%): 节点查询通过 sink handle 发送 TsBlock 耗时的P99 + - Average Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 + - P50 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 + - P99 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 +- 通过 sink handle 发送 TsBlock + - Average Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 + - P50 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 + - P99 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的P99 - 回调 data block event - - The time consumed of on acknowledge data block event task(avg): 节点查询回调 data block event 耗时的平均值 - - The time consumed of on acknowledge data block event task(50%): 节点查询回调 data block event 耗时的中位数 - - The time consumed of on acknowledge data block event task(99%): 节点查询回调 data block event 耗时的P99 -- 获取 data block task - - The time consumed of get data block task(avg): 节点查询获取 data block task 耗时的平均值 - - The time consumed of get data block task(50%): 节点查询获取 data block task 耗时的中位数 - - The time consumed of get data block task(99%): 节点查询获取 data block task 耗时的 P99 + - Average Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的平均值 + - P50 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的中位数 + - P99 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的P99 +- 获取 data block task + - Average Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的平均值 + - P50 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的中位数 + - P99 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的 P99 -#### 3.4.9 Query Related Resource +#### Query Related Resource - MppDataExchangeManager: 节点查询时 shuffle sink handle 和 source handle 的数量 - LocalExecutionPlanner: 节点可分配给查询分片的剩余内存 @@ -643,40 +643,40 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - Coordinator: 节点上记录的查询数量 - MemoryPool Size: 节点查询相关的内存池情况 - MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 -- DriverScheduler: 节点查询相关的队列任务数量 +- DriverScheduler Count: 节点查询相关的队列任务数量 -#### 3.4.10 Consensus - IoT Consensus +#### Consensus - IoT Consensus - 内存使用 - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 - 节点间同步情况 - - IoTConsensus Sync Index: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 + - IoTConsensus Sync Index Size: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 - - IoTConsensus Search Index Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 - - IoTConsensus Safe Index Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 + - IoTConsensus Search Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 + - IoTConsensus Safe Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 - - Sync Speed Diff Of Peers: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 + - Peer Sync Speed Difference: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 - 不同执行阶段耗时 - - The Time Consumed Of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 - - The Time Consumed Of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 - - The Time Consumed Of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 + - The Time Consumed of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 + - The Time Consumed of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 + - The Time Consumed of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 -#### 3.4.11 Consensus - DataRegion Ratis Consensus +#### Consensus - DataRegion Ratis Consensus -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 不同阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的 QPS -- RatisConsensus Memory: 节点 Ratis 的内存使用情况 +- Ratis Consensus Stage Latency: 节点 Ratis 不同阶段的耗时 +- Ratis Log Write Latency: 节点 Ratis 写 Log 不同阶段的耗时 +- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的 QPS +- RatisConsensus Memory Usage: 节点 Ratis 的内存使用情况 -#### 3.4.12 Consensus - SchemaRegion Ratis Consensus +#### Consensus - SchemaRegion Ratis Consensus -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 各阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的QPS -- RatisConsensus Memory: 节点 Ratis 内存使用情况 \ No newline at end of file +- RatisConsensus Stage Latency: 节点 Ratis 不同阶段的耗时 +- Ratis Log Write Latency: 节点 Ratis 写 Log 各阶段的耗时 +- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的QPS +- RatisConsensus Memory Usage: 节点 Ratis 内存使用情况 \ No newline at end of file diff --git a/src/zh/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md index d2156fa29..0e77aea91 100644 --- a/src/zh/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ b/src/zh/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -194,18 +194,18 @@ cd grafana-* #### CPU -- CPU Core:CPU 核数 -- CPU Load: - - System CPU Load:整个系统在采样时间内 CPU 的平均负载和繁忙程度 - - Process CPU Load:IoTDB 进程在采样时间内占用的 CPU 比例 +- CPU Cores:CPU 核数 +- CPU Utilization: + - System CPU Utilization:整个系统在采样时间内 CPU 的平均负载和繁忙程度 + - Process CPU Utilization:IoTDB 进程在采样时间内占用的 CPU 比例 - CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 #### Memory - System Memory:当前系统内存的使用情况。 - - Commited vm size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 - - Total physical memory:系统可用物理内存的总量。 - - Used physical memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 + - Commited VM Size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 + - Total Physical Memory:系统可用物理内存的总量。 + - Used Physical Memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 - System Swap Memory:交换空间(Swap Space)内存用量。 - Process Memory:IoTDB 进程使用内存的情况。 - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) @@ -215,142 +215,142 @@ cd grafana-* #### Disk - Disk Space: - - Total disk space:IoTDB 可使用的最大磁盘空间。 - - Used disk space:IoTDB 已经使用的磁盘空间。 -- Log Number Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 + - Total Disk Space:IoTDB 可使用的最大磁盘空间。 + - Used Disk Space:IoTDB 已经使用的磁盘空间。 +- Logs Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 - File Count:IoTDB 相关文件数量 - - all:所有文件数量 + - All:所有文件数量 - TsFile:TsFile 数量 - - seq:顺序 TsFile 数量 - - unseq:乱序 TsFile 数量 - - wal:WAL 文件数量 - - cross-temp:跨空间合并 temp 文件数量 - - inner-seq-temp:顺序空间内合并 temp 文件数量 - - innser-unseq-temp:乱序空间内合并 temp 文件数量 - - mods:墓碑文件数量 -- Open File Count:系统打开的文件句柄数量 + - Seq:顺序 TsFile 数量 + - Unseq:乱序 TsFile 数量 + - WAL:WAL 文件数量 + - Cross-Temp:跨空间合并 temp 文件数量 + - Inner-Seq-Temp:顺序空间内合并 temp 文件数量 + - Innsr-Unseq-Temp:乱序空间内合并 temp 文件数量 + - Mods:墓碑文件数量 +- Open File Handles:系统打开的文件句柄数量 - File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 -- Disk I/O Busy Rate:等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 +- Disk Utilization (%):等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 - Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 -- Disk I/O Ops:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 -- Disk I/O Avg Time:等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 -- Disk I/O Avg Size:等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 -- Disk I/O Avg Queue Size:等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 -- I/O System Call Rate:进程调用读写系统调用的频率,类似于 IOPS。 +- Disk IOPS:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 +- Disk I/O Latency (Avg):等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 +- Disk I/O Request Size (Avg):等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 +- Disk I/O Queue Length (Avg):等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 +- I/O Syscall Rate:进程调用读写系统调用的频率,类似于 IOPS。 - I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 #### JVM - GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 -- GC Allocated/Promoted Size Detail: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 -- GC Data Size Detail:节点 JVM 长期存活的对象大小和对应代际允许的最大值 +- GC Allocated/Promoted Size: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 +- GC Live Data Size:节点 JVM 长期存活的对象大小和对应代际允许的最大值 - Heap Memory:JVM 堆内存使用情况。 - - Maximum heap memory:JVM 最大可用的堆内存大小。 - - Committed heap memory:JVM 已提交的堆内存大小。 - - Used heap memory:JVM 已经使用的堆内存大小。 + - Maximum Heap Memory:JVM 最大可用的堆内存大小。 + - Committed Heap Memory:JVM 已提交的堆内存大小。 + - Used Heap Memory:JVM 已经使用的堆内存大小。 - PS Eden Space:PS Young 区的大小。 - PS Old Space:PS Old 区的大小。 - PS Survivor Space:PS Survivor 区的大小。 - ...(CMS/G1/ZGC 等) -- Off Heap Memory:堆外内存用量。 - - direct memory:堆外直接内存。 - - mapped memory:堆外映射内存。 -- GC Number Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC -- GC Number Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC -- GC Time Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC -- Time Consumed Of Compilation Per Minute:每分钟 JVM 用于编译的总时间 -- The Number of Class: - - loaded:JVM 目前已经加载的类的数量 - - unloaded:系统启动至今 JVM 卸载的类的数量 -- The Number of Java Thread:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 +- Off-Heap Memory:堆外内存用量。 + - Direct Memory:堆外直接内存。 + - Mapped Memory:堆外映射内存。 +- GCs Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC +- GC Latency Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC +- GC Events Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC +- GC Pause Time Breakdown Per Minute:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC +- JIT Compilation Time Per Minute:每分钟 JVM 用于编译的总时间 +- Loaded & Unloaded Classes: + - Loaded:JVM 目前已经加载的类的数量 + - Unloaded:系统启动至今 JVM 卸载的类的数量 +- Active Java Threads:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 #### Network eno 指的是到公网的网卡,lo 是虚拟网卡。 -- Net Speed:网卡发送和接收数据的速度 -- Receive/Transmit Data Size:网卡发送或者接收的数据包大小,自系统重启后算起 -- Packet Speed:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 -- Connection Num:当前选定进程的 socket 连接数(IoTDB只有 TCP) +- Network Speed:网卡发送和接收数据的速度 +- Network Throughput (Receive/Transmit):网卡发送或者接收的数据包大小,自系统重启后算起 +- Packet Transmission Rate:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 +- Active TCP Connections:当前选定进程的 socket 连接数(IoTDB只有 TCP) ### 3.2 整体性能面板(Performance Overview Dashboard) #### Cluster Overview -- Total CPU Core: 集群机器 CPU 总核数 +- Total CPU Cores: 集群机器 CPU 总核数 - DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 - 磁盘 - Total Disk Space: 集群机器磁盘总大小 - - DataNode Disk Usage: 集群各 DataNode 的磁盘使用率 -- Total Timeseries: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 -- Cluster: 集群 ConfigNode 和 DataNode 节点数量 + - DataNode Disk Utilization: 集群各 DataNode 的磁盘使用率 +- Total Time Series: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 +- Cluster Info: 集群 ConfigNode 和 DataNode 节点数量 - Up Time: 集群启动至今的时长 -- Total Write Point Per Second: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 +- Total Write Throughput: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 - 内存 - Total System Memory: 集群机器系统内存总大小 - Total Swap Memory: 集群机器交换内存总大小 - - DataNode Process Memory Usage: 集群各 DataNode 的内存使用率 -- Total File Number: 集群管理文件总数量 + - DataNode Process Memory Utilization: 集群各 DataNode 的内存使用率 +- Total Files: 集群管理文件总数量 - Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 -- Total DataBase: 集群管理的 Database 总数(含副本) -- Total DataRegion: 集群管理的 DataRegion 总数 -- Total SchemaRegion: 集群管理的 SchemaRegion 总数 +- Total DataBases: 集群管理的 Database 总数(含副本) +- Total DataRegions: 集群管理的 DataRegion 总数 +- Total SchemaRegions: 集群管理的 SchemaRegion 总数 #### Node Overview -- CPU Core: 节点所在机器的 CPU 核数 +- CPU Cores: 节点所在机器的 CPU 核数 - Disk Space: 节点所在机器的磁盘大小 -- Timeseries: 节点所在机器管理的时间序列数量(含副本) +- Time Series: 节点所在机器管理的时间序列数量(含副本) - System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 -- Write Point Per Second: 节点所在机器的每秒写入速度(含副本) +- Write Throughput: 节点所在机器的每秒写入速度(含副本) - System Memory: 节点所在机器的系统内存大小 - Swap Memory: 节点所在机器的交换内存大小 -- File Number: 节点管理的文件数 +- File Count: 节点管理的文件数 #### Performance - Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 -- Client Connection: 节点的客户端连接情况,包括总连接数和活跃连接数 -- Time Consumed Of Operation: 节点的各类型操作耗时,包括平均值和P99 -- Average Time Consumed Of Interface: 节点的各个 thrift 接口平均耗时 -- P99 Time Consumed Of Interface: 节点的各个 thrift 接口的 P99 耗时数 -- Task Number: 节点的各项系统任务数量 -- Average Time Consumed of Task: 节点的各项系统任务的平均耗时 -- P99 Time Consumed of Task: 节点的各项系统任务的 P99 耗时 -- Operation Per Second: 节点的每秒操作数 +- Client Connections: 节点的客户端连接情况,包括总连接数和活跃连接数 +- Operation Latency: 节点的各类型操作耗时,包括平均值和P99 +- Average Interface Latency: 节点的各个 thrift 接口平均耗时 +- P99 Interface Latency: 节点的各个 thrift 接口的 P99 耗时数 +- Total Tasks: 节点的各项系统任务数量 +- Average Task Latency: 节点的各项系统任务的平均耗时 +- P99 Task Latency: 节点的各项系统任务的 P99 耗时 +- Operations Per Second: 节点的每秒操作数 - 主流程 - - Operation Per Second Of Stage: 节点主流程各阶段的每秒操作数 - - Average Time Consumed Of Stage: 节点主流程各阶段平均耗时 - - P99 Time Consumed Of Stage: 节点主流程各阶段 P99 耗时 + - Operations Per Second (Stage-wise): 节点主流程各阶段的每秒操作数 + - Average Stage Latency: 节点主流程各阶段平均耗时 + - P99 Stage Latency: 节点主流程各阶段 P99 耗时 - Schedule 阶段 - - OPS Of Schedule: 节点 schedule 阶段各子阶段每秒操作数 - - Average Time Consumed Of Schedule Stage: 节点 schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Schedule Stage: 节点的 schedule 阶段各子阶段 P99 耗时 + - Schedule Operations Per Second: 节点 schedule 阶段各子阶段每秒操作数 + - Average Schedule Stage Latency: 节点 schedule 阶段各子阶段平均耗时 + - P99 Schedule Stage Latency: 节点的 schedule 阶段各子阶段 P99 耗时 - Local Schedule 各子阶段 - - OPS Of Local Schedule Stage: 节点 local schedule 各子阶段每秒操作数 - - Average Time Consumed Of Local Schedule Stage: 节点 local schedule 阶段各子阶段平均耗时 - - P99 Time Consumed Of Local Schedule Stage: 节点的 local schedule 阶段各子阶段 P99 耗时 + - Local Schedule Operations Per Second: 节点 local schedule 各子阶段每秒操作数 + - Average Local Schedule Stage Latency: 节点 local schedule 阶段各子阶段平均耗时 + - P99 Local Schedule Latency: 节点的 local schedule 阶段各子阶段 P99 耗时 - Storage 阶段 - - OPS Of Storage Stage: 节点 storage 阶段各子阶段每秒操作数 - - Average Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段平均耗时 - - P99 Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段 P99 耗时 + - Storage Operations Per Second: 节点 storage 阶段各子阶段每秒操作数 + - Average Storage Stage Latency: 节点 storage 阶段各子阶段平均耗时 + - P99 Storage Stage Latency: 节点 storage 阶段各子阶段 P99 耗时 - Engine 阶段 - - OPS Of Engine Stage: 节点 engine 阶段各子阶段每秒操作数 - - Average Time Consumed Of Engine Stage: 节点的 engine 阶段各子阶段平均耗时 - - P99 Time Consumed Of Engine Stage: 节点 engine 阶段各子阶段的 P99 耗时 + - Engine Operations Per Second: 节点 engine 阶段各子阶段每秒操作数 + - Average Engine Stage Latency: 节点的 engine 阶段各子阶段平均耗时 + - P99 Engine Stage Latency: 节点 engine 阶段各子阶段的 P99 耗时 #### System -- CPU Load: 节点的 CPU 负载 -- CPU Time Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 -- GC Time Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC +- CPU Utilization: 节点的 CPU 负载 +- CPU Latency Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 +- GC Latency Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC - Heap Memory: 节点的堆内存使用情况 -- Off Heap Memory: 节点的非堆内存使用情况 -- The Number Of Java Thread: 节点的 Java 线程数量情况 +- Off-Heap Memory: 节点的非堆内存使用情况 +- Total Java Threads: 节点的 Java 线程数量情况 - File Count: 节点管理的文件数量情况 - File Size: 节点管理文件大小情况 -- Log Number Per Minute: 节点的每分钟不同类型日志情况 +- Logs Per Minute: 节点的每分钟不同类型日志情况 ### 3.3 ConfigNode 面板(ConfigNode Dashboard) @@ -361,13 +361,13 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - Database Count: 节点的数据库数量 - Region - DataRegion Count: 节点的 DataRegion 数量 - - DataRegion Current Status: 节点的 DataRegion 的状态 + - DataRegion Status: 节点的 DataRegion 的状态 - SchemaRegion Count: 节点的 SchemaRegion 数量 - - SchemaRegion Current Status: 节点的 SchemaRegion 的状态 -- System Memory: 节点的系统内存大小 -- Swap Memory: 节点的交换区内存大小 -- ConfigNodes: 节点所在集群的 ConfigNode 的运行状态 -- DataNodes: 节点所在集群的 DataNode 情况 + - SchemaRegion Status: 节点的 SchemaRegion 的状态 +- System Memory Utilization: 节点的系统内存大小 +- Swap Memory Utilization: 节点的交换区内存大小 +- ConfigNodes Status: 节点所在集群的 ConfigNode 的运行状态 +- DataNodes Status: 节点所在集群的 DataNode 情况 - System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 #### NodeInfo @@ -383,15 +383,15 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 #### Protocol - 客户端数量统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点各线程池的借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 + - Active Clients: 节点各线程池的活跃客户端数量 + - Idle Clients: 节点各线程池的空闲客户端数量 + - Borrowed Clients Per Second: 节点各线程池的借用客户端数量 + - Created Clients Per Second: 节点各线程池的创建客户端数量 + - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 - 客户端时间情况 - - Client Mean Active Time: 节点各线程池客户端的平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + - Average Client Active Time: 节点各线程池客户端的平均活跃时间 + - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 + - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 #### Partition Table @@ -404,11 +404,11 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 #### Consensus -- Ratis Stage Time: 节点的 Ratis 各阶段耗时 -- Write Log Entry: 节点的 Ratis 写 Log 的耗时 -- Remote / Local Write Time: 节点的 Ratis 的远程写入和本地写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 的远程和本地写入的 QPS -- RatisConsensus Memory: 节点 Ratis 共识协议的内存使用 +- Ratis Stage Latency: 节点的 Ratis 各阶段耗时 +- Write Log Entry Latency: 节点的 Ratis 写 Log 的耗时 +- Remote/Local Write Latency: 节点的 Ratis 的远程写入和本地写入的耗时 +- Remote/Local Write Throughput: 节点 Ratis 的远程和本地写入的 QPS +- RatisConsensus Memory Utilization: 节点 Ratis 共识协议的内存使用 ### 3.4 DataNode 面板(DataNode Dashboard) @@ -416,82 +416,82 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 #### Node Overview -- The Number Of Entity: 节点管理的实体情况 -- Write Point Per Second: 节点的每秒写入速度 +- Total Managed Entities: 节点管理的实体情况 +- Write Throughput: 节点的每秒写入速度 - Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 #### Protocol - 节点操作耗时 - - The Time Consumed Of Operation (avg): 节点的各项操作的平均耗时 - - The Time Consumed Of Operation (50%): 节点的各项操作耗时的中位数 - - The Time Consumed Of Operation (99%): 节点的各项操作耗时的P99 + - Average Operation Latency: 节点的各项操作的平均耗时 + - P50 Operation Latency: 节点的各项操作耗时的中位数 + - P99 Operation Latency: 节点的各项操作耗时的P99 - Thrift统计 - - The QPS Of Interface: 节点各个 Thrift 接口的 QPS - - The Avg Time Consumed Of Interface: 节点各个 Thrift 接口的平均耗时 - - Thrift Connection: 节点的各类型的 Thrfit 连接数量 - - Thrift Active Thread: 节点各类型的活跃 Thrift 连接数量 + - Thrift Interface QPS: 节点各个 Thrift 接口的 QPS + - Average Thrift Interface Latency: 节点各个 Thrift 接口的平均耗时 + - Thrift Connections: 节点的各类型的 Thrfit 连接数量 + - Active Thrift Threads: 节点各类型的活跃 Thrift 连接数量 - 客户端统计 - - Active Client Num: 节点各线程池的活跃客户端数量 - - Idle Client Num: 节点各线程池的空闲客户端数量 - - Borrowed Client Count: 节点的各线程池借用客户端数量 - - Created Client Count: 节点各线程池的创建客户端数量 - - Destroyed Client Count: 节点各线程池的销毁客户端数量 - - Client Mean Active Time: 节点各线程池的客户端平均活跃时间 - - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 - - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + - Active Clients: 节点各线程池的活跃客户端数量 + - Idle Clients: 节点各线程池的空闲客户端数量 + - Borrowed Clients Per Second: 节点的各线程池借用客户端数量 + - Created Clients Per Second: 节点各线程池的创建客户端数量 + - Destroyed Clients Per Second: 节点各线程池的销毁客户端数量 + - Average Client Active Time: 节点各线程池的客户端平均活跃时间 + - Average Client Borrowing Latency: 节点各线程池的客户端平均借用等待时间 + - Average Client Idle Time: 节点各线程池的客户端平均空闲时间 #### Storage Engine - File Count: 节点管理的各类型文件数量 - File Size: 节点管理的各类型文件大小 - TsFile - - TsFile Total Size In Each Level: 节点管理的各级别 TsFile 文件总大小 - - TsFile Count In Each Level: 节点管理的各级别 TsFile 文件数量 - - Avg TsFile Size In Each Level: 节点管理的各级别 TsFile 文件的平均大小 -- Task Number: 节点的 Task 数量 -- The Time Consumed of Task: 节点的 Task 的耗时 + - Total TsFile Size Per Level: 节点管理的各级别 TsFile 文件总大小 + - TsFile Count Per Level: 节点管理的各级别 TsFile 文件数量 + - Average TsFile Size Per Level: 节点管理的各级别 TsFile 文件的平均大小 +- Total Tasks: 节点的 Task 数量 +- Task Latency: 节点的 Task 的耗时 - Compaction - - Compaction Read And Write Per Second: 节点的每秒钟合并读写速度 - - Compaction Number Per Minute: 节点的每分钟合并数量 - - Compaction Process Chunk Status: 节点合并不同状态的 Chunk 的数量 - - Compacted Point Num Per Minute: 节点每分钟合并的点数 + - Compaction Read/Write Throughput: 节点的每秒钟合并读写速度 + - Compactions Per Minute: 节点的每分钟合并数量 + - Compaction Chunk Status: 节点合并不同状态的 Chunk 的数量 + - Compacted-Points Per Minute: 节点每分钟合并的点数 #### Write Performance -- Write Cost(avg): 节点写入耗时平均值,包括写入 wal 和 memtable -- Write Cost(50%): 节点写入耗时中位数,包括写入 wal 和 memtable -- Write Cost(99%): 节点写入耗时的P99,包括写入 wal 和 memtable +- Average Write Latency: 节点写入耗时平均值,包括写入 wal 和 memtable +- P50 Write Latency: 节点写入耗时中位数,包括写入 wal 和 memtable +- P99 Write Latency: 节点写入耗时的P99,包括写入 wal 和 memtable - WAL - WAL File Size: 节点管理的 WAL 文件总大小 - - WAL File Num: 节点管理的 WAL 文件数量 - - WAL Nodes Num: 节点管理的 WAL Node 数量 - - Make Checkpoint Costs: 节点创建各类型的 CheckPoint 的耗时 - - WAL Serialize Total Cost: 节点 WAL 序列化总耗时 + - WAL Files: 节点管理的 WAL 文件数量 + - WAL Nodes: 节点管理的 WAL Node 数量 + - Checkpoint Creation Time: 节点创建各类型的 CheckPoint 的耗时 + - WAL Serialization Time (Total): 节点 WAL 序列化总耗时 - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 - - Effective Info Ratio Of WALNode: 节点的不同 WALNode 的有效信息比 + - WALNode Effective Info Ratio: 节点的不同 WALNode 的有效信息比 - WAL Buffer - - WAL Buffer Cost: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 + - WAL Buffer Latency: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 - Flush统计 - - Flush MemTable Cost(avg): 节点 Flush 的总耗时和各个子阶段耗时的平均值 - - Flush MemTable Cost(50%): 节点 Flush 的总耗时和各个子阶段耗时的中位数 - - Flush MemTable Cost(99%): 节点 Flush 的总耗时和各个子阶段耗时的 P99 - - Flush Sub Task Cost(avg): 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(50%): 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 - - Flush Sub Task Cost(99%): 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 + - Average Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的平均值 + - P50 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的中位数 + - P99 Flush Latency: 节点 Flush 的总耗时和各个子阶段耗时的 P99 + - Average Flush Subtask Latency: 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 + - P50 Flush Subtask Latency: 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 + - P99 Flush Subtask Latency: 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 - Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 - Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 -- Tsfile Compression Ratio Of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 -- Flush TsFile Size Of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 -- Size Of Flushing MemTable: 节点刷盘的 Memtable 的大小 -- Points Num Of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 -- Series Num Of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 -- Average Point Num Of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 +- Tsfile Compression Ratio of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 +- Flush TsFile Size of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 +- Size of Flushing MemTable: 节点刷盘的 Memtable 的大小 +- Points Num of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 +- Series Num of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 +- Average Point Num of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 #### Schema Engine @@ -525,117 +525,117 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 #### Query Engine - 各阶段耗时 - - The time consumed of query plan stages(avg): 节点查询各阶段耗时的平均值 - - The time consumed of query plan stages(50%): 节点查询各阶段耗时的中位数 - - The time consumed of query plan stages(99%): 节点查询各阶段耗时的P99 + - Average Query Plan Execution Time: 节点查询各阶段耗时的平均值 + - P50 Query Plan Execution Time: 节点查询各阶段耗时的中位数 + - P99 Query Plan Execution Time: 节点查询各阶段耗时的P99 - 执行计划分发耗时 - - The time consumed of plan dispatch stages(avg): 节点查询执行计划分发耗时的平均值 - - The time consumed of plan dispatch stages(50%): 节点查询执行计划分发耗时的中位数 - - The time consumed of plan dispatch stages(99%): 节点查询执行计划分发耗时的P99 + - Average Query Plan Dispatch Time: 节点查询执行计划分发耗时的平均值 + - P50 Query Plan Dispatch Time: 节点查询执行计划分发耗时的中位数 + - P99 Query Plan Dispatch Time: 节点查询执行计划分发耗时的P99 - 执行计划执行耗时 - - The time consumed of query execution stages(avg): 节点查询执行计划执行耗时的平均值 - - The time consumed of query execution stages(50%): 节点查询执行计划执行耗时的中位数 - - The time consumed of query execution stages(99%): 节点查询执行计划执行耗时的P99 + - Average Query Execution Time: 节点查询执行计划执行耗时的平均值 + - P50 Query Execution Time: 节点查询执行计划执行耗时的中位数 + - P99 Query Execution Time: 节点查询执行计划执行耗时的P99 - 算子执行耗时 - - The time consumed of operator execution stages(avg): 节点查询算子执行耗时的平均值 - - The time consumed of operator execution(50%): 节点查询算子执行耗时的中位数 - - The time consumed of operator execution(99%): 节点查询算子执行耗时的P99 + - Average Query Operator Execution Time: 节点查询算子执行耗时的平均值 + - P50 Query Operator Execution Time: 节点查询算子执行耗时的中位数 + - P99 Query Operator Execution Time: 节点查询算子执行耗时的P99 - 聚合查询计算耗时 - - The time consumed of query aggregation(avg): 节点聚合查询计算耗时的平均值 - - The time consumed of query aggregation(50%): 节点聚合查询计算耗时的中位数 - - The time consumed of query aggregation(99%): 节点聚合查询计算耗时的P99 + - Average Query Aggregation Execution Time: 节点聚合查询计算耗时的平均值 + - P50 Query Aggregation Execution Time: 节点聚合查询计算耗时的中位数 + - P99 Query Aggregation Execution Time: 节点聚合查询计算耗时的P99 - 文件/内存接口耗时 - - The time consumed of query scan(avg): 节点查询文件/内存接口耗时的平均值 - - The time consumed of query scan(50%): 节点查询文件/内存接口耗时的中位数 - - The time consumed of query scan(99%): 节点查询文件/内存接口耗时的P99 + - Average Query Scan Execution Time: 节点查询文件/内存接口耗时的平均值 + - P50 Query Scan Execution Time: 节点查询文件/内存接口耗时的中位数 + - P99 Query Scan Execution Time: 节点查询文件/内存接口耗时的P99 - 资源访问数量 - - The usage of query resource(avg): 节点查询资源访问数量的平均值 - - The usage of query resource(50%): 节点查询资源访问数量的中位数 - - The usage of query resource(99%): 节点查询资源访问数量的P99 + - Average Query Resource Utilization: 节点查询资源访问数量的平均值 + - P50 Query Resource Utilization: 节点查询资源访问数量的中位数 + - P99 Query Resource Utilization: 节点查询资源访问数量的P99 - 数据传输耗时 - - The time consumed of query data exchange(avg): 节点查询数据传输耗时的平均值 - - The time consumed of query data exchange(50%): 节点查询数据传输耗时的中位数 - - The time consumed of query data exchange(99%): 节点查询数据传输耗时的P99 + - Average Query Data Exchange Latency: 节点查询数据传输耗时的平均值 + - P50 Query Data Exchange Latency: 节点查询数据传输耗时的中位数 + - P99 Query Data Exchange Latency: 节点查询数据传输耗时的P99 - 数据传输数量 - - The count of Data Exchange(avg): 节点查询的数据传输数量的平均值 - - The count of Data Exchange: 节点查询的数据传输数量的分位数,包括中位数和P99 + - Average Query Data Exchange Count: 节点查询的数据传输数量的平均值 + - Query Data Exchange Count: 节点查询的数据传输数量的分位数,包括中位数和P99 - 任务调度数量与耗时 - - The number of query queue: 节点查询任务调度数量 - - The time consumed of query schedule time(avg): 节点查询任务调度耗时的平均值 - - The time consumed of query schedule time(50%): 节点查询任务调度耗时的中位数 - - The time consumed of query schedule time(99%): 节点查询任务调度耗时的P99 + - Query Queue Length: 节点查询任务调度数量 + - Average Query Scheduling Latency: 节点查询任务调度耗时的平均值 + - P50 Query Scheduling Latency: 节点查询任务调度耗时的中位数 + - P99 Query Scheduling Latency: 节点查询任务调度耗时的P99 #### Query Interface - 加载时间序列元数据 - - The time consumed of load timeseries metadata(avg): 节点查询加载时间序列元数据耗时的平均值 - - The time consumed of load timeseries metadata(50%): 节点查询加载时间序列元数据耗时的中位数 - - The time consumed of load timeseries metadata(99%): 节点查询加载时间序列元数据耗时的P99 + - Average Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的平均值 + - P50 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的中位数 + - P99 Timeseries Metadata Load Time: 节点查询加载时间序列元数据耗时的P99 - 读取时间序列 - - The time consumed of read timeseries metadata(avg): 节点查询读取时间序列耗时的平均值 - - The time consumed of read timeseries metadata(50%): 节点查询读取时间序列耗时的中位数 - - The time consumed of read timeseries metadata(99%): 节点查询读取时间序列耗时的P99 + - Average Timeseries Metadata Read Time: 节点查询读取时间序列耗时的平均值 + - P50 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的中位数 + - P99 Timeseries Metadata Read Time: 节点查询读取时间序列耗时的P99 - 修改时间序列元数据 - - The time consumed of timeseries metadata modification(avg): 节点查询修改时间序列元数据耗时的平均值 - - The time consumed of timeseries metadata modification(50%): 节点查询修改时间序列元数据耗时的中位数 - - The time consumed of timeseries metadata modification(99%): 节点查询修改时间序列元数据耗时的P99 + - Average Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的平均值 + - P50 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的中位数 + - P99 Timeseries Metadata Modification Time: 节点查询修改时间序列元数据耗时的P99 - 加载Chunk元数据列表 - - The time consumed of load chunk metadata list(avg): 节点查询加载Chunk元数据列表耗时的平均值 - - The time consumed of load chunk metadata list(50%): 节点查询加载Chunk元数据列表耗时的中位数 - - The time consumed of load chunk metadata list(99%): 节点查询加载Chunk元数据列表耗时的P99 + - Average Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的平均值 + - P50 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的中位数 + - P99 Chunk Metadata List Load Time: 节点查询加载Chunk元数据列表耗时的P99 - 修改Chunk元数据 - - The time consumed of chunk metadata modification(avg): 节点查询修改Chunk元数据耗时的平均值 - - The time consumed of chunk metadata modification(50%): 节点查询修改Chunk元数据耗时的总位数 - - The time consumed of chunk metadata modification(99%): 节点查询修改Chunk元数据耗时的P99 + - Average Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的平均值 + - P50 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的总位数 + - P99 Chunk Metadata Modification Time: 节点查询修改Chunk元数据耗时的P99 - 按照Chunk元数据过滤 - - The time consumed of chunk metadata filter(avg): 节点查询按照Chunk元数据过滤耗时的平均值 - - The time consumed of chunk metadata filter(50%): 节点查询按照Chunk元数据过滤耗时的中位数 - - The time consumed of chunk metadata filter(99%): 节点查询按照Chunk元数据过滤耗时的P99 + - Average Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的平均值 + - P50 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的中位数 + - P99 Chunk Metadata Filtering Time: 节点查询按照Chunk元数据过滤耗时的P99 - 构造Chunk Reader - - The time consumed of construct chunk reader(avg): 节点查询构造Chunk Reader耗时的平均值 - - The time consumed of construct chunk reader(50%): 节点查询构造Chunk Reader耗时的中位数 - - The time consumed of construct chunk reader(99%): 节点查询构造Chunk Reader耗时的P99 + - Average Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的平均值 + - P50 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的中位数 + - P99 Chunk Reader Construction Time: 节点查询构造Chunk Reader耗时的P99 - 读取Chunk - - The time consumed of read chunk(avg): 节点查询读取Chunk耗时的平均值 - - The time consumed of read chunk(50%): 节点查询读取Chunk耗时的中位数 - - The time consumed of read chunk(99%): 节点查询读取Chunk耗时的P99 + - Average Chunk Read Time: 节点查询读取Chunk耗时的平均值 + - P50 Chunk Read Time: 节点查询读取Chunk耗时的中位数 + - P99 Chunk Read Time: 节点查询读取Chunk耗时的P99 - 初始化Chunk Reader - - The time consumed of init chunk reader(avg): 节点查询初始化Chunk Reader耗时的平均值 - - The time consumed of init chunk reader(50%): 节点查询初始化Chunk Reader耗时的中位数 - - The time consumed of init chunk reader(99%): 节点查询初始化Chunk Reader耗时的P99 + - Average Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的平均值 + - P50 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的中位数 + - P99 Chunk Reader Initialization Time: 节点查询初始化Chunk Reader耗时的P99 - 通过 Page Reader 构造 TsBlock - - The time consumed of build tsblock from page reader(avg): 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from page reader(50%): 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from page reader(99%): 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 + - Average TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 + - P50 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 + - P99 TsBlock Construction Time from Page Reader: 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 - 查询通过 Merge Reader 构造 TsBlock - - The time consumed of build tsblock from merge reader(avg): 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 - - The time consumed of build tsblock from merge reader(50%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 - - The time consumed of build tsblock from merge reader(99%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 + - Average TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 + - P50 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 + - P99 TsBlock Construction Time from Merge Reader: 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 #### Query Data Exchange 查询的数据交换耗时。 - 通过 source handle 获取 TsBlock - - The time consumed of source handle get tsblock(avg): 节点查询通过 source handle 获取 TsBlock 耗时的平均值 - - The time consumed of source handle get tsblock(50%): 节点查询通过 source handle 获取 TsBlock 耗时的中位数 - - The time consumed of source handle get tsblock(99%): 节点查询通过 source handle 获取 TsBlock 耗时的P99 + - Average Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的平均值 + - P50 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的中位数 + - P99 Source Handle TsBlock Retrieval Time: 节点查询通过 source handle 获取 TsBlock 耗时的P99 - 通过 source handle 反序列化 TsBlock - - The time consumed of source handle deserialize tsblock(avg): 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 - - The time consumed of source handle deserialize tsblock(50%): 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 - - The time consumed of source handle deserialize tsblock(99%): 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 + - Average Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 + - P50 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 + - P99 Source Handle TsBlock Deserialization Time: 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 - 通过 sink handle 发送 TsBlock - - The time consumed of sink handle send tsblock(avg): 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 - - The time consumed of sink handle send tsblock(50%): 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 - - The time consumed of sink handle send tsblock(99%): 节点查询通过 sink handle 发送 TsBlock 耗时的P99 + - Average Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 + - P50 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 + - P99 Sink Handle TsBlock Transmission Time: 节点查询通过 sink handle 发送 TsBlock 耗时的P99 - 回调 data block event - - The time consumed of on acknowledge data block event task(avg): 节点查询回调 data block event 耗时的平均值 - - The time consumed of on acknowledge data block event task(50%): 节点查询回调 data block event 耗时的中位数 - - The time consumed of on acknowledge data block event task(99%): 节点查询回调 data block event 耗时的P99 + - Average Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的平均值 + - P50 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的中位数 + - P99 Data Block Event Acknowledgment Time: 节点查询回调 data block event 耗时的P99 - 获取 data block task - - The time consumed of get data block task(avg): 节点查询获取 data block task 耗时的平均值 - - The time consumed of get data block task(50%): 节点查询获取 data block task 耗时的中位数 - - The time consumed of get data block task(99%): 节点查询获取 data block task 耗时的 P99 + - Average Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的平均值 + - P50 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的中位数 + - P99 Data Block Task Retrieval Time: 节点查询获取 data block task 耗时的 P99 #### Query Related Resource @@ -645,40 +645,40 @@ eno 指的是到公网的网卡,lo 是虚拟网卡。 - Coordinator: 节点上记录的查询数量 - MemoryPool Size: 节点查询相关的内存池情况 - MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 -- DriverScheduler: 节点查询相关的队列任务数量 +- DriverScheduler Count: 节点查询相关的队列任务数量 #### Consensus - IoT Consensus - 内存使用 - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 - 节点间同步情况 - - IoTConsensus Sync Index: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 + - IoTConsensus Sync Index Size: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 - - IoTConsensus Search Index Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 - - IoTConsensus Safe Index Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 + - IoTConsensus Search Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 + - IoTConsensus Safe Index Growth Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 - - Sync Speed Diff Of Peers: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 + - Peer Sync Speed Difference: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 - 不同执行阶段耗时 - - The Time Consumed Of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 - - The Time Consumed Of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 - - The Time Consumed Of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 + - The Time Consumed of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 + - The Time Consumed of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 + - The Time Consumed of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 #### Consensus - DataRegion Ratis Consensus -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 不同阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的 QPS -- RatisConsensus Memory: 节点 Ratis 的内存使用情况 +- Ratis Consensus Stage Latency: 节点 Ratis 不同阶段的耗时 +- Ratis Log Write Latency: 节点 Ratis 写 Log 不同阶段的耗时 +- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的 QPS +- RatisConsensus Memory Usage: 节点 Ratis 的内存使用情况 #### Consensus - SchemaRegion Ratis Consensus -- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 -- Write Log Entry: 节点 Ratis 写 Log 各阶段的耗时 -- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 -- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的QPS -- RatisConsensus Memory: 节点 Ratis 内存使用情况 \ No newline at end of file +- RatisConsensus Stage Latency: 节点 Ratis 不同阶段的耗时 +- Ratis Log Write Latency: 节点 Ratis 写 Log 各阶段的耗时 +- Remote / Local Write Latency: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write Throughput (QPS): 节点 Ratis 在本地或者远端写入的QPS +- RatisConsensus Memory Usage: 节点 Ratis 内存使用情况 \ No newline at end of file