diff --git a/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md b/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md index 98f992ba0..cd20b3671 100644 --- a/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md +++ b/src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md @@ -154,30 +154,31 @@ SELECT LEAST(temperature,humidity) FROM table2; 2. Except for `COUNT()`, all other aggregate functions ignore null values and return null when there are no input rows or all values are null. For example, `SUM()` returns null instead of zero, and `AVG()` does not include null values in the count. -### 2.2 Supported Aggregate Functions - -| Function Name | Description | Allowed Input Types | Output Type | -|:--------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------|:-------------------------------------------| -| COUNT | Counts the number of data points. | All types | INT64 | -| COUNT_IF | COUNT_IF(exp) counts the number of rows that satisfy a specified boolean expression. | `exp` must be a boolean expression,(e.g. `count_if(temperature>20)`) | INT64 | -| SUM | Calculates the sum. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| AVG | Calculates the average. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| MAX | Finds the maximum value. | All types | Same as input type | -| MIN | Finds the minimum value. | All types | Same as input type | -| FIRST | Finds the value with the smallest timestamp that is not NULL. | All types | Same as input type | -| LAST | Finds the value with the largest timestamp that is not NULL. | All types | Same as input type | -| STDDEV | Alias for STDDEV_SAMP, calculates the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| STDDEV_POP | Calculates the population standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| STDDEV_SAMP | Calculates the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VARIANCE | Alias for VAR_SAMP, calculates the sample variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VAR_POP | Calculates the population variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VAR_SAMP | Calculates the sample variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| EXTREME | Finds the value with the largest absolute value. If the largest absolute values of positive and negative values are equal, returns the positive value. | INT32 INT64 FLOAT DOUBLE | Same as input type | -| MODE | Finds the mode. Note: 1. There is a risk of memory exception when the number of distinct values in the input sequence is too large; 2. If all elements have the same frequency, i.e., there is no mode, a random element is returned; 3. If there are multiple modes, a random mode is returned; 4. NULL values are also counted in frequency, so even if not all values in the input sequence are NULL, the final result may still be NULL. | All types | Same as input type | -| MAX_BY | MAX_BY(x, y) finds the value of x corresponding to the maximum y in the binary input x and y. MAX_BY(time, x) returns the timestamp when x is at its maximum. | x and y can be of any type | Same as the data type of the first input x | -| MIN_BY | MIN_BY(x, y) finds the value of x corresponding to the minimum y in the binary input x and y. MIN_BY(time, x) returns the timestamp when x is at its minimum. | x and y can be of any type | Same as the data type of the first input x | -| FIRST_BY | FIRST_BY(x, y) finds the value of x in the same row when y is the first non-null value. | x and y can be of any type | Same as the data type of the first input x | -| LAST_BY | LAST_BY(x, y) finds the value of x in the same row when y is the last non-null value. | x and y can be of any type | Same as the data type of the first input x | +### 2.2 Supported Aggregate Functions + +| Function Name | Description | Allowed Input Types | Output Type | +|:-----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------| +| COUNT | Counts the number of data points. | All types | INT64 | +| COUNT_IF | COUNT_IF(exp) counts the number of rows that satisfy a specified boolean expression. | `exp` must be a boolean expression,(e.g. `count_if(temperature>20)`) | INT64 | +| APPROX_COUNT_DISTINCT | The APPROX_COUNT_DISTINCT(x[, maxStandardError]) function provides an approximation of COUNT(DISTINCT x), returning the estimated number of distinct input values. | `x`: The target column to be calculated, supports all data types.
`maxStandardError` (optional): Specifies the maximum standard error allowed for the function's result. Valid range is [0.0040625, 0.26]. Defaults to 0.023 if not specified. | INT64 | +| SUM | Calculates the sum. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| AVG | Calculates the average. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| MAX | Finds the maximum value. | All types | Same as input type | +| MIN | Finds the minimum value. | All types | Same as input type | +| FIRST | Finds the value with the smallest timestamp that is not NULL. | All types | Same as input type | +| LAST | Finds the value with the largest timestamp that is not NULL. | All types | Same as input type | +| STDDEV | Alias for STDDEV_SAMP, calculates the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| STDDEV_POP | Calculates the population standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| STDDEV_SAMP | Calculates the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VARIANCE | Alias for VAR_SAMP, calculates the sample variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VAR_POP | Calculates the population variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VAR_SAMP | Calculates the sample variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| EXTREME | Finds the value with the largest absolute value. If the largest absolute values of positive and negative values are equal, returns the positive value. | INT32 INT64 FLOAT DOUBLE | Same as input type | +| MODE | Finds the mode. Note: 1. There is a risk of memory exception when the number of distinct values in the input sequence is too large; 2. If all elements have the same frequency, i.e., there is no mode, a random element is returned; 3. If there are multiple modes, a random mode is returned; 4. NULL values are also counted in frequency, so even if not all values in the input sequence are NULL, the final result may still be NULL. | All types | Same as input type | +| MAX_BY | MAX_BY(x, y) finds the value of x corresponding to the maximum y in the binary input x and y. MAX_BY(time, x) returns the timestamp when x is at its maximum. | x and y can be of any type | Same as the data type of the first input x | +| MIN_BY | MIN_BY(x, y) finds the value of x corresponding to the minimum y in the binary input x and y. MIN_BY(time, x) returns the timestamp when x is at its minimum. | x and y can be of any type | Same as the data type of the first input x | +| FIRST_BY | FIRST_BY(x, y) finds the value of x in the same row when y is the first non-null value. | x and y can be of any type | Same as the data type of the first input x | +| LAST_BY | LAST_BY(x, y) finds the value of x in the same row when y is the last non-null value. | x and y can be of any type | Same as the data type of the first input x | ### 2.3 Examples @@ -229,8 +230,29 @@ Total line number = 1 It costs 0.047s ``` +#### 2.3.4 Approx_count_distinct -#### 2.3.4 First +Retrieve the number of distinct values in the `temperature` column from `table1`. + +```sql +IoTDB> SELECT COUNT(DISTINCT temperature) as origin, APPROX_COUNT_DISTINCT(temperature) as approx FROM table1; +IoTDB> SELECT COUNT(DISTINCT temperature) as origin, APPROX_COUNT_DISTINCT(temperature,0.006) as approx FROM table1; +``` + +The execution result is as follows: + +```sql ++------+------+ +|origin|approx| ++------+------+ +| 3| 3| ++------+------+ +Total line number = 1 +It costs 0.022s +``` + + +#### 2.3.5 First Finds the values with the smallest timestamp that are not NULL in the `temperature` and `humidity` columns. @@ -250,7 +272,7 @@ Total line number = 1 It costs 0.170s ``` -#### 2.3.5 Last +#### 2.3.6 Last Finds the values with the largest timestamp that are not NULL in the `temperature` and `humidity` columns. @@ -270,7 +292,7 @@ Total line number = 1 It costs 0.211s ``` -#### 2.3.6 First_by +#### 2.3.7 First_by Finds the `time` value of the row with the smallest timestamp that is not NULL in the `temperature` column, and the `humidity` value of the row with the smallest timestamp that is not NULL in the `temperature` column. @@ -290,7 +312,7 @@ Total line number = 1 It costs 0.269s ``` -#### 2.3.7 Last_by +#### 2.3.8 Last_by Queries the `time` value of the row with the largest timestamp that is not NULL in the `temperature` column, and the `humidity` value of the row with the largest timestamp that is not NULL in the `temperature` column. @@ -310,7 +332,7 @@ Total line number = 1 It costs 0.070s ``` -#### 2.3.8 Max_by +#### 2.3.9 Max_by Queries the `time` value of the row where the `temperature` column is at its maximum, and the `humidity` value of the row where the `temperature` column is at its maximum. @@ -330,7 +352,7 @@ Total line number = 1 It costs 0.172s ``` -#### 2.3.9 Min_by +#### 2.3.10 Min_by Queries the `time` value of the row where the `temperature` column is at its minimum, and the `humidity` value of the row where the `temperature` column is at its minimum. @@ -395,7 +417,7 @@ NULL OR true -- true ##### 3.2.2.1 Truth Table -The following truth table illustrates how `NULL` is handled in `AND` and `OR` operators: +The following truth table illustrates how `NULL` is handled in `AND` and `OR` operators: | a | b | a AND b | a OR b | | :---- | :---- | :------ | :----- | @@ -469,7 +491,7 @@ date_bin(interval,source,origin) 4. If `source` is `null`, the function returns `null`. 5. Mixing months and non-month time units (e.g., `1 MONTH 1 DAY`) is not supported due to ambiguity. -> For example, if the starting point is **April 30, 2000**, calculating `1 DAY` first and then `1 MONTH` results in **June 1, 2000**, whereas calculating `1 MONTH` first and then `1 DAY` results in **May 31, 2000**. The resulting dates are different. +> For example, if the starting point is **April 30, 2000**, calculating `1 DAY` first and then `1 MONTH` results in **June 1, 2000**, whereas calculating `1 MONTH` first and then `1 DAY` results in **May 31, 2000**. The resulting dates are different. #### 4.2.2 Examples @@ -980,10 +1002,10 @@ Msg: org.apache.iotdb.jdbc.IoTDBSQLException: 701: Invalid format string: %.5f ( ``` 3. Invalid Invocation Errors -Triggered if: + Triggered if: -* Total arguments < 2 (must include `pattern` and at least one argument).• -* `pattern` is not of type `STRING`/`TEXT`. + * Total arguments < 2 (must include `pattern` and at least one argument).• + * `pattern` is not of type `STRING`/`TEXT`. ```SQL -- Example 1 @@ -1006,7 +1028,7 @@ The `||` operator is used for string concatenation and functions the same as the #### 8.1.2 LIKE Statement -The `LIKE` statement is used for pattern matching. For detailed usage, refer to Pattern Matching:[LIKE](#1-like-运算符). + The `LIKE` statement is used for pattern matching. For detailed usage, refer to Pattern Matching:[LIKE](#1-like-运算符). ### 8.2 String Functions diff --git a/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md b/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md index 5c97d7435..cd20b3671 100644 --- a/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md +++ b/src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md @@ -156,28 +156,29 @@ SELECT LEAST(temperature,humidity) FROM table2; ### 2.2 Supported Aggregate Functions -| Function Name | Description | Allowed Input Types | Output Type | -|:--------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------|:-------------------------------------------| -| COUNT | Counts the number of data points. | All types | INT64 | -| COUNT_IF | COUNT_IF(exp) counts the number of rows that satisfy a specified boolean expression. | `exp` must be a boolean expression,(e.g. `count_if(temperature>20)`) | INT64 | -| SUM | Calculates the sum. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| AVG | Calculates the average. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| MAX | Finds the maximum value. | All types | Same as input type | -| MIN | Finds the minimum value. | All types | Same as input type | -| FIRST | Finds the value with the smallest timestamp that is not NULL. | All types | Same as input type | -| LAST | Finds the value with the largest timestamp that is not NULL. | All types | Same as input type | -| STDDEV | Alias for STDDEV_SAMP, calculates the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| STDDEV_POP | Calculates the population standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| STDDEV_SAMP | Calculates the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VARIANCE | Alias for VAR_SAMP, calculates the sample variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VAR_POP | Calculates the population variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VAR_SAMP | Calculates the sample variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| EXTREME | Finds the value with the largest absolute value. If the largest absolute values of positive and negative values are equal, returns the positive value. | INT32 INT64 FLOAT DOUBLE | Same as input type | -| MODE | Finds the mode. Note: 1. There is a risk of memory exception when the number of distinct values in the input sequence is too large; 2. If all elements have the same frequency, i.e., there is no mode, a random element is returned; 3. If there are multiple modes, a random mode is returned; 4. NULL values are also counted in frequency, so even if not all values in the input sequence are NULL, the final result may still be NULL. | All types | Same as input type | -| MAX_BY | MAX_BY(x, y) finds the value of x corresponding to the maximum y in the binary input x and y. MAX_BY(time, x) returns the timestamp when x is at its maximum. | x and y can be of any type | Same as the data type of the first input x | -| MIN_BY | MIN_BY(x, y) finds the value of x corresponding to the minimum y in the binary input x and y. MIN_BY(time, x) returns the timestamp when x is at its minimum. | x and y can be of any type | Same as the data type of the first input x | -| FIRST_BY | FIRST_BY(x, y) finds the value of x in the same row when y is the first non-null value. | x and y can be of any type | Same as the data type of the first input x | -| LAST_BY | LAST_BY(x, y) finds the value of x in the same row when y is the last non-null value. | x and y can be of any type | Same as the data type of the first input x | +| Function Name | Description | Allowed Input Types | Output Type | +|:-----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------| +| COUNT | Counts the number of data points. | All types | INT64 | +| COUNT_IF | COUNT_IF(exp) counts the number of rows that satisfy a specified boolean expression. | `exp` must be a boolean expression,(e.g. `count_if(temperature>20)`) | INT64 | +| APPROX_COUNT_DISTINCT | The APPROX_COUNT_DISTINCT(x[, maxStandardError]) function provides an approximation of COUNT(DISTINCT x), returning the estimated number of distinct input values. | `x`: The target column to be calculated, supports all data types.
`maxStandardError` (optional): Specifies the maximum standard error allowed for the function's result. Valid range is [0.0040625, 0.26]. Defaults to 0.023 if not specified. | INT64 | +| SUM | Calculates the sum. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| AVG | Calculates the average. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| MAX | Finds the maximum value. | All types | Same as input type | +| MIN | Finds the minimum value. | All types | Same as input type | +| FIRST | Finds the value with the smallest timestamp that is not NULL. | All types | Same as input type | +| LAST | Finds the value with the largest timestamp that is not NULL. | All types | Same as input type | +| STDDEV | Alias for STDDEV_SAMP, calculates the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| STDDEV_POP | Calculates the population standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| STDDEV_SAMP | Calculates the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VARIANCE | Alias for VAR_SAMP, calculates the sample variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VAR_POP | Calculates the population variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VAR_SAMP | Calculates the sample variance. | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| EXTREME | Finds the value with the largest absolute value. If the largest absolute values of positive and negative values are equal, returns the positive value. | INT32 INT64 FLOAT DOUBLE | Same as input type | +| MODE | Finds the mode. Note: 1. There is a risk of memory exception when the number of distinct values in the input sequence is too large; 2. If all elements have the same frequency, i.e., there is no mode, a random element is returned; 3. If there are multiple modes, a random mode is returned; 4. NULL values are also counted in frequency, so even if not all values in the input sequence are NULL, the final result may still be NULL. | All types | Same as input type | +| MAX_BY | MAX_BY(x, y) finds the value of x corresponding to the maximum y in the binary input x and y. MAX_BY(time, x) returns the timestamp when x is at its maximum. | x and y can be of any type | Same as the data type of the first input x | +| MIN_BY | MIN_BY(x, y) finds the value of x corresponding to the minimum y in the binary input x and y. MIN_BY(time, x) returns the timestamp when x is at its minimum. | x and y can be of any type | Same as the data type of the first input x | +| FIRST_BY | FIRST_BY(x, y) finds the value of x in the same row when y is the first non-null value. | x and y can be of any type | Same as the data type of the first input x | +| LAST_BY | LAST_BY(x, y) finds the value of x in the same row when y is the last non-null value. | x and y can be of any type | Same as the data type of the first input x | ### 2.3 Examples @@ -229,8 +230,29 @@ Total line number = 1 It costs 0.047s ``` +#### 2.3.4 Approx_count_distinct -#### 2.3.4 First +Retrieve the number of distinct values in the `temperature` column from `table1`. + +```sql +IoTDB> SELECT COUNT(DISTINCT temperature) as origin, APPROX_COUNT_DISTINCT(temperature) as approx FROM table1; +IoTDB> SELECT COUNT(DISTINCT temperature) as origin, APPROX_COUNT_DISTINCT(temperature,0.006) as approx FROM table1; +``` + +The execution result is as follows: + +```sql ++------+------+ +|origin|approx| ++------+------+ +| 3| 3| ++------+------+ +Total line number = 1 +It costs 0.022s +``` + + +#### 2.3.5 First Finds the values with the smallest timestamp that are not NULL in the `temperature` and `humidity` columns. @@ -250,7 +272,7 @@ Total line number = 1 It costs 0.170s ``` -#### 2.3.5 Last +#### 2.3.6 Last Finds the values with the largest timestamp that are not NULL in the `temperature` and `humidity` columns. @@ -270,7 +292,7 @@ Total line number = 1 It costs 0.211s ``` -#### 2.3.6 First_by +#### 2.3.7 First_by Finds the `time` value of the row with the smallest timestamp that is not NULL in the `temperature` column, and the `humidity` value of the row with the smallest timestamp that is not NULL in the `temperature` column. @@ -290,7 +312,7 @@ Total line number = 1 It costs 0.269s ``` -#### 2.3.7 Last_by +#### 2.3.8 Last_by Queries the `time` value of the row with the largest timestamp that is not NULL in the `temperature` column, and the `humidity` value of the row with the largest timestamp that is not NULL in the `temperature` column. @@ -310,7 +332,7 @@ Total line number = 1 It costs 0.070s ``` -#### 2.3.8 Max_by +#### 2.3.9 Max_by Queries the `time` value of the row where the `temperature` column is at its maximum, and the `humidity` value of the row where the `temperature` column is at its maximum. @@ -330,7 +352,7 @@ Total line number = 1 It costs 0.172s ``` -#### 2.3.9 Min_by +#### 2.3.10 Min_by Queries the `time` value of the row where the `temperature` column is at its minimum, and the `humidity` value of the row where the `temperature` column is at its minimum. diff --git a/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md b/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md index 62ad12638..5b3bca340 100644 --- a/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md +++ b/src/zh/UserGuide/Master/Table/SQL-Manual/Basis-Function.md @@ -144,7 +144,7 @@ SELECT GREATEST(temperature,humidity) FROM table2; -- 查询 table2 中 temperature 和 humidity 的最小记录 SELECT LEAST(temperature,humidity) FROM table2; ``` - + ## 2. 聚合函数 @@ -153,30 +153,31 @@ SELECT LEAST(temperature,humidity) FROM table2; 1. 聚合函数是多对一函数。它们对一组值进行聚合计算,得到单个聚合结果。 2. 除了 `COUNT()`之外,其他所有聚合函数都忽略空值,并在没有输入行或所有值为空时返回空值。 例如,`SUM()` 返回 null 而不是零,而 `AVG()` 在计数中不包括 null 值。 -### 2.2 支持的聚合函数 - -| 函数名 | 功能描述 | 允许的输入类型 | 输出类型 | -| ----------- | ------------------------------------------------------------ |-----------------------------------------------|------------------| -| COUNT | 计算数据点数。 | 所有类型 | INT64 | -| COUNT_IF | COUNT_IF(exp) 用于统计满足指定布尔表达式的记录行数 | exp 必须是一个布尔类型的表达式,例如 count_if(temperature>20) | INT64 | -| SUM | 求和。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| AVG | 求平均值。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| MAX | 求最大值。 | 所有类型 | 与输入类型一致 | -| MIN | 求最小值。 | 所有类型 | 与输入类型一致 | -| FIRST | 求时间戳最小且不为 NULL 的值。 | 所有类型 | 与输入类型一致 | -| LAST | 求时间戳最大且不为 NULL 的值。 | 所有类型 | 与输入类型一致 | -| STDDEV | STDDEV_SAMP 的别名,求样本标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| STDDEV_POP | 求总体标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| STDDEV_SAMP | 求样本标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VARIANCE | VAR_SAMP 的别名,求样本方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VAR_POP | 求总体方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VAR_SAMP | 求样本方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。 | INT32 INT64 FLOAT DOUBLE | 与输入类型一致 | -| MODE | 求众数。注意: 1.输入序列的不同值个数过多时会有内存异常风险; 2.如果所有元素出现的频次相同,即没有众数,则随机返回一个元素; 3.如果有多个众数,则随机返回一个众数; 4. NULL 值也会被统计频次,所以即使输入序列的值不全为 NULL,最终结果也可能为 NULL。 | 所有类型 | 与输入类型一致 | -| MAX_BY | MAX_BY(x, y) 求二元输入 x 和 y 在 y 最大时对应的 x 的值。MAX_BY(time, x) 返回 x 取最大值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | -| MIN_BY | MIN_BY(x, y) 求二元输入 x 和 y 在 y 最小时对应的 x 的值。MIN_BY(time, x) 返回 x 取最小值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | -| FIRST_BY | FIRST_BY(x, y) 求当 y 为第一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | -| LAST_BY | LAST_BY(x, y) 求当 y 为最后一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | +### 2.2 支持的聚合函数 + +| 函数名 | 功能描述 | 允许的输入类型 | 输出类型 | +|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|------------------| +| COUNT | 计算数据点数。 | 所有类型 | INT64 | +| COUNT_IF | COUNT_IF(exp) 用于统计满足指定布尔表达式的记录行数 | exp 必须是一个布尔类型的表达式,例如 count_if(temperature>20) | INT64 | +| APPROX_COUNT_DISTINCT | APPROX_COUNT_DISTINCT(x[,maxStandardError]) 函数提供 COUNT(DISTINCT x) 的近似值,返回不同输入值的近似个数。 | x:待计算列,支持所有类型;
maxStandardError:指定该函数应产生的最大标准误差,取值范围[0.0040625, 0.26],未指定值时默认0.023。 | INT64 | +| SUM | 求和。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| AVG | 求平均值。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| MAX | 求最大值。 | 所有类型 | 与输入类型一致 | +| MIN | 求最小值。 | 所有类型 | 与输入类型一致 | +| FIRST | 求时间戳最小且不为 NULL 的值。 | 所有类型 | 与输入类型一致 | +| LAST | 求时间戳最大且不为 NULL 的值。 | 所有类型 | 与输入类型一致 | +| STDDEV | STDDEV_SAMP 的别名,求样本标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| STDDEV_POP | 求总体标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| STDDEV_SAMP | 求样本标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VARIANCE | VAR_SAMP 的别名,求样本方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VAR_POP | 求总体方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VAR_SAMP | 求样本方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。 | INT32 INT64 FLOAT DOUBLE | 与输入类型一致 | +| MODE | 求众数。注意: 1.输入序列的不同值个数过多时会有内存异常风险; 2.如果所有元素出现的频次相同,即没有众数,则随机返回一个元素; 3.如果有多个众数,则随机返回一个众数; 4. NULL 值也会被统计频次,所以即使输入序列的值不全为 NULL,最终结果也可能为 NULL。 | 所有类型 | 与输入类型一致 | +| MAX_BY | MAX_BY(x, y) 求二元输入 x 和 y 在 y 最大时对应的 x 的值。MAX_BY(time, x) 返回 x 取最大值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | +| MIN_BY | MIN_BY(x, y) 求二元输入 x 和 y 在 y 最小时对应的 x 的值。MIN_BY(time, x) 返回 x 取最小值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | +| FIRST_BY | FIRST_BY(x, y) 求当 y 为第一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | +| LAST_BY | LAST_BY(x, y) 求当 y 为最后一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | ### 2.3 示例 @@ -213,7 +214,7 @@ It costs 0.834s 统计 `table2` 中 到达时间 `arrival_time` 不是 `null` 的记录行数。 ```sql -select count_if(arrival_time is not null) from table2; +IoTDB> select count_if(arrival_time is not null) from table2; ``` 执行结果如下: @@ -228,8 +229,29 @@ Total line number = 1 It costs 0.047s ``` +#### 2.3.4 Approx_count_distinct + +查询 `table1` 中 `temperature` 列不同值的个数。 -#### 2.3.4 First +```sql +IoTDB> SELECT COUNT(DISTINCT temperature) as origin, APPROX_COUNT_DISTINCT(temperature) as approx FROM table1; +IoTDB> SELECT COUNT(DISTINCT temperature) as origin, APPROX_COUNT_DISTINCT(temperature,0.006) as approx FROM table1; +``` + +执行结果如下: + +```sql ++------+------+ +|origin|approx| ++------+------+ +| 3| 3| ++------+------+ +Total line number = 1 +It costs 0.022s +``` + + +#### 2.3.5 First 查询`temperature`列、`humidity`列时间戳最小且不为 NULL 的值。 @@ -249,7 +271,7 @@ Total line number = 1 It costs 0.170s ``` -#### 2.3.5 Last +#### 2.3.6 Last 查询`temperature`列、`humidity`列时间戳最大且不为 NULL 的值。 @@ -269,7 +291,7 @@ Total line number = 1 It costs 0.211s ``` -#### 2.3.6 First_by +#### 2.3.7 First_by 查询 `temperature` 列中非 NULL 且时间戳最小的行的 `time` 值,以及 `temperature` 列中非 NULL 且时间戳最小的行的 `humidity` 值。 @@ -289,7 +311,7 @@ Total line number = 1 It costs 0.269s ``` -#### 2.3.7 Last_by +#### 2.3.8 Last_by 查询`temperature` 列中非 NULL 且时间戳最大的行的 `time` 值,以及 `temperature` 列中非 NULL 且时间戳最大的行的 `humidity` 值。 @@ -309,7 +331,7 @@ Total line number = 1 It costs 0.070s ``` -#### 2.3.8 Max_by +#### 2.3.9 Max_by 查询`temperature` 列中最大值所在行的 `time` 值,以及`temperature` 列中最大值所在行的 `humidity` 值。 @@ -329,7 +351,7 @@ Total line number = 1 It costs 0.172s ``` -#### 2.3.9 Min_by +#### 2.3.10 Min_by 查询`temperature` 列中最小值所在行的 `time` 值,以及`temperature` 列中最小值所在行的 `humidity` 值。 @@ -350,7 +372,6 @@ It costs 0.244s ``` - ## 3. 逻辑运算符 ### 3.1 概述 diff --git a/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md b/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md index 00e421f38..5b3bca340 100644 --- a/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md +++ b/src/zh/UserGuide/latest-Table/SQL-Manual/Basis-Function.md @@ -155,28 +155,29 @@ SELECT LEAST(temperature,humidity) FROM table2; ### 2.2 支持的聚合函数 -| 函数名 | 功能描述 | 允许的输入类型 | 输出类型 | -| ----------- | ------------------------------------------------------------ |-----------------------------------------------|------------------| -| COUNT | 计算数据点数。 | 所有类型 | INT64 | -| COUNT_IF | COUNT_IF(exp) 用于统计满足指定布尔表达式的记录行数 | exp 必须是一个布尔类型的表达式,例如 count_if(temperature>20) | INT64 | -| SUM | 求和。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| AVG | 求平均值。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| MAX | 求最大值。 | 所有类型 | 与输入类型一致 | -| MIN | 求最小值。 | 所有类型 | 与输入类型一致 | -| FIRST | 求时间戳最小且不为 NULL 的值。 | 所有类型 | 与输入类型一致 | -| LAST | 求时间戳最大且不为 NULL 的值。 | 所有类型 | 与输入类型一致 | -| STDDEV | STDDEV_SAMP 的别名,求样本标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| STDDEV_POP | 求总体标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| STDDEV_SAMP | 求样本标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VARIANCE | VAR_SAMP 的别名,求样本方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VAR_POP | 求总体方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| VAR_SAMP | 求样本方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | -| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。 | INT32 INT64 FLOAT DOUBLE | 与输入类型一致 | -| MODE | 求众数。注意: 1.输入序列的不同值个数过多时会有内存异常风险; 2.如果所有元素出现的频次相同,即没有众数,则随机返回一个元素; 3.如果有多个众数,则随机返回一个众数; 4. NULL 值也会被统计频次,所以即使输入序列的值不全为 NULL,最终结果也可能为 NULL。 | 所有类型 | 与输入类型一致 | -| MAX_BY | MAX_BY(x, y) 求二元输入 x 和 y 在 y 最大时对应的 x 的值。MAX_BY(time, x) 返回 x 取最大值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | -| MIN_BY | MIN_BY(x, y) 求二元输入 x 和 y 在 y 最小时对应的 x 的值。MIN_BY(time, x) 返回 x 取最小值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | -| FIRST_BY | FIRST_BY(x, y) 求当 y 为第一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | -| LAST_BY | LAST_BY(x, y) 求当 y 为最后一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | +| 函数名 | 功能描述 | 允许的输入类型 | 输出类型 | +|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|------------------| +| COUNT | 计算数据点数。 | 所有类型 | INT64 | +| COUNT_IF | COUNT_IF(exp) 用于统计满足指定布尔表达式的记录行数 | exp 必须是一个布尔类型的表达式,例如 count_if(temperature>20) | INT64 | +| APPROX_COUNT_DISTINCT | APPROX_COUNT_DISTINCT(x[,maxStandardError]) 函数提供 COUNT(DISTINCT x) 的近似值,返回不同输入值的近似个数。 | x:待计算列,支持所有类型;
maxStandardError:指定该函数应产生的最大标准误差,取值范围[0.0040625, 0.26],未指定值时默认0.023。 | INT64 | +| SUM | 求和。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| AVG | 求平均值。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| MAX | 求最大值。 | 所有类型 | 与输入类型一致 | +| MIN | 求最小值。 | 所有类型 | 与输入类型一致 | +| FIRST | 求时间戳最小且不为 NULL 的值。 | 所有类型 | 与输入类型一致 | +| LAST | 求时间戳最大且不为 NULL 的值。 | 所有类型 | 与输入类型一致 | +| STDDEV | STDDEV_SAMP 的别名,求样本标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| STDDEV_POP | 求总体标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| STDDEV_SAMP | 求样本标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VARIANCE | VAR_SAMP 的别名,求样本方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VAR_POP | 求总体方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VAR_SAMP | 求样本方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。 | INT32 INT64 FLOAT DOUBLE | 与输入类型一致 | +| MODE | 求众数。注意: 1.输入序列的不同值个数过多时会有内存异常风险; 2.如果所有元素出现的频次相同,即没有众数,则随机返回一个元素; 3.如果有多个众数,则随机返回一个众数; 4. NULL 值也会被统计频次,所以即使输入序列的值不全为 NULL,最终结果也可能为 NULL。 | 所有类型 | 与输入类型一致 | +| MAX_BY | MAX_BY(x, y) 求二元输入 x 和 y 在 y 最大时对应的 x 的值。MAX_BY(time, x) 返回 x 取最大值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | +| MIN_BY | MIN_BY(x, y) 求二元输入 x 和 y 在 y 最小时对应的 x 的值。MIN_BY(time, x) 返回 x 取最小值时对应的时间戳。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | +| FIRST_BY | FIRST_BY(x, y) 求当 y 为第一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | +| LAST_BY | LAST_BY(x, y) 求当 y 为最后一个不为 NULL 的值时,同一行里对应的 x 值。 | x 和 y 可以是任意类型 | 与第一个输入 x 的数据类型一致 | ### 2.3 示例 @@ -213,7 +214,7 @@ It costs 0.834s 统计 `table2` 中 到达时间 `arrival_time` 不是 `null` 的记录行数。 ```sql -select count_if(arrival_time is not null) from table2; +IoTDB> select count_if(arrival_time is not null) from table2; ``` 执行结果如下: @@ -228,8 +229,29 @@ Total line number = 1 It costs 0.047s ``` +#### 2.3.4 Approx_count_distinct -#### 2.3.4 First +查询 `table1` 中 `temperature` 列不同值的个数。 + +```sql +IoTDB> SELECT COUNT(DISTINCT temperature) as origin, APPROX_COUNT_DISTINCT(temperature) as approx FROM table1; +IoTDB> SELECT COUNT(DISTINCT temperature) as origin, APPROX_COUNT_DISTINCT(temperature,0.006) as approx FROM table1; +``` + +执行结果如下: + +```sql ++------+------+ +|origin|approx| ++------+------+ +| 3| 3| ++------+------+ +Total line number = 1 +It costs 0.022s +``` + + +#### 2.3.5 First 查询`temperature`列、`humidity`列时间戳最小且不为 NULL 的值。 @@ -249,7 +271,7 @@ Total line number = 1 It costs 0.170s ``` -#### 2.3.5 Last +#### 2.3.6 Last 查询`temperature`列、`humidity`列时间戳最大且不为 NULL 的值。 @@ -269,7 +291,7 @@ Total line number = 1 It costs 0.211s ``` -#### 2.3.6 First_by +#### 2.3.7 First_by 查询 `temperature` 列中非 NULL 且时间戳最小的行的 `time` 值,以及 `temperature` 列中非 NULL 且时间戳最小的行的 `humidity` 值。 @@ -289,7 +311,7 @@ Total line number = 1 It costs 0.269s ``` -#### 2.3.7 Last_by +#### 2.3.8 Last_by 查询`temperature` 列中非 NULL 且时间戳最大的行的 `time` 值,以及 `temperature` 列中非 NULL 且时间戳最大的行的 `humidity` 值。 @@ -309,7 +331,7 @@ Total line number = 1 It costs 0.070s ``` -#### 2.3.8 Max_by +#### 2.3.9 Max_by 查询`temperature` 列中最大值所在行的 `time` 值,以及`temperature` 列中最大值所在行的 `humidity` 值。 @@ -329,7 +351,7 @@ Total line number = 1 It costs 0.172s ``` -#### 2.3.9 Min_by +#### 2.3.10 Min_by 查询`temperature` 列中最小值所在行的 `time` 值,以及`temperature` 列中最小值所在行的 `humidity` 值。