diff --git a/src/UserGuide/V1.3.0-2/API/Programming-CSharp-Native-API.md b/src/UserGuide/V1.3.0-2/API/Programming-CSharp-Native-API.md deleted file mode 100644 index a4f208f7c..000000000 --- a/src/UserGuide/V1.3.0-2/API/Programming-CSharp-Native-API.md +++ /dev/null @@ -1,213 +0,0 @@ - - -# C# Native API - -## Installation - -### Install from NuGet Package - -We have prepared Nuget Package for C# users. Users can directly install the client through .NET CLI. [The link of our NuGet Package is here](https://www.nuget.org/packages/Apache.IoTDB/). Run the following command in the command line to complete installation - -```sh -dotnet add package Apache.IoTDB -``` - -Note that the `Apache.IoTDB` package only supports versions greater than `.net framework 4.6.1`. - -## Prerequisites - - .NET SDK Version >= 5.0 - .NET Framework >= 4.6.1 - -## How to Use the Client (Quick Start) - -Users can quickly get started by referring to the use cases under the Apache-IoTDB-Client-CSharp-UserCase directory. These use cases serve as a useful resource for getting familiar with the client's functionality and capabilities. - -For those who wish to delve deeper into the client's usage and explore more advanced features, the samples directory contains additional code samples. - -## Developer environment requirements for iotdb-client-csharp - -``` -.NET SDK Version >= 5.0 -.NET Framework >= 4.6.1 -ApacheThrift >= 0.14.1 -NLog >= 4.7.9 -``` - -### OS - -* Linux, Macos or other unix-like OS -* Windows+bash(WSL, cygwin, Git Bash) - -### Command Line Tools - -* dotnet CLI -* Thrift - -## Basic interface description - -The Session interface is semantically identical to other language clients - -```csharp -// Parameters -string host = "localhost"; -int port = 6667; -int pool_size = 2; - -// Init Session -var session_pool = new SessionPool(host, port, pool_size); - -// Open Session -await session_pool.Open(false); - -// Create TimeSeries -await session_pool.CreateTimeSeries("root.test_group.test_device.ts1", TSDataType.TEXT, TSEncoding.PLAIN, Compressor.UNCOMPRESSED); -await session_pool.CreateTimeSeries("root.test_group.test_device.ts2", TSDataType.BOOLEAN, TSEncoding.PLAIN, Compressor.UNCOMPRESSED); -await session_pool.CreateTimeSeries("root.test_group.test_device.ts3", TSDataType.INT32, TSEncoding.PLAIN, Compressor.UNCOMPRESSED); - -// Insert Record -var measures = new List{"ts1", "ts2", "ts3"}; -var values = new List { "test_text", true, (int)123 }; -var timestamp = 1; -var rowRecord = new RowRecord(timestamp, values, measures); -await session_pool.InsertRecordAsync("root.test_group.test_device", rowRecord); - -// Insert Tablet -var timestamp_lst = new List{ timestamp + 1 }; -var value_lst = new List {"iotdb", true, (int) 12}; -var tablet = new Tablet("root.test_group.test_device", measures, value_lst, timestamp_lst); -await session_pool.InsertTabletAsync(tablet); - -// Close Session -await session_pool.Close(); -``` - -## **Row Record** - -- Encapsulate and abstract the `record` data in **IoTDB** -- e.g. - - | timestamp | status | temperature | - | --------- | ------ | ----------- | - | 1 | 0 | 20 | - -- Construction: - -```csharp -var rowRecord = - new RowRecord(long timestamps, List values, List measurements); -``` - -### **Tablet** - -- A data structure similar to a table, containing several non empty data blocks of a device's rows。 -- e.g. - - | time | status | temperature | - | ---- | ------ | ----------- | - | 1 | 0 | 20 | - | 2 | 0 | 20 | - | 3 | 3 | 21 | - -- Construction: - -```csharp -var tablet = - Tablet(string deviceId, List measurements, List> values, List timestamps); -``` - - - -## **API** - -### **Basic API** - -| api name | parameters | notes | use example | -| -------------- | ------------------------- | ------------------------ | ----------------------------- | -| Open | bool | open session | session_pool.Open(false) | -| Close | null | close session | session_pool.Close() | -| IsOpen | null | check if session is open | session_pool.IsOpen() | -| OpenDebugMode | LoggingConfiguration=null | open debug mode | session_pool.OpenDebugMode() | -| CloseDebugMode | null | close debug mode | session_pool.CloseDebugMode() | -| SetTimeZone | string | set time zone | session_pool.GetTimeZone() | -| GetTimeZone | null | get time zone | session_pool.GetTimeZone() | - -### **Record API** - -| api name | parameters | notes | use example | -| ----------------------------------- | ----------------------------- | ----------------------------------- | ------------------------------------------------------------ | -| InsertRecordAsync | string, RowRecord | insert single record | session_pool.InsertRecordAsync("root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE", new RowRecord(1, values, measures)); | -| InsertRecordsAsync | List\, List\ | insert records | session_pool.InsertRecordsAsync(device_id, rowRecords) | -| InsertRecordsOfOneDeviceAsync | string, List\ | insert records of one device | session_pool.InsertRecordsOfOneDeviceAsync(device_id, rowRecords) | -| InsertRecordsOfOneDeviceSortedAsync | string, List\ | insert sorted records of one device | InsertRecordsOfOneDeviceSortedAsync(deviceId, sortedRowRecords); | -| TestInsertRecordAsync | string, RowRecord | test insert record | session_pool.TestInsertRecordAsync("root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE", rowRecord) | -| TestInsertRecordsAsync | List\, List\ | test insert record | session_pool.TestInsertRecordsAsync(device_id, rowRecords) | - -### **Tablet API** - -| api name | parameters | notes | use example | -| ---------------------- | ------------ | -------------------- | -------------------------------------------- | -| InsertTabletAsync | Tablet | insert single tablet | session_pool.InsertTabletAsync(tablet) | -| InsertTabletsAsync | List\ | insert tablets | session_pool.InsertTabletsAsync(tablets) | -| TestInsertTabletAsync | Tablet | test insert tablet | session_pool.TestInsertTabletAsync(tablet) | -| TestInsertTabletsAsync | List\ | test insert tablets | session_pool.TestInsertTabletsAsync(tablets) | - -### **SQL API** - -| api name | parameters | notes | use example | -| ----------------------------- | ---------- | ------------------------------ | ------------------------------------------------------------ | -| ExecuteQueryStatementAsync | string | execute sql query statement | session_pool.ExecuteQueryStatementAsync("select * from root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE where time<15"); | -| ExecuteNonQueryStatementAsync | string | execute sql nonquery statement | session_pool.ExecuteNonQueryStatementAsync( "create timeseries root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE.status with datatype=BOOLEAN,encoding=PLAIN") | - -### **Scheam API** - -| api name | parameters | notes | use example | -| -------------------------- | ------------------------------------------------------------ | --------------------------- | ------------------------------------------------------------ | -| SetStorageGroup | string | set storage group | session_pool.SetStorageGroup("root.97209_TEST_CSHARP_CLIENT_GROUP_01") | -| CreateTimeSeries | string, TSDataType, TSEncoding, Compressor | create time series | session_pool.InsertTabletsAsync(tablets) | -| DeleteStorageGroupAsync | string | delete single storage group | session_pool.DeleteStorageGroupAsync("root.97209_TEST_CSHARP_CLIENT_GROUP_01") | -| DeleteStorageGroupsAsync | List\ | delete storage group | session_pool.DeleteStorageGroupAsync("root.97209_TEST_CSHARP_CLIENT_GROUP") | -| CreateMultiTimeSeriesAsync | List\, List\ , List\ , List\ | create multi time series | session_pool.CreateMultiTimeSeriesAsync(ts_path_lst, data_type_lst, encoding_lst, compressor_lst); | -| DeleteTimeSeriesAsync | List\ | delete time series | | -| DeleteTimeSeriesAsync | string | delete time series | | -| DeleteDataAsync | List\, long, long | delete data | session_pool.DeleteDataAsync(ts_path_lst, 2, 3) | - -### **Other API** - -| api name | parameters | notes | use example | -| -------------------------- | ---------- | --------------------------- | ---------------------------------------------------- | -| CheckTimeSeriesExistsAsync | string | check if time series exists | session_pool.CheckTimeSeriesExistsAsync(time series) | - - - -[e.g.](https://github.com/apache/iotdb-client-csharp/tree/main/samples/Apache.IoTDB.Samples) - -## SessionPool - -To implement concurrent client requests, we provide a `SessionPool` for the native interface. Since `SessionPool` itself is a superset of `Session`, when `SessionPool` is a When the `pool_size` parameter is set to 1, it reverts to the original `Session` - -We use the `ConcurrentQueue` data structure to encapsulate a client queue to maintain multiple connections with the server. When the `Open()` interface is called, a specified number of clients are created in the queue, and synchronous access to the queue is achieved through the `System.Threading.Monitor` class. - -When a request occurs, it will try to find an idle client connection from the Connection pool. If there is no idle connection, the program will need to wait until there is an idle connection - -When a connection is used up, it will automatically return to the pool and wait for the next time it is used up - diff --git a/src/UserGuide/V1.3.0-2/API/Programming-Cpp-Native-API.md b/src/UserGuide/V1.3.0-2/API/Programming-Cpp-Native-API.md deleted file mode 100644 index b462983d2..000000000 --- a/src/UserGuide/V1.3.0-2/API/Programming-Cpp-Native-API.md +++ /dev/null @@ -1,428 +0,0 @@ - - -# C++ Native API - -## Dependencies - -- Java 8+ -- Flex -- Bison 2.7+ -- Boost 1.56+ -- OpenSSL 1.0+ -- GCC 5.5.0+ - -## Installation - -### Install Required Dependencies - -- **MAC** - 1. Install Bison: - - Use the following brew command to install the Bison version: - ```shell - brew install bison - ``` - - 2. Install Boost: Make sure to install the latest version of Boost. - - ```shell - brew install boost - ``` - - 3. Check OpenSSL: Make sure the OpenSSL library is installed. The default OpenSSL header file path is "/usr/local/opt/openssl/include". - - If you encounter errors related to OpenSSL not being found during compilation, try adding `-Dopenssl.include.dir=""`. - -- **Ubuntu 16.04+ or Other Debian-based Systems** - - Use the following commands to install dependencies: - - ```shell - sudo apt-get update - sudo apt-get install gcc g++ bison flex libboost-all-dev libssl-dev - ``` - -- **CentOS 7.7+/Fedora/Rocky Linux or Other Red Hat-based Systems** - - Use the yum command to install dependencies: - - ```shell - sudo yum update - sudo yum install gcc gcc-c++ boost-devel bison flex openssl-devel - ``` - -- **Windows** - - 1. Set Up the Build Environment - - Install MS Visual Studio (version 2019+ recommended): Make sure to select Visual Studio C/C++ IDE and compiler (supporting CMake, Clang, MinGW) during installation. - - Download and install [CMake](https://cmake.org/download/). - - 2. Download and Install Flex, Bison - - Download [Win_Flex_Bison](https://sourceforge.net/projects/winflexbison/). - - After downloading, rename the executables to flex.exe and bison.exe to ensure they can be found during compilation, and add the directory of these executables to the PATH environment variable. - - 3. Install Boost Library - - Download [Boost](https://www.boost.org/users/download/). - - Compile Boost locally: Run `bootstrap.bat` and `b2.exe` in sequence. - - Add the Boost installation directory to the PATH environment variable, e.g., `C:\Program Files (x86)\boost_1_78_0`. - - 4. Install OpenSSL - - Download and install [OpenSSL](http://slproweb.com/products/Win32OpenSSL.html). - - Add the include directory under the installation directory to the PATH environment variable. - -### Compilation - -Clone the source code from git: -```shell -git clone https://github.com/apache/iotdb.git -``` - -The default main branch is the master branch. If you want to use a specific release version, switch to that branch (e.g., version 1.3.2): -```shell -git checkout rc/1.3.2 -``` - -Run Maven to compile in the IoTDB root directory: - -- Mac or Linux with glibc version >= 2.32 - ```shell - ./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp - ``` - -- Linux with glibc version >= 2.31 - ```shell - ./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Diotdb-tools-thrift.version=0.14.1.1-old-glibc-SNAPSHOT - ``` - -- Linux with glibc version >= 2.17 - ```shell - ./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Diotdb-tools-thrift.version=0.14.1.1-glibc223-SNAPSHOT - ``` - -- Windows using Visual Studio 2022 - ```batch - .\mvnw.cmd clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp - ``` - -- Windows using Visual Studio 2019 - ```batch - .\mvnw.cmd clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Dcmake.generator="Visual Studio 16 2019" -Diotdb-tools-thrift.version=0.14.1.1-msvc142-SNAPSHOT - ``` - - If you haven't added the Boost library path to the PATH environment variable, you need to add the relevant parameters to the compile command, e.g., `-DboostIncludeDir="C:\Program Files (x86)\boost_1_78_0" -DboostLibraryDir="C:\Program Files (x86)\boost_1_78_0\stage\lib"`. - -After successful compilation, the packaged library files will be located in `iotdb-client/client-cpp/target`, and you can find the compiled example program under `example/client-cpp-example/target`. - -### Compilation Q&A - -Q: What are the requirements for the environment on Linux? - -A: -- The known minimum version requirement for glibc (x86_64 version) is 2.17, and the minimum version for GCC is 5.5. -- The known minimum version requirement for glibc (ARM version) is 2.31, and the minimum version for GCC is 10.2. -- If the above requirements are not met, you can try compiling Thrift locally: - - Download the code from https://github.com/apache/iotdb-bin-resources/tree/iotdb-tools-thrift-v0.14.1.0/iotdb-tools-thrift. - - Run `./mvnw clean install`. - - Go back to the IoTDB code directory and run `./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp`. - -Q: How to resolve the `undefined reference to '_libc_single_thread'` error during Linux compilation? - -A: -- This issue is caused by the precompiled Thrift dependencies requiring a higher version of glibc. -- You can try adding `-Diotdb-tools-thrift.version=0.14.1.1-glibc223-SNAPSHOT` or `-Diotdb-tools-thrift.version=0.14.1.1-old-glibc-SNAPSHOT` to the Maven compile command. - -Q: What if I need to compile using Visual Studio 2017 or earlier on Windows? - -A: -- You can try compiling Thrift locally before compiling the client: - - Download the code from https://github.com/apache/iotdb-bin-resources/tree/iotdb-tools-thrift-v0.14.1.0/iotdb-tools-thrift. - - Run `.\mvnw.cmd clean install`. - - Go back to the IoTDB code directory and run `.\mvnw.cmd clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Dcmake.generator="Visual Studio 15 2017"`. - - -## Native APIs - -Here we show the commonly used interfaces and their parameters in the Native API: - -### Initialization - -- Open a Session -```cpp -void open(); -``` - -- Open a session, with a parameter to specify whether to enable RPC compression -```cpp -void open(bool enableRPCCompression); -``` -Notice: this RPC compression status of client must comply with that of IoTDB server - -- Close a Session -```cpp -void close(); -``` - -### Data Definition Interface (DDL) - -#### Database Management - -- CREATE DATABASE -```cpp -void setStorageGroup(const std::string &storageGroupId); -``` - -- Delete one or several databases -```cpp -void deleteStorageGroup(const std::string &storageGroup); -void deleteStorageGroups(const std::vector &storageGroups); -``` - -#### Timeseries Management - -- Create one or multiple timeseries -```cpp -void createTimeseries(const std::string &path, TSDataType::TSDataType dataType, TSEncoding::TSEncoding encoding, - CompressionType::CompressionType compressor); - -void createMultiTimeseries(const std::vector &paths, - const std::vector &dataTypes, - const std::vector &encodings, - const std::vector &compressors, - std::vector> *propsList, - std::vector> *tagsList, - std::vector> *attributesList, - std::vector *measurementAliasList); -``` - -- Create aligned timeseries -```cpp -void createAlignedTimeseries(const std::string &deviceId, - const std::vector &measurements, - const std::vector &dataTypes, - const std::vector &encodings, - const std::vector &compressors); -``` - -- Delete one or several timeseries -```cpp -void deleteTimeseries(const std::string &path); -void deleteTimeseries(const std::vector &paths); -``` - -- Check whether the specific timeseries exists. -```cpp -bool checkTimeseriesExists(const std::string &path); -``` - -#### Schema Template - -- Create a schema template -```cpp -void createSchemaTemplate(const Template &templ); -``` - -- Set the schema template named `templateName` at path `prefixPath`. -```cpp -void setSchemaTemplate(const std::string &template_name, const std::string &prefix_path); -``` - -- Unset the schema template -```cpp -void unsetSchemaTemplate(const std::string &prefix_path, const std::string &template_name); -``` - -- After measurement template created, you can edit the template with belowed APIs. -```cpp -// Add aligned measurements to a template -void addAlignedMeasurementsInTemplate(const std::string &template_name, - const std::vector &measurements, - const std::vector &dataTypes, - const std::vector &encodings, - const std::vector &compressors); - -// Add one aligned measurement to a template -void addAlignedMeasurementsInTemplate(const std::string &template_name, - const std::string &measurement, - TSDataType::TSDataType dataType, - TSEncoding::TSEncoding encoding, - CompressionType::CompressionType compressor); - -// Add unaligned measurements to a template -void addUnalignedMeasurementsInTemplate(const std::string &template_name, - const std::vector &measurements, - const std::vector &dataTypes, - const std::vector &encodings, - const std::vector &compressors); - -// Add one unaligned measurement to a template -void addUnalignedMeasurementsInTemplate(const std::string &template_name, - const std::string &measurement, - TSDataType::TSDataType dataType, - TSEncoding::TSEncoding encoding, - CompressionType::CompressionType compressor); - -// Delete a node in template and its children -void deleteNodeInTemplate(const std::string &template_name, const std::string &path); -``` - -- You can query measurement templates with these APIS: -```cpp -// Return the amount of measurements inside a template -int countMeasurementsInTemplate(const std::string &template_name); - -// Return true if path points to a measurement, otherwise returne false -bool isMeasurementInTemplate(const std::string &template_name, const std::string &path); - -// Return true if path exists in template, otherwise return false -bool isPathExistInTemplate(const std::string &template_name, const std::string &path); - -// Return all measurements paths inside template -std::vector showMeasurementsInTemplate(const std::string &template_name); - -// Return all measurements paths under the designated patter inside template -std::vector showMeasurementsInTemplate(const std::string &template_name, const std::string &pattern); -``` - - -### Data Manipulation Interface (DML) - -#### Insert - -> It is recommended to use insertTablet to help improve write efficiency. - -- Insert a Tablet,which is multiple rows of a device, each row has the same measurements - - Better Write Performance - - Support null values: fill the null value with any value, and then mark the null value via BitMap -```cpp -void insertTablet(Tablet &tablet); -``` - -- Insert multiple Tablets -```cpp -void insertTablets(std::unordered_map &tablets); -``` - -- Insert a Record, which contains multiple measurement value of a device at a timestamp -```cpp -void insertRecord(const std::string &deviceId, int64_t time, const std::vector &measurements, - const std::vector &types, const std::vector &values); -``` - -- Insert multiple Records -```cpp -void insertRecords(const std::vector &deviceIds, - const std::vector ×, - const std::vector> &measurementsList, - const std::vector> &typesList, - const std::vector> &valuesList); -``` - -- Insert multiple Records that belong to the same device. With type info the server has no need to do type inference, which leads a better performance -```cpp -void insertRecordsOfOneDevice(const std::string &deviceId, - std::vector ×, - std::vector> &measurementsList, - std::vector> &typesList, - std::vector> &valuesList); -``` - -#### Insert with type inference - -Without type information, server has to do type inference, which may cost some time. - -```cpp -void insertRecord(const std::string &deviceId, int64_t time, const std::vector &measurements, - const std::vector &values); - - -void insertRecords(const std::vector &deviceIds, - const std::vector ×, - const std::vector> &measurementsList, - const std::vector> &valuesList); - - -void insertRecordsOfOneDevice(const std::string &deviceId, - std::vector ×, - std::vector> &measurementsList, - const std::vector> &valuesList); -``` - -#### Insert data into Aligned Timeseries - -The Insert of aligned timeseries uses interfaces like `insertAlignedXXX`, and others are similar to the above interfaces: - -- insertAlignedRecord -- insertAlignedRecords -- insertAlignedRecordsOfOneDevice -- insertAlignedTablet -- insertAlignedTablets - -#### Delete - -- Delete data in a time range of one or several timeseries -```cpp -void deleteData(const std::string &path, int64_t endTime); -void deleteData(const std::vector &paths, int64_t endTime); -void deleteData(const std::vector &paths, int64_t startTime, int64_t endTime); -``` - -### IoTDB-SQL Interface - -- Execute query statement -```cpp -unique_ptr executeQueryStatement(const std::string &sql); -``` - -- Execute non query statement -```cpp -void executeNonQueryStatement(const std::string &sql); -``` - - -## Examples - -The sample code of using these interfaces is in: - -- `example/client-cpp-example/src/SessionExample.cpp` -- `example/client-cpp-example/src/AlignedTimeseriesSessionExample.cpp` (Aligned Timeseries) - -If the compilation finishes successfully, the example project will be placed under `example/client-cpp-example/target` - -## FAQ - -### on Mac - -If errors occur when compiling thrift source code, try to downgrade your xcode-commandline from 12 to 11.5 - -see https://stackoverflow.com/questions/63592445/ld-unsupported-tapi-file-type-tapi-tbd-in-yaml-file/65518087#65518087 - - -### on Windows - -When Building Thrift and downloading packages via "wget", a possible annoying issue may occur with -error message looks like: -```shell -Failed to delete cached file C:\Users\Administrator\.m2\repository\.cache\download-maven-plugin\index.ser -``` -Possible fixes: -- Try to delete the ".m2\repository\\.cache\" directory and try again. -- Add "\true\" configuration to the download-maven-plugin maven phase that complains this error. - diff --git a/src/UserGuide/V1.3.0-2/API/Programming-Go-Native-API.md b/src/UserGuide/V1.3.0-2/API/Programming-Go-Native-API.md deleted file mode 100644 index e077dcf85..000000000 --- a/src/UserGuide/V1.3.0-2/API/Programming-Go-Native-API.md +++ /dev/null @@ -1,65 +0,0 @@ - - -# Go Native API - -## Dependencies - - * golang >= 1.13 - * make >= 3.0 - * curl >= 7.1.1 - * thrift 0.15.0 - * Linux、Macos or other unix-like systems - * Windows+bash (WSL、cygwin、Git Bash) - - - - -## Installation - - * go mod - -```sh -export GO111MODULE=on -export GOPROXY=https://goproxy.io - -mkdir session_example && cd session_example - -curl -o session_example.go -L https://github.com/apache/iotdb-client-go/raw/main/example/session_example.go - -go mod init session_example -go run session_example.go -``` - -* GOPATH - -```sh -# get thrift 0.13.0 -go get github.com/apache/thrift -cd $GOPATH/src/github.com/apache/thrift -git checkout 0.13.0 - -mkdir -p $GOPATH/src/iotdb-client-go-example/session_example -cd $GOPATH/src/iotdb-client-go-example/session_example -curl -o session_example.go -L https://github.com/apache/iotdb-client-go/raw/main/example/session_example.go -go run session_example.go -``` - diff --git a/src/UserGuide/V1.3.0-2/API/Programming-JDBC.md b/src/UserGuide/V1.3.0-2/API/Programming-JDBC.md deleted file mode 100644 index b717ac540..000000000 --- a/src/UserGuide/V1.3.0-2/API/Programming-JDBC.md +++ /dev/null @@ -1,212 +0,0 @@ - - -# JDBC (Not Recommend) - -*NOTICE: CURRENTLY, JDBC IS USED FOR CONNECTING SOME THIRD-PART TOOLS. -IT CAN NOT PROVIDE HIGH THROUGHPUT FOR WRITE OPERATIONS. -PLEASE USE [Java Native API](./Programming-Java-Native-API.md) INSTEAD* - -## Dependencies - -* JDK >= 1.8 -* Maven >= 3.6 - -## Installation - -In root directory: - -```shell -mvn clean install -pl iotdb-client/jdbc -am -DskipTests -``` - -## Use IoTDB JDBC with Maven - -```xml - - - org.apache.iotdb - iotdb-jdbc - 1.3.1 - - -``` - -## Coding Examples - -This chapter provides an example of how to open a database connection, execute a SQL query, and display the results. - -It requires including the packages containing the JDBC classes needed for database programming. - -**NOTE: For faster insertion, the insertTablet() in Session is recommended.** - -```java -import java.sql.*; -import org.apache.iotdb.jdbc.IoTDBSQLException; - -public class JDBCExample { - /** - * Before executing a SQL statement with a Statement object, you need to create a Statement object using the createStatement() method of the Connection object. - * After creating a Statement object, you can use its execute() method to execute a SQL statement - * Finally, remember to close the 'statement' and 'connection' objects by using their close() method - * For statements with query results, we can use the getResultSet() method of the Statement object to get the result set. - */ - public static void main(String[] args) throws SQLException { - Connection connection = getConnection(); - if (connection == null) { - System.out.println("get connection defeat"); - return; - } - Statement statement = connection.createStatement(); - //Create database - try { - statement.execute("CREATE DATABASE root.demo"); - }catch (IoTDBSQLException e){ - System.out.println(e.getMessage()); - } - - - //SHOW DATABASES - statement.execute("SHOW DATABASES"); - outputResult(statement.getResultSet()); - - //Create time series - //Different data type has different encoding methods. Here use INT32 as an example - try { - statement.execute("CREATE TIMESERIES root.demo.s0 WITH DATATYPE=INT32,ENCODING=RLE;"); - }catch (IoTDBSQLException e){ - System.out.println(e.getMessage()); - } - //Show time series - statement.execute("SHOW TIMESERIES root.demo"); - outputResult(statement.getResultSet()); - //Show devices - statement.execute("SHOW DEVICES"); - outputResult(statement.getResultSet()); - //Count time series - statement.execute("COUNT TIMESERIES root"); - outputResult(statement.getResultSet()); - //Count nodes at the given level - statement.execute("COUNT NODES root LEVEL=3"); - outputResult(statement.getResultSet()); - //Count timeseries group by each node at the given level - statement.execute("COUNT TIMESERIES root GROUP BY LEVEL=3"); - outputResult(statement.getResultSet()); - - - //Execute insert statements in batch - statement.addBatch("insert into root.demo(timestamp,s0) values(1,1);"); - statement.addBatch("insert into root.demo(timestamp,s0) values(1,1);"); - statement.addBatch("insert into root.demo(timestamp,s0) values(2,15);"); - statement.addBatch("insert into root.demo(timestamp,s0) values(2,17);"); - statement.addBatch("insert into root.demo(timestamp,s0) values(4,12);"); - statement.executeBatch(); - statement.clearBatch(); - - //Full query statement - String sql = "select * from root.demo"; - ResultSet resultSet = statement.executeQuery(sql); - System.out.println("sql: " + sql); - outputResult(resultSet); - - //Exact query statement - sql = "select s0 from root.demo where time = 4;"; - resultSet= statement.executeQuery(sql); - System.out.println("sql: " + sql); - outputResult(resultSet); - - //Time range query - sql = "select s0 from root.demo where time >= 2 and time < 5;"; - resultSet = statement.executeQuery(sql); - System.out.println("sql: " + sql); - outputResult(resultSet); - - //Aggregate query - sql = "select count(s0) from root.demo;"; - resultSet = statement.executeQuery(sql); - System.out.println("sql: " + sql); - outputResult(resultSet); - - //Delete time series - statement.execute("delete timeseries root.demo.s0"); - - //close connection - statement.close(); - connection.close(); - } - - public static Connection getConnection() { - // JDBC driver name and database URL - String driver = "org.apache.iotdb.jdbc.IoTDBDriver"; - String url = "jdbc:iotdb://127.0.0.1:6667/"; - // set rpc compress mode - // String url = "jdbc:iotdb://127.0.0.1:6667?rpc_compress=true"; - - // Database credentials - String username = "root"; - String password = "root"; - - Connection connection = null; - try { - Class.forName(driver); - connection = DriverManager.getConnection(url, username, password); - } catch (ClassNotFoundException e) { - e.printStackTrace(); - } catch (SQLException e) { - e.printStackTrace(); - } - return connection; - } - - /** - * This is an example of outputting the results in the ResultSet - */ - private static void outputResult(ResultSet resultSet) throws SQLException { - if (resultSet != null) { - System.out.println("--------------------------"); - final ResultSetMetaData metaData = resultSet.getMetaData(); - final int columnCount = metaData.getColumnCount(); - for (int i = 0; i < columnCount; i++) { - System.out.print(metaData.getColumnLabel(i + 1) + " "); - } - System.out.println(); - while (resultSet.next()) { - for (int i = 1; ; i++) { - System.out.print(resultSet.getString(i)); - if (i < columnCount) { - System.out.print(", "); - } else { - System.out.println(); - break; - } - } - } - System.out.println("--------------------------\n"); - } - } -} -``` - -The parameter `version` can be used in the url: -````java -String url = "jdbc:iotdb://127.0.0.1:6667?version=V_1_0"; -```` -The parameter `version` represents the SQL semantic version used by the client, which is used to be compatible with the SQL semantics of 0.12 when upgrading 0.13. The possible values are: `V_0_12`, `V_0_13`, `V_1_0`. diff --git a/src/UserGuide/V1.3.0-2/API/Programming-Java-Native-API.md b/src/UserGuide/V1.3.0-2/API/Programming-Java-Native-API.md deleted file mode 100644 index 186a2dd40..000000000 --- a/src/UserGuide/V1.3.0-2/API/Programming-Java-Native-API.md +++ /dev/null @@ -1,536 +0,0 @@ - - -# Java Native API - -## Installation - -### Dependencies - -* JDK >= 1.8 -* Maven >= 3.6 - - - -### Using IoTDB Java Native API with Maven - -```xml - - - org.apache.iotdb - iotdb-session - 1.0.0 - - -``` - -## Syntax Convention - -- **IoTDB-SQL interface:** The input SQL parameter needs to conform to the [syntax conventions](../User-Manual/Syntax-Rule.md#Literal-Values) and be escaped for JAVA strings. For example, you need to add a backslash before the double-quotes. (That is: after JAVA escaping, it is consistent with the SQL statement executed on the command line.) -- **Other interfaces:** - - The node names in path or path prefix as parameter: The node names which should be escaped by backticks (`) in the SQL statement, escaping is required here. - - Identifiers (such as template names) as parameters: The identifiers which should be escaped by backticks (`) in the SQL statement, and escaping is not required here. -- **Code example for syntax convention could be found at:** `example/session/src/main/java/org/apache/iotdb/SyntaxConventionRelatedExample.java` - -## Native APIs - -Here we show the commonly used interfaces and their parameters in the Native API: - -### Initialization - -* Initialize a Session - -``` java -// use default configuration -session = new Session.Builder.build(); - -// initialize with a single node -session = - new Session.Builder() - .host(String host) - .port(int port) - .build(); - -// initialize with multiple nodes -session = - new Session.Builder() - .nodeUrls(List nodeUrls) - .build(); - -// other configurations -session = - new Session.Builder() - .fetchSize(int fetchSize) - .username(String username) - .password(String password) - .thriftDefaultBufferSize(int thriftDefaultBufferSize) - .thriftMaxFrameSize(int thriftMaxFrameSize) - .enableRedirection(boolean enableRedirection) - .version(Version version) - .build(); -``` - -Version represents the SQL semantic version used by the client, which is used to be compatible with the SQL semantics of 0.12 when upgrading 0.13. The possible values are: `V_0_12`, `V_0_13`, `V_1_0`. - - - -* Open a Session - -``` java -void open() -``` - -* Open a session, with a parameter to specify whether to enable RPC compression - -``` java -void open(boolean enableRPCCompression) -``` - -Notice: this RPC compression status of client must comply with that of IoTDB server - -* Close a Session - -``` java -void close() -``` - -### Data Definition Interface (DDL Interface) - -#### Database Management - -* CREATE DATABASE - -``` java -void setStorageGroup(String storageGroupId) -``` - -* Delete one or several databases - -``` java -void deleteStorageGroup(String storageGroup) -void deleteStorageGroups(List storageGroups) -``` - -#### Timeseries Management - -* Create one or multiple timeseries - -``` java -void createTimeseries(String path, TSDataType dataType, - TSEncoding encoding, CompressionType compressor, Map props, - Map tags, Map attributes, String measurementAlias) - -void createMultiTimeseries(List paths, List dataTypes, - List encodings, List compressors, - List> propsList, List> tagsList, - List> attributesList, List measurementAliasList) -``` - -* Create aligned timeseries -``` -void createAlignedTimeseries(String prefixPath, List measurements, - List dataTypes, List encodings, - List compressors, List measurementAliasList); -``` - -Attention: Alias of measurements are **not supported** currently. - -* Delete one or several timeseries - -``` java -void deleteTimeseries(String path) -void deleteTimeseries(List paths) -``` - -* Check whether the specific timeseries exists. - -``` java -boolean checkTimeseriesExists(String path) -``` - -#### Schema Template - - -Create a schema template for massive identical devices will help to improve memory performance. You can use Template, InternalNode and MeasurementNode to depict the structure of the template, and use belowed interface to create it inside session. - -``` java -public void createSchemaTemplate(Template template); - -Class Template { - private String name; - private boolean directShareTime; - Map children; - public Template(String name, boolean isShareTime); - - public void addToTemplate(Node node); - public void deleteFromTemplate(String name); - public void setShareTime(boolean shareTime); -} - -Abstract Class Node { - private String name; - public void addChild(Node node); - public void deleteChild(Node node); -} - -Class MeasurementNode extends Node { - TSDataType dataType; - TSEncoding encoding; - CompressionType compressor; - public MeasurementNode(String name, - TSDataType dataType, - TSEncoding encoding, - CompressionType compressor); -} -``` - -We strongly suggest you implement templates only with flat-measurement (like object 'flatTemplate' in belowed snippet), since tree-structured template may not be a long-term supported feature in further version of IoTDB. - -A snippet of using above Method and Class: - -``` java -MeasurementNode nodeX = new MeasurementNode("x", TSDataType.FLOAT, TSEncoding.RLE, CompressionType.SNAPPY); -MeasurementNode nodeY = new MeasurementNode("y", TSDataType.FLOAT, TSEncoding.RLE, CompressionType.SNAPPY); -MeasurementNode nodeSpeed = new MeasurementNode("speed", TSDataType.DOUBLE, TSEncoding.GORILLA, CompressionType.SNAPPY); - -// This is the template we suggest to implement -Template flatTemplate = new Template("flatTemplate"); -template.addToTemplate(nodeX); -template.addToTemplate(nodeY); -template.addToTemplate(nodeSpeed); - -createSchemaTemplate(flatTemplate); -``` - -You can query measurement inside templates with these APIS: - -```java -// Return the amount of measurements inside a template -public int countMeasurementsInTemplate(String templateName); - -// Return true if path points to a measurement, otherwise returne false -public boolean isMeasurementInTemplate(String templateName, String path); - -// Return true if path exists in template, otherwise return false -public boolean isPathExistInTemplate(String templateName, String path); - -// Return all measurements paths inside template -public List showMeasurementsInTemplate(String templateName); - -// Return all measurements paths under the designated patter inside template -public List showMeasurementsInTemplate(String templateName, String pattern); -``` - -To implement schema template, you can set the measurement template named 'templateName' at path 'prefixPath'. - -**Please notice that, we strongly recommend not setting templates on the nodes above the database to accommodate future updates and collaboration between modules.** - -``` java -void setSchemaTemplate(String templateName, String prefixPath) -``` - -Before setting template, you should firstly create the template using - -``` java -void createSchemaTemplate(Template template) -``` - -After setting template to a certain path, you can use the template to create timeseries on given device paths through the following interface, or you can write data directly to trigger timeseries auto creation using schema template under target devices. - -``` java -void createTimeseriesUsingSchemaTemplate(List devicePathList) -``` - -After setting template to a certain path, you can query for info about template using belowed interface in session: - -``` java -/** @return All template names. */ -public List showAllTemplates(); - -/** @return All paths have been set to designated template. */ -public List showPathsTemplateSetOn(String templateName); - -/** @return All paths are using designated template. */ -public List showPathsTemplateUsingOn(String templateName) -``` - -If you are ready to get rid of schema template, you can drop it with belowed interface. Make sure the template to drop has been unset from MTree. - -``` java -void unsetSchemaTemplate(String prefixPath, String templateName); -public void dropSchemaTemplate(String templateName); -``` - -Unset the measurement template named 'templateName' from path 'prefixPath'. When you issue this interface, you should assure that there is a template named 'templateName' set at the path 'prefixPath'. - -Attention: Unsetting the template named 'templateName' from node at path 'prefixPath' or descendant nodes which have already inserted records using template is **not supported**. - - -### Data Manipulation Interface (DML Interface) - -#### Insert - -It is recommended to use insertTablet to help improve write efficiency. - -* Insert a Tablet,which is multiple rows of a device, each row has the same measurements - * **Better Write Performance** - * **Support batch write** - * **Support null values**: fill the null value with any value, and then mark the null value via BitMap - -``` java -void insertTablet(Tablet tablet) - -public class Tablet { - /** deviceId of this tablet */ - public String prefixPath; - /** the list of measurement schemas for creating the tablet */ - private List schemas; - /** timestamps in this tablet */ - public long[] timestamps; - /** each object is a primitive type array, which represents values of one measurement */ - public Object[] values; - /** each bitmap represents the existence of each value in the current column. */ - public BitMap[] bitMaps; - /** the number of rows to include in this tablet */ - public int rowSize; - /** the maximum number of rows for this tablet */ - private int maxRowNumber; - /** whether this tablet store data of aligned timeseries or not */ - private boolean isAligned; -} -``` - -* Insert multiple Tablets - -``` java -void insertTablets(Map tablet) -``` - -* Insert a Record, which contains multiple measurement value of a device at a timestamp. This method is equivalent to providing a common interface for multiple data types of values. Later, the value can be cast to the original type through TSDataType. - - The correspondence between the Object type and the TSDataType type is shown in the following table. - - | TSDataType | Object | - | ---------- | -------------- | - | BOOLEAN | Boolean | - | INT32 | Integer | - | INT64 | Long | - | FLOAT | Float | - | DOUBLE | Double | - | TEXT | String, Binary | - -``` java -void insertRecord(String deviceId, long time, List measurements, - List types, List values) -``` - -* Insert multiple Records - -``` java -void insertRecords(List deviceIds, List times, - List> measurementsList, List> typesList, - List> valuesList) -``` -* Insert multiple Records that belong to the same device. - With type info the server has no need to do type inference, which leads a better performance - -``` java -void insertRecordsOfOneDevice(String deviceId, List times, - List> measurementsList, List> typesList, - List> valuesList) -``` - -#### Insert with type inference - -When the data is of String type, we can use the following interface to perform type inference based on the value of the value itself. For example, if value is "true" , it can be automatically inferred to be a boolean type. If value is "3.2" , it can be automatically inferred as a flout type. Without type information, server has to do type inference, which may cost some time. - -* Insert a Record, which contains multiple measurement value of a device at a timestamp - -``` java -void insertRecord(String prefixPath, long time, List measurements, List values) -``` - -* Insert multiple Records - -``` java -void insertRecords(List deviceIds, List times, - List> measurementsList, List> valuesList) -``` - -* Insert multiple Records that belong to the same device. - -``` java -void insertStringRecordsOfOneDevice(String deviceId, List times, - List> measurementsList, List> valuesList) -``` - -#### Insert of Aligned Timeseries - -The Insert of aligned timeseries uses interfaces like insertAlignedXXX, and others are similar to the above interfaces: - -* insertAlignedRecord -* insertAlignedRecords -* insertAlignedRecordsOfOneDevice -* insertAlignedStringRecordsOfOneDevice -* insertAlignedTablet -* insertAlignedTablets - -#### Delete - -* Delete data before or equal to a timestamp of one or several timeseries - -``` java -void deleteData(String path, long time) -void deleteData(List paths, long time) -``` - -#### Query - -* Time-series raw data query with time range: - - The specified query time range is a left-closed right-open interval, including the start time but excluding the end time. - -``` java -SessionDataSet executeRawDataQuery(List paths, long startTime, long endTime); -``` - -* Last query: - - Query the last data, whose timestamp is greater than or equal LastTime. - ``` java - SessionDataSet executeLastDataQuery(List paths, long LastTime); - ``` - - Query the latest point of the specified series of single device quickly, and support redirection; - If you are sure that the query path is valid, set 'isLegalPathNodes' to true to avoid performance penalties from path verification. - ``` java - SessionDataSet executeLastDataQueryForOneDevice( - String db, String device, List sensors, boolean isLegalPathNodes); - ``` - -* Aggregation query: - - Support specified query time range: The specified query time range is a left-closed right-open interval, including the start time but not the end time. - - Support GROUP BY TIME. - -``` java -SessionDataSet executeAggregationQuery(List paths, List aggregations); - -SessionDataSet executeAggregationQuery( - List paths, List aggregations, long startTime, long endTime); - -SessionDataSet executeAggregationQuery( - List paths, - List aggregations, - long startTime, - long endTime, - long interval); - -SessionDataSet executeAggregationQuery( - List paths, - List aggregations, - long startTime, - long endTime, - long interval, - long slidingStep); -``` - -### IoTDB-SQL Interface - -* Execute query statement - -``` java -SessionDataSet executeQueryStatement(String sql) -``` - -* Execute non query statement - -``` java -void executeNonQueryStatement(String sql) -``` - -### Write Test Interface (to profile network cost) - -These methods **don't** insert data into database and server just return after accept the request. - -* Test the network and client cost of insertRecord - -``` java -void testInsertRecord(String deviceId, long time, List measurements, List values) - -void testInsertRecord(String deviceId, long time, List measurements, - List types, List values) -``` - -* Test the network and client cost of insertRecords - -``` java -void testInsertRecords(List deviceIds, List times, - List> measurementsList, List> valuesList) - -void testInsertRecords(List deviceIds, List times, - List> measurementsList, List> typesList - List> valuesList) -``` - -* Test the network and client cost of insertTablet - -``` java -void testInsertTablet(Tablet tablet) -``` - -* Test the network and client cost of insertTablets - -``` java -void testInsertTablets(Map tablets) -``` - -### Coding Examples - -To get more information of the following interfaces, please view session/src/main/java/org/apache/iotdb/session/Session.java - -The sample code of using these interfaces is in example/session/src/main/java/org/apache/iotdb/SessionExample.java,which provides an example of how to open an IoTDB session, execute a batch insertion. - -For examples of aligned timeseries and measurement template, you can refer to `example/session/src/main/java/org/apache/iotdb/AlignedTimeseriesSessionExample.java` - - -## Session Pool for Native API - -We provide a connection pool (`SessionPool) for Native API. -Using the interface, you need to define the pool size. - -If you can not get a session connection in 60 seconds, there is a warning log but the program will hang. - -If a session has finished an operation, it will be put back to the pool automatically. -If a session connection is broken, the session will be removed automatically and the pool will try -to create a new session and redo the operation. -You can also specify an url list of multiple reachable nodes when creating a SessionPool, just as you would when creating a Session. To ensure high availability of clients in distributed cluster. - -For query operations: - -1. When using SessionPool to query data, the result set is `SessionDataSetWrapper`; -2. Given a `SessionDataSetWrapper`, if you have not scanned all the data in it and stop to use it, -you have to call `SessionPool.closeResultSet(wrapper)` manually; -3. When you call `hasNext()` and `next()` of a `SessionDataSetWrapper` and there is an exception, then -you have to call `SessionPool.closeResultSet(wrapper)` manually; -4. You can call `getColumnNames()` of `SessionDataSetWrapper` to get the column names of query result; - -Examples: ```session/src/test/java/org/apache/iotdb/session/pool/SessionPoolTest.java``` - -Or `example/session/src/main/java/org/apache/iotdb/SessionPoolExample.java` - - diff --git a/src/UserGuide/V1.3.0-2/API/Programming-Kafka.md b/src/UserGuide/V1.3.0-2/API/Programming-Kafka.md deleted file mode 100644 index a03f3183b..000000000 --- a/src/UserGuide/V1.3.0-2/API/Programming-Kafka.md +++ /dev/null @@ -1,118 +0,0 @@ - - -# Kafka - -[Apache Kafka](https://kafka.apache.org/) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. - -## Coding Example - -### kafka Producer Producing Data Java Code Example - -```java - Properties props = new Properties(); - props.put("bootstrap.servers", "127.0.0.1:9092"); - props.put("key.serializer", StringSerializer.class); - props.put("value.serializer", StringSerializer.class); - KafkaProducer producer = new KafkaProducer<>(props); - producer.send( - new ProducerRecord<>( - "Kafka-Test", "key", "root.kafka," + System.currentTimeMillis() + ",value,INT32,100")); - producer.close(); -``` - -### kafka Consumer Receiving Data Java Code Example - -```java - Properties props = new Properties(); - props.put("bootstrap.servers", "127.0.0.1:9092"); - props.put("key.deserializer", StringDeserializer.class); - props.put("value.deserializer", StringDeserializer.class); - props.put("auto.offset.reset", "earliest"); - props.put("group.id", "Kafka-Test"); - KafkaConsumer kafkaConsumer = new KafkaConsumer<>(props); - kafkaConsumer.subscribe(Collections.singleton("Kafka-Test")); - ConsumerRecords records = kafkaConsumer.poll(Duration.ofSeconds(1)); - ``` - -### Example of Java Code Stored in IoTDB Server - -```java - SessionPool pool = - new SessionPool.Builder() - .host("127.0.0.1") - .port(6667) - .user("root") - .password("root") - .maxSize(3) - .build(); - List datas = new ArrayList<>(records.count()); - for (ConsumerRecord record : records) { - datas.add(record.value()); - } - int size = datas.size(); - List deviceIds = new ArrayList<>(size); - List times = new ArrayList<>(size); - List> measurementsList = new ArrayList<>(size); - List> typesList = new ArrayList<>(size); - List> valuesList = new ArrayList<>(size); - for (String data : datas) { - String[] dataArray = data.split(","); - String device = dataArray[0]; - long time = Long.parseLong(dataArray[1]); - List measurements = Arrays.asList(dataArray[2].split(":")); - List types = new ArrayList<>(); - for (String type : dataArray[3].split(":")) { - types.add(TSDataType.valueOf(type)); - } - List values = new ArrayList<>(); - String[] valuesStr = dataArray[4].split(":"); - for (int i = 0; i < valuesStr.length; i++) { - switch (types.get(i)) { - case INT64: - values.add(Long.parseLong(valuesStr[i])); - break; - case DOUBLE: - values.add(Double.parseDouble(valuesStr[i])); - break; - case INT32: - values.add(Integer.parseInt(valuesStr[i])); - break; - case TEXT: - values.add(valuesStr[i]); - break; - case FLOAT: - values.add(Float.parseFloat(valuesStr[i])); - break; - case BOOLEAN: - values.add(Boolean.parseBoolean(valuesStr[i])); - break; - } - } - deviceIds.add(device); - times.add(time); - measurementsList.add(measurements); - typesList.add(types); - valuesList.add(values); - } - pool.insertRecords(deviceIds, times, measurementsList, typesList, valuesList); - ``` - diff --git a/src/UserGuide/V1.3.0-2/API/Programming-MQTT.md b/src/UserGuide/V1.3.0-2/API/Programming-MQTT.md deleted file mode 100644 index 9f4a3e86a..000000000 --- a/src/UserGuide/V1.3.0-2/API/Programming-MQTT.md +++ /dev/null @@ -1,183 +0,0 @@ - -# MQTT Protocol - -[MQTT](http://mqtt.org/) is a machine-to-machine (M2M)/"Internet of Things" connectivity protocol. -It was designed as an extremely lightweight publish/subscribe messaging transport. -It is useful for connections with remote locations where a small code footprint is required and/or network bandwidth is at a premium. - -IoTDB supports the MQTT v3.1(an OASIS Standard) protocol. -IoTDB server includes a built-in MQTT service that allows remote devices send messages into IoTDB server directly. - - - - -## Built-in MQTT Service -The Built-in MQTT Service provide the ability of direct connection to IoTDB through MQTT. It listen the publish messages from MQTT clients - and then write the data into storage immediately. -The MQTT topic corresponds to IoTDB timeseries. -The messages payload can be format to events by `PayloadFormatter` which loaded by java SPI, and the default implementation is `JSONPayloadFormatter`. -The default `json` formatter support two json format and its json array. The following is an MQTT message payload example: - -```json - { - "device":"root.sg.d1", - "timestamp":1586076045524, - "measurements":["s1","s2"], - "values":[0.530635,0.530635] - } -``` -or -```json - { - "device":"root.sg.d1", - "timestamps":[1586076045524,1586076065526], - "measurements":["s1","s2"], - "values":[[0.530635,0.530635], [0.530655,0.530695]] - } -``` -or json array of the above two. - - - -## MQTT Configurations -The IoTDB MQTT service load configurations from `${IOTDB_HOME}/${IOTDB_CONF}/iotdb-common.properties` by default. - -Configurations are as follows: - -| NAME | DESCRIPTION | DEFAULT | -| ------------- |:-------------:|:------:| -| enable_mqtt_service | whether to enable the mqtt service | false | -| mqtt_host | the mqtt service binding host | 127.0.0.1 | -| mqtt_port | the mqtt service binding port | 1883 | -| mqtt_handler_pool_size | the handler pool size for handing the mqtt messages | 1 | -| mqtt_payload_formatter | the mqtt message payload formatter | json | -| mqtt_max_message_size | the max mqtt message size in byte| 1048576 | - - -## Coding Examples -The following is an example which a mqtt client send messages to IoTDB server. - -```java -MQTT mqtt = new MQTT(); -mqtt.setHost("127.0.0.1", 1883); -mqtt.setUserName("root"); -mqtt.setPassword("root"); - -BlockingConnection connection = mqtt.blockingConnection(); -connection.connect(); - -Random random = new Random(); -for (int i = 0; i < 10; i++) { - String payload = String.format("{\n" + - "\"device\":\"root.sg.d1\",\n" + - "\"timestamp\":%d,\n" + - "\"measurements\":[\"s1\"],\n" + - "\"values\":[%f]\n" + - "}", System.currentTimeMillis(), random.nextDouble()); - - connection.publish("root.sg.d1.s1", payload.getBytes(), QoS.AT_LEAST_ONCE, false); -} - -connection.disconnect(); - -``` - -## Customize your MQTT Message Format - -If you do not like the above Json format, you can customize your MQTT Message format by just writing several lines -of codes. An example can be found in `example/mqtt-customize` project. - -Steps: -1. Create a java project, and add dependency: -```xml - - org.apache.iotdb - iotdb-server - 1.1.0-SNAPSHOT - -``` -2. Define your implementation which implements `org.apache.iotdb.db.protocol.mqtt.PayloadFormatter` -e.g., - -```java -package org.apache.iotdb.mqtt.server; - -import io.netty.buffer.ByteBuf; -import org.apache.iotdb.db.protocol.mqtt.Message; -import org.apache.iotdb.db.protocol.mqtt.PayloadFormatter; - -import java.nio.charset.StandardCharsets; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.List; - -public class CustomizedJsonPayloadFormatter implements PayloadFormatter { - - @Override - public List format(ByteBuf payload) { - // Suppose the payload is a json format - if (payload == null) { - return null; - } - - String json = payload.toString(StandardCharsets.UTF_8); - // parse data from the json and generate Messages and put them into List ret - List ret = new ArrayList<>(); - // this is just an example, so we just generate some Messages directly - for (int i = 0; i < 2; i++) { - long ts = i; - Message message = new Message(); - message.setDevice("d" + i); - message.setTimestamp(ts); - message.setMeasurements(Arrays.asList("s1", "s2")); - message.setValues(Arrays.asList("4.0" + i, "5.0" + i)); - ret.add(message); - } - return ret; - } - - @Override - public String getName() { - // set the value of mqtt_payload_formatter in iotdb-common.properties as the following string: - return "CustomizedJson"; - } -} -``` -3. modify the file in `src/main/resources/META-INF/services/org.apache.iotdb.db.protocol.mqtt.PayloadFormatter`: - clean the file and put your implementation class name into the file. - In this example, the content is: `org.apache.iotdb.mqtt.server.CustomizedJsonPayloadFormatter` -4. compile your implementation as a jar file: `mvn package -DskipTests` - - -Then, in your server: -1. Create ${IOTDB_HOME}/ext/mqtt/ folder, and put the jar into this folder. -2. Update configuration to enable MQTT service. (`enable_mqtt_service=true` in `conf/iotdb-common.properties`) -3. Set the value of `mqtt_payload_formatter` in `conf/iotdb-common.properties` as the value of getName() in your implementation - , in this example, the value is `CustomizedJson` -4. Launch the IoTDB server. -5. Now IoTDB will use your implementation to parse the MQTT message. - -More: the message format can be anything you want. For example, if it is a binary format, -just use `payload.forEachByte()` or `payload.array` to get bytes content. - - - diff --git a/src/UserGuide/V1.3.0-2/API/Programming-NodeJS-Native-API.md b/src/UserGuide/V1.3.0-2/API/Programming-NodeJS-Native-API.md deleted file mode 100644 index cae3f918a..000000000 --- a/src/UserGuide/V1.3.0-2/API/Programming-NodeJS-Native-API.md +++ /dev/null @@ -1,196 +0,0 @@ - - - -# Node.js Native API - -IoTDB uses Thrift as a cross language RPC framework, so access to IoTDB can be achieved through the interface provided by Thrift. This document will introduce how to generate a native Node.js interface that can access IoTDB. - -## Dependents - - * JDK >= 1.8 - * Node.js >= 16.0.0 - * thrift 0.14.1 - * Linux、Macos or like unix - * Windows+bash - -Thrift (0.14.1 or higher) must be installed to compile Thrift files into Node.js code. The following is the official installation tutorial, and in the end, you should receive a Thrift executable file. -``` -http://thrift.apache.org/docs/install/ -``` - - -## Compile the Thrift library and generate the Node.js native interface - -1. Find the pom.xml file in the root directory of the IoTDB source code folder. -2. Open the pom.xml file and find the following content: - -```xml - - generate-thrift-sources-java - generate-sources - - compile - - - java - ${thrift.exec.absolute.path} - ${basedir}/src/main/thrift - - -``` -3. Referring to this setting, add the following content to the pom.xml file to generate the native interface for Node.js: - -```xml - - generate-thrift-sources-nodejs - generate-sources - - compile - - - js:node - ${thrift.exec.absolute.path} - ${basedir}/src/main/thrift - **/common.thrift,**/client.thrift - ${project.build.directory}/generated-sources-nodejs - - -``` - -4. In the root directory of the IoTDB source code folder,run `mvn clean generate-sources`, - -This command will automatically delete the files in `iotdb/iotdb-protocol/thrift/target` and `iotdb/iotdb-protocol/thrift-commons/target`, and repopulate the folder with the newly generated throttle file. - - -## Using the Node.js native interface - -copy `iotdb/iotdb-protocol/thrift/target/generated-sources-nodejs/` and `iotdb/iotdb-protocol/thrift-commons/target/generated-sources-nodejs/` in your project。 - - -## rpc interface - -``` -// open a session -TSOpenSessionResp openSession(1:TSOpenSessionReq req); - -// close a session -TSStatus closeSession(1:TSCloseSessionReq req); - -// run an SQL statement in batch -TSExecuteStatementResp executeStatement(1:TSExecuteStatementReq req); - -// execute SQL statement in batch -TSStatus executeBatchStatement(1:TSExecuteBatchStatementReq req); - -// execute query SQL statement -TSExecuteStatementResp executeQueryStatement(1:TSExecuteStatementReq req); - -// execute insert, delete and update SQL statement -TSExecuteStatementResp executeUpdateStatement(1:TSExecuteStatementReq req); - -// fetch next query result -TSFetchResultsResp fetchResults(1:TSFetchResultsReq req) - -// fetch meta data -TSFetchMetadataResp fetchMetadata(1:TSFetchMetadataReq req) - -// cancel a query -TSStatus cancelOperation(1:TSCancelOperationReq req); - -// close a query dataset -TSStatus closeOperation(1:TSCloseOperationReq req); - -// get time zone -TSGetTimeZoneResp getTimeZone(1:i64 sessionId); - -// set time zone -TSStatus setTimeZone(1:TSSetTimeZoneReq req); - -// get server's properties -ServerProperties getProperties(); - -// CREATE DATABASE -TSStatus setStorageGroup(1:i64 sessionId, 2:string storageGroup); - -// create timeseries -TSStatus createTimeseries(1:TSCreateTimeseriesReq req); - -// create multi timeseries -TSStatus createMultiTimeseries(1:TSCreateMultiTimeseriesReq req); - -// delete timeseries -TSStatus deleteTimeseries(1:i64 sessionId, 2:list path) - -// delete sttorage groups -TSStatus deleteStorageGroups(1:i64 sessionId, 2:list storageGroup); - -// insert record -TSStatus insertRecord(1:TSInsertRecordReq req); - -// insert record in string format -TSStatus insertStringRecord(1:TSInsertStringRecordReq req); - -// insert tablet -TSStatus insertTablet(1:TSInsertTabletReq req); - -// insert tablets in batch -TSStatus insertTablets(1:TSInsertTabletsReq req); - -// insert records in batch -TSStatus insertRecords(1:TSInsertRecordsReq req); - -// insert records of one device -TSStatus insertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); - -// insert records in batch as string format -TSStatus insertStringRecords(1:TSInsertStringRecordsReq req); - -// test the latency of innsert tablet,caution:no data will be inserted, only for test latency -TSStatus testInsertTablet(1:TSInsertTabletReq req); - -// test the latency of innsert tablets,caution:no data will be inserted, only for test latency -TSStatus testInsertTablets(1:TSInsertTabletsReq req); - -// test the latency of innsert record,caution:no data will be inserted, only for test latency -TSStatus testInsertRecord(1:TSInsertRecordReq req); - -// test the latency of innsert record in string format,caution:no data will be inserted, only for test latency -TSStatus testInsertStringRecord(1:TSInsertStringRecordReq req); - -// test the latency of innsert records,caution:no data will be inserted, only for test latency -TSStatus testInsertRecords(1:TSInsertRecordsReq req); - -// test the latency of innsert records of one device,caution:no data will be inserted, only for test latency -TSStatus testInsertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); - -// test the latency of innsert records in string formate,caution:no data will be inserted, only for test latency -TSStatus testInsertStringRecords(1:TSInsertStringRecordsReq req); - -// delete data -TSStatus deleteData(1:TSDeleteDataReq req); - -// execute raw data query -TSExecuteStatementResp executeRawDataQuery(1:TSRawDataQueryReq req); - -// request a statement id from server -i64 requestStatementId(1:i64 sessionId); -``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/API/Programming-ODBC.md b/src/UserGuide/V1.3.0-2/API/Programming-ODBC.md deleted file mode 100644 index 443f9ef1a..000000000 --- a/src/UserGuide/V1.3.0-2/API/Programming-ODBC.md +++ /dev/null @@ -1,146 +0,0 @@ - - -# ODBC -With IoTDB JDBC, IoTDB can be accessed using the ODBC-JDBC bridge. - -## Dependencies -* IoTDB-JDBC's jar-with-dependency package -* ODBC-JDBC bridge (e.g. ZappySys JDBC Bridge) - -## Deployment -### Preparing JDBC package -Download the source code of IoTDB, and execute the following command in root directory: -```shell -mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies -``` -Then, you can see the output `iotdb-jdbc-1.3.2-SNAPSHOT-jar-with-dependencies.jar` under `iotdb-client/jdbc/target` directory. - -### Preparing ODBC-JDBC Bridge -*Note: Here we only provide one kind of ODBC-JDBC bridge as the instance. Readers can use other ODBC-JDBC bridges to access IoTDB with the IOTDB-JDBC.* -1. **Download Zappy-Sys ODBC-JDBC Bridge**: - Enter the https://zappysys.com/products/odbc-powerpack/odbc-jdbc-bridge-driver/ website, and click "download". - - ![ZappySys_website.jpg](/img/ZappySys_website.jpg) - -2. **Prepare IoTDB**: Set up IoTDB cluster, and write a row of data arbitrarily. - ```sql - IoTDB > insert into root.ln.wf02.wt02(timestamp,status) values(1,true) - ``` - -3. **Deploy and Test the Bridge**: - 1. Open ODBC Data Sources(32/64 bit), depending on the bits of Windows. One possible position is `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Administrative Tools`. - - ![ODBC_ADD_EN.jpg](/img/ODBC_ADD_EN.jpg) - - 2. Click on "add" and select ZappySys JDBC Bridge. - - ![ODBC_CREATE_EN.jpg](/img/ODBC_CREATE_EN.jpg) - - 3. Fill in the following settings: - - | Property | Content | Example | - |---------------------|-----------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------| - | Connection String | jdbc:iotdb://\:\/ | jdbc:iotdb://127.0.0.1:6667/ | - | Driver Class | org.apache.iotdb.jdbc.IoTDBDriver | org.apache.iotdb.jdbc.IoTDBDriver | - | JDBC driver file(s) | The path of IoTDB JDBC jar-with-dependencies | C:\Users\13361\Documents\GitHub\iotdb\iotdb-client\jdbc\target\iotdb-jdbc-1.3.2-SNAPSHOT-jar-with-dependencies.jar | - | User name | IoTDB's user name | root | - | User password | IoTDB's password | root | - - ![ODBC_CONNECTION.png](/img/ODBC_CONNECTION.png) - - 4. Click on "Test Connection" button, and a "Test Connection: SUCCESSFUL" should appear. - - ![ODBC_CONFIG_EN.jpg](/img/ODBC_CONFIG_EN.jpg) - - 5. Click the "Preview" button above, and replace the original query text with `select * from root.**`, then click "Preview Data", and the query result should correctly. - - ![ODBC_TEST.jpg](/img/ODBC_TEST.jpg) - -4. **Operate IoTDB's data with ODBC**: After correct deployment, you can use Microsoft's ODBC library to operate IoTDB's data. Here's an example written in C#: - ```C# - using System.Data.Odbc; - - // Get a connection - var dbConnection = new OdbcConnection("DSN=ZappySys JDBC Bridge"); - dbConnection.Open(); - - // Execute the write commands to prepare data - var dbCommand = dbConnection.CreateCommand(); - dbCommand.CommandText = "insert into root.Keller.Flur.Energieversorgung(time, s1) values(1715670861634, 1)"; - dbCommand.ExecuteNonQuery(); - dbCommand.CommandText = "insert into root.Keller.Flur.Energieversorgung(time, s2) values(1715670861634, true)"; - dbCommand.ExecuteNonQuery(); - dbCommand.CommandText = "insert into root.Keller.Flur.Energieversorgung(time, s3) values(1715670861634, 3.1)"; - dbCommand.ExecuteNonQuery(); - - // Execute the read command - dbCommand.CommandText = "SELECT * FROM root.Keller.Flur.Energieversorgung"; - var dbReader = dbCommand.ExecuteReader(); - - // Write the output header - var fCount = dbReader.FieldCount; - Console.Write(":"); - for(var i = 0; i < fCount; i++) - { - var fName = dbReader.GetName(i); - Console.Write(fName + ":"); - } - Console.WriteLine(); - - // Output the content - while (dbReader.Read()) - { - Console.Write(":"); - for(var i = 0; i < fCount; i++) - { - var fieldType = dbReader.GetFieldType(i); - switch (fieldType.Name) - { - case "DateTime": - var dateTime = dbReader.GetInt64(i); - Console.Write(dateTime + ":"); - break; - case "Double": - if (dbReader.IsDBNull(i)) - { - Console.Write("null:"); - } - else - { - var fValue = dbReader.GetDouble(i); - Console.Write(fValue + ":"); - } - break; - default: - Console.Write(fieldType.Name + ":"); - break; - } - } - Console.WriteLine(); - } - - // Shut down gracefully - dbReader.Close(); - dbCommand.Dispose(); - dbConnection.Close(); - ``` - This program can write data into IoTDB, and query the data we have just written. diff --git a/src/UserGuide/V1.3.0-2/API/Programming-Python-Native-API.md b/src/UserGuide/V1.3.0-2/API/Programming-Python-Native-API.md deleted file mode 100644 index fa3f939b8..000000000 --- a/src/UserGuide/V1.3.0-2/API/Programming-Python-Native-API.md +++ /dev/null @@ -1,732 +0,0 @@ - - -# Python Native API - -## Requirements - -You have to install thrift (>=0.13) before using the package. - - - -## How to use (Example) - -First, download the package: `pip3 install apache-iotdb` - -You can get an example of using the package to read and write data at here: [Session Example](https://github.com/apache/iotdb/blob/rc/1.3.0/iotdb-client/client-py/SessionExample.py) - -An example of aligned timeseries: [Aligned Timeseries Session Example](https://github.com/apache/iotdb/blob/rc/1.3.0/iotdb-client/client-py/SessionAlignedTimeseriesExample.py) - -(you need to add `import iotdb` in the head of the file) - -Or: - -```python -from iotdb.Session import Session - -ip = "127.0.0.1" -port_ = "6667" -username_ = "root" -password_ = "root" -session = Session(ip, port_, username_, password_) -session.open(False) -zone = session.get_time_zone() -session.close() -``` - -## Initialization - -* Initialize a Session - -```python -session = Session( - ip="127.0.0.1", - port="6667", - user="root", - password="root", - fetch_size=1024, - zone_id="UTC+8", - enable_redirection=True -) -``` - -* Initialize a Session to connect multiple nodes - -```python -session = Session.init_from_node_urls( - node_urls=["127.0.0.1:6667", "127.0.0.1:6668", "127.0.0.1:6669"], - user="root", - password="root", - fetch_size=1024, - zone_id="UTC+8", - enable_redirection=True -) -``` - -* Open a session, with a parameter to specify whether to enable RPC compression - -```python -session.open(enable_rpc_compression=False) -``` - -Notice: this RPC compression status of client must comply with that of IoTDB server - -* Close a Session - -```python -session.close() -``` -## Managing Session through SessionPool - -Utilizing SessionPool to manage sessions eliminates the need to worry about session reuse. When the number of session connections reaches the maximum capacity of the pool, requests for acquiring a session will be blocked, and you can set the blocking wait time through parameters. After using a session, it should be returned to the SessionPool using the `putBack` method for proper management. - -### Create SessionPool - -```python -pool_config = PoolConfig(host=ip,port=port, user_name=username, - password=password, fetch_size=1024, - time_zone="UTC+8", max_retry=3) -max_pool_size = 5 -wait_timeout_in_ms = 3000 - -# # Create the connection pool -session_pool = SessionPool(pool_config, max_pool_size, wait_timeout_in_ms) -``` -### Create a SessionPool using distributed nodes. -```python -pool_config = PoolConfig(node_urls=node_urls=["127.0.0.1:6667", "127.0.0.1:6668", "127.0.0.1:6669"], user_name=username, - password=password, fetch_size=1024, - time_zone="UTC+8", max_retry=3) -max_pool_size = 5 -wait_timeout_in_ms = 3000 -``` -### Acquiring a session through SessionPool and manually calling PutBack after use - -```python -session = session_pool.get_session() -session.set_storage_group(STORAGE_GROUP_NAME) -session.create_time_series( - TIMESERIES_PATH, TSDataType.BOOLEAN, TSEncoding.PLAIN, Compressor.SNAPPY -) -# After usage, return the session using putBack -session_pool.put_back(session) -# When closing the sessionPool, all managed sessions will be closed as well -session_pool.close() -``` - -## Data Definition Interface (DDL Interface) - -### Database Management - -* CREATE DATABASE - -```python -session.set_storage_group(group_name) -``` - -* Delete one or several databases - -```python -session.delete_storage_group(group_name) -session.delete_storage_groups(group_name_lst) -``` -### Timeseries Management - -* Create one or multiple timeseries - -```python -session.create_time_series(ts_path, data_type, encoding, compressor, - props=None, tags=None, attributes=None, alias=None) - -session.create_multi_time_series( - ts_path_lst, data_type_lst, encoding_lst, compressor_lst, - props_lst=None, tags_lst=None, attributes_lst=None, alias_lst=None -) -``` - -* Create aligned timeseries - -```python -session.create_aligned_time_series( - device_id, measurements_lst, data_type_lst, encoding_lst, compressor_lst -) -``` - -Attention: Alias of measurements are **not supported** currently. - -* Delete one or several timeseries - -```python -session.delete_time_series(paths_list) -``` - -* Check whether the specific timeseries exists - -```python -session.check_time_series_exists(path) -``` - -## Data Manipulation Interface (DML Interface) - -### Insert - -It is recommended to use insertTablet to help improve write efficiency. - -* Insert a Tablet,which is multiple rows of a device, each row has the same measurements - * **Better Write Performance** - * **Support null values**: fill the null value with any value, and then mark the null value via BitMap (from v0.13) - - -We have two implementations of Tablet in Python API. - -* Normal Tablet - -```python -values_ = [ - [False, 10, 11, 1.1, 10011.1, "test01"], - [True, 100, 11111, 1.25, 101.0, "test02"], - [False, 100, 1, 188.1, 688.25, "test03"], - [True, 0, 0, 0, 6.25, "test04"], -] -timestamps_ = [1, 2, 3, 4] -tablet_ = Tablet( - device_id, measurements_, data_types_, values_, timestamps_ -) -session.insert_tablet(tablet_) - -values_ = [ - [None, 10, 11, 1.1, 10011.1, "test01"], - [True, None, 11111, 1.25, 101.0, "test02"], - [False, 100, None, 188.1, 688.25, "test03"], - [True, 0, 0, 0, None, None], -] -timestamps_ = [16, 17, 18, 19] -tablet_ = Tablet( - device_id, measurements_, data_types_, values_, timestamps_ -) -session.insert_tablet(tablet_) -``` -* Numpy Tablet - -Comparing with Tablet, Numpy Tablet is using [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) to record data. -With less memory footprint and time cost of serialization, the insert performance will be better. - -**Notice** -1. time and numerical value columns in Tablet is ndarray -2. recommended to use the specific dtypes to each ndarray, see the example below - (if not, the default dtypes are also ok). - -```python -import numpy as np -data_types_ = [ - TSDataType.BOOLEAN, - TSDataType.INT32, - TSDataType.INT64, - TSDataType.FLOAT, - TSDataType.DOUBLE, - TSDataType.TEXT, -] -np_values_ = [ - np.array([False, True, False, True], TSDataType.BOOLEAN.np_dtype()), - np.array([10, 100, 100, 0], TSDataType.INT32.np_dtype()), - np.array([11, 11111, 1, 0], TSDataType.INT64.np_dtype()), - np.array([1.1, 1.25, 188.1, 0], TSDataType.FLOAT.np_dtype()), - np.array([10011.1, 101.0, 688.25, 6.25], TSDataType.DOUBLE.np_dtype()), - np.array(["test01", "test02", "test03", "test04"], TSDataType.TEXT.np_dtype()), -] -np_timestamps_ = np.array([1, 2, 3, 4], TSDataType.INT64.np_dtype()) -np_tablet_ = NumpyTablet( - device_id, measurements_, data_types_, np_values_, np_timestamps_ -) -session.insert_tablet(np_tablet_) - -# insert one numpy tablet with None into the database. -np_values_ = [ - np.array([False, True, False, True], TSDataType.BOOLEAN.np_dtype()), - np.array([10, 100, 100, 0], TSDataType.INT32.np_dtype()), - np.array([11, 11111, 1, 0], TSDataType.INT64.np_dtype()), - np.array([1.1, 1.25, 188.1, 0], TSDataType.FLOAT.np_dtype()), - np.array([10011.1, 101.0, 688.25, 6.25], TSDataType.DOUBLE.np_dtype()), - np.array(["test01", "test02", "test03", "test04"], TSDataType.TEXT.np_dtype()), -] -np_timestamps_ = np.array([98, 99, 100, 101], TSDataType.INT64.np_dtype()) -np_bitmaps_ = [] -for i in range(len(measurements_)): - np_bitmaps_.append(BitMap(len(np_timestamps_))) -np_bitmaps_[0].mark(0) -np_bitmaps_[1].mark(1) -np_bitmaps_[2].mark(2) -np_bitmaps_[4].mark(3) -np_bitmaps_[5].mark(3) -np_tablet_with_none = NumpyTablet( - device_id, measurements_, data_types_, np_values_, np_timestamps_, np_bitmaps_ -) -session.insert_tablet(np_tablet_with_none) -``` - -* Insert multiple Tablets - -```python -session.insert_tablets(tablet_lst) -``` - -* Insert a Record - -```python -session.insert_record(device_id, timestamp, measurements_, data_types_, values_) -``` - -* Insert multiple Records - -```python -session.insert_records( - device_ids_, time_list_, measurements_list_, data_type_list_, values_list_ -) -``` - -* Insert multiple Records that belong to the same device. - With type info the server has no need to do type inference, which leads a better performance - - -```python -session.insert_records_of_one_device(device_id, time_list, measurements_list, data_types_list, values_list) -``` - -### Insert with type inference - -When the data is of String type, we can use the following interface to perform type inference based on the value of the value itself. For example, if value is "true" , it can be automatically inferred to be a boolean type. If value is "3.2" , it can be automatically inferred as a flout type. Without type information, server has to do type inference, which may cost some time. - -* Insert a Record, which contains multiple measurement value of a device at a timestamp - -```python -session.insert_str_record(device_id, timestamp, measurements, string_values) -``` - -### Insert of Aligned Timeseries - -The Insert of aligned timeseries uses interfaces like insert_aligned_XXX, and others are similar to the above interfaces: - -* insert_aligned_record -* insert_aligned_records -* insert_aligned_records_of_one_device -* insert_aligned_tablet -* insert_aligned_tablets - - -## IoTDB-SQL Interface - -* Execute query statement - -```python -session.execute_query_statement(sql) -``` - -* Execute non query statement - -```python -session.execute_non_query_statement(sql) -``` - -* Execute statement - -```python -session.execute_statement(sql) -``` - -## Schema Template -### Create Schema Template -The step for creating a metadata template is as follows -1. Create the template class -2. Adding MeasurementNode -3. Execute create schema template function - -```python -template = Template(name=template_name, share_time=True) - -m_node_x = MeasurementNode("x", TSDataType.FLOAT, TSEncoding.RLE, Compressor.SNAPPY) -m_node_y = MeasurementNode("y", TSDataType.FLOAT, TSEncoding.RLE, Compressor.SNAPPY) -m_node_z = MeasurementNode("z", TSDataType.FLOAT, TSEncoding.RLE, Compressor.SNAPPY) - -template.add_template(m_node_x) -template.add_template(m_node_y) -template.add_template(m_node_z) - -session.create_schema_template(template) -``` -### Modify Schema Template measurements -Modify measurements in a template, the template must be already created. These are functions that add or delete some measurement nodes. -* add node in template -```python -session.add_measurements_in_template(template_name, measurements_path, data_types, encodings, compressors, is_aligned) -``` - -* delete node in template -```python -session.delete_node_in_template(template_name, path) -``` - -### Set Schema Template -```python -session.set_schema_template(template_name, prefix_path) -``` - -### Uset Schema Template -```python -session.unset_schema_template(template_name, prefix_path) -``` - -### Show Schema Template -* Show all schema templates -```python -session.show_all_templates() -``` -* Count all measurements in templates -```python -session.count_measurements_in_template(template_name) -``` - -* Judge whether the path is measurement or not in templates, This measurement must be in the template -```python -session.count_measurements_in_template(template_name, path) -``` - -* Judge whether the path is exist or not in templates, This path may not belong to the template -```python -session.is_path_exist_in_template(template_name, path) -``` - -* Show nodes under in schema template -```python -session.show_measurements_in_template(template_name) -``` - -* Show the path prefix where a schema template is set -```python -session.show_paths_template_set_on(template_name) -``` - -* Show the path prefix where a schema template is used (i.e. the time series has been created) -```python -session.show_paths_template_using_on(template_name) -``` - -### Drop Schema Template -Delete an existing metadata template,dropping an already set template is not supported -```python -session.drop_schema_template("template_python") -``` - - -## Pandas Support - -To easily transform a query result to a [Pandas Dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) -the SessionDataSet has a method `.todf()` which consumes the dataset and transforms it to a pandas dataframe. - -Example: - -```python -from iotdb.Session import Session - -ip = "127.0.0.1" -port_ = "6667" -username_ = "root" -password_ = "root" -session = Session(ip, port_, username_, password_) -session.open(False) -result = session.execute_query_statement("SELECT * FROM root.*") - -# Transform to Pandas Dataset -df = result.todf() - -session.close() - -# Now you can work with the dataframe -df = ... -``` - - -## IoTDB Testcontainer - -The Test Support is based on the lib `testcontainers` (https://testcontainers-python.readthedocs.io/en/latest/index.html) which you need to install in your project if you want to use the feature. - -To start (and stop) an IoTDB Database in a Docker container simply do: -```python -class MyTestCase(unittest.TestCase): - - def test_something(self): - with IoTDBContainer() as c: - session = Session("localhost", c.get_exposed_port(6667), "root", "root") - session.open(False) - result = session.execute_query_statement("SHOW TIMESERIES") - print(result) - session.close() -``` - -by default it will load the image `apache/iotdb:latest`, if you want a specific version just pass it like e.g. `IoTDBContainer("apache/iotdb:0.12.0")` to get version `0.12.0` running. - -## IoTDB DBAPI - -IoTDB DBAPI implements the Python DB API 2.0 specification (https://peps.python.org/pep-0249/), which defines a common -interface for accessing databases in Python. - -### Examples -+ Initialization - -The initialized parameters are consistent with the session part (except for the sqlalchemy_mode). -```python -from iotdb.dbapi import connect - -ip = "127.0.0.1" -port_ = "6667" -username_ = "root" -password_ = "root" -conn = connect(ip, port_, username_, password_,fetch_size=1024,zone_id="UTC+8",sqlalchemy_mode=False) -cursor = conn.cursor() -``` -+ simple SQL statement execution -```python -cursor.execute("SELECT ** FROM root") -for row in cursor.fetchall(): - print(row) -``` - -+ execute SQL with parameter - -IoTDB DBAPI supports pyformat style parameters -```python -cursor.execute("SELECT ** FROM root WHERE time < %(time)s",{"time":"2017-11-01T00:08:00.000"}) -for row in cursor.fetchall(): - print(row) -``` - -+ execute SQL with parameter sequences -```python -seq_of_parameters = [ - {"timestamp": 1, "temperature": 1}, - {"timestamp": 2, "temperature": 2}, - {"timestamp": 3, "temperature": 3}, - {"timestamp": 4, "temperature": 4}, - {"timestamp": 5, "temperature": 5}, -] -sql = "insert into root.cursor(timestamp,temperature) values(%(timestamp)s,%(temperature)s)" -cursor.executemany(sql,seq_of_parameters) -``` - -+ close the connection and cursor -```python -cursor.close() -conn.close() -``` - -## IoTDB SQLAlchemy Dialect (Experimental) -The SQLAlchemy dialect of IoTDB is written to adapt to Apache Superset. -This part is still being improved. -Please do not use it in the production environment! -### Mapping of the metadata -The data model used by SQLAlchemy is a relational data model, which describes the relationships between different entities through tables. -While the data model of IoTDB is a hierarchical data model, which organizes the data through a tree structure. -In order to adapt IoTDB to the dialect of SQLAlchemy, the original data model in IoTDB needs to be reorganized. -Converting the data model of IoTDB into the data model of SQLAlchemy. - -The metadata in the IoTDB are: - -1. Database -2. Path -3. Entity -4. Measurement - -The metadata in the SQLAlchemy are: -1. Schema -2. Table -3. Column - -The mapping relationship between them is: - -| The metadata in the SQLAlchemy | The metadata in the IoTDB | -| -------------------- | -------------------------------------------- | -| Schema | Database | -| Table | Path ( from database to entity ) + Entity | -| Column | Measurement | - -The following figure shows the relationship between the two more intuitively: - -![sqlalchemy-to-iotdb](/img/UserGuide/API/IoTDB-SQLAlchemy/sqlalchemy-to-iotdb.png?raw=true) - -### Data type mapping -| data type in IoTDB | data type in SQLAlchemy | -|--------------------|-------------------------| -| BOOLEAN | Boolean | -| INT32 | Integer | -| INT64 | BigInteger | -| FLOAT | Float | -| DOUBLE | Float | -| TEXT | Text | -| LONG | BigInteger | - -### Example - -+ execute statement - -```python -from sqlalchemy import create_engine - -engine = create_engine("iotdb://root:root@127.0.0.1:6667") -connect = engine.connect() -result = connect.execute("SELECT ** FROM root") -for row in result.fetchall(): - print(row) -``` - -+ ORM (now only simple queries are supported) - -```python -from sqlalchemy import create_engine, Column, Float, BigInteger, MetaData -from sqlalchemy.ext.declarative import declarative_base -from sqlalchemy.orm import sessionmaker - -metadata = MetaData( - schema='root.factory' -) -Base = declarative_base(metadata=metadata) - - -class Device(Base): - __tablename__ = "room2.device1" - Time = Column(BigInteger, primary_key=True) - temperature = Column(Float) - status = Column(Float) - - -engine = create_engine("iotdb://root:root@127.0.0.1:6667") - -DbSession = sessionmaker(bind=engine) -session = DbSession() - -res = session.query(Device.status).filter(Device.temperature > 1) - -for row in res: - print(row) -``` - - -## Developers - -### Introduction - -This is an example of how to connect to IoTDB with python, using the thrift rpc interfaces. Things are almost the same on Windows or Linux, but pay attention to the difference like path separator. - - - -### Prerequisites - -Python3.7 or later is preferred. - -You have to install Thrift (0.11.0 or later) to compile our thrift file into python code. Below is the official tutorial of installation, eventually, you should have a thrift executable. - -``` -http://thrift.apache.org/docs/install/ -``` - -Before starting you need to install `requirements_dev.txt` in your python environment, e.g. by calling -```shell -pip install -r requirements_dev.txt -``` - - - -### Compile the thrift library and Debug - -In the root of IoTDB's source code folder, run `mvn clean generate-sources -pl iotdb-client/client-py -am`. - -This will automatically delete and repopulate the folder `iotdb/thrift` with the generated thrift files. -This folder is ignored from git and should **never be pushed to git!** - -**Notice** Do not upload `iotdb/thrift` to the git repo. - - - - -### Session Client & Example - -We packed up the Thrift interface in `client-py/src/iotdb/Session.py` (similar with its Java counterpart), also provided an example file `client-py/src/SessionExample.py` of how to use the session module. please read it carefully. - - -Or, another simple example: - -```python -from iotdb.Session import Session - -ip = "127.0.0.1" -port_ = "6667" -username_ = "root" -password_ = "root" -session = Session(ip, port_, username_, password_) -session.open(False) -zone = session.get_time_zone() -session.close() -``` - - - -### Tests - -Please add your custom tests in `tests` folder. - -To run all defined tests just type `pytest .` in the root folder. - -**Notice** Some tests need docker to be started on your system as a test instance is started in a docker container using [testcontainers](https://testcontainers-python.readthedocs.io/en/latest/index.html). - - - -### Futher Tools - -[black](https://pypi.org/project/black/) and [flake8](https://pypi.org/project/flake8/) are installed for autoformatting and linting. -Both can be run by `black .` or `flake8 .` respectively. - - - -## Releasing - -To do a release just ensure that you have the right set of generated thrift files. -Then run linting and auto-formatting. -Then, ensure that all tests work (via `pytest .`). -Then you are good to go to do a release! - - - -### Preparing your environment - -First, install all necessary dev dependencies via `pip install -r requirements_dev.txt`. - - - -### Doing the Release - -There is a convenient script `release.sh` to do all steps for a release. -Namely, these are - -* Remove all transient directories from last release (if exists) -* (Re-)generate all generated sources via mvn -* Run Linting (flake8) -* Run Tests via pytest -* Build -* Release to pypi - diff --git a/src/UserGuide/V1.3.0-2/API/Programming-Rust-Native-API.md b/src/UserGuide/V1.3.0-2/API/Programming-Rust-Native-API.md deleted file mode 100644 index a5bbacffd..000000000 --- a/src/UserGuide/V1.3.0-2/API/Programming-Rust-Native-API.md +++ /dev/null @@ -1,198 +0,0 @@ - - - -# Rust Native API Native API - -IoTDB uses Thrift as a cross language RPC framework, so access to IoTDB can be achieved through the interface provided by Thrift. This document will introduce how to generate a native Rust interface that can access IoTDB. - - -## Dependents - - * JDK >= 1.8 - * Rust >= 1.0.0 - * thrift 0.14.1 - * Linux、Macos or like unix - * Windows+bash - -Thrift (0.14.1 or higher) must be installed to compile Thrift files into Rust code. The following is the official installation tutorial, and in the end, you should receive a Thrift executable file. - -``` -http://thrift.apache.org/docs/install/ -``` - - -## Compile the Thrift library and generate the Rust native interface - -1. Find the pom.xml file in the root directory of the IoTDB source code folder. -2. Open the pom.xml file and find the following content: - -```xml - - generate-thrift-sources-java - generate-sources - - compile - - - java - ${thrift.exec.absolute.path} - ${basedir}/src/main/thrift - - -``` -3. Referring to this setting, add the following content to the pom.xml file to generate the native interface for Rust: - -```xml - - generate-thrift-sources-rust - generate-sources - - compile - - - rs - ${thrift.exec.absolute.path} - ${basedir}/src/main/thrift - **/common.thrift,**/client.thrift - ${project.build.directory}/generated-sources-rust - - -``` - - -4. In the root directory of the IoTDB source code folder,run `mvn clean generate-sources`, - -This command will automatically delete the files in `iotdb/iotdb-protocol/thrift/target` and `iotdb/iotdb-protocol/thrift-commons/target`, and repopulate the folder with the newly generated throttle file. - -## Using the Rust native interface - -copy `iotdb/iotdb-protocol/thrift/target/generated-sources-rust/` and `iotdb/iotdb-protocol/thrift-commons/target/generated-sources-rust/` in your project。 - - -## rpc interface - -``` -// open a session -TSOpenSessionResp openSession(1:TSOpenSessionReq req); - -// close a session -TSStatus closeSession(1:TSCloseSessionReq req); - -// run an SQL statement in batch -TSExecuteStatementResp executeStatement(1:TSExecuteStatementReq req); - -// execute SQL statement in batch -TSStatus executeBatchStatement(1:TSExecuteBatchStatementReq req); - -// execute query SQL statement -TSExecuteStatementResp executeQueryStatement(1:TSExecuteStatementReq req); - -// execute insert, delete and update SQL statement -TSExecuteStatementResp executeUpdateStatement(1:TSExecuteStatementReq req); - -// fetch next query result -TSFetchResultsResp fetchResults(1:TSFetchResultsReq req) - -// fetch meta data -TSFetchMetadataResp fetchMetadata(1:TSFetchMetadataReq req) - -// cancel a query -TSStatus cancelOperation(1:TSCancelOperationReq req); - -// close a query dataset -TSStatus closeOperation(1:TSCloseOperationReq req); - -// get time zone -TSGetTimeZoneResp getTimeZone(1:i64 sessionId); - -// set time zone -TSStatus setTimeZone(1:TSSetTimeZoneReq req); - -// get server's properties -ServerProperties getProperties(); - -// CREATE DATABASE -TSStatus setStorageGroup(1:i64 sessionId, 2:string storageGroup); - -// create timeseries -TSStatus createTimeseries(1:TSCreateTimeseriesReq req); - -// create multi timeseries -TSStatus createMultiTimeseries(1:TSCreateMultiTimeseriesReq req); - -// delete timeseries -TSStatus deleteTimeseries(1:i64 sessionId, 2:list path) - -// delete sttorage groups -TSStatus deleteStorageGroups(1:i64 sessionId, 2:list storageGroup); - -// insert record -TSStatus insertRecord(1:TSInsertRecordReq req); - -// insert record in string format -TSStatus insertStringRecord(1:TSInsertStringRecordReq req); - -// insert tablet -TSStatus insertTablet(1:TSInsertTabletReq req); - -// insert tablets in batch -TSStatus insertTablets(1:TSInsertTabletsReq req); - -// insert records in batch -TSStatus insertRecords(1:TSInsertRecordsReq req); - -// insert records of one device -TSStatus insertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); - -// insert records in batch as string format -TSStatus insertStringRecords(1:TSInsertStringRecordsReq req); - -// test the latency of innsert tablet,caution:no data will be inserted, only for test latency -TSStatus testInsertTablet(1:TSInsertTabletReq req); - -// test the latency of innsert tablets,caution:no data will be inserted, only for test latency -TSStatus testInsertTablets(1:TSInsertTabletsReq req); - -// test the latency of innsert record,caution:no data will be inserted, only for test latency -TSStatus testInsertRecord(1:TSInsertRecordReq req); - -// test the latency of innsert record in string format,caution:no data will be inserted, only for test latency -TSStatus testInsertStringRecord(1:TSInsertStringRecordReq req); - -// test the latency of innsert records,caution:no data will be inserted, only for test latency -TSStatus testInsertRecords(1:TSInsertRecordsReq req); - -// test the latency of innsert records of one device,caution:no data will be inserted, only for test latency -TSStatus testInsertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); - -// test the latency of innsert records in string formate,caution:no data will be inserted, only for test latency -TSStatus testInsertStringRecords(1:TSInsertStringRecordsReq req); - -// delete data -TSStatus deleteData(1:TSDeleteDataReq req); - -// execute raw data query -TSExecuteStatementResp executeRawDataQuery(1:TSRawDataQueryReq req); - -// request a statement id from server -i64 requestStatementId(1:i64 sessionId); -``` diff --git a/src/UserGuide/V1.3.0-2/API/RestServiceV1.md b/src/UserGuide/V1.3.0-2/API/RestServiceV1.md deleted file mode 100644 index c110b527b..000000000 --- a/src/UserGuide/V1.3.0-2/API/RestServiceV1.md +++ /dev/null @@ -1,924 +0,0 @@ - - -# RESTful API V1(Not Recommend) -IoTDB's RESTful services can be used for query, write, and management operations, using the OpenAPI standard to define interfaces and generate frameworks. - -## Enable RESTful Services - -RESTful services are disabled by default. - - Find the `conf/conf/iotdb-datanode.properties` file under the IoTDB installation directory and set `enable_rest_service` to `true` to enable the module. - - ```properties - enable_rest_service=true - ``` - -## Authentication -Except the liveness probe API `/ping`, RESTful services use the basic authentication. Each URL request needs to carry `'Authorization': 'Basic ' + base64.encode(username + ':' + password)`. - -The username used in the following examples is: `root`, and password is: `root`. - -And the authorization header is - -``` -Authorization: Basic cm9vdDpyb290 -``` - -- If a user authorized with incorrect username or password, the following error is returned: - - HTTP Status Code:`401` - - HTTP response body: - ```json - { - "code": 600, - "message": "WRONG_LOGIN_PASSWORD_ERROR" - } - ``` - -- If the `Authorization` header is missing,the following error is returned: - - HTTP Status Code:`401` - - HTTP response body: - ```json - { - "code": 603, - "message": "UNINITIALIZED_AUTH_ERROR" - } - ``` - -## Interface - -### ping - -The `/ping` API can be used for service liveness probing. - -Request method: `GET` - -Request path: `http://ip:port/ping` - -The user name used in the example is: root, password: root - -Example request: - -```shell -$ curl http://127.0.0.1:18080/ping -``` - -Response status codes: - -- `200`: The service is alive. -- `503`: The service cannot accept any requests now. - -Response parameters: - -|parameter name |parameter type |parameter describe| -|:--- | :--- | :---| -|code | integer | status code | -| message | string | message | - -Sample response: - -- With HTTP status code `200`: - - ```json - { - "code": 200, - "message": "SUCCESS_STATUS" - } - ``` - -- With HTTP status code `503`: - - ```json - { - "code": 500, - "message": "thrift service is unavailable" - } - ``` - -> `/ping` can be accessed without authorization. - -### query - -The query interface can be used to handle data queries and metadata queries. - -Request method: `POST` - -Request header: `application/json` - -Request path: `http://ip:port/rest/v1/query` - -Parameter Description: - -| parameter name | parameter type | required | parameter description | -|----------------| -------------- | -------- | ------------------------------------------------------------ | -| sql | string | yes | | -| rowLimit | integer | no | The maximum number of rows in the result set that can be returned by a query.
If this parameter is not set, the `rest_query_default_row_size_limit` of the configuration file will be used as the default value.
When the number of rows in the returned result set exceeds the limit, the status code `411` will be returned. | - -Response parameters: - -| parameter name | parameter type | parameter description | -|----------------| -------------- | ------------------------------------------------------------ | -| expressions | array | Array of result set column names for data query, `null` for metadata query | -| columnNames | array | Array of column names for metadata query result set, `null` for data query | -| timestamps | array | Timestamp column, `null` for metadata query | -| values | array | A two-dimensional array, the first dimension has the same length as the result set column name array, and the second dimension array represents a column of the result set | - -**Examples:** - -Tip: Statements like `select * from root.xx.**` are not recommended because those statements may cause OOM. - -**Expression query** - - ```shell - curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4, s3 + 1 from root.sg27 limit 2"}' http://127.0.0.1:18080/rest/v1/query - ```` -Response instance - ```json - { - "expressions": [ - "root.sg27.s3", - "root.sg27.s4", - "root.sg27.s3 + 1" - ], - "columnNames": null, - "timestamps": [ - 1635232143960, - 1635232153960 - ], - "values": [ - [ - 11, - null - ], - [ - false, - true - ], - [ - 12.0, - null - ] - ] - } - ``` - -**Show child paths** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child paths root"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "child paths" - ], - "timestamps": null, - "values": [ - [ - "root.sg27", - "root.sg28" - ] - ] -} -``` - -**Show child nodes** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child nodes root"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "child nodes" - ], - "timestamps": null, - "values": [ - [ - "sg27", - "sg28" - ] - ] -} -``` - -**Show all ttl** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show all ttl"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "database", - "ttl" - ], - "timestamps": null, - "values": [ - [ - "root.sg27", - "root.sg28" - ], - [ - null, - null - ] - ] -} -``` - -**Show ttl** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show ttl on root.sg27"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "database", - "ttl" - ], - "timestamps": null, - "values": [ - [ - "root.sg27" - ], - [ - null - ] - ] -} -``` - -**Show functions** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show functions"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "function name", - "function type", - "class name (UDF)" - ], - "timestamps": null, - "values": [ - [ - "ABS", - "ACOS", - "ASIN", - ... - ], - [ - "built-in UDTF", - "built-in UDTF", - "built-in UDTF", - ... - ], - [ - "org.apache.iotdb.db.query.udf.builtin.UDTFAbs", - "org.apache.iotdb.db.query.udf.builtin.UDTFAcos", - "org.apache.iotdb.db.query.udf.builtin.UDTFAsin", - ... - ] - ] -} -``` - -**Show timeseries** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show timeseries"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "timeseries", - "alias", - "database", - "dataType", - "encoding", - "compression", - "tags", - "attributes" - ], - "timestamps": null, - "values": [ - [ - "root.sg27.s3", - "root.sg27.s4", - "root.sg28.s3", - "root.sg28.s4" - ], - [ - null, - null, - null, - null - ], - [ - "root.sg27", - "root.sg27", - "root.sg28", - "root.sg28" - ], - [ - "INT32", - "BOOLEAN", - "INT32", - "BOOLEAN" - ], - [ - "RLE", - "RLE", - "RLE", - "RLE" - ], - [ - "SNAPPY", - "SNAPPY", - "SNAPPY", - "SNAPPY" - ], - [ - null, - null, - null, - null - ], - [ - null, - null, - null, - null - ] - ] -} -``` - -**Show latest timeseries** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show latest timeseries"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "timeseries", - "alias", - "database", - "dataType", - "encoding", - "compression", - "tags", - "attributes" - ], - "timestamps": null, - "values": [ - [ - "root.sg28.s4", - "root.sg27.s4", - "root.sg28.s3", - "root.sg27.s3" - ], - [ - null, - null, - null, - null - ], - [ - "root.sg28", - "root.sg27", - "root.sg28", - "root.sg27" - ], - [ - "BOOLEAN", - "BOOLEAN", - "INT32", - "INT32" - ], - [ - "RLE", - "RLE", - "RLE", - "RLE" - ], - [ - "SNAPPY", - "SNAPPY", - "SNAPPY", - "SNAPPY" - ], - [ - null, - null, - null, - null - ], - [ - null, - null, - null, - null - ] - ] -} -``` - -**Count timeseries** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count timeseries root.**"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "count" - ], - "timestamps": null, - "values": [ - [ - 4 - ] - ] -} -``` - -**Count nodes** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count nodes root.** level=2"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "count" - ], - "timestamps": null, - "values": [ - [ - 4 - ] - ] -} -``` - -**Show devices** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "devices", - "isAligned" - ], - "timestamps": null, - "values": [ - [ - "root.sg27", - "root.sg28" - ], - [ - "false", - "false" - ] - ] -} -``` - -**Show devices with database** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices with database"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "devices", - "database", - "isAligned" - ], - "timestamps": null, - "values": [ - [ - "root.sg27", - "root.sg28" - ], - [ - "root.sg27", - "root.sg28" - ], - [ - "false", - "false" - ] - ] -} -``` - -**List user** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"list user"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "user" - ], - "timestamps": null, - "values": [ - [ - "root" - ] - ] -} -``` - -**Aggregation** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": [ - "count(root.sg27.s3)", - "count(root.sg27.s4)" - ], - "columnNames": null, - "timestamps": [ - 0 - ], - "values": [ - [ - 1 - ], - [ - 2 - ] - ] -} -``` - -**Group by level** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.** group by level = 1"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "count(root.sg27.*)", - "count(root.sg28.*)" - ], - "timestamps": null, - "values": [ - [ - 3 - ], - [ - 3 - ] - ] -} -``` - -**Group by** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27 group by([1635232143960,1635232153960),1s)"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": [ - "count(root.sg27.s3)", - "count(root.sg27.s4)" - ], - "columnNames": null, - "timestamps": [ - 1635232143960, - 1635232144960, - 1635232145960, - 1635232146960, - 1635232147960, - 1635232148960, - 1635232149960, - 1635232150960, - 1635232151960, - 1635232152960 - ], - "values": [ - [ - 1, - 0, - 0, - 0, - 0, - 0, - 0, - 0, - 0, - 0 - ], - [ - 1, - 0, - 0, - 0, - 0, - 0, - 0, - 0, - 0, - 0 - ] - ] -} -``` - -**Last** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select last s3 from root.sg27"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "expressions": null, - "columnNames": [ - "timeseries", - "value", - "dataType" - ], - "timestamps": [ - 1635232143960 - ], - "values": [ - [ - "root.sg27.s3" - ], - [ - "11" - ], - [ - "INT32" - ] - ] -} -``` - -**Disable align** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select * from root.sg27 disable align"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "code": 407, - "message": "disable align clauses are not supported." -} -``` - -**Align by device** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(s3) from root.sg27 align by device"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "code": 407, - "message": "align by device clauses are not supported." -} -``` - -**Select into** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4 into root.sg29.s1, root.sg29.s2 from root.sg27"}' http://127.0.0.1:18080/rest/v1/query -``` - -```json -{ - "code": 407, - "message": "select into clauses are not supported." -} -``` - -### nonQuery - -Request method: `POST` - -Request header: `application/json` - -Request path: `http://ip:port/rest/v1/nonQuery` - -Parameter Description: - -|parameter name |parameter type |parameter describe| -|:--- | :--- | :---| -| sql | string | query content | - -Example request: -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"CREATE DATABASE root.ln"}' http://127.0.0.1:18080/rest/v1/nonQuery -``` - -Response parameters: - -|parameter name |parameter type |parameter describe| -|:--- | :--- | :---| -| code | integer | status code | -| message | string | message | - -Sample response: -```json -{ - "code": 200, - "message": "SUCCESS_STATUS" -} -``` - - - -### insertTablet - -Request method: `POST` - -Request header: `application/json` - -Request path: `http://ip:port/rest/v1/insertTablet` - -Parameter Description: - -| parameter name |parameter type |is required|parameter describe| -|:---------------| :--- | :---| :---| -| timestamps | array | yes | Time column | -| measurements | array | yes | The name of the measuring point | -| dataTypes | array | yes | The data type | -| values | array | yes | Value columns, the values in each column can be `null` | -| isAligned | boolean | yes | Whether to align the timeseries | -| deviceId | string | yes | Device name | - -Example request: -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"timestamps":[1635232143960,1635232153960],"measurements":["s3","s4"],"dataTypes":["INT32","BOOLEAN"],"values":[[11,null],[false,true]],"isAligned":false,"deviceId":"root.sg27"}' http://127.0.0.1:18080/rest/v1/insertTablet -``` - -Sample response: - -|parameter name |parameter type |parameter describe| -|:--- | :--- | :---| -| code | integer | status code | -| message | string | message | - -Sample response: -```json -{ - "code": 200, - "message": "SUCCESS_STATUS" -} -``` - -## Configuration - -The configuration is located in 'iotdb-datanode.properties'. - -* Set 'enable_rest_service' to 'true' to enable the module, and 'false' to disable the module. By default, this value is' false '. - -```properties -enable_rest_service=true -``` - -* This parameter is valid only when 'enable_REST_service =true'. Set 'rest_service_port' to a number (1025 to 65535) to customize the REST service socket port. By default, the value is 18080. - -```properties -rest_service_port=18080 -``` - -* Set 'enable_swagger' to 'true' to display rest service interface information through swagger, and 'false' to do not display the rest service interface information through the swagger. By default, this value is' false '. - -```properties -enable_swagger=false -``` - -* The maximum number of rows in the result set that can be returned by a query. When the number of rows in the returned result set exceeds the limit, the status code `411` is returned. - -````properties -rest_query_default_row_size_limit=10000 -```` - -* Expiration time for caching customer login information (used to speed up user authentication, in seconds, 8 hours by default) - -```properties -cache_expire=28800 -``` - - -* Maximum number of users stored in the cache (default: 100) - -```properties -cache_max_num=100 -``` - -* Initial cache size (default: 10) - -```properties -cache_init_num=10 -``` - -* REST Service whether to enable SSL configuration, set 'enable_https' to' true 'to enable the module, and set' false 'to disable the module. By default, this value is' false '. - -```properties -enable_https=false -``` - -* keyStore location path (optional) - -```properties -key_store_path= -``` - - -* keyStore password (optional) - -```properties -key_store_pwd= -``` - - -* trustStore location path (optional) - -```properties -trust_store_path= -``` - -* trustStore password (optional) - -```properties -trust_store_pwd= -``` - - -* SSL timeout period, in seconds - -```properties -idle_timeout=5000 -``` diff --git a/src/UserGuide/V1.3.0-2/API/RestServiceV2.md b/src/UserGuide/V1.3.0-2/API/RestServiceV2.md deleted file mode 100644 index 04df1f8ec..000000000 --- a/src/UserGuide/V1.3.0-2/API/RestServiceV2.md +++ /dev/null @@ -1,964 +0,0 @@ - - -# RESTful API V2 -IoTDB's RESTful services can be used for query, write, and management operations, using the OpenAPI standard to define interfaces and generate frameworks. - -## Enable RESTful Services - -RESTful services are disabled by default. - - Find the `conf/iotdb-datanode.properties` file under the IoTDB installation directory and set `enable_rest_service` to `true` to enable the module. - - ```properties - enable_rest_service=true - ``` - -## Authentication -Except the liveness probe API `/ping`, RESTful services use the basic authentication. Each URL request needs to carry `'Authorization': 'Basic ' + base64.encode(username + ':' + password)`. - -The username used in the following examples is: `root`, and password is: `root`. - -And the authorization header is - -``` -Authorization: Basic cm9vdDpyb290 -``` - -- If a user authorized with incorrect username or password, the following error is returned: - - HTTP Status Code:`401` - - HTTP response body: - ```json - { - "code": 600, - "message": "WRONG_LOGIN_PASSWORD_ERROR" - } - ``` - -- If the `Authorization` header is missing,the following error is returned: - - HTTP Status Code:`401` - - HTTP response body: - ```json - { - "code": 603, - "message": "UNINITIALIZED_AUTH_ERROR" - } - ``` - -## Interface - -### ping - -The `/ping` API can be used for service liveness probing. - -Request method: `GET` - -Request path: `http://ip:port/ping` - -The user name used in the example is: root, password: root - -Example request: - -```shell -$ curl http://127.0.0.1:18080/ping -``` - -Response status codes: - -- `200`: The service is alive. -- `503`: The service cannot accept any requests now. - -Response parameters: - -|parameter name |parameter type |parameter describe| -|:--- | :--- | :---| -|code | integer | status code | -| message | string | message | - -Sample response: - -- With HTTP status code `200`: - - ```json - { - "code": 200, - "message": "SUCCESS_STATUS" - } - ``` - -- With HTTP status code `503`: - - ```json - { - "code": 500, - "message": "thrift service is unavailable" - } - ``` - -> `/ping` can be accessed without authorization. - -### query - -The query interface can be used to handle data queries and metadata queries. - -Request method: `POST` - -Request header: `application/json` - -Request path: `http://ip:port/rest/v2/query` - -Parameter Description: - -| parameter name | parameter type | required | parameter description | -|----------------| -------------- | -------- | ------------------------------------------------------------ | -| sql | string | yes | | -| row_limit | integer | no | The maximum number of rows in the result set that can be returned by a query.
If this parameter is not set, the `rest_query_default_row_size_limit` of the configuration file will be used as the default value.
When the number of rows in the returned result set exceeds the limit, the status code `411` will be returned. | - -Response parameters: - -| parameter name | parameter type | parameter description | -|----------------| -------------- | ------------------------------------------------------------ | -| expressions | array | Array of result set column names for data query, `null` for metadata query | -| column_names | array | Array of column names for metadata query result set, `null` for data query | -| timestamps | array | Timestamp column, `null` for metadata query | -| values | array | A two-dimensional array, the first dimension has the same length as the result set column name array, and the second dimension array represents a column of the result set | - -**Examples:** - -Tip: Statements like `select * from root.xx.**` are not recommended because those statements may cause OOM. - -**Expression query** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4, s3 + 1 from root.sg27 limit 2"}' http://127.0.0.1:18080/rest/v2/query -```` - -```json -{ - "expressions": [ - "root.sg27.s3", - "root.sg27.s4", - "root.sg27.s3 + 1" - ], - "column_names": null, - "timestamps": [ - 1635232143960, - 1635232153960 - ], - "values": [ - [ - 11, - null - ], - [ - false, - true - ], - [ - 12.0, - null - ] - ] -} -``` - -**Show child paths** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child paths root"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "child paths" - ], - "timestamps": null, - "values": [ - [ - "root.sg27", - "root.sg28" - ] - ] -} -``` - -**Show child nodes** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child nodes root"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "child nodes" - ], - "timestamps": null, - "values": [ - [ - "sg27", - "sg28" - ] - ] -} -``` - -**Show all ttl** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show all ttl"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "database", - "ttl" - ], - "timestamps": null, - "values": [ - [ - "root.sg27", - "root.sg28" - ], - [ - null, - null - ] - ] -} -``` - -**Show ttl** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show ttl on root.sg27"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "database", - "ttl" - ], - "timestamps": null, - "values": [ - [ - "root.sg27" - ], - [ - null - ] - ] -} -``` - -**Show functions** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show functions"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "function name", - "function type", - "class name (UDF)" - ], - "timestamps": null, - "values": [ - [ - "ABS", - "ACOS", - "ASIN", - ... - ], - [ - "built-in UDTF", - "built-in UDTF", - "built-in UDTF", - ... - ], - [ - "org.apache.iotdb.db.query.udf.builtin.UDTFAbs", - "org.apache.iotdb.db.query.udf.builtin.UDTFAcos", - "org.apache.iotdb.db.query.udf.builtin.UDTFAsin", - ... - ] - ] -} -``` - -**Show timeseries** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show timeseries"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "timeseries", - "alias", - "database", - "dataType", - "encoding", - "compression", - "tags", - "attributes" - ], - "timestamps": null, - "values": [ - [ - "root.sg27.s3", - "root.sg27.s4", - "root.sg28.s3", - "root.sg28.s4" - ], - [ - null, - null, - null, - null - ], - [ - "root.sg27", - "root.sg27", - "root.sg28", - "root.sg28" - ], - [ - "INT32", - "BOOLEAN", - "INT32", - "BOOLEAN" - ], - [ - "RLE", - "RLE", - "RLE", - "RLE" - ], - [ - "SNAPPY", - "SNAPPY", - "SNAPPY", - "SNAPPY" - ], - [ - null, - null, - null, - null - ], - [ - null, - null, - null, - null - ] - ] -} -``` - -**Show latest timeseries** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show latest timeseries"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "timeseries", - "alias", - "database", - "dataType", - "encoding", - "compression", - "tags", - "attributes" - ], - "timestamps": null, - "values": [ - [ - "root.sg28.s4", - "root.sg27.s4", - "root.sg28.s3", - "root.sg27.s3" - ], - [ - null, - null, - null, - null - ], - [ - "root.sg28", - "root.sg27", - "root.sg28", - "root.sg27" - ], - [ - "BOOLEAN", - "BOOLEAN", - "INT32", - "INT32" - ], - [ - "RLE", - "RLE", - "RLE", - "RLE" - ], - [ - "SNAPPY", - "SNAPPY", - "SNAPPY", - "SNAPPY" - ], - [ - null, - null, - null, - null - ], - [ - null, - null, - null, - null - ] - ] -} -``` - -**Count timeseries** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count timeseries root.**"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "count" - ], - "timestamps": null, - "values": [ - [ - 4 - ] - ] -} -``` - -**Count nodes** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count nodes root.** level=2"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "count" - ], - "timestamps": null, - "values": [ - [ - 4 - ] - ] -} -``` - -**Show devices** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "devices", - "isAligned" - ], - "timestamps": null, - "values": [ - [ - "root.sg27", - "root.sg28" - ], - [ - "false", - "false" - ] - ] -} -``` - -**Show devices with database** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices with database"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "devices", - "database", - "isAligned" - ], - "timestamps": null, - "values": [ - [ - "root.sg27", - "root.sg28" - ], - [ - "root.sg27", - "root.sg28" - ], - [ - "false", - "false" - ] - ] -} -``` - -**List user** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"list user"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "user" - ], - "timestamps": null, - "values": [ - [ - "root" - ] - ] -} -``` - -**Aggregation** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": [ - "count(root.sg27.s3)", - "count(root.sg27.s4)" - ], - "column_names": null, - "timestamps": [ - 0 - ], - "values": [ - [ - 1 - ], - [ - 2 - ] - ] -} -``` - -**Group by level** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.** group by level = 1"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "count(root.sg27.*)", - "count(root.sg28.*)" - ], - "timestamps": null, - "values": [ - [ - 3 - ], - [ - 3 - ] - ] -} -``` - -**Group by** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27 group by([1635232143960,1635232153960),1s)"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": [ - "count(root.sg27.s3)", - "count(root.sg27.s4)" - ], - "column_names": null, - "timestamps": [ - 1635232143960, - 1635232144960, - 1635232145960, - 1635232146960, - 1635232147960, - 1635232148960, - 1635232149960, - 1635232150960, - 1635232151960, - 1635232152960 - ], - "values": [ - [ - 1, - 0, - 0, - 0, - 0, - 0, - 0, - 0, - 0, - 0 - ], - [ - 1, - 0, - 0, - 0, - 0, - 0, - 0, - 0, - 0, - 0 - ] - ] -} -``` - -**Last** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select last s3 from root.sg27"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "expressions": null, - "column_names": [ - "timeseries", - "value", - "dataType" - ], - "timestamps": [ - 1635232143960 - ], - "values": [ - [ - "root.sg27.s3" - ], - [ - "11" - ], - [ - "INT32" - ] - ] -} -``` - -**Disable align** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select * from root.sg27 disable align"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "code": 407, - "message": "disable align clauses are not supported." -} -``` - -**Align by device** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(s3) from root.sg27 align by device"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "code": 407, - "message": "align by device clauses are not supported." -} -``` - -**Select into** - -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4 into root.sg29.s1, root.sg29.s2 from root.sg27"}' http://127.0.0.1:18080/rest/v2/query -``` - -```json -{ - "code": 407, - "message": "select into clauses are not supported." -} -``` - -### nonQuery - -Request method: `POST` - -Request header: `application/json` - -Request path: `http://ip:port/rest/v2/nonQuery` - -Parameter Description: - -|parameter name |parameter type |parameter describe| -|:--- | :--- | :---| -| sql | string | query content | - -Example request: -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"CREATE DATABASE root.ln"}' http://127.0.0.1:18080/rest/v2/nonQuery -``` - -Response parameters: - -|parameter name |parameter type |parameter describe| -|:--- | :--- | :---| -| code | integer | status code | -| message | string | message | - -Sample response: -```json -{ - "code": 200, - "message": "SUCCESS_STATUS" -} -``` - - - -### insertTablet - -Request method: `POST` - -Request header: `application/json` - -Request path: `http://ip:port/rest/v2/insertTablet` - -Parameter Description: - -| parameter name |parameter type |is required|parameter describe| -|:---------------| :--- | :---| :---| -| timestamps | array | yes | Time column | -| measurements | array | yes | The name of the measuring point | -| data_types | array | yes | The data type | -| values | array | yes | Value columns, the values in each column can be `null` | -| is_aligned | boolean | yes | Whether to align the timeseries | -| device | string | yes | Device name | - -Example request: -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"timestamps":[1635232143960,1635232153960],"measurements":["s3","s4"],"data_types":["INT32","BOOLEAN"],"values":[[11,null],[false,true]],"is_aligned":false,"device":"root.sg27"}' http://127.0.0.1:18080/rest/v2/insertTablet -``` - -Sample response: - -|parameter name |parameter type |parameter describe| -|:--- | :--- | :---| -| code | integer | status code | -| message | string | message | - -Sample response: -```json -{ - "code": 200, - "message": "SUCCESS_STATUS" -} -``` - -### insertRecords - -Request method: `POST` - -Request header: `application/json` - -Request path: `http://ip:port/rest/v2/insertRecords` - -Parameter Description: - -| parameter name |parameter type |is required|parameter describe| -|:------------------| :--- | :---| :---| -| timestamps | array | yes | Time column | -| measurements_list | array | yes | The name of the measuring point | -| data_types_list | array | yes | The data type | -| values_list | array | yes | Value columns, the values in each column can be `null` | -| devices | string | yes | Device name | -| is_aligned | boolean | yes | Whether to align the timeseries | - -Example request: -```shell -curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"timestamps":[1635232113960,1635232151960,1635232143960,1635232143960],"measurements_list":[["s33","s44"],["s55","s66"],["s77","s88"],["s771","s881"]],"data_types_list":[["INT32","INT64"],["FLOAT","DOUBLE"],["FLOAT","DOUBLE"],["BOOLEAN","TEXT"]],"values_list":[[1,11],[2.1,2],[4,6],[false,"cccccc"]],"is_aligned":false,"devices":["root.s1","root.s1","root.s1","root.s3"]}' http://127.0.0.1:18080/rest/v2/insertRecords -``` - -Sample response: - -|parameter name |parameter type |parameter describe| -|:--- | :--- | :---| -| code | integer | status code | -| message | string | message | - -Sample response: -```json -{ - "code": 200, - "message": "SUCCESS_STATUS" -} -``` - - -## Configuration - -The configuration is located in 'iotdb-datanode.properties'. - -* Set 'enable_rest_service' to 'true' to enable the module, and 'false' to disable the module. By default, this value is' false '. - -```properties -enable_rest_service=true -``` - -* This parameter is valid only when 'enable_REST_service =true'. Set 'rest_service_port' to a number (1025 to 65535) to customize the REST service socket port. By default, the value is 18080. - -```properties -rest_service_port=18080 -``` - -* Set 'enable_swagger' to 'true' to display rest service interface information through swagger, and 'false' to do not display the rest service interface information through the swagger. By default, this value is' false '. - -```properties -enable_swagger=false -``` - -* The maximum number of rows in the result set that can be returned by a query. When the number of rows in the returned result set exceeds the limit, the status code `411` is returned. - -````properties -rest_query_default_row_size_limit=10000 -```` - -* Expiration time for caching customer login information (used to speed up user authentication, in seconds, 8 hours by default) - -```properties -cache_expire=28800 -``` - - -* Maximum number of users stored in the cache (default: 100) - -```properties -cache_max_num=100 -``` - -* Initial cache size (default: 10) - -```properties -cache_init_num=10 -``` - -* REST Service whether to enable SSL configuration, set 'enable_https' to' true 'to enable the module, and set' false 'to disable the module. By default, this value is' false '. - -```properties -enable_https=false -``` - -* keyStore location path (optional) - -```properties -key_store_path= -``` - - -* keyStore password (optional) - -```properties -key_store_pwd= -``` - - -* trustStore location path (optional) - -```properties -trust_store_path= -``` - -* trustStore password (optional) - -```properties -trust_store_pwd= -``` - - -* SSL timeout period, in seconds - -```properties -idle_timeout=5000 -``` diff --git a/src/UserGuide/V1.3.0-2/Basic-Concept/Cluster-Concept.md b/src/UserGuide/V1.3.0-2/Basic-Concept/Cluster-Concept.md deleted file mode 100644 index 4c86aa2ea..000000000 --- a/src/UserGuide/V1.3.0-2/Basic-Concept/Cluster-Concept.md +++ /dev/null @@ -1,59 +0,0 @@ - - -# Cluster-related Concepts -The figure below illustrates a typical IoTDB 3C3D1A cluster deployment mode, comprising 3 ConfigNodes, 3 DataNodes, and 1 AINode: - - -This deployment involves several key concepts that users commonly encounter when working with IoTDB clusters, including: -- **Nodes** (ConfigNode, DataNode, AINode); -- **Slots** (SchemaSlot, DataSlot); -- **Regions** (SchemaRegion, DataRegion); -- **Replica Groups**. - -The following sections will provide a detailed introduction to these concepts. - -## Nodes - -An IoTDB cluster consists of three types of nodes (processes): **ConfigNode** (the main node), **DataNode**, and **AINode**, as detailed below: -- **ConfigNode:** ConfigNodes store cluster configurations, database metadata, the routing information of time series' schema and data. They also monitor cluster nodes and conduct load balancing. All ConfigNodes maintain full mutual backups, as shown in the figure with ConfigNode-1, ConfigNode-2, and ConfigNode-3. ConfigNodes do not directly handle client read or write requests. Instead, they guide the distribution of time series' schema and data within the cluster using a series of [load balancing algorithms](https://iotdb.apache.org/UserGuide/latest/Technical-Insider/Cluster-data-partitioning.html). -- **DataNode:** DataNodes are responsible for reading and writing time series' schema and data. Each DataNode can accept client read and write requests and provide corresponding services, as illustrated with DataNode-1, DataNode-2, and DataNode-3 in the above figure. When a DataNode receives client requests, it can process them directly or forward them if it has the relevant routing information cached locally. Otherwise, it queries the ConfigNode for routing details and caches the information to improve the efficiency of subsequent requests. -- **AINode:** AINodes interact with ConfigNodes and DataNodes to extend IoTDB's capabilities for data intelligence analysis on time series data. They support registering pre-trained machine learning models from external sources and performing time series analysis tasks using simple SQL statements on specified data. This process integrates model creation, management, and inference within the database engine. Currently, the system provides built-in algorithms or self-training models for common time series analysis scenarios, such as forecasting and anomaly detection. - -## Slots - -IoTDB divides time series' schema and data into smaller, more manageable units called **slots**. Slots are logical entities, and in an IoTDB cluster, the **SchemaSlots** and **DataSlots** are defined as follows: -- **SchemaSlot:** A SchemaSlot represents a subset of the time series' schema collection. The total number of SchemaSlots is fixed, with a default value of 1000. IoTDB uses a hashing algorithm to evenly distribute all devices across these SchemaSlots. -- **DataSlot:** A DataSlot represents a subset of the time series' data collection. Based on the SchemaSlots, the data for corresponding devices is further divided into DataSlots by a fixed time interval. The default time interval for a DataSlot is 7 days. - -## Region - -In IoTDB, time series' schema and data are replicated across DataNodes to ensure high availability in the cluster. However, replicating data at the slot level can increase management complexity and reduce write throughput. To address this, IoTDB introduces the concept of **Region**, which groups SchemaSlots and DataSlots into **SchemaRegions** and **DataRegions** respectively. Replication is then performed at the Region level. The definitions of SchemaRegion and DataRegion are as follows: -- **SchemaRegion**: A SchemaRegion is the basic unit for storing and replicating time series' schema. All SchemaSlots in a database are evenly distributed across the database's SchemaRegions. SchemaRegions with the same RegionID are replicas of each other. For example, in the figure above, SchemaRegion-1 has three replicas located on DataNode-1, DataNode-2, and DataNode-3. -- **DataRegion**: A DataRegion is the basic unit for storing and replicating time series' data. All DataSlots in a database are evenly distributed across the database's DataRegions. DataRegions with the same RegionID are replicas of each other. For instance, in the figure above, DataRegion-2 has two replicas located on DataNode-1 and DataNode-2. - -## Replica Groups -Region replicas are critical for the fault tolerance of the cluster. Each Region's replicas are organized into **replica groups**, where the replicas are assigned roles as either **leader** or **follower**, working together to provide read and write services. Recommended replica group configurations under different architectures are as follows: - -| Category | Parameter | Single-node Recommended Configuration | Distributed Recommended Configuration | -|:------------:|:-----------------------:|:------------------------------------:|:-------------------------------------:| -| Schema | `schema_replication_factor` | 1 | 3 | -| Data | `data_replication_factor` | 1 | 2 | \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Basic-Concept/Cluster-data-partitioning.md b/src/UserGuide/V1.3.0-2/Basic-Concept/Cluster-data-partitioning.md deleted file mode 100644 index 479f95527..000000000 --- a/src/UserGuide/V1.3.0-2/Basic-Concept/Cluster-data-partitioning.md +++ /dev/null @@ -1,110 +0,0 @@ - - -# Load Balance -This document introduces the partitioning strategies and load balance strategies in IoTDB. According to the characteristics of time series data, IoTDB partitions them by series and time dimensions. Combining a series partition with a time partition creates a partition, the unit of division. To enhance throughput and reduce management costs, these partitions are evenly allocated to RegionGroups, which serve as the unit of replication. The RegionGroup's Regions then determine the storage location, with the leader Region managing the primary load. During this process, the Region placement strategy determines which nodes will host the replicas, while the leader selection strategy designates which Region will act as the leader. - -## Partitioning Strategy & Partition Allocation -IoTDB implements tailored partitioning algorithms for time series data. Building on this foundation, the partition information cached on both ConfigNodes and DataNodes is not only manageable in size but also clearly differentiated between hot and cold. Subsequently, balanced partitions are evenly allocated across the cluster's RegionGroups to achieve storage balance. - -### Partitioning Strategy -IoTDB maps each sensor in the production environment to a time series. The time series are then partitioned using the series partitioning algorithm to manage their schema, and combined with the time partitioning algorithm to manage their data. The following figure illustrates how IoTDB partitions time series data. - - - -#### Partitioning Algorithm -Because numerous devices and sensors are commonly deployed in production environments, IoTDB employs a series partitioning algorithm to ensure the size of partition information is manageable. Since the generated time series associated with timestamps, IoTDB uses a time partioning algorithm to clearly distinguish between hot and cold partitions. - -##### Series Partitioning Algorithm -By default, IoTDB limits the number of series partitions to 1000 and configures the series partitioning algorithm to use a hash partitioning algorithm. This leads to the following outcomes: -+ Since the number of series partitions is a fixed constant, the mapping between series and series partitions remains stable. As a result, IoTDB does not require frequent data migrations. -+ The load across series partitions is relatively balanced because the number of series partitions is much smaller than the number of sensors deployed in the production environment. - -Furthermore, if a more accurate estimate of the actual load in the production environment is available, the series partitioning algorithm can be configured to use a customized hash partitioning or a list partitioning to achieve a more uniform load distribution across all series partitions. - -##### Time Partitioning Algorithm -The time partitioning algorithm converts a given timestamp to the corresponding time partition by - -$$\left\lfloor\frac{\text{Timestamp}-\text{StartTimestamp}}{\text{TimePartitionInterval}}\right\rfloor.$$ - -In this equation, both $\text{StartTimestamp}$ and $\text{TimePartitionInterval}$ are configurable parameters to accommodate various production environments. The $\text{StartTimestamp}$ represents the starting time of the first time partition, while the $\text{TimePartitionInterval}$ defines the duration of each time partition. By default, the $\text{TimePartitionInterval}$ is set to seven day. - -#### Schema Partitioning -Since the series partitioning algorithm evenly partitions the time series, each series partition corresponds to a schema partition. These schema partitions are then evenly allocated across the SchemaRegionGroups to achieve a balanced schema distribution. - -#### Data Partitioning -Combining a series partition with a time partition creates a data partition. Since the series partitioning algorithm evenly partitions the time series, the load of data partitions within a specified time partition remains balanced. These data partitions are then evenly allocated across the DataRegionGroups to achieve balanced data distribution. - -### Partition Allocation -IoTDB uses RegionGroups to enable elastic storage of time series, with the number of RegionGroups in the cluster determined by the total resources available across all DataNodes. Since the number of RegionGroups is dynamic, IoTDB can easily scale out. Both the SchemaRegionGroup and DataRegionGroup follow the same partition allocation algorithm, which evenly splits all series partitions. The following figure demonstrates the partition allocation process, where the dynamic RegionGroups match the variously expending time series and cluster. - - - -#### RegionGroup Expansion -The number of RegionGroups is given by - -$$\text{RegionGroupNumber}=\left\lfloor\frac{\sum_{i=1}^{DataNodeNumber}\text{RegionNumber}_i}{\text{ReplicationFactor}}\right\rfloor.$$ - -In this equation, $\text{RegionNumber}_i$ represents the number of Regions expected to be hosted on the $i$-th DataNode, while $\text{ReplicationFactor}$ denotes the number of Regions within each RegionGroup. Both $\text{RegionNumber}_i$ and $\text{ReplicationFactor}$ are configurable parameters. The $\text{RegionNumber}_i$ can be determined by the available hardware resources---such as CPU cores, memory sizes, etc.---on the $i$-th DataNode to accommodate different physical servers. The $\text{ReplicationFactor}$ can be adjusted to ensure diverse levels of fault tolerance. - -#### Allocation Algorithm -Both the SchemaRegionGroup and the DataRegionGroup follow the same allocation algorithm--splitting all series partitions evenly. As a result, each SchemaRegionGroup holds the same number of schema partitions, ensuring balanced schema storage. Similarly, for each time partition, each DataRegionGroup acquires the data partitions corresponding to the series partitions it holds. Consequently, the data partitions within a time partition are evenly distributed across all DataRegionGroups, ensuring balanced data storage in each time partition. - -Notably, IoTDB effectively leverages the characteristics of time series data. When the TTL (Time to Live) is configured, IoTDB enables migration-free elastic storage for time series data. This feature facilitates cluster expansion while minimizing the impact on online operations. The figures above illustrate an instance of this feature: newborn data partitions are evenly allocated to each DataRegion, and expired data are automatically archived. As a result, the cluster's storage will eventually remain balanced. - -## Balance Strategy -To enhance the cluster's availability and performance, IoTDB employs sophisticated storage load and computing load balance algorithms. - -### Storage Load Balance -The number of Regions held by a DataNode reflects its storage load. If the difference in the number of Regions across DataNodes is relatively large, the DataNode with more Regions is likely to become a storage bottleneck. Although a straightforward Round Robin placement algorithm can achieve storage balance by ensuring that each DataNode hosts an equal number of Regions, it compromises the cluster's fault tolerance, as illustrated below: - - - -+ Assume the cluster has 4 DataNodes, 4 RegionGroups and a replication factor of 2. -+ Place RegionGroup $r_1$'s 2 Regions on DataNodes $n_1$ and $n_2$. -+ Place RegionGroup $r_2$'s 2 Regions on DataNodes $n_3$ and $n_4$. -+ Place RegionGroup $r_3$'s 2 Regions on DataNodes $n_1$ and $n_3$. -+ Place RegionGroup $r_4$'s 2 Regions on DataNodes $n_2$ and $n_4$. - -In this scenario, if DataNode $n_2$ fails, the load previously handled by DataNode $n_2$ would be transferred solely to DataNode $n_1$, potentially overloading it. - -To address this issue, IoTDB employs a Region placement algorithm that not only evenly distributes Regions across all DataNodes but also ensures that each DataNode can offload its storage to sufficient other DataNodes in the event of a failure. As a result, the cluster achieves balanced storage distribution and a high level of fault tolerance, ensuring its availability. - -### Computing Load Balance -The number of leader Regions held by a DataNode reflects its Computing load. If the difference in the number of leaders across DataNodes is relatively large, the DataNode with more leaders is likely to become a Computing bottleneck. If the leader selection process is conducted using a transparent Greedy algorithm, the result may be an unbalanced leader distribution when the Regions are fault-tolerantly placed, as demonstrated below: - - - -+ Assume the cluster has 4 DataNodes, 4 RegionGroups and a replication factor of 2. -+ Select RegionGroup $r_5$'s Region on DataNode $n_5$ as the leader. -+ Select RegionGroup $r_6$'s Region on DataNode $n_7$ as the leader. -+ Select RegionGroup $r_7$'s Region on DataNode $n_7$ as the leader. -+ Select RegionGroup $r_8$'s Region on DataNode $n_8$ as the leader. - -Please note that all the above steps strictly follow the Greedy algorithm. However, by Step 3, selecting the leader of RegionGroup $r_7$ on either DataNode $n_5$ or $n_7$ results in an unbalanced leader distribution. The rationale is that each greedy step lacks a global perspective, leading to a locally optimal solution. - -To address this issue, IoTDB employs a leader selection algorithm that can consistently balance the cluster's leader distribution. Consequently, the cluster achieves balanced Computing load distribution, ensuring its performance. - -## Source Code -+ [Data Partitioning](https://github.com/apache/iotdb/tree/master/iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/partition) -+ [Partition Allocation](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/partition) -+ [Region Placement](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/region) -+ [Leader Selection](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/router/leader) \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Basic-Concept/Data-Model-and-Terminology.md b/src/UserGuide/V1.3.0-2/Basic-Concept/Data-Model-and-Terminology.md deleted file mode 100644 index 4b63a268f..000000000 --- a/src/UserGuide/V1.3.0-2/Basic-Concept/Data-Model-and-Terminology.md +++ /dev/null @@ -1,149 +0,0 @@ - - -# Data Model - -A wind power IoT scenario is taken as an example to illustrate how to create a correct data model in IoTDB. - -According to the enterprise organization structure and equipment entity hierarchy, it is expressed as an attribute hierarchy structure, as shown below. The hierarchical from top to bottom is: power group layer - power plant layer - entity layer - measurement layer. ROOT is the root node, and each node of measurement layer is a leaf node. In the process of using IoTDB, the attributes on the path from ROOT node is directly connected to each leaf node with ".", thus forming the name of a timeseries in IoTDB. For example, The left-most path in Figure 2.1 can generate a timeseries named `root.ln.wf01.wt01.status`. - -
- -Here are the basic concepts of the model involved in IoTDB. - -## Measurement, Entity, Database, Path - -### Measurement (Also called field) - -It is information measured by detection equipment in an actual scene and can transform the sensed information into an electrical signal or other desired form of information output and send it to IoTDB. In IoTDB, all data and paths stored are organized in units of measurements. - -### Entity (Also called device) - -**An entity** is an equipped with measurements in real scenarios. In IoTDB, all measurements should have their corresponding entities. Entities do not need to be created manually, the default is the second last layer. - -### Database - -**A group of entities.** Users can create any prefix path as a database. Provided that there are four timeseries `root.ln.wf01.wt01.status`, `root.ln.wf01.wt01.temperature`, `root.ln.wf02.wt02.hardware`, `root.ln.wf02.wt02.status`, two devices `wf01`, `wf02` under the path `root.ln` may belong to the same owner or the same manufacturer, so d1 and d2 are closely related. At this point, the prefix path root.vehicle can be designated as a database, which will enable IoTDB to store all devices under it in the same folder. Newly added devices under `root.ln` will also belong to this database. - -In general, it is recommended to create 1 database. - -> Note1: A full path (`root.ln.wf01.wt01.status` as in the above example) is not allowed to be set as a database. -> -> Note2: The prefix of a timeseries must belong to a database. Before creating a timeseries, users must set which database the series belongs to. Only timeseries whose database is set can be persisted to disk. -> -> Note3: The number of character in the path as database, including `root.`, shall not exceed 64. - -Once a prefix path is set as a database, the database settings cannot be changed. - -After a database is set, the ancestral layers, children and descendant layers of the corresponding prefix path are not allowed to be set up again (for example, after `root.ln` is set as the database, the root layer and `root.ln.wf01` are not allowed to be created as database). - -The Layer Name of database can only consist of characters, numbers, and underscores, like `root.storagegroup_1`. - -**Schema-less writing**: When metadata is not defined, data can be directly written through an insert statement, and the required metadata will be automatically recognized and registered in the database, achieving automatic modeling. - -### Path - -A `path` is an expression that conforms to the following constraints: - -```sql -path - : nodeName ('.' nodeName)* - ; - -nodeName - : wildcard? identifier wildcard? - | wildcard - ; - -wildcard - : '*' - | '**' - ; -``` - -We call the part of a path divided by `'.'` as a `node` or `nodeName`. For example: `root.a.b.c` is a path with 4 nodes. - -The following are the constraints on the `nodeName`: - -* `root` is a reserved character, and it is only allowed to appear at the beginning layer of the time series mentioned below. If `root` appears in other layers, it cannot be parsed and an error will be reported. -* Except for the beginning layer (`root`) of the time series, the characters supported in other layers are as follows: - - * [ 0-9 a-z A-Z _ ] (letters, numbers, underscore) - * ['\u2E80'..'\u9FFF'] (Chinese characters) -* In particular, if the system is deployed on a Windows machine, the database layer name will be case-insensitive. For example, creating both `root.ln` and `root.LN` at the same time is not allowed. -* If you want to use special characters in `nodeName`, you can quote it with back quote, detailed information can be found from charpter Syntax-Conventions,click here: [Syntax-Conventions](https://iotdb.apache.org/UserGuide/Master/Syntax-Conventions/Literal-Values.html). - -### Path Pattern - -In order to make it easier and faster to express multiple timeseries paths, IoTDB provides users with the path pattern. Users can construct a path pattern by using wildcard `*` and `**`. Wildcard can appear in any node of the path. - -`*` represents one node. For example, `root.vehicle.*.sensor1` represents a 4-node path which is prefixed with `root.vehicle` and suffixed with `sensor1`. - -`**` represents (`*`)+, which is one or more nodes of `*`. For example, `root.vehicle.device1.**` represents all paths prefixed by `root.vehicle.device1` with nodes num greater than or equal to 4, like `root.vehicle.device1.*`, `root.vehicle.device1.*.*`, `root.vehicle.device1.*.*.*`, etc; `root.vehicle.**.sensor1` represents a path which is prefixed with `root.vehicle` and suffixed with `sensor1` and has at least 4 nodes. - -> Note1: Wildcard `*` and `**` cannot be placed at the beginning of the path. - - -## Timeseries - -### Timestamp - -The timestamp is the time point at which data is produced. It includes absolute timestamps and relative timestamps. For detailed description, please go to [Data Type doc](./Data-Type.md). - -### Data point - -**A "time-value" pair**. - -### Timeseries - -**The record of a measurement of an entity on the time axis.** Timeseries is a series of data points. - -A measurement of an entity corresponds to a timeseries. - -Also called meter, timeline, and tag, parameter in real time database. - -The number of measurements managed by IoTDB can reach more than billions. - -For example, if entity wt01 in power plant wf01 of power group ln has a measurement named status, its timeseries can be expressed as: `root.ln.wf01.wt01.status`. - -### Aligned timeseries - -There is a situation that multiple measurements of an entity are sampled simultaneously in practical applications, forming multiple timeseries with the same time column. Such a group of timeseries can be modeled as aligned timeseries in Apache IoTDB. - -The timestamp columns of a group of aligned timeseries need to be stored only once in memory and disk when inserting data, instead of once per timeseries. - -It would be best if you created a group of aligned timeseries at the same time. - -You cannot create non-aligned timeseries under the entity to which the aligned timeseries belong, nor can you create aligned timeseries under the entity to which the non-aligned timeseries belong. - -When querying, you can query each timeseries separately. - -When inserting data, it is allowed to insert null value in the aligned timeseries. - - - -In the following chapters of data definition language, data operation language and Java Native Interface, various operations related to aligned timeseries will be introduced one by one. - -## Schema Template - -In the actual scenario, many entities collect the same measurements, that is, they have the same measurements name and type. A **schema template** can be declared to define the collectable measurements set. Schema template helps save memory by implementing schema sharing. For detailed description, please refer to [Schema Template doc](../User-Manual/Operate-Metadata.md#Device-Template). - -In the following chapters of, data definition language, data operation language and Java Native Interface, various operations related to schema template will be introduced one by one. diff --git a/src/UserGuide/V1.3.0-2/Basic-Concept/Data-Type.md b/src/UserGuide/V1.3.0-2/Basic-Concept/Data-Type.md deleted file mode 100644 index 1e5242c92..000000000 --- a/src/UserGuide/V1.3.0-2/Basic-Concept/Data-Type.md +++ /dev/null @@ -1,180 +0,0 @@ - - -# Data Type - -## Basic Data Type - -IoTDB supports the following data types: - -* BOOLEAN (Boolean) -* INT32 (Integer) -* INT64 (Long Integer) -* FLOAT (Single Precision Floating Point) -* DOUBLE (Double Precision Floating Point) -* TEXT (String) - - - -### Float Precision - -The time series of **FLOAT** and **DOUBLE** type can specify (MAX\_POINT\_NUMBER, see [this page](../SQL-Manual/SQL-Manual.md) for more information on how to specify), which is the number of digits after the decimal point of the floating point number, if the encoding method is [RLE](Encoding-and-Compression.md) or [TS\_2DIFF](Encoding-and-Compression.md). If MAX\_POINT\_NUMBER is not specified, the system will use [float\_precision](../Reference/DataNode-Config-Manual.md) in the configuration file `iotdb-datanode.properties`. - -```sql -CREATE TIMESERIES root.vehicle.d0.s0 WITH DATATYPE=FLOAT, ENCODING=RLE, 'MAX_POINT_NUMBER'='2'; -``` - -* For Float data value, The data range is (-Integer.MAX_VALUE, Integer.MAX_VALUE), rather than Float.MAX_VALUE, and the max_point_number is 19, caused by the limition of function Math.round(float) in Java. -* For Double data value, The data range is (-Long.MAX_VALUE, Long.MAX_VALUE), rather than Double.MAX_VALUE, and the max_point_number is 19, caused by the limition of function Math.round(double) in Java (Long.MAX_VALUE=9.22E18). - -### Data Type Compatibility - -When the written data type is inconsistent with the data type of time-series, -- If the data type of time-series is not compatible with the written data type, the system will give an error message. -- If the data type of time-series is compatible with the written data type, the system will automatically convert the data type. - -The compatibility of each data type is shown in the following table: - -| Series Data Type | Supported Written Data Types | -|------------------|------------------------------| -| BOOLEAN | BOOLEAN | -| INT32 | INT32 | -| INT64 | INT32 INT64 | -| FLOAT | INT32 FLOAT | -| DOUBLE | INT32 INT64 FLOAT DOUBLE | -| TEXT | TEXT | - -## Timestamp - -The timestamp is the time point at which data is produced. It includes absolute timestamps and relative timestamps - -### Absolute timestamp - -Absolute timestamps in IoTDB are divided into two types: LONG and DATETIME (including DATETIME-INPUT and DATETIME-DISPLAY). When a user inputs a timestamp, he can use a LONG type timestamp or a DATETIME-INPUT type timestamp, and the supported formats of the DATETIME-INPUT type timestamp are shown in the table below: - -
- -**Supported formats of DATETIME-INPUT type timestamp** - - - -| Format | -| :--------------------------: | -| yyyy-MM-dd HH:mm:ss | -| yyyy/MM/dd HH:mm:ss | -| yyyy.MM.dd HH:mm:ss | -| yyyy-MM-dd HH:mm:ssZZ | -| yyyy/MM/dd HH:mm:ssZZ | -| yyyy.MM.dd HH:mm:ssZZ | -| yyyy/MM/dd HH:mm:ss.SSS | -| yyyy-MM-dd HH:mm:ss.SSS | -| yyyy.MM.dd HH:mm:ss.SSS | -| yyyy-MM-dd HH:mm:ss.SSSZZ | -| yyyy/MM/dd HH:mm:ss.SSSZZ | -| yyyy.MM.dd HH:mm:ss.SSSZZ | -| ISO8601 standard time format | - -
- - -IoTDB can support LONG types and DATETIME-DISPLAY types when displaying timestamps. The DATETIME-DISPLAY type can support user-defined time formats. The syntax of the custom time format is shown in the table below: - -
- -**The syntax of the custom time format** - - -| Symbol | Meaning | Presentation | Examples | -| :----: | :-------------------------: | :----------: | :--------------------------------: | -| G | era | era | era | -| C | century of era (>=0) | number | 20 | -| Y | year of era (>=0) | year | 1996 | -| | | | | -| x | weekyear | year | 1996 | -| w | week of weekyear | number | 27 | -| e | day of week | number | 2 | -| E | day of week | text | Tuesday; Tue | -| | | | | -| y | year | year | 1996 | -| D | day of year | number | 189 | -| M | month of year | month | July; Jul; 07 | -| d | day of month | number | 10 | -| | | | | -| a | halfday of day | text | PM | -| K | hour of halfday (0~11) | number | 0 | -| h | clockhour of halfday (1~12) | number | 12 | -| | | | | -| H | hour of day (0~23) | number | 0 | -| k | clockhour of day (1~24) | number | 24 | -| m | minute of hour | number | 30 | -| s | second of minute | number | 55 | -| S | fraction of second | millis | 978 | -| | | | | -| z | time zone | text | Pacific Standard Time; PST | -| Z | time zone offset/id | zone | -0800; -08:00; America/Los_Angeles | -| | | | | -| ' | escape for text | delimiter | | -| '' | single quote | literal | ' | - -
- -### Relative timestamp - -Relative time refers to the time relative to the server time ```now()``` and ```DATETIME``` time. - - Syntax: - - ``` - Duration = (Digit+ ('Y'|'MO'|'W'|'D'|'H'|'M'|'S'|'MS'|'US'|'NS'))+ - RelativeTime = (now() | DATETIME) ((+|-) Duration)+ - - ``` - -
- -**The syntax of the duration unit** - - -| Symbol | Meaning | Presentation | Examples | -| :----: | :---------: | :----------------------: | :------: | -| y | year | 1y=365 days | 1y | -| mo | month | 1mo=30 days | 1mo | -| w | week | 1w=7 days | 1w | -| d | day | 1d=1 day | 1d | -| | | | | -| h | hour | 1h=3600 seconds | 1h | -| m | minute | 1m=60 seconds | 1m | -| s | second | 1s=1 second | 1s | -| | | | | -| ms | millisecond | 1ms=1000_000 nanoseconds | 1ms | -| us | microsecond | 1us=1000 nanoseconds | 1us | -| ns | nanosecond | 1ns=1 nanosecond | 1ns | - -
- - eg: - - ``` - now() - 1d2h //1 day and 2 hours earlier than the current server time - now() - 1w //1 week earlier than the current server time - ``` - - > Note:There must be spaces on the left and right of '+' and '-'. diff --git a/src/UserGuide/V1.3.0-2/Basic-Concept/Encoding-and-Compression.md b/src/UserGuide/V1.3.0-2/Basic-Concept/Encoding-and-Compression.md deleted file mode 100644 index f70377a93..000000000 --- a/src/UserGuide/V1.3.0-2/Basic-Concept/Encoding-and-Compression.md +++ /dev/null @@ -1,128 +0,0 @@ - - -# Encoding and Compression - - -## Encoding Methods - -To improve the efficiency of data storage, it is necessary to encode data during data writing, thereby reducing the amount of disk space used. In the process of writing and reading data, the amount of data involved in the I/O operations can be reduced to improve performance. IoTDB supports the following encoding methods for different data types: - -1. PLAIN - - PLAIN encoding, the default encoding mode, i.e, no encoding, supports multiple data types. It has high compression and decompression efficiency while suffering from low space storage efficiency. - -2. TS_2DIFF - - Second-order differential encoding is more suitable for encoding monotonically increasing or decreasing sequence data, and is not recommended for sequence data with large fluctuations. - -3. RLE - - Run-length encoding is suitable for storing sequence with continuous values, and is not recommended for sequence data with most of the time different values. - - Run-length encoding can also be used to encode floating-point numbers, while it is necessary to specify reserved decimal digits (MAX\_POINT\_NUMBER) when creating time series. It is more suitable to store sequence data where floating-point values appear continuously, monotonously increasing or decreasing, and it is not suitable for storing sequence data with high precision requirements after the decimal point or with large fluctuations. - - > TS_2DIFF and RLE have precision limit for data type of float and double. By default, two decimal places are reserved. GORILLA is recommended. - -4. GORILLA - - GORILLA encoding is lossless. It is more suitable for numerical sequence with similar values and is not recommended for sequence data with large fluctuations. - - Currently, there are two versions of GORILLA encoding implementation, it is recommended to use `GORILLA` instead of `GORILLA_V1` (deprecated). - - Usage restrictions: When using GORILLA to encode INT32 data, you need to ensure that there is no data point with the value `Integer.MIN_VALUE` in the sequence. When using GORILLA to encode INT64 data, you need to ensure that there is no data point with the value `Long.MIN_VALUE` in the sequence. - -5. DICTIONARY - - DICTIONARY encoding is lossless. It is suitable for TEXT data with low cardinality (i.e. low number of distinct values). It is not recommended to use it for high-cardinality data. - -6. ZIGZAG - - ZIGZAG encoding maps signed integers to unsigned integers so that numbers with a small absolute value (for instance, -1) have a small variant encoded value too. It does this in a way that "zig-zags" back and forth through the positive and negative integers. - -7. CHIMP - - CHIMP encoding is lossless. It is the state-of-the-art compression algorithm for streaming floating point data, providing impressive savings compared to earlier approaches. It is suitable for any numerical sequence with similar values and works best for sequence data without large fluctuations and/or random noise. - - Usage restrictions: When using CHIMP to encode INT32 data, you need to ensure that there is no data point with the value `Integer.MIN_VALUE` in the sequence. When using CHIMP to encode INT64 data, you need to ensure that there is no data point with the value `Long.MIN_VALUE` in the sequence. - -8. SPRINTZ - - SPRINTZ coding is a type of lossless data compression technique that involves predicting the original time series data, applying Zigzag encoding, bit-packing encoding, and run-length encoding. SPRINTZ encoding is effective for time series data with small absolute differences between values. However, it may not be as effective for time series data with large differences between values, indicating large fluctuation. -9. RLBE - - RLBE is a lossless encoding that combines the ideas of differential encoding, bit-packing encoding, run-length encoding, Fibonacci encoding and concatenation. RLBE encoding is suitable for time series data with increasing and small increment value, and is not suitable for time series data with large fluctuation. - - -### Correspondence between data type and encoding - -The five encodings described in the previous sections are applicable to different data types. If the correspondence is wrong, the time series cannot be created correctly. - -The correspondence between the data type and its supported encodings is summarized in the Table below. - - -| **Data Type** | **Recommended Encoding (default)** | **Supported Encoding** | -| ------------- | --------------------------- | ----------------------------------------------------------- | -| BOOLEAN | RLE | PLAIN, RLE | -| INT32 | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | -| INT64 | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | -| FLOAT | GORILLA | PLAIN, RLE, TS_2DIFF, GORILLA, CHIMP, SPRINTZ, RLBE | -| DOUBLE | GORILLA | PLAIN, RLE, TS_2DIFF, GORILLA, CHIMP, SPRINTZ, RLBE | -| TEXT | PLAIN | PLAIN, DICTIONARY | - -When the data type specified by the user does not correspond to the encoding method, the system will prompt an error. - -As shown below, the second-order difference encoding does not support the Boolean type: - -``` -IoTDB> create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF -Msg: 507: encoding TS_2DIFF does not support BOOLEAN -``` -## Compression - -When the time series is written and encoded as binary data according to the specified type, IoTDB compresses the data using compression technology to further improve space storage efficiency. Although both encoding and compression are designed to improve storage efficiency, encoding techniques are usually available only for specific data types (e.g., second-order differential encoding is only suitable for INT32 or INT64 data type, and storing floating-point numbers requires multiplying them by 10m to convert to integers), after which the data is converted to a binary stream. The compression method (SNAPPY) compresses the binary stream, so the use of the compression method is no longer limited by the data type. - -### Basic Compression Methods - -IoTDB allows you to specify the compression method of the column when creating a time series, and supports the following compression methods: - -* UNCOMPRESSED - -* SNAPPY - -* LZ4 (Recommended compression method) - -* GZIP - -* ZSTD - -* LZMA2 - -The specified syntax for compression is detailed in [Create Timeseries Statement](../SQL-Manual/SQL-Manual.md). - -### Compression Ratio Statistics - -Compression ratio statistics file: data/datanode/system/compression_ratio - -* ratio_sum: sum of memtable compression ratios -* memtable_flush_time: memtable flush times - -The average compression ratio can be calculated by `ratio_sum / memtable_flush_time` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Basic-Concept/Navigating_Time_Series_Data.md b/src/UserGuide/V1.3.0-2/Basic-Concept/Navigating_Time_Series_Data.md deleted file mode 100644 index ca47e475b..000000000 --- a/src/UserGuide/V1.3.0-2/Basic-Concept/Navigating_Time_Series_Data.md +++ /dev/null @@ -1,65 +0,0 @@ - -# Navigating Time Series Data - -## What Is Time Series Data? - -In today's era of the Internet of Things, various scenarios such as the Internet of Things and industrial scenarios are undergoing digital transformation. People collect various states of devices by installing sensors on them. If the motor collects voltage and current, the blade speed, angular velocity, and power generation of the fan; Vehicle collection of latitude and longitude, speed, and fuel consumption; The vibration frequency, deflection, displacement, etc. of the bridge. The data collection of sensors has penetrated into various industries. - -![](/img/20240505154735.png) - -Generally speaking, we refer to each collection point as a measurement point (also known as a physical quantity, time series, timeline, signal quantity, indicator, measurement value, etc.). Each measurement point continuously collects new data information over time, forming a time series. In the form of a table, each time series is a table formed by two columns: time and value; In a graphical way, each time series is a trend chart formed over time, which can also be vividly referred to as the device's electrocardiogram. - -![](/img/20240505154843.png) - -The massive time series data generated by sensors is the foundation of digital transformation in various industries, so our modeling of time series data mainly focuses on equipment and sensors. - -## Key Concepts of Time Series Data - -The main concepts involved in time-series data can be divided from bottom to top: data points, measurement points, and equipment. - -![](/img/20240505154513.png) - -### Data Point - -- Definition: Consists of a timestamp and a value, where the timestamp is of type long and the value can be of various types such as BOOLEAN, FLOAT, INT32, etc. -- Example: A row of a time series in the form of a table in the above figure, or a point of a time series in the form of a graph, is a data point. - -![](/img/20240505154432.png) - -### Measurement Points - -- Definition: It is a time series formed by multiple data points arranged in increments according to timestamps. Usually, a measuring point represents a collection point and can regularly collect physical quantities of the environment it is located in. -- Also known as: physical quantity, time series, timeline, semaphore, indicator, measurement value, etc -- Example: - - Electricity scenario: current, voltage - - Energy scenario: wind speed, rotational speed - - Vehicle networking scenarios: fuel consumption, vehicle speed, longitude, dimensions - - Factory scenario: temperature, humidity - -### Device - -- Definition: Corresponding to a physical device in an actual scene, usually a collection of measurement points, identified by one to multiple labels -- Example: - - Vehicle networking scenario: Vehicles identified by vehicle identification code (VIN) - - Factory scenario: robotic arm, unique ID identification generated by IoT platform - - Energy scenario: Wind turbines, identified by region, station, line, model, instance, etc - - Monitoring scenario: CPU, identified by machine room, rack, Hostname, device type, etc \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/AINode_Deployment_timecho.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/AINode_Deployment_timecho.md deleted file mode 100644 index 815a6d84b..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/AINode_Deployment_timecho.md +++ /dev/null @@ -1,556 +0,0 @@ - -# AINode Deployment - -## AINode Introduction - -### Capability Introduction - - AINode is the third type of endogenous node provided by IoTDB after the Configurable Node and DataNode. This node extends its ability to perform machine learning analysis on time series by interacting with the DataNode and Configurable Node of the IoTDB cluster. It supports the introduction of existing machine learning models from external sources for registration and the use of registered models to complete time series analysis tasks on specified time series data through simple SQL statements. The creation, management, and inference of models are integrated into the database engine. Currently, machine learning algorithms or self-developed models are available for common time series analysis scenarios, such as prediction and anomaly detection. - -### Delivery Method - It is an additional package outside the IoTDB cluster, with independent installation and activation (if you need to try or use it, please contact Timecho Technology Business or Technical Support). - -### Deployment mode -
- - -
- -## Installation preparation - -### Get installation package - - Users can download the software installation package for AINode, download and unzip it to complete the installation of AINode. - - Unzip and install the package - `(iotdb-enterprise-ainode-.zip)`, The directory structure after unpacking the installation package is as follows: -| **Catalogue** | **Type** | **Explain** | -| ------------ | -------- | ------------------------------------------------ | -| lib | folder | AINode compiled binary executable files and related code dependencies | -| sbin | folder | The running script of AINode can start, remove, and stop AINode | -| conf | folder | Contains configuration items for AINode, specifically including the following configuration items | -| LICENSE | file | Certificate | -| NOTICE | file | Tips | -| README_ZH.md | file | Explanation of the Chinese version of the markdown format | -| `README.md` | file | Instructions | - -### Environment preparation -- Suggested operating environment:Ubuntu, CentOS, MacOS - -- Runtime Environment - - Python>=3.8 and Python <= 3.14 is sufficient in a networked environment, and comes with pip and venv tools; Python 3.8 version is required for non networked environments, and download the zip package for the corresponding operating system from [here](https://cloud.tsinghua.edu.cn/d/4c1342f6c272439aa96c/?p=%2Flibs&mode=list) (Note that when downloading dependencies, you need to select the zip file in the libs folder, as shown in the following figure). Copy all files in the folder to the `lib` folder in the `iotdb-enterprise-ainode-` folder, and follow the steps below to start AINode. - - - - - There must be a Python interpreter in the environment variables that can be directly called through the `python` instruction. - - It is recommended to create a Python interpreter venv virtual environment in the `iotdb-enterprise-ainode-` folder. If installing version 3.8.0 virtual environment, the statement is as follows: - ```shell - # Install version 3.8.0 of Venv , Create a virtual environment with the folder name `venv`. - ../Python-3.8.0/python -m venv `venv` - ``` - -## Installation steps - -### Install AINode - -1. AINode activation - - Require IoTDB to be in normal operation and have AINode module authorization in the license (usually not in the license, please contact T Business or technical support personnel to obtain AINode module authorization). - - The authorization method for activating the AINode module is as follows: - - Method 1: Activate file copy activation - - After restarting the confignode node, enter the activation folder, copy the system_info file to the Timecho staff, and inform them to apply for independent authorization for AINode; - - Received the license file returned by the staff; - - Put the license file into the activation folder of the corresponding node; - -- Method 2: Activate Script Activation - - Obtain the required machine code for activation, enter the `sbin` directory of the installation directory, and execute the activation script: - ```shell - cd sbin - ./start-activate.sh - ``` - - The following information is displayed. Please copy the machine code (i.e. this string of characters) to the Timecho staff and inform them to apply for independent authorization of AINode: - ```shell - Please copy the system_info's content and send it to Timecho: - Y17hFA0xRCE1TmkVxILuCIEPc7uJcr5bzlXWiptw8uZTmTX5aThfypQdLUIhMljw075hNRSicyvyJR9JM7QaNm1gcFZPHVRWVXIiY5IlZkXdxCVc1erXMsbCqUYsR2R2Mw4PSpFJsUF5jHWSoFIIjQ2bmJFW5P52KCccFMVeHTc= - Please enter license: - ``` - - Enter the activation code returned by the staff into the `Please enter license:` command prompt in the previous step, as shown below: - ```shell - Please enter license: - Jw+MmF+AtexsfgNGOFgTm83BgXbq0zT1+fOfPvQsLlj6ZsooHFU6HycUSEGC78eT1g67KPvkcLCUIsz2QpbyVmPLr9x1+kVjBubZPYlVpsGYLqLFc8kgpb5vIrPLd3hGLbJ5Ks8fV1WOVrDDVQq89YF2atQa2EaB9EAeTWd0bRMZ+s9ffjc/1Zmh9NSP/T3VCfJcJQyi7YpXWy5nMtcW0gSV+S6fS5r7a96PjbtE0zXNjnEhqgRzdU+mfO8gVuUNaIy9l375cp1GLpeCh6m6pF+APW1CiXLTSijK9Qh3nsL5bAOXNeob5l+HO5fEMgzrW8OJPh26Vl6ljKUpCvpTiw== - License has been stored to sbin/../activation/license - Import completed. Please start cluster and excute 'show cluster' to verify activation status - ``` -- After updating the license, restart the DataNode node and enter the sbin directory of IoTDB to start the datanode: - ```shell - cd sbin - ./start-datanode.sh -d #The parameter'd 'will be started in the background - ``` - - 2. Check the kernel architecture of Linux - ```shell - uname -m - ``` - - 3. Import Python environment [Download](https://repo.anaconda.com/miniconda/) - - Recommend downloading the py311 version application and importing it into the iotdb dedicated folder in the user's root directory - - 4. Switch to the iotdb dedicated folder to install the Python environment - - Taking Miniconda 3-py311_24.5.0-0-Lux-x86_64 as an example: - - ```shell - bash ./Miniconda3-py311_24.5.0-0-Linux-x86_64.sh - ``` - > Type "Enter", "Long press space", "Enter", "Yes", "Yes" according to the prompt
- > Close the current SSH window and reconnect - - 5. Create a dedicated environment - - ```shell - conda create -n ainode_py python=3.11.9 - ``` - - Type 'y' according to the prompt - - 6. Activate dedicated environment - - ```shell - conda activate ainode_py - ``` - - 7. Verify Python version - - ```shell - python --version - ``` - 8. Download and import AINode to a dedicated folder, switch to the dedicated folder and extract the installation package - - ```shell - unzip iotdb-enterprise-ainode-1.3.3.2.zip - ``` - - 9. Configuration item modification - - ```shell - vi iotdb-enterprise-ainode-1.3.3.2/conf/iotdb-ainode.properties - ``` - Configuration item modification:[detailed information](#configuration-item-modification) - - > ain_seed_config_node=iotdb-1:10710 (Cluster communication node IP: communication node port)
- > ain_inference_rpc_address=iotdb-3 (IP address of the server running AINode) - - 10. Replace Python source - - ```shell - pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ - ``` - - 11. Start the AINode node - - ```shell - nohup bash iotdb-enterprise-ainode-1.3.3.2/sbin/start-ainode.sh > myout.file 2>& 1 & - ``` - > Return to the default environment of the system: conda deactivate - - - ### Configuration item modification - -AINode supports modifying some necessary parameters. You can find the following parameters in the `conf/iotdb-ainode.properties` file and make persistent modifications to them: -: - -| **Name** | **Describe** | **Type** | **Default value** | **Effective method after modification** | -| :----------------------------- | ------------------------------------------------------------ | ------- | ------------------ | ---------------------------- | -| cluster_name | The identifier for AINode to join the cluster | string | defaultCluster | Only allow modifications before the first service startup | -| ain_seed_config_node | The Configurable Node address registered during AINode startup | String | 127.0.0.1:10710 | Only allow modifications before the first service startup | -| ain_inference_rpc_address | AINode provides service and communication addresses , Internal Service Communication Interface | String | 127.0.0.1 | Only allow modifications before the first service startup | -| ain_inference_rpc_port | AINode provides ports for services and communication | String | 10810 | Only allow modifications before the first service startup | -| ain_system_dir | AINode metadata storage path, the starting directory of the relative path is related to the operating system, and it is recommended to use an absolute path | String | data/AINode/system | Only allow modifications before the first service startup | -| ain_models_dir | AINode stores the path of the model file, and the starting directory of the relative path is related to the operating system. It is recommended to use an absolute path | String | data/AINode/models | Only allow modifications before the first service startup | -| ain_logs_dir | The path where AINode stores logs, the starting directory of the relative path is related to the operating system, and it is recommended to use an absolute path | String | logs/AINode | Effective after restart | -| ain_thrift_compression_enabled | Does AINode enable Thrift's compression mechanism , 0-Do not start, 1-Start | Boolean | 0 | Effective after restart | - -### Start AINode - - After completing the deployment of Seed Config Node, the registration and inference functions of the model can be supported by adding AINode nodes. After specifying the information of the IoTDB cluster in the configuration file, the corresponding instruction can be executed to start AINode and join the IoTDB cluster。 - -#### Networking environment startup - -##### Start command - -```shell - # Start command - # Linux and MacOS systems - bash sbin/start-ainode.sh - - # Windows systems - sbin\start-ainode.bat - - # Backend startup command (recommended for long-term running) - # Linux and MacOS systems - nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & - - # Windows systems - nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & - ``` - -#### Detailed Syntax - -```shell - # Start command - # Linux and MacOS systems - bash sbin/start-ainode.sh -i -r -n - - # Windows systems - sbin\start-ainode.bat -i -r -n - ``` - -##### Parameter introduction: - -| **Name** | **Label** | **Describe** | **Is it mandatory** | **Type** | **Default value** | **Input method** | -| ------------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ---------------- | ---------------------- | -| ain_interpreter_dir | -i | The interpreter path of the virtual environment where AINode is installed requires the use of an absolute path. | no | String | Default reading of environment variables | Input or persist modifications during invocation | -| ain_force_reinstall | -r | Does this script check the version when checking the installation status of AINode. If it does, it will force the installation of the whl package in lib if the version is incorrect. | no | Bool | false | Input when calling | -| ain_no_dependencies | -n | Specify whether to install dependencies when installing AINode, and if so, only install the AINode main program without installing dependencies. | no | Bool | false | Input when calling | - - If you don't want to specify the corresponding parameters every time you start, you can also persistently modify the parameters in the `ainode-env.sh` and `ainode-env.bat` scripts in the `conf` folder (currently supporting persistent modification of the ain_interpreter-dir parameter). - - `ainode-env.sh` : - ```shell - # The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark - # ain_interpreter_dir= - ``` - `ainode-env.bat` : -```shell - @REM The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark - @REM set ain_interpreter_dir= - ``` - After writing the parameter value, uncomment the corresponding line and save it to take effect on the next script execution. - - -#### Example - -##### Directly start: - -```shell - # Start command - # Linux and MacOS systems - bash sbin/start-ainode.sh - # Windows systems - sbin\start-ainode.bat - - - # Backend startup command (recommended for long-term running) - # Linux and MacOS systems - nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & - # Windows systems - nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & - ``` - -##### Update Start: -If the version of AINode has been updated (such as updating the `lib` folder), this command can be used. Firstly, it is necessary to ensure that AINode has stopped running, and then restart it using the `-r` parameter, which will reinstall AINode based on the files under `lib`. - - -```shell - # Update startup command - # Linux and MacOS systems - bash sbin/start-ainode.sh -r - # Windows systems - sbin\start-ainode.bat -r - - - # Backend startup command (recommended for long-term running) - # Linux and MacOS systems - nohup bash sbin/start-ainode.sh -r > myout.file 2>& 1 & - # Windows systems - nohup bash sbin\start-ainode.bat -r > myout.file 2>& 1 & - ``` -#### Non networked environment startup - -##### Start command - -```shell - # Start command - # Linux and MacOS systems - bash sbin/start-ainode.sh - - # Windows systems - sbin\start-ainode.bat - - # Backend startup command (recommended for long-term running) - # Linux and MacOS systems - nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & - - # Windows systems - nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & - ``` - -#### Detailed Syntax - -```shell - # Start command - # Linux and MacOS systems - bash sbin/start-ainode.sh -i -r -n - - # Windows systems - sbin\start-ainode.bat -i -r -n - ``` - -##### Parameter introduction: - -| **Name** | **Label** | **Describe** | **Is it mandatory** | **Type** | **Default value** | **Input method** | -| ------------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ---------------- | ---------------------- | -| ain_interpreter_dir | -i | The interpreter path of the virtual environment where AINode is installed requires the use of an absolute path | no | String | Default reading of environment variables | Input or persist modifications during invocation | -| ain_force_reinstall | -r | Does this script check the version when checking the installation status of AINode. If it does, it will force the installation of the whl package in lib if the version is incorrect | no | Bool | false | Input when calling | - -> Attention: When installation fails in a non networked environment, first check if the installation package corresponding to the platform is selected, and then confirm that the Python version is 3.8 (due to the limitations of the downloaded installation package on Python versions, 3.7, 3.9, and others are not allowed) - -#### Example - -##### Directly start: - -```shell - # Start command - # Linux and MacOS systems - bash sbin/start-ainode.sh - # Windows systems - sbin\start-ainode.bat - - # Backend startup command (recommended for long-term running) - # Linux and MacOS systems - nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & - # Windows systems - nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & - ``` - -### Detecting the status of AINode nodes - -During the startup process of AINode, the new AINode will be automatically added to the IoTDB cluster. After starting AINode, you can enter SQL in the command line to query. If you see an AINode node in the cluster and its running status is Running (as shown below), it indicates successful joining. - - -```shell -IoTDB> show cluster -+------+----------+-------+---------------+------------+-------+-----------+ -|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| -+------+----------+-------+---------------+------------+-------+-----------+ -| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| -| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| -| 2| AINode|Running| 127.0.0.1| 10810|UNKNOWN|190e303-dev| -+------+----------+-------+---------------+------------+-------+-----------+ -``` - -### Stop AINode - -If you need to stop a running AINode node, execute the corresponding shutdown script. - -#### Stop command - -```shell - # Linux / MacOS - bash sbin/stop-ainode.sh - - #Windows - sbin\stop-ainode.bat - ``` - -#### Detailed Syntax - -```shell - # Linux / MacOS - bash sbin/stop-ainode.sh -t/: - - #Windows - sbin\stop-ainode.bat -t/: - ``` - -##### Parameter introduction: - -| **Name** | **Label** | **Describe** | **Is it mandatory** | **Type** | **Default value** | **Input method** | -| ----------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ------ | ---------- | -| ain_remove_target | -t | When closing AINode, you can specify the Node ID, address, and port number of the target AINode to be removed, in the format of `/:` | no | String | nothing | Input when calling | - -#### Example - -```shell - # Linux / MacOS - bash sbin/stop-ainode.sh - - # Windows - sbin\stop-ainode.bat - ``` -After stopping AINode, you can still see AINode nodes in the cluster, whose running status is UNKNOWN (as shown below), and the AINode function cannot be used at this time. - - ```shell -IoTDB> show cluster -+------+----------+-------+---------------+------------+-------+-----------+ -|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| -+------+----------+-------+---------------+------------+-------+-----------+ -| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| -| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| -| 2| AINode|UNKNOWN| 127.0.0.1| 10790|UNKNOWN|190e303-dev| -+------+----------+-------+---------------+------------+-------+-----------+ -``` -If you need to restart the node, you need to execute the startup script again. - -### Remove AINode - -When it is necessary to remove an AINode node from the cluster, a removal script can be executed. The difference between removing and stopping scripts is that stopping retains the AINode node in the cluster but stops the AINode service, while removing removes the AINode node from the cluster. - -#### Remove command - - -```shell - # Linux / MacOS - bash sbin/remove-ainode.sh - - # Windows - sbin\remove-ainode.bat - ``` - -#### Detailed Syntax - -```shell - # Linux / MacOS - bash sbin/remove-ainode.sh -i -t/: -r -n - - # Windows - sbin\remove-ainode.bat -i -t/: -r -n - ``` - -##### Parameter introduction: - - | **Name** | **Label** | **Describe** | **Is it mandatory** | **Type** | **Default value** | **Input method** | -| ------------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ---------------- | --------------------- | -| ain_interpreter_dir | -i | The interpreter path of the virtual environment where AINode is installed requires the use of an absolute path | no | String | Default reading of environment variables | Input+persistent modification during invocation | -| ain_remove_target | -t | When closing AINode, you can specify the Node ID, address, and port number of the target AINode to be removed, in the format of `/:` | no | String | nothing | Input when calling | -| ain_force_reinstall | -r | Does this script check the version when checking the installation status of AINode. If it does, it will force the installation of the whl package in lib if the version is incorrect | no | Bool | false | Input when calling | -| ain_no_dependencies | -n | Specify whether to install dependencies when installing AINode, and if so, only install the AINode main program without installing dependencies | no | Bool | false | Input when calling | - - If you don't want to specify the corresponding parameters every time you start, you can also persistently modify the parameters in the `ainode-env.sh` and `ainode-env.bat` scripts in the `conf` folder (currently supporting persistent modification of the ain_interpreter-dir parameter). - - `ainode-env.sh` : - ```shell - # The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark - # ain_interpreter_dir= - ``` - `ainode-env.bat` : -```shell - @REM The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark - @REM set ain_interpreter_dir= - ``` - After writing the parameter value, uncomment the corresponding line and save it to take effect on the next script execution. - -#### Example - -##### Directly remove: - - ```shell - # Linux / MacOS - bash sbin/remove-ainode.sh - - # Windows - sbin\remove-ainode.bat - ``` - After removing the node, relevant information about the node cannot be queried. - - ```shell -IoTDB> show cluster -+------+----------+-------+---------------+------------+-------+-----------+ -|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| -+------+----------+-------+---------------+------------+-------+-----------+ -| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| -| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| -+------+----------+-------+---------------+------------+-------+-----------+ -``` -##### Specify removal: - -If the user loses files in the data folder, AINode may not be able to actively remove them locally. The user needs to specify the node number, address, and port number for removal. In this case, we support users to input parameters according to the following methods for deletion. - - ```shell - # Linux / MacOS - bash sbin/remove-ainode.sh -t /: - - # Windows - sbin\remove-ainode.bat -t /: - ``` - -## common problem - -### An error occurs when starting AINode stating that the venv module cannot be found - - When starting AINode using the default method, a Python virtual environment will be created in the installation package directory and dependencies will be installed, so it is required to install the venv module. Generally speaking, Python 3.8 and above versions come with built-in VenV, but for some systems with built-in Python environments, this requirement may not be met. There are two solutions when this error occurs (choose one or the other): - - To install the Venv module locally, taking Ubuntu as an example, you can run the following command to install the built-in Venv module in Python. Or install a Python version with built-in Venv from the Python official website. - - ```shell -apt-get install python3.8-venv -``` -Install version 3.8.0 of venv into AINode in the AINode path. - - ```shell -../Python-3.8.0/python -m venv venv(Folder Name) -``` - When running the startup script, use ` -i ` to specify an existing Python interpreter path as the running environment for AINode, eliminating the need to create a new virtual environment. - - ### The SSL module in Python is not properly installed and configured to handle HTTPS resources -WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. -You can install OpenSSLS and then rebuild Python to solve this problem -> Currently Python versions 3.6 to 3.9 are compatible with OpenSSL 1.0.2, 1.1.0, and 1.1.1. - - Python requires OpenSSL to be installed on our system, the specific installation method can be found in [link](https://stackoverflow.com/questions/56552390/how-to-fix-ssl-module-in-python-is-not-available-in-centos) - - ```shell -sudo apt-get install build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev uuid-dev lzma-dev liblzma-dev -sudo -E ./configure --with-ssl -make -sudo make install -``` - - ### Pip version is lower - - A compilation issue similar to "error: Microsoft Visual C++14.0 or greater is required..." appears on Windows - -The corresponding error occurs during installation and compilation, usually due to insufficient C++version or Setup tools version. You can check it in - - ```shell -./python -m pip install --upgrade pip -./python -m pip install --upgrade setuptools -``` - - - ### Install and compile Python - - Use the following instructions to download the installation package from the official website and extract it: - ```shell -.wget https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tar.xz -tar Jxf Python-3.8.0.tar.xz -``` - Compile and install the corresponding Python package: - ```shell -cd Python-3.8.0 -./configure prefix=/usr/local/python3 -make -sudo make install -python3 --version -``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Cluster-Deployment_apache.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Cluster-Deployment_apache.md deleted file mode 100644 index 84453e544..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Cluster-Deployment_apache.md +++ /dev/null @@ -1,347 +0,0 @@ - -# Cluster Deployment - -This section will take the IoTDB classic cluster deployment architecture 3C3D (3 ConfigNodes and 3 DataNodes) as an example to introduce how to deploy a cluster, commonly known as the 3C3D cluster. The architecture diagram of the 3C3D cluster is as follows: - -
- -
- -## Note - -1. Before installation, ensure that the system is complete by referring to [System configuration](./Environment-Requirements.md) - -2. It is recommended to prioritize using `hostname` for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure /etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure the `cn_internal_address` and `dn_internal_address` of IoTDB using the host name. - - ``` shell - echo "192.168.1.3 iotdb-1" >> /etc/hosts - ``` - -3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings. - -4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. - -5. Please note that when installing and deploying IoTDB, it is necessary to use the same user for operations. You can: -- Using root user (recommended): Using root user can avoid issues such as permissions. -- Using a fixed non root user: - - Using the same user operation: Ensure that the same user is used for start, stop and other operations, and do not switch users. - - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. - -## Preparation Steps - -1. Prepare the IoTDB database installation package::apache-iotdb-{version}-all-bin.zip(Please refer to the installation package for details:[IoTDB-Package](../Deployment-and-Maintenance/IoTDB-Package_apache.md)) - -2. Configure the operating system environment according to environmental requirements (system environment configuration can be found in:[Environment Requirements](../Deployment-and-Maintenance/Environment-Requirements.md)) - -## Installation Steps - -Assuming there are three Linux servers now, the IP addresses and service roles are assigned as follows: - -| Node IP | Host Name | Service | -| ----------- | --------- | -------------------- | -| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | -| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | -| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | - -### Set Host Name - -On three machines, configure the host names separately. To set the host names, configure `/etc/hosts` on the target server. Use the following command: - -```Bash -echo "192.168.1.3 iotdb-1" >> /etc/hosts -echo "192.168.1.4 iotdb-2" >> /etc/hosts -echo "192.168.1.5 iotdb-3" >> /etc/hosts -``` - -### Configuration - -Unzip the installation package and enter the installation directory - -```Plain -unzip apache-iotdb-{version}-all-bin.zip -cd apache-iotdb-{version}-all-bin -``` - -#### Environment Script Configuration - -- `./conf/confignode-env.sh` configuration - -| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | -| :---------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | -| MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | - -- `./conf/datanode-env.sh` configuration - -| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | -| :---------------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | -| MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | - -#### General Configuration - -Open the general configuration file `./conf/iotdb-common.properties`, The following parameters can be set according to the deployment method: - -| **Configuration** | **Description** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | -| ------------------------- | ------------------------------------------------------------ | -------------- | -------------- | -------------- | -| cluster_name | Cluster Name | defaultCluster | defaultCluster | defaultCluster | -| schema_replication_factor | The number of metadata replicas, the number of DataNodes should not be less than this number | 3 | 3 | 3 | -| data_replication_factor | The number of data replicas should not be less than this number of DataNodes | 2 | 2 | 2 | - -#### ConfigNode Configuration - -Open the ConfigNode configuration file `./conf/iotdb-confignode.properties`, Set the following parameters - -| **Configuration** | **Description** | **Default** | **Recommended value** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | Note | -| ------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | -| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | -| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | 10710 | 10710 | 10710 | Cannot be modified after initial startup | -| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | 10720 | 10720 | 10720 | Cannot be modified after initial startup | -| cn_seed_config_node | TThe address of the ConfigNode that the node connects to when registering to join the cluster, `cn_internal_address:cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's `cn_internal-address: cn_internal_port` | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | - -#### DataNode Configuration - -Open DataNode Configuration File `./conf/iotdb-datanode.properties`,Set the following parameters: - -| **Configuration** | **Description** | **Default** | **Recommended value** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | Note | -| ------------------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | -| dn_rpc_address | The address of the client RPC service | 127.0.0.1 | Recommend using the **IPV4 address or hostname** of the server where it is located | iotdb-1 |iotdb-2 | iotdb-3 | Restarting the service takes effect | -| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | 6667 | 6667 | 6667 | Restarting the service takes effect | -| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | -| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | 10730 | 10730 | 10730 | Cannot be modified after initial startup | -| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | 10740 | 10740 | 10740 | Cannot be modified after initial startup | -| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | 10750 | 10750 | 10750 | Cannot be modified after initial startup | -| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | 10760 | 10760 | 10760 | Cannot be modified after initial startup | -| dn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, i.e. `cn_internal-address: cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's cn_internal-address: cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | - -> ❗️Attention: Editors such as VSCode Remote do not have automatic configuration saving function. Please ensure that the modified files are saved persistently, otherwise the configuration items will not take effect - -### Start ConfigNode - -Start the first confignode of IoTDB-1 first, ensuring that the seed confignode node starts first, and then start the second and third confignode nodes in sequence - -```Bash -cd sbin -./start-confignode.sh -d #"- d" parameter will start in the background -``` - -If the startup fails, please refer to [Common Questions](#common-questions). - -### Start DataNode - - Enter the `sbin` directory of iotdb and start three datanode nodes in sequence: - -```Bash -cd sbin -./start-datanode.sh -d #"- d" parameter will start in the background -``` - -### Verify Deployment - -Can be executed directly Cli startup script in `./sbin` directory: - -```Plain -./start-cli.sh -h ip(local IP or domain name) -p port(6667) -``` - -After successful startup, the following interface will appear displaying successful installation of IOTDB. - -![](/img/%E5%BC%80%E6%BA%90%E6%88%90%E5%8A%9F.png) - -You can use the `show cluster` command to view cluster information: - -![](/img/%E5%BC%80%E6%BA%90%E7%89%88%20show%20cluter.png) - - -> The appearance of `ACTIVATED (W)` indicates passive activation, which means that this Configurable Node does not have a license file (or has not issued the latest license file with a timestamp), and its activation depends on other Activated Configurable Nodes in the cluster. At this point, it is recommended to check if the license file has been placed in the license folder. If not, please place the license file. If a license file already exists, it may be due to inconsistency between the license file of this node and the information of other nodes. Please contact Timecho staff to reapply. - -## Node Maintenance Steps - -### ConfigNode Node Maintenance - -ConfigNode node maintenance is divided into two types of operations: adding and removing ConfigNodes, with two common use cases: -- Cluster expansion: For example, when there is only one ConfigNode in the cluster, and you want to increase the high availability of ConfigNode nodes, you can add two ConfigNodes, making a total of three ConfigNodes in the cluster. -- Cluster failure recovery: When the machine where a ConfigNode is located fails, making the ConfigNode unable to run normally, you can remove this ConfigNode and then add a new ConfigNode to the cluster. - -> ❗️Note, after completing ConfigNode node maintenance, you need to ensure that there are 1 or 3 ConfigNodes running normally in the cluster. Two ConfigNodes do not have high availability, and more than three ConfigNodes will lead to performance loss. - -#### Adding ConfigNode Nodes - -Script command: -```shell -# Linux / MacOS -# First switch to the IoTDB root directory -sbin/start-confignode.sh - -# Windows -# First switch to the IoTDB root directory -sbin/start-confignode.bat -``` - -Parameter introduction: - -| Parameter | Description | Is it required | -| :--- | :--------------------------------------------- | :----------- | -| -v | Show version information | No | -| -f | Run the script in the foreground, do not put it in the background | No | -| -d | Start in daemon mode, i.e. run in the background | No | -| -p | Specify a file to store the process ID for process management | No | -| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | -| -g | Print detailed garbage collection (GC) information | No | -| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | -| -E | Specify the path of the JVM error log file | No | -| -D | Define system properties, in the format key=value | No | -| -X | Pass -XX parameters directly to the JVM | No | -| -h | Help instruction | No | - -#### Removing ConfigNode Nodes - -First connect to the cluster through the CLI and confirm the internal address and port number of the ConfigNode you want to remove by using `show confignodes`: - -```Bash -IoTDB> show confignodes -+------+-------+---------------+------------+--------+ -|NodeID| Status|InternalAddress|InternalPort| Role| -+------+-------+---------------+------------+--------+ -| 0|Running| 127.0.0.1| 10710| Leader| -| 1|Running| 127.0.0.1| 10711|Follower| -| 2|Running| 127.0.0.1| 10712|Follower| -+------+-------+---------------+------------+--------+ -Total line number = 3 -It costs 0.030s -``` - -Then use the script to remove the DataNode. Script command: - -```Bash -# Linux / MacOS -sbin/remove-confignode.sh [confignode_id] -or -./sbin/remove-confignode.sh [cn_internal_address:cn_internal_port] - -#Windows -sbin/remove-confignode.bat [confignode_id] -or -./sbin/remove-confignode.bat [cn_internal_address:cn_internal_port] -``` - -### DataNode Node Maintenance - -There are two common scenarios for DataNode node maintenance: - -- Cluster expansion: For the purpose of expanding cluster capabilities, add new DataNodes to the cluster -- Cluster failure recovery: When a machine where a DataNode is located fails, making the DataNode unable to run normally, you can remove this DataNode and add a new DataNode to the cluster - -> ❗️Note, in order for the cluster to work normally, during the process of DataNode node maintenance and after the maintenance is completed, the total number of DataNodes running normally should not be less than the number of data replicas (usually 2), nor less than the number of metadata replicas (usually 3). - -#### Adding DataNode Nodes - -Script command: - -```Bash -# Linux / MacOS -# First switch to the IoTDB root directory -sbin/start-datanode.sh - -# Windows -# First switch to the IoTDB root directory -sbin/start-datanode.bat -``` - -Parameter introduction: - -| Abbreviation | Description | Is it required | -| :--- | :--------------------------------------------- | :----------- | -| -v | Show version information | No | -| -f | Run the script in the foreground, do not put it in the background | No | -| -d | Start in daemon mode, i.e. run in the background | No | -| -p | Specify a file to store the process ID for process management | No | -| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | -| -g | Print detailed garbage collection (GC) information | No | -| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | -| -E | Specify the path of the JVM error log file | No | -| -D | Define system properties, in the format key=value | No | -| -X | Pass -XX parameters directly to the JVM | No | -| -h | Help instruction | No | - -Note: After adding a DataNode, as new writes arrive (and old data expires, if TTL is set), the cluster load will gradually balance towards the new DataNode, eventually achieving a balance of storage and computation resources on all nodes. - -#### Removing DataNode Nodes - -First connect to the cluster through the CLI and confirm the RPC address and port number of the DataNode you want to remove with `show datanodes`: - -```Bash -IoTDB> show datanodes -+------+-------+----------+-------+-------------+---------------+ -|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| -+------+-------+----------+-------+-------------+---------------+ -| 1|Running| 0.0.0.0| 6667| 0| 0| -| 2|Running| 0.0.0.0| 6668| 1| 1| -| 3|Running| 0.0.0.0| 6669| 1| 0| -+------+-------+----------+-------+-------------+---------------+ -Total line number = 3 -It costs 0.110s -``` - -Then use the script to remove the DataNode. Script command: - -```Bash -# Linux / MacOS -sbin/remove-datanode.sh [dn_rpc_address:dn_rpc_port] - -#Windows -sbin/remove-datanode.bat [dn_rpc_address:dn_rpc_port] -``` - -## Common Questions - -1. Confignode failed to start - - Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. - - Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. - - Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. - - Step 4: Clean up the environment: - - a. Terminate all ConfigNode Node and DataNode processes. - ```Bash - # 1. Stop the ConfigNode and DataNode services - sbin/stop-standalone.sh - - # 2. Check for any remaining processes - jps - # Or - ps -ef|gerp iotdb - - # 3. If there are any remaining processes, manually kill the - kill -9 - # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes - ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 - ``` - b. Delete the data and logs directories. - - Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. - - ```Bash - cd /data/iotdb - rm -rf data logs - ``` diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Cluster-Deployment_timecho.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Cluster-Deployment_timecho.md deleted file mode 100644 index d9e3c1247..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Cluster-Deployment_timecho.md +++ /dev/null @@ -1,388 +0,0 @@ - -# Cluster Deployment - -This section describes how to manually deploy an instance that includes 3 ConfigNodes and 3 DataNodes, commonly known as a 3C3D cluster. - -
- -
- -## Note - -1. Before installation, ensure that the system is complete by referring to [System configuration](./Environment-Requirements.md) - -2. It is recommended to prioritize using `hostname` for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure /etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure the `cn_internal_address` and `dn_internal_address` of IoTDB using the host name. - - ``` shell - echo "192.168.1.3 iotdb-1" >> /etc/hosts - ``` - -3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings. - -4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. - -5. Please note that when installing and deploying IoTDB (including activating and using software), it is necessary to use the same user for operations. You can: -- Using root user (recommended): Using root user can avoid issues such as permissions. -- Using a fixed non root user: - - Using the same user operation: Ensure that the same user is used for start, activation, stop, and other operations, and do not switch users. - - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. - -6. It is recommended to deploy a monitoring panel, which can monitor important operational indicators and keep track of database operation status at any time. The monitoring panel can be obtained by contacting the business department,The steps for deploying a monitoring panel can refer to:[Monitoring Panel Deployment](./Monitoring-panel-deployment.md) - -## Preparation Steps - -1. Prepare the IoTDB database installation package: iotdb enterprise- {version}-bin.zip(The installation package can be obtained from:[IoTDB-Package](../Deployment-and-Maintenance/IoTDB-Package_timecho.md)) -2. Configure the operating system environment according to environmental requirements(The system environment configuration can be found in:[Environment Requirements](../Deployment-and-Maintenance/Environment-Requirements.md)) - -## Installation Steps - -Assuming there are three Linux servers now, the IP addresses and service roles are assigned as follows: - -| Node IP | Host Name | Service | -| ----------- | --------- | -------------------- | -| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | -| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | -| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | - -### Set Host Name - -On three machines, configure the host names separately. To set the host names, configure `/etc/hosts` on the target server. Use the following command: - -```Bash -echo "192.168.1.3 iotdb-1" >> /etc/hosts -echo "192.168.1.4 iotdb-2" >> /etc/hosts -echo "192.168.1.5 iotdb-3" >> /etc/hosts -``` - -### Configuration - -Unzip the installation package and enter the installation directory - -```Plain -unzip iotdb-enterprise-{version}-bin.zip -cd iotdb-enterprise-{version}-bin -``` - -#### Environment script configuration - -- `./conf/confignode-env.sh` configuration - - | **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | - | :---------------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | - | MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | - -- `./conf/datanode-env.sh` configuration - - | **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | - | :---------------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | - | MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | - -#### General Configuration - -Open the general configuration file `./conf/iotdb-common.properties`,The following parameters can be set according to the deployment method: - -| **Configuration** | **Description** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | -| ------------------------- | ------------------------------------------------------------ | -------------- | -------------- | -------------- | -| cluster_name | Cluster Name | defaultCluster | defaultCluster | defaultCluster | -| schema_replication_factor | The number of metadata replicas, the number of DataNodes should not be less than this number | 3 | 3 | 3 | -| data_replication_factor | The number of data replicas should not be less than this number of DataNodes | 2 | 2 | 2 | - -#### **ConfigNode Configuration** - -Open the ConfigNode configuration file `./conf/iotdb-confignode.properties`,Set the following parameters - -| **Configuration** | **Description** | **Default** | **Recommended value** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | Note | -| ------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | -| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | -| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | 10710 | 10710 | 10710 | Cannot be modified after initial startup | -| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | 10720 | 10720 | 10720 | Cannot be modified after initial startup | -| cn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, `cn_internal_address:cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's `cn_internal-address: cn_internal_port` | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | - -#### DataNode Configuration - -Open DataNode Configuration File `./conf/iotdb-datanode.properties`,Set the following parameters: - -| **Configuration** | **Description** | **Default** | **Recommended value** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | Note | -| ------------------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | -| dn_rpc_address | The address of the client RPC service | 127.0.0.1 | Recommend using the **IPV4 address or hostname** of the server where it is located | iotdb-1 |iotdb-2 | iotdb-3 | Restarting the service takes effect | -| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | 6667 | 6667 | 6667 | Restarting the service takes effect | -| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | -| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | 10730 | 10730 | 10730 | Cannot be modified after initial startup | -| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | 10740 | 10740 | 10740 | Cannot be modified after initial startup | -| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | 10750 | 10750 | 10750 | Cannot be modified after initial startup | -| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | 10760 | 10760 | 10760 | Cannot be modified after initial startup | -| dn_seed_config_node | The addresses of the ConfigNode that the node connects to when registering to join the cluster, i.e. `cn_internal-address: cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's cn_internal-address: cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | - -> ❗️Attention: Editors such as VSCode Remote do not have automatic configuration saving function. Please ensure that the modified files are saved persistently, otherwise the configuration items will not take effect - -### Start ConfigNode - -Start the first confignode of IoTDB-1 first, ensuring that the seed confignode node starts first, and then start the second and third confignode nodes in sequence - -```Bash -cd sbin -./start-confignode.sh -d #"- d" parameter will start in the background -``` -If the startup fails, please refer to [Common Questions](#common-questions). - -### Activate Database - -#### Method 1: Activate file copy activation - -- After starting three confignode nodes in sequence, copy the `activation` folder of each machine and the `system_info` file of each machine to the Timecho staff; -- The staff will return the license files for each ConfigNode node, where 3 license files will be returned; -- Put the three license files into the `activation` folder of the corresponding ConfigNode node; - -#### Method 2: Activate Script Activation - -- Obtain the machine codes of three machines in sequence, enter the `sbin` directory of the installation directory, and execute the activation script `start activate.sh`: - - ```Bash - cd sbin - ./start-activate.sh - ``` - -- The following information is displayed, where the machine code of one machine is displayed: - - ```Bash - Please copy the system_info's content and send it to Timecho: - Y17hFA0xRCE1TmkVxILuxxxxxxxxxxxxxxxxxxxxxxxxxxxxW5P52KCccFMVeHTc= - Please enter license: - ``` - -- The other two nodes execute the activation script `start activate.sh` in sequence, and then copy the machine codes of the three machines obtained to the Timecho staff -- The staff will return 3 activation codes, which normally correspond to the order of the provided 3 machine codes. Please paste each activation code into the previous command line prompt `Please enter license:`, as shown below: - - ```Bash - Please enter license: - Jw+MmF+Atxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx5bAOXNeob5l+HO5fEMgzrW8OJPh26Vl6ljKUpCvpTiw== - License has been stored to sbin/../activation/license - Import completed. Please start cluster and excute 'show cluster' to verify activation status - ``` - -### Start DataNode - - Enter the `sbin` directory of iotdb and start three datanode nodes in sequence: - -```Go -cd sbin -./start-datanode.sh -d #"- d" parameter will start in the background -``` - -### Verify Deployment - -Can be executed directly Cli startup script in `./sbin` directory: - -```Plain -./start-cli.sh -h ip(local IP or domain name) -p port(6667) -``` - - After successful startup, the following interface will appear displaying successful installation of IOTDB. - -![](/img/%E4%BC%81%E4%B8%9A%E7%89%88%E6%88%90%E5%8A%9F.png) - -After the installation success interface appears, continue to check if the activation is successful and use the `show cluster` command. - -When you see the display of `Activated` on the far right, it indicates successful activation. - -![](/img/%E4%BC%81%E4%B8%9A%E7%89%88%E6%BF%80%E6%B4%BB.png) - - -> The appearance of `ACTIVATED (W)` indicates passive activation, which means that this Configurable Node does not have a license file (or has not issued the latest license file with a timestamp), and its activation depends on other Activated Configurable Nodes in the cluster. At this point, it is recommended to check if the license file has been placed in the license folder. If not, please place the license file. If a license file already exists, it may be due to inconsistency between the license file of this node and the information of other nodes. Please contact Timecho staff to reapply. - - -## Node Maintenance Steps - -### ConfigNode Node Maintenance - -ConfigNode node maintenance is divided into two types of operations: adding and removing ConfigNodes, with two common use cases: -- Cluster expansion: For example, when there is only one ConfigNode in the cluster, and you want to increase the high availability of ConfigNode nodes, you can add two ConfigNodes, making a total of three ConfigNodes in the cluster. -- Cluster failure recovery: When the machine where a ConfigNode is located fails, making the ConfigNode unable to run normally, you can remove this ConfigNode and then add a new ConfigNode to the cluster. - -> ❗️Note, after completing ConfigNode node maintenance, you need to ensure that there are 1 or 3 ConfigNodes running normally in the cluster. Two ConfigNodes do not have high availability, and more than three ConfigNodes will lead to performance loss. - -#### Adding ConfigNode Nodes - -Script command: -```shell -# Linux / MacOS -# First switch to the IoTDB root directory -sbin/start-confignode.sh - -# Windows -# First switch to the IoTDB root directory -sbin/start-confignode.bat -``` - -Parameter introduction: - -| Parameter | Description | Is it required | -| :--- | :--------------------------------------------- | :----------- | -| -v | Show version information | No | -| -f | Run the script in the foreground, do not put it in the background | No | -| -d | Start in daemon mode, i.e. run in the background | No | -| -p | Specify a file to store the process ID for process management | No | -| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | -| -g | Print detailed garbage collection (GC) information | No | -| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | -| -E | Specify the path of the JVM error log file | No | -| -D | Define system properties, in the format key=value | No | -| -X | Pass -XX parameters directly to the JVM | No | -| -h | Help instruction | No | - -#### Removing ConfigNode Nodes - -First connect to the cluster through the CLI and confirm the internal address and port number of the ConfigNode you want to remove by using `show confignodes`: - -```Bash -IoTDB> show confignodes -+------+-------+---------------+------------+--------+ -|NodeID| Status|InternalAddress|InternalPort| Role| -+------+-------+---------------+------------+--------+ -| 0|Running| 127.0.0.1| 10710| Leader| -| 1|Running| 127.0.0.1| 10711|Follower| -| 2|Running| 127.0.0.1| 10712|Follower| -+------+-------+---------------+------------+--------+ -Total line number = 3 -It costs 0.030s -``` - -Then use the script to remove the DataNode. Script command: - -```Bash -# Linux / MacOS -sbin/remove-confignode.sh [confignode_id] -or -./sbin/remove-confignode.sh [cn_internal_address:cn_internal_port] - -#Windows -sbin/remove-confignode.bat [confignode_id] -or -./sbin/remove-confignode.bat [cn_internal_address:cn_internal_port] -``` - -### DataNode Node Maintenance - -There are two common scenarios for DataNode node maintenance: - -- Cluster expansion: For the purpose of expanding cluster capabilities, add new DataNodes to the cluster -- Cluster failure recovery: When a machine where a DataNode is located fails, making the DataNode unable to run normally, you can remove this DataNode and add a new DataNode to the cluster - -> ❗️Note, in order for the cluster to work normally, during the process of DataNode node maintenance and after the maintenance is completed, the total number of DataNodes running normally should not be less than the number of data replicas (usually 2), nor less than the number of metadata replicas (usually 3). - -#### Adding DataNode Nodes - -Script command: - -```Bash -# Linux / MacOS -# First switch to the IoTDB root directory -sbin/start-datanode.sh - -# Windows -# First switch to the IoTDB root directory -sbin/start-datanode.bat -``` - -Parameter introduction: - -| Abbreviation | Description | Is it required | -| :--- | :--------------------------------------------- | :----------- | -| -v | Show version information | No | -| -f | Run the script in the foreground, do not put it in the background | No | -| -d | Start in daemon mode, i.e. run in the background | No | -| -p | Specify a file to store the process ID for process management | No | -| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | -| -g | Print detailed garbage collection (GC) information | No | -| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | -| -E | Specify the path of the JVM error log file | No | -| -D | Define system properties, in the format key=value | No | -| -X | Pass -XX parameters directly to the JVM | No | -| -h | Help instruction | No | - -Note: After adding a DataNode, as new writes arrive (and old data expires, if TTL is set), the cluster load will gradually balance towards the new DataNode, eventually achieving a balance of storage and computation resources on all nodes. - -#### Removing DataNode Nodes - -First connect to the cluster through the CLI and confirm the RPC address and port number of the DataNode you want to remove with `show datanodes`: - -```Bash -IoTDB> show datanodes -+------+-------+----------+-------+-------------+---------------+ -|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| -+------+-------+----------+-------+-------------+---------------+ -| 1|Running| 0.0.0.0| 6667| 0| 0| -| 2|Running| 0.0.0.0| 6668| 1| 1| -| 3|Running| 0.0.0.0| 6669| 1| 0| -+------+-------+----------+-------+-------------+---------------+ -Total line number = 3 -It costs 0.110s -``` - -Then use the script to remove the DataNode. Script command: - -```Bash -# Linux / MacOS -sbin/remove-datanode.sh [dn_rpc_address:dn_rpc_port] - -#Windows -sbin/remove-datanode.bat [dn_rpc_address:dn_rpc_port] -``` - -## Common Questions -1. Multiple prompts indicating activation failure during deployment process - - Use the `ls -al` command: Use the `ls -al` command to check if the owner information of the installation package root directory is the current user. - - Check activation directory: Check all files in the `./activation` directory and whether the owner information is the current user. - -2. Confignode failed to start - - Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. - - Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. - - Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. - - Step 4: Clean up the environment: - - a. Terminate all ConfigNode Node and DataNode processes. - ```Bash - # 1. Stop the ConfigNode and DataNode services - sbin/stop-standalone.sh - - # 2. Check for any remaining processes - jps - # Or - ps -ef|gerp iotdb - - # 3. If there are any remaining processes, manually kill the - kill -9 - # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes - ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 - ``` - b. Delete the data and logs directories. - - Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. - - ```Bash - cd /data/iotdb - rm -rf data logs - ``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Database-Resources.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Database-Resources.md deleted file mode 100644 index 67a5f1fb2..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Database-Resources.md +++ /dev/null @@ -1,203 +0,0 @@ - -# Database Resources -## CPU - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Number of timeseries (frequency<=1HZ)CPUNumber of nodes
standalone modeDouble activeDistributed
Within 1000002core-4core123
Within 3000004core-8core123
Within 5000008core-26core123
Within 100000016core-32core123
Within 200000032core-48core123
Within 1000000048core12Please contact Timecho Business for consultation
Over 10000000Please contact Timecho Business for consultation
- -## Memory - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Number of timeseries (frequency<=1HZ)MemoryNumber of nodes
standalone modeDouble activeDistributed
Within 1000004G-8G123
Within 30000012G-32G123
Within 50000024G-48G123
Within 100000032G-96G123
Within 200000064G-128G123
Within 10000000128G12Please contact Timecho Business for consultation
Over 10000000Please contact Timecho Business for consultation
- - -## Storage (Disk) -### Storage space -Calculation formula: Number of measurement points * Sampling frequency (Hz) * Size of each data point (Byte, different data types may vary, see table below) * Storage time (seconds) * Number of copies (usually 1 copy for a single node and 2 copies for a cluster) ÷ Compression ratio (can be estimated at 5-10 times, but may be higher in actual situations) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Data point size calculation
data typeTimestamp (Bytes)Value (Bytes)Total size of data points (in bytes)
Boolean819
INT32/FLOAT8412
INT64/DOUBLE8816
TEXT8The average is a8+a
- -Example: 1000 devices, each with 100 measurement points, a total of 100000 sequences, INT32 type. Sampling frequency 1Hz (once per second), storage for 1 year, 3 copies. -- Complete calculation formula: 1000 devices * 100 measurement points * 12 bytes per data point * 86400 seconds per day * 365 days per year * 3 copies / 10 compression ratio / 1024 / 1024 / 1024 / 1024 =11T -- Simplified calculation formula: 1000 * 100 * 12 * 86400 * 365 * 3 / 10 / 1024 / 1024 / 1024 / 1024 =11T -### Storage Configuration -If the number of nodes is over 10000000 or the query load is high, it is recommended to configure SSD -## Network (Network card) -If the write throughput does not exceed 10 million points/second, configure 1Gbps network card. When the write throughput exceeds 10 million points per second, a 10Gbps network card needs to be configured. -| **Write throughput (data points per second)** | **NIC rate** | -| ------------------- | ------------- | -| /<10 million | 1Gbps | -| >=10 million | 10Gbps | -## Other instructions -IoTDB has the ability to scale up clusters in seconds, and expanding node data does not require migration. Therefore, you do not need to worry about the limited cluster capacity estimated based on existing data. In the future, you can add new nodes to the cluster when you need to scale up. \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Docker-Deployment_apache.md deleted file mode 100644 index 048c3e0d8..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Docker-Deployment_apache.md +++ /dev/null @@ -1,416 +0,0 @@ - -# Docker Deployment - -## Environmental Preparation - -### Docker Installation - -```SQL -#Taking Ubuntu as an example, other operating systems can search for installation methods themselves -#step1: Install some necessary system tools -sudo apt-get update -sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common -#step2: Install GPG certificate -curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add - -#step3: Write software source information -sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" -#step4: Update and install Docker CE -sudo apt-get -y update -sudo apt-get -y install docker-ce -#step5: Set Docker to start automatically upon startup -sudo systemctl enable docker -#step6: Verify if Docker installation is successful -docker --version #Display version information, indicating successful installation -``` - -### Docker-compose Installation - -```SQL -#Installation command -curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose -chmod +x /usr/local/bin/docker-compose -ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose -#Verify if the installation was successful -docker-compose --version #Displaying version information indicates successful installation -``` - -## Stand-Alone Deployment - -This section demonstrates how to deploy a standalone Docker version of 1C1D. - -### Pull Image File - -The Docker image of Apache IoTDB has been uploaded tohttps://hub.docker.com/r/apache/iotdb。 - -Taking obtaining version 1.3.2 as an example, pull the image command: - -```bash -docker pull apache/iotdb:1.3.2-standalone -``` - -View image: - -```bash -docker images -``` - -![](/img/%E5%BC%80%E6%BA%90-%E6%8B%89%E5%8F%96%E9%95%9C%E5%83%8F.png) - -### Create Docker Bridge Network - -```Bash -docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb -``` - -### Write The Yml File For Docker-Compose - -Here we take the example of consolidating the IoTDB installation directory and yml files in the/docker iotdb folder: - -The file directory structure is:`/docker-iotdb/iotdb`, `/docker-iotdb/docker-compose-standalone.yml ` - -```bash -docker-iotdb: -├── iotdb #Iotdb installation directory -│── docker-compose-standalone.yml #YML file for standalone Docker Composer -``` - -The complete docker-compose-standalone.yml content is as follows: - -```bash -version: "3" -services: - iotdb-service: - image: apache/iotdb:1.3.2-standalone #The image used - hostname: iotdb - container_name: iotdb - restart: always - ports: - - "6667:6667" - environment: - - cn_internal_address=iotdb - - cn_internal_port=10710 - - cn_consensus_port=10720 - - cn_seed_config_node=iotdb:10710 - - dn_rpc_address=iotdb - - dn_internal_address=iotdb - - dn_rpc_port=6667 - - dn_internal_port=10730 - - dn_mpp_data_exchange_port=10740 - - dn_schema_region_consensus_port=10750 - - dn_data_region_consensus_port=10760 - - dn_seed_config_node=iotdb:10710 - privileged: true - volumes: - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs - networks: - iotdb: - ipv4_address: 172.18.0.6 -networks: - iotdb: - external: true -``` - -### Start IoTDB - -Use the following command to start: - -```bash -cd /docker-iotdb -docker-compose -f docker-compose-standalone.yml up -d #Background startup -``` - -### Validate Deployment - -- Viewing the log, the following words indicate successful startup - -```SQL -docker logs -f iotdb-datanode #View log command -2024-07-21 08:22:38,457 [main] INFO o.a.i.db.service.DataNode:227 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! -``` - -![](/img/%E5%BC%80%E6%BA%90-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B2.png) - -- Enter the container to view the service running status and activation information - -View the launched container - -```SQL -docker ps -``` - -![](/img/%E5%BC%80%E6%BA%90-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B22.png) - -Enter the container, log in to the database through CLI, and use the `show cluster` command to view the service status and activation status - -```SQL -docker exec -it iotdb /bin/bash #Entering the container -./start-cli.sh -h iotdb #Log in to the database -IoTDB> show cluster #View status -``` - -You can see that all services are running and the activation status shows as activated. - -![](/img/%E5%BC%80%E6%BA%90-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B23.png) - -### Map/conf Directory (optional) - -If you want to directly modify the configuration file in the physical machine in the future, you can map the/conf folder in the container in three steps: - -Step 1: Copy the /conf directory from the container to `/docker-iotdb/iotdb/conf` - -```bash -docker cp iotdb:/iotdb/conf /docker-iotdb/iotdb/conf -``` - -Step 2: Add mappings in docker-compose-standalone.yml - -```bash - volumes: - - ./iotdb/conf:/iotdb/conf #Add mapping for this/conf folder - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs -``` - -Step 3: Restart IoTDB - -```bash -docker-compose -f docker-compose-standalone.yml up -d -``` - -## Cluster Deployment - -This section describes how to manually deploy an instance that includes 3 Config Nodes and 3 Data Nodes, commonly known as a 3C3D cluster. - -
- -
- -**Note: The cluster version currently only supports host and overlay networks, and does not support bridge networks.** - -Taking the host network as an example, we will demonstrate how to deploy a 3C3D cluster. - -### Set Host Name - -Assuming there are currently three Linux servers, the IP addresses and service role assignments are as follows: - -| Node IP | Host Name | Service | -| ----------- | --------- | -------------------- | -| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | -| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | -| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | - -Configure the host names on three machines separately. To set the host names, configure `/etc/hosts` on the target server using the following command: - -```Bash -echo "192.168.1.3 iotdb-1" >> /etc/hosts -echo "192.168.1.4 iotdb-2" >> /etc/hosts -echo "192.168.1.5 iotdb-3" >> /etc/hosts -``` - -### Pull Image File - -The Docker image of Apache IoTDB has been uploaded tohttps://hub.docker.com/r/apache/iotdb。 - -Pull IoTDB images from three servers separately, taking version 1.3.2 as an example. The pull image command is: - -```SQL -docker pull apache/iotdb:1.3.2-standalone -``` - -View image: - -```SQL -docker images -``` - -![](/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%881.png) - -### Write The Yml File For Docker Compose - -Here we take the example of consolidating the IoTDB installation directory and yml files in the `/docker-iotdb` folder: - -The file directory structure is :`/docker-iotdb/iotdb`, `/docker-iotdb/confignode.yml`,`/docker-iotdb/datanode.yml` - -```SQL -docker-iotdb: -├── confignode.yml #Yml file of confignode -├── datanode.yml #Yml file of datanode -└── iotdb #IoTDB installation directory -``` - -On each server, two yml files need to be written, namely confignnode. yml and datanode. yml. The example of yml is as follows: - -**confignode.yml:** - -```bash -#confignode.yml -version: "3" -services: - iotdb-confignode: - image: iotdb-enterprise:1.3.2.3-standalone #The image used - hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation - container_name: iotdb-confignode - command: ["bash", "-c", "entrypoint.sh confignode"] - restart: always - environment: - - cn_internal_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation - - cn_internal_port=10710 - - cn_consensus_port=10720 - - cn_seed_config_node=iotdb-1:10710 #The default first node is the seed node - - schema_replication_factor=3 #Number of metadata copies - - data_replication_factor=2 #Number of data replicas - privileged: true - volumes: - - ./iotdb/activation:/iotdb/activation - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs - - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro - - /dev/mem:/dev/mem:ro - network_mode: "host" #Using the host network -``` - -**datanode.yml:** - -```bash -#datanode.yml -version: "3" -services: - iotdb-datanode: - image: iotdb-enterprise:1.3.2.3-standalone #The image used - hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation - container_name: iotdb-datanode - command: ["bash", "-c", "entrypoint.sh datanode"] - restart: always - ports: - - "6667:6667" - privileged: true - environment: - - dn_rpc_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation - - dn_internal_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation - - dn_seed_config_node=iotdb-1:10710 #The default first node is the seed node - - dn_rpc_port=6667 - - dn_internal_port=10730 - - dn_mpp_data_exchange_port=10740 - - dn_schema_region_consensus_port=10750 - - dn_data_region_consensus_port=10760 - - schema_replication_factor=3 #Number of metadata copies - - data_replication_factor=2 #Number of data replicas - volumes: - - ./iotdb/activation:/iotdb/activation - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs - - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro - - /dev/mem:/dev/mem:ro - network_mode: "host" #Using the host network -``` - -### Starting Confignode For The First Time - -First, start configNodes on each of the three servers to obtain the machine code. Pay attention to the startup order, start the first iotdb-1 first, then start iotdb-2 and iotdb-3. - -```bash -cd /docker-iotdb -docker-compose -f confignode.yml up -d #Background startup -``` - -### Start Datanode - -Start datanodes on 3 servers separately - -```SQL -cd /docker-iotdb -docker-compose -f datanode.yml up -d #Background startup -``` - -![](/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%882.png) - -### Validate Deployment - -- Viewing the logs, the following words indicate that the datanode has successfully started - - ```SQL - docker logs -f iotdb-datanode #View log command - 2024-07-21 09:40:58,120 [main] INFO o.a.i.db.service.DataNode:227 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! - ``` - - ![](/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%883.png) - -- Enter any container to view the service running status and activation information - - View the launched container - - ```SQL - docker ps - ``` - - ![](/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%884.png) - - Enter the container, log in to the database through CLI, and use the `show cluster` command to view the service status and activation status - - ```SQL - docker exec -it iotdb-datanode /bin/bash #Entering the container - ./start-cli.sh -h iotdb-1 #Log in to the database - IoTDB> show cluster #View status - ``` - - You can see that all services are running and the activation status shows as activated. - - ![](/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%885.png) - -### Map/conf Directory (optional) - -If you want to directly modify the configuration file in the physical machine in the future, you can map the/conf folder in the container in three steps: - -Step 1: Copy the `/conf` directory from the container to `/docker-iotdb/iotdb/conf` on each of the three servers - -```bash -docker cp iotdb-confignode:/iotdb/conf /docker-iotdb/iotdb/conf -or -docker cp iotdb-datanode:/iotdb/conf /docker-iotdb/iotdb/conf -``` - -Step 2: Add `/conf` directory mapping in `confignode.yml` and `datanode. yml` on 3 servers - -```bash -#confignode.yml - volumes: - - ./iotdb/conf:/iotdb/conf #Add mapping for this /conf folder - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs - - /dev/mem:/dev/mem:ro - -#datanode.yml - volumes: - - ./iotdb/conf:/iotdb/conf #Add mapping for this /conf folder - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs - - /dev/mem:/dev/mem:ro -``` - -Step 3: Restart IoTDB on 3 servers - -```bash -cd /docker-iotdb -docker-compose -f confignode.yml up -d -docker-compose -f datanode.yml up -d -``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Docker-Deployment_timecho.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Docker-Deployment_timecho.md deleted file mode 100644 index ada158d17..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Docker-Deployment_timecho.md +++ /dev/null @@ -1,475 +0,0 @@ - -# Docker Deployment - -## Environmental Preparation - -### Docker Installation - -```Bash -#Taking Ubuntu as an example, other operating systems can search for installation methods themselves -#step1: Install some necessary system tools -sudo apt-get update -sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common -#step2: Install GPG certificate -curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add - -#step3: Write software source information -sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" -#step4: Update and install Docker CE -sudo apt-get -y update -sudo apt-get -y install docker-ce -#step5: Set Docker to start automatically upon startup -sudo systemctl enable docker -#step6: Verify if Docker installation is successful -docker --version #Display version information, indicating successful installation -``` - -### Docker-compose Installation - -```Bash -#Installation command -curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose -chmod +x /usr/local/bin/docker-compose -ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose -#Verify if the installation was successful -docker-compose --version #Displaying version information indicates successful installation -``` - -### Install The Dmidecode Plugin - -By default, Linux servers should already be installed. If not, you can use the following command to install them. - -```Bash -sudo apt-get install dmidecode -``` - -After installing dmidecode, search for the installation path: `wherever dmidecode`. Assuming the result is `/usr/sbin/dmidecode`, remember this path as it will be used in the later docker compose yml file. - -### Get Container Image Of IoTDB - -You can contact business or technical support to obtain container images for IoTDB Enterprise Edition. - -## Stand-Alone Deployment - -This section demonstrates how to deploy a standalone Docker version of 1C1D. - -### Load Image File - -For example, the container image file name of IoTDB obtained here is: `iotdb-enterprise-1.3.2-3-standalone-docker.tar.gz` - -Load image: - -```Bash -docker load -i iotdb-enterprise-1.3.2.3-standalone-docker.tar.gz -``` - -View image: - -```Bash -docker images -``` - -![](/img/%E5%8D%95%E6%9C%BA-%E6%9F%A5%E7%9C%8B%E9%95%9C%E5%83%8F.png) - -### Create Docker Bridge Network - -```Bash -docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb -``` - -### Write The Yml File For docker-compose - -Here we take the example of consolidating the IoTDB installation directory and yml files in the/docker iotdb folder: - -The file directory structure is:`/docker-iotdb/iotdb`, `/docker-iotdb/docker-compose-standalone.yml ` - -```Bash -docker-iotdb: -├── iotdb #Iotdb installation directory -│── docker-compose-standalone.yml #YML file for standalone Docker Composer -``` - -The complete docker-compose-standalone.yml content is as follows: - -```Bash -version: "3" -services: - iotdb-service: - image: iotdb-enterprise:1.3.2.3-standalone #The image used - hostname: iotdb - container_name: iotdb - restart: always - ports: - - "6667:6667" - environment: - - cn_internal_address=iotdb - - cn_internal_port=10710 - - cn_consensus_port=10720 - - cn_seed_config_node=iotdb:10710 - - dn_rpc_address=iotdb - - dn_internal_address=iotdb - - dn_rpc_port=6667 - - dn_internal_port=10730 - - dn_mpp_data_exchange_port=10740 - - dn_schema_region_consensus_port=10750 - - dn_data_region_consensus_port=10760 - - dn_seed_config_node=iotdb:10710 - privileged: true - volumes: - - ./iotdb/activation:/iotdb/activation - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs - - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro - - /dev/mem:/dev/mem:ro - networks: - iotdb: - ipv4_address: 172.18.0.6 -networks: - iotdb: - external: true -``` - -### First Launch - -Use the following command to start: - -```Bash -cd /docker-iotdb -docker-compose -f docker-compose-standalone.yml up -``` - -Due to lack of activation, it is normal to exit directly upon initial startup. The initial startup is to obtain the machine code file for the subsequent activation process. - -![](/img/%E5%8D%95%E6%9C%BA-%E6%BF%80%E6%B4%BB.png) - -### Apply For Activation - -- After the first startup, a system_info file will be generated in the physical machine directory `/docker-iotdb/iotdb/activation`, and this file will be copied to the Timecho staff. - - ![](/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB1.png) - -- Received the license file returned by the staff, copy the license file to the `/docker iotdb/iotdb/activation` folder. - - ![](/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB2.png) - -### Restart IoTDB - -```Bash -docker-compose -f docker-compose-standalone.yml up -d -``` - -![](/img/%E5%90%AF%E5%8A%A8iotdb.png) - -### Validate Deployment - -- Viewing the log, the following words indicate successful startup - - ```Bash - docker logs -f iotdb-datanode #View log command - 2024-07-19 12:02:32,608 [main] INFO o.a.i.db.service.DataNode:231 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! - ``` - - ![](/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B21.png) - -- Enter the container to view the service running status and activation information - - View the launched container - - ```Bash - docker ps - ``` - - ![](/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B22.png) - - Enter the container, log in to the database through CLI, and use the `show cluster` command to view the service status and activation status - - ```Bash - docker exec -it iotdb /bin/bash #Entering the container - ./start-cli.sh -h iotdb #Log in to the database - IoTDB> show cluster #View status - ``` - - You can see that all services are running and the activation status shows as activated. - - ![](/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B23.png) - -### Map/conf Directory (optional) - -If you want to directly modify the configuration file in the physical machine in the future, you can map the/conf folder in the container in three steps: - -Step 1: Copy the/conf directory from the container to/docker-iotdb/iotdb/conf - -```Bash -docker cp iotdb:/iotdb/conf /docker-iotdb/iotdb/conf -``` - -Step 2: Add mappings in docker-compose-standalone.yml - -```Bash - volumes: - - ./iotdb/conf:/iotdb/conf #Add mapping for this/conf folder - - ./iotdb/activation:/iotdb/activation - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs - - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro - - /dev/mem:/dev/mem:ro -``` - -Step 3: Restart IoTDB - -```Bash -docker-compose -f docker-compose-standalone.yml up -d -``` - -## Cluster Deployment - -This section describes how to manually deploy an instance that includes 3 Config Nodes and 3 Data Nodes, commonly known as a 3C3D cluster. - -
- -
- -**Note: The cluster version currently only supports host and overlay networks, and does not support bridge networks.** - -Taking the host network as an example, we will demonstrate how to deploy a 3C3D cluster. - -### Set Host Name - -Assuming there are currently three Linux servers, the IP addresses and service role assignments are as follows: - -| Node IP | Host Name | Service | -| ----------- | --------- | -------------------- | -| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | -| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | -| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | - -Configure the host names on three machines separately. To set the host names, configure `/etc/hosts` on the target server using the following command: - -```Bash -echo "192.168.1.3 iotdb-1" >> /etc/hosts -echo "192.168.1.4 iotdb-2" >> /etc/hosts -echo "192.168.1.5 iotdb-3" >> /etc/hosts -``` - -### Load Image File - -For example, the container image file name obtained for IoTDB is: `iotdb-enterprise-1.3.23-standalone-docker.tar.gz` - -Execute the load image command on three servers separately: - -```Bash -docker load -i iotdb-enterprise-1.3.2.3-standalone-docker.tar.gz -``` - -View image: - -```Bash -docker images -``` - -![](/img/%E9%95%9C%E5%83%8F%E5%8A%A0%E8%BD%BD.png) - -### Write The Yml File For Docker Compose - -Here we take the example of consolidating the IoTDB installation directory and yml files in the /docker-iotdb folder: - -The file directory structure is:/docker-iotdb/iotdb, /docker-iotdb/confignode.yml,/docker-iotdb/datanode.yml - -```Bash -docker-iotdb: -├── confignode.yml #Yml file of confignode -├── datanode.yml #Yml file of datanode -└── iotdb #IoTDB installation directory -``` - -On each server, two yml files need to be written, namely confignnode. yml and datanode. yml. The example of yml is as follows: - -**confignode.yml:** - -```Bash -#confignode.yml -version: "3" -services: - iotdb-confignode: - image: iotdb-enterprise:1.3.2.3-standalone #The image used - hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation - container_name: iotdb-confignode - command: ["bash", "-c", "entrypoint.sh confignode"] - restart: always - environment: - - cn_internal_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation - - cn_internal_port=10710 - - cn_consensus_port=10720 - - cn_seed_config_node=iotdb-1:10710 #The default first node is the seed node - - schema_replication_factor=3 #Number of metadata copies - - data_replication_factor=2 #Number of data replicas - privileged: true - volumes: - - ./iotdb/activation:/iotdb/activation - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs - - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro - - /dev/mem:/dev/mem:ro - network_mode: "host" #Using the host network -``` - -**datanode.yml:** - -```Bash -#datanode.yml -version: "3" -services: - iotdb-datanode: - image: iotdb-enterprise:1.3.2.3-standalone #The image used - hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation - container_name: iotdb-datanode - command: ["bash", "-c", "entrypoint.sh datanode"] - restart: always - ports: - - "6667:6667" - privileged: true - environment: - - dn_rpc_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation - - dn_internal_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation - - dn_seed_config_node=iotdb-1:10710 #The default first node is the seed node - - dn_rpc_port=6667 - - dn_internal_port=10730 - - dn_mpp_data_exchange_port=10740 - - dn_schema_region_consensus_port=10750 - - dn_data_region_consensus_port=10760 - - schema_replication_factor=3 #Number of metadata copies - - data_replication_factor=2 #Number of data replicas - volumes: - - ./iotdb/activation:/iotdb/activation - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs - - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro - - /dev/mem:/dev/mem:ro - network_mode: "host" #Using the host network -``` - -### Starting Confignode For The First Time - -First, start configNodes on each of the three servers to obtain the machine code. Pay attention to the startup order, start the first iotdb-1 first, then start iotdb-2 and iotdb-3. - -```Bash -cd /docker-iotdb -docker-compose -f confignode.yml up -d #Background startup -``` - -### Apply For Activation - -- After starting three confignodes for the first time, a system_info file will be generated in each physical machine directory `/docker-iotdb/iotdb/activation`, and the system_info files of the three servers will be copied to the Timecho staff; - - ![](/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB1.png) - -- Put the three license files into the `/docker iotdb/iotdb/activation` folder of the corresponding Configurable Node node; - - ![](/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB2.png) - -- After the license is placed in the corresponding activation folder, confignode will be automatically activated without restarting confignode - -### Start Datanode - -Start datanodes on 3 servers separately - -```Bash -cd /docker-iotdb -docker-compose -f datanode.yml up -d #Background startup -``` - -![](/img/%E9%9B%86%E7%BE%A4%E7%89%88-dn%E5%90%AF%E5%8A%A8.png) - -### Validate Deployment - -- Viewing the logs, the following words indicate that the datanode has successfully started - - ```Bash - docker logs -f iotdb-datanode #View log command - 2024-07-20 16:50:48,937 [main] INFO o.a.i.db.service.DataNode:231 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! - ``` - - ![](/img/dn%E5%90%AF%E5%8A%A8.png) - -- Enter any container to view the service running status and activation information - - View the launched container - - ```Bash - docker ps - ``` - - ![](/img/%E6%9F%A5%E7%9C%8B%E5%AE%B9%E5%99%A8.png) - - Enter the container, log in to the database through CLI, and use the `show cluster` command to view the service status and activation status - - ```Bash - docker exec -it iotdb-datanode /bin/bash #Entering the container - ./start-cli.sh -h iotdb-1 #Log in to the database - IoTDB> show cluster #View status - ``` - - You can see that all services are running and the activation status shows as activated. - - ![](/img/%E9%9B%86%E7%BE%A4-%E6%BF%80%E6%B4%BB.png) - -### Map/conf Directory (optional) - -If you want to directly modify the configuration file in the physical machine in the future, you can map the/conf folder in the container in three steps: - -Step 1: Copy the `/conf` directory from the container to `/docker-iotdb/iotdb/conf` on each of the three servers - -```Bash -docker cp iotdb-confignode:/iotdb/conf /docker-iotdb/iotdb/conf -or -docker cp iotdb-datanode:/iotdb/conf /docker-iotdb/iotdb/conf -``` - -Step 2: Add `/conf` directory mapping in `confignode.yml` and `datanode. yml` on 3 servers - -```Bash -#confignode.yml - volumes: - - ./iotdb/conf:/iotdb/conf #Add mapping for this /conf folder - - ./iotdb/activation:/iotdb/activation - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs - - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro - - /dev/mem:/dev/mem:ro - -#datanode.yml - volumes: - - ./iotdb/conf:/iotdb/conf #Add mapping for this /conf folder - - ./iotdb/activation:/iotdb/activation - - ./iotdb/data:/iotdb/data - - ./iotdb/logs:/iotdb/logs - - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro - - /dev/mem:/dev/mem:ro -``` - -Step 3: Restart IoTDB on 3 servers - -```Bash -cd /docker-iotdb -docker-compose -f confignode.yml up -d -docker-compose -f datanode.yml up -d -``` - diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md deleted file mode 100644 index 45fec09c6..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md +++ /dev/null @@ -1,164 +0,0 @@ - -# Dual Active Deployment - -## What is a double active version? - -Dual active usually refers to two independent machines (or clusters) that perform real-time mirror synchronization. Their configurations are completely independent and can simultaneously receive external writes. Each independent machine (or cluster) can synchronize the data written to itself to another machine (or cluster), and the data of the two machines (or clusters) can achieve final consistency. - -- Two standalone machines (or clusters) can form a high availability group: when one of the standalone machines (or clusters) stops serving, the other standalone machine (or cluster) will not be affected. When the single machine (or cluster) that stopped the service is restarted, another single machine (or cluster) will synchronize the newly written data. Business can be bound to two standalone machines (or clusters) for read and write operations, thereby achieving high availability. -- The dual active deployment scheme allows for high availability with fewer than 3 physical nodes and has certain advantages in deployment costs. At the same time, the physical supply isolation of two sets of single machines (or clusters) can be achieved through the dual ring network of power and network, ensuring the stability of operation. -- At present, the dual active capability is a feature of the enterprise version. - -![](/img/20240731104336.png) - -## Note - -1. It is recommended to prioritize using `hostname` for IP configuration during deployment to avoid the problem of database failure caused by modifying the host IP in the later stage. To set the hostname, you need to configure `/etc/hosts` on the target server. If the local IP is 192.168.1.3 and the hostname is iotdb-1, you can use the following command to set the server's hostname and configure IoTDB's `cn_internal-address` and` dn_internal-address` using the hostname. - - ```Bash - echo "192.168.1.3 iotdb-1" >> /etc/hosts - ``` - -2. Some parameters cannot be modified after the first startup, please refer to the "Installation Steps" section below to set them. - -3. Recommend deploying a monitoring panel, which can monitor important operational indicators and keep track of database operation status at any time. The monitoring panel can be obtained by contacting the business department. The steps for deploying the monitoring panel can be referred to [Monitoring Panel Deployment](../Deployment-and-Maintenance/Monitoring-panel-deployment.md) - -## Installation Steps - -Taking the dual active version IoTDB built by two single machines A and B as an example, the IP addresses of A and B are 192.168.1.3 and 192.168.1.4, respectively. Here, we use hostname to represent different hosts. The plan is as follows: - -| Machine | Machine IP | Host Name | -| ------- | ----------- | --------- | -| A | 192.168.1.3 | iotdb-1 | -| B | 192.168.1.4 | iotdb-2 | - -### Step1:Install Two Independent IoTDBs Separately - -Install IoTDB on two machines separately, and refer to the deployment documentation for the standalone version [Stand-Alone Deployment](../Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md),The deployment document for the cluster version can be referred to [Cluster Deployment](../Deployment-and-Maintenance/Cluster-Deployment_timecho.md)。**It is recommended that the configurations of clusters A and B remain consistent to achieve the best dual active effect** - -### Step2:Create A Aata Synchronization Task On Machine A To Machine B - -- Create a data synchronization process on machine A, where the data on machine A is automatically synchronized to machine B. Use the cli tool in the sbin directory to connect to the IoTDB database on machine A: - - ```Bash - ./sbin/start-cli.sh -h iotdb-1 - ``` - -- Create and start the data synchronization command with the following SQL: - - ```Bash - create pipe AB - with source ( - 'source.forwarding-pipe-requests' = 'false' - ) - with sink ( - 'sink'='iotdb-thrift-sink', - 'sink.ip'='iotdb-2', - 'sink.port'='6667' - ) - ``` - -- Note: To avoid infinite data loops, it is necessary to set the parameter `source.forwarding pipe questions` on both A and B to `false`, indicating that data transmitted from another pipe will not be forwarded. - -### Step3:Create A Data Synchronization Task On Machine B To Machine A - -- Create a data synchronization process on machine B, where the data on machine B is automatically synchronized to machine A. Use the cli tool in the sbin directory to connect to the IoTDB database on machine B - - ```Bash - ./sbin/start-cli.sh -h iotdb-2 - ``` - - Create and start the pipe with the following SQL: - - ```Bash - create pipe BA - with source ( - 'source.forwarding-pipe-requests' = 'false' - ) - with sink ( - 'sink'='iotdb-thrift-sink', - 'sink.ip'='iotdb-1', - 'sink.port'='6667' - ) - ``` - -- Note: To avoid infinite data loops, it is necessary to set the parameter `source. forwarding pipe questions` on both A and B to `false` , indicating that data transmitted from another pipe will not be forwarded. - -### Step4:Validate Deployment - -After the above data synchronization process is created, the dual active cluster can be started. - -#### Check the running status of the cluster - -```Bash -#Execute the show cluster command on two nodes respectively to check the status of IoTDB service -show cluster -``` - -**Machine A**: - -![](/img/%E5%8F%8C%E6%B4%BB-A.png) - -**Machine B**: - -![](/img/%E5%8F%8C%E6%B4%BB-B.png) - -Ensure that every Configurable Node and DataNode is in the Running state. - -#### Check synchronization status - -- Check the synchronization status on machine A - -```Bash -show pipes -``` - -![](/img/show%20pipes-A.png) - -- Check the synchronization status on machine B - -```Bash -show pipes -``` - -![](/img/show%20pipes-B.png) - -Ensure that every pipe is in the RUNNING state. - -### Step5:Stop Dual Active Version IoTDB - -- Execute the following command on machine A: - - ```SQL - ./sbin/start-cli.sh -h iotdb-1 #Log in to CLI - IoTDB> stop pipe AB #Stop the data synchronization process - ./sbin/stop-standalone.sh #Stop database service - ``` - -- Execute the following command on machine B: - - ```SQL - ./sbin/start-cli.sh -h iotdb-2 #Log in to CLI - IoTDB> stop pipe BA #Stop the data synchronization process - ./sbin/stop-standalone.sh #Stop database service - ``` - diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Environment-Requirements.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Environment-Requirements.md deleted file mode 100644 index c93bda082..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Environment-Requirements.md +++ /dev/null @@ -1,195 +0,0 @@ - -# System Requirements - -## Disk Array - -### Configuration Suggestions - -IoTDB has no strict operation requirements on disk array configuration. It is recommended to use multiple disk arrays to store IoTDB data to achieve the goal of concurrent writing to multiple disk arrays. For configuration, refer to the following suggestions: - -1. Physical environment - System disk: You are advised to use two disks as Raid1, considering only the space occupied by the operating system itself, and do not reserve system disk space for the IoTDB - Data disk: - Raid is recommended to protect data on disks - It is recommended to provide multiple disks (1-6 disks) or disk groups for the IoTDB. (It is not recommended to create a disk array for all disks, as this will affect the maximum performance of the IoTDB.) -2. Virtual environment - You are advised to mount multiple hard disks (1-6 disks). -3. When deploying IoTDB, it is recommended to avoid using network storage devices such as NAS. - -### Configuration Example - -- Example 1: Four 3.5-inch hard disks - -Only a few hard disks are installed on the server. Configure Raid5 directly. -The recommended configurations are as follows: -| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | -| ----------- | -------- | -------- | --------- | -------- | -| system/data disk | RAID5 | 4 | 1 | 3 | is allowed to fail| - -- Example 2: Twelve 3.5-inch hard disks - -The server is configured with twelve 3.5-inch disks. -Two disks are recommended as Raid1 system disks. The two data disks can be divided into two Raid5 groups. Each group of five disks can be used as four disks. -The recommended configurations are as follows: -| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | -| -------- | -------- | -------- | --------- | -------- | -| system disk | RAID1 | 2 | 1 | 1 | -| data disk | RAID5 | 5 | 1 | 4 | -| data disk | RAID5 | 5 | 1 | 4 | -- Example 3:24 2.5-inch disks - -The server is configured with 24 2.5-inch disks. -Two disks are recommended as Raid1 system disks. The last two disks can be divided into three Raid5 groups. Each group of seven disks can be used as six disks. The remaining block can be idle or used to store pre-write logs. -The recommended configurations are as follows: -| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | -| -------- | -------- | -------- | --------- | -------- | -| system disk | RAID1 | 2 | 1 | 1 | -| data disk | RAID5 | 7 | 1 | 6 | -| data disk | RAID5 | 7 | 1 | 6 | -| data disk | RAID5 | 7 | 1 | 6 | -| data disk | NoRaid | 1 | 0 | 1 | - -## Operating System - -### Version Requirements - -IoTDB supports operating systems such as Linux, Windows, and MacOS, while the enterprise version supports domestic CPUs such as Loongson, Phytium, and Kunpeng. It also supports domestic server operating systems such as Neokylin, KylinOS, UOS, and Linx. - -### Disk Partition - -- The default standard partition mode is recommended. LVM extension and hard disk encryption are not recommended. -- The system disk needs only the space used by the operating system, and does not need to reserve space for the IoTDB. -- Each disk group corresponds to only one partition. Data disks (with multiple disk groups, corresponding to raid) do not need additional partitions. All space is used by the IoTDB. -The following table lists the recommended disk partitioning methods. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Disk classificationDisk setDriveCapacityFile system type
System diskDisk group0/boot1GBAcquiesce
/Remaining space of the disk groupAcquiesce
Data diskDisk set1/data1Full space of disk group1Acquiesce
Disk set2/data2Full space of disk group2Acquiesce
......
-### Network Configuration - -1. Disable the firewall - -```Bash -# View firewall -systemctl status firewalld -# Disable firewall -systemctl stop firewalld -# Disable firewall permanently -systemctl disable firewalld -``` -2. Ensure that the required port is not occupied - -(1) Check the ports occupied by the cluster: In the default cluster configuration, ConfigNode occupies ports 10710 and 10720, and DataNode occupies ports 6667, 10730, 10740, 10750, 10760, 9090, 9190, and 3000. Ensure that these ports are not occupied. Check methods are as follows: - -```Bash -lsof -i:6667 or netstat -tunp | grep 6667 -lsof -i:10710 or netstat -tunp | grep 10710 -lsof -i:10720 or netstat -tunp | grep 10720 -# If the command outputs, the port is occupied. -``` - -(2) Checking the port occupied by the cluster deployment tool: When using the cluster management tool opskit to install and deploy the cluster, enable the SSH remote connection service configuration and open port 22. - -```Bash -yum install openssh-server # Install the ssh service -systemctl start sshd # Enable port 22 -``` - -3. Ensure that servers are connected to each other - -### Other Configuration - -1. Reduce the system swap priority to the lowest level - -```Bash -echo "vm.swappiness = 0">> /etc/sysctl.conf -# The swapoff -a and swapon -a commands are executed together to dump the data in swap back to memory and to empty the data in swap. -# Do not omit the swappiness setting and just execute swapoff -a; Otherwise, swap automatically opens again after the restart, making the operation invalid. -swapoff -a && swapon -a -# Make the configuration take effect without restarting. -sysctl -p -# Swap's used memory has become 0 -free -m -``` - -2. Set the maximum number of open files to 65535 to avoid the error of "too many open files". - -```Bash -# View current restrictions -ulimit -n -# Temporary changes -ulimit -n 65535 -# Permanent modification -echo "* soft nofile 65535" >> /etc/security/limits.conf -echo "* hard nofile 65535" >> /etc/security/limits.conf -# View after exiting the current terminal session, expect to display 65535 -ulimit -n -``` -## Software Dependence - -Install the Java runtime environment (Java version >= 1.8). Ensure that jdk environment variables are set. (It is recommended to deploy JDK17 for V1.3.2.2 or later. In some scenarios, the performance of JDK of earlier versions is compromised, and Datanodes cannot be stopped.) - -```Bash -# The following is an example of installing in centos7 using JDK-17: -tar -zxvf JDk-17_linux-x64_bin.tar # Decompress the JDK file -Vim ~/.bashrc # Configure the JDK environment -{ export JAVA_HOME=/usr/lib/jvm/jdk-17.0.9 - export PATH=$JAVA_HOME/bin:$PATH -} # Add JDK environment variables -source ~/.bashrc # The configuration takes effect -java -version # Check the JDK environment -``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/IoTDB-Package.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/IoTDB-Package.md deleted file mode 100644 index 6057ef6a2..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/IoTDB-Package.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -redirectTo: IoTDB-Package_apache.html ---- - \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/IoTDB-Package_apache.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/IoTDB-Package_apache.md deleted file mode 100644 index dc0484412..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/IoTDB-Package_apache.md +++ /dev/null @@ -1,42 +0,0 @@ - -# Obtain IoTDB - -## How to obtain installation packages -The installation package can be directly obtained from the Apache IoTDB official website:https://iotdb.apache.org/Download/ - -## Installation Package Structure -Install the package after decompression(`apache-iotdb--all-bin.zip`),After decompressing the installation package, the directory structure is as follows: -| **catalogue** | **Type** | **Explanation** | -| :--------------: | :------: | :----------------------------------------------------------: | -| conf | folder | Configuration file directory, including configuration files such as ConfigNode, DataNode, JMX, and logback | -| data | folder | The default data file directory contains data files for ConfigNode and DataNode. (The directory will only be generated after starting the program) | -| lib | folder | IoTDB executable library file directory | -| licenses | folder | Open source community certificate file directory | -| logs | folder | The default log file directory, which includes log files for ConfigNode and DataNode (this directory will only be generated after starting the program) | -| sbin | folder | Main script directory, including start, stop, and other scripts | -| tools | folder | Directory of System Peripheral Tools | -| ext | folder | Related files for pipe, trigger, and UDF plugins (created by the user when needed) | -| LICENSE | file | certificate | -| NOTICE | file | Tip | -| README_ZH\.md | file | Explanation of the Chinese version in Markdown format | -| README\.md | file | Instructions for use | -| RELEASE_NOTES\.md | file | Version Description | \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/IoTDB-Package_timecho.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/IoTDB-Package_timecho.md deleted file mode 100644 index 86e0af2aa..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/IoTDB-Package_timecho.md +++ /dev/null @@ -1,42 +0,0 @@ - -# Obtain TimechoDB -## How to obtain TimechoDB -The enterprise version installation package can be obtained through product trial application or by directly contacting the business personnel who are in contact with you. - -## Installation Package Structure -Install the package after decompression(iotdb-enterprise-{version}-bin.zip),The directory structure after unpacking the installation package is as follows: -| **catalogue** | **Type** | **Explanation** | -| :--------------: | -------- | ------------------------------------------------------------ | -| activation | folder | The directory where the activation file is located, including the generated machine code and the enterprise version activation code obtained from the business side (this directory will only be generated after starting ConfigNode to obtain the activation code) | -| conf | folder | Configuration file directory, including configuration files such as ConfigNode, DataNode, JMX, and logback | -| data | folder | The default data file directory contains data files for ConfigNode and DataNode. (The directory will only be generated after starting the program) | -| lib | folder | IoTDB executable library file directory | -| licenses | folder | Open source community certificate file directory | -| logs | folder | The default log file directory, which includes log files for ConfigNode and DataNode (this directory will only be generated after starting the program) | -| sbin | folder | Main script directory, including start, stop, and other scripts | -| tools | folder | Directory of System Peripheral Tools | -| ext | folder | Related files for pipe, trigger, and UDF plugins (created by the user when needed) | -| LICENSE | file | certificate | -| NOTICE | file | Tip | -| README_ZH\.md | file | Explanation of the Chinese version in Markdown format | -| README\.md | file | Instructions for use | -| RELEASE_NOTES\.md | file | Version Description | diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Monitoring-panel-deployment.md deleted file mode 100644 index 835dd1120..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Monitoring-panel-deployment.md +++ /dev/null @@ -1,680 +0,0 @@ - -# Monitoring Panel Deployment - -The IoTDB monitoring panel is one of the supporting tools for the IoTDB Enterprise Edition. It aims to solve the monitoring problems of IoTDB and its operating system, mainly including operating system resource monitoring, IoTDB performance monitoring, and hundreds of kernel monitoring indicators, in order to help users monitor the health status of the cluster, and perform cluster optimization and operation. This article will take common 3C3D clusters (3 Confignodes and 3 Datanodes) as examples to introduce how to enable the system monitoring module in an IoTDB instance and use Prometheus+Grafana to visualize the system monitoring indicators. - -## Installation Preparation - -1. Installing IoTDB: You need to first install IoTDB V1.0 or above Enterprise Edition. You can contact business or technical support to obtain -2. Obtain the IoTDB monitoring panel installation package: Based on the enterprise version of IoTDB database monitoring panel, you can contact business or technical support to obtain - -## Installation Steps - -### Step 1: IoTDB enables monitoring indicator collection - -1. Open the monitoring configuration item. The configuration items related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to open the relevant configuration items (note that the service needs to be restarted after enabling monitoring configuration). - -| **Configuration** | Located in the configuration file | **Description** | -| :--------------------------------- | :-------------------------------- | :----------------------------------------------------------- | -| cn_metric_reporter_list | conf/iotdb-confignode.properties | Uncomment the configuration item and set the value to PROMETHEUS | -| cn_metric_level | conf/iotdb-confignode.properties | Uncomment the configuration item and set the value to IMPORTANT | -| cn_metric_prometheus_reporter_port | conf/iotdb-sysconfignodetem.properties | Uncomment the configuration item to maintain the default setting of 9091. If other ports are set, they will not conflict with each other | -| dn_metric_reporter_list | conf/iotdb-datanode.properties | Uncomment the configuration item and set the value to PROMETHEUS | -| dn_metric_level | conf/iotdb-datanode.properties | Uncomment the configuration item and set the value to IMPORTANT | -| dn_metric_prometheus_reporter_port | conf/iotdb-datanode.properties | Uncomment the configuration item and set it to 9092 by default. If other ports are set, they will not conflict with each other | - -Taking the 3C3D cluster as an example, the monitoring configuration that needs to be modified is as follows: - -| Node IP | Host Name | Cluster Role | Configuration File Path | Configuration | -| ----------- | --------- | ------------ | -------------------------------- | ------------------------------------------------------------ | -| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-confignode.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-confignode.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-confignode.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | -| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-datanode.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-datanode.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | -| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-datanode.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | - -2. Restart all nodes. After modifying the monitoring indicator configuration of three nodes, the confignode and datanode services of all nodes can be restarted: - -```Bash -./sbin/stop-standalone.sh #Stop confignode and datanode first -./sbin/start-confignode.sh -d #Start confignode -./sbin/start-datanode.sh -d #Start datanode -``` - -3. After restarting, confirm the running status of each node through the client. If the status is Running, it indicates successful configuration: - -![](/img/%E5%90%AF%E5%8A%A8.png) - -### Step 2: Install and configure Prometheus - -> Taking Prometheus installed on server 192.168.1.3 as an example. - -1. Download the Prometheus installation package, which requires installation of V2.30.3 and above. You can go to the Prometheus official website to download it(https://prometheus.io/docs/introduction/first_steps/) -2. Unzip the installation package and enter the unzipped folder: - -```Shell -tar xvfz prometheus-*.tar.gz -cd prometheus-* -``` - -3. Modify the configuration. Modify the configuration file prometheus.yml as follows - 1. Add configNode task to collect monitoring data for ConfigNode - 2. Add a datanode task to collect monitoring data for DataNodes - -```YAML -global: - scrape_interval: 15s - evaluation_interval: 15s -scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true -``` - -4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: - -```Shell -./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d -``` - -5. Confirm successful startup. Enter in browser http://192.168.1.3:9090 Go to Prometheus and click on the Target interface under Status. When you see that all States are Up, it indicates successful configuration and connectivity. - -
- - -
- -6. Clicking on the left link in Targets will redirect you to web monitoring and view the monitoring information of the corresponding node: - -![](/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) - -### Step 3: Install Grafana and configure the data source - -> Taking Grafana installed on server 192.168.1.3 as an example. - -1. Download the Grafana installation package, which requires installing version 8.4.2 or higher. You can go to the Grafana official website to download it(https://grafana.com/grafana/download) -2. Unzip and enter the corresponding folder - -```Shell -tar -zxvf grafana-*.tar.gz -cd grafana-* -``` - -3. Start Grafana: - -```Shell -./bin/grafana-server web -``` - -4. Log in to Grafana. Enter in browser http://192.168.1.3:3000 (or the modified port), enter Grafana, and the default initial username and password are both admin. - -5. Configure data sources. Find Data sources in Connections, add a new data source, and configure the Data Source to Prometheus - -![](/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) - -When configuring the Data Source, pay attention to the URL where Prometheus is located. After configuring it, click on Save&Test and a Data Source is working prompt will appear, indicating successful configuration - -![](/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) - -### Step 4: Import IoTDB Grafana Dashboards - -1. Enter Grafana and select Dashboards: - - ![](/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) - -2. Click the Import button on the right side - - ![](/img/Import%E6%8C%89%E9%92%AE.png) - -3. Import Dashboard using upload JSON file - - ![](/img/%E5%AF%BC%E5%85%A5Dashboard.png) - -4. Select the JSON file of one of the panels in the IoTDB monitoring panel, using the Apache IoTDB ConfigNode Dashboard as an example (refer to the installation preparation section in this article for the monitoring panel installation package): - - ![](/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) - -5. Select Prometheus as the data source and click Import - - ![](/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) - -6. Afterwards, you can see the imported Apache IoTDB ConfigNode Dashboard monitoring panel - - ![](/img/%E9%9D%A2%E6%9D%BF.png) - -7. Similarly, we can import the Apache IoTDB DataNode Dashboard Apache Performance Overview Dashboard、Apache System Overview Dashboard, You can see the following monitoring panel: - -
- - - -
- -8. At this point, all IoTDB monitoring panels have been imported and monitoring information can now be viewed at any time. - - ![](/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) - -## Appendix, Detailed Explanation of Monitoring Indicators - -### System Dashboard - -This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. - -#### CPU - -- CPU Core:CPU cores -- CPU Load: - - System CPU Load:The average CPU load and busyness of the entire system during the sampling time - - Process CPU Load:The proportion of CPU occupied by the IoTDB process during sampling time -- CPU Time Per Minute:The total CPU time of all processes in the system per minute - -#### Memory - -- System Memory:The current usage of system memory. - - Commited vm size: The size of virtual memory allocated by the operating system to running processes. - - Total physical memory:The total amount of available physical memory in the system. - - Used physical memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. -- System Swap Memory:Swap Space memory usage. -- Process Memory:The usage of memory by the IoTDB process. - - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) - - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. - - Used Memory:The total amount of memory currently used by the IoTDB process. - -#### Disk - -- Disk Space: - - Total disk space:The maximum disk space that IoTDB can use. - - Used disk space:The disk space already used by IoTDB. -- Log Number Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. -- File Count:Number of IoTDB related files - - all:All file quantities - - TsFile:Number of TsFiles - - seq:Number of sequential TsFiles - - unseq:Number of unsequence TsFiles - - wal:Number of WAL files - - cross-temp:Number of cross space merge temp files - - inner-seq-temp:Number of merged temp files in sequential space - - innser-unseq-temp:Number of merged temp files in unsequential space - - mods:Number of tombstone files -- Open File Count:Number of file handles opened by the system -- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. -- Disk I/O Busy Rate:Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. -- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. -- Disk I/O Ops:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. -- Disk I/O Avg Time:Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. -- Disk I/O Avg Size:Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. -- Disk I/O Avg Queue Size:Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. -- I/O System Call Rate:The frequency of process calls to read and write system calls, similar to IOPS. -- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. - -#### JVM - -- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window -- GC Allocated/Promoted Size Detail: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications -- GC Data Size Detail:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value -- Heap Memory:JVM heap memory usage. - - Maximum heap memory:The maximum available heap memory size for the JVM. - - Committed heap memory:The size of heap memory that has been committed by the JVM. - - Used heap memory:The size of heap memory already used by the JVM. - - PS Eden Space:The size of the PS Young area. - - PS Old Space:The size of the PS Old area. - - PS Survivor Space:The size of the PS survivor area. - - ...(CMS/G1/ZGC, etc) -- Off Heap Memory:Out of heap memory usage. - - direct memory:Out of heap direct memory. - - mapped memory:Out of heap mapped memory. -- GC Number Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC -- GC Time Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC -- GC Number Per Minute Detail:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC -- GC Time Per Minute Detail:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC -- Time Consumed Of Compilation Per Minute:The total time JVM spends compiling per minute -- The Number of Class: - - loaded:The number of classes currently loaded by the JVM - - unloaded:The number of classes uninstalled by the JVM since system startup -- The Number of Java Thread:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. - -#### Network - -Eno refers to the network card connected to the public network, while lo refers to the virtual network card. - -- Net Speed:The speed of network card sending and receiving data -- Receive/Transmit Data Size:The size of data packets sent or received by the network card, calculated from system restart -- Packet Speed:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets -- Connection Num:The current number of socket connections for the selected process (IoTDB only has TCP) - -### Performance Overview Dashboard - -#### Cluster Overview - -- Total CPU Core:Total CPU cores of cluster machines -- DataNode CPU Load:CPU usage of each DataNode node in the cluster -- Disk - - Total Disk Space: Total disk size of cluster machines - - DataNode Disk Usage: The disk usage rate of each DataNode in the cluster -- Total Timeseries: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas -- Cluster: Number of ConfigNode and DataNode nodes in the cluster -- Up Time: The duration of cluster startup until now -- Total Write Point Per Second: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas -- Memory - - Total System Memory: Total memory size of cluster machine system - - Total Swap Memory: Total size of cluster machine swap memory - - DataNode Process Memory Usage: Memory usage of each DataNode in the cluster -- Total File Number:Total number of cluster management files -- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage -- Total DataBase: The total number of databases managed by the cluster (including replicas) -- Total DataRegion: The total number of DataRegions managed by the cluster -- Total SchemaRegion: The total number of SchemeRegions managed by the cluster - -#### Node Overview - -- CPU Core: The number of CPU cores in the machine where the node is located -- Disk Space: The disk size of the machine where the node is located -- Timeseries: Number of time series managed by the machine where the node is located (including replicas) -- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio -- Write Point Per Second: The write speed per second of the machine where the node is located (including replicas) -- System Memory: The system memory size of the machine where the node is located -- Swap Memory:The swap memory size of the machine where the node is located -- File Number: Number of files managed by nodes - -#### Performance - -- Session Idle Time:The total idle time and total busy time of the session connection of the node -- Client Connection: The client connection status of the node, including the total number of connections and the number of active connections -- Time Consumed Of Operation: The time consumption of various types of node operations, including average and P99 -- Average Time Consumed Of Interface: The average time consumption of each thrust interface of a node -- P99 Time Consumed Of Interface: P99 time consumption of various thrust interfaces of nodes -- Task Number: The number of system tasks for each node -- Average Time Consumed of Task: The average time spent on various system tasks of a node -- P99 Time Consumed of Task: P99 time consumption for various system tasks of nodes -- Operation Per Second: The number of operations per second for a node -- Mainstream Process - - Operation Per Second Of Stage: The number of operations per second for each stage of the node's main process - - Average Time Consumed Of Stage: The average time consumption of each stage in the main process of a node - - P99 Time Consumed Of Stage: P99 time consumption for each stage of the node's main process -- Schedule Stage - - OPS Of Schedule: The number of operations per second in each sub stage of the node schedule stage - - Average Time Consumed Of Schedule Stage:The average time consumption of each sub stage in the node schedule stage - - P99 Time Consumed Of Schedule Stage: P99 time consumption for each sub stage of the schedule stage of the node -- Local Schedule Sub Stages - - OPS Of Local Schedule Stage: The number of operations per second in each sub stage of the local schedule node - - Average Time Consumed Of Local Schedule Stage: The average time consumption of each sub stage in the local schedule stage of the node - - P99 Time Consumed Of Local Schedule Stage: P99 time consumption for each sub stage of the local schedule stage of the node -- Storage Stage - - OPS Of Storage Stage: The number of operations per second in each sub stage of the node storage stage - - Average Time Consumed Of Storage Stage: Average time consumption of each sub stage in the node storage stage - - P99 Time Consumed Of Storage Stage: P99 time consumption for each sub stage of node storage stage -- Engine Stage - - OPS Of Engine Stage: The number of operations per second in each sub stage of the node engine stage - - Average Time Consumed Of Engine Stage: The average time consumption of each sub stage in the engine stage of a node - - P99 Time Consumed Of Engine Stage: P99 time consumption of each sub stage in the node engine stage - -#### System - -- CPU Load: CPU load of nodes -- CPU Time Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores -- GC Time Per Minute:The average GC time per minute for nodes, including YGC and FGC -- Heap Memory: Node's heap memory usage -- Off Heap Memory: Non heap memory usage of nodes -- The Number Of Java Thread: Number of Java threads on nodes -- File Count:Number of files managed by nodes -- File Size: Node management file size situation -- Log Number Per Minute: Different types of logs per minute for nodes - -### ConfigNode Dashboard - -This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. - -#### Node Overview - -- Database Count: Number of databases for nodes -- Region - - DataRegion Count:Number of DataRegions for nodes - - DataRegion Current Status: The state of the DataRegion of the node - - SchemaRegion Count: Number of SchemeRegions for nodes - - SchemaRegion Current Status: The state of the SchemeRegion of the node -- System Memory: The system memory size of the node -- Swap Memory: Node's swap memory size -- ConfigNodes: The running status of the ConfigNode in the cluster where the node is located -- DataNodes:The DataNode situation of the cluster where the node is located -- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load - -#### NodeInfo - -- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode -- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located -- DataNode Status: The status of the DataNode node in the cluster where the node is located -- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located -- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located -- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located -- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located - -#### Protocol - -- Client Count - - Active Client Num: The number of active clients in each thread pool of a node - - Idle Client Num: The number of idle clients in each thread pool of a node - - Borrowed Client Count: Number of borrowed clients in each thread pool of the node - - Created Client Count: Number of created clients for each thread pool of the node - - Destroyed Client Count: The number of destroyed clients in each thread pool of the node -- Client time situation - - Client Mean Active Time: The average active time of clients in each thread pool of a node - - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node - - Client Mean Idle Time: The average idle time of clients in each thread pool of a node - -#### Partition Table - -- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located -- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located -- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located -- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located -- DataRegion Status: The DataRegion status of the cluster where the node is located -- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located - -#### Consensus - -- Ratis Stage Time: The time consumption of each stage of the node's Ratis -- Write Log Entry: The time required to write a log for the Ratis of a node -- Remote / Local Write Time: The time consumption of remote and local writes for the Ratis of nodes -- Remote / Local Write QPS: Remote and local QPS written to node Ratis -- RatisConsensus Memory: Memory usage of Node Ratis consensus protocol - -### DataNode Dashboard - -This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. - -#### Node Overview - -- The Number Of Entity: Entity situation of node management -- Write Point Per Second: The write speed per second of the node -- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. - -#### Protocol - -- Node Operation Time Consumption - - The Time Consumed Of Operation (avg): The average time spent on various operations of a node - - The Time Consumed Of Operation (50%): The median time spent on various operations of a node - - The Time Consumed Of Operation (99%): P99 time consumption for various operations of nodes -- Thrift Statistics - - The QPS Of Interface: QPS of various Thrift interfaces of nodes - - The Avg Time Consumed Of Interface: The average time consumption of each Thrift interface of a node - - Thrift Connection: The number of Thrfit connections of each type of node - - Thrift Active Thread: The number of active Thrift connections for each type of node -- Client Statistics - - Active Client Num: The number of active clients in each thread pool of a node - - Idle Client Num: The number of idle clients in each thread pool of a node - - Borrowed Client Count:Number of borrowed clients for each thread pool of a node - - Created Client Count: Number of created clients for each thread pool of the node - - Destroyed Client Count: The number of destroyed clients in each thread pool of the node - - Client Mean Active Time: The average active time of clients in each thread pool of a node - - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node - - Client Mean Idle Time: The average idle time of clients in each thread pool of a node - -#### Storage Engine - -- File Count: Number of files of various types managed by nodes -- File Size: Node management of various types of file sizes -- TsFile - - TsFile Total Size In Each Level: The total size of TsFile files at each level of node management - - TsFile Count In Each Level: Number of TsFile files at each level of node management - - Avg TsFile Size In Each Level: The average size of TsFile files at each level of node management -- Task Number: Number of Tasks for Nodes -- The Time Consumed of Task: The time consumption of tasks for nodes -- Compaction - - Compaction Read And Write Per Second: The merge read and write speed of nodes per second - - Compaction Number Per Minute: The number of merged nodes per minute - - Compaction Process Chunk Status: The number of Chunks in different states merged by nodes - - Compacted Point Num Per Minute: The number of merged nodes per minute - -#### Write Performance - -- Write Cost(avg): Average node write time, including writing wal and memtable -- Write Cost(50%): Median node write time, including writing wal and memtable -- Write Cost(99%): P99 for node write time, including writing wal and memtable -- WAL - - WAL File Size: Total size of WAL files managed by nodes - - WAL File Num:Number of WAL files managed by nodes - - WAL Nodes Num: Number of WAL nodes managed by nodes - - Make Checkpoint Costs: The time required to create various types of CheckPoints for nodes - - WAL Serialize Total Cost: Total time spent on node WAL serialization - - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster - - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry - - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot - - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush - - Effective Info Ratio Of WALNode: The effective information ratio of different WALNodes of nodes - - WAL Buffer - - WAL Buffer Cost: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options - - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node - - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node -- Flush Statistics - - Flush MemTable Cost(avg): The total time spent on node Flush and the average time spent on each sub stage - - Flush MemTable Cost(50%): The total time spent on node Flush and the median time spent on each sub stage - - Flush MemTable Cost(99%): The total time spent on node Flush and the P99 time spent on each sub stage - - Flush Sub Task Cost(avg): The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages - - Flush Sub Task Cost(50%): The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages - - Flush Sub Task Cost(99%): The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages -- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node -- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes -- Tsfile Compression Ratio Of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable -- Flush TsFile Size Of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions -- Size Of Flushing MemTable: The size of the Memtable for node disk flushing -- Points Num Of Flushing MemTable: The number of points when flashing data in different DataRegions of a node -- Series Num Of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node -- Average Point Num Of Flushing MemChunk: The average number of disk flushing points for node MemChunk - -#### Schema Engine - -- Schema Engine Mode: The metadata engine pattern of nodes -- Schema Consensus Protocol: Node metadata consensus protocol -- Schema Region Number:Number of SchemeRegions managed by nodes -- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node -- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion -- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node -- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) -- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node -- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node -- Time Series statistics - - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion - - Series Type: Number of time series of different types of nodes - - Time Series Number: The total number of time series nodes - - Template Series Number: The total number of template time series for nodes - - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node -- IMNode Statistics - - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion - - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node - - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node - - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node - - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes - - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second -- Cache Hit Rate: Cache hit rate of nodes -- Release and Flush Thread Number: The current number of active Release and Flush threads on the node -- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing -- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing - -#### Query Engine - -- Time Consumption In Each Stage - - The time consumed of query plan stages(avg): The average time spent on node queries at each stage - - The time consumed of query plan stages(50%): Median time spent on node queries at each stage - - The time consumed of query plan stages(99%): P99 time consumption for node query at each stage -- Execution Plan Distribution Time - - The time consumed of plan dispatch stages(avg): The average time spent on node query execution plan distribution - - The time consumed of plan dispatch stages(50%): Median time spent on node query execution plan distribution - - The time consumed of plan dispatch stages(99%): P99 of node query execution plan distribution time -- Execution Plan Execution Time - - The time consumed of query execution stages(avg): The average execution time of node query execution plan - - The time consumed of query execution stages(50%):Median execution time of node query execution plan - - The time consumed of query execution stages(99%): P99 of node query execution plan execution time -- Operator Execution Time - - The time consumed of operator execution stages(avg): The average execution time of node query operators - - The time consumed of operator execution(50%): Median execution time of node query operator - - The time consumed of operator execution(99%): P99 of node query operator execution time -- Aggregation Query Computation Time - - The time consumed of query aggregation(avg): The average computation time for node aggregation queries - - The time consumed of query aggregation(50%): Median computation time for node aggregation queries - - The time consumed of query aggregation(99%): P99 of node aggregation query computation time -- File/Memory Interface Time Consumption - - The time consumed of query scan(avg): The average time spent querying file/memory interfaces for nodes - - The time consumed of query scan(50%): Median time spent querying file/memory interfaces for nodes - - The time consumed of query scan(99%): P99 time consumption for node query file/memory interface -- Number Of Resource Visits - - The usage of query resource(avg): The average number of resource visits for node queries - - The usage of query resource(50%): Median number of resource visits for node queries - - The usage of query resource(99%): P99 for node query resource access quantity -- Data Transmission Time - - The time consumed of query data exchange(avg): The average time spent on node query data transmission - - The time consumed of query data exchange(50%): Median query data transmission time for nodes - - The time consumed of query data exchange(99%): P99 for node query data transmission time -- Number Of Data Transfers - - The count of Data Exchange(avg): The average number of data transfers queried by nodes - - The count of Data Exchange: The quantile of the number of data transfers queried by nodes, including the median and P99 -- Task Scheduling Quantity And Time Consumption - - The number of query queue: Node query task scheduling quantity - - The time consumed of query schedule time(avg): The average time spent on scheduling node query tasks - - The time consumed of query schedule time(50%): Median time spent on node query task scheduling - - The time consumed of query schedule time(99%): P99 of node query task scheduling time - -#### Query Interface - -- Load Time Series Metadata - - The time consumed of load timeseries metadata(avg): The average time taken for node queries to load time series metadata - - The time consumed of load timeseries metadata(50%): Median time spent on loading time series metadata for node queries - - The time consumed of load timeseries metadata(99%): P99 time consumption for node query loading time series metadata -- Read Time Series - - The time consumed of read timeseries metadata(avg): The average time taken for node queries to read time series - - The time consumed of read timeseries metadata(50%): The median time taken for node queries to read time series - - The time consumed of read timeseries metadata(99%): P99 time consumption for node query reading time series -- Modify Time Series Metadata - - The time consumed of timeseries metadata modification(avg):The average time taken for node queries to modify time series metadata - - The time consumed of timeseries metadata modification(50%): Median time spent on querying and modifying time series metadata for nodes - - The time consumed of timeseries metadata modification(99%): P99 time consumption for node query and modification of time series metadata -- Load Chunk Metadata List - - The time consumed of load chunk metadata list(avg): The average time it takes for node queries to load Chunk metadata lists - - The time consumed of load chunk metadata list(50%): Median time spent on node query loading Chunk metadata list - - The time consumed of load chunk metadata list(99%): P99 time consumption for node query loading Chunk metadata list -- Modify Chunk Metadata - - The time consumed of chunk metadata modification(avg): The average time it takes for node queries to modify Chunk metadata - - The time consumed of chunk metadata modification(50%): The total number of bits spent on modifying Chunk metadata for node queries - - The time consumed of chunk metadata modification(99%): P99 time consumption for node query and modification of Chunk metadata -- Filter According To Chunk Metadata - - The time consumed of chunk metadata filter(avg): The average time spent on node queries filtering by Chunk metadata - - The time consumed of chunk metadata filter(50%): Median filtering time for node queries based on Chunk metadata - - The time consumed of chunk metadata filter(99%): P99 time consumption for node query filtering based on Chunk metadata -- Constructing Chunk Reader - - The time consumed of construct chunk reader(avg): The average time spent on constructing Chunk Reader for node queries - - The time consumed of construct chunk reader(50%): Median time spent on constructing Chunk Reader for node queries - - The time consumed of construct chunk reader(99%): P99 time consumption for constructing Chunk Reader for node queries -- Read Chunk - - The time consumed of read chunk(avg): The average time taken for node queries to read Chunks - - The time consumed of read chunk(50%): Median time spent querying nodes to read Chunks - - The time consumed of read chunk(99%): P99 time spent on querying and reading Chunks for nodes -- Initialize Chunk Reader - - The time consumed of init chunk reader(avg): The average time spent initializing Chunk Reader for node queries - - The time consumed of init chunk reader(50%): Median time spent initializing Chunk Reader for node queries - - The time consumed of init chunk reader(99%):P99 time spent initializing Chunk Reader for node queries -- Constructing TsBlock Through Page Reader - - The time consumed of build tsblock from page reader(avg): The average time it takes for node queries to construct TsBlock through Page Reader - - The time consumed of build tsblock from page reader(50%): The median time spent on constructing TsBlock through Page Reader for node queries - - The time consumed of build tsblock from page reader(99%):Node query using Page Reader to construct TsBlock time-consuming P99 -- Query the construction of TsBlock through Merge Reader - - The time consumed of build tsblock from merge reader(avg): The average time taken for node queries to construct TsBlock through Merge Reader - - The time consumed of build tsblock from merge reader(50%): The median time spent on constructing TsBlock through Merge Reader for node queries - - The time consumed of build tsblock from merge reader(99%): Node query using Merge Reader to construct TsBlock time-consuming P99 - -#### Query Data Exchange - -The data exchange for the query is time-consuming. - -- Obtain TsBlock through source handle - - The time consumed of source handle get tsblock(avg): The average time taken for node queries to obtain TsBlock through source handle - - The time consumed of source handle get tsblock(50%):Node query obtains the median time spent on TsBlock through source handle - - The time consumed of source handle get tsblock(99%): Node query obtains TsBlock time P99 through source handle -- Deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(avg): The average time taken for node queries to deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(50%): The median time taken for node queries to deserialize TsBlock through source handle - - The time consumed of source handle deserialize tsblock(99%): P99 time spent on deserializing TsBlock through source handle for node query -- Send TsBlock through sink handle - - The time consumed of sink handle send tsblock(avg): The average time taken for node queries to send TsBlock through sink handle - - The time consumed of sink handle send tsblock(50%): Node query median time spent sending TsBlock through sink handle - - The time consumed of sink handle send tsblock(99%): Node query sends TsBlock through sink handle with a time consumption of P99 -- Callback data block event - - The time consumed of on acknowledge data block event task(avg): The average time taken for node query callback data block event - - The time consumed of on acknowledge data block event task(50%): Median time spent on node query callback data block event - - The time consumed of on acknowledge data block event task(99%): P99 time consumption for node query callback data block event -- Get Data Block Tasks - - The time consumed of get data block task(avg): The average time taken for node queries to obtain data block tasks - - The time consumed of get data block task(50%): The median time taken for node queries to obtain data block tasks - - The time consumed of get data block task(99%): P99 time consumption for node query to obtain data block task - -#### Query Related Resource - -- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries -- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards -- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running -- Coordinator: The number of queries recorded on the node -- MemoryPool Size: Node query related memory pool situation -- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values -- DriverScheduler: Number of queue tasks related to node queries - -#### Consensus - IoT Consensus - -- Memory Usage - - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage -- Synchronization Status Between Nodes - - IoTConsensus Sync Index: SyncIndex size for different DataRegions of IoT Consumption nodes - - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes - - IoTConsensus Search Index Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus Safe Index Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes - - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes - - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node - - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption - - Sync Speed Diff Of Peers: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption - - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions - - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue -- Different Execution Stages Take Time - - The Time Consumed Of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus - - The Time Consumed Of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus - - The Time Consumed Of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus - -#### Consensus - DataRegion Ratis Consensus - -- Ratis Stage Time: The time consumption of different stages of node Ratis -- Write Log Entry: The time consumption of writing logs at different stages of node Ratis -- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotely -- Remote / Local Write QPS: QPS written by node Ratis locally or remotely -- RatisConsensus Memory:Memory usage of node Ratis - -#### Consensus - SchemaRegion Ratis Consensus - -- Ratis Stage Time: The time consumption of different stages of node Ratis -- Write Log Entry: The time consumption for writing logs at each stage of node Ratis -- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely -- Remote / Local Write QPS: QPS written by node Ratis locally or remotely -- RatisConsensus Memory: Node Ratis Memory Usage \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md deleted file mode 100644 index 39605d31a..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md +++ /dev/null @@ -1,178 +0,0 @@ - -# Stand-Alone Deployment - -## Matters Needing Attention - -1. Before installation, ensure that the system is complete by referring to [System configuration](./Environment-Requirements.md). - -2. It is recommended to prioritize using 'hostname' for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure/etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure IoTDB's' cn_internal-address' using the host name dn_internal_address、dn_rpc_address. - - ``` Shell - echo "192.168.1.3 iotdb-1" >> /etc/hosts - ``` - -3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings. - -4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. - -5. Please note that when installing and deploying IoTDB, it is necessary to use the same user for operations. You can: -- Using root user (recommended): Using root user can avoid issues such as permissions. -- Using a fixed non root user: - - Using the same user operation: Ensure that the same user is used for start, stop and other operations, and do not switch users. - - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. - -## Installation Steps - -### Unzip the installation package and enter the installation directory - -```Shell -unzip apache-iotdb-{version}-all-bin.zip -cd apache-iotdb-{version}-all-bin -``` - -### Parameter Configuration - -#### Environment Script Configuration - -- ./conf/confignode-env.sh (./conf/confignode-env.bat) configuration - -| **Configuration** | **Description** | **Default** | **Recommended value** | Note | -| :---------------: | :----------------------------------------------------------: | :---------: | :----------------------------------------------------------: | :---------------------------------: | -| MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | - -- ./conf/datanode-env.sh (./conf/datanode-env.bat) configuration - -| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | -| :---------: | :----------------------------------: | :--------: | :----------------------------------------------: | :----------: | -| MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | - -#### System General Configuration - -Open the general configuration file (./conf/iotdb-common. properties file) and set the following parameters: - -| **Configuration** | **Description** | **Default** | **Recommended value** | Note | -| :-----------------------: | :----------------------------------------------------------: | :------------: | :----------------------------------------------------------: | :---------------------------------------------------: | -| cluster_name | Cluster Name | defaultCluster | The cluster name can be set as needed, and if there are no special needs, the default can be kept | Cannot be modified after initial startup | -| schema_replication_factor | Number of metadata replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | -| data_replication_factor | Number of data replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | - -#### ConfigNode Configuration - -Open the ConfigNode configuration file (./conf/iotdb-confignode. properties file) and set the following parameters: - -| **Configuration** | **Description** | **Default** | **Recommended value** | Note | -| :-----------------: | :----------------------------------------------------------: | :-------------: | :----------------------------------------------------------: | :--------------------------------------: | -| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | -| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | Cannot be modified after initial startup | -| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | Cannot be modified after initial startup | -| cn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | - -#### DataNode Configuration - -Open the DataNode configuration file (./conf/iotdb-datanode.properties file) and set the following parameters: - -| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | -| :-----------------------------: | :----------------------------------------------------------: | :-------------: | :----------------------------------------------------------: | :--------------------------------------- | -| dn_rpc_address | The address of the client RPC service | 0.0.0.0 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Restarting the service takes effect | -| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | Restarting the service takes effect | -| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | -| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | Cannot be modified after initial startup | -| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | Cannot be modified after initial startup | -| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | Cannot be modified after initial startup | -| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | Cannot be modified after initial startup | -| dn_seed_config_node | The ConfigNode address that the node connects to when registering to join the cluster, i.e. cn_internal-address: cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | - -> ❗️Attention: Editors such as VSCode Remote do not have automatic configuration saving function. Please ensure that the modified files are saved persistently, otherwise the configuration items will not take effect - -### Start ConfigNode - -Enter the sbin directory of iotdb and start confignode - -```Shell -./start-confignode.sh -d #The "- d" parameter will start in the background -``` -If the startup fails, please refer to [Common Questions](#common-questions). - -### Start DataNode - -Enter the sbin directory of iotdb and start datanode: - -```Shell -cd sbin -./start-datanode.sh -d #The "- d" parameter will start in the background -``` - -### Verify Deployment - -Can be executed directly/ Cli startup script in sbin directory: - -```Shell -./start-cli.sh -h ip(local IP or domain name) -p port(6667) -``` - -After successful startup, the following interface will appear displaying successful installation of IOTDB. - -![](/img/%E5%BC%80%E6%BA%90%E7%89%88%E5%90%AF%E5%8A%A8%E6%88%90%E5%8A%9F.png) - -After the successful installation interface appears, use the `show cluster` command to check the service running status - -When the status is all running, it indicates that the service has started successfully - -![](/img/%E5%BC%80%E6%BA%90-%E5%8D%95%E6%9C%BAshow.jpeg) - -> The appearance of 'Activated (W)' indicates passive activation, indicating that this Config Node does not have a license file (or has not issued the latest license file with a timestamp). At this point, it is recommended to check if the license file has been placed in the license folder. If not, please place the license file. If a license file already exists, it may be due to inconsistency between the license file of this node and the information of other nodes. Please contact Timecho staff to reapply. - -## Common Questions - -1. Confignode failed to start - - Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. - - Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. - - Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. - - Step 4: Clean up the environment: - - a. Terminate all ConfigNode Node and DataNode processes. - ```Bash - # 1. Stop the ConfigNode and DataNode services - sbin/stop-standalone.sh - - # 2. Check for any remaining processes - jps - # Or - ps -ef|gerp iotdb - - # 3. If there are any remaining processes, manually kill the - kill -9 - # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes - ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 - ``` - b. Delete the data and logs directories. - - Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. - - ```Bash - cd /data/iotdb - rm -rf data logs - ``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md deleted file mode 100644 index 0ef620e96..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md +++ /dev/null @@ -1,221 +0,0 @@ - -# Stand-Alone Deployment - -This chapter will introduce how to start an IoTDB standalone instance, which includes 1 ConfigNode and 1 DataNode (commonly known as 1C1D). - -## Matters Needing Attention - -1. Before installation, ensure that the system is complete by referring to [System configuration](./Environment-Requirements.md). - -2. It is recommended to prioritize using 'hostname' for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure/etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure IoTDB's' cn_internal-address' using the host name dn_internal_address、dn_rpc_address。 - - ```shell - echo "192.168.1.3 iotdb-1" >> /etc/hosts - ``` - -3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings - -4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. - -5. Please note that when installing and deploying IoTDB (including activating and using software), it is necessary to use the same user for operations. You can: -- Using root user (recommended): Using root user can avoid issues such as permissions. -- Using a fixed non root user: - - Using the same user operation: Ensure that the same user is used for start, activation, stop, and other operations, and do not switch users. - - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. - -6. It is recommended to deploy a monitoring panel, which can monitor important operational indicators and keep track of database operation status at any time. The monitoring panel can be obtained by contacting the business department, and the steps for deploying the monitoring panel can be referred to:[Monitoring Board Install and Deploy](./Monitoring-panel-deployment.md). - -## Installation Steps - -### Unzip the installation package and enter the installation directory - -```shell -unzip iotdb-enterprise-{version}-bin.zip -cd iotdb-enterprise-{version}-bin -``` - -### Parameter Configuration - -#### Environment Script Configuration - -- ./conf/confignode-env.sh (./conf/confignode-env.bat) configuration - -| **Configuration** | **Description** | **Default** | **Recommended value** | Note | -| :---------------: | :----------------------------------------------------------: | :---------: | :----------------------------------------------------------: | :---------------------------------: | -| MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | - - -- ./conf/datanode-env.sh (./conf/datanode-env.bat) configuration - -| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | -| :---------: | :----------------------------------: | :--------: | :----------------------------------------------: | :----------: | -| MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | - - -#### System General Configuration - -Open the general configuration file (./conf/iotdb-common. properties file) and set the following parameters: - -| **Configuration** | **Description** | **Default** | **Recommended value** | Note | -| :-----------------------: | :----------------------------------------------------------: | :------------: | :----------------------------------------------------------: | :---------------------------------------------------: | -| cluster_name | Cluster Name | defaultCluster | The cluster name can be set as needed, and if there are no special needs, the default can be kept | Cannot be modified after initial startup | -| schema_replication_factor | Number of metadata replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | -| data_replication_factor | Number of data replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | - -#### ConfigNode Configuration - -Open the ConfigNode configuration file (./conf/iotdb-confignode. properties file) and set the following parameters: - -| **Configuration** | **Description** | **Default** | **Recommended value** | Note | -| :-----------------: | :----------------------------------------------------------: | :-------------: | :----------------------------------------------------------: | :--------------------------------------: | -| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | -| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | Cannot be modified after initial startup | -| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | Cannot be modified after initial startup | -| cn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | - -#### DataNode Configuration - -Open the DataNode configuration file (./conf/iotdb-datanode.properties file) and set the following parameters: - -| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | -| :------------------------------ | :----------------------------------------------------------- | :-------------- | :----------------------------------------------------------- | :--------------------------------------- | -| dn_rpc_address | The address of the client RPC service | 0.0.0.0 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Restarting the service takes effect | -| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | Restarting the service takes effect | -| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | -| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | Cannot be modified after initial startup | -| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | Cannot be modified after initial startup | -| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | Cannot be modified after initial startup | -| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | Cannot be modified after initial startup | -| dn_seed_config_node | The ConfigNode address that the node connects to when registering to join the cluster, i.e. cn_internal-address: cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | - -> ❗️Attention: Editors such as VSCode Remote do not have automatic configuration saving function. Please ensure that the modified files are saved persistently, otherwise the configuration items will not take effect - -### Start ConfigNode - -Enter the sbin directory of iotdb and start confignode - -```shell -./start-confignode.sh -d #The "- d" parameter will start in the background -``` -If the startup fails, please refer to [Common Questions](#common-questions). - -### Activate Database - -#### Method 1: Activate file copy activation - -- After starting the confignode node, enter the activation folder and copy the systeminfo file to the Timecho staff -- Received the license file returned by the staff -- Place the license file in the activation folder of the corresponding node; - -#### Method 2: Activate Script Activation - -- Obtain the required machine code for activation, enter the sbin directory of the installation directory, and execute the activation script: - -```shell - cd sbin -./start-activate.sh -``` - -- The following information is displayed. Please copy the machine code (i.e. the string of characters) to the Timecho staff: - -```shell -Please copy the system_info's content and send it to Timecho: -Y17hFA0xRCE1TmkVxILuCIEPc7uJcr5bzlXWiptw8uZTmTX5aThfypQdLUIhMljw075hNRSicyvyJR9JM7QaNm1gcFZPHVRWVXIiY5IlZkXdxCVc1erXMsbCqUYsR2R2Mw4PSpFJsUF5jHWSoFIIjQ2bmJFW5P52KCccFMVeHTc= -Please enter license: -``` - -- Enter the activation code returned by the staff into the previous command line prompt 'Please enter license:', as shown below: - -```shell -Please enter license: -Jw+MmF+AtexsfgNGOFgTm83Bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxm6pF+APW1CiXLTSijK9Qh3nsLgzrW8OJPh26Vl6ljKUpCvpTiw== -License has been stored to sbin/../activation/license -Import completed. Please start cluster and excute 'show cluster' to verify activation status -``` - -### Start DataNode - -Enter the sbin directory of iotdb and start datanode: - -```shell -cd sbin -./start-datanode.sh -d # The "- d" parameter will start in the background -``` - -### Verify Deployment - -Can be executed directly/ Cli startup script in sbin directory: - -```shell -./start-cli.sh -h ip(local IP or domain name) -p port(6667) -``` - -After successful startup, the following interface will appear displaying successful installation of IOTDB. - -![](/img/%E5%90%AF%E5%8A%A8%E6%88%90%E5%8A%9F.png) - -After the installation success interface appears, continue to check if the activation is successful and use the `show cluster`command - -When you see the display "Activated" on the far right, it indicates successful activation - -![](/img/show%20cluster.png) - -> The appearance of 'Activated (W)' indicates passive activation, indicating that this Config Node does not have a license file (or has not issued the latest license file with a timestamp). At this point, it is recommended to check if the license file has been placed in the license folder. If not, please place the license file. If a license file already exists, it may be due to inconsistency between the license file of this node and the information of other nodes. Please contact Timecho staff to reapply. - -## Common Problem -1. Multiple prompts indicating activation failure during deployment process - - Use the `ls -al` command: Use the `ls -al` command to check if the owner information of the installation package root directory is the current user. - - Check activation directory: Check all files in the `./activation` directory and whether the owner information is the current user. - -2. Confignode failed to start - - Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. - - Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. - - Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. - - Step 4: Clean up the environment: - - a. Terminate all ConfigNode Node and DataNode processes. - ```Bash - # 1. Stop the ConfigNode and DataNode services - sbin/stop-standalone.sh - - # 2. Check for any remaining processes - jps - # Or - ps -ef|gerp iotdb - - # 3. If there are any remaining processes, manually kill the - kill -9 - # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes - ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 - ``` - b. Delete the data and logs directories. - - Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. - - ```Bash - cd /data/iotdb - rm -rf data logs - ``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/workbench-deployment_timecho.md b/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/workbench-deployment_timecho.md deleted file mode 100644 index d18d4bbd0..000000000 --- a/src/UserGuide/V1.3.0-2/Deployment-and-Maintenance/workbench-deployment_timecho.md +++ /dev/null @@ -1,251 +0,0 @@ - -# Workbench Deployment - -The visualization console is one of the supporting tools for IoTDB (similar to Navicat for MySQL). It is an official application tool system used for database deployment implementation, operation and maintenance management, and application development stages, making the use, operation, and management of databases simpler and more efficient, truly achieving low-cost management and operation of databases. This document will assist you in installing Workbench. - -
-  -  -
- - -## Installation Preparation - -| Preparation Content | Name | Version Requirements | Link | -| :----------------------: | :-------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | -| Operating System | Windows or Linux | - | - | -| Installation Environment | JDK | Need>=V1.8.0_162 (recommended to use 11 or 17, please choose ARM or x64 installation package according to machine configuration when downloading) | https://www.oracle.com/java/technologies/downloads/ | -| Related Software | Prometheus | Requires installation of V2.30.3 and above. | https://prometheus.io/download/ | -| Database | IoTDB | Requires V1.2.0 Enterprise Edition and above | You can contact business or technical support to obtain | -| Console | IoTDB-Workbench-`` | - | You can choose according to the appendix version comparison table and contact business or technical support to obtain it | - -## Installation Steps - -### Step 1: IoTDB enables monitoring indicator collection - -1. Open the monitoring configuration item. The configuration items related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to open the relevant configuration items (note that the service needs to be restarted after enabling monitoring configuration). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ConfigurationLocated in the configuration fileDescription
cn_metric_reporter_listconf/iotdb-confignode.propertiesUncomment the configuration item and set the value to PROMETHEUS
cn_metric_levelUncomment the configuration item and set the value to IMPORTANT
cn_metric_prometheus_reporter_portUncomment the configuration item to maintain the default setting of 9091. If other ports are set, they will not conflict with each other
dn_metric_reporter_listconf/iotdb-datanode.propertiesUncomment the configuration item and set the value to PROMETHEUS
dn_metric_levelUncomment the configuration item and set the value to IMPORTANT
dn_metric_prometheus_reporter_portUncomment the configuration item and set it to 9092 by default. If other ports are set, they will not conflict with each other
dn_metric_internal_reporter_typeUncomment the configuration item and set the value to IOTDB
enable_audit_logconf/iotdb-common.propertiesUncomment the configuration item and set the value to true
audit_log_storageUncomment configuration items
audit_log_operationUncomment configuration items
- - -2. Restart all nodes. After modifying the monitoring indicator configuration of three nodes, the confignode and datanode services of all nodes can be restarted: - - ```shell - ./sbin/stop-standalone.sh #Stop confignode and datanode first - ./sbin/start-confignode.sh -d #Start confignode - ./sbin/start-datanode.sh -d #Start datanode - ``` - -3. After restarting, confirm the running status of each node through the client. If the status is Running, it indicates successful configuration: - - ![](/img/%E5%90%AF%E5%8A%A8.png) - -### Step 2: Install and configure Prometheus - -1. Download the Prometheus installation package, which requires installation of V2.30.3 and above. You can go to the Prometheus official website to download it (https://prometheus.io/docs/introduction/first_steps/) -2. Unzip the installation package and enter the unzipped folder: - - ```Shell - tar xvfz prometheus-*.tar.gz - cd prometheus-* - ``` - -3. Modify the configuration. Modify the configuration file prometheus.yml as follows - 1. Add configNode task to collect monitoring data for ConfigNode - 2. Add a datanode task to collect monitoring data for DataNodes - - ```shell - global: - scrape_interval: 15s - evaluation_interval: 15s - scrape_configs: - - job_name: "prometheus" - static_configs: - - targets: ["localhost:9090"] - - job_name: "confignode" - static_configs: - - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] - honor_labels: true - - job_name: "datanode" - static_configs: - - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] - honor_labels: true - ``` - -4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: - - ```Shell - ./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d - ``` - -5. Confirm successful startup. Enter in browser `http://IP:port` Go to Prometheus and click on the Target interface under Status. When you see that all States are Up, it indicates successful configuration and connectivity. - -
- - -
- - -### Step 3: Install Workbench - -#### Windows: - -1. Enter the config directory of iotdb Workbench -`` - -2. Modify Workbench configuration file: Go to the `config` folder and modify the configuration file `application-prod.properties`. If you are installing it locally, there is no need to modify it. If you are deploying it on a server, you need to modify the IP address - > Workbench can be deployed on a local or cloud server as long as it can connect to IoTDB - - | Configuration | Before Modification | After modification | - | ---------------- | ----------------------------------- | ----------------------------------------------- | - | pipe.callbackUrl | pipe.callbackUrl=`http://127.0.0.1` | pipe.callbackUrl=`http://` | - - ![](/img/workbench-conf-1.png) - -3. Startup program: Please execute the startup command in the sbin folder of IoTDB Workbench -`` - - ```shell - # Start Workbench in the background - start.bat -d - ``` - -4. You can use the `jps` command to check if the startup was successful, as shown in the figure: - - ![](/img/windows-jps.png) - -5. Verification successful: Open "`http://Server IP: Port in configuration file`" in the browser to access, for example:"`http://127.0.0.1:9190`" When the login interface appears, it is considered successful - - ![](/img/workbench-en.png) - -#### Linux: - -1. Enter the IoTDB Workbench -`` directory - -2. Modify Workbench configuration: Go to the `config` folder and modify the configuration file `application-prod.properties`. If you are installing it locally, there is no need to modify it. If you are deploying it on a server, you need to modify the IP address - > Workbench can be deployed on a local or cloud server as long as it can connect to IoTDB - - | Configuration | Before Modification | After modification | - | ---------------- | ----------------------------------- | ----------------------------------------------- | - | pipe.callbackUrl | pipe.callbackUrl=`http://127.0.0.1` | pipe.callbackUrl=`http://` | - - ![](/img/workbench-conf-1.png) - -3. Startup program: Please execute the startup command in the sbin folder of IoTDB Workbench -`` - - ```shell - # Start Workbench in the background - ./start.sh -d - ``` - -4. You can use the `jps` command to check if the startup was successful, as shown in the figure: - - ![](/img/linux-jps.png) - -5. Verification successful: Open "`http://Server IP: Port in configuration file`" in the browser to access, for example:"`http://127.0.0.1:9190`" When the login interface appears, it is considered successful - - ![](/img/workbench-en.png) - -### Step 4: Configure Instance Information - -1. Configure instance information: You only need to fill in the following information to connect to the instance - - ![](/img/workbench-en-1.jpeg) - - - | Field Name | Is It A Required Field | Field Meaning | Default Value | - | --------------- | ---------------------- | ------------------------------------------------------------ | ------ | - | Connection Type | | The content filled in for different connection types varies, and supports selecting "single machine, cluster, dual active" | - | - | Instance Name | Yes | You can distinguish different instances based on their names, with a maximum input of 50 characters | - | - | Instance | Yes | Fill in the database address (`dn_rpc_address` field in the `iotdb/conf/iotdb-datanode.properties` file) and port number (`dn_rpc_port` field). Note: For clusters and dual active devices, clicking the "+" button supports entering multiple instance information | - | - | Prometheus | No | Fill in `http://:/app/v1/query` to view some monitoring information on the homepage. We recommend that you configure and use it | - | - | Username | Yes | Fill in the username for IoTDB, supporting input of 4 to 32 characters, including uppercase and lowercase letters, numbers, and special characters (! @ # $% ^&* () _+-=) | root | - | Enter Password | No | Fill in the password for IoTDB. To ensure the security of the database, we will not save the password. Please fill in the password yourself every time you connect to the instance or test | root | - -2. Test the accuracy of the information filled in: You can perform a connection test on the instance information by clicking the "Test" button - - ![](/img/workbench-en-2.png) - -## Appendix: IoTDB and Workbench Version Comparison Table - -| Workbench Version Number | Release Note | Supports IoTDB Versions | -| :------------------------: | :------------------------------------------------------------: | :-------------------------: | -| V1.5.1 | Add AI analysis function and pattern matching function | V1.3.2 and above versions | -| V1.4.0 | New tree model display and internationalization | V1.3.2 and above versions | -| V1.3.1 |New analysis methods have been added to the analysis function, and functions such as optimizing import templates have been optimized |V1.3.2 and above versions | -| V1.3.0 | Add database configuration function |V1.3.2 and above versions | -| V1.2.6 | Optimize the permission control function of each module | V1.3.1 and above versions | -| V1.2.5 | The visualization function has added the concept of "commonly used templates", and all interface optimization and page caching functions have been supplemented | V1.3.0 and above versions | -| V1.2.4 | The calculation function has added the "import and export" function, and the measurement point list has added the "time alignment" field | V1.2.2 and above versions | -| V1.2.3 | New "activation details" and analysis functions added to the homepage | V1.2.2 and above versions | -| V1.2.2 | Optimize the display content and other functions of "measurement point description" | V1.2.2 and above versions | -| V1.2.1 | New "Monitoring Panel" added to the data synchronization interface to optimize Prometheus prompt information | V1.2.2 and above versions | -| V1.2.0 | New Workbench version upgrade | V1.2.0 and above versions | \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/DBeaver.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/DBeaver.md deleted file mode 100644 index cd28d1b38..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/DBeaver.md +++ /dev/null @@ -1,92 +0,0 @@ - - -# DBeaver - -DBeaver is a SQL client software application and a database administration tool. It can use the JDBC application programming interface (API) to interact with IoTDB via the JDBC driver. - -## DBeaver Installation - -* From DBeaver site: https://dbeaver.io/download/ - -## IoTDB Installation - -* Download binary version - * From IoTDB site: https://iotdb.apache.org/Download/ - * Version >= 0.13.0 -* Or compile from source code - * See https://github.com/apache/iotdb - -## Connect IoTDB and DBeaver - -1. Start IoTDB server - - ```shell - ./sbin/start-server.sh - ``` -2. Start DBeaver -3. Open Driver Manager - - ![](/img/UserGuide/Ecosystem-Integration/DBeaver/01.png) - -4. Create a new driver type for IoTDB - - ![](/img/UserGuide/Ecosystem-Integration/DBeaver/02.png) - -5. Download `iotdb-jdbc`, from [source1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/) or [source2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/),choose the corresponding jar file,download the suffix `jar-with-dependencies.jar` file. - ![](/img/20230920-192746.jpg) - -6. Add the downloaded jar file, then select `Find Class`. - - ![](/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) - -7. Edit the driver Settings - - ![](/img/UserGuide/Ecosystem-Integration/DBeaver/05.png) - - ``` - Driver Name: IoTDB - Driver Type: Generic - URL Template: jdbc:iotdb://{host}:{port}/ - Default Port: 6667 - Default User: root - ``` - -8. Open New DataBase Connection and select iotdb - - ![](/img/UserGuide/Ecosystem-Integration/DBeaver/06.png) - -9. Edit JDBC Connection Settings - - ``` - JDBC URL: jdbc:iotdb://127.0.0.1:6667/ - Username: root - Password: root - ``` - ![](/img/UserGuide/Ecosystem-Integration/DBeaver/07.png) - -10. Test Connection - - ![](/img/UserGuide/Ecosystem-Integration/DBeaver/08.png) - -11. Enjoy IoTDB with DBeaver - - ![](/img/UserGuide/Ecosystem-Integration/DBeaver/09.png) diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/DataEase.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/DataEase.md deleted file mode 100644 index 021ed7d68..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/DataEase.md +++ /dev/null @@ -1,228 +0,0 @@ - -# DataEase - -## Product Overview - -1. Introduction to DataEase - - DataEase is an open-source data visualization and analysis tool that provides a drag-and-drop interface, allowing users to easily create charts and dashboards. It supports multiple data sources such as MySQL, SQL Server, Hive, ClickHouse, and DM, and can be integrated into other applications. This tool helps users quickly gain insights from their data and make informed decisions. For more detailed information, please refer to [DataEase official website](https://www.fit2cloud.com/dataease/index.html) - -
- -
- -2. Introduction to the DataEase-IoTDB Connector - - IoTDB can be efficiently integrated with DataEase through API data sources, and IoTDB data can be accessed through the Session interface using API data source plugins. This plugin supports customized data processing functions, providing users with greater flexibility and more diverse data operation options. -
- -
- -## Installation Requirements - -| **Preparation Content** | **Version Requirements** | -| :-------------------- | :----------------------------------------------------------- | -| IoTDB | Version not required, please refer to [Deployment Guidance](https://www.timecho-global.com/docs/UserGuide/latest/Deployment-and-Maintenance/IoTDB-Package_timecho.html) | -| JDK | Requires JDK 11 or higher (JDK 17 or above is recommended for optimal performance) | -| DataEase | Requires v1 series v1.18 version, please refer to the official [DataEase Installation Guide](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(V2.x is currently not supported. For integration with other versions, please contact Timecho) | -| DataEase-IoTDB Connector | Please contact Timecho for assistance | - -## Installation Steps - -Step 1: Please contact Timecho to obtain the file and unzip the installation package `iotdb-api-source-1.0.0.zip` - -Step 2: After extracting the files, modify the `application.properties` configuration file in the `config` folder - -- `server.port` can be modified as needed. -- `iotdb.nodeUrls` should be configured with the address and port of the IoTDB instance to be connected. -- `iotdb.user` should be set to the IoTDB username. -- `iotdb.password` should be set to the IoTDB password. - -```Properties -# Port on which the IoTDB API Source listens -server.port=8097 -# IoTDB instance addresses, multiple nodeUrls separated by ; -iotdb.nodeUrls=127.0.0.1:6667 -# IoTDB username -iotdb.user=root -# IoTDB password -iotdb.password=root -``` - -Step 3: Start up DataEase-IoTDB Connector - -- Foreground start - -```Shell -./sbin/start.sh -``` - -- Background start (add - d parameter) - -```Shell -./sbin/start.sh -d -``` - -Step 4: After startup, you can check whether the startup was successful through the log. - -```Shell - lsof -i:8097 // The port configured in the file where the IoTDB API Source listens -``` - -## Instructions - -### Sign in DataEase - -1. Sign in DataEase,access address: `http://[target server IP address]:80` -
- -
- -### Configure data source - -1. Navigate to "Data Source". -
- -
- -2. Click on the "+" on the top left corner, choose "API" at the bottom as data source. -
- -
- -3. Set the "Display Name" and add the API Data Source. -
- -
- -4. Set the name of the Dataset Table, select "Post" as the Request Type, fill in the address with `http://[IoTDB API Source]:[port]/getData>`. If operating on the local machine and using the default port, the address should be set to `http://127.0.0.1:8097/getData`. -
- -
- -5. In the "Request parameters"-"Request Body" configuration, set the format as "JSON". Please fill in the parameters according to the following example: - - timeseries:The full path of the series to be queried (currently only one series can be queried). - - limit:The number of entries to query (valid range is greater than 0 and less than 100,000). - - ```JSON - { - "timeseries": "root.ln.wf03.wt03.speed", - "limit": 1000 - } - ``` -
- -
- -6. In the "Request parameters"-"Request Body" configuration, set "Basic Auth" as the verification method, and enter the IoTDB username and password. -
- -
- -7. In the next step, results are returned in the "data" section. For example, it returns `time`, `rownumber` and `value` as shown in the interface below. The date type for each field also need to be specified. After completing the settings, click the "Save" button in the bottom. -
- -
- -8. Save the settings to complete creating new API data source. -
- -
- -9. You can now view and edit the data source and its detailed information under "API"-"Data Source". -
- -
- -### Configure the Dataset - -1. Create API dataset: Navigate to "Data Set",click on the "+" on the top left corner, select "API dataset" and choose the directory where this dataset is located to enter the New API Dataset interface. -
- - -
- -2. Select the newly created API data source and the corresponding dataset table, then define the DataSet Name. Save the settings to complete the creation of the dataset. -
- - -
- -3. Select the newly created dataset and navigate to "Field Manage", check the required fields (such as `rowNum`) and convert them to dimensions. -
- -
- -4. Configure update frequency: Click on "Add Task" under "Update info" tag and set the following information: - - - Task Name: Define the task name - - - Update method: Select "Full update" - - - Execution frequency: Set according to the actual situation. Considering the data retrieval speed of DataEase, it is recommended to set the update frequency to more than 5 seconds. For example, to set the update frequency to every 5 seconds, select "Expression setting" and configure the cron expression as `0/5****?*`. - Click on "Confirm" to save the settings. -
- -
- -5. The task is now successfully added. You can click "Execution record" to view the logs. -
- -
- -### Configure Dashboard - -1. Navigate to "Dashboard", click on "+" to create a directory, then click on "+" of the directory and select "Create Dashboard". -
- -
- -2. After setting up as needed, click on "Confirm". We will taking "Custom" setting as an example. -
- -
- -3. In the new dashboard interface, click on "Chart" to open a pop-up window for adding views. Select the previously created dataset and click on "Next". -
- -
- -4. Choose a chart type by need and define the chart title. We take "Base Line" as an example. Confirm to proceed. -
- -
- -5. In the chart configuration interface, drag and drop the `rowNum` field to the category axis (usually the X-axis) and the `value` field to the value axis (usually the Y-axis). -
- -
- -6. In the chart's category axis settings, set the sorting order to ascending, so that the data will be displayed in increasing order. Set the data refresh frequency to determine the frequency of chart updates. After completing these settings, you can further adjust other format and style options for the chart, such as color, size, etc., to meet display needs. Once adjustments are made, click the "Save" button to save the chart configuration. ->Since DataEase may cause the API data, originally returned in ascending order, to become disordered after automatically updating the dataset, it is necessary to manually specify the sorting order in the chart configuration. -
- -
- -7. After exiting the editing mode, you will be able to see the corresponding chart. -
- -
\ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Flink-IoTDB.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Flink-IoTDB.md deleted file mode 100644 index 676604775..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Flink-IoTDB.md +++ /dev/null @@ -1,215 +0,0 @@ - - -# Apache Flink(IoTDB) - -IoTDB integration for [Apache Flink](https://flink.apache.org/). This module includes the IoTDB sink that allows a flink job to write events into timeseries, and the IoTDB source allowing reading data from IoTDB. - -## IoTDBSink - -To use the `IoTDBSink`, you need construct an instance of it by specifying `IoTDBSinkOptions` and `IoTSerializationSchema` instances. -The `IoTDBSink` send only one event after another by default, but you can change to batch by invoking `withBatchSize(int)`. - -### Example - -This example shows a case that sends data to a IoTDB server from a Flink job: - -- A simulated Source `SensorSource` generates data points per 1 second. -- Flink uses `IoTDBSink` to consume the generated data points and write the data into IoTDB. - -It is noteworthy that to use IoTDBSink, schema auto-creation in IoTDB should be enabled. - -```java -import org.apache.iotdb.flink.options.IoTDBSinkOptions; -import org.apache.iotdb.tsfile.file.metadata.enums.CompressionType; -import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; -import org.apache.iotdb.tsfile.file.metadata.enums.TSEncoding; - -import com.google.common.collect.Lists; -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; -import org.apache.flink.streaming.api.functions.source.SourceFunction; - -import java.security.SecureRandom; -import java.util.HashMap; -import java.util.Map; -import java.util.Random; - -public class FlinkIoTDBSink { - public static void main(String[] args) throws Exception { - // run the flink job on local mini cluster - StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - IoTDBSinkOptions options = new IoTDBSinkOptions(); - options.setHost("127.0.0.1"); - options.setPort(6667); - options.setUser("root"); - options.setPassword("root"); - - // If the server enables auto_create_schema, then we do not need to register all timeseries - // here. - options.setTimeseriesOptionList( - Lists.newArrayList( - new IoTDBSinkOptions.TimeseriesOption( - "root.sg.d1.s1", TSDataType.DOUBLE, TSEncoding.GORILLA, CompressionType.SNAPPY))); - - IoTSerializationSchema serializationSchema = new DefaultIoTSerializationSchema(); - IoTDBSink ioTDBSink = - new IoTDBSink(options, serializationSchema) - // enable batching - .withBatchSize(10) - // how many connections to the server will be created for each parallelism - .withSessionPoolSize(3); - - env.addSource(new SensorSource()) - .name("sensor-source") - .setParallelism(1) - .addSink(ioTDBSink) - .name("iotdb-sink"); - - env.execute("iotdb-flink-example"); - } - - private static class SensorSource implements SourceFunction> { - boolean running = true; - Random random = new SecureRandom(); - - @Override - public void run(SourceContext context) throws Exception { - while (running) { - Map tuple = new HashMap(); - tuple.put("device", "root.sg.d1"); - tuple.put("timestamp", String.valueOf(System.currentTimeMillis())); - tuple.put("measurements", "s1"); - tuple.put("types", "DOUBLE"); - tuple.put("values", String.valueOf(random.nextDouble())); - - context.collect(tuple); - Thread.sleep(1000); - } - } - - @Override - public void cancel() { - running = false; - } - } -} - -``` - -### Usage - -* Launch the IoTDB server. -* Run `org.apache.iotdb.flink.FlinkIoTDBSink.java` to run the flink job on local mini cluster. - -## IoTDBSource -To use the `IoTDBSource`, you need to construct an instance of `IoTDBSource` by specifying `IoTDBSourceOptions` -and implementing the abstract method `convert()` in `IoTDBSource`. The `convert` methods defines how -you want the row data to be transformed. - -### Example -This example shows a case where data are read from IoTDB. -```java -import org.apache.iotdb.flink.options.IoTDBSourceOptions; -import org.apache.iotdb.rpc.IoTDBConnectionException; -import org.apache.iotdb.rpc.StatementExecutionException; -import org.apache.iotdb.rpc.TSStatusCode; -import org.apache.iotdb.session.Session; -import org.apache.iotdb.tsfile.file.metadata.enums.CompressionType; -import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; -import org.apache.iotdb.tsfile.file.metadata.enums.TSEncoding; -import org.apache.iotdb.tsfile.read.common.RowRecord; - -import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; - -import java.util.ArrayList; -import java.util.List; - -public class FlinkIoTDBSource { - - static final String LOCAL_HOST = "127.0.0.1"; - static final String ROOT_SG1_D1_S1 = "root.sg1.d1.s1"; - static final String ROOT_SG1_D1 = "root.sg1.d1"; - - public static void main(String[] args) throws Exception { - prepareData(); - - // run the flink job on local mini cluster - StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); - - IoTDBSourceOptions ioTDBSourceOptions = - new IoTDBSourceOptions("127.0.0.1", 6667, "root", "root", - "select s1 from " + ROOT_SG1_D1 + " align by device"); - - env.addSource( - new IoTDBSource(ioTDBSourceOptions) { - @Override - public RowRecord convert(RowRecord rowRecord) { - return rowRecord; - } - }) - .name("sensor-source") - .print() - .setParallelism(2); - env.execute(); - } - - /** - * Write some data to IoTDB - */ - private static void prepareData() throws IoTDBConnectionException, StatementExecutionException { - Session session = new Session(LOCAL_HOST, 6667, "root", "root"); - session.open(false); - try { - session.setStorageGroup("root.sg1"); - if (!session.checkTimeseriesExists(ROOT_SG1_D1_S1)) { - session.createTimeseries( - ROOT_SG1_D1_S1, TSDataType.INT64, TSEncoding.RLE, CompressionType.SNAPPY); - List measurements = new ArrayList<>(); - List types = new ArrayList<>(); - measurements.add("s1"); - measurements.add("s2"); - measurements.add("s3"); - types.add(TSDataType.INT64); - types.add(TSDataType.INT64); - types.add(TSDataType.INT64); - - for (long time = 0; time < 100; time++) { - List values = new ArrayList<>(); - values.add(1L); - values.add(2L); - values.add(3L); - session.insertRecord(ROOT_SG1_D1, time, measurements, types, values); - } - } - } catch (StatementExecutionException e) { - if (e.getStatusCode() != TSStatusCode.PATH_ALREADY_EXIST_ERROR.getStatusCode()) { - throw e; - } - } - } -} -``` - -### Usage -Launch the IoTDB server. -Run org.apache.iotdb.flink.FlinkIoTDBSource.java to run the flink job on local mini cluster. - diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Flink-TsFile.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Flink-TsFile.md deleted file mode 100644 index e1ea626dd..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Flink-TsFile.md +++ /dev/null @@ -1,180 +0,0 @@ - - -# Apache Flink(TsFile) - -## About Flink-TsFile-Connector - -Flink-TsFile-Connector implements the support of Flink for external data sources of Tsfile type. -This enables users to read and write Tsfile by Flink via DataStream/DataSet API. - -With this connector, you can - -* load a single TsFile or multiple TsFiles(only for DataSet), from either the local file system or hdfs, into Flink -* load all files in a specific directory, from either the local file system or hdfs, into Flink - -## Quick Start - -### TsFileInputFormat Example - -1. create TsFileInputFormat with default RowRowRecordParser. - -```java -String[] filedNames = { - QueryConstant.RESERVED_TIME, - "device_1.sensor_1", - "device_1.sensor_2", - "device_1.sensor_3", - "device_2.sensor_1", - "device_2.sensor_2", - "device_2.sensor_3" -}; -TypeInformation[] typeInformations = new TypeInformation[] { - Types.LONG, - Types.FLOAT, - Types.INT, - Types.INT, - Types.FLOAT, - Types.INT, - Types.INT -}; -List paths = Arrays.stream(filedNames) - .filter(s -> !s.equals(QueryConstant.RESERVED_TIME)) - .map(Path::new) - .collect(Collectors.toList()); -RowTypeInfo rowTypeInfo = new RowTypeInfo(typeInformations, filedNames); -QueryExpression queryExpression = QueryExpression.create(paths, null); -RowRowRecordParser parser = RowRowRecordParser.create(rowTypeInfo, queryExpression.getSelectedSeries()); -TsFileInputFormat inputFormat = new TsFileInputFormat<>(queryExpression, parser); -``` - -2. Read data from the input format and print to stdout: - -DataStream: - -```java -StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment(); -inputFormat.setFilePath("source.tsfile"); -DataStream source = senv.createInput(inputFormat); -DataStream rowString = source.map(Row::toString); -Iterator result = DataStreamUtils.collect(rowString); -while (result.hasNext()) { - System.out.println(result.next()); -} -``` - -DataSet: - -```java -ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); -inputFormat.setFilePath("source.tsfile"); -DataSet source = env.createInput(inputFormat); -List result = source.map(Row::toString).collect(); -for (String s : result) { - System.out.println(s); -} -``` - -### Example of TSRecordOutputFormat - -1. create TSRecordOutputFormat with default RowTSRecordConverter. - -```java -String[] filedNames = { - QueryConstant.RESERVED_TIME, - "device_1.sensor_1", - "device_1.sensor_2", - "device_1.sensor_3", - "device_2.sensor_1", - "device_2.sensor_2", - "device_2.sensor_3" -}; -TypeInformation[] typeInformations = new TypeInformation[] { - Types.LONG, - Types.LONG, - Types.LONG, - Types.LONG, - Types.LONG, - Types.LONG, - Types.LONG -}; -RowTypeInfo rowTypeInfo = new RowTypeInfo(typeInformations, filedNames); -Schema schema = new Schema(); -schema.extendTemplate("template", new MeasurementSchema("sensor_1", TSDataType.INT64, TSEncoding.TS_2DIFF)); -schema.extendTemplate("template", new MeasurementSchema("sensor_2", TSDataType.INT64, TSEncoding.TS_2DIFF)); -schema.extendTemplate("template", new MeasurementSchema("sensor_3", TSDataType.INT64, TSEncoding.TS_2DIFF)); -RowTSRecordConverter converter = new RowTSRecordConverter(rowTypeInfo); -TSRecordOutputFormat outputFormat = new TSRecordOutputFormat<>(schema, converter); -``` - -2. write data via the output format: - -DataStream: - -```java -StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment(); -senv.setParallelism(1); -List data = new ArrayList<>(7); -data.add(new Tuple7(1L, 2L, 3L, 4L, 5L, 6L, 7L)); -data.add(new Tuple7(2L, 3L, 4L, 5L, 6L, 7L, 8L)); -data.add(new Tuple7(3L, 4L, 5L, 6L, 7L, 8L, 9L)); -data.add(new Tuple7(4L, 5L, 6L, 7L, 8L, 9L, 10L)); -data.add(new Tuple7(6L, 6L, 7L, 8L, 9L, 10L, 11L)); -data.add(new Tuple7(7L, 7L, 8L, 9L, 10L, 11L, 12L)); -data.add(new Tuple7(8L, 8L, 9L, 10L, 11L, 12L, 13L)); -outputFormat.setOutputFilePath(new org.apache.flink.core.fs.Path(path)); -DataStream source = senv.fromCollection( - data, Types.TUPLE(Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG)); -source.map(t -> { - Row row = new Row(7); - for (int i = 0; i < 7; i++) { - row.setField(i, t.getField(i)); - } - return row; -}).returns(rowTypeInfo).writeUsingOutputFormat(outputFormat); -senv.execute(); -``` - -DataSet: - -```java -ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); -env.setParallelism(1); -List data = new ArrayList<>(7); -data.add(new Tuple7(1L, 2L, 3L, 4L, 5L, 6L, 7L)); -data.add(new Tuple7(2L, 3L, 4L, 5L, 6L, 7L, 8L)); -data.add(new Tuple7(3L, 4L, 5L, 6L, 7L, 8L, 9L)); -data.add(new Tuple7(4L, 5L, 6L, 7L, 8L, 9L, 10L)); -data.add(new Tuple7(6L, 6L, 7L, 8L, 9L, 10L, 11L)); -data.add(new Tuple7(7L, 7L, 8L, 9L, 10L, 11L, 12L)); -data.add(new Tuple7(8L, 8L, 9L, 10L, 11L, 12L, 13L)); -DataSet source = env.fromCollection( - data, Types.TUPLE(Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG)); -source.map(t -> { - Row row = new Row(7); - for (int i = 0; i < 7; i++) { - row.setField(i, t.getField(i)); - } - return row; -}).returns(rowTypeInfo).write(outputFormat, path); -env.execute(); -``` - diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Grafana-Connector.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Grafana-Connector.md deleted file mode 100644 index 92fb176fa..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Grafana-Connector.md +++ /dev/null @@ -1,180 +0,0 @@ - - -# Grafana(IoTDB) - -Grafana is an open source volume metrics monitoring and visualization tool, which can be used to display time series data and application runtime analysis. Grafana supports Graphite, InfluxDB and other major time series databases as data sources. IoTDB-Grafana-Connector is a connector which we developed to show time series data in IoTDB by reading data from IoTDB and sends to Grafana(https://grafana.com/). Before using this tool, make sure Grafana and IoTDB are correctly installed and started. - -## Installation and deployment - -### Install Grafana - -* Download url: https://grafana.com/grafana/download -* Version >= 4.4.1 - -### Install data source plugin - -* Plugin name: simple-json-datasource -* Download url: https://github.com/grafana/simple-json-datasource - -After downloading this plugin, use the grafana-cli tool to install SimpleJson from the commandline: - -``` -grafana-cli plugins install grafana-simple-json-datasource -``` - -Alternatively, manually download the .zip file and unpack it into grafana plugins directory. - -* `{grafana-install-directory}\data\plugins\` (Windows) -* `/var/lib/grafana/plugins` (Linux) -* `/usr/local/var/lib/grafana/plugins`(Mac) - -Then you need to restart grafana server, then you can use browser to visit grafana. - -If you see "SimpleJson" in "Type" of "Add data source" pages, then it is install successfully. - -Or, if you meet following errors: - -``` -Unsigned plugins were found during plugin initialization. Grafana Labs cannot guarantee the integrity of these plugins. We recommend only using signed plugins. -The following plugins are disabled and not shown in the list below: -``` - -Please try to find config file of grafana(eg. customer.ini in windows, and /etc/grafana/grafana.ini in linux), then add following configuration: - -``` -allow_loading_unsigned_plugins = "grafana-simple-json-datasource" -``` - -### Start Grafana -If Unix is used, Grafana will start automatically after installing, or you can run `sudo service grafana-server start` command. See more information [here](http://docs.grafana.org/installation/debian/). - -If Mac and `homebrew` are used to install Grafana, you can use `homebrew` to start Grafana. -First make sure homebrew/services is installed by running `brew tap homebrew/services`, then start Grafana using: `brew services start grafana`. -See more information [here](http://docs.grafana.org/installation/mac/). - -If Windows is used, start Grafana by executing grafana-server.exe, located in the bin directory, preferably from the command line. See more information [here](http://docs.grafana.org/installation/windows/). - -## IoTDB installation - -See https://github.com/apache/iotdb - -## IoTDB-Grafana-Connector installation - -```shell -git clone https://github.com/apache/iotdb.git -``` - -## Start IoTDB-Grafana-Connector - -* Option one - -Import the entire project, after the maven dependency is installed, directly run`iotdb/grafana-connector/rc/main/java/org/apache/iotdb/web/grafana`directory` TsfileWebDemoApplication.java`, this grafana connector is developed by springboot - -* Option two - -In `/grafana/target/`directory - -```shell -cd iotdb -mvn clean package -pl iotdb-connector/grafana-connector -am -Dmaven.test.skip=true -cd iotdb-connector/grafana-connector/target -java -jar iotdb-grafana-connector-{version}.war -``` - -If following output is displayed, then iotdb-grafana-connector connector is successfully activated. - -```shell -$ java -jar iotdb-grafana-connector-{version}.war - - . ____ _ __ _ _ - /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \ -( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \ - \\/ ___)| |_)| | | | | || (_| | ) ) ) ) - ' |____| .__|_| |_|_| |_\__, | / / / / - =========|_|==============|___/=/_/_/_/ - :: Spring Boot :: (v1.5.4.RELEASE) -... -``` - -To configure properties, move the `grafana-connector/src/main/resources/application.properties` to the same directory as the war package (`grafana/target`) - -## Explore in Grafana - -The default port of Grafana is 3000, see http://localhost:3000/ - -Username and password are both "admin" by default. - -### Add data source - -Select `Data Sources` and then `Add data source`, select `SimpleJson` in `Type` and `URL` is http://localhost:8888. -After that, make sure IoTDB has been started, click "Save & Test", and "Data Source is working" will be shown to indicate successful configuration. - - - - -### Design in dashboard - -Add diagrams in dashboard and customize your query. See http://docs.grafana.org/guides/getting_started/ - - - -## config grafana - -``` -# ip and port of IoTDB -spring.datasource.url=jdbc:iotdb://127.0.0.1:6667/ -spring.datasource.username=root -spring.datasource.password=root -spring.datasource.driver-class-name=org.apache.iotdb.jdbc.IoTDBDriver -server.port=8888 -# Use this value to set timestamp precision as "ms", "us" or "ns", which must to be same with the timestamp -# precision of Apache IoTDB engine. -timestamp_precision=ms - -# Use this value to set down sampling true/false -isDownSampling=true -# defaut sampling intervals -interval=1m -# aggregation function to use to downsampling the data (int, long, float, double) -# COUNT, FIRST_VALUE, LAST_VALUE, MAX_TIME, MAX_VALUE, AVG, MIN_TIME, MIN_VALUE, NOW, SUM -continuous_data_function=AVG -# aggregation function to use to downsampling the data (boolean, string) -# COUNT, FIRST_VALUE, LAST_VALUE, MAX_TIME, MIN_TIME, NOW -discrete_data_function=LAST_VALUE -``` - -The specific configuration information of interval is as follows - -<1h: no sampling - -1h~1d : intervals = 1m - -1d~30d:intervals = 1h - -\>30d:intervals = 1d - -After configuration, please re-run war package - -``` -java -jar iotdb-grafana-connector-{version}.war -``` - diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Grafana-Plugin.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Grafana-Plugin.md deleted file mode 100644 index fd0db53db..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Grafana-Plugin.md +++ /dev/null @@ -1,298 +0,0 @@ - - -# Grafana Plugin - - -Grafana is an open source volume metrics monitoring and visualization tool, which can be used to present time series data and analyze application runtime status. - -We developed the Grafana-Plugin for IoTDB, using the IoTDB REST service to present time series data and providing many visualization methods for time series data. -Compared with previous IoTDB-Grafana-Connector, current Grafana-Plugin performs more efficiently and supports more query types. So, **we recommend using Grafana-Plugin instead of IoTDB-Grafana-Connector**. - -## Installation and deployment - -### Install Grafana - -* Download url: https://grafana.com/grafana/download -* Version >= 9.3.0 - - -### Acquisition method of grafana plugin - -#### Download apache-iotdb-datasource from Grafana's official website - -Download url:https://grafana.com/api/plugins/apache-iotdb-datasource/versions/1.0.0/download - -### Install Grafana-Plugin - -### Method 1: Install using the grafana cli tool (recommended) - -* Use the grafana cli tool to install apache-iotdb-datasource from the command line. The command content is as follows: - -```shell -grafana-cli plugins install apache-iotdb-datasource -``` - -### Method 2: Install using the Grafana interface (recommended) - -* Click on Configuration ->Plugins ->Search IoTDB from local Grafana to install the plugin - -### Method 3: Manually install the grafana-plugin plugin (not recommended) - - -* Copy the front-end project target folder generated above to Grafana's plugin directory `${Grafana directory}\data\plugins\`。If there is no such directory, you can manually create it or start grafana and it will be created automatically. Of course, you can also modify the location of plugins. For details, please refer to the following instructions for modifying the location of Grafana's plugin directory. - -* Start Grafana (restart if the Grafana service is already started) - -For more details,please click [here](https://grafana.com/docs/grafana/latest/plugins/installation/) - -### Start Grafana - -Start Grafana with the following command in the Grafana directory: - -* Windows: - -```shell -bin\grafana-server.exe -``` -* Linux: - -```shell -sudo service grafana-server start -``` - -* MacOS: - -```shell -brew services start grafana -``` - -For more details,please click [here](https://grafana.com/docs/grafana/latest/installation/) - - - -### Configure IoTDB REST Service - -* Modify `{iotdb directory}/conf/iotdb-datanode.properties` as following: - -```properties -# Is the REST service enabled -enable_rest_service=true - -# the binding port of the REST service -rest_service_port=18080 -``` - -Start IoTDB (restart if the IoTDB service is already started) - - -## How to use Grafana-Plugin - -### Access Grafana dashboard - -Grafana displays data in a web page dashboard. Please open your browser and visit `http://:` when using it. - -* IP is the IP of the server where your Grafana is located, and Port is the running port of Grafana (default 3000). - -* The default login username and password are both `admin`. - - -### Add IoTDB as Data Source - -Click the `Settings` icon on the left, select the `Data Source` option, and then click `Add data source`. - - - - - -Select the `Apache IoTDB` data source. - -* Fill in `http://:` in the `URL` field - * ip is the host ip where your IoTDB server is located - * port is the running port of the REST service (default 18080). -* Enter the username and password of the IoTDB server - -Click `Save & Test`, and `Data source is working` will appear. - - - - -### Create a new Panel - -Click the `Dashboards` icon on the left, and select `Manage` option. - - - -Click the `New Dashboard` icon on the top right, and select `Add an empty panel` option. - - - -Grafana plugin supports SQL: Full Customized mode and SQL: Drop-down List mode, and the default mode is SQL: Full Customized mode. - - - -#### SQL: Full Customized input method - -Enter content in the SELECT, FROM , WHERE and CONTROL input box, where the WHERE and CONTROL input boxes are optional. - -If a query involves multiple expressions, we can click `+` on the right side of the SELECT input box to add expressions in the SELECT clause, or click `+` on the right side of the FROM input box to add a path prefix: - - - -SELECT input box: contents can be the time series suffix, function, udf, arithmetic expression, or nested expressions. You can also use the as clause to rename the result. - -Here are some examples of valid SELECT content: - -* `s1` -* `top_k(s1, 'k'='1') as top` -* `sin(s1) + cos(s1 + s2)` -* `udf(s1) as "alias"` - -FROM input box: contents must be the prefix path of the time series, such as `root.sg.d`. - -WHERE input box: contents should be the filter condition of the query, such as `time > 0` or `s1 < 1024 and s2 > 1024`. - -CONTROL input box: contents should be a special clause that controls the query type and output format. -The GROUP BY input box supports the use of grafana's global variables to obtain the current time interval changes $__from (start time), $__to (end time) - -Here are some examples of valid CONTROL content: - -* `GROUP BY ([$__from, $__to), 1d)` -* `GROUP BY ([$__from, $__to),3h,1d)` -* `GROUP BY ([2017-11-01T00:00:00, 2017-11-07T23:00:00), 1d)` -* `GROUP BY ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d)` -* `GROUP BY ([$__from, $__to), 1m) FILL (PREVIOUSUNTILLAST)` -* `GROUP BY ([2017-11-07T23:50:00, 2017-11-07T23:59:00), 1m) FILL (PREVIOUSUNTILLAST)` -* `GROUP BY ([2017-11-07T23:50:00, 2017-11-07T23:59:00), 1m) FILL (PREVIOUS, 1m)` -* `GROUP BY ([2017-11-07T23:50:00, 2017-11-07T23:59:00), 1m) FILL (LINEAR, 5m, 5m)` -* `GROUP BY ((2017-11-01T00:00:00, 2017-11-07T23:00:00], 1d), LEVEL=1` -* `GROUP BY ([0, 20), 2ms, 3ms), LEVEL=1` - - -Tip: Statements like `select * from root.xx.**` are not recommended because those statements may cause OOM. - -#### SQL: Drop-down List - -Select a time series in the TIME-SERIES selection box, select a function in the FUNCTION option, and enter the contents in the SAMPLING INTERVAL、SLIDING STEP、LEVEL、FILL input boxes, where TIME-SERIES is a required item and the rest are non required items. - - - -### Support for variables and template functions - -Both SQL: Full Customized and SQL: Drop-down List input methods support the variable and template functions of grafana. In the following example, raw input method is used, and aggregation is similar. - -After creating a new Panel, click the Settings button in the upper right corner: - - - -Select `Variables`, click `Add variable`: - - - -Example 1:Enter `Name`, `Label`, and `Query`, and then click the `Update` button: - - - -Apply Variables, enter the variable in the `grafana panel` and click the `save` button: - - - -Example 2: Nested use of variables: - - - - - - - - -Example 3: using function variables - - - - - -The Name in the above figure is the variable name and the variable name we will use in the panel in the future. Label is the display name of the variable. If it is empty, the variable of Name will be displayed. Otherwise, the name of the Label will be displayed. -There are Query, Custom, Text box, Constant, DataSource, Interval, Ad hoc filters, etc. in the Type drop-down, all of which can be used in IoTDB's Grafana Plugin -For a more detailed introduction to usage, please check the official manual (https://grafana.com/docs/grafana/latest/variables/) - -In addition to the examples above, the following statements are supported: - -* `show databases` -* `show timeseries` -* `show child nodes` -* `show all ttl` -* `show latest timeseries` -* `show devices` -* `select xx from root.xxx limit xx 等sql 查询` - -Tip: If the query field contains Boolean data, the result value will be converted to 1 by true and 0 by false. - -### Grafana alert function - -This plugin supports Grafana alert function. - -1. In the Grafana panel, click the `alerting` button, as shown in the following figure: - - - -2. Click `Create alert rule from this panel`, as shown in the figure below: - - - -3. Set query and alarm conditions in step 1. Conditions represent query conditions, and multiple combined query conditions can be configured. As shown below: - - -The query condition in the figure: `min() OF A IS BELOW 0`, means that the condition will be triggered when the minimum value in the A tab is 0, click this function to change it to another function. - -Tip: Queries used in alert rules cannot contain any template variables. Currently we only support AND and OR operators between conditions, which are executed serially. -For example, we have 3 conditions in the following order: Condition: B (Evaluates to: TRUE) OR Condition: C (Evaluates to: FALSE) and Condition: D (Evaluates to: TRUE) So the result will evaluate to ((True or False ) and right) = right. - - -4. After selecting indicators and alarm rules, click the `Preview` button to preview the data as shown in the figure below: - - - -5. In step 2, specify the alert evaluation interval, and for `Evaluate every`, specify the evaluation frequency. Must be a multiple of 10 seconds. For example, 1m, 30s. - For `Evaluate for`, specify the duration before the alert fires. As shown below: - - - -6. In step 3, add the storage location, rule group, and other metadata associated with the rule. Where `Rule name` specifies the name of the rule. Rule names must be unique. - - - -7. In step 4, add a custom label. Add a custom label by selecting an existing key-value pair from the drop-down list, or add a new label by entering a new key or value. As shown below: - - - -8. Click `Save` to save the rule or click `Save and Exit` to save the rule and return to the alerts page. - -9. Commonly used alarm states include `Normal`, `Pending`, `Firing` and other states, as shown in the figure below: - - - - -10. We can also configure `Contact points` for alarms to receive alarm notifications. For more detailed operations, please refer to the official document (https://grafana.com/docs/grafana/latest/alerting/manage-notifications/create-contact-point/). - -## More Details about Grafana - -For more details about Grafana operation, please refer to the official Grafana documentation: http://docs.grafana.org/guides/getting_started/. diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Hive-TsFile.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Hive-TsFile.md deleted file mode 100644 index e8b4dc30d..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Hive-TsFile.md +++ /dev/null @@ -1,170 +0,0 @@ - -# Apache Hive(TsFile) - -## About Hive-TsFile-Connector - -Hive-TsFile-Connector implements the support of Hive for external data sources of Tsfile type. This enables users to operate TsFile by Hive. - -With this connector, you can - -* Load a single TsFile, from either the local file system or hdfs, into hive -* Load all files in a specific directory, from either the local file system or hdfs, into hive -* Query the tsfile through HQL. -* As of now, the write operation is not supported in hive-connector. So, insert operation in HQL is not allowed while operating tsfile through hive. - -## System Requirements - -|Hadoop Version |Hive Version | Java Version | TsFile | -|------------- |------------ | ------------ |------------ | -| `2.7.3` or `3.2.1` | `2.3.6` or `3.1.2` | `1.8` | `1.0.0`| - -> Note: For more information about how to download and use TsFile, please see the following link: https://github.com/apache/iotdb/tree/master/tsfile. - -## Data Type Correspondence - -| TsFile data type | Hive field type | -| ---------------- | --------------- | -| BOOLEAN | Boolean | -| INT32 | INT | -| INT64 | BIGINT | -| FLOAT | Float | -| DOUBLE | Double | -| TEXT | STRING | - - -## Add Dependency For Hive - -To use hive-connector in hive, we should add the hive-connector jar into hive. - -After downloading the code of iotdb from , you can use the command of `mvn clean package -pl iotdb-connector/hive-connector -am -Dmaven.test.skip=true -P get-jar-with-dependencies` to get a `hive-connector-X.X.X-jar-with-dependencies.jar`. - -Then in hive, use the command of `add jar XXX` to add the dependency. For example: - -``` -hive> add jar /Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar; - -Added [/Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar] to class path -Added resources: [/Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar] -``` - - -## Create Tsfile-backed Hive tables - -To create a Tsfile-backed table, specify the `serde` as `org.apache.iotdb.hive.TsFileSerDe`, -specify the `inputformat` as `org.apache.iotdb.hive.TSFHiveInputFormat`, -and the `outputformat` as `org.apache.iotdb.hive.TSFHiveOutputFormat`. - -Also provide a schema which only contains two fields: `time_stamp` and `sensor_id` for the table. -`time_stamp` is the time value of the time series -and `sensor_id` is the sensor name to extract from the tsfile to hive such as `sensor_1`. -The name of the table can be any valid table names in hive. - -Also a location provided for hive-connector to pull the most current data for the table. - -The location should be a specific directory on your local file system or HDFS to set up Hadoop. -If it is in your local file system, the location should look like `file:///data/data/sequence/root.baic2.WWS.leftfrontdoor/` - -Last, set the `device_id` in `TBLPROPERTIES` to the device name you want to analyze. - -For example: - -``` -CREATE EXTERNAL TABLE IF NOT EXISTS only_sensor_1( - time_stamp TIMESTAMP, - sensor_1 BIGINT) -ROW FORMAT SERDE 'org.apache.iotdb.hive.TsFileSerDe' -STORED AS - INPUTFORMAT 'org.apache.iotdb.hive.TSFHiveInputFormat' - OUTPUTFORMAT 'org.apache.iotdb.hive.TSFHiveOutputFormat' -LOCATION '/data/data/sequence/root.baic2.WWS.leftfrontdoor/' -TBLPROPERTIES ('device_id'='root.baic2.WWS.leftfrontdoor.plc1'); -``` -In this example, the data of `root.baic2.WWS.leftfrontdoor.plc1.sensor_1` is pulled from the directory of `/data/data/sequence/root.baic2.WWS.leftfrontdoor/`. -This table results in a description as below: - -``` -hive> describe only_sensor_1; -OK -time_stamp timestamp from deserializer -sensor_1 bigint from deserializer -Time taken: 0.053 seconds, Fetched: 2 row(s) -``` -At this point, the Tsfile-backed table can be worked with in Hive like any other table. - -## Query from TsFile-backed Hive tables - -Before we do any queries, we should set the `hive.input.format` in hive by executing the following command. - -``` -hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; -``` - -Now, we already have an external table named `only_sensor_1` in hive. -We can use any query operations through HQL to analyse it. - -For example: - -### Select Clause Example - -``` -hive> select * from only_sensor_1 limit 10; -OK -1 1000000 -2 1000001 -3 1000002 -4 1000003 -5 1000004 -6 1000005 -7 1000006 -8 1000007 -9 1000008 -10 1000009 -Time taken: 1.464 seconds, Fetched: 10 row(s) -``` - -### Aggregate Clause Example - -``` -hive> select count(*) from only_sensor_1; -WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. -Query ID = jackietien_20191016202416_d1e3e233-d367-4453-b39a-2aac9327a3b6 -Total jobs = 1 -Launching Job 1 out of 1 -Number of reduce tasks determined at compile time: 1 -In order to change the average load for a reducer (in bytes): - set hive.exec.reducers.bytes.per.reducer= -In order to limit the maximum number of reducers: - set hive.exec.reducers.max= -In order to set a constant number of reducers: - set mapreduce.job.reduces= -Job running in-process (local Hadoop) -2019-10-16 20:24:18,305 Stage-1 map = 0%, reduce = 0% -2019-10-16 20:24:27,443 Stage-1 map = 100%, reduce = 100% -Ended Job = job_local867757288_0002 -MapReduce Jobs Launched: -Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS -Total MapReduce CPU Time Spent: 0 msec -OK -1000000 -Time taken: 11.334 seconds, Fetched: 1 row(s) -``` - diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Ignition-IoTDB-plugin_timecho.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Ignition-IoTDB-plugin_timecho.md deleted file mode 100644 index 10f07ed73..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Ignition-IoTDB-plugin_timecho.md +++ /dev/null @@ -1,275 +0,0 @@ - - -# Ignition - -## Product Overview - -1. Introduction to Ignition - - Ignition is a web-based monitoring and data acquisition tool (SCADA) - an open and scalable universal platform. Ignition allows you to more easily control, track, display, and analyze all data of your enterprise, enhancing business capabilities. For more introduction details, please refer to [Ignition Official Website](https://docs.inductiveautomation.com/docs/8.1/getting-started/introducing-ignition) - -2. Introduction to the Ignition-IoTDB Connector - - The ignition-IoTDB Connector is divided into two modules: the ignition-IoTDB Connector,Ignition-IoTDB With JDBC。 Among them: - - - Ignition-IoTDB Connector: Provides the ability to store data collected by Ignition into IoTDB, and also supports data reading in Components. It injects script interfaces such as `system. iotdb. insert`and`system. iotdb. query`to facilitate programming in Ignition - - Ignition-IoTDB With JDBC: Ignition-IoTDB With JDBC can be used in the`Transaction Groups`module and is not applicable to the`Tag Historian`module. It can be used for custom writing and querying. - - The specific relationship and content between the two modules and ignition are shown in the following figure. - - ![](/img/20240703114443.png) - -## Installation Requirements - -| **Preparation Content** | Version Requirements | -| ------------------------------- | ------------------------------------------------------------ | -| IoTDB | Version 1.3.1 and above are required to be installed, please refer to IoTDB for installation [Deployment Guidance](../Deployment-and-Maintenance/IoTDB-Package_timecho.md) | -| Ignition | Requirement: 8.1 version (8.1.37 and above) of version 8.1 must be installed. Please refer to the Ignition official website for installation [Installation Guidance](https://docs.inductiveautomation.com/docs/8.1/getting-started/installing-and-upgrading)(Other versions are compatible, please contact the business department for more information) | -| Ignition-IoTDB Connector module | Please contact Business to obtain | -| Ignition-IoTDB With JDBC module | Download address:https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/ | - -## Instruction Manual For Ignition-IoTDB Connector - -### Introduce - -The Ignition-IoTDB Connector module can store data in a database connection associated with the historical database provider. The data is directly stored in a table in the SQL database based on its data type, as well as a millisecond timestamp. Store data only when making changes based on the value pattern and dead zone settings on each label, thus avoiding duplicate and unnecessary data storage. - -The Ignition-IoTDB Connector provides the ability to store the data collected by Ignition into IoTDB. - -### Installation Steps - -Step 1: Enter the `Configuration` - `System` - `Modules` module and click on the `Install or Upgrade a Module` button at the bottom - -![](/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-1.png) - -Step 2: Select the obtained `modl`, select the file and upload it, click `Install`, and trust the relevant certificate. - -![](/img/20240703-151030.png) - -Step 3: After installation is completed, you can see the following content - -![](/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-3.png) - -Step 4: Enter the `Configuration` - `Tags` - `History` module and click on `Create new Historical Tag Provider` below - -![](/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-4.png) - -Step 5: Select `IoTDB` and fill in the configuration information - -![](/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-5.png) - -The configuration content is as follows: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
NameDescriptionDefault ValueNotes
Main
Provider NameProvider Name-
Enabled trueThe provider can only be used when it is true
DescriptionDescription-
IoTDB Settings
Host NameThe address of the target IoTDB instance-
Port NumberThe port of the target IoTDB instance6667
UsernameThe username of the target IoTDB-
PasswordPassword for target IoTDB-
Database NameThe database name to be stored, starting with root, such as root db-
Pool SizeSize of SessionPool50Can be configured as needed
Store and Forward SettingsJust keep it as default
- - - -### Instructions - -#### Configure Historical Data Storage - -- After configuring the `Provider`, you can use the `IoTDB Tag Historian` in the `Designer`, just like using other `Providers`. Right click on the corresponding `Tag` and select `Edit Tag (s) `, then select the History category in the Tag Editor - - ![](/img/ignition-7.png) - -- Set `History Disabled` to `true`, select `Storage Provider` as the `Provider` created in the previous step, configure other parameters as needed, click `OK`, and then save the project. At this point, the data will be continuously stored in the 'IoTDB' instance according to the set content. - - ![](/img/ignition-8.png) - -#### Read Data - -- You can also directly select the tags stored in IoTDB under the Data tab of the Report - - ![](/img/ignition-9.png) - -- You can also directly browse relevant data in Components - - ![](/img/ignition-10.png) - -#### Script module: This function can interact with IoTDB - -1. system.iotdb.insert: - - -- Script Description: Write data to an IoTDB instance - -- Script Definition: - - `system.iotdb.insert(historian, deviceId, timestamps, measurementNames, measurementValues)` - -- Parameter: - - - `str historian`:The name of the corresponding IoTDB Tag Historian Provider - - `str deviceId`:The deviceId written, excluding the configured database, such as Sine - - `long[] timestamps`:List of timestamps for written data points - - `str[] measurementNames`:List of names for written physical quantities - - `str[][] measurementValues`:The written data point data corresponds to the timestamp list and physical quantity name list - -- Return Value: None - -- Available Range:Client, Designer, Gateway - -- Usage example: - - ```shell - system.iotdb.insert("IoTDB", "Sine", [system.date.now()],["measure1","measure2"],[["val1","val2"]]) - ``` - -2. system.iotdb.query: - - -- Script Description:Query the data written to the IoTDB instance - -- Script Definition: - - `system.iotdb.query(historian, sql)` - -- Parameter: - - - `str historian`:The name of the corresponding IoTDB Tag Historian Provider - - `str sql`:SQL statement to be queried - -- Return Value: - Query Results:`List>` - -- Available Range:Client, Designer, Gateway - -- Usage example: - - ```Python - system.iotdb.query("IoTDB", "select * from root.db.Sine where time > 1709563427247") - ``` - -## Ignition-IoTDB With JDBC - -### Introduce - - Ignition-IoTDB With JDBC provides a JDBC driver that allows users to connect and query the Ignition IoTDB database using standard JDBC APIs - -### Installation Steps - -Step 1: Enter the `Configuration` - `Databases` -`Drivers` module and create the `Translator` - -![](/img/Ignition-IoTDBWithJDBC-1.png) - -Step 2: Enter the `Configuration` - `Databases` - `Drivers` module, create a `JDBC Driver` , select the `Translator` configured in the previous step, and upload the downloaded `IoTDB JDBC`. Set the Classname to `org. apache. iotdb. jdbc.IoTDBDriver` - -![](/img/Ignition-IoTDBWithJDBC-2.png) - -Step 3: Enter the `Configuration` - `Databases` - `Connections` module, create a new `Connections` , select the`IoTDB Driver` created in the previous step for `JDBC Driver`, configure the relevant information, and save it to use - -![](/img/Ignition-IoTDBWithJDBC-3.png) - -### Instructions - -#### Data Writing - -Select the previously created `Connection` from the `Data Source` in the `Transaction Groups` - -- `Table name`needs to be set as the complete device path starting from root -- Uncheck `Automatically create table` -- `Store timestame to` configure as time - -Do not select other options, set the fields, and after `enabled` , the data will be installed and stored in the corresponding IoTDB - -![](/img/%E6%95%B0%E6%8D%AE%E5%86%99%E5%85%A5-1.png) - -#### Query - -- Select `Data Source` in the `Database Query Browser` and select the previously created `Connection` to write an SQL statement to query the data in IoTDB - -![](/img/%E6%95%B0%E6%8D%AE%E6%9F%A5%E8%AF%A2-ponz.png) - diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/NiFi-IoTDB.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/NiFi-IoTDB.md deleted file mode 100644 index 531c5119c..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/NiFi-IoTDB.md +++ /dev/null @@ -1,141 +0,0 @@ - -# Apache NiFi - -## Apache NiFi Introduction - -Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data. - -Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. - -Apache NiFi includes the following capabilities: - -* Browser-based user interface - * Seamless experience for design, control, feedback, and monitoring -* Data provenance tracking - * Complete lineage of information from beginning to end -* Extensive configuration - * Loss-tolerant and guaranteed delivery - * Low latency and high throughput - * Dynamic prioritization - * Runtime modification of flow configuration - * Back pressure control -* Extensible design - * Component architecture for custom Processors and Services - * Rapid development and iterative testing -* Secure communication - * HTTPS with configurable authentication strategies - * Multi-tenant authorization and policy management - * Standard protocols for encrypted communication including TLS and SSH - -## PutIoTDBRecord - -This is a processor that reads the content of the incoming FlowFile as individual records using the configured 'Record Reader' and writes them to Apache IoTDB using native interface. - -### Properties of PutIoTDBRecord - -| property | description | default value | necessary | -|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ------------- | --------- | -| Host | The host of IoTDB. | null | true | -| Port | The port of IoTDB. | 6667 | true | -| Username | Username to access the IoTDB. | null | true | -| Password | Password to access the IoTDB. | null | true | -| Prefix | The Prefix begin with root. that will be add to the tsName in data.
It can be updated by expression language. | null | true | -| Time | The name of time field | null | true | -| Record Reader | Specifies the type of Record Reader controller service to use
for parsing the incoming data and determining the schema. | null | true | -| Schema | The schema that IoTDB needs doesn't support good by NiFi.
Therefore, you can define the schema here.
Besides, you can set encoding type and compression type by this method.
If you don't set this property, the inferred schema will be used.
It can be updated by expression language. | null | false | -| Aligned | Whether using aligned interface? It can be updated by expression language. | false | false | -| MaxRowNumber | Specifies the max row number of each tablet. It can be updated by expression language. | 1024 | false | - -### Inferred Schema of Flowfile - -There are a couple of rules about flowfile: - -1. The flowfile can be read by `Record Reader`. -2. The schema of flowfile must contain a time field with name set in Time property. -3. The data type of time must be `STRING` or `LONG`. -4. Fields excepted time must start with `root.`. -5. The supported data types are `INT`, `LONG`, `FLOAT`, `DOUBLE`, `BOOLEAN`, `TEXT`. - -### Convert Schema by property - -As mentioned above, converting schema by property which is more flexible and stronger than inferred schema. - -The structure of property `Schema`: - -```json -{ - "fields": [{ - "tsName": "s1", - "dataType": "INT32", - "encoding": "RLE", - "compressionType": "GZIP" - }, { - "tsName": "s2", - "dataType": "INT64", - "encoding": "RLE", - "compressionType": "GZIP" - }] -} -``` - -**Note** - -1. The first column must be `Time`. The rest must be arranged in the same order as in `field` of JSON. -1. The JSON of schema must contain `timeType` and `fields`. -2. There are only two options `LONG` and `STRING` for `timeType`. -3. The columns `tsName` and `dataType` must be set. -4. The property `Prefix` will be added to tsName as the field name when add data to IoTDB. -5. The supported `dataTypes` are `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOLEAN`, `TEXT`. -6. The supported `encoding` are `PLAIN`, `DICTIONARY`, `RLE`, `DIFF`, `TS_2DIFF`, `BITMAP`, `GORILLA_V1`, `REGULAR`, `GORILLA`, `CHIMP`, `SPRINTZ`, `RLBE`. -7. The supported `compressionType` are `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZO`, `SDT`, `PAA`, `PLA`, `LZ4`, `ZSTD`, `LZMA2`. - -## Relationships - -| relationship | description | -| ------------ | ---------------------------------------------------- | -| success | Data can be written correctly or flow file is empty. | -| failure | The shema or flow file is abnormal. | - - -## QueryIoTDBRecord - -This is a processor that reads the sql query from the incoming FlowFile and using it to query the result from IoTDB using native interface. Then it use the configured 'Record Writer' to generate the flowfile - -### Properties of QueryIoTDBRecord - -| property | description | default value | necessary | -|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------| --------- | -| Host | The host of IoTDB. | null | true | -| Port | The port of IoTDB. | 6667 | true | -| Username | Username to access the IoTDB. | null | true | -| Password | Password to access the IoTDB. | null | true | -| Record Writer | Specifies the Controller Service to use for writing results to a FlowFile. The Record Writer may use Inherit Schema to emulate the inferred schema behavior, i.e. An explicit schema need not be defined in the writer, and will be supplied by the same logic used to infer the schema from the column types. | null | true | -| iotdb-query | The IoTDB query to execute.
Note: If there are incoming connections, then the query is created from incoming FlowFile's content otherwise"it is created from this property. | null | false | -| iotdb-query-chunk-size | Chunking can be used to return results in a stream of smaller batches (each has a partial results up to a chunk size) rather than as a single response. Chunking queries can return an unlimited number of rows. Note: Chunking is enable when result chunk size is greater than 0 | 0 | false | - - -## Relationships - -| relationship | description | -| ------------ | ---------------------------------------------------- | -| success | Data can be written correctly or flow file is empty. | -| failure | The shema or flow file is abnormal. | \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Spark-IoTDB.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Spark-IoTDB.md deleted file mode 100644 index 7e03da5c2..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Spark-IoTDB.md +++ /dev/null @@ -1,232 +0,0 @@ - - -# Apache Spark(IoTDB) - -## Supported Versions - -Supported versions of Spark and Scala are as follows: - -| Spark Version | Scala Version | -|----------------|---------------| -| `2.4.0-latest` | `2.11, 2.12` | - -## Precautions - -1. The current version of `spark-iotdb-connector` supports Scala `2.11` and `2.12`, but not `2.13`. -2. `spark-iotdb-connector` supports usage in Spark for both Java, Scala, and PySpark. - -## Deployment - -`spark-iotdb-connector` has two use cases: IDE development and `spark-shell` debugging. - -### IDE Development - -For IDE development, simply add the following dependency to the `pom.xml` file: - -``` xml - - org.apache.iotdb - - spark-iotdb-connector_2.12.10 - ${iotdb.version} - -``` - -### `spark-shell` Debugging - -To use `spark-iotdb-connector` in `spark-shell`, you need to download the `with-dependencies` version of the jar package -from the official website. After that, copy the jar package to the `${SPARK_HOME}/jars` directory. -Simply execute the following command: - -```shell -cp spark-iotdb-connector_2.12.10-${iotdb.version}.jar $SPARK_HOME/jars/ -``` - -In addition, to ensure that spark can use JDBC and IoTDB connections, you need to do the following: - -Run the following command to compile the IoTDB JDBC connector: - -```shell -mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies -``` - -The compiled jar package is located in the following directory: - -```shell -$IoTDB_HOME/iotdb-client/jdbc/target/iotdb-jdbc-{version}-SNAPSHOT-jar-with-dependencies.jar -``` - -At last, copy the jar package to the ${SPARK_HOME}/jars directory. Simply execute the following command: - -```shell -cp iotdb-jdbc-{version}-SNAPSHOT-jar-with-dependencies.jar $SPARK_HOME/jars/ -``` - -## Usage - -### Parameters - -| Parameter | Description | Default Value | Scope | Can be Empty | -|--------------|--------------------------------------------------------------------------------------------------------------|---------------|-------------|--------------| -| url | Specifies the JDBC URL of IoTDB | null | read, write | false | -| user | The username of IoTDB | root | read, write | true | -| password | The password of IoTDB | root | read, write | true | -| sql | Specifies the SQL statement for querying | null | read | true | -| numPartition | Specifies the partition number of the DataFrame when in read, and the write concurrency number when in write | 1 | read, write | true | -| lowerBound | The start timestamp of the query (inclusive) | 0 | read | true | -| upperBound | The end timestamp of the query (inclusive) | 0 | read | true | - -### Reading Data from IoTDB - -Here is an example that demonstrates how to read data from IoTDB into a DataFrame: - -```scala -import org.apache.iotdb.spark.db._ - -val df = spark.read.format("org.apache.iotdb.spark.db") - .option("user", "root") - .option("password", "root") - .option("url", "jdbc:iotdb://127.0.0.1:6667/") - .option("sql", "select ** from root") // query SQL - .option("lowerBound", "0") // lower timestamp bound - .option("upperBound", "100000000") // upper timestamp bound - .option("numPartition", "5") // number of partitions - .load - -df.printSchema() - -df.show() -``` - -### Writing Data to IoTDB - -Here is an example that demonstrates how to write data to IoTDB: - -```scala -// Construct narrow table data -val df = spark.createDataFrame(List( - (1L, "root.test.d0", 1, 1L, 1.0F, 1.0D, true, "hello"), - (2L, "root.test.d0", 2, 2L, 2.0F, 2.0D, false, "world"))) - -val dfWithColumn = df.withColumnRenamed("_1", "Time") - .withColumnRenamed("_2", "Device") - .withColumnRenamed("_3", "s0") - .withColumnRenamed("_4", "s1") - .withColumnRenamed("_5", "s2") - .withColumnRenamed("_6", "s3") - .withColumnRenamed("_7", "s4") - .withColumnRenamed("_8", "s5") - -// Write narrow table data -dfWithColumn - .write - .format("org.apache.iotdb.spark.db") - .option("url", "jdbc:iotdb://127.0.0.1:6667/") - .save - -// Construct wide table data -val df = spark.createDataFrame(List( - (1L, 1, 1L, 1.0F, 1.0D, true, "hello"), - (2L, 2, 2L, 2.0F, 2.0D, false, "world"))) - -val dfWithColumn = df.withColumnRenamed("_1", "Time") - .withColumnRenamed("_2", "root.test.d0.s0") - .withColumnRenamed("_3", "root.test.d0.s1") - .withColumnRenamed("_4", "root.test.d0.s2") - .withColumnRenamed("_5", "root.test.d0.s3") - .withColumnRenamed("_6", "root.test.d0.s4") - .withColumnRenamed("_7", "root.test.d0.s5") - -// Write wide table data -dfWithColumn.write.format("org.apache.iotdb.spark.db") - .option("url", "jdbc:iotdb://127.0.0.1:6667/") - .option("numPartition", "10") - .save -``` - -### Wide and Narrow Table Conversion - -Here are examples of how to convert between wide and narrow tables: - -* From wide to narrow - -```scala -import org.apache.iotdb.spark.db._ - -val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root.** where time < 1100 and time > 1000").load -val narrow_df = Transformer.toNarrowForm(spark, wide_df) -``` - -* From narrow to wide - -```scala -import org.apache.iotdb.spark.db._ - -val wide_df = Transformer.toWideForm(spark, narrow_df) -``` - -## Wide and Narrow Tables - -Using the TsFile structure as an example: there are three measurements in the TsFile pattern, -namely `Status`, `Temperature`, and `Hardware`. The basic information for each of these three measurements is as -follows: - -| Name | Type | Encoding | -|-------------|---------|----------| -| Status | Boolean | PLAIN | -| Temperature | Float | RLE | -| Hardware | Text | PLAIN | - -The existing data in the TsFile is as follows: - -* `d1:root.ln.wf01.wt01` -* `d2:root.ln.wf02.wt02` - -| time | d1.status | time | d1.temperature | time | d2.hardware | time | d2.status | -|------|-----------|------|----------------|------|-------------|------|-----------| -| 1 | True | 1 | 2.2 | 2 | "aaa" | 1 | True | -| 3 | True | 2 | 2.2 | 4 | "bbb" | 2 | False | -| 5 | False | 3 | 2.1 | 6 | "ccc" | 4 | True | - -The wide (default) table form is as follows: - -| Time | root.ln.wf02.wt02.temperature | root.ln.wf02.wt02.status | root.ln.wf02.wt02.hardware | root.ln.wf01.wt01.temperature | root.ln.wf01.wt01.status | root.ln.wf01.wt01.hardware | -|------|-------------------------------|--------------------------|----------------------------|-------------------------------|--------------------------|----------------------------| -| 1 | null | true | null | 2.2 | true | null | -| 2 | null | false | aaa | 2.2 | null | null | -| 3 | null | null | null | 2.1 | true | null | -| 4 | null | true | bbb | null | null | null | -| 5 | null | null | null | null | false | null | -| 6 | null | null | ccc | null | null | null | - -You can also use the narrow table format as shown below: - -| Time | Device | status | hardware | temperature | -|------|-------------------|--------|----------|-------------| -| 1 | root.ln.wf02.wt01 | true | null | 2.2 | -| 1 | root.ln.wf02.wt02 | true | null | null | -| 2 | root.ln.wf02.wt01 | null | null | 2.2 | -| 2 | root.ln.wf02.wt02 | false | aaa | null | -| 3 | root.ln.wf02.wt01 | true | null | 2.1 | -| 4 | root.ln.wf02.wt02 | true | bbb | null | -| 5 | root.ln.wf02.wt01 | false | null | null | -| 6 | root.ln.wf02.wt02 | null | ccc | null | \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Spark-TsFile.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Spark-TsFile.md deleted file mode 100644 index 151d81e14..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Spark-TsFile.md +++ /dev/null @@ -1,315 +0,0 @@ - - -# Apache Spark(TsFile) - -## About Spark-TsFile-Connector - -Spark-TsFile-Connector implements the support of Spark for external data sources of Tsfile type. This enables users to read, write and query Tsfile by Spark. - -With this connector, you can - -* load a single TsFile, from either the local file system or hdfs, into Spark -* load all files in a specific directory, from either the local file system or hdfs, into Spark -* write data from Spark into TsFile - -## System Requirements - -|Spark Version | Scala Version | Java Version | TsFile | -|:-------------: | :-------------: | :------------: |:------------: | -| `2.4.3` | `2.11.8` | `1.8` | `1.0.0`| - -> Note: For more information about how to download and use TsFile, please see the following link: https://github.com/apache/iotdb/tree/master/tsfile. -> Currently we only support spark version 2.4.3 and there are some known issue on 2.4.7, do no use it - -## Quick Start -### Local Mode - -Start Spark with TsFile-Spark-Connector in local mode: - -``` -./ --jars tsfile-spark-connector.jar,tsfile-{version}-jar-with-dependencies.jar,hadoop-tsfile-{version}-jar-with-dependencies.jar -``` - -Note: - -* \ is the real path of your spark-shell. -* Multiple jar packages are separated by commas without any spaces. -* See https://github.com/apache/iotdb/tree/master/tsfile for how to get TsFile. - - -### Distributed Mode - -Start Spark with TsFile-Spark-Connector in distributed mode (That is, the spark cluster is connected by spark-shell): - -``` -. / --jars tsfile-spark-connector.jar,tsfile-{version}-jar-with-dependencies.jar,hadoop-tsfile-{version}-jar-with-dependencies.jar --master spark://ip:7077 -``` - -Note: - -* \ is the real path of your spark-shell. -* Multiple jar packages are separated by commas without any spaces. -* See https://github.com/apache/iotdb/tree/master/tsfile for how to get TsFile. - -## Data Type Correspondence - -| TsFile data type | SparkSQL data type| -| --------------| -------------- | -| BOOLEAN | BooleanType | -| INT32 | IntegerType | -| INT64 | LongType | -| FLOAT | FloatType | -| DOUBLE | DoubleType | -| TEXT | StringType | - -## Schema Inference - -The way to display TsFile is dependent on the schema. Take the following TsFile structure as an example: There are three measurements in the TsFile schema: status, temperature, and hardware. The basic information of these three measurements is listed: - - -|Name|Type|Encode| -|---|---|---| -|status|Boolean|PLAIN| -|temperature|Float|RLE| -|hardware|Text|PLAIN| - -The existing data in the TsFile are: - -ST 1 - -The corresponding SparkSQL table is: - -| time | root.ln.wf02.wt02.temperature | root.ln.wf02.wt02.status | root.ln.wf02.wt02.hardware | root.ln.wf01.wt01.temperature | root.ln.wf01.wt01.status | root.ln.wf01.wt01.hardware | -|------|-------------------------------|--------------------------|----------------------------|-------------------------------|--------------------------|----------------------------| -| 1 | null | true | null | 2.2 | true | null | -| 2 | null | false | aaa | 2.2 | null | null | -| 3 | null | null | null | 2.1 | true | null | -| 4 | null | true | bbb | null | null | null | -| 5 | null | null | null | null | false | null | -| 6 | null | null | ccc | null | null | null | - -You can also use narrow table form which as follows: (You can see part 6 about how to use narrow form) - -| time | device_name | status | hardware | temperature | -|------|-------------------------------|--------------------------|----------------------------|-------------------------------| -| 1 | root.ln.wf02.wt01 | true | null | 2.2 | -| 1 | root.ln.wf02.wt02 | true | null | null | -| 2 | root.ln.wf02.wt01 | null | null | 2.2 | -| 2 | root.ln.wf02.wt02 | false | aaa | null | -| 3 | root.ln.wf02.wt01 | true | null | 2.1 | -| 4 | root.ln.wf02.wt02 | true | bbb | null | -| 5 | root.ln.wf02.wt01 | false | null | null | -| 6 | root.ln.wf02.wt02 | null | ccc | null | - - - -## Scala API - -NOTE: Remember to assign necessary read and write permissions in advance. - -* Example 1: read from the local file system - -```scala -import org.apache.iotdb.spark.tsfile._ -val wide_df = spark.read.tsfile("test.tsfile") -wide_df.show - -val narrow_df = spark.read.tsfile("test.tsfile", true) -narrow_df.show -``` - -* Example 2: read from the hadoop file system - -```scala -import org.apache.iotdb.spark.tsfile._ -val wide_df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") -wide_df.show - -val narrow_df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) -narrow_df.show -``` - -* Example 3: read from a specific directory - -```scala -import org.apache.iotdb.spark.tsfile._ -val df = spark.read.tsfile("hdfs://localhost:9000/usr/hadoop") -df.show -``` - -Note 1: Global time ordering of all TsFiles in a directory is not supported now. - -Note 2: Measurements of the same name should have the same schema. - -* Example 4: query in wide form - -```scala -import org.apache.iotdb.spark.tsfile._ -val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") -df.createOrReplaceTempView("tsfile_table") -val newDf = spark.sql("select * from tsfile_table where `device_1.sensor_1`>0 and `device_1.sensor_2` < 22") -newDf.show -``` - -```scala -import org.apache.iotdb.spark.tsfile._ -val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") -df.createOrReplaceTempView("tsfile_table") -val newDf = spark.sql("select count(*) from tsfile_table") -newDf.show -``` - -* Example 5: query in narrow form - -```scala -import org.apache.iotdb.spark.tsfile._ -val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) -df.createOrReplaceTempView("tsfile_table") -val newDf = spark.sql("select * from tsfile_table where device_name = 'root.ln.wf02.wt02' and temperature > 5") -newDf.show -``` - -```scala -import org.apache.iotdb.spark.tsfile._ -val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) -df.createOrReplaceTempView("tsfile_table") -val newDf = spark.sql("select count(*) from tsfile_table") -newDf.show -``` - -* Example 6: write in wide form - -```scala -// we only support wide_form table to write -import org.apache.iotdb.spark.tsfile._ - -val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") -df.show -df.write.tsfile("hdfs://localhost:9000/output") - -val newDf = spark.read.tsfile("hdfs://localhost:9000/output") -newDf.show -``` - -* Example 7: write in narrow form - -```scala -// we only support wide_form table to write -import org.apache.iotdb.spark.tsfile._ - -val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) -df.show -df.write.tsfile("hdfs://localhost:9000/output", true) - -val newDf = spark.read.tsfile("hdfs://localhost:9000/output", true) -newDf.show -``` - - -Appendix A: Old Design of Schema Inference - -The way to display TsFile is related to TsFile Schema. Take the following TsFile structure as an example: There are three measurements in the Schema of TsFile: status, temperature, and hardware. The basic info of these three Measurements is: - - -|Name|Type|Encode| -|---|---|---| -|status|Boolean|PLAIN| -|temperature|Float|RLE| -|hardware|Text|PLAIN| - - -The existing data in the file are: - -ST 2 - -A set of time-series data - -There are two ways to show a set of time-series data: - -* the default way - -Two columns are created to store the full path of the device: time(LongType) and delta_object(StringType). - -- `time` : Timestamp, LongType -- `delta_object` : Delta_object ID, StringType - -Next, a column is created for each Measurement to store the specific data. The SparkSQL table structure is: - -|time(LongType)|delta\_object(StringType)|status(BooleanType)|temperature(FloatType)|hardware(StringType)| -|---|---|---|---|---| -|1| root.ln.wf01.wt01 |True|2.2|null| -|1| root.ln.wf02.wt02 |True|null|null| -|2| root.ln.wf01.wt01 |null|2.2|null| -|2| root.ln.wf02.wt02 |False|null|"aaa"| -|2| root.sgcc.wf03.wt01 |True|null|null| -|3| root.ln.wf01.wt01 |True|2.1|null| -|3| root.sgcc.wf03.wt01 |True|3.3|null| -|4| root.ln.wf01.wt01 |null|2.0|null| -|4| root.ln.wf02.wt02 |True|null|"bbb"| -|4| root.sgcc.wf03.wt01 |True|null|null| -|5| root.ln.wf01.wt01 |False|null|null| -|5| root.ln.wf02.wt02 |False|null|null| -|5| root.sgcc.wf03.wt01 |True|null|null| -|6| root.ln.wf02.wt02 |null|null|"ccc"| -|6| root.sgcc.wf03.wt01 |null|6.6|null| -|7| root.ln.wf01.wt01 |True|null|null| -|8| root.ln.wf02.wt02 |null|null|"ddd"| -|8| root.sgcc.wf03.wt01 |null|8.8|null| -|9| root.sgcc.wf03.wt01 |null|9.9|null| - - - -* unfold delta_object column - -Expand the device column by "." into multiple columns, ignoring the root directory "root". Convenient for richer aggregation operations. To use this display way, the parameter "delta\_object\_name" is set in the table creation statement (refer to Example 5 in Section 5.1 of this manual), as in this example, parameter "delta\_object\_name" is set to "root.device.turbine". The number of path layers needs to be one-to-one. At this point, one column is created for each layer of the device path except the "root" layer. The column name is the name in the parameter and the value is the name of the corresponding layer of the device. Next, one column is created for each Measurement to store the specific data. - -Then SparkSQL Table Structure is as follows: - -|time(LongType)| group(StringType)| field(StringType)| device(StringType)|status(BooleanType)|temperature(FloatType)|hardware(StringType)| -|---|---|---|---|---|---|---| -|1| ln | wf01 | wt01 |True|2.2|null| -|1| ln | wf02 | wt02 |True|null|null| -|2| ln | wf01 | wt01 |null|2.2|null| -|2| ln | wf02 | wt02 |False|null|"aaa"| -|2| sgcc | wf03 | wt01 |True|null|null| -|3| ln | wf01 | wt01 |True|2.1|null| -|3| sgcc | wf03 | wt01 |True|3.3|null| -|4| ln | wf01 | wt01 |null|2.0|null| -|4| ln | wf02 | wt02 |True|null|"bbb"| -|4| sgcc | wf03 | wt01 |True|null|null| -|5| ln | wf01 | wt01 |False|null|null| -|5| ln | wf02 | wt02 |False|null|null| -|5| sgcc | wf03 | wt01 |True|null|null| -|6| ln | wf02 | wt02 |null|null|"ccc"| -|6| sgcc | wf03 | wt01 |null|6.6|null| -|7| ln | wf01 | wt01 |True|null|null| -|8| ln | wf02 | wt02 |null|null|"ddd"| -|8| sgcc | wf03 | wt01 |null|8.8|null| -|9| sgcc | wf03 | wt01 |null|9.9|null| - - -TsFile-Spark-Connector displays one or more TsFiles as a table in SparkSQL By SparkSQL. It also allows users to specify a single directory or use wildcards to match multiple directories. If there are multiple TsFiles, the union of the measurements in all TsFiles will be retained in the table, and the measurement with the same name have the same data type by default. Note that if a situation with the same name but different data types exists, TsFile-Spark-Connector does not guarantee the correctness of the results. - -The writing process is to write a DataFrame as one or more TsFiles. By default, two columns need to be included: time and delta_object. The rest of the columns are used as Measurement. If user wants to write the second table structure back to TsFile, user can set the "delta\_object\_name" parameter(refer to Section 5.1 of Section 5.1 of this manual). - -Appendix B: Old Note -NOTE: Check the jar packages in the root directory of your Spark and replace libthrift-0.9.2.jar and libfb303-0.9.2.jar with libthrift-0.9.1.jar and libfb303-0.9.1.jar respectively. diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Telegraf-IoTDB.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Telegraf-IoTDB.md deleted file mode 100644 index cdb7475a4..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Telegraf-IoTDB.md +++ /dev/null @@ -1,110 +0,0 @@ - - -# Telegraf -Telegraf is an open-source agent that facilitates the collection, processing, and transmission of metric data. Developed by InfluxData. -Telegraf includes the following features: -* Plugin Architecture: Telegraf's strength lies in its extensive plugin ecosystem. It supports a wide range of input, output, and processor plugins, allowing seamless integration with various data sources and destinations. -* Data Collection: Telegraf excels in collecting metrics from diverse sources, such as system metrics, logs, databases, and more. Its versatility makes it suitable for monitoring applications, infrastructure, and IoT devices. -* Output Destinations: Once collected, data can be sent to various output destinations, including popular databases like InfluxDB. This flexibility makes Telegraf adaptable to different monitoring and analytics setups. -* Ease of Configuration: Telegraf's configuration is done using TOML files. This simplicity allows users to define inputs, outputs, and processors with ease, making customization straightforward. -* Community and Support: Being open-source, Telegraf benefits from an active community. Users can contribute plugins, report issues, and seek assistance through forums and documentation. - -# Telegraf IoTDB Output Plugin -This output plugin saves Telegraf metrics to an Apache IoTDB backend, supporting session connection and data insertion. - -## Precautions -1. Before using this plugin, please configure the IP address, port number, username, password and other information of the database server, as well as some data type conversion, time unit and other configurations. -2. The path should follow the rule in Chapter 'Syntax Rule' -3. See https://github.com/influxdata/telegraf/tree/master/plugins/outputs/iotdb for how to configure this plugin. - -## Example -Here is an example that demonstrates how to collect cpu data from Telegraf into IoTDB. -1. generate the configuration file by telegraf -``` -telegraf --sample-config --input-filter cpu --output-filter iotdb > cpu_iotdb.conf -``` -2. modify the default cpu inputs plugin configuration -``` -# Read metrics about cpu usage -[[inputs.cpu]] - ## Whether to report per-cpu stats or not - percpu = true - ## Whether to report total system cpu stats or not - totalcpu = true - ## If true, collect raw CPU time metrics - collect_cpu_time = false - ## If true, compute and report the sum of all non-idle CPU states - report_active = false - ## If true and the info is available then add core_id and physical_id tags - core_tags = false - name_override = "root.demo.telgraf.cpu" -``` -3. modify the IoTDB outputs plugin configuration -``` -# Save metrics to an IoTDB Database -[[outputs.iotdb]] - ## Configuration of IoTDB server connection - host = "127.0.0.1" - # port = "6667" - - ## Configuration of authentication - # user = "root" - # password = "root" - - ## Timeout to open a new session. - ## A value of zero means no timeout. - # timeout = "5s" - - ## Configuration of type conversion for 64-bit unsigned int - ## IoTDB currently DOES NOT support unsigned integers (version 13.x). - ## 32-bit unsigned integers are safely converted into 64-bit signed integers by the plugin, - ## however, this is not true for 64-bit values in general as overflows may occur. - ## The following setting allows to specify the handling of 64-bit unsigned integers. - ## Available values are: - ## - "int64" -- convert to 64-bit signed integers and accept overflows - ## - "int64_clip" -- convert to 64-bit signed integers and clip the values on overflow to 9,223,372,036,854,775,807 - ## - "text" -- convert to the string representation of the value - # uint64_conversion = "int64_clip" - - ## Configuration of TimeStamp - ## TimeStamp is always saved in 64bits int. timestamp_precision specifies the unit of timestamp. - ## Available value: - ## "second", "millisecond", "microsecond", "nanosecond"(default) - timestamp_precision = "millisecond" - - ## Handling of tags - ## Tags are not fully supported by IoTDB. - ## A guide with suggestions on how to handle tags can be found here: - ## https://iotdb.apache.org/UserGuide/Master/API/InfluxDB-Protocol.html - ## - ## Available values are: - ## - "fields" -- convert tags to fields in the measurement - ## - "device_id" -- attach tags to the device ID - ## - ## For Example, a metric named "root.sg.device" with the tags `tag1: "private"` and `tag2: "working"` and - ## fields `s1: 100` and `s2: "hello"` will result in the following representations in IoTDB - ## - "fields" -- root.sg.device, s1=100, s2="hello", tag1="private", tag2="working" - ## - "device_id" -- root.sg.device.private.working, s1=100, s2="hello" - convert_tags_to = "fields" -``` -4. run telegraf with this configuration file, after some time, the data can be found in IoTDB - diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Thingsboard.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Thingsboard.md deleted file mode 100644 index a67c7d65c..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Thingsboard.md +++ /dev/null @@ -1,99 +0,0 @@ - -# ThingsBoard - -## Product Overview - -1. Introduction to ThingsBoard - - ThingsBoard is an open-source IoT platform that enables rapid development, management, and expansion of IoT projects. For more detailed information, please refer to [ThingsBoard Official Website](https://thingsboard.io/docs/getting-started-guides/what-is-thingsboard/). - - ![](/img/ThingsBoard-en1.png) - -2. Introduction to ThingsBoard-IoTDB - - ThingsBoard IoTDB provides the ability to store data from ThingsBoard to IoTDB, and also supports reading data information from the `root.thingsboard` database in ThingsBoard. The detailed architecture diagram is shown in yellow in the following figure. - -### Relationship Diagram - - ![](/img/Thingsboard-2.png) - -## Installation Requirements - -| **Preparation Content** | **Version Requirements** | -| :---------------------------------------- | :----------------------------------------------------------- | -| JDK | JDK17 or above. Please refer to the downloads on [Oracle Official Website](https://www.oracle.com/java/technologies/downloads/) | -| IoTDB |IoTDB v1.3.0 or above. Please refer to the [Deployment guidance](../Deployment-and-Maintenance/IoTDB-Package.md) | -| ThingsBoard
(IoTDB adapted version) | Please contact Timecho staff to obtain the installation package. Detailed installation steps are provided below. | - -## Installation Steps - -Please refer to the installation steps on [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/),wherein: - -- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/)【 Step 2: ThingsBoard Service Installation 】 Use the installation package provided by your Timecho contact to install the software. Please note that the official ThingsBoard installation package does not support IoTDB. -- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/) 【Step 3: Configure ThingsBoard Database - ThingsBoard Configuration】 In this step, you need to add environment variables according to the following content - -```Shell -# ThingsBoard original configuration -export SPRING_DATASOURCE_URL=jdbc:postgresql://localhost:5432/thingsboard -export SPRING_DATASOURCE_USERNAME=postgres -export SPRING_DATASOURCE_PASSWORD=PUT_YOUR_POSTGRESQL_PASSWORD_HERE ##Change password to pg - -# To use IoTDB, the following variables need to be modified -export DATABASE_TS_TYPE=iotdb ## Originally configured as SQL, change the variable value to iotdb - - -# To use IoTDB, the following variables need to be added -export DATABASE_TS_LATEST_TYPE=iotdb -export IoTDB_HOST=127.0.0.1 ## The IP address where iotdb is located -export IoTDB_PORT=6667 ## The port number for iotdb is 6667 by default -export IoTDB_USER=root ## The username for iotdb,defaults as root -export IoTDB_PASSWORD=root ## The password for iotdb,default as root -export IoTDB_CONNECTION_TIMEOUT=5000 ## IoTDB timeout setting -export IoTDB_FETCH_SIZE=1024 ## The number of data pulled in a single request is recommended to be set to 1024 -export IoTDB_MAX_SIZE=200 ## The maximum number of sessions in the session pool is recommended to be set to>=concurrent requests -export IoTDB_DATABASE=root.thingsboard ## Thingsboard data is written to the database stored in IoTDB, supporting customization -``` - -## Instructions - -1. Set up devices and connect datasource: Add a new device under "Entities" - "Devices" in Thingsboard and send data to the specified devices through gateway. - - ![](/img/Thingsboard-en2.png) - -2. Set rule chain: Set alarm rules for "SD-032F pump" in the rule chain library and set the rule chain as the root chain. - -
-  -  -
- - -3. View alarm records: The generated alarm records can be found under "Devices" - "Alarms. - - ![](/img/Thingsboard-en5.png) - -4. Data Visualization: Configure datasource and parameters for data visualization. - -
-  -  -
\ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Zeppelin-IoTDB_apache.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Zeppelin-IoTDB_apache.md deleted file mode 100644 index cd9f6bfd3..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Zeppelin-IoTDB_apache.md +++ /dev/null @@ -1,184 +0,0 @@ - - -# Apache Zeppelin - -## About Zeppelin - -Zeppelin is a web-based notebook that enables interactive data analytics. You can connect to data sources and perform interactive operations with SQL, Scala, etc. The operations can be saved as documents, just like Jupyter. Zeppelin has already supported many data sources, including Spark, ElasticSearch, Cassandra, and InfluxDB. Now, we have enabled Zeppelin to operate IoTDB via SQL. - -![iotdb-note-snapshot](/img/github/102752947-520a3e80-43a5-11eb-8fb1-8fac471c8c7e.png) - - - -## Zeppelin-IoTDB Interpreter - -### System Requirements - -| IoTDB Version | Java Version | Zeppelin Version | -| :-----------: | :-----------: | :--------------: | -| >=`0.12.0` | >=`1.8.0_271` | `>=0.9.0` | - -Install IoTDB: Reference to [IoTDB Quick Start](../Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md). Suppose IoTDB is placed at `$IoTDB_HOME`. - -Install Zeppelin: -> Method A. Download directly: You can download [Zeppelin](https://zeppelin.apache.org/download.html#) and unpack the binary package. [netinst](http://www.apache.org/dyn/closer.cgi/zeppelin/zeppelin-0.9.0/zeppelin-0.9.0-bin-netinst.tgz) binary package is recommended since it's relatively small by excluding irrelevant interpreters. -> -> Method B. Compile from source code: Reference to [build Zeppelin from source](https://zeppelin.apache.org/docs/latest/setup/basics/how_to_build.html). The command is `mvn clean package -pl zeppelin-web,zeppelin-server -am -DskipTests`. - -Suppose Zeppelin is placed at `$Zeppelin_HOME`. - -### Build Interpreter - -``` - cd $IoTDB_HOME - mvn clean package -pl iotdb-connector/zeppelin-interpreter -am -DskipTests -P get-jar-with-dependencies -``` - -The interpreter will be in the folder: - -``` - $IoTDB_HOME/zeppelin-interpreter/target/zeppelin-{version}-SNAPSHOT-jar-with-dependencies.jar -``` - - - -### Install Interpreter - -Once you have built your interpreter, create a new folder under the Zeppelin interpreter directory and put the built interpreter into it. - -``` - cd $IoTDB_HOME - mkdir -p $Zeppelin_HOME/interpreter/iotdb - cp $IoTDB_HOME/zeppelin-interpreter/target/zeppelin-{version}-SNAPSHOT-jar-with-dependencies.jar $Zeppelin_HOME/interpreter/iotdb -``` - -### Modify Configuration - -Enter `$Zeppelin_HOME/conf` and use template to create Zeppelin configuration file: - -```shell -cp zeppelin-site.xml.template zeppelin-site.xml -``` - -Open the zeppelin-site.xml file and change the `zeppelin.server.addr` item to `0.0.0.0` - -### Running Zeppelin and IoTDB - -Go to `$Zeppelin_HOME` and start Zeppelin by running: - -``` - ./bin/zeppelin-daemon.sh start -``` - -or in Windows: - -``` - .\bin\zeppelin.cmd -``` - -Go to `$IoTDB_HOME` and start IoTDB server: - -``` - # Unix/OS X - > nohup sbin/start-server.sh >/dev/null 2>&1 & - or - > nohup sbin/start-server.sh -c -rpc_port >/dev/null 2>&1 & - - # Windows - > sbin\start-server.bat -c -rpc_port -``` - - - -## Use Zeppelin-IoTDB - -Wait for Zeppelin server to start, then visit http://127.0.0.1:8080/ - -In the interpreter page: - -1. Click the `Create new node` button -2. Set the note name -3. Configure your interpreter - -Now you are ready to use your interpreter. - -![iotdb-create-note](/img/github/102752945-5171a800-43a5-11eb-8614-53b3276a3ce2.png) - -We provide some simple SQL to show the use of Zeppelin-IoTDB interpreter: - -```sql - CREATE DATABASE root.ln.wf01.wt01; - CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN; - CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=PLAIN; - CREATE TIMESERIES root.ln.wf01.wt01.hardware WITH DATATYPE=INT32, ENCODING=PLAIN; - - INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) - VALUES (1, 1.1, false, 11); - - INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) - VALUES (2, 2.2, true, 22); - - INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) - VALUES (3, 3.3, false, 33); - - INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) - VALUES (4, 4.4, false, 44); - - INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) - VALUES (5, 5.5, false, 55); - - - SELECT * - FROM root.ln.wf01.wt01 - WHERE time >= 1 - AND time <= 6; -``` - -The screenshot is as follows: - -![iotdb-note-snapshot2](/img/github/102752948-52a2d500-43a5-11eb-9156-0c55667eb4cd.png) - -You can also design more fantasy documents referring to [[1]](https://zeppelin.apache.org/docs/0.9.0/usage/display_system/basic.html) and others. - -The above demo notebook can be found at `$IoTDB_HOME/zeppelin-interpreter/Zeppelin-IoTDB-Demo.zpln`. - - - -## Configuration - -You can configure the connection parameters in http://127.0.0.1:8080/#/interpreter : - -![iotdb-configuration](/img/github/102752940-50407b00-43a5-11eb-94fb-3e3be222183c.png) - -The parameters you can configure are as follows: - -| Property | Default | Description | -| ---------------------------- | --------- | ------------------------------- | -| iotdb.host | 127.0.0.1 | IoTDB server host to connect to | -| iotdb.port | 6667 | IoTDB server port to connect to | -| iotdb.username | root | Username for authentication | -| iotdb.password | root | Password for authentication | -| iotdb.fetchSize | 10000 | Query fetch size | -| iotdb.zoneId | | Zone Id | -| iotdb.enable.rpc.compression | FALSE | Whether enable rpc compression | -| iotdb.time.display.type | default | The time format to display | - diff --git a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Zeppelin-IoTDB_timecho.md b/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Zeppelin-IoTDB_timecho.md deleted file mode 100644 index 134a17b1b..000000000 --- a/src/UserGuide/V1.3.0-2/Ecosystem-Integration/Zeppelin-IoTDB_timecho.md +++ /dev/null @@ -1,184 +0,0 @@ - - -# Apache Zeppelin - -## About Zeppelin - -Zeppelin is a web-based notebook that enables interactive data analytics. You can connect to data sources and perform interactive operations with SQL, Scala, etc. The operations can be saved as documents, just like Jupyter. Zeppelin has already supported many data sources, including Spark, ElasticSearch, Cassandra, and InfluxDB. Now, we have enabled Zeppelin to operate IoTDB via SQL. - -![iotdb-note-snapshot](/img/github/102752947-520a3e80-43a5-11eb-8fb1-8fac471c8c7e.png) - - - -## Zeppelin-IoTDB Interpreter - -### System Requirements - -| IoTDB Version | Java Version | Zeppelin Version | -| :-----------: | :-----------: | :--------------: | -| >=`0.12.0` | >=`1.8.0_271` | `>=0.9.0` | - -Install IoTDB: Reference to [IoTDB Quick Start](../Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md). Suppose IoTDB is placed at `$IoTDB_HOME`. - -Install Zeppelin: -> Method A. Download directly: You can download [Zeppelin](https://zeppelin.apache.org/download.html#) and unpack the binary package. [netinst](http://www.apache.org/dyn/closer.cgi/zeppelin/zeppelin-0.9.0/zeppelin-0.9.0-bin-netinst.tgz) binary package is recommended since it's relatively small by excluding irrelevant interpreters. -> -> Method B. Compile from source code: Reference to [build Zeppelin from source](https://zeppelin.apache.org/docs/latest/setup/basics/how_to_build.html). The command is `mvn clean package -pl zeppelin-web,zeppelin-server -am -DskipTests`. - -Suppose Zeppelin is placed at `$Zeppelin_HOME`. - -### Build Interpreter - -``` - cd $IoTDB_HOME - mvn clean package -pl iotdb-connector/zeppelin-interpreter -am -DskipTests -P get-jar-with-dependencies -``` - -The interpreter will be in the folder: - -``` - $IoTDB_HOME/zeppelin-interpreter/target/zeppelin-{version}-SNAPSHOT-jar-with-dependencies.jar -``` - - - -### Install Interpreter - -Once you have built your interpreter, create a new folder under the Zeppelin interpreter directory and put the built interpreter into it. - -``` - cd $IoTDB_HOME - mkdir -p $Zeppelin_HOME/interpreter/iotdb - cp $IoTDB_HOME/zeppelin-interpreter/target/zeppelin-{version}-SNAPSHOT-jar-with-dependencies.jar $Zeppelin_HOME/interpreter/iotdb -``` - -### Modify Configuration - -Enter `$Zeppelin_HOME/conf` and use template to create Zeppelin configuration file: - -```shell -cp zeppelin-site.xml.template zeppelin-site.xml -``` - -Open the zeppelin-site.xml file and change the `zeppelin.server.addr` item to `0.0.0.0` - -### Running Zeppelin and IoTDB - -Go to `$Zeppelin_HOME` and start Zeppelin by running: - -``` - ./bin/zeppelin-daemon.sh start -``` - -or in Windows: - -``` - .\bin\zeppelin.cmd -``` - -Go to `$IoTDB_HOME` and start IoTDB server: - -``` - # Unix/OS X - > nohup sbin/start-server.sh >/dev/null 2>&1 & - or - > nohup sbin/start-server.sh -c -rpc_port >/dev/null 2>&1 & - - # Windows - > sbin\start-server.bat -c -rpc_port -``` - - - -## Use Zeppelin-IoTDB - -Wait for Zeppelin server to start, then visit http://127.0.0.1:8080/ - -In the interpreter page: - -1. Click the `Create new node` button -2. Set the note name -3. Configure your interpreter - -Now you are ready to use your interpreter. - -![iotdb-create-note](/img/github/102752945-5171a800-43a5-11eb-8614-53b3276a3ce2.png) - -We provide some simple SQL to show the use of Zeppelin-IoTDB interpreter: - -```sql - CREATE DATABASE root.ln.wf01.wt01; - CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN; - CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=PLAIN; - CREATE TIMESERIES root.ln.wf01.wt01.hardware WITH DATATYPE=INT32, ENCODING=PLAIN; - - INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) - VALUES (1, 1.1, false, 11); - - INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) - VALUES (2, 2.2, true, 22); - - INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) - VALUES (3, 3.3, false, 33); - - INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) - VALUES (4, 4.4, false, 44); - - INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) - VALUES (5, 5.5, false, 55); - - - SELECT * - FROM root.ln.wf01.wt01 - WHERE time >= 1 - AND time <= 6; -``` - -The screenshot is as follows: - -![iotdb-note-snapshot2](/img/github/102752948-52a2d500-43a5-11eb-9156-0c55667eb4cd.png) - -You can also design more fantasy documents referring to [[1]](https://zeppelin.apache.org/docs/0.9.0/usage/display_system/basic.html) and others. - -The above demo notebook can be found at `$IoTDB_HOME/zeppelin-interpreter/Zeppelin-IoTDB-Demo.zpln`. - - - -## Configuration - -You can configure the connection parameters in http://127.0.0.1:8080/#/interpreter : - -![iotdb-configuration](/img/github/102752940-50407b00-43a5-11eb-94fb-3e3be222183c.png) - -The parameters you can configure are as follows: - -| Property | Default | Description | -| ---------------------------- | --------- | ------------------------------- | -| iotdb.host | 127.0.0.1 | IoTDB server host to connect to | -| iotdb.port | 6667 | IoTDB server port to connect to | -| iotdb.username | root | Username for authentication | -| iotdb.password | root | Password for authentication | -| iotdb.fetchSize | 10000 | Query fetch size | -| iotdb.zoneId | | Zone Id | -| iotdb.enable.rpc.compression | FALSE | Whether enable rpc compression | -| iotdb.time.display.type | default | The time format to display | - diff --git a/src/UserGuide/V1.3.0-2/FAQ/Frequently-asked-questions.md b/src/UserGuide/V1.3.0-2/FAQ/Frequently-asked-questions.md deleted file mode 100644 index 4355ef7e4..000000000 --- a/src/UserGuide/V1.3.0-2/FAQ/Frequently-asked-questions.md +++ /dev/null @@ -1,263 +0,0 @@ - - -# Frequently Asked Questions - -## General FAQ - -### How can I identify my version of IoTDB? - -There are several ways to identify the version of IoTDB that you are using: - -* Launch IoTDB's Command Line Interface: - -``` -> ./start-cli.sh -p 6667 -pw root -u root -h localhost - _____ _________ ______ ______ -|_ _| | _ _ ||_ _ `.|_ _ \ - | | .--.|_/ | | \_| | | `. \ | |_) | - | | / .'`\ \ | | | | | | | __'. - _| |_| \__. | _| |_ _| |_.' /_| |__) | -|_____|'.__.' |_____| |______.'|_______/ version x.x.x -``` - -* Check pom.xml file: - -``` -x.x.x -``` - -* Use JDBC API: - -``` -String iotdbVersion = tsfileDatabaseMetadata.getDatabaseProductVersion(); -``` - -* Use Command Line Interface: - -``` -IoTDB> show version -show version -+---------------+ -|version | -+---------------+ -|x.x.x | -+---------------+ -Total line number = 1 -It costs 0.241s -``` - -### Where can I find IoTDB logs? - -Suppose your root directory is: - -``` -$ pwd -/workspace/iotdb - -$ ls -l -server/ -cli/ -pom.xml -Readme.md -... -``` - -Let `$IOTDB_HOME = /workspace/iotdb/server/target/iotdb-server-{project.version}` - -Let `$IOTDB_CLI_HOME = /workspace/iotdb/cli/target/iotdb-cli-{project.version}` - -By default settings, the logs are stored under ```IOTDB_HOME/logs```. You can change log level and storage path by configuring ```logback.xml``` under ```IOTDB_HOME/conf```. - -### Where can I find IoTDB data files? - -By default settings, the data files (including tsfile, metadata, and WAL files) are stored under ```IOTDB_HOME/data/datanode```. - -### How do I know how many time series are stored in IoTDB? - -Use IoTDB's Command Line Interface: - -``` -IoTDB> show timeseries -``` - -In the result, there is a statement shows `Total timeseries number`, this number is the timeseries number in IoTDB. - -In the current version, IoTDB supports querying the number of time series. Use IoTDB's Command Line Interface: - -``` -IoTDB> count timeseries -``` - -If you are using Linux, you can use the following shell command: - -``` -> grep "0,root" $IOTDB_HOME/data/system/schema/mlog.txt | wc -l -> 6 -``` - -### Can I use Hadoop and Spark to read TsFile in IoTDB? - -Yes. IoTDB has intense integration with Open Source Ecosystem. IoTDB supports [Hadoop](https://github.com/apache/iotdb-extras/tree/master/connectors/hadoop), [Spark](https://github.com/apache/iotdb-extras/tree/master/connectors/spark-iotdb-connector) and [Grafana](https://github.com/apache/iotdb-extras/tree/master/connectors/grafana-connector) visualization tool. - -### How does IoTDB handle duplicate points? - -A data point is uniquely identified by a full time series path (e.g. ```root.vehicle.d0.s0```) and timestamp. If you submit a new point with the same path and timestamp as an existing point, IoTDB updates the value of this point instead of inserting a new point. - -### How can I tell what type of the specific timeseries? - -Use ```SHOW TIMESERIES ``` SQL in IoTDB's Command Line Interface: - -For example, if you want to know the type of all timeseries, the \ should be `root.**`. The statement will be: - -``` -IoTDB> show timeseries root.** -``` - -If you want to query specific sensor, you can replace the \ with the sensor name. For example: - -``` -IoTDB> show timeseries root.fit.d1.s1 -``` - -Otherwise, you can also use wildcard in timeseries path: - -``` -IoTDB> show timeseries root.fit.d1.* -``` - -### How can I change IoTDB's Cli time display format? - -The default IoTDB's Cli time display format is readable (e.g. ```1970-01-01T08:00:00.001```), if you want to display time in timestamp type or other readable format, add parameter ```-disableISO8601``` in start command: - -``` -> $IOTDB_CLI_HOME/sbin/start-cli.sh -h 127.0.0.1 -p 6667 -u root -pw root -disableISO8601 -``` - -### How to handle error `IndexOutOfBoundsException` from `org.apache.ratis.grpc.server.GrpcLogAppender`? - -This is an internal error log from Ratis 2.4.1, our dependency, and no impact on data writes or reads is expected. -It has been reported to the Ratis community and will be fixed in the future releases. - -### How to deal with estimated out of memory errors? - -Report an error message: -``` -301: There is not enough memory to execute current fragment instance, current remaining free memory is 86762854, estimated memory usage for current fragment instance is 270139392 -``` -Error analysis: -The datanode_memory_proportion parameter controls the memory divided to the query, and the chunk_timeseriesmeta_free_memory_proportion parameter controls the memory available for query execution. -By default the memory allocated to the query is 30% of the heap memory and the memory available for query execution is 20% of the query memory. -The error report shows that the current remaining memory available for query execution is 86762854B = 82.74MB, and the query is estimated to use 270139392B = 257.6MB of execution memory. - -Some possible improvement items: - -- Without changing the default parameters, crank up the heap memory of IoTDB greater than 4.2G (4.2G * 1024MB = 4300MB), 4300M * 30% * 20% = 258M > 257.6M, which can fulfill the requirement. -- Change parameters such as datanode_memory_proportion so that the available memory for query execution is >257.6MB. -- Reduce the number of exported time series. -- Add slimit limit to the query statement, which is also an option to reduce the query time series. -- Add align by device, which will export in device order, and the memory usage will be reduced to single-device level. - -It is an internal error introduced by Ratis 2.4.1 dependency, and we can safely ignore this exception as it will -not affect normal operations. We will fix this message in the incoming releases. - -## FAQ for Cluster Setup - -### Cluster StartUp and Stop - -#### Failed to start ConfigNode for the first time, how to find the reason? - -- Make sure that the data/confignode directory is cleared when start ConfigNode for the first time. -- Make sure that the used by ConfigNode is not occupied, and the is also not conflicted with other ConfigNodes. -- Make sure that the `cn_seed_config_node` is configured correctly, which points to the alive ConfigNode. And if the ConfigNode is started for the first time, make sure that `cn_seed_config_node` points to itself. -- Make sure that the configuration(consensus protocol and replica number) of the started ConfigNode is accord with the `cn_seed_config_node` ConfigNode. - -#### ConfigNode is started successfully, but why the node doesn't appear in the results of `show cluster`? - -- Examine whether the `cn_seed_config_node` points to the correct address. If `cn_seed_config_node` points to itself, a new ConfigNode cluster is started. - -#### Failed to start DataNode for the first time, how to find the reason? - -- Make sure that the data/datanode directory is cleared when start DataNode for the first time. If the start result is “Reject DataNode restart.”, maybe the data/datanode directory is not cleared. -- Make sure that the used by DataNode is not occupied, and the is also not conflicted with other DataNodes. -- Make sure that the `dn_seed_config_node` points to the alive ConfigNode. - -#### Failed to remove DataNode, how to find the reason? - -- Examine whether the parameter of `remove-datanode.sh` is correct, only rpcIp:rpcPort and dataNodeId are correct parameter. -- Only when the number of available DataNodes in the cluster is greater than max(schema_replication_factor, data_replication_factor), removing operation can be executed. -- Removing DataNode will migrate the data from the removing DataNode to other alive DataNodes. Data migration is based on Region, if some regions are migrated failed, the removing DataNode will always in the status of `Removing`. -- If the DataNode is in the status of `Removing`, the regions in the removing DataNode will also in the status of `Removing` or `Unknown`, which are unavailable status. Besides, the removing DataNode will not receive new write requests from client. - And users can use the command `set system status to running` to make the status of DataNode from Removing to Running; - If users want to make the Regions from Removing to available status, command `migrate region from datanodeId1 to datanodeId2` can take effect, this command can migrate the regions to other alive DataNodes. - Besides, IoTDB will publish `remove-datanode.sh -f` command in the next version, which can remove DataNodes forced (The failed migrated regions will be discarded). - -#### Whether the down DataNode can be removed? - -- The down DataNode can be removed only when the replica factor of schema and data is greater than 1. - Besides, IoTDB will publish `remove-datanode.sh -f` function in the next version. - -#### What should be paid attention to when upgrading from 0.13 to 1.0? - -- The file structure between 0.13 and 1.0 is different, we can't copy the data directory from 0.13 to 1.0 to use directly. - If you want to load the data from 0.13 to 1.0, you can use the LOAD function. -- The default RPC address of 0.13 is `0.0.0.0`, but the default RPC address of 1.0 is `127.0.0.1`. - - -### Cluster Restart - -#### How to restart any ConfigNode in the cluster? - -- First step: stop the process by `stop-confignode.sh` or kill PID of ConfigNode. -- Second step: execute `start-confignode.sh` to restart ConfigNode. - -#### How to restart any DataNode in the cluster? - -- First step: stop the process by `stop-datanode.sh` or kill PID of DataNode. -- Second step: execute `s`tart-datanode.sh` to restart DataNode. - -#### If it's possible to restart ConfigNode using the old data directory when it's removed? - -- Can't. The running result will be "Reject ConfigNode restart. Because there are no corresponding ConfigNode(whose nodeId=xx) in the cluster". - -#### If it's possible to restart DataNode using the old data directory when it's removed? - -- Can't. The running result will be "Reject DataNode restart. Because there are no corresponding DataNode(whose nodeId=xx) in the cluster. Possible solutions are as follows:...". - -#### Can we execute `start-confignode.sh`/`start-datanode.sh` successfully when delete the data directory of given ConfigNode/DataNode without killing the PID? - -- Can't. The running result will be "The port is already occupied". - -### Cluster Maintenance - -#### How to find the reason when Show cluster failed, and error logs like "please check server status" are shown? - -- Make sure that more than one half ConfigNodes are alive. -- Make sure that the DataNode connected by the client is alive. - -#### How to fix one DataNode when the disk file is broken? - -- We can use `remove-datanode.sh` to fix it. Remove-datanode will migrate the data in the removing DataNode to other alive DataNodes. -- IoTDB will publish Node-Fix tools in the next version. - -#### How to decrease the memory usage of ConfigNode/DataNode? - -- Adjust the ON_HEAP_MEMORY、OFF_HEAP_MEMORY options in conf/confignode-env.sh and conf/datanode-env.sh. diff --git a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Architecture.md b/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Architecture.md deleted file mode 100644 index ec9aa25ae..000000000 --- a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Architecture.md +++ /dev/null @@ -1,44 +0,0 @@ - - -# System Architecture - -Besides IoTDB engine, we also developed several components to provide better IoT service. All components are referred to below as the IoTDB suite, and IoTDB refers specifically to the IoTDB engine. - -IoTDB suite can provide a series of functions in the real situation such as data collection, data writing, data storage, data query, data visualization and data analysis. Figure 1.1 shows the overall application architecture brought by all the components of the IoTDB suite. - - - -As shown in Figure 1.1, users can use JDBC to import timeseries data collected by sensor on the device to local/remote IoTDB. These timeseries data may be system state data (such as server load and CPU memory, etc.), message queue data, timeseries data from applications, or other timeseries data in the database. Users can also write the data directly to the TsFile (local or on HDFS). - -TsFile could be written to the HDFS, thereby implementing data processing tasks such as abnormality detection and machine learning on the Hadoop or Spark data processing platform. - -For the data written to HDFS or local TsFile, users can use TsFile-Hadoop-Connector or TsFile-Spark-Connector to allow Hadoop or Spark to process data. - -The results of the analysis can be write back to TsFile in the same way. - -Also, IoTDB and TsFile provide client tools to meet the various needs of users in writing and viewing data in SQL form, script form and graphical form. - -IoTDB offers two deployment modes: standalone and cluster. In cluster deployment mode, IoTDB supports automatic failover, ensuring that the system can quickly switch to standby nodes in the event of a node failure. The switch time can be achieved in seconds, thereby minimizing system downtime and ensuring no data loss after the switch. When the faulty node returns to normal, the system will automatically reintegrate it into the cluster, ensuring the cluster's high availability and scalability. - -IoTDB also supports a read-write separation deployment mode, which can allocate read and write operations to different nodes, achieving load balancing and enhancing the system's concurrent processing capability. - -Through these features, IoTDB can avoid single-point performance bottlenecks and single-point failures (SPOF), offering a high-availability and reliable data storage and management solution. diff --git a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Features.md b/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Features.md deleted file mode 100644 index f7af71318..000000000 --- a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Features.md +++ /dev/null @@ -1,58 +0,0 @@ - - -# Features - - -* Flexible deployment. - -IoTDB provides users one-click installation tool on the cloud, once-decompressed-used terminal tool and the bridging tool between cloud platforms and terminal tools (Data Synchronization Tool). - -* Low storage cost. - -IoTDB can reach a high compression ratio of disk storage, which means IoTDB can store the same amount of data with less hardware disk cost. - -* Efficient directory structure. - -IoTDB supports efficient oganization for complex timeseries data structure from intelligent networking devices, oganization for timeseries data from devices of the same type, fuzzy searching strategy for massive and complex directory of timeseries data. -* High-throughput read and write. - -IoTDB supports millions of low-power devices' strong connection data access, high-speed data read and write for intelligent networking devices and mixed devices mentioned above. - -* Rich query semantics. - -IoTDB supports time alignment for timeseries data accross devices and sensors, computation in timeseries field (frequency domain transformation) and rich aggregation function support in time dimension. - -* Easy to get started. - -IoTDB supports SQL-Like language, JDBC standard API and import/export tools which are easy to use. - -* Intense integration with Open Source Ecosystem. - -IoTDB supports Hadoop, Spark, etc. analysis ecosystems and Grafana visualization tool. - -* Unified data access mode - -IoTDB eliminates the need for database partitioning or sharding and makes no distinction between historical and real-time databases. - -* High availability support - -IoTDB supports a HA distributed architecture, ensuring 7x24 uninterrupted real-time database services. Users can connect to any node within the cluster for system access. The system remains operational and unaffected even during physical node outages or network failures. As physical nodes are added, removed, or face performance issues, IoTDB automatically manages load balancing for both computational and storage resources. Furthermore, it's compatible with heterogeneous environments, allowing servers of varying types and capabilities to form a cluster, with load balancing optimized based on the specific configurations of each server. diff --git a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/IoTDB-Introduction_apache.md b/src/UserGuide/V1.3.0-2/IoTDB-Introduction/IoTDB-Introduction_apache.md deleted file mode 100644 index 1a1f23ec4..000000000 --- a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/IoTDB-Introduction_apache.md +++ /dev/null @@ -1,77 +0,0 @@ - - -# What is IoTDB - -Apache IoTDB is a low-cost, high-performance native temporal database for the Internet of Things. It can solve various problems encountered by enterprises when building IoT big data platforms to manage time-series data, such as complex application scenarios, large data volumes, high sampling frequencies, high amount of unaligned data, long data processing time, diverse analysis requirements, and high storage and operation costs. - -- Github repository link: https://github.com/apache/iotdb - -- Open source installation package download: https://iotdb.apache.org/zh/Download/ - -- Installation, deployment, and usage documentation: [QuickStart](../QuickStart/QuickStart_apache.md) - - -## Product Components - -IoTDB products consist of several components that help users efficiently manage and analyze the massive amount of time-series data generated by the IoT. - -
- Introduction-en-timecho.png - -
- -1. Time-series Database (Apache IoTDB): The core component for time-series data storage, it provides users with high-compression storage capabilities, rich time-series querying capabilities, real-time stream processing capabilities, and ensures high availability of data and high scalability of clusters. It also offers comprehensive security protection. Additionally, IoTDB provides users with a variety of application tools for easy configuration and management of the system; multi-language APIs and external system application integration capabilities, making it convenient for users to build business applications based on IoTDB. - -2. Time-series Data Standard File Format (Apache TsFile): This file format is specifically designed for time-series data and can efficiently store and query massive amounts of time-series data. Currently, the underlying storage files for modules such as IoTDB and AINode are supported by Apache TsFile. With TsFile, users can uniformly use the same file format for data management during the collection, management, application, and analysis phases, greatly simplifying the entire process from data collection to analysis, and improving the efficiency and convenience of time-series data management. - -3. Time-series Model Training and Inference Integrated Engine (IoTDB AINode): For intelligent analysis scenarios, IoTDB provides the AINode time-series model training and inference integrated engine, which offers a complete set of time-series data analysis tools. The underlying engine supports model training tasks and data management, including machine learning and deep learning. With these tools, users can conduct in-depth analysis of the data stored in IoTDB and extract its value. - - -## Product Features - -TimechoDB has the following advantages and characteristics: - -- Flexible deployment methods: Support for one-click cloud deployment, out-of-the-box use after unzipping at the terminal, and seamless connection between terminal and cloud (data cloud synchronization tool). - -- Low hardware cost storage solution: Supports high compression ratio disk storage, no need to distinguish between historical and real-time databases, unified data management. - -- Hierarchical sensor organization and management: Supports modeling in the system according to the actual hierarchical relationship of devices to achieve alignment with the industrial sensor management structure, and supports directory viewing, search, and other capabilities for hierarchical structures. - -- High throughput data reading and writing: supports access to millions of devices, high-speed data reading and writing, out of unaligned/multi frequency acquisition, and other complex industrial reading and writing scenarios. - -- Rich time series query semantics: Supports a native computation engine for time series data, supports timestamp alignment during queries, provides nearly a hundred built-in aggregation and time series calculation functions, and supports time series feature analysis and AI capabilities. - -- Highly available distributed system: Supports HA distributed architecture, the system provides 7*24 hours uninterrupted real-time database services, the failure of a physical node or network fault will not affect the normal operation of the system; supports the addition, deletion, or overheating of physical nodes, the system will automatically perform load balancing of computing/storage resources; supports heterogeneous environments, servers of different types and different performance can form a cluster, and the system will automatically load balance according to the configuration of the physical machine. - -- Extremely low usage and operation threshold: supports SQL like language, provides multi language native secondary development interface, and has a complete tool system such as console. - -- Rich ecological environment docking: Supports docking with big data ecosystem components such as Hadoop, Spark, and supports equipment management and visualization tools such as Grafana, Thingsboard, DataEase. - -## Commercial version - -Timecho provides the original commercial product TimechoDB based on the open source version of Apache IoTDB, providing enterprise level products and services for enterprises and commercial customers. It can solve various problems encountered by enterprises when building IoT big data platforms to manage time-series data, such as complex application scenarios, large data volumes, high sampling frequencies, high amount of unaligned data, long data processing time, diverse analysis requirements, and high storage and operation costs. - -Timecho provides a more diverse range of product features, stronger performance and stability, and a richer set of utility tools based on TimechoDB. It also offers comprehensive enterprise services to users, thereby providing commercial customers with more powerful product capabilities and a higher quality of development, operations, and usage experience. - -- Timecho Official website:https://www.timecho.com/ - -- TimechoDB installation, deployment and usage documentation:[QuickStart](https://www.timecho.com/docs/UserGuide/V1.3.0-2/QuickStart/QuickStart_timecho.html) diff --git a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/IoTDB-Introduction_timecho.md b/src/UserGuide/V1.3.0-2/IoTDB-Introduction/IoTDB-Introduction_timecho.md deleted file mode 100644 index d798b4e63..000000000 --- a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/IoTDB-Introduction_timecho.md +++ /dev/null @@ -1,267 +0,0 @@ - - -# What is TimechoDB - -TimechoDB is a low-cost, high-performance native temporal database for the Internet of Things, provided by Timecho based on the Apache IoTDB community version as an original commercial product. It can solve various problems encountered by enterprises when building IoT big data platforms to manage time-series data, such as complex application scenarios, large data volumes, high sampling frequencies, high amount of unaligned data, long data processing time, diverse analysis requirements, and high storage and operation costs. - -Timecho provides a more diverse range of product features, stronger performance and stability, and a richer set of utility tools based on TimechoDB. It also offers comprehensive enterprise services to users, thereby providing commercial customers with more powerful product capabilities and a higher quality of development, operations, and usage experience. - -- Download 、Deployment and Usage:[QuickStart](../QuickStart/QuickStart_timecho.md) - - -## Product Components - -Timecho products is composed of several components, covering the entire time-series data lifecycle from data collection, data management to data analysis & application, helping users efficiently manage and analyze the massive amount of time-series data generated by the IoT. - -
- Introduction-en-timecho-new.png - -
- -1. **Time-series database (TimechoDB, a commercial product based on Apache IoTDB provided by the original team)**: The core component of time-series data storage, which can provide users with high-compression storage capabilities, rich time-series query capabilities, real-time stream processing capabilities, while also having high availability of data and high scalability of clusters, and providing security protection. At the same time, TimechoDB also provides users with a variety of application tools for easy management of the system; multi-language API and external system application integration capabilities, making it convenient for users to build applications based on TimechoDB. - -2. **Time-series data standard file format (Apache TsFile, led and contributed by core team members of Timecho)**: This file format is a storage format specifically designed for time-series data, which can efficiently store and query massive amounts of time-series data. Currently, the underlying storage files of Timecho's collection, storage, and intelligent analysis modules are all supported by Apache TsFile. TsFile can be efficiently loaded into TimechoDB and can also be migrated out. Through TsFile, users can use the same file format for data management in the stages of collection, management, application & analysis, greatly simplifying the entire process from data collection to analysis, and improving the efficiency and convenience of time-series data management. - -3. **Time-series model training and inference integrated engine (AINode)**: For intelligent analysis scenarios, TimechoDB provides the AINode time-series model training and inference integrated engine, which offers a complete set of time-series data analysis tools, with the underlying model training engine supporting training tasks and data management, including machine learning, deep learning, etc. With these tools, users can conduct in-depth analysis of the data stored in TimechoDB and mine its value. - -4. **Data collection**: To more conveniently dock with various industrial collection scenarios, Timecho provides data collection access services, supporting multiple protocols and formats, which can access data generated by various sensors and devices, while also supporting features such as breakpoint resumption and network barrier penetration. It is more adapted to the characteristics of difficult configuration, slow transmission, and weak network in the industrial field collection process, making the user's data collection simpler and more efficient. - -## Product Features - -TimechoDB has the following advantages and characteristics: - -- Flexible deployment methods: Support for one-click cloud deployment, out-of-the-box use after unzipping at the terminal, and seamless connection between terminal and cloud (data cloud synchronization tool). - -- Low hardware cost storage solution: Supports high compression ratio disk storage, no need to distinguish between historical and real-time databases, unified data management. - -- Hierarchical sensor organization and management: Supports modeling in the system according to the actual hierarchical relationship of devices to achieve alignment with the industrial sensor management structure, and supports directory viewing, search, and other capabilities for hierarchical structures. - -- High throughput data reading and writing: supports access to millions of devices, high-speed data reading and writing, out of unaligned/multi frequency acquisition, and other complex industrial reading and writing scenarios. - -- Rich time series query semantics: Supports a native computation engine for time series data, supports timestamp alignment during queries, provides nearly a hundred built-in aggregation and time series calculation functions, and supports time series feature analysis and AI capabilities. - -- Highly available distributed system: Supports HA distributed architecture, the system provides 7*24 hours uninterrupted real-time database services, the failure of a physical node or network fault will not affect the normal operation of the system; supports the addition, deletion, or overheating of physical nodes, the system will automatically perform load balancing of computing/storage resources; supports heterogeneous environments, servers of different types and different performance can form a cluster, and the system will automatically load balance according to the configuration of the physical machine. - -- Extremely low usage and operation threshold: supports SQL like language, provides multi language native secondary development interface, and has a complete tool system such as console. - -- Rich ecological environment docking: Supports docking with big data ecosystem components such as Hadoop, Spark, and supports equipment management and visualization tools such as Grafana, Thingsboard, DataEase. - -## Enterprise characteristics - -### Higher level product features - -Building on the open-source version, TimechoDB offers a range of advanced product features, with native upgrades and optimizations at the kernel level for industrial production scenarios. These include multi-level storage, cloud-edge collaboration, visualization tools, and security enhancements, allowing users to focus more on business development without worrying too much about underlying logic. This simplifies and enhances industrial production, bringing more economic benefits to enterprises. For example: - - -- Dual Active Deployment:Dual active usually refers to two independent single machines (or clusters) that perform real-time mirror synchronization. Their configurations are completely independent and can simultaneously receive external writes. Each independent single machine (or clusters) can synchronize the data written to itself to another single machine (or clusters), and the data of the two single machines (or clusters) can achieve final consistency. - -- Data Synchronisation:Through the built-in synchronization module of the database, data can be aggregated from the station to the center, supporting various scenarios such as full aggregation, partial aggregation, and hierarchical aggregation. It can support both real-time data synchronization and batch data synchronization modes. Simultaneously providing multiple built-in plugins to support requirements such as gateway penetration, encrypted transmission, and compressed transmission in enterprise data synchronization applications. - -- Tiered Storage:Multi level storage: By upgrading the underlying storage capacity, data can be divided into different levels such as cold, warm, and hot based on factors such as access frequency and data importance, and stored in different media (such as SSD, mechanical hard drive, cloud storage, etc.). At the same time, the system also performs data scheduling during the query process. Thereby reducing customer data storage costs while ensuring data access speed. - -- Security Enhancements: Features like whitelists and audit logs strengthen internal management and reduce the risk of data breaches. - -The detailed functional comparison is as follows: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
FunctionApache IoTDBTimechoDB
Deployment ModeStand-Alone Deployment
Distributed Deployment
Dual Active Deployment×
Container DeploymentPartial support
Database FunctionalitySensor Management
Write Data
Query Data
Continuous Query
Trigger
User Defined Function
Permission Management
Data SynchronisationOnly file synchronization, no built-in pluginsReal time synchronization+file synchronization, enriched with built-in plugins
Stream ProcessingOnly framework, no built-in pluginsFramework+rich built-in plugins
Tiered Storage×
View×
White List×
Audit Log×
Supporting ToolsWorkbench×
Cluster Management Tool×
System Monitor Tool×
LocalizationLocalization Compatibility Certification×
Technical SupportBest Practices×
Use Training×
- -### More efficient/stable product performance - -TimechoDB has optimized stability and performance on the basis of the open source version. With technical support from the enterprise version, it can achieve more than 10 times performance improvement and has the performance advantage of timely fault recovery. - -### More User-Friendly Tool System - -TimechoDB will provide users with a simpler and more user-friendly tool system. Through products such as the Cluster Monitoring Panel (Grafana), Database Console (Workbench), and Cluster Management Tool (Deploy Tool, abbreviated as IoTD), it will help users quickly deploy, manage, and monitor database clusters, reduce the work/learning costs of operation and maintenance personnel, simplify database operation and maintenance work, and make the operation and maintenance process more convenient and efficient. - -- Cluster Monitoring Panel: Designed to address the monitoring issues of TimechoDB and its operating system, including operating system resource monitoring, TimechoDB performance monitoring, and hundreds of kernel monitoring indicators, to help users monitor the health status of the cluster and perform cluster tuning and operation. - -
-

Overall Overview

-

Operating System Resource Monitoring

-

TimechoDB Performance Monitoring

-
-
- - - -
-

- -- Database Console: Designed to provide a low threshold database interaction tool, it helps users perform metadata management, data addition, deletion, modification, query, permission management, system management, and other operations in a concise and clear manner through an interface console, simplifying the difficulty of database use and improving database efficiency. - - -
-

Home Page

-

Operate Metadata

-

SQL Query

-
-
- - - -
-

- - -- Cluster management tool: aimed at solving the operational difficulties of multi node distributed systems, mainly including cluster deployment, cluster start stop, elastic expansion, configuration updates, data export and other functions, so as to achieve one click instruction issuance for complex database clusters, greatly reducing management difficulty. - - -
-  -
- -### More professional enterprise technical services - -TimechoDB customers provide powerful original factory services, including but not limited to on-site installation and training, expert consultant consultation, on-site emergency assistance, software upgrades, online self-service, remote support, and guidance on using the latest development version. At the same time, in order to make TimechoDB more suitable for industrial production scenarios, we will recommend modeling solutions, optimize read-write performance, optimize compression ratios, recommend database configurations, and provide other technical support based on the actual data structure and read-write load of the enterprise. If encountering industrial customization scenarios that are not covered by some products, TimechoDB will provide customized development tools based on user characteristics. - -Compared to the open source version, TimechoDB provides a faster release frequency every 2-3 months. At the same time, it offers day level exclusive fixes for urgent customer issues to ensure stable production environments. - -### More compatible localization adaptation - -The TimechoDB code is self-developed and controllable, and is compatible with most mainstream information and creative products (CPU, operating system, etc.), and has completed compatibility certification with multiple manufacturers to ensure product compliance and security. \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Performance.md b/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Performance.md deleted file mode 100644 index a428d141b..000000000 --- a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Performance.md +++ /dev/null @@ -1,38 +0,0 @@ - - -# Performance - -This chapter introduces the performance characteristics of IoTDB from the perspectives of database connection, database read and write performance, and storage performance. -The test tool uses IoTDBBenchmark, an open source time series database benchmark tool. - -## Database connection - -- Support high concurrent connections, a single server can support tens of thousands of concurrent connections per second. - - -## Read and write performance - -- It has the characteristics of high write throughput, a single core can handle more than tens of thousands of write requests per second, and the write performance of a single server can reach tens of millions of points per second; the cluster can be linearly scaled, and the write performance of the cluster can reach hundreds of millions points/second. -- It has the characteristics of high query throughput and low query latency, a single server supports tens of millions of points/second query throughput, and can aggregate tens of billions of data points in milliseconds. -- -## Storage performance - -- Supports the storage of massive data, with the storage and processing capabilities of PB-level data. -- Support high compression ratio, lossless compression can reach 20 times compression ratio, lossy compression can reach 100 times compression ratio. \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Publication.md b/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Publication.md deleted file mode 100644 index 1f1832ef0..000000000 --- a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Publication.md +++ /dev/null @@ -1,42 +0,0 @@ - - -# Academic Achievement - -Apache IoTDB starts at Tsinghua University, School of Software. IoTDB is a database for managing large amount of time series data with columnar storage, data encoding, pre-computation, and index techniques. It has SQL-like interface to write millions of data points per second per node and is optimized to get query results in few seconds over trillions of data points. It can also be easily integrated with Apache Hadoop MapReduce and Apache Spark for analytics. - -The research papers related are as follows: -* [Apache IoTDB: A Time Series Database for IoT Applications](https://sxsong.github.io/doc/23sigmod-iotdb.pdf), Chen Wang, Jialin Qiao, Xiangdong Huang, Shaoxu Song, Haonan Hou, Tian Jiang, Lei Rui, Jianmin Wang, Jiaguang Sun. SIGMOD 2023. -* [Grouping Time Series for Efficient Columnar Storage](https://sxsong.github.io/doc/23sigmod-group.pdf), Chenguang Fang, Shaoxu Song, Haoquan Guan, Xiangdong Huang, Chen Wang, Jianmin Wang. SIGMOD 2023. -* [Learning Autoregressive Model in LSM-Tree based Store](https://sxsong.github.io/doc/23kdd.pdf), Yunxiang Su, Wenxuan Ma, Shaoxu Song. SIGMOD 2023. -* [TsQuality: Measuring Time Series Data Quality in Apache IoTDB](https://sxsong.github.io/doc/23vldb-qaulity.pdf), Yuanhui Qiu, Chenguang Fang, Shaoxu Song, Xiangdong Huang, Chen Wang, Jianmin Wang. VLDB 2023. -* [Frequency Domain Data Encoding in Apache IoTDB](https://sxsong.github.io/doc/22vldb-frequency.pdf), Haoyu Wang, Shaoxu Song. VLDB 2023. -* [Non-Blocking Raft for High Throughput IoT Data](https://sxsong.github.io/doc/23icde-raft.pdf), Tian Jiang, Xiangdong Huang, Shaoxu Song, Chen Wang, Jianmin Wang, Ruibo Li, Jincheng Sun. ICDE 2023. -* [Backward-Sort for Time Series in Apache IoTDB](https://sxsong.github.io/doc/23icde-sort.pdf), Xiaojian Zhang, Hongyin Zhang, Shaoxu Song, Xiangdong Huang, Chen Wang, Jianmin Wang. ICDE 2023. -* [Time Series Data Encoding for Efficient Storage: A Comparative Analysis in Apache IoTDB](https://sxsong.github.io/doc/22vldb-encoding.pdf), Jinzhao Xiao, Yuxiang Huang, Changyu Hu, Shaoxu Song, Xiangdong Huang, Jianmin Wang. VLDB 2022. -* [Separation or Not: On Handing Out-of-Order Time-Series Data in Leveled LSM-Tree](https://sxsong.github.io/doc/22icde-separation.pdf), Yuyuan Kang, Xiangdong Huang, Shaoxu Song, Lingzhe Zhang, Jialin Qiao, Chen Wang, Jianmin Wang, Julian Feinauer. ICDE 2022. -* [Dual-PISA: An index for aggregation operations on time series data](https://www.sciencedirect.com/science/article/pii/S0306437918305489), Jialin Qiao, Xiangdong Huang, Jianmin Wang, Raymond K Wong. IS 2020. -* [Apache IoTDB: time-series database for internet of things](http://www.vldb.org/pvldb/vol13/p2901-wang.pdf), Chen Wang, Xiangdong Huang, Jialin Qiao, Tian Jiang, Lei Rui, Jinrui Zhang, Rong Kang, Julian Feinauer, Kevin A. McGrail, Peng Wang, Jun Yuan, Jianmin Wang, Jiaguang Sun. VLDB 2020. -* [KV-match: A Subsequence Matching Approach Supporting Normalization and Time Warping](https://www.semanticscholar.org/paper/KV-match%3A-A-Subsequence-Matching-Approach-and-Time-Wu-Wang/9ed84cb15b7e5052028fc5b4d667248713ac8592), Jiaye Wu and Peng Wang and Chen Wang and Wei Wang and Jianmin Wang. ICDE 2019. -* [The Design of Apache IoTDB distributed framework](http://ndbc2019.sdu.edu.cn/info/1002/1044.htm), Tianan Li, Jianmin Wang, Xiangdong Huang, Yi Xu, Dongfang Mao, Jun Yuan. NDBC 2019. -* [Matching Consecutive Subpatterns over Streaming Time Series](https://link.springer.com/chapter/10.1007/978-3-319-96893-3_8), Rong Kang and Chen Wang and Peng Wang and Yuting Ding and Jianmin Wang. APWeb/WAIM 2018. -* [PISA: An Index for Aggregating Big Time Series Data](https://dl.acm.org/citation.cfm?id=2983775&dl=ACM&coll=DL), Xiangdong Huang and Jianmin Wang and Raymond K. Wong and Jinrui Zhang and Chen Wang. CIKM 2016. - diff --git a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Scenario.md b/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Scenario.md deleted file mode 100644 index c3d82e010..000000000 --- a/src/UserGuide/V1.3.0-2/IoTDB-Introduction/Scenario.md +++ /dev/null @@ -1,94 +0,0 @@ - - -# Scenario - -## Application 1: Internet of Vehicles - -### Background - -> - Challenge: a large number of vehicles and time series - -A car company has a huge business volume and needs to deal with a large number of vehicles and a large amount of data. It has hundreds of millions of data measurement points, over ten million new data points per second, millisecond-level collection frequency, posing high requirements on real-time writing, storage and processing of databases. - -In the original architecture, the HBase cluster was used as the storage database. The query delay was high, and the system maintenance was difficult and costly. The HBase cluster cannot meet the demand. On the contrary, IoTDB supports high-frequency data writing with millions of measurement points and millisecond-level query response speed. The efficient data processing capability allows users to obtain the required data quickly and accurately. Therefore, IoTDB is chosen as the data storage layer, which has a lightweight architecture, reduces operation and maintenance costs, and supports elastic expansion and contraction and high availability to ensure system stability and availability. - -### Architecture - -The data management architecture of the car company using IoTDB as the time-series data storage engine is shown in the figure below. - - -![img](/img/architecture1.png) - -The vehicle data is encoded based on TCP and industrial protocols and sent to the edge gateway, and the gateway sends the data to the message queue Kafka cluster, decoupling the two ends of production and consumption. Kafka sends data to Flink for real-time processing, and the processed data is written into IoTDB. Both historical data and latest data are queried in IoTDB, and finally the data flows into the visualization platform through API for application. - -## Application 2: Intelligent Operation and Maintenance - -### Background - -A steel factory aims to build a low-cost, large-scale access-capable remote intelligent operation and maintenance software and hardware platform, access hundreds of production lines, more than one million devices, and tens of millions of time series, to achieve remote coverage of intelligent operation and maintenance. - -There are many challenges in this process: - -> - Wide variety of devices, protocols, and data types -> - Time series data, especially high-frequency data, has a huge amount of data -> - The reading and writing speed of massive time series data cannot meet business needs -> - Existing time series data management components cannot meet various advanced application requirements - -After selecting IoTDB as the storage database of the intelligent operation and maintenance platform, it can stably write multi-frequency and high-frequency acquisition data, covering the entire steel process, and use a composite compression algorithm to reduce the data size by more than 10 times, saving costs. IoTDB also effectively supports downsampling query of historical data of more than 10 years, helping enterprises to mine data trends and assist enterprises in long-term strategic analysis. - -### Architecture - -The figure below shows the architecture design of the intelligent operation and maintenance platform of the steel plant. - -![img](/img/architecture2.jpg) - -## Application 3: Smart Factory - -### Background - -> - Challenge:Cloud-edge collaboration - -A cigarette factory hopes to upgrade from a "traditional factory" to a "high-end factory". It uses the Internet of Things and equipment monitoring technology to strengthen information management and services to realize the free flow of data within the enterprise and to help improve productivity and lower operating costs. - -### Architecture - -The figure below shows the factory's IoT system architecture. IoTDB runs through the three-level IoT platform of the company, factory, and workshop to realize unified joint debugging and joint control of equipment. The data at the workshop level is collected, processed and stored in real time through the IoTDB at the edge layer, and a series of analysis tasks are realized. The preprocessed data is sent to the IoTDB at the platform layer for data governance at the business level, such as device management, connection management, and service support. Eventually, the data will be integrated into the IoTDB at the group level for comprehensive analysis and decision-making across the organization. - -![img](/img/architecture3.jpg) - - -## Application 4: Condition monitoring - -### Background - -> - Challenge: Smart heating, cost reduction and efficiency increase - -A power plant needs to monitor tens of thousands of measuring points of main and auxiliary equipment such as fan boiler equipment, generators, and substation equipment. In the previous heating process, there was a lack of prediction of the heat supply in the next stage, resulting in ineffective heating, overheating, and insufficient heating. - -After using IoTDB as the storage and analysis engine, combined with meteorological data, building control data, household control data, heat exchange station data, official website data, heat source side data, etc., all data are time-aligned in IoTDB to provide reliable data basis to realize smart heating. At the same time, it also solves the problem of monitoring the working conditions of various important components in the relevant heating process, such as on-demand billing and pipe network,heating station, etc., to reduce manpower input. - -### Architecture - -The figure below shows the data management architecture of the power plant in the heating scene. - -![img](/img/architecture4.jpg) - diff --git a/src/UserGuide/V1.3.0-2/QuickStart/QuickStart.md b/src/UserGuide/V1.3.0-2/QuickStart/QuickStart.md deleted file mode 100644 index d94d5e0c9..000000000 --- a/src/UserGuide/V1.3.0-2/QuickStart/QuickStart.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -redirectTo: QuickStart_apache.html ---- - diff --git a/src/UserGuide/V1.3.0-2/QuickStart/QuickStart_apache.md b/src/UserGuide/V1.3.0-2/QuickStart/QuickStart_apache.md deleted file mode 100644 index ce71c069b..000000000 --- a/src/UserGuide/V1.3.0-2/QuickStart/QuickStart_apache.md +++ /dev/null @@ -1,80 +0,0 @@ - -# Quick Start - -This document will help you understand how to quickly get started with IoTDB. - -## How to install and deploy? - -This document will help you quickly install and deploy IoTDB. You can quickly locate the content you need to view through the following document links: - -1. Prepare the necessary machine resources:The deployment and operation of IoTDB require consideration of multiple aspects of machine resource configuration. Specific resource allocation can be viewed [Database Resources](../Deployment-and-Maintenance/Database-Resources.md) - -2. Complete system configuration preparation:The system configuration of IoTDB involves multiple aspects, and the key system configuration introductions can be viewed [System Requirements](../Deployment-and-Maintenance/Environment-Requirements.md) - -3. Get installation package:You can visit [Apache IoTDB official website](https://iotdb.apache.org/zh/Download/ ) Get the IoTDB installation package.The specific installation package structure can be viewed: [Obtain IoTDB](../Deployment-and-Maintenance/IoTDB-Package_apache.md) - -4. Install database: You can choose the following tutorials for installation and deployment based on the actual deployment architecture: - - - Stand-Alone Deployment: [Stand-Alone Deployment](../Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md) - - - Cluster Deployment: [Cluster Deployment](../Deployment-and-Maintenance/Cluster-Deployment_apache.md) - -> ❗️Attention: Currently, we still recommend installing and deploying directly on physical/virtual machines. If Docker deployment is required, please refer to: [Docker Deployment](../Deployment-and-Maintenance/Docker-Deployment_apache.md) - -## How to use it? - -1. Database modeling design: Database modeling is an important step in creating a database system, which involves designing the structure and relationships of data to ensure that the organization of data meets the specific application requirements. The following document will help you quickly understand the modeling design of IoTDB: - - - Introduction to the concept of timeseries: [Navigating Time Series Data](../Basic-Concept/Navigating_Time_Series_Data.md) - - - Introduction to Modeling Design: [Data Model](../Basic-Concept/Data-Model-and-Terminology.md) - - - SQL syntax introduction: [Operate Metadata](../User-Manual/Operate-Metadata_apache.md) - -2. Write Data: In terms of data writing, IoTDB provides multiple ways to insert real-time data. Please refer to the basic data writing operations for details [Write Data](../User-Manual/Write-Delete-Data.md) - -3. Query Data: IoTDB provides rich data query functions. Please refer to the basic introduction of data query [Query Data](../User-Manual/Query-Data.md) - -4. Other advanced features: In addition to common functions such as writing and querying in databases, IoTDB also supports "Data Synchronisation、Stream Framework、Database Administration " and other functions, specific usage methods can be found in the specific document: - - - Data Synchronisation: [Data Synchronisation](../User-Manual/Data-Sync_apache.md) - - - Stream Framework: [Stream Framework](../User-Manual/Streaming_apache.md) - - - Database Administration: [Database Administration](../User-Manual/Authority-Management.md) - -5. API: IoTDB provides multiple application programming interfaces (API) for developers to interact with IoTDB in their applications, and currently supports [Java Native API](../API/Programming-Java-Native-API.md)、[Python Native API](../API/Programming-Python-Native-API.md)、[C++ Native API](../API/Programming-Cpp-Native-API.md) ,For more API, please refer to the official website 【API】 and other chapters - -## What other convenient tools are available? - -In addition to its rich features, IoTDB also has a comprehensive range of tools in its surrounding system. This document will help you quickly use the peripheral tool system : - - - Benchmark Tool: IoT benchmark is a time series database benchmark testing tool developed based on Java and big data environments, developed and open sourced by the School of Software at Tsinghua University. It supports multiple writing and querying methods, can store test information and results for further query or analysis, and supports integration with Tableau to visualize test results. For specific usage instructions, please refer to: [Benchmark Tool](../Tools-System/Benchmark.md) - - - Data Import Export Script: Used to achieve the interaction between internal data and external files in IoTDB, suitable for batch operations of individual files or directory files. For specific usage instructions, please refer to: [Data Import Export Script](../Tools-System/Data-Import-Export-Tool.md) - - - TsFile Import Export Script: For different scenarios, IoTDB provides users with multiple ways to batch import data. For specific usage instructions, please refer to: [TsFile Import Export Script](../Tools-System/TsFile-Import-Export-Tool.md) - -## Encountering problems during use? - -If you encounter difficulties during installation or use, you can move to [Frequently Asked Questions](../FAQ/Frequently-asked-questions.md) View in the middle - diff --git a/src/UserGuide/V1.3.0-2/QuickStart/QuickStart_timecho.md b/src/UserGuide/V1.3.0-2/QuickStart/QuickStart_timecho.md deleted file mode 100644 index cea50ceb5..000000000 --- a/src/UserGuide/V1.3.0-2/QuickStart/QuickStart_timecho.md +++ /dev/null @@ -1,91 +0,0 @@ - -# Quick Start - -This document will help you understand how to quickly get started with IoTDB. - -## How to install and deploy? - -This document will help you quickly install and deploy IoTDB. You can quickly locate the content you need to view through the following document links: - -1. Prepare the necessary machine resources: The deployment and operation of IoTDB require consideration of multiple aspects of machine resource configuration. Specific resource allocation can be viewed [Database Resources](../Deployment-and-Maintenance/Database-Resources.md) - -2. Complete system configuration preparation: The system configuration of IoTDB involves multiple aspects, and the key system configuration introductions can be viewed [System Requirements](../Deployment-and-Maintenance/Environment-Requirements.md) - -3. Get installation package: You can contact Timecho Business to obtain the IoTDB installation package to ensure that the downloaded version is the latest and stable. The specific installation package structure can be viewed: [Obtain TimechoDB](../Deployment-and-Maintenance/IoTDB-Package_timecho.md) - -4. Install database and activate: You can choose the following tutorials for installation and deployment based on the actual deployment architecture: - - - Stand-Alone Deployment: [Stand-Alone Deployment](../Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md) - - - Cluster Deployment: [Cluster Deployment](../Deployment-and-Maintenance/Cluster-Deployment_timecho.md) - - - Dual Active Deployment: [Dual Active Deployment](../Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md) - -> ❗️Attention: Currently, we still recommend installing and deploying directly on physical/virtual machines. If Docker deployment is required, please refer to: [Docker Deployment](../Deployment-and-Maintenance/Docker-Deployment_timecho.md) - -5. Install database supporting tools: The enterprise version database provides a monitoring panel 、Workbench Supporting tools, etc,It is recommended to install IoTDB when deploying the enterprise version, which can help you use IoTDB more conveniently: - - - Monitoring panel:Provides over a hundred database monitoring metrics for detailed monitoring of IoTDB and its operating system, enabling system optimization, performance optimization, bottleneck discovery, and more. The installation steps can be viewed [Monitoring panel](../Deployment-and-Maintenance/Monitoring-panel-deployment.md) - - - Workbench: It is the visual interface of IoTDB,Support providing through interface interaction Operate Metadata、Query Data、Data Visualization and other functions, help users use the database easily and efficiently, and the installation steps can be viewed [Workbench Deployment](../Deployment-and-Maintenance/workbench-deployment_timecho.md) - -## How to use it? - -1. Database modeling design: Database modeling is an important step in creating a database system, which involves designing the structure and relationships of data to ensure that the organization of data meets the specific application requirements. The following document will help you quickly understand the modeling design of IoTDB: - - - Introduction to the concept of timeseries:[Navigating Time Series Data](../Basic-Concept/Navigating_Time_Series_Data.md) - - - Introduction to Modeling Design: [Data Model](../Basic-Concept/Data-Model-and-Terminology.md) - - - SQL syntax introduction:[Operate Metadata](../User-Manual/Operate-Metadata_timecho.md) - -2. Write Data: In terms of data writing, IoTDB provides multiple ways to insert real-time data. Please refer to the basic data writing operations for details [Write Data](../User-Manual/Write-Delete-Data.md) - -3. Query Data: IoTDB provides rich data query functions. Please refer to the basic introduction of data query [Query Data](../User-Manual/Query-Data.md) - -4. Other advanced features: In addition to common functions such as writing and querying in databases, IoTDB also supports "Data Synchronisation、Stream Framework、Security Management、Database Administration、AI Capability"and other functions, specific usage methods can be found in the specific document: - - - Data Synchronisation: [Data Synchronisation](../User-Manual/Data-Sync_timecho.md) - - - Stream Framework: [Stream Framework](../User-Manual/Streaming_timecho.md) - - - Security Management: [Security Management](../User-Manual/Security-Management_timecho.md) - - - Database Administration: [Database Administration](../User-Manual/Authority-Management.md) - - - AI Capability :[AI Capability](../User-Manual/AINode_timecho.md) - -5. API: IoTDB provides multiple application programming interfaces (API) for developers to interact with IoTDB in their applications, and currently supports[ Java Native API](../API/Programming-Java-Native-API.md)、[Python Native API](../API/Programming-Python-Native-API.md)、[C++ Native API](../API/Programming-Cpp-Native-API.md)、[Go Native API](../API/Programming-Go-Native-API.md), For more API, please refer to the official website 【API】 and other chapters - -## What other convenient tools are available? - -In addition to its rich features, IoTDB also has a comprehensive range of tools in its surrounding system. This document will help you quickly use the peripheral tool system : - - - Benchmark Tool: IoT benchmark is a time series database benchmark testing tool developed based on Java and big data environments, developed and open sourced by the School of Software at Tsinghua University. It supports multiple writing and querying methods, can store test information and results for further query or analysis, and supports integration with Tableau to visualize test results. For specific usage instructions, please refer to: [Benchmark Tool](../Tools-System/Benchmark.md) - - - Data Import Export Script: Used to achieve the interaction between internal data and external files in IoTDB, suitable for batch operations of individual files or directory files. For specific usage instructions, please refer to: [Data Import Export Script](../Tools-System/Data-Import-Export-Tool.md) - - - TsFile Import Export Script: For different scenarios, IoTDB provides users with multiple ways to batch import data. For specific usage instructions, please refer to: [TsFile Import Export Script](../Tools-System/TsFile-Import-Export-Tool.md) - -## Encountering problems during use? - -If you encounter difficulties during installation or use, you can move to [Frequently Asked Questions](../FAQ/Frequently-asked-questions.md) View in the middle \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Reference/Common-Config-Manual.md b/src/UserGuide/V1.3.0-2/Reference/Common-Config-Manual.md deleted file mode 100644 index 2c84562db..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/Common-Config-Manual.md +++ /dev/null @@ -1,2065 +0,0 @@ - - -# Common Configuration - -IoTDB common files for ConfigNode and DataNode are under `conf`. - -* `iotdb-common.properties`:IoTDB system configurations. - - -## Effective -Different configuration parameters take effect in the following three ways: - -+ **Only allowed to be modified in first start up:** Can't be modified after first start, otherwise the ConfigNode/DataNode cannot start. -+ **After restarting system:** Can be modified after the ConfigNode/DataNode first start, but take effect after restart. -+ **hot-load:** Can be modified while the ConfigNode/DataNode is running, and trigger through sending the command(sql) `load configuration` or `set configuration` to the IoTDB server by client or session. - -## Configuration File - -### Replication Configuration - -* config\_node\_consensus\_protocol\_class - -| Name | config\_node\_consensus\_protocol\_class | -|:-----------:|:-----------------------------------------------------------------------| -| Description | Consensus protocol of ConfigNode replicas, only support RatisConsensus | -| Type | String | -| Default | org.apache.iotdb.consensus.ratis.RatisConsensus | -| Effective | Only allowed to be modified in first start up | - -* schema\_replication\_factor - -| Name | schema\_replication\_factor | -|:-----------:|:-----------------------------------------------------------------| -| Description | Schema replication num | -| Type | int32 | -| Default | 1 | -| Effective | Take effect on **new created Databases** after restarting system | - -* schema\_region\_consensus\_protocol\_class - -| Name | schema\_region\_consensus\_protocol\_class | -|:-----------:|:--------------------------------------------------------------------------------------------------------------------------------------------:| -| Description | Consensus protocol of schema replicas,larger than 1 replicas could only use RatisConsensus | -| Type | String | -| Default | org.apache.iotdb.consensus.ratis.RatisConsensus | -| Effective | Only allowed to be modified in first start up | - -* data\_replication\_factor - -| Name | data\_replication\_factor | -|:-----------:|:-----------------------------------------------------------------| -| Description | Data replication num | -| Type | int32 | -| Default | 1 | -| Effective | Take effect on **new created Databases** after restarting system | - -* data\_region\_consensus\_protocol\_class - -| Name | data\_region\_consensus\_protocol\_class | -|:-----------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------| -| Description | Consensus protocol of data replicasa,larger than 1 replicas could use IoTConsensus or RatisConsensus | -| Type | String | -| Default | org.apache.iotdb.consensus.simple.SimpleConsensus | -| Effective | Only allowed to be modified in first start up | - -### Load balancing Configuration - -* series\_partition\_slot\_num - -| Name | series\_slot\_num | -|:-----------:|:----------------------------------------------| -| Description | Slot num of series partition | -| Type | int32 | -| Default | 10000 | -| Effective | Only allowed to be modified in first start up | - -* series\_partition\_executor\_class - -| Name | series\_partition\_executor\_class | -|:-----------:|:------------------------------------------------------------------| -| Description | Series partition hash function | -| Type | String | -| Default | org.apache.iotdb.commons.partition.executor.hash.BKDRHashExecutor | -| Effective | Only allowed to be modified in first start up | - -* schema\_region\_group\_extension\_policy - -| Name | schema\_region\_group\_extension\_policy | -|:-----------:|:------------------------------------------| -| Description | The extension policy of SchemaRegionGroup | -| Type | string | -| Default | AUTO | -| Effective | After restarting system | - -* default\_schema\_region\_group\_num\_per\_database - -| Name | default\_schema\_region\_group\_num\_per\_database | -|:-----------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Description | The number of SchemaRegionGroups that each Database has when using the CUSTOM-SchemaRegionGroup extension policy. The least number of SchemaRegionGroups that each Database has when using the AUTO-SchemaRegionGroup extension policy. | -| Type | int | -| Default | 1 | -| Effective | After restarting system | - -* schema\_region\_per\_data\_node - -| Name | schema\_region\_per\_data\_node | -|:-----------:|:---------------------------------------------------------------------------| -| Description | The maximum number of SchemaRegion expected to be managed by each DataNode | -| Type | double | -| Default | 1.0 | -| Effective | After restarting system | - -* data\_region\_group\_extension\_policy - -| Name | data\_region\_group\_extension\_policy | -|:-----------:|:----------------------------------------| -| Description | The extension policy of DataRegionGroup | -| Type | string | -| Default | AUTO | -| Effective | After restarting system | - -* default\_data\_region\_group\_num\_per\_database - -| Name | default\_data\_region\_group\_num\_per\_database | -|:-----------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Description | The number of DataRegionGroups that each Database has when using the CUSTOM-DataRegionGroup extension policy. The least number of DataRegionGroups that each Database has when using the AUTO-DataRegionGroup extension policy. | -| Type | int | -| Default | 1 | -| Effective | After restarting system | - -* data\_region\_per\_processor - -| Name | data\_region\_per\_processor | -|:-----------:|:--------------------------------------------------------------------------| -| Description | The maximum number of DataRegion expected to be managed by each processor | -| Type | double | -| Default | 1.0 | -| Effective | After restarting system | - -* enable\_data\_partition\_inherit\_policy - -| Name | enable\_data\_partition\_inherit\_policy | -|:-----------:|:---------------------------------------------------| -| Description | Whether to enable the DataPartition inherit policy | -| Type | Boolean | -| Default | false | -| Effective | After restarting system | - -* leader\_distribution\_policy - -| Name | leader\_distribution\_policy | -|:-----------:|:--------------------------------------------------------| -| Description | The policy of cluster RegionGroups' leader distribution | -| Type | String | -| Default | MIN_COST_FLOW | -| Effective | After restarting system | - -* enable\_auto\_leader\_balance\_for\_ratis - -| Name | enable\_auto\_leader\_balance\_for\_ratis\_consensus | -|:-----------:|:-------------------------------------------------------------------| -| Description | Whether to enable auto leader balance for Ratis consensus protocol | -| Type | Boolean | -| Default | false | -| Effective | After restarting system | - -* enable\_auto\_leader\_balance\_for\_iot\_consensus - -| Name | enable\_auto\_leader\_balance\_for\_iot\_consensus | -|:-----------:|:----------------------------------------------------------------| -| Description | Whether to enable auto leader balance for IoTConsensus protocol | -| Type | Boolean | -| Default | true | -| Effective | After restarting system | - -### Cluster Management - -* time\_partition\_interval - -| Name | time\_partition\_interval | -|:-----------:|:--------------------------------------------------------------| -| Description | Time partition interval of data when ConfigNode allocate data | -| Type | Long | -| Unit | ms | -| Default | 604800000 | -| Effective | Only allowed to be modified in first start up | - -* heartbeat\_interval\_in\_ms - -| Name | heartbeat\_interval\_in\_ms | -|:-----------:|:----------------------------------------| -| Description | Heartbeat interval in the cluster nodes | -| Type | Long | -| Unit | ms | -| Default | 1000 | -| Effective | After restarting system | - -* disk\_space\_warning\_threshold - -| Name | disk\_space\_warning\_threshold | -|:-----------:|:--------------------------------| -| Description | Disk remaining threshold | -| Type | double(percentage) | -| Default | 0.05 | -| Effective | After restarting system | - -### Memory Control Configuration - -* datanode\_memory\_proportion - -|Name| datanode\_memory\_proportion | -|:---:|:-------------------------------------------------------------------------------------------------------------| -|Description| Memory Allocation Ratio: StorageEngine, QueryEngine, SchemaEngine, StreamingEngine, Consensus and Free Memory | -|Type| Ratio | -|Default| 3:3:1:1:1:1 | -|Effective| After restarting system | - -* schema\_memory\_allocate\_proportion - -|Name| schema\_memory\_allocate\_proportion | -|:---:|:----------------------------------------------------------------------------------------| -|Description| Schema Memory Allocation Ratio: SchemaRegion, SchemaCache, PartitionCache and LastCache | -|Type| Ratio | -|Default| 5:3:1:1 | -|Effective| After restarting system | - -* storage\_engine\_memory\_proportion - -|Name| storage\_engine\_memory\_proportion | -|:---:|:------------------------------------| -|Description| Memory allocation ratio in StorageEngine: Write, Compaction | -|Type| Ratio | -|Default| 8:2 | -|Effective| After restarting system | - -* write\_memory\_proportion - -|Name| write\_memory\_proportion | -|:---:|:----------------------------------------------------------------| -|Description| Memory allocation ratio in writing: Memtable, TimePartitionInfo | -|Type| Ratio | -|Default| 19:1 | -|Effective| After restarting system | - -* concurrent\_writing\_time\_partition - -|Name| concurrent\_writing\_time\_partition | -|:---:|:---| -|Description| This config decides how many time partitions in a database can be inserted concurrently
For example, your partitionInterval is 86400 and you want to insert data in 5 different days, | -|Type|int32| -|Default| 1 | -|Effective|After restarting system| - -* primitive\_array\_size - -| Name | primitive\_array\_size | -|:-----------:|:----------------------------------------------------------| -| Description | primitive array size (length of each array) in array pool | -| Type | Int32 | -| Default | 64 | -| Effective | After restart system | - -* chunk\_metadata\_size\_proportion - -|Name| chunk\_metadata\_size\_proportion | -|:---:|:------------------------------------| -|Description| size proportion for chunk metadata maintains in memory when writing tsfile | -|Type| Double | -|Default| 0.1 | -|Effective|After restart system| - -* flush\_proportion - -| Name | flush\_proportion | -|:-----------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Description | Ratio of write memory for invoking flush disk, 0.4 by default If you have extremely high write load (like batch=1000), it can be set lower than the default value like 0.2 | -| Type | Double | -| Default | 0.4 | -| Effective | After restart system | - -* buffered\_arrays\_memory\_proportion - -|Name| buffered\_arrays\_memory\_proportion | -|:---:|:---| -|Description| Ratio of write memory allocated for buffered arrays | -|Type| Double | -|Default| 0.6 | -|Effective|After restart system| - -* reject\_proportion - -|Name| reject\_proportion | -|:---:|:---| -|Description| Ratio of write memory for rejecting insertion | -|Type| Double | -|Default| 0.8 | -|Effective|After restart system| - -* write\_memory\_variation\_report\_proportion - -| Name | write\_memory\_variation\_report\_proportion | -| :---------: | :----------------------------------------------------------- | -| Description | if memory cost of data region increased more than proportion of allocated memory for write, report to system | -| Type | Double | -| Default | 0.001 | -| Effective | After restarting system | - -* check\_period\_when\_insert\_blocked - -|Name| check\_period\_when\_insert\_blocked | -|:---:|:----------------------------------------------------------------------------| -|Description| when an inserting is rejected, waiting period (in ms) to check system again | -|Type| Int32 | -|Default| 50 | -|Effective| After restart system | - -* io\_task\_queue\_size\_for\_flushing - -|Name| io\_task\_queue\_size\_for\_flushing | -|:---:|:----------------------------------------------| -|Description| size of ioTaskQueue. The default value is 10 | -|Type| Int32 | -|Default| 10 | -|Effective| After restart system | - -* enable\_query\_memory\_estimation - -|Name| enable\_query\_memory\_estimation | -|:---:|:----------------------------------| -|Description| If true, we will estimate each query's possible memory footprint before executing it and deny it if its estimated memory exceeds current free memory | -|Type| bool | -|Default| true | -|Effective|hot-load| - -* partition\_cache\_size - -|Name| partition\_cache\_size | -|:---:|:---| -|Description| The max num of partition info record cached on DataNode. | -|Type| Int32 | -|Default| 1000 | -|Effective|After restarting system| - -### Schema Engine Configuration - -* schema\_engine\_mode - -| Name | schema\_engine\_mode | -|:-----------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Description | Schema engine mode, supporting Memory and PBTree modes; PBTree mode support evict the timeseries schema temporarily not used in memory at runtime, and load it into memory from disk when needed. This parameter must be the same on all DataNodes in one cluster. | -| Type | string | -| Default | Memory | -| Effective | Only allowed to be modified in first start up | - -* mlog\_buffer\_size - -|Name| mlog\_buffer\_size | -|:---:|:---| -|Description| size of log buffer in each metadata operation plan(in byte) | -|Type|int32| -|Default| 1048576 | -|Effective|After restart system| - -* sync\_mlog\_period\_in\_ms - -| Name | sync\_mlog\_period\_in\_ms | -| :---------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Description | The cycle when metadata log is periodically forced to be written to disk(in milliseconds). If force_mlog_period_in_ms = 0 it means force metadata log to be written to disk after each refreshment | -| Type | Int64 | -| Default | 100 | -| Effective | After restarting system | - -* tag\_attribute\_flush\_interval - -|Name| tag\_attribute\_flush\_interval | -|:---:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -|Description| interval num for tag and attribute records when force flushing to disk. When a certain amount of tag and attribute records is reached, they will be force flushed to disk. It is possible to lose at most tag_attribute_flush_interval records | -|Type| int32 | -|Default| 1000 | -|Effective| Only allowed to be modified in first start up | - -* tag\_attribute\_total\_size - -|Name| tag\_attribute\_total\_size | -|:---:|:---| -|Description| The maximum persistence size of tags and attributes of each time series.| -|Type| int32 | -|Default| 700 | -|Effective|Only allowed to be modified in first start up| - -* schema\_region\_device\_node\_cache\_size - -|Name| schema\_region\_device\_node\_cache\_size | -|:---:|:--------------------------------| -|Description| The max num of device node, used for speeding up device query, cached in schemaRegion. | -|Type| Int32 | -|Default| 10000 | -|Effective|After restarting system| - -* max\_measurement\_num\_of\_internal\_request - -|Name| max\_measurement\_num\_of\_internal\_request | -|:---:|:--------------------------------| -|Description| When there's too many measurements in one create timeseries plan, the plan will be split to several sub plan, with measurement num no more than this param.| -|Type| Int32 | -|Default| 10000 | -|Effective|After restarting system| - -### Configurations for creating schema automatically - -* enable\_auto\_create\_schema - -| Name | enable\_auto\_create\_schema | -| :---------: | :---------------------------------------------------------------------------- | -| Description | whether auto create the time series when a non-existed time series data comes | -| Type | true or false | -| Default | true | -| Effective | After restarting system | - -* default\_storage\_group\_level - -| Name | default\_storage\_group\_level | -| :---------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Description | Database level when creating schema automatically is enabled. For example, if we receives a data point from root.sg0.d1.s2, we will set root.sg0 as the database if database level is 1. (root is level 0) | -| Type | integer | -| Default | 1 | -| Effective | After restarting system | - -* boolean\_string\_infer\_type - -| Name | boolean\_string\_infer\_type | -| :---------: | :------------------------------------------------------------ | -| Description | To which type the values "true" and "false" should be reslved | -| Type | BOOLEAN or TEXT | -| Default | BOOLEAN | -| Effective | After restarting system | - -* integer\_string\_infer\_type - -| Name | integer\_string\_infer\_type | -| :---------: | :---------------------------------------------------------------------- | -| Description | To which type an integer string like "67" in a query should be resolved | -| Type | INT32, INT64, DOUBLE, FLOAT or TEXT | -| Default | DOUBLE | -| Effective | After restarting system | - -* floating\_string\_infer\_type - -| Name | floating\_string\_infer\_type | -| :---------: | :------------------------------------------------------------------------------ | -| Description | To which type a floating number string like "6.7" in a query should be resolved | -| Type | DOUBLE, FLOAT or TEXT | -| Default | DOUBLE | -| Effective | After restarting system | - -* nan\_string\_infer\_type - -| Name | nan\_string\_infer\_type | -| :---------: |:----------------------------------------------------------| -| Description | To which type the value NaN in a query should be resolved | -| Type | DOUBLE, FLOAT or TEXT | -| Default | DOUBLE | -| Effective | After restarting system | - -### Query Configurations - -* read\_consistency\_level - -| Name | mpp\_data\_exchange\_core\_pool\_size | -|:-----------:|:---------------------------------------------| -| Description | The read consistency level,
1. strong(Default, read from the leader replica)
2. weak(Read from a random replica) | -| Type | string | -| Default | strong | -| Effective | After restarting system | - -* meta\_data\_cache\_enable - -|Name| meta\_data\_cache\_enable | -|:---:|:---| -|Description| Whether to cache meta data(BloomFilter, ChunkMetadata and TimeSeriesMetadata) or not.| -|Type|Boolean| -|Default| true | -|Effective| After restarting system| - -* chunk\_timeseriesmeta\_free\_memory\_proportion - -|Name| chunk\_timeseriesmeta\_free\_memory\_proportion | -|:---:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -|Description| Read memory Allocation Ratio: BloomFilterCache : ChunkCache : TimeSeriesMetadataCache : Coordinator : Operators : DataExchange : timeIndex in TsFileResourceList : others. | -|Default| 1 : 100 : 200 : 300 : 400 | -|Effective| After restarting system | - -* enable\_last\_cache - -|Name| enable\_last\_cache | -|:---:|:---| -|Description| Whether to enable LAST cache. | -|Type| Boolean | -|Default| true | -|Effective|After restarting system| - -* max\_deduplicated\_path\_num - -|Name| max\_deduplicated\_path\_num | -|:---:|:---| -|Description| allowed max numbers of deduplicated path in one query. | -|Type| Int32 | -|Default| 1000 | -|Effective|After restarting system| - -* mpp\_data\_exchange\_core\_pool\_size - -| Name | mpp\_data\_exchange\_core\_pool\_size | -|:-----------:|:---------------------------------------------| -| Description | Core size of ThreadPool of MPP data exchange | -| Type | int32 | -| Default | 10 | -| Effective | After restarting system | - -* mpp\_data\_exchange\_max\_pool\_size - -| Name | mpp\_data\_exchange\_max\_pool\_size | -| :---------: | :------------------------------------------ | -| Description | Max size of ThreadPool of MPP data exchange | -| Type | int32 | -| Default | 10 | -| Effective | After restarting system | - -* mpp\_data\_exchange\_keep\_alive\_time\_in\_ms - -|Name| mpp\_data\_exchange\_keep\_alive\_time\_in\_ms | -|:---:|:---| -|Description| Max waiting time for MPP data exchange | -|Type| long | -|Default| 1000 | -|Effective|After restarting system| - -* driver\_task\_execution\_time\_slice\_in\_ms - -| Name | driver\_task\_execution\_time\_slice\_in\_ms | -| :---------: | :------------------------------------------- | -| Description | Maximum execution time of a DriverTask | -| Type | int32 | -| Default | 100 | -| Effective | After restarting system | - -* max\_tsblock\_size\_in\_bytes - -| Name | max\_tsblock\_size\_in\_bytes | -| :---------: | :---------------------------- | -| Description | Maximum capacity of a TsBlock | -| Type | int32 | -| Default | 1024 * 1024 (1 MB) | -| Effective | After restarting system | - -* max\_tsblock\_line\_numbers - -| Name | max\_tsblock\_line\_numbers | -| :---------: | :------------------------------------------ | -| Description | Maximum number of lines in a single TsBlock | -| Type | int32 | -| Default | 1000 | -| Effective | After restarting system | - -* slow\_query\_threshold - -|Name| slow\_query\_threshold | -|:---:|:----------------------------------------| -|Description| Time cost(ms) threshold for slow query. | -|Type| Int32 | -|Default| 30000 | -|Effective| Trigger | - -* query\_timeout\_threshold - -|Name| query\_timeout\_threshold | -|:---:|:---| -|Description| The max executing time of query. unit: ms | -|Type| Int32 | -|Default| 60000 | -|Effective| After restarting system| - -* max\_allowed\_concurrent\_queries - -|Name| max\_allowed\_concurrent\_queries | -|:---:|:---| -|Description| The maximum allowed concurrently executing queries. | -|Type| Int32 | -|Default| 1000 | -|Effective|After restarting system| - -* query\_thread\_count - -|Name| query\_thread\_count | -|:---:|:---------------------------------------------------------------------------------------------------------------------| -|Description| How many threads can concurrently execute query statement. When <= 0, use CPU core number. | -|Type| Int32 | -|Default | CPU core number | -|Effective| After restarting system | - -* batch\_size - -|Name| batch\_size | -|:---:|:---| -|Description| The amount of data iterate each time in server (the number of data strips, that is, the number of different timestamps.) | -|Type| Int32 | -|Default| 100000 | -|Effective|After restarting system| - -### Storage Engine Configuration - -* timestamp\_precision - -| Name | timestamp\_precision | -| :----------: | :-------------------------- | -| Description | timestamp precision,support ms、us、ns | -| Type | String | -| Default | ms | -| Effective | Only allowed to be modified in first start up | - -* default\_ttl\_in\_ms - -| Name | default\_ttl\_in\_ms | -|:---:|:--------------| -|Description| Define the maximum age of data for which each tier is responsible | -|Type| long | -|Default| -1 | -|Effective| After restarting system | - -* max\_waiting\_time\_when\_insert\_blocked - -| Name | max\_waiting\_time\_when\_insert\_blocked | -| :---------: |:------------------------------------------------------------------------------| -| Description | When the waiting time(in ms) of an inserting exceeds this, throw an exception | -| Type | Int32 | -| Default | 10000 | -| Effective | After restarting system | - -* handle\_system\_error - -| Name | handle\_system\_error | -| :---------: |:-------------------------------------------------------| -| Description | What will the system do when unrecoverable error occurs| -| Type | String | -| Default | CHANGE\_TO\_READ\_ONLY | -| Effective | After restarting system | - -* write\_memory\_variation\_report\_proportion - -| Name | write\_memory\_variation\_report\_proportion | -| :---------: | :----------------------------------------------------------------------------------------------------------- | -| Description | if memory cost of data region increased more than proportion of allocated memory for write, report to system | -| Type | Double | -| Default | 0.001 | -| Effective | After restarting system | - -* enable\_timed\_flush\_seq\_memtable - -| Name | enable\_timed\_flush\_seq\_memtable | -|:-----------:|:------------------------------------------------| -| Description | whether to enable timed flush sequence memtable | -| Type | Boolean | -| Default | true | -| Effective | hot-load | - -* seq\_memtable\_flush\_interval\_in\_ms - -| Name | seq\_memtable\_flush\_interval\_in\_ms | -|:-----------:|:---------------------------------------------------------------------------------------------------------| -| Description | if a memTable's created time is older than current time minus this, the memtable will be flushed to disk | -| Type | int32 | -| Default | 10800000 | -| Effective | hot-load | - -* seq\_memtable\_flush\_check\_interval\_in\_ms - -|Name| seq\_memtable\_flush\_check\_interval\_in\_ms | -|:---:|:---| -|Description| the interval to check whether sequence memtables need flushing | -|Type|int32| -|Default| 600000 | -|Effective| hot-load | - -* enable\_timed\_flush\_unseq\_memtable - -|Name| enable\_timed\_flush\_unseq\_memtable | -|:---:|:---| -|Description| whether to enable timed flush unsequence memtable | -|Type|Boolean| -|Default| false | -|Effective| hot-load | - -* unseq\_memtable\_flush\_interval\_in\_ms - -| Name | unseq\_memtable\_flush\_interval\_in\_ms | -|:-----------:|:---------------------------------------------------------------------------------------------------------| -| Description | if a memTable's created time is older than current time minus this, the memtable will be flushed to disk | -| Type | int32 | -| Default | 600000 | -| Effective | hot-load | - -* unseq\_memtable\_flush\_check\_interval\_in\_ms - -|Name| unseq\_memtable\_flush\_check\_interval\_in\_ms | -|:---:|:---| -|Description| the interval to check whether unsequence memtables need flushing | -|Type|int32| -|Default| 30000 | -|Effective| hot-load | - -* tvlist\_sort\_algorithm - -|Name| tvlist\_sort\_algorithm | -|:---:|:--------------------------------------------------| -|Description| the sort algorithm used in the memtable's TVList | -|Type| String | -|Default| TIM | -|Effective| After restarting system | - -* avg\_series\_point\_number\_threshold - -|Name| avg\_series\_point\_number\_threshold | -|:---:|:-------------------------------------------------------| -|Description| max average number of point of each series in memtable | -|Type| int32 | -|Default| 100000 | -|Effective| After restarting system | - -* flush\_thread\_count - -|Name| flush\_thread\_count | -|:---:|:---| -|Description| The thread number used to perform the operation when IoTDB writes data in memory to disk. If the value is less than or equal to 0, then the number of CPU cores installed on the machine is used. The default is 0.| -|Type| int32 | -|Default| 0 | -|Effective|After restarting system| - -* enable\_partial\_insert - -|Name| enable\_partial\_insert | -|:---:|:---| -|Description| Whether continue to write other measurements if some measurements are failed in one insertion.| -|Type| Boolean | -|Default| true | -|Effective|After restarting system| - -* recovery\_log\_interval\_in\_ms - -|Name| recovery\_log\_interval\_in\_ms | -|:---:|:------------------------------------------------------------------------| -|Description| the interval to log recover progress of each region when starting iotdb | -|Type| Int32 | -|Default| 5000 | -|Effective| After restarting system | - -* 0.13\_data\_insert\_adapt - -|Name| 0.13\_data\_insert\_adapt | -|:---:|:----------------------------------------------------------------------| -|Description| if using v0.13 client to insert data, set this configuration to true. | -|Type| Boolean | -|Default| false | -|Effective| After restarting system | - - -* device\_path\_cache\_size - -| Name | device\_path\_cache\_size | -|:---------:|:--------------------------------------------------------------------------------------------------------------------------| -|Description| The max size of the device path cache. This cache is for avoiding initialize duplicated device id object in write process | -| Type | Int32 | -| Default | 500000 | -| Effective | After restarting system | - -* insert\_multi\_tablet\_enable\_multithreading\_column\_threshold - -| Name | insert\_multi\_tablet\_enable\_multithreading\_column\_threshold | -| :---------: | :--------------------------------------------------------------------------------------------- | -| Description | When the insert plan column count reaches the specified threshold, multi-threading is enabled. | -| Type | int32 | -| Default | 10 | -| Effective | After restarting system | - -### Compaction Configurations - -* enable\_seq\_space\_compaction - -| Name | enable\_seq\_space\_compaction | -| :---------: |:---------------------------------------------| -| Description | enable the compaction between sequence files | -| Type | Boolean | -| Default | true | -| Effective | hot-load | - -* enable\_unseq\_space\_compaction - -| Name | enable\_unseq\_space\_compaction | -| :---------: |:-----------------------------------------------| -| Description | enable the compaction between unsequence files | -| Type | Boolean | -| Default | true | -| Effective | hot-load | - -* enable\_cross\_space\_compaction - -| Name | enable\_cross\_space\_compaction | -| :---------: |:------------------------------------------------------------------| -| Description | enable the compaction between sequence files and unsequence files | -| Type | Boolean | -| Default | true | -| Effective | hot-load | - -* cross\_selector - -|Name| cross\_selector | -|:---:|:-------------------------------------------------| -|Description| the task selector type of cross space compaction | -|Type| String | -|Default| rewrite | -|Effective| After restart system | - -* cross\_performer - -|Name| cross\_performer | -|:---:|:---------------------------------------------------| -|Description| the task performer type of cross space compaction. The options are read_point and fast, read_point is the default and fast is still under test | -|Type| String | -|Default| read\_point | -|Effective| After restart system | - -* inner\_seq\_selector - -|Name| inner\_seq\_selector | -|:---:|:----------------------------------------------------------| -|Description| the task selector type of inner sequence space compaction | -|Type| String | -|Default| size\_tiered | -|Effective| After restart system | - -* inner\_seq\_performer - -|Name| inner\_seq\_peformer | -|:---:|:--------------------------------------------------------------------------------------------------------------------------------------------------------| -|Description| the task performer type of inner sequence space compaction. The options are read_chunk and fast, read_chunk is the default and fast is still under test | -|Type| String | -|Default| read\_chunk | -|Effective| After restart system | - -* inner\_unseq\_selector - -|Name| inner\_unseq\_selector | -|:---:|:------------------------------------------------------------| -|Description| the task selector type of inner unsequence space compaction | -|Type| String | -|Default| size\_tiered | -|Effective| After restart system | - -* inner\_unseq\_performer - -|Name| inner\_unseq\_peformer | -|:---:|:--------------------------------------------------------------| -|Description| the task performer type of inner unsequence space compaction. The options are read_point and fast, read_point is the default and fast is still under test | -|Type| String | -|Default| read\_point | -|Effective| After restart system | - -* compaction\_priority - -| Name | compaction\_priority | -| :---------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Description | Priority of compaction task. When it is BALANCE, system executes all types of compaction equally; when it is INNER\_CROSS, system takes precedence over executing inner space compaction task; when it is CROSS\_INNER, system takes precedence over executing cross space compaction task | -| Type | String | -| Default | BALANCE | -| Effective | After restart system | - -* target\_compaction\_file\_size - -| Name | target\_compaction\_file\_size | -| :---------: |:-----------------------------------| -| Description | The target file size in compaction | -| Type | Int64 | -| Default | 2147483648 | -| Effective | hot-load | - -* target\_chunk\_size - -| Name | target\_chunk\_size | -| :---------: | :--------------------------------- | -| Description | The target size of compacted chunk | -| Type | Int64 | -| Default | 1048576 | -| Effective | After restart system | - -* target\_chunk\_point\_num - -|Name| target\_chunk\_point\_num | -|:---:|:---| -|Description| The target point number of compacted chunk | -|Type| int32 | -|Default| 100000 | -|Effective|After restart system| - -* chunk\_size\_lower\_bound\_in\_compaction - -| Name | chunk\_size\_lower\_bound\_in\_compaction | -| :---------: |:----------------------------------------------------------------------------------------| -| Description | A source chunk will be deserialized in compaction when its size is less than this value | -| Type | Int64 | -| Default | 10240 | -| Effective | After restart system | - -* chunk\_point\_num\_lower\_bound\_in\_compaction - -|Name| chunk\_point\_num\_lower\_bound\_in\_compaction | -|:---:|:---------------------------------------------------------------------------------------------| -|Description| A source chunk will be deserialized in compaction when its point num is less than this value | -|Type| int32 | -|Default| 1000 | -|Effective| After restart system | - -* max\_inner\_compaction\_candidate\_file\_num - -|Name| max\_inner\_compaction\_candidate\_file\_num | -|:---:|:---------------------------------------------------------| -|Description| The max num of files encounter in inner space compaction | -|Type| int32 | -|Default| 30 | -|Effective| hot-load | - -* max\_cross\_compaction\_file\_num - -|Name| max\_cross\_compaction\_candidate\_file\_num | -|:---:|:---------------------------------------------------------| -|Description| The max num of files encounter in cross space compaction | -|Type| int32 | -|Default| 500 | -|Effective| hot-load | - -* max\_cross\_compaction\_file\_size - -|Name| max\_cross\_compaction\_candidate\_file\_size | -|:---:|:----------------------------------------------------------| -|Description| The max size of files encounter in cross space compaction | -|Type| Int64 | -|Default| 5368709120 | -|Effective| hot-load | - -* compaction\_thread\_count - -|Name| compaction\_thread\_count | -|:---:|:---------------------------------| -|Description| thread num to execute compaction | -|Type| int32 | -|Default| 10 | -|Effective| hot-load | - -* compaction\_schedule\_interval\_in\_ms - -| Name | compaction\_schedule\_interval\_in\_ms | -| :---------: | :------------------------------------- | -| Description | interval of scheduling compaction | -| Type | Int64 | -| Default | 60000 | -| Effective | After restart system | - -* compaction\_write\_throughput\_mb\_per\_sec - -|Name| compaction\_write\_throughput\_mb\_per\_sec | -|:---:|:-----------------------------------------------| -|Description| The write rate of all compaction tasks in MB/s | -|Type| int32 | -|Default| 16 | -|Effective| hot-load | - -* compaction\_read\_throughput\_mb\_per\_sec - -|Name| compaction\_read\_throughput\_mb\_per\_sec | -|:---:|:------------------------------------------------| -|Description| The read rate of all compaction tasks in MB/s, values less than or equal to 0 means no limit | -|Type| int32 | -|Default| 0 | -|Effective| hot-load | - -* compaction\_read\_operation\_per\_sec - -|Name| compaction\_read\_operation\_per\_sec | -|:---:|:---------------------------------------------------------------------------------------------------------------| -|Description| The read operation of all compaction tasks can reach per second, values less than or equal to 0 means no limit | -|Type| int32 | -|Default| 0 | -|Effective| hot-load | - -* sub\_compaction\_thread\_count - -|Name| sub\_compaction\_thread\_count | -|:---:|:--------------------------------------------------------------------------| -|Description| the number of sub-compaction threads to accelerate cross space compaction | -|Type| Int32 | -|Default| 4 | -|Effective| hot-load | - -* enable\_tsfile\_validation - -| Name | enable\_tsfile\_validation | -|:-----------:|:--------------------------------------------------------------------------| -| Description | Verify that TSfiles generated by Flush, Load, and Compaction are correct. | -| Type | boolean | -| Default | false | -| Effective | hot-load | - -* candidate\_compaction\_task\_queue\_size - -|Name| candidate\_compaction\_task\_queue\_size | -|:---:|:--------------------------------------------| -|Description| The size of candidate compaction task queue | -|Type| Int32 | -|Default| 50 | -|Effective| After restart system | - -* compaction\_schedule\_thread\_num - -|Name| compaction\_schedule\_thread\_num | -|:---:|:--------------------------------------------------------------------------| -|Description| The number of threads to be set up to select compaction task. | -|Type| Int32 | -|Default| 4 | -|Effective| hot-load | - -### Write Ahead Log Configuration - -* wal\_mode - -| Name | wal\_mode | -|:-----------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Description | The write mode of wal. For DISABLE mode, the system will disable wal. For SYNC mode, the system will submit wal synchronously, write request will not return until its wal is fsynced to the disk successfully. For ASYNC mode, the system will submit wal asynchronously, write request will return immediately no matter its wal is fsynced to the disk successfully. | -| Type | String | -| Default | ASYNC | -| Effective | After restart system | - -* max\_wal\_nodes\_num - -| Name | max\_wal\_nodes\_num | -|:-----------:|:---------------------------------------------------------------------------------------------------------------------------------------| -| Description | Max number of wal nodes, each node corresponds to one wal directory. The default value 0 means the number is determined by the system. | -| Type | int32 | -| Default | 0 | -| Effective | After restart system | - -* wal\_async\_mode\_fsync\_delay\_in\_ms - -| Name | wal\_async\_mode\_fsync\_delay\_in\_ms | -|:-----------:|:--------------------------------------------------------------------------------| -| Description | Duration a wal flush operation will wait before calling fsync in the async mode | -| Type | int32 | -| Default | 1000 | -| Effective | hot-load | - -* wal\_sync\_mode\_fsync\_delay\_in\_ms - -| Name | wal\_sync\_mode\_fsync\_delay\_in\_ms | -|:-----------:|:-------------------------------------------------------------------------------| -| Description | Duration a wal flush operation will wait before calling fsync in the sync mode | -| Type | int32 | -| Default | 3 | -| Effective | hot-load | - -* wal\_buffer\_size\_in\_byte - -| Name | wal\_buffer\_size\_in\_byte | -|:-----------:|:-----------------------------| -| Description | Buffer size of each wal node | -| Type | int32 | -| Default | 33554432 | -| Effective | After restart system | - -* wal\_buffer\_queue\_capacity - -| Name | wal\_buffer\_queue\_capacity | -|:-----------:|:-------------------------------------------| -| Description | Blocking queue capacity of each wal buffer | -| Type | int32 | -| Default | 500 | -| Effective | After restart system | - -* wal\_file\_size\_threshold\_in\_byte - -| Name | wal\_file\_size\_threshold\_in\_byte | -|:-----------:|:-------------------------------------| -| Description | Size threshold of each wal file | -| Type | int32 | -| Default | 31457280 | -| Effective | hot-load | - -* wal\_min\_effective\_info\_ratio - -| Name | wal\_min\_effective\_info\_ratio | -|:-----------:|:----------------------------------------------------| -| Description | Minimum ratio of effective information in wal files | -| Type | double | -| Default | 0.1 | -| Effective | hot-load | - -* wal\_memtable\_snapshot\_threshold\_in\_byte - -| Name | wal\_memtable\_snapshot\_threshold\_in\_byte | -|:-----------:|:----------------------------------------------------------------| -| Description | MemTable size threshold for triggering MemTable snapshot in wal | -| Type | int64 | -| Default | 8388608 | -| Effective | hot-load | - -* max\_wal\_memtable\_snapshot\_num - -| Name | max\_wal\_memtable\_snapshot\_num | -|:-----------:|:--------------------------------------| -| Description | MemTable's max snapshot number in wal | -| Type | int32 | -| Default | 1 | -| Effective | hot-load | - -* delete\_wal\_files\_period\_in\_ms - -| Name | delete\_wal\_files\_period\_in\_ms | -|:-----------:|:------------------------------------------------------------| -| Description | The period when outdated wal files are periodically deleted | -| Type | int64 | -| Default | 20000 | -| Effective | hot-load | - -### TsFile Configurations - -* group\_size\_in\_byte - -|Name|group\_size\_in\_byte| -|:---:|:---| -|Description|The data size written to the disk per time| -|Type|int32| -|Default| 134217728 | -|Effective|hot-load| - -* page\_size\_in\_byte - -|Name| page\_size\_in\_byte | -|:---:|:---| -|Description|The maximum size of a single page written in memory when each column in memory is written (in bytes)| -|Type|int32| -|Default| 65536 | -|Effective|hot-load| - -* max\_number\_of\_points\_in\_page - -|Name| max\_number\_of\_points\_in\_page | -|:---:|:-----------------------------------------------------------------------------------| -|Description| The maximum number of data points (timestamps - valued groups) contained in a page | -|Type| int32 | -|Default| 10000 | -|Effective| hot-load | - -* pattern\_matching\_threshold - -|Name| pattern\_matching\_threshold | -|:---:|:-----------------------------------| -|Description| Max matching time of regex pattern | -|Type| int32 | -|Default| 1000000 | -|Effective| hot-load | - -* max\_degree\_of\_index\_node - -|Name| max\_degree\_of\_index\_node | -|:---:|:---| -|Description|The maximum degree of the metadata index tree (that is, the max number of each node's children)| -|Type|int32| -|Default| 256 | -|Effective|Only allowed to be modified in first start up| - -* max\_string\_length - -|Name| max\_string\_length | -|:---:|:---| -|Description|The maximum length of a single string (number of character)| -|Type|int32| -|Default| 128 | -|Effective|hot-load| - -* value\_encoder - -| Name | value\_encoder | -| :---------: | :------------------------------------ | -| Description | Encoding type of value column | -| Type | Enum String: “TS_2DIFF”,“PLAIN”,“RLE” | -| Default | PLAIN | -| Effective | hot-load | - -* float\_precision - -|Name| float\_precision | -|:---:|:---| -|Description| The precision of the floating point number.(The number of digits after the decimal point) | -|Type|int32| -|Default| The default is 2 digits. Note: The 32-bit floating point number has a decimal precision of 7 bits, and the 64-bit floating point number has a decimal precision of 15 bits. If the setting is out of the range, it will have no practical significance. | -|Effective|hot-load| - -* compressor - -| Name | compressor | -|:-----------:|:-----------------------------------------------------------------------| -| Description | Data compression method; Time compression method in aligned timeseries | -| Type | Enum String : "UNCOMPRESSED", "SNAPPY", "LZ4", "ZSTD", "LZMA2" | -| Default | SNAPPY | -| Effective | hot-load | - -* bloomFilterErrorRate - -| Name | bloomFilterErrorRate | -| :---------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Description | The false positive rate of bloom filter in each TsFile. Bloom filter checks whether a given time series is in the tsfile before loading metadata. This can improve the performance of loading metadata and skip the tsfile that doesn't contain specified time series. If you want to learn more about its mechanism, you can refer to: [wiki page of bloom filter](https://en.wikipedia.org/wiki/Bloom_filter). | -| Type | float, (0, 1) | -| Default | 0.05 | -| Effective | After restarting system | - - -### Authorization Configuration - -* authorizer\_provider\_class - -| Name | authorizer\_provider\_class | -| :--------------------: | :------------------------------------------------------ | -| Description | the class name of the authorization service | -| Type | String | -| Default | org.apache.iotdb.commons.auth.authorizer.LocalFileAuthorizer | -| Effective | After restarting system | -| Other available values | org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer | - -* openID\_url - -| Name | openID\_url | -| :---------: | :----------------------------------------------- | -| Description | the openID server if OpenIdAuthorizer is enabled | -| Type | String (a http url) | -| Default | no | -| Effective | After restarting system | - -* iotdb\_server\_encrypt\_decrypt\_provider - -| Name | iotdb\_server\_encrypt\_decrypt\_provider | -| :---------: | :------------------------------------------------------------- | -| Description | The Class for user password encryption | -| Type | String | -| Default | org.apache.iotdb.commons.security.encrypt.MessageDigestEncrypt | -| Effective | Only allowed to be modified in first start up | - -* iotdb\_server\_encrypt\_decrypt\_provider\_parameter - -| Name | iotdb\_server\_encrypt\_decrypt\_provider\_parameter | -| :---------: | :--------------------------------------------------------------- | -| Description | Parameters used to initialize the user password encryption class | -| Type | String | -| Default | 空 | -| Effective | After restarting system | - -* author\_cache\_size - -| Name | author\_cache\_size | -| :---------: | :-------------------------- | -| Description | Cache size of user and role | -| Type | int32 | -| Default | 1000 | -| Effective | After restarting system | - -* author\_cache\_expire\_time - -| Name | author\_cache\_expire\_time | -| :---------: | :------------------------------------------------ | -| Description | Cache expire time of user and role, Unit: minutes | -| Type | int32 | -| Default | 30 | -| Effective | After restarting system | - -### UDF Configuration - -* udf\_initial\_byte\_array\_length\_for\_memory\_control - -| Name | udf\_initial\_byte\_array\_length\_for\_memory\_control | -| :---------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Description | Used to estimate the memory usage of text fields in a UDF query. It is recommended to set this value to be slightly larger than the average length of all texts. | -| Type | int32 | -| Default | 48 | -| Effective | After restarting system | - -* udf\_memory\_budget\_in\_mb - -| Name | udf\_memory\_budget\_in\_mb | -| :---------: | :--------------------------------------------------------------------------------------------------------- | -| Description | How much memory may be used in ONE UDF query (in MB). The upper limit is 20% of allocated memory for read. | -| Type | Float | -| Default | 30.0 | -| Effective | After restarting system | - -* udf\_reader\_transformer\_collector\_memory\_proportion - -| Name | udf\_reader\_transformer\_collector\_memory\_proportion | -| :---------: | :---------------------------------------------------------------------------------------------------------------------------------- | -| Description | UDF memory allocation ratio for reader, transformer and collector. The parameter form is a : b : c, where a, b, and c are integers. | -| Type | String | -| Default | 1:1:1 | -| Effective | After restarting system | - -* udf\_root\_dir - -| Name | udf\_root\_dir | -| :---------: | :------------------------ | -| Description | Root directory of UDF | -| Type | String | -| Default | ext/udf(Windows:ext\\udf) | -| Effective | After restarting system | - -* udf\_lib\_dir - -| Name | udf\_lib\_dir | -| :---------: | :--------------------------- | -| Description | UDF log and jar file dir | -| Type | String | -| Default | ext/udf(Windows:ext\\udf) | -| Effective | After restarting system | - -### Trigger Configuration - - -* trigger\_lib\_dir - -| Name | trigger\_lib\_dir | -| :---------: |:------------------------| -| Description | Trigger JAR file dir | -| Type | String | -| Default | ext/trigger | -| Effective | After restarting system | - -* stateful\_trigger\_retry\_num\_when\_not\_found - -| Name | stateful\_trigger\_retry\_num\_when\_not\_found | -| :---------: |:-----------------------------------------------------------------------------------| -| Description | How many times we will retry to found an instance of stateful trigger on DataNodes | -| Type | Int32 | -| Default | 3 | -| Effective | After restarting system | - - -### SELECT-INTO - -* into\_operation\_buffer\_size\_in\_byte - -| Name | into\_operation\_buffer\_size\_in\_byte | -| :---------: | :---------------------------------------------------------------------------------------------------------------------------------- | -| Description | When the select-into statement is executed, the maximum memory occupied by the data to be written (unit: Byte) | -| Type | int64 | -| Default | 100MB | -| Effective | hot-load | - - -* select\_into\_insert\_tablet\_plan\_row\_limit - -| Name | select\_into\_insert\_tablet\_plan\_row\_limit | -| :---------: | :---------------------------------------------------------------------------------------------------------------------------------- | -| Description | The maximum number of rows that can be processed in insert-tablet-plan when executing select-into statements. When <= 0, use 10000. | -| Type | int32 | -| Default | 10000 | -| Effective | hot-load | - -* into\_operation\_execution\_thread\_count - -| Name | into\_operation\_execution\_thread\_count | -| :---------: | :------------------------------------------------------------ | -| Description | The number of threads in the thread pool that execute insert-tablet tasks | -| Type | int32 | -| Default | 2 | -| Effective | After restarting system | - -### Continuous Query - -* continuous\_query\_execution\_thread - -| Name | continuous\_query\_execution\_thread | -| :---------: | :------------------------------------------------------------ | -| Description | How many threads will be set up to perform continuous queries | -| Type | int32 | -| Default | max(1, the / 2) | -| Effective | After restarting system | - -* continuous\_query\_min\_every\_interval - -| Name | continuous\_query\_min\_every\_interval | -| :---------: | :-------------------------------------------------- | -| Description | Minimum every interval to perform continuous query. | -| Type | duration | -| Default | 1s | -| Effective | After restarting system | - -### PIPE Configuration - -##### Version 1.3.0: - -* pipe_lib_dir - -| **Name** | **pipe_lib_dir** | -| ------------ | -------------------------- | -| Description | Directory for storing custom Pipe plugins | -| Type | string | -| Default Value | ext/pipe | -| Effective | Not currently supported for modification | - -* pipe_subtask_executor_max_thread_num - -| **Name** | **pipe_subtask_executor_max_thread_num** | -| ------------ | ------------------------------------------------------------ | -| Description | The maximum number of threads that can be used for processors and sinks in Pipe subtasks. The actual value will be the minimum of pipe_subtask_executor_max_thread_num and the maximum of 1 and half of the CPU core count. | -| Type | int | -| Default Value | 5 | -| Effective | After restarting system | - -* pipe_connector_timeout_ms - -| **Name** | **pipe_connector_timeout_ms** | -| ------------ | --------------------------------------------- | -| Description | The connection timeout for Thrift clients in milliseconds. | -| Type | int | -| Default Value | 900000 | -| Effective | After restarting system | - -* pipe_async_connector_selector_number - -| **Name** | **pipe_async_connector_selector_number** | -| ------------ | ------------------------------------------------------------ | -| Description | The maximum number of threads for processing execution results in the iotdb-thrift-async-connector plugin. | -| Type | int | -| Default Value | 1 | -| Effective | After restarting system | - -* pipe_async_connector_core_client_number - -| **Name** | **pipe_async_connector_core_client_number** | -| ------------ | ------------------------------------------------------------ | -| Description | The maximum number of clients that can be used in the iotdb-thrift-async-connector plugin. | -| Type | int | -| Default Value | 8 | -| Effective | After restarting system | - -* pipe_async_connector_max_client_number - -| **Name** | **pipe_async_connector_max_client_number** | -| ------------ | ------------------------------------------------------------ | -| Description | The maximum number of clients that can be used in the iotdb-thrift-async-connector plugin. | -| Type | int | -| Default Value | 16 | -| Effective | After restarting system | - -* pipe_air_gap_receiver_enabled - -| **Name** | **pipe_air_gap_receiver_enabled** | -| ------------ | ------------------------------------------------------------ | -| Description | Whether to enable receiving Pipe data through a gateway. The receiver can only return 0 or 1 in TCP mode to indicate whether the data was successfully received. | -| Type | Boolean | -| Default Value | false | -| Effective | After restarting system | - -* pipe_air_gap_receiver_enabled - -| **Name** | **pipe_air_gap_receiver_port** | -| ------------ | ------------------------------------ | -| Description | The port used by the server to receive Pipe data through a gateway. | -| Type | int | -| Default Value | 9780 | -| Effective | After restarting system | - -##### Version 1.3.1/2: - -* pipe_lib_dir - -| **Name** | **pipe_lib_dir** | -| ------------ | -------------------------- | -| Description | Directory for storing custom Pipe plugins | -| Type | string | -| Default Value | ext/pipe | -| Effective | Not currently supported for modification | - -* pipe_subtask_executor_max_thread_num - -| **Name** | **pipe_subtask_executor_max_thread_num** | -| ------------ | ------------------------------------------------------------ | -| Description | The maximum number of threads that can be used for processors and sinks in Pipe subtasks. The actual value will be the minimum of pipe_subtask_executor_max_thread_num and the maximum of 1 and half of the CPU core count. | -| Type | int | -| Default Value | 5 | -| Effective | After restarting system | - -* pipe_sink_timeout_ms - -| **Name** | **pipe_sink_timeout_ms** | -| ------------ | --------------------------------------------- | -| Description | The connection timeout for Thrift clients in milliseconds. | -| Type | int | -| Default Value | 900000 | -| Effective | After restarting system | - -* pipe_sink_selector_number - -| **Name** | **pipe_sink_selector_number** | -| ------------ | ------------------------------------------------------------ | -| Description | The maximum number of threads for processing execution results in the iotdb-thrift-async-sink plugin. It is recommended to set this value to be less than or equal to pipe_sink_max_client_number. | -| Type | int | -| Default Value | 4 | -| Effective | After restarting system | - -* pipe_sink_max_client_number - -| **Name** | **pipe_sink_max_client_number** | -| ------------ | ----------------------------------------------------------- | -| Description | The maximum number of clients that can be used in the iotdb-thrift-async-sink plugin. | -| Type | int | -| Default Value | 16 | -| Effective | After restarting system | - -* pipe_air_gap_receiver_enabled - -| **Name** | **pipe_air_gap_receiver_enabled** | -| ------------ | ------------------------------------------------------------ | -| Description | Whether to enable receiving Pipe data through a gateway. The receiver can only return 0 or 1 in TCP mode to indicate whether the data was successfully received. | -| Type | Boolean | -| Default Value | false | -| Effective | After restarting system | - -* pipe_air_gap_receiver_port - -| **Name** | **pipe_air_gap_receiver_port** | -| ------------ | ------------------------------------ | -| Description | The port used by the server to receive Pipe data through a gateway. | -| Type | int | -| Default Value | 9780 | -| Effective | After restarting system | - -### IOTConsensus Configuration - -* data_region_iot_max_log_entries_num_per_batch - -| Name | data_region_iot_max_log_entries_num_per_batch | -| :---------: | :------------------------------------------------ | -| Description | The maximum log entries num in IoTConsensus Batch | -| Type | int32 | -| Default | 1024 | -| Effective | After restarting system | - -* data_region_iot_max_size_per_batch - -| Name | data_region_iot_max_size_per_batch | -| :---------: | :------------------------------------- | -| Description | The maximum size in IoTConsensus Batch | -| Type | int32 | -| Default | 16MB | -| Effective | After restarting system | - -* data_region_iot_max_pending_batches_num - -| Name | data_region_iot_max_pending_batches_num | -| :---------: | :---------------------------------------------- | -| Description | The maximum pending batches num in IoTConsensus | -| Type | int32 | -| Default | 12 | -| Effective | After restarting system | - -* data_region_iot_max_memory_ratio_for_queue - -| Name | data_region_iot_max_memory_ratio_for_queue | -| :---------: | :------------------------------------------------- | -| Description | The maximum memory ratio for queue in IoTConsensus | -| Type | double | -| Default | 0.6 | -| Effective | After restarting system | - -### RatisConsensus Configuration - -* config\_node\_ratis\_log\_appender\_buffer\_size\_max - -| Name | config\_node\_ratis\_log\_appender\_buffer\_size\_max | -|:------:|:-----------------------------------------------| -| Description | confignode max payload size for a single log-sync-RPC from leader to follower | -| Type | int32 | -| Default | 4MB | -| Effective | After restarting system | - - -* schema\_region\_ratis\_log\_appender\_buffer\_size\_max - -| Name | schema\_region\_ratis\_log\_appender\_buffer\_size\_max | -|:------:|:-------------------------------------------------| -| Description | schema region max payload size for a single log-sync-RPC from leader to follower | -| Type | int32 | -| Default | 4MB | -| Effective | After restarting system | - -* data\_region\_ratis\_log\_appender\_buffer\_size\_max - -| Name | data\_region\_ratis\_log\_appender\_buffer\_size\_max | -|:------:|:-----------------------------------------------| -| Description | data region max payload size for a single log-sync-RPC from leader to follower | -| Type | int32 | -| Default | 4MB | -| Effective | After restarting system | - -* config\_node\_ratis\_snapshot\_trigger\_threshold - -| Name | config\_node\_ratis\_snapshot\_trigger\_threshold | -|:------:|:---------------------------------------------| -| Description | confignode trigger a snapshot when snapshot_trigger_threshold logs are written | -| Type | int32 | -| Default | 400,000 | -| Effective | After restarting system | - -* schema\_region\_ratis\_snapshot\_trigger\_threshold - -| Name | schema\_region\_ratis\_snapshot\_trigger\_threshold | -|:------:|:-----------------------------------------------| -| Description | schema region trigger a snapshot when snapshot_trigger_threshold logs are written | -| Type | int32 | -| Default | 400,000 | -| Effective | After restarting system | - -* data\_region\_ratis\_snapshot\_trigger\_threshold - -| Name | data\_region\_ratis\_snapshot\_trigger\_threshold | -|:------:|:---------------------------------------------| -| Description | data region trigger a snapshot when snapshot_trigger_threshold logs are written | -| Type | int32 | -| Default | 400,000 | -| Effective | After restarting system | - -* config\_node\_ratis\_log\_unsafe\_flush\_enable - -| Name | config\_node\_ratis\_log\_unsafe\_flush\_enable | -|:------:|:---------------------------------------------------| -| Description | confignode allows flushing Raft Log asynchronously | -| Type | boolean | -| Default | false | -| Effective | After restarting system | - -* schema\_region\_ratis\_log\_unsafe\_flush\_enable - -| Name | schema\_region\_ratis\_log\_unsafe\_flush\_enable | -|:------:|:------------------------------------------------------| -| Description | schema region allows flushing Raft Log asynchronously | -| Type | boolean | -| Default | false | -| Effective | After restarting system | - -* data\_region\_ratis\_log\_unsafe\_flush\_enable - -| Name | data\_region\_ratis\_log\_unsafe\_flush\_enable | -|:------:|:----------------------------------------------------| -| Description | data region allows flushing Raft Log asynchronously | -| Type | boolean | -| Default | false | -| Effective | After restarting system | - -* config\_node\_ratis\_log\_segment\_size\_max\_in\_byte - -| Name | config\_node\_ratis\_log\_segment\_size\_max\_in\_byte | -|:------:|:-----------------------------------------------| -| Description | confignode max capacity of a single Log segment file | -| Type | int32 | -| Default | 24MB | -| Effective | After restarting system | - -* schema\_region\_ratis\_log\_segment\_size\_max\_in\_byte - -| Name | schema\_region\_ratis\_log\_segment\_size\_max\_in\_byte | -|:------:|:-------------------------------------------------| -| Description | schema region max capacity of a single Log segment file | -| Type | int32 | -| Default | 24MB | -| Effective | After restarting system | - -* data\_region\_ratis\_log\_segment\_size\_max\_in\_byte - -| Name | data\_region\_ratis\_log\_segment\_size\_max\_in\_byte | -|:------:|:-----------------------------------------------| -| Description | data region max capacity of a single Log segment file | -| Type | int32 | -| Default | 24MB | -| Effective | After restarting system | - -* config\_node\_ratis\_grpc\_flow\_control\_window - -| Name | config\_node\_ratis\_grpc\_flow\_control\_window | -|:------:|:-----------------------------------------------------------------------------| -| Description | confignode flow control window for ratis grpc log appender | -| Type | int32 | -| Default | 4MB | -| Effective | After restarting system | - -* schema\_region\_ratis\_grpc\_flow\_control\_window - -| Name | schema\_region\_ratis\_grpc\_flow\_control\_window | -|:------:|:---------------------------------------------| -| Description | schema region flow control window for ratis grpc log appender | -| Type | int32 | -| Default | 4MB | -| Effective | After restarting system | - -* data\_region\_ratis\_grpc\_flow\_control\_window - -| Name | data\_region\_ratis\_grpc\_flow\_control\_window | -|:------:|:-------------------------------------------| -| Description | data region flow control window for ratis grpc log appender | -| Type | int32 | -| Default | 4MB | -| Effective | After restarting system | - -* config\_node\_ratis\_grpc\_leader\_outstanding\_appends\_max - -| Name | config\_node\_ratis\_grpc\_leader\_outstanding\_appends\_max | -| :---------: | :----------------------------------------------------- | -| Description | config node grpc pipeline concurrency threshold | -| Type | int32 | -| Default | 128 | -| Effective | After restarting system | - -* schema\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max - -| Name | schema\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max | -| :---------: | :------------------------------------------------------ | -| Description | schema region grpc pipeline concurrency threshold | -| Type | int32 | -| Default | 128 | -| Effective | After restarting system | - -* data\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max - -| Name | data\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max | -| :---------: | :---------------------------------------------------- | -| Description | data region grpc pipeline concurrency threshold | -| Type | int32 | -| Default | 128 | -| Effective | After restarting system | - -* config\_node\_ratis\_log\_force\_sync\_num - -| Name | config\_node\_ratis\_log\_force\_sync\_num | -| :---------: | :------------------------------------ | -| Description | config node fsync threshold | -| Type | int32 | -| Default | 128 | -| Effective | After restarting system | - -* schema\_region\_ratis\_log\_force\_sync\_num - -| Name | schema\_region\_ratis\_log\_force\_sync\_num | -| :---------: | :-------------------------------------- | -| Description | schema region fsync threshold | -| Type | int32 | -| Default | 128 | -| Effective | After restarting system | - -* data\_region\_ratis\_log\_force\_sync\_num - -| Name | data\_region\_ratis\_log\_force\_sync\_num | -| :---------: | :----------------------------------- | -| Description | data region fsync threshold | -| Type | int32 | -| Default | 128 | -| Effective | After restarting system | - -* config\_node\_ratis\_rpc\_leader\_election\_timeout\_min\_ms - -| Name | config\_node\_ratis\_rpc\_leader\_election\_timeout\_min\_ms | -|:------:|:-----------------------------------------------------| -| Description | confignode min election timeout for leader election | -| Type | int32 | -| Default | 2000ms | -| Effective | After restarting system | - -* schema\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms - -| Name | schema\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms | -|:------:|:-------------------------------------------------------| -| Description | schema region min election timeout for leader election | -| Type | int32 | -| Default | 2000ms | -| Effective | After restarting system | - -* data\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms - -| Name | data\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms | -|:------:|:-----------------------------------------------------| -| Description | data region min election timeout for leader election | -| Type | int32 | -| Default | 2000ms | -| Effective | After restarting system | - -* config\_node\_ratis\_rpc\_leader\_election\_timeout\_max\_ms - -| Name | config\_node\_ratis\_rpc\_leader\_election\_timeout\_max\_ms | -|:------:|:-----------------------------------------------------| -| Description | confignode max election timeout for leader election | -| Type | int32 | -| Default | 2000ms | -| Effective | After restarting system | - -* schema\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms - -| Name | schema\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms | -|:------:|:-------------------------------------------------------| -| Description | schema region max election timeout for leader election | -| Type | int32 | -| Default | 2000ms | -| Effective | After restarting system | - -* data\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms - -| Name | data\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms | -|:------:|:-----------------------------------------------------| -| Description | data region max election timeout for leader election | -| Type | int32 | -| Default | 2000ms | -| Effective | After restarting system | - -* config\_node\_ratis\_request\_timeout\_ms - -| Name | config\_node\_ratis\_request\_timeout\_ms | -|:------:|:-------------------------------------| -| Description | confignode ratis client retry threshold | -| Type | int32 | -| Default | 10s | -| Effective | After restarting system | - -* schema\_region\_ratis\_request\_timeout\_ms - -| Name | schema\_region\_ratis\_request\_timeout\_ms | -|:------:|:---------------------------------------| -| Description | schema region ratis client retry threshold | -| Type | int32 | -| Default | 10s | -| Effective | After restarting system | - -* data\_region\_ratis\_request\_timeout\_ms - -| Name | data\_region\_ratis\_request\_timeout\_ms | -|:------:|:-------------------------------------| -| Description | data region ratis client retry threshold | -| Type | int32 | -| Default | 10s | -| Effective | After restarting system | - -* config\_node\_ratis\_max\_retry\_attempts - -| Name | config\_node\_ratis\_max\_retry\_attempts | -|:------:|:-------------------------------------------| -| Description | confignode ratis client max retry attempts | -| Type | int32 | -| Default | 10 | -| Effective | After restarting system | - -* config\_node\_ratis\_initial\_sleep\_time\_ms - -| Name | config\_node\_ratis\_initial\_sleep\_time\_ms | -|:------:|:-------------------------------------------------| -| Description | confignode ratis client retry initial sleep time | -| Type | int32 | -| Default | 100ms | -| Effective | After restarting system | - -* config\_node\_ratis\_max\_sleep\_time\_ms - -| Name | config\_node\_ratis\_max\_sleep\_time\_ms | -|:------:|:---------------------------------------------| -| Description | confignode ratis client retry max sleep time | -| Type | int32 | -| Default | 10s | -| Effective | After restarting system | - -* schema\_region\_ratis\_max\_retry\_attempts - -| Name | schema\_region\_ratis\_max\_retry\_attempts | -|:------:|:---------------------------------------| -| Description | schema region ratis client max retry attempts | -| Type | int32 | -| Default | 10 | -| Effective | After restarting system | - -* schema\_region\_ratis\_initial\_sleep\_time\_ms - -| Name | schema\_region\_ratis\_initial\_sleep\_time\_ms | -|:------:|:------------------------------------------| -| Description | schema region ratis client retry initial sleep time | -| Type | int32 | -| Default | 100ms | -| Effective | After restarting system | - -* schema\_region\_ratis\_max\_sleep\_time\_ms - -| Name | schema\_region\_ratis\_max\_sleep\_time\_ms | -|:------:|:--------------------------------------| -| Description | schema region ratis client retry max sleep time | -| Type | int32 | -| Default | 10s | -| Effective | After restarting system | - -* data\_region\_ratis\_max\_retry\_attempts - -| Name | data\_region\_ratis\_max\_retry\_attempts | -|:------:|:-------------------------------------| -| Description | data region ratis client max retry attempts | -| Type | int32 | -| Default | 10 | -| Effective | After restarting system | - -* data\_region\_ratis\_initial\_sleep\_time\_ms - -| Name | data\_region\_ratis\_initial\_sleep\_time\_ms | -|:------:|:----------------------------------------| -| Description | data region ratis client retry initial sleep time | -| Type | int32 | -| Default | 100ms | -| Effective | After restarting system | - -* data\_region\_ratis\_max\_sleep\_time\_ms - -| Name | data\_region\_ratis\_max\_sleep\_time\_ms | -|:------:|:------------------------------------| -| Description | data region ratis client retry max sleep time | -| Type | int32 | -| Default | 10s | -| Effective | After restarting system | - -* config\_node\_ratis\_preserve\_logs\_num\_when\_purge - -| Name | config\_node\_ratis\_preserve\_logs\_num\_when\_purge | -|:------:|:---------------------------------------------------------------| -| Description | confignode preserves certain logs when take snapshot and purge | -| Type | int32 | -| Default | 1000 | -| Effective | After restarting system | - -* schema\_region\_ratis\_preserve\_logs\_num\_when\_purge - -| Name | schema\_region\_ratis\_preserve\_logs\_num\_when\_purge | -|:------:|:------------------------------------------------------------------| -| Description | schema region preserves certain logs when take snapshot and purge | -| Type | int32 | -| Default | 1000 | -| Effective | After restarting system | - -* data\_region\_ratis\_preserve\_logs\_num\_when\_purge - -| Name | data\_region\_ratis\_preserve\_logs\_num\_when\_purge | -|:------:|:----------------------------------------------------------------| -| Description | data region preserves certain logs when take snapshot and purge | -| Type | int32 | -| Default | 1000 | -| Effective | After restarting system | - -### Procedure Configuration - -* procedure\_core\_worker\_thread\_count - -| Name | procedure\_core\_worker\_thread\_count | -| :---------: | :--------------------------------- | -| Description | The number of worker thread count | -| Type | int32 | -| Default | 4 | -| Effective | After restarting system | - -* procedure\_completed\_clean\_interval - -| Name | procedure\_completed\_clean\_interval | -| :---------: | :--------------------------------------------------- | -| Description | Time interval of completed procedure cleaner work in | -| Type | int32 | -| Unit | second | -| Default | 30 | -| Effective | After restarting system | - -* procedure\_completed\_evict\_ttl - -| Name | procedure\_completed\_evict\_ttl | -| :---------: | :----------------------------- | -| Description | The ttl of completed procedure | -| Type | int32 | -| Unit | second | -| Default | 800 | -| Effective | After restarting system | - -### MQTT Broker Configuration - -* enable\_mqtt\_service - -| Name | enable\_mqtt\_service。 | -|:-----------:|:------------------------------------| -| Description | Whether to enable the MQTT service | -| Type | Boolean | -| Default | False | -| Effective | hot-load | - -* mqtt\_host - -| Name | mqtt\_host | -|:-----------:|:---------------------------------------------| -| Description | The host to which the MQTT service is bound | -| Type | String | -| Default | 0.0.0.0 | -| Effective | hot-load | - -* mqtt\_port - -| Name | mqtt\_port | -|:-----------:|:--------------------------------------------| -| Description | The port to which the MQTT service is bound | -| Type | int32 | -| Default | 1883 | -| Effective | hot-load | - -* mqtt\_handler\_pool\_size - -|Name| mqtt\_handler\_pool\_size | -|:---:|:------------------------------------------------------------| -|Description| The size of the handler pool used to process MQTT messages | -|Type| int32 | -|Default| 1 | -|Effective| hot-load | - -* mqtt\_payload\_formatter - -| Name | mqtt\_payload\_formatter | -|:-----------:|:-------------------------------| -| Description | MQTT message payload formatter | -| Type | String | -| Default | JSON | -| Effective | hot-load | - -* mqtt\_max\_message\_size - -| Name | mqtt\_max\_message\_size | -|:------:|:-----------------------------------------| -| Description | Maximum length of MQTT message in bytes | -| Type | int32 | -| Default | 1048576 | -| Effective | hot-load | - - diff --git a/src/UserGuide/V1.3.0-2/Reference/ConfigNode-Config-Manual.md b/src/UserGuide/V1.3.0-2/Reference/ConfigNode-Config-Manual.md deleted file mode 100644 index fb0b3b384..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/ConfigNode-Config-Manual.md +++ /dev/null @@ -1,223 +0,0 @@ - - -# ConfigNode Configuration - -IoTDB ConfigNode files are under `conf`. - -* `confignode-env.sh/bat`:Environment configurations, in which we could set the memory allocation of ConfigNode. - -* `iotdb-confignode.properties`:IoTDB system configurations. - -## Environment Configuration File(confignode-env.sh/bat) - -The environment configuration file is mainly used to configure the Java environment related parameters when ConfigNode is running, such as JVM related configuration. This part of the configuration is passed to the JVM when the ConfigNode starts. - -The details of each parameter are as follows: - -* MEMORY\_SIZE - -|Name|MEMORY\_SIZE| -|:---:|:---| -|Description|The memory size that IoTDB ConfigNode will use when startup | -|Type|String| -|Default|The default is three-tenths of the memory, with a maximum of 16G.| -|Effective|After restarting system| - -* ON\_HEAP\_MEMORY - -|Name|ON\_HEAP\_MEMORY| -|:---:|:---| -|Description|The heap memory size that IoTDB ConfigNode can use, Former Name: MAX\_HEAP\_SIZE | -|Type|String| -|Default| Calculate based on MEMORY\_SIZE.| -|Effective|After restarting system| - -* OFF\_HEAP\_MEMORY - -|Name|OFF\_HEAP\_MEMORY| -|:---:|:---| -|Description|The direct memory that IoTDB ConfigNode can use, Former Name: MAX\_DIRECT\_MEMORY\_SIZE | -|Type|String| -|Default| Calculate based on MEMORY\_SIZE.| -|Effective|After restarting system| - - -## ConfigNode Configuration File (iotdb-confignode.properties) - -The global configuration of cluster is in ConfigNode. - -### Config Node RPC Configuration - -* cn\_internal\_address - -| Name | cn\_internal\_address | -|:-----------:|:------------------------------------| -| Description | ConfigNode internal service address | -| Type | String | -| Default | 127.0.0.1 | -| Effective | Only allowed to be modified in first start up | - -* cn\_internal\_port - -|Name| cn\_internal\_port | -|:---:|:---| -|Description| ConfigNode internal service port| -|Type| Short Int : [0,65535] | -|Default| 10710 | -|Effective|Only allowed to be modified in first start up| - -### Consensus - -* cn\_consensus\_port - -|Name| cn\_consensus\_port | -|:---:|:---| -|Description| ConfigNode data Consensus Port | -|Type| Short Int : [0,65535] | -|Default| 10720 | -|Effective|Only allowed to be modified in first start up| - -### Target Config Nodes - -* cn\_seed\_config\_node - -|Name| cn\_seed\_config\_node | -|:---:|:----------------------------------------------------------------------| -|Description| Target ConfigNode address, for current ConfigNode to join the cluster | -|Type| String | -|Default| 127.0.0.1:10710 | -|Effective| Only allowed to be modified in first start up | - -### Directory configuration - -* cn\_system\_dir - -|Name| cn\_system\_dir | -|:---:|:---| -|Description| ConfigNode system data dir | -|Type| String | -|Default| data/system(Windows:data\\system) | -|Effective|After restarting system| - -* cn\_consensus\_dir - -|Name| cn\_consensus\_dir | -|:---:|:---------------------------------------------------------------| -|Description| ConfigNode Consensus protocol data dir | -|Type| String | -|Default| data/confignode/consensus(Windows:data\\confignode\\consensus) | -|Effective| After restarting system | - -### Thrift RPC configuration - -* cn\_rpc\_thrift\_compression\_enable - -|Name| cn\_rpc\_thrift\_compression\_enable | -|:---:|:---| -|Description| Whether enable thrift's compression (using GZIP).| -|Type|Boolean| -|Default| false | -|Effective|After restarting system| - -* cn\_rpc\_thrift\_compression\_enable - -|Name| cn\_rpc\_thrift\_compression\_enable | -|:---:|:---| -|Description| Whether enable thrift's compression (using GZIP).| -|Type|Boolean| -|Default| false | -|Effective|After restarting system| - -* cn\_rpc\_advanced\_compression\_enable - -|Name| cn\_rpc\_advanced\_compression\_enable | -|:---:|:---| -|Description| Whether enable thrift's advanced compression.| -|Type|Boolean| -|Default| false | -|Effective|After restarting system| - -* cn\_rpc\_max\_concurrent\_client\_num - -|Name| cn\_rpc\_max\_concurrent\_client\_num | -|:---:|:---| -|Description| Max concurrent rpc connections| -|Type| Short Int : [0,65535] | -|Description| 65535 | -|Effective|After restarting system| - -* cn\_thrift\_max\_frame\_size - -|Name| cn\_thrift\_max\_frame\_size | -|:---:|:---| -|Description| Max size of bytes of each thrift RPC request/response| -|Type| Long | -|Unit|Byte| -|Default| 536870912 | -|Effective|After restarting system| - -* cn\_thrift\_init\_buffer\_size - -|Name| cn\_thrift\_init\_buffer\_size | -|:---:|:---| -|Description| Initial size of bytes of buffer that thrift used | -|Type| long | -|Default| 1024 | -|Effective|After restarting system| - -* cn\_connection\_timeout\_ms - -| Name | cn\_connection\_timeout\_ms | -|:-----------:|:-------------------------------------------------------| -| Description | Thrift socket and connection timeout between nodes | -| Type | int | -| Default | 60000 | -| Effective | After restarting system | - -* cn\_selector\_thread\_nums\_of\_client\_manager - -| Name | cn\_selector\_thread\_nums\_of\_client\_manager | -|:-----------:|:-------------------------------------------------------------------------------| -| Description | selector thread (TAsyncClientManager) nums for async thread in a clientManager | -| Type | int | -| Default | 1 | -| Effective | After restarting system | - -* cn\_core\_client\_count\_for\_each\_node\_in\_client\_manager - -| Name | cn\_core\_client\_count\_for\_each\_node\_in\_client\_manager | -|:------------:|:---------------------------------------------------------------| -| Description | Number of core clients routed to each node in a ClientManager | -| Type | int | -| Default | 200 | -| Effective | After restarting system | - -* cn\_max\_client\_count\_for\_each\_node\_in\_client\_manager - -| Name | cn\_max\_client\_count\_for\_each\_node\_in\_client\_manager | -|:--------------:|:-------------------------------------------------------------| -| Description | Number of max clients routed to each node in a ClientManager | -| Type | int | -| Default | 300 | -| Effective | After restarting system | - -### Metric Configuration diff --git a/src/UserGuide/V1.3.0-2/Reference/DataNode-Config-Manual.md b/src/UserGuide/V1.3.0-2/Reference/DataNode-Config-Manual.md deleted file mode 100644 index 172882761..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/DataNode-Config-Manual.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -redirectTo: DataNode-Config-Manual_apache.html ---- - diff --git a/src/UserGuide/V1.3.0-2/Reference/DataNode-Config-Manual_apache.md b/src/UserGuide/V1.3.0-2/Reference/DataNode-Config-Manual_apache.md deleted file mode 100644 index ea3f63288..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/DataNode-Config-Manual_apache.md +++ /dev/null @@ -1,500 +0,0 @@ - - -# DataNode Configuration Parameters - -We use the same configuration files for IoTDB DataNode and Standalone version, all under the `conf`. - -* `datanode-env.sh/bat`:Environment configurations, in which we could set the memory allocation of DataNode and Standalone. - -* `iotdb-datanode.properties`:IoTDB system configurations. - -## Hot Modification Configuration - -For the convenience of users, IoTDB provides users with hot modification function, that is, modifying some configuration parameters in `iotdb-datanode.properties` during the system operation and applying them to the system immediately. -In the parameters described below, these parameters whose way of `Effective` is `hot-load` support hot modification. - -Trigger way: The client sends the command(sql) `load configuration` or `set configuration` to the IoTDB server. - -## Environment Configuration File(datanode-env.sh/bat) - -The environment configuration file is mainly used to configure the Java environment related parameters when DataNode is running, such as JVM related configuration. This part of the configuration is passed to the JVM when the DataNode starts. - -The details of each parameter are as follows: - -* MEMORY\_SIZE - -|Name|MEMORY\_SIZE| -|:---:|:---| -|Description|The minimum heap memory size that IoTDB DataNode will use when startup | -|Type|String| -|Default| The default is a half of the memory.| -|Effective|After restarting system| - -* ON\_HEAP\_MEMORY - -|Name|ON\_HEAP\_MEMORY| -|:---:|:---| -|Description|The heap memory size that IoTDB DataNode can use, Former Name: MAX\_HEAP\_SIZE | -|Type|String| -|Default| Calculate based on MEMORY\_SIZE.| -|Effective|After restarting system| - -* OFF\_HEAP\_MEMORY - -|Name|OFF\_HEAP\_MEMORY| -|:---:|:---| -|Description|The direct memory that IoTDB DataNode can use, Former Name: MAX\_DIRECT\_MEMORY\_SIZE| -|Type|String| -|Default| Calculate based on MEMORY\_SIZE.| -|Effective|After restarting system| - -* JMX\_LOCAL - -|Name|JMX\_LOCAL| -|:---:|:---| -|Description|JMX monitoring mode, configured as yes to allow only local monitoring, no to allow remote monitoring| -|Type|Enum String: "true", "false"| -|Default|true| -|Effective|After restarting system| - -* JMX\_PORT - -|Name|JMX\_PORT| -|:---:|:---| -|Description|JMX listening port. Please confirm that the port is not a system reserved port and is not occupied| -|Type|Short Int: [0,65535]| -|Default|31999| -|Effective|After restarting system| - -* JMX\_IP - -|Name|JMX\_IP| -|:---:|:---| -|Description|JMX listening address. Only take effect if JMX\_LOCAL=false. 0.0.0.0 is never allowed| -|Type|String| -|Default|127.0.0.1| -|Effective|After restarting system| - -## JMX Authorization - -We **STRONGLY RECOMMENDED** you CHANGE the PASSWORD for the JMX remote connection. - -The user and passwords are in ${IOTDB\_CONF}/conf/jmx.password. - -The permission definitions are in ${IOTDB\_CONF}/conf/jmx.access. - -## DataNode/Standalone Configuration File (iotdb-datanode.properties) - -### Data Node RPC Configuration - -* dn\_rpc\_address - -|Name| dn\_rpc\_address | -|:---:|:-----------------------------------------------| -|Description| The client rpc service listens on the address. | -|Type| String | -|Default| 0.0.0.0 | -|Effective| After restarting system | - -* dn\_rpc\_port - -|Name| dn\_rpc\_port | -|:---:|:---| -|Description| The client rpc service listens on the port.| -|Type|Short Int : [0,65535]| -|Default| 6667 | -|Effective|After restarting system| - -* dn\_internal\_address - -|Name| dn\_internal\_address | -|:---:|:---| -|Description| DataNode internal service host/IP | -|Type| string | -|Default| 127.0.0.1 | -|Effective|Only allowed to be modified in first start up| - -* dn\_internal\_port - -|Name| dn\_internal\_port | -|:---:|:-------------------------------| -|Description| DataNode internal service port | -|Type| int | -|Default| 10730 | -|Effective| Only allowed to be modified in first start up | - -* dn\_mpp\_data\_exchange\_port - -|Name| mpp\_data\_exchange\_port | -|:---:|:---| -|Description| MPP data exchange port | -|Type| int | -|Default| 10740 | -|Effective|Only allowed to be modified in first start up| - -* dn\_schema\_region\_consensus\_port - -|Name| dn\_schema\_region\_consensus\_port | -|:---:|:---| -|Description| DataNode Schema replica communication port for consensus | -|Type| int | -|Default| 10750 | -|Effective|Only allowed to be modified in first start up| - -* dn\_data\_region\_consensus\_port - -|Name| dn\_data\_region\_consensus\_port | -|:---:|:---| -|Description| DataNode Data replica communication port for consensus | -|Type| int | -|Default| 10760 | -|Effective|Only allowed to be modified in first start up| - -* dn\_join\_cluster\_retry\_interval\_ms - -|Name| dn\_join\_cluster\_retry\_interval\_ms | -|:---:|:--------------------------------------------------------------------------| -|Description| The time of data node waiting for the next retry to join into the cluster | -|Type| long | -|Default| 5000 | -|Effective| After restarting system | - -### SSL Configuration - -* enable\_thrift\_ssl - -|Name| enable\_thrift\_ssl | -|:---:|:---------------------------| -|Description|When enable\_thrift\_ssl is configured as true, SSL encryption will be used for communication through dn\_rpc\_port | -|Type| Boolean | -|Default| false | -|Effective| After restarting system | - -* enable\_https - -|Name| enable\_https | -|:---:|:-------------------------| -|Description| REST Service Specifies whether to enable SSL configuration | -|Type| Boolean | -|Default| false | -|Effective| After restarting system | - -* key\_store\_path - -|Name| key\_store\_path | -|:---:|:-----------------| -|Description| SSL certificate path | -|Type| String | -|Default| "" | -|Effective| After restarting system | - -* key\_store\_pwd - -|Name| key\_store\_pwd | -|:---:|:----------------| -|Description| SSL certificate password | -|Type| String | -|Default| "" | -|Effective| After restarting system | - -### Target Config Nodes - -* dn\_seed\_config\_node - -|Name| dn\_seed\_config\_node | -|:---:|:------------------------------------------------| -|Description| ConfigNode Address for DataNode to join cluster | -|Type| String | -|Default| 127.0.0.1:10710 | -|Effective| Only allowed to be modified in first start up | - -### Connection Configuration - -* dn\_rpc\_thrift\_compression\_enable - -|Name| dn\_rpc\_thrift\_compression\_enable | -|:---:|:---| -|Description| Whether enable thrift's compression (using GZIP).| -|Type|Boolean| -|Default| false | -|Effective|After restarting system| - -* dn\_rpc\_advanced\_compression\_enable - -|Name| dn\_rpc\_advanced\_compression\_enable | -|:---:|:---| -|Description| Whether enable thrift's advanced compression.| -|Type|Boolean| -|Default| false | -|Effective|After restarting system| - -* dn\_rpc\_selector\_thread\_count - -|Name| dn\_rpc\_selector\_thread\_count | -|:---:|:-----------------------------------| -|Description| The number of rpc selector thread. | -|Type| int | -|Default| false | -|Effective| After restarting system | - -* dn\_rpc\_min\_concurrent\_client\_num - -|Name| dn\_rpc\_min\_concurrent\_client\_num | -|:---:|:-----------------------------------| -|Description| Minimum concurrent rpc connections | -|Type| Short Int : [0,65535] | -|Description| 1 | -|Effective| After restarting system | - -* dn\_rpc\_max\_concurrent\_client\_num - -|Name| dn\_rpc\_max\_concurrent\_client\_num | -|:---:|:---| -|Description| Max concurrent rpc connections| -|Type| Short Int : [0,65535] | -|Description| 65535 | -|Effective|After restarting system| - -* dn\_thrift\_max\_frame\_size - -|Name| dn\_thrift\_max\_frame\_size | -|:---:|:---| -|Description| Max size of bytes of each thrift RPC request/response| -|Type| Long | -|Unit|Byte| -|Default| 536870912 | -|Effective|After restarting system| - -* dn\_thrift\_init\_buffer\_size - -|Name| dn\_thrift\_init\_buffer\_size | -|:---:|:---| -|Description| Initial size of bytes of buffer that thrift used | -|Type| long | -|Default| 1024 | -|Effective|After restarting system| - -* dn\_connection\_timeout\_ms - -| Name | dn\_connection\_timeout\_ms | -|:-----------:|:---------------------------------------------------| -| Description | Thrift socket and connection timeout between nodes | -| Type | int | -| Default | 60000 | -| Effective | After restarting system | - -* dn\_core\_client\_count\_for\_each\_node\_in\_client\_manager - -| Name | dn\_core\_client\_count\_for\_each\_node\_in\_client\_manager | -|:------------:|:--------------------------------------------------------------| -| Description | Number of core clients routed to each node in a ClientManager | -| Type | int | -| Default | 200 | -| Effective | After restarting system | - -* dn\_max\_client\_count\_for\_each\_node\_in\_client\_manager - -| Name | dn\_max\_client\_count\_for\_each\_node\_in\_client\_manager | -|:--------------:|:-------------------------------------------------------------| -| Description | Number of max clients routed to each node in a ClientManager | -| Type | int | -| Default | 300 | -| Effective | After restarting system | - -### Dictionary Configuration - -* dn\_system\_dir - -| Name | dn\_system\_dir | -|:-----------:|:----------------------------------------------------------------------------| -| Description | The directories of system files. It is recommended to use an absolute path. | -| Type | String | -| Default | data/datanode/system (Windows: data\\datanode\\system) | -| Effective | After restarting system | - -* dn\_data\_dirs - -| Name | dn\_data\_dirs | -|:-----------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Description | The directories of data files. Multiple directories are separated by comma. The starting directory of the relative path is related to the operating system. It is recommended to use an absolute path. If the path does not exist, the system will automatically create it. | -| Type | String[] | -| Default | data/datanode/data (Windows: data\\datanode\\data) | -| Effective | After restarting system | - -* dn\_multi\_dir\_strategy - -| Name | dn\_multi\_dir\_strategy | -|:-----------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Description | IoTDB's strategy for selecting directories for TsFile in tsfile_dir. You can use a simple class name or a full name of the class. The system provides the following three strategies:
1. SequenceStrategy: IoTDB selects the directory from tsfile\_dir in order, traverses all the directories in tsfile\_dir in turn, and keeps counting;
2. MaxDiskUsableSpaceFirstStrategy: IoTDB first selects the directory with the largest free disk space in tsfile\_dir;
You can complete a user-defined policy in the following ways:
1. Inherit the org.apache.iotdb.db.storageengine.rescon.disk.strategy.DirectoryStrategy class and implement its own Strategy method;
2. Fill in the configuration class with the full class name of the implemented class (package name plus class name, UserDfineStrategyPackage);
3. Add the jar file to the project. | -| Type | String | -| Default | SequenceStrategy | -| Effective | hot-load | - -* dn\_consensus\_dir - -| Name | dn\_consensus\_dir | -|:-----------:|:-------------------------------------------------------------------------------| -| Description | The directories of consensus files. It is recommended to use an absolute path. | -| Type | String | -| Default | data/datanode/consensus | -| Effective | After restarting system | - -* dn\_wal\_dirs - -| Name | dn\_wal\_dirs | -|:-----------:|:-------------------------------------------------------------------------| -| Description | Write Ahead Log storage path. It is recommended to use an absolute path. | -| Type | String | -| Default | data/datanode/wal | -| Effective | After restarting system | - -* dn\_tracing\_dir - -| Name | dn\_tracing\_dir | -|:-----------:|:----------------------------------------------------------------------------| -| Description | The tracing root directory path. It is recommended to use an absolute path. | -| Type | String | -| Default | datanode/tracing | -| Effective | After restarting system | - -* dn\_sync\_dir - -| Name | dn\_sync\_dir | -|:-----------:|:--------------------------------------------------------------------------| -| Description | The directories of sync files. It is recommended to use an absolute path. | -| Type | String | -| Default | data/datanode/sync | -| Effective | After restarting system | - -### Metric Configuration - -## Enable GC log - -GC log is off by default. -For performance tuning, you may want to collect the GC info. - -To enable GC log, just add a parameter "printgc" when you start the DataNode. - -```bash -nohup sbin/start-datanode.sh printgc >/dev/null 2>&1 & -``` -Or -```cmd -sbin\start-datanode.bat printgc -``` - -GC log is stored at `IOTDB_HOME/logs/gc.log`. -There will be at most 10 gc.log.* files and each one can reach to 10MB. - -### REST Service Configuration - -* enable\_rest\_service - -|Name| enable\_rest\_service | -|:---:|:--------------------------------------| -|Description| Whether to enable the Rest service | -|Type| Boolean | -|Default| false | -|Effective| After restarting system | - -* rest\_service\_port - -|Name| rest\_service\_port | -|:---:|:------------------| -|Description| The Rest service listens to the port number | -|Type| int32 | -|Default| 18080 | -|Effective| After restarting system | - -* enable\_swagger - -|Name| enable\_swagger | -|:---:|:-----------------------| -|Description| Whether to enable swagger to display rest interface information | -|Type| Boolean | -|Default| false | -|Effective| After restarting system | - -* rest\_query\_default\_row\_size\_limit - -|Name| rest\_query\_default\_row\_size\_limit | -|:---:|:------------------------------------------------------------------------------------------| -|Description| The maximum number of rows in a result set that can be returned by a query | -|Type| int32 | -|Default| 10000 | -|Effective| After restarting system | - -* cache\_expire - -|Name| cache\_expire | -|:---:|:--------------------------------------------------------| -|Description| Expiration time for caching customer login information | -|Type| int32 | -|Default| 28800 | -|Effective| After restarting system | - -* cache\_max\_num - -|Name| cache\_max\_num | -|:---:|:--------------| -|Description| The maximum number of users stored in the cache | -|Type| int32 | -|Default| 100 | -|Effective| After restarting system | - -* cache\_init\_num - -|Name| cache\_init\_num | -|:---:|:---------------| -|Description| Initial cache capacity | -|Type| int32 | -|Default| 10 | -|Effective| After restarting system | - - -* trust\_store\_path - -|Name| trust\_store\_path | -|:---:|:---------------| -|Description| keyStore Password (optional) | -|Type| String | -|Default| "" | -|Effective| After restarting system | - -* trust\_store\_pwd - -|Name| trust\_store\_pwd | -|:---:|:---------------------------------| -|Description| trustStore Password (Optional) | -|Type| String | -|Default| "" | -|Effective| After restarting system | - -* idle\_timeout - -|Name| idle\_timeout | -|:---:|:--------------| -|Description| SSL timeout duration, expressed in seconds | -|Type| int32 | -|Default| 5000 | -|Effective| After restarting system | - diff --git a/src/UserGuide/V1.3.0-2/Reference/DataNode-Config-Manual_timecho.md b/src/UserGuide/V1.3.0-2/Reference/DataNode-Config-Manual_timecho.md deleted file mode 100644 index 9b2164402..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/DataNode-Config-Manual_timecho.md +++ /dev/null @@ -1,592 +0,0 @@ - - -# DataNode Configuration Parameters - -We use the same configuration files for IoTDB DataNode and Standalone version, all under the `conf`. - -* `datanode-env.sh/bat`:Environment configurations, in which we could set the memory allocation of DataNode and Standalone. - -* `iotdb-datanode.properties`:IoTDB system configurations. - -## Hot Modification Configuration - -For the convenience of users, IoTDB provides users with hot modification function, that is, modifying some configuration parameters in `iotdb-datanode.properties` during the system operation and applying them to the system immediately. -In the parameters described below, these parameters whose way of `Effective` is `hot-load` support hot modification. - -Trigger way: The client sends the command(sql) `load configuration` or `set configuration` to the IoTDB server. - -## Environment Configuration File(datanode-env.sh/bat) - -The environment configuration file is mainly used to configure the Java environment related parameters when DataNode is running, such as JVM related configuration. This part of the configuration is passed to the JVM when the DataNode starts. - -The details of each parameter are as follows: - -* MEMORY\_SIZE - -|Name|MEMORY\_SIZE| -|:---:|:---| -|Description|The minimum heap memory size that IoTDB DataNode will use when startup | -|Type|String| -|Default| The default is a half of the memory.| -|Effective|After restarting system| - -* ON\_HEAP\_MEMORY - -|Name|ON\_HEAP\_MEMORY| -|:---:|:---| -|Description|The heap memory size that IoTDB DataNode can use, Former Name: MAX\_HEAP\_SIZE | -|Type|String| -|Default| Calculate based on MEMORY\_SIZE.| -|Effective|After restarting system| - -* OFF\_HEAP\_MEMORY - -|Name|OFF\_HEAP\_MEMORY| -|:---:|:---| -|Description|The direct memory that IoTDB DataNode can use, Former Name: MAX\_DIRECT\_MEMORY\_SIZE| -|Type|String| -|Default| Calculate based on MEMORY\_SIZE.| -|Effective|After restarting system| - -* JMX\_LOCAL - -|Name|JMX\_LOCAL| -|:---:|:---| -|Description|JMX monitoring mode, configured as yes to allow only local monitoring, no to allow remote monitoring| -|Type|Enum String: "true", "false"| -|Default|true| -|Effective|After restarting system| - -* JMX\_PORT - -|Name|JMX\_PORT| -|:---:|:---| -|Description|JMX listening port. Please confirm that the port is not a system reserved port and is not occupied| -|Type|Short Int: [0,65535]| -|Default|31999| -|Effective|After restarting system| - -* JMX\_IP - -|Name|JMX\_IP| -|:---:|:---| -|Description|JMX listening address. Only take effect if JMX\_LOCAL=false. 0.0.0.0 is never allowed| -|Type|String| -|Default|127.0.0.1| -|Effective|After restarting system| - -## JMX Authorization - -We **STRONGLY RECOMMENDED** you CHANGE the PASSWORD for the JMX remote connection. - -The user and passwords are in ${IOTDB\_CONF}/conf/jmx.password. - -The permission definitions are in ${IOTDB\_CONF}/conf/jmx.access. - -## DataNode/Standalone Configuration File (iotdb-datanode.properties) - -### Data Node RPC Configuration - -* dn\_rpc\_address - -|Name| dn\_rpc\_address | -|:---:|:-----------------------------------------------| -|Description| The client rpc service listens on the address. | -|Type| String | -|Default| 0.0.0.0 | -|Effective| After restarting system | - -* dn\_rpc\_port - -|Name| dn\_rpc\_port | -|:---:|:---| -|Description| The client rpc service listens on the port.| -|Type|Short Int : [0,65535]| -|Default| 6667 | -|Effective|After restarting system| - -* dn\_internal\_address - -|Name| dn\_internal\_address | -|:---:|:---| -|Description| DataNode internal service host/IP | -|Type| string | -|Default| 127.0.0.1 | -|Effective|Only allowed to be modified in first start up| - -* dn\_internal\_port - -|Name| dn\_internal\_port | -|:---:|:-------------------------------| -|Description| DataNode internal service port | -|Type| int | -|Default| 10730 | -|Effective| Only allowed to be modified in first start up | - -* dn\_mpp\_data\_exchange\_port - -|Name| mpp\_data\_exchange\_port | -|:---:|:---| -|Description| MPP data exchange port | -|Type| int | -|Default| 10740 | -|Effective|Only allowed to be modified in first start up| - -* dn\_schema\_region\_consensus\_port - -|Name| dn\_schema\_region\_consensus\_port | -|:---:|:---| -|Description| DataNode Schema replica communication port for consensus | -|Type| int | -|Default| 10750 | -|Effective|Only allowed to be modified in first start up| - -* dn\_data\_region\_consensus\_port - -|Name| dn\_data\_region\_consensus\_port | -|:---:|:---| -|Description| DataNode Data replica communication port for consensus | -|Type| int | -|Default| 10760 | -|Effective|Only allowed to be modified in first start up| - -* dn\_join\_cluster\_retry\_interval\_ms - -|Name| dn\_join\_cluster\_retry\_interval\_ms | -|:---:|:--------------------------------------------------------------------------| -|Description| The time of data node waiting for the next retry to join into the cluster | -|Type| long | -|Default| 5000 | -|Effective| After restarting system | - -### SSL Configuration - -* enable\_thrift\_ssl - -|Name| enable\_thrift\_ssl | -|:---:|:---------------------------| -|Description|When enable\_thrift\_ssl is configured as true, SSL encryption will be used for communication through dn\_rpc\_port | -|Type| Boolean | -|Default| false | -|Effective| After restarting system | - -* enable\_https - -|Name| enable\_https | -|:---:|:-------------------------| -|Description| REST Service Specifies whether to enable SSL configuration | -|Type| Boolean | -|Default| false | -|Effective| After restarting system | - -* key\_store\_path - -|Name| key\_store\_path | -|:---:|:-----------------| -|Description| SSL certificate path | -|Type| String | -|Default| "" | -|Effective| After restarting system | - -* key\_store\_pwd - -|Name| key\_store\_pwd | -|:---:|:----------------| -|Description| SSL certificate password | -|Type| String | -|Default| "" | -|Effective| After restarting system | - -### Target Config Nodes - -* dn\_seed\_config\_node - -|Name| dn\_seed\_config\_node | -|:---:|:------------------------------------------------| -|Description| ConfigNode Address for DataNode to join cluster | -|Type| String | -|Default| 127.0.0.1:10710 | -|Effective| Only allowed to be modified in first start up | - -### Connection Configuration - -* dn\_rpc\_thrift\_compression\_enable - -|Name| dn\_rpc\_thrift\_compression\_enable | -|:---:|:---| -|Description| Whether enable thrift's compression (using GZIP).| -|Type|Boolean| -|Default| false | -|Effective|After restarting system| - -* dn\_rpc\_advanced\_compression\_enable - -|Name| dn\_rpc\_advanced\_compression\_enable | -|:---:|:---| -|Description| Whether enable thrift's advanced compression.| -|Type|Boolean| -|Default| false | -|Effective|After restarting system| - -* dn\_rpc\_selector\_thread\_count - -|Name| dn\_rpc\_selector\_thread\_count | -|:---:|:-----------------------------------| -|Description| The number of rpc selector thread. | -|Type| int | -|Default| false | -|Effective| After restarting system | - -* dn\_rpc\_min\_concurrent\_client\_num - -|Name| dn\_rpc\_min\_concurrent\_client\_num | -|:---:|:-----------------------------------| -|Description| Minimum concurrent rpc connections | -|Type| Short Int : [0,65535] | -|Description| 1 | -|Effective| After restarting system | - -* dn\_rpc\_max\_concurrent\_client\_num - -|Name| dn\_rpc\_max\_concurrent\_client\_num | -|:---:|:---| -|Description| Max concurrent rpc connections| -|Type| Short Int : [0,65535] | -|Description| 65535 | -|Effective|After restarting system| - -* dn\_thrift\_max\_frame\_size - -|Name| dn\_thrift\_max\_frame\_size | -|:---:|:---| -|Description| Max size of bytes of each thrift RPC request/response| -|Type| Long | -|Unit|Byte| -|Default| 536870912 | -|Effective|After restarting system| - -* dn\_thrift\_init\_buffer\_size - -|Name| dn\_thrift\_init\_buffer\_size | -|:---:|:---| -|Description| Initial size of bytes of buffer that thrift used | -|Type| long | -|Default| 1024 | -|Effective|After restarting system| - -* dn\_connection\_timeout\_ms - -| Name | dn\_connection\_timeout\_ms | -|:-----------:|:---------------------------------------------------| -| Description | Thrift socket and connection timeout between nodes | -| Type | int | -| Default | 60000 | -| Effective | After restarting system | - -* dn\_core\_client\_count\_for\_each\_node\_in\_client\_manager - -| Name | dn\_core\_client\_count\_for\_each\_node\_in\_client\_manager | -|:------------:|:--------------------------------------------------------------| -| Description | Number of core clients routed to each node in a ClientManager | -| Type | int | -| Default | 200 | -| Effective | After restarting system | - -* dn\_max\_client\_count\_for\_each\_node\_in\_client\_manager - -| Name | dn\_max\_client\_count\_for\_each\_node\_in\_client\_manager | -|:--------------:|:-------------------------------------------------------------| -| Description | Number of max clients routed to each node in a ClientManager | -| Type | int | -| Default | 300 | -| Effective | After restarting system | - -### Dictionary Configuration - -* dn\_system\_dir - -| Name | dn\_system\_dir | -|:-----------:|:----------------------------------------------------------------------------| -| Description | The directories of system files. It is recommended to use an absolute path. | -| Type | String | -| Default | data/datanode/system (Windows: data\\datanode\\system) | -| Effective | After restarting system | - -* dn\_data\_dirs - -| Name | dn\_data\_dirs | -|:-----------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Description | The directories of data files. Multiple directories are separated by comma. The starting directory of the relative path is related to the operating system. It is recommended to use an absolute path. If the path does not exist, the system will automatically create it. | -| Type | String[] | -| Default | data/datanode/data (Windows: data\\datanode\\data) | -| Effective | After restarting system | - -* dn\_multi\_dir\_strategy - -| Name | dn\_multi\_dir\_strategy | -|:-----------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Description | IoTDB's strategy for selecting directories for TsFile in tsfile_dir. You can use a simple class name or a full name of the class. The system provides the following three strategies:
1. SequenceStrategy: IoTDB selects the directory from tsfile\_dir in order, traverses all the directories in tsfile\_dir in turn, and keeps counting;
2. MaxDiskUsableSpaceFirstStrategy: IoTDB first selects the directory with the largest free disk space in tsfile\_dir;
You can complete a user-defined policy in the following ways:
1. Inherit the org.apache.iotdb.db.storageengine.rescon.disk.strategy.DirectoryStrategy class and implement its own Strategy method;
2. Fill in the configuration class with the full class name of the implemented class (package name plus class name, UserDfineStrategyPackage);
3. Add the jar file to the project. | -| Type | String | -| Default | SequenceStrategy | -| Effective | hot-load | - -* dn\_consensus\_dir - -| Name | dn\_consensus\_dir | -|:-----------:|:-------------------------------------------------------------------------------| -| Description | The directories of consensus files. It is recommended to use an absolute path. | -| Type | String | -| Default | data/datanode/consensus | -| Effective | After restarting system | - -* dn\_wal\_dirs - -| Name | dn\_wal\_dirs | -|:-----------:|:-------------------------------------------------------------------------| -| Description | Write Ahead Log storage path. It is recommended to use an absolute path. | -| Type | String | -| Default | data/datanode/wal | -| Effective | After restarting system | - -* dn\_tracing\_dir - -| Name | dn\_tracing\_dir | -|:-----------:|:----------------------------------------------------------------------------| -| Description | The tracing root directory path. It is recommended to use an absolute path. | -| Type | String | -| Default | datanode/tracing | -| Effective | After restarting system | - -* dn\_sync\_dir - -| Name | dn\_sync\_dir | -|:-----------:|:--------------------------------------------------------------------------| -| Description | The directories of sync files. It is recommended to use an absolute path. | -| Type | String | -| Default | data/datanode/sync | -| Effective | After restarting system | - -### Metric Configuration - -## Enable GC log - -GC log is off by default. -For performance tuning, you may want to collect the GC info. - -To enable GC log, just add a parameter "printgc" when you start the DataNode. - -```bash -nohup sbin/start-datanode.sh printgc >/dev/null 2>&1 & -``` -Or -```cmd -sbin\start-datanode.bat printgc -``` - -GC log is stored at `IOTDB_HOME/logs/gc.log`. -There will be at most 10 gc.log.* files and each one can reach to 10MB. - -### REST Service Configuration - -* enable\_rest\_service - -|Name| enable\_rest\_service | -|:---:|:--------------------------------------| -|Description| Whether to enable the Rest service | -|Type| Boolean | -|Default| false | -|Effective| After restarting system | - -* rest\_service\_port - -|Name| rest\_service\_port | -|:---:|:------------------| -|Description| The Rest service listens to the port number | -|Type| int32 | -|Default| 18080 | -|Effective| After restarting system | - -* enable\_swagger - -|Name| enable\_swagger | -|:---:|:-----------------------| -|Description| Whether to enable swagger to display rest interface information | -|Type| Boolean | -|Default| false | -|Effective| After restarting system | - -* rest\_query\_default\_row\_size\_limit - -|Name| rest\_query\_default\_row\_size\_limit | -|:---:|:------------------------------------------------------------------------------------------| -|Description| The maximum number of rows in a result set that can be returned by a query | -|Type| int32 | -|Default| 10000 | -|Effective| After restarting system | - -* cache\_expire - -|Name| cache\_expire | -|:---:|:--------------------------------------------------------| -|Description| Expiration time for caching customer login information | -|Type| int32 | -|Default| 28800 | -|Effective| After restarting system | - -* cache\_max\_num - -|Name| cache\_max\_num | -|:---:|:--------------| -|Description| The maximum number of users stored in the cache | -|Type| int32 | -|Default| 100 | -|Effective| After restarting system | - -* cache\_init\_num - -|Name| cache\_init\_num | -|:---:|:---------------| -|Description| Initial cache capacity | -|Type| int32 | -|Default| 10 | -|Effective| After restarting system | - - -* trust\_store\_path - -|Name| trust\_store\_path | -|:---:|:---------------| -|Description| keyStore Password (optional) | -|Type| String | -|Default| "" | -|Effective| After restarting system | - -* trust\_store\_pwd - -|Name| trust\_store\_pwd | -|:---:|:---------------------------------| -|Description| trustStore Password (Optional) | -|Type| String | -|Default| "" | -|Effective| After restarting system | - -* idle\_timeout - -|Name| idle\_timeout | -|:---:|:--------------| -|Description| SSL timeout duration, expressed in seconds | -|Type| int32 | -|Default| 5000 | -|Effective| After restarting system | - -#### Storage engine configuration - -* dn\_default\_space\_move\_thresholds - -|Name| dn\_default\_space\_move\_thresholds | -|:---:|:--------------| -|Description| Version 1.3.0/1: Define the minimum remaining space ratio for each tier data catalogue; when the remaining space is less than this ratio, the data will be automatically migrated to the next tier; when the remaining storage space of the last tier falls below this threshold, the system will be set to READ_ONLY | -|Type| double | -|Default| 0.15 | -|Effective| hot-load | - - -* dn\_default\_space\_usage\_thresholds - -|Name| dn\_default\_space\_usage\_thresholds | -|:---:|:--------------| -|Description| Version 1.3.2: Define the minimum remaining space ratio for each tier data catalogue; when the remaining space is less than this ratio, the data will be automatically migrated to the next tier; when the remaining storage space of the last tier falls below this threshold, the system will be set to READ_ONLY | -|Type| double | -|Default| 0.85 | -|Effective| hot-load | - -* remote\_tsfile\_cache\_dirs - -|Name| remote\_tsfile\_cache\_dirs | -|:---:|:--------------| -|Description| Cache directory stored locally in the cloud | -|Type| string | -|Default| data/datanode/data/cache | -|Effective| After restarting system | - -* remote\_tsfile\_cache\_page\_size\_in\_kb - -|Name| remote\_tsfile\_cache\_page\_size\_in\_kb | -|:---:|:--------------| -|Description| Block size of locally cached files stored in the cloud | -|Type| int | -|Default| 20480 | -|Effective| After restarting system | - -* remote\_tsfile\_cache\_max\_disk\_usage\_in\_mb - -|Name| remote\_tsfile\_cache\_max\_disk\_usage\_in\_mb | -|:---:|:--------------| -|Description| Maximum Disk Occupancy Size for Cloud Storage Local Cache | -|Type| long | -|Default| 51200 | -|Effective| After restarting system | - -* object\_storage\_type - -|Name| object\_storage\_type | -|:---:|:--------------| -|Description| Cloud Storage Type | -|Type| string | -|Default| AWS_S3 | -|Effective| After restarting system | - -* object\_storage\_bucket - -|Name| object\_storage\_bucket | -|:---:|:--------------| -|Description| Name of cloud storage bucket | -|Type| string | -|Default| iotdb_data | -|Effective| After restarting system | - -* object\_storage\_endpoiont - -|Name| object\_storage\_endpoiont | -|:---:|:--------------| -|Description| endpoint of cloud storage | -|Type| string | -|Default| None | -|Effective| After restarting system | - -* object\_storage\_access\_key - -|Name| object\_storage\_access\_key | -|:---:|:--------------| -|Description| Authentication information stored in the cloud: key | -|Type| string | -|Default| None | -|Effective| After restarting system | - -* object\_storage\_access\_secret - -|Name| object\_storage\_access\_secret | -|:---:|:--------------| -|Description| Authentication information stored in the cloud: secret | -|Type| string | -|Default| None | -|Effective| After restarting system | diff --git a/src/UserGuide/V1.3.0-2/Reference/Function-and-Expression.md b/src/UserGuide/V1.3.0-2/Reference/Function-and-Expression.md deleted file mode 100644 index c208d6f17..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/Function-and-Expression.md +++ /dev/null @@ -1,3014 +0,0 @@ -# Function and Expression - - - -## Arithmetic Operators and Functions - -### Arithmetic Operators - -#### Unary Arithmetic Operators - -Supported operators: `+`, `-` - -Supported input data types: `INT32`, `INT64` and `FLOAT` - -Output data type: consistent with the input data type - -#### Binary Arithmetic Operators - -Supported operators: `+`, `-`, `*`, `/`, `%` - -Supported input data types: `INT32`, `INT64`, `FLOAT` and `DOUBLE` - -Output data type: `DOUBLE` - -Note: Only when the left operand and the right operand under a certain timestamp are not `null`, the binary arithmetic operation will have an output value. - -#### Example - -```sql -select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root.sg.d1 -``` - -Result: - -``` -+-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ -| Time|root.sg.d1.s1|-root.sg.d1.s1|root.sg.d1.s2|root.sg.d1.s2|root.sg.d1.s1 + root.sg.d1.s2|root.sg.d1.s1 - root.sg.d1.s2|root.sg.d1.s1 * root.sg.d1.s2|root.sg.d1.s1 / root.sg.d1.s2|root.sg.d1.s1 % root.sg.d1.s2| -+-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ -|1970-01-01T08:00:00.001+08:00| 1.0| -1.0| 1.0| 1.0| 2.0| 0.0| 1.0| 1.0| 0.0| -|1970-01-01T08:00:00.002+08:00| 2.0| -2.0| 2.0| 2.0| 4.0| 0.0| 4.0| 1.0| 0.0| -|1970-01-01T08:00:00.003+08:00| 3.0| -3.0| 3.0| 3.0| 6.0| 0.0| 9.0| 1.0| 0.0| -|1970-01-01T08:00:00.004+08:00| 4.0| -4.0| 4.0| 4.0| 8.0| 0.0| 16.0| 1.0| 0.0| -|1970-01-01T08:00:00.005+08:00| 5.0| -5.0| 5.0| 5.0| 10.0| 0.0| 25.0| 1.0| 0.0| -+-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ -Total line number = 5 -It costs 0.014s -``` - -### Arithmetic Functions - -Currently, IoTDB supports the following mathematical functions. The behavior of these mathematical functions is consistent with the behavior of these functions in the Java Math standard library. - -| Function Name | Allowed Input Series Data Types | Output Series Data Type | Required Attributes | Corresponding Implementation in the Java Standard Library | -| ------------- | ------------------------------- | ----------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -| SIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sin(double) | -| COS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cos(double) | -| TAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tan(double) | -| ASIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#asin(double) | -| ACOS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#acos(double) | -| ATAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#atan(double) | -| SINH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sinh(double) | -| COSH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cosh(double) | -| TANH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tanh(double) | -| DEGREES | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toDegrees(double) | -| RADIANS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toRadians(double) | -| ABS | INT32 / INT64 / FLOAT / DOUBLE | Same type as the input series | / | Math#abs(int) / Math#abs(long) /Math#abs(float) /Math#abs(double) | -| SIGN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#signum(double) | -| CEIL | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#ceil(double) | -| FLOOR | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#floor(double) | -| ROUND | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | 'places' : Round the significant number, positive number is the significant number after the decimal point, negative number is the significant number of whole number | Math#rint(Math#pow(10,places))/Math#pow(10,places) | -| EXP | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#exp(double) | -| LN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log(double) | -| LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log10(double) | -| SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sqrt(double) | - -Example: - -``` sql -select s1, sin(s1), cos(s1), tan(s1) from root.sg1.d1 limit 5 offset 1000; -``` - -Result: - -``` -+-----------------------------+-------------------+-------------------+--------------------+-------------------+ -| Time| root.sg1.d1.s1|sin(root.sg1.d1.s1)| cos(root.sg1.d1.s1)|tan(root.sg1.d1.s1)| -+-----------------------------+-------------------+-------------------+--------------------+-------------------+ -|2020-12-10T17:11:49.037+08:00|7360723084922759782| 0.8133527237573284| 0.5817708713544664| 1.3980636773094157| -|2020-12-10T17:11:49.038+08:00|4377791063319964531|-0.8938962705202537| 0.4482738644511651| -1.994085181866842| -|2020-12-10T17:11:49.039+08:00|7972485567734642915| 0.9627757585308978|-0.27030138509681073|-3.5618602479083545| -|2020-12-10T17:11:49.040+08:00|2508858212791964081|-0.6073417341629443| -0.7944406950452296| 0.7644897069734913| -|2020-12-10T17:11:49.041+08:00|2817297431185141819|-0.8419358900502509| -0.5395775727782725| 1.5603611649667768| -+-----------------------------+-------------------+-------------------+--------------------+-------------------+ -Total line number = 5 -It costs 0.008s -``` - -#### ROUND -Example: -```sql -select s4,round(s4),round(s4,2),round(s4,-1) from root.sg1.d1 -``` - -```sql -+-----------------------------+-------------+--------------------+----------------------+-----------------------+ -| Time|root.db.d1.s4|ROUND(root.db.d1.s4)|ROUND(root.db.d1.s4,2)|ROUND(root.db.d1.s4,-1)| -+-----------------------------+-------------+--------------------+----------------------+-----------------------+ -|1970-01-01T08:00:00.001+08:00| 101.14345| 101.0| 101.14| 100.0| -|1970-01-01T08:00:00.002+08:00| 20.144346| 20.0| 20.14| 20.0| -|1970-01-01T08:00:00.003+08:00| 20.614372| 21.0| 20.61| 20.0| -|1970-01-01T08:00:00.005+08:00| 20.814346| 21.0| 20.81| 20.0| -|1970-01-01T08:00:00.006+08:00| 60.71443| 61.0| 60.71| 60.0| -|2023-03-13T16:16:19.764+08:00| 10.143425| 10.0| 10.14| 10.0| -+-----------------------------+-------------+--------------------+----------------------+-----------------------+ -Total line number = 6 -It costs 0.059s -``` - - - -## Comparison Operators and Functions - -### Basic comparison operators - -Supported operators `>`, `>=`, `<`, `<=`, `==`, `!=` (or `<>` ) - -Supported input data types: `INT32`, `INT64`, `FLOAT` and `DOUBLE` - -Note: It will transform all type to `DOUBLE` then do computation. - -Output data type: `BOOLEAN` - -**Example:** - -```sql -select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; -``` - -``` -IoTDB> select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; -+-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ -| Time|root.test.a|root.test.b|root.test.a > 10|root.test.a <= root.test.b|!root.test.a <= root.test.b|(root.test.a > 10) & (root.test.a > root.test.b)| -+-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 23| 10.0| true| false| true| true| -|1970-01-01T08:00:00.002+08:00| 33| 21.0| true| false| true| true| -|1970-01-01T08:00:00.004+08:00| 13| 15.0| true| true| false| false| -|1970-01-01T08:00:00.005+08:00| 26| 0.0| true| false| true| true| -|1970-01-01T08:00:00.008+08:00| 1| 22.0| false| true| false| false| -|1970-01-01T08:00:00.010+08:00| 23| 12.0| true| false| true| true| -+-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ -``` - -### `BETWEEN ... AND ...` operator - -|operator |meaning| -|-----------------------------|-----------| -|`BETWEEN ... AND ...` |within the specified range| -|`NOT BETWEEN ... AND ...` |Not within the specified range| - -**Example:** Select data within or outside the interval [36.5,40]: - -```sql -select temperature from root.sg1.d1 where temperature between 36.5 and 40; -``` - -```sql -select temperature from root.sg1.d1 where temperature not between 36.5 and 40; -``` - -### Fuzzy matching operator - -For TEXT type data, support fuzzy matching of data using `Like` and `Regexp` operators. - -|operator |meaning| -|-----------------------------|-----------| -|`LIKE` | matches simple patterns| -|`NOT LIKE` |cannot match simple pattern| -|`REGEXP` | Match regular expression| -|`NOT REGEXP` |Cannot match regular expression| - -Input data type: `TEXT` - -Return type: `BOOLEAN` - -#### Use `Like` for fuzzy matching - -**Matching rules:** - -- `%` means any 0 or more characters. -- `_` means any single character. - -**Example 1:** Query the data under `root.sg.d1` that contains `'cc'` in `value`. - -```shell -IoTDB> select * from root.sg.d1 where value like '%cc%' -+--------------------------+----------------+ -| Time|root.sg.d1.value| -+--------------------------+----------------+ -|2017-11-01T00:00:00.000+08:00| aabbccdd| -|2017-11-01T00:00:01.000+08:00| cc| -+--------------------------+----------------+ -Total line number = 2 -It costs 0.002s -``` - -**Example 2:** Query the data under `root.sg.d1` with `'b'` in the middle of `value` and any single character before and after. - -```shell -IoTDB> select * from root.sg.device where value like '_b_' -+--------------------------+----------------+ -| Time|root.sg.d1.value| -+--------------------------+----------------+ -|2017-11-01T00:00:02.000+08:00|abc| -+--------------------------+----------------+ -Total line number = 1 -It costs 0.002s -``` - -#### Use `Regexp` for fuzzy matching - -The filter condition that needs to be passed in is **Java standard library style regular expression**. - -**Common regular matching examples:** - -``` -All characters with a length of 3-20: ^.{3,20}$ -Uppercase English characters: ^[A-Z]+$ -Numbers and English characters: ^[A-Za-z0-9]+$ -Starting with a: ^a.* -``` - -**Example 1:** Query the string of 26 English characters for value under root.sg.d1. - -```shell -IoTDB> select * from root.sg.d1 where value regexp '^[A-Za-z]+$' -+--------------------------+----------------+ -| Time|root.sg.d1.value| -+--------------------------+----------------+ -|2017-11-01T00:00:00.000+08:00| aabbccdd| -|2017-11-01T00:00:01.000+08:00| cc| -+--------------------------+----------------+ -Total line number = 2 -It costs 0.002s -``` - -**Example 2:** Query root.sg.d1 where the value is a string consisting of 26 lowercase English characters and the time is greater than 100. - -```shell -IoTDB> select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 -+--------------------------+----------------+ -| Time|root.sg.d1.value| -+--------------------------+----------------+ -|2017-11-01T00:00:00.000+08:00| aabbccdd| -|2017-11-01T00:00:01.000+08:00| cc| -+--------------------------+----------------+ -Total line number = 2 -It costs 0.002s -``` - -**Example 3:** - -```sql -select b, b like '1%', b regexp '[0-2]' from root.test; -``` - -operation result -``` -+-----------------------------+-----------+------- ------------------+--------------------------+ -| Time|root.test.b|root.test.b LIKE '^1.*?$'|root.test.b REGEXP '[0-2]'| -+-----------------------------+-----------+------- ------------------+--------------------------+ -|1970-01-01T08:00:00.001+08:00| 111test111| true| true| -|1970-01-01T08:00:00.003+08:00| 333test333| false| false| -+-----------------------------+-----------+------- ------------------+--------------------------+ -``` - -### `IS NULL` operator - -|operator |meaning| -|-----------------------------|-----------| -|`IS NULL` |is a null value| -|`IS NOT NULL` |is not a null value| - -**Example 1:** Select data with empty values: - -```sql -select code from root.sg1.d1 where temperature is null; -``` - -**Example 2:** Select data with non-null values: - -```sql -select code from root.sg1.d1 where temperature is not null; -``` - -### `IN` operator - -|operator |meaning| -|-----------------------------|-----------| -|`IN` / `CONTAINS` | are the values ​​in the specified list| -|`NOT IN` / `NOT CONTAINS` |not a value in the specified list| - -Input data type: `All Types` - -return type `BOOLEAN` - -**Note: Please ensure that the values ​​in the collection can be converted to the type of the input data. ** -> -> For example: -> -> `s1 in (1, 2, 3, 'test')`, the data type of `s1` is `INT32` -> -> We will throw an exception because `'test'` cannot be converted to type `INT32` - -**Example 1:** Select data with values ​​within a certain range: - -```sql -select code from root.sg1.d1 where code in ('200', '300', '400', '500'); -``` - -**Example 2:** Select data with values ​​outside a certain range: - -```sql -select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); -``` - -**Example 3:** - -```sql -select a, a in (1, 2) from root.test; -``` - -Output 2: -``` -+-----------------------------+-----------+------- -------------+ -| Time|root.test.a|root.test.a IN (1,2)| -+-----------------------------+-----------+------- -------------+ -|1970-01-01T08:00:00.001+08:00| 1| true| -|1970-01-01T08:00:00.003+08:00| 3| false| -+-----------------------------+-----------+------- -------------+ -``` - -### Condition Functions - -Condition functions are used to check whether timeseries data points satisfy some specific condition. - -They return BOOLEANs. - -Currently, IoTDB supports the following condition functions: - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| ------------- | ------------------------------- | --------------------------------------------- | ----------------------- | --------------------------------------------- | -| ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`: a double type variate | BOOLEAN | Return `ts_value >= threshold`. | -| IN_RANGR | INT32 / INT64 / FLOAT / DOUBLE | `lower`: DOUBLE type
`upper`: DOUBLE type | BOOLEAN | Return `ts_value >= lower && value <= upper`. | - -Example Data: -``` -IoTDB> select ts from root.test; -+-----------------------------+------------+ -| Time|root.test.ts| -+-----------------------------+------------+ -|1970-01-01T08:00:00.001+08:00| 1| -|1970-01-01T08:00:00.002+08:00| 2| -|1970-01-01T08:00:00.003+08:00| 3| -|1970-01-01T08:00:00.004+08:00| 4| -+-----------------------------+------------+ -``` - -#### Test 1 -SQL: -```sql -select ts, on_off(ts, 'threshold'='2') from root.test; -``` - -Output: -``` -IoTDB> select ts, on_off(ts, 'threshold'='2') from root.test; -+-----------------------------+------------+-------------------------------------+ -| Time|root.test.ts|on_off(root.test.ts, "threshold"="2")| -+-----------------------------+------------+-------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1| false| -|1970-01-01T08:00:00.002+08:00| 2| true| -|1970-01-01T08:00:00.003+08:00| 3| true| -|1970-01-01T08:00:00.004+08:00| 4| true| -+-----------------------------+------------+-------------------------------------+ -``` - -#### Test 2 -Sql: -```sql -select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; -``` - -Output: -``` -IoTDB> select ts, in_range(ts,'lower'='2', 'upper'='3.1') from root.test; -+-----------------------------+------------+--------------------------------------------------+ -| Time|root.test.ts|in_range(root.test.ts, "lower"="2", "upper"="3.1")| -+-----------------------------+------------+--------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1| false| -|1970-01-01T08:00:00.002+08:00| 2| true| -|1970-01-01T08:00:00.003+08:00| 3| true| -|1970-01-01T08:00:00.004+08:00| 4| false| -+-----------------------------+------------+--------------------------------------------------+ -``` - - - -## Logical Operators - -### Unary Logical Operators - -Supported operator `!` - -Supported input data types: `BOOLEAN` - -Output data type: `BOOLEAN` - -Hint: the priority of `!` is the same as `-`. Remember to use brackets to modify priority. - -### Binary Logical Operators - -Supported operators AND:`and`,`&`, `&&`; OR:`or`,`|`,`||` - -Supported input data types: `BOOLEAN` - -Output data type: `BOOLEAN` - -Note: Only when the left operand and the right operand under a certain timestamp are both `BOOLEAN` type, the binary logic operation will have an output value. - -**Example:** - -```sql -select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; -``` - -Output: -``` -IoTDB> select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; -+-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ -| Time|root.test.a|root.test.b|root.test.a > 10|root.test.a <= root.test.b|!root.test.a <= root.test.b|(root.test.a > 10) & (root.test.a > root.test.b)| -+-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 23| 10.0| true| false| true| true| -|1970-01-01T08:00:00.002+08:00| 33| 21.0| true| false| true| true| -|1970-01-01T08:00:00.004+08:00| 13| 15.0| true| true| false| false| -|1970-01-01T08:00:00.005+08:00| 26| 0.0| true| false| true| true| -|1970-01-01T08:00:00.008+08:00| 1| 22.0| false| true| false| false| -|1970-01-01T08:00:00.010+08:00| 23| 12.0| true| false| true| true| -+-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ -``` - - - -## Aggregate Functions - -Aggregate functions are many-to-one functions. They perform aggregate calculations on a set of values, resulting in a single aggregated result. - -All aggregate functions except `COUNT()`, `COUNT_IF()` ignore null values and return null when there are no input rows or all values are null. For example, `SUM()` returns null instead of zero, and `AVG()` does not include null values in the count. - -The aggregate functions supported by IoTDB are as follows: - -| Function Name | Description | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | -| ------------- | ------------------------------------------------------------ |-----------------------------------------------------| ------------------------------------------------------------ | ----------------------------------- | -| SUM | Summation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | -| COUNT | Counts the number of data points. | All data types | / | INT | -| AVG | Average. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | -| EXTREME | Finds the value with the largest absolute value. Returns a positive value if the maximum absolute value of positive and negative values is equal. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | -| MAX_VALUE | Find the maximum value. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | -| MIN_VALUE | Find the minimum value. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | -| FIRST_VALUE | Find the value with the smallest timestamp. | All data types | / | Consistent with input data type | -| LAST_VALUE | Find the value with the largest timestamp. | All data types | / | Consistent with input data type | -| MAX_TIME | Find the maximum timestamp. | All data Types | / | Timestamp | -| MIN_TIME | Find the minimum timestamp. | All data Types | / | Timestamp | -| COUNT_IF | Find the number of data points that continuously meet a given condition and the number of data points that meet the condition (represented by keep) meet the specified threshold. | BOOLEAN | `[keep >=/>/=/!=/= threshold` if `threshold` is used alone, type of `threshold` is `INT64` `ignoreNull`:Optional, default value is `true`;If the value is `true`, null values are ignored, it means that if there is a null value in the middle, the value is ignored without interrupting the continuity. If the value is `true`, null values are not ignored, it means that if there are null values in the middle, continuity will be broken | INT64 | -| TIME_DURATION | Find the difference between the timestamp of the largest non-null value and the timestamp of the smallest non-null value in a column | All data Types | / | INT64 | -| MODE | Find the mode. Note: 1.Having too many different values in the input series risks a memory exception; 2.If all the elements have the same number of occurrences, that is no Mode, return the value with earliest time; 3.If there are many Modes, return the Mode with earliest time. | All data Types | / | Consistent with the input data type | -| STDDEV | Calculate the overall standard deviation of the data. Note:
Missing points, null points and `NaN` in the input series will be ignored.| INT32 INT64 FLOAT DOUBLE | / | DOUBLE | -| COUNT_TIME | The number of timestamps in the query data set. When used with `align by device`, the result is the number of timestamps in the data set per device. | All data Types, the input parameter can only be `*` | / | INT64 | - - -### COUNT - -#### example - -```sql -select count(status) from root.ln.wf01.wt01; -``` -Result: - -``` -+-------------------------------+ -|count(root.ln.wf01.wt01.status)| -+-------------------------------+ -| 10080| -+-------------------------------+ -Total line number = 1 -It costs 0.016s -``` - -### COUNT_IF - -#### Grammar -```sql -count_if(predicate, [keep >=/>/=/!=/Note: count_if is not supported to use with SlidingWindow in group by time now - -#### example - -##### raw data - -``` -+-----------------------------+-------------+-------------+ -| Time|root.db.d1.s1|root.db.d1.s2| -+-----------------------------+-------------+-------------+ -|1970-01-01T08:00:00.001+08:00| 0| 0| -|1970-01-01T08:00:00.002+08:00| null| 0| -|1970-01-01T08:00:00.003+08:00| 0| 0| -|1970-01-01T08:00:00.004+08:00| 0| 0| -|1970-01-01T08:00:00.005+08:00| 1| 0| -|1970-01-01T08:00:00.006+08:00| 1| 0| -|1970-01-01T08:00:00.007+08:00| 1| 0| -|1970-01-01T08:00:00.008+08:00| 0| 0| -|1970-01-01T08:00:00.009+08:00| 0| 0| -|1970-01-01T08:00:00.010+08:00| 0| 0| -+-----------------------------+-------------+-------------+ -``` - -##### Not use `ignoreNull` attribute (Ignore Null) - -SQL: -```sql -select count_if(s1=0 & s2=0, 3), count_if(s1=1 & s2=0, 3) from root.db.d1 -``` - -Result: -``` -+--------------------------------------------------+--------------------------------------------------+ -|count_if(root.db.d1.s1 = 0 & root.db.d1.s2 = 0, 3)|count_if(root.db.d1.s1 = 1 & root.db.d1.s2 = 0, 3)| -+--------------------------------------------------+--------------------------------------------------+ -| 2| 1| -+--------------------------------------------------+-------------------------------------------------- -``` - -##### Use `ignoreNull` attribute - -SQL: -```sql -select count_if(s1=0 & s2=0, 3, 'ignoreNull'='false'), count_if(s1=1 & s2=0, 3, 'ignoreNull'='false') from root.db.d1 -``` - -Result: -``` -+------------------------------------------------------------------------+------------------------------------------------------------------------+ -|count_if(root.db.d1.s1 = 0 & root.db.d1.s2 = 0, 3, "ignoreNull"="false")|count_if(root.db.d1.s1 = 1 & root.db.d1.s2 = 0, 3, "ignoreNull"="false")| -+------------------------------------------------------------------------+------------------------------------------------------------------------+ -| 1| 1| -+------------------------------------------------------------------------+------------------------------------------------------------------------+ -``` - -### TIME_DURATION -#### Grammar -```sql - time_duration(Path) -``` -#### Example -##### raw data -```sql -+----------+-------------+ -| Time|root.db.d1.s1| -+----------+-------------+ -| 1| 70| -| 3| 10| -| 4| 303| -| 6| 110| -| 7| 302| -| 8| 110| -| 9| 60| -| 10| 70| -|1677570934| 30| -+----------+-------------+ -``` -##### Insert sql -```sql -"CREATE DATABASE root.db", -"CREATE TIMESERIES root.db.d1.s1 WITH DATATYPE=INT32, ENCODING=PLAIN tags(city=Beijing)", -"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(1, 2, 10, true)", -"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(2, null, 20, true)", -"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(3, 10, 0, null)", -"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(4, 303, 30, true)", -"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(5, null, 20, true)", -"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(6, 110, 20, true)", -"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(7, 302, 20, true)", -"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(8, 110, null, true)", -"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(9, 60, 20, true)", -"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(10,70, 20, null)", -"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(1677570934, 30, 0, true)", -``` - -SQL: -```sql -select time_duration(s1) from root.db.d1 -``` - -Result: -``` -+----------------------------+ -|time_duration(root.db.d1.s1)| -+----------------------------+ -| 1677570933| -+----------------------------+ -``` -> Note: Returns 0 if there is only one data point, or null if the data point is null. - -### COUNT_TIME -#### Grammar -```sql - count_time(*) -``` -#### Example -##### raw data -``` -+----------+-------------+-------------+-------------+-------------+ -| Time|root.db.d1.s1|root.db.d1.s2|root.db.d2.s1|root.db.d2.s2| -+----------+-------------+-------------+-------------+-------------+ -| 0| 0| null| null| 0| -| 1| null| 1| 1| null| -| 2| null| 2| 2| null| -| 4| 4| null| null| 4| -| 5| 5| 5| 5| 5| -| 7| null| 7| 7| null| -| 8| 8| 8| 8| 8| -| 9| null| 9| null| null| -+----------+-------------+-------------+-------------+-------------+ -``` -##### Insert sql -```sql -CREATE DATABASE root.db; -CREATE TIMESERIES root.db.d1.s1 WITH DATATYPE=INT32, ENCODING=PLAIN; -CREATE TIMESERIES root.db.d1.s2 WITH DATATYPE=INT32, ENCODING=PLAIN; -CREATE TIMESERIES root.db.d2.s1 WITH DATATYPE=INT32, ENCODING=PLAIN; -CREATE TIMESERIES root.db.d2.s2 WITH DATATYPE=INT32, ENCODING=PLAIN; -INSERT INTO root.db.d1(time, s1) VALUES(0, 0), (4,4), (5,5), (8,8); -INSERT INTO root.db.d1(time, s2) VALUES(1, 1), (2,2), (5,5), (7,7), (8,8), (9,9); -INSERT INTO root.db.d2(time, s1) VALUES(1, 1), (2,2), (5,5), (7,7), (8,8); -INSERT INTO root.db.d2(time, s2) VALUES(0, 0), (4,4), (5,5), (8,8); -``` - -Query-Example - 1: -```sql -select count_time(*) from root.db.** -``` - -Result -``` -+-------------+ -|count_time(*)| -+-------------+ -| 8| -+-------------+ -``` - -Query-Example - 2: -```sql -select count_time(*) from root.db.d1, root.db.d2 -``` - -Result -``` -+-------------+ -|count_time(*)| -+-------------+ -| 8| -+-------------+ -``` - -Query-Example - 3: -```sql -select count_time(*) from root.db.** group by([0, 10), 2ms) -``` - -Result -``` -+-----------------------------+-------------+ -| Time|count_time(*)| -+-----------------------------+-------------+ -|1970-01-01T08:00:00.000+08:00| 2| -|1970-01-01T08:00:00.002+08:00| 1| -|1970-01-01T08:00:00.004+08:00| 2| -|1970-01-01T08:00:00.006+08:00| 1| -|1970-01-01T08:00:00.008+08:00| 2| -+-----------------------------+-------------+ -``` - -Query-Example - 4: -```sql -select count_time(*) from root.db.** group by([0, 10), 2ms) align by device -``` - -Result -``` -+-----------------------------+----------+-------------+ -| Time| Device|count_time(*)| -+-----------------------------+----------+-------------+ -|1970-01-01T08:00:00.000+08:00|root.db.d1| 2| -|1970-01-01T08:00:00.002+08:00|root.db.d1| 1| -|1970-01-01T08:00:00.004+08:00|root.db.d1| 2| -|1970-01-01T08:00:00.006+08:00|root.db.d1| 1| -|1970-01-01T08:00:00.008+08:00|root.db.d1| 2| -|1970-01-01T08:00:00.000+08:00|root.db.d2| 2| -|1970-01-01T08:00:00.002+08:00|root.db.d2| 1| -|1970-01-01T08:00:00.004+08:00|root.db.d2| 2| -|1970-01-01T08:00:00.006+08:00|root.db.d2| 1| -|1970-01-01T08:00:00.008+08:00|root.db.d2| 1| -+-----------------------------+----------+-------------+ -``` - -> Note: -> 1. The parameter in count_time can only be *. -> 2. Count_time aggregation cannot be used with other aggregation functions. -> 3. Count_time aggregation used with having statement is not supported, and count_time aggregation can not appear in the having statement. -> 4. Count_time does not support use with group by level, group by tag. - - - -## String Processing - -### STRING_CONTAINS - -#### Function introduction - -This function checks whether the substring `s` exists in the string - -**Function name:** STRING_CONTAINS - -**Input sequence:** Only a single input sequence is supported, the type is TEXT. - -**parameter:** -+ `s`: The string to search for. - -**Output Sequence:** Output a single sequence, the type is BOOLEAN. - -#### Usage example - -``` sql -select s1, string_contains(s1, 's'='warn') from root.sg1.d4; -``` - -``` -+-----------------------------+--------------+-------------------------------------------+ -| Time|root.sg1.d4.s1|string_contains(root.sg1.d4.s1, "s"="warn")| -+-----------------------------+--------------+-------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| warn:-8721| true| -|1970-01-01T08:00:00.002+08:00| error:-37229| false| -|1970-01-01T08:00:00.003+08:00| warn:1731| true| -+-----------------------------+--------------+-------------------------------------------+ -Total line number = 3 -It costs 0.007s -``` - -### STRING_MATCHES - -#### Function introduction - -This function judges whether a string can be matched by the regular expression `regex`. - -**Function name:** STRING_MATCHES - -**Input sequence:** Only a single input sequence is supported, the type is TEXT. - -**parameter:** -+ `regex`: Java standard library-style regular expressions. - -**Output Sequence:** Output a single sequence, the type is BOOLEAN. - -#### Usage example - -```sql -select s1, string_matches(s1, 'regex'='[^\\s]+37229') from root.sg1.d4; -``` - -``` -+-----------------------------+--------------+------------------------------------------------------+ -| Time|root.sg1.d4.s1|string_matches(root.sg1.d4.s1, "regex"="[^\\s]+37229")| -+-----------------------------+--------------+------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| warn:-8721| false| -|1970-01-01T08:00:00.002+08:00| error:-37229| true| -|1970-01-01T08:00:00.003+08:00| warn:1731| false| -+-----------------------------+--------------+------------------------------------------------------+ -Total line number = 3 -It costs 0.007s -``` - -### Length - -#### Usage - -The function is used to get the length of input series. - -**Name:** LENGTH - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Output Series:** Output a single series. The type is INT32. - -**Note:** Returns NULL if input is NULL. - -#### Examples - -Input series: - -``` -+-----------------------------+--------------+ -| Time|root.sg1.d1.s1| -+-----------------------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| -|1970-01-01T08:00:00.002+08:00| 22test22| -+-----------------------------+--------------+ -``` - -SQL for query: - -```sql -select s1, length(s1) from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+----------------------+ -| Time|root.sg1.d1.s1|length(root.sg1.d1.s1)| -+-----------------------------+--------------+----------------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| 6| -|1970-01-01T08:00:00.002+08:00| 22test22| 8| -+-----------------------------+--------------+----------------------+ -``` - -### Locate - -#### Usage - -The function is used to get the position of the first occurrence of substring `target` in input series. Returns -1 if there are no `target` in input. - -**Name:** LOCATE - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Parameter:** - -+ `target`: The substring to be located. -+ `reverse`: Indicates whether reverse locate is required. The default value is `false`, means left-to-right locate. - -**Output Series:** Output a single series. The type is INT32. - -**Note:** The index begins from 0. - -#### Examples - -Input series: - -``` -+-----------------------------+--------------+ -| Time|root.sg1.d1.s1| -+-----------------------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| -|1970-01-01T08:00:00.002+08:00| 22test22| -+-----------------------------+--------------+ -``` - -SQL for query: - -```sql -select s1, locate(s1, "target"="1") from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+------------------------------------+ -| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1")| -+-----------------------------+--------------+------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| 0| -|1970-01-01T08:00:00.002+08:00| 22test22| -1| -+-----------------------------+--------------+------------------------------------+ -``` - -Another SQL for query: - -```sql -select s1, locate(s1, "target"="1", "reverse"="true") from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+------------------------------------------------------+ -| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1", "reverse"="true")| -+-----------------------------+--------------+------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| 5| -|1970-01-01T08:00:00.002+08:00| 22test22| -1| -+-----------------------------+--------------+------------------------------------------------------+ -``` - -### StartsWith - -#### Usage - -The function is used to check whether input series starts with the specified prefix. - -**Name:** STARTSWITH - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Parameter:** -+ `target`: The prefix to be checked. - -**Output Series:** Output a single series. The type is BOOLEAN. - -**Note:** Returns NULL if input is NULL. - -#### Examples - -Input series: - -``` -+-----------------------------+--------------+ -| Time|root.sg1.d1.s1| -+-----------------------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| -|1970-01-01T08:00:00.002+08:00| 22test22| -+-----------------------------+--------------+ -``` - -SQL for query: - -```sql -select s1, startswith(s1, "target"="1") from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+----------------------------------------+ -| Time|root.sg1.d1.s1|startswith(root.sg1.d1.s1, "target"="1")| -+-----------------------------+--------------+----------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| true| -|1970-01-01T08:00:00.002+08:00| 22test22| false| -+-----------------------------+--------------+----------------------------------------+ -``` - -### EndsWith - -#### Usage - -The function is used to check whether input series ends with the specified suffix. - -**Name:** ENDSWITH - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Parameter:** -+ `target`: The suffix to be checked. - -**Output Series:** Output a single series. The type is BOOLEAN. - -**Note:** Returns NULL if input is NULL. - -#### Examples - -Input series: - -``` -+-----------------------------+--------------+ -| Time|root.sg1.d1.s1| -+-----------------------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| -|1970-01-01T08:00:00.002+08:00| 22test22| -+-----------------------------+--------------+ -``` - -SQL for query: - -```sql -select s1, endswith(s1, "target"="1") from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+--------------------------------------+ -| Time|root.sg1.d1.s1|endswith(root.sg1.d1.s1, "target"="1")| -+-----------------------------+--------------+--------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| true| -|1970-01-01T08:00:00.002+08:00| 22test22| false| -+-----------------------------+--------------+--------------------------------------+ -``` - -### Concat - -#### Usage - -The function is used to concat input series and target strings. - -**Name:** CONCAT - -**Input Series:** At least one input series. The data type is TEXT. - -**Parameter:** -+ `targets`: A series of K-V, key needs to start with `target` and be not duplicated, value is the string you want to concat. -+ `series_behind`: Indicates whether series behind targets. The default value is `false`. - -**Output Series:** Output a single series. The type is TEXT. - -**Note:** -+ If value of input series is NULL, it will be skipped. -+ We can only concat input series and `targets` separately. `concat(s1, "target1"="IoT", s2, "target2"="DB")` and - `concat(s1, s2, "target1"="IoT", "target2"="DB")` gives the same result. - -#### Examples - -Input series: - -``` -+-----------------------------+--------------+--------------+ -| Time|root.sg1.d1.s1|root.sg1.d1.s2| -+-----------------------------+--------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| null| -|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| -+-----------------------------+--------------+--------------+ -``` - -SQL for query: - -```sql -select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB") from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ -| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB")| -+-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| null| 1test1IoTDB| -|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 22test222222testIoTDB| -+-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ -``` - -Another SQL for query: - -```sql -select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB", "series_behind"="true") from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ -| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB", "series_behind"="true")| -+-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| null| IoTDB1test1| -|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| IoTDB22test222222test| -+-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ -``` - -### substring - -#### Usage - -Extracts a substring of a string, starting with the first specified character and stopping after the specified number of characters.The index start at 1. The value range of from and for is an INT32. - -**Name:** SUBSTRING - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Parameter:** -+ `from`: Indicates the start position of substring. -+ `for`: Indicates how many characters to stop after of substring. - -**Output Series:** Output a single series. The type is TEXT. - -**Note:** Returns NULL if input is NULL. - -#### Examples - -Input series: - -``` -+-----------------------------+--------------+ -| Time|root.sg1.d1.s1| -+-----------------------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| -|1970-01-01T08:00:00.002+08:00| 22test22| -+-----------------------------+--------------+ -``` - -SQL for query: - -```sql -select s1, substring(s1 from 1 for 2) from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+--------------------------------------+ -| Time|root.sg1.d1.s1|SUBSTRING(root.sg1.d1.s1 FROM 1 FOR 2)| -+-----------------------------+--------------+--------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| 1t| -|1970-01-01T08:00:00.002+08:00| 22test22| 22| -+-----------------------------+--------------+--------------------------------------+ -``` - -### replace - -#### Usage - -Replace a substring in the input sequence with the target substring. - -**Name:** REPLACE - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Parameter:** -+ first parameter: The target substring to be replaced. -+ second parameter: The substring to replace with. - -**Output Series:** Output a single series. The type is TEXT. - -**Note:** Returns NULL if input is NULL. - -#### Examples - -Input series: - -``` -+-----------------------------+--------------+ -| Time|root.sg1.d1.s1| -+-----------------------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| -|1970-01-01T08:00:00.002+08:00| 22test22| -+-----------------------------+--------------+ -``` - -SQL for query: - -```sql -select s1, replace(s1, 'es', 'tt') from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+-----------------------------------+ -| Time|root.sg1.d1.s1|REPLACE(root.sg1.d1.s1, 'es', 'tt')| -+-----------------------------+--------------+-----------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| 1tttt1| -|1970-01-01T08:00:00.002+08:00| 22test22| 22tttt22| -+-----------------------------+--------------+-----------------------------------+ -``` - -### Upper - -#### Usage - -The function is used to get the string of input series with all characters changed to uppercase. - -**Name:** UPPER - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Output Series:** Output a single series. The type is TEXT. - -**Note:** Returns NULL if input is NULL. - -#### Examples - -Input series: - -``` -+-----------------------------+--------------+ -| Time|root.sg1.d1.s1| -+-----------------------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| -|1970-01-01T08:00:00.002+08:00| 22test22| -+-----------------------------+--------------+ -``` - -SQL for query: - -```sql -select s1, upper(s1) from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+---------------------+ -| Time|root.sg1.d1.s1|upper(root.sg1.d1.s1)| -+-----------------------------+--------------+---------------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| 1TEST1| -|1970-01-01T08:00:00.002+08:00| 22test22| 22TEST22| -+-----------------------------+--------------+---------------------+ -``` - -### Lower - -#### Usage - -The function is used to get the string of input series with all characters changed to lowercase. - -**Name:** LOWER - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Output Series:** Output a single series. The type is TEXT. - -**Note:** Returns NULL if input is NULL. - -#### Examples - -Input series: - -``` -+-----------------------------+--------------+ -| Time|root.sg1.d1.s1| -+-----------------------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 1TEST1| -|1970-01-01T08:00:00.002+08:00| 22TEST22| -+-----------------------------+--------------+ -``` - -SQL for query: - -```sql -select s1, lower(s1) from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+---------------------+ -| Time|root.sg1.d1.s1|lower(root.sg1.d1.s1)| -+-----------------------------+--------------+---------------------+ -|1970-01-01T08:00:00.001+08:00| 1TEST1| 1test1| -|1970-01-01T08:00:00.002+08:00| 22TEST22| 22test22| -+-----------------------------+--------------+---------------------+ -``` - -### Trim - -#### Usage - -The function is used to get the string whose value is same to input series, with all leading and trailing space removed. - -**Name:** TRIM - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Output Series:** Output a single series. The type is TEXT. - -**Note:** Returns NULL if input is NULL. - -#### Examples - -Input series: - -``` -+-----------------------------+--------------+ -| Time|root.sg1.d1.s3| -+-----------------------------+--------------+ -|1970-01-01T08:00:00.002+08:00| 3querytest3| -|1970-01-01T08:00:00.003+08:00| 3querytest3 | -+-----------------------------+--------------+ -``` - -SQL for query: - -```sql -select s3, trim(s3) from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+--------------------+ -| Time|root.sg1.d1.s3|trim(root.sg1.d1.s3)| -+-----------------------------+--------------+--------------------+ -|1970-01-01T08:00:00.002+08:00| 3querytest3| 3querytest3| -|1970-01-01T08:00:00.003+08:00| 3querytest3 | 3querytest3| -+-----------------------------+--------------+--------------------+ -``` - -### StrCmp - -#### Usage - -The function is used to get the compare result of two input series. Returns `0` if series value are the same, a `negative integer` if value of series1 is smaller than series2, -a `positive integer` if value of series1 is more than series2. - -**Name:** StrCmp - -**Input Series:** Support two input series. Data types are all the TEXT. - -**Output Series:** Output a single series. The type is INT32. - -**Note:** Returns NULL either series value is NULL. - -#### Examples - -Input series: - -``` -+-----------------------------+--------------+--------------+ -| Time|root.sg1.d1.s1|root.sg1.d1.s2| -+-----------------------------+--------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| null| -|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| -+-----------------------------+--------------+--------------+ -``` - -SQL for query: - -```sql -select s1, s2, strcmp(s1, s2) from root.sg1.d1 -``` - -Output series: - -``` -+-----------------------------+--------------+--------------+--------------------------------------+ -| Time|root.sg1.d1.s1|root.sg1.d1.s2|strcmp(root.sg1.d1.s1, root.sg1.d1.s2)| -+-----------------------------+--------------+--------------+--------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1test1| null| null| -|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 66| -+-----------------------------+--------------+--------------+--------------------------------------+ -``` - - -### StrReplace - -#### Usage - -**This is not a built-in function and can only be used after registering the library-udf.** The function is used to replace the specific substring with given string. - -**Name:** STRREPLACE - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Parameter:** - -+ `target`: The target substring to be replaced. -+ `replace`: The string to be put on. -+ `limit`: The number of matches to be replaced which should be an integer no less than -1, - default to -1 which means all matches will be replaced. -+ `offset`: The number of matches to be skipped, which means the first `offset` matches will not be replaced, default to 0. -+ `reverse`: Whether to count all the matches reversely, default to 'false'. - -**Output Series:** Output a single series. The type is TEXT. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2021-01-01T00:00:01.000+08:00| A,B,A+,B-| -|2021-01-01T00:00:02.000+08:00| A,A+,A,B+| -|2021-01-01T00:00:03.000+08:00| B+,B,B| -|2021-01-01T00:00:04.000+08:00| A+,A,A+,A| -|2021-01-01T00:00:05.000+08:00| A,B-,B,B| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-----------------------------------------+ -| Time|strreplace(root.test.d1.s1, "target"=",",| -| | "replace"="/", "limit"="2")| -+-----------------------------+-----------------------------------------+ -|2021-01-01T00:00:01.000+08:00| A/B/A+,B-| -|2021-01-01T00:00:02.000+08:00| A/A+/A,B+| -|2021-01-01T00:00:03.000+08:00| B+/B/B| -|2021-01-01T00:00:04.000+08:00| A+/A/A+,A| -|2021-01-01T00:00:05.000+08:00| A/B-/B,B| -+-----------------------------+-----------------------------------------+ -``` - -Another SQL for query: - -```sql -select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-----------------------------------------------------+ -| Time|strreplace(root.test.d1.s1, "target"=",", "replace"= | -| | "|", "limit"="1", "offset"="1", "reverse"="true")| -+-----------------------------+-----------------------------------------------------+ -|2021-01-01T00:00:01.000+08:00| A,B/A+,B-| -|2021-01-01T00:00:02.000+08:00| A,A+/A,B+| -|2021-01-01T00:00:03.000+08:00| B+/B,B| -|2021-01-01T00:00:04.000+08:00| A+,A/A+,A| -|2021-01-01T00:00:05.000+08:00| A,B-/B,B| -+-----------------------------+-----------------------------------------------------+ -``` - -### RegexMatch - -#### Usage - -**This is not a built-in function and can only be used after registering the library-udf.** The function is used to fetch matched contents from text with given regular expression. - -**Name:** REGEXMATCH - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Parameter:** - -+ `regex`: The regular expression to match in the text. All grammars supported by Java are acceptable, - for example, `\d+\.\d+\.\d+\.\d+` is expected to match any IPv4 addresses. -+ `group`: The wanted group index in the matched result. - Reference to java.util.regex, group 0 is the whole pattern and - the next ones are numbered with the appearance order of left parentheses. - For example, the groups in `A(B(CD))` are: 0-`A(B(CD))`, 1-`B(CD)`, 2-`CD`. - -**Output Series:** Output a single series. The type is TEXT. - -**Note:** Those points with null values or not matched with the given pattern will not return any results. - -#### Examples - -Input series: - -``` -+-----------------------------+-------------------------------+ -| Time| root.test.d1.s1| -+-----------------------------+-------------------------------+ -|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| -|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| -|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| -|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| -|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| -+-----------------------------+-------------------------------+ -``` - -SQL for query: - -```sql -select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+----------------------------------------------------------------------+ -| Time|regexmatch(root.test.d1.s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0")| -+-----------------------------+----------------------------------------------------------------------+ -|2021-01-01T00:00:01.000+08:00| 192.168.0.1| -|2021-01-01T00:00:02.000+08:00| 192.168.0.24| -|2021-01-01T00:00:03.000+08:00| 192.168.0.2| -|2021-01-01T00:00:04.000+08:00| 192.168.0.5| -|2021-01-01T00:00:05.000+08:00| 192.168.0.124| -+-----------------------------+----------------------------------------------------------------------+ -``` - -### RegexReplace - -#### Usage - -**This is not a built-in function and can only be used after registering the library-udf.** The function is used to replace the specific regular expression matches with given string. - -**Name:** REGEXREPLACE - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Parameter:** - -+ `regex`: The target regular expression to be replaced. All grammars supported by Java are acceptable. -+ `replace`: The string to be put on and back reference notes in Java is also supported, - for example, '$1' refers to group 1 in the `regex` which will be filled with corresponding matched results. -+ `limit`: The number of matches to be replaced which should be an integer no less than -1, - default to -1 which means all matches will be replaced. -+ `offset`: The number of matches to be skipped, which means the first `offset` matches will not be replaced, default to 0. -+ `reverse`: Whether to count all the matches reversely, default to 'false'. - -**Output Series:** Output a single series. The type is TEXT. - -#### Examples - -Input series: - -``` -+-----------------------------+-------------------------------+ -| Time| root.test.d1.s1| -+-----------------------------+-------------------------------+ -|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| -|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| -|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| -|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| -|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| -+-----------------------------+-------------------------------+ -``` - -SQL for query: - -```sql -select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-----------------------------------------------------------+ -| Time|regexreplace(root.test.d1.s1, "regex"="192\.168\.0\.(\d+)",| -| | "replace"="cluster-$1", "limit"="1")| -+-----------------------------+-----------------------------------------------------------+ -|2021-01-01T00:00:01.000+08:00| [cluster-1] [SUCCESS]| -|2021-01-01T00:00:02.000+08:00| [cluster-24] [SUCCESS]| -|2021-01-01T00:00:03.000+08:00| [cluster-2] [FAIL]| -|2021-01-01T00:00:04.000+08:00| [cluster-5] [SUCCESS]| -|2021-01-01T00:00:05.000+08:00| [cluster-124] [SUCCESS]| -+-----------------------------+-----------------------------------------------------------+ -``` - -### RegexSplit - -#### Usage - -**This is not a built-in function and can only be used after registering the library-udf.** The function is used to split text with given regular expression and return specific element. - -**Name:** REGEXSPLIT - -**Input Series:** Only support a single input series. The data type is TEXT. - -**Parameter:** - -+ `regex`: The regular expression used to split the text. - All grammars supported by Java are acceptable, for example, `['"]` is expected to match `'` and `"`. -+ `index`: The wanted index of elements in the split result. - It should be an integer no less than -1, default to -1 which means the length of the result array is returned - and any non-negative integer is used to fetch the text of the specific index starting from 0. - -**Output Series:** Output a single series. The type is INT32 when `index` is -1 and TEXT when it's an valid index. - -**Note:** When `index` is out of the range of the result array, for example `0,1,2` split with `,` and `index` is set to 3, -no result are returned for that record. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2021-01-01T00:00:01.000+08:00| A,B,A+,B-| -|2021-01-01T00:00:02.000+08:00| A,A+,A,B+| -|2021-01-01T00:00:03.000+08:00| B+,B,B| -|2021-01-01T00:00:04.000+08:00| A+,A,A+,A| -|2021-01-01T00:00:05.000+08:00| A,B-,B,B| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------+ -| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")| -+-----------------------------+------------------------------------------------------+ -|2021-01-01T00:00:01.000+08:00| 4| -|2021-01-01T00:00:02.000+08:00| 4| -|2021-01-01T00:00:03.000+08:00| 3| -|2021-01-01T00:00:04.000+08:00| 4| -|2021-01-01T00:00:05.000+08:00| 4| -+-----------------------------+------------------------------------------------------+ -``` - -Another SQL for query: - -SQL for query: - -```sql -select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-----------------------------------------------------+ -| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="3")| -+-----------------------------+-----------------------------------------------------+ -|2021-01-01T00:00:01.000+08:00| B-| -|2021-01-01T00:00:02.000+08:00| B+| -|2021-01-01T00:00:04.000+08:00| A| -|2021-01-01T00:00:05.000+08:00| B| -+-----------------------------+-----------------------------------------------------+ -``` - - - -## Data Type Conversion Function - -The IoTDB currently supports 6 data types, including INT32, INT64 ,FLOAT, DOUBLE, BOOLEAN, TEXT. When we query or evaluate data, we may need to convert data types, such as TEXT to INT32, or FLOAT to DOUBLE. IoTDB supports cast function to convert data types. - -Syntax example: - -```sql -SELECT cast(s1 as INT32) from root.sg -``` - -The syntax of the cast function is consistent with that of PostgreSQL. The data type specified after AS indicates the target type to be converted. Currently, all six data types supported by IoTDB can be used in the cast function. The conversion rules to be followed are shown in the following table. The row represents the original data type, and the column represents the target data type to be converted into: - -| | **INT32** | **INT64** | **FLOAT** | **DOUBLE** | **BOOLEAN** | **TEXT** | -| ----------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | -------------------------------- | -| **INT32** | No need to cast | Cast directly | Cast directly | Cast directly | !=0 : true
==0: false | String.valueOf() | -| **INT64** | Out of the range of INT32: throw Exception
Otherwise: Cast directly | No need to cast | Cast directly | Cast directly | !=0L : true
==0: false | String.valueOf() | -| **FLOAT** | Out of the range of INT32: throw Exception
Otherwise: Math.round() | Out of the range of INT64: throw Exception
Otherwise: Math.round() | No need to cast | Cast directly | !=0.0f : true
==0: false | String.valueOf() | -| **DOUBLE** | Out of the range of INT32: throw Exception
Otherwise: Math.round() | Out of the range of INT64: throw Exception
Otherwise: Math.round() | Out of the range of FLOAT:throw Exception
Otherwise: Cast directly | No need to cast | !=0.0 : true
==0: false | String.valueOf() | -| **BOOLEAN** | true: 1
false: 0 | true: 1L
false: 0 | true: 1.0f
false: 0 | true: 1.0
false: 0 | No need to cast | true: "true"
false: "false" | -| **TEXT** | Integer.parseInt() | Long.parseLong() | Float.parseFloat() | Double.parseDouble() | text.toLowerCase =="true" : true
text.toLowerCase =="false" : false
Otherwise: throw Exception | No need to cast | - -### Examples - -``` -// timeseries -IoTDB> show timeseries root.sg.d1.** -+-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ -| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters| -+-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ -|root.sg.d1.s3| null| root.sg| FLOAT| PLAIN| SNAPPY|null| null| null| null| -|root.sg.d1.s4| null| root.sg| DOUBLE| PLAIN| SNAPPY|null| null| null| null| -|root.sg.d1.s5| null| root.sg| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| -|root.sg.d1.s6| null| root.sg| TEXT| PLAIN| SNAPPY|null| null| null| null| -|root.sg.d1.s1| null| root.sg| INT32| PLAIN| SNAPPY|null| null| null| null| -|root.sg.d1.s2| null| root.sg| INT64| PLAIN| SNAPPY|null| null| null| null| -+-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ - -// data of timeseries -IoTDB> select * from root.sg.d1; -+-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ -| Time|root.sg.d1.s3|root.sg.d1.s4|root.sg.d1.s5|root.sg.d1.s6|root.sg.d1.s1|root.sg.d1.s2| -+-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| false| 10000| 0| 0| -|1970-01-01T08:00:00.001+08:00| 1.0| 1.0| false| 3| 1| 1| -|1970-01-01T08:00:00.002+08:00| 2.7| 2.7| true| TRue| 2| 2| -|1970-01-01T08:00:00.003+08:00| 3.33| 3.33| true| faLse| 3| 3| -+-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ - -// cast BOOLEAN to other types -IoTDB> select cast(s5 as INT32), cast(s5 as INT64),cast(s5 as FLOAT),cast(s5 as DOUBLE), cast(s5 as TEXT) from root.sg.d1 -+-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ -| Time|CAST(root.sg.d1.s5 AS INT32)|CAST(root.sg.d1.s5 AS INT64)|CAST(root.sg.d1.s5 AS FLOAT)|CAST(root.sg.d1.s5 AS DOUBLE)|CAST(root.sg.d1.s5 AS TEXT)| -+-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ -|1970-01-01T08:00:00.000+08:00| 0| 0| 0.0| 0.0| false| -|1970-01-01T08:00:00.001+08:00| 0| 0| 0.0| 0.0| false| -|1970-01-01T08:00:00.002+08:00| 1| 1| 1.0| 1.0| true| -|1970-01-01T08:00:00.003+08:00| 1| 1| 1.0| 1.0| true| -+-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ - -// cast TEXT to numeric types -IoTDB> select cast(s6 as INT32), cast(s6 as INT64), cast(s6 as FLOAT), cast(s6 as DOUBLE) from root.sg.d1 where time < 2 -+-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ -| Time|CAST(root.sg.d1.s6 AS INT32)|CAST(root.sg.d1.s6 AS INT64)|CAST(root.sg.d1.s6 AS FLOAT)|CAST(root.sg.d1.s6 AS DOUBLE)| -+-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ -|1970-01-01T08:00:00.000+08:00| 10000| 10000| 10000.0| 10000.0| -|1970-01-01T08:00:00.001+08:00| 3| 3| 3.0| 3.0| -+-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ - -// cast TEXT to BOOLEAN -IoTDB> select cast(s6 as BOOLEAN) from root.sg.d1 where time >= 2 -+-----------------------------+------------------------------+ -| Time|CAST(root.sg.d1.s6 AS BOOLEAN)| -+-----------------------------+------------------------------+ -|1970-01-01T08:00:00.002+08:00| true| -|1970-01-01T08:00:00.003+08:00| false| -+-----------------------------+------------------------------+ -``` - - - - -## Constant Timeseries Generating Functions - -The constant timeseries generating function is used to generate a timeseries in which the values of all data points are the same. - -The constant timeseries generating function accepts one or more timeseries inputs, and the timestamp set of the output data points is the union of the timestamp sets of the input timeseries. - -Currently, IoTDB supports the following constant timeseries generating functions: - -| Function Name | Required Attributes | Output Series Data Type | Description | -| ------------- | ------------------------------------------------------------ | -------------------------------------------- | ------------------------------------------------------------ | -| CONST | `value`: the value of the output data point
`type`: the type of the output data point, it can only be INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | Determined by the required attribute `type` | Output the user-specified constant timeseries according to the attributes `value` and `type`. | -| PI | None | DOUBLE | Data point value: a `double` value of `π`, the ratio of the circumference of a circle to its diameter, which is equals to `Math.PI` in the *Java Standard Library*. | -| E | None | DOUBLE | Data point value: a `double` value of `e`, the base of the natural logarithms, which is equals to `Math.E` in the *Java Standard Library*. | - -Example: - -``` sql -select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; -``` - -Result: - -``` -select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; -+-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ -| Time|root.sg1.d1.s1|root.sg1.d1.s2|const(root.sg1.d1.s1, "value"="1024", "type"="INT64")|pi(root.sg1.d1.s2)|e(root.sg1.d1.s1, root.sg1.d1.s2)| -+-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| 1024| 3.141592653589793| 2.718281828459045| -|1970-01-01T08:00:00.001+08:00| 1.0| null| 1024| null| 2.718281828459045| -|1970-01-01T08:00:00.002+08:00| 2.0| null| 1024| null| 2.718281828459045| -|1970-01-01T08:00:00.003+08:00| null| 3.0| null| 3.141592653589793| 2.718281828459045| -|1970-01-01T08:00:00.004+08:00| null| 4.0| null| 3.141592653589793| 2.718281828459045| -+-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ -Total line number = 5 -It costs 0.005s -``` - - - -## Selector Functions - -Currently, IoTDB supports the following selector functions: - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| ------------- | ------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | -| TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the largest values in a time series. | -| BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the smallest values in a time series. | - -Example: - -``` sql -select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time > 2020-12-10T20:36:15.530+08:00; -``` - -Result: - -``` -+-----------------------------+--------------------+------------------------------+---------------------------------+ -| Time| root.sg1.d2.s1|top_k(root.sg1.d2.s1, "k"="2")|bottom_k(root.sg1.d2.s1, "k"="2")| -+-----------------------------+--------------------+------------------------------+---------------------------------+ -|2020-12-10T20:36:15.531+08:00| 1531604122307244742| 1531604122307244742| null| -|2020-12-10T20:36:15.532+08:00|-7426070874923281101| null| null| -|2020-12-10T20:36:15.533+08:00|-7162825364312197604| -7162825364312197604| null| -|2020-12-10T20:36:15.534+08:00|-8581625725655917595| null| -8581625725655917595| -|2020-12-10T20:36:15.535+08:00|-7667364751255535391| null| -7667364751255535391| -+-----------------------------+--------------------+------------------------------+---------------------------------+ -Total line number = 5 -It costs 0.006s -``` - - - -## Continuous Interval Functions - -The continuous interval functions are used to query all continuous intervals that meet specified conditions. -They can be divided into two categories according to return value: -1. Returns the start timestamp and time span of the continuous interval that meets the conditions (a time span of 0 means that only the start time point meets the conditions) -2. Returns the start timestamp of the continuous interval that meets the condition and the number of points in the interval (a number of 1 means that only the start time point meets the conditions) - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| ----------------- | ------------------------------------ | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | -| ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always 0(false), and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | -| NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always not 0, and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | -| ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always 0(false). Data points number `n` satisfy `n >= min && n <= max` | -| NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always not 0(false). Data points number `n` satisfy `n >= min && n <= max` | - -### Demonstrate -Example data: -``` -IoTDB> select s1,s2,s3,s4,s5 from root.sg.d2; -+-----------------------------+-------------+-------------+-------------+-------------+-------------+ -| Time|root.sg.d2.s1|root.sg.d2.s2|root.sg.d2.s3|root.sg.d2.s4|root.sg.d2.s5| -+-----------------------------+-------------+-------------+-------------+-------------+-------------+ -|1970-01-01T08:00:00.000+08:00| 0| 0| 0.0| 0.0| false| -|1970-01-01T08:00:00.001+08:00| 1| 1| 1.0| 1.0| true| -|1970-01-01T08:00:00.002+08:00| 1| 1| 1.0| 1.0| true| -|1970-01-01T08:00:00.003+08:00| 0| 0| 0.0| 0.0| false| -|1970-01-01T08:00:00.004+08:00| 1| 1| 1.0| 1.0| true| -|1970-01-01T08:00:00.005+08:00| 0| 0| 0.0| 0.0| false| -|1970-01-01T08:00:00.006+08:00| 0| 0| 0.0| 0.0| false| -|1970-01-01T08:00:00.007+08:00| 1| 1| 1.0| 1.0| true| -+-----------------------------+-------------+-------------+-------------+-------------+-------------+ -``` - -Sql: -```sql -select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_duration(s4) from root.sg.d2; -``` - -Result: -``` -+-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ -| Time|root.sg.d2.s1|zero_count(root.sg.d2.s1)|non_zero_count(root.sg.d2.s2)|zero_duration(root.sg.d2.s3)|non_zero_duration(root.sg.d2.s4)| -+-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0| 1| null| 0| null| -|1970-01-01T08:00:00.001+08:00| 1| null| 2| null| 1| -|1970-01-01T08:00:00.002+08:00| 1| null| null| null| null| -|1970-01-01T08:00:00.003+08:00| 0| 1| null| 0| null| -|1970-01-01T08:00:00.004+08:00| 1| null| 1| null| 0| -|1970-01-01T08:00:00.005+08:00| 0| 2| null| 1| null| -|1970-01-01T08:00:00.006+08:00| 0| null| null| null| null| -|1970-01-01T08:00:00.007+08:00| 1| null| 1| null| 0| -+-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ -``` - - - -## Variation Trend Calculation Functions - -Currently, IoTDB supports the following variation trend calculation functions: - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| ----------------------- | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | -| TIME_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | / | INT64 | Calculates the difference between the time stamp of a data point and the time stamp of the previous data point. There is no corresponding output for the first data point. | -| DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | -| NON_NEGATIVE_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the absolute value of the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | -| DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the rate of change of a data point compared to the previous data point, the result is equals to DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | -| NON_NEGATIVE_DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the absolute value of the rate of change of a data point compared to the previous data point, the result is equals to NON_NEGATIVE_DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | -| DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:optional,default is true. If is true, the previous data point is ignored when it is null and continues to find the first non-null value forwardly. If the value is false, previous data point is not ignored when it is null, the result is also null because null is used for subtraction | DOUBLE | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point, so output is null | - -Example: - -``` sql -select s1, time_difference(s1), difference(s1), non_negative_difference(s1), derivative(s1), non_negative_derivative(s1) from root.sg1.d1 limit 5 offset 1000; -``` - -Result: - -``` -+-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ -| Time| root.sg1.d1.s1|time_difference(root.sg1.d1.s1)|difference(root.sg1.d1.s1)|non_negative_difference(root.sg1.d1.s1)|derivative(root.sg1.d1.s1)|non_negative_derivative(root.sg1.d1.s1)| -+-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ -|2020-12-10T17:11:49.037+08:00|7360723084922759782| 1| -8431715764844238876| 8431715764844238876| -8.4317157648442388E18| 8.4317157648442388E18| -|2020-12-10T17:11:49.038+08:00|4377791063319964531| 1| -2982932021602795251| 2982932021602795251| -2.982932021602795E18| 2.982932021602795E18| -|2020-12-10T17:11:49.039+08:00|7972485567734642915| 1| 3594694504414678384| 3594694504414678384| 3.5946945044146785E18| 3.5946945044146785E18| -|2020-12-10T17:11:49.040+08:00|2508858212791964081| 1| -5463627354942678834| 5463627354942678834| -5.463627354942679E18| 5.463627354942679E18| -|2020-12-10T17:11:49.041+08:00|2817297431185141819| 1| 308439218393177738| 308439218393177738| 3.0843921839317773E17| 3.0843921839317773E17| -+-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ -Total line number = 5 -It costs 0.014s -``` - -### Example - -#### RawData - -``` -+-----------------------------+------------+------------+ -| Time|root.test.s1|root.test.s2| -+-----------------------------+------------+------------+ -|1970-01-01T08:00:00.001+08:00| 1| 1.0| -|1970-01-01T08:00:00.002+08:00| 2| null| -|1970-01-01T08:00:00.003+08:00| null| 3.0| -|1970-01-01T08:00:00.004+08:00| 4| null| -|1970-01-01T08:00:00.005+08:00| 5| 5.0| -|1970-01-01T08:00:00.006+08:00| null| 6.0| -+-----------------------------+------------+------------+ -``` - -#### Not use `ignoreNull` attribute (Ignore Null) - -SQL: -```sql -SELECT DIFF(s1), DIFF(s2) from root.test; -``` - -Result: -``` -+-----------------------------+------------------+------------------+ -| Time|DIFF(root.test.s1)|DIFF(root.test.s2)| -+-----------------------------+------------------+------------------+ -|1970-01-01T08:00:00.001+08:00| null| null| -|1970-01-01T08:00:00.002+08:00| 1.0| null| -|1970-01-01T08:00:00.003+08:00| null| 2.0| -|1970-01-01T08:00:00.004+08:00| 2.0| null| -|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.006+08:00| null| 1.0| -+-----------------------------+------------------+------------------+ -``` - -#### Use `ignoreNull` attribute - -SQL: -```sql -SELECT DIFF(s1, 'ignoreNull'='false'), DIFF(s2, 'ignoreNull'='false') from root.test; -``` - -Result: -``` -+-----------------------------+----------------------------------------+----------------------------------------+ -| Time|DIFF(root.test.s1, "ignoreNull"="false")|DIFF(root.test.s2, "ignoreNull"="false")| -+-----------------------------+----------------------------------------+----------------------------------------+ -|1970-01-01T08:00:00.001+08:00| null| null| -|1970-01-01T08:00:00.002+08:00| 1.0| null| -|1970-01-01T08:00:00.003+08:00| null| null| -|1970-01-01T08:00:00.004+08:00| null| null| -|1970-01-01T08:00:00.005+08:00| 1.0| null| -|1970-01-01T08:00:00.006+08:00| null| 1.0| -+-----------------------------+----------------------------------------+----------------------------------------+ -``` - - - -## Sample Functions - -### Equal Size Bucket Sample Function - -This function samples the input sequence in equal size buckets, that is, according to the downsampling ratio and downsampling method given by the user, the input sequence is equally divided into several buckets according to a fixed number of points. Sampling by the given sampling method within each bucket. -- `proportion`: sample ratio, the value range is `(0, 1]`. -#### Equal Size Bucket Random Sample -Random sampling is performed on the equally divided buckets. - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| -| EQUAL_SIZE_BUCKET_RANDOM_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns a random sample of equal buckets that matches the sampling ratio | - -##### Example - -Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`. - -```sql -IoTDB> select temperature from root.ln.wf01.wt01; -+-----------------------------+-----------------------------+ -| Time|root.ln.wf01.wt01.temperature| -+-----------------------------+-----------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -|1970-01-01T08:00:00.001+08:00| 1.0| -|1970-01-01T08:00:00.002+08:00| 2.0| -|1970-01-01T08:00:00.003+08:00| 3.0| -|1970-01-01T08:00:00.004+08:00| 4.0| -|1970-01-01T08:00:00.005+08:00| 5.0| -|1970-01-01T08:00:00.006+08:00| 6.0| -|1970-01-01T08:00:00.007+08:00| 7.0| -|1970-01-01T08:00:00.008+08:00| 8.0| -|1970-01-01T08:00:00.009+08:00| 9.0| -|1970-01-01T08:00:00.010+08:00| 10.0| -|1970-01-01T08:00:00.011+08:00| 11.0| -|1970-01-01T08:00:00.012+08:00| 12.0| -|.............................|.............................| -|1970-01-01T08:00:00.089+08:00| 89.0| -|1970-01-01T08:00:00.090+08:00| 90.0| -|1970-01-01T08:00:00.091+08:00| 91.0| -|1970-01-01T08:00:00.092+08:00| 92.0| -|1970-01-01T08:00:00.093+08:00| 93.0| -|1970-01-01T08:00:00.094+08:00| 94.0| -|1970-01-01T08:00:00.095+08:00| 95.0| -|1970-01-01T08:00:00.096+08:00| 96.0| -|1970-01-01T08:00:00.097+08:00| 97.0| -|1970-01-01T08:00:00.098+08:00| 98.0| -|1970-01-01T08:00:00.099+08:00| 99.0| -+-----------------------------+-----------------------------+ -``` -Sql: -```sql -select equal_size_bucket_random_sample(temperature,'proportion'='0.1') as random_sample from root.ln.wf01.wt01; -``` -Result: -```sql -+-----------------------------+-------------+ -| Time|random_sample| -+-----------------------------+-------------+ -|1970-01-01T08:00:00.007+08:00| 7.0| -|1970-01-01T08:00:00.014+08:00| 14.0| -|1970-01-01T08:00:00.020+08:00| 20.0| -|1970-01-01T08:00:00.035+08:00| 35.0| -|1970-01-01T08:00:00.047+08:00| 47.0| -|1970-01-01T08:00:00.059+08:00| 59.0| -|1970-01-01T08:00:00.063+08:00| 63.0| -|1970-01-01T08:00:00.079+08:00| 79.0| -|1970-01-01T08:00:00.086+08:00| 86.0| -|1970-01-01T08:00:00.096+08:00| 96.0| -+-----------------------------+-------------+ -Total line number = 10 -It costs 0.024s -``` - -#### Equal Size Bucket Aggregation Sample - -The input sequence is sampled by the aggregation sampling method, and the user needs to provide an additional aggregation function parameter, namely -- `type`: Aggregate type, which can be `avg` or `max` or `min` or `sum` or `extreme` or `variance`. By default, `avg` is used. `extreme` represents the value with the largest absolute value in the equal bucket. `variance` represents the variance in the sampling equal buckets. - -The timestamp of the sampling output of each bucket is the timestamp of the first point of the bucket. - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| -| EQUAL_SIZE_BUCKET_AGG_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1`
`type`: The value types are `avg`, `max`, `min`, `sum`, `extreme`, `variance`, the default is `avg` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket aggregation samples that match the sampling ratio | - -##### Example - -Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`, and the test data is randomly sampled in equal buckets. - -Sql: -```sql -select equal_size_bucket_agg_sample(temperature, 'type'='avg','proportion'='0.1') as agg_avg, equal_size_bucket_agg_sample(temperature, 'type'='max','proportion'='0.1') as agg_max, equal_size_bucket_agg_sample(temperature,'type'='min','proportion'='0.1') as agg_min, equal_size_bucket_agg_sample(temperature, 'type'='sum','proportion'='0.1') as agg_sum, equal_size_bucket_agg_sample(temperature, 'type'='extreme','proportion'='0.1') as agg_extreme, equal_size_bucket_agg_sample(temperature, 'type'='variance','proportion'='0.1') as agg_variance from root.ln.wf01.wt01; -``` -Result: -```sql -+-----------------------------+-----------------+-------+-------+-------+-----------+------------+ -| Time| agg_avg|agg_max|agg_min|agg_sum|agg_extreme|agg_variance| -+-----------------------------+-----------------+-------+-------+-------+-----------+------------+ -|1970-01-01T08:00:00.000+08:00| 4.5| 9.0| 0.0| 45.0| 9.0| 8.25| -|1970-01-01T08:00:00.010+08:00| 14.5| 19.0| 10.0| 145.0| 19.0| 8.25| -|1970-01-01T08:00:00.020+08:00| 24.5| 29.0| 20.0| 245.0| 29.0| 8.25| -|1970-01-01T08:00:00.030+08:00| 34.5| 39.0| 30.0| 345.0| 39.0| 8.25| -|1970-01-01T08:00:00.040+08:00| 44.5| 49.0| 40.0| 445.0| 49.0| 8.25| -|1970-01-01T08:00:00.050+08:00| 54.5| 59.0| 50.0| 545.0| 59.0| 8.25| -|1970-01-01T08:00:00.060+08:00| 64.5| 69.0| 60.0| 645.0| 69.0| 8.25| -|1970-01-01T08:00:00.070+08:00|74.50000000000001| 79.0| 70.0| 745.0| 79.0| 8.25| -|1970-01-01T08:00:00.080+08:00| 84.5| 89.0| 80.0| 845.0| 89.0| 8.25| -|1970-01-01T08:00:00.090+08:00| 94.5| 99.0| 90.0| 945.0| 99.0| 8.25| -+-----------------------------+-----------------+-------+-------+-------+-----------+------------+ -Total line number = 10 -It costs 0.044s -``` - -#### Equal Size Bucket M4 Sample - -The input sequence is sampled using the M4 sampling method. That is to sample the head, tail, min and max values for each bucket. - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| -| EQUAL_SIZE_BUCKET_M4_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket M4 samples that match the sampling ratio | - -##### Example - -Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`, and the test data is randomly sampled in equal buckets. - -Sql: -```sql -select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01; -``` -Result: -```sql -+-----------------------------+---------+ -| Time|M4_sample| -+-----------------------------+---------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -|1970-01-01T08:00:00.001+08:00| 1.0| -|1970-01-01T08:00:00.038+08:00| 38.0| -|1970-01-01T08:00:00.039+08:00| 39.0| -|1970-01-01T08:00:00.040+08:00| 40.0| -|1970-01-01T08:00:00.041+08:00| 41.0| -|1970-01-01T08:00:00.078+08:00| 78.0| -|1970-01-01T08:00:00.079+08:00| 79.0| -|1970-01-01T08:00:00.080+08:00| 80.0| -|1970-01-01T08:00:00.081+08:00| 81.0| -|1970-01-01T08:00:00.098+08:00| 98.0| -|1970-01-01T08:00:00.099+08:00| 99.0| -+-----------------------------+---------+ -Total line number = 12 -It costs 0.065s -``` - -#### Equal Size Bucket Outlier Sample - -This function samples the input sequence with equal number of bucket outliers, that is, according to the downsampling ratio given by the user and the number of samples in the bucket, the input sequence is divided into several buckets according to a fixed number of points. Sampling by the given outlier sampling method within each bucket. - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| -| EQUAL_SIZE_BUCKET_OUTLIER_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | The value range of `proportion` is `(0, 1]`, the default is `0.1`
The value of `type` is `avg` or `stendis` or `cos` or `prenextdis`, the default is `avg`
The value of `number` should be greater than 0, the default is `3`| INT32 / INT64 / FLOAT / DOUBLE | Returns outlier samples in equal buckets that match the sampling ratio and the number of samples in the bucket | - -Parameter Description -- `proportion`: sampling ratio -- `number`: the number of samples in each bucket, default `3` -- `type`: outlier sampling method, the value is - - `avg`: Take the average of the data points in the bucket, and find the `top number` farthest from the average according to the sampling ratio - - `stendis`: Take the vertical distance between each data point in the bucket and the first and last data points of the bucket to form a straight line, and according to the sampling ratio, find the `top number` with the largest distance - - `cos`: Set a data point in the bucket as b, the data point on the left of b as a, and the data point on the right of b as c, then take the cosine value of the angle between the ab and bc vectors. The larger the angle, the more likely it is an outlier. Find the `top number` with the smallest cos value - - `prenextdis`: Let a data point in the bucket be b, the data point to the left of b is a, and the data point to the right of b is c, then take the sum of the lengths of ab and bc as the yardstick, the larger the sum, the more likely it is to be an outlier, and find the `top number` with the largest sum value - -##### Example - -Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`. Among them, in order to add outliers, we make the number modulo 5 equal to 0 increment by 100. - -```sql -IoTDB> select temperature from root.ln.wf01.wt01; -+-----------------------------+-----------------------------+ -| Time|root.ln.wf01.wt01.temperature| -+-----------------------------+-----------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -|1970-01-01T08:00:00.001+08:00| 1.0| -|1970-01-01T08:00:00.002+08:00| 2.0| -|1970-01-01T08:00:00.003+08:00| 3.0| -|1970-01-01T08:00:00.004+08:00| 4.0| -|1970-01-01T08:00:00.005+08:00| 105.0| -|1970-01-01T08:00:00.006+08:00| 6.0| -|1970-01-01T08:00:00.007+08:00| 7.0| -|1970-01-01T08:00:00.008+08:00| 8.0| -|1970-01-01T08:00:00.009+08:00| 9.0| -|1970-01-01T08:00:00.010+08:00| 10.0| -|1970-01-01T08:00:00.011+08:00| 11.0| -|1970-01-01T08:00:00.012+08:00| 12.0| -|1970-01-01T08:00:00.013+08:00| 13.0| -|1970-01-01T08:00:00.014+08:00| 14.0| -|1970-01-01T08:00:00.015+08:00| 115.0| -|1970-01-01T08:00:00.016+08:00| 16.0| -|.............................|.............................| -|1970-01-01T08:00:00.092+08:00| 92.0| -|1970-01-01T08:00:00.093+08:00| 93.0| -|1970-01-01T08:00:00.094+08:00| 94.0| -|1970-01-01T08:00:00.095+08:00| 195.0| -|1970-01-01T08:00:00.096+08:00| 96.0| -|1970-01-01T08:00:00.097+08:00| 97.0| -|1970-01-01T08:00:00.098+08:00| 98.0| -|1970-01-01T08:00:00.099+08:00| 99.0| -+-----------------------------+-----------------------------+ -``` -Sql: -```sql -select equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='avg', 'number'='2') as outlier_avg_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='stendis', 'number'='2') as outlier_stendis_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='cos', 'number'='2') as outlier_cos_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='prenextdis', 'number'='2') as outlier_prenextdis_sample from root.ln.wf01.wt01; -``` -Result: -```sql -+-----------------------------+------------------+----------------------+------------------+-------------------------+ -| Time|outlier_avg_sample|outlier_stendis_sample|outlier_cos_sample|outlier_prenextdis_sample| -+-----------------------------+------------------+----------------------+------------------+-------------------------+ -|1970-01-01T08:00:00.005+08:00| 105.0| 105.0| 105.0| 105.0| -|1970-01-01T08:00:00.015+08:00| 115.0| 115.0| 115.0| 115.0| -|1970-01-01T08:00:00.025+08:00| 125.0| 125.0| 125.0| 125.0| -|1970-01-01T08:00:00.035+08:00| 135.0| 135.0| 135.0| 135.0| -|1970-01-01T08:00:00.045+08:00| 145.0| 145.0| 145.0| 145.0| -|1970-01-01T08:00:00.055+08:00| 155.0| 155.0| 155.0| 155.0| -|1970-01-01T08:00:00.065+08:00| 165.0| 165.0| 165.0| 165.0| -|1970-01-01T08:00:00.075+08:00| 175.0| 175.0| 175.0| 175.0| -|1970-01-01T08:00:00.085+08:00| 185.0| 185.0| 185.0| 185.0| -|1970-01-01T08:00:00.095+08:00| 195.0| 195.0| 195.0| 195.0| -+-----------------------------+------------------+----------------------+------------------+-------------------------+ -Total line number = 10 -It costs 0.041s -``` - -### M4 Function - -M4 is used to sample the `first, last, bottom, top` points for each sliding window: - -- the first point is the point with the **m**inimal time; -- the last point is the point with the **m**aximal time; -- the bottom point is the point with the **m**inimal value (if there are multiple such points, M4 returns one of them); -- the top point is the point with the **m**aximal value (if there are multiple such points, M4 returns one of them). - -image - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| ------------- | ------------------------------- | ------------------------------------------------------------ | ------------------------------ | ------------------------------------------------------------ | -| M4 | INT32 / INT64 / FLOAT / DOUBLE | Different attributes used by the size window and the time window. The size window uses attributes `windowSize` and `slidingStep`. The time window uses attributes `timeInterval`, `slidingStep`, `displayWindowBegin`, and `displayWindowEnd`. More details see below. | INT32 / INT64 / FLOAT / DOUBLE | Returns the `first, last, bottom, top` points in each sliding window. M4 sorts and deduplicates the aggregated points within the window before outputting them. | - -#### Attributes - -**(1) Attributes for the size window:** - -+ `windowSize`: The number of points in a window. Int data type. **Required**. -+ `slidingStep`: Slide a window by the number of points. Int data type. Optional. If not set, default to the same as `windowSize`. - -image - -**(2) Attributes for the time window:** - -+ `timeInterval`: The time interval length of a window. Long data type. **Required**. -+ `slidingStep`: Slide a window by the time length. Long data type. Optional. If not set, default to the same as `timeInterval`. -+ `displayWindowBegin`: The starting position of the window (included). Long data type. Optional. If not set, default to Long.MIN_VALUE, meaning using the time of the first data point of the input time series as the starting position of the window. -+ `displayWindowEnd`: End time limit (excluded, essentially playing the same role as `WHERE time < displayWindowEnd`). Long data type. Optional. If not set, default to Long.MAX_VALUE, meaning there is no additional end time limit other than the end of the input time series itself. - -groupBy window - -#### Examples - -Input series: - -```sql -+-----------------------------+------------------+ -| Time|root.vehicle.d1.s1| -+-----------------------------+------------------+ -|1970-01-01T08:00:00.001+08:00| 5.0| -|1970-01-01T08:00:00.002+08:00| 15.0| -|1970-01-01T08:00:00.005+08:00| 10.0| -|1970-01-01T08:00:00.008+08:00| 8.0| -|1970-01-01T08:00:00.010+08:00| 30.0| -|1970-01-01T08:00:00.020+08:00| 20.0| -|1970-01-01T08:00:00.025+08:00| 8.0| -|1970-01-01T08:00:00.027+08:00| 20.0| -|1970-01-01T08:00:00.030+08:00| 40.0| -|1970-01-01T08:00:00.033+08:00| 9.0| -|1970-01-01T08:00:00.035+08:00| 10.0| -|1970-01-01T08:00:00.040+08:00| 20.0| -|1970-01-01T08:00:00.045+08:00| 30.0| -|1970-01-01T08:00:00.052+08:00| 8.0| -|1970-01-01T08:00:00.054+08:00| 18.0| -+-----------------------------+------------------+ -``` - -SQL for query1: - -```sql -select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1 -``` - -Output1: - -```sql -+-----------------------------+-----------------------------------------------------------------------------------------------+ -| Time|M4(root.vehicle.d1.s1, "timeInterval"="25", "displayWindowBegin"="0", "displayWindowEnd"="100")| -+-----------------------------+-----------------------------------------------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 5.0| -|1970-01-01T08:00:00.010+08:00| 30.0| -|1970-01-01T08:00:00.020+08:00| 20.0| -|1970-01-01T08:00:00.025+08:00| 8.0| -|1970-01-01T08:00:00.030+08:00| 40.0| -|1970-01-01T08:00:00.045+08:00| 30.0| -|1970-01-01T08:00:00.052+08:00| 8.0| -|1970-01-01T08:00:00.054+08:00| 18.0| -+-----------------------------+-----------------------------------------------------------------------------------------------+ -Total line number = 8 -``` - -SQL for query2: - -```sql -select M4(s1,'windowSize'='10') from root.vehicle.d1 -``` - -Output2: - -```sql -+-----------------------------+-----------------------------------------+ -| Time|M4(root.vehicle.d1.s1, "windowSize"="10")| -+-----------------------------+-----------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 5.0| -|1970-01-01T08:00:00.030+08:00| 40.0| -|1970-01-01T08:00:00.033+08:00| 9.0| -|1970-01-01T08:00:00.035+08:00| 10.0| -|1970-01-01T08:00:00.045+08:00| 30.0| -|1970-01-01T08:00:00.052+08:00| 8.0| -|1970-01-01T08:00:00.054+08:00| 18.0| -+-----------------------------+-----------------------------------------+ -Total line number = 7 -``` - -#### Suggested Use Cases - -**(1) Use Case: Extreme-point-preserving downsampling** - -As M4 aggregation selects the `first, last, bottom, top` points for each window, M4 usually preserves extreme points and thus patterns better than other downsampling methods such as Piecewise Aggregate Approximation (PAA). Therefore, if you want to downsample the time series while preserving extreme points, you may give M4 a try. - -**(2) Use case: Error-free two-color line chart visualization of large-scale time series through M4 downsampling** - -Referring to paper ["M4: A Visualization-Oriented Time Series Data Aggregation"](http://www.vldb.org/pvldb/vol7/p797-jugel.pdf), M4 is a downsampling method to facilitate large-scale time series visualization without deforming the shape in terms of a two-color line chart. - -Given a chart of `w*h` pixels, suppose that the visualization time range of the time series is `[tqs,tqe)` and (tqe-tqs) is divisible by w, the points that fall within the `i`-th time span `Ii=[tqs+(tqe-tqs)/w*(i-1),tqs+(tqe-tqs)/w*i)` will be drawn on the `i`-th pixel column, i=1,2,...,w. Therefore, from a visualization-driven perspective, use the sql: `"select M4(s1,'timeInterval'='(tqe-tqs)/w','displayWindowBegin'='tqs','displayWindowEnd'='tqe') from root.vehicle.d1"` to sample the `first, last, bottom, top` points for each time span. The resulting downsampled time series has no more than `4*w` points, a big reduction compared to the original large-scale time series. Meanwhile, the two-color line chart drawn from the reduced data is identical that to that drawn from the original data (pixel-level consistency). - -To eliminate the hassle of hardcoding parameters, we recommend the following usage of Grafana's [template variable](https://grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#global-variables) `$__interval_ms` when Grafana is used for visualization: - -``` -select M4(s1,'timeInterval'='$__interval_ms') from root.sg1.d1 -``` - -where `timeInterval` is set as `(tqe-tqs)/w` automatically. Note that the time precision here is assumed to be milliseconds. - -#### Comparison with Other Functions - -| SQL | Whether support M4 aggregation | Sliding window type | Example | Docs | -| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -| 1. native built-in aggregate functions with Group By clause | No. Lack `BOTTOM_TIME` and `TOP_TIME`, which are respectively the time of the points that have the mininum and maximum value. | Time Window | `select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d)` | https://iotdb.apache.org/UserGuide/Master/Query-Data/Aggregate-Query.html#built-in-aggregate-functions
https://iotdb.apache.org/UserGuide/Master/Query-Data/Aggregate-Query.html#downsampling-aggregate-query | -| 2. EQUAL_SIZE_BUCKET_M4_SAMPLE (built-in UDF) | Yes* | Size Window. `windowSize = 4*(int)(1/proportion)` | `select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01` | https://iotdb.apache.org/UserGuide/Master/Query-Data/Select-Expression.html#time-series-generating-functions | -| **3. M4 (built-in UDF)** | Yes* | Size Window, Time Window | (1) Size Window: `select M4(s1,'windowSize'='10') from root.vehicle.d1`
(2) Time Window: `select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1` | refer to this doc | -| 4. extend native built-in aggregate functions with Group By clause to support M4 aggregation | not implemented | not implemented | not implemented | not implemented | - -Further compare `EQUAL_SIZE_BUCKET_M4_SAMPLE` and `M4`: - -**(1) Different M4 aggregation definition:** - -For each window, `EQUAL_SIZE_BUCKET_M4_SAMPLE` extracts the top and bottom points from points **EXCLUDING** the first and last points. - -In contrast, `M4` extracts the top and bottom points from points **INCLUDING** the first and last points, which is more consistent with the semantics of `max_value` and `min_value` stored in metadata. - -It is worth noting that both functions sort and deduplicate the aggregated points in a window before outputting them to the collectors. - -**(2) Different sliding windows:** - -`EQUAL_SIZE_BUCKET_M4_SAMPLE` uses SlidingSizeWindowAccessStrategy and **indirectly** controls sliding window size by sampling proportion. The conversion formula is `windowSize = 4*(int)(1/proportion)`. - -`M4` supports two types of sliding window: SlidingSizeWindowAccessStrategy and SlidingTimeWindowAccessStrategy. `M4` **directly** controls the window point size or time length using corresponding parameters. - - - - -## Time Series Processing - -### CHANGE_POINTS - -#### Usage - -This function is used to remove consecutive identical values from an input sequence. -For example, input:`1,1,2,2,3` output:`1,2,3`. - -**Name:** CHANGE_POINTS - -**Input Series:** Support only one input series. - -**Parameters:** No parameters. - -#### Example - -Raw data: - -``` -+-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ -| Time|root.testChangePoints.d1.s1|root.testChangePoints.d1.s2|root.testChangePoints.d1.s3|root.testChangePoints.d1.s4|root.testChangePoints.d1.s5|root.testChangePoints.d1.s6| -+-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ -|1970-01-01T08:00:00.001+08:00| true| 1| 1| 1.0| 1.0| 1test1| -|1970-01-01T08:00:00.002+08:00| true| 2| 2| 2.0| 1.0| 2test2| -|1970-01-01T08:00:00.003+08:00| false| 1| 2| 1.0| 1.0| 2test2| -|1970-01-01T08:00:00.004+08:00| true| 1| 3| 1.0| 1.0| 1test1| -|1970-01-01T08:00:00.005+08:00| true| 1| 3| 1.0| 1.0| 1test1| -+-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ -``` - -SQL for query: - -```sql -select change_points(s1), change_points(s2), change_points(s3), change_points(s4), change_points(s5), change_points(s6) from root.testChangePoints.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ -| Time|change_points(root.testChangePoints.d1.s1)|change_points(root.testChangePoints.d1.s2)|change_points(root.testChangePoints.d1.s3)|change_points(root.testChangePoints.d1.s4)|change_points(root.testChangePoints.d1.s5)|change_points(root.testChangePoints.d1.s6)| -+-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| true| 1| 1| 1.0| 1.0| 1test1| -|1970-01-01T08:00:00.002+08:00| null| 2| 2| 2.0| null| 2test2| -|1970-01-01T08:00:00.003+08:00| false| 1| null| 1.0| null| null| -|1970-01-01T08:00:00.004+08:00| true| null| 3| null| null| 1test1| -+-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ -``` - - - -## Lambda Expression - -### JEXL Function - -Java Expression Language (JEXL) is an expression language engine. We use JEXL to extend UDFs, which are implemented on the command line with simple lambda expressions. See the link for [operators supported in jexl lambda expressions](https://commons.apache.org/proper/commons-jexl/apidocs/org/apache/commons/jexl3/package-summary.html#customization). - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Series Data Type Description | -|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| -| JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr` is a lambda expression that supports standard one or multi arguments in the form `x -> {...}` or `(x, y, z) -> {...}`, e.g. ` x -> {x * 2}`, `(x, y, z) -> {x + y * z}` | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | Returns the input time series transformed by a lambda expression | - -##### Demonstrate -Example data: `root.ln.wf01.wt01.temperature`, `root.ln.wf01.wt01.st`, `root.ln.wf01.wt01.str` a total of `11` data. - -``` -IoTDB> select * from root.ln.wf01.wt01; -+-----------------------------+---------------------+--------------------+-----------------------------+ -| Time|root.ln.wf01.wt01.str|root.ln.wf01.wt01.st|root.ln.wf01.wt01.temperature| -+-----------------------------+---------------------+--------------------+-----------------------------+ -|1970-01-01T08:00:00.000+08:00| str| 10.0| 0.0| -|1970-01-01T08:00:00.001+08:00| str| 20.0| 1.0| -|1970-01-01T08:00:00.002+08:00| str| 30.0| 2.0| -|1970-01-01T08:00:00.003+08:00| str| 40.0| 3.0| -|1970-01-01T08:00:00.004+08:00| str| 50.0| 4.0| -|1970-01-01T08:00:00.005+08:00| str| 60.0| 5.0| -|1970-01-01T08:00:00.006+08:00| str| 70.0| 6.0| -|1970-01-01T08:00:00.007+08:00| str| 80.0| 7.0| -|1970-01-01T08:00:00.008+08:00| str| 90.0| 8.0| -|1970-01-01T08:00:00.009+08:00| str| 100.0| 9.0| -|1970-01-01T08:00:00.010+08:00| str| 110.0| 10.0| -+-----------------------------+---------------------+--------------------+-----------------------------+ -``` -Sql: -```sql -select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'expr'='x -> {x * 3}') as jexl2, jexl(temperature, 'expr'='x -> {x * x}') as jexl3, jexl(temperature, 'expr'='x -> {multiply(x, 100)}') as jexl4, jexl(temperature, st, 'expr'='(x, y) -> {x + y}') as jexl5, jexl(temperature, st, str, 'expr'='(x, y, z) -> {x + y + z}') as jexl6 from root.ln.wf01.wt01;``` -``` - -Result: -``` -+-----------------------------+-----+-----+-----+------+-----+--------+ -| Time|jexl1|jexl2|jexl3| jexl4|jexl5| jexl6| -+-----------------------------+-----+-----+-----+------+-----+--------+ -|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| 0.0| 0.0| 10.0| 10.0str| -|1970-01-01T08:00:00.001+08:00| 2.0| 3.0| 1.0| 100.0| 21.0| 21.0str| -|1970-01-01T08:00:00.002+08:00| 4.0| 6.0| 4.0| 200.0| 32.0| 32.0str| -|1970-01-01T08:00:00.003+08:00| 6.0| 9.0| 9.0| 300.0| 43.0| 43.0str| -|1970-01-01T08:00:00.004+08:00| 8.0| 12.0| 16.0| 400.0| 54.0| 54.0str| -|1970-01-01T08:00:00.005+08:00| 10.0| 15.0| 25.0| 500.0| 65.0| 65.0str| -|1970-01-01T08:00:00.006+08:00| 12.0| 18.0| 36.0| 600.0| 76.0| 76.0str| -|1970-01-01T08:00:00.007+08:00| 14.0| 21.0| 49.0| 700.0| 87.0| 87.0str| -|1970-01-01T08:00:00.008+08:00| 16.0| 24.0| 64.0| 800.0| 98.0| 98.0str| -|1970-01-01T08:00:00.009+08:00| 18.0| 27.0| 81.0| 900.0|109.0|109.0str| -|1970-01-01T08:00:00.010+08:00| 20.0| 30.0|100.0|1000.0|120.0|120.0str| -+-----------------------------+-----+-----+-----+------+-----+--------+ -Total line number = 11 -It costs 0.118s -``` - - - - -## Conditional Expressions - -### CASE - -The CASE expression is a kind of conditional expression that can be used to return different values based on specific conditions, similar to the if-else statements in other languages. - -The CASE expression consists of the following parts: - -- CASE keyword: Indicates the start of the CASE expression. -- WHEN-THEN clauses: There may be multiple clauses used to define conditions and give results. This clause is divided into two parts, WHEN and THEN. The WHEN part defines the condition, and the THEN part defines the result expression. If the WHEN condition is true, the corresponding THEN result is returned. -- ELSE clause: If none of the WHEN conditions is true, the result in the ELSE clause will be returned. The ELSE clause can be omitted. -- END keyword: Indicates the end of the CASE expression. - -The CASE expression is a scalar operation that can be used in combination with any other scalar operation or aggregate function. - -In the following text, all THEN parts and ELSE clauses will be collectively referred to as result clauses. - -#### Syntax - -The CASE expression supports two formats. - -- Format 1: - ```sql - CASE - WHEN condition1 THEN expression1 - [WHEN condition2 THEN expression2] ... - [ELSE expression_end] - END - ``` - The `condition`s will be evaluated one by one. - - The first `condition` that is true will return the corresponding expression. - -- Format 2: - ```sql - CASE caseValue - WHEN whenValue1 THEN expression1 - [WHEN whenValue2 THEN expression2] ... - [ELSE expression_end] - END - ``` - The `caseValue` will be evaluated first, and then the `whenValue`s will be evaluated one by one. The first `whenValue` that is equal to the `caseValue` will return the corresponding `expression`. - - Format 2 will be transformed into an equivalent Format 1 by iotdb. - - For example, the above SQL statement will be transformed into: - - ```sql - CASE - WHEN caseValue=whenValue1 THEN expression1 - [WHEN caseValue=whenValue1 THEN expression1] ... - [ELSE expression_end] - END - ``` - -If none of the conditions are true, or if none of the `whenValue`s match the `caseValue`, the `expression_end` will be returned. - -If there is no ELSE clause, `null` will be returned. - -#### Notes - -- In format 1, all WHEN clauses must return a BOOLEAN type. -- In format 2, all WHEN clauses must be able to be compared to the CASE clause. -- All result clauses in a CASE expression must satisfy certain conditions for their return value types: - - BOOLEAN types cannot coexist with other types and will cause an error if present. - - TEXT types cannot coexist with other types and will cause an error if present. - - The other four numeric types can coexist, and the final result will be of DOUBLE type, with possible precision loss during conversion. - - If necessary, you can use the CAST function to convert the result to a type that can coexist with others. -- The CASE expression does not implement lazy evaluation, meaning that all clauses will be evaluated. -- The CASE expression does not support mixing with UDFs. -- Aggregate functions cannot be used within a CASE expression, but the result of a CASE expression can be used as input for an aggregate function. -- When using the CLI, because the CASE expression string can be lengthy, it is recommended to provide an alias for the expression using AS. - -#### Using Examples - -##### Example 1 - -The CASE expression can be used to analyze data in a visual way. For example: -- The preparation of a certain chemical product requires that the temperature and pressure be within specific ranges. -- During the preparation process, sensors will detect the temperature and pressure, forming two time-series T (temperature) and P (pressure) in IoTDB. -In this application scenario, the CASE expression can indicate which time parameters are appropriate, which are not, and why they are not. - -data: -```sql -IoTDB> select * from root.test1 -+-----------------------------+------------+------------+ -| Time|root.test1.P|root.test1.T| -+-----------------------------+------------+------------+ -|2023-03-29T11:25:54.724+08:00| 1000000.0| 1025.0| -|2023-03-29T11:26:13.445+08:00| 1000094.0| 1040.0| -|2023-03-29T11:27:36.988+08:00| 1000095.0| 1041.0| -|2023-03-29T11:27:56.446+08:00| 1000095.0| 1059.0| -|2023-03-29T11:28:20.838+08:00| 1200000.0| 1040.0| -+-----------------------------+------------+------------+ -``` - -SQL statements: -```sql -select T, P, case -when 1000=1050 then "bad temperature" -when P<=1000000 or P>=1100000 then "bad pressure" -end as `result` -from root.test1 -``` - - -output: -``` -+-----------------------------+------------+------------+---------------+ -| Time|root.test1.T|root.test1.P| result| -+-----------------------------+------------+------------+---------------+ -|2023-03-29T11:25:54.724+08:00| 1025.0| 1000000.0| bad pressure| -|2023-03-29T11:26:13.445+08:00| 1040.0| 1000094.0| good!| -|2023-03-29T11:27:36.988+08:00| 1041.0| 1000095.0| good!| -|2023-03-29T11:27:56.446+08:00| 1059.0| 1000095.0|bad temperature| -|2023-03-29T11:28:20.838+08:00| 1040.0| 1200000.0| bad pressure| -+-----------------------------+------------+------------+---------------+ -``` - - -##### Example 2 - -The CASE expression can achieve flexible result transformation, such as converting strings with a certain pattern to other strings. - -data: -```sql -IoTDB> select * from root.test2 -+-----------------------------+--------------+ -| Time|root.test2.str| -+-----------------------------+--------------+ -|2023-03-27T18:23:33.427+08:00| abccd| -|2023-03-27T18:23:39.389+08:00| abcdd| -|2023-03-27T18:23:43.463+08:00| abcdefg| -+-----------------------------+--------------+ -``` - -SQL statements: -```sql -select str, case -when str like "%cc%" then "has cc" -when str like "%dd%" then "has dd" -else "no cc and dd" end as `result` -from root.test2 -``` - -output: -``` -+-----------------------------+--------------+------------+ -| Time|root.test2.str| result| -+-----------------------------+--------------+------------+ -|2023-03-27T18:23:33.427+08:00| abccd| has cc| -|2023-03-27T18:23:39.389+08:00| abcdd| has dd| -|2023-03-27T18:23:43.463+08:00| abcdefg|no cc and dd| -+-----------------------------+--------------+------------+ -``` - -##### Example 3: work with aggregation functions - -###### Valid: aggregation function ← CASE expression - -The CASE expression can be used as a parameter for aggregate functions. For example, used in conjunction with the COUNT function, it can implement statistics based on multiple conditions simultaneously. - -data: -```sql -IoTDB> select * from root.test3 -+-----------------------------+------------+ -| Time|root.test3.x| -+-----------------------------+------------+ -|2023-03-27T18:11:11.300+08:00| 0.0| -|2023-03-27T18:11:14.658+08:00| 1.0| -|2023-03-27T18:11:15.981+08:00| 2.0| -|2023-03-27T18:11:17.668+08:00| 3.0| -|2023-03-27T18:11:19.112+08:00| 4.0| -|2023-03-27T18:11:20.822+08:00| 5.0| -|2023-03-27T18:11:22.462+08:00| 6.0| -|2023-03-27T18:11:24.174+08:00| 7.0| -|2023-03-27T18:11:25.858+08:00| 8.0| -|2023-03-27T18:11:27.979+08:00| 9.0| -+-----------------------------+------------+ -``` - -SQL statements: - -```sql -select -count(case when x<=1 then 1 end) as `(-∞,1]`, -count(case when 1 select * from root.test4 -+-----------------------------+------------+ -| Time|root.test4.x| -+-----------------------------+------------+ -|1970-01-01T08:00:00.001+08:00| 1.0| -|1970-01-01T08:00:00.002+08:00| 2.0| -|1970-01-01T08:00:00.003+08:00| 3.0| -|1970-01-01T08:00:00.004+08:00| 4.0| -+-----------------------------+------------+ -``` - -SQL statements: -```sql -select x, case x when 1 then "one" when 2 then "two" else "other" end from root.test4 -``` - -output: -``` -+-----------------------------+------------+-----------------------------------------------------------------------------------+ -| Time|root.test4.x|CASE WHEN root.test4.x = 1 THEN "one" WHEN root.test4.x = 2 THEN "two" ELSE "other"| -+-----------------------------+------------+-----------------------------------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1.0| one| -|1970-01-01T08:00:00.002+08:00| 2.0| two| -|1970-01-01T08:00:00.003+08:00| 3.0| other| -|1970-01-01T08:00:00.004+08:00| 4.0| other| -+-----------------------------+------------+-----------------------------------------------------------------------------------+ -``` - -##### Example 5: type of return clauses - -The result clause of a CASE expression needs to satisfy certain type restrictions. - -In this example, we continue to use the data from Example 4. - -###### Invalid: BOOLEAN cannot coexist with other types - -SQL statements: -```sql -select x, case x when 1 then true when 2 then 2 end from root.test4 -``` - -output: -``` -Msg: 701: CASE expression: BOOLEAN and other types cannot exist at same time -``` - -###### Valid: Only BOOLEAN type exists - -SQL statements: -```sql -select x, case x when 1 then true when 2 then false end as `result` from root.test4 -``` - -output: -``` -+-----------------------------+------------+------+ -| Time|root.test4.x|result| -+-----------------------------+------------+------+ -|1970-01-01T08:00:00.001+08:00| 1.0| true| -|1970-01-01T08:00:00.002+08:00| 2.0| false| -|1970-01-01T08:00:00.003+08:00| 3.0| null| -|1970-01-01T08:00:00.004+08:00| 4.0| null| -+-----------------------------+------------+------+ -``` - -###### Invalid:TEXT cannot coexist with other types - -SQL statements: -```sql -select x, case x when 1 then 1 when 2 then "str" end from root.test4 -``` - -output: -``` -Msg: 701: CASE expression: TEXT and other types cannot exist at same time -``` - -###### Valid: Only TEXT type exists - -See in Example 1. - -###### Valid: Numerical types coexist - -SQL statements: -```sql -select x, case x -when 1 then 1 -when 2 then 222222222222222 -when 3 then 3.3 -when 4 then 4.4444444444444 -end as `result` -from root.test4 -``` - -output: -``` -+-----------------------------+------------+-------------------+ -| Time|root.test4.x| result| -+-----------------------------+------------+-------------------+ -|1970-01-01T08:00:00.001+08:00| 1.0| 1.0| -|1970-01-01T08:00:00.002+08:00| 2.0|2.22222222222222E14| -|1970-01-01T08:00:00.003+08:00| 3.0| 3.299999952316284| -|1970-01-01T08:00:00.004+08:00| 4.0| 4.44444465637207| -+-----------------------------+------------+-------------------+ -``` - diff --git a/src/UserGuide/V1.3.0-2/Reference/Keywords.md b/src/UserGuide/V1.3.0-2/Reference/Keywords.md deleted file mode 100644 index c098b3e99..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/Keywords.md +++ /dev/null @@ -1,227 +0,0 @@ - - -# Keywords - -Reserved words(Can not be used as identifier): - -- ROOT -- TIME -- TIMESTAMP - -Common Keywords: - -- ADD -- AFTER -- ALIAS -- ALIGN -- ALIGNED -- ALL -- ALTER -- ALTER_TIMESERIES -- ANY -- APPEND -- APPLY_TEMPLATE -- AS -- ASC -- ATTRIBUTES -- BEFORE -- BEGIN -- BLOCKED -- BOUNDARY -- BY -- CACHE -- CHILD -- CLEAR -- CLUSTER -- CONCAT -- CONFIGNODES -- CONFIGURATION -- CONTINUOUS -- COUNT -- CONTAIN -- CQ -- CQS -- CREATE -- CREATE_CONTINUOUS_QUERY -- CREATE_FUNCTION -- CREATE_ROLE -- CREATE_TIMESERIES -- CREATE_TRIGGER -- CREATE_USER -- DATA -- DATABASE -- DATABASES -- DATANODES -- DEACTIVATE -- DEBUG -- DELETE -- DELETE_ROLE -- DELETE_STORAGE_GROUP -- DELETE_TIMESERIES -- DELETE_USER -- DESC -- DESCRIBE -- DEVICE -- DEVICEID -- DEVICES -- DISABLE -- DISCARD -- DROP -- DROP_CONTINUOUS_QUERY -- DROP_FUNCTION -- DROP_TRIGGER -- END -- ENDTIME -- EVERY -- EXPLAIN -- FILL -- FILE -- FLUSH -- FOR -- FROM -- FULL -- FUNCTION -- FUNCTIONS -- GLOBAL -- GRANT -- GRANT_ROLE_PRIVILEGE -- GRANT_USER_PRIVILEGE -- GRANT_USER_ROLE -- GROUP -- HEAD -- HAVING -- INDEX -- INFO -- INSERT -- INSERT_TIMESERIES -- INTO -- KILL -- LABEL -- LAST -- LATEST -- LEVEL -- LIKE -- LIMIT -- LINEAR -- LINK -- LIST -- LIST_ROLE -- LIST_USER -- LOAD -- LOCAL -- LOCK -- MERGE -- METADATA -- MODIFY_PASSWORD -- NODES -- NONE -- NOW -- OF -- OFF -- OFFSET -- ON -- ORDER -- ONSUCCESS -- PARTITION -- PASSWORD -- PATHS -- PIPE -- PIPES -- PIPESINK -- PIPESINKS -- PIPESINKTYPE -- POLICY -- PREVIOUS -- PREVIOUSUNTILLAST -- PRIVILEGES -- PROCESSLIST -- PROPERTY -- PRUNE -- QUERIES -- QUERY -- RANGE -- READONLY -- READ_TEMPLATE -- READ_TEMPLATE_APPLICATION -- READ_TIMESERIES -- REGEXP -- REGIONID -- REGIONS -- REMOVE -- RENAME -- RESAMPLE -- RESOURCE -- REVOKE -- REVOKE_ROLE_PRIVILEGE -- REVOKE_USER_PRIVILEGE -- REVOKE_USER_ROLE -- ROLE -- RUNNING -- SCHEMA -- SELECT -- SERIESSLOTID -- SET -- SET_STORAGE_GROUP -- SETTLE -- SGLEVEL -- SHOW -- SLIMIT -- SOFFSET -- STORAGE -- START -- STARTTIME -- STATELESS -- STATEFUL -- STOP -- SYSTEM -- TAIL -- TAGS -- TASK -- TEMPLATE -- TIMEOUT -- TIMESERIES -- TIMESLOTID -- TO -- TOLERANCE -- TOP -- TRACING -- TRIGGER -- TRIGGERS -- TTL -- UNLINK -- UNLOAD -- UNSET -- UPDATE -- UPDATE_TEMPLATE -- UPSERT -- URI -- USER -- USING -- VALUES -- VERIFY -- VERSION -- VIEW -- WATERMARK_EMBEDDING -- WHERE -- WITH -- WITHOUT -- WRITABLE diff --git a/src/UserGuide/V1.3.0-2/Reference/Modify-Config-Manual.md b/src/UserGuide/V1.3.0-2/Reference/Modify-Config-Manual.md deleted file mode 100644 index 5a1650da7..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/Modify-Config-Manual.md +++ /dev/null @@ -1,71 +0,0 @@ - - -# Introduction to configuration item modification -## Method to modify -* Use sql statement to modify [recommended] -* Directly modify the configuration file [not recommended] -## Effective method -* Cannot be modified after the first startup. (first_start) -* Take effect after restart (restart) -* hot load (hot_reload) -# Modify configuration files directly -It can take effect by restarting or following the command -## Hot reload configuration command -Make changes to configuration items that support hot reloading take effect immediately. -For configuration items that have been modified in the configuration file, deleting or commenting them from the configuration file and then performing load configuration will restore the default values. -``` -load configuration -``` -# SetConfiguration statement -``` -set configuration "key1"="value1" "key2"="value2"... (on nodeId) -``` -### Example 1 -``` -set configuration "enable_cross_space_compaction"="false" -``` -To take effect permanently on all nodes in the cluster, set enable_cross_space_compaction to false and write it to iotdb-common.properties. -### Example 2 -``` -set configuration "enable_cross_space_compaction"="false" "enable_seq_space_compaction"="false" on 1 -``` -To take effect permanently on the node with nodeId 1, set enable_cross_space_compaction to false, set enable_seq_space_compaction to false, and write it to iotdb-common.properties. -### Example 3 -``` -set configuration "enable_cross_space_compaction"="false" "timestamp_precision"="ns" -``` -To take effect permanently on all nodes in the cluster, set enable_cross_space_compaction to false, timestamp_precision to ns, and write it to iotdb-common.properties. However, timestamp_precision is a configuration item that cannot be modified after the first startup, so the update of this configuration item will be ignored and the return is as follows. -``` -Msg: org.apache.iotdb.jdbc.IoTDBSQLException: 301: ignored config items: [timestamp_precision] -``` -Effective configuration item -Configuration items that support hot reloading and take effect immediately are marked with effectiveMode as hot_reload in the iotdb-common.properties.template file. - -Example -``` -# Used for indicate cluster name and distinguish different cluster. -# If you need to modify the cluster name, it's recommended to use 'set configuration "cluster_name=xxx"' sql. -# Manually modifying configuration file is not recommended, which may cause node restart fail. -# effectiveMode: hot_reload -# Datatype: string -cluster_name=defaultCluster -``` diff --git a/src/UserGuide/V1.3.0-2/Reference/Status-Codes.md b/src/UserGuide/V1.3.0-2/Reference/Status-Codes.md deleted file mode 100644 index 5dffc1ed5..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/Status-Codes.md +++ /dev/null @@ -1,178 +0,0 @@ - - -# Status Codes - -A sample solution as IoTDB requires registering the time series first before writing data is: - -``` -try { - writeData(); -} catch (SQLException e) { - // the most case is that the time series does not exist - if (e.getMessage().contains("exist")) { - //However, using the content of the error message is not so efficient - registerTimeSeries(); - //write data once again - writeData(); - } -} - -``` - -With Status Code, instead of writing codes like `if (e.getErrorMessage().contains("exist"))`, we can simply use `e.getErrorCode() == TSStatusCode.TIME_SERIES_NOT_EXIST_ERROR.getStatusCode()`. - -Here is a list of Status Code and related message: - -| Status Code | Status Type | Meanings | -|:------------|:---------------------------------------|:------------------------------------------------------------------------------------------| -| 200 | SUCCESS_STATUS | | -| 201 | INCOMPATIBLE_VERSION | Incompatible version | -| 202 | CONFIGURATION_ERROR | Configuration error | -| 203 | START_UP_ERROR | Meet error while starting | -| 204 | SHUT_DOWN_ERROR | Meet error while shutdown | -| 300 | UNSUPPORTED_OPERATION | Unsupported operation | -| 301 | EXECUTE_STATEMENT_ERROR | Execute statement error | -| 302 | MULTIPLE_ERROR | Meet error when executing multiple statements | -| 303 | ILLEGAL_PARAMETER | Parameter is illegal | -| 304 | OVERLAP_WITH_EXISTING_TASK | Current task has some conflict with existing tasks | -| 305 | INTERNAL_SERVER_ERROR | Internal server error | -| 306 | DISPATCH_ERROR | Meet error while dispatching | -| 400 | REDIRECTION_RECOMMEND | Recommend Client redirection | -| 500 | DATABASE_NOT_EXIST | Database does not exist | -| 501 | DATABASE_ALREADY_EXISTS | Database already exist | -| 502 | SERIES_OVERFLOW | Series number exceeds the threshold | -| 503 | TIMESERIES_ALREADY_EXIST | Timeseries already exists | -| 504 | TIMESERIES_IN_BLACK_LIST | Timeseries is being deleted | -| 505 | ALIAS_ALREADY_EXIST | Alias already exists | -| 506 | PATH_ALREADY_EXIST | Path already exists | -| 507 | METADATA_ERROR | Meet error when dealing with metadata | -| 508 | PATH_NOT_EXIST | Path does not exist | -| 509 | ILLEGAL_PATH | Illegal path | -| 510 | CREATE_TEMPLATE_ERROR | Create schema template error | -| 511 | DUPLICATED_TEMPLATE | Schema template is duplicated | -| 512 | UNDEFINED_TEMPLATE | Schema template is not defined | -| 513 | TEMPLATE_NOT_SET | Schema template is not set | -| 514 | DIFFERENT_TEMPLATE | Template is not consistent | -| 515 | TEMPLATE_IS_IN_USE | Template is in use | -| 516 | TEMPLATE_INCOMPATIBLE | Template is not compatible | -| 517 | SEGMENT_NOT_FOUND | Segment not found | -| 518 | PAGE_OUT_OF_SPACE | No enough space on schema page | -| 519 | RECORD_DUPLICATED | Record is duplicated | -| 520 | SEGMENT_OUT_OF_SPACE | No enough space on schema segment | -| 521 | PBTREE_FILE_NOT_EXISTS | PBTreeFile does not exist | -| 522 | OVERSIZE_RECORD | Size of record exceeds the threshold of page of PBTreeFile | -| 523 | PBTREE_FILE_REDO_LOG_BROKEN | PBTreeFile redo log has broken | -| 524 | TEMPLATE_NOT_ACTIVATED | Schema template is not activated | -| 526 | SCHEMA_QUOTA_EXCEEDED | Schema usage exceeds quota limit | -| 527 | MEASUREMENT_ALREADY_EXISTS_IN_TEMPLATE | Measurement already exists in schema template | -| 600 | SYSTEM_READ_ONLY | IoTDB system is read only | -| 601 | STORAGE_ENGINE_ERROR | Storage engine related error | -| 602 | STORAGE_ENGINE_NOT_READY | The storage engine is in recovery, not ready fore accepting read/write operation | -| 603 | DATAREGION_PROCESS_ERROR | DataRegion related error | -| 604 | TSFILE_PROCESSOR_ERROR | TsFile processor related error | -| 605 | WRITE_PROCESS_ERROR | Writing data related error | -| 606 | WRITE_PROCESS_REJECT | Writing data rejected error | -| 607 | OUT_OF_TTL | Insertion time is less than TTL time bound | -| 608 | COMPACTION_ERROR | Meet error while merging | -| 609 | ALIGNED_TIMESERIES_ERROR | Meet error in aligned timeseries | -| 610 | WAL_ERROR | WAL error | -| 611 | DISK_SPACE_INSUFFICIENT | Disk space is insufficient | -| 700 | SQL_PARSE_ERROR | Meet error while parsing SQL | -| 701 | SEMANTIC_ERROR | SQL semantic error | -| 702 | GENERATE_TIME_ZONE_ERROR | Meet error while generating time zone | -| 703 | SET_TIME_ZONE_ERROR | Meet error while setting time zone | -| 704 | QUERY_NOT_ALLOWED | Query statements are not allowed error | -| 705 | LOGICAL_OPERATOR_ERROR | Logical operator related error | -| 706 | LOGICAL_OPTIMIZE_ERROR | Logical optimize related error | -| 707 | UNSUPPORTED_FILL_TYPE | Unsupported fill type related error | -| 708 | QUERY_PROCESS_ERROR | Query process related error | -| 709 | MPP_MEMORY_NOT_ENOUGH | Not enough memory for task execution in MPP | -| 710 | CLOSE_OPERATION_ERROR | Meet error in close operation | -| 711 | TSBLOCK_SERIALIZE_ERROR | TsBlock serialization error | -| 712 | INTERNAL_REQUEST_TIME_OUT | MPP Operation timeout | -| 713 | INTERNAL_REQUEST_RETRY_ERROR | Internal operation retry failed | -| 714 | NO_SUCH_QUERY | Cannot find target query | -| 715 | QUERY_WAS_KILLED | Query was killed when execute | -| 800 | UNINITIALIZED_AUTH_ERROR | Failed to initialize auth module | -| 801 | WRONG_LOGIN_PASSWORD | Username or password is wrong | -| 802 | NOT_LOGIN | Not login | -| 803 | NO_PERMISSION | No permisstion to operate | -| 804 | USER_NOT_EXIST | User not exists | -| 805 | USER_ALREADY_EXIST | User already exists | -| 806 | USER_ALREADY_HAS_ROLE | User already has target role | -| 807 | USER_NOT_HAS_ROLE | User not has target role | -| 808 | ROLE_NOT_EXIST | Role not exists | -| 809 | ROLE_ALREADY_EXIST | Role already exists | -| 810 | ALREADY_HAS_PRIVILEGE | Already has privilege | -| 811 | NOT_HAS_PRIVILEGE | Not has privilege | -| 812 | CLEAR_PERMISSION_CACHE_ERROR | Failed to clear permission cache | -| 813 | UNKNOWN_AUTH_PRIVILEGE | Unknown auth privilege | -| 814 | UNSUPPORTED_AUTH_OPERATION | Unsupported auth operation | -| 815 | AUTH_IO_EXCEPTION | IO Exception in auth module | -| 900 | MIGRATE_REGION_ERROR | Error when migrate region | -| 901 | CREATE_REGION_ERROR | Create region error | -| 902 | DELETE_REGION_ERROR | Delete region error | -| 903 | PARTITION_CACHE_UPDATE_ERROR | Update partition cache failed | -| 904 | CONSENSUS_NOT_INITIALIZED | Consensus is not initialized and cannot provide service | -| 905 | REGION_LEADER_CHANGE_ERROR | Region leader migration failed | -| 906 | NO_AVAILABLE_REGION_GROUP | Cannot find an available region group | -| 907 | LACK_DATA_PARTITION_ALLOCATION | Lacked some data partition allocation result in the response | -| 1000 | DATANODE_ALREADY_REGISTERED | DataNode already registered in cluster | -| 1001 | NO_ENOUGH_DATANODE | The number of DataNode is not enough, cannot remove DataNode or create enough replication | -| 1002 | ADD_CONFIGNODE_ERROR | Add ConfigNode error | -| 1003 | REMOVE_CONFIGNODE_ERROR | Remove ConfigNode error | -| 1004 | DATANODE_NOT_EXIST | DataNode not exist error | -| 1005 | DATANODE_STOP_ERROR | DataNode stop error | -| 1006 | REMOVE_DATANODE_ERROR | Remove datanode failed | -| 1007 | REGISTER_DATANODE_WITH_WRONG_ID | The DataNode to be registered has incorrect register id | -| 1008 | CAN_NOT_CONNECT_DATANODE | Can not connect to DataNode | -| 1100 | LOAD_FILE_ERROR | Meet error while loading file | -| 1101 | LOAD_PIECE_OF_TSFILE_ERROR | Error when load a piece of TsFile when loading | -| 1102 | DESERIALIZE_PIECE_OF_TSFILE_ERROR | Error when deserialize a piece of TsFile | -| 1103 | SYNC_CONNECTION_ERROR | Sync connection error | -| 1104 | SYNC_FILE_REDIRECTION_ERROR | Sync TsFile redirection error | -| 1105 | SYNC_FILE_ERROR | Sync TsFile error | -| 1106 | CREATE_PIPE_SINK_ERROR | Failed to create a PIPE sink | -| 1107 | PIPE_ERROR | PIPE error | -| 1108 | PIPESERVER_ERROR | PIPE server error | -| 1109 | VERIFY_METADATA_ERROR | Meet error in validate timeseries schema | -| 1200 | UDF_LOAD_CLASS_ERROR | Error when loading UDF class | -| 1201 | UDF_DOWNLOAD_ERROR | DataNode cannot download UDF from ConfigNode | -| 1202 | CREATE_UDF_ON_DATANODE_ERROR | Error when create UDF on DataNode | -| 1203 | DROP_UDF_ON_DATANODE_ERROR | Error when drop a UDF on DataNode | -| 1300 | CREATE_TRIGGER_ERROR | ConfigNode create trigger error | -| 1301 | DROP_TRIGGER_ERROR | ConfigNode delete Trigger error | -| 1302 | TRIGGER_FIRE_ERROR | Error when firing trigger | -| 1303 | TRIGGER_LOAD_CLASS_ERROR | Error when load class of trigger | -| 1304 | TRIGGER_DOWNLOAD_ERROR | Error when download trigger from ConfigNode | -| 1305 | CREATE_TRIGGER_INSTANCE_ERROR | Error when create trigger instance | -| 1306 | ACTIVE_TRIGGER_INSTANCE_ERROR | Error when activate trigger instance | -| 1307 | DROP_TRIGGER_INSTANCE_ERROR | Error when drop trigger instance | -| 1308 | UPDATE_TRIGGER_LOCATION_ERROR | Error when move stateful trigger to new datanode | -| 1400 | NO_SUCH_CQ | CQ task does not exist | -| 1401 | CQ_ALREADY_ACTIVE | CQ is already active | -| 1402 | CQ_AlREADY_EXIST | CQ is already exist | -| 1403 | CQ_UPDATE_LAST_EXEC_TIME_ERROR | CQ update last execution time failed | - -> All exceptions are refactored in the latest version by extracting uniform message into exception classes. Different error codes are added to all exceptions. When an exception is caught and a higher-level exception is thrown, the error code will keep and pass so that users will know the detailed error reason. -A base exception class "ProcessException" is also added to be extended by all exceptions. - diff --git a/src/UserGuide/V1.3.0-2/Reference/UDF-Libraries.md b/src/UserGuide/V1.3.0-2/Reference/UDF-Libraries.md deleted file mode 100644 index 2867a78eb..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/UDF-Libraries.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -redirectTo: UDF-Libraries_apache.html ---- - \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Reference/UDF-Libraries_apache.md b/src/UserGuide/V1.3.0-2/Reference/UDF-Libraries_apache.md deleted file mode 100644 index 8bab853b8..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/UDF-Libraries_apache.md +++ /dev/null @@ -1,5245 +0,0 @@ - - -# UDF Libraries - -# UDF Libraries - -Based on the ability of user-defined functions, IoTDB provides a series of functions for temporal data processing, including data quality, data profiling, anomaly detection, frequency domain analysis, data matching, data repairing, sequence discovery, machine learning, etc., which can meet the needs of industrial fields for temporal data processing. - -> Note: The functions in the current UDF library only support millisecond level timestamp accuracy. - -## Installation steps - -1. Please obtain the compressed file of the UDF library JAR package that is compatible with the IoTDB version. - - | UDF installation package | Supported IoTDB versions | Download link | - | --------------- | ----------------- | ------------------------------------------------------------ | - | apache-UDF-1.3.3.zip | V1.3.3 and above |Please contact Timecho for assistance | - | apache-UDF-1.3.2.zip | V1.0.0~V1.3.2 | Please contact Timecho for assistance| - -2. Place the library-udf.jar file in the compressed file obtained in the directory `/ext/udf ` of all nodes in the IoTDB cluster -3. In the SQL operation interface of IoTDB's SQL command line terminal (CLI), execute the corresponding function registration statement as follows. -4. Batch registration: Two registration methods: registration script or SQL full statement -- Register Script - - Copy the registration script (register-UDF.sh or register-UDF.bat) from the compressed package to the `tools` directory of IoTDB as needed, and modify the parameters in the script (default is host=127.0.0.1, rpcPort=6667, user=root, pass=root); - - Start IoTDB service, run registration script to batch register UDF - -- All SQL statements - - Open the SQl file in the compressed package, copy all SQL statements, and in the SQL operation interface of IoTDB's SQL command line terminal (CLI), execute all SQl statements to batch register UDFs - - -## Data Quality - -### Completeness - -#### Registration statement - -```sql -create function completeness as 'org.apache.iotdb.library.dquality.UDTFCompleteness' -``` - -#### Usage - -This function is used to calculate the completeness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the completeness of each window will be output. - -**Name:** COMPLETENESS - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. -+ `downtime`: Whether the downtime exception is considered in the calculation of completeness. It is 'true' or 'false' (default). When considering the downtime exception, long-term missing data will be considered as downtime exception without any influence on completeness. - -**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. - -**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. - -#### Examples - -##### Default Parameters - -With default parameters, this function will regard all input data as the same window. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+-----------------------------+ -| Time|completeness(root.test.d1.s1)| -+-----------------------------+-----------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.875| -+-----------------------------+-----------------------------+ -``` - -##### Specific Window Size - -When the window size is given, this function will divide the input data as multiple windows. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -|2020-01-01T00:00:32.000+08:00| 130.0| -|2020-01-01T00:00:34.000+08:00| 132.0| -|2020-01-01T00:00:36.000+08:00| 134.0| -|2020-01-01T00:00:38.000+08:00| 136.0| -|2020-01-01T00:00:40.000+08:00| 138.0| -|2020-01-01T00:00:42.000+08:00| 140.0| -|2020-01-01T00:00:44.000+08:00| 142.0| -|2020-01-01T00:00:46.000+08:00| 144.0| -|2020-01-01T00:00:48.000+08:00| 146.0| -|2020-01-01T00:00:50.000+08:00| 148.0| -|2020-01-01T00:00:52.000+08:00| 150.0| -|2020-01-01T00:00:54.000+08:00| 152.0| -|2020-01-01T00:00:56.000+08:00| 154.0| -|2020-01-01T00:00:58.000+08:00| 156.0| -|2020-01-01T00:01:00.000+08:00| 158.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------+ -| Time|completeness(root.test.d1.s1, "window"="15")| -+-----------------------------+--------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.875| -|2020-01-01T00:00:32.000+08:00| 1.0| -+-----------------------------+--------------------------------------------+ -``` - -### Consistency - -#### Registration statement - -```sql -create function consistency as 'org.apache.iotdb.library.dquality.UDTFConsistency' -``` - -#### Usage - -This function is used to calculate the consistency of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the consistency of each window will be output. - -**Name:** CONSISTENCY - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. - -**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. - -**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. - -#### Examples - -##### Default Parameters - -With default parameters, this function will regard all input data as the same window. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+----------------------------+ -| Time|consistency(root.test.d1.s1)| -+-----------------------------+----------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| -+-----------------------------+----------------------------+ -``` - -##### Specific Window Size - -When the window size is given, this function will divide the input data as multiple windows. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -|2020-01-01T00:00:32.000+08:00| 130.0| -|2020-01-01T00:00:34.000+08:00| 132.0| -|2020-01-01T00:00:36.000+08:00| 134.0| -|2020-01-01T00:00:38.000+08:00| 136.0| -|2020-01-01T00:00:40.000+08:00| 138.0| -|2020-01-01T00:00:42.000+08:00| 140.0| -|2020-01-01T00:00:44.000+08:00| 142.0| -|2020-01-01T00:00:46.000+08:00| 144.0| -|2020-01-01T00:00:48.000+08:00| 146.0| -|2020-01-01T00:00:50.000+08:00| 148.0| -|2020-01-01T00:00:52.000+08:00| 150.0| -|2020-01-01T00:00:54.000+08:00| 152.0| -|2020-01-01T00:00:56.000+08:00| 154.0| -|2020-01-01T00:00:58.000+08:00| 156.0| -|2020-01-01T00:01:00.000+08:00| 158.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------+ -| Time|consistency(root.test.d1.s1, "window"="15")| -+-----------------------------+-------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| -|2020-01-01T00:00:32.000+08:00| 1.0| -+-----------------------------+-------------------------------------------+ -``` - -### Timeliness - -#### Registration statement - -```sql -create function timeliness as 'org.apache.iotdb.library.dquality.UDTFTimeliness' -``` - -#### Usage - -This function is used to calculate the timeliness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the timeliness of each window will be output. - -**Name:** TIMELINESS - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. - -**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. - -**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. - -#### Examples - -##### Default Parameters - -With default parameters, this function will regard all input data as the same window. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+---------------------------+ -| Time|timeliness(root.test.d1.s1)| -+-----------------------------+---------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| -+-----------------------------+---------------------------+ -``` - -##### Specific Window Size - -When the window size is given, this function will divide the input data as multiple windows. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -|2020-01-01T00:00:32.000+08:00| 130.0| -|2020-01-01T00:00:34.000+08:00| 132.0| -|2020-01-01T00:00:36.000+08:00| 134.0| -|2020-01-01T00:00:38.000+08:00| 136.0| -|2020-01-01T00:00:40.000+08:00| 138.0| -|2020-01-01T00:00:42.000+08:00| 140.0| -|2020-01-01T00:00:44.000+08:00| 142.0| -|2020-01-01T00:00:46.000+08:00| 144.0| -|2020-01-01T00:00:48.000+08:00| 146.0| -|2020-01-01T00:00:50.000+08:00| 148.0| -|2020-01-01T00:00:52.000+08:00| 150.0| -|2020-01-01T00:00:54.000+08:00| 152.0| -|2020-01-01T00:00:56.000+08:00| 154.0| -|2020-01-01T00:00:58.000+08:00| 156.0| -|2020-01-01T00:01:00.000+08:00| 158.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------+ -| Time|timeliness(root.test.d1.s1, "window"="15")| -+-----------------------------+------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| -|2020-01-01T00:00:32.000+08:00| 1.0| -+-----------------------------+------------------------------------------+ -``` - -### Validity - -#### Registration statement - -```sql -create function validity as 'org.apache.iotdb.library.dquality.UDTFValidity' -``` - -#### Usage - -This function is used to calculate the Validity of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the Validity of each window will be output. - -**Name:** VALIDITY - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. - -**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. - -**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. - -#### Examples - -##### Default Parameters - -With default parameters, this function will regard all input data as the same window. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select Validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+-------------------------+ -| Time|validity(root.test.d1.s1)| -+-----------------------------+-------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| -+-----------------------------+-------------------------+ -``` - -##### Specific Window Size - -When the window size is given, this function will divide the input data as multiple windows. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -|2020-01-01T00:00:32.000+08:00| 130.0| -|2020-01-01T00:00:34.000+08:00| 132.0| -|2020-01-01T00:00:36.000+08:00| 134.0| -|2020-01-01T00:00:38.000+08:00| 136.0| -|2020-01-01T00:00:40.000+08:00| 138.0| -|2020-01-01T00:00:42.000+08:00| 140.0| -|2020-01-01T00:00:44.000+08:00| 142.0| -|2020-01-01T00:00:46.000+08:00| 144.0| -|2020-01-01T00:00:48.000+08:00| 146.0| -|2020-01-01T00:00:50.000+08:00| 148.0| -|2020-01-01T00:00:52.000+08:00| 150.0| -|2020-01-01T00:00:54.000+08:00| 152.0| -|2020-01-01T00:00:56.000+08:00| 154.0| -|2020-01-01T00:00:58.000+08:00| 156.0| -|2020-01-01T00:01:00.000+08:00| 158.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select Validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 -``` - -Output series: - -``` -+-----------------------------+----------------------------------------+ -| Time|validity(root.test.d1.s1, "window"="15")| -+-----------------------------+----------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| -|2020-01-01T00:00:32.000+08:00| 1.0| -+-----------------------------+----------------------------------------+ -``` - - - - - -## Data Profiling - -### ACF - -#### Registration statement - -```sql -create function acf as 'org.apache.iotdb.library.dprofile.UDTFACF' -``` - -#### Usage - -This function is used to calculate the auto-correlation factor of the input time series, -which equals to cross correlation between the same series. -For more information, please refer to [XCorr](#XCorr) function. - -**Name:** ACF - -**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. -There are $2N-1$ data points in the series, and the values are interpreted in details in [XCorr](#XCorr) function. - -**Note:** - -+ `null` and `NaN` values in the input series will be ignored and treated as 0. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| 1| -|2020-01-01T00:00:02.000+08:00| null| -|2020-01-01T00:00:03.000+08:00| 3| -|2020-01-01T00:00:04.000+08:00| NaN| -|2020-01-01T00:00:05.000+08:00| 5| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 -``` - -Output series: - -``` -+-----------------------------+--------------------+ -| Time|acf(root.test.d1.s1)| -+-----------------------------+--------------------+ -|1970-01-01T08:00:00.001+08:00| 1.0| -|1970-01-01T08:00:00.002+08:00| 0.0| -|1970-01-01T08:00:00.003+08:00| 3.6| -|1970-01-01T08:00:00.004+08:00| 0.0| -|1970-01-01T08:00:00.005+08:00| 7.0| -|1970-01-01T08:00:00.006+08:00| 0.0| -|1970-01-01T08:00:00.007+08:00| 3.6| -|1970-01-01T08:00:00.008+08:00| 0.0| -|1970-01-01T08:00:00.009+08:00| 1.0| -+-----------------------------+--------------------+ -``` - -### Distinct - -#### Registration statement - -```sql -create function distinct as 'org.apache.iotdb.library.dprofile.UDTFDistinct' -``` - -#### Usage - -This function returns all unique values in time series. - -**Name:** DISTINCT - -**Input Series:** Only support a single input series. The type is arbitrary. - -**Output Series:** Output a single series. The type is the same as the input. - -**Note:** - -+ The timestamp of the output series is meaningless. The output order is arbitrary. -+ Missing points and null points in the input series will be ignored, but `NaN` will not. -+ Case Sensitive. - - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s2| -+-----------------------------+---------------+ -|2020-01-01T08:00:00.001+08:00| Hello| -|2020-01-01T08:00:00.002+08:00| hello| -|2020-01-01T08:00:00.003+08:00| Hello| -|2020-01-01T08:00:00.004+08:00| World| -|2020-01-01T08:00:00.005+08:00| World| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select distinct(s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------+ -| Time|distinct(root.test.d2.s2)| -+-----------------------------+-------------------------+ -|1970-01-01T08:00:00.001+08:00| Hello| -|1970-01-01T08:00:00.002+08:00| hello| -|1970-01-01T08:00:00.003+08:00| World| -+-----------------------------+-------------------------+ -``` - -### Histogram - -#### Registration statement - -```sql -create function histogram as 'org.apache.iotdb.library.dprofile.UDTFHistogram' -``` - -#### Usage - -This function is used to calculate the distribution histogram of a single column of numerical data. - -**Name:** HISTOGRAM - -**Input Series:** Only supports a single input sequence, the type is INT32 / INT64 / FLOAT / DOUBLE - -**Parameters:** - -+ `min`: The lower limit of the requested data range, the default value is -Double.MAX_VALUE. -+ `max`: The upper limit of the requested data range, the default value is Double.MAX_VALUE, and the value of start must be less than or equal to end. -+ `count`: The number of buckets of the histogram, the default value is 1. It must be a positive integer. - -**Output Series:** The value of the bucket of the histogram, where the lower bound represented by the i-th bucket (index starts from 1) is $min+ (i-1)\cdot\frac{max-min}{count}$ and the upper bound is $min + i \cdot \frac{max-min}{count}$. - -**Note:** - -+ If the value is lower than `min`, it will be put into the 1st bucket. If the value is larger than `max`, it will be put into the last bucket. -+ Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:00.000+08:00| 1.0| -|2020-01-01T00:00:01.000+08:00| 2.0| -|2020-01-01T00:00:02.000+08:00| 3.0| -|2020-01-01T00:00:03.000+08:00| 4.0| -|2020-01-01T00:00:04.000+08:00| 5.0| -|2020-01-01T00:00:05.000+08:00| 6.0| -|2020-01-01T00:00:06.000+08:00| 7.0| -|2020-01-01T00:00:07.000+08:00| 8.0| -|2020-01-01T00:00:08.000+08:00| 9.0| -|2020-01-01T00:00:09.000+08:00| 10.0| -|2020-01-01T00:00:10.000+08:00| 11.0| -|2020-01-01T00:00:11.000+08:00| 12.0| -|2020-01-01T00:00:12.000+08:00| 13.0| -|2020-01-01T00:00:13.000+08:00| 14.0| -|2020-01-01T00:00:14.000+08:00| 15.0| -|2020-01-01T00:00:15.000+08:00| 16.0| -|2020-01-01T00:00:16.000+08:00| 17.0| -|2020-01-01T00:00:17.000+08:00| 18.0| -|2020-01-01T00:00:18.000+08:00| 19.0| -|2020-01-01T00:00:19.000+08:00| 20.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+---------------------------------------------------------------+ -| Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")| -+-----------------------------+---------------------------------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 2| -|1970-01-01T08:00:00.001+08:00| 2| -|1970-01-01T08:00:00.002+08:00| 2| -|1970-01-01T08:00:00.003+08:00| 2| -|1970-01-01T08:00:00.004+08:00| 2| -|1970-01-01T08:00:00.005+08:00| 2| -|1970-01-01T08:00:00.006+08:00| 2| -|1970-01-01T08:00:00.007+08:00| 2| -|1970-01-01T08:00:00.008+08:00| 2| -|1970-01-01T08:00:00.009+08:00| 2| -+-----------------------------+---------------------------------------------------------------+ -``` - -### Integral - -#### Registration statement - -```sql -create function integral as 'org.apache.iotdb.library.dprofile.UDAFIntegral' -``` - -#### Usage - -This function is used to calculate the integration of time series, -which equals to the area under the curve with time as X-axis and values as Y-axis. - -**Name:** INTEGRAL - -**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `unit`: The unit of time used when computing the integral. - The value should be chosen from "1S", "1s", "1m", "1H", "1d"(case-sensitive), - and each represents taking one millisecond / second / minute / hour / day as 1.0 while calculating the area and integral. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the integration. - -**Note:** - -+ The integral value equals to the sum of the areas of right-angled trapezoids consisting of each two adjacent points and the time-axis. - Choosing different `unit` implies different scaling of time axis, thus making it apparent to convert the value among those results with constant coefficient. - -+ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. - -#### Examples - -##### Default Parameters - -With default parameters, this function will take one second as 1.0. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| 1| -|2020-01-01T00:00:02.000+08:00| 2| -|2020-01-01T00:00:03.000+08:00| 5| -|2020-01-01T00:00:04.000+08:00| 6| -|2020-01-01T00:00:05.000+08:00| 7| -|2020-01-01T00:00:08.000+08:00| 8| -|2020-01-01T00:00:09.000+08:00| NaN| -|2020-01-01T00:00:10.000+08:00| 10| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 -``` - -Output series: - -``` -+-----------------------------+-------------------------+ -| Time|integral(root.test.d1.s1)| -+-----------------------------+-------------------------+ -|1970-01-01T08:00:00.000+08:00| 57.5| -+-----------------------------+-------------------------+ -``` - -Calculation expression: -$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5$$ - -##### Specific time unit - -With time unit specified as "1m", this function will take one minute as 1.0. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 -``` - -Output series: - -``` -+-----------------------------+-------------------------+ -| Time|integral(root.test.d1.s1)| -+-----------------------------+-------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.958| -+-----------------------------+-------------------------+ -``` - -Calculation expression: -$$\frac{1}{2\times 60}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958$$ - -### IntegralAvg - -#### Registration statement - -```sql -create function integralavg as 'org.apache.iotdb.library.dprofile.UDAFIntegralAvg' -``` - -#### Usage - -This function is used to calculate the function average of time series. -The output equals to the area divided by the time interval using the same time `unit`. -For more information of the area under the curve, please refer to `Integral` function. - -**Name:** INTEGRALAVG - -**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the time-weighted average. - -**Note:** - -+ The time-weighted value equals to the integral value with any `unit` divided by the time interval of input series. - The result is irrelevant to the time unit used in integral, and it's consistent with the timestamp precision of IoTDB by default. - -+ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. - -+ If the input series is empty, the output value will be 0.0, but if there is only one data point, the value will equal to the input value. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| 1| -|2020-01-01T00:00:02.000+08:00| 2| -|2020-01-01T00:00:03.000+08:00| 5| -|2020-01-01T00:00:04.000+08:00| 6| -|2020-01-01T00:00:05.000+08:00| 7| -|2020-01-01T00:00:08.000+08:00| 8| -|2020-01-01T00:00:09.000+08:00| NaN| -|2020-01-01T00:00:10.000+08:00| 10| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 -``` - -Output series: - -``` -+-----------------------------+----------------------------+ -| Time|integralavg(root.test.d1.s1)| -+-----------------------------+----------------------------+ -|1970-01-01T08:00:00.000+08:00| 5.75| -+-----------------------------+----------------------------+ -``` - -Calculation expression: -$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75$$ - -### Mad - -#### Registration statement - -```sql -create function mad as 'org.apache.iotdb.library.dprofile.UDAFMad' -``` - -#### Usage - -The function is used to compute the exact or approximate median absolute deviation (MAD) of a numeric time series. MAD is the median of the deviation of each element from the elements' median. - -Take a dataset $\{1,3,3,5,5,6,7,8,9\}$ as an instance. Its median is 5 and the deviation of each element from the median is $\{0,0,1,2,2,2,3,4,4\}$, whose median is 2. Therefore, the MAD of the original dataset is 2. - -**Name:** MAD - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameter:** - -+ `error`: The relative error of the approximate MAD. It should be within [0,1) and the default value is 0. Taking `error`=0.01 as an instance, suppose the exact MAD is $a$ and the approximate MAD is $b$, we have $0.99a \le b \le 1.01a$. With `error`=0, the output is the exact MAD. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the MAD. - -**Note:** Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -##### Exact Query - -With the default `error`(`error`=0), the function queries the exact MAD. - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -............ -Total line number = 20 -``` - -SQL for query: - -```sql -select mad(s1) from root.test -``` - -Output series: - -``` -+-----------------------------+---------------------------------+ -| Time|median(root.test.s1, "error"="0")| -+-----------------------------+---------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -+-----------------------------+---------------------------------+ -``` - -##### Approximate Query - -By setting `error` within (0,1), the function queries the approximate MAD. - -SQL for query: - -```sql -select mad(s1, "error"="0.01") from root.test -``` - -Output series: - -``` -+-----------------------------+---------------------------------+ -| Time|mad(root.test.s1, "error"="0.01")| -+-----------------------------+---------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.9900000000000001| -+-----------------------------+---------------------------------+ -``` - -### Median - -#### Registration statement - -```sql -create function median as 'org.apache.iotdb.library.dprofile.UDAFMedian' -``` - -#### Usage - -The function is used to compute the exact or approximate median of a numeric time series. Median is the value separating the higher half from the lower half of a data sample. - -**Name:** MEDIAN - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameter:** - -+ `error`: The rank error of the approximate median. It should be within [0,1) and the default value is 0. For instance, a median with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact median. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the median. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -Total line number = 20 -``` - -SQL for query: - -```sql -select median(s1, "error"="0.01") from root.test -``` - -Output series: - -``` -+-----------------------------+------------------------------------+ -| Time|median(root.test.s1, "error"="0.01")| -+-----------------------------+------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -+-----------------------------+------------------------------------+ -``` - -### MinMax - -#### Registration statement - -```sql -create function minmax as 'org.apache.iotdb.library.dprofile.UDTFMinMax' -``` - -#### Usage - -This function is used to standardize the input series with min-max. Minimum value is transformed to 0; maximum value is transformed to 1. - -**Name:** MINMAX - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide minimum and maximum values. The default method is "batch". -+ `min`: The maximum value when method is set to "stream". -+ `max`: The minimum value when method is set to "stream". - -**Output Series:** Output a single series. The type is DOUBLE. - -#### Examples - -##### Batch computing - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select minmax(s1) from root.test -``` - -Output series: - -``` -+-----------------------------+--------------------+ -| Time|minmax(root.test.s1)| -+-----------------------------+--------------------+ -|1970-01-01T08:00:00.100+08:00| 0.16666666666666666| -|1970-01-01T08:00:00.200+08:00| 0.16666666666666666| -|1970-01-01T08:00:00.300+08:00| 0.25| -|1970-01-01T08:00:00.400+08:00| 0.08333333333333333| -|1970-01-01T08:00:00.500+08:00| 0.16666666666666666| -|1970-01-01T08:00:00.600+08:00| 0.16666666666666666| -|1970-01-01T08:00:00.700+08:00| 0.0| -|1970-01-01T08:00:00.800+08:00| 0.3333333333333333| -|1970-01-01T08:00:00.900+08:00| 0.16666666666666666| -|1970-01-01T08:00:01.000+08:00| 0.16666666666666666| -|1970-01-01T08:00:01.100+08:00| 0.25| -|1970-01-01T08:00:01.200+08:00| 0.08333333333333333| -|1970-01-01T08:00:01.300+08:00| 0.08333333333333333| -|1970-01-01T08:00:01.400+08:00| 0.25| -|1970-01-01T08:00:01.500+08:00| 0.16666666666666666| -|1970-01-01T08:00:01.600+08:00| 0.16666666666666666| -|1970-01-01T08:00:01.700+08:00| 1.0| -|1970-01-01T08:00:01.800+08:00| 0.3333333333333333| -|1970-01-01T08:00:01.900+08:00| 0.0| -|1970-01-01T08:00:02.000+08:00| 0.16666666666666666| -+-----------------------------+--------------------+ -``` - - -### MvAvg - -#### Registration statement - -```sql -create function mvavg as 'org.apache.iotdb.library.dprofile.UDTFMvAvg' -``` - -#### Usage - -This function is used to calculate moving average of input series. - -**Name:** MVAVG - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `window`: Length of the moving window. Default value is 10. - -**Output Series:** Output a single series. The type is DOUBLE. - -#### Examples - -##### Batch computing - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select mvavg(s1, "window"="3") from root.test -``` - -Output series: - -``` -+-----------------------------+---------------------------------+ -| Time|mvavg(root.test.s1, "window"="3")| -+-----------------------------+---------------------------------+ -|1970-01-01T08:00:00.300+08:00| 0.3333333333333333| -|1970-01-01T08:00:00.400+08:00| 0.0| -|1970-01-01T08:00:00.500+08:00| -0.3333333333333333| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -0.6666666666666666| -|1970-01-01T08:00:00.800+08:00| 0.0| -|1970-01-01T08:00:00.900+08:00| 0.6666666666666666| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 0.3333333333333333| -|1970-01-01T08:00:01.200+08:00| 0.0| -|1970-01-01T08:00:01.300+08:00| -0.6666666666666666| -|1970-01-01T08:00:01.400+08:00| 0.0| -|1970-01-01T08:00:01.500+08:00| 0.3333333333333333| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 3.3333333333333335| -|1970-01-01T08:00:01.800+08:00| 4.0| -|1970-01-01T08:00:01.900+08:00| 0.0| -|1970-01-01T08:00:02.000+08:00| -0.6666666666666666| -+-----------------------------+---------------------------------+ -``` - -### PACF - -#### Registration statement - -```sql -create function pacf as 'org.apache.iotdb.library.dprofile.UDTFPACF' -``` - -#### Usage - -This function is used to calculate partial autocorrelation of input series by solving Yule-Walker equation. For some cases, the equation may not be solved, and NaN will be output. - -**Name:** PACF - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `lag`: Maximum lag of pacf to calculate. The default value is $\min(10\log_{10}n,n-1)$, where $n$ is the number of data points. - -**Output Series:** Output a single series. The type is DOUBLE. - -#### Examples - -##### Assigning maximum lag - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| 1| -|2020-01-01T00:00:02.000+08:00| NaN| -|2020-01-01T00:00:03.000+08:00| 3| -|2020-01-01T00:00:04.000+08:00| NaN| -|2020-01-01T00:00:05.000+08:00| 5| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select pacf(s1, "lag"="5") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+--------------------------------+ -| Time|pacf(root.test.d1.s1, "lag"="5")| -+-----------------------------+--------------------------------+ -|2020-01-01T00:00:01.000+08:00| 1.0| -|2020-01-01T00:00:02.000+08:00| -0.5744680851063829| -|2020-01-01T00:00:03.000+08:00| 0.3172297297297296| -|2020-01-01T00:00:04.000+08:00| -0.2977686586304181| -|2020-01-01T00:00:05.000+08:00| -2.0609033521065867| -+-----------------------------+--------------------------------+ -``` - -### Percentile - -#### Registration statement - -```sql -create function percentile as 'org.apache.iotdb.library.dprofile.UDAFPercentile' -``` - -#### Usage - -The function is used to compute the exact or approximate percentile of a numeric time series. A percentile is value of element in the certain rank of the sorted series. - -**Name:** PERCENTILE - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameter:** - -+ `rank`: The rank percentage of the percentile. It should be (0,1] and the default value is 0.5. For instance, a percentile with `rank`=0.5 is the median. -+ `error`: The rank error of the approximate percentile. It should be within [0,1) and the default value is 0. For instance, a 0.5-percentile with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact percentile. - -**Output Series:** Output a single series. The type is the same as input series. If `error`=0, there is only one data point in the series, whose timestamp is the same has which the first percentile value has, and value is the percentile, otherwise the timestamp of the only data point is 0. - -**Note:** Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+-------------+ -| Time|root.test2.s1| -+-----------------------------+-------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+-------------+ -Total line number = 20 -``` - -SQL for query: - -```sql -select percentile(s0, "rank"="0.2", "error"="0.01") from root.test -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------------------+ -| Time|percentile(root.test2.s1, "rank"="0.2", "error"="0.01")| -+-----------------------------+-------------------------------------------------------+ -|1970-01-01T08:00:00.000+08:00| -1.0| -+-----------------------------+-------------------------------------------------------+ -``` - -### Quantile - -#### Registration statement - -```sql -create function quantile as 'org.apache.iotdb.library.dprofile.UDAFQuantile' -``` - -#### Usage - -The function is used to compute the approximate quantile of a numeric time series. A quantile is value of element in the certain rank of the sorted series. - -**Name:** QUANTILE - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameter:** - -+ `rank`: The rank of the quantile. It should be (0,1] and the default value is 0.5. For instance, a quantile with `rank`=0.5 is the median. -+ `K`: The size of KLL sketch maintained in the query. It should be within [100,+inf) and the default value is 800. For instance, the 0.5-quantile computed by a KLL sketch with K=800 items is a value with rank quantile 0.49~0.51 with a confidence of at least 99%. The result will be more accurate as K increases. - -**Output Series:** Output a single series. The type is the same as input series. The timestamp of the only data point is 0. - -**Note:** Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+-------------+ -| Time|root.test1.s1| -+-----------------------------+-------------+ -|2021-03-17T10:32:17.054+08:00| 7| -|2021-03-17T10:32:18.054+08:00| 15| -|2021-03-17T10:32:19.054+08:00| 36| -|2021-03-17T10:32:20.054+08:00| 39| -|2021-03-17T10:32:21.054+08:00| 40| -|2021-03-17T10:32:22.054+08:00| 41| -|2021-03-17T10:32:23.054+08:00| 20| -|2021-03-17T10:32:24.054+08:00| 18| -+-----------------------------+-------------+ -............ -Total line number = 8 -``` - -SQL for query: - -```sql -select quantile(s1, "rank"="0.2", "K"="800") from root.test1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------+ -| Time|quantile(root.test1.s1, "rank"="0.2", "K"="800")| -+-----------------------------+------------------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 7.000000000000001| -+-----------------------------+------------------------------------------------+ -``` - -### Period - -#### Registration statement - -```sql -create function period as 'org.apache.iotdb.library.dprofile.UDAFPeriod' -``` - -#### Usage - -The function is used to compute the period of a numeric time series. - -**Name:** PERIOD - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is INT32. There is only one data point in the series, whose timestamp is 0 and value is the period. - -#### Examples - -Input series: - - -``` -+-----------------------------+---------------+ -| Time|root.test.d3.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.001+08:00| 1.0| -|1970-01-01T08:00:00.002+08:00| 2.0| -|1970-01-01T08:00:00.003+08:00| 3.0| -|1970-01-01T08:00:00.004+08:00| 1.0| -|1970-01-01T08:00:00.005+08:00| 2.0| -|1970-01-01T08:00:00.006+08:00| 3.0| -|1970-01-01T08:00:00.007+08:00| 1.0| -|1970-01-01T08:00:00.008+08:00| 2.0| -|1970-01-01T08:00:00.009+08:00| 3.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select period(s1) from root.test.d3 -``` - -Output series: - -``` -+-----------------------------+-----------------------+ -| Time|period(root.test.d3.s1)| -+-----------------------------+-----------------------+ -|1970-01-01T08:00:00.000+08:00| 3| -+-----------------------------+-----------------------+ -``` - -### QLB - -#### Registration statement - -```sql -create function qlb as 'org.apache.iotdb.library.dprofile.UDTFQLB' -``` - -#### Usage - -This function is used to calculate Ljung-Box statistics $Q_{LB}$ for time series, and convert it to p value. - -**Name:** QLB - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters**: - -`lag`: max lag to calculate. Legal input shall be integer from 1 to n-2, where n is the sample number. Default value is n-2. - -**Output Series:** Output a single series. The type is DOUBLE. The output series is p value, and timestamp means lag. - -**Note:** If you want to calculate Ljung-Box statistics $Q_{LB}$ instead of p value, you may use ACF function. - -#### Examples - -##### Using Default Parameter - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T00:00:00.100+08:00| 1.22| -|1970-01-01T00:00:00.200+08:00| -2.78| -|1970-01-01T00:00:00.300+08:00| 1.53| -|1970-01-01T00:00:00.400+08:00| 0.70| -|1970-01-01T00:00:00.500+08:00| 0.75| -|1970-01-01T00:00:00.600+08:00| -0.72| -|1970-01-01T00:00:00.700+08:00| -0.22| -|1970-01-01T00:00:00.800+08:00| 0.28| -|1970-01-01T00:00:00.900+08:00| 0.57| -|1970-01-01T00:00:01.000+08:00| -0.22| -|1970-01-01T00:00:01.100+08:00| -0.72| -|1970-01-01T00:00:01.200+08:00| 1.34| -|1970-01-01T00:00:01.300+08:00| -0.25| -|1970-01-01T00:00:01.400+08:00| 0.17| -|1970-01-01T00:00:01.500+08:00| 2.51| -|1970-01-01T00:00:01.600+08:00| 1.42| -|1970-01-01T00:00:01.700+08:00| -1.34| -|1970-01-01T00:00:01.800+08:00| -0.01| -|1970-01-01T00:00:01.900+08:00| -0.49| -|1970-01-01T00:00:02.000+08:00| 1.63| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select QLB(s1) from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+--------------------+ -| Time|QLB(root.test.d1.s1)| -+-----------------------------+--------------------+ -|1970-01-01T00:00:00.001+08:00| 0.2168702295315677| -|1970-01-01T00:00:00.002+08:00| 0.3068948509261751| -|1970-01-01T00:00:00.003+08:00| 0.4217859150918444| -|1970-01-01T00:00:00.004+08:00| 0.5114539874276656| -|1970-01-01T00:00:00.005+08:00| 0.6560619525616759| -|1970-01-01T00:00:00.006+08:00| 0.7722398654053280| -|1970-01-01T00:00:00.007+08:00| 0.8532491661465290| -|1970-01-01T00:00:00.008+08:00| 0.9028575017542528| -|1970-01-01T00:00:00.009+08:00| 0.9434989988192729| -|1970-01-01T00:00:00.010+08:00| 0.8950280161464689| -|1970-01-01T00:00:00.011+08:00| 0.7701048398839656| -|1970-01-01T00:00:00.012+08:00| 0.7845536060001281| -|1970-01-01T00:00:00.013+08:00| 0.5943030981705825| -|1970-01-01T00:00:00.014+08:00| 0.4618413512531093| -|1970-01-01T00:00:00.015+08:00| 0.2645948244673964| -|1970-01-01T00:00:00.016+08:00| 0.3167530476666645| -|1970-01-01T00:00:00.017+08:00| 0.2330010780351453| -|1970-01-01T00:00:00.018+08:00| 0.0666611237622325| -+-----------------------------+--------------------+ -``` - -### Resample - -#### Registration statement - -```sql -create function re_sample as 'org.apache.iotdb.library.dprofile.UDTFResample' -``` - -#### Usage - -This function is used to resample the input series according to a given frequency, -including up-sampling and down-sampling. -Currently, the supported up-sampling methods are -NaN (filling with `NaN`), -FFill (filling with previous value), -BFill (filling with next value) and -Linear (filling with linear interpolation). -Down-sampling relies on group aggregation, -which supports Max, Min, First, Last, Mean and Median. - -**Name:** RESAMPLE - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - - -+ `every`: The frequency of resampling, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. -+ `interp`: The interpolation method of up-sampling, which is 'NaN', 'FFill', 'BFill' or 'Linear'. By default, NaN is used. -+ `aggr`: The aggregation method of down-sampling, which is 'Max', 'Min', 'First', 'Last', 'Mean' or 'Median'. By default, Mean is used. -+ `start`: The start time (inclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the first valid data point. -+ `end`: The end time (exclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the last valid data point. - -**Output Series:** Output a single series. The type is DOUBLE. It is strictly equispaced with the frequency `every`. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - -##### Up-sampling - -When the frequency of resampling is higher than the original frequency, up-sampling starts. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2021-03-06T16:00:00.000+08:00| 3.09| -|2021-03-06T16:15:00.000+08:00| 3.53| -|2021-03-06T16:30:00.000+08:00| 3.5| -|2021-03-06T16:45:00.000+08:00| 3.51| -|2021-03-06T17:00:00.000+08:00| 3.41| -+-----------------------------+---------------+ -``` - - -SQL for query: - -```sql -select resample(s1,'every'='5m','interp'='linear') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+----------------------------------------------------------+ -| Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")| -+-----------------------------+----------------------------------------------------------+ -|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| -|2021-03-06T16:05:00.000+08:00| 3.2366665999094644| -|2021-03-06T16:10:00.000+08:00| 3.3833332856496177| -|2021-03-06T16:15:00.000+08:00| 3.5299999713897705| -|2021-03-06T16:20:00.000+08:00| 3.5199999809265137| -|2021-03-06T16:25:00.000+08:00| 3.509999990463257| -|2021-03-06T16:30:00.000+08:00| 3.5| -|2021-03-06T16:35:00.000+08:00| 3.503333330154419| -|2021-03-06T16:40:00.000+08:00| 3.506666660308838| -|2021-03-06T16:45:00.000+08:00| 3.509999990463257| -|2021-03-06T16:50:00.000+08:00| 3.4766666889190674| -|2021-03-06T16:55:00.000+08:00| 3.443333387374878| -|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| -+-----------------------------+----------------------------------------------------------+ -``` - -##### Down-sampling - -When the frequency of resampling is lower than the original frequency, down-sampling starts. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select resample(s1,'every'='30m','aggr'='first') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------------+ -| Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")| -+-----------------------------+--------------------------------------------------------+ -|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| -|2021-03-06T16:30:00.000+08:00| 3.5| -|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| -+-----------------------------+--------------------------------------------------------+ -``` - - - -##### Specify the time period - -The time period of resampling can be specified with `start` and `end`. -The period outside the actual time range will be interpolated. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-----------------------------------------------------------------------+ -| Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")| -+-----------------------------+-----------------------------------------------------------------------+ -|2021-03-06T15:00:00.000+08:00| NaN| -|2021-03-06T15:30:00.000+08:00| NaN| -|2021-03-06T16:00:00.000+08:00| 3.309999942779541| -|2021-03-06T16:30:00.000+08:00| 3.5049999952316284| -|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| -+-----------------------------+-----------------------------------------------------------------------+ -``` - -### Sample - -#### Registration statement - -```sql -create function sample as 'org.apache.iotdb.library.dprofile.UDTFSample' -``` - -#### Usage - -This function is used to sample the input series, -that is, select a specified number of data points from the input series and output them. -Currently, three sampling methods are supported: -**Reservoir sampling** randomly selects data points. -All of the points have the same probability of being sampled. -**Isometric sampling** selects data points at equal index intervals. -**Triangle sampling** assigns data points to the buckets based on the number of sampling. -Then it calculates the area of the triangle based on these points inside the bucket and selects the point with the largest area of the triangle. -For more detail, please read [paper](http://skemman.is/stream/get/1946/15343/37285/3/SS_MSthesis.pdf) - -**Name:** SAMPLE - -**Input Series:** Only support a single input series. The type is arbitrary. - -**Parameters:** - -+ `method`: The method of sampling, which is 'reservoir', 'isometric' or 'triangle'. By default, reservoir sampling is used. -+ `k`: The number of sampling, which is a positive integer. By default, it's 1. - -**Output Series:** Output a single series. The type is the same as the input. The length of the output series is `k`. Each data point in the output series comes from the input series. - -**Note:** If `k` is greater than the length of input series, all data points in the input series will be output. - -#### Examples - -##### Reservoir Sampling - -When `method` is 'reservoir' or the default, reservoir sampling is used. -Due to the randomness of this method, the output series shown below is only a possible result. - - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| 1.0| -|2020-01-01T00:00:02.000+08:00| 2.0| -|2020-01-01T00:00:03.000+08:00| 3.0| -|2020-01-01T00:00:04.000+08:00| 4.0| -|2020-01-01T00:00:05.000+08:00| 5.0| -|2020-01-01T00:00:06.000+08:00| 6.0| -|2020-01-01T00:00:07.000+08:00| 7.0| -|2020-01-01T00:00:08.000+08:00| 8.0| -|2020-01-01T00:00:09.000+08:00| 9.0| -|2020-01-01T00:00:10.000+08:00| 10.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select sample(s1,'method'='reservoir','k'='5') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------+ -| Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")| -+-----------------------------+------------------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 2.0| -|2020-01-01T00:00:03.000+08:00| 3.0| -|2020-01-01T00:00:05.000+08:00| 5.0| -|2020-01-01T00:00:08.000+08:00| 8.0| -|2020-01-01T00:00:10.000+08:00| 10.0| -+-----------------------------+------------------------------------------------------+ -``` - -##### Isometric Sampling - -When `method` is 'isometric', isometric sampling is used. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select sample(s1,'method'='isometric','k'='5') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------+ -| Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")| -+-----------------------------+------------------------------------------------------+ -|2020-01-01T00:00:01.000+08:00| 1.0| -|2020-01-01T00:00:03.000+08:00| 3.0| -|2020-01-01T00:00:05.000+08:00| 5.0| -|2020-01-01T00:00:07.000+08:00| 7.0| -|2020-01-01T00:00:09.000+08:00| 9.0| -+-----------------------------+------------------------------------------------------+ -``` - -### Segment - -#### Registration statement - -```sql -create function segment as 'org.apache.iotdb.library.dprofile.UDTFSegment' -``` - -#### Usage - -This function is used to segment a time series into subsequences according to linear trend, and returns linear fitted values of first values in each subsequence or every data point. - -**Name:** SEGMENT - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `output` :"all" to output all fitted points; "first" to output first fitted points in each subsequence. - -+ `error`: error allowed at linear regression. It is defined as mean absolute error of a subsequence. - -**Output Series:** Output a single series. The type is DOUBLE. - -**Note:** This function treat input series as equal-interval sampled. All data are loaded, so downsample input series first if there are too many data points. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.000+08:00| 5.0| -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 1.0| -|1970-01-01T08:00:00.300+08:00| 2.0| -|1970-01-01T08:00:00.400+08:00| 3.0| -|1970-01-01T08:00:00.500+08:00| 4.0| -|1970-01-01T08:00:00.600+08:00| 5.0| -|1970-01-01T08:00:00.700+08:00| 6.0| -|1970-01-01T08:00:00.800+08:00| 7.0| -|1970-01-01T08:00:00.900+08:00| 8.0| -|1970-01-01T08:00:01.000+08:00| 9.0| -|1970-01-01T08:00:01.100+08:00| 9.1| -|1970-01-01T08:00:01.200+08:00| 9.2| -|1970-01-01T08:00:01.300+08:00| 9.3| -|1970-01-01T08:00:01.400+08:00| 9.4| -|1970-01-01T08:00:01.500+08:00| 9.5| -|1970-01-01T08:00:01.600+08:00| 9.6| -|1970-01-01T08:00:01.700+08:00| 9.7| -|1970-01-01T08:00:01.800+08:00| 9.8| -|1970-01-01T08:00:01.900+08:00| 9.9| -|1970-01-01T08:00:02.000+08:00| 10.0| -|1970-01-01T08:00:02.100+08:00| 8.0| -|1970-01-01T08:00:02.200+08:00| 6.0| -|1970-01-01T08:00:02.300+08:00| 4.0| -|1970-01-01T08:00:02.400+08:00| 2.0| -|1970-01-01T08:00:02.500+08:00| 0.0| -|1970-01-01T08:00:02.600+08:00| -2.0| -|1970-01-01T08:00:02.700+08:00| -4.0| -|1970-01-01T08:00:02.800+08:00| -6.0| -|1970-01-01T08:00:02.900+08:00| -8.0| -|1970-01-01T08:00:03.000+08:00| -10.0| -|1970-01-01T08:00:03.100+08:00| 10.0| -|1970-01-01T08:00:03.200+08:00| 10.0| -|1970-01-01T08:00:03.300+08:00| 10.0| -|1970-01-01T08:00:03.400+08:00| 10.0| -|1970-01-01T08:00:03.500+08:00| 10.0| -|1970-01-01T08:00:03.600+08:00| 10.0| -|1970-01-01T08:00:03.700+08:00| 10.0| -|1970-01-01T08:00:03.800+08:00| 10.0| -|1970-01-01T08:00:03.900+08:00| 10.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select segment(s1, "error"="0.1") from root.test -``` - -Output series: - -``` -+-----------------------------+------------------------------------+ -| Time|segment(root.test.s1, "error"="0.1")| -+-----------------------------+------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 5.0| -|1970-01-01T08:00:00.200+08:00| 1.0| -|1970-01-01T08:00:01.000+08:00| 9.0| -|1970-01-01T08:00:02.000+08:00| 10.0| -|1970-01-01T08:00:03.000+08:00| -10.0| -|1970-01-01T08:00:03.200+08:00| 10.0| -+-----------------------------+------------------------------------+ -``` - -### Skew - -#### Registration statement - -```sql -create function skew as 'org.apache.iotdb.library.dprofile.UDAFSkew' -``` - -#### Usage - -This function is used to calculate the population skewness. - -**Name:** SKEW - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population skewness. - -**Note:** Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:00.000+08:00| 1.0| -|2020-01-01T00:00:01.000+08:00| 2.0| -|2020-01-01T00:00:02.000+08:00| 3.0| -|2020-01-01T00:00:03.000+08:00| 4.0| -|2020-01-01T00:00:04.000+08:00| 5.0| -|2020-01-01T00:00:05.000+08:00| 6.0| -|2020-01-01T00:00:06.000+08:00| 7.0| -|2020-01-01T00:00:07.000+08:00| 8.0| -|2020-01-01T00:00:08.000+08:00| 9.0| -|2020-01-01T00:00:09.000+08:00| 10.0| -|2020-01-01T00:00:10.000+08:00| 10.0| -|2020-01-01T00:00:11.000+08:00| 10.0| -|2020-01-01T00:00:12.000+08:00| 10.0| -|2020-01-01T00:00:13.000+08:00| 10.0| -|2020-01-01T00:00:14.000+08:00| 10.0| -|2020-01-01T00:00:15.000+08:00| 10.0| -|2020-01-01T00:00:16.000+08:00| 10.0| -|2020-01-01T00:00:17.000+08:00| 10.0| -|2020-01-01T00:00:18.000+08:00| 10.0| -|2020-01-01T00:00:19.000+08:00| 10.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select skew(s1) from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-----------------------+ -| Time| skew(root.test.d1.s1)| -+-----------------------------+-----------------------+ -|1970-01-01T08:00:00.000+08:00| -0.9998427402292644| -+-----------------------------+-----------------------+ -``` - -### Spline - -#### Registration statement - -```sql -create function spline as 'org.apache.iotdb.library.dprofile.UDTFSpline' -``` - -#### Usage - -This function is used to calculate cubic spline interpolation of input series. - -**Name:** SPLINE - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `points`: Number of resampling points. - -**Output Series:** Output a single series. The type is DOUBLE. - -**Note**: Output series retains the first and last timestamps of input series. Interpolation points are selected at equal intervals. The function tries to calculate only when there are no less than 4 points in input series. - -#### Examples - -##### Assigning number of interpolation points - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.2| -|1970-01-01T08:00:00.500+08:00| 1.7| -|1970-01-01T08:00:00.700+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 2.1| -|1970-01-01T08:00:01.100+08:00| 2.0| -|1970-01-01T08:00:01.200+08:00| 1.8| -|1970-01-01T08:00:01.300+08:00| 1.2| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 1.6| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select spline(s1, "points"="151") from root.test -``` - -Output series: - -``` -+-----------------------------+------------------------------------+ -| Time|spline(root.test.s1, "points"="151")| -+-----------------------------+------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -|1970-01-01T08:00:00.010+08:00| 0.04870000251134237| -|1970-01-01T08:00:00.020+08:00| 0.09680000495910646| -|1970-01-01T08:00:00.030+08:00| 0.14430000734329226| -|1970-01-01T08:00:00.040+08:00| 0.19120000966389972| -|1970-01-01T08:00:00.050+08:00| 0.23750001192092896| -|1970-01-01T08:00:00.060+08:00| 0.2832000141143799| -|1970-01-01T08:00:00.070+08:00| 0.32830001624425253| -|1970-01-01T08:00:00.080+08:00| 0.3728000183105469| -|1970-01-01T08:00:00.090+08:00| 0.416700020313263| -|1970-01-01T08:00:00.100+08:00| 0.4600000222524008| -|1970-01-01T08:00:00.110+08:00| 0.5027000241279602| -|1970-01-01T08:00:00.120+08:00| 0.5448000259399414| -|1970-01-01T08:00:00.130+08:00| 0.5863000276883443| -|1970-01-01T08:00:00.140+08:00| 0.627200029373169| -|1970-01-01T08:00:00.150+08:00| 0.6675000309944153| -|1970-01-01T08:00:00.160+08:00| 0.7072000325520833| -|1970-01-01T08:00:00.170+08:00| 0.7463000340461731| -|1970-01-01T08:00:00.180+08:00| 0.7848000354766846| -|1970-01-01T08:00:00.190+08:00| 0.8227000368436178| -|1970-01-01T08:00:00.200+08:00| 0.8600000381469728| -|1970-01-01T08:00:00.210+08:00| 0.8967000393867494| -|1970-01-01T08:00:00.220+08:00| 0.9328000405629477| -|1970-01-01T08:00:00.230+08:00| 0.9683000416755676| -|1970-01-01T08:00:00.240+08:00| 1.0032000427246095| -|1970-01-01T08:00:00.250+08:00| 1.037500043710073| -|1970-01-01T08:00:00.260+08:00| 1.071200044631958| -|1970-01-01T08:00:00.270+08:00| 1.1043000454902647| -|1970-01-01T08:00:00.280+08:00| 1.1368000462849934| -|1970-01-01T08:00:00.290+08:00| 1.1687000470161437| -|1970-01-01T08:00:00.300+08:00| 1.2000000476837158| -|1970-01-01T08:00:00.310+08:00| 1.2307000483103594| -|1970-01-01T08:00:00.320+08:00| 1.2608000489139557| -|1970-01-01T08:00:00.330+08:00| 1.2903000494873524| -|1970-01-01T08:00:00.340+08:00| 1.3192000500233967| -|1970-01-01T08:00:00.350+08:00| 1.3475000505149364| -|1970-01-01T08:00:00.360+08:00| 1.3752000509548186| -|1970-01-01T08:00:00.370+08:00| 1.402300051335891| -|1970-01-01T08:00:00.380+08:00| 1.4288000516510009| -|1970-01-01T08:00:00.390+08:00| 1.4547000518929958| -|1970-01-01T08:00:00.400+08:00| 1.480000052054723| -|1970-01-01T08:00:00.410+08:00| 1.5047000521290301| -|1970-01-01T08:00:00.420+08:00| 1.5288000521087646| -|1970-01-01T08:00:00.430+08:00| 1.5523000519867738| -|1970-01-01T08:00:00.440+08:00| 1.575200051755905| -|1970-01-01T08:00:00.450+08:00| 1.597500051409006| -|1970-01-01T08:00:00.460+08:00| 1.619200050938924| -|1970-01-01T08:00:00.470+08:00| 1.6403000503385066| -|1970-01-01T08:00:00.480+08:00| 1.660800049600601| -|1970-01-01T08:00:00.490+08:00| 1.680700048718055| -|1970-01-01T08:00:00.500+08:00| 1.7000000476837158| -|1970-01-01T08:00:00.510+08:00| 1.7188475466453037| -|1970-01-01T08:00:00.520+08:00| 1.7373800457262996| -|1970-01-01T08:00:00.530+08:00| 1.7555825448831923| -|1970-01-01T08:00:00.540+08:00| 1.7734400440724702| -|1970-01-01T08:00:00.550+08:00| 1.790937543250622| -|1970-01-01T08:00:00.560+08:00| 1.8080600423741364| -|1970-01-01T08:00:00.570+08:00| 1.8247925413995016| -|1970-01-01T08:00:00.580+08:00| 1.8411200402832066| -|1970-01-01T08:00:00.590+08:00| 1.8570275389817397| -|1970-01-01T08:00:00.600+08:00| 1.8725000374515897| -|1970-01-01T08:00:00.610+08:00| 1.8875225356492449| -|1970-01-01T08:00:00.620+08:00| 1.902080033531194| -|1970-01-01T08:00:00.630+08:00| 1.9161575310539258| -|1970-01-01T08:00:00.640+08:00| 1.9297400281739288| -|1970-01-01T08:00:00.650+08:00| 1.9428125248476913| -|1970-01-01T08:00:00.660+08:00| 1.9553600210317021| -|1970-01-01T08:00:00.670+08:00| 1.96736751668245| -|1970-01-01T08:00:00.680+08:00| 1.9788200117564232| -|1970-01-01T08:00:00.690+08:00| 1.9897025062101101| -|1970-01-01T08:00:00.700+08:00| 2.0| -|1970-01-01T08:00:00.710+08:00| 2.0097024933913334| -|1970-01-01T08:00:00.720+08:00| 2.0188199867081615| -|1970-01-01T08:00:00.730+08:00| 2.027367479995188| -|1970-01-01T08:00:00.740+08:00| 2.0353599732971155| -|1970-01-01T08:00:00.750+08:00| 2.0428124666586482| -|1970-01-01T08:00:00.760+08:00| 2.049739960124489| -|1970-01-01T08:00:00.770+08:00| 2.056157453739342| -|1970-01-01T08:00:00.780+08:00| 2.06207994754791| -|1970-01-01T08:00:00.790+08:00| 2.067522441594897| -|1970-01-01T08:00:00.800+08:00| 2.072499935925006| -|1970-01-01T08:00:00.810+08:00| 2.07702743058294| -|1970-01-01T08:00:00.820+08:00| 2.081119925613404| -|1970-01-01T08:00:00.830+08:00| 2.0847924210611| -|1970-01-01T08:00:00.840+08:00| 2.0880599169707317| -|1970-01-01T08:00:00.850+08:00| 2.0909374133870027| -|1970-01-01T08:00:00.860+08:00| 2.0934399103546166| -|1970-01-01T08:00:00.870+08:00| 2.0955824079182768| -|1970-01-01T08:00:00.880+08:00| 2.0973799061226863| -|1970-01-01T08:00:00.890+08:00| 2.098847405012549| -|1970-01-01T08:00:00.900+08:00| 2.0999999046325684| -|1970-01-01T08:00:00.910+08:00| 2.1005574051201332| -|1970-01-01T08:00:00.920+08:00| 2.1002599065303778| -|1970-01-01T08:00:00.930+08:00| 2.0991524087846245| -|1970-01-01T08:00:00.940+08:00| 2.0972799118041947| -|1970-01-01T08:00:00.950+08:00| 2.0946874155104105| -|1970-01-01T08:00:00.960+08:00| 2.0914199198245944| -|1970-01-01T08:00:00.970+08:00| 2.0875224246680673| -|1970-01-01T08:00:00.980+08:00| 2.083039929962151| -|1970-01-01T08:00:00.990+08:00| 2.0780174356281687| -|1970-01-01T08:00:01.000+08:00| 2.0724999415874406| -|1970-01-01T08:00:01.010+08:00| 2.06653244776129| -|1970-01-01T08:00:01.020+08:00| 2.060159954071038| -|1970-01-01T08:00:01.030+08:00| 2.053427460438006| -|1970-01-01T08:00:01.040+08:00| 2.046379966783517| -|1970-01-01T08:00:01.050+08:00| 2.0390624730288924| -|1970-01-01T08:00:01.060+08:00| 2.031519979095454| -|1970-01-01T08:00:01.070+08:00| 2.0237974849045237| -|1970-01-01T08:00:01.080+08:00| 2.015939990377423| -|1970-01-01T08:00:01.090+08:00| 2.0079924954354746| -|1970-01-01T08:00:01.100+08:00| 2.0| -|1970-01-01T08:00:01.110+08:00| 1.9907018211101906| -|1970-01-01T08:00:01.120+08:00| 1.9788509124245144| -|1970-01-01T08:00:01.130+08:00| 1.9645127287932083| -|1970-01-01T08:00:01.140+08:00| 1.9477527250665083| -|1970-01-01T08:00:01.150+08:00| 1.9286363560946513| -|1970-01-01T08:00:01.160+08:00| 1.9072290767278735| -|1970-01-01T08:00:01.170+08:00| 1.8835963418164114| -|1970-01-01T08:00:01.180+08:00| 1.8578036062105014| -|1970-01-01T08:00:01.190+08:00| 1.8299163247603802| -|1970-01-01T08:00:01.200+08:00| 1.7999999523162842| -|1970-01-01T08:00:01.210+08:00| 1.7623635841923329| -|1970-01-01T08:00:01.220+08:00| 1.7129696477516976| -|1970-01-01T08:00:01.230+08:00| 1.6543635959181928| -|1970-01-01T08:00:01.240+08:00| 1.5890908816156328| -|1970-01-01T08:00:01.250+08:00| 1.5196969577678319| -|1970-01-01T08:00:01.260+08:00| 1.4487272772986044| -|1970-01-01T08:00:01.270+08:00| 1.3787272931317647| -|1970-01-01T08:00:01.280+08:00| 1.3122424581911272| -|1970-01-01T08:00:01.290+08:00| 1.251818225400506| -|1970-01-01T08:00:01.300+08:00| 1.2000000476837158| -|1970-01-01T08:00:01.310+08:00| 1.1548000470995912| -|1970-01-01T08:00:01.320+08:00| 1.1130667107899999| -|1970-01-01T08:00:01.330+08:00| 1.0756000393033045| -|1970-01-01T08:00:01.340+08:00| 1.043200033187868| -|1970-01-01T08:00:01.350+08:00| 1.016666692992053| -|1970-01-01T08:00:01.360+08:00| 0.9968000192642223| -|1970-01-01T08:00:01.370+08:00| 0.9844000125527389| -|1970-01-01T08:00:01.380+08:00| 0.9802666734059655| -|1970-01-01T08:00:01.390+08:00| 0.9852000023722649| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.410+08:00| 1.023999999165535| -|1970-01-01T08:00:01.420+08:00| 1.0559999990463256| -|1970-01-01T08:00:01.430+08:00| 1.0959999996423722| -|1970-01-01T08:00:01.440+08:00| 1.1440000009536744| -|1970-01-01T08:00:01.450+08:00| 1.2000000029802322| -|1970-01-01T08:00:01.460+08:00| 1.264000005722046| -|1970-01-01T08:00:01.470+08:00| 1.3360000091791153| -|1970-01-01T08:00:01.480+08:00| 1.4160000133514405| -|1970-01-01T08:00:01.490+08:00| 1.5040000182390214| -|1970-01-01T08:00:01.500+08:00| 1.600000023841858| -+-----------------------------+------------------------------------+ -``` - -### Spread - -#### Registration statement - -```sql -create function spread as 'org.apache.iotdb.library.dprofile.UDAFSpread' -``` - -#### Usage - -This function is used to calculate the spread of time series, that is, the maximum value minus the minimum value. - -**Name:** SPREAD - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is the same as the input. There is only one data point in the series, whose timestamp is 0 and value is the spread. - -**Note:** Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+-----------------------+ -| Time|spread(root.test.d1.s1)| -+-----------------------------+-----------------------+ -|1970-01-01T08:00:00.000+08:00| 26.0| -+-----------------------------+-----------------------+ -``` - - - -### ZScore - -#### Registration statement - -```sql -create function zscore as 'org.apache.iotdb.library.dprofile.UDTFZScore' -``` - -#### Usage - -This function is used to standardize the input series with z-score. - -**Name:** ZSCORE - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide mean and standard deviation. The default method is "batch". -+ `avg`: Mean value when method is set to "stream". -+ `sd`: Standard deviation when method is set to "stream". - -**Output Series:** Output a single series. The type is DOUBLE. - -#### Examples - -##### Batch computing - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select zscore(s1) from root.test -``` - -Output series: - -``` -+-----------------------------+--------------------+ -| Time|zscore(root.test.s1)| -+-----------------------------+--------------------+ -|1970-01-01T08:00:00.100+08:00|-0.20672455764868078| -|1970-01-01T08:00:00.200+08:00|-0.20672455764868078| -|1970-01-01T08:00:00.300+08:00| 0.20672455764868078| -|1970-01-01T08:00:00.400+08:00| -0.6201736729460423| -|1970-01-01T08:00:00.500+08:00|-0.20672455764868078| -|1970-01-01T08:00:00.600+08:00|-0.20672455764868078| -|1970-01-01T08:00:00.700+08:00| -1.033622788243404| -|1970-01-01T08:00:00.800+08:00| 0.6201736729460423| -|1970-01-01T08:00:00.900+08:00|-0.20672455764868078| -|1970-01-01T08:00:01.000+08:00|-0.20672455764868078| -|1970-01-01T08:00:01.100+08:00| 0.20672455764868078| -|1970-01-01T08:00:01.200+08:00| -0.6201736729460423| -|1970-01-01T08:00:01.300+08:00| -0.6201736729460423| -|1970-01-01T08:00:01.400+08:00| 0.20672455764868078| -|1970-01-01T08:00:01.500+08:00|-0.20672455764868078| -|1970-01-01T08:00:01.600+08:00|-0.20672455764868078| -|1970-01-01T08:00:01.700+08:00| 3.9277665953249348| -|1970-01-01T08:00:01.800+08:00| 0.6201736729460423| -|1970-01-01T08:00:01.900+08:00| -1.033622788243404| -|1970-01-01T08:00:02.000+08:00|-0.20672455764868078| -+-----------------------------+--------------------+ -``` - - -## Anomaly Detection - -### IQR - -#### Registration statement - -```sql -create function iqr as 'org.apache.iotdb.library.anomaly.UDTFIQR' -``` - -#### Usage - -This function is used to detect anomalies based on IQR. Points distributing beyond 1.5 times IQR are selected. - -**Name:** IQR - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `method`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide upper and lower quantiles. The default method is "batch". -+ `q1`: The lower quantile when method is set to "stream". -+ `q3`: The upper quantile when method is set to "stream". - -**Output Series:** Output a single series. The type is DOUBLE. - -**Note:** $IQR=Q_3-Q_1$ - -#### Examples - -##### Batch computing - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select iqr(s1) from root.test -``` - -Output series: - -``` -+-----------------------------+-----------------+ -| Time|iqr(root.test.s1)| -+-----------------------------+-----------------+ -|1970-01-01T08:00:01.700+08:00| 10.0| -+-----------------------------+-----------------+ -``` - -### KSigma - -#### Registration statement - -```sql -create function ksigma as 'org.apache.iotdb.library.anomaly.UDTFKSigma' -``` - -#### Usage - -This function is used to detect anomalies based on the Dynamic K-Sigma Algorithm. -Within a sliding window, the input value with a deviation of more than k times the standard deviation from the average will be output as anomaly. - -**Name:** KSIGMA - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `k`: How many times to multiply on standard deviation to define anomaly, the default value is 3. -+ `window`: The window size of Dynamic K-Sigma Algorithm, the default value is 10000. - -**Output Series:** Output a single series. The type is same as input series. - -**Note:** Only when is larger than 0, the anomaly detection will be performed. Otherwise, nothing will be output. - -#### Examples - -##### Assigning k - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 0.0| -|2020-01-01T00:00:03.000+08:00| 50.0| -|2020-01-01T00:00:04.000+08:00| 100.0| -|2020-01-01T00:00:06.000+08:00| 150.0| -|2020-01-01T00:00:08.000+08:00| 200.0| -|2020-01-01T00:00:10.000+08:00| 200.0| -|2020-01-01T00:00:14.000+08:00| 200.0| -|2020-01-01T00:00:15.000+08:00| 200.0| -|2020-01-01T00:00:16.000+08:00| 200.0| -|2020-01-01T00:00:18.000+08:00| 200.0| -|2020-01-01T00:00:20.000+08:00| 150.0| -|2020-01-01T00:00:22.000+08:00| 100.0| -|2020-01-01T00:00:26.000+08:00| 50.0| -|2020-01-01T00:00:28.000+08:00| 0.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+---------------------------------+ -|Time |ksigma(root.test.d1.s1,"k"="3.0")| -+-----------------------------+---------------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.0| -|2020-01-01T00:00:03.000+08:00| 50.0| -|2020-01-01T00:00:26.000+08:00| 50.0| -|2020-01-01T00:00:28.000+08:00| 0.0| -+-----------------------------+---------------------------------+ -``` - -### LOF - -#### Registration statement - -```sql -create function LOF as 'org.apache.iotdb.library.anomaly.UDTFLOF' -``` - -#### Usage - -This function is used to detect density anomaly of time series. According to k-th distance calculation parameter and local outlier factor (lof) threshold, the function judges if a set of input values is an density anomaly, and a bool mark of anomaly values will be output. - -**Name:** LOF - -**Input Series:** Multiple input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `method`:assign a detection method. The default value is "default", when input data has multiple dimensions. The alternative is "series", when a input series will be transformed to high dimension. -+ `k`:use the k-th distance to calculate lof. Default value is 3. -+ `window`: size of window to split origin data points. Default value is 10000. -+ `windowsize`:dimension that will be transformed into when method is "series". The default value is 5. - -**Output Series:** Output a single series. The type is DOUBLE. - -**Note:** Incomplete rows will be ignored. They are neither calculated nor marked as anomaly. - -#### Examples - -##### Using default parameters - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d1.s1|root.test.d1.s2| -+-----------------------------+---------------+---------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| 1.0| -|1970-01-01T08:00:00.300+08:00| 1.0| 1.0| -|1970-01-01T08:00:00.400+08:00| 1.0| 0.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -1.0| -|1970-01-01T08:00:00.600+08:00| -1.0| -1.0| -|1970-01-01T08:00:00.700+08:00| -1.0| 0.0| -|1970-01-01T08:00:00.800+08:00| 2.0| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| null| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select lof(s1,s2) from root.test.d1 where time<1000 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------+ -| Time|lof(root.test.d1.s1, root.test.d1.s2)| -+-----------------------------+-------------------------------------+ -|1970-01-01T08:00:00.100+08:00| 3.8274824267668244| -|1970-01-01T08:00:00.200+08:00| 3.0117631741126156| -|1970-01-01T08:00:00.300+08:00| 2.838155437762879| -|1970-01-01T08:00:00.400+08:00| 3.0117631741126156| -|1970-01-01T08:00:00.500+08:00| 2.73518261244453| -|1970-01-01T08:00:00.600+08:00| 2.371440975708148| -|1970-01-01T08:00:00.700+08:00| 2.73518261244453| -|1970-01-01T08:00:00.800+08:00| 1.7561416374270742| -+-----------------------------+-------------------------------------+ -``` - -##### Diagnosing 1d timeseries - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.100+08:00| 1.0| -|1970-01-01T08:00:00.200+08:00| 2.0| -|1970-01-01T08:00:00.300+08:00| 3.0| -|1970-01-01T08:00:00.400+08:00| 4.0| -|1970-01-01T08:00:00.500+08:00| 5.0| -|1970-01-01T08:00:00.600+08:00| 6.0| -|1970-01-01T08:00:00.700+08:00| 7.0| -|1970-01-01T08:00:00.800+08:00| 8.0| -|1970-01-01T08:00:00.900+08:00| 9.0| -|1970-01-01T08:00:01.000+08:00| 10.0| -|1970-01-01T08:00:01.100+08:00| 11.0| -|1970-01-01T08:00:01.200+08:00| 12.0| -|1970-01-01T08:00:01.300+08:00| 13.0| -|1970-01-01T08:00:01.400+08:00| 14.0| -|1970-01-01T08:00:01.500+08:00| 15.0| -|1970-01-01T08:00:01.600+08:00| 16.0| -|1970-01-01T08:00:01.700+08:00| 17.0| -|1970-01-01T08:00:01.800+08:00| 18.0| -|1970-01-01T08:00:01.900+08:00| 19.0| -|1970-01-01T08:00:02.000+08:00| 20.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select lof(s1, "method"="series") from root.test.d1 where time<1000 -``` - -Output series: - -``` -+-----------------------------+--------------------+ -| Time|lof(root.test.d1.s1)| -+-----------------------------+--------------------+ -|1970-01-01T08:00:00.100+08:00| 3.77777777777778| -|1970-01-01T08:00:00.200+08:00| 4.32727272727273| -|1970-01-01T08:00:00.300+08:00| 4.85714285714286| -|1970-01-01T08:00:00.400+08:00| 5.40909090909091| -|1970-01-01T08:00:00.500+08:00| 5.94999999999999| -|1970-01-01T08:00:00.600+08:00| 6.43243243243243| -|1970-01-01T08:00:00.700+08:00| 6.79999999999999| -|1970-01-01T08:00:00.800+08:00| 7.0| -|1970-01-01T08:00:00.900+08:00| 7.0| -|1970-01-01T08:00:01.000+08:00| 6.79999999999999| -|1970-01-01T08:00:01.100+08:00| 6.43243243243243| -|1970-01-01T08:00:01.200+08:00| 5.94999999999999| -|1970-01-01T08:00:01.300+08:00| 5.40909090909091| -|1970-01-01T08:00:01.400+08:00| 4.85714285714286| -|1970-01-01T08:00:01.500+08:00| 4.32727272727273| -|1970-01-01T08:00:01.600+08:00| 3.77777777777778| -+-----------------------------+--------------------+ -``` - -### MissDetect - -#### Registration statement - -```sql -create function missdetect as 'org.apache.iotdb.library.anomaly.UDTFMissDetect' -``` - -#### Usage - -This function is used to detect missing anomalies. -In some datasets, missing values are filled by linear interpolation. -Thus, there are several long perfect linear segments. -By discovering these perfect linear segments, -missing anomalies are detected. - -**Name:** MISSDETECT - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameter:** - -`error`: The minimum length of the detected missing anomalies, which is an integer greater than or equal to 10. By default, it is 10. - -**Output Series:** Output a single series. The type is BOOLEAN. Each data point which is miss anomaly will be labeled as true. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s2| -+-----------------------------+---------------+ -|2021-07-01T12:00:00.000+08:00| 0.0| -|2021-07-01T12:00:01.000+08:00| 1.0| -|2021-07-01T12:00:02.000+08:00| 0.0| -|2021-07-01T12:00:03.000+08:00| 1.0| -|2021-07-01T12:00:04.000+08:00| 0.0| -|2021-07-01T12:00:05.000+08:00| 0.0| -|2021-07-01T12:00:06.000+08:00| 0.0| -|2021-07-01T12:00:07.000+08:00| 0.0| -|2021-07-01T12:00:08.000+08:00| 0.0| -|2021-07-01T12:00:09.000+08:00| 0.0| -|2021-07-01T12:00:10.000+08:00| 0.0| -|2021-07-01T12:00:11.000+08:00| 0.0| -|2021-07-01T12:00:12.000+08:00| 0.0| -|2021-07-01T12:00:13.000+08:00| 0.0| -|2021-07-01T12:00:14.000+08:00| 0.0| -|2021-07-01T12:00:15.000+08:00| 0.0| -|2021-07-01T12:00:16.000+08:00| 1.0| -|2021-07-01T12:00:17.000+08:00| 0.0| -|2021-07-01T12:00:18.000+08:00| 1.0| -|2021-07-01T12:00:19.000+08:00| 0.0| -|2021-07-01T12:00:20.000+08:00| 1.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select missdetect(s2,'minlen'='10') from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------+ -| Time|missdetect(root.test.d2.s2, "minlen"="10")| -+-----------------------------+------------------------------------------+ -|2021-07-01T12:00:00.000+08:00| false| -|2021-07-01T12:00:01.000+08:00| false| -|2021-07-01T12:00:02.000+08:00| false| -|2021-07-01T12:00:03.000+08:00| false| -|2021-07-01T12:00:04.000+08:00| true| -|2021-07-01T12:00:05.000+08:00| true| -|2021-07-01T12:00:06.000+08:00| true| -|2021-07-01T12:00:07.000+08:00| true| -|2021-07-01T12:00:08.000+08:00| true| -|2021-07-01T12:00:09.000+08:00| true| -|2021-07-01T12:00:10.000+08:00| true| -|2021-07-01T12:00:11.000+08:00| true| -|2021-07-01T12:00:12.000+08:00| true| -|2021-07-01T12:00:13.000+08:00| true| -|2021-07-01T12:00:14.000+08:00| true| -|2021-07-01T12:00:15.000+08:00| true| -|2021-07-01T12:00:16.000+08:00| false| -|2021-07-01T12:00:17.000+08:00| false| -|2021-07-01T12:00:18.000+08:00| false| -|2021-07-01T12:00:19.000+08:00| false| -|2021-07-01T12:00:20.000+08:00| false| -+-----------------------------+------------------------------------------+ -``` - -### Range - -#### Registration statement - -```sql -create function range as 'org.apache.iotdb.library.anomaly.UDTFRange' -``` - -#### Usage - -This function is used to detect range anomaly of time series. According to upper bound and lower bound parameters, the function judges if a input value is beyond range, aka range anomaly, and a new time series of anomaly will be output. - -**Name:** RANGE - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `lower_bound`:lower bound of range anomaly detection. -+ `upper_bound`:upper bound of range anomaly detection. - -**Output Series:** Output a single series. The type is the same as the input. - -**Note:** Only when `upper_bound` is larger than `lower_bound`, the anomaly detection will be performed. Otherwise, nothing will be output. - - - -#### Examples - -##### Assigning Lower and Upper Bound - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------------------+ -|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| -+-----------------------------+------------------------------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -+-----------------------------+------------------------------------------------------------------+ -``` - -### TwoSidedFilter - -#### Registration statement - -```sql -create function twosidedfilter as 'org.apache.iotdb.library.anomaly.UDTFTwoSidedFilter' -``` - -#### Usage - -The function is used to filter anomalies of a numeric time series based on two-sided window detection. - -**Name:** TWOSIDEDFILTER - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE - -**Output Series:** Output a single series. The type is the same as the input. It is the input without anomalies. - -**Parameter:** - -- `len`: The size of the window, which is a positive integer. By default, it's 5. When `len`=3, the algorithm detects forward window and backward window with length 3 and calculates the outlierness of the current point. - -- `threshold`: The threshold of outlierness, which is a floating number in (0,1). By default, it's 0.3. The strict standard of detecting anomalies is in proportion to the threshold. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s0| -+-----------------------------+------------+ -|1970-01-01T08:00:00.000+08:00| 2002.0| -|1970-01-01T08:00:01.000+08:00| 1946.0| -|1970-01-01T08:00:02.000+08:00| 1958.0| -|1970-01-01T08:00:03.000+08:00| 2012.0| -|1970-01-01T08:00:04.000+08:00| 2051.0| -|1970-01-01T08:00:05.000+08:00| 1898.0| -|1970-01-01T08:00:06.000+08:00| 2014.0| -|1970-01-01T08:00:07.000+08:00| 2052.0| -|1970-01-01T08:00:08.000+08:00| 1935.0| -|1970-01-01T08:00:09.000+08:00| 1901.0| -|1970-01-01T08:00:10.000+08:00| 1972.0| -|1970-01-01T08:00:11.000+08:00| 1969.0| -|1970-01-01T08:00:12.000+08:00| 1984.0| -|1970-01-01T08:00:13.000+08:00| 2018.0| -|1970-01-01T08:00:37.000+08:00| 1484.0| -|1970-01-01T08:00:38.000+08:00| 1055.0| -|1970-01-01T08:00:39.000+08:00| 1050.0| -|1970-01-01T08:01:05.000+08:00| 1023.0| -|1970-01-01T08:01:06.000+08:00| 1056.0| -|1970-01-01T08:01:07.000+08:00| 978.0| -|1970-01-01T08:01:08.000+08:00| 1050.0| -|1970-01-01T08:01:09.000+08:00| 1123.0| -|1970-01-01T08:01:10.000+08:00| 1150.0| -|1970-01-01T08:01:11.000+08:00| 1034.0| -|1970-01-01T08:01:12.000+08:00| 950.0| -|1970-01-01T08:01:13.000+08:00| 1059.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test -``` - -Output series: - -``` -+-----------------------------+------------+ -| Time|root.test.s0| -+-----------------------------+------------+ -|1970-01-01T08:00:00.000+08:00| 2002.0| -|1970-01-01T08:00:01.000+08:00| 1946.0| -|1970-01-01T08:00:02.000+08:00| 1958.0| -|1970-01-01T08:00:03.000+08:00| 2012.0| -|1970-01-01T08:00:04.000+08:00| 2051.0| -|1970-01-01T08:00:05.000+08:00| 1898.0| -|1970-01-01T08:00:06.000+08:00| 2014.0| -|1970-01-01T08:00:07.000+08:00| 2052.0| -|1970-01-01T08:00:08.000+08:00| 1935.0| -|1970-01-01T08:00:09.000+08:00| 1901.0| -|1970-01-01T08:00:10.000+08:00| 1972.0| -|1970-01-01T08:00:11.000+08:00| 1969.0| -|1970-01-01T08:00:12.000+08:00| 1984.0| -|1970-01-01T08:00:13.000+08:00| 2018.0| -|1970-01-01T08:01:05.000+08:00| 1023.0| -|1970-01-01T08:01:06.000+08:00| 1056.0| -|1970-01-01T08:01:07.000+08:00| 978.0| -|1970-01-01T08:01:08.000+08:00| 1050.0| -|1970-01-01T08:01:09.000+08:00| 1123.0| -|1970-01-01T08:01:10.000+08:00| 1150.0| -|1970-01-01T08:01:11.000+08:00| 1034.0| -|1970-01-01T08:01:12.000+08:00| 950.0| -|1970-01-01T08:01:13.000+08:00| 1059.0| -+-----------------------------+------------+ -``` - -### Outlier - -#### Registration statement - -```sql -create function outlier as 'org.apache.iotdb.library.anomaly.UDTFOutlier' -``` - -#### Usage - -This function is used to detect distance-based outliers. For each point in the current window, if the number of its neighbors within the distance of neighbor distance threshold is less than the neighbor count threshold, the point in detected as an outlier. - -**Name:** OUTLIER - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `r`:the neighbor distance threshold. -+ `k`:the neighbor count threshold. -+ `w`:the window size. -+ `s`:the slide size. - -**Output Series:** Output a single series. The type is the same as the input. - -#### Examples - -##### Assigning Parameters of Queries - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|2020-01-04T23:59:55.000+08:00| 56.0| -|2020-01-04T23:59:56.000+08:00| 55.1| -|2020-01-04T23:59:57.000+08:00| 54.2| -|2020-01-04T23:59:58.000+08:00| 56.3| -|2020-01-04T23:59:59.000+08:00| 59.0| -|2020-01-05T00:00:00.000+08:00| 60.0| -|2020-01-05T00:00:01.000+08:00| 60.5| -|2020-01-05T00:00:02.000+08:00| 64.5| -|2020-01-05T00:00:03.000+08:00| 69.0| -|2020-01-05T00:00:04.000+08:00| 64.2| -|2020-01-05T00:00:05.000+08:00| 62.3| -|2020-01-05T00:00:06.000+08:00| 58.0| -|2020-01-05T00:00:07.000+08:00| 58.9| -|2020-01-05T00:00:08.000+08:00| 52.0| -|2020-01-05T00:00:09.000+08:00| 62.3| -|2020-01-05T00:00:10.000+08:00| 61.0| -|2020-01-05T00:00:11.000+08:00| 64.2| -|2020-01-05T00:00:12.000+08:00| 61.8| -|2020-01-05T00:00:13.000+08:00| 64.0| -|2020-01-05T00:00:14.000+08:00| 63.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------------+ -| Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| -+-----------------------------+--------------------------------------------------------+ -|2020-01-05T00:00:03.000+08:00| 69.0| -+-----------------------------+--------------------------------------------------------+ -|2020-01-05T00:00:08.000+08:00| 52.0| -+-----------------------------+--------------------------------------------------------+ -``` - - -### MasterTrain - -#### Usage - -This function is used to train the VAR model based on master data. The model is trained on learning samples consisting of p+1 consecutive non-error points. - -**Name:** MasterTrain - -**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `p`: The order of the model. -+ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. - -**Output Series:** Output a single series. The type is the same as the input. - -**Installation** -- Install IoTDB from branch `research/master-detector`. -- Run `mvn spotless:apply`. -- Run `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies`. -- Copy `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar` to `./ext/udf/`. -- Start IoTDB server and run `create function MasterTrain as 'org.apache.iotdb.library.anomaly.UDTFMasterTrain'` in client. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+------------+--------------+--------------+ -| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| -+-----------------------------+------------+------------+--------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| -|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| -|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| -|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| -|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| -|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| -|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| -|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| -|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| -|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| -|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| -|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| -|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| -|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| -|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| -|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| -|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| -|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| -|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| -|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| -|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| -|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| -|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| -|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| -+-----------------------------+------------+------------+--------------+--------------+ -``` - -SQL for query: - -```sql -select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test -``` - -Output series: - -``` -+-----------------------------+---------------------------------------------------------------------------------------------+ -| Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| -+-----------------------------+---------------------------------------------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 0.13656607660463288| -|1970-01-01T08:00:00.002+08:00| 0.8291884323013894| -|1970-01-01T08:00:00.003+08:00| 0.05012816073171693| -|1970-01-01T08:00:00.004+08:00| -0.5495287787485761| -|1970-01-01T08:00:00.005+08:00| 0.03740486307345578| -|1970-01-01T08:00:00.006+08:00| 1.0500132150475212| -|1970-01-01T08:00:00.007+08:00| 0.04583944643116993| -|1970-01-01T08:00:00.008+08:00| -0.07863708480736269| -+-----------------------------+---------------------------------------------------------------------------------------------+ -``` - -### MasterDetect - -#### Usage - -This function is used to detect time series and repair errors based on master data. The VAR model is trained by MasterTrain. - -**Name:** MasterDetect - -**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `p`: The order of the model. -+ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple distance of the k-th nearest neighbor in the master data. -+ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. -+ `eta`: The detection threshold. By default, it will be estimated based on the 3-sigma rule. -+ `output_type`: The type of output. 'repair' for repairing and 'anomaly' for anomaly detection. -+ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. - -**Output Series:** Output a single series. The type is the same as the input. - -**Installation** -- Install IoTDB from branch `research/master-detector`. -- Run `mvn spotless:apply`. -- Run `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies`. -- Copy `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar` to `./ext/udf/`. -- Start IoTDB server and run `create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'` in client. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+------------+--------------+--------------+--------------------+ -| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| -+-----------------------------+------------+------------+--------------+--------------+--------------------+ -|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| -|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| -|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| -|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| -|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| -|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| -|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| -|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| -|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| -|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| -|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| -|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| -|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| -|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| -|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| -|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| -|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| -|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| -|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| -|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| -|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| -|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| -|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| -|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| -|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| -+-----------------------------+------------+------------+--------------+--------------+--------------------+ -``` - -##### Repairing - -SQL for query: - -```sql -select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------------------------------------------+ -| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| -+-----------------------------+--------------------------------------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 116.327274| -|1970-01-01T08:00:00.002+08:00| 116.327305| -|1970-01-01T08:00:00.003+08:00| 116.3273291| -|1970-01-01T08:00:00.004+08:00| 116.327342| -|1970-01-01T08:00:00.005+08:00| 116.3273744| -|1970-01-01T08:00:00.006+08:00| 116.3274117| -|1970-01-01T08:00:00.007+08:00| 116.3274396| -|1970-01-01T08:00:00.008+08:00| 116.3274668| -|1970-01-01T08:00:00.009+08:00| 116.3275026| -|1970-01-01T08:00:00.010+08:00| 116.3274967| -|1970-01-01T08:00:00.011+08:00| 116.3274929| -|1970-01-01T08:00:00.012+08:00| 116.3274745| -|1970-01-01T08:00:00.013+08:00| 116.3275095| -|1970-01-01T08:00:00.014+08:00| 116.3274787| -|1970-01-01T08:00:00.015+08:00| 116.3274693| -|1970-01-01T08:00:00.016+08:00| 116.3274941| -|1970-01-01T08:00:00.017+08:00| 116.3275401| -|1970-01-01T08:00:00.018+08:00| 116.3275713| -|1970-01-01T08:00:00.019+08:00| 116.3276003| -|1970-01-01T08:00:00.020+08:00| 116.3276308| -|1970-01-01T08:00:00.021+08:00| 116.3276338| -|1970-01-01T08:00:00.022+08:00| 116.3276684| -|1970-01-01T08:00:00.023+08:00| 116.3277016| -|1970-01-01T08:00:00.024+08:00| 116.3277284| -|1970-01-01T08:00:00.025+08:00| 116.3277562| -+-----------------------------+--------------------------------------------------------------------------------------+ -``` - -##### Anomaly Detection - -SQL for query: - -```sql -select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test -``` - -Output series: - -``` -+-----------------------------+---------------------------------------------------------------------------------------+ -| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| -+-----------------------------+---------------------------------------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| false| -|1970-01-01T08:00:00.002+08:00| false| -|1970-01-01T08:00:00.003+08:00| false| -|1970-01-01T08:00:00.004+08:00| false| -|1970-01-01T08:00:00.005+08:00| true| -|1970-01-01T08:00:00.006+08:00| true| -|1970-01-01T08:00:00.007+08:00| false| -|1970-01-01T08:00:00.008+08:00| false| -|1970-01-01T08:00:00.009+08:00| false| -|1970-01-01T08:00:00.010+08:00| false| -|1970-01-01T08:00:00.011+08:00| false| -|1970-01-01T08:00:00.012+08:00| false| -|1970-01-01T08:00:00.013+08:00| false| -|1970-01-01T08:00:00.014+08:00| true| -|1970-01-01T08:00:00.015+08:00| false| -|1970-01-01T08:00:00.016+08:00| false| -|1970-01-01T08:00:00.017+08:00| false| -|1970-01-01T08:00:00.018+08:00| false| -|1970-01-01T08:00:00.019+08:00| false| -|1970-01-01T08:00:00.020+08:00| false| -|1970-01-01T08:00:00.021+08:00| false| -|1970-01-01T08:00:00.022+08:00| false| -|1970-01-01T08:00:00.023+08:00| false| -|1970-01-01T08:00:00.024+08:00| false| -|1970-01-01T08:00:00.025+08:00| false| -+-----------------------------+---------------------------------------------------------------------------------------+ -``` - - - -## Frequency Domain Analysis - -### Conv - -#### Registration statement - -```sql -create function conv as 'org.apache.iotdb.library.frequency.UDTFConv' -``` - -#### Usage - -This function is used to calculate the convolution, i.e. polynomial multiplication. - -**Name:** CONV - -**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Output:** Output a single series. The type is DOUBLE. It is the result of convolution whose timestamps starting from 0 only indicate the order. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d2.s1|root.test.d2.s2| -+-----------------------------+---------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 1.0| 7.0| -|1970-01-01T08:00:00.001+08:00| 0.0| 2.0| -|1970-01-01T08:00:00.002+08:00| 1.0| null| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select conv(s1,s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------+ -| Time|conv(root.test.d2.s1, root.test.d2.s2)| -+-----------------------------+--------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 7.0| -|1970-01-01T08:00:00.001+08:00| 2.0| -|1970-01-01T08:00:00.002+08:00| 7.0| -|1970-01-01T08:00:00.003+08:00| 2.0| -+-----------------------------+--------------------------------------+ -``` - -### Deconv - -#### Registration statement - -```sql -create function deconv as 'org.apache.iotdb.library.frequency.UDTFDeconv' -``` - -#### Usage - -This function is used to calculate the deconvolution, i.e. polynomial division. - -**Name:** DECONV - -**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `result`: The result of deconvolution, which is 'quotient' or 'remainder'. By default, the quotient will be output. - -**Output:** Output a single series. The type is DOUBLE. It is the result of deconvolving the second series from the first series (dividing the first series by the second series) whose timestamps starting from 0 only indicate the order. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - - -##### Calculate the quotient - -When `result` is 'quotient' or the default, this function calculates the quotient of the deconvolution. - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d2.s3|root.test.d2.s2| -+-----------------------------+---------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 8.0| 7.0| -|1970-01-01T08:00:00.001+08:00| 2.0| 2.0| -|1970-01-01T08:00:00.002+08:00| 7.0| null| -|1970-01-01T08:00:00.003+08:00| 2.0| null| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select deconv(s3,s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+----------------------------------------+ -| Time|deconv(root.test.d2.s3, root.test.d2.s2)| -+-----------------------------+----------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 1.0| -|1970-01-01T08:00:00.001+08:00| 0.0| -|1970-01-01T08:00:00.002+08:00| 1.0| -+-----------------------------+----------------------------------------+ -``` - -##### Calculate the remainder - -When `result` is 'remainder', this function calculates the remainder of the deconvolution. - -Input series is the same as above, the SQL for query is shown below: - - -```sql -select deconv(s3,s2,'result'='remainder') from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------------------+ -| Time|deconv(root.test.d2.s3, root.test.d2.s2, "result"="remainder")| -+-----------------------------+--------------------------------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 1.0| -|1970-01-01T08:00:00.001+08:00| 0.0| -|1970-01-01T08:00:00.002+08:00| 0.0| -|1970-01-01T08:00:00.003+08:00| 0.0| -+-----------------------------+--------------------------------------------------------------+ -``` - -### DWT - -#### Registration statement - -```sql -create function dwt as 'org.apache.iotdb.library.frequency.UDTFDWT' -``` - -#### Usage - -This function is used to calculate 1d discrete wavelet transform of a numerical series. - -**Name:** DWT - -**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `method`: The type of wavelet. May select 'Haar', 'DB4', 'DB6', 'DB8', where DB means Daubechies. User may offer coefficients of wavelet transform and ignore this parameter. Case ignored. -+ `coef`: Coefficients of wavelet transform. When providing this parameter, use comma ',' to split them, and leave no spaces or other punctuations. -+ `layer`: Times to transform. The number of output vectors equals $layer+1$. Default is 1. - -**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. - -**Note:** The length of input series must be an integer number power of 2. - -#### Examples - - -##### Haar wavelet transform - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -|1970-01-01T08:00:00.100+08:00| 0.2| -|1970-01-01T08:00:00.200+08:00| 1.5| -|1970-01-01T08:00:00.300+08:00| 1.2| -|1970-01-01T08:00:00.400+08:00| 0.6| -|1970-01-01T08:00:00.500+08:00| 1.7| -|1970-01-01T08:00:00.600+08:00| 0.8| -|1970-01-01T08:00:00.700+08:00| 2.0| -|1970-01-01T08:00:00.800+08:00| 2.5| -|1970-01-01T08:00:00.900+08:00| 2.1| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 2.0| -|1970-01-01T08:00:01.200+08:00| 1.8| -|1970-01-01T08:00:01.300+08:00| 1.2| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 1.6| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select dwt(s1,"method"="haar") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------+ -| Time|dwt(root.test.d1.s1, "method"="haar")| -+-----------------------------+-------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.14142135834465192| -|1970-01-01T08:00:00.100+08:00| 1.909188342921157| -|1970-01-01T08:00:00.200+08:00| 1.6263456473052773| -|1970-01-01T08:00:00.300+08:00| 1.9798989957517026| -|1970-01-01T08:00:00.400+08:00| 3.252691126023161| -|1970-01-01T08:00:00.500+08:00| 1.414213562373095| -|1970-01-01T08:00:00.600+08:00| 2.1213203435596424| -|1970-01-01T08:00:00.700+08:00| 1.8384776479437628| -|1970-01-01T08:00:00.800+08:00| -0.14142135834465192| -|1970-01-01T08:00:00.900+08:00| 0.21213200063848547| -|1970-01-01T08:00:01.000+08:00| -0.7778174761639416| -|1970-01-01T08:00:01.100+08:00| -0.8485281289944873| -|1970-01-01T08:00:01.200+08:00| 0.2828427799095765| -|1970-01-01T08:00:01.300+08:00| -1.414213562373095| -|1970-01-01T08:00:01.400+08:00| 0.42426400127697095| -|1970-01-01T08:00:01.500+08:00| -0.42426408557066786| -+-----------------------------+-------------------------------------+ -``` - -### FFT - -#### Registration statement - -```sql -create function fft as 'org.apache.iotdb.library.frequency.UDTFFFT' -``` - -#### Usage - -This function is used to calculate the fast Fourier transform (FFT) of a numerical series. - -**Name:** FFT - -**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `method`: The type of FFT, which is 'uniform' (by default) or 'nonuniform'. If the value is 'uniform', the timestamps will be ignored and all data points will be regarded as equidistant. Thus, the equidistant fast Fourier transform algorithm will be applied. If the value is 'nonuniform' (TODO), the non-equidistant fast Fourier transform algorithm will be applied based on timestamps. -+ `result`: The result of FFT, which is 'real', 'imag', 'abs' or 'angle', corresponding to the real part, imaginary part, magnitude and phase angle. By default, the magnitude will be output. -+ `compress`: The parameter of compression, which is within (0,1]. It is the reserved energy ratio of lossy compression. By default, there is no compression. - - -**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. The timestamps starting from 0 only indicate the order. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - - -##### Uniform FFT - -With the default `type`, uniform FFT is applied. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 2.902113| -|1970-01-01T08:00:01.000+08:00| 1.1755705| -|1970-01-01T08:00:02.000+08:00| -2.1755705| -|1970-01-01T08:00:03.000+08:00| -1.9021131| -|1970-01-01T08:00:04.000+08:00| 1.0| -|1970-01-01T08:00:05.000+08:00| 1.9021131| -|1970-01-01T08:00:06.000+08:00| 0.1755705| -|1970-01-01T08:00:07.000+08:00| -1.1755705| -|1970-01-01T08:00:08.000+08:00| -0.902113| -|1970-01-01T08:00:09.000+08:00| 0.0| -|1970-01-01T08:00:10.000+08:00| 0.902113| -|1970-01-01T08:00:11.000+08:00| 1.1755705| -|1970-01-01T08:00:12.000+08:00| -0.1755705| -|1970-01-01T08:00:13.000+08:00| -1.9021131| -|1970-01-01T08:00:14.000+08:00| -1.0| -|1970-01-01T08:00:15.000+08:00| 1.9021131| -|1970-01-01T08:00:16.000+08:00| 2.1755705| -|1970-01-01T08:00:17.000+08:00| -1.1755705| -|1970-01-01T08:00:18.000+08:00| -2.902113| -|1970-01-01T08:00:19.000+08:00| 0.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select fft(s1) from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+----------------------+ -| Time| fft(root.test.d1.s1)| -+-----------------------------+----------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -|1970-01-01T08:00:00.001+08:00| 1.2727111142703152E-8| -|1970-01-01T08:00:00.002+08:00| 2.385520799101839E-7| -|1970-01-01T08:00:00.003+08:00| 8.723291723972645E-8| -|1970-01-01T08:00:00.004+08:00| 19.999999960195904| -|1970-01-01T08:00:00.005+08:00| 9.999999850988388| -|1970-01-01T08:00:00.006+08:00| 3.2260694930700566E-7| -|1970-01-01T08:00:00.007+08:00| 8.723291605373329E-8| -|1970-01-01T08:00:00.008+08:00| 1.108657103979944E-7| -|1970-01-01T08:00:00.009+08:00| 1.2727110997246171E-8| -|1970-01-01T08:00:00.010+08:00|1.9852334701272664E-23| -|1970-01-01T08:00:00.011+08:00| 1.2727111194499847E-8| -|1970-01-01T08:00:00.012+08:00| 1.108657103979944E-7| -|1970-01-01T08:00:00.013+08:00| 8.723291785769131E-8| -|1970-01-01T08:00:00.014+08:00| 3.226069493070057E-7| -|1970-01-01T08:00:00.015+08:00| 9.999999850988388| -|1970-01-01T08:00:00.016+08:00| 19.999999960195904| -|1970-01-01T08:00:00.017+08:00| 8.723291747109068E-8| -|1970-01-01T08:00:00.018+08:00| 2.3855207991018386E-7| -|1970-01-01T08:00:00.019+08:00| 1.2727112069910878E-8| -+-----------------------------+----------------------+ -``` - -Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, there are peaks in $k=4$ and $k=5$ of the output. - -##### Uniform FFT with Compression - -Input series is the same as above, the SQL for query is shown below: - -```sql -select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+----------------------+----------------------+ -| Time| fft(root.test.d1.s1,| fft(root.test.d1.s1,| -| | "result"="real",| "result"="imag",| -| | "compress"="0.99")| "compress"="0.99")| -+-----------------------------+----------------------+----------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| -|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| -|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| -|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| -|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| -|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| -|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| -+-----------------------------+----------------------+----------------------+ -``` - -Note: Based on the conjugation of the Fourier transform result, only the first half of the compression result is reserved. -According to the given parameter, data points are reserved from low frequency to high frequency until the reserved energy ratio exceeds it. -The last data point is reserved to indicate the length of the series. - -### HighPass - -#### Registration statement - -```sql -create function highpass as 'org.apache.iotdb.library.frequency.UDTFHighPass' -``` - -#### Usage - -This function performs low-pass filtering on the input series and extracts components above the cutoff frequency. -The timestamps of input will be ignored and all data points will be regarded as equidistant. - -**Name:** HIGHPASS - -**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. - -**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 2.902113| -|1970-01-01T08:00:01.000+08:00| 1.1755705| -|1970-01-01T08:00:02.000+08:00| -2.1755705| -|1970-01-01T08:00:03.000+08:00| -1.9021131| -|1970-01-01T08:00:04.000+08:00| 1.0| -|1970-01-01T08:00:05.000+08:00| 1.9021131| -|1970-01-01T08:00:06.000+08:00| 0.1755705| -|1970-01-01T08:00:07.000+08:00| -1.1755705| -|1970-01-01T08:00:08.000+08:00| -0.902113| -|1970-01-01T08:00:09.000+08:00| 0.0| -|1970-01-01T08:00:10.000+08:00| 0.902113| -|1970-01-01T08:00:11.000+08:00| 1.1755705| -|1970-01-01T08:00:12.000+08:00| -0.1755705| -|1970-01-01T08:00:13.000+08:00| -1.9021131| -|1970-01-01T08:00:14.000+08:00| -1.0| -|1970-01-01T08:00:15.000+08:00| 1.9021131| -|1970-01-01T08:00:16.000+08:00| 2.1755705| -|1970-01-01T08:00:17.000+08:00| -1.1755705| -|1970-01-01T08:00:18.000+08:00| -2.902113| -|1970-01-01T08:00:19.000+08:00| 0.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select highpass(s1,'wpass'='0.45') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-----------------------------------------+ -| Time|highpass(root.test.d1.s1, "wpass"="0.45")| -+-----------------------------+-----------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.9999999534830373| -|1970-01-01T08:00:01.000+08:00| 1.7462829277628608E-8| -|1970-01-01T08:00:02.000+08:00| -0.9999999593178128| -|1970-01-01T08:00:03.000+08:00| -4.1115269056426626E-8| -|1970-01-01T08:00:04.000+08:00| 0.9999999925494194| -|1970-01-01T08:00:05.000+08:00| 3.328126513330016E-8| -|1970-01-01T08:00:06.000+08:00| -1.0000000183304454| -|1970-01-01T08:00:07.000+08:00| 6.260191433311374E-10| -|1970-01-01T08:00:08.000+08:00| 1.0000000018134796| -|1970-01-01T08:00:09.000+08:00| -3.097210911744423E-17| -|1970-01-01T08:00:10.000+08:00| -1.0000000018134794| -|1970-01-01T08:00:11.000+08:00| -6.260191627862097E-10| -|1970-01-01T08:00:12.000+08:00| 1.0000000183304454| -|1970-01-01T08:00:13.000+08:00| -3.328126501424346E-8| -|1970-01-01T08:00:14.000+08:00| -0.9999999925494196| -|1970-01-01T08:00:15.000+08:00| 4.111526915498874E-8| -|1970-01-01T08:00:16.000+08:00| 0.9999999593178128| -|1970-01-01T08:00:17.000+08:00| -1.7462829341296528E-8| -|1970-01-01T08:00:18.000+08:00| -0.9999999534830369| -|1970-01-01T08:00:19.000+08:00| -1.035237222742873E-16| -+-----------------------------+-----------------------------------------+ -``` - -Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=sin(2\pi t/4)$ after high-pass filtering. - -### IFFT - -#### Registration statement - -```sql -create function ifft as 'org.apache.iotdb.library.frequency.UDTFIFFT' -``` - -#### Usage - -This function treats the two input series as the real and imaginary part of a complex series, performs an inverse fast Fourier transform (IFFT), and outputs the real part of the result. -For the input format, please refer to the output format of `FFT` function. -Moreover, the compressed output of `FFT` function is also supported. - -**Name:** IFFT - -**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `start`: The start time of the output series with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is '1970-01-01 08:00:00'. -+ `interval`: The interval of the output series, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it is 1s. - -**Output:** Output a single series. The type is DOUBLE. It is strictly equispaced. The values are the results of IFFT. - -**Note:** If a row contains null points or `NaN`, it will be ignored. - -#### Examples - - -Input series: - -``` -+-----------------------------+----------------------+----------------------+ -| Time| root.test.d1.re| root.test.d1.im| -+-----------------------------+----------------------+----------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| -|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| -|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| -|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| -|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| -|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| -|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| -+-----------------------------+----------------------+----------------------+ -``` - - -SQL for query: - -```sql -select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------------------+ -| Time|ifft(root.test.d1.re, root.test.d1.im, "interval"="1m",| -| | "start"="2021-01-01 00:00:00")| -+-----------------------------+-------------------------------------------------------+ -|2021-01-01T00:00:00.000+08:00| 2.902112992431231| -|2021-01-01T00:01:00.000+08:00| 1.1755704705132448| -|2021-01-01T00:02:00.000+08:00| -2.175570513757101| -|2021-01-01T00:03:00.000+08:00| -1.9021130389094498| -|2021-01-01T00:04:00.000+08:00| 0.9999999925494194| -|2021-01-01T00:05:00.000+08:00| 1.902113046743454| -|2021-01-01T00:06:00.000+08:00| 0.17557053610884188| -|2021-01-01T00:07:00.000+08:00| -1.1755704886020932| -|2021-01-01T00:08:00.000+08:00| -0.9021130371347148| -|2021-01-01T00:09:00.000+08:00| 3.552713678800501E-16| -|2021-01-01T00:10:00.000+08:00| 0.9021130371347154| -|2021-01-01T00:11:00.000+08:00| 1.1755704886020932| -|2021-01-01T00:12:00.000+08:00| -0.17557053610884144| -|2021-01-01T00:13:00.000+08:00| -1.902113046743454| -|2021-01-01T00:14:00.000+08:00| -0.9999999925494196| -|2021-01-01T00:15:00.000+08:00| 1.9021130389094498| -|2021-01-01T00:16:00.000+08:00| 2.1755705137571004| -|2021-01-01T00:17:00.000+08:00| -1.1755704705132448| -|2021-01-01T00:18:00.000+08:00| -2.902112992431231| -|2021-01-01T00:19:00.000+08:00| -3.552713678800501E-16| -+-----------------------------+-------------------------------------------------------+ -``` - -### LowPass - -#### Registration statement - -```sql -create function lowpass as 'org.apache.iotdb.library.frequency.UDTFLowPass' -``` - -#### Usage - -This function performs low-pass filtering on the input series and extracts components below the cutoff frequency. -The timestamps of input will be ignored and all data points will be regarded as equidistant. - -**Name:** LOWPASS - -**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. - -**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 2.902113| -|1970-01-01T08:00:01.000+08:00| 1.1755705| -|1970-01-01T08:00:02.000+08:00| -2.1755705| -|1970-01-01T08:00:03.000+08:00| -1.9021131| -|1970-01-01T08:00:04.000+08:00| 1.0| -|1970-01-01T08:00:05.000+08:00| 1.9021131| -|1970-01-01T08:00:06.000+08:00| 0.1755705| -|1970-01-01T08:00:07.000+08:00| -1.1755705| -|1970-01-01T08:00:08.000+08:00| -0.902113| -|1970-01-01T08:00:09.000+08:00| 0.0| -|1970-01-01T08:00:10.000+08:00| 0.902113| -|1970-01-01T08:00:11.000+08:00| 1.1755705| -|1970-01-01T08:00:12.000+08:00| -0.1755705| -|1970-01-01T08:00:13.000+08:00| -1.9021131| -|1970-01-01T08:00:14.000+08:00| -1.0| -|1970-01-01T08:00:15.000+08:00| 1.9021131| -|1970-01-01T08:00:16.000+08:00| 2.1755705| -|1970-01-01T08:00:17.000+08:00| -1.1755705| -|1970-01-01T08:00:18.000+08:00| -2.902113| -|1970-01-01T08:00:19.000+08:00| 0.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select lowpass(s1,'wpass'='0.45') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+----------------------------------------+ -| Time|lowpass(root.test.d1.s1, "wpass"="0.45")| -+-----------------------------+----------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 1.9021130073323922| -|1970-01-01T08:00:01.000+08:00| 1.1755704705132448| -|1970-01-01T08:00:02.000+08:00| -1.1755705286582614| -|1970-01-01T08:00:03.000+08:00| -1.9021130389094498| -|1970-01-01T08:00:04.000+08:00| 7.450580419288145E-9| -|1970-01-01T08:00:05.000+08:00| 1.902113046743454| -|1970-01-01T08:00:06.000+08:00| 1.1755705212076808| -|1970-01-01T08:00:07.000+08:00| -1.1755704886020932| -|1970-01-01T08:00:08.000+08:00| -1.9021130222335536| -|1970-01-01T08:00:09.000+08:00| 3.552713678800501E-16| -|1970-01-01T08:00:10.000+08:00| 1.9021130222335536| -|1970-01-01T08:00:11.000+08:00| 1.1755704886020932| -|1970-01-01T08:00:12.000+08:00| -1.1755705212076801| -|1970-01-01T08:00:13.000+08:00| -1.902113046743454| -|1970-01-01T08:00:14.000+08:00| -7.45058112983088E-9| -|1970-01-01T08:00:15.000+08:00| 1.9021130389094498| -|1970-01-01T08:00:16.000+08:00| 1.1755705286582616| -|1970-01-01T08:00:17.000+08:00| -1.1755704705132448| -|1970-01-01T08:00:18.000+08:00| -1.9021130073323924| -|1970-01-01T08:00:19.000+08:00| -2.664535259100376E-16| -+-----------------------------+----------------------------------------+ -``` - -Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=2sin(2\pi t/5)$ after low-pass filtering. - - - -## Data Matching - -### Cov - -#### Registration statement - -```sql -create function cov as 'org.apache.iotdb.library.dmatch.UDAFCov' -``` - -#### Usage - -This function is used to calculate the population covariance. - -**Name:** COV - -**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population covariance. - -**Note:** - -+ If a row contains missing points, null points or `NaN`, it will be ignored; -+ If all rows are ignored, `NaN` will be output. - - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d2.s1|root.test.d2.s2| -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| -|2020-01-01T00:00:03.000+08:00| 101.0| null| -|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| -|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| -|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| -|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| -|2020-01-01T00:00:12.000+08:00| null| 103.0| -|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| -|2020-01-01T00:00:15.000+08:00| 113.0| null| -|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| -|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| -|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| -|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| -|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| -|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| -|2020-01-01T00:00:30.000+08:00| NaN| 108.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select cov(s1,s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------+ -| Time|cov(root.test.d2.s1, root.test.d2.s2)| -+-----------------------------+-------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 12.291666666666666| -+-----------------------------+-------------------------------------+ -``` - -### DTW - -#### Registration statement - -```sql -create function dtw as 'org.apache.iotdb.library.dmatch.UDAFDtw' -``` - -#### Usage - -This function is used to calculate the DTW distance between two input series. - -**Name:** DTW - -**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the DTW distance. - -**Note:** - -+ If a row contains missing points, null points or `NaN`, it will be ignored; -+ If all rows are ignored, `0` will be output. - - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d2.s1|root.test.d2.s2| -+-----------------------------+---------------+---------------+ -|1970-01-01T08:00:00.001+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.002+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.003+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.004+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.006+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.007+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.008+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.009+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.010+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.011+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.012+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.013+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.014+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.015+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.016+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.017+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.018+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.019+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.020+08:00| 1.0| 2.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select dtw(s1,s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------+ -| Time|dtw(root.test.d2.s1, root.test.d2.s2)| -+-----------------------------+-------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 20.0| -+-----------------------------+-------------------------------------+ -``` - -### Pearson - -#### Registration statement - -```sql -create function pearson as 'org.apache.iotdb.library.dmatch.UDAFPearson' -``` - -#### Usage - -This function is used to calculate the Pearson Correlation Coefficient. - -**Name:** PEARSON - -**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the Pearson Correlation Coefficient. - -**Note:** - -+ If a row contains missing points, null points or `NaN`, it will be ignored; -+ If all rows are ignored, `NaN` will be output. - - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d2.s1|root.test.d2.s2| -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| -|2020-01-01T00:00:03.000+08:00| 101.0| null| -|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| -|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| -|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| -|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| -|2020-01-01T00:00:12.000+08:00| null| 103.0| -|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| -|2020-01-01T00:00:15.000+08:00| 113.0| null| -|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| -|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| -|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| -|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| -|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| -|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| -|2020-01-01T00:00:30.000+08:00| NaN| 108.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select pearson(s1,s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-----------------------------------------+ -| Time|pearson(root.test.d2.s1, root.test.d2.s2)| -+-----------------------------+-----------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.5630881927754872| -+-----------------------------+-----------------------------------------+ -``` - -### PtnSym - -#### Registration statement - -```sql -create function ptnsym as 'org.apache.iotdb.library.dmatch.UDTFPtnSym' -``` - -#### Usage - -This function is used to find all symmetric subseries in the input whose degree of symmetry is less than the threshold. -The degree of symmetry is calculated by DTW. -The smaller the degree, the more symmetrical the series is. - -**Name:** PATTERNSYMMETRIC - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE - -**Parameter:** - -+ `window`: The length of the symmetric subseries. It's a positive integer and the default value is 10. -+ `threshold`: The threshold of the degree of symmetry. It's non-negative. Only the subseries whose degree of symmetry is below it will be output. By default, all subseries will be output. - - -**Output Series:** Output a single series. The type is DOUBLE. Each data point in the output series corresponds to a symmetric subseries. The output timestamp is the starting timestamp of the subseries and the output value is the degree of symmetry. - -#### Example - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s4| -+-----------------------------+---------------+ -|2021-01-01T12:00:00.000+08:00| 1.0| -|2021-01-01T12:00:01.000+08:00| 2.0| -|2021-01-01T12:00:02.000+08:00| 3.0| -|2021-01-01T12:00:03.000+08:00| 2.0| -|2021-01-01T12:00:04.000+08:00| 1.0| -|2021-01-01T12:00:05.000+08:00| 1.0| -|2021-01-01T12:00:06.000+08:00| 1.0| -|2021-01-01T12:00:07.000+08:00| 1.0| -|2021-01-01T12:00:08.000+08:00| 2.0| -|2021-01-01T12:00:09.000+08:00| 3.0| -|2021-01-01T12:00:10.000+08:00| 2.0| -|2021-01-01T12:00:11.000+08:00| 1.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------+ -| Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| -+-----------------------------+------------------------------------------------------+ -|2021-01-01T12:00:00.000+08:00| 0.0| -|2021-01-01T12:00:07.000+08:00| 0.0| -+-----------------------------+------------------------------------------------------+ -``` - -### XCorr - -#### Registration statement - -```sql -create function xcorr as 'org.apache.iotdb.library.dmatch.UDTFXCorr' -``` - -#### Usage - -This function is used to calculate the cross correlation function of given two time series. -For discrete time series, cross correlation is given by -$$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ -which represent the similarities between two series with different index shifts. - -**Name:** XCORR - -**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series with DOUBLE as datatype. -There are $2N-1$ data points in the series, the center of which represents the cross correlation -calculated with pre-aligned series(that is $CR(0)$ in the formula above), -and the previous(or post) values represent those with shifting the latter series forward(or backward otherwise) -until the two series are no longer overlapped(not included). -In short, the values of output series are given by(index starts from 1) -$$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ -$$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ - -**Note:** - -+ `null` and `NaN` values in the input series will be ignored and treated as 0. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d1.s1|root.test.d1.s2| -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:01.000+08:00| null| 6| -|2020-01-01T00:00:02.000+08:00| 2| 7| -|2020-01-01T00:00:03.000+08:00| 3| NaN| -|2020-01-01T00:00:04.000+08:00| 4| 9| -|2020-01-01T00:00:05.000+08:00| 5| 10| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 -``` - -Output series: - -``` -+-----------------------------+---------------------------------------+ -| Time|xcorr(root.test.d1.s1, root.test.d1.s2)| -+-----------------------------+---------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 0.0| -|1970-01-01T08:00:00.002+08:00| 4.0| -|1970-01-01T08:00:00.003+08:00| 9.6| -|1970-01-01T08:00:00.004+08:00| 13.4| -|1970-01-01T08:00:00.005+08:00| 20.0| -|1970-01-01T08:00:00.006+08:00| 15.6| -|1970-01-01T08:00:00.007+08:00| 9.2| -|1970-01-01T08:00:00.008+08:00| 11.8| -|1970-01-01T08:00:00.009+08:00| 6.0| -+-----------------------------+---------------------------------------+ -``` - - - -## Data Repairing - -### TimestampRepair - -#### Registration statement - -```sql -create function timestamprepair as 'org.apache.iotdb.library.drepair.UDTFTimestampRepair' -``` - -#### Usage - -This function is used for timestamp repair. -According to the given standard time interval, -the method of minimizing the repair cost is adopted. -By fine-tuning the timestamps, -the original data with unstable timestamp interval is repaired to strictly equispaced data. -If no standard time interval is given, -this function will use the **median**, **mode** or **cluster** of the time interval to estimate the standard time interval. - -**Name:** TIMESTAMPREPAIR - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `interval`: The standard time interval whose unit is millisecond. It is a positive integer. By default, it will be estimated according to the given method. -+ `method`: The method to estimate the standard time interval, which is 'median', 'mode' or 'cluster'. This parameter is only valid when `interval` is not given. By default, median will be used. - -**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. - -#### Examples - -##### Manually Specify the Standard Time Interval - -When `interval` is given, this function repairs according to the given standard time interval. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s1| -+-----------------------------+---------------+ -|2021-07-01T12:00:00.000+08:00| 1.0| -|2021-07-01T12:00:10.000+08:00| 2.0| -|2021-07-01T12:00:19.000+08:00| 3.0| -|2021-07-01T12:00:30.000+08:00| 4.0| -|2021-07-01T12:00:40.000+08:00| 5.0| -|2021-07-01T12:00:50.000+08:00| 6.0| -|2021-07-01T12:01:01.000+08:00| 7.0| -|2021-07-01T12:01:11.000+08:00| 8.0| -|2021-07-01T12:01:21.000+08:00| 9.0| -|2021-07-01T12:01:31.000+08:00| 10.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select timestamprepair(s1,'interval'='10000') from root.test.d2 -``` - -Output series: - - -``` -+-----------------------------+----------------------------------------------------+ -| Time|timestamprepair(root.test.d2.s1, "interval"="10000")| -+-----------------------------+----------------------------------------------------+ -|2021-07-01T12:00:00.000+08:00| 1.0| -|2021-07-01T12:00:10.000+08:00| 2.0| -|2021-07-01T12:00:20.000+08:00| 3.0| -|2021-07-01T12:00:30.000+08:00| 4.0| -|2021-07-01T12:00:40.000+08:00| 5.0| -|2021-07-01T12:00:50.000+08:00| 6.0| -|2021-07-01T12:01:00.000+08:00| 7.0| -|2021-07-01T12:01:10.000+08:00| 8.0| -|2021-07-01T12:01:20.000+08:00| 9.0| -|2021-07-01T12:01:30.000+08:00| 10.0| -+-----------------------------+----------------------------------------------------+ -``` - -##### Automatically Estimate the Standard Time Interval - -When `interval` is default, this function estimates the standard time interval. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select timestamprepair(s1) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+--------------------------------+ -| Time|timestamprepair(root.test.d2.s1)| -+-----------------------------+--------------------------------+ -|2021-07-01T12:00:00.000+08:00| 1.0| -|2021-07-01T12:00:10.000+08:00| 2.0| -|2021-07-01T12:00:20.000+08:00| 3.0| -|2021-07-01T12:00:30.000+08:00| 4.0| -|2021-07-01T12:00:40.000+08:00| 5.0| -|2021-07-01T12:00:50.000+08:00| 6.0| -|2021-07-01T12:01:00.000+08:00| 7.0| -|2021-07-01T12:01:10.000+08:00| 8.0| -|2021-07-01T12:01:20.000+08:00| 9.0| -|2021-07-01T12:01:30.000+08:00| 10.0| -+-----------------------------+--------------------------------+ -``` - -### ValueFill - -#### Registration statement - -```sql -create function valuefill as 'org.apache.iotdb.library.drepair.UDTFValueFill' -``` - -#### Usage - -This function is used to impute time series. Several methods are supported. - -**Name**: ValueFill -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, default "linear". - Method to use for imputation in series. "mean": use global mean value to fill holes; "previous": propagate last valid observation forward to next valid. "linear": simplest interpolation method; "likelihood":Maximum likelihood estimation based on the normal distribution of speed; "AR": auto regression; "MA": moving average; "SCREEN": speed constraint. - -**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. - -**Note:** AR method use AR(1) model. Input value should be auto-correlated, or the function would output a single point (0, 0.0). - -#### Examples - -##### Fill with linear - -When `method` is "linear" or the default, Screen method is used to impute. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| NaN| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| NaN| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| NaN| -|2020-01-01T00:00:22.000+08:00| NaN| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| 128.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select valuefill(s1) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-----------------------+ -| Time|valuefill(root.test.d2)| -+-----------------------------+-----------------------+ -|2020-01-01T00:00:02.000+08:00| NaN| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 108.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.7| -|2020-01-01T00:00:22.000+08:00| 121.3| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| 128.0| -+-----------------------------+-----------------------+ -``` - -##### Previous Fill - -When `method` is "previous", previous method is used. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select valuefill(s1,"method"="previous") from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------+ -| Time|valuefill(root.test.d2,"method"="previous")| -+-----------------------------+-------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| NaN| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 110.5| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 116.0| -|2020-01-01T00:00:22.000+08:00| 116.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| 128.0| -+-----------------------------+-------------------------------------------+ -``` - -### ValueRepair - -#### Registration statement - -```sql -create function valuerepair as 'org.apache.iotdb.library.drepair.UDTFValueRepair' -``` - -#### Usage - -This function is used to repair the value of the time series. -Currently, two methods are supported: -**Screen** is a method based on speed threshold, which makes all speeds meet the threshold requirements under the premise of minimum changes; -**LsGreedy** is a method based on speed change likelihood, which models speed changes as Gaussian distribution, and uses a greedy algorithm to maximize the likelihood. - - -**Name:** VALUEREPAIR - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `method`: The method used to repair, which is 'Screen' or 'LsGreedy'. By default, Screen is used. -+ `minSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds below it will be regarded as outliers. By default, it is the median minus 3 times of median absolute deviation. -+ `maxSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds above it will be regarded as outliers. By default, it is the median plus 3 times of median absolute deviation. -+ `center`: This parameter is only valid with LsGreedy. It is the center of the Gaussian distribution of speed changes. By default, it is 0. -+ `sigma`: This parameter is only valid with LsGreedy. It is the standard deviation of the Gaussian distribution of speed changes. By default, it is the median absolute deviation. - -**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. - -**Note:** `NaN` will be filled with linear interpolation before repairing. - -#### Examples - -##### Repair with Screen - -When `method` is 'Screen' or the default, Screen method is used. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 100.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select valuerepair(s1) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+----------------------------+ -| Time|valuerepair(root.test.d2.s1)| -+-----------------------------+----------------------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 106.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| 128.0| -+-----------------------------+----------------------------+ -``` - -##### Repair with LsGreedy - -When `method` is 'LsGreedy', LsGreedy method is used. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select valuerepair(s1,'method'='LsGreedy') from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------------+ -| Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")| -+-----------------------------+-------------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 106.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| 128.0| -+-----------------------------+-------------------------------------------------+ -``` - -### MasterRepair - -#### Usage - -This function is used to clean time series with master data. - -**Name**: MasterRepair -**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `omega`: The window size. It is a non-negative integer whose unit is millisecond. By default, it will be estimated according to the distances of two tuples with various time differences. -+ `eta`: The distance threshold. It is a positive number. By default, it will be estimated according to the distance distribution of tuples in windows. -+ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple dis- tance of the k-th nearest neighbor in the master data. -+ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. - -**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+------------+------------+------------+------------+------------+ -| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| -+-----------------------------+------------+------------+------------+------------+------------+------------+ -|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| -|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| -|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| -|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| -|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| -|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| -|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| -|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| -+-----------------------------+------------+------------+------------+------------+------------+------------+ -``` - -SQL for query: - -```sql -select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test -``` - -Output series: - - -``` -+-----------------------------+-------------------------------------------------------------------------------------------+ -| Time|MasterRepair(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| -+-----------------------------+-------------------------------------------------------------------------------------------+ -|2021-07-01T12:00:01.000+08:00| 1704| -|2021-07-01T12:00:02.000+08:00| 1702| -|2021-07-01T12:00:03.000+08:00| 1702| -|2021-07-01T12:00:04.000+08:00| 1701| -|2021-07-01T12:00:07.000+08:00| 1703| -|2021-07-01T12:00:08.000+08:00| 1704| -|2021-07-01T12:01:09.000+08:00| 1705| -|2021-07-01T12:01:10.000+08:00| 1706| -+-----------------------------+-------------------------------------------------------------------------------------------+ -``` - -### SeasonalRepair - -#### Usage -This function is used to repair the value of the seasonal time series via decomposition. Currently, two methods are supported: **Classical** - detect irregular fluctuations through residual component decomposed by classical decomposition, and repair them through moving average; **Improved** - detect irregular fluctuations through residual component decomposed by improved decomposition, and repair them through moving median. - -**Name:** SEASONALREPAIR - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `method`: The decomposition method used to repair, which is 'Classical' or 'Improved'. By default, classical decomposition is used. -+ `period`: It is the period of the time series. -+ `k`: It is the range threshold of residual term, which limits the degree to which the residual term is off-center. By default, it is 9. -+ `max_iter`: It is the maximum number of iterations for the algorithm. By default, it is 10. - -**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. - -**Note:** `NaN` will be filled with linear interpolation before repairing. - -#### Examples - -##### Repair with Classical - -When `method` is 'Classical' or default value, classical decomposition method is used. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:04.000+08:00| 120.0| -|2020-01-01T00:00:06.000+08:00| 80.0| -|2020-01-01T00:00:08.000+08:00| 100.5| -|2020-01-01T00:00:10.000+08:00| 119.5| -|2020-01-01T00:00:12.000+08:00| 101.0| -|2020-01-01T00:00:14.000+08:00| 99.5| -|2020-01-01T00:00:16.000+08:00| 119.0| -|2020-01-01T00:00:18.000+08:00| 80.5| -|2020-01-01T00:00:20.000+08:00| 99.0| -|2020-01-01T00:00:22.000+08:00| 121.0| -|2020-01-01T00:00:24.000+08:00| 79.5| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------+ -| Time|seasonalrepair(root.test.d2.s1, 'period'=4, 'k'=2)| -+-----------------------------+--------------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:04.000+08:00| 120.0| -|2020-01-01T00:00:06.000+08:00| 80.0| -|2020-01-01T00:00:08.000+08:00| 100.5| -|2020-01-01T00:00:10.000+08:00| 119.5| -|2020-01-01T00:00:12.000+08:00| 87.0| -|2020-01-01T00:00:14.000+08:00| 99.5| -|2020-01-01T00:00:16.000+08:00| 119.0| -|2020-01-01T00:00:18.000+08:00| 80.5| -|2020-01-01T00:00:20.000+08:00| 99.0| -|2020-01-01T00:00:22.000+08:00| 121.0| -|2020-01-01T00:00:24.000+08:00| 79.5| -+-----------------------------+--------------------------------------------------+ -``` - -##### Repair with Improved -When `method` is 'Improved', improved decomposition method is used. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------------------------+ -| Time|valuerepair(root.test.d2.s1, 'method'='improved', 'period'=3)| -+-----------------------------+-------------------------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:04.000+08:00| 120.0| -|2020-01-01T00:00:06.000+08:00| 80.0| -|2020-01-01T00:00:08.000+08:00| 100.5| -|2020-01-01T00:00:10.000+08:00| 119.5| -|2020-01-01T00:00:12.000+08:00| 81.5| -|2020-01-01T00:00:14.000+08:00| 99.5| -|2020-01-01T00:00:16.000+08:00| 119.0| -|2020-01-01T00:00:18.000+08:00| 80.5| -|2020-01-01T00:00:20.000+08:00| 99.0| -|2020-01-01T00:00:22.000+08:00| 121.0| -|2020-01-01T00:00:24.000+08:00| 79.5| -+-----------------------------+-------------------------------------------------------------+ -``` - - - -## Series Discovery - -### ConsecutiveSequences - -#### Registration statement - -```sql -create function consecutivesequences as 'org.apache.iotdb.library.series.UDTFConsecutiveSequences' -``` - -#### Usage - -This function is used to find locally longest consecutive subsequences in strictly equispaced multidimensional data. - -Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. - -Consecutive subsequence is the subsequence that is strictly equispaced with the standard time interval without any missing data. If a consecutive subsequence is not a proper subsequence of any consecutive subsequence, it is locally longest. - -**Name:** CONSECUTIVESEQUENCES - -**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. - -**Parameters:** - -+ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. - -**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a locally longest consecutive subsequence. The output timestamp is the starting timestamp of the subsequence and the output value is the number of data points in the subsequence. - -**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. - -#### Examples - -##### Manually Specify the Standard Time Interval - -It's able to manually specify the standard time interval by the parameter `gap`. It's notable that false parameter leads to false output. - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d1.s1|root.test.d1.s2| -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:40:00.000+08:00| 1.0| null| -|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------------------+ -| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")| -+-----------------------------+------------------------------------------------------------------+ -|2020-01-01T00:00:00.000+08:00| 3| -|2020-01-01T00:20:00.000+08:00| 4| -|2020-01-01T00:45:00.000+08:00| 2| -+-----------------------------+------------------------------------------------------------------+ -``` - - -##### Automatically Estimate the Standard Time Interval - -When `gap` is default, this function estimates the standard time interval by the mode of time intervals and gets the same results. Therefore, this usage is more recommended. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select consecutivesequences(s1,s2) from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------+ -| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)| -+-----------------------------+------------------------------------------------------+ -|2020-01-01T00:00:00.000+08:00| 3| -|2020-01-01T00:20:00.000+08:00| 4| -|2020-01-01T00:45:00.000+08:00| 2| -+-----------------------------+------------------------------------------------------+ -``` - -### ConsecutiveWindows - -#### Registration statement - -```sql -create function consecutivewindows as 'org.apache.iotdb.library.series.UDTFConsecutiveWindows' -``` - -#### Usage - -This function is used to find consecutive windows of specified length in strictly equispaced multidimensional data. - -Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. - -Consecutive window is the subsequence that is strictly equispaced with the standard time interval without any missing data. - -**Name:** CONSECUTIVEWINDOWS - -**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. - -**Parameters:** - -+ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. -+ `length`: The length of the window which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. - -**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a consecutive window. The output timestamp is the starting timestamp of the window and the output value is the number of data points in the window. - -**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. - -#### Examples - - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d1.s1|root.test.d1.s2| -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:40:00.000+08:00| 1.0| null| -|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------------------------+ -| Time|consecutivewindows(root.test.d1.s1, root.test.d1.s2, "length"="10m")| -+-----------------------------+--------------------------------------------------------------------+ -|2020-01-01T00:00:00.000+08:00| 3| -|2020-01-01T00:20:00.000+08:00| 3| -|2020-01-01T00:25:00.000+08:00| 3| -+-----------------------------+--------------------------------------------------------------------+ -``` - - - -## Machine Learning - -### AR - -#### Registration statement - -```sql -create function ar as 'org.apache.iotdb.library.dlearn.UDTFAR' -``` - -#### Usage - -This function is used to learn the coefficients of the autoregressive models for a time series. - -**Name:** AR - -**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -- `p`: The order of the autoregressive model. Its default value is 1. - -**Output Series:** Output a single series. The type is DOUBLE. The first line corresponds to the first order coefficient, and so on. - -**Note:** - -- Parameter `p` should be a positive integer. -- Most points in the series should be sampled at a constant time interval. -- Linear interpolation is applied for the missing points in the series. - -#### Examples - -##### Assigning Model Order - -Input Series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d0.s0| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| -4.0| -|2020-01-01T00:00:02.000+08:00| -3.0| -|2020-01-01T00:00:03.000+08:00| -2.0| -|2020-01-01T00:00:04.000+08:00| -1.0| -|2020-01-01T00:00:05.000+08:00| 0.0| -|2020-01-01T00:00:06.000+08:00| 1.0| -|2020-01-01T00:00:07.000+08:00| 2.0| -|2020-01-01T00:00:08.000+08:00| 3.0| -|2020-01-01T00:00:09.000+08:00| 4.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select ar(s0,"p"="2") from root.test.d0 -``` - -Output Series: - -``` -+-----------------------------+---------------------------+ -| Time|ar(root.test.d0.s0,"p"="2")| -+-----------------------------+---------------------------+ -|1970-01-01T08:00:00.001+08:00| 0.9429| -|1970-01-01T08:00:00.002+08:00| -0.2571| -+-----------------------------+---------------------------+ -``` - -### Representation - -#### Usage - -This function is used to represent a time series. - -**Name:** Representation - -**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -- `tb`: The number of timestamp blocks. Its default value is 10. -- `vb`: The number of value blocks. Its default value is 10. - -**Output Series:** Output a single series. The type is INT32. The length is `tb*vb`. The timestamps starting from 0 only indicate the order. - -**Note:** - -- Parameters `tb` and `vb` should be positive integers. - -#### Examples - -##### Assigning Window Size and Dimension - -Input Series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d0.s0| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| -4.0| -|2020-01-01T00:00:02.000+08:00| -3.0| -|2020-01-01T00:00:03.000+08:00| -2.0| -|2020-01-01T00:00:04.000+08:00| -1.0| -|2020-01-01T00:00:05.000+08:00| 0.0| -|2020-01-01T00:00:06.000+08:00| 1.0| -|2020-01-01T00:00:07.000+08:00| 2.0| -|2020-01-01T00:00:08.000+08:00| 3.0| -|2020-01-01T00:00:09.000+08:00| 4.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select representation(s0,"tb"="3","vb"="2") from root.test.d0 -``` - -Output Series: - -``` -+-----------------------------+-------------------------------------------------+ -| Time|representation(root.test.d0.s0,"tb"="3","vb"="2")| -+-----------------------------+-------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1| -|1970-01-01T08:00:00.002+08:00| 1| -|1970-01-01T08:00:00.003+08:00| 0| -|1970-01-01T08:00:00.004+08:00| 0| -|1970-01-01T08:00:00.005+08:00| 1| -|1970-01-01T08:00:00.006+08:00| 1| -+-----------------------------+-------------------------------------------------+ -``` - -### RM - -#### Usage - -This function is used to calculate the matching score of two time series according to the representation. - -**Name:** RM - -**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -- `tb`: The number of timestamp blocks. Its default value is 10. -- `vb`: The number of value blocks. Its default value is 10. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the matching score. - -**Note:** - -- Parameters `tb` and `vb` should be positive integers. - -#### Examples - -##### Assigning Window Size and Dimension - -Input Series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d0.s0|root.test.d0.s1 -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:01.000+08:00| -4.0| -4.0| -|2020-01-01T00:00:02.000+08:00| -3.0| -3.0| -|2020-01-01T00:00:03.000+08:00| -3.0| -3.0| -|2020-01-01T00:00:04.000+08:00| -1.0| -1.0| -|2020-01-01T00:00:05.000+08:00| 0.0| 0.0| -|2020-01-01T00:00:06.000+08:00| 1.0| 1.0| -|2020-01-01T00:00:07.000+08:00| 2.0| 2.0| -|2020-01-01T00:00:08.000+08:00| 3.0| 3.0| -|2020-01-01T00:00:09.000+08:00| 4.0| 4.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 -``` - -Output Series: - -``` -+-----------------------------+-----------------------------------------------------+ -| Time|rm(root.test.d0.s0,root.test.d0.s1,"tb"="3","vb"="2")| -+-----------------------------+-----------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1.00| -+-----------------------------+-----------------------------------------------------+ -``` - diff --git a/src/UserGuide/V1.3.0-2/Reference/UDF-Libraries_timecho.md b/src/UserGuide/V1.3.0-2/Reference/UDF-Libraries_timecho.md deleted file mode 100644 index 02745cf67..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/UDF-Libraries_timecho.md +++ /dev/null @@ -1,5303 +0,0 @@ - - -# UDF Libraries - -Based on the ability of user-defined functions, IoTDB provides a series of functions for temporal data processing, including data quality, data profiling, anomaly detection, frequency domain analysis, data matching, data repairing, sequence discovery, machine learning, etc., which can meet the needs of industrial fields for temporal data processing. - -> Note: The functions in the current UDF library only support millisecond level timestamp accuracy. - -## Installation steps - -1. Please obtain the compressed file of the UDF library JAR package that is compatible with the IoTDB version. - - | UDF installation package | Supported IoTDB versions | Download link | - | --------------- | ----------------- | ------------------------------------------------------------ | - | TimechoDB-UDF-1.3.3.zip | V1.3.3 and above | Please contact Timecho for assistance | - | TimechoDB-UDF-1.3.2.zip | V1.0.0~V1.3.2 | Please contact Timecho for assistance| - -2. Place the `library-udf.jar` file in the compressed file obtained in the directory `/ext/udf ` of all nodes in the IoTDB cluster -3. In the SQL command line terminal (CLI) or visualization console (Workbench) SQL operation interface of IoTDB, execute the corresponding function registration statement as follows. -4. Batch registration: Two registration methods: registration script or SQL full statement -- Register Script - - Copy the registration script (`register-UDF.sh` or `register-UDF.bat`) from the compressed package to the `tools` directory of IoTDB as needed, and modify the parameters in the script (default is host=127.0.0.1, rpcPort=6667, user=root, pass=root); - - Start IoTDB service, run registration script to batch register UDF - -- All SQL statements - - Open the SQl file in the compressed package, copy all SQL statements, and execute all SQl statements in the SQL command line terminal (CLI) of IoTDB or the SQL operation interface of the visualization console (Workbench) to batch register UDF - - -## Data Quality - -### Completeness - -#### Registration statement - -```sql -create function completeness as 'org.apache.iotdb.library.dquality.UDTFCompleteness' -``` - -#### Usage - -This function is used to calculate the completeness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the completeness of each window will be output. - -**Name:** COMPLETENESS - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. -+ `downtime`: Whether the downtime exception is considered in the calculation of completeness. It is 'true' or 'false' (default). When considering the downtime exception, long-term missing data will be considered as downtime exception without any influence on completeness. - -**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. - -**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. - -#### Examples - -##### Default Parameters - -With default parameters, this function will regard all input data as the same window. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+-----------------------------+ -| Time|completeness(root.test.d1.s1)| -+-----------------------------+-----------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.875| -+-----------------------------+-----------------------------+ -``` - -##### Specific Window Size - -When the window size is given, this function will divide the input data as multiple windows. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -|2020-01-01T00:00:32.000+08:00| 130.0| -|2020-01-01T00:00:34.000+08:00| 132.0| -|2020-01-01T00:00:36.000+08:00| 134.0| -|2020-01-01T00:00:38.000+08:00| 136.0| -|2020-01-01T00:00:40.000+08:00| 138.0| -|2020-01-01T00:00:42.000+08:00| 140.0| -|2020-01-01T00:00:44.000+08:00| 142.0| -|2020-01-01T00:00:46.000+08:00| 144.0| -|2020-01-01T00:00:48.000+08:00| 146.0| -|2020-01-01T00:00:50.000+08:00| 148.0| -|2020-01-01T00:00:52.000+08:00| 150.0| -|2020-01-01T00:00:54.000+08:00| 152.0| -|2020-01-01T00:00:56.000+08:00| 154.0| -|2020-01-01T00:00:58.000+08:00| 156.0| -|2020-01-01T00:01:00.000+08:00| 158.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------+ -| Time|completeness(root.test.d1.s1, "window"="15")| -+-----------------------------+--------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.875| -|2020-01-01T00:00:32.000+08:00| 1.0| -+-----------------------------+--------------------------------------------+ -``` - -### Consistency - -#### Registration statement - -```sql -create function consistency as 'org.apache.iotdb.library.dquality.UDTFConsistency' -``` - -#### Usage - -This function is used to calculate the consistency of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the consistency of each window will be output. - -**Name:** CONSISTENCY - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. - -**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. - -**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. - -#### Examples - -##### Default Parameters - -With default parameters, this function will regard all input data as the same window. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+----------------------------+ -| Time|consistency(root.test.d1.s1)| -+-----------------------------+----------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| -+-----------------------------+----------------------------+ -``` - -##### Specific Window Size - -When the window size is given, this function will divide the input data as multiple windows. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -|2020-01-01T00:00:32.000+08:00| 130.0| -|2020-01-01T00:00:34.000+08:00| 132.0| -|2020-01-01T00:00:36.000+08:00| 134.0| -|2020-01-01T00:00:38.000+08:00| 136.0| -|2020-01-01T00:00:40.000+08:00| 138.0| -|2020-01-01T00:00:42.000+08:00| 140.0| -|2020-01-01T00:00:44.000+08:00| 142.0| -|2020-01-01T00:00:46.000+08:00| 144.0| -|2020-01-01T00:00:48.000+08:00| 146.0| -|2020-01-01T00:00:50.000+08:00| 148.0| -|2020-01-01T00:00:52.000+08:00| 150.0| -|2020-01-01T00:00:54.000+08:00| 152.0| -|2020-01-01T00:00:56.000+08:00| 154.0| -|2020-01-01T00:00:58.000+08:00| 156.0| -|2020-01-01T00:01:00.000+08:00| 158.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------+ -| Time|consistency(root.test.d1.s1, "window"="15")| -+-----------------------------+-------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| -|2020-01-01T00:00:32.000+08:00| 1.0| -+-----------------------------+-------------------------------------------+ -``` - -### Timeliness - -#### Registration statement - -```sql -create function timeliness as 'org.apache.iotdb.library.dquality.UDTFTimeliness' -``` - -#### Usage - -This function is used to calculate the timeliness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the timeliness of each window will be output. - -**Name:** TIMELINESS - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. - -**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. - -**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. - -#### Examples - -##### Default Parameters - -With default parameters, this function will regard all input data as the same window. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+---------------------------+ -| Time|timeliness(root.test.d1.s1)| -+-----------------------------+---------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| -+-----------------------------+---------------------------+ -``` - -##### Specific Window Size - -When the window size is given, this function will divide the input data as multiple windows. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -|2020-01-01T00:00:32.000+08:00| 130.0| -|2020-01-01T00:00:34.000+08:00| 132.0| -|2020-01-01T00:00:36.000+08:00| 134.0| -|2020-01-01T00:00:38.000+08:00| 136.0| -|2020-01-01T00:00:40.000+08:00| 138.0| -|2020-01-01T00:00:42.000+08:00| 140.0| -|2020-01-01T00:00:44.000+08:00| 142.0| -|2020-01-01T00:00:46.000+08:00| 144.0| -|2020-01-01T00:00:48.000+08:00| 146.0| -|2020-01-01T00:00:50.000+08:00| 148.0| -|2020-01-01T00:00:52.000+08:00| 150.0| -|2020-01-01T00:00:54.000+08:00| 152.0| -|2020-01-01T00:00:56.000+08:00| 154.0| -|2020-01-01T00:00:58.000+08:00| 156.0| -|2020-01-01T00:01:00.000+08:00| 158.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------+ -| Time|timeliness(root.test.d1.s1, "window"="15")| -+-----------------------------+------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| -|2020-01-01T00:00:32.000+08:00| 1.0| -+-----------------------------+------------------------------------------+ -``` - -### Validity - -#### Registration statement - -```sql -create function validity as 'org.apache.iotdb.library.dquality.UDTFValidity' -``` - -#### Usage - -This function is used to calculate the Validity of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the Validity of each window will be output. - -**Name:** VALIDITY - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. - -**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. - -**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. - -#### Examples - -##### Default Parameters - -With default parameters, this function will regard all input data as the same window. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select Validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+-------------------------+ -| Time|validity(root.test.d1.s1)| -+-----------------------------+-------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| -+-----------------------------+-------------------------+ -``` - -##### Specific Window Size - -When the window size is given, this function will divide the input data as multiple windows. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -|2020-01-01T00:00:32.000+08:00| 130.0| -|2020-01-01T00:00:34.000+08:00| 132.0| -|2020-01-01T00:00:36.000+08:00| 134.0| -|2020-01-01T00:00:38.000+08:00| 136.0| -|2020-01-01T00:00:40.000+08:00| 138.0| -|2020-01-01T00:00:42.000+08:00| 140.0| -|2020-01-01T00:00:44.000+08:00| 142.0| -|2020-01-01T00:00:46.000+08:00| 144.0| -|2020-01-01T00:00:48.000+08:00| 146.0| -|2020-01-01T00:00:50.000+08:00| 148.0| -|2020-01-01T00:00:52.000+08:00| 150.0| -|2020-01-01T00:00:54.000+08:00| 152.0| -|2020-01-01T00:00:56.000+08:00| 154.0| -|2020-01-01T00:00:58.000+08:00| 156.0| -|2020-01-01T00:01:00.000+08:00| 158.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select Validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 -``` - -Output series: - -``` -+-----------------------------+----------------------------------------+ -| Time|validity(root.test.d1.s1, "window"="15")| -+-----------------------------+----------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| -|2020-01-01T00:00:32.000+08:00| 1.0| -+-----------------------------+----------------------------------------+ -``` - - - - - -## Data Profiling - -### ACF - -#### Registration statement - -```sql -create function acf as 'org.apache.iotdb.library.dprofile.UDTFACF' -``` - -#### Usage - -This function is used to calculate the auto-correlation factor of the input time series, -which equals to cross correlation between the same series. -For more information, please refer to [XCorr](#XCorr) function. - -**Name:** ACF - -**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. -There are $2N-1$ data points in the series, and the values are interpreted in details in [XCorr](#XCorr) function. - -**Note:** - -+ `null` and `NaN` values in the input series will be ignored and treated as 0. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| 1| -|2020-01-01T00:00:02.000+08:00| null| -|2020-01-01T00:00:03.000+08:00| 3| -|2020-01-01T00:00:04.000+08:00| NaN| -|2020-01-01T00:00:05.000+08:00| 5| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 -``` - -Output series: - -``` -+-----------------------------+--------------------+ -| Time|acf(root.test.d1.s1)| -+-----------------------------+--------------------+ -|1970-01-01T08:00:00.001+08:00| 1.0| -|1970-01-01T08:00:00.002+08:00| 0.0| -|1970-01-01T08:00:00.003+08:00| 3.6| -|1970-01-01T08:00:00.004+08:00| 0.0| -|1970-01-01T08:00:00.005+08:00| 7.0| -|1970-01-01T08:00:00.006+08:00| 0.0| -|1970-01-01T08:00:00.007+08:00| 3.6| -|1970-01-01T08:00:00.008+08:00| 0.0| -|1970-01-01T08:00:00.009+08:00| 1.0| -+-----------------------------+--------------------+ -``` - -### Distinct - -#### Registration statement - -```sql -create function distinct as 'org.apache.iotdb.library.dprofile.UDTFDistinct' -``` - -#### Usage - -This function returns all unique values in time series. - -**Name:** DISTINCT - -**Input Series:** Only support a single input series. The type is arbitrary. - -**Output Series:** Output a single series. The type is the same as the input. - -**Note:** - -+ The timestamp of the output series is meaningless. The output order is arbitrary. -+ Missing points and null points in the input series will be ignored, but `NaN` will not. -+ Case Sensitive. - - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s2| -+-----------------------------+---------------+ -|2020-01-01T08:00:00.001+08:00| Hello| -|2020-01-01T08:00:00.002+08:00| hello| -|2020-01-01T08:00:00.003+08:00| Hello| -|2020-01-01T08:00:00.004+08:00| World| -|2020-01-01T08:00:00.005+08:00| World| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select distinct(s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------+ -| Time|distinct(root.test.d2.s2)| -+-----------------------------+-------------------------+ -|1970-01-01T08:00:00.001+08:00| Hello| -|1970-01-01T08:00:00.002+08:00| hello| -|1970-01-01T08:00:00.003+08:00| World| -+-----------------------------+-------------------------+ -``` - -### Histogram - -#### Registration statement - -```sql -create function histogram as 'org.apache.iotdb.library.dprofile.UDTFHistogram' -``` - -#### Usage - -This function is used to calculate the distribution histogram of a single column of numerical data. - -**Name:** HISTOGRAM - -**Input Series:** Only supports a single input sequence, the type is INT32 / INT64 / FLOAT / DOUBLE - -**Parameters:** - -+ `min`: The lower limit of the requested data range, the default value is -Double.MAX_VALUE. -+ `max`: The upper limit of the requested data range, the default value is Double.MAX_VALUE, and the value of start must be less than or equal to end. -+ `count`: The number of buckets of the histogram, the default value is 1. It must be a positive integer. - -**Output Series:** The value of the bucket of the histogram, where the lower bound represented by the i-th bucket (index starts from 1) is $min+ (i-1)\cdot\frac{max-min}{count}$ and the upper bound is $min + i \cdot \frac{max-min}{count}$. - -**Note:** - -+ If the value is lower than `min`, it will be put into the 1st bucket. If the value is larger than `max`, it will be put into the last bucket. -+ Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:00.000+08:00| 1.0| -|2020-01-01T00:00:01.000+08:00| 2.0| -|2020-01-01T00:00:02.000+08:00| 3.0| -|2020-01-01T00:00:03.000+08:00| 4.0| -|2020-01-01T00:00:04.000+08:00| 5.0| -|2020-01-01T00:00:05.000+08:00| 6.0| -|2020-01-01T00:00:06.000+08:00| 7.0| -|2020-01-01T00:00:07.000+08:00| 8.0| -|2020-01-01T00:00:08.000+08:00| 9.0| -|2020-01-01T00:00:09.000+08:00| 10.0| -|2020-01-01T00:00:10.000+08:00| 11.0| -|2020-01-01T00:00:11.000+08:00| 12.0| -|2020-01-01T00:00:12.000+08:00| 13.0| -|2020-01-01T00:00:13.000+08:00| 14.0| -|2020-01-01T00:00:14.000+08:00| 15.0| -|2020-01-01T00:00:15.000+08:00| 16.0| -|2020-01-01T00:00:16.000+08:00| 17.0| -|2020-01-01T00:00:17.000+08:00| 18.0| -|2020-01-01T00:00:18.000+08:00| 19.0| -|2020-01-01T00:00:19.000+08:00| 20.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+---------------------------------------------------------------+ -| Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")| -+-----------------------------+---------------------------------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 2| -|1970-01-01T08:00:00.001+08:00| 2| -|1970-01-01T08:00:00.002+08:00| 2| -|1970-01-01T08:00:00.003+08:00| 2| -|1970-01-01T08:00:00.004+08:00| 2| -|1970-01-01T08:00:00.005+08:00| 2| -|1970-01-01T08:00:00.006+08:00| 2| -|1970-01-01T08:00:00.007+08:00| 2| -|1970-01-01T08:00:00.008+08:00| 2| -|1970-01-01T08:00:00.009+08:00| 2| -+-----------------------------+---------------------------------------------------------------+ -``` - -### Integral - -#### Registration statement - -```sql -create function integral as 'org.apache.iotdb.library.dprofile.UDAFIntegral' -``` - -#### Usage - -This function is used to calculate the integration of time series, -which equals to the area under the curve with time as X-axis and values as Y-axis. - -**Name:** INTEGRAL - -**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `unit`: The unit of time used when computing the integral. - The value should be chosen from "1S", "1s", "1m", "1H", "1d"(case-sensitive), - and each represents taking one millisecond / second / minute / hour / day as 1.0 while calculating the area and integral. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the integration. - -**Note:** - -+ The integral value equals to the sum of the areas of right-angled trapezoids consisting of each two adjacent points and the time-axis. - Choosing different `unit` implies different scaling of time axis, thus making it apparent to convert the value among those results with constant coefficient. - -+ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. - -#### Examples - -##### Default Parameters - -With default parameters, this function will take one second as 1.0. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| 1| -|2020-01-01T00:00:02.000+08:00| 2| -|2020-01-01T00:00:03.000+08:00| 5| -|2020-01-01T00:00:04.000+08:00| 6| -|2020-01-01T00:00:05.000+08:00| 7| -|2020-01-01T00:00:08.000+08:00| 8| -|2020-01-01T00:00:09.000+08:00| NaN| -|2020-01-01T00:00:10.000+08:00| 10| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 -``` - -Output series: - -``` -+-----------------------------+-------------------------+ -| Time|integral(root.test.d1.s1)| -+-----------------------------+-------------------------+ -|1970-01-01T08:00:00.000+08:00| 57.5| -+-----------------------------+-------------------------+ -``` - -Calculation expression: -$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5$$ - -##### Specific time unit - -With time unit specified as "1m", this function will take one minute as 1.0. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 -``` - -Output series: - -``` -+-----------------------------+-------------------------+ -| Time|integral(root.test.d1.s1)| -+-----------------------------+-------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.958| -+-----------------------------+-------------------------+ -``` - -Calculation expression: -$$\frac{1}{2\times 60}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958$$ - -### IntegralAvg - -#### Registration statement - -```sql -create function integralavg as 'org.apache.iotdb.library.dprofile.UDAFIntegralAvg' -``` - -#### Usage - -This function is used to calculate the function average of time series. -The output equals to the area divided by the time interval using the same time `unit`. -For more information of the area under the curve, please refer to `Integral` function. - -**Name:** INTEGRALAVG - -**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the time-weighted average. - -**Note:** - -+ The time-weighted value equals to the integral value with any `unit` divided by the time interval of input series. - The result is irrelevant to the time unit used in integral, and it's consistent with the timestamp precision of IoTDB by default. - -+ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. - -+ If the input series is empty, the output value will be 0.0, but if there is only one data point, the value will equal to the input value. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| 1| -|2020-01-01T00:00:02.000+08:00| 2| -|2020-01-01T00:00:03.000+08:00| 5| -|2020-01-01T00:00:04.000+08:00| 6| -|2020-01-01T00:00:05.000+08:00| 7| -|2020-01-01T00:00:08.000+08:00| 8| -|2020-01-01T00:00:09.000+08:00| NaN| -|2020-01-01T00:00:10.000+08:00| 10| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 -``` - -Output series: - -``` -+-----------------------------+----------------------------+ -| Time|integralavg(root.test.d1.s1)| -+-----------------------------+----------------------------+ -|1970-01-01T08:00:00.000+08:00| 5.75| -+-----------------------------+----------------------------+ -``` - -Calculation expression: -$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75$$ - -### Mad - -#### Registration statement - -```sql -create function mad as 'org.apache.iotdb.library.dprofile.UDAFMad' -``` - -#### Usage - -The function is used to compute the exact or approximate median absolute deviation (MAD) of a numeric time series. MAD is the median of the deviation of each element from the elements' median. - -Take a dataset $\{1,3,3,5,5,6,7,8,9\}$ as an instance. Its median is 5 and the deviation of each element from the median is $\{0,0,1,2,2,2,3,4,4\}$, whose median is 2. Therefore, the MAD of the original dataset is 2. - -**Name:** MAD - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameter:** - -+ `error`: The relative error of the approximate MAD. It should be within [0,1) and the default value is 0. Taking `error`=0.01 as an instance, suppose the exact MAD is $a$ and the approximate MAD is $b$, we have $0.99a \le b \le 1.01a$. With `error`=0, the output is the exact MAD. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the MAD. - -**Note:** Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -##### Exact Query - -With the default `error`(`error`=0), the function queries the exact MAD. - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -............ -Total line number = 20 -``` - -SQL for query: - -```sql -select mad(s1) from root.test -``` - -Output series: - -``` -+-----------------------------+---------------------------------+ -| Time|median(root.test.s1, "error"="0")| -+-----------------------------+---------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -+-----------------------------+---------------------------------+ -``` - -##### Approximate Query - -By setting `error` within (0,1), the function queries the approximate MAD. - -SQL for query: - -```sql -select mad(s1, "error"="0.01") from root.test -``` - -Output series: - -``` -+-----------------------------+---------------------------------+ -| Time|mad(root.test.s1, "error"="0.01")| -+-----------------------------+---------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.9900000000000001| -+-----------------------------+---------------------------------+ -``` - -### Median - -#### Registration statement - -```sql -create function median as 'org.apache.iotdb.library.dprofile.UDAFMedian' -``` - -#### Usage - -The function is used to compute the exact or approximate median of a numeric time series. Median is the value separating the higher half from the lower half of a data sample. - -**Name:** MEDIAN - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameter:** - -+ `error`: The rank error of the approximate median. It should be within [0,1) and the default value is 0. For instance, a median with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact median. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the median. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -Total line number = 20 -``` - -SQL for query: - -```sql -select median(s1, "error"="0.01") from root.test -``` - -Output series: - -``` -+-----------------------------+------------------------------------+ -| Time|median(root.test.s1, "error"="0.01")| -+-----------------------------+------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -+-----------------------------+------------------------------------+ -``` - -### MinMax - -#### Registration statement - -```sql -create function minmax as 'org.apache.iotdb.library.dprofile.UDTFMinMax' -``` - -#### Usage - -This function is used to standardize the input series with min-max. Minimum value is transformed to 0; maximum value is transformed to 1. - -**Name:** MINMAX - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide minimum and maximum values. The default method is "batch". -+ `min`: The maximum value when method is set to "stream". -+ `max`: The minimum value when method is set to "stream". - -**Output Series:** Output a single series. The type is DOUBLE. - -#### Examples - -##### Batch computing - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select minmax(s1) from root.test -``` - -Output series: - -``` -+-----------------------------+--------------------+ -| Time|minmax(root.test.s1)| -+-----------------------------+--------------------+ -|1970-01-01T08:00:00.100+08:00| 0.16666666666666666| -|1970-01-01T08:00:00.200+08:00| 0.16666666666666666| -|1970-01-01T08:00:00.300+08:00| 0.25| -|1970-01-01T08:00:00.400+08:00| 0.08333333333333333| -|1970-01-01T08:00:00.500+08:00| 0.16666666666666666| -|1970-01-01T08:00:00.600+08:00| 0.16666666666666666| -|1970-01-01T08:00:00.700+08:00| 0.0| -|1970-01-01T08:00:00.800+08:00| 0.3333333333333333| -|1970-01-01T08:00:00.900+08:00| 0.16666666666666666| -|1970-01-01T08:00:01.000+08:00| 0.16666666666666666| -|1970-01-01T08:00:01.100+08:00| 0.25| -|1970-01-01T08:00:01.200+08:00| 0.08333333333333333| -|1970-01-01T08:00:01.300+08:00| 0.08333333333333333| -|1970-01-01T08:00:01.400+08:00| 0.25| -|1970-01-01T08:00:01.500+08:00| 0.16666666666666666| -|1970-01-01T08:00:01.600+08:00| 0.16666666666666666| -|1970-01-01T08:00:01.700+08:00| 1.0| -|1970-01-01T08:00:01.800+08:00| 0.3333333333333333| -|1970-01-01T08:00:01.900+08:00| 0.0| -|1970-01-01T08:00:02.000+08:00| 0.16666666666666666| -+-----------------------------+--------------------+ -``` - - - -### MvAvg - -#### Registration statement - -```sql -create function mvavg as 'org.apache.iotdb.library.dprofile.UDTFMvAvg' -``` - -#### Usage - -This function is used to calculate moving average of input series. - -**Name:** MVAVG - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `window`: Length of the moving window. Default value is 10. - -**Output Series:** Output a single series. The type is DOUBLE. - -#### Examples - -##### Batch computing - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select mvavg(s1, "window"="3") from root.test -``` - -Output series: - -``` -+-----------------------------+---------------------------------+ -| Time|mvavg(root.test.s1, "window"="3")| -+-----------------------------+---------------------------------+ -|1970-01-01T08:00:00.300+08:00| 0.3333333333333333| -|1970-01-01T08:00:00.400+08:00| 0.0| -|1970-01-01T08:00:00.500+08:00| -0.3333333333333333| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -0.6666666666666666| -|1970-01-01T08:00:00.800+08:00| 0.0| -|1970-01-01T08:00:00.900+08:00| 0.6666666666666666| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 0.3333333333333333| -|1970-01-01T08:00:01.200+08:00| 0.0| -|1970-01-01T08:00:01.300+08:00| -0.6666666666666666| -|1970-01-01T08:00:01.400+08:00| 0.0| -|1970-01-01T08:00:01.500+08:00| 0.3333333333333333| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 3.3333333333333335| -|1970-01-01T08:00:01.800+08:00| 4.0| -|1970-01-01T08:00:01.900+08:00| 0.0| -|1970-01-01T08:00:02.000+08:00| -0.6666666666666666| -+-----------------------------+---------------------------------+ -``` - -### PACF - -#### Registration statement - -```sql -create function pacf as 'org.apache.iotdb.library.dprofile.UDTFPACF' -``` - -#### Usage - -This function is used to calculate partial autocorrelation of input series by solving Yule-Walker equation. For some cases, the equation may not be solved, and NaN will be output. - -**Name:** PACF - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `lag`: Maximum lag of pacf to calculate. The default value is $\min(10\log_{10}n,n-1)$, where $n$ is the number of data points. - -**Output Series:** Output a single series. The type is DOUBLE. - -#### Examples - -##### Assigning maximum lag - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| 1| -|2020-01-01T00:00:02.000+08:00| NaN| -|2020-01-01T00:00:03.000+08:00| 3| -|2020-01-01T00:00:04.000+08:00| NaN| -|2020-01-01T00:00:05.000+08:00| 5| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select pacf(s1, "lag"="5") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+--------------------------------+ -| Time|pacf(root.test.d1.s1, "lag"="5")| -+-----------------------------+--------------------------------+ -|2020-01-01T00:00:01.000+08:00| 1.0| -|2020-01-01T00:00:02.000+08:00| -0.5744680851063829| -|2020-01-01T00:00:03.000+08:00| 0.3172297297297296| -|2020-01-01T00:00:04.000+08:00| -0.2977686586304181| -|2020-01-01T00:00:05.000+08:00| -2.0609033521065867| -+-----------------------------+--------------------------------+ -``` - -### Percentile - -#### Registration statement - -```sql -create function percentile as 'org.apache.iotdb.library.dprofile.UDAFPercentile' -``` - -#### Usage - -The function is used to compute the exact or approximate percentile of a numeric time series. A percentile is value of element in the certain rank of the sorted series. - -**Name:** PERCENTILE - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameter:** - -+ `rank`: The rank percentage of the percentile. It should be (0,1] and the default value is 0.5. For instance, a percentile with `rank`=0.5 is the median. -+ `error`: The rank error of the approximate percentile. It should be within [0,1) and the default value is 0. For instance, a 0.5-percentile with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact percentile. - -**Output Series:** Output a single series. The type is the same as input series. If `error`=0, there is only one data point in the series, whose timestamp is the same has which the first percentile value has, and value is the percentile, otherwise the timestamp of the only data point is 0. - -**Note:** Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+-------------+ -| Time|root.test2.s1| -+-----------------------------+-------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+-------------+ -Total line number = 20 -``` - -SQL for query: - -```sql -select percentile(s0, "rank"="0.2", "error"="0.01") from root.test -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------------------+ -| Time|percentile(root.test2.s1, "rank"="0.2", "error"="0.01")| -+-----------------------------+-------------------------------------------------------+ -|1970-01-01T08:00:00.000+08:00| -1.0| -+-----------------------------+-------------------------------------------------------+ -``` - -### Quantile - -#### Registration statement - -```sql -create function quantile as 'org.apache.iotdb.library.dprofile.UDAFQuantile' -``` - -#### Usage - -The function is used to compute the approximate quantile of a numeric time series. A quantile is value of element in the certain rank of the sorted series. - -**Name:** QUANTILE - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameter:** - -+ `rank`: The rank of the quantile. It should be (0,1] and the default value is 0.5. For instance, a quantile with `rank`=0.5 is the median. -+ `K`: The size of KLL sketch maintained in the query. It should be within [100,+inf) and the default value is 800. For instance, the 0.5-quantile computed by a KLL sketch with K=800 items is a value with rank quantile 0.49~0.51 with a confidence of at least 99%. The result will be more accurate as K increases. - -**Output Series:** Output a single series. The type is the same as input series. The timestamp of the only data point is 0. - -**Note:** Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+-------------+ -| Time|root.test1.s1| -+-----------------------------+-------------+ -|2021-03-17T10:32:17.054+08:00| 7| -|2021-03-17T10:32:18.054+08:00| 15| -|2021-03-17T10:32:19.054+08:00| 36| -|2021-03-17T10:32:20.054+08:00| 39| -|2021-03-17T10:32:21.054+08:00| 40| -|2021-03-17T10:32:22.054+08:00| 41| -|2021-03-17T10:32:23.054+08:00| 20| -|2021-03-17T10:32:24.054+08:00| 18| -+-----------------------------+-------------+ -............ -Total line number = 8 -``` - -SQL for query: - -```sql -select quantile(s1, "rank"="0.2", "K"="800") from root.test1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------+ -| Time|quantile(root.test1.s1, "rank"="0.2", "K"="800")| -+-----------------------------+------------------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 7.000000000000001| -+-----------------------------+------------------------------------------------+ -``` - -### Period - -#### Registration statement - -```sql -create function period as 'org.apache.iotdb.library.dprofile.UDAFPeriod' -``` - -#### Usage - -The function is used to compute the period of a numeric time series. - -**Name:** PERIOD - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is INT32. There is only one data point in the series, whose timestamp is 0 and value is the period. - -#### Examples - -Input series: - - -``` -+-----------------------------+---------------+ -| Time|root.test.d3.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.001+08:00| 1.0| -|1970-01-01T08:00:00.002+08:00| 2.0| -|1970-01-01T08:00:00.003+08:00| 3.0| -|1970-01-01T08:00:00.004+08:00| 1.0| -|1970-01-01T08:00:00.005+08:00| 2.0| -|1970-01-01T08:00:00.006+08:00| 3.0| -|1970-01-01T08:00:00.007+08:00| 1.0| -|1970-01-01T08:00:00.008+08:00| 2.0| -|1970-01-01T08:00:00.009+08:00| 3.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select period(s1) from root.test.d3 -``` - -Output series: - -``` -+-----------------------------+-----------------------+ -| Time|period(root.test.d3.s1)| -+-----------------------------+-----------------------+ -|1970-01-01T08:00:00.000+08:00| 3| -+-----------------------------+-----------------------+ -``` - -### QLB - -#### Registration statement - -```sql -create function qlb as 'org.apache.iotdb.library.dprofile.UDTFQLB' -``` - -#### Usage - -This function is used to calculate Ljung-Box statistics $Q_{LB}$ for time series, and convert it to p value. - -**Name:** QLB - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters**: - -`lag`: max lag to calculate. Legal input shall be integer from 1 to n-2, where n is the sample number. Default value is n-2. - -**Output Series:** Output a single series. The type is DOUBLE. The output series is p value, and timestamp means lag. - -**Note:** If you want to calculate Ljung-Box statistics $Q_{LB}$ instead of p value, you may use ACF function. - -#### Examples - -##### Using Default Parameter - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T00:00:00.100+08:00| 1.22| -|1970-01-01T00:00:00.200+08:00| -2.78| -|1970-01-01T00:00:00.300+08:00| 1.53| -|1970-01-01T00:00:00.400+08:00| 0.70| -|1970-01-01T00:00:00.500+08:00| 0.75| -|1970-01-01T00:00:00.600+08:00| -0.72| -|1970-01-01T00:00:00.700+08:00| -0.22| -|1970-01-01T00:00:00.800+08:00| 0.28| -|1970-01-01T00:00:00.900+08:00| 0.57| -|1970-01-01T00:00:01.000+08:00| -0.22| -|1970-01-01T00:00:01.100+08:00| -0.72| -|1970-01-01T00:00:01.200+08:00| 1.34| -|1970-01-01T00:00:01.300+08:00| -0.25| -|1970-01-01T00:00:01.400+08:00| 0.17| -|1970-01-01T00:00:01.500+08:00| 2.51| -|1970-01-01T00:00:01.600+08:00| 1.42| -|1970-01-01T00:00:01.700+08:00| -1.34| -|1970-01-01T00:00:01.800+08:00| -0.01| -|1970-01-01T00:00:01.900+08:00| -0.49| -|1970-01-01T00:00:02.000+08:00| 1.63| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select QLB(s1) from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+--------------------+ -| Time|QLB(root.test.d1.s1)| -+-----------------------------+--------------------+ -|1970-01-01T00:00:00.001+08:00| 0.2168702295315677| -|1970-01-01T00:00:00.002+08:00| 0.3068948509261751| -|1970-01-01T00:00:00.003+08:00| 0.4217859150918444| -|1970-01-01T00:00:00.004+08:00| 0.5114539874276656| -|1970-01-01T00:00:00.005+08:00| 0.6560619525616759| -|1970-01-01T00:00:00.006+08:00| 0.7722398654053280| -|1970-01-01T00:00:00.007+08:00| 0.8532491661465290| -|1970-01-01T00:00:00.008+08:00| 0.9028575017542528| -|1970-01-01T00:00:00.009+08:00| 0.9434989988192729| -|1970-01-01T00:00:00.010+08:00| 0.8950280161464689| -|1970-01-01T00:00:00.011+08:00| 0.7701048398839656| -|1970-01-01T00:00:00.012+08:00| 0.7845536060001281| -|1970-01-01T00:00:00.013+08:00| 0.5943030981705825| -|1970-01-01T00:00:00.014+08:00| 0.4618413512531093| -|1970-01-01T00:00:00.015+08:00| 0.2645948244673964| -|1970-01-01T00:00:00.016+08:00| 0.3167530476666645| -|1970-01-01T00:00:00.017+08:00| 0.2330010780351453| -|1970-01-01T00:00:00.018+08:00| 0.0666611237622325| -+-----------------------------+--------------------+ -``` - -### Resample - -#### Registration statement - -```sql -create function re_sample as 'org.apache.iotdb.library.dprofile.UDTFResample' -``` - -#### Usage - -This function is used to resample the input series according to a given frequency, -including up-sampling and down-sampling. -Currently, the supported up-sampling methods are -NaN (filling with `NaN`), -FFill (filling with previous value), -BFill (filling with next value) and -Linear (filling with linear interpolation). -Down-sampling relies on group aggregation, -which supports Max, Min, First, Last, Mean and Median. - -**Name:** RESAMPLE - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - - -+ `every`: The frequency of resampling, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. -+ `interp`: The interpolation method of up-sampling, which is 'NaN', 'FFill', 'BFill' or 'Linear'. By default, NaN is used. -+ `aggr`: The aggregation method of down-sampling, which is 'Max', 'Min', 'First', 'Last', 'Mean' or 'Median'. By default, Mean is used. -+ `start`: The start time (inclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the first valid data point. -+ `end`: The end time (exclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the last valid data point. - -**Output Series:** Output a single series. The type is DOUBLE. It is strictly equispaced with the frequency `every`. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - -##### Up-sampling - -When the frequency of resampling is higher than the original frequency, up-sampling starts. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2021-03-06T16:00:00.000+08:00| 3.09| -|2021-03-06T16:15:00.000+08:00| 3.53| -|2021-03-06T16:30:00.000+08:00| 3.5| -|2021-03-06T16:45:00.000+08:00| 3.51| -|2021-03-06T17:00:00.000+08:00| 3.41| -+-----------------------------+---------------+ -``` - - -SQL for query: - -```sql -select resample(s1,'every'='5m','interp'='linear') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+----------------------------------------------------------+ -| Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")| -+-----------------------------+----------------------------------------------------------+ -|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| -|2021-03-06T16:05:00.000+08:00| 3.2366665999094644| -|2021-03-06T16:10:00.000+08:00| 3.3833332856496177| -|2021-03-06T16:15:00.000+08:00| 3.5299999713897705| -|2021-03-06T16:20:00.000+08:00| 3.5199999809265137| -|2021-03-06T16:25:00.000+08:00| 3.509999990463257| -|2021-03-06T16:30:00.000+08:00| 3.5| -|2021-03-06T16:35:00.000+08:00| 3.503333330154419| -|2021-03-06T16:40:00.000+08:00| 3.506666660308838| -|2021-03-06T16:45:00.000+08:00| 3.509999990463257| -|2021-03-06T16:50:00.000+08:00| 3.4766666889190674| -|2021-03-06T16:55:00.000+08:00| 3.443333387374878| -|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| -+-----------------------------+----------------------------------------------------------+ -``` - -##### Down-sampling - -When the frequency of resampling is lower than the original frequency, down-sampling starts. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select resample(s1,'every'='30m','aggr'='first') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------------+ -| Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")| -+-----------------------------+--------------------------------------------------------+ -|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| -|2021-03-06T16:30:00.000+08:00| 3.5| -|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| -+-----------------------------+--------------------------------------------------------+ -``` - - - -##### Specify the time period - -The time period of resampling can be specified with `start` and `end`. -The period outside the actual time range will be interpolated. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-----------------------------------------------------------------------+ -| Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")| -+-----------------------------+-----------------------------------------------------------------------+ -|2021-03-06T15:00:00.000+08:00| NaN| -|2021-03-06T15:30:00.000+08:00| NaN| -|2021-03-06T16:00:00.000+08:00| 3.309999942779541| -|2021-03-06T16:30:00.000+08:00| 3.5049999952316284| -|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| -+-----------------------------+-----------------------------------------------------------------------+ -``` - -### Sample - -#### Registration statement - -```sql -create function sample as 'org.apache.iotdb.library.dprofile.UDTFSample' -``` - -#### Usage - -This function is used to sample the input series, -that is, select a specified number of data points from the input series and output them. -Currently, three sampling methods are supported: -**Reservoir sampling** randomly selects data points. -All of the points have the same probability of being sampled. -**Isometric sampling** selects data points at equal index intervals. -**Triangle sampling** assigns data points to the buckets based on the number of sampling. -Then it calculates the area of the triangle based on these points inside the bucket and selects the point with the largest area of the triangle. -For more detail, please read [paper](http://skemman.is/stream/get/1946/15343/37285/3/SS_MSthesis.pdf) - -**Name:** SAMPLE - -**Input Series:** Only support a single input series. The type is arbitrary. - -**Parameters:** - -+ `method`: The method of sampling, which is 'reservoir', 'isometric' or 'triangle'. By default, reservoir sampling is used. -+ `k`: The number of sampling, which is a positive integer. By default, it's 1. - -**Output Series:** Output a single series. The type is the same as the input. The length of the output series is `k`. Each data point in the output series comes from the input series. - -**Note:** If `k` is greater than the length of input series, all data points in the input series will be output. - -#### Examples - -##### Reservoir Sampling - -When `method` is 'reservoir' or the default, reservoir sampling is used. -Due to the randomness of this method, the output series shown below is only a possible result. - - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| 1.0| -|2020-01-01T00:00:02.000+08:00| 2.0| -|2020-01-01T00:00:03.000+08:00| 3.0| -|2020-01-01T00:00:04.000+08:00| 4.0| -|2020-01-01T00:00:05.000+08:00| 5.0| -|2020-01-01T00:00:06.000+08:00| 6.0| -|2020-01-01T00:00:07.000+08:00| 7.0| -|2020-01-01T00:00:08.000+08:00| 8.0| -|2020-01-01T00:00:09.000+08:00| 9.0| -|2020-01-01T00:00:10.000+08:00| 10.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select sample(s1,'method'='reservoir','k'='5') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------+ -| Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")| -+-----------------------------+------------------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 2.0| -|2020-01-01T00:00:03.000+08:00| 3.0| -|2020-01-01T00:00:05.000+08:00| 5.0| -|2020-01-01T00:00:08.000+08:00| 8.0| -|2020-01-01T00:00:10.000+08:00| 10.0| -+-----------------------------+------------------------------------------------------+ -``` - -##### Isometric Sampling - -When `method` is 'isometric', isometric sampling is used. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select sample(s1,'method'='isometric','k'='5') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------+ -| Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")| -+-----------------------------+------------------------------------------------------+ -|2020-01-01T00:00:01.000+08:00| 1.0| -|2020-01-01T00:00:03.000+08:00| 3.0| -|2020-01-01T00:00:05.000+08:00| 5.0| -|2020-01-01T00:00:07.000+08:00| 7.0| -|2020-01-01T00:00:09.000+08:00| 9.0| -+-----------------------------+------------------------------------------------------+ -``` - -### Segment - -#### Registration statement - -```sql -create function segment as 'org.apache.iotdb.library.dprofile.UDTFSegment' -``` - -#### Usage - -This function is used to segment a time series into subsequences according to linear trend, and returns linear fitted values of first values in each subsequence or every data point. - -**Name:** SEGMENT - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `output` :"all" to output all fitted points; "first" to output first fitted points in each subsequence. - -+ `error`: error allowed at linear regression. It is defined as mean absolute error of a subsequence. - -**Output Series:** Output a single series. The type is DOUBLE. - -**Note:** This function treat input series as equal-interval sampled. All data are loaded, so downsample input series first if there are too many data points. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.000+08:00| 5.0| -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 1.0| -|1970-01-01T08:00:00.300+08:00| 2.0| -|1970-01-01T08:00:00.400+08:00| 3.0| -|1970-01-01T08:00:00.500+08:00| 4.0| -|1970-01-01T08:00:00.600+08:00| 5.0| -|1970-01-01T08:00:00.700+08:00| 6.0| -|1970-01-01T08:00:00.800+08:00| 7.0| -|1970-01-01T08:00:00.900+08:00| 8.0| -|1970-01-01T08:00:01.000+08:00| 9.0| -|1970-01-01T08:00:01.100+08:00| 9.1| -|1970-01-01T08:00:01.200+08:00| 9.2| -|1970-01-01T08:00:01.300+08:00| 9.3| -|1970-01-01T08:00:01.400+08:00| 9.4| -|1970-01-01T08:00:01.500+08:00| 9.5| -|1970-01-01T08:00:01.600+08:00| 9.6| -|1970-01-01T08:00:01.700+08:00| 9.7| -|1970-01-01T08:00:01.800+08:00| 9.8| -|1970-01-01T08:00:01.900+08:00| 9.9| -|1970-01-01T08:00:02.000+08:00| 10.0| -|1970-01-01T08:00:02.100+08:00| 8.0| -|1970-01-01T08:00:02.200+08:00| 6.0| -|1970-01-01T08:00:02.300+08:00| 4.0| -|1970-01-01T08:00:02.400+08:00| 2.0| -|1970-01-01T08:00:02.500+08:00| 0.0| -|1970-01-01T08:00:02.600+08:00| -2.0| -|1970-01-01T08:00:02.700+08:00| -4.0| -|1970-01-01T08:00:02.800+08:00| -6.0| -|1970-01-01T08:00:02.900+08:00| -8.0| -|1970-01-01T08:00:03.000+08:00| -10.0| -|1970-01-01T08:00:03.100+08:00| 10.0| -|1970-01-01T08:00:03.200+08:00| 10.0| -|1970-01-01T08:00:03.300+08:00| 10.0| -|1970-01-01T08:00:03.400+08:00| 10.0| -|1970-01-01T08:00:03.500+08:00| 10.0| -|1970-01-01T08:00:03.600+08:00| 10.0| -|1970-01-01T08:00:03.700+08:00| 10.0| -|1970-01-01T08:00:03.800+08:00| 10.0| -|1970-01-01T08:00:03.900+08:00| 10.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select segment(s1, "error"="0.1") from root.test -``` - -Output series: - -``` -+-----------------------------+------------------------------------+ -| Time|segment(root.test.s1, "error"="0.1")| -+-----------------------------+------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 5.0| -|1970-01-01T08:00:00.200+08:00| 1.0| -|1970-01-01T08:00:01.000+08:00| 9.0| -|1970-01-01T08:00:02.000+08:00| 10.0| -|1970-01-01T08:00:03.000+08:00| -10.0| -|1970-01-01T08:00:03.200+08:00| 10.0| -+-----------------------------+------------------------------------+ -``` - -### Skew - -#### Registration statement - -```sql -create function skew as 'org.apache.iotdb.library.dprofile.UDAFSkew' -``` - -#### Usage - -This function is used to calculate the population skewness. - -**Name:** SKEW - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population skewness. - -**Note:** Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:00.000+08:00| 1.0| -|2020-01-01T00:00:01.000+08:00| 2.0| -|2020-01-01T00:00:02.000+08:00| 3.0| -|2020-01-01T00:00:03.000+08:00| 4.0| -|2020-01-01T00:00:04.000+08:00| 5.0| -|2020-01-01T00:00:05.000+08:00| 6.0| -|2020-01-01T00:00:06.000+08:00| 7.0| -|2020-01-01T00:00:07.000+08:00| 8.0| -|2020-01-01T00:00:08.000+08:00| 9.0| -|2020-01-01T00:00:09.000+08:00| 10.0| -|2020-01-01T00:00:10.000+08:00| 10.0| -|2020-01-01T00:00:11.000+08:00| 10.0| -|2020-01-01T00:00:12.000+08:00| 10.0| -|2020-01-01T00:00:13.000+08:00| 10.0| -|2020-01-01T00:00:14.000+08:00| 10.0| -|2020-01-01T00:00:15.000+08:00| 10.0| -|2020-01-01T00:00:16.000+08:00| 10.0| -|2020-01-01T00:00:17.000+08:00| 10.0| -|2020-01-01T00:00:18.000+08:00| 10.0| -|2020-01-01T00:00:19.000+08:00| 10.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select skew(s1) from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-----------------------+ -| Time| skew(root.test.d1.s1)| -+-----------------------------+-----------------------+ -|1970-01-01T08:00:00.000+08:00| -0.9998427402292644| -+-----------------------------+-----------------------+ -``` - -### Spline - -#### Registration statement - -```sql -create function spline as 'org.apache.iotdb.library.dprofile.UDTFSpline' -``` - -#### Usage - -This function is used to calculate cubic spline interpolation of input series. - -**Name:** SPLINE - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `points`: Number of resampling points. - -**Output Series:** Output a single series. The type is DOUBLE. - -**Note**: Output series retains the first and last timestamps of input series. Interpolation points are selected at equal intervals. The function tries to calculate only when there are no less than 4 points in input series. - -#### Examples - -##### Assigning number of interpolation points - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.2| -|1970-01-01T08:00:00.500+08:00| 1.7| -|1970-01-01T08:00:00.700+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 2.1| -|1970-01-01T08:00:01.100+08:00| 2.0| -|1970-01-01T08:00:01.200+08:00| 1.8| -|1970-01-01T08:00:01.300+08:00| 1.2| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 1.6| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select spline(s1, "points"="151") from root.test -``` - -Output series: - -``` -+-----------------------------+------------------------------------+ -| Time|spline(root.test.s1, "points"="151")| -+-----------------------------+------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -|1970-01-01T08:00:00.010+08:00| 0.04870000251134237| -|1970-01-01T08:00:00.020+08:00| 0.09680000495910646| -|1970-01-01T08:00:00.030+08:00| 0.14430000734329226| -|1970-01-01T08:00:00.040+08:00| 0.19120000966389972| -|1970-01-01T08:00:00.050+08:00| 0.23750001192092896| -|1970-01-01T08:00:00.060+08:00| 0.2832000141143799| -|1970-01-01T08:00:00.070+08:00| 0.32830001624425253| -|1970-01-01T08:00:00.080+08:00| 0.3728000183105469| -|1970-01-01T08:00:00.090+08:00| 0.416700020313263| -|1970-01-01T08:00:00.100+08:00| 0.4600000222524008| -|1970-01-01T08:00:00.110+08:00| 0.5027000241279602| -|1970-01-01T08:00:00.120+08:00| 0.5448000259399414| -|1970-01-01T08:00:00.130+08:00| 0.5863000276883443| -|1970-01-01T08:00:00.140+08:00| 0.627200029373169| -|1970-01-01T08:00:00.150+08:00| 0.6675000309944153| -|1970-01-01T08:00:00.160+08:00| 0.7072000325520833| -|1970-01-01T08:00:00.170+08:00| 0.7463000340461731| -|1970-01-01T08:00:00.180+08:00| 0.7848000354766846| -|1970-01-01T08:00:00.190+08:00| 0.8227000368436178| -|1970-01-01T08:00:00.200+08:00| 0.8600000381469728| -|1970-01-01T08:00:00.210+08:00| 0.8967000393867494| -|1970-01-01T08:00:00.220+08:00| 0.9328000405629477| -|1970-01-01T08:00:00.230+08:00| 0.9683000416755676| -|1970-01-01T08:00:00.240+08:00| 1.0032000427246095| -|1970-01-01T08:00:00.250+08:00| 1.037500043710073| -|1970-01-01T08:00:00.260+08:00| 1.071200044631958| -|1970-01-01T08:00:00.270+08:00| 1.1043000454902647| -|1970-01-01T08:00:00.280+08:00| 1.1368000462849934| -|1970-01-01T08:00:00.290+08:00| 1.1687000470161437| -|1970-01-01T08:00:00.300+08:00| 1.2000000476837158| -|1970-01-01T08:00:00.310+08:00| 1.2307000483103594| -|1970-01-01T08:00:00.320+08:00| 1.2608000489139557| -|1970-01-01T08:00:00.330+08:00| 1.2903000494873524| -|1970-01-01T08:00:00.340+08:00| 1.3192000500233967| -|1970-01-01T08:00:00.350+08:00| 1.3475000505149364| -|1970-01-01T08:00:00.360+08:00| 1.3752000509548186| -|1970-01-01T08:00:00.370+08:00| 1.402300051335891| -|1970-01-01T08:00:00.380+08:00| 1.4288000516510009| -|1970-01-01T08:00:00.390+08:00| 1.4547000518929958| -|1970-01-01T08:00:00.400+08:00| 1.480000052054723| -|1970-01-01T08:00:00.410+08:00| 1.5047000521290301| -|1970-01-01T08:00:00.420+08:00| 1.5288000521087646| -|1970-01-01T08:00:00.430+08:00| 1.5523000519867738| -|1970-01-01T08:00:00.440+08:00| 1.575200051755905| -|1970-01-01T08:00:00.450+08:00| 1.597500051409006| -|1970-01-01T08:00:00.460+08:00| 1.619200050938924| -|1970-01-01T08:00:00.470+08:00| 1.6403000503385066| -|1970-01-01T08:00:00.480+08:00| 1.660800049600601| -|1970-01-01T08:00:00.490+08:00| 1.680700048718055| -|1970-01-01T08:00:00.500+08:00| 1.7000000476837158| -|1970-01-01T08:00:00.510+08:00| 1.7188475466453037| -|1970-01-01T08:00:00.520+08:00| 1.7373800457262996| -|1970-01-01T08:00:00.530+08:00| 1.7555825448831923| -|1970-01-01T08:00:00.540+08:00| 1.7734400440724702| -|1970-01-01T08:00:00.550+08:00| 1.790937543250622| -|1970-01-01T08:00:00.560+08:00| 1.8080600423741364| -|1970-01-01T08:00:00.570+08:00| 1.8247925413995016| -|1970-01-01T08:00:00.580+08:00| 1.8411200402832066| -|1970-01-01T08:00:00.590+08:00| 1.8570275389817397| -|1970-01-01T08:00:00.600+08:00| 1.8725000374515897| -|1970-01-01T08:00:00.610+08:00| 1.8875225356492449| -|1970-01-01T08:00:00.620+08:00| 1.902080033531194| -|1970-01-01T08:00:00.630+08:00| 1.9161575310539258| -|1970-01-01T08:00:00.640+08:00| 1.9297400281739288| -|1970-01-01T08:00:00.650+08:00| 1.9428125248476913| -|1970-01-01T08:00:00.660+08:00| 1.9553600210317021| -|1970-01-01T08:00:00.670+08:00| 1.96736751668245| -|1970-01-01T08:00:00.680+08:00| 1.9788200117564232| -|1970-01-01T08:00:00.690+08:00| 1.9897025062101101| -|1970-01-01T08:00:00.700+08:00| 2.0| -|1970-01-01T08:00:00.710+08:00| 2.0097024933913334| -|1970-01-01T08:00:00.720+08:00| 2.0188199867081615| -|1970-01-01T08:00:00.730+08:00| 2.027367479995188| -|1970-01-01T08:00:00.740+08:00| 2.0353599732971155| -|1970-01-01T08:00:00.750+08:00| 2.0428124666586482| -|1970-01-01T08:00:00.760+08:00| 2.049739960124489| -|1970-01-01T08:00:00.770+08:00| 2.056157453739342| -|1970-01-01T08:00:00.780+08:00| 2.06207994754791| -|1970-01-01T08:00:00.790+08:00| 2.067522441594897| -|1970-01-01T08:00:00.800+08:00| 2.072499935925006| -|1970-01-01T08:00:00.810+08:00| 2.07702743058294| -|1970-01-01T08:00:00.820+08:00| 2.081119925613404| -|1970-01-01T08:00:00.830+08:00| 2.0847924210611| -|1970-01-01T08:00:00.840+08:00| 2.0880599169707317| -|1970-01-01T08:00:00.850+08:00| 2.0909374133870027| -|1970-01-01T08:00:00.860+08:00| 2.0934399103546166| -|1970-01-01T08:00:00.870+08:00| 2.0955824079182768| -|1970-01-01T08:00:00.880+08:00| 2.0973799061226863| -|1970-01-01T08:00:00.890+08:00| 2.098847405012549| -|1970-01-01T08:00:00.900+08:00| 2.0999999046325684| -|1970-01-01T08:00:00.910+08:00| 2.1005574051201332| -|1970-01-01T08:00:00.920+08:00| 2.1002599065303778| -|1970-01-01T08:00:00.930+08:00| 2.0991524087846245| -|1970-01-01T08:00:00.940+08:00| 2.0972799118041947| -|1970-01-01T08:00:00.950+08:00| 2.0946874155104105| -|1970-01-01T08:00:00.960+08:00| 2.0914199198245944| -|1970-01-01T08:00:00.970+08:00| 2.0875224246680673| -|1970-01-01T08:00:00.980+08:00| 2.083039929962151| -|1970-01-01T08:00:00.990+08:00| 2.0780174356281687| -|1970-01-01T08:00:01.000+08:00| 2.0724999415874406| -|1970-01-01T08:00:01.010+08:00| 2.06653244776129| -|1970-01-01T08:00:01.020+08:00| 2.060159954071038| -|1970-01-01T08:00:01.030+08:00| 2.053427460438006| -|1970-01-01T08:00:01.040+08:00| 2.046379966783517| -|1970-01-01T08:00:01.050+08:00| 2.0390624730288924| -|1970-01-01T08:00:01.060+08:00| 2.031519979095454| -|1970-01-01T08:00:01.070+08:00| 2.0237974849045237| -|1970-01-01T08:00:01.080+08:00| 2.015939990377423| -|1970-01-01T08:00:01.090+08:00| 2.0079924954354746| -|1970-01-01T08:00:01.100+08:00| 2.0| -|1970-01-01T08:00:01.110+08:00| 1.9907018211101906| -|1970-01-01T08:00:01.120+08:00| 1.9788509124245144| -|1970-01-01T08:00:01.130+08:00| 1.9645127287932083| -|1970-01-01T08:00:01.140+08:00| 1.9477527250665083| -|1970-01-01T08:00:01.150+08:00| 1.9286363560946513| -|1970-01-01T08:00:01.160+08:00| 1.9072290767278735| -|1970-01-01T08:00:01.170+08:00| 1.8835963418164114| -|1970-01-01T08:00:01.180+08:00| 1.8578036062105014| -|1970-01-01T08:00:01.190+08:00| 1.8299163247603802| -|1970-01-01T08:00:01.200+08:00| 1.7999999523162842| -|1970-01-01T08:00:01.210+08:00| 1.7623635841923329| -|1970-01-01T08:00:01.220+08:00| 1.7129696477516976| -|1970-01-01T08:00:01.230+08:00| 1.6543635959181928| -|1970-01-01T08:00:01.240+08:00| 1.5890908816156328| -|1970-01-01T08:00:01.250+08:00| 1.5196969577678319| -|1970-01-01T08:00:01.260+08:00| 1.4487272772986044| -|1970-01-01T08:00:01.270+08:00| 1.3787272931317647| -|1970-01-01T08:00:01.280+08:00| 1.3122424581911272| -|1970-01-01T08:00:01.290+08:00| 1.251818225400506| -|1970-01-01T08:00:01.300+08:00| 1.2000000476837158| -|1970-01-01T08:00:01.310+08:00| 1.1548000470995912| -|1970-01-01T08:00:01.320+08:00| 1.1130667107899999| -|1970-01-01T08:00:01.330+08:00| 1.0756000393033045| -|1970-01-01T08:00:01.340+08:00| 1.043200033187868| -|1970-01-01T08:00:01.350+08:00| 1.016666692992053| -|1970-01-01T08:00:01.360+08:00| 0.9968000192642223| -|1970-01-01T08:00:01.370+08:00| 0.9844000125527389| -|1970-01-01T08:00:01.380+08:00| 0.9802666734059655| -|1970-01-01T08:00:01.390+08:00| 0.9852000023722649| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.410+08:00| 1.023999999165535| -|1970-01-01T08:00:01.420+08:00| 1.0559999990463256| -|1970-01-01T08:00:01.430+08:00| 1.0959999996423722| -|1970-01-01T08:00:01.440+08:00| 1.1440000009536744| -|1970-01-01T08:00:01.450+08:00| 1.2000000029802322| -|1970-01-01T08:00:01.460+08:00| 1.264000005722046| -|1970-01-01T08:00:01.470+08:00| 1.3360000091791153| -|1970-01-01T08:00:01.480+08:00| 1.4160000133514405| -|1970-01-01T08:00:01.490+08:00| 1.5040000182390214| -|1970-01-01T08:00:01.500+08:00| 1.600000023841858| -+-----------------------------+------------------------------------+ -``` - -### Spread - -#### Registration statement - -```sql -create function spread as 'org.apache.iotdb.library.dprofile.UDAFSpread' -``` - -#### Usage - -This function is used to calculate the spread of time series, that is, the maximum value minus the minimum value. - -**Name:** SPREAD - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is the same as the input. There is only one data point in the series, whose timestamp is 0 and value is the spread. - -**Note:** Missing points, null points and `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+-----------------------+ -| Time|spread(root.test.d1.s1)| -+-----------------------------+-----------------------+ -|1970-01-01T08:00:00.000+08:00| 26.0| -+-----------------------------+-----------------------+ -``` - - -### ZScore - -#### Registration statement - -```sql -create function zscore as 'org.apache.iotdb.library.dprofile.UDTFZScore' -``` - -#### Usage - -This function is used to standardize the input series with z-score. - -**Name:** ZSCORE - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide mean and standard deviation. The default method is "batch". -+ `avg`: Mean value when method is set to "stream". -+ `sd`: Standard deviation when method is set to "stream". - -**Output Series:** Output a single series. The type is DOUBLE. - -#### Examples - -##### Batch computing - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select zscore(s1) from root.test -``` - -Output series: - -``` -+-----------------------------+--------------------+ -| Time|zscore(root.test.s1)| -+-----------------------------+--------------------+ -|1970-01-01T08:00:00.100+08:00|-0.20672455764868078| -|1970-01-01T08:00:00.200+08:00|-0.20672455764868078| -|1970-01-01T08:00:00.300+08:00| 0.20672455764868078| -|1970-01-01T08:00:00.400+08:00| -0.6201736729460423| -|1970-01-01T08:00:00.500+08:00|-0.20672455764868078| -|1970-01-01T08:00:00.600+08:00|-0.20672455764868078| -|1970-01-01T08:00:00.700+08:00| -1.033622788243404| -|1970-01-01T08:00:00.800+08:00| 0.6201736729460423| -|1970-01-01T08:00:00.900+08:00|-0.20672455764868078| -|1970-01-01T08:00:01.000+08:00|-0.20672455764868078| -|1970-01-01T08:00:01.100+08:00| 0.20672455764868078| -|1970-01-01T08:00:01.200+08:00| -0.6201736729460423| -|1970-01-01T08:00:01.300+08:00| -0.6201736729460423| -|1970-01-01T08:00:01.400+08:00| 0.20672455764868078| -|1970-01-01T08:00:01.500+08:00|-0.20672455764868078| -|1970-01-01T08:00:01.600+08:00|-0.20672455764868078| -|1970-01-01T08:00:01.700+08:00| 3.9277665953249348| -|1970-01-01T08:00:01.800+08:00| 0.6201736729460423| -|1970-01-01T08:00:01.900+08:00| -1.033622788243404| -|1970-01-01T08:00:02.000+08:00|-0.20672455764868078| -+-----------------------------+--------------------+ -``` - - -## Anomaly Detection - -### IQR - -#### Registration statement - -```sql -create function iqr as 'org.apache.iotdb.library.anomaly.UDTFIQR' -``` - -#### Usage - -This function is used to detect anomalies based on IQR. Points distributing beyond 1.5 times IQR are selected. - -**Name:** IQR - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `method`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide upper and lower quantiles. The default method is "batch". -+ `q1`: The lower quantile when method is set to "stream". -+ `q3`: The upper quantile when method is set to "stream". - -**Output Series:** Output a single series. The type is DOUBLE. - -**Note:** $IQR=Q_3-Q_1$ - -#### Examples - -##### Batch computing - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| -|1970-01-01T08:00:00.300+08:00| 1.0| -|1970-01-01T08:00:00.400+08:00| -1.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -|1970-01-01T08:00:00.600+08:00| 0.0| -|1970-01-01T08:00:00.700+08:00| -2.0| -|1970-01-01T08:00:00.800+08:00| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 1.0| -|1970-01-01T08:00:01.200+08:00| -1.0| -|1970-01-01T08:00:01.300+08:00| -1.0| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 0.0| -|1970-01-01T08:00:01.600+08:00| 0.0| -|1970-01-01T08:00:01.700+08:00| 10.0| -|1970-01-01T08:00:01.800+08:00| 2.0| -|1970-01-01T08:00:01.900+08:00| -2.0| -|1970-01-01T08:00:02.000+08:00| 0.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select iqr(s1) from root.test -``` - -Output series: - -``` -+-----------------------------+-----------------+ -| Time|iqr(root.test.s1)| -+-----------------------------+-----------------+ -|1970-01-01T08:00:01.700+08:00| 10.0| -+-----------------------------+-----------------+ -``` - -### KSigma - -#### Registration statement - -```sql -create function ksigma as 'org.apache.iotdb.library.anomaly.UDTFKSigma' -``` - -#### Usage - -This function is used to detect anomalies based on the Dynamic K-Sigma Algorithm. -Within a sliding window, the input value with a deviation of more than k times the standard deviation from the average will be output as anomaly. - -**Name:** KSIGMA - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `k`: How many times to multiply on standard deviation to define anomaly, the default value is 3. -+ `window`: The window size of Dynamic K-Sigma Algorithm, the default value is 10000. - -**Output Series:** Output a single series. The type is same as input series. - -**Note:** Only when is larger than 0, the anomaly detection will be performed. Otherwise, nothing will be output. - -#### Examples - -##### Assigning k - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 0.0| -|2020-01-01T00:00:03.000+08:00| 50.0| -|2020-01-01T00:00:04.000+08:00| 100.0| -|2020-01-01T00:00:06.000+08:00| 150.0| -|2020-01-01T00:00:08.000+08:00| 200.0| -|2020-01-01T00:00:10.000+08:00| 200.0| -|2020-01-01T00:00:14.000+08:00| 200.0| -|2020-01-01T00:00:15.000+08:00| 200.0| -|2020-01-01T00:00:16.000+08:00| 200.0| -|2020-01-01T00:00:18.000+08:00| 200.0| -|2020-01-01T00:00:20.000+08:00| 150.0| -|2020-01-01T00:00:22.000+08:00| 100.0| -|2020-01-01T00:00:26.000+08:00| 50.0| -|2020-01-01T00:00:28.000+08:00| 0.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+---------------------------------+ -|Time |ksigma(root.test.d1.s1,"k"="3.0")| -+-----------------------------+---------------------------------+ -|2020-01-01T00:00:02.000+08:00| 0.0| -|2020-01-01T00:00:03.000+08:00| 50.0| -|2020-01-01T00:00:26.000+08:00| 50.0| -|2020-01-01T00:00:28.000+08:00| 0.0| -+-----------------------------+---------------------------------+ -``` - -### LOF - -#### Registration statement - -```sql -create function LOF as 'org.apache.iotdb.library.anomaly.UDTFLOF' -``` - -#### Usage - -This function is used to detect density anomaly of time series. According to k-th distance calculation parameter and local outlier factor (lof) threshold, the function judges if a set of input values is an density anomaly, and a bool mark of anomaly values will be output. - -**Name:** LOF - -**Input Series:** Multiple input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `method`:assign a detection method. The default value is "default", when input data has multiple dimensions. The alternative is "series", when a input series will be transformed to high dimension. -+ `k`:use the k-th distance to calculate lof. Default value is 3. -+ `window`: size of window to split origin data points. Default value is 10000. -+ `windowsize`:dimension that will be transformed into when method is "series". The default value is 5. - -**Output Series:** Output a single series. The type is DOUBLE. - -**Note:** Incomplete rows will be ignored. They are neither calculated nor marked as anomaly. - -#### Examples - -##### Using default parameters - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d1.s1|root.test.d1.s2| -+-----------------------------+---------------+---------------+ -|1970-01-01T08:00:00.100+08:00| 0.0| 0.0| -|1970-01-01T08:00:00.200+08:00| 0.0| 1.0| -|1970-01-01T08:00:00.300+08:00| 1.0| 1.0| -|1970-01-01T08:00:00.400+08:00| 1.0| 0.0| -|1970-01-01T08:00:00.500+08:00| 0.0| -1.0| -|1970-01-01T08:00:00.600+08:00| -1.0| -1.0| -|1970-01-01T08:00:00.700+08:00| -1.0| 0.0| -|1970-01-01T08:00:00.800+08:00| 2.0| 2.0| -|1970-01-01T08:00:00.900+08:00| 0.0| null| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select lof(s1,s2) from root.test.d1 where time<1000 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------+ -| Time|lof(root.test.d1.s1, root.test.d1.s2)| -+-----------------------------+-------------------------------------+ -|1970-01-01T08:00:00.100+08:00| 3.8274824267668244| -|1970-01-01T08:00:00.200+08:00| 3.0117631741126156| -|1970-01-01T08:00:00.300+08:00| 2.838155437762879| -|1970-01-01T08:00:00.400+08:00| 3.0117631741126156| -|1970-01-01T08:00:00.500+08:00| 2.73518261244453| -|1970-01-01T08:00:00.600+08:00| 2.371440975708148| -|1970-01-01T08:00:00.700+08:00| 2.73518261244453| -|1970-01-01T08:00:00.800+08:00| 1.7561416374270742| -+-----------------------------+-------------------------------------+ -``` - -##### Diagnosing 1d timeseries - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.100+08:00| 1.0| -|1970-01-01T08:00:00.200+08:00| 2.0| -|1970-01-01T08:00:00.300+08:00| 3.0| -|1970-01-01T08:00:00.400+08:00| 4.0| -|1970-01-01T08:00:00.500+08:00| 5.0| -|1970-01-01T08:00:00.600+08:00| 6.0| -|1970-01-01T08:00:00.700+08:00| 7.0| -|1970-01-01T08:00:00.800+08:00| 8.0| -|1970-01-01T08:00:00.900+08:00| 9.0| -|1970-01-01T08:00:01.000+08:00| 10.0| -|1970-01-01T08:00:01.100+08:00| 11.0| -|1970-01-01T08:00:01.200+08:00| 12.0| -|1970-01-01T08:00:01.300+08:00| 13.0| -|1970-01-01T08:00:01.400+08:00| 14.0| -|1970-01-01T08:00:01.500+08:00| 15.0| -|1970-01-01T08:00:01.600+08:00| 16.0| -|1970-01-01T08:00:01.700+08:00| 17.0| -|1970-01-01T08:00:01.800+08:00| 18.0| -|1970-01-01T08:00:01.900+08:00| 19.0| -|1970-01-01T08:00:02.000+08:00| 20.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select lof(s1, "method"="series") from root.test.d1 where time<1000 -``` - -Output series: - -``` -+-----------------------------+--------------------+ -| Time|lof(root.test.d1.s1)| -+-----------------------------+--------------------+ -|1970-01-01T08:00:00.100+08:00| 3.77777777777778| -|1970-01-01T08:00:00.200+08:00| 4.32727272727273| -|1970-01-01T08:00:00.300+08:00| 4.85714285714286| -|1970-01-01T08:00:00.400+08:00| 5.40909090909091| -|1970-01-01T08:00:00.500+08:00| 5.94999999999999| -|1970-01-01T08:00:00.600+08:00| 6.43243243243243| -|1970-01-01T08:00:00.700+08:00| 6.79999999999999| -|1970-01-01T08:00:00.800+08:00| 7.0| -|1970-01-01T08:00:00.900+08:00| 7.0| -|1970-01-01T08:00:01.000+08:00| 6.79999999999999| -|1970-01-01T08:00:01.100+08:00| 6.43243243243243| -|1970-01-01T08:00:01.200+08:00| 5.94999999999999| -|1970-01-01T08:00:01.300+08:00| 5.40909090909091| -|1970-01-01T08:00:01.400+08:00| 4.85714285714286| -|1970-01-01T08:00:01.500+08:00| 4.32727272727273| -|1970-01-01T08:00:01.600+08:00| 3.77777777777778| -+-----------------------------+--------------------+ -``` - -### MissDetect - -#### Registration statement - -```sql -create function missdetect as 'org.apache.iotdb.library.anomaly.UDTFMissDetect' -``` - -#### Usage - -This function is used to detect missing anomalies. -In some datasets, missing values are filled by linear interpolation. -Thus, there are several long perfect linear segments. -By discovering these perfect linear segments, -missing anomalies are detected. - -**Name:** MISSDETECT - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameter:** - -`error`: The minimum length of the detected missing anomalies, which is an integer greater than or equal to 10. By default, it is 10. - -**Output Series:** Output a single series. The type is BOOLEAN. Each data point which is miss anomaly will be labeled as true. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s2| -+-----------------------------+---------------+ -|2021-07-01T12:00:00.000+08:00| 0.0| -|2021-07-01T12:00:01.000+08:00| 1.0| -|2021-07-01T12:00:02.000+08:00| 0.0| -|2021-07-01T12:00:03.000+08:00| 1.0| -|2021-07-01T12:00:04.000+08:00| 0.0| -|2021-07-01T12:00:05.000+08:00| 0.0| -|2021-07-01T12:00:06.000+08:00| 0.0| -|2021-07-01T12:00:07.000+08:00| 0.0| -|2021-07-01T12:00:08.000+08:00| 0.0| -|2021-07-01T12:00:09.000+08:00| 0.0| -|2021-07-01T12:00:10.000+08:00| 0.0| -|2021-07-01T12:00:11.000+08:00| 0.0| -|2021-07-01T12:00:12.000+08:00| 0.0| -|2021-07-01T12:00:13.000+08:00| 0.0| -|2021-07-01T12:00:14.000+08:00| 0.0| -|2021-07-01T12:00:15.000+08:00| 0.0| -|2021-07-01T12:00:16.000+08:00| 1.0| -|2021-07-01T12:00:17.000+08:00| 0.0| -|2021-07-01T12:00:18.000+08:00| 1.0| -|2021-07-01T12:00:19.000+08:00| 0.0| -|2021-07-01T12:00:20.000+08:00| 1.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select missdetect(s2,'minlen'='10') from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------+ -| Time|missdetect(root.test.d2.s2, "minlen"="10")| -+-----------------------------+------------------------------------------+ -|2021-07-01T12:00:00.000+08:00| false| -|2021-07-01T12:00:01.000+08:00| false| -|2021-07-01T12:00:02.000+08:00| false| -|2021-07-01T12:00:03.000+08:00| false| -|2021-07-01T12:00:04.000+08:00| true| -|2021-07-01T12:00:05.000+08:00| true| -|2021-07-01T12:00:06.000+08:00| true| -|2021-07-01T12:00:07.000+08:00| true| -|2021-07-01T12:00:08.000+08:00| true| -|2021-07-01T12:00:09.000+08:00| true| -|2021-07-01T12:00:10.000+08:00| true| -|2021-07-01T12:00:11.000+08:00| true| -|2021-07-01T12:00:12.000+08:00| true| -|2021-07-01T12:00:13.000+08:00| true| -|2021-07-01T12:00:14.000+08:00| true| -|2021-07-01T12:00:15.000+08:00| true| -|2021-07-01T12:00:16.000+08:00| false| -|2021-07-01T12:00:17.000+08:00| false| -|2021-07-01T12:00:18.000+08:00| false| -|2021-07-01T12:00:19.000+08:00| false| -|2021-07-01T12:00:20.000+08:00| false| -+-----------------------------+------------------------------------------+ -``` - -### Range - -#### Registration statement - -```sql -create function range as 'org.apache.iotdb.library.anomaly.UDTFRange' -``` - -#### Usage - -This function is used to detect range anomaly of time series. According to upper bound and lower bound parameters, the function judges if a input value is beyond range, aka range anomaly, and a new time series of anomaly will be output. - -**Name:** RANGE - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `lower_bound`:lower bound of range anomaly detection. -+ `upper_bound`:upper bound of range anomaly detection. - -**Output Series:** Output a single series. The type is the same as the input. - -**Note:** Only when `upper_bound` is larger than `lower_bound`, the anomaly detection will be performed. Otherwise, nothing will be output. - - - -#### Examples - -##### Assigning Lower and Upper Bound - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------------------+ -|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| -+-----------------------------+------------------------------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -+-----------------------------+------------------------------------------------------------------+ -``` - -### TwoSidedFilter - -#### Registration statement - -```sql -create function twosidedfilter as 'org.apache.iotdb.library.anomaly.UDTFTwoSidedFilter' -``` - -#### Usage - -The function is used to filter anomalies of a numeric time series based on two-sided window detection. - -**Name:** TWOSIDEDFILTER - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE - -**Output Series:** Output a single series. The type is the same as the input. It is the input without anomalies. - -**Parameter:** - -- `len`: The size of the window, which is a positive integer. By default, it's 5. When `len`=3, the algorithm detects forward window and backward window with length 3 and calculates the outlierness of the current point. - -- `threshold`: The threshold of outlierness, which is a floating number in (0,1). By default, it's 0.3. The strict standard of detecting anomalies is in proportion to the threshold. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s0| -+-----------------------------+------------+ -|1970-01-01T08:00:00.000+08:00| 2002.0| -|1970-01-01T08:00:01.000+08:00| 1946.0| -|1970-01-01T08:00:02.000+08:00| 1958.0| -|1970-01-01T08:00:03.000+08:00| 2012.0| -|1970-01-01T08:00:04.000+08:00| 2051.0| -|1970-01-01T08:00:05.000+08:00| 1898.0| -|1970-01-01T08:00:06.000+08:00| 2014.0| -|1970-01-01T08:00:07.000+08:00| 2052.0| -|1970-01-01T08:00:08.000+08:00| 1935.0| -|1970-01-01T08:00:09.000+08:00| 1901.0| -|1970-01-01T08:00:10.000+08:00| 1972.0| -|1970-01-01T08:00:11.000+08:00| 1969.0| -|1970-01-01T08:00:12.000+08:00| 1984.0| -|1970-01-01T08:00:13.000+08:00| 2018.0| -|1970-01-01T08:00:37.000+08:00| 1484.0| -|1970-01-01T08:00:38.000+08:00| 1055.0| -|1970-01-01T08:00:39.000+08:00| 1050.0| -|1970-01-01T08:01:05.000+08:00| 1023.0| -|1970-01-01T08:01:06.000+08:00| 1056.0| -|1970-01-01T08:01:07.000+08:00| 978.0| -|1970-01-01T08:01:08.000+08:00| 1050.0| -|1970-01-01T08:01:09.000+08:00| 1123.0| -|1970-01-01T08:01:10.000+08:00| 1150.0| -|1970-01-01T08:01:11.000+08:00| 1034.0| -|1970-01-01T08:01:12.000+08:00| 950.0| -|1970-01-01T08:01:13.000+08:00| 1059.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test -``` - -Output series: - -``` -+-----------------------------+------------+ -| Time|root.test.s0| -+-----------------------------+------------+ -|1970-01-01T08:00:00.000+08:00| 2002.0| -|1970-01-01T08:00:01.000+08:00| 1946.0| -|1970-01-01T08:00:02.000+08:00| 1958.0| -|1970-01-01T08:00:03.000+08:00| 2012.0| -|1970-01-01T08:00:04.000+08:00| 2051.0| -|1970-01-01T08:00:05.000+08:00| 1898.0| -|1970-01-01T08:00:06.000+08:00| 2014.0| -|1970-01-01T08:00:07.000+08:00| 2052.0| -|1970-01-01T08:00:08.000+08:00| 1935.0| -|1970-01-01T08:00:09.000+08:00| 1901.0| -|1970-01-01T08:00:10.000+08:00| 1972.0| -|1970-01-01T08:00:11.000+08:00| 1969.0| -|1970-01-01T08:00:12.000+08:00| 1984.0| -|1970-01-01T08:00:13.000+08:00| 2018.0| -|1970-01-01T08:01:05.000+08:00| 1023.0| -|1970-01-01T08:01:06.000+08:00| 1056.0| -|1970-01-01T08:01:07.000+08:00| 978.0| -|1970-01-01T08:01:08.000+08:00| 1050.0| -|1970-01-01T08:01:09.000+08:00| 1123.0| -|1970-01-01T08:01:10.000+08:00| 1150.0| -|1970-01-01T08:01:11.000+08:00| 1034.0| -|1970-01-01T08:01:12.000+08:00| 950.0| -|1970-01-01T08:01:13.000+08:00| 1059.0| -+-----------------------------+------------+ -``` - -### Outlier - -#### Registration statement - -```sql -create function outlier as 'org.apache.iotdb.library.anomaly.UDTFOutlier' -``` - -#### Usage - -This function is used to detect distance-based outliers. For each point in the current window, if the number of its neighbors within the distance of neighbor distance threshold is less than the neighbor count threshold, the point in detected as an outlier. - -**Name:** OUTLIER - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -+ `r`:the neighbor distance threshold. -+ `k`:the neighbor count threshold. -+ `w`:the window size. -+ `s`:the slide size. - -**Output Series:** Output a single series. The type is the same as the input. - -#### Examples - -##### Assigning Parameters of Queries - -Input series: - -``` -+-----------------------------+------------+ -| Time|root.test.s1| -+-----------------------------+------------+ -|2020-01-04T23:59:55.000+08:00| 56.0| -|2020-01-04T23:59:56.000+08:00| 55.1| -|2020-01-04T23:59:57.000+08:00| 54.2| -|2020-01-04T23:59:58.000+08:00| 56.3| -|2020-01-04T23:59:59.000+08:00| 59.0| -|2020-01-05T00:00:00.000+08:00| 60.0| -|2020-01-05T00:00:01.000+08:00| 60.5| -|2020-01-05T00:00:02.000+08:00| 64.5| -|2020-01-05T00:00:03.000+08:00| 69.0| -|2020-01-05T00:00:04.000+08:00| 64.2| -|2020-01-05T00:00:05.000+08:00| 62.3| -|2020-01-05T00:00:06.000+08:00| 58.0| -|2020-01-05T00:00:07.000+08:00| 58.9| -|2020-01-05T00:00:08.000+08:00| 52.0| -|2020-01-05T00:00:09.000+08:00| 62.3| -|2020-01-05T00:00:10.000+08:00| 61.0| -|2020-01-05T00:00:11.000+08:00| 64.2| -|2020-01-05T00:00:12.000+08:00| 61.8| -|2020-01-05T00:00:13.000+08:00| 64.0| -|2020-01-05T00:00:14.000+08:00| 63.0| -+-----------------------------+------------+ -``` - -SQL for query: - -```sql -select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------------+ -| Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| -+-----------------------------+--------------------------------------------------------+ -|2020-01-05T00:00:03.000+08:00| 69.0| -+-----------------------------+--------------------------------------------------------+ -|2020-01-05T00:00:08.000+08:00| 52.0| -+-----------------------------+--------------------------------------------------------+ -``` - - -### MasterTrain - -#### Usage - -This function is used to train the VAR model based on master data. The model is trained on learning samples consisting of p+1 consecutive non-error points. - -**Name:** MasterTrain - -**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `p`: The order of the model. -+ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. - -**Output Series:** Output a single series. The type is the same as the input. - -**Installation** -- Install IoTDB from branch `research/master-detector`. -- Run `mvn spotless:apply`. -- Run `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies`. -- Copy `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar` to `./ext/udf/`. -- Start IoTDB server and run `create function MasterTrain as 'org.apache.iotdb.library.anomaly.UDTFMasterTrain'` in client. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+------------+--------------+--------------+ -| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| -+-----------------------------+------------+------------+--------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| -|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| -|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| -|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| -|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| -|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| -|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| -|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| -|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| -|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| -|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| -|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| -|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| -|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| -|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| -|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| -|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| -|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| -|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| -|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| -|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| -|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| -|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| -|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| -+-----------------------------+------------+------------+--------------+--------------+ -``` - -SQL for query: - -```sql -select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test -``` - -Output series: - -``` -+-----------------------------+---------------------------------------------------------------------------------------------+ -| Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| -+-----------------------------+---------------------------------------------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 0.13656607660463288| -|1970-01-01T08:00:00.002+08:00| 0.8291884323013894| -|1970-01-01T08:00:00.003+08:00| 0.05012816073171693| -|1970-01-01T08:00:00.004+08:00| -0.5495287787485761| -|1970-01-01T08:00:00.005+08:00| 0.03740486307345578| -|1970-01-01T08:00:00.006+08:00| 1.0500132150475212| -|1970-01-01T08:00:00.007+08:00| 0.04583944643116993| -|1970-01-01T08:00:00.008+08:00| -0.07863708480736269| -+-----------------------------+---------------------------------------------------------------------------------------------+ -``` - -### MasterDetect - -#### Usage - -This function is used to detect time series and repair errors based on master data. The VAR model is trained by MasterTrain. - -**Name:** MasterDetect - -**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `p`: The order of the model. -+ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple distance of the k-th nearest neighbor in the master data. -+ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. -+ `eta`: The detection threshold. By default, it will be estimated based on the 3-sigma rule. -+ `output_type`: The type of output. 'repair' for repairing and 'anomaly' for anomaly detection. -+ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. - -**Output Series:** Output a single series. The type is the same as the input. - -**Installation** -- Install IoTDB from branch `research/master-detector`. -- Run `mvn spotless:apply`. -- Run `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies`. -- Copy `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar` to `./ext/udf/`. -- Start IoTDB server and run `create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'` in client. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+------------+--------------+--------------+--------------------+ -| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| -+-----------------------------+------------+------------+--------------+--------------+--------------------+ -|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| -|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| -|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| -|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| -|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| -|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| -|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| -|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| -|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| -|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| -|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| -|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| -|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| -|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| -|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| -|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| -|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| -|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| -|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| -|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| -|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| -|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| -|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| -|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| -|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| -+-----------------------------+------------+------------+--------------+--------------+--------------------+ -``` - -##### Repairing - -SQL for query: - -```sql -select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------------------------------------------+ -| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| -+-----------------------------+--------------------------------------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 116.327274| -|1970-01-01T08:00:00.002+08:00| 116.327305| -|1970-01-01T08:00:00.003+08:00| 116.3273291| -|1970-01-01T08:00:00.004+08:00| 116.327342| -|1970-01-01T08:00:00.005+08:00| 116.3273744| -|1970-01-01T08:00:00.006+08:00| 116.3274117| -|1970-01-01T08:00:00.007+08:00| 116.3274396| -|1970-01-01T08:00:00.008+08:00| 116.3274668| -|1970-01-01T08:00:00.009+08:00| 116.3275026| -|1970-01-01T08:00:00.010+08:00| 116.3274967| -|1970-01-01T08:00:00.011+08:00| 116.3274929| -|1970-01-01T08:00:00.012+08:00| 116.3274745| -|1970-01-01T08:00:00.013+08:00| 116.3275095| -|1970-01-01T08:00:00.014+08:00| 116.3274787| -|1970-01-01T08:00:00.015+08:00| 116.3274693| -|1970-01-01T08:00:00.016+08:00| 116.3274941| -|1970-01-01T08:00:00.017+08:00| 116.3275401| -|1970-01-01T08:00:00.018+08:00| 116.3275713| -|1970-01-01T08:00:00.019+08:00| 116.3276003| -|1970-01-01T08:00:00.020+08:00| 116.3276308| -|1970-01-01T08:00:00.021+08:00| 116.3276338| -|1970-01-01T08:00:00.022+08:00| 116.3276684| -|1970-01-01T08:00:00.023+08:00| 116.3277016| -|1970-01-01T08:00:00.024+08:00| 116.3277284| -|1970-01-01T08:00:00.025+08:00| 116.3277562| -+-----------------------------+--------------------------------------------------------------------------------------+ -``` - -##### Anomaly Detection - -SQL for query: - -```sql -select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test -``` - -Output series: - -``` -+-----------------------------+---------------------------------------------------------------------------------------+ -| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| -+-----------------------------+---------------------------------------------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| false| -|1970-01-01T08:00:00.002+08:00| false| -|1970-01-01T08:00:00.003+08:00| false| -|1970-01-01T08:00:00.004+08:00| false| -|1970-01-01T08:00:00.005+08:00| true| -|1970-01-01T08:00:00.006+08:00| true| -|1970-01-01T08:00:00.007+08:00| false| -|1970-01-01T08:00:00.008+08:00| false| -|1970-01-01T08:00:00.009+08:00| false| -|1970-01-01T08:00:00.010+08:00| false| -|1970-01-01T08:00:00.011+08:00| false| -|1970-01-01T08:00:00.012+08:00| false| -|1970-01-01T08:00:00.013+08:00| false| -|1970-01-01T08:00:00.014+08:00| true| -|1970-01-01T08:00:00.015+08:00| false| -|1970-01-01T08:00:00.016+08:00| false| -|1970-01-01T08:00:00.017+08:00| false| -|1970-01-01T08:00:00.018+08:00| false| -|1970-01-01T08:00:00.019+08:00| false| -|1970-01-01T08:00:00.020+08:00| false| -|1970-01-01T08:00:00.021+08:00| false| -|1970-01-01T08:00:00.022+08:00| false| -|1970-01-01T08:00:00.023+08:00| false| -|1970-01-01T08:00:00.024+08:00| false| -|1970-01-01T08:00:00.025+08:00| false| -+-----------------------------+---------------------------------------------------------------------------------------+ -``` - - - -## Frequency Domain Analysis - -### Conv - -#### Registration statement - -```sql -create function conv as 'org.apache.iotdb.library.frequency.UDTFConv' -``` - -#### Usage - -This function is used to calculate the convolution, i.e. polynomial multiplication. - -**Name:** CONV - -**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Output:** Output a single series. The type is DOUBLE. It is the result of convolution whose timestamps starting from 0 only indicate the order. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d2.s1|root.test.d2.s2| -+-----------------------------+---------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 1.0| 7.0| -|1970-01-01T08:00:00.001+08:00| 0.0| 2.0| -|1970-01-01T08:00:00.002+08:00| 1.0| null| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select conv(s1,s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------+ -| Time|conv(root.test.d2.s1, root.test.d2.s2)| -+-----------------------------+--------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 7.0| -|1970-01-01T08:00:00.001+08:00| 2.0| -|1970-01-01T08:00:00.002+08:00| 7.0| -|1970-01-01T08:00:00.003+08:00| 2.0| -+-----------------------------+--------------------------------------+ -``` - -### Deconv - -#### Registration statement - -```sql -create function deconv as 'org.apache.iotdb.library.frequency.UDTFDeconv' -``` - -#### Usage - -This function is used to calculate the deconvolution, i.e. polynomial division. - -**Name:** DECONV - -**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `result`: The result of deconvolution, which is 'quotient' or 'remainder'. By default, the quotient will be output. - -**Output:** Output a single series. The type is DOUBLE. It is the result of deconvolving the second series from the first series (dividing the first series by the second series) whose timestamps starting from 0 only indicate the order. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - - -##### Calculate the quotient - -When `result` is 'quotient' or the default, this function calculates the quotient of the deconvolution. - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d2.s3|root.test.d2.s2| -+-----------------------------+---------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 8.0| 7.0| -|1970-01-01T08:00:00.001+08:00| 2.0| 2.0| -|1970-01-01T08:00:00.002+08:00| 7.0| null| -|1970-01-01T08:00:00.003+08:00| 2.0| null| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select deconv(s3,s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+----------------------------------------+ -| Time|deconv(root.test.d2.s3, root.test.d2.s2)| -+-----------------------------+----------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 1.0| -|1970-01-01T08:00:00.001+08:00| 0.0| -|1970-01-01T08:00:00.002+08:00| 1.0| -+-----------------------------+----------------------------------------+ -``` - -##### Calculate the remainder - -When `result` is 'remainder', this function calculates the remainder of the deconvolution. - -Input series is the same as above, the SQL for query is shown below: - - -```sql -select deconv(s3,s2,'result'='remainder') from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------------------+ -| Time|deconv(root.test.d2.s3, root.test.d2.s2, "result"="remainder")| -+-----------------------------+--------------------------------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 1.0| -|1970-01-01T08:00:00.001+08:00| 0.0| -|1970-01-01T08:00:00.002+08:00| 0.0| -|1970-01-01T08:00:00.003+08:00| 0.0| -+-----------------------------+--------------------------------------------------------------+ -``` - -### DWT - -#### Registration statement - -```sql -create function dwt as 'org.apache.iotdb.library.frequency.UDTFDWT' -``` - -#### Usage - -This function is used to calculate 1d discrete wavelet transform of a numerical series. - -**Name:** DWT - -**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `method`: The type of wavelet. May select 'Haar', 'DB4', 'DB6', 'DB8', where DB means Daubechies. User may offer coefficients of wavelet transform and ignore this parameter. Case ignored. -+ `coef`: Coefficients of wavelet transform. When providing this parameter, use comma ',' to split them, and leave no spaces or other punctuations. -+ `layer`: Times to transform. The number of output vectors equals $layer+1$. Default is 1. - -**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. - -**Note:** The length of input series must be an integer number power of 2. - -#### Examples - - -##### Haar wavelet transform - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -|1970-01-01T08:00:00.100+08:00| 0.2| -|1970-01-01T08:00:00.200+08:00| 1.5| -|1970-01-01T08:00:00.300+08:00| 1.2| -|1970-01-01T08:00:00.400+08:00| 0.6| -|1970-01-01T08:00:00.500+08:00| 1.7| -|1970-01-01T08:00:00.600+08:00| 0.8| -|1970-01-01T08:00:00.700+08:00| 2.0| -|1970-01-01T08:00:00.800+08:00| 2.5| -|1970-01-01T08:00:00.900+08:00| 2.1| -|1970-01-01T08:00:01.000+08:00| 0.0| -|1970-01-01T08:00:01.100+08:00| 2.0| -|1970-01-01T08:00:01.200+08:00| 1.8| -|1970-01-01T08:00:01.300+08:00| 1.2| -|1970-01-01T08:00:01.400+08:00| 1.0| -|1970-01-01T08:00:01.500+08:00| 1.6| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select dwt(s1,"method"="haar") from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------+ -| Time|dwt(root.test.d1.s1, "method"="haar")| -+-----------------------------+-------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.14142135834465192| -|1970-01-01T08:00:00.100+08:00| 1.909188342921157| -|1970-01-01T08:00:00.200+08:00| 1.6263456473052773| -|1970-01-01T08:00:00.300+08:00| 1.9798989957517026| -|1970-01-01T08:00:00.400+08:00| 3.252691126023161| -|1970-01-01T08:00:00.500+08:00| 1.414213562373095| -|1970-01-01T08:00:00.600+08:00| 2.1213203435596424| -|1970-01-01T08:00:00.700+08:00| 1.8384776479437628| -|1970-01-01T08:00:00.800+08:00| -0.14142135834465192| -|1970-01-01T08:00:00.900+08:00| 0.21213200063848547| -|1970-01-01T08:00:01.000+08:00| -0.7778174761639416| -|1970-01-01T08:00:01.100+08:00| -0.8485281289944873| -|1970-01-01T08:00:01.200+08:00| 0.2828427799095765| -|1970-01-01T08:00:01.300+08:00| -1.414213562373095| -|1970-01-01T08:00:01.400+08:00| 0.42426400127697095| -|1970-01-01T08:00:01.500+08:00| -0.42426408557066786| -+-----------------------------+-------------------------------------+ -``` - -### FFT - -#### Registration statement - -```sql -create function fft as 'org.apache.iotdb.library.frequency.UDTFFFT' -``` - -#### Usage - -This function is used to calculate the fast Fourier transform (FFT) of a numerical series. - -**Name:** FFT - -**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `method`: The type of FFT, which is 'uniform' (by default) or 'nonuniform'. If the value is 'uniform', the timestamps will be ignored and all data points will be regarded as equidistant. Thus, the equidistant fast Fourier transform algorithm will be applied. If the value is 'nonuniform' (TODO), the non-equidistant fast Fourier transform algorithm will be applied based on timestamps. -+ `result`: The result of FFT, which is 'real', 'imag', 'abs' or 'angle', corresponding to the real part, imaginary part, magnitude and phase angle. By default, the magnitude will be output. -+ `compress`: The parameter of compression, which is within (0,1]. It is the reserved energy ratio of lossy compression. By default, there is no compression. - - -**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. The timestamps starting from 0 only indicate the order. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - - -##### Uniform FFT - -With the default `type`, uniform FFT is applied. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 2.902113| -|1970-01-01T08:00:01.000+08:00| 1.1755705| -|1970-01-01T08:00:02.000+08:00| -2.1755705| -|1970-01-01T08:00:03.000+08:00| -1.9021131| -|1970-01-01T08:00:04.000+08:00| 1.0| -|1970-01-01T08:00:05.000+08:00| 1.9021131| -|1970-01-01T08:00:06.000+08:00| 0.1755705| -|1970-01-01T08:00:07.000+08:00| -1.1755705| -|1970-01-01T08:00:08.000+08:00| -0.902113| -|1970-01-01T08:00:09.000+08:00| 0.0| -|1970-01-01T08:00:10.000+08:00| 0.902113| -|1970-01-01T08:00:11.000+08:00| 1.1755705| -|1970-01-01T08:00:12.000+08:00| -0.1755705| -|1970-01-01T08:00:13.000+08:00| -1.9021131| -|1970-01-01T08:00:14.000+08:00| -1.0| -|1970-01-01T08:00:15.000+08:00| 1.9021131| -|1970-01-01T08:00:16.000+08:00| 2.1755705| -|1970-01-01T08:00:17.000+08:00| -1.1755705| -|1970-01-01T08:00:18.000+08:00| -2.902113| -|1970-01-01T08:00:19.000+08:00| 0.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select fft(s1) from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+----------------------+ -| Time| fft(root.test.d1.s1)| -+-----------------------------+----------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| -|1970-01-01T08:00:00.001+08:00| 1.2727111142703152E-8| -|1970-01-01T08:00:00.002+08:00| 2.385520799101839E-7| -|1970-01-01T08:00:00.003+08:00| 8.723291723972645E-8| -|1970-01-01T08:00:00.004+08:00| 19.999999960195904| -|1970-01-01T08:00:00.005+08:00| 9.999999850988388| -|1970-01-01T08:00:00.006+08:00| 3.2260694930700566E-7| -|1970-01-01T08:00:00.007+08:00| 8.723291605373329E-8| -|1970-01-01T08:00:00.008+08:00| 1.108657103979944E-7| -|1970-01-01T08:00:00.009+08:00| 1.2727110997246171E-8| -|1970-01-01T08:00:00.010+08:00|1.9852334701272664E-23| -|1970-01-01T08:00:00.011+08:00| 1.2727111194499847E-8| -|1970-01-01T08:00:00.012+08:00| 1.108657103979944E-7| -|1970-01-01T08:00:00.013+08:00| 8.723291785769131E-8| -|1970-01-01T08:00:00.014+08:00| 3.226069493070057E-7| -|1970-01-01T08:00:00.015+08:00| 9.999999850988388| -|1970-01-01T08:00:00.016+08:00| 19.999999960195904| -|1970-01-01T08:00:00.017+08:00| 8.723291747109068E-8| -|1970-01-01T08:00:00.018+08:00| 2.3855207991018386E-7| -|1970-01-01T08:00:00.019+08:00| 1.2727112069910878E-8| -+-----------------------------+----------------------+ -``` - -Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, there are peaks in $k=4$ and $k=5$ of the output. - -##### Uniform FFT with Compression - -Input series is the same as above, the SQL for query is shown below: - -```sql -select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+----------------------+----------------------+ -| Time| fft(root.test.d1.s1,| fft(root.test.d1.s1,| -| | "result"="real",| "result"="imag",| -| | "compress"="0.99")| "compress"="0.99")| -+-----------------------------+----------------------+----------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| -|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| -|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| -|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| -|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| -|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| -|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| -+-----------------------------+----------------------+----------------------+ -``` - -Note: Based on the conjugation of the Fourier transform result, only the first half of the compression result is reserved. -According to the given parameter, data points are reserved from low frequency to high frequency until the reserved energy ratio exceeds it. -The last data point is reserved to indicate the length of the series. - -### HighPass - -#### Registration statement - -```sql -create function highpass as 'org.apache.iotdb.library.frequency.UDTFHighPass' -``` - -#### Usage - -This function performs low-pass filtering on the input series and extracts components above the cutoff frequency. -The timestamps of input will be ignored and all data points will be regarded as equidistant. - -**Name:** HIGHPASS - -**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. - -**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 2.902113| -|1970-01-01T08:00:01.000+08:00| 1.1755705| -|1970-01-01T08:00:02.000+08:00| -2.1755705| -|1970-01-01T08:00:03.000+08:00| -1.9021131| -|1970-01-01T08:00:04.000+08:00| 1.0| -|1970-01-01T08:00:05.000+08:00| 1.9021131| -|1970-01-01T08:00:06.000+08:00| 0.1755705| -|1970-01-01T08:00:07.000+08:00| -1.1755705| -|1970-01-01T08:00:08.000+08:00| -0.902113| -|1970-01-01T08:00:09.000+08:00| 0.0| -|1970-01-01T08:00:10.000+08:00| 0.902113| -|1970-01-01T08:00:11.000+08:00| 1.1755705| -|1970-01-01T08:00:12.000+08:00| -0.1755705| -|1970-01-01T08:00:13.000+08:00| -1.9021131| -|1970-01-01T08:00:14.000+08:00| -1.0| -|1970-01-01T08:00:15.000+08:00| 1.9021131| -|1970-01-01T08:00:16.000+08:00| 2.1755705| -|1970-01-01T08:00:17.000+08:00| -1.1755705| -|1970-01-01T08:00:18.000+08:00| -2.902113| -|1970-01-01T08:00:19.000+08:00| 0.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select highpass(s1,'wpass'='0.45') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-----------------------------------------+ -| Time|highpass(root.test.d1.s1, "wpass"="0.45")| -+-----------------------------+-----------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.9999999534830373| -|1970-01-01T08:00:01.000+08:00| 1.7462829277628608E-8| -|1970-01-01T08:00:02.000+08:00| -0.9999999593178128| -|1970-01-01T08:00:03.000+08:00| -4.1115269056426626E-8| -|1970-01-01T08:00:04.000+08:00| 0.9999999925494194| -|1970-01-01T08:00:05.000+08:00| 3.328126513330016E-8| -|1970-01-01T08:00:06.000+08:00| -1.0000000183304454| -|1970-01-01T08:00:07.000+08:00| 6.260191433311374E-10| -|1970-01-01T08:00:08.000+08:00| 1.0000000018134796| -|1970-01-01T08:00:09.000+08:00| -3.097210911744423E-17| -|1970-01-01T08:00:10.000+08:00| -1.0000000018134794| -|1970-01-01T08:00:11.000+08:00| -6.260191627862097E-10| -|1970-01-01T08:00:12.000+08:00| 1.0000000183304454| -|1970-01-01T08:00:13.000+08:00| -3.328126501424346E-8| -|1970-01-01T08:00:14.000+08:00| -0.9999999925494196| -|1970-01-01T08:00:15.000+08:00| 4.111526915498874E-8| -|1970-01-01T08:00:16.000+08:00| 0.9999999593178128| -|1970-01-01T08:00:17.000+08:00| -1.7462829341296528E-8| -|1970-01-01T08:00:18.000+08:00| -0.9999999534830369| -|1970-01-01T08:00:19.000+08:00| -1.035237222742873E-16| -+-----------------------------+-----------------------------------------+ -``` - -Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=sin(2\pi t/4)$ after high-pass filtering. - -### IFFT - -#### Registration statement - -```sql -create function ifft as 'org.apache.iotdb.library.frequency.UDTFIFFT' -``` - -#### Usage - -This function treats the two input series as the real and imaginary part of a complex series, performs an inverse fast Fourier transform (IFFT), and outputs the real part of the result. -For the input format, please refer to the output format of `FFT` function. -Moreover, the compressed output of `FFT` function is also supported. - -**Name:** IFFT - -**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `start`: The start time of the output series with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is '1970-01-01 08:00:00'. -+ `interval`: The interval of the output series, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it is 1s. - -**Output:** Output a single series. The type is DOUBLE. It is strictly equispaced. The values are the results of IFFT. - -**Note:** If a row contains null points or `NaN`, it will be ignored. - -#### Examples - - -Input series: - -``` -+-----------------------------+----------------------+----------------------+ -| Time| root.test.d1.re| root.test.d1.im| -+-----------------------------+----------------------+----------------------+ -|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| -|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| -|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| -|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| -|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| -|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| -|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| -+-----------------------------+----------------------+----------------------+ -``` - - -SQL for query: - -```sql -select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------------------+ -| Time|ifft(root.test.d1.re, root.test.d1.im, "interval"="1m",| -| | "start"="2021-01-01 00:00:00")| -+-----------------------------+-------------------------------------------------------+ -|2021-01-01T00:00:00.000+08:00| 2.902112992431231| -|2021-01-01T00:01:00.000+08:00| 1.1755704705132448| -|2021-01-01T00:02:00.000+08:00| -2.175570513757101| -|2021-01-01T00:03:00.000+08:00| -1.9021130389094498| -|2021-01-01T00:04:00.000+08:00| 0.9999999925494194| -|2021-01-01T00:05:00.000+08:00| 1.902113046743454| -|2021-01-01T00:06:00.000+08:00| 0.17557053610884188| -|2021-01-01T00:07:00.000+08:00| -1.1755704886020932| -|2021-01-01T00:08:00.000+08:00| -0.9021130371347148| -|2021-01-01T00:09:00.000+08:00| 3.552713678800501E-16| -|2021-01-01T00:10:00.000+08:00| 0.9021130371347154| -|2021-01-01T00:11:00.000+08:00| 1.1755704886020932| -|2021-01-01T00:12:00.000+08:00| -0.17557053610884144| -|2021-01-01T00:13:00.000+08:00| -1.902113046743454| -|2021-01-01T00:14:00.000+08:00| -0.9999999925494196| -|2021-01-01T00:15:00.000+08:00| 1.9021130389094498| -|2021-01-01T00:16:00.000+08:00| 2.1755705137571004| -|2021-01-01T00:17:00.000+08:00| -1.1755704705132448| -|2021-01-01T00:18:00.000+08:00| -2.902112992431231| -|2021-01-01T00:19:00.000+08:00| -3.552713678800501E-16| -+-----------------------------+-------------------------------------------------------+ -``` - -### LowPass - -#### Registration statement - -```sql -create function lowpass as 'org.apache.iotdb.library.frequency.UDTFLowPass' -``` - -#### Usage - -This function performs low-pass filtering on the input series and extracts components below the cutoff frequency. -The timestamps of input will be ignored and all data points will be regarded as equidistant. - -**Name:** LOWPASS - -**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. - -**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. - -**Note:** `NaN` in the input series will be ignored. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:00.000+08:00| 2.902113| -|1970-01-01T08:00:01.000+08:00| 1.1755705| -|1970-01-01T08:00:02.000+08:00| -2.1755705| -|1970-01-01T08:00:03.000+08:00| -1.9021131| -|1970-01-01T08:00:04.000+08:00| 1.0| -|1970-01-01T08:00:05.000+08:00| 1.9021131| -|1970-01-01T08:00:06.000+08:00| 0.1755705| -|1970-01-01T08:00:07.000+08:00| -1.1755705| -|1970-01-01T08:00:08.000+08:00| -0.902113| -|1970-01-01T08:00:09.000+08:00| 0.0| -|1970-01-01T08:00:10.000+08:00| 0.902113| -|1970-01-01T08:00:11.000+08:00| 1.1755705| -|1970-01-01T08:00:12.000+08:00| -0.1755705| -|1970-01-01T08:00:13.000+08:00| -1.9021131| -|1970-01-01T08:00:14.000+08:00| -1.0| -|1970-01-01T08:00:15.000+08:00| 1.9021131| -|1970-01-01T08:00:16.000+08:00| 2.1755705| -|1970-01-01T08:00:17.000+08:00| -1.1755705| -|1970-01-01T08:00:18.000+08:00| -2.902113| -|1970-01-01T08:00:19.000+08:00| 0.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select lowpass(s1,'wpass'='0.45') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+----------------------------------------+ -| Time|lowpass(root.test.d1.s1, "wpass"="0.45")| -+-----------------------------+----------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 1.9021130073323922| -|1970-01-01T08:00:01.000+08:00| 1.1755704705132448| -|1970-01-01T08:00:02.000+08:00| -1.1755705286582614| -|1970-01-01T08:00:03.000+08:00| -1.9021130389094498| -|1970-01-01T08:00:04.000+08:00| 7.450580419288145E-9| -|1970-01-01T08:00:05.000+08:00| 1.902113046743454| -|1970-01-01T08:00:06.000+08:00| 1.1755705212076808| -|1970-01-01T08:00:07.000+08:00| -1.1755704886020932| -|1970-01-01T08:00:08.000+08:00| -1.9021130222335536| -|1970-01-01T08:00:09.000+08:00| 3.552713678800501E-16| -|1970-01-01T08:00:10.000+08:00| 1.9021130222335536| -|1970-01-01T08:00:11.000+08:00| 1.1755704886020932| -|1970-01-01T08:00:12.000+08:00| -1.1755705212076801| -|1970-01-01T08:00:13.000+08:00| -1.902113046743454| -|1970-01-01T08:00:14.000+08:00| -7.45058112983088E-9| -|1970-01-01T08:00:15.000+08:00| 1.9021130389094498| -|1970-01-01T08:00:16.000+08:00| 1.1755705286582616| -|1970-01-01T08:00:17.000+08:00| -1.1755704705132448| -|1970-01-01T08:00:18.000+08:00| -1.9021130073323924| -|1970-01-01T08:00:19.000+08:00| -2.664535259100376E-16| -+-----------------------------+----------------------------------------+ -``` - -Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=2sin(2\pi t/5)$ after low-pass filtering. - - -### Envelope - -#### Registration statement - -```sql -create function envelope as 'org.apache.iotdb.library.frequency.UDFEnvelopeAnalysis' -``` - -#### Usage - -This function achieves signal demodulation and envelope extraction by inputting a one-dimensional floating-point array and a user specified modulation frequency. The goal of demodulation is to extract the parts of interest from complex signals, making them easier to understand. For example, demodulation can be used to find the envelope of the signal, that is, the trend of amplitude changes. - -**Name:** Envelope - -**Input:** Only supports a single input sequence, with types INT32/INT64/FLOAT/DOUBLE - - -**Parameters:** - -+ `frequency`: Frequency (optional, positive number. If this parameter is not filled in, the system will infer the frequency based on the time interval corresponding to the sequence). -+ `amplification`: Amplification factor (optional, positive integer. The output of the Time column is a set of positive integers and does not output decimals. When the frequency is less than 1, this parameter can be used to amplify the frequency to display normal results). - -**Output:** -+ `Time`: The meaning of the value returned by this column is frequency rather than time. If the output format is time format (e.g. 1970-01-01T08:00: 19.000+08:00), please convert it to a timestamp value. - - -+ `Envelope(Path, 'frequency'='{frequency}')`:Output a single sequence of type DOUBLE, which is the result of envelope analysis. - -**Note:** When the values of the demodulated original sequence are discontinuous, this function will treat it as continuous processing. It is recommended that the analyzed time series be a complete time series of values. It is also recommended to specify a start time and an end time. - -#### Examples - -Input series: - - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s1| -+-----------------------------+---------------+ -|1970-01-01T08:00:01.000+08:00| 1.0 | -|1970-01-01T08:00:02.000+08:00| 2.0 | -|1970-01-01T08:00:03.000+08:00| 3.0 | -|1970-01-01T08:00:04.000+08:00| 4.0 | -|1970-01-01T08:00:05.000+08:00| 5.0 | -|1970-01-01T08:00:06.000+08:00| 6.0 | -|1970-01-01T08:00:07.000+08:00| 7.0 | -|1970-01-01T08:00:08.000+08:00| 8.0 | -|1970-01-01T08:00:09.000+08:00| 9.0 | -|1970-01-01T08:00:10.000+08:00| 10.0 | -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -set time_display_type=long; -select envelope(s1),envelope(s1,'frequency'='1000'),envelope(s1,'amplification'='10') from root.test.d1; -``` - -Output series: - - -``` -+----+-------------------------+---------------------------------------------+-----------------------------------------------+ -|Time|envelope(root.test.d1.s1)|envelope(root.test.d1.s1, "frequency"="1000")|envelope(root.test.d1.s1, "amplification"="10")| -+----+-------------------------+---------------------------------------------+-----------------------------------------------+ -| 0| 6.284350808484124| 6.284350808484124| 6.284350808484124| -| 100| 1.5581923657404393| 1.5581923657404393| null| -| 200| 0.8503211038340728| 0.8503211038340728| null| -| 300| 0.512808785945551| 0.512808785945551| null| -| 400| 0.26361156774506744| 0.26361156774506744| null| -|1000| null| null| 1.5581923657404393| -|2000| null| null| 0.8503211038340728| -|3000| null| null| 0.512808785945551| -|4000| null| null| 0.26361156774506744| -+----+-------------------------+---------------------------------------------+-----------------------------------------------+ - -``` - - -## Data Matching - -### Cov - -#### Registration statement - -```sql -create function cov as 'org.apache.iotdb.library.dmatch.UDAFCov' -``` - -#### Usage - -This function is used to calculate the population covariance. - -**Name:** COV - -**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population covariance. - -**Note:** - -+ If a row contains missing points, null points or `NaN`, it will be ignored; -+ If all rows are ignored, `NaN` will be output. - - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d2.s1|root.test.d2.s2| -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| -|2020-01-01T00:00:03.000+08:00| 101.0| null| -|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| -|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| -|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| -|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| -|2020-01-01T00:00:12.000+08:00| null| 103.0| -|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| -|2020-01-01T00:00:15.000+08:00| 113.0| null| -|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| -|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| -|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| -|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| -|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| -|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| -|2020-01-01T00:00:30.000+08:00| NaN| 108.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select cov(s1,s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------+ -| Time|cov(root.test.d2.s1, root.test.d2.s2)| -+-----------------------------+-------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 12.291666666666666| -+-----------------------------+-------------------------------------+ -``` - -### DTW - -#### Registration statement - -```sql -create function dtw as 'org.apache.iotdb.library.dmatch.UDAFDtw' -``` - -#### Usage - -This function is used to calculate the DTW distance between two input series. - -**Name:** DTW - -**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the DTW distance. - -**Note:** - -+ If a row contains missing points, null points or `NaN`, it will be ignored; -+ If all rows are ignored, `0` will be output. - - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d2.s1|root.test.d2.s2| -+-----------------------------+---------------+---------------+ -|1970-01-01T08:00:00.001+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.002+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.003+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.004+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.006+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.007+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.008+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.009+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.010+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.011+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.012+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.013+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.014+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.015+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.016+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.017+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.018+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.019+08:00| 1.0| 2.0| -|1970-01-01T08:00:00.020+08:00| 1.0| 2.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select dtw(s1,s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------+ -| Time|dtw(root.test.d2.s1, root.test.d2.s2)| -+-----------------------------+-------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 20.0| -+-----------------------------+-------------------------------------+ -``` - -### Pearson - -#### Registration statement - -```sql -create function pearson as 'org.apache.iotdb.library.dmatch.UDAFPearson' -``` - -#### Usage - -This function is used to calculate the Pearson Correlation Coefficient. - -**Name:** PEARSON - -**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the Pearson Correlation Coefficient. - -**Note:** - -+ If a row contains missing points, null points or `NaN`, it will be ignored; -+ If all rows are ignored, `NaN` will be output. - - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d2.s1|root.test.d2.s2| -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| -|2020-01-01T00:00:03.000+08:00| 101.0| null| -|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| -|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| -|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| -|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| -|2020-01-01T00:00:12.000+08:00| null| 103.0| -|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| -|2020-01-01T00:00:15.000+08:00| 113.0| null| -|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| -|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| -|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| -|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| -|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| -|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| -|2020-01-01T00:00:30.000+08:00| NaN| 108.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select pearson(s1,s2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-----------------------------------------+ -| Time|pearson(root.test.d2.s1, root.test.d2.s2)| -+-----------------------------+-----------------------------------------+ -|1970-01-01T08:00:00.000+08:00| 0.5630881927754872| -+-----------------------------+-----------------------------------------+ -``` - -### PtnSym - -#### Registration statement - -```sql -create function ptnsym as 'org.apache.iotdb.library.dmatch.UDTFPtnSym' -``` - -#### Usage - -This function is used to find all symmetric subseries in the input whose degree of symmetry is less than the threshold. -The degree of symmetry is calculated by DTW. -The smaller the degree, the more symmetrical the series is. - -**Name:** PATTERNSYMMETRIC - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE - -**Parameter:** - -+ `window`: The length of the symmetric subseries. It's a positive integer and the default value is 10. -+ `threshold`: The threshold of the degree of symmetry. It's non-negative. Only the subseries whose degree of symmetry is below it will be output. By default, all subseries will be output. - - -**Output Series:** Output a single series. The type is DOUBLE. Each data point in the output series corresponds to a symmetric subseries. The output timestamp is the starting timestamp of the subseries and the output value is the degree of symmetry. - -#### Example - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d1.s4| -+-----------------------------+---------------+ -|2021-01-01T12:00:00.000+08:00| 1.0| -|2021-01-01T12:00:01.000+08:00| 2.0| -|2021-01-01T12:00:02.000+08:00| 3.0| -|2021-01-01T12:00:03.000+08:00| 2.0| -|2021-01-01T12:00:04.000+08:00| 1.0| -|2021-01-01T12:00:05.000+08:00| 1.0| -|2021-01-01T12:00:06.000+08:00| 1.0| -|2021-01-01T12:00:07.000+08:00| 1.0| -|2021-01-01T12:00:08.000+08:00| 2.0| -|2021-01-01T12:00:09.000+08:00| 3.0| -|2021-01-01T12:00:10.000+08:00| 2.0| -|2021-01-01T12:00:11.000+08:00| 1.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------+ -| Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| -+-----------------------------+------------------------------------------------------+ -|2021-01-01T12:00:00.000+08:00| 0.0| -|2021-01-01T12:00:07.000+08:00| 0.0| -+-----------------------------+------------------------------------------------------+ -``` - -### XCorr - -#### Registration statement - -```sql -create function xcorr as 'org.apache.iotdb.library.dmatch.UDTFXCorr' -``` - -#### Usage - -This function is used to calculate the cross correlation function of given two time series. -For discrete time series, cross correlation is given by -$$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ -which represent the similarities between two series with different index shifts. - -**Name:** XCORR - -**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Output Series:** Output a single series with DOUBLE as datatype. -There are $2N-1$ data points in the series, the center of which represents the cross correlation -calculated with pre-aligned series(that is $CR(0)$ in the formula above), -and the previous(or post) values represent those with shifting the latter series forward(or backward otherwise) -until the two series are no longer overlapped(not included). -In short, the values of output series are given by(index starts from 1) -$$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ -$$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ - -**Note:** - -+ `null` and `NaN` values in the input series will be ignored and treated as 0. - -#### Examples - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d1.s1|root.test.d1.s2| -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:01.000+08:00| null| 6| -|2020-01-01T00:00:02.000+08:00| 2| 7| -|2020-01-01T00:00:03.000+08:00| 3| NaN| -|2020-01-01T00:00:04.000+08:00| 4| 9| -|2020-01-01T00:00:05.000+08:00| 5| 10| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 -``` - -Output series: - -``` -+-----------------------------+---------------------------------------+ -| Time|xcorr(root.test.d1.s1, root.test.d1.s2)| -+-----------------------------+---------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 0.0| -|1970-01-01T08:00:00.002+08:00| 4.0| -|1970-01-01T08:00:00.003+08:00| 9.6| -|1970-01-01T08:00:00.004+08:00| 13.4| -|1970-01-01T08:00:00.005+08:00| 20.0| -|1970-01-01T08:00:00.006+08:00| 15.6| -|1970-01-01T08:00:00.007+08:00| 9.2| -|1970-01-01T08:00:00.008+08:00| 11.8| -|1970-01-01T08:00:00.009+08:00| 6.0| -+-----------------------------+---------------------------------------+ -``` - - - -## Data Repairing - -### TimestampRepair - -#### Registration statement - -```sql -create function timestamprepair as 'org.apache.iotdb.library.drepair.UDTFTimestampRepair' -``` - -#### Usage - -This function is used for timestamp repair. -According to the given standard time interval, -the method of minimizing the repair cost is adopted. -By fine-tuning the timestamps, -the original data with unstable timestamp interval is repaired to strictly equispaced data. -If no standard time interval is given, -this function will use the **median**, **mode** or **cluster** of the time interval to estimate the standard time interval. - -**Name:** TIMESTAMPREPAIR - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `interval`: The standard time interval whose unit is millisecond. It is a positive integer. By default, it will be estimated according to the given method. -+ `method`: The method to estimate the standard time interval, which is 'median', 'mode' or 'cluster'. This parameter is only valid when `interval` is not given. By default, median will be used. - -**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. - -#### Examples - -##### Manually Specify the Standard Time Interval - -When `interval` is given, this function repairs according to the given standard time interval. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s1| -+-----------------------------+---------------+ -|2021-07-01T12:00:00.000+08:00| 1.0| -|2021-07-01T12:00:10.000+08:00| 2.0| -|2021-07-01T12:00:19.000+08:00| 3.0| -|2021-07-01T12:00:30.000+08:00| 4.0| -|2021-07-01T12:00:40.000+08:00| 5.0| -|2021-07-01T12:00:50.000+08:00| 6.0| -|2021-07-01T12:01:01.000+08:00| 7.0| -|2021-07-01T12:01:11.000+08:00| 8.0| -|2021-07-01T12:01:21.000+08:00| 9.0| -|2021-07-01T12:01:31.000+08:00| 10.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select timestamprepair(s1,'interval'='10000') from root.test.d2 -``` - -Output series: - - -``` -+-----------------------------+----------------------------------------------------+ -| Time|timestamprepair(root.test.d2.s1, "interval"="10000")| -+-----------------------------+----------------------------------------------------+ -|2021-07-01T12:00:00.000+08:00| 1.0| -|2021-07-01T12:00:10.000+08:00| 2.0| -|2021-07-01T12:00:20.000+08:00| 3.0| -|2021-07-01T12:00:30.000+08:00| 4.0| -|2021-07-01T12:00:40.000+08:00| 5.0| -|2021-07-01T12:00:50.000+08:00| 6.0| -|2021-07-01T12:01:00.000+08:00| 7.0| -|2021-07-01T12:01:10.000+08:00| 8.0| -|2021-07-01T12:01:20.000+08:00| 9.0| -|2021-07-01T12:01:30.000+08:00| 10.0| -+-----------------------------+----------------------------------------------------+ -``` - -##### Automatically Estimate the Standard Time Interval - -When `interval` is default, this function estimates the standard time interval. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select timestamprepair(s1) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+--------------------------------+ -| Time|timestamprepair(root.test.d2.s1)| -+-----------------------------+--------------------------------+ -|2021-07-01T12:00:00.000+08:00| 1.0| -|2021-07-01T12:00:10.000+08:00| 2.0| -|2021-07-01T12:00:20.000+08:00| 3.0| -|2021-07-01T12:00:30.000+08:00| 4.0| -|2021-07-01T12:00:40.000+08:00| 5.0| -|2021-07-01T12:00:50.000+08:00| 6.0| -|2021-07-01T12:01:00.000+08:00| 7.0| -|2021-07-01T12:01:10.000+08:00| 8.0| -|2021-07-01T12:01:20.000+08:00| 9.0| -|2021-07-01T12:01:30.000+08:00| 10.0| -+-----------------------------+--------------------------------+ -``` - -### ValueFill - -#### Registration statement - -```sql -create function valuefill as 'org.apache.iotdb.library.drepair.UDTFValueFill' -``` - -#### Usage - -This function is used to impute time series. Several methods are supported. - -**Name**: ValueFill -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, default "linear". - Method to use for imputation in series. "mean": use global mean value to fill holes; "previous": propagate last valid observation forward to next valid. "linear": simplest interpolation method; "likelihood":Maximum likelihood estimation based on the normal distribution of speed; "AR": auto regression; "MA": moving average; "SCREEN": speed constraint. - -**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. - -**Note:** AR method use AR(1) model. Input value should be auto-correlated, or the function would output a single point (0, 0.0). - -#### Examples - -##### Fill with linear - -When `method` is "linear" or the default, Screen method is used to impute. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| NaN| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| NaN| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| NaN| -|2020-01-01T00:00:22.000+08:00| NaN| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| 128.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select valuefill(s1) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-----------------------+ -| Time|valuefill(root.test.d2)| -+-----------------------------+-----------------------+ -|2020-01-01T00:00:02.000+08:00| NaN| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 108.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.7| -|2020-01-01T00:00:22.000+08:00| 121.3| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| 128.0| -+-----------------------------+-----------------------+ -``` - -##### Previous Fill - -When `method` is "previous", previous method is used. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select valuefill(s1,"method"="previous") from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------+ -| Time|valuefill(root.test.d2,"method"="previous")| -+-----------------------------+-------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| NaN| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 110.5| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 116.0| -|2020-01-01T00:00:22.000+08:00| 116.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| 128.0| -+-----------------------------+-------------------------------------------+ -``` - -### ValueRepair - -#### Registration statement - -```sql -create function valuerepair as 'org.apache.iotdb.library.drepair.UDTFValueRepair' -``` - -#### Usage - -This function is used to repair the value of the time series. -Currently, two methods are supported: -**Screen** is a method based on speed threshold, which makes all speeds meet the threshold requirements under the premise of minimum changes; -**LsGreedy** is a method based on speed change likelihood, which models speed changes as Gaussian distribution, and uses a greedy algorithm to maximize the likelihood. - - -**Name:** VALUEREPAIR - -**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `method`: The method used to repair, which is 'Screen' or 'LsGreedy'. By default, Screen is used. -+ `minSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds below it will be regarded as outliers. By default, it is the median minus 3 times of median absolute deviation. -+ `maxSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds above it will be regarded as outliers. By default, it is the median plus 3 times of median absolute deviation. -+ `center`: This parameter is only valid with LsGreedy. It is the center of the Gaussian distribution of speed changes. By default, it is 0. -+ `sigma`: This parameter is only valid with LsGreedy. It is the standard deviation of the Gaussian distribution of speed changes. By default, it is the median absolute deviation. - -**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. - -**Note:** `NaN` will be filled with linear interpolation before repairing. - -#### Examples - -##### Repair with Screen - -When `method` is 'Screen' or the default, Screen method is used. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 126.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 100.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| NaN| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select valuerepair(s1) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+----------------------------+ -| Time|valuerepair(root.test.d2.s1)| -+-----------------------------+----------------------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 106.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| 128.0| -+-----------------------------+----------------------------+ -``` - -##### Repair with LsGreedy - -When `method` is 'LsGreedy', LsGreedy method is used. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select valuerepair(s1,'method'='LsGreedy') from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------------+ -| Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")| -+-----------------------------+-------------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:03.000+08:00| 101.0| -|2020-01-01T00:00:04.000+08:00| 102.0| -|2020-01-01T00:00:06.000+08:00| 104.0| -|2020-01-01T00:00:08.000+08:00| 106.0| -|2020-01-01T00:00:10.000+08:00| 108.0| -|2020-01-01T00:00:14.000+08:00| 112.0| -|2020-01-01T00:00:15.000+08:00| 113.0| -|2020-01-01T00:00:16.000+08:00| 114.0| -|2020-01-01T00:00:18.000+08:00| 116.0| -|2020-01-01T00:00:20.000+08:00| 118.0| -|2020-01-01T00:00:22.000+08:00| 120.0| -|2020-01-01T00:00:26.000+08:00| 124.0| -|2020-01-01T00:00:28.000+08:00| 126.0| -|2020-01-01T00:00:30.000+08:00| 128.0| -+-----------------------------+-------------------------------------------------+ -``` - -### MasterRepair - -#### Usage - -This function is used to clean time series with master data. - -**Name**: MasterRepair -**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `omega`: The window size. It is a non-negative integer whose unit is millisecond. By default, it will be estimated according to the distances of two tuples with various time differences. -+ `eta`: The distance threshold. It is a positive number. By default, it will be estimated according to the distance distribution of tuples in windows. -+ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple dis- tance of the k-th nearest neighbor in the master data. -+ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. - -**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. - -#### Examples - -Input series: - -``` -+-----------------------------+------------+------------+------------+------------+------------+------------+ -| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| -+-----------------------------+------------+------------+------------+------------+------------+------------+ -|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| -|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| -|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| -|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| -|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| -|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| -|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| -|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| -+-----------------------------+------------+------------+------------+------------+------------+------------+ -``` - -SQL for query: - -```sql -select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test -``` - -Output series: - - -``` -+-----------------------------+-------------------------------------------------------------------------------------------+ -| Time|MasterRepair(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| -+-----------------------------+-------------------------------------------------------------------------------------------+ -|2021-07-01T12:00:01.000+08:00| 1704| -|2021-07-01T12:00:02.000+08:00| 1702| -|2021-07-01T12:00:03.000+08:00| 1702| -|2021-07-01T12:00:04.000+08:00| 1701| -|2021-07-01T12:00:07.000+08:00| 1703| -|2021-07-01T12:00:08.000+08:00| 1704| -|2021-07-01T12:01:09.000+08:00| 1705| -|2021-07-01T12:01:10.000+08:00| 1706| -+-----------------------------+-------------------------------------------------------------------------------------------+ -``` - -### SeasonalRepair - -#### Usage -This function is used to repair the value of the seasonal time series via decomposition. Currently, two methods are supported: **Classical** - detect irregular fluctuations through residual component decomposed by classical decomposition, and repair them through moving average; **Improved** - detect irregular fluctuations through residual component decomposed by improved decomposition, and repair them through moving median. - -**Name:** SEASONALREPAIR - -**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -+ `method`: The decomposition method used to repair, which is 'Classical' or 'Improved'. By default, classical decomposition is used. -+ `period`: It is the period of the time series. -+ `k`: It is the range threshold of residual term, which limits the degree to which the residual term is off-center. By default, it is 9. -+ `max_iter`: It is the maximum number of iterations for the algorithm. By default, it is 10. - -**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. - -**Note:** `NaN` will be filled with linear interpolation before repairing. - -#### Examples - -##### Repair with Classical - -When `method` is 'Classical' or default value, classical decomposition method is used. - -Input series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d2.s1| -+-----------------------------+---------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:04.000+08:00| 120.0| -|2020-01-01T00:00:06.000+08:00| 80.0| -|2020-01-01T00:00:08.000+08:00| 100.5| -|2020-01-01T00:00:10.000+08:00| 119.5| -|2020-01-01T00:00:12.000+08:00| 101.0| -|2020-01-01T00:00:14.000+08:00| 99.5| -|2020-01-01T00:00:16.000+08:00| 119.0| -|2020-01-01T00:00:18.000+08:00| 80.5| -|2020-01-01T00:00:20.000+08:00| 99.0| -|2020-01-01T00:00:22.000+08:00| 121.0| -|2020-01-01T00:00:24.000+08:00| 79.5| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------+ -| Time|seasonalrepair(root.test.d2.s1, 'period'=4, 'k'=2)| -+-----------------------------+--------------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:04.000+08:00| 120.0| -|2020-01-01T00:00:06.000+08:00| 80.0| -|2020-01-01T00:00:08.000+08:00| 100.5| -|2020-01-01T00:00:10.000+08:00| 119.5| -|2020-01-01T00:00:12.000+08:00| 87.0| -|2020-01-01T00:00:14.000+08:00| 99.5| -|2020-01-01T00:00:16.000+08:00| 119.0| -|2020-01-01T00:00:18.000+08:00| 80.5| -|2020-01-01T00:00:20.000+08:00| 99.0| -|2020-01-01T00:00:22.000+08:00| 121.0| -|2020-01-01T00:00:24.000+08:00| 79.5| -+-----------------------------+--------------------------------------------------+ -``` - -##### Repair with Improved -When `method` is 'Improved', improved decomposition method is used. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 -``` - -Output series: - -``` -+-----------------------------+-------------------------------------------------------------+ -| Time|valuerepair(root.test.d2.s1, 'method'='improved', 'period'=3)| -+-----------------------------+-------------------------------------------------------------+ -|2020-01-01T00:00:02.000+08:00| 100.0| -|2020-01-01T00:00:04.000+08:00| 120.0| -|2020-01-01T00:00:06.000+08:00| 80.0| -|2020-01-01T00:00:08.000+08:00| 100.5| -|2020-01-01T00:00:10.000+08:00| 119.5| -|2020-01-01T00:00:12.000+08:00| 81.5| -|2020-01-01T00:00:14.000+08:00| 99.5| -|2020-01-01T00:00:16.000+08:00| 119.0| -|2020-01-01T00:00:18.000+08:00| 80.5| -|2020-01-01T00:00:20.000+08:00| 99.0| -|2020-01-01T00:00:22.000+08:00| 121.0| -|2020-01-01T00:00:24.000+08:00| 79.5| -+-----------------------------+-------------------------------------------------------------+ -``` - - - -## Series Discovery - -### ConsecutiveSequences - -#### Registration statement - -```sql -create function consecutivesequences as 'org.apache.iotdb.library.series.UDTFConsecutiveSequences' -``` - -#### Usage - -This function is used to find locally longest consecutive subsequences in strictly equispaced multidimensional data. - -Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. - -Consecutive subsequence is the subsequence that is strictly equispaced with the standard time interval without any missing data. If a consecutive subsequence is not a proper subsequence of any consecutive subsequence, it is locally longest. - -**Name:** CONSECUTIVESEQUENCES - -**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. - -**Parameters:** - -+ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. - -**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a locally longest consecutive subsequence. The output timestamp is the starting timestamp of the subsequence and the output value is the number of data points in the subsequence. - -**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. - -#### Examples - -##### Manually Specify the Standard Time Interval - -It's able to manually specify the standard time interval by the parameter `gap`. It's notable that false parameter leads to false output. - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d1.s1|root.test.d1.s2| -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:40:00.000+08:00| 1.0| null| -|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------------------+ -| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")| -+-----------------------------+------------------------------------------------------------------+ -|2020-01-01T00:00:00.000+08:00| 3| -|2020-01-01T00:20:00.000+08:00| 4| -|2020-01-01T00:45:00.000+08:00| 2| -+-----------------------------+------------------------------------------------------------------+ -``` - - -##### Automatically Estimate the Standard Time Interval - -When `gap` is default, this function estimates the standard time interval by the mode of time intervals and gets the same results. Therefore, this usage is more recommended. - -Input series is the same as above, the SQL for query is shown below: - -```sql -select consecutivesequences(s1,s2) from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+------------------------------------------------------+ -| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)| -+-----------------------------+------------------------------------------------------+ -|2020-01-01T00:00:00.000+08:00| 3| -|2020-01-01T00:20:00.000+08:00| 4| -|2020-01-01T00:45:00.000+08:00| 2| -+-----------------------------+------------------------------------------------------+ -``` - -### ConsecutiveWindows - -#### Registration statement - -```sql -create function consecutivewindows as 'org.apache.iotdb.library.series.UDTFConsecutiveWindows' -``` - -#### Usage - -This function is used to find consecutive windows of specified length in strictly equispaced multidimensional data. - -Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. - -Consecutive window is the subsequence that is strictly equispaced with the standard time interval without any missing data. - -**Name:** CONSECUTIVEWINDOWS - -**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. - -**Parameters:** - -+ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. -+ `length`: The length of the window which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. - -**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a consecutive window. The output timestamp is the starting timestamp of the window and the output value is the number of data points in the window. - -**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. - -#### Examples - - -Input series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d1.s1|root.test.d1.s2| -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:40:00.000+08:00| 1.0| null| -|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| -|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 -``` - -Output series: - -``` -+-----------------------------+--------------------------------------------------------------------+ -| Time|consecutivewindows(root.test.d1.s1, root.test.d1.s2, "length"="10m")| -+-----------------------------+--------------------------------------------------------------------+ -|2020-01-01T00:00:00.000+08:00| 3| -|2020-01-01T00:20:00.000+08:00| 3| -|2020-01-01T00:25:00.000+08:00| 3| -+-----------------------------+--------------------------------------------------------------------+ -``` - - - -## Machine Learning - -### AR - -#### Registration statement - -```sql -create function ar as 'org.apache.iotdb.library.dlearn.UDTFAR' -``` - -#### Usage - -This function is used to learn the coefficients of the autoregressive models for a time series. - -**Name:** AR - -**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -- `p`: The order of the autoregressive model. Its default value is 1. - -**Output Series:** Output a single series. The type is DOUBLE. The first line corresponds to the first order coefficient, and so on. - -**Note:** - -- Parameter `p` should be a positive integer. -- Most points in the series should be sampled at a constant time interval. -- Linear interpolation is applied for the missing points in the series. - -#### Examples - -##### Assigning Model Order - -Input Series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d0.s0| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| -4.0| -|2020-01-01T00:00:02.000+08:00| -3.0| -|2020-01-01T00:00:03.000+08:00| -2.0| -|2020-01-01T00:00:04.000+08:00| -1.0| -|2020-01-01T00:00:05.000+08:00| 0.0| -|2020-01-01T00:00:06.000+08:00| 1.0| -|2020-01-01T00:00:07.000+08:00| 2.0| -|2020-01-01T00:00:08.000+08:00| 3.0| -|2020-01-01T00:00:09.000+08:00| 4.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select ar(s0,"p"="2") from root.test.d0 -``` - -Output Series: - -``` -+-----------------------------+---------------------------+ -| Time|ar(root.test.d0.s0,"p"="2")| -+-----------------------------+---------------------------+ -|1970-01-01T08:00:00.001+08:00| 0.9429| -|1970-01-01T08:00:00.002+08:00| -0.2571| -+-----------------------------+---------------------------+ -``` - -### Representation - -#### Usage - -This function is used to represent a time series. - -**Name:** Representation - -**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -- `tb`: The number of timestamp blocks. Its default value is 10. -- `vb`: The number of value blocks. Its default value is 10. - -**Output Series:** Output a single series. The type is INT32. The length is `tb*vb`. The timestamps starting from 0 only indicate the order. - -**Note:** - -- Parameters `tb` and `vb` should be positive integers. - -#### Examples - -##### Assigning Window Size and Dimension - -Input Series: - -``` -+-----------------------------+---------------+ -| Time|root.test.d0.s0| -+-----------------------------+---------------+ -|2020-01-01T00:00:01.000+08:00| -4.0| -|2020-01-01T00:00:02.000+08:00| -3.0| -|2020-01-01T00:00:03.000+08:00| -2.0| -|2020-01-01T00:00:04.000+08:00| -1.0| -|2020-01-01T00:00:05.000+08:00| 0.0| -|2020-01-01T00:00:06.000+08:00| 1.0| -|2020-01-01T00:00:07.000+08:00| 2.0| -|2020-01-01T00:00:08.000+08:00| 3.0| -|2020-01-01T00:00:09.000+08:00| 4.0| -+-----------------------------+---------------+ -``` - -SQL for query: - -```sql -select representation(s0,"tb"="3","vb"="2") from root.test.d0 -``` - -Output Series: - -``` -+-----------------------------+-------------------------------------------------+ -| Time|representation(root.test.d0.s0,"tb"="3","vb"="2")| -+-----------------------------+-------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1| -|1970-01-01T08:00:00.002+08:00| 1| -|1970-01-01T08:00:00.003+08:00| 0| -|1970-01-01T08:00:00.004+08:00| 0| -|1970-01-01T08:00:00.005+08:00| 1| -|1970-01-01T08:00:00.006+08:00| 1| -+-----------------------------+-------------------------------------------------+ -``` - -### RM - -#### Usage - -This function is used to calculate the matching score of two time series according to the representation. - -**Name:** RM - -**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. - -**Parameters:** - -- `tb`: The number of timestamp blocks. Its default value is 10. -- `vb`: The number of value blocks. Its default value is 10. - -**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the matching score. - -**Note:** - -- Parameters `tb` and `vb` should be positive integers. - -#### Examples - -##### Assigning Window Size and Dimension - -Input Series: - -``` -+-----------------------------+---------------+---------------+ -| Time|root.test.d0.s0|root.test.d0.s1 -+-----------------------------+---------------+---------------+ -|2020-01-01T00:00:01.000+08:00| -4.0| -4.0| -|2020-01-01T00:00:02.000+08:00| -3.0| -3.0| -|2020-01-01T00:00:03.000+08:00| -3.0| -3.0| -|2020-01-01T00:00:04.000+08:00| -1.0| -1.0| -|2020-01-01T00:00:05.000+08:00| 0.0| 0.0| -|2020-01-01T00:00:06.000+08:00| 1.0| 1.0| -|2020-01-01T00:00:07.000+08:00| 2.0| 2.0| -|2020-01-01T00:00:08.000+08:00| 3.0| 3.0| -|2020-01-01T00:00:09.000+08:00| 4.0| 4.0| -+-----------------------------+---------------+---------------+ -``` - -SQL for query: - -```sql -select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 -``` - -Output Series: - -``` -+-----------------------------+-----------------------------------------------------+ -| Time|rm(root.test.d0.s0,root.test.d0.s1,"tb"="3","vb"="2")| -+-----------------------------+-----------------------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 1.00| -+-----------------------------+-----------------------------------------------------+ -``` - diff --git a/src/UserGuide/V1.3.0-2/Reference/UDF-development.md b/src/UserGuide/V1.3.0-2/Reference/UDF-development.md deleted file mode 100644 index 0a3efb6bb..000000000 --- a/src/UserGuide/V1.3.0-2/Reference/UDF-development.md +++ /dev/null @@ -1,743 +0,0 @@ - # UDF development - -## 1. UDF development - -### 1.1 UDF Development Dependencies - -If you use [Maven](http://search.maven.org/), you can search for the development dependencies listed below from the [Maven repository](http://search.maven.org/) . Please note that you must select the same dependency version as the target IoTDB server version for development. - -``` xml - - org.apache.iotdb - udf-api - 1.0.0 - provided - -``` - -## 1.2 UDTF(User Defined Timeseries Generating Function) - -To write a UDTF, you need to inherit the `org.apache.iotdb.udf.api.UDTF` class, and at least implement the `beforeStart` method and a `transform` method. - -#### Interface Description: - -| Interface definition | Description | Required to Implement | -| :----------------------------------------------------------- | :----------------------------------------------------------- | ----------------------------------------------------- | -| void validate(UDFParameterValidator validator) throws Exception | This method is mainly used to validate `UDFParameters` and it is executed before `beforeStart(UDFParameters, UDTFConfigurations)` is called. | Optional | -| void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception | The initialization method to call the user-defined initialization behavior before a UDTF processes the input data. Every time a user executes a UDTF query, the framework will construct a new UDF instance, and `beforeStart` will be called. | Required | -| Object transform(Row row) throws Exception | This method is called by the framework. This data processing method will be called when you choose to use the `MappableRowByRowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `Row`, and the transformation result should be returned. | Required to implement at least one `transform` method | -| void transform(Column[] columns, ColumnBuilder builder) throws Exception | This method is called by the framework. This data processing method will be called when you choose to use the `MappableRowByRowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `Column[]`, and the transformation result should be output by `ColumnBuilder`. You need to call the data collection method provided by `builder` to determine the output data. | Required to implement at least one `transform` method | -| void transform(Row row, PointCollector collector) throws Exception | This method is called by the framework. This data processing method will be called when you choose to use the `RowByRowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `Row`, and the transformation result should be output by `PointCollector`. You need to call the data collection method provided by `collector` to determine the output data. | Required to implement at least one `transform` method | -| void transform(RowWindow rowWindow, PointCollector collector) throws Exception | This method is called by the framework. This data processing method will be called when you choose to use the `SlidingSizeWindowAccessStrategy` or `SlidingTimeWindowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `RowWindow`, and the transformation result should be output by `PointCollector`. You need to call the data collection method provided by `collector` to determine the output data. | Required to implement at least one `transform` method | -| void terminate(PointCollector collector) throws Exception | This method is called by the framework. This method will be called once after all `transform` calls have been executed. In a single UDF query, this method will and will only be called once. You need to call the data collection method provided by `collector` to determine the output data. | Optional | -| void beforeDestroy() | This method is called by the framework after the last input data is processed, and will only be called once in the life cycle of each UDF instance. | Optional | - -In the life cycle of a UDTF instance, the calling sequence of each method is as follows: - -1. void validate(UDFParameterValidator validator) throws Exception -2. void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception -3. `Object transform(Row row) throws Exception` or `void transform(Column[] columns, ColumnBuilder builder) throws Exception` or `void transform(Row row, PointCollector collector) throws Exception` or `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` -4. void terminate(PointCollector collector) throws Exception -5. void beforeDestroy() - -> Note that every time the framework executes a UDTF query, a new UDF instance will be constructed. When the query ends, the corresponding instance will be destroyed. Therefore, the internal data of the instances in different UDTF queries (even in the same SQL statement) are isolated. You can maintain some state data in the UDTF without considering the influence of concurrency and other factors. - -#### Detailed interface introduction: - -1. **void validate(UDFParameterValidator validator) throws Exception** - -The `validate` method is used to validate the parameters entered by the user. - -In this method, you can limit the number and types of input time series, check the attributes of user input, or perform any custom verification. - -Please refer to the Javadoc for the usage of `UDFParameterValidator`. - - -2. **void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception** - -This method is mainly used to customize UDTF. In this method, the user can do the following things: - -1. Use UDFParameters to get the time series paths and parse key-value pair attributes entered by the user. -2. Set the strategy to access the raw data and set the output data type in UDTFConfigurations. -3. Create resources, such as establishing external connections, opening files, etc. - - -2.1 **UDFParameters** - -`UDFParameters` is used to parse UDF parameters in SQL statements (the part in parentheses after the UDF function name in SQL). The input parameters have two parts. The first part is data types of the time series that the UDF needs to process, and the second part is the key-value pair attributes for customization. Only the second part can be empty. - - -Example: - -``` sql -SELECT UDF(s1, s2, 'key1'='iotdb', 'key2'='123.45') FROM root.sg.d; -``` - -Usage: - -``` java -void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { - String stringValue = parameters.getString("key1"); // iotdb - Float floatValue = parameters.getFloat("key2"); // 123.45 - Double doubleValue = parameters.getDouble("key3"); // null - int intValue = parameters.getIntOrDefault("key4", 678); // 678 - // do something - - // configurations - // ... -} -``` - - -2.2 **UDTFConfigurations** - -You must use `UDTFConfigurations` to specify the strategy used by UDF to access raw data and the type of output sequence. - -Usage: - -``` java -void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { - // parameters - // ... - - // configurations - configurations - .setAccessStrategy(new RowByRowAccessStrategy()) - .setOutputDataType(Type.INT32); -} -``` - -The `setAccessStrategy` method is used to set the UDF's strategy for accessing the raw data, and the `setOutputDataType` method is used to set the data type of the output sequence. - - 2.2.1 **setAccessStrategy** - - -Note that the raw data access strategy you set here determines which `transform` method the framework will call. Please implement the `transform` method corresponding to the raw data access strategy. Of course, you can also dynamically decide which strategy to set based on the attribute parameters parsed by `UDFParameters`. Therefore, two `transform` methods are also allowed to be implemented in one UDF. - -The following are the strategies you can set: - -| Interface definition | Description | The `transform` Method to Call | -| :-------------------------------- | :----------------------------------------------------------- | ------------------------------------------------------------ | -| MappableRowByRowStrategy | Custom scalar function
The framework will call the `transform` method once for each row of raw data input, with k columns of time series and 1 row of data input, and 1 column of time series and 1 row of data output. It can be used in any clause and expression where scalar functions appear, such as select clauses, where clauses, etc. | void transform(Column[] columns, ColumnBuilder builder) throws ExceptionObject transform(Row row) throws Exception | -| RowByRowAccessStrategy | Customize time series generation function to process raw data line by line.
The framework will call the `transform` method once for each row of raw data input, inputting k columns of time series and 1 row of data, and outputting 1 column of time series and n rows of data.
When a sequence is input, the row serves as a data point for the input sequence.
When multiple sequences are input, after aligning the input sequences in time, each row serves as a data point for the input sequence.
(In a row of data, there may be a column with a `null` value, but not all columns are `null`) | void transform(Row row, PointCollector collector) throws Exception | -| SlidingTimeWindowAccessStrategy | Customize time series generation functions to process raw data in a sliding time window manner.
The framework will call the `transform` method once for each raw data input window, input k columns of time series m rows of data, and output 1 column of time series n rows of data.
A window may contain multiple rows of data, and after aligning the input sequence in time, each window serves as a data point for the input sequence.
(Each window may have i rows, and each row of data may have a column with a `null` value, but not all of them are `null`) | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | -| SlidingSizeWindowAccessStrategy | Customize the time series generation function to process raw data in a fixed number of rows, meaning that each data processing window will contain a fixed number of rows of data (except for the last window).
The framework will call the `transform` method once for each raw data input window, input k columns of time series m rows of data, and output 1 column of time series n rows of data.
A window may contain multiple rows of data, and after aligning the input sequence in time, each window serves as a data point for the input sequence.
(Each window may have i rows, and each row of data may have a column with a `null` value, but not all of them are `null`) | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | -| SessionTimeWindowAccessStrategy | Customize time series generation functions to process raw data in a session window format.
The framework will call the `transform` method once for each raw data input window, input k columns of time series m rows of data, and output 1 column of time series n rows of data.
A window may contain multiple rows of data, and after aligning the input sequence in time, each window serves as a data point for the input sequence.
(Each window may have i rows, and each row of data may have a column with a `null` value, but not all of them are `null`) | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | -| StateWindowAccessStrategy | Customize time series generation functions to process raw data in a state window format.
he framework will call the `transform` method once for each raw data input window, inputting 1 column of time series m rows of data and outputting 1 column of time series n rows of data.
A window may contain multiple rows of data, and currently only supports opening windows for one physical quantity, which is one column of data. | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | - - -#### Interface Description: - -- `MappableRowByRowStrategy` and `RowByRowAccessStrategy`: The construction of `RowByRowAccessStrategy` does not require any parameters. - -- `SlidingTimeWindowAccessStrategy` - -Window opening diagram: - - - -`SlidingTimeWindowAccessStrategy`: `SlidingTimeWindowAccessStrategy` has many constructors, you can pass 3 types of parameters to them: - -- Parameter 1: The display window on the time axis - -The first type of parameters are optional. If the parameters are not provided, the beginning time of the display window will be set to the same as the minimum timestamp of the query result set, and the ending time of the display window will be set to the same as the maximum timestamp of the query result set. - -- Parameter 2: Time interval for dividing the time axis (should be positive) -- Parameter 3: Time sliding step (not required to be greater than or equal to the time interval, but must be a positive number) - -The sliding step parameter is also optional. If the parameter is not provided, the sliding step will be set to the same as the time interval for dividing the time axis. - -The relationship between the three types of parameters can be seen in the figure below. Please see the Javadoc for more details. - -
- -> Note that the actual time interval of some of the last time windows may be less than the specified time interval parameter. In addition, there may be cases where the number of data rows in some time windows is 0. In these cases, the framework will also call the `transform` method for the empty windows. - -- `SlidingSizeWindowAccessStrategy` - -Window opening diagram: - - - -`SlidingSizeWindowAccessStrategy`: `SlidingSizeWindowAccessStrategy` has many constructors, you can pass 2 types of parameters to them: - -* Parameter 1: Window size. This parameter specifies the number of data rows contained in a data processing window. Note that the number of data rows in some of the last time windows may be less than the specified number of data rows. -* Parameter 2: Sliding step. This parameter means the number of rows between the first point of the next window and the first point of the current window. (This parameter is not required to be greater than or equal to the window size, but must be a positive number) - -The sliding step parameter is optional. If the parameter is not provided, the sliding step will be set to the same as the window size. - -- `SessionTimeWindowAccessStrategy` - -Window opening diagram: **Time intervals less than or equal to the given minimum time interval `sessionGap` are assigned in one group.** - - - -`SessionTimeWindowAccessStrategy`: `SessionTimeWindowAccessStrategy` has many constructors, you can pass 2 types of parameters to them: - -- Parameter 1: The display window on the time axis. -- Parameter 2: The minimum time interval `sessionGap` of two adjacent windows. - -- `StateWindowAccessStrategy` - -Window opening diagram: **For numerical data, if the state difference is less than or equal to the given threshold `delta`, it will be assigned in one group.** - - - -`StateWindowAccessStrategy` has four constructors. - -- Constructor 1: For numerical data, there are 3 parameters: the time axis can display the start and end time of the time window and the threshold `delta` for the allowable change within a single window. -- Constructor 2: For text data and boolean data, there are 3 parameters: the time axis can be provided to display the start and end time of the time window. For both data types, the data within a single window is same, and there is no need to provide an allowable change threshold. -- Constructor 3: For numerical data, there are 1 parameters: you can only provide the threshold delta that is allowed to change within a single window. The start time of the time axis display time window will be defined as the smallest timestamp in the entire query result set, and the time axis display time window end time will be defined as The largest timestamp in the entire query result set. -- Constructor 4: For text data and boolean data, you can provide no parameter. The start and end timestamps are explained in Constructor 3. - -StateWindowAccessStrategy can only take one column as input for now. - -Please see the Javadoc for more details. - - 2.2.2 **setOutputDataType** - -Note that the type of output sequence you set here determines the type of data that the `PointCollector` can actually receive in the `transform` method. The relationship between the output data type set in `setOutputDataType` and the actual data output type that `PointCollector` can receive is as follows: - -| Output Data Type Set in `setOutputDataType` | Data Type that `PointCollector` Can Receive | -| :------------------------------------------ | :----------------------------------------------------------- | -| INT32 | int | -| INT64 | long | -| FLOAT | float | -| DOUBLE | double | -| BOOLEAN | boolean | -| TEXT | java.lang.String and org.apache.iotdb.udf.api.type.Binar` | - -The type of output time series of a UDTF is determined at runtime, which means that a UDTF can dynamically determine the type of output time series according to the type of input time series. -Here is a simple example: - -```java -void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { - // do something - // ... - - configurations - .setAccessStrategy(new RowByRowAccessStrategy()) - .setOutputDataType(parameters.getDataType(0)); -} -``` - -3. **Object transform(Row row) throws Exception** - -You need to implement this method or `transform(Column[] columns, ColumnBuilder builder) throws Exception` when you specify the strategy of UDF to read the original data as `MappableRowByRowAccessStrategy`. - -This method processes the raw data one row at a time. The raw data is input from `Row` and output by its return object. You must return only one object based on each input data point in a single `transform` method call, i.e., input and output are one-to-one. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. - -The following is a complete UDF example that implements the `Object transform(Row row) throws Exception` method. It is an adder that receives two columns of time series as input. - -```java -import org.apache.iotdb.udf.api.UDTF; -import org.apache.iotdb.udf.api.access.Row; -import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; -import org.apache.iotdb.udf.api.customizer.parameter.UDFParameterValidator; -import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; -import org.apache.iotdb.udf.api.customizer.strategy.MappableRowByRowAccessStrategy; -import org.apache.iotdb.udf.api.type.Type; - -public class Adder implements UDTF { - private Type dataType; - - @Override - public void validate(UDFParameterValidator validator) throws Exception { - validator - .validateInputSeriesNumber(2) - .validateInputSeriesDataType(0, Type.INT64) - .validateInputSeriesDataType(1, Type.INT64); - } - - @Override - public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { - dataType = parameters.getDataType(0); - configurations - .setAccessStrategy(new MappableRowByRowAccessStrategy()) - .setOutputDataType(dataType); - } - - @Override - public Object transform(Row row) throws Exception { - return row.getLong(0) + row.getLong(1); - } -} -``` - - - -4. **void transform(Column[] columns, ColumnBuilder builder) throws Exception** - -You need to implement this method or `Object transform(Row row) throws Exception` when you specify the strategy of UDF to read the original data as `MappableRowByRowAccessStrategy`. - -This method processes the raw data multiple rows at a time. After performance tests, we found that UDTF that process multiple rows at once perform better than those UDTF that process one data point at a time. The raw data is input from `Column[]` and output by `ColumnBuilder`. You must output a corresponding data point based on each input data point in a single `transform` method call, i.e., input and output are still one-to-one. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. - -The following is a complete UDF example that implements the `void transform(Column[] columns, ColumnBuilder builder) throws Exception` method. It is an adder that receives two columns of time series as input. - -```java -import org.apache.iotdb.tsfile.read.common.block.column.Column; -import org.apache.iotdb.tsfile.read.common.block.column.ColumnBuilder; -import org.apache.iotdb.udf.api.UDTF; -import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; -import org.apache.iotdb.udf.api.customizer.parameter.UDFParameterValidator; -import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; -import org.apache.iotdb.udf.api.customizer.strategy.MappableRowByRowAccessStrategy; -import org.apache.iotdb.udf.api.type.Type; - -public class Adder implements UDTF { - private Type type; - - @Override - public void validate(UDFParameterValidator validator) throws Exception { - validator - .validateInputSeriesNumber(2) - .validateInputSeriesDataType(0, Type.INT64) - .validateInputSeriesDataType(1, Type.INT64); - } - - @Override - public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { - type = parameters.getDataType(0); - configurations.setAccessStrategy(new MappableRowByRowAccessStrategy()).setOutputDataType(type); - } - - @Override - public void transform(Column[] columns, ColumnBuilder builder) throws Exception { - long[] inputs1 = columns[0].getLongs(); - long[] inputs2 = columns[1].getLongs(); - - int count = columns[0].getPositionCount(); - for (int i = 0; i < count; i++) { - builder.writeLong(inputs1[i] + inputs2[i]); - } - } -} -``` - -5. **void transform(Row row, PointCollector collector) throws Exception** - -You need to implement this method when you specify the strategy of UDF to read the original data as `RowByRowAccessStrategy`. - -This method processes the raw data one row at a time. The raw data is input from `Row` and output by `PointCollector`. You can output any number of data points in one `transform` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. - -The following is a complete UDF example that implements the `void transform(Row row, PointCollector collector) throws Exception` method. It is an adder that receives two columns of time series as input. When two data points in a row are not `null`, this UDF will output the algebraic sum of these two data points. - -``` java -import org.apache.iotdb.udf.api.UDTF; -import org.apache.iotdb.udf.api.access.Row; -import org.apache.iotdb.udf.api.collector.PointCollector; -import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; -import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; -import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy; -import org.apache.iotdb.udf.api.type.Type; - -public class Adder implements UDTF { - - @Override - public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { - configurations - .setOutputDataType(TSDataType.INT64) - .setAccessStrategy(new RowByRowAccessStrategy()); - } - - @Override - public void transform(Row row, PointCollector collector) throws Exception { - if (row.isNull(0) || row.isNull(1)) { - return; - } - collector.putLong(row.getTime(), row.getLong(0) + row.getLong(1)); - } -} -``` - -6. **void transform(RowWindow rowWindow, PointCollector collector) throws Exception** - -You need to implement this method when you specify the strategy of UDF to read the original data as `SlidingTimeWindowAccessStrategy` or `SlidingSizeWindowAccessStrategy`. - -This method processes a batch of data in a fixed number of rows or a fixed time interval each time, and we call the container containing this batch of data a window. The raw data is input from `RowWindow` and output by `PointCollector`. `RowWindow` can help you access a batch of `Row`, it provides a set of interfaces for random access and iterative access to this batch of `Row`. You can output any number of data points in one `transform` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. - -Below is a complete UDF example that implements the `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` method. It is a counter that receives any number of time series as input, and its function is to count and output the number of data rows in each time window within a specified time range. - -```java -import java.io.IOException; -import org.apache.iotdb.udf.api.UDTF; -import org.apache.iotdb.udf.api.access.Row; -import org.apache.iotdb.udf.api.access.RowWindow; -import org.apache.iotdb.udf.api.collector.PointCollector; -import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; -import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; -import org.apache.iotdb.udf.api.customizer.strategy.SlidingTimeWindowAccessStrategy; -import org.apache.iotdb.udf.api.type.Type; - -public class Counter implements UDTF { - - @Override - public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { - configurations - .setOutputDataType(TSDataType.INT32) - .setAccessStrategy(new SlidingTimeWindowAccessStrategy( - parameters.getLong("time_interval"), - parameters.getLong("sliding_step"), - parameters.getLong("display_window_begin"), - parameters.getLong("display_window_end"))); - } - - @Override - public void transform(RowWindow rowWindow, PointCollector collector) { - if (rowWindow.windowSize() != 0) { - collector.putInt(rowWindow.windowStartTime(), rowWindow.windowSize()); - } - } -} -``` - -7. **void terminate(PointCollector collector) throws Exception** - -In some scenarios, a UDF needs to traverse all the original data to calculate the final output data points. The `terminate` interface provides support for those scenarios. - -This method is called after all `transform` calls are executed and before the `beforeDestory` method is executed. You can implement the `transform` method to perform pure data processing (without outputting any data points), and implement the `terminate` method to output the processing results. - -The processing results need to be output by the `PointCollector`. You can output any number of data points in one `terminate` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. - -Below is a complete UDF example that implements the `void terminate(PointCollector collector) throws Exception` method. It takes one time series whose data type is `INT32` as input, and outputs the maximum value point of the series. - -```java -import java.io.IOException; -import org.apache.iotdb.udf.api.UDTF; -import org.apache.iotdb.udf.api.access.Row; -import org.apache.iotdb.udf.api.collector.PointCollector; -import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; -import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; -import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy; -import org.apache.iotdb.udf.api.type.Type; - -public class Max implements UDTF { - - private Long time; - private int value; - - @Override - public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { - configurations - .setOutputDataType(TSDataType.INT32) - .setAccessStrategy(new RowByRowAccessStrategy()); - } - - @Override - public void transform(Row row, PointCollector collector) { - if (row.isNull(0)) { - return; - } - int candidateValue = row.getInt(0); - if (time == null || value < candidateValue) { - time = row.getTime(); - value = candidateValue; - } - } - - @Override - public void terminate(PointCollector collector) throws IOException { - if (time != null) { - collector.putInt(time, value); - } - } -} -``` - -8. **void beforeDestroy()** - -The method for terminating a UDF. - -This method is called by the framework. For a UDF instance, `beforeDestroy` will be called after the last record is processed. In the entire life cycle of the instance, `beforeDestroy` will only be called once. - - - -### 1.3 UDAF (User Defined Aggregation Function) - -A complete definition of UDAF involves two classes, `State` and `UDAF`. - -#### State Class - -To write your own `State`, you need to implement the `org.apache.iotdb.udf.api.State` interface. - -#### Interface Description: - -| Interface Definition | Description | Required to Implement | -| -------------------------------- | ------------------------------------------------------------ | --------------------- | -| void reset() | To reset the `State` object to its initial state, you need to fill in the initial values of the fields in the `State` class within this method as if you were writing a constructor. | Required | -| byte[] serialize() | Serializes `State` to binary data. This method is used for IoTDB internal `State` passing. Note that the order of serialization must be consistent with the following deserialization methods. | Required | -| void deserialize(byte[] bytes) | Deserializes binary data to `State`. This method is used for IoTDB internal `State` passing. Note that the order of deserialization must be consistent with the serialization method above. | Required | - -#### Detailed interface introduction: - -1. **void reset()** - -This method resets the `State` to its initial state, you need to fill in the initial values of the fields in the `State` object in this method. For optimization reasons, IoTDB reuses `State` as much as possible internally, rather than creating a new `State` for each group, which would introduce unnecessary overhead. When `State` has finished updating the data in a group, this method is called to reset to the initial state as a way to process the next group. - -In the case of `State` for averaging (aka `avg`), for example, you would need the sum of the data, `sum`, and the number of entries in the data, `count`, and initialize both to 0 in the `reset()` method. - -```java -class AvgState implements State { - double sum; - - long count; - - @Override - public void reset() { - sum = 0; - count = 0; - } - - // other methods -} -``` - -2. **byte[] serialize()/void deserialize(byte[] bytes)** - -These methods serialize the `State` into binary data, and deserialize the `State` from the binary data. IoTDB, as a distributed database, involves passing data among different nodes, so you need to write these two methods to enable the passing of the State among different nodes. Note that the order of serialization and deserialization must be the consistent. - -In the case of `State` for averaging (aka `avg`), for example, you can convert the content of State to `byte[]` array and read out the content of State from `byte[]` array in any way you want, the following shows the code for serialization/deserialization using `ByteBuffer` introduced by Java8: - -```java -@Override -public byte[] serialize() { - ByteBuffer buffer = ByteBuffer.allocate(Double.BYTES + Long.BYTES); - buffer.putDouble(sum); - buffer.putLong(count); - - return buffer.array(); -} - -@Override -public void deserialize(byte[] bytes) { - ByteBuffer buffer = ByteBuffer.wrap(bytes); - sum = buffer.getDouble(); - count = buffer.getLong(); -} -``` - - - -#### UDAF Classes - -To write a UDAF, you need to implement the `org.apache.iotdb.udf.api.UDAF` interface. - -#### Interface Description: - -| Interface definition | Description | Required to Implement | -| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------- | -| void validate(UDFParameterValidator validator) throws Exception | This method is mainly used to validate `UDFParameters` and it is executed before `beforeStart(UDFParameters, UDTFConfigurations)` is called. | Optional | -| void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception | Initialization method that invokes user-defined initialization behavior before UDAF processes the input data. Unlike UDTF, configuration is of type `UDAFConfiguration`. | Required | -| State createState() | To create a `State` object, usually just call the default constructor and modify the default initial value as needed. | Required | -| void addInput(State state, Column[] columns, BitMap bitMap) | Update `State` object according to the incoming data `Column[]` in batch, note that last column `columns[columns.length - 1]` always represents the time column. In addition, `BitMap` represents the data that has been filtered out before, you need to manually determine whether the corresponding data has been filtered out when writing this method. | Required | -| void combineState(State state, State rhs) | Merge `rhs` state into `state` state. In a distributed scenario, the same set of data may be distributed on different nodes, IoTDB generates a `State` object for the partial data on each node, and then calls this method to merge it into the complete `State`. | Required | -| void outputFinal(State state, ResultValue resultValue) | Computes the final aggregated result based on the data in `State`. Note that according to the semantics of the aggregation, only one value can be output per group. | Required | -| void beforeDestroy() | This method is called by the framework after the last input data is processed, and will only be called once in the life cycle of each UDF instance. | Optional | - -In the life cycle of a UDAF instance, the calling sequence of each method is as follows: - -1. State createState() -2. void validate(UDFParameterValidator validator) throws Exception -3. void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception -4. void addInput(State state, Column[] columns, BitMap bitMap) -5. void combineState(State state, State rhs) -6. void outputFinal(State state, ResultValue resultValue) -7. void beforeDestroy() - -Similar to UDTF, every time the framework executes a UDAF query, a new UDF instance will be constructed. When the query ends, the corresponding instance will be destroyed. Therefore, the internal data of the instances in different UDAF queries (even in the same SQL statement) are isolated. You can maintain some state data in the UDAF without considering the influence of concurrency and other factors. - -#### Detailed interface introduction: - - -1. **void validate(UDFParameterValidator validator) throws Exception** - -Same as UDTF, the `validate` method is used to validate the parameters entered by the user. - -In this method, you can limit the number and types of input time series, check the attributes of user input, or perform any custom verification. - -2. **void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception** - - The `beforeStart` method does the same thing as the UDAF: - -1. Use UDFParameters to get the time series paths and parse key-value pair attributes entered by the user. -2. Set the strategy to access the raw data and set the output data type in UDAFConfigurations. -3. Create resources, such as establishing external connections, opening files, etc. - -The role of the `UDFParameters` type can be seen above. - -2.2 **UDTFConfigurations** - -The difference from UDTF is that UDAF uses `UDAFConfigurations` as the type of `configuration` object. - -Currently, this class only supports setting the type of output data. - -```java -void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { - // parameters - // ... - - // configurations - configurations - .setOutputDataType(Type.INT32); } -} -``` - -The relationship between the output type set in `setOutputDataType` and the type of data output that `ResultValue` can actually receive is as follows: - -| The output type set in `setOutputDataType` | The output type that `ResultValue` can actually receive | -| ------------------------------------------ | ------------------------------------------------------- | -| INT32 | int | -| INT64 | long | -| FLOAT | float | -| DOUBLE | double | -| BOOLEAN | boolean | -| TEXT | org.apache.iotdb.udf.api.type.Binary | - -The output type of the UDAF is determined at runtime. You can dynamically determine the output sequence type based on the input type. - -Here is a simple example: - -```java -void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { - // do something - // ... - - configurations - .setOutputDataType(parameters.getDataType(0)); -} -``` - -3. **State createState()** - - -This method creates and initializes a `State` object for UDAF. Due to the limitations of the Java language, you can only call the default constructor for the `State` class. The default constructor assigns a default initial value to all the fields in the class, and if that initial value does not meet your requirements, you need to initialize them manually within this method. - -The following is an example that includes manual initialization. Suppose you want to implement an aggregate function that multiply all numbers in the group, then your initial `State` value should be set to 1, but the default constructor initializes it to 0, so you need to initialize `State` manually after calling the default constructor: - -```java -public State createState() { - MultiplyState state = new MultiplyState(); - state.result = 1; - return state; -} -``` - -4. **void addInput(State state, Column[] columns, BitMap bitMap)** - -This method updates the `State` object with the raw input data. For performance reasons, also to align with the IoTDB vectorized query engine, the raw input data is no longer a data point, but an array of columns ``Column[]``. Note that the last column (i.e. `columns[columns.length - 1]`) is always the time column, so you can also do different operations in UDAF depending on the time. - -Since the input parameter is not of a single data point type, but of multiple columns, you need to manually filter some of the data in the columns, which is why the third parameter, `BitMap`, exists. It identifies which of these columns have been filtered out, so you don't have to think about the filtered data in any case. - -Here's an example of `addInput()` that counts the number of items (aka count). It shows how you can use `BitMap` to ignore data that has been filtered out. Note that due to the limitations of the Java language, you need to do the explicit cast the `State` object from type defined in the interface to a custom `State` type at the beginning of the method, otherwise you won't be able to use the `State` object. - -```java -public void addInput(State state, Column[] columns, BitMap bitMap) { - CountState countState = (CountState) state; - - int count = columns[0].getPositionCount(); - for (int i = 0; i < count; i++) { - if (bitMap != null && !bitMap.isMarked(i)) { - continue; - } - if (!columns[0].isNull(i)) { - countState.count++; - } - } -} -``` - -5. **void combineState(State state, State rhs)** - - -This method combines two `State`s, or more precisely, updates the first `State` object with the second `State` object. IoTDB is a distributed database, and the data of the same group may be distributed on different nodes. For performance reasons, IoTDB will first aggregate some of the data on each node into `State`, and then merge the `State`s on different nodes that belong to the same group, which is what `combineState` does. - -Here's an example of `combineState()` for averaging (aka avg). Similar to `addInput`, you need to do an explicit type conversion for the two `State`s at the beginning. Also note that you are updating the value of the first `State` with the contents of the second `State`. - -```java -public void combineState(State state, State rhs) { - AvgState avgState = (AvgState) state; - AvgState avgRhs = (AvgState) rhs; - - avgState.count += avgRhs.count; - avgState.sum += avgRhs.sum; -} -``` - -6. **void outputFinal(State state, ResultValue resultValue)** - -This method works by calculating the final result from `State`. You need to access the various fields in `State`, derive the final result, and set the final result into the `ResultValue` object.IoTDB internally calls this method once at the end for each group. Note that according to the semantics of aggregation, the final result can only be one value. - -Here is another `outputFinal` example for averaging (aka avg). In addition to the forced type conversion at the beginning, you will also see a specific use of the `ResultValue` object, where the final result is set by `setXXX` (where `XXX` is the type name). - -```java -public void outputFinal(State state, ResultValue resultValue) { - AvgState avgState = (AvgState) state; - - if (avgState.count != 0) { - resultValue.setDouble(avgState.sum / avgState.count); - } else { - resultValue.setNull(); - } -} -``` - -7. **void beforeDestroy()** - - -The method for terminating a UDF. - -This method is called by the framework. For a UDF instance, `beforeDestroy` will be called after the last record is processed. In the entire life cycle of the instance, `beforeDestroy` will only be called once. - - -### 1.4 Maven Project Example - -If you use Maven, you can build your own UDF project referring to our **udf-example** module. You can find the project [here](https://github.com/apache/iotdb/tree/master/example/udf). - - -## 2. Contribute universal built-in UDF functions to iotdb - -This part mainly introduces how external users can contribute their own UDFs to the IoTDB community. - -#### 2.1 Prerequisites - -1. UDFs must be universal. - - The "universal" mentioned here refers to: UDFs can be widely used in some scenarios. In other words, the UDF function must have reuse value and may be directly used by other users in the community. - - If you are not sure whether the UDF you want to contribute is universal, you can send an email to `dev@iotdb.apache.org` or create an issue to initiate a discussion. - -2. The UDF you are going to contribute has been well tested and can run normally in the production environment. - - -#### 2.2 What you need to prepare - -1. UDF source code -2. Test cases -3. Instructions - -### 2.3 Contribution Content - -#### 2.3.1 UDF Source Code - -1. Create the UDF main class and related classes in `iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin` or in its subfolders. -2. Register your UDF in `iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin/BuiltinTimeSeriesGeneratingFunction.java`. - -#### 2.3.2 Test Cases - -At a minimum, you need to write integration tests for the UDF. - -You can add a test class in `integration-test/src/test/java/org/apache/iotdb/db/it/udf`. - - -#### 2.3.3 Instructions - -The instructions need to include: the name and the function of the UDF, the attribute parameters that must be provided when the UDF is executed, the applicable scenarios, and the usage examples, etc. - -The instructions for use should include both Chinese and English versions. Instructions for use should be added separately in `docs/zh/UserGuide/Operation Manual/DML Data Manipulation Language.md` and `docs/UserGuide/Operation Manual/DML Data Manipulation Language.md`. - -#### 2.3.4 Submit a PR - -When you have prepared the UDF source code, test cases, and instructions, you are ready to submit a Pull Request (PR) on [Github](https://github.com/apache/iotdb). You can refer to our code contribution guide to submit a PR: [Development Guide](https://iotdb.apache.org/Community/Development-Guide.html). - - -After the PR review is approved and merged, your UDF has already contributed to the IoTDB community! diff --git a/src/UserGuide/V1.3.0-2/SQL-Manual/SQL-Manual.md b/src/UserGuide/V1.3.0-2/SQL-Manual/SQL-Manual.md deleted file mode 100644 index 27410f434..000000000 --- a/src/UserGuide/V1.3.0-2/SQL-Manual/SQL-Manual.md +++ /dev/null @@ -1,1756 +0,0 @@ - - -# SQL Manual - -## DATABASE MANAGEMENT - -For more details, see document [Operate-Metadata](../User-Manual/Operate-Metadata.md). - -### Create Database - -```sql -IoTDB > create database root.ln -IoTDB > create database root.sgcc -``` - -### Show Databases - -```sql -IoTDB> SHOW DATABASES -IoTDB> SHOW DATABASES root.** -``` - -### Delete Database - -```sql -IoTDB > DELETE DATABASE root.ln -IoTDB > DELETE DATABASE root.sgcc -// delete all data, all timeseries and all databases -IoTDB > DELETE DATABASE root.** -``` - -### Count Databases - -```sql -IoTDB> count databases -IoTDB> count databases root.* -IoTDB> count databases root.sgcc.* -IoTDB> count databases root.sgcc -``` - -### Setting up heterogeneous databases (Advanced operations) - -#### Set heterogeneous parameters when creating a Database - -```sql -CREATE DATABASE root.db WITH SCHEMA_REPLICATION_FACTOR=1, DATA_REPLICATION_FACTOR=3, SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; -``` - -#### Adjust heterogeneous parameters at run time - -```sql -ALTER DATABASE root.db WITH SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; -``` - -#### Show heterogeneous databases - -```sql -SHOW DATABASES DETAILS -``` - -### TTL - -#### Set TTL - -```sql -IoTDB> set ttl to root.ln 3600000 -IoTDB> set ttl to root.sgcc.** 3600000 -IoTDB> set ttl to root.** 3600000 -``` - -#### Unset TTL - -```sql -IoTDB> unset ttl to root.ln -IoTDB> unset ttl to root.sgcc.** -IoTDB> unset ttl to root.** -``` - -#### Show TTL - -```sql -IoTDB> SHOW ALL TTL -IoTDB> SHOW TTL ON StorageGroupNames -``` - -## DEVICE TEMPLATE - -For more details, see document [Operate-Metadata](../User-Manual/Operate-Metadata.md). - -![img](/img/%E6%A8%A1%E6%9D%BF.png) - -![img](/img/templateEN.jpg) - - -### Create Device Template - -**Example 1:** Create a template containing two non-aligned timeseires - -```shell -IoTDB> create device template t1 (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) -``` - -**Example 2:** Create a template containing a group of aligned timeseires - -```shell -IoTDB> create device template t2 aligned (lat FLOAT encoding=Gorilla, lon FLOAT encoding=Gorilla) -``` - -The` lat` and `lon` measurements are aligned. - -### Set Device Template - -```sql -IoTDB> set device template t1 to root.sg1.d1 -``` - -### Activate Device Template - -```sql -IoTDB> set device template t1 to root.sg1.d1 -IoTDB> set device template t2 to root.sg1.d2 -IoTDB> create timeseries using device template on root.sg1.d1 -IoTDB> create timeseries using device template on root.sg1.d2 -``` - -### Show Device Template - -```sql -IoTDB> show device templates -IoTDB> show nodes in device template t1 -IoTDB> show paths set device template t1 -IoTDB> show paths using device template t1 -``` - -### Deactivate Device Template - -```sql -IoTDB> delete timeseries of device template t1 from root.sg1.d1 -IoTDB> deactivate device template t1 from root.sg1.d1 -IoTDB> delete timeseries of device template t1 from root.sg1.*, root.sg2.* -IoTDB> deactivate device template t1 from root.sg1.*, root.sg2.* -``` - -### Unset Device Template - -```sql -IoTDB> unset device template t1 from root.sg1.d1 -``` - -### Drop Device Template - -```sql -IoTDB> drop device template t1 -``` - -### Alter Device Template - -```sql -IoTDB> alter device template t1 add (speed FLOAT encoding=RLE, FLOAT TEXT encoding=PLAIN compression=SNAPPY) -``` - -## TIMESERIES MANAGEMENT - -For more details, see document [Operate-Metadata](../User-Manual/Operate-Metadata.md). - -### Create Timeseries - -```sql -IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE -IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN -IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE -``` - -- From v0.13, you can use a simplified version of the SQL statements to create timeseries: - -```sql -IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE -IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN -IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE -``` - -- Notice that when in the CREATE TIMESERIES statement the encoding method conflicts with the data type, the system gives the corresponding error prompt as shown below: - -```sql -IoTDB > create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF -error: encoding TS_2DIFF does not support BOOLEAN -``` - -### Create Aligned Timeseries - -```sql -IoTDB> CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY) -``` - -### Delete Timeseries - -```sql -IoTDB> delete timeseries root.ln.wf01.wt01.status -IoTDB> delete timeseries root.ln.wf01.wt01.temperature, root.ln.wf02.wt02.hardware -IoTDB> delete timeseries root.ln.wf02.* -IoTDB> drop timeseries root.ln.wf02.* -``` - -### Show Timeseries - -```sql -IoTDB> show timeseries root.** -IoTDB> show timeseries root.ln.** -IoTDB> show timeseries root.ln.** limit 10 offset 10 -IoTDB> show timeseries root.ln.** where timeseries contains 'wf01.wt' -IoTDB> show timeseries root.ln.** where dataType=FLOAT -IoTDB> show timeseries root.ln.** where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; -``` - -### Count Timeseries - -```sql -IoTDB > COUNT TIMESERIES root.** -IoTDB > COUNT TIMESERIES root.ln.** -IoTDB > COUNT TIMESERIES root.ln.*.*.status -IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status -IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' -IoTDB > COUNT TIMESERIES root.** WHERE DATATYPE = INT64 -IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) contains 'c' -IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) = 'c' -IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' group by level = 1 -IoTDB > COUNT TIMESERIES root.** WHERE time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; -IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 -IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 -IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 -``` - -### Tag and Attribute Management - -```sql -create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2) -``` - -* Rename the tag/attribute key - -```SQL -ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 -``` - -* Reset the tag/attribute value - -```SQL -ALTER timeseries root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 -``` - -* Delete the existing tag/attribute - -```SQL -ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 -``` - -* Add new tags - -```SQL -ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 -``` - -* Add new attributes - -```SQL -ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 -``` - -* Upsert alias, tags and attributes - -> add alias or a new key-value if the alias or key doesn't exist, otherwise, update the old one with new value. - -```SQL -ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag3=v3, tag4=v4) ATTRIBUTES(attr3=v3, attr4=v4) -``` - -* Show timeseries using tags. Use TAGS(tagKey) to identify the tags used as filter key - -```SQL -SHOW TIMESERIES (<`PathPattern`>)? timeseriesWhereClause -``` - -returns all the timeseries information that satisfy the where condition and match the pathPattern. SQL statements are as follows: - -```SQL -ALTER timeseries root.ln.wf02.wt02.hardware ADD TAGS unit=c -ALTER timeseries root.ln.wf02.wt02.status ADD TAGS description=test1 -show timeseries root.ln.** where TAGS(unit)='c' -show timeseries root.ln.** where TAGS(description) contains 'test1' -``` - -- count timeseries using tags - -```SQL -COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause -COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause GROUP BY LEVEL= -``` - -returns all the number of timeseries that satisfy the where condition and match the pathPattern. SQL statements are as follows: - -```SQL -count timeseries -count timeseries root.** where TAGS(unit)='c' -count timeseries root.** where TAGS(unit)='c' group by level = 2 -``` - -create aligned timeseries - -```SQL -create aligned timeseries root.sg1.d1(s1 INT32 tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2), s2 DOUBLE tags(tag3=v3, tag4=v4) attributes(attr3=v3, attr4=v4)) -``` - -The execution result is as follows: - -```SQL -IoTDB> show timeseries -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| -|root.sg1.d1.s2| null| root.sg1| DOUBLE| GORILLA| SNAPPY|{"tag4":"v4","tag3":"v3"}|{"attr4":"v4","attr3":"v3"}| null| null| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -``` - -Support query: - -```SQL -IoTDB> show timeseries where TAGS(tag1)='v1' -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -``` - -The above operations are supported for timeseries tag, attribute updates, etc. - -## NODE MANAGEMENT - -For more details, see document [Operate-Metadata](../User-Manual/Operate-Metadata.md). - -### Show Child Paths - -```SQL -SHOW CHILD PATHS pathPattern -``` - -### Show Child Nodes - -```SQL -SHOW CHILD NODES pathPattern -``` - -### Count Nodes - -```SQL -IoTDB > COUNT NODES root.** LEVEL=2 -IoTDB > COUNT NODES root.ln.** LEVEL=2 -IoTDB > COUNT NODES root.ln.wf01.** LEVEL=3 -IoTDB > COUNT NODES root.**.temperature LEVEL=3 -``` - -### Show Devices - -```SQL -IoTDB> show devices -IoTDB> show devices root.ln.** -IoTDB> show devices root.ln.** where device contains 't' -IoTDB> show devices with database -IoTDB> show devices root.ln.** with database -``` - -### Count Devices - -```SQL -IoTDB> show devices -IoTDB> count devices -IoTDB> count devices root.ln.** -``` - -## INSERT & LOAD DATA - -### Insert Data - -For more details, see document [Write-Delete-Data](../User-Manual/Write-Delete-Data.md). - -#### Use of INSERT Statements - -- Insert Single Timeseries - -```sql -IoTDB > insert into root.ln.wf02.wt02(timestamp,status) values(1,true) -IoTDB > insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') -``` - -- Insert Multiple Timeseries - -```sql -IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (2, false, 'v2') -IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (3, false, 'v3'),(4, true, 'v4') -``` - -- Use the Current System Timestamp as the Timestamp of the Data Point - -```SQL -IoTDB > insert into root.ln.wf02.wt02(status, hardware) values (false, 'v2') -``` - -#### Insert Data Into Aligned Timeseries - -```SQL -IoTDB > create aligned timeseries root.sg1.d1(s1 INT32, s2 DOUBLE) -IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1) -IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(2, 2, 2), (3, 3, 3) -IoTDB > select * from root.sg1.d1 -``` - -### Load External TsFile Tool - -For more details, see document [Import-Export-Tool](../Tools-System/TsFile-Import-Export-Tool.md). - -#### Load with SQL - -1. Load a single tsfile by specifying a file path (absolute path). - -- `load '/Users/Desktop/data/1575028885956-101-0.tsfile'` -- `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1` -- `load '/Users/Desktop/data/1575028885956-101-0.tsfile' onSuccess=delete` -- `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1 onSuccess=delete` - - -2. Load a batch of files by specifying a folder path (absolute path). - -- `load '/Users/Desktop/data'` -- `load '/Users/Desktop/data' sglevel=1` -- `load '/Users/Desktop/data' onSuccess=delete` -- `load '/Users/Desktop/data' sglevel=1 onSuccess=delete` - -#### Load with Script - -``` -./load-rewrite.bat -f D:\IoTDB\data -h 192.168.0.101 -p 6667 -u root -pw root -``` - -## DELETE DATA - -For more details, see document [Write-Delete-Data](../User-Manual/Write-Delete-Data.md). - -### Delete Single Timeseries - -```sql -IoTDB > delete from root.ln.wf02.wt02.status where time<=2017-11-01T16:26:00; -IoTDB > delete from root.ln.wf02.wt02.status where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; -IoTDB > delete from root.ln.wf02.wt02.status where time < 10 -IoTDB > delete from root.ln.wf02.wt02.status where time <= 10 -IoTDB > delete from root.ln.wf02.wt02.status where time < 20 and time > 10 -IoTDB > delete from root.ln.wf02.wt02.status where time <= 20 and time >= 10 -IoTDB > delete from root.ln.wf02.wt02.status where time > 20 -IoTDB > delete from root.ln.wf02.wt02.status where time >= 20 -IoTDB > delete from root.ln.wf02.wt02.status where time = 20 -IoTDB > delete from root.ln.wf02.wt02.status where time > 4 or time < 0 -Msg: 303: Check metadata error: For delete statement, where clause can only contain atomic -expressions like : time > XXX, time <= XXX, or two atomic expressions connected by 'AND' -IoTDB > delete from root.ln.wf02.wt02.status -``` - -### Delete Multiple Timeseries - -```sql -IoTDB > delete from root.ln.wf02.wt02 where time <= 2017-11-01T16:26:00; -IoTDB > delete from root.ln.wf02.wt02.* where time <= 2017-11-01T16:26:00; -IoTDB> delete from root.ln.wf03.wt02.status where time < now() -Msg: The statement is executed successfully. -``` - -### Delete Time Partition (experimental) - -```sql -IoTDB > DELETE PARTITION root.ln 0,1,2 -``` - -## QUERY DATA - -For more details, see document [Query-Data](../User-Manual/Query-Data.md). - -```sql -SELECT [LAST] selectExpr [, selectExpr] ... - [INTO intoItem [, intoItem] ...] - FROM prefixPath [, prefixPath] ... - [WHERE whereCondition] - [GROUP BY { - ([startTime, endTime), interval [, slidingStep]) | - LEVEL = levelNum [, levelNum] ... | - TAGS(tagKey [, tagKey] ... ) | - VARIATION(expression[,delta][,ignoreNull=true/false]) | - CONDITION(expression,[keep>/>=/=/ select temperature from root.ln.wf01.wt01 where time < 2017-11-01T00:08:00.000 -``` - -#### Select Multiple Columns of Data Based on a Time Interval - -```sql -IoTDB > select status, temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; -``` - -#### Select Multiple Columns of Data for the Same Device According to Multiple Time Intervals - -```sql -IoTDB > select status,temperature from root.ln.wf01.wt01 where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); -``` - -#### Choose Multiple Columns of Data for Different Devices According to Multiple Time Intervals - -```sql -IoTDB > select wf01.wt01.status,wf02.wt02.hardware from root.ln where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); -``` - -#### Order By Time Query - -```sql -IoTDB > select * from root.ln.** where time > 1 order by time desc limit 10; -``` - -### `SELECT` CLAUSE - -#### Use Alias - -```sql -IoTDB > select s1 as temperature, s2 as speed from root.ln.wf01.wt01; -``` - -#### Nested Expressions - -##### Nested Expressions with Time Series Query - -```sql -IoTDB > select a, - b, - ((a + 1) * 2 - 1) % 2 + 1.5, - sin(a + sin(a + sin(b))), - -(a + b) * (sin(a + b) * sin(a + b) + cos(a + b) * cos(a + b)) + 1 -from root.sg1; - -IoTDB > select (a + b) * 2 + sin(a) from root.sg - -IoTDB > select (a + *) / 2 from root.sg1 - -IoTDB > select (a + b) * 3 from root.sg, root.ln -``` - -##### Nested Expressions query with aggregations - -```sql -IoTDB > select avg(temperature), - sin(avg(temperature)), - avg(temperature) + 1, - -sum(hardware), - avg(temperature) + sum(hardware) -from root.ln.wf01.wt01; - -IoTDB > select avg(*), - (avg(*) + 1) * 3 / 2 -1 -from root.sg1 - -IoTDB > select avg(temperature), - sin(avg(temperature)), - avg(temperature) + 1, - -sum(hardware), - avg(temperature) + sum(hardware) as custom_sum -from root.ln.wf01.wt01 -GROUP BY([10, 90), 10ms); -``` - -#### Last Query - -```sql -IoTDB > select last status from root.ln.wf01.wt01 -IoTDB > select last status, temperature from root.ln.wf01.wt01 where time >= 2017-11-07T23:50:00 -IoTDB > select last * from root.ln.wf01.wt01 order by timeseries desc; -IoTDB > select last * from root.ln.wf01.wt01 order by dataType desc; -``` - -### `WHERE` CLAUSE - -#### Time Filter - -```sql -IoTDB > select s1 from root.sg1.d1 where time > 2022-01-01T00:05:00.000; -IoTDB > select s1 from root.sg1.d1 where time = 2022-01-01T00:05:00.000; -IoTDB > select s1 from root.sg1.d1 where time >= 2022-01-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; -``` - -#### Value Filter - -```sql -IoTDB > select temperature from root.sg1.d1 where temperature > 36.5; -IoTDB > select status from root.sg1.d1 where status = true; -IoTDB > select temperature from root.sg1.d1 where temperature between 36.5 and 40; -IoTDB > select temperature from root.sg1.d1 where temperature not between 36.5 and 40; -IoTDB > select code from root.sg1.d1 where code in ('200', '300', '400', '500'); -IoTDB > select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); -IoTDB > select code from root.sg1.d1 where temperature is null; -IoTDB > select code from root.sg1.d1 where temperature is not null; -``` - -#### Fuzzy Query - -- Fuzzy matching using `Like` - -```sql -IoTDB > select * from root.sg.d1 where value like '%cc%' -IoTDB > select * from root.sg.device where value like '_b_' -``` - -- Fuzzy matching using `Regexp` - -```sql -IoTDB > select * from root.sg.d1 where value regexp '^[A-Za-z]+$' -IoTDB > select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 -``` - -### `GROUP BY` CLAUSE - -- Aggregate By Time without Specifying the Sliding Step Length - -```sql -IoTDB > select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d); -``` - -- Aggregate By Time Specifying the Sliding Step Length - -```sql -IoTDB > select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d); -``` - -- Aggregate by Natural Month - -```sql -IoTDB > select count(status) from root.ln.wf01.wt01 group by([2017-11-01T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); -IoTDB > select count(status) from root.ln.wf01.wt01 group by([2017-10-31T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); -``` - -- Left Open And Right Close Range - -```sql -IoTDB > select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d); -``` - -- Aggregation By Variation - -```sql -IoTDB > select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6) -IoTDB > select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, ignoreNull=false) -IoTDB > select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, 4) -IoTDB > select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6+s5, 10) -``` - -- Aggregation By Condition - -```sql -IoTDB > select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoringNull=true) -IoTDB > select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoringNull=false) -``` - -- Aggregation By Session - -```sql -IoTDB > select __endTime,count(*) from root.** group by session(1d) -IoTDB > select __endTime,sum(hardware) from root.ln.wf02.wt01 group by session(50s) having sum(hardware)>0 align by device -``` - -- Aggregation By Count - -```sql -IoTDB > select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5) -IoTDB > select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5,ignoreNull=false) -``` - -- Aggregation By Level - -```sql -IoTDB > select count(status) from root.** group by level = 1 -IoTDB > select count(status) from root.** group by level = 3 -IoTDB > select count(status) from root.** group by level = 1, 3 -IoTDB > select max_value(temperature) from root.** group by level = 0 -IoTDB > select count(*) from root.ln.** group by level = 2 -``` - -- Aggregate By Time with Level Clause - -```sql -IoTDB > select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d), level=1; -IoTDB > select count(status) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d), level=1; -``` - -- Aggregation query by one single tag - -```sql -IoTDB > SELECT AVG(temperature) FROM root.factory1.** GROUP BY TAGS(city); -``` - -- Aggregation query by multiple tags - -```sql -IoTDB > SELECT avg(temperature) FROM root.factory1.** GROUP BY TAGS(city, workshop); -``` - -- Downsampling Aggregation by tags based on Time Window - -```sql -IoTDB > SELECT avg(temperature) FROM root.factory1.** GROUP BY ([1000, 10000), 5s), TAGS(city, workshop); -``` - -### `HAVING` CLAUSE - -Correct: - -```sql -IoTDB > select count(s1) from root.** group by ([1,11),2ms), level=1 having count(s2) > 1 -IoTDB > select count(s1), count(s2) from root.** group by ([1,11),2ms) having count(s2) > 1 align by device -``` - -Incorrect: - -```sql -IoTDB > select count(s1) from root.** group by ([1,3),1ms) having sum(s1) > s1 -IoTDB > select count(s1) from root.** group by ([1,3),1ms) having s1 > 1 -IoTDB > select count(s1) from root.** group by ([1,3),1ms), level=1 having sum(d1.s1) > 1 -IoTDB > select count(d1.s1) from root.** group by ([1,3),1ms), level=1 having sum(s1) > 1 -``` - -### `FILL` CLAUSE - -#### `PREVIOUS` Fill - -```sql -IoTDB > select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(previous); -``` - -#### `PREVIOUS` FILL and specify the fill timeout threshold -```sql -select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(previous, 2m); -``` - -#### `LINEAR` Fill - -```sql -IoTDB > select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(linear); -``` - -#### Constant Fill - -```sql -IoTDB > select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(2.0); -IoTDB > select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(true); -``` - -### `LIMIT` and `SLIMIT` CLAUSES (PAGINATION) - -#### Row Control over Query Results - -```sql -IoTDB > select status, temperature from root.ln.wf01.wt01 limit 10 -IoTDB > select status, temperature from root.ln.wf01.wt01 limit 5 offset 3 -IoTDB > select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time< 2017-11-01T00:12:00.000 limit 2 offset 3 -IoTDB > select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) limit 5 offset 3 -``` - -#### Column Control over Query Results - -```sql -IoTDB > select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 -IoTDB > select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 soffset 1 -IoTDB > select max_value(*) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) slimit 1 soffset 1 -``` - -#### Row and Column Control over Query Results - -```sql -IoTDB > select * from root.ln.wf01.wt01 limit 10 offset 100 slimit 2 soffset 0 -``` - -### `ORDER BY` CLAUSE - -#### Order by in ALIGN BY TIME mode - -```sql -IoTDB > select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time desc; -``` - -#### Order by in ALIGN BY DEVICE mode - -```sql -IoTDB > select * from root.ln.** where time <= 2017-11-01T00:01:00 order by device desc,time asc align by device; -IoTDB > select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time asc,device desc align by device; -IoTDB > select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; -IoTDB > select count(*) from root.ln.** group by ((2017-11-01T00:00:00.000+08:00,2017-11-01T00:03:00.000+08:00],1m) order by device asc,time asc align by device -``` - -#### Order by arbitrary expressions - -```sql -IoTDB > select score from root.** order by score desc align by device -IoTDB > select score,total from root.one order by base+score+bonus desc -IoTDB > select score,total from root.one order by total desc -IoTDB > select base, score, bonus, total from root.** order by total desc NULLS Last, - score desc NULLS Last, - bonus desc NULLS Last, - time desc align by device -IoTDB > select min_value(total) from root.** order by min_value(total) asc align by device -IoTDB > select min_value(total),max_value(base) from root.** order by max_value(total) desc align by device -IoTDB > select score from root.** order by device asc, score desc, time asc align by device -``` - -### `ALIGN BY` CLAUSE - -#### Align by Device - -```sql -IoTDB > select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; -``` - -### `INTO` CLAUSE (QUERY WRITE-BACK) - -```sql -IoTDB > select s1, s2 into root.sg_copy.d1(t1), root.sg_copy.d2(t1, t2), root.sg_copy.d1(t2) from root.sg.d1, root.sg.d2; -IoTDB > select count(s1 + s2), last_value(s2) into root.agg.count(s1_add_s2), root.agg.last_value(s2) from root.sg.d1 group by ([0, 100), 10ms); -IoTDB > select s1, s2 into root.sg_copy.d1(t1, t2), root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; -IoTDB > select s1 + s2 into root.expr.add(d1s1_d1s2), root.expr.add(d2s1_d2s2) from root.sg.d1, root.sg.d2 align by device; -``` - -- Using variable placeholders: - -```sql -IoTDB > select s1, s2 -into root.sg_copy.d1(::), root.sg_copy.d2(s1), root.sg_copy.d1(${3}), root.sg_copy.d2(::) -from root.sg.d1, root.sg.d2; - -IoTDB > select d1.s1, d1.s2, d2.s3, d3.s4 -into ::(s1_1, s2_2), root.sg.d2_2(s3_3), root.${2}_copy.::(s4) -from root.sg; - -IoTDB > select * into root.sg_bk.::(::) from root.sg.**; - -IoTDB > select s1, s2, s3, s4 -into root.backup_sg.d1(s1, s2, s3, s4), root.backup_sg.d2(::), root.sg.d3(backup_${4}) -from root.sg.d1, root.sg.d2, root.sg.d3 -align by device; - -IoTDB > select avg(s1), sum(s2) + sum(s3), count(s4) -into root.agg_${2}.::(avg_s1, sum_s2_add_s3, count_s4) -from root.** -align by device; - -IoTDB > select * into ::(backup_${4}) from root.sg.** align by device; - -IoTDB > select s1, s2 into root.sg_copy.d1(t1, t2), aligned root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; -``` -## Maintennance -Generate the corresponding query plan: -``` -explain select s1,s2 from root.sg.d1 -``` -Execute the corresponding SQL, analyze the execution and output: -``` -explain analyze select s1,s2 from root.sg.d1 order by s1 -``` -## OPERATOR - -For more details, see document [Operator-and-Expression](../User-Manual/Operator-and-Expression.md). - -### Arithmetic Operators - -For details and examples, see the document [Arithmetic Operators and Functions](../Reference/Function-and-Expression.md#arithmetic-operators-and-functions). - -```sql -select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root.sg.d1 -``` - -### Comparison Operators - -For details and examples, see the document [Comparison Operators and Functions](../Reference/Function-and-Expression.md#comparison-operators-and-functions). - -```sql -# Basic comparison operators -select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; - -# `BETWEEN ... AND ...` operator -select temperature from root.sg1.d1 where temperature between 36.5 and 40; -select temperature from root.sg1.d1 where temperature not between 36.5 and 40; - -# Fuzzy matching operator: Use `Like` for fuzzy matching -select * from root.sg.d1 where value like '%cc%' -select * from root.sg.device where value like '_b_' - -# Fuzzy matching operator: Use `Regexp` for fuzzy matching -select * from root.sg.d1 where value regexp '^[A-Za-z]+$' -select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 -select b, b like '1%', b regexp '[0-2]' from root.test; - -# `IS NULL` operator -select code from root.sg1.d1 where temperature is null; -select code from root.sg1.d1 where temperature is not null; - -# `IN` operator -select code from root.sg1.d1 where code in ('200', '300', '400', '500'); -select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); -select a, a in (1, 2) from root.test; -``` - -### Logical Operators - -For details and examples, see the document [Logical Operators](../Reference/Function-and-Expression.md#logical-operators). - -```sql -select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; -``` - -## BUILT-IN FUNCTIONS - -For more details, see document [Operator-and-Expression](../Reference/Function-and-Expression.md). - -### Aggregate Functions - -For details and examples, see the document [Aggregate Functions](../Reference/Function-and-Expression.md#aggregate-functions). - -```sql -select count(status) from root.ln.wf01.wt01; - -select count_if(s1=0 & s2=0, 3), count_if(s1=1 & s2=0, 3) from root.db.d1; -select count_if(s1=0 & s2=0, 3, 'ignoreNull'='false'), count_if(s1=1 & s2=0, 3, 'ignoreNull'='false') from root.db.d1; - -select time_duration(s1) from root.db.d1; -``` - -### Arithmetic Functions - -For details and examples, see the document [Arithmetic Operators and Functions](../Reference/Function-and-Expression.md#arithmetic-operators-and-functions). - -```sql -select s1, sin(s1), cos(s1), tan(s1) from root.sg1.d1 limit 5 offset 1000; -select s4,round(s4),round(s4,2),round(s4,-1) from root.sg1.d1; -``` - -### Comparison Functions - -For details and examples, see the document [Comparison Operators and Functions](../Reference/Function-and-Expression.md#comparison-operators-and-functions). - -```sql -select ts, on_off(ts, 'threshold'='2') from root.test; -select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; -``` - -### String Processing Functions - -For details and examples, see the document [String Processing](../Reference/Function-and-Expression.md#string-processing). - -```sql -select s1, string_contains(s1, 's'='warn') from root.sg1.d4; -select s1, string_matches(s1, 'regex'='[^\\s]+37229') from root.sg1.d4; -select s1, length(s1) from root.sg1.d1 -select s1, locate(s1, "target"="1") from root.sg1.d1 -select s1, locate(s1, "target"="1", "reverse"="true") from root.sg1.d1 -select s1, startswith(s1, "target"="1") from root.sg1.d1 -select s1, endswith(s1, "target"="1") from root.sg1.d1 -select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB") from root.sg1.d1 -select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB", "series_behind"="true") from root.sg1.d1 -select s1, substring(s1 from 1 for 2) from root.sg1.d1 -select s1, replace(s1, 'es', 'tt') from root.sg1.d1 -select s1, upper(s1) from root.sg1.d1 -select s1, lower(s1) from root.sg1.d1 -select s3, trim(s3) from root.sg1.d1 -select s1, s2, strcmp(s1, s2) from root.sg1.d1 -select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1 -select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1 -select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1 -select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1 -select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1 -select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1 -``` - -### Data Type Conversion Function - -For details and examples, see the document [Data Type Conversion Function](../Reference/Function-and-Expression.md#data-type-conversion-function). - -```sql -SELECT cast(s1 as INT32) from root.sg -``` - -### Constant Timeseries Generating Functions - -For details and examples, see the document [Constant Timeseries Generating Functions](../Reference/Function-and-Expression.md#constant-timeseries-generating-functions). - -```sql -select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; -``` - -### Selector Functions - -For details and examples, see the document [Selector Functions](../Reference/Function-and-Expression.md#selector-functions). - -```sql -select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time > 2020-12-10T20:36:15.530+08:00; -``` - -### Continuous Interval Functions - -For details and examples, see the document [Continuous Interval Functions](../Reference/Function-and-Expression.md#continuous-interval-functions). - -```sql -select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_duration(s4) from root.sg.d2; -``` - -### Variation Trend Calculation Functions - -For details and examples, see the document [Variation Trend Calculation Functions](../Reference/Function-and-Expression.md#variation-trend-calculation-functions). - -```sql -select s1, time_difference(s1), difference(s1), non_negative_difference(s1), derivative(s1), non_negative_derivative(s1) from root.sg1.d1 limit 5 offset 1000; - -SELECT DIFF(s1), DIFF(s2) from root.test; -SELECT DIFF(s1, 'ignoreNull'='false'), DIFF(s2, 'ignoreNull'='false') from root.test; -``` - -### Sample Functions - -For details and examples, see the document [Sample Functions](../Reference/Function-and-Expression.md#sample-functions). - -```sql -select equal_size_bucket_random_sample(temperature,'proportion'='0.1') as random_sample from root.ln.wf01.wt01; -select equal_size_bucket_agg_sample(temperature, 'type'='avg','proportion'='0.1') as agg_avg, equal_size_bucket_agg_sample(temperature, 'type'='max','proportion'='0.1') as agg_max, equal_size_bucket_agg_sample(temperature,'type'='min','proportion'='0.1') as agg_min, equal_size_bucket_agg_sample(temperature, 'type'='sum','proportion'='0.1') as agg_sum, equal_size_bucket_agg_sample(temperature, 'type'='extreme','proportion'='0.1') as agg_extreme, equal_size_bucket_agg_sample(temperature, 'type'='variance','proportion'='0.1') as agg_variance from root.ln.wf01.wt01; -select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01; -select equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='avg', 'number'='2') as outlier_avg_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='stendis', 'number'='2') as outlier_stendis_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='cos', 'number'='2') as outlier_cos_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='prenextdis', 'number'='2') as outlier_prenextdis_sample from root.ln.wf01.wt01; - -select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1 -select M4(s1,'windowSize'='10') from root.vehicle.d1 -``` - -### Change Points Function - -For details and examples, see the document [Time-Series](../Reference/Function-and-Expression.md#time-series-processing). - -```sql -select change_points(s1), change_points(s2), change_points(s3), change_points(s4), change_points(s5), change_points(s6) from root.testChangePoints.d1 -``` - -## DATA QUALITY FUNCTION LIBRARY - -For more details, see document [Operator-and-Expression](../Reference/UDF-Libraries.md#). - -### Data Quality - -For details and examples, see the document [Data-Quality](../Reference/UDF-Libraries.md#data-quality). - -```sql -# Completeness -select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 - -# Consistency -select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 - -# Timeliness -select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 - -# Validity -select Validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 -select Validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 - -# Accuracy -select Accuracy(t1,t2,t3,m1,m2,m3) from root.test -``` - -### Data Profiling - -For details and examples, see the document [Data-Profiling](../Reference/UDF-Libraries.md#data-profiling). - -```sql -# ACF -select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 - -# Distinct -select distinct(s2) from root.test.d2 - -# Histogram -select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 - -# Integral -select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 -select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 - -# IntegralAvg -select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 - -# Mad -select mad(s0) from root.test -select mad(s0, "error"="0.01") from root.test - -# Median -select median(s0, "error"="0.01") from root.test - -# MinMax -select minmax(s1) from root.test - -# Mode -select mode(s2) from root.test.d2 - -# MvAvg -select mvavg(s1, "window"="3") from root.test - -# PACF -select pacf(s1, "lag"="5") from root.test - -# Percentile -select percentile(s0, "rank"="0.2", "error"="0.01") from root.test - -# Quantile -select quantile(s0, "rank"="0.2", "K"="800") from root.test - -# Period -select period(s1) from root.test.d3 - -# QLB -select QLB(s1) from root.test.d1 - -# Resample -select resample(s1,'every'='5m','interp'='linear') from root.test.d1 -select resample(s1,'every'='30m','aggr'='first') from root.test.d1 -select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 - -# Sample -select sample(s1,'method'='reservoir','k'='5') from root.test.d1 -select sample(s1,'method'='isometric','k'='5') from root.test.d1 - -# Segment -select segment(s1, "error"="0.1") from root.test - -# Skew -select skew(s1) from root.test.d1 - -# Spline -select spline(s1, "points"="151") from root.test - -# Spread -select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 - -# Stddev -select stddev(s1) from root.test.d1 - -# ZScore -select zscore(s1) from root.test -``` - -### Anomaly Detection - -For details and examples, see the document [Anomaly-Detection](../Reference/UDF-Libraries.md#anomaly-detection). - -```sql -# IQR -select iqr(s1) from root.test - -# KSigma -select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 - -# LOF -select lof(s1,s2) from root.test.d1 where time<1000 -select lof(s1, "method"="series") from root.test.d1 where time<1000 - -# MissDetect -select missdetect(s2,'minlen'='10') from root.test.d2 - -# Range -select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 - -# TwoSidedFilter -select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test - -# Outlier -select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test - -# MasterTrain -select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test - -# MasterDetect -select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test -select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test -``` - -### Frequency Domain - -For details and examples, see the document [Frequency-Domain](../Reference/UDF-Libraries.md#frequency-domain-analysis). - -```sql -# Conv -select conv(s1,s2) from root.test.d2 - -# Deconv -select deconv(s3,s2) from root.test.d2 -select deconv(s3,s2,'result'='remainder') from root.test.d2 - -# DWT -select dwt(s1,"method"="haar") from root.test.d1 - -# FFT -select fft(s1) from root.test.d1 -select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 - -# HighPass -select highpass(s1,'wpass'='0.45') from root.test.d1 - -# IFFT -select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 - -# LowPass -select lowpass(s1,'wpass'='0.45') from root.test.d1 - -# Envelope -select envelope(s1) from root.test.d1 -``` - -### Data Matching - -For details and examples, see the document [Data-Matching](../Reference/UDF-Libraries.md#data-matching). - -```sql -# Cov -select cov(s1,s2) from root.test.d2 - -# DTW -select dtw(s1,s2) from root.test.d2 - -# Pearson -select pearson(s1,s2) from root.test.d2 - -# PtnSym -select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 - -# XCorr -select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 -``` - -### Data Repairing - -For details and examples, see the document [Data-Repairing](../Reference/UDF-Libraries.md#data-repairing). - -```sql -# TimestampRepair -select timestamprepair(s1,'interval'='10000') from root.test.d2 -select timestamprepair(s1) from root.test.d2 - -# ValueFill -select valuefill(s1) from root.test.d2 -select valuefill(s1,"method"="previous") from root.test.d2 - -# ValueRepair -select valuerepair(s1) from root.test.d2 -select valuerepair(s1,'method'='LsGreedy') from root.test.d2 - -# MasterRepair -select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test - -# SeasonalRepair -select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 -select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 -``` - -### Series Discovery - -For details and examples, see the document [Series-Discovery](../Reference/UDF-Libraries.md#series-discovery). - -```sql -# ConsecutiveSequences -select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 -select consecutivesequences(s1,s2) from root.test.d1 - -# ConsecutiveWindows -select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 -``` - -### Machine Learning - -For details and examples, see the document [Machine-Learning](../Reference/UDF-Libraries.md#machine-learning). - -```sql -# AR -select ar(s0,"p"="2") from root.test.d0 - -# Representation -select representation(s0,"tb"="3","vb"="2") from root.test.d0 - -# RM -select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 -``` - -## LAMBDA EXPRESSION - -For details and examples, see the document [Lambda](../Reference/Function-and-Expression.md#lambda-expression). - -```sql -select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'expr'='x -> {x * 3}') as jexl2, jexl(temperature, 'expr'='x -> {x * x}') as jexl3, jexl(temperature, 'expr'='x -> {multiply(x, 100)}') as jexl4, jexl(temperature, st, 'expr'='(x, y) -> {x + y}') as jexl5, jexl(temperature, st, str, 'expr'='(x, y, z) -> {x + y + z}') as jexl6 from root.ln.wf01.wt01;``` -``` - -## CONDITIONAL EXPRESSION - -For details and examples, see the document [Conditional Expressions](../Reference/Function-and-Expression.md#conditional-expressions). - -```sql -select T, P, case -when 1000=1050 then "bad temperature" -when P<=1000000 or P>=1100000 then "bad pressure" -end as `result` -from root.test1 - -select str, case -when str like "%cc%" then "has cc" -when str like "%dd%" then "has dd" -else "no cc and dd" end as `result` -from root.test2 - -select -count(case when x<=1 then 1 end) as `(-∞,1]`, -count(case when 1 -[RESAMPLE - [EVERY ] - [BOUNDARY ] - [RANGE [, end_time_offset]] -] -[TIMEOUT POLICY BLOCKED|DISCARD] -BEGIN - SELECT CLAUSE - INTO CLAUSE - FROM CLAUSE - [WHERE CLAUSE] - [GROUP BY([, ]) [, level = ]] - [HAVING CLAUSE] - [FILL ({PREVIOUS | LINEAR | constant} (, interval=DURATION_LITERAL)?)] - [LIMIT rowLimit OFFSET rowOffset] - [ALIGN BY DEVICE] -END -``` - -### Configuring execution intervals - -```sql -CREATE CONTINUOUS QUERY cq1 -RESAMPLE EVERY 20s -BEGIN -SELECT max_value(temperature) - INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) - FROM root.ln.*.* - GROUP BY(10s) -END -``` - -### Configuring time range for resampling - -```sql -CREATE CONTINUOUS QUERY cq2 -RESAMPLE RANGE 40s -BEGIN - SELECT max_value(temperature) - INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) - FROM root.ln.*.* - GROUP BY(10s) -END -``` - -### Configuring execution intervals and CQ time ranges - -```sql -CREATE CONTINUOUS QUERY cq3 -RESAMPLE EVERY 20s RANGE 40s -BEGIN - SELECT max_value(temperature) - INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) - FROM root.ln.*.* - GROUP BY(10s) - FILL(100.0) -END -``` - -### Configuring end_time_offset for CQ time range - -```sql -CREATE CONTINUOUS QUERY cq4 -RESAMPLE EVERY 20s RANGE 40s, 20s -BEGIN - SELECT max_value(temperature) - INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) - FROM root.ln.*.* - GROUP BY(10s) - FILL(100.0) -END -``` - -### CQ without group by clause - -```sql -CREATE CONTINUOUS QUERY cq5 -RESAMPLE EVERY 20s -BEGIN - SELECT temperature + 1 - INTO root.precalculated_sg.::(temperature) - FROM root.ln.*.* - align by device -END -``` - -### CQ Management - -#### Listing continuous queries - -```sql -SHOW (CONTINUOUS QUERIES | CQS) -``` - -#### Dropping continuous queries - -```sql -DROP (CONTINUOUS QUERY | CQ) -``` - -#### Altering continuous queries - -CQs can't be altered once they're created. To change a CQ, you must `DROP` and re`CREATE` it with the updated settings. - -## USER-DEFINED FUNCTION (UDF) - -For more details, see document [Operator-and-Expression](../Reference/UDF-Libraries.md). - -### UDF Registration - -```sql -CREATE FUNCTION AS (USING URI URI-STRING)? -``` - -### UDF Deregistration - -```sql -DROP FUNCTION -``` - -### UDF Queries - -```sql -SELECT example(*) from root.sg.d1 -SELECT example(s1, *) from root.sg.d1 -SELECT example(*, *) from root.sg.d1 - -SELECT example(s1, 'key1'='value1', 'key2'='value2'), example(*, 'key3'='value3') FROM root.sg.d1; -SELECT example(s1, s2, 'key1'='value1', 'key2'='value2') FROM root.sg.d1; - -SELECT s1, s2, example(s1, s2) FROM root.sg.d1; -SELECT *, example(*) FROM root.sg.d1 DISABLE ALIGN; -SELECT s1 * example(* / s1 + s2) FROM root.sg.d1; -SELECT s1, s2, s1 + example(s1, s2), s1 - example(s1 + example(s1, s2) / s2) FROM root.sg.d1; -``` - -### Show All Registered UDFs - -```sql -SHOW FUNCTIONS -``` - -## ADMINISTRATION MANAGEMENT - -For more details, see document [Operator-and-Expression](../User-Manual/Operator-and-Expression.md). - -### SQL Statements - -- Create user (Requires MANAGE_USER permission) - -```SQL -CREATE USER -eg: CREATE USER user1 'passwd' -``` - -- Delete user (Requires MANAGE_USER permission) - -```sql -DROP USER -eg: DROP USER user1 -``` - -- Create role (Requires MANAGE_ROLE permission) - -```sql -CREATE ROLE -eg: CREATE ROLE role1 -``` - -- Delete role (Requires MANAGE_ROLE permission) - -```sql -DROP ROLE -eg: DROP ROLE role1 -``` - -- Grant role to user (Requires MANAGE_ROLE permission) - -```sql -GRANT ROLE TO -eg: GRANT ROLE admin TO user1 -``` - -- Revoke role from user(Requires MANAGE_ROLE permission) - -```sql -REVOKE ROLE FROM -eg: REVOKE ROLE admin FROM user1 -``` - -- List all user (Requires MANAGE_USER permission) - -```sql -LIST USER -``` - -- List all role (Requires MANAGE_ROLE permission) - -```sql -LIST ROLE -``` - -- List all users granted specific role.(Requires MANAGE_USER permission) - -```sql -LIST USER OF ROLE -eg: LIST USER OF ROLE roleuser -``` - -- List all role granted to specific user. - -```sql -LIST ROLE OF USER -eg: LIST ROLE OF USER tempuser -``` - -- List all privileges of user - -```sql -LIST PRIVILEGES OF USER ; -eg: LIST PRIVILEGES OF USER tempuser; -``` - -- List all privileges of role - -```sql -LIST PRIVILEGES OF ROLE ; -eg: LIST PRIVILEGES OF ROLE actor; -``` - -- Update password - -```sql -ALTER USER SET PASSWORD ; -eg: ALTER USER tempuser SET PASSWORD 'newpwd'; -``` - -### Authorization and Deauthorization - - -```sql -GRANT ON TO ROLE/USER [WITH GRANT OPTION]; -eg: GRANT READ ON root.** TO ROLE role1; -eg: GRANT READ_DATA, WRITE_DATA ON root.t1.** TO USER user1; -eg: GRANT READ_DATA, WRITE_DATA ON root.t1.**,root.t2.** TO USER user1; -eg: GRANT MANAGE_ROLE ON root.** TO USER user1 WITH GRANT OPTION; -eg: GRANT ALL ON root.** TO USER user1 WITH GRANT OPTION; -``` - -```sql -REVOKE ON FROM ROLE/USER ; -eg: REVOKE READ ON root.** FROM ROLE role1; -eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.** FROM USER user1; -eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.**, root.t2.** FROM USER user1; -eg: REVOKE MANAGE_ROLE ON root.** FROM USER user1; -eg: REVOKE ALL ON ROOT.** FROM USER user1; -``` - - -#### Delete Time Partition (experimental) - -``` -Eg: IoTDB > DELETE PARTITION root.ln 0,1,2 -``` - -#### Continuous Query,CQ - -``` -Eg: IoTDB > CREATE CONTINUOUS QUERY cq1 BEGIN SELECT max_value(temperature) INTO temperature_max FROM root.ln.*.* GROUP BY time(10s) END -``` - -#### Maintenance Command - -- FLUSH - -``` -Eg: IoTDB > flush -``` - -- MERGE - -``` -Eg: IoTDB > MERGE -Eg: IoTDB > FULL MERGE -``` - -- CLEAR CACHE - -```sql -Eg: IoTDB > CLEAR CACHE -``` - -- START REPAIR DATA - -```sql -Eg: IoTDB > START REPAIR DATA -``` - -- STOP REPAIR DATA - -```sql -Eg: IoTDB > STOP REPAIR DATA -``` - -- SET SYSTEM TO READONLY / WRITABLE - -``` -Eg: IoTDB > SET SYSTEM TO READONLY / WRITABLE -``` - -- Query abort - -``` -Eg: IoTDB > KILL QUERY 1 -``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Tools-System/Benchmark.md b/src/UserGuide/V1.3.0-2/Tools-System/Benchmark.md deleted file mode 100644 index f63405b7c..000000000 --- a/src/UserGuide/V1.3.0-2/Tools-System/Benchmark.md +++ /dev/null @@ -1,336 +0,0 @@ - - -# Benchmark Tool - -IoT-benchmark is a time-series database benchmarking tool based on Java and big data environment, developed and open sourced by School of Software Tsinghua University. It is easy to use, supports multiple writing and query methods, supports storing test information and results for further query or analysis, and supports integration with Tableau to visualize test results. - -Figure 1-1 below includes the test benchmark process and other extended functions. These processes can be unified by IoT-benchmark. IoT Benchmark supports a variety of workloads, including **pure write, pure query, write query mixed**, etc., supports **software and hardware system monitoring, test metric measurement** and other monitoring functions, and also realizes **initializing the database automatically, test data analysis and system parameter optimization** functions. - -![img](/img/benchmark-English1.png) - - -Figure 1-1 - -Referring to the YCSB test tool's design idea of separating the three components of workload generation, performance metric measurement and database interface, the modular design of IoT-benchmark is shown in Figure 1-2. Different from the YCSB-based test tool system, IoT-benchmark adds a system monitoring module to support the persistence of test data and system monitoring data. In addition, some special load testing functions especially designed for time series data scenarios have been added, such as supporting batch writing and multiple out-of-sequence data writing modes for IoT scenarios. - -![img](/img/benchmark-%20English2.png) - - -Figure 1-2 - -Currently IoT-benchmark supports the following time series databases, versions and connection methods: - -| Database | Version | Connection mmethod | -| --------------- | ------- | -------------------------------------------------------- | -| InfluxDB | v1.x
v2.0 | SDK | | -| TimescaleDB | -- | jdbc | -| OpenTSDB | -- | Http Request | -| QuestDB | v6.0.7 | jdbc | -| TDengine | v2.2.0.2 | jdbc | -| VictoriaMetrics | v1.64.0 | Http Request | -| KairosDB | -- | Http Request | -| IoTDB | v1.x
v0.13 | jdbc、sessionByTablet、sessionByRecord、sessionByRecords | - -Table 1-1 Comparison of big data test benchmarks - -## Software Installation and Environment Setup - -### Prerequisites - -1. Java 8 -2. Maven 3.6+ -3. The corresponding appropriate version of the database, such as Apache IoTDB 1.0 - -### How to Get IoT Benchmark - -- **Get the binary package**: Enter https://github.com/thulab/iot-benchmark/releases to download the required installation package. Download it as a compressed file, select a folder to decompress and use it. -- Compiled from source (can be tested with Apache IoTDB 1.0): - - The first step (compile the latest IoTDB Session package): Enter the official website https://github.com/apache/iotdb/tree/rel/1.0 to download the IoTDB source code, and run the command `mvn clean package install -pl session -am -DskipTests` in the root directory to compiles the latest package for IoTDB Session. - - The second step (compile the IoTDB Benchmark test package): Enter the official website https://github.com/thulab/iot-benchmark to download the source code, run `mvn clean package install -pl iotdb-1.0 -am -DskipTests` in the root directory to compile Apache IoTDB version 1.0 test package. The relative path between the test package and the root directory is `./iotdb-1.0/target/iotdb-1.0-0.0.1/iotdb-1.0-0.0.1`. - -### IoT Benchmark's Test Package Structure - -The directory structure of the test package is shown in Figure 1-3 below. The test configuration file is conf/config.properties, and the test startup scripts are benchmark\.sh (Linux & MacOS) and benchmark.bat (Windows). The detailed usage of the files is shown in Table 1-2. - -![](/img/bm3.png) - -Figure 1-3 List of files and folders - -| Name | File | Usage | -| ---------------- | ----------------- | -------------------------------- | -| benchmark.bat | - | Startup script on Windows | -| benchmark\.sh | - | Startup script on Linux/Mac | -| conf | config.properties | Test scenario configuration file | -| logback.xml | - | Log output configuration file | -| lib | - | Dependency library | -| LICENSE | - | License file | -| bin | startup\.sh | Init script folder | -| ser-benchmark\.sh | - | Monitor mode startup script | - -Table 1-2 Usage list of files and folders - -### IoT Benchmark Execution Test - -1. Modify the configuration file according to the test requirements. For the main parameters, see next chapter. The corresponding configuration file is conf/config.properties. For example, to test Apache IoTDB 1.0, you need to modify DB_SWITCH=IoTDB-100-SESSION_BY_TABLET. -2. Start the time series database under test. -3. Running. -4. Start IoT-benchmark to execute the test. Observe the status of the time series database and IoT-benchmark under test during execution, and view the results and analyze the test process after execution. - -### IoT Benchmark Results Interpretation - -All the log files of the test are stored in the logs folder, and the test results are stored in the data/csvOutput folder after the test is completed. For example, after the test, we get the following result matrix: - -![](/img/bm4.png) - -- Result Matrix - - OkOperation: successful operations - - OkPoint: For write operations, it is the number of points successfully written; for query operations, it is the number of points successfully queried. - - FailOperation: failed operations - - FailPoint: For write operations, it is the number of write failure points -- Latency(mx) Matrix - - AVG: average operation time - - MIN: minimum operation time - - Pn: the quantile value of the overall distribution of operations, for example, P25 is the lower quartile. - -## Main Parameters - -This chapter mainly explains the purpose and configuration method of the main parameters. - -### Working Mode and Operation Proportion - -- The working mode parameter "BENCHMARK_WORK_MODE" can be selected as "default mode" and "server monitoring"; the "server monitoring" mode can be started directly by executing the ser-benchmark\.sh script, and the script will automatically modify this parameter. "Default mode" is a commonly used test mode, combined with the configuration of the OPERATION_PROPORTION parameter to achieve the definition of test operation proportions of "pure write", "pure query" and "read-write mix". - -- When running ServerMode to monitor the operating environment of the time series database under test, IoT-benchmark relies on sysstat software related commands; if MySQL or IoTDB is selected for persistent test process data, this type of database needs to be installed; the recording mode of ServerMode and CSV can only be used in the Linux system to record relevant system information during the test. Therefore, we recommend using MacOs or Linux system. This article uses Linux (Centos7) system as an example. If you use Windows system, you can use the benchmark.bat script in the conf folder to start IoT-benchmark. - -Table 1-3 Test mode - -| Mode Name | BENCHMARK_WORK_MODE | Description | -| ------------ | ------------------- | ------------------------------------------------------------ | -| default mode | testWithDefaultPath | Supports mixed workloads with multiple read and write operations | -| server mode | serverMODE | Server resource usage monitoring mode (running in this mode is started by the ser-benchmark\.sh script, no need to manually configure this parameter) | - -### Server Connection Information - -After the working mode is specified, how to inform IoT-benchmark of the information of the time series database under test? Currently, the type of the time-series database under test is informed through "DB_SWITCH"; the network address of the time-series database under test is informed through "HOST"; the network port of the time-series database under test is informed through "PORT"; the login user name of the time-series database under test is informed through "USERNAME"; "PASSWORD" informs the password of the login user of the time series database under test; informs the name of the time series database under test through "DB_NAME"; informs the connection authentication token of the time series database under test through "TOKEN" (used by InfluxDB 2.0). - -### Write Scene Setup Parameters - -Table 1-4 Write scene setup parameters - -| Parameter Name | Type | Example | Description | -| -------------------------- | --------- | ------------------------- | ------------------------------------------------------------ | -| CLIENT_NUMBER | Integer | 100 | Total number of clients | -| GROUP_NUMBER | Integer | 20 | Number of storage groups; only for IoTDB. | -| DEVICE_NUMBER | Integer | 100 | Total number of devices | -| SENSOR_NUMBER | Integer | 300 | Total number of sensors per device | -| INSERT_DATATYPE_PROPORTION | String | 1:1:1:1:1:1 | the data type proportion of the device, BOOLEAN:INT32:INT64:FLOAT:DOUBLE:TEXT | -| POINT_STEP | Integer | 1000 | Timestamp interval, that is, the fixed length between two timestamps of generated data. | -| OP_MIN_INTERVAL | Integer | 0 | Minimum operation execution interval: if the operation time is greater than this value, execute the next one immediately, otherwise wait (OP_MIN_INTERVAL-actual execution time) ms; if it is 0, the parameter will not take effect; if it is -1, its value is consistent with POINT_STEP. | -| IS_OUT_OF_ORDER | Boolean | false | Whether to write out of order | -| OUT_OF_ORDER_RATIO | Float | 0.3 | Ratio of data written out of order | -| BATCH_SIZE_PER_WRITE | Integer | 1 | Number of data rows written in batches (how many rows of data are written at a time) | -| START_TIME | Timestamp | 2022-10-30T00:00:00+08:00 | The start timestamp of writing data; use this timestamp as the starting point to start the simulation to create the data timestamp. | -| LOOP | Integer | 86400 | Total number of operations: Each type of operation will be divided according to the ratio defined by OPERATION_PROPORTION | -| OPERATION_PROPORTION | String | 1:0:0:0:0:0:0:0:0:0:0 | The ratio of each operation. Write:Q1:Q2:Q3:Q4:Q5:Q6:Q7:Q8:Q9:Q10, please note the use of English colons. Each term in the scale is an integer. | - -According to the configuration parameters in Table 1-4, the test scenario can be described as follows: write 30,000 (100 devices, 300 sensors for each device) time series sequential data for a day on October 30, 2022 to the time series database under test, in total 2.592 billion data points. The 300 sensor data types of each device are 50 Booleans, 50 integers, 50 long integers, 50 floats, 50 doubles, and 50 characters. If we change the value of IS_OUT_OF_ORDER in the table to true, then the scenario is: write 30,000 time series data on October 30, 2022 to the measured time series database, and there are 30% out of order data ( arrives in the time series database later than other data points whose generation time is later than itself). - -### Query Scene Setup Parameters - -Table 1-5 Query scene setup parameters - -| Parameter Name | Type | Example | Description | -| -------------------- | ------- | --------------------- | ------------------------------------------------------------ | -| QUERY_DEVICE_NUM | Integer | 2 | The number of devices involved in the query in each query statement. | -| QUERY_SENSOR_NUM | Integer | 2 | The number of sensors involved in the query in each query statement. | -| QUERY_AGGREGATE_FUN | String | count | Aggregate functions used in aggregate queries, such as count, avg, sum, max_time, etc. | -| STEP_SIZE | Integer | 1 | The change step of the starting time point of the time filter condition, if set to 0, the time filter condition of each query is the same, unit: POINT_STEP. | -| QUERY_INTERVAL | Integer | 250000 | The time interval between the start time and the end time in the start and end time query, and the time interval in Group By. | -| QUERY_LOWER_VALUE | Integer | -5 | Parameters for conditional query clauses, where xxx > QUERY_LOWER_VALUE. | -| GROUP_BY_TIME_UNIT | Integer | 20000 | The size of the group in the Group By statement. | -| LOOP | Integer | 10 | Total number of operations. Each type of operation will be divided according to the ratio defined by OPERATION_PROPORTION. | -| OPERATION_PROPORTION | String | 0:0:0:0:0:0:0:0:0:0:1 | Write:Q1:Q2:Q3:Q4:Q5:Q6:Q7:Q8:Q9:Q10 | - -Table 1-6 Query types and example SQL - -| Id | Query Type | IoTDB Example SQL | -| ---- | ---------------------------------------------------- | ------------------------------------------------------------ | -| Q1 | exact point query | select v1 from root.db.d1 where time = ? | -| Q2 | time range query | select v1 from root.db.d1 where time > ? and time < ? | -| Q3 | time range query with value filtering | select v1 from root.db.d1 where time > ? and time < ? and v1 > ? | -| Q4 | time range aggregation query | select count(v1) from root.db.d1 where and time > ? and time < ? | -| Q5 | full time range aggregate query with value filtering | select count(v1) from root.db.d1 where v1 > ? | -| Q6 | time range aggregation query with value filtering | select count(v1) from root.db.d1 where v1 > ? and time > ? and time < ? | -| Q7 | time grouping aggregation query | select count(v1) from root.db.d1 group by ([?, ?), ?, ?) | -| Q8 | latest point query | select last v1 from root.db.d1 | -| Q9 | reverse order time range query | select v1 from root.sg.d1 where time > ? and time < ? order by time desc | -| Q10 | reverse order time range query with value filtering | select v1 from root.sg.d1 where time > ? and time < ? and v1 > ? order by time desc | - -According to the configuration parameters in Table 1-5, the test scenario can be described as follows: Execute 10 reverse order time range queries with value filtering for 2 devices and 2 sensors from the time series database under test. The SQL statement is: `select s_0,s_31from data where time >2022-10-30T00:00:00+08:00 and time < 2022-10-30T00:04:10+08:00 and s_0 > -5 and device in d_21,d_46 order by time desc`. - -### Persistence of Test Process and Test Results - -IoT-benchmark currently supports persisting the test process and test results to IoTDB, MySQL, and CSV through the configuration parameter "TEST_DATA_PERSISTENCE"; writing to MySQL and CSV can define the upper limit of the number of rows in the sub-database and sub-table, such as "RECORD_SPLIT=true, RECORD_SPLIT_MAX_LINE=10000000" means that each database table or CSV file is divided and stored according to the total number of 10 million rows; if the records are recorded to MySQL or IoTDB, database link information needs to be provided, including "TEST_DATA_STORE_IP" the IP address of the database, "TEST_DATA_STORE_PORT" the port number of the database, "TEST_DATA_STORE_DB" the name of the database, "TEST_DATA_STORE_USER" the database user name, and "TEST_DATA_STORE_PW" the database user password. - -If we set "TEST_DATA_PERSISTENCE=CSV", we can see the newly generated data folder under the IoT-benchmark root directory during and after the test execution, which contains the csv folder to record the test process; the csvOutput folder to record the test results . If we set "TEST_DATA_PERSISTENCE=MySQL", it will create a data table named "testWithDefaultPath_tested database name_remarks_test start time" in the specified MySQL database before the test starts to record the test process; it will record the test process in the "CONFIG" data table (create the table if it does not exist), write the configuration information of this test; when the test is completed, the result of this test will be written in the data table named "FINAL_RESULT" (create the table if it does not exist). - -## Use Case - -We take the application of CRRC Qingdao Sifang Vehicle Research Institute Co., Ltd. as an example, and refer to the scene described in "Apache IoTDB in Intelligent Operation and Maintenance Platform Storage" for practical operation instructions. - -Test objective: Simulate the actual needs of switching time series databases in the scene of CRRC Qingdao Sifang Institute, and compare the performance of the expected IoTDB and KairosDB used by the original system. - -Test environment: In order to ensure that the impact of other irrelevant services and processes on database performance and the mutual influence between different databases are eliminated during the experiment, the local databases in this experiment are deployed and run on multiple independent virtual servers with the same resource configuration. Therefore, this experiment set up 4 Linux (CentOS7 /x86) virtual machines, and deployed IoT-benchmark, IoTDB database, KairosDB database, and MySQL database on them respectively. The specific resource configuration of each virtual machine is shown in Table 2-1. The specific usage of each virtual machine is shown in Table 2-2. - -Table 2-1 Virtual machine configuration information - -| Hardware Configuration Information | Value | -| ---------------------------------- | ------- | -| OS system | CentOS7 | -| number of CPU cores | 16 | -| memory | 32G | -| hard disk | 200G | -| network | Gigabit | - -Table 2-2 Virtual machine usage - -| IP | Usage | -| ---------- | ------------- | -| 172.21.4.2 | IoT-benchmark | -| 172.21.4.3 | Apache-iotdb | -| 172.21.4.4 | KaiosDB | -| 172.21.4.5 | MySQL | - -### Write Test - -Scenario description: Create 100 clients to simulate 100 trains, each train has 3000 sensors, the data type is DOUBLE, the data time interval is 500ms (2Hz), and they are sent sequentially. Referring to the above requirements, we need to modify the IoT-benchmark configuration parameters as listed in Table 2-3. - -Table 2-3 Configuration parameter information - -| Parameter Name | IoTDB Value | KairosDB Value | -| -------------------------- | --------------------------- | -------------- | -| DB_SWITCH | IoTDB-013-SESSION_BY_TABLET | KairosDB | -| HOST | 172.21.4.3 | 172.21.4.4 | -| PORT | 6667 | 8080 | -| BENCHMARK_WORK_MODE | testWithDefaultPath | | -| OPERATION_PROPORTION | 1:0:0:0:0:0:0:0:0:0:0 | | -| CLIENT_NUMBER | 100 | | -| GROUP_NUMBER | 10 | | -| DEVICE_NUMBER | 100 | | -| SENSOR_NUMBER | 3000 | | -| INSERT_DATATYPE_PROPORTION | 0:0:0:0:1:0 | | -| POINT_STEP | 500 | | -| OP_MIN_INTERVAL | 0 | | -| IS_OUT_OF_ORDER | false | | -| BATCH_SIZE_PER_WRITE | 1 | | -| LOOP | 10000 | | -| TEST_DATA_PERSISTENCE | MySQL | | -| TEST_DATA_STORE_IP | 172.21.4.5 | | -| TEST_DATA_STORE_PORT | 3306 | | -| TEST_DATA_STORE_DB | demo | | -| TEST_DATA_STORE_USER | root | | -| TEST_DATA_STORE_PW | admin | | -| REMARK | demo | | - -First, start the tested time series databases Apache-IoTDB and KairosDB on 172.21.4.3 and 172.21.4.4 respectively, and then start server resource monitoring through the ser-benchamrk\.sh script on 172.21.4.2, 172.21.4.3 and 172.21.4.4 (Figure 2-1). Then modify the conf/config.properties files in the iotdb-0.13-0.0.1 and kairosdb-0.0.1 folders in 172.21.4.2 according to Table 2-3 to meet the test requirements. Use benchmark\.sh to start the writing test of Apache-IoTDB and KairosDB successively. - -![](/img/bm5.png) - -Figure 2-1 Server monitoring tasks - -For example, if we first start the test on KairosDB, IoT-benchmark will create a CONFIG data table in the MySQL database to store the configuration information of this test (Figure 2-2), and there will be a log output of the current test progress during the test execution (Figure 2-3) . When the test is completed, the test result will be output (Figure 2-3), and the result will be written into the FINAL_RESULT data table (Figure 2-4). - -![](/img/bm6.png) - -Figure 2-2 Test configuration information table - -![](/img/bm7.png) -![](/img/bm8.png) -![](/img/bm9.png) -![](/img/bm10.png) - -Figure 2-3 Test progress and results - -![](/img/bm11.png) - -Figure 2-4 Test result table - -Afterwards, we will start the test on Apache-IoTDB. The same IoT-benchmark will write the test configuration information in the MySQL database CONFIG data table. During the test execution, there will be a log to output the current test progress. When the test is completed, the test result will be output, and the result will be written into the FINAL_RESULT data table. - -According to the test result information, we know that under the same configuration the write delay times of Apache-IoTDB and KairosDB are 55.98ms and 1324.45ms respectively; the write throughputs are 5,125,600.86 points/second and 224,819.01 points/second respectively; the tests were executed respectively 585.30 seconds and 11777.99 seconds. And KairosDB has a write failure. After investigation, it is found that the data disk usage has reached 100%, and there is no disk space to continue receiving data. However, Apache-IoTDB has no write failure, and the disk space occupied after all data is written is only 4.7G (as shown in Figure 2-5); Apache-IoTDB is better than KairosDB in terms of write throughput and disk occupation. Of course, there will be other tests in the follow-up to observe and compare from various aspects, such as query performance, file compression ratio, data security, etc. - -![](/img/bm12.png) - -Figure 2-5 Disk usage - -So what is the resource usage of each server during the test? What is the specific performance of each write operation? At this time, we can visualize the data in the server monitoring table and test process recording table by installing and using Tableau. The use of Tableau will not be introduced in this article. After connecting to the data table for test data persistence, the specific results are as follows (taking Apache-IoTDB as an example): - -![](/img/bm13.png) -![](/img/bm14.png) - -Figure 2-6 Visualization of testing process in Tableau - -### Query Test - -Scenario description: In the writing test scenario, 10 clients are simulated to perform all types of query tasks on the data stored in the time series database Apache-IoTDB. The configuration is as follows. - -Table 2-4 Configuration parameter information - -| Parameter Name | Example | -| -------------------- | --------------------- | -| CLIENT_NUMBER | 10 | -| QUERY_DEVICE_NUM | 2 | -| QUERY_SENSOR_NUM | 2 | -| QUERY_AGGREGATE_FUN | count | -| STEP_SIZE | 1 | -| QUERY_INTERVAL | 250000 | -| QUERY_LOWER_VALUE | -5 | -| GROUP_BY_TIME_UNIT | 20000 | -| LOOP | 30 | -| OPERATION_PROPORTION | 0:1:1:1:1:1:1:1:1:1:1 | - -Results: - -![img](/img/bm15.png) - -Figure 2-7 Query test results - -### Description of Other Parameters - -In the previous chapters, the write performance comparison between Apache-IoTDB and KairosDB was performed, but if the user wants to perform a simulated real write rate test, how to configure it? How to control if the test time is too long? Are there any regularities in the generated simulated data? If the IoT-Benchmark server configuration is low, can multiple machines be used to simulate pressure output? - -Table 2-5 Configuration parameter information - -| Scenario | Parameter | Value | Notes | -| ------------------------------------------------------------ | -------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -| Simulate real write rate | OP_INTERVAL | -1 | You can also enter an integer to control the operation interval. | -| Specify test duration (1 hour) | TEST_MAX_TIME | 3600000 | The unit is ms; the LOOP execution time needs to be greater than this value. | -| Define the law of simulated data: support all data types, and the number is evenly classified; support five data distributions, and the number is evenly distributed; the length of the string is 10; the number of decimal places is 2. | INSERT_DATATYPE_PROPORTION | 1:1:1:1:1:1 | Data type distribution proportion | -| LINE_RATIO | 1 | linear | | -| SIN_RATIO | 1 | Fourier function | | -| SQUARE_RATIO | 1 | Square wave | | -| RANDOM_RATIO | 1 | Random number | | -| CONSTANT_RATIO | 1 | Constant | | -| STRING_LENGTH | 10 | String length | | -| DOUBLE_LENGTH | 2 | Decimal places | | -| Three machines simulate data writing of 300 devices | BENCHMARK_CLUSTER | true | Enable multi-benchmark mode | -| BENCHMARK_INDEX | 0, 1, 3 | Take the writing parameters in the [write test](./Benchmark.md#write-test) as an example: No. 0 is responsible for writing data of device numbers 0-99; No. 1 is responsible for writing data of device numbers 100-199; No. 2 is responsible for writing data of device numbers 200-299. | | \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Tools-System/CLI.md b/src/UserGuide/V1.3.0-2/Tools-System/CLI.md deleted file mode 100644 index d5034ea71..000000000 --- a/src/UserGuide/V1.3.0-2/Tools-System/CLI.md +++ /dev/null @@ -1,295 +0,0 @@ - - -# Command Line Interface (CLI) - - -IoTDB provides Cli/shell tools for users to interact with IoTDB server in command lines. This document shows how Cli/shell tool works and the meaning of its parameters. - -> Note: In this document, \$IOTDB\_HOME represents the path of the IoTDB installation directory. - -## Installation - -If you use the source code version of IoTDB, then under the root path of IoTDB, execute: - -```shell -> mvn clean package -pl iotdb-client/cli -am -DskipTests -P get-jar-with-dependencies -``` - -After build, the IoTDB Cli will be in the folder "cli/target/iotdb-cli-{project.version}". - -If you download the binary version, then the Cli can be used directly in sbin folder. - -## Running - -### Running Cli - -After installation, there is a default user in IoTDB: `root`, and the -default password is `root`. Users can use this username to try IoTDB Cli/Shell tool. The cli startup script is the `start-cli` file under the \$IOTDB\_HOME/bin folder. When starting the script, you need to specify the IP and PORT. (Make sure the IoTDB cluster is running properly when you use Cli/Shell tool to connect to it.) - -Here is an example where the cluster is started locally and the user has not changed the running port. The default rpc port is -6667
-If you need to connect to the remote DataNode or changes -the rpc port number of the DataNode running, set the specific IP and RPC PORT at -h and -p.
-You also can set your own environment variable at the front of the start script ("/sbin/start-cli.sh" for linux and "/sbin/start-cli.bat" for windows) - -The Linux and MacOS system startup commands are as follows: - -```shell -Shell > bash sbin/start-cli.sh -h 127.0.0.1 -p 6667 -u root -pw root -``` - -The Windows system startup commands are as follows: - -```shell -Shell > sbin\start-cli.bat -h 127.0.0.1 -p 6667 -u root -pw root -``` - -After operating these commands, the cli can be started successfully. The successful status will be as follows: - -``` - _____ _________ ______ ______ -|_ _| | _ _ ||_ _ `.|_ _ \ - | | .--.|_/ | | \_| | | `. \ | |_) | - | | / .'`\ \ | | | | | | | __'. - _| |_| \__. | _| |_ _| |_.' /_| |__) | -|_____|'.__.' |_____| |______.'|_______/ version - - -Successfully login at 127.0.0.1:6667 -IoTDB> -``` - -Enter ```quit``` or `exit` can exit Cli. - -### Cli Parameters - -| Parameter name | Parameter type | Required | Description | Example | -| :--------------------------- | :------------------------- | :------- | :----------------------------------------------------------- | :------------------ | -| -disableISO8601 | No parameters | No | If this parameter is set, IoTDB will print the timestamp in digital form | -disableISO8601 | -| -h <`host`> | string, no quotation marks | Yes | The IP address of the IoTDB server | -h 10.129.187.21 | -| -help | No parameters | No | Print help information for IoTDB | -help | -| -p <`rpcPort`> | int | Yes | The rpc port number of the IoTDB server. IoTDB runs on rpc port 6667 by default | -p 6667 | -| -pw <`password`> | string, no quotation marks | No | The password used for IoTDB to connect to the server. If no password is entered, IoTDB will ask for password in Cli command | -pw root | -| -u <`username`> | string, no quotation marks | Yes | User name used for IoTDB to connect the server | -u root | -| -maxPRC <`maxPrintRowCount`> | int | No | Set the maximum number of rows that IoTDB returns | -maxPRC 10 | -| -e <`execute`> | string | No | manipulate IoTDB in batches without entering cli input mode | -e "show databases" | -| -c | empty | No | If the server enables `rpc_thrift_compression_enable=true`, then cli must use `-c` | -c | - -Following is a cli command which connects the host with IP -10.129.187.21, rpc port 6667, username "root", password "root", and prints the timestamp in digital form. The maximum number of lines displayed on the IoTDB command line is 10. - -The Linux and MacOS system startup commands are as follows: - -```shell -Shell > bash sbin/start-cli.sh -h 10.129.187.21 -p 6667 -u root -pw root -disableISO8601 -maxPRC 10 -``` - -The Windows system startup commands are as follows: - -```shell -Shell > sbin\start-cli.bat -h 10.129.187.21 -p 6667 -u root -pw root -disableISO8601 -maxPRC 10 -``` - -### CLI Special Command - -Special commands of Cli are below. - -| Command | Description / Example | -| :-------------------------- | :------------------------------------------------------ | -| `set time_display_type=xxx` | eg. long, default, ISO8601, yyyy-MM-dd HH:mm:ss | -| `show time_display_type` | show time display type | -| `set time_zone=xxx` | eg. +08:00, Asia/Shanghai | -| `show time_zone` | show cli time zone | -| `set fetch_size=xxx` | set fetch size when querying data from server | -| `show fetch_size` | show fetch size | -| `set max_display_num=xxx` | set max lines for cli to output, -1 equals to unlimited | -| `help` | Get hints for CLI special commands | -| `exit/quit` | Exit CLI | - -### Note on using the CLI with OpenID Connect Auth enabled on Server side - -Openid connect (oidc) uses keycloack as the authority authentication service of oidc service - - -#### configuration - -The configuration is located in iotdb-common.properties , set the author_provider_class is org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer Openid service is enabled, and the default value is org.apache.iotdb.db.auth.authorizer.LocalFileAuthorizer Indicates that the openid service is not enabled. - -``` -authorizer_provider_class=org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer -``` - -If the openid service is turned on, openid_URL is required,openID_url value is http://ip:port/realms/{realmsName} - -``` -openID_url=http://127.0.0.1:8080/realms/iotdb/ -``` - -#### keycloack configuration - -1、Download the keycloack file (This tutorial is version 21.1.0) and start keycloack in keycloack/bin - -```shell -Shell >cd bin -Shell >./kc.sh start-dev -``` - -2、use url(https://ip:port) login keycloack, the first login needs to create a user -![avatar](/img/UserGuide/CLI/Command-Line-Interface/login_keycloak.png?raw=true) - -3、Click administration console -![avatar](/img/UserGuide/CLI/Command-Line-Interface/AdministrationConsole.png?raw=true) - -4、In the master menu on the left, click Create realm and enter Realm name to create a new realm -![avatar](/img/UserGuide/CLI/Command-Line-Interface/add_Realm_1.jpg?raw=true) - -![avatar](/img/UserGuide/CLI/Command-Line-Interface/add_Realm_2.jpg?raw=true) - - -5、Click the menu clients on the left to create clients - -![avatar](/img/UserGuide/CLI/Command-Line-Interface/client.jpg?raw=true) - -6、Click user on the left menu to create user - -![avatar](/img/UserGuide/CLI/Command-Line-Interface/user.jpg?raw=true) - -7、Click the newly created user ID, click the credentials navigation, enter the password and close the temporary option. The configuration of keycloud is completed - -![avatar](/img/UserGuide/CLI/Command-Line-Interface/pwd.jpg?raw=true) - -8、To create a role, click Roles on the left menu and then click the Create Role button to add a role - -![avatar](/img/UserGuide/CLI/Command-Line-Interface/add_role1.jpg?raw=true) - -9、 Enter `iotdb_admin` in the Role Name and click the save button. Tip: `iotdb_admin` here cannot be any other name, otherwise even after successful login, you will not have permission to use iotdb's query, insert, create database, add users, roles and other functions - -![avatar](/img/UserGuide/CLI/Command-Line-Interface/add_role2.jpg?raw=true) - -10、Click on the User menu on the left and then click on the user in the user list to add the `iotdb_admin` role we just created for that user - -![avatar](/img/UserGuide/CLI/Command-Line-Interface/add_role3.jpg?raw=true) - -11、 Select Role Mappings, select the `iotdb_admin` role in Assign Role - -![avatar](/img/UserGuide/CLI/Command-Line-Interface/add_role4.jpg?raw=true) - -![avatar](/img/UserGuide/CLI/Command-Line-Interface/add_role5.jpg?raw=true) - - -Tip: If the user role is adjusted, you need to regenerate the token and log in to iotdb again to take effect - -The above steps provide a way for keycloak to log into iotdb. For more ways, please refer to keycloak configuration - -If OIDC is enabled on server side then no username / passwort is needed but a valid Access Token from the OIDC Provider. -So as username you use the token and the password has to be empty, e.g. - -```shell -Shell > bash sbin/start-cli.sh -h 10.129.187.21 -p 6667 -u {my-access-token} -pw "" -``` - -Among them, you need to replace {my access token} (note, including {}) with your token, that is, the value corresponding to access_token. The password is empty and needs to be confirmed again. - -![avatar](/img/UserGuide/CLI/Command-Line-Interface/iotdbpw.jpeg?raw=true) - - -How to get the token is dependent on your OpenID Connect setup and not covered here. -In the simplest case you can get this via the command line with the `passwort-grant`. -For example, if you use keycloack as OIDC and you have a realm with a client `iotdb` defined as public you could use -the following `curl` command to fetch a token (replace all `{}` with appropriate values). - -```shell -curl -X POST "https://{your-keycloack-server}/realms/{your-realm}/protocol/openid-connect/token" \ - -H "Content-Type: application/x-www-form-urlencoded" \ - -d "username={username}" \ - -d "password={password}" \ - -d 'grant_type=password' \ - -d "client_id=iotdb-client" -``` - -The response looks something like - -```json -{"access_token":"eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJxMS1XbTBvelE1TzBtUUg4LVNKYXAyWmNONE1tdWNXd25RV0tZeFpKNG93In0.eyJleHAiOjE1OTAzOTgwNzEsImlhdCI6MTU5MDM5Nzc3MSwianRpIjoiNjA0ZmYxMDctN2NiNy00NTRmLWIwYmQtY2M2ZDQwMjFiNGU4IiwiaXNzIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwiYXVkIjoiYWNjb3VudCIsInN1YiI6ImJhMzJlNDcxLWM3NzItNGIzMy04ZGE2LTZmZThhY2RhMDA3MyIsInR5cCI6IkJlYXJlciIsImF6cCI6ImlvdGRiIiwic2Vzc2lvbl9zdGF0ZSI6IjA2MGQyODYyLTE0ZWQtNDJmZS1iYWY3LThkMWY3ODQ2NTdmMSIsImFjciI6IjEiLCJhbGxvd2VkLW9yaWdpbnMiOlsibG9jYWxob3N0OjgwODAiXSwicmVhbG1fYWNjZXNzIjp7InJvbGVzIjpbIm9mZmxpbmVfYWNjZXNzIiwidW1hX2F1dGhvcml6YXRpb24iLCJpb3RkYl9hZG1pbiJdfSwicmVzb3VyY2VfYWNjZXNzIjp7ImFjY291bnQiOnsicm9sZXMiOlsibWFuYWdlLWFjY291bnQiLCJtYW5hZ2UtYWNjb3VudC1saW5rcyIsInZpZXctcHJvZmlsZSJdfX0sInNjb3BlIjoiZW1haWwgcHJvZmlsZSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJ1c2VyIn0.nwbrJkWdCNjzFrTDwKNuV5h9dDMg5ytRKGOXmFIajpfsbOutJytjWTCB2WpA8E1YI3KM6gU6Jx7cd7u0oPo5syHhfCz119n_wBiDnyTZkFOAPsx0M2z20kvBLN9k36_VfuCMFUeddJjO31MeLTmxB0UKg2VkxdczmzMH3pnalhxqpnWWk3GnrRrhAf2sZog0foH4Ae3Ks0lYtYzaWK_Yo7E4Px42-gJpohy3JevOC44aJ4auzJR1RBj9LUbgcRinkBy0JLi6XXiYznSC2V485CSBHW3sseXn7pSXQADhnmGQrLfFGO5ZljmPO18eFJaimdjvgSChsrlSEmTDDsoo5Q","expires_in":300,"refresh_expires_in":1800,"refresh_token":"eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJhMzZlMGU0NC02MWNmLTQ5NmMtOGRlZi03NTkwNjQ5MzQzMjEifQ.eyJleHAiOjE1OTAzOTk1NzEsImlhdCI6MTU5MDM5Nzc3MSwianRpIjoiNmMxNTBiY2EtYmE5NC00NTgxLWEwODEtYjI2YzhhMmI5YmZmIiwiaXNzIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwiYXVkIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwic3ViIjoiYmEzMmU0NzEtYzc3Mi00YjMzLThkYTYtNmZlOGFjZGEwMDczIiwidHlwIjoiUmVmcmVzaCIsImF6cCI6ImlvdGRiIiwic2Vzc2lvbl9zdGF0ZSI6IjA2MGQyODYyLTE0ZWQtNDJmZS1iYWY3LThkMWY3ODQ2NTdmMSIsInNjb3BlIjoiZW1haWwgcHJvZmlsZSJ9.ayNpXdNX28qahodX1zowrMGiUCw2AodlHBQFqr8Ui7c","token_type":"bearer","not-before-policy":0,"session_state":"060d2862-14ed-42fe-baf7-8d1f784657f1","scope":"email profile"} -``` - -The interesting part here is the access token with the key `access_token`. -This has to be passed as username (with parameter `-u`) and empty password to the CLI. - -### Batch Operation of Cli - --e parameter is designed for the Cli/shell tool in the situation where you would like to manipulate IoTDB in batches through scripts. By using the -e parameter, you can operate IoTDB without entering the cli's input mode. - -In order to avoid confusion between statements and other parameters, the current version only supports the -e parameter as the last parameter. - -The usage of -e parameter for Cli/shell is as follows: - -The Linux and MacOS system commands: - -```shell -Shell > bash sbin/start-cli.sh -h {host} -p {rpcPort} -u {user} -pw {password} -e {sql for iotdb} -``` - -The Windows system commands: - -```shell -Shell > sbin\start-cli.bat -h {host} -p {rpcPort} -u {user} -pw {password} -e {sql for iotdb} -``` - -In the Windows environment, the SQL statement of the -e parameter needs to use ` `` ` to replace `" "` - -In order to better explain the use of -e parameter, take following as an example(On linux system). - -Suppose you want to create a database root.demo to a newly launched IoTDB, create a timeseries root.demo.s1 and insert three data points into it. With -e parameter, you could write a shell like this: - -```shell -# !/bin/bash - -host=127.0.0.1 -rpcPort=6667 -user=root -pass=root - -bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "create database root.demo" -bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "create timeseries root.demo.s1 WITH DATATYPE=INT32, ENCODING=RLE" -bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(1,10)" -bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(2,11)" -bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(3,12)" -bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "select s1 from root.demo" -``` - -The results are shown in the figure, which are consistent with the Cli and jdbc operations. - -```shell - Shell > bash ./shell.sh -+-----------------------------+------------+ -| Time|root.demo.s1| -+-----------------------------+------------+ -|1970-01-01T08:00:00.001+08:00| 10| -|1970-01-01T08:00:00.002+08:00| 11| -|1970-01-01T08:00:00.003+08:00| 12| -+-----------------------------+------------+ -Total line number = 3 -It costs 0.267s -``` - -It should be noted that the use of the -e parameter in shell scripts requires attention to the escaping of special characters. diff --git a/src/UserGuide/V1.3.0-2/Tools-System/Data-Import-Export-Tool.md b/src/UserGuide/V1.3.0-2/Tools-System/Data-Import-Export-Tool.md deleted file mode 100644 index 3d082fe8f..000000000 --- a/src/UserGuide/V1.3.0-2/Tools-System/Data-Import-Export-Tool.md +++ /dev/null @@ -1,278 +0,0 @@ - - -# Data Import Export Script - -IoTDB provides data import and export scripts (tools/export-data, tools/import-data, supported in versions 1.3.2 and above; for historical versions, tools/export-csv, tools/import-csv scripts can be used, see the reference link for usage [Document](./TsFile-Import-Export-Tool.md) ), which are used to facilitate the interaction between IoTDB internal data and external files, suitable for batch operations of single files or directories. - - -## Supported Data Formats - -- **CSV** : Plain text format for storing formatted data, which must be constructed according to the specified CSV format below. -- **SQL** : Files containing custom SQL statements. - -## export-data Script (Data Export) - -### Command - -```Bash -# Unix/OS X ->tools/export-data.sh -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] - -# Windows ->tools\export-data.bat -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] -``` - -Parameter Introduction: - -| Parameter | Definition | Required | Default | -|:-------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------|:-------------------------| -| -h | Database IP address | No | 127.0.0.1 | -| -p | Database port | No | 6667 | -| -u | Database connection username | No | root | -| -pw | Database connection password | No | root | -| -t | Output path for the exported CSV or SQL file(The parameter for V1.3.2 is `-td`) | Yes | | -| -datatype | Whether to print the corresponding data type behind the time series in the CSV file header, options are true or false | No | true | -| -q | Directly specify the query statement to be executed in the command (currently only supports some statements, see the table below for details).
Note: -q and -s parameters must be filled in one, and -q takes effect if both are filled. For detailed examples of supported SQL statements, please refer to the "SQL Statement Support Details" below. | No | | -| -s | Specify an SQL file, which may contain one or more SQL statements. If there are multiple SQL statements, they should be separated by newlines (returns). Each SQL statement corresponds to one or more output CSV or SQL files.
Note: -q and -s parameters must be filled in one, and -q takes effect if both are filled. For detailed examples of supported SQL statements, please refer to the "SQL Statement Support Details" below. | No | | -| -type | Specify the type of exported file, options are csv or sql | No | csv | -| -tf | Specify the time format. The time format must comply with the [ISO 8601](https://calendars.wikia.org/wiki/ISO_8601) standard or timestamp.
Note: Only effective when -type is csv | No | yyyy-MM-dd HH:mm:ss.SSSz | -| -lpf | Specify the maximum number of lines for the exported dump file(The parameter for V1.3.2 is `-linesPerFile`) | No | 10000 | -| -timeout | Specify the timeout time for session queries in milliseconds | No | -1 | - -SQL Statement Support Rules: - -1. Only query statements are supported; non-query statements (such as metadata management, system management, etc.) are not supported. For unsupported SQL, the program will automatically skip and output an error message. -2. In the current version of query statements, only the export of raw data is supported. If there are group by, aggregate functions, UDFs, or operational operators, they are not supported for export as SQL. When exporting raw data, please note that if exporting data from multiple devices, please use the align by device statement. Detailed examples are as follows: - -| | Supported for Export | Example | -|--------------------------------------------------------|----------------------|-----------------------------------------------| -| Raw data single device query | Supported | select * from root.s_0.d_0 | -| Raw data multi-device query (align by device) | Supported | select * from root.** align by device | -| Raw data multi-device query (without align by device) | Unsupported | select * from root.**
select * from root.s_0.* | - -### Running Examples - -- Export all data within a certain SQL execution range to a CSV file. -```Bash - # Unix/OS X - >tools/export-data.sh -t ./data/ -q 'select * from root.stock.**' - # Windows - >tools/export-data.bat -t ./data/ -q 'select * from root.stock.**' - ``` - -- Export Results - ```Bash - Time,root.stock.Legacy.0700HK.L1_BidPrice,root.stock.Legacy.0700HK.Type,root.stock.Legacy.0700HK.L1_BidSize,root.stock.Legacy.0700HK.Domain,root.stock.Legacy.0700HK.L1_BuyNo,root.stock.Legacy.0700HK.L1_AskPrice - 2024-07-29T18:37:18.700+08:00,0.9666617,3.0,0.021367407654674264,-6.0,false,0.8926191 - 2024-07-29T18:37:19.701+08:00,0.3057328,3.0,0.9965377284981661,-5.0,false,0.15167356 - ``` -- All data within the scope of all SQL statements in the SQL file is exported to CSV files. - ```Bash - # Unix/OS X - >tools/export-data.sh -t ./data/ -s export.sql - # Windows - >tools/export-data.bat -t ./data/ -s export.sql - ``` - -- Contents of export.sql File (Pointed to by -s Parameter) - ```SQL - select * from root.stock.** limit 100 - select * from root.db.** limit 100 - ``` - -- Export Result File 1 - ```Bash - Time,root.stock.Legacy.0700HK.L1_BidPrice,root.stock.Legacy.0700HK.Type,root.stock.Legacy.0700HK.L1_BidSize,root.stock.Legacy.0700HK.Domain,root.stock.Legacy.0700HK.L1_BuyNo,root.stock.Legacy.0700HK.L1_AskPrice - 2024-07-29T18:37:18.700+08:00,0.9666617,3.0,0.021367407654674264,-6.0,false,0.8926191 - 2024-07-29T18:37:19.701+08:00,0.3057328,3.0,0.9965377284981661,-5.0,false,0.15167356 - ``` - -- Export Result File 2 - ```Bash - Time,root.db.Random.RandomBoolean - 2024-07-22T17:16:05.820+08:00,true - 2024-07-22T17:16:02.597+08:00,false - ``` -- Export Data in SQL File to SQL Statements with Aligned Format - ```Bash - # Unix/OS X - >tools/export-data.sh -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true - # Windows - >tools/export-data.bat -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true - ``` - -- Export Results - ```Bash - INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249629831,0.62308747,2.0,0.012206747854849653,-6.0,false,0.14164352); - INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249630834,0.7520042,3.0,0.22760657101910464,-5.0,true,0.089064896); - INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249631835,0.3981064,3.0,0.6254559288663467,-6.0,false,0.9767922); - ``` -- Export Data in a Certain SQL Execution Range to a CSV File with Specified Time Format and Data Types - ```Bash - # Unix/OS X - >tools/export-data.sh -t ./data/ -tf 'yyyy-MM-dd HH:mm:ss' -datatype true -q "select * from root.stock.**" -type csv - # Windows - >tools/export-data.bat -t ./data/ -tf 'yyyy-MM-dd HH:mm:ss' -datatype true -q "select * from root.stock.**" -type csv - ``` - -- Export Results - ```Bash - Time,root.stock.Legacy.0700HK.L1_BidPrice(DOUBLE),root.stock.Legacy.0700HK.Type(DOUBLE),root.stock.Legacy.0700HK.L1_BidSize(DOUBLE),root.stock.Legacy.0700HK.Domain(DOUBLE),root.stock.Legacy.0700HK.L1_BuyNo(BOOLEAN),root.stock.Legacy.0700HK.L1_AskPrice(DOUBLE) - 2024-07-30 10:33:55,0.44574088,3.0,0.21476832811611501,-4.0,true,0.5951748 - 2024-07-30 10:33:56,0.6880933,3.0,0.6289119476165305,-5.0,false,0.114634395 - ``` - -## import-data Script (Data Import) - -### Import File Examples - -#### CSV File Example - -Note that before importing CSV data, special characters need to be handled as follows: - -1. If the text type field contains special characters such as `,`, it should be escaped with `\`. -2. You can import times in formats like `yyyy-MM-dd'T'HH:mm:ss`, `yyyy-MM-dd HH:mm:ss`, or `yyyy-MM-dd'T'HH:mm:ss.SSSZ`. -3. The time column `Time` should always be in the first column. - -Example 1: Time Aligned, No Data Types in Header - -```SQL -Time,root.test.t1.str,root.test.t2.str,root.test.t2.var -1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 -1970-01-01T08:00:00.002+08:00,"123",, -``` - -Example 2: Time Aligned, Data Types in Header(Text type data supports double quotation marks and non double quotation marks) - -```SQL -Time,root.test.t1.str(TEXT),root.test.t2.str(TEXT),root.test.t2.var(INT32) -1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 -1970-01-01T08:00:00.002+08:00,123,hello world,123 -1970-01-01T08:00:00.003+08:00,"123",, -1970-01-01T08:00:00.004+08:00,123,,12 -``` -Example 3: Device Aligned, No Data Types in Header - -```SQL -Time,Device,str,var -1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", -1970-01-01T08:00:00.002+08:00,root.test.t1,"123", -1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 -``` - -Example 4: Device Aligned, Data Types in Header (Text type data supports double quotation marks and non double quotation marks) - -```SQL -Time,Device,str(TEXT),var(INT32) -1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", -1970-01-01T08:00:00.002+08:00,root.test.t1,"123", -1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 -1970-01-01T08:00:00.002+08:00,root.test.t1,hello world,123 -``` - -#### SQL File Example - -> For unsupported SQL, illegal SQL, or failed SQL executions, they will be placed in the failed directory under the failed file (default to filename.failed). - -```SQL -INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728578812,0.21911979,4.0,0.7129878488375604,-5.0,false,0.65362453); -INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728579812,0.35814416,3.0,0.04674720094979623,-5.0,false,0.9365247); -INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728580813,0.20012152,3.0,0.9910098187911393,-4.0,true,0.70040536); -INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728581814,0.034122765,4.0,0.9313345284181858,-4.0,true,0.9945297); -``` - -### Command - -```Bash -# Unix/OS X ->tools/import-data.sh -h -p -u -pw -s [-fd <./failedDirectory> -aligned -batch -tp -typeInfer -lpf ] - -# Windows ->tools\import-data.bat -h -p -u -pw -s [-fd <./failedDirectory> -aligned -batch -tp -typeInfer -lpf ] -``` - -> Although IoTDB has the ability to infer types, it is still recommended to create metadata before importing data to avoid unnecessary type conversion errors. For example: - -```SQL -CREATE DATABASE root.fit.d1; -CREATE DATABASE root.fit.d2; -CREATE DATABASE root.fit.p; -CREATE TIMESERIES root.fit.d1.s1 WITH DATATYPE=INT32,ENCODING=RLE; -CREATE TIMESERIES root.fit.d1.s2 WITH DATATYPE=TEXT,ENCODING=PLAIN; -CREATE TIMESERIES root.fit.d2.s1 WITH DATATYPE=INT32,ENCODING=RLE; -CREATE TIMESERIES root.fit.d2.s3 WITH DATATYPE=INT32,ENCODING=RLE; -CREATE TIMESERIES root.fit.p.s1 WITH DATATYPE=INT32,ENCODING=RLE; -``` - -Parameter Introduction: - -| Parameter | Definition | Required | Default | -|:----------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------|:-------------------------------------------------| -| -h | Database IP address | No | 127.0.0.1 | -| -p | Database port | No | 6667 | -| -u | Database connection username | No | root | -| -pw | Database connection password | No | root | -| -s | Specify the data you want to import, here you can specify a file or folder. If a folder is specified, all files with the suffix CSV or SQL in the folder will be imported in bulk.(The parameter for V1.3.2 is `-f`) | Yes | | -| -fd | Specify the directory to store the failed SQL files. If this parameter is not specified, the failed files will be saved to the directory of the source data.
Note: For unsupported SQL, illegal SQL, and failed SQL, they will be placed in the failed file in the failed directory (default file name is. failed) | No | Add the suffix '. failed' to the source file name | -| -aligned | Specify whether to use the 'aligned' interface, with options of true or false. This parameter only takes effect when the imported file is a CSV file | No | false | -| -batch | Used to specify the number of points to be inserted for each batch of data (minimum value is 1, maximum value is Integer.MAX_VALUE). If the program reports' org.apache.hrift.transport ' If TTransportException: Frame size larger than protect max size is incorrect, you can adjust this parameter appropriately. | No | `100000` | -| -tp | Specify time precision, optional values include 'ms' (milliseconds),' ns' (nanoseconds), 'us' (microseconds) | No | `ms` | -| -lpf | Specify the number of lines to write data to each failed import file(The parameter for V1.3.2 is `-linesPerFailedFile`) | No | 10000 | -| -typeInfer | Used to specify type inference rules. For Example:.
Note: Used to specify type inference rules.`srcTsDataType` include `boolean`,`int`,`long`,`float`,`double`,`NaN`.`dstTsDataType` include `boolean`,`int`,`long`,`float`,`double`,`text`.when`srcTsDataType`is`boolean`, `dstTsDataType`can only be`boolean`or`text`.when`srcTsDataType`is`NaN`, `dstTsDataType`can only be`float`, `double`or`text`.when`srcTsDataType`is numeric, the precision of `dstTsDataType`needs to be higher than that of `srcTsDataType`.For example:`-typeInfer boolean=text,float=double` | No | | - -### Running Examples - -- Import the `dump0_0.sql` data in the current data directory into the local IoTDB database. - -```Bash -# Unix/OS X ->tools/import-data.sh -s ./data/dump0_0.sql -# Windows ->tools/import-data.bat -s ./data/dump0_0.sql -``` - -- Import all data in the current data directory in an aligned manner into the local IoTDB database. - -```Bash -# Unix/OS X ->tools/import-data.sh -s ./data/ -fd ./failed/ -aligned true -# Windows ->tools/import-data.bat -s ./data/ -fd ./failed/ -aligned true -``` - -- Import the `dump0_0.csv` data in the current data directory into the local IoTDB database. - -```Bash -# Unix/OS X ->tools/import-data.sh -s ./data/dump0_0.csv -fd ./failed/ -# Windows ->tools/import-data.bat -s ./data/dump0_0.csv -fd ./failed/ -``` - -- Import the `dump0_0.csv` data in the current data directory in an aligned manner, batch import 100,000 records into the IoTDB database on the host with IP `192.168.100.1`, record failures in the current `failed` directory, and limit each file to 1,000 records. - -```Bash -# Unix/OS X ->tools/import-data.sh -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000 -# Windows ->tools/import-data.bat -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000 -``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Tools-System/Maintenance-Tool_apache.md b/src/UserGuide/V1.3.0-2/Tools-System/Maintenance-Tool_apache.md deleted file mode 100644 index c3f1a1f5f..000000000 --- a/src/UserGuide/V1.3.0-2/Tools-System/Maintenance-Tool_apache.md +++ /dev/null @@ -1,228 +0,0 @@ - -# Maintenance Tool -## IoTDB Data Directory Overview Tool - -IoTDB data directory overview tool is used to print an overview of the IoTDB data directory structure. The location is tools/tsfile/print-iotdb-data-dir. - -### Usage - -- For Windows: - -```bash -.\print-iotdb-data-dir.bat () -``` - -- For Linux or MacOs: - -```shell -./print-iotdb-data-dir.sh () -``` - -Note: if the storage path of the output overview file is not set, the default relative path "IoTDB_data_dir_overview.txt" will be used. - -### Example - -Use Windows in this example: - -`````````````````````````bash -.\print-iotdb-data-dir.bat D:\github\master\iotdb\data\datanode\data -```````````````````````` -Starting Printing the IoTDB Data Directory Overview -```````````````````````` -output save path:IoTDB_data_dir_overview.txt -data dir num:1 -143 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-common.properties, use the default configs. -|============================================================== -|D:\github\master\iotdb\data\datanode\data -|--sequence -| |--root.redirect0 -| | |--1 -| | | |--0 -| |--root.redirect1 -| | |--2 -| | | |--0 -| |--root.redirect2 -| | |--3 -| | | |--0 -| |--root.redirect3 -| | |--4 -| | | |--0 -| |--root.redirect4 -| | |--5 -| | | |--0 -| |--root.redirect5 -| | |--6 -| | | |--0 -| |--root.sg1 -| | |--0 -| | | |--0 -| | | |--2760 -|--unsequence -|============================================================== -````````````````````````` - -## TsFile Sketch Tool - -TsFile sketch tool is used to print the content of a TsFile in sketch mode. The location is tools/tsfile/print-tsfile. - -### Usage - -- For Windows: - -``` -.\print-tsfile-sketch.bat () -``` - -- For Linux or MacOs: - -``` -./print-tsfile-sketch.sh () -``` - -Note: if the storage path of the output sketch file is not set, the default relative path "TsFile_sketch_view.txt" will be used. - -### Example - -Use Windows in this example: - -`````````````````````````bash -.\print-tsfile.bat D:\github\master\1669359533965-1-0-0.tsfile D:\github\master\sketch.txt -```````````````````````` -Starting Printing the TsFile Sketch -```````````````````````` -TsFile path:D:\github\master\1669359533965-1-0-0.tsfile -Sketch save path:D:\github\master\sketch.txt -148 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-common.properties, use the default configs. --------------------------------- TsFile Sketch -------------------------------- -file path: D:\github\master\1669359533965-1-0-0.tsfile -file length: 2974 - - POSITION| CONTENT - -------- ------- - 0| [magic head] TsFile - 6| [version number] 3 -||||||||||||||||||||| [Chunk Group] of root.sg1.d1, num of Chunks:3 - 7| [Chunk Group Header] - | [marker] 0 - | [deviceID] root.sg1.d1 - 20| [Chunk] of root.sg1.d1.s1, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] - | [chunk header] marker=5, measurementID=s1, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE - | [page] UncompressedSize:862, CompressedSize:860 - 893| [Chunk] of root.sg1.d1.s2, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] - | [chunk header] marker=5, measurementID=s2, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE - | [page] UncompressedSize:862, CompressedSize:860 - 1766| [Chunk] of root.sg1.d1.s3, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] - | [chunk header] marker=5, measurementID=s3, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE - | [page] UncompressedSize:862, CompressedSize:860 -||||||||||||||||||||| [Chunk Group] of root.sg1.d1 ends - 2656| [marker] 2 - 2657| [TimeseriesIndex] of root.sg1.d1.s1, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] - | [ChunkIndex] offset=20 - 2728| [TimeseriesIndex] of root.sg1.d1.s2, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] - | [ChunkIndex] offset=893 - 2799| [TimeseriesIndex] of root.sg1.d1.s3, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] - | [ChunkIndex] offset=1766 - 2870| [IndexOfTimerseriesIndex Node] type=LEAF_MEASUREMENT - | - | -||||||||||||||||||||| [TsFileMetadata] begins - 2891| [IndexOfTimerseriesIndex Node] type=LEAF_DEVICE - | - | - | [meta offset] 2656 - | [bloom filter] bit vector byte array length=31, filterSize=256, hashFunctionSize=5 -||||||||||||||||||||| [TsFileMetadata] ends - 2964| [TsFileMetadataSize] 73 - 2968| [magic tail] TsFile - 2974| END of TsFile ----------------------------- IndexOfTimerseriesIndex Tree ----------------------------- - [MetadataIndex:LEAF_DEVICE] - └──────[root.sg1.d1,2870] - [MetadataIndex:LEAF_MEASUREMENT] - └──────[s1,2657] ----------------------------------- TsFile Sketch End ---------------------------------- -````````````````````````` - -Explanations: - -- Separated by "|", the left is the actual position in the TsFile, and the right is the summary content. -- "||||||||||||||||||||" is the guide information added to enhance readability, not the actual data stored in TsFile. -- The last printed "IndexOfTimerseriesIndex Tree" is a reorganization of the metadata index tree at the end of the TsFile, which is convenient for intuitive understanding, and again not the actual data stored in TsFile. - -## TsFile Resource Sketch Tool - -TsFile resource sketch tool is used to print the content of a TsFile resource file. The location is tools/tsfile/print-tsfile-resource-files. - -### Usage - -- For Windows: - -```bash -.\print-tsfile-resource-files.bat -``` - -- For Linux or MacOs: - -``` -./print-tsfile-resource-files.sh -``` - -### Example - -Use Windows in this example: - -`````````````````````````bash -.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 -```````````````````````` -Starting Printing the TsFileResources -```````````````````````` -147 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-common.properties, use the default configs. -230 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-common.properties, use default configuration -231 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-common.properties from any of the known sources. -233 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-datanode.properties, use default configuration -237 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-datanode.properties from any of the known sources. - -Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... - -Resource plan index range [9223372036854775807, -9223372036854775808] -device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) - -Analyzing the resource file folder D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 finished. -````````````````````````` - -`````````````````````````bash -.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource -```````````````````````` -Starting Printing the TsFileResources -```````````````````````` -178 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-common.properties, use default configuration -186 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-common.properties, use the default configs. -187 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-common.properties from any of the known sources. -188 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-datanode.properties, use default configuration -192 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-datanode.properties from any of the known sources. -Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... - -Resource plan index range [9223372036854775807, -9223372036854775808] -device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) - -Analyzing the resource file D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource finished. -````````````````````````` diff --git a/src/UserGuide/V1.3.0-2/Tools-System/Maintenance-Tool_timecho.md b/src/UserGuide/V1.3.0-2/Tools-System/Maintenance-Tool_timecho.md deleted file mode 100644 index 4c6186c7b..000000000 --- a/src/UserGuide/V1.3.0-2/Tools-System/Maintenance-Tool_timecho.md +++ /dev/null @@ -1,957 +0,0 @@ - -# Maintenance Tool - -## IoTDB-OpsKit - -The IoTDB OpsKit is an easy-to-use operation and maintenance tool (enterprise version tool). -It is designed to solve the operation and maintenance problems of multiple nodes in the IoTDB distributed system. -It mainly includes cluster deployment, cluster start and stop, elastic expansion, configuration update, data export and other functions, thereby realizing one-click command issuance for complex database clusters, which greatly Reduce management difficulty. -This document will explain how to remotely deploy, configure, start and stop IoTDB cluster instances with cluster management tools. - -### Environment dependence - -This tool is a supporting tool for TimechoDB(Enterprise Edition based on IoTDB). You can contact your sales representative to obtain the tool download method. - -The machine where IoTDB is to be deployed needs to rely on jdk 8 and above, lsof, netstat, and unzip functions. If not, please install them yourself. You can refer to the installation commands required for the environment in the last section of the document. - -Tip: The IoTDB cluster management tool requires an account with root privileges - -### Deployment method - -#### Download and install - -This tool is a supporting tool for TimechoDB(Enterprise Edition based on IoTDB). You can contact your salesperson to obtain the tool download method. - -Note: Since the binary package only supports GLIBC2.17 and above, the minimum version is Centos7. - -* After entering the following commands in the iotdb-opskit directory: - -```bash -bash install-iotdbctl.sh -``` - -The iotdbctl keyword can be activated in the subsequent shell, such as checking the environment instructions required before deployment as follows: - -```bash -iotdbctl cluster check example -``` - -* You can also directly use <iotdbctl absolute path>/sbin/iotdbctl without activating iotdbctl to execute commands, such as checking the environment required before deployment: - -```bash -/sbin/iotdbctl cluster check example -``` - -### Introduction to cluster configuration files - -* There is a cluster configuration yaml file in the `iotdbctl/config` directory. The yaml file name is the cluster name. There can be multiple yaml files. In order to facilitate users to configure yaml files, a `default_cluster.yaml` example is provided under the iotdbctl/config directory. -* The yaml file configuration consists of five major parts: `global`, `confignode_servers`, `datanode_servers`, `grafana_server`, and `prometheus_server` -* `global` is a general configuration that mainly configures machine username and password, IoTDB local installation files, Jdk configuration, etc. A `default_cluster.yaml` sample data is provided in the `iotdbctl/config` directory, - Users can copy and modify it to their own cluster name and refer to the instructions inside to configure the IoTDB cluster. In the `default_cluster.yaml` sample, all uncommented items are required, and those that have been commented are non-required. - -例如要执行`default_cluster.yaml`检查命令则需要执行命令`iotdbctl cluster check default_cluster`即可, -更多详细命令请参考下面命令列表。 - - -| parameter name | parameter describe | required | -|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| -| iotdb\_zip\_dir | IoTDB deployment distribution directory, if the value is empty, it will be downloaded from the address specified by `iotdb_download_url` | NO | -| iotdb\_download\_url | IoTDB download address, if `iotdb_zip_dir` has no value, download from the specified address | NO | -| jdk\_tar\_dir | jdk local directory, you can use this jdk path to upload and deploy to the target node. | NO | -| jdk\_deploy\_dir | jdk remote machine deployment directory, jdk will be deployed to this directory, and the following `jdk_dir_name` parameter forms a complete jdk deployment directory, that is, `/` | NO | -| jdk\_dir\_name | The directory name after jdk decompression defaults to jdk_iotdb | NO | -| iotdb\_lib\_dir | The IoTDB lib directory or the IoTDB lib compressed package only supports .zip format and is only used for IoTDB upgrade. It is in the comment state by default. If you need to upgrade, please open the comment and modify the path. If you use a zip file, please use the zip command to compress the iotdb/lib directory, such as zip -r lib.zip apache-iotdb-1.2.0/lib/* d | NO | -| user | User name for ssh login deployment machine | YES | -| password | The password for ssh login. If the password does not specify the use of pkey to log in, please ensure that the ssh login between nodes has been configured without a key. | NO | -| pkey | Key login: If password has a value, password is used first, otherwise pkey is used to log in. | NO | -| ssh\_port | ssh port | YES | -| deploy\_dir | IoTDB deployment directory, IoTDB will be deployed to this directory and the following `iotdb_dir_name` parameter will form a complete IoTDB deployment directory, that is, `/` | YES | -| iotdb\_dir\_name | The directory name after decompression of IoTDB is iotdb by default. | NO | -| datanode-env.sh | Corresponding to `iotdb/config/datanode-env.sh`, when `global` and `confignode_servers` are configured at the same time, the value in `confignode_servers` is used first | NO | -| confignode-env.sh | Corresponding to `iotdb/config/confignode-env.sh`, the value in `datanode_servers` is used first when `global` and `datanode_servers` are configured at the same time | NO | -| iotdb-system.properties | Corresponds to `/config/iotdb-system.properties` | NO | -| cn\_internal\_address | The cluster configuration address points to the surviving ConfigNode, and it points to confignode_x by default. When `global` and `confignode_servers` are configured at the same time, the value in `confignode_servers` is used first, corresponding to `cn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | -| dn\_internal\_address | The cluster configuration address points to the surviving ConfigNode, and points to confignode_x by default. When configuring values for `global` and `datanode_servers` at the same time, the value in `datanode_servers` is used first, corresponding to `dn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | - -Among them, datanode-env.sh and confignode-env.sh can be configured with extra parameters extra_opts. When this parameter is configured, corresponding values will be appended after datanode-env.sh and confignode-env.sh. Refer to default_cluster.yaml for configuration examples as follows: -datanode-env.sh: -extra_opts: | -IOTDB_JMX_OPTS="$IOTDB_JMX_OPTS -XX:+UseG1GC" -IOTDB_JMX_OPTS="$IOTDB_JMX_OPTS -XX:MaxGCPauseMillis=200" - -* `confignode_servers` is the configuration for deploying IoTDB Confignodes, in which multiple Confignodes can be configured - By default, the first started ConfigNode node node1 is regarded as the Seed-ConfigNode - -| parameter name | parameter describe | required | -|-----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| -| name | Confignode name | YES | -| deploy\_dir | IoTDB config node deployment directory | YES | -| cn\_internal\_address | Corresponds to iotdb/internal communication address, corresponding to `cn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | -| cn_internal_address | The cluster configuration address points to the surviving ConfigNode, and it points to confignode_x by default. When `global` and `confignode_servers` are configured at the same time, the value in `confignode_servers` is used first, corresponding to `cn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | -| cn\_internal\_port | Internal communication port, corresponding to `cn_internal_port` in `iotdb/config/iotdb-system.properties` | YES | -| cn\_consensus\_port | Corresponds to `cn_consensus_port` in `iotdb/config/iotdb-system.properties` | NO | -| cn\_data\_dir | Corresponds to `cn_consensus_port` in `iotdb/config/iotdb-system.properties` Corresponds to `cn_data_dir` in `iotdb/config/iotdb-system.properties` | YES | -| iotdb-system.properties | Corresponding to `iotdb/config/iotdb-system.properties`, when configuring values in `global` and `confignode_servers` at the same time, the value in confignode_servers will be used first. | NO | - -* datanode_servers 是部署IoTDB Datanodes配置,里面可以配置多个Datanode - -| parameter name | parameter describe | required | -|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| -| name | Datanode name | YES | -| deploy\_dir | IoTDB data node deployment directory | YES | -| dn\_rpc\_address | The datanode rpc address corresponds to `dn_rpc_address` in `iotdb/config/iotdb-system.properties` | YES | -| dn\_internal\_address | Internal communication address, corresponding to `dn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | -| dn\_seed\_config\_node | The cluster configuration address points to the surviving ConfigNode, and points to confignode_x by default. When configuring values for `global` and `datanode_servers` at the same time, the value in `datanode_servers` is used first, corresponding to `dn_seed_config_node` in `iotdb/config/iotdb-system.properties`. | YES | -| dn\_rpc\_port | Datanode rpc port address, corresponding to `dn_rpc_port` in `iotdb/config/iotdb-system.properties` | YES | -| dn\_internal\_port | Internal communication port, corresponding to `dn_internal_port` in `iotdb/config/iotdb-system.properties` | YES | -| iotdb-system.properties | Corresponding to `iotdb/config/iotdb-system.properties`, when configuring values in `global` and `datanode_servers` at the same time, the value in `datanode_servers` will be used first. | NO | - -* grafana_server is the configuration related to deploying Grafana - -| parameter name | parameter describe | required | -|--------------------|-------------------------------------------------------------|-----------| -| grafana\_dir\_name | Grafana decompression directory name(default grafana_iotdb) | NO | -| host | Server ip deployed by grafana | YES | -| grafana\_port | The port of grafana deployment machine, default 3000 | NO | -| deploy\_dir | grafana deployment server directory | YES | -| grafana\_tar\_dir | Grafana compressed package location | YES | -| dashboards | dashboards directory | NO | - -* prometheus_server 是部署Prometheus 相关配置 - -| parameter name | parameter describe | required | -|--------------------------------|----------------------------------------------------|----------| -| prometheus\_dir\_name | prometheus decompression directory name, default prometheus_iotdb | NO | -| host | Server IP deployed by prometheus | YES | -| prometheus\_port | The port of prometheus deployment machine, default 9090 | NO | -| deploy\_dir | prometheus deployment server directory | YES | -| prometheus\_tar\_dir | prometheus compressed package path | YES | -| storage\_tsdb\_retention\_time | The number of days to save data is 15 days by default | NO | -| storage\_tsdb\_retention\_size | The data size that can be saved by the specified block defaults to 512M. Please note the units are KB, MB, GB, TB, PB, and EB. | NO | - -If metrics are configured in `iotdb-system.properties` and `iotdb-system.properties` of config/xxx.yaml, the configuration will be automatically put into promethues without manual modification. - -Note: How to configure the value corresponding to the yaml key to contain special characters such as: etc. It is recommended to use double quotes for the entire value, and do not use paths containing spaces in the corresponding file paths to prevent abnormal recognition problems. - -### scenes to be used - -#### Clean data - -* Cleaning up the cluster data scenario will delete the data directory in the IoTDB cluster and `cn_system_dir`, `cn_consensus_dir`, `cn_consensus_dir` configured in the yaml file - `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs` and `ext` directories. -* First execute the stop cluster command, and then execute the cluster cleanup command. - -```bash -iotdbctl cluster stop default_cluster -iotdbctl cluster clean default_cluster -``` - -#### Cluster destruction - -* The cluster destruction scenario will delete `data`, `cn_system_dir`, `cn_consensus_dir`, in the IoTDB cluster - `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs`, `ext`, `IoTDB` deployment directory, - grafana deployment directory and prometheus deployment directory. -* First execute the stop cluster command, and then execute the cluster destruction command. - - -```bash -iotdbctl cluster stop default_cluster -iotdbctl cluster destroy default_cluster -``` - -#### Cluster upgrade - -* To upgrade the cluster, you first need to configure `iotdb_lib_dir` in config/xxx.yaml as the directory path where the jar to be uploaded to the server is located (for example, iotdb/lib). -* If you use zip files to upload, please use the zip command to compress the iotdb/lib directory, such as zip -r lib.zip apache-iotdb-1.2.0/lib/* -* Execute the upload command and then execute the restart IoTDB cluster command to complete the cluster upgrade. - -```bash -iotdbctl cluster dist-lib default_cluster -iotdbctl cluster restart default_cluster -``` - -#### hot deployment - -* First modify the configuration in config/xxx.yaml. -* Execute the distribution command, and then execute the hot deployment command to complete the hot deployment of the cluster configuration - -```bash -iotdbctl cluster dist-conf default_cluster -iotdbctl cluster reload default_cluster -``` - -#### Cluster expansion - -* First modify and add a datanode or confignode node in config/xxx.yaml. -* Execute the cluster expansion command - -```bash -iotdbctl cluster scaleout default_cluster -``` - -#### Cluster scaling - -* First find the node name or ip+port to shrink in config/xxx.yaml (where confignode port is cn_internal_port, datanode port is rpc_port) -* Execute cluster shrink command - -```bash -iotdbctl cluster scalein default_cluster -``` - -#### Using cluster management tools to manipulate existing IoTDB clusters - -* Configure the server's `user`, `passwod` or `pkey`, `ssh_port` -* Modify the IoTDB deployment path in config/xxx.yaml, `deploy_dir` (IoTDB deployment directory), `iotdb_dir_name` (IoTDB decompression directory name, the default is iotdb) - For example, if the full path of IoTDB deployment is `/home/data/apache-iotdb-1.1.1`, you need to modify the yaml files `deploy_dir:/home/data/` and `iotdb_dir_name:apache-iotdb-1.1.1` -* If the server is not using java_home, modify `jdk_deploy_dir` (jdk deployment directory) and `jdk_dir_name` (the directory name after jdk decompression, the default is jdk_iotdb). If java_home is used, there is no need to modify the configuration. - For example, the full path of jdk deployment is `/home/data/jdk_1.8.2`, you need to modify the yaml files `jdk_deploy_dir:/home/data/`, `jdk_dir_name:jdk_1.8.2` -* Configure `cn_internal_address`, `dn_internal_address` -* Configure `cn_internal_address`, `cn_internal_port`, `cn_consensus_port`, `cn_system_dir`, in `iotdb-system.properties` in `confignode_servers` - If the values in `cn_consensus_dir` and `iotdb-system.properties` are not the default for IoTDB, they need to be configured, otherwise there is no need to configure them. -* Configure `dn_rpc_address`, `dn_internal_address`, `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir` in `iotdb-system.properties` in `datanode_servers` -* Execute initialization command - -```bash -iotdbctl cluster init default_cluster -``` - -#### Deploy IoTDB, Grafana and Prometheus - -* Configure `iotdb-system.properties` to open the metrics interface -* Configure the Grafana configuration. If there are multiple `dashboards`, separate them with commas. The names cannot be repeated or they will be overwritten. -* Configure the Prometheus configuration. If the IoTDB cluster is configured with metrics, there is no need to manually modify the Prometheus configuration. The Prometheus configuration will be automatically modified according to which node is configured with metrics. -* Start the cluster - -```bash -iotdbctl cluster start default_cluster -``` - -For more detailed parameters, please refer to the cluster configuration file introduction above - -### Command - -The basic usage of this tool is: -```bash -iotdbctl cluster [params (Optional)] -``` -* key indicates a specific command. - -* cluster name indicates the cluster name (that is, the name of the yaml file in the `iotdbctl/config` file). - -* params indicates the required parameters of the command (optional). - -* For example, the command format to deploy the default_cluster cluster is: - -```bash -iotdbctl cluster deploy default_cluster -``` - -* The functions and parameters of the cluster are listed as follows: - -| command | description | parameter | -|-----------------|-------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| check | check whether the cluster can be deployed | Cluster name list | -| clean | cleanup-cluster | cluster-name | -| deploy/dist-all | deploy cluster | Cluster name, -N, module name (optional for iotdb, grafana, prometheus), -op force (optional) | -| list | cluster status list | None | -| start | start cluster | Cluster name, -N, node name (nodename, grafana, prometheus optional) | -| stop | stop cluster | Cluster name, -N, node name (nodename, grafana, prometheus optional), -op force (nodename, grafana, prometheus optional) | -| restart | restart cluster | Cluster name, -N, node name (nodename, grafana, prometheus optional), -op force (nodename, grafana, prometheus optional) | -| show | view cluster information. The details field indicates the details of the cluster information. | Cluster name, details (optional) | -| destroy | destroy cluster | Cluster name, -N, module name (iotdb, grafana, prometheus optional) | -| scaleout | cluster expansion | Cluster name | -| scalein | cluster shrink | Cluster name, -N, cluster node name or cluster node ip+port | -| reload | hot loading of cluster configuration files | Cluster name | -| dist-conf | cluster configuration file distribution | Cluster name | -| dumplog | Back up specified cluster logs | Cluster name, -N, cluster node name -h Back up to target machine ip -pw Back up to target machine password -p Back up to target machine port -path Backup directory -startdate Start time -enddate End time -loglevel Log type -l transfer speed | -| dumpdata | Backup cluster data | Cluster name, -h backup to target machine ip -pw backup to target machine password -p backup to target machine port -path backup directory -startdate start time -enddate end time -l transmission speed | -| dist-lib | lib package upgrade | Cluster name | -| init | When an existing cluster uses the cluster deployment tool, initialize the cluster configuration | Cluster name | -| status | View process status | Cluster name | -| activate | Activate cluster | Cluster name | -| health_check | health check | Cluster name, -N, nodename (optional) | -| backup | Activate cluster | Cluster name,-N nodename (optional) | -| importschema | Activate cluster | Cluster name,-N nodename -param paramters | -| exportschema | Activate cluster | Cluster name,-N nodename -param paramters | - -### Detailed command execution process - -The following commands are executed using default_cluster.yaml as an example, and users can modify them to their own cluster files to execute - -#### Check cluster deployment environment commands - -```bash -iotdbctl cluster check default_cluster -``` - -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` - -* Verify that the target node is able to log in via SSH - -* Verify whether the JDK version on the corresponding node meets IoTDB jdk1.8 and above, and whether the server is installed with unzip, lsof, and netstat. - -* If you see the following prompt `Info:example check successfully!`, it proves that the server has already met the installation requirements. - If `Error:example check fail!` is output, it proves that some conditions do not meet the requirements. You can check the Error log output above (for example: `Error:Server (ip:172.20.31.76) iotdb port(10713) is listening`) to make repairs. , - If the jdk check does not meet the requirements, we can configure a jdk1.8 or above version in the yaml file ourselves for deployment without affecting subsequent use. - If checking lsof, netstat or unzip does not meet the requirements, you need to install it on the server yourself. - -#### Deploy cluster command - -```bash -iotdbctl cluster deploy default_cluster -``` - -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` - -* Upload IoTDB compressed package and jdk compressed package according to the node information in `confignode_servers` and `datanode_servers` (if `jdk_tar_dir` and `jdk_deploy_dir` values ​​are configured in yaml) - -* Generate and upload `iotdb-system.properties` according to the yaml file node configuration information - -```bash -iotdbctl cluster deploy default_cluster -op force -``` - -Note: This command will force the deployment, and the specific process will delete the existing deployment directory and redeploy - -*deploy a single module* -```bash -# Deploy grafana module -iotdbctl cluster deploy default_cluster -N grafana -# Deploy the prometheus module -iotdbctl cluster deploy default_cluster -N prometheus -# Deploy the iotdb module -iotdbctl cluster deploy default_cluster -N iotdb -``` - -#### Start cluster command - -```bash -iotdbctl cluster start default_cluster -``` - -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` - -* Start confignode, start sequentially according to the order in `confignode_servers` in the yaml configuration file and check whether the confignode is normal according to the process id, the first confignode is seek config - -* Start the datanode in sequence according to the order in `datanode_servers` in the yaml configuration file and check whether the datanode is normal according to the process id. - -* After checking the existence of the process according to the process id, check whether each service in the cluster list is normal through the cli. If the cli link fails, retry every 10s until it succeeds and retry up to 5 times - - -* -Start a single node command* -```bash -#Start according to the IoTDB node name -iotdbctl cluster start default_cluster -N datanode_1 -#Start according to IoTDB cluster ip+port, where port corresponds to cn_internal_port of confignode and rpc_port of datanode. -iotdbctl cluster start default_cluster -N 192.168.1.5:6667 -#Start grafana -iotdbctl cluster start default_cluster -N grafana -#Start prometheus -iotdbctl cluster start default_cluster -N prometheus -``` - -* Find the yaml file in the default location based on cluster-name - -* Find the node location information based on the provided node name or ip:port. If the started node is `data_node`, the ip uses `dn_rpc_address` in the yaml file, and the port uses `dn_rpc_port` in datanode_servers in the yaml file. - If the started node is `config_node`, the ip uses `cn_internal_address` in confignode_servers in the yaml file, and the port uses `cn_internal_port` - -* start the node - -Note: Since the cluster deployment tool only calls the start-confignode.sh and start-datanode.sh scripts in the IoTDB cluster, -When the actual output result fails, it may be that the cluster has not started normally. It is recommended to use the status command to check the current cluster status (iotdbctl cluster status xxx) - - -#### View IoTDB cluster status command - -```bash -iotdbctl cluster show default_cluster -#View IoTDB cluster details -iotdbctl cluster show default_cluster details -``` -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` - -* Execute `show cluster details` through cli on datanode in turn. If one node is executed successfully, it will not continue to execute cli on subsequent nodes and return the result directly. - -#### Stop cluster command - - -```bash -iotdbctl cluster stop default_cluster -``` -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` - -* According to the datanode node information in `datanode_servers`, stop the datanode nodes in order according to the configuration. - -* Based on the confignode node information in `confignode_servers`, stop the confignode nodes in sequence according to the configuration - -*force stop cluster command* - -```bash -iotdbctl cluster stop default_cluster -op force -``` -Will directly execute the kill -9 pid command to forcibly stop the cluster - -*Stop single node command* - -```bash -#Stop by IoTDB node name -iotdbctl cluster stop default_cluster -N datanode_1 -#Stop according to IoTDB cluster ip+port (ip+port is to get the only node according to ip+dn_rpc_port in datanode or ip+cn_internal_port in confignode to get the only node) -iotdbctl cluster stop default_cluster -N 192.168.1.5:6667 -#Stop grafana -iotdbctl cluster stop default_cluster -N grafana -#Stop prometheus -iotdbctl cluster stop default_cluster -N prometheus -``` - -* Find the yaml file in the default location based on cluster-name - -* Find the corresponding node location information based on the provided node name or ip:port. If the stopped node is `data_node`, the ip uses `dn_rpc_address` in the yaml file, and the port uses `dn_rpc_port` in datanode_servers in the yaml file. - If the stopped node is `config_node`, the ip uses `cn_internal_address` in confignode_servers in the yaml file, and the port uses `cn_internal_port` - -* stop the node - -Note: Since the cluster deployment tool only calls the stop-confignode.sh and stop-datanode.sh scripts in the IoTDB cluster, in some cases the iotdb cluster may not be stopped. - - -#### Clean cluster data command - -```bash -iotdbctl cluster clean default_cluster -``` - -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` - -* Based on the information in `confignode_servers` and `datanode_servers`, check whether there are still services running, - If any service is running, the cleanup command will not be executed. - -* Delete the data directory in the IoTDB cluster and the `cn_system_dir`, `cn_consensus_dir`, configured in the yaml file - `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs` and `ext` directories. - - - -#### Restart cluster command - -```bash -iotdbctl cluster restart default_cluster -``` -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` - -* Execute the above stop cluster command (stop), and then execute the start cluster command (start). For details, refer to the above start and stop commands. - -*Force restart cluster command* - -```bash -iotdbctl cluster restart default_cluster -op force -``` -Will directly execute the kill -9 pid command to force stop the cluster, and then start the cluster - - -*Restart a single node command* - -```bash -#Restart datanode_1 according to the IoTDB node name -iotdbctl cluster restart default_cluster -N datanode_1 -#Restart confignode_1 according to the IoTDB node name -iotdbctl cluster restart default_cluster -N confignode_1 -#Restart grafana -iotdbctl cluster restart default_cluster -N grafana -#Restart prometheus -iotdbctl cluster restart default_cluster -N prometheus -``` - -#### Cluster shrink command - -```bash -#Scale down by node name -iotdbctl cluster scalein default_cluster -N nodename -#Scale down according to ip+port (ip+port obtains the only node according to ip+dn_rpc_port in datanode, and obtains the only node according to ip+cn_internal_port in confignode) -iotdbctl cluster scalein default_cluster -N ip:port -``` -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` - -* Determine whether there is only one confignode node and datanode to be reduced. If there is only one left, the reduction cannot be performed. - -* Then get the node information to shrink according to ip:port or nodename, execute the shrink command, and then destroy the node directory. If the shrink node is `data_node`, use `dn_rpc_address` in the yaml file for ip, and use `dn_rpc_address` in the port. `dn_rpc_port` in datanode_servers in yaml file. - If the shrinking node is `config_node`, the ip uses `cn_internal_address` in confignode_servers in the yaml file, and the port uses `cn_internal_port` - - -Tip: Currently, only one node scaling is supported at a time - -#### Cluster expansion command - -```bash -iotdbctl cluster scaleout default_cluster -``` -* Modify the config/xxx.yaml file to add a datanode node or confignode node - -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` - -* Find the node to be expanded, upload the IoTDB compressed package and jdb package (if the `jdk_tar_dir` and `jdk_deploy_dir` values ​​are configured in yaml) and decompress it - -* Generate and upload `iotdb-system.properties` according to the yaml file node configuration information - -* Execute the command to start the node and verify whether the node is started successfully - -Tip: Currently, only one node expansion is supported at a time - -#### destroy cluster command -```bash -iotdbctl cluster destroy default_cluster -``` - -* cluster-name finds the yaml file in the default location - -* Check whether the node is still running based on the node node information in `confignode_servers`, `datanode_servers`, `grafana`, and `prometheus`. - Stop the destroy command if any node is running - -* Delete `data` in the IoTDB cluster and `cn_system_dir`, `cn_consensus_dir` configured in the yaml file - `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs`, `ext`, `IoTDB` deployment directory, - grafana deployment directory and prometheus deployment directory - -*Destroy a single module* - -```bash -# Destroy grafana module -iotdbctl cluster destroy default_cluster -N grafana -# Destroy prometheus module -iotdbctl cluster destroy default_cluster -N prometheus -# Destroy iotdb module -iotdbctl cluster destroy default_cluster -N iotdb -``` - -#### Distribute cluster configuration commands - -```bash -iotdbctl cluster dist-conf default_cluster -``` - -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` - -* Generate and upload `iotdb-system.properties` to the specified node according to the node configuration information of the yaml file - -#### Hot load cluster configuration command - -```bash -iotdbctl cluster reload default_cluster -``` -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` - -* Execute `load configuration` in the cli according to the node configuration information of the yaml file. - -#### Cluster node log backup -```bash -iotdbctl cluster dumplog default_cluster -N datanode_1,confignode_1 -startdate '2023-04-11' -enddate '2023-04-26' -h 192.168.9.48 -p 36000 -u root -pw root -path '/iotdb/logs' -logs '/root/data/db/iotdb/logs' -``` - -* Find the yaml file in the default location based on cluster-name - -* This command will verify the existence of datanode_1 and confignode_1 according to the yaml file, and then back up the log data of the specified node datanode_1 and confignode_1 to the specified service `192.168.9.48` port 36000 according to the configured start and end dates (startdate<=logtime<=enddate) The data backup path is `/iotdb/logs`, and the IoTDB log storage path is `/root/data/db/iotdb/logs` (not required, if you do not fill in -logs xxx, the default is to backup logs from the IoTDB installation path /logs ) - -| command | description | required | -|------------|-------------------------------------------------------------------------|----------| -| -h | backup data server ip | NO | -| -u | backup data server username | NO | -| -pw | backup data machine password | NO | -| -p | backup data machine port(default 22) | NO | -| -path | path to backup data (default current path) | NO | -| -loglevel | Log levels include all, info, error, warn (default is all) | NO | -| -l | speed limit (default 1024 speed limit range 0 to 104857601 unit Kbit/s) | NO | -| -N | multiple configuration file cluster names are separated by commas. | YES | -| -startdate | start time (including default 1970-01-01) | NO | -| -enddate | end time (included) | NO | -| -logs | IoTDB log storage path, the default is ({iotdb}/logs)) | NO | - -#### Cluster data backup -```bash -iotdbctl cluster dumpdata default_cluster -granularity partition -startdate '2023-04-11' -enddate '2023-04-26' -h 192.168.9.48 -p 36000 -u root -pw root -path '/iotdb/datas' -``` -* This command will obtain the leader node based on the yaml file, and then back up the data to the /iotdb/datas directory on the 192.168.9.48 service based on the start and end dates (startdate<=logtime<=enddate) - -| command | description | required | -|--------------|-------------------------------------------------------------------------|----------| -| -h | backup data server ip | NO | -| -u | backup data server username | NO | -| -pw | backup data machine password | NO | -| -p | backup data machine port(default 22) | NO | -| -path | path to backup data (default current path) | NO | -| -granularity | partition | YES | -| -l | speed limit (default 1024 speed limit range 0 to 104857601 unit Kbit/s) | NO | -| -startdate | start time (including default 1970-01-01) | YES | -| -enddate | end time (included) | YES | - -#### Cluster upgrade -```bash -iotdbctl cluster dist-lib default_cluster -``` -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` - -* Upload lib package - -Note that after performing the upgrade, please restart IoTDB for it to take effect. - -#### Cluster initialization -```bash -iotdbctl cluster init default_cluster -``` -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` -* Initialize cluster configuration - -#### View cluster process status -```bash -iotdbctl cluster status default_cluster -``` - -* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` -* Display the survival status of each node in the cluster - -#### Cluster authorization activation - -Cluster activation is activated by entering the activation code by default, or by using the - op license_path activated through license path - -* Default activation method -```bash -iotdbctl cluster activate default_cluster -``` -* Find the yaml file in the default location based on `cluster-name` and obtain the `confignode_servers` configuration information -* Obtain the machine code inside -* Waiting for activation code input - -```bash -Machine code: -Kt8NfGP73FbM8g4Vty+V9qU5lgLvwqHEF3KbLN/SGWYCJ61eFRKtqy7RS/jw03lHXt4MwdidrZJ== -JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKgGXEGzMms25+u== -Please enter the activation code: -JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKg=,lTF1Dur1AElXIi/5jPV9h0XCm8ziPd9/R+tMYLsze1oAPxE87+Nwws= -Activation successful -``` -* Activate a node - -```bash -iotdbctl cluster activate default_cluster -N confignode1 -``` - -* Activate through license path - -```bash -iotdbctl cluster activate default_cluster -op license_path -``` -* Find the yaml file in the default location based on `cluster-name` and obtain the `confignode_servers` configuration information -* Obtain the machine code inside -* Waiting for activation code input - -```bash -Machine code: -Kt8NfGP73FbM8g4Vty+V9qU5lgLvwqHEF3KbLN/SGWYCJ61eFRKtqy7RS/jw03lHXt4MwdidrZJ== -JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKgGXEGzMms25+u== -Please enter the activation code: -JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKg=,lTF1Dur1AElXIi/5jPV9h0XCm8ziPd9/R+tMYLsze1oAPxE87+Nwws= -Activation successful -``` -* Activate a node - -```bash -iotdbctl cluster activate default_cluster -N confignode1 -op license_path -``` - -#### Cluster Health Check -```bash -iotdbctl cluster health_check default_cluster -``` -* Locate the yaml file in the default location based on the cluster-name to retrieve confignode_servers and datanode_servers configuration information. -* Execute health_check.sh on each node. -* Single Node Health Check -```bash -iotdbctl cluster health_check default_cluster -N datanode_1 -``` -* Locate the yaml file in the default location based on the cluster-name to retrieve datanode_servers configuration information. -* Execute health_check.sh on datanode1. - -#### Cluster Shutdown Backup - -```bash -iotdbctl cluster backup default_cluster -``` -* Locate the yaml file in the default location based on the cluster-name to retrieve confignode_servers and datanode_servers configuration information. -* Execute backup.sh on each node - -* Single Node Backup - -```bash -iotdbctl cluster backup default_cluster -N datanode_1 -``` - -* Locate the yaml file in the default location based on the cluster-name to retrieve datanode_servers configuration information. -* Execute backup.sh on datanode1. - Note: Multi-node deployment on a single machine only supports quick mode. - -#### Cluster Metadata Import -```bash -iotdbctl cluster importschema default_cluster -N datanode1 -param "-s ./dump0.csv -fd ./failed/ -lpf 10000" -``` -* Locate the yaml file in the default location based on the cluster-name to retrieve datanode_servers configuration information. -* Execute metadata import with import-schema.sh on datanode1. -* Parameters for -param are as follows: - -| command | description | required | -|------------|-------------------------------------------------------------------------|----------| -| -s | Specify the data file to be imported. You can specify a file or a directory. If a directory is specified, all files with a .csv extension in the directory will be imported in bulk. | YES | -| -fd | Specify a directory to store failed import files. If this parameter is not specified, failed files will be saved in the source data directory with the extension .failed added to the original filename. | No | -| -lpf | Specify the number of lines written to each failed import file. The default is 10000.| NO | - -#### Cluster Metadata Export - -```bash -iotdbctl cluster exportschema default_cluster -N datanode1 -param "-t ./ -pf ./pattern.txt -lpf 10 -t 10000" -``` - -* Locate the yaml file in the default location based on the cluster-name to retrieve datanode_servers configuration information. -* Execute metadata export with export-schema.sh on datanode1. -* Parameters for -param are as follows: - -| command | description | required | -|-------------|-------------------------------------------------------------------------|----------| -| -t | Specify the output path for the exported CSV file. | YES | -| -path | Specify the path pattern for exporting metadata. If this parameter is specified, the -s parameter will be ignored. Example: root.stock.** | NO | -| -pf | If -path is not specified, this parameter must be specified. It designates the file path containing the metadata paths to be exported, supporting txt file format. Each path to be exported is on a new line.| NO | -| -lpf | Specify the maximum number of lines for the exported dump file. The default is 10000.| NO | -| -timeout | Specify the timeout for session queries in milliseconds.| NO | - - -### Introduction to Cluster Deployment Tool Samples - -In the cluster deployment tool installation directory config/example, there are three yaml examples. If necessary, you can copy them to config and modify them. - -| name | description | -|-----------------------------|------------------------------------------------| -| default\_1c1d.yaml | 1 confignode and 1 datanode configuration example | -| default\_3c3d.yaml | 3 confignode and 3 datanode configuration samples | -| default\_3c3d\_grafa\_prome | 3 confignode and 3 datanode, Grafana, Prometheus configuration examples | - - -## IoTDB Data Directory Overview Tool - -IoTDB data directory overview tool is used to print an overview of the IoTDB data directory structure. The location is tools/tsfile/print-iotdb-data-dir. - -### Usage - -- For Windows: - -```bash -.\print-iotdb-data-dir.bat () -``` - -- For Linux or MacOs: - -```shell -./print-iotdb-data-dir.sh () -``` - -Note: if the storage path of the output overview file is not set, the default relative path "IoTDB_data_dir_overview.txt" will be used. - -### Example - -Use Windows in this example: - -`````````````````````````bash -.\print-iotdb-data-dir.bat D:\github\master\iotdb\data\datanode\data -```````````````````````` -Starting Printing the IoTDB Data Directory Overview -```````````````````````` -output save path:IoTDB_data_dir_overview.txt -data dir num:1 -143 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-common.properties, use the default configs. -|============================================================== -|D:\github\master\iotdb\data\datanode\data -|--sequence -| |--root.redirect0 -| | |--1 -| | | |--0 -| |--root.redirect1 -| | |--2 -| | | |--0 -| |--root.redirect2 -| | |--3 -| | | |--0 -| |--root.redirect3 -| | |--4 -| | | |--0 -| |--root.redirect4 -| | |--5 -| | | |--0 -| |--root.redirect5 -| | |--6 -| | | |--0 -| |--root.sg1 -| | |--0 -| | | |--0 -| | | |--2760 -|--unsequence -|============================================================== -````````````````````````` - -## TsFile Sketch Tool - -TsFile sketch tool is used to print the content of a TsFile in sketch mode. The location is tools/tsfile/print-tsfile. - -### Usage - -- For Windows: - -``` -.\print-tsfile-sketch.bat () -``` - -- For Linux or MacOs: - -``` -./print-tsfile-sketch.sh () -``` - -Note: if the storage path of the output sketch file is not set, the default relative path "TsFile_sketch_view.txt" will be used. - -### Example - -Use Windows in this example: - -`````````````````````````bash -.\print-tsfile.bat D:\github\master\1669359533965-1-0-0.tsfile D:\github\master\sketch.txt -```````````````````````` -Starting Printing the TsFile Sketch -```````````````````````` -TsFile path:D:\github\master\1669359533965-1-0-0.tsfile -Sketch save path:D:\github\master\sketch.txt -148 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-common.properties, use the default configs. --------------------------------- TsFile Sketch -------------------------------- -file path: D:\github\master\1669359533965-1-0-0.tsfile -file length: 2974 - - POSITION| CONTENT - -------- ------- - 0| [magic head] TsFile - 6| [version number] 3 -||||||||||||||||||||| [Chunk Group] of root.sg1.d1, num of Chunks:3 - 7| [Chunk Group Header] - | [marker] 0 - | [deviceID] root.sg1.d1 - 20| [Chunk] of root.sg1.d1.s1, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] - | [chunk header] marker=5, measurementID=s1, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE - | [page] UncompressedSize:862, CompressedSize:860 - 893| [Chunk] of root.sg1.d1.s2, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] - | [chunk header] marker=5, measurementID=s2, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE - | [page] UncompressedSize:862, CompressedSize:860 - 1766| [Chunk] of root.sg1.d1.s3, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] - | [chunk header] marker=5, measurementID=s3, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE - | [page] UncompressedSize:862, CompressedSize:860 -||||||||||||||||||||| [Chunk Group] of root.sg1.d1 ends - 2656| [marker] 2 - 2657| [TimeseriesIndex] of root.sg1.d1.s1, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] - | [ChunkIndex] offset=20 - 2728| [TimeseriesIndex] of root.sg1.d1.s2, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] - | [ChunkIndex] offset=893 - 2799| [TimeseriesIndex] of root.sg1.d1.s3, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] - | [ChunkIndex] offset=1766 - 2870| [IndexOfTimerseriesIndex Node] type=LEAF_MEASUREMENT - | - | -||||||||||||||||||||| [TsFileMetadata] begins - 2891| [IndexOfTimerseriesIndex Node] type=LEAF_DEVICE - | - | - | [meta offset] 2656 - | [bloom filter] bit vector byte array length=31, filterSize=256, hashFunctionSize=5 -||||||||||||||||||||| [TsFileMetadata] ends - 2964| [TsFileMetadataSize] 73 - 2968| [magic tail] TsFile - 2974| END of TsFile ----------------------------- IndexOfTimerseriesIndex Tree ----------------------------- - [MetadataIndex:LEAF_DEVICE] - └──────[root.sg1.d1,2870] - [MetadataIndex:LEAF_MEASUREMENT] - └──────[s1,2657] ----------------------------------- TsFile Sketch End ---------------------------------- -````````````````````````` - -Explanations: - -- Separated by "|", the left is the actual position in the TsFile, and the right is the summary content. -- "||||||||||||||||||||" is the guide information added to enhance readability, not the actual data stored in TsFile. -- The last printed "IndexOfTimerseriesIndex Tree" is a reorganization of the metadata index tree at the end of the TsFile, which is convenient for intuitive understanding, and again not the actual data stored in TsFile. - -## TsFile Resource Sketch Tool - -TsFile resource sketch tool is used to print the content of a TsFile resource file. The location is tools/tsfile/print-tsfile-resource-files. - -### Usage - -- For Windows: - -```bash -.\print-tsfile-resource-files.bat -``` - -- For Linux or MacOs: - -``` -./print-tsfile-resource-files.sh -``` - -### Example - -Use Windows in this example: - -`````````````````````````bash -.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 -```````````````````````` -Starting Printing the TsFileResources -```````````````````````` -147 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-common.properties, use the default configs. -230 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-common.properties, use default configuration -231 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-common.properties from any of the known sources. -233 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-datanode.properties, use default configuration -237 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-datanode.properties from any of the known sources. -Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... - -Resource plan index range [9223372036854775807, -9223372036854775808] -device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) - -Analyzing the resource file folder D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 finished. -````````````````````````` - -`````````````````````````bash -.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource -```````````````````````` -Starting Printing the TsFileResources -```````````````````````` -178 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-common.properties, use default configuration -186 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-common.properties, use the default configs. -187 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-common.properties from any of the known sources. -188 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-datanode.properties, use default configuration -192 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-datanode.properties from any of the known sources. -Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... - -Resource plan index range [9223372036854775807, -9223372036854775808] -device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) - -Analyzing the resource file D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource finished. -````````````````````````` diff --git a/src/UserGuide/V1.3.0-2/Tools-System/Monitor-Tool_apache.md b/src/UserGuide/V1.3.0-2/Tools-System/Monitor-Tool_apache.md deleted file mode 100644 index ec3b8048d..000000000 --- a/src/UserGuide/V1.3.0-2/Tools-System/Monitor-Tool_apache.md +++ /dev/null @@ -1,180 +0,0 @@ - - -# Monitor Tool - -## Prometheus - -### The mapping from metric type to prometheus format - -> For metrics whose Metric Name is name and Tags are K1=V1, ..., Kn=Vn, the mapping is as follows, where value is a -> specific value - -| Metric Type | Mapping | -| ---------------- | ------------------------------------------------------------ | -| Counter | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value | -| AutoGauge、Gauge | name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value | -| Histogram | name_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.5"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.99"} value | -| Rate | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m1"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m5"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m15"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="mean"} value | -| Timer | name_seconds_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.5"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.99"} value | - -### Config File - -1) Taking DataNode as an example, modify the iotdb-datanode.properties configuration file as follows: - -```properties -dn_metric_reporter_list=PROMETHEUS -dn_metric_level=CORE -dn_metric_prometheus_reporter_port=9091 -``` - -Then you can get metrics data as follows - -2) Start IoTDB DataNodes -3) Open a browser or use ```curl``` to visit ```http://servier_ip:9091/metrics```, you can get the following metric - data: - -``` -... -# HELP file_count -# TYPE file_count gauge -file_count{name="wal",} 0.0 -file_count{name="unseq",} 0.0 -file_count{name="seq",} 2.0 -... -``` - -### Prometheus + Grafana - -As shown above, IoTDB exposes monitoring metrics data in the standard Prometheus format to the outside world. Prometheus -can be used to collect and store monitoring indicators, and Grafana can be used to visualize monitoring indicators. - -The following picture describes the relationships among IoTDB, Prometheus and Grafana - -![iotdb_prometheus_grafana](/img/UserGuide/System-Tools/Metrics/iotdb_prometheus_grafana.png) - -1. Along with running, IoTDB will collect its metrics continuously. -2. Prometheus scrapes metrics from IoTDB at a constant interval (can be configured). -3. Prometheus saves these metrics to its inner TSDB. -4. Grafana queries metrics from Prometheus at a constant interval (can be configured) and then presents them on the - graph. - -So, we need to do some additional works to configure and deploy Prometheus and Grafana. - -For instance, you can config your Prometheus as follows to get metrics data from IoTDB: - -```yaml -job_name: pull-metrics -honor_labels: true -honor_timestamps: true -scrape_interval: 15s -scrape_timeout: 10s -metrics_path: /metrics -scheme: http -follow_redirects: true -static_configs: - - targets: - - localhost:9091 -``` - -The following documents may help you have a good journey with Prometheus and Grafana. - -[Prometheus getting_started](https://prometheus.io/docs/prometheus/latest/getting_started/) - -[Prometheus scrape metrics](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) - -[Grafana getting_started](https://grafana.com/docs/grafana/latest/getting-started/getting-started/) - -[Grafana query metrics from Prometheus](https://prometheus.io/docs/visualization/grafana/#grafana-support-for-prometheus) - -## Apache IoTDB Dashboard - -`Apache IoTDB Dashboard` is available as a supplement to IoTDB Enterprise Edition, designed for unified centralized operations and management. With it, multiple clusters can be monitored through a single panel. You can access the Dashboard's Json file by contacting Commerce. - - -![Apache IoTDB Dashboard](/img/%E7%9B%91%E6%8E%A7%20default%20cluster.png) - -![Apache IoTDB Dashboard](/img/%E7%9B%91%E6%8E%A7%20cluster2.png) - - - -### Cluster Overview - -Including but not limited to: - -- Total cluster CPU cores, memory space, and hard disk space. -- Number of ConfigNodes and DataNodes in the cluster. -- Cluster uptime duration. -- Cluster write speed. -- Current CPU, memory, and disk usage across all nodes in the cluster. -- Information on individual nodes. - -![](/img/%E7%9B%91%E6%8E%A7%20%E6%A6%82%E8%A7%88.png) - - -### Data Writing - -Including but not limited to: - -- Average write latency, median latency, and the 99% percentile latency. -- Number and size of WAL files. -- Node WAL flush SyncBuffer latency. - -![](/img/%E7%9B%91%E6%8E%A7%20%E5%86%99%E5%85%A5.png) - -### Data Querying - -Including but not limited to: - -- Node query load times for time series metadata. -- Node read duration for time series. -- Node edit duration for time series metadata. -- Node query load time for Chunk metadata list. -- Node edit duration for Chunk metadata. -- Node filtering duration based on Chunk metadata. -- Average time to construct a Chunk Reader. - -![](/img/%E7%9B%91%E6%8E%A7%20%E6%9F%A5%E8%AF%A2.png) - -### Storage Engine - -Including but not limited to: - -- File count and sizes by type. -- The count and size of TsFiles at various stages. -- Number and duration of various tasks. - -![](/img/%E7%9B%91%E6%8E%A7%20%E5%AD%98%E5%82%A8%E5%BC%95%E6%93%8E.png) - -### System Monitoring - -Including but not limited to: - -- System memory, swap memory, and process memory. -- Disk space, file count, and file sizes. -- JVM GC time percentage, GC occurrences by type, GC volume, and heap memory usage across generations. -- Network transmission rate, packet sending rate - -![](/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E5%86%85%E5%AD%98%E4%B8%8E%E7%A1%AC%E7%9B%98.png) - -![](/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9Fjvm.png) - -![](/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E7%BD%91%E7%BB%9C.png) diff --git a/src/UserGuide/V1.3.0-2/Tools-System/Monitor-Tool_timecho.md b/src/UserGuide/V1.3.0-2/Tools-System/Monitor-Tool_timecho.md deleted file mode 100644 index d15b5d0e6..000000000 --- a/src/UserGuide/V1.3.0-2/Tools-System/Monitor-Tool_timecho.md +++ /dev/null @@ -1,180 +0,0 @@ - - -# Monitor Tool - -## Prometheus - -### The mapping from metric type to prometheus format - -> For metrics whose Metric Name is name and Tags are K1=V1, ..., Kn=Vn, the mapping is as follows, where value is a -> specific value - -| Metric Type | Mapping | -| ---------------- | ------------------------------------------------------------ | -| Counter | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value | -| AutoGauge、Gauge | name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value | -| Histogram | name_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.5"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.99"} value | -| Rate | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m1"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m5"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m15"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="mean"} value | -| Timer | name_seconds_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.5"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.99"} value | - -### Config File - -1) Taking DataNode as an example, modify the iotdb-datanode.properties configuration file as follows: - -```properties -dn_metric_reporter_list=PROMETHEUS -dn_metric_level=CORE -dn_metric_prometheus_reporter_port=9091 -``` - -Then you can get metrics data as follows - -2) Start IoTDB DataNodes -3) Open a browser or use ```curl``` to visit ```http://servier_ip:9091/metrics```, you can get the following metric - data: - -``` -... -# HELP file_count -# TYPE file_count gauge -file_count{name="wal",} 0.0 -file_count{name="unseq",} 0.0 -file_count{name="seq",} 2.0 -... -``` - -### Prometheus + Grafana - -As shown above, IoTDB exposes monitoring metrics data in the standard Prometheus format to the outside world. Prometheus -can be used to collect and store monitoring indicators, and Grafana can be used to visualize monitoring indicators. - -The following picture describes the relationships among IoTDB, Prometheus and Grafana - -![iotdb_prometheus_grafana](/img/UserGuide/System-Tools/Metrics/iotdb_prometheus_grafana.png) - -1. Along with running, IoTDB will collect its metrics continuously. -2. Prometheus scrapes metrics from IoTDB at a constant interval (can be configured). -3. Prometheus saves these metrics to its inner TSDB. -4. Grafana queries metrics from Prometheus at a constant interval (can be configured) and then presents them on the - graph. - -So, we need to do some additional works to configure and deploy Prometheus and Grafana. - -For instance, you can config your Prometheus as follows to get metrics data from IoTDB: - -```yaml -job_name: pull-metrics -honor_labels: true -honor_timestamps: true -scrape_interval: 15s -scrape_timeout: 10s -metrics_path: /metrics -scheme: http -follow_redirects: true -static_configs: - - targets: - - localhost:9091 -``` - -The following documents may help you have a good journey with Prometheus and Grafana. - -[Prometheus getting_started](https://prometheus.io/docs/prometheus/latest/getting_started/) - -[Prometheus scrape metrics](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) - -[Grafana getting_started](https://grafana.com/docs/grafana/latest/getting-started/getting-started/) - -[Grafana query metrics from Prometheus](https://prometheus.io/docs/visualization/grafana/#grafana-support-for-prometheus) - -## Apache IoTDB Dashboard - -We introduce the Apache IoTDB Dashboard, designed for unified centralized operations and management. With it, multiple clusters can be monitored through a single panel. - -![Apache IoTDB Dashboard](/img/%E7%9B%91%E6%8E%A7%20default%20cluster.png) - -![Apache IoTDB Dashboard](/img/%E7%9B%91%E6%8E%A7%20cluster2.png) - - -You can access the Dashboard's Json file in the enterprise edition. - -### Cluster Overview - -Including but not limited to: - -- Total cluster CPU cores, memory space, and hard disk space. -- Number of ConfigNodes and DataNodes in the cluster. -- Cluster uptime duration. -- Cluster write speed. -- Current CPU, memory, and disk usage across all nodes in the cluster. -- Information on individual nodes. - -![](/img/%E7%9B%91%E6%8E%A7%20%E6%A6%82%E8%A7%88.png) - - -### Data Writing - -Including but not limited to: - -- Average write latency, median latency, and the 99% percentile latency. -- Number and size of WAL files. -- Node WAL flush SyncBuffer latency. - -![](/img/%E7%9B%91%E6%8E%A7%20%E5%86%99%E5%85%A5.png) - -### Data Querying - -Including but not limited to: - -- Node query load times for time series metadata. -- Node read duration for time series. -- Node edit duration for time series metadata. -- Node query load time for Chunk metadata list. -- Node edit duration for Chunk metadata. -- Node filtering duration based on Chunk metadata. -- Average time to construct a Chunk Reader. - -![](/img/%E7%9B%91%E6%8E%A7%20%E6%9F%A5%E8%AF%A2.png) - -### Storage Engine - -Including but not limited to: - -- File count and sizes by type. -- The count and size of TsFiles at various stages. -- Number and duration of various tasks. - -![](/img/%E7%9B%91%E6%8E%A7%20%E5%AD%98%E5%82%A8%E5%BC%95%E6%93%8E.png) - -### System Monitoring - -Including but not limited to: - -- System memory, swap memory, and process memory. -- Disk space, file count, and file sizes. -- JVM GC time percentage, GC occurrences by type, GC volume, and heap memory usage across generations. -- Network transmission rate, packet sending rate - -![](/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E5%86%85%E5%AD%98%E4%B8%8E%E7%A1%AC%E7%9B%98.png) - -![](/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9Fjvm.png) - -![](/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E7%BD%91%E7%BB%9C.png) diff --git a/src/UserGuide/V1.3.0-2/Tools-System/TsFile-Import-Export-Tool.md b/src/UserGuide/V1.3.0-2/Tools-System/TsFile-Import-Export-Tool.md deleted file mode 100644 index 1fbb9a519..000000000 --- a/src/UserGuide/V1.3.0-2/Tools-System/TsFile-Import-Export-Tool.md +++ /dev/null @@ -1,428 +0,0 @@ - - -# TsFile Import Export Script - -For different scenarios, IoTDB provides users with a variety of operation methods for batch importing data. This chapter introduces the two most commonly used methods for importing in the form of CSV text and importing in the form of TsFile files. - -## TsFile Load And Export Script - -### TsFile Load Tool - -#### Introduction - -The load external tsfile tool allows users to load tsfiles, delete a tsfile, or move a tsfile to target directory from the running Apache IoTDB instance. Alternatively, you can use scripts to load tsfiles into IoTDB, for more information. - -#### Load with SQL - -The user sends specified commands to the Apache IoTDB system through the Cli tool or JDBC to use the tool. - -##### Load Tsfiles - -The command to load tsfiles is `load [sglevel=int][verify=true/false][onSuccess=delete/none]`. - -This command has two usages: - -1. Load a single tsfile by specifying a file path (absolute path). - -The first parameter indicates the path of the tsfile to be loaded. This command has three options: sglevel, verify, onSuccess. - -SGLEVEL option. If the database correspond to the tsfile does not exist, the user can set the level of database through the fourth parameter. By default, it uses the database level which is set in `iotdb-common.properties`. - -VERIFY option. If this parameter is true, All timeseries in this loading tsfile will be compared with the timeseries in IoTDB. If existing a measurement which has different datatype with the measurement in IoTDB, the loading process will be stopped and exit. If consistence can be promised, setting false for this parameter will be a better choice. - -ONSUCCESS option. The default value is DELETE, which means the processing method of successfully loaded tsfiles, and DELETE means after the tsfile is successfully loaded, it will be deleted. NONE means after the tsfile is successfully loaded, it will be remained in the origin dir. - -If the `.resource` file corresponding to the file exists, it will be loaded into the data directory and engine of the Apache IoTDB. Otherwise, the corresponding `.resource` file will be regenerated from the tsfile file. - -Examples: - -* `load '/Users/Desktop/data/1575028885956-101-0.tsfile'` -* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true` -* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false` -* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1` -* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' onSuccess=delete` -* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true sglevel=1` -* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1` -* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true onSuccess=none` -* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1 onSuccess=delete` - -2. Load a batch of files by specifying a folder path (absolute path). - -The first parameter indicates the path of the tsfile to be loaded. The options above also works for this command. - -Examples: - -* `load '/Users/Desktop/data'` -* `load '/Users/Desktop/data' verify=false` -* `load '/Users/Desktop/data' verify=true` -* `load '/Users/Desktop/data' verify=true sglevel=1` -* `load '/Users/Desktop/data' verify=false sglevel=1 onSuccess=delete` - -**NOTICE**: When `$IOTDB_HOME$/conf/iotdb-common.properties` has `enable_auto_create_schema=true`, it will automatically create metadata in TSFILE, otherwise it will not be created automatically. - -#### Load with Script - -Run rewrite-tsfile.bat if you are in a Windows environment, or rewrite-tsfile.sh if you are on Linux or Unix. - -```bash -./load-tsfile.bat -f filePath [-h host] [-p port] [-u username] [-pw password] [--sgLevel int] [--verify true/false] [--onSuccess none/delete] --f File/Directory to be load, required --h IoTDB Host address, optional field, 127.0.0.1 by default --p IoTDB port, optional field, 6667 by default --u IoTDB user name, optional field, root by default --pw IoTDB password, optional field, root by default ---sgLevel Sg level of loading Tsfile, optional field, default_storage_group_level in iotdb-common.properties by default ---verify Verify schema or not, optional field, True by default ---onSuccess Delete or remain origin TsFile after loading, optional field, none by default -``` - -##### Example - -Assuming that an IoTDB instance is running on server 192.168.0.101:6667, you want to load all TsFile files from the locally saved TsFile backup folder D:\IoTDB\data into this IoTDB instance. - -First move to the folder `$IOTDB_HOME/tools/`, open the command line, and execute - -```bash -./load-rewrite.bat -f D:\IoTDB\data -h 192.168.0.101 -p 6667 -u root -pw root -``` - -After waiting for the script execution to complete, you can check that the data in the IoTDB instance has been loaded correctly. - -##### Q&A - -- Cannot find or load the main class - - It may be because the environment variable $IOTDB_HOME is not set, please set the environment variable and try again -- -f option must be set! - - The input command is missing the -f field (file or folder path to be loaded) or the -u field (user name), please add it and re-execute -- What if the execution crashes in the middle and you want to reload? - - You re-execute the command just now, reloading the data will not affect the correctness after loading - -TsFile can help you export the result set in the format of TsFile file to the specified path by executing the sql, command line sql, and sql file. - -### TsFile Export Tool - -#### Syntax - -```shell -# Unix/OS X -> tools/export-tsfile.sh -h -p -u -pw -td [-f -q -s ] - -# Windows -> tools\export-tsfile.bat -h -p -u -pw -td [-f -q -s ] -``` - -* `-h `: - - The host address of the IoTDB service. -* `-p `: - - The port number of the IoTDB service. -* `-u `: - - The username of the IoTDB service. -* `-pw `: - - Password for IoTDB service. -* `-td `: - - Specify the output path for the exported TsFile file. -* `-f `: - - For the file name of the exported TsFile file, just write the file name, and cannot include the file path and suffix. If the sql file or console input contains multiple sqls, multiple files will be generated in the order of sql. - - Example: There are three SQLs in the file or command line, and -f param is "dump", then three TsFile files: dump0.tsfile、dump1.tsfile、dump2.tsfile will be generated in the target path. -* `-q `: - - Directly specify the query statement you want to execute in the command. - - Example: `select * from root.** limit 100` -* `-s `: - - Specify a SQL file that contains one or more SQL statements. If an SQL file contains multiple SQL statements, the SQL statements should be separated by newlines. Each SQL statement corresponds to an output TsFile file. -* `-t `: - - Specifies the timeout period for session queries, in milliseconds - - -In addition, if you do not use the `-s` and `-q` parameters, after the export script is started, you need to enter the query statement as prompted by the program, and different query results will be saved to different TsFile files. - -#### Example - -```shell -# Unix/OS X -> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -# or -> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" -# Or -> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -# Or -> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -# Or -> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 - -# Windows -> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -# Or -> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" -# Or -> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -# Or -> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -# Or -> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 -``` - -#### Q&A - -- It is recommended not to execute the write data command at the same time when loading data, which may lead to insufficient memory in the JVM. - -## CSV Tool - -The CSV tool can help you import data in CSV format to IoTDB or export data from IoTDB to a CSV file. - -### Usage of export-csv.sh - -#### Syntax - -```shell -# Unix/OS X -> tools/export-csv.sh -h -p -u -pw -td [-tf -datatype -q -s -linesPerFile ] - -# Windows -> tools\export-csv.bat -h -p -u -pw -td [-tf -datatype -q -s -linesPerFile ] -``` - -Description: - -* `-datatype`: - - true (by default): print the data type of timesries in the head line of CSV file. i.e., `Time, root.sg1.d1.s1(INT32), root.sg1.d1.s2(INT64)`. - - false: only print the timeseries name in the head line of the CSV file. i.e., `Time, root.sg1.d1.s1 , root.sg1.d1.s2` -* `-q `: - - specifying a query command that you want to execute - - example: `select * from root.** limit 100`, or `select * from root.** limit 100 align by device` -* `-s `: - - specifying a SQL file which can consist of more than one sql. If there are multiple SQLs in one SQL file, the SQLs should be separated by line breaks. And, for each SQL, a output CSV file will be generated. -* `-td `: - - specifying the directory that the data will be exported -* `-tf `: - - specifying a time format that you want. The time format have to obey [ISO 8601](https://calendars.wikia.org/wiki/ISO_8601) standard. If you want to save the time as the timestamp, then setting `-tf timestamp` - - example: `-tf yyyy-MM-dd\ HH:mm:ss` or `-tf timestamp` -* `-linesPerFile `: - - Specifying lines of each dump file, `10000` is default. - - example: `-linesPerFile 1` -* `-t `: - - Specifies the timeout period for session queries, in milliseconds - - -More, if you don't use one of `-s` and `-q`, you need to enter some queries after running the export script. The results of the different query will be saved to different CSV files. - -#### Example - -```shell -# Unix/OS X -> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -# Or -> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -# or -> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" -# Or -> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt -# Or -> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -# Or -> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -# Or -> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000 - -# Windows -> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -# Or -> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -# or -> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" -# Or -> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt -# Or -> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -# Or -> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -# Or -> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000 -``` - -#### Sample SQL file - -```sql -select * from root.**; -select * from root.** align by device; -``` - -The result of `select * from root.**` - -```sql -Time,root.ln.wf04.wt04.status(BOOLEAN),root.ln.wf03.wt03.hardware(TEXT),root.ln.wf02.wt02.status(BOOLEAN),root.ln.wf02.wt02.hardware(TEXT),root.ln.wf01.wt01.hardware(TEXT),root.ln.wf01.wt01.status(BOOLEAN) -1970-01-01T08:00:00.001+08:00,true,"v1",true,"v1",v1,true -1970-01-01T08:00:00.002+08:00,true,"v1",,,,true -``` - -The result of `select * from root.** align by device` - -```sql -Time,Device,hardware(TEXT),status(BOOLEAN) -1970-01-01T08:00:00.001+08:00,root.ln.wf01.wt01,"v1",true -1970-01-01T08:00:00.002+08:00,root.ln.wf01.wt01,,true -1970-01-01T08:00:00.001+08:00,root.ln.wf02.wt02,"v1",true -1970-01-01T08:00:00.001+08:00,root.ln.wf03.wt03,"v1", -1970-01-01T08:00:00.002+08:00,root.ln.wf03.wt03,"v1", -1970-01-01T08:00:00.001+08:00,root.ln.wf04.wt04,,true -1970-01-01T08:00:00.002+08:00,root.ln.wf04.wt04,,true -``` - -The data of boolean type signed by `true` and `false` without double quotes. And the text data will be enclosed in double quotes. - -#### Note - -Note that if fields exported by the export tool have the following special characters: - -1. `,`: the field will be escaped by `\`. - -### Usage of import-csv.sh - -#### Create Metadata (optional) - -```sql -CREATE DATABASE root.fit.d1; -CREATE DATABASE root.fit.d2; -CREATE DATABASE root.fit.p; -CREATE TIMESERIES root.fit.d1.s1 WITH DATATYPE=INT32,ENCODING=RLE; -CREATE TIMESERIES root.fit.d1.s2 WITH DATATYPE=TEXT,ENCODING=PLAIN; -CREATE TIMESERIES root.fit.d2.s1 WITH DATATYPE=INT32,ENCODING=RLE; -CREATE TIMESERIES root.fit.d2.s3 WITH DATATYPE=INT32,ENCODING=RLE; -CREATE TIMESERIES root.fit.p.s1 WITH DATATYPE=INT32,ENCODING=RLE; -``` - -IoTDB has the ability of type inference, so it is not necessary to create metadata before data import. However, we still recommend creating metadata before importing data using the CSV import tool, as this can avoid unnecessary type conversion errors. - -#### Sample CSV File to Be Imported - -The data aligned by time, and headers without data type. - -```sql -Time,root.test.t1.str,root.test.t2.str,root.test.t2.int -1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 -1970-01-01T08:00:00.002+08:00,"123",, -``` - -The data aligned by time, and headers with data type.(Text type data supports double quotation marks and no double quotation marks) - -```sql -Time,root.test.t1.str(TEXT),root.test.t2.str(TEXT),root.test.t2.int(INT32) -1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 -1970-01-01T08:00:00.002+08:00,123,hello world,123 -1970-01-01T08:00:00.003+08:00,"123",, -1970-01-01T08:00:00.004+08:00,123,,12 -``` - -The data aligned by device, and headers without data type. - -```sql -Time,Device,str,int -1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", -1970-01-01T08:00:00.002+08:00,root.test.t1,"123", -1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 -``` - -The data aligned by device, and headers with data type.(Text type data supports double quotation marks and no double quotation marks) - -```sql -Time,Device,str(TEXT),int(INT32) -1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", -1970-01-01T08:00:00.002+08:00,root.test.t1,hello world,123 -1970-01-01T08:00:00.003+08:00,root.test.t1,,123 -``` - -#### Syntax - -```shell -# Unix/OS X -> tools/import-csv.sh -h -p -u -pw -f [-fd <./failedDirectory>] [-aligned ] [-tp ] [-typeInfer ] -# Windows -> tools\import-csv.bat -h -p -u -pw -f [-fd <./failedDirectory>] [-aligned ] [-tp ] [-typeInfer ] -``` - -Description: - -* `-f`: - - the CSV file that you want to import, and it could be a file or a folder. If a folder is specified, all TXT and CSV files in the folder will be imported in batches. - - example: `-f filename.csv` - -* `-fd`: - - specifying a directory to save files which save failed lines. If you don't use this parameter, the failed file will be saved at original directory, and the filename will be the source filename with suffix `.failed`. - - example: `-fd ./failed/` - -* `-aligned`: - - whether to use the aligned interface? The option `false` is default. - - example: `-aligned true` - -* `-batch`: - - specifying the point's number of a batch. If the program throw the exception `org.apache.thrift.transport.TTransportException: Frame size larger than protect max size`, you can lower this parameter as appropriate. - - example: `-batch 100000`, `100000` is the default value. - -* `-tp `: - - specifying a time precision. Options includes `ms`(millisecond), `ns`(nanosecond), and `us`(microsecond), `ms` is default. - -* `-typeInfer `: - - specifying rules of type inference. - - Option `srcTsDataType` includes `boolean`,`int`,`long`,`float`,`double`,`NaN`. - - Option `dstTsDataType` includes `boolean`,`int`,`long`,`float`,`double`,`text`. - - When `srcTsDataType` is `boolean`, `dstTsDataType` should be between `boolean` and `text`. - - When `srcTsDataType` is `NaN`, `dstTsDataType` should be among `float`, `double` and `text`. - - When `srcTsDataType` is Numeric type, `dstTsDataType` precision should be greater than `srcTsDataType`. - - example: `-typeInfer boolean=text,float=double` - -* `-linesPerFailedFile `: - - Specifying lines of each failed file, `10000` is default. - - example: `-linesPerFailedFile 1` - -#### Example - -```sh -# Unix/OS X -> tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -# or -> tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -# or -> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -# or -> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double -# or -> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10 - -# Windows -> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -# or -> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -# or -> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -# or -> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double -# or -> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10 - -``` - -#### Note - -Note that the following special characters in fields need to be checked before importing: - -1. `,` : fields containing `,` should be escaped by `\`. -2. you can input time format like `yyyy-MM-dd'T'HH:mm:ss`, `yyy-MM-dd HH:mm:ss`, or `yyyy-MM-dd'T'HH:mm:ss.SSSZ`. -3. the `Time` column must be the first one. \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/Tools-System/Workbench_timecho.md b/src/UserGuide/V1.3.0-2/Tools-System/Workbench_timecho.md deleted file mode 100644 index 18955836a..000000000 --- a/src/UserGuide/V1.3.0-2/Tools-System/Workbench_timecho.md +++ /dev/null @@ -1,30 +0,0 @@ -# WorkBench -## Product Introduction -IoTDB Visualization Console is an extension component developed for industrial scenarios based on the IoTDB Enterprise Edition time series database. It integrates real-time data collection, storage, and analysis, aiming to provide users with efficient and reliable real-time data storage and query solutions. It features lightweight, high performance, and ease of use, seamlessly integrating with the Hadoop and Spark ecosystems. It is suitable for high-speed writing and complex analytical queries of massive time series data in industrial IoT applications. - -## Instructions for Use -| **Functional Module** | **Functional Description** | -| ---------------------- | ------------------------------------------------------------ | -| Instance Management | Support unified management of connected instances, support creation, editing, and deletion, while visualizing the relationships between multiple instances, helping customers manage multiple database instances more clearly | -| Home | Support viewing the service running status of each node in the database instance (such as activation status, running status, IP information, etc.), support viewing the running monitoring status of clusters, ConfigNodes, and DataNodes, monitor the operational health of the database, and determine if there are any potential operational issues with the instance. | -| Measurement Point List | Support directly viewing the measurement point information in the instance, including database information (such as database name, data retention time, number of devices, etc.), and measurement point information (measurement point name, data type, compression encoding, etc.), while also supporting the creation, export, and deletion of measurement points either individually or in batches. | -| Data Model | Support viewing hierarchical relationships and visually displaying the hierarchical model. | -| Data Query | Support interface-based query interactions for common data query scenarios, and enable batch import and export of queried data. | -| Statistical Query | Support interface-based query interactions for common statistical data scenarios, such as outputting results for maximum, minimum, average, and sum values. | -| SQL Operations | Support interactive SQL operations on the database through a graphical user interface, allowing for the execution of single or multiple statements, and displaying and exporting the results. | -| Trend | Support one-click visualization to view the overall trend of data, draw real-time and historical data for selected measurement points, and observe the real-time and historical operational status of the measurement points. | -| Analysis | Support visualizing data through different analysis methods (such as FFT) for visualization. | -| View | Support viewing information such as view name, view description, result measuring points, and expressions through the interface. Additionally, enable users to quickly create, edit, and delete views through interactive interfaces. | -| Data synchronization | Support the intuitive creation, viewing, and management of data synchronization tasks between databases. Enable direct viewing of task running status, synchronized data, and target addresses. Users can also monitor changes in synchronization status in real-time through the interface. | -| Permission management | Support interface-based control of permissions for managing and controlling database user access and operations. | -| Audit logs | Support detailed logging of user operations on the database, including Data Definition Language (DDL), Data Manipulation Language (DML), and query operations. Assist users in tracking and identifying potential security threats, database errors, and misuse behavior. | - -Main feature showcase -* Home -![首页.png](/img/%E9%A6%96%E9%A1%B5.png) -* Measurement Point List -![测点列表.png](/img/workbench-en-bxzk.png) -* Data Query -![数据查询.png](/img/%E6%95%B0%E6%8D%AE%E6%9F%A5%E8%AF%A2.png) -* Trend -![历史趋势.png](/img/%E5%8E%86%E5%8F%B2%E8%B6%8B%E5%8A%BF.png) \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/User-Manual/AINode_timecho.md b/src/UserGuide/V1.3.0-2/User-Manual/AINode_timecho.md deleted file mode 100644 index ba4f69d38..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/AINode_timecho.md +++ /dev/null @@ -1,655 +0,0 @@ - - -# AI Capability(AINode) - -AINode is the third internal node after ConfigNode and DataNode in Apache IoTDB, which extends the capability of machine learning analysis of time series by interacting with DataNode and ConfigNode of IoTDB cluster, supports the introduction of pre-existing machine learning models from the outside to be registered, and uses the registered models in the It supports the process of introducing existing machine learning models from outside for registration, and using the registered models to complete the time series analysis tasks on the specified time series data through simple SQL statements, which integrates the model creation, management and inference in the database engine. At present, we have provided machine learning algorithms or self-developed models for common timing analysis scenarios (e.g. prediction and anomaly detection). - -The system architecture is shown below: -::: center - -::: -The responsibilities of the three nodes are as follows: - -- **ConfigNode**: responsible for storing and managing the meta-information of the model; responsible for distributed node management. -- **DataNode**: responsible for receiving and parsing SQL requests from users; responsible for storing time-series data; responsible for preprocessing computation of data. -- **AINode**: responsible for model file import creation and model inference. - -## Advantageous features - -Compared with building a machine learning service alone, it has the following advantages: - -- **Simple and easy to use**: no need to use Python or Java programming, the complete process of machine learning model management and inference can be completed using SQL statements. Creating a model can be done using the CREATE MODEL statement, and using a model for inference can be done using the CALL INFERENCE (...) statement, making it simpler and more convenient to use. - - -- **Avoid Data Migration**: With IoTDB native machine learning, data stored in IoTDB can be directly applied to the inference of machine learning models without having to move the data to a separate machine learning service platform, which accelerates data processing, improves security, and reduces costs. - -![](/img/AInode1.png) - -- **Built-in Advanced Algorithms**: supports industry-leading machine learning analytics algorithms covering typical timing analysis tasks, empowering the timing database with native data analysis capabilities. Such as: - - **Time Series Forecasting**: learns patterns of change from past time series; thus outputs the most likely prediction of future series based on observations at a given past time. - - **Anomaly Detection for Time Series**: detects and identifies outliers in a given time series data, helping to discover anomalous behaviour in the time series. - - **Annotation for Time Series (Time Series Annotation)**: Adds additional information or markers, such as event occurrence, outliers, trend changes, etc., to each data point or specific time period to better understand and analyse the data. - - - -## Basic Concepts - -- **Model**: a machine learning model that takes time-series data as input and outputs the results or decisions of an analysis task. Model is the basic management unit of AINode, which supports adding (registration), deleting, checking, and using (inference) of models. -- **Create**: Load externally designed or trained model files or algorithms into MLNode for unified management and use by IoTDB. -- **Inference**: The process of using the created model to complete the timing analysis task applicable to the model on the specified timing data. -- **Built-in capabilities**: AINode comes with machine learning algorithms or home-grown models for common timing analysis scenarios (e.g., prediction and anomaly detection). - -::: center - -:::: - -## Installation and Deployment - -The deployment of AINode can be found in the document [Deployment Guidelines](../Deployment-and-Maintenance/AINode_Deployment_timecho.md#AINode-部署) . - - -## Usage Guidelines - -AINode provides model creation and deletion process for deep learning models related to timing data. Built-in models do not need to be created and deleted, they can be used directly, and the built-in model instances created after inference is completed will be destroyed automatically. - -### Registering Models - -A trained deep learning model can be registered by specifying the vector dimensions of the model's inputs and outputs, which can be used for model inference. - -Models that meet the following criteria can be registered in AINode: -1. Models trained on PyTorch 2.1.0 and 2.2.0 versions supported by AINode should avoid using features from versions 2.2.0 and above. -2. AINode supports models stored using PyTorch JIT, and the model file needs to include the parameters and structure of the model. -3. The input sequence of the model can contain one or more columns, and if there are multiple columns, they need to correspond to the model capability and model configuration file. -4. The input and output dimensions of the model must be clearly defined in the `config.yaml` configuration file. When using the model, it is necessary to strictly follow the input-output dimensions defined in the `config.yaml` configuration file. If the number of input and output columns does not match the configuration file, it will result in errors. - -The following is the SQL syntax definition for model registration. - -```SQL -create model using uri -``` - -The specific meanings of the parameters in the SQL are as follows: - -- model_name: a globally unique identifier for the model, which cannot be repeated. The model name has the following constraints: - - - Identifiers [ 0-9 a-z A-Z _ ] (letters, numbers, underscores) are allowed. - - Length is limited to 2-64 characters - - Case sensitive - -- uri: resource path to the model registration file, which should contain the **model weights model.pt file and the model's metadata description file config.yaml**. - - - Model weight file: the weight file obtained after the training of the deep learning model is completed, currently supporting pytorch training of the .pt file - - - yaml metadata description file: parameters related to the model structure that need to be provided when the model is registered, which must contain the input and output dimensions of the model for model inference: - - - | **Parameter name** | **Parameter description** | **Example** | - | ------------ | ---------------------------- | -------- | - | input_shape | Rows and columns of model inputs for model inference | [96,2] | - | output_shape | rows and columns of model outputs, for model inference | [48,2] | - - - In addition to model inference, the data types of model input and output can be specified: - - - | **Parameter name** | **Parameter description** | **Example** | - | ----------- | ------------------ | --------------------- | - | input_type | model input data type | ['float32','float32'] | - | output_type | data type of the model output | ['float32','float32'] | - - - In addition to this, additional notes can be specified for display during model management - - - | **Parameter name** | **Parameter description** | **Examples** | - | ---------- | ---------------------------------------------- | ------------------------------------------- | - | attributes | optional, user-defined model notes for model display | 'model_type': 'dlinear','kernel_size': '25' | - - -In addition to registration of local model files, registration can also be done by specifying remote resource paths via URIs, using open source model repositories (e.g. HuggingFace). - -#### Example - -In the current example folder, it contains model.pt and config.yaml files, model.pt is the training get, and the content of config.yaml is as follows: - -```YAML -configs. - # Required options - input_shape: [96, 2] # The model receives data in 96 rows x 2 columns. - output_shape: [48, 2] # Indicates that the model outputs 48 rows x 2 columns. - - # Optional Default is all float32 and the number of columns is the number of columns in the shape. - input_type: ["int64", "int64"] # Input data type, need to match the number of columns. - output_type: ["text", "int64"] #Output data type, need to match the number of columns. - -attributes: # Optional user-defined notes for the input. - 'model_type': 'dlinear' - 'kernel_size': '25' -``` - -Specify this folder as the load path to register the model. - -```SQL -IoTDB> create model dlinear_example using uri "file://. /example" -``` - -Alternatively, you can download the corresponding model file from huggingFace and register it. - -```SQL -IoTDB> create model dlinear_example using uri "https://huggingface.com/IoTDBML/dlinear/" -``` - -After the SQL is executed, the registration process will be carried out asynchronously, and you can view the registration status of the model through the model showcase (see the Model Showcase section), and the time consumed for successful registration is mainly affected by the size of the model file. - -Once the model registration is complete, you can call specific functions and perform model inference by using normal queries. - -### Viewing Models - -Successfully registered models can be queried for model-specific information through the show models command. The SQL definition is as follows: - -```SQL -show models - -show models -``` - -In addition to displaying information about all models directly, you can specify a model id to view information about a specific model. The results of the model show contain the following information: - -| **ModelId** | **State** | **Configs** | **Attributes** | -| ------------ | ------------------------------------- | ---------------------------------------------- | -------------- | -| Model Unique Identifier | Model Registration Status (LOADING, ACTIVE, DROPPING) | InputShape, outputShapeInputTypes, outputTypes | Model Notes | - -State is used to show the current state of model registration, which consists of the following three stages - -- **LOADING**: The corresponding model meta information has been added to the configNode, and the model file is being transferred to the AINode node. -- **ACTIVE**: The model has been set up and the model is in the available state -- **DROPPING**: Model deletion is in progress, model related information is being deleted from configNode and AINode. -- **UNAVAILABLE**: Model creation failed, you can delete the failed model_name by drop model. - -#### Example - -```SQL -IoTDB> show models - - -+---------------------+--------------------------+-----------+----------------------------+-----------------------+ -| ModelId| ModelType| State| Configs| Notes| -+---------------------+--------------------------+-----------+----------------------------+-----------------------+ -| dlinear_example| USER_DEFINED| ACTIVE| inputShape:[96,2]| | -| | | | outputShape:[48,2]| | -| | | | inputDataType:[float,float]| | -| | | |outputDataType:[float,float]| | -| _STLForecaster| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| -| _NaiveForecaster| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| -| _ARIMA| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| -|_ExponentialSmoothing| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| -| _GaussianHMM|BUILT_IN_ANOMALY_DETECTION| ACTIVE| |Built-in model in IoTDB| -| _GMMHMM|BUILT_IN_ANOMALY_DETECTION| ACTIVE| |Built-in model in IoTDB| -| _Stray|BUILT_IN_ANOMALY_DETECTION| ACTIVE| |Built-in model in IoTDB| -+---------------------+--------------------------+-----------+------------------------------------------------------------+-----------------------+ -``` - -We have registered the corresponding model earlier, you can view the model status through the corresponding designation, active indicates that the model is successfully registered and can be used for inference. - -### Delete Model - -For a successfully registered model, the user can delete it via SQL. In addition to deleting the meta information on the configNode, this operation also deletes all the related model files under the AINode. The SQL is as follows: - -```SQL -drop model -``` - -You need to specify the model model_name that has been successfully registered to delete the corresponding model. Since model deletion involves the deletion of data on multiple nodes, the operation will not be completed immediately, and the state of the model at this time is DROPPING, and the model in this state cannot be used for model inference. - -### Using Built-in Model Reasoning - -The SQL syntax is as follows: - - -```SQL -call inference(,sql[,=]) -``` - -Built-in model inference does not require a registration process, the inference function can be used by calling the inference function through the call keyword, and its corresponding parameters are described as follows: - -- **built_in_model_name**: built-in model name -- **parameterName**: parameter name -- **parameterValue**: parameter value - -#### Built-in Models and Parameter Descriptions - -The following machine learning models are currently built-in, please refer to the following links for detailed parameter descriptions. - -| Model | built_in_model_name | Task type | Parameter description | -| -------------------- | --------------------- | -------- | ------------------------------------------------------------ | -| Arima | _Arima | Forecast | [Arima Parameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.arima.ARIMA.html?highlight=Arima) | -| STLForecaster | _STLForecaster | Forecast | [STLForecaster Parameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.trend.STLForecaster.html#sktime.forecasting.trend.STLForecaster) | -| NaiveForecaster | _NaiveForecaster | Forecast | [NaiveForecaster Parameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.naive.NaiveForecaster.html#naiveforecaster) | -| ExponentialSmoothing | _ExponentialSmoothing | Forecast | [ExponentialSmoothing 参Parameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.exp_smoothing.ExponentialSmoothing.html) | -| GaussianHMM | _GaussianHMM | Annotation | [GaussianHMMParameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.annotation.hmm_learn.gaussian.GaussianHMM.html) | -| GMMHMM | _GMMHMM | Annotation | [GMMHMM参数说明](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.annotation.hmm_learn.gmm.GMMHMM.html) | -| Stray | _Stray | Anomaly detection | [Stray Parameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.annotation.stray.STRAY.html) | - - -#### Example - -The following is an example of an operation using built-in model inference. The built-in Stray model is used for anomaly detection algorithm. The input is `[144,1]` and the output is `[144,1]`. We use it for reasoning through SQL. - -```SQL -IoTDB> select * from root.eg.airline -+-----------------------------+------------------+ -| Time|root.eg.airline.s0| -+-----------------------------+------------------+ -|1949-01-31T00:00:00.000+08:00| 224.0| -|1949-02-28T00:00:00.000+08:00| 118.0| -|1949-03-31T00:00:00.000+08:00| 132.0| -|1949-04-30T00:00:00.000+08:00| 129.0| -...... -|1960-09-30T00:00:00.000+08:00| 508.0| -|1960-10-31T00:00:00.000+08:00| 461.0| -|1960-11-30T00:00:00.000+08:00| 390.0| -|1960-12-31T00:00:00.000+08:00| 432.0| -+-----------------------------+------------------+ -Total line number = 144 - -IoTDB> call inference(_Stray, "select s0 from root.eg.airline", k=2) -+-------+ -|output0| -+-------+ -| 0| -| 0| -| 0| -| 0| -...... -| 1| -| 1| -| 0| -| 0| -| 0| -| 0| -+-------+ -Total line number = 144 -``` - -### Reasoning with Deep Learning Models - -The SQL syntax is as follows: - -```SQL -call inference(,sql[,window=]) - - -window_function: - head(window_size) - tail(window_size) - count(window_size,sliding_step) -``` - -After completing the registration of the model, the inference function can be used by calling the inference function through the call keyword, and its corresponding parameters are described as follows: - -- **model_name**: corresponds to a registered model -- **sql**: sql query statement, the result of the query is used as input to the model for model inference. The dimensions of the rows and columns in the result of the query need to match the size specified in the specific model config. (It is not recommended to use the `SELECT *` clause for the sql here because in IoTDB, `*` does not sort the columns, so the order of the columns is undefined, you can use `SELECT s0,s1` to ensure that the columns order matches the expectations of the model input) -- **window_function**: Window functions that can be used in the inference process, there are currently three types of window functions provided to assist in model inference: - - **head(window_size)**: Get the top window_size points in the data for model inference, this window can be used for data cropping. - ![](/img/AINode-call1.png) - - - **tail(window_size)**: get the last window_size point in the data for model inference, this window can be used for data cropping. - ![](/img/AINode-call2.png) - - - **count(window_size, sliding_step)**: sliding window based on the number of points, the data in each window will be reasoned through the model respectively, as shown in the example below, window_size for 2 window function will be divided into three windows of the input dataset, and each window will perform reasoning operations to generate results respectively. The window can be used for continuous inference - ![](/img/AINode-call3.png) - -**Explanation 1**: window can be used to solve the problem of cropping rows when the results of the sql query and the input row requirements of the model do not match. Note that when the number of columns does not match or the number of rows is directly less than the model requirement, the inference cannot proceed and an error message will be returned. - -**Explanation 2**: In deep learning applications, timestamp-derived features (time columns in the data) are often used as covariates in generative tasks, and are input into the model together to enhance the model, but the time columns are generally not included in the model's output. In order to ensure the generality of the implementation, the model inference results only correspond to the real output of the model, if the model does not output the time column, it will not be included in the results. - - -#### Example - -The following is an example of inference in action using a deep learning model, for the `dlinear` prediction model with input `[96,2]` and output `[48,2]` mentioned above, which we use via SQL. - -```Shell -IoTDB> select s1,s2 from root.** -+-----------------------------+-------------------+-------------------+ -| Time| root.eg.etth.s0| root.eg.etth.s1| -+-----------------------------+-------------------+-------------------+ -|1990-01-01T00:00:00.000+08:00| 0.7855| 1.611| -|1990-01-02T00:00:00.000+08:00| 0.7818| 1.61| -|1990-01-03T00:00:00.000+08:00| 0.7867| 1.6293| -|1990-01-04T00:00:00.000+08:00| 0.786| 1.637| -|1990-01-05T00:00:00.000+08:00| 0.7849| 1.653| -|1990-01-06T00:00:00.000+08:00| 0.7866| 1.6537| -|1990-01-07T00:00:00.000+08:00| 0.7886| 1.662| -...... -|1990-03-31T00:00:00.000+08:00| 0.7585| 1.678| -|1990-04-01T00:00:00.000+08:00| 0.7587| 1.6763| -|1990-04-02T00:00:00.000+08:00| 0.76| 1.6813| -|1990-04-03T00:00:00.000+08:00| 0.7669| 1.684| -|1990-04-04T00:00:00.000+08:00| 0.7645| 1.677| -|1990-04-05T00:00:00.000+08:00| 0.7625| 1.68| -|1990-04-06T00:00:00.000+08:00| 0.7617| 1.6917| -+-----------------------------+-------------------+-------------------+ -Total line number = 96 - -IoTDB> call inference(dlinear_example,"select s0,s1 from root.**") -+--------------------------------------------+-----------------------------+ -| _result_0| _result_1| -+--------------------------------------------+-----------------------------+ -| 0.726302981376648| 1.6549958229064941| -| 0.7354921698570251| 1.6482787370681763| -| 0.7238251566886902| 1.6278168201446533| -...... -| 0.7692174911499023| 1.654654049873352| -| 0.7685555815696716| 1.6625318765640259| -| 0.7856493592262268| 1.6508299350738525| -+--------------------------------------------+-----------------------------+ -Total line number = 48 -``` - -#### Example of using the tail/head window function - -When the amount of data is variable and you want to take the latest 96 rows of data for inference, you can use the corresponding window function tail. head function is used in a similar way, except that it takes the earliest 96 points. - -```Shell -IoTDB> select s1,s2 from root.** -+-----------------------------+-------------------+-------------------+ -| Time| root.eg.etth.s0| root.eg.etth.s1| -+-----------------------------+-------------------+-------------------+ -|1988-01-01T00:00:00.000+08:00| 0.7355| 1.211| -...... -|1990-01-01T00:00:00.000+08:00| 0.7855| 1.611| -|1990-01-02T00:00:00.000+08:00| 0.7818| 1.61| -|1990-01-03T00:00:00.000+08:00| 0.7867| 1.6293| -|1990-01-04T00:00:00.000+08:00| 0.786| 1.637| -|1990-01-05T00:00:00.000+08:00| 0.7849| 1.653| -|1990-01-06T00:00:00.000+08:00| 0.7866| 1.6537| -|1990-01-07T00:00:00.000+08:00| 0.7886| 1.662| -...... -|1990-03-31T00:00:00.000+08:00| 0.7585| 1.678| -|1990-04-01T00:00:00.000+08:00| 0.7587| 1.6763| -|1990-04-02T00:00:00.000+08:00| 0.76| 1.6813| -|1990-04-03T00:00:00.000+08:00| 0.7669| 1.684| -|1990-04-04T00:00:00.000+08:00| 0.7645| 1.677| -|1990-04-05T00:00:00.000+08:00| 0.7625| 1.68| -|1990-04-06T00:00:00.000+08:00| 0.7617| 1.6917| -+-----------------------------+-------------------+-------------------+ -Total line number = 996 - -IoTDB> call inference(dlinear_example,"select s0,s1 from root.**",window=tail(96)) -+--------------------------------------------+-----------------------------+ -| _result_0| _result_1| -+--------------------------------------------+-----------------------------+ -| 0.726302981376648| 1.6549958229064941| -| 0.7354921698570251| 1.6482787370681763| -| 0.7238251566886902| 1.6278168201446533| -...... -| 0.7692174911499023| 1.654654049873352| -| 0.7685555815696716| 1.6625318765640259| -| 0.7856493592262268| 1.6508299350738525| -+--------------------------------------------+-----------------------------+ -Total line number = 48 -``` - -#### Example of using the count window function - -This window is mainly used for computational tasks. When the task's corresponding model can only handle a fixed number of rows of data at a time, but the final desired outcome is multiple sets of prediction results, this window function can be used to perform continuous inference using a sliding window of points. Suppose we now have an anomaly detection model `anomaly_example(input: [24,2], output[1,1])`, which generates a 0/1 label for every 24 rows of data. An example of its use is as follows: - -```Shell -IoTDB> select s1,s2 from root.** -+-----------------------------+-------------------+-------------------+ -| Time| root.eg.etth.s0| root.eg.etth.s1| -+-----------------------------+-------------------+-------------------+ -|1990-01-01T00:00:00.000+08:00| 0.7855| 1.611| -|1990-01-02T00:00:00.000+08:00| 0.7818| 1.61| -|1990-01-03T00:00:00.000+08:00| 0.7867| 1.6293| -|1990-01-04T00:00:00.000+08:00| 0.786| 1.637| -|1990-01-05T00:00:00.000+08:00| 0.7849| 1.653| -|1990-01-06T00:00:00.000+08:00| 0.7866| 1.6537| -|1990-01-07T00:00:00.000+08:00| 0.7886| 1.662| -...... -|1990-03-31T00:00:00.000+08:00| 0.7585| 1.678| -|1990-04-01T00:00:00.000+08:00| 0.7587| 1.6763| -|1990-04-02T00:00:00.000+08:00| 0.76| 1.6813| -|1990-04-03T00:00:00.000+08:00| 0.7669| 1.684| -|1990-04-04T00:00:00.000+08:00| 0.7645| 1.677| -|1990-04-05T00:00:00.000+08:00| 0.7625| 1.68| -|1990-04-06T00:00:00.000+08:00| 0.7617| 1.6917| -+-----------------------------+-------------------+-------------------+ -Total line number = 96 - -IoTDB> call inference(anomaly_example,"select s0,s1 from root.**",window=count(24,24)) -+-------------------------+ -| _result_0| -+-------------------------+ -| 0| -| 1| -| 1| -| 0| -+-------------------------+ -Total line number = 4 -``` - -In the result set, each row's label corresponds to the output of the anomaly detection model after inputting each group of 24 rows of data. - -## Privilege Management - -When using AINode related functions, the authentication of IoTDB itself can be used to do a permission management, users can only use the model management related functions when they have the USE_MODEL permission. When using the inference function, the user needs to have the permission to access the source sequence corresponding to the SQL of the input model. - -| Privilege Name | Privilege Scope | Administrator User (default ROOT) | Normal User | Path Related | -| --------- | --------------------------------- | ---------------------- | -------- | -------- | -| USE_MODEL | create model/show models/drop model | √ | √ | x | -| READ_DATA| call inference | √ | √|√ | - -## Practical Examples - -### Power Load Prediction - -In some industrial scenarios, there is a need to predict power loads, which can be used to optimise power supply, conserve energy and resources, support planning and expansion, and enhance power system reliability. - -The data for the test set of ETTh1 that we use is [ETTh1](/img/ETTh1.csv). - - -It contains power data collected at 1h intervals, and each data consists of load and oil temperature as High UseFul Load, High UseLess Load, Middle UseLess Load, Low UseFul Load, Low UseLess Load, Oil Temperature. - -On this dataset, the model inference function of IoTDB-ML can predict the oil temperature in the future period of time through the relationship between the past values of high, middle and low use loads and the corresponding time stamp oil temperature, which empowers the automatic regulation and monitoring of grid transformers. - -#### Step 1: Data Import - -Users can import the ETT dataset into IoTDB using `import-csv.sh` in the tools folder - -``Bash -bash . /import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f ... /... /ETTh1.csv -`` - -#### Step 2: Model Import - -We can enter the following SQL in iotdb-cli to pull a trained model from huggingface for registration for subsequent inference. - -```SQL -create model dlinear using uri 'https://huggingface.co/hvlgo/dlinear/tree/main' -``` - -This model is trained on the lighter weight deep model DLinear, which is able to capture as many trends within a sequence and relationships between variables as possible with relatively fast inference, making it more suitable for fast real-time prediction than other deeper models. - -#### Step 3: Model inference - -```Shell -IoTDB> select s0,s1,s2,s3,s4,s5,s6 from root.eg.etth LIMIT 96 -+-----------------------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+ -| Time|root.eg.etth.s0|root.eg.etth.s1|root.eg.etth.s2|root.eg.etth.s3|root.eg.etth.s4|root.eg.etth.s5|root.eg.etth.s6| -+-----------------------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+ -|2017-10-20T00:00:00.000+08:00| 10.449| 3.885| 8.706| 2.025| 2.041| 0.944| 8.864| -|2017-10-20T01:00:00.000+08:00| 11.119| 3.952| 8.813| 2.31| 2.071| 1.005| 8.442| -|2017-10-20T02:00:00.000+08:00| 9.511| 2.88| 7.533| 1.564| 1.949| 0.883| 8.16| -|2017-10-20T03:00:00.000+08:00| 9.645| 2.21| 7.249| 1.066| 1.828| 0.914| 7.949| -...... -|2017-10-23T20:00:00.000+08:00| 8.105| 0.938| 4.371| -0.569| 3.533| 1.279| 9.708| -|2017-10-23T21:00:00.000+08:00| 7.167| 1.206| 4.087| -0.462| 3.107| 1.432| 8.723| -|2017-10-23T22:00:00.000+08:00| 7.1| 1.34| 4.015| -0.32| 2.772| 1.31| 8.864| -|2017-10-23T23:00:00.000+08:00| 9.176| 2.746| 7.107| 1.635| 2.65| 1.097| 9.004| -+-----------------------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+ -Total line number = 96 - -IoTDB> call inference(dlinear_example, "select s0,s1,s2,s3,s4,s5,s6 from root.eg.etth", window=head(96)) -+-----------+----------+----------+------------+---------+----------+----------+ -| output0| output1| output2| output3| output4| output5| output6| -+-----------+----------+----------+------------+---------+----------+----------+ -| 10.319546| 3.1450553| 7.877341| 1.5723765|2.7303758| 1.1362307| 8.867775| -| 10.443649| 3.3286757| 7.8593454| 1.7675098| 2.560634| 1.1177158| 8.920919| -| 10.883752| 3.2341104| 8.47036| 1.6116762|2.4874182| 1.1760603| 8.798939| -...... -| 8.0115595| 1.2995274| 6.9900327|-0.098746896| 3.04923| 1.176214| 9.548782| -| 8.612427| 2.5036244| 5.6790237| 0.66474205|2.8870275| 1.2051733| 9.330128| -| 10.096699| 3.399722| 6.9909| 1.7478468|2.7642853| 1.1119363| 9.541455| -+-----------+----------+----------+------------+---------+----------+----------+ -Total line number = 48 -``` - -We compare the results of the prediction of the oil temperature with the real results, and we can get the following image. - -The data before 10/24 00:00 represents the past data input to the model, the blue line after 10/24 00:00 is the oil temperature forecast result given by the model, and the red line is the actual oil temperature data from the dataset (used for comparison). - -![](/img/AINode-analysis1.png) - -As can be seen, we have used the relationship between the six load information and the corresponding time oil temperatures for the past 96 hours (4 days) to model the possible changes in this data for the oil temperature for the next 48 hours (2 days) based on the inter-relationships between the sequences learned previously, and it can be seen that the predicted curves maintain a high degree of consistency in trend with the actual results after visualisation. - -### Power Prediction - -Power monitoring of current, voltage and power data is required in substations for detecting potential grid problems, identifying faults in the power system, effectively managing grid loads and analysing power system performance and trends. - -We have used the current, voltage and power data in a substation to form a dataset in a real scenario. The dataset consists of data such as A-phase voltage, B-phase voltage, and C-phase voltage collected every 5 - 6s for a time span of nearly four months in the substation. - -The test set data content is [data](/img/data.csv). - -On this dataset, the model inference function of IoTDB-ML can predict the C-phase voltage in the future period through the previous values and corresponding timestamps of A-phase voltage, B-phase voltage and C-phase voltage, empowering the monitoring management of the substation. - -#### Step 1: Data Import - -Users can import the dataset using `import-csv.sh` in the tools folder - -```Bash -bash ./import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f ... /... /data.csv -``` - -#### Step 2: Model Import - -We can select built-in models or registered models in IoTDB CLI for subsequent inference. - -We use the built-in model STLForecaster for prediction. STLForecaster is a time series forecasting method based on the STL implementation in the statsmodels library. - -#### Step 3: Model Inference - -```Shell -IoTDB> select * from root.eg.voltage limit 96 -+-----------------------------+------------------+------------------+------------------+ -| Time|root.eg.voltage.s0|root.eg.voltage.s1|root.eg.voltage.s2| -+-----------------------------+------------------+------------------+------------------+ -|2023-02-14T20:38:32.000+08:00| 2038.0| 2028.0| 2041.0| -|2023-02-14T20:38:38.000+08:00| 2014.0| 2005.0| 2018.0| -|2023-02-14T20:38:44.000+08:00| 2014.0| 2005.0| 2018.0| -...... -|2023-02-14T20:47:52.000+08:00| 2024.0| 2016.0| 2027.0| -|2023-02-14T20:47:57.000+08:00| 2024.0| 2016.0| 2027.0| -|2023-02-14T20:48:03.000+08:00| 2024.0| 2016.0| 2027.0| -+-----------------------------+------------------+------------------+------------------+ -Total line number = 96 - -IoTDB> call inference(_STLForecaster, "select s0,s1,s2 from root.eg.voltage", window=head(96),predict_length=48) -+---------+---------+---------+ -| output0| output1| output2| -+---------+---------+---------+ -|2026.3601|2018.2953|2029.4257| -|2019.1538|2011.4361|2022.0888| -|2025.5074|2017.4522|2028.5199| -...... - -|2022.2336|2015.0290|2025.1023| -|2015.7241|2008.8975|2018.5085| -|2022.0777|2014.9136|2024.9396| -|2015.5682|2008.7821|2018.3458| -+---------+---------+---------+ -Total line number = 48 -``` - -Comparing the predicted results of the C-phase voltage with the real results, we can get the following image. - -The data before 02/14 20:48 represents the past data input to the model, the blue line after 02/14 20:48 is the predicted result of phase C voltage given by the model, while the red line is the actual phase C voltage data from the dataset (used for comparison). - -![](/img/AINode-analysis2.png) - -It can be seen that we used the voltage data from the past 10 minutes and, based on the previously learned inter-sequence relationships, modeled the possible changes in the phase C voltage data for the next 5 minutes. The visualized forecast curve shows a certain degree of synchronicity with the actual results in terms of trend. - -### Anomaly Detection - -In the civil aviation and transport industry, there exists a need for anomaly detection of the number of passengers travelling on an aircraft. The results of anomaly detection can be used to guide the adjustment of flight scheduling to make the organisation more efficient. - -Airline Passengers is a time-series dataset that records the number of international air passengers between 1949 and 1960, sampled at one-month intervals. The dataset contains a total of one time series. The dataset is [airline](/img/airline.csv). -On this dataset, the model inference function of IoTDB-ML can empower the transport industry by capturing the changing patterns of the sequence in order to detect anomalies at the sequence time points. - -#### Step 1: Data Import - -Users can import the dataset using `import-csv.sh` in the tools folder - -``Bash -bash . /import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f ... /... /data.csv -`` - -#### Step 2: Model Inference - -IoTDB has some built-in machine learning algorithms that can be used directly, a sample prediction using one of the anomaly detection algorithms is shown below: - -```Shell -IoTDB> select * from root.eg.airline -+-----------------------------+------------------+ -| Time|root.eg.airline.s0| -+-----------------------------+------------------+ -|1949-01-31T00:00:00.000+08:00| 224.0| -|1949-02-28T00:00:00.000+08:00| 118.0| -|1949-03-31T00:00:00.000+08:00| 132.0| -|1949-04-30T00:00:00.000+08:00| 129.0| -...... -|1960-09-30T00:00:00.000+08:00| 508.0| -|1960-10-31T00:00:00.000+08:00| 461.0| -|1960-11-30T00:00:00.000+08:00| 390.0| -|1960-12-31T00:00:00.000+08:00| 432.0| -+-----------------------------+------------------+ -Total line number = 144 - -IoTDB> call inference(_Stray, "select s0 from root.eg.airline", k=2) -+-------+ -|output0| -+-------+ -| 0| -| 0| -| 0| -| 0| -...... -| 1| -| 1| -| 0| -| 0| -| 0| -| 0| -+-------+ -Total line number = 144 -``` - -We plot the results detected as anomalies to get the following image. Where the blue curve is the original time series and the time points specially marked with red dots are the time points that the algorithm detects as anomalies. - -![](/img/s6.png) - -It can be seen that the Stray model has modelled the input sequence changes and successfully detected the time points where anomalies occur. \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Authority-Management.md b/src/UserGuide/V1.3.0-2/User-Manual/Authority-Management.md deleted file mode 100644 index 0724d13c9..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Authority-Management.md +++ /dev/null @@ -1,519 +0,0 @@ - - -# Database Administration - -IoTDB provides permission management operations, offering users the ability to manage permissions for data and cluster systems, ensuring data and system security. - -This article introduces the basic concepts of the permission module in IoTDB, including user definition, permission management, authentication logic, and use cases. In the JAVA programming environment, you can use the [JDBC API](https://chat.openai.com/API/Programming-JDBC.md) to execute permission management statements individually or in batches. - -## Basic Concepts - -### User - -A user is a legitimate user of the database. Each user corresponds to a unique username and has a password as a means of authentication. Before using the database, a person must provide a valid (i.e., stored in the database) username and password for a successful login. - -### Permission - -The database provides various operations, but not all users can perform all operations. If a user can perform a certain operation, they are said to have permission to execute that operation. Permissions are typically limited in scope by a path, and [path patterns](https://chat.openai.com/Basic-Concept/Data-Model-and-Terminology.md) can be used to manage permissions flexibly. - -### Role - -A role is a collection of multiple permissions and has a unique role name as an identifier. Roles often correspond to real-world identities (e.g., a traffic dispatcher), and a real-world identity may correspond to multiple users. Users with the same real-world identity often have the same permissions, and roles are abstractions for unified management of such permissions. - -### Default Users and Roles - -After installation and initialization, IoTDB includes a default user: root, with the default password root. This user is an administrator with fixed permissions, which cannot be granted or revoked and cannot be deleted. There is only one administrator user in the database. - -A newly created user or role does not have any permissions initially. - -## User Definition - -Users with MANAGE_USER and MANAGE_ROLE permissions or administrators can create users or roles. Creating a user must meet the following constraints. - -### Username Constraints - -4 to 32 characters, supports the use of uppercase and lowercase English letters, numbers, and special characters (`!@#$%^&*()_+-=`). - -Users cannot create users with the same name as the administrator. - -### Password Constraints - -4 to 32 characters, can use uppercase and lowercase letters, numbers, and special characters (`!@#$%^&*()_+-=`). Passwords are encrypted by default using MD5. - -### Role Name Constraints - -4 to 32 characters, supports the use of uppercase and lowercase English letters, numbers, and special characters (`!@#$%^&*()_+-=`). - -Users cannot create roles with the same name as the administrator. - - - -## Permission Management - -IoTDB primarily has two types of permissions: series permissions and global permissions. - -### Series Permissions - -Series permissions constrain the scope and manner in which users access data. IOTDB support authorization for both absolute paths and prefix-matching paths, and can be effective at the timeseries granularity. - -The table below describes the types and scope of these permissions: - - - -| Permission Name | Description | -|-----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| READ_DATA | Allows reading time series data under the authorized path. | -| WRITE_DATA | Allows reading time series data under the authorized path.
Allows inserting and deleting time series data under the authorized path.
Allows importing and loading data under the authorized path. When importing data, you need the WRITE_DATA permission for the corresponding path. When automatically creating databases or time series, you need MANAGE_DATABASE and WRITE_SCHEMA permissions. | -| READ_SCHEMA | Allows obtaining detailed information about the metadata tree under the authorized path,
including databases, child paths, child nodes, devices, time series, templates, views, etc. | -| WRITE_SCHEMA | Allows obtaining detailed information about the metadata tree under the authorized path.
Allows creating, deleting, and modifying time series, templates, views, etc. under the authorized path. When creating or modifying views, it checks the WRITE_SCHEMA permission for the view path and READ_SCHEMA permission for the data source. When querying and inserting data into views, it checks the READ_DATA and WRITE_DATA permissions for the view path.
Allows setting, unsetting, and viewing TTL under the authorized path.
Allows attaching or detaching templates under the authorized path. | - - -### Global Permissions - -Global permissions constrain the database functions that users can use and restrict commands that change the system and task state. Once a user obtains global authorization, they can manage the database. -The table below describes the types of system permissions: - - -| Permission Name | Description | -|:---------------:|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| MANAGE_DATABASE | Allow users to create and delete databases. | -| MANAGE_USER | Allow users to create, delete, modify, and view users. | -| MANAGE_ROLE | Allow users to create, delete, modify, and view roles.
Allow users to grant/revoke roles to/from other users. | -| USE_TRIGGER | Allow users to create, delete, and view triggers.
Independent of data source permission checks for triggers. | -| USE_UDF | Allow users to create, delete, and view user-defined functions.
Independent of data source permission checks for user-defined functions. | -| USE_CQ | Allow users to create, delete, and view continuous queries.
Independent of data source permission checks for continuous queries. | -| USE_PIPE | Allow users to create, start, stop, delete, and view pipelines.
Allow users to create, delete, and view pipeline plugins.
Independent of data source permission checks for pipelines. | -| EXTEND_TEMPLATE | Permission to automatically create templates. | -| MAINTAIN | Allow users to query and cancel queries.
Allow users to view variables.
Allow users to view cluster status. | -| USE_MODEL | Allow users to create, delete and view deep learning model. | -Regarding template permissions: - -1. Only administrators are allowed to create, delete, modify, query, mount, and unmount templates. -2. To activate a template, you need to have WRITE_SCHEMA permission for the activation path. -3. If automatic creation is enabled, writing to a non-existent path that has a template mounted will automatically extend the template and insert data. Therefore, one needs EXTEND_TEMPLATE permission and WRITE_DATA permission for writing to the sequence. -4. To deactivate a template, WRITE_SCHEMA permission for the mounted template path is required. -5. To query paths that use a specific metadata template, you needs READ_SCHEMA permission for the paths; otherwise, it will return empty results. - - - -### Granting and Revoking Permissions - -In IoTDB, users can obtain permissions through three methods: - -1. Granted by administrator, who has control over the permissions of other users. -2. Granted by a user allowed to authorize permissions, and this user was assigned the grant option keyword when obtaining the permission. -3. Granted a certain role by administrator or a user with MANAGE_ROLE, thereby obtaining permissions. - -Revoking a user's permissions can be done through the following methods: - -1. Revoked by administrator. -2. Revoked by a user allowed to authorize permissions, and this user was assigned the grant option keyword when obtaining the permission. -3. Revoked from a user's role by administrator or a user with MANAGE_ROLE, thereby revoking the permissions. - -- When granting permissions, a path must be specified. Global permissions need to be specified as root.**, while series-specific permissions must be absolute paths or prefix paths ending with a double wildcard. -- When granting user/role permissions, you can specify the "with grant option" keyword for that permission, which means that the user can grant permissions on their authorized paths and can also revoke permissions on other users' authorized paths. For example, if User A is granted read permission for `group1.company1.**` with the grant option keyword, then A can grant read permissions to others on any node or series below `group1.company1`, and can also revoke read permissions on any node below `group1.company1` for other users. -- When revoking permissions, the revocation statement will match against all of the user's permission paths and clear the matched permission paths. For example, if User A has read permission for `group1.company1.factory1`, when revoking read permission for `group1.company1.**`, it will remove A's read permission for `group1.company1.factory1`. - - - -## Authentication - -User permissions mainly consist of three parts: permission scope (path), permission type, and the "with grant option" flag: - -``` -userTest1: - root.t1.** - read_schema, read_data - with grant option - root.** - write_schema, write_data - with grant option -``` - -Each user has such a permission access list, identifying all the permissions they have acquired. You can view their permissions by using the command `LIST PRIVILEGES OF USER `. - -When authorizing a path, the database will match the path with the permissions. For example, when checking the read_schema permission for `root.t1.t2`, it will first match with the permission access list `root.t1.**`. If it matches successfully, it will then check if that path contains the permission to be authorized. If not, it continues to the next path-permission match until a match is found or all matches are exhausted. - -When performing authorization for multiple paths, such as executing a multi-path query task, the database will only present data for which the user has permissions. Data for which the user does not have permissions will not be included in the results, and information about these paths without permissions will be output to the alert messages. - -Please note that the following operations require checking multiple permissions: - -1. Enabling the automatic sequence creation feature requires not only write permission for the corresponding sequence when a user inserts data into a non-existent sequence but also metadata modification permission for the sequence. - -2. When executing the "select into" statement, it is necessary to check the read permission for the source sequence and the write permission for the target sequence. It should be noted that the source sequence data may only be partially accessible due to insufficient permissions, and if the target sequence has insufficient write permissions, an error will occur, terminating the task. - -3. View permissions and data source permissions are independent. Performing read and write operations on a view will only check the permissions of the view itself and will not perform permission validation on the source path. - - -## Function Syntax and Examples - -IoTDB provides composite permissions for user authorization: - -| Permission Name | Permission Scope | -|-----------------|--------------------------| -| ALL | All permissions | -| READ | READ_SCHEMA, READ_DATA | -| WRITE | WRITE_SCHEMA, WRITE_DATA | - -Composite permissions are not specific permissions themselves but a shorthand way to denote a combination of permissions, with no difference from directly specifying the corresponding permission names. - -The following series of specific use cases will demonstrate the usage of permission statements. Non-administrator users executing the following statements require obtaining the necessary permissions, which are indicated after the operation description. - -### User and Role Related - -- Create user (Requires MANAGE_USER permission) - -```SQL -CREATE USER -eg: CREATE USER user1 'passwd' -``` - -- Delete user (Requires MANAGE_USER permission) - -```sql -DROP USER -eg: DROP USER user1 -``` - -- Create role (Requires MANAGE_ROLE permission) - -```sql -CREATE ROLE -eg: CREATE ROLE role1 -``` - -- Delete role (Requires MANAGE_ROLE permission) - -```sql -DROP ROLE -eg: DROP ROLE role1 -``` - -- Grant role to user (Requires MANAGE_ROLE permission) - -```sql -GRANT ROLE TO -eg: GRANT ROLE admin TO user1 -``` - -- Revoke role from user(Requires MANAGE_ROLE permission) - -```sql -REVOKE ROLE FROM -eg: REVOKE ROLE admin FROM user1 -``` - -- List all user (Requires MANAGE_USER permission) - -```sql -LIST USER -``` - -- List all role (Requires MANAGE_ROLE permission) - -```sql -LIST ROLE -``` - -- List all users granted specific role.(Requires MANAGE_USER permission) - -```sql -LIST USER OF ROLE -eg: LIST USER OF ROLE roleuser -``` - -- List all role granted to specific user. - - Users can list their own roles, but listing roles of other users requires the MANAGE_ROLE permission. - -```sql -LIST ROLE OF USER -eg: LIST ROLE OF USER tempuser -``` - -- List all privileges of user - -Users can list their own privileges, but listing privileges of other users requires the MANAGE_USER permission. - -```sql -LIST PRIVILEGES OF USER ; -eg: LIST PRIVILEGES OF USER tempuser; -``` - -- List all privileges of role - -Users can list the permission information of roles they have, but listing permissions of other roles requires the MANAGE_ROLE permission. - -```sql -LIST PRIVILEGES OF ROLE ; -eg: LIST PRIVILEGES OF ROLE actor; -``` - -- Update password - -Users can update their own password, but updating passwords of other users requires the MANAGE_USER permission. - -```sql -ALTER USER SET PASSWORD ; -eg: ALTER USER tempuser SET PASSWORD 'newpwd'; -``` - -### Authorization and Deauthorization - -Users can use authorization statements to grant permissions to other users. The syntax is as follows: - -```sql -GRANT ON TO ROLE/USER [WITH GRANT OPTION]; -eg: GRANT READ ON root.** TO ROLE role1; -eg: GRANT READ_DATA, WRITE_DATA ON root.t1.** TO USER user1; -eg: GRANT READ_DATA, WRITE_DATA ON root.t1.**,root.t2.** TO USER user1; -eg: GRANT MANAGE_ROLE ON root.** TO USER user1 WITH GRANT OPTION; -eg: GRANT ALL ON root.** TO USER user1 WITH GRANT OPTION; -``` - -Users can use deauthorization statements to revoke permissions from others. The syntax is as follows: - -```sql -REVOKE ON FROM ROLE/USER ; -eg: REVOKE READ ON root.** FROM ROLE role1; -eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.** FROM USER user1; -eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.**, root.t2.** FROM USER user1; -eg: REVOKE MANAGE_ROLE ON root.** FROM USER user1; -eg: REVOKE ALL ON ROOT.** FROM USER user1; -``` - -- **When non-administrator users execute authorization/deauthorization statements, they need to have \ permissions on \, and these permissions must be marked with WITH GRANT OPTION.** - -- When granting or revoking global permissions or when the statement contains global permissions (expanding ALL includes global permissions), you must specify the path as root**. For example, the following authorization/deauthorization statements are valid: - - ```sql - GRANT MANAGE_USER ON root.** TO USER user1; - GRANT MANAGE_ROLE ON root.** TO ROLE role1 WITH GRANT OPTION; - GRANT ALL ON root.** TO role role1 WITH GRANT OPTION; - REVOKE MANAGE_USER ON root.** FROM USER user1; - REVOKE MANAGE_ROLE ON root.** FROM ROLE role1; - REVOKE ALL ON root.** FROM ROLE role1; - ``` - - The following statements are invalid: - - ```sql - GRANT READ, MANAGE_ROLE ON root.t1.** TO USER user1; - GRANT ALL ON root.t1.t2 TO USER user1 WITH GRANT OPTION; - REVOKE ALL ON root.t1.t2 FROM USER user1; - REVOKE READ, MANAGE_ROLE ON root.t1.t2 FROM ROLE ROLE1; - ``` - -- \ must be a full path or a matching path ending with a double wildcard. The following paths are valid: - - ```sql - root.** - root.t1.t2.** - root.t1.t2.t3 - ``` - - The following paths are invalid: - - ```sql - root.t1.* - root.t1.**.t2 - root.t1*.t2.t3 - ``` - - - -## Examples - - Based on the described [sample data](https://github.com/thulab/iotdb/files/4438687/OtherMaterial-Sample.Data.txt), IoTDB's sample data may belong to different power generation groups such as ln, sgcc, and so on. Different power generation groups do not want other groups to access their database data, so we need to implement data isolation at the group level. - -#### Create Users -Use `CREATE USER ` to create users. For example, we can create two users for the ln and sgcc groups with the root user, who has all permissions, and name them ln_write_user and sgcc_write_user. It is recommended to enclose the username in backticks. The SQL statements are as follows: -```SQL -CREATE USER `ln_write_user` 'write_pwd' -CREATE USER `sgcc_write_user` 'write_pwd' -``` - -Now, using the SQL statement to display users: - -```sql -LIST USER -``` - -We can see that these two users have been created, and the result is as follows: - -```sql -IoTDB> CREATE USER `ln_write_user` 'write_pwd' -Msg: The statement is executed successfully. -IoTDB> CREATE USER `sgcc_write_user` 'write_pwd' -Msg: The statement is executed successfully. -IoTDB> LIST USER; -+---------------+ -| user| -+---------------+ -| ln_write_user| -| root| -|sgcc_write_user| -+---------------+ -Total line number = 3 -It costs 0.012s -``` - -#### Granting Permissions to Users - -At this point, although two users have been created, they do not have any permissions, so they cannot operate on the database. For example, if we use the ln_write_user to write data to the database, the SQL statement is as follows: - -```sql -INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) -``` - -At this point, the system does not allow this operation, and an error is displayed: - -```sql -IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) -Msg: 803: No permissions for this operation, please add privilege WRITE_DATA on [root.ln.wf01.wt01.status] -``` - -Now, we will grant each user write permissions to the corresponding paths using the root user. - -We use the `GRANT ON TO USER ` statement to grant permissions to users, for example: - -```sql -GRANT WRITE_DATA ON root.ln.** TO USER `ln_write_user` -GRANT WRITE_DATA ON root.sgcc1.**, root.sgcc2.** TO USER `sgcc_write_user` -``` - -The execution status is as follows: - -```sql -IoTDB> GRANT WRITE_DATA ON root.ln.** TO USER `ln_write_user` -Msg: The statement is executed successfully. -IoTDB> GRANT WRITE_DATA ON root.sgcc1.**, root.sgcc2.** TO USER `sgcc_write_user` -Msg: The statement is executed successfully. -``` - -Then, using ln_write_user, try to write data again: - -```sql -IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) -Msg: The statement is executed successfully. -``` - -#### Revoking User Permissions - -After granting user permissions, we can use the `REVOKE ON FROM USER ` to revoke the permissions granted to users. For example, using the root user to revoke the permissions of ln_write_user and sgcc_write_user: - -```sql -REVOKE WRITE_DATA ON root.ln.** FROM USER `ln_write_user` -REVOKE WRITE_DATA ON root.sgcc1.**, root.sgcc2.** FROM USER `sgcc_write_user` -``` - - -The execution status is as follows: - -```sql -IoTDB> REVOKE WRITE_DATA ON root.ln.** FROM USER `ln_write_user` -Msg: The statement is executed successfully. -IoTDB> REVOKE WRITE_DATA ON root.sgcc1.**, root.sgcc2.** FROM USER `sgcc_write_user` -Msg: The statement is executed successfully. -``` - -After revoking the permissions, ln_write_user no longer has the permission to write data to root.ln.**: - -```sql -IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) -Msg: 803: No permissions for this operation, please add privilege WRITE_DATA on [root.ln.wf01.wt01.status] -``` - -## Other Explanations - -Roles are collections of permissions, and both permissions and roles are attributes of users. In other words, a role can have multiple permissions, and a user can have multiple roles and permissions (referred to as the user's self-permissions). - -Currently, in IoTDB, there are no conflicting permissions. Therefore, the actual permissions a user has are the union of their self-permissions and the permissions of all their roles. In other words, to determine if a user can perform a certain operation, it's necessary to check whether their self-permissions or the permissions of all their roles allow that operation. Self-permissions, role permissions, and the permissions of multiple roles a user has may contain the same permission, but this does not have any impact. - -It's important to note that if a user has a certain permission (corresponding to operation A) on their own, and one of their roles has the same permission, revoking the permission from the user alone will not prevent the user from performing operation A. To prevent the user from performing operation A, you need to revoke the permission from both the user and the role, or remove the user from the role that has the permission. Similarly, if you only revoke the permission from the role, it won't prevent the user from performing operation A if they have the same permission on their own. - -At the same time, changes to roles will be immediately reflected in all users who have that role. For example, adding a certain permission to a role will immediately grant that permission to all users who have that role, and removing a certain permission will cause those users to lose that permission (unless the user has it on their own). - - - -## Upgrading from a previous version - -Before version 1.3, there were many different permission types. In 1.3 version's implementation, we have streamlined the permission types. - -The permission paths in version 1.3 of the database must be either full paths or matching paths ending with a double wildcard. During system upgrades, any invalid permission paths and permission types will be automatically converted. The first invalid node on the path will be replaced with "**", and any unsupported permission types will be mapped to the permissions supported by the current system. - -| Permission | Path | Mapped-Permission | Mapped-path | -|-------------------|-----------------|-------------------|---------------| -| CREATE_DATBASE | root.db.t1.* | MANAGE_DATABASE | root.** | -| INSERT_TIMESERIES | root.db.t2.*.t3 | WRITE_DATA | root.db.t2.** | -| CREATE_TIMESERIES | root.db.t2*c.t3 | WRITE_SCHEMA | root.db.** | -| LIST_ROLE | root.** | (ignore) | | - - - -You can refer to the table below for a comparison of permission types between the old and new versions (where "--IGNORE" indicates that the new version ignores that permission): - -| Permission Name | Path-Related | New Permission Name | Path-Related | -|---------------------------|--------------|---------------------|--------------| -| CREATE_DATABASE | YES | MANAGE_DATABASE | NO | -| INSERT_TIMESERIES | YES | WRITE_DATA | YES | -| UPDATE_TIMESERIES | YES | WRITE_DATA | YES | -| READ_TIMESERIES | YES | READ_DATA | YES | -| CREATE_TIMESERIES | YES | WRITE_SCHEMA | YES | -| DELETE_TIMESERIES | YES | WRITE_SCHEMA | YES | -| CREATE_USER | NO | MANAGE_USER | NO | -| DELETE_USER | NO | MANAGE_USER | NO | -| MODIFY_PASSWORD | NO | -- IGNORE | | -| LIST_USER | NO | -- IGNORE | | -| GRANT_USER_PRIVILEGE | NO | -- IGNORE | | -| REVOKE_USER_PRIVILEGE | NO | -- IGNORE | | -| GRANT_USER_ROLE | NO | MANAGE_ROLE | NO | -| REVOKE_USER_ROLE | NO | MANAGE_ROLE | NO | -| CREATE_ROLE | NO | MANAGE_ROLE | NO | -| DELETE_ROLE | NO | MANAGE_ROLE | NO | -| LIST_ROLE | NO | -- IGNORE | | -| GRANT_ROLE_PRIVILEGE | NO | -- IGNORE | | -| REVOKE_ROLE_PRIVILEGE | NO | -- IGNORE | | -| CREATE_FUNCTION | NO | USE_UDF | NO | -| DROP_FUNCTION | NO | USE_UDF | NO | -| CREATE_TRIGGER | YES | USE_TRIGGER | NO | -| DROP_TRIGGER | YES | USE_TRIGGER | NO | -| START_TRIGGER | YES | USE_TRIGGER | NO | -| STOP_TRIGGER | YES | USE_TRIGGER | NO | -| CREATE_CONTINUOUS_QUERY | NO | USE_CQ | NO | -| DROP_CONTINUOUS_QUERY | NO | USE_CQ | NO | -| ALL | NO | All privilegs | | -| DELETE_DATABASE | YES | MANAGE_DATABASE | NO | -| ALTER_TIMESERIES | YES | WRITE_SCHEMA | YES | -| UPDATE_TEMPLATE | NO | -- IGNORE | | -| READ_TEMPLATE | NO | -- IGNORE | | -| APPLY_TEMPLATE | YES | WRITE_SCHEMA | YES | -| READ_TEMPLATE_APPLICATION | NO | -- IGNORE | | -| SHOW_CONTINUOUS_QUERIES | NO | -- IGNORE | | -| CREATE_PIPEPLUGIN | NO | USE_PIPE | NO | -| DROP_PIPEPLUGINS | NO | USE_PIPE | NO | -| SHOW_PIPEPLUGINS | NO | -- IGNORE | | -| CREATE_PIPE | NO | USE_PIPE | NO | -| START_PIPE | NO | USE_PIPE | NO | -| STOP_PIPE | NO | USE_PIPE | NO | -| DROP_PIPE | NO | USE_PIPE | NO | -| SHOW_PIPES | NO | -- IGNORE | | -| CREATE_VIEW | YES | WRITE_SCHEMA | YES | -| ALTER_VIEW | YES | WRITE_SCHEMA | YES | -| RENAME_VIEW | YES | WRITE_SCHEMA | YES | -| DELETE_VIEW | YES | WRITE_SCHEMA | YES | diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Data-Sync_apache.md b/src/UserGuide/V1.3.0-2/User-Manual/Data-Sync_apache.md deleted file mode 100644 index 691dfd4e3..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Data-Sync_apache.md +++ /dev/null @@ -1,542 +0,0 @@ - - -# Data Synchronisation - -Data synchronization is a typical requirement in industrial Internet of Things (IoT). Through data synchronization mechanisms, it is possible to achieve data sharing between IoTDB, and to establish a complete data link to meet the needs for internal and external network data interconnectivity, edge-cloud synchronization, data migration, and data backup. - -## Function Overview - -### Data Synchronization - -A data synchronization task consists of three stages: - -![](/img/sync_en_01.png) - -- Source Stage:This part is used to extract data from the source IoTDB, defined in the source section of the SQL statement. -- Process Stage:This part is used to process the data extracted from the source IoTDB, defined in the processor section of the SQL statement. -- Sink Stage:This part is used to send data to the target IoTDB, defined in the sink section of the SQL statement. - -By declaratively configuring the specific content of the three parts through SQL statements, flexible data synchronization capabilities can be achieved. Currently, data synchronization supports the synchronization of the following information, and you can select the synchronization scope when creating a synchronization task (the default is data.insert, which means synchronizing newly written data): - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Synchronization ScopeSynchronization Content Description
allAll scopes
data(Data)insertSynchronize newly written data
deleteSynchronize deleted data
schemadatabaseSynchronize database creation, modification or deletion operations
timeseriesSynchronize the definition and attributes of time series
TTLSynchronize the data retention time
auth-Synchronize user permissions and access control
- -### Functional limitations and instructions - -The schema and auth synchronization functions have the following limitations: - -- When using schema synchronization, it is required that the consensus protocol of `Schema region` and `ConfigNode` must be the default ratis protocol, that is: In the `iotdb-common.properties` configuration file, both the `config_node_consensus_protocol_class` and `schema_region_consensus_protocol_class` configuration items are set to `org.apache.iotdb.consensus.ratis.RatisConsensus`. - -- To prevent potential conflicts, please turn off the automatic creation of metadata on the receiving end when enabling schema synchronization. You can do this by setting the `enable_auto_create_schema` configuration in the `iotdb-common.properties` configuration file to false. - -- When schema synchronization is enabled, the use of custom plugins is not supported. - -- During data synchronization tasks, please avoid performing any deletion operations to prevent inconsistent states between the two ends. - -## Usage Instructions - -Data synchronization tasks have three states: RUNNING, STOPPED, and DROPPED. The task state transitions are shown in the following diagram: - -V1.3.0 and earlier versions: - -After creation, it will not start immediately and needs to execute the `START PIPE` statement to start the task. - -![](/img/sync_en_02.png) - -V1.3.1 and later versions: - -After creation, the task will start directly, and when the task stops abnormally, the system will automatically attempt to restart the task. - -![](/img/Data-Sync02.png) - -Provide the following SQL statements for state management of synchronization tasks. - -### Create Task - -Use the `CREATE PIPE` statement to create a data synchronization task. The `PipeId` and `sink` attributes are required, while `source` and `processor` are optional. When entering the SQL, note that the order of the `SOURCE` and `SINK` plugins cannot be swapped. - -The SQL example is as follows: - -```SQL -CREATE PIPE -- PipeId is the name that uniquely identifies the task. --- Data extraction plugin, optional plugin -WITH SOURCE ( - [ = ,], -) --- Data processing plugin, optional plugin -WITH PROCESSOR ( - [ = ,], -) --- Data connection plugin, required plugin -WITH SINK ( - [ = ,], -) -``` - -### Start Task - -Start processing data: - - -```SQL -START PIPE -``` - -### Stop Task - -Stop processing data: - -```SQL -STOP PIPE -``` - -### Delete Task - -Deletes the specified task: - -```SQL -DROP PIPE -``` - -Deleting a task does not require stopping the synchronization task first. - -### View Task - -View all tasks: - -```SQL -SHOW PIPES -``` - -To view a specified task: - -```SQL -SHOW PIPE -``` - -Example of the show pipes result for a pipe: - -```SQL -+--------------------------------+-----------------------+-------+---------------+--------------------+------------------------------------------------------------+----------------+ -| ID| CreationTime| State| PipeSource| PipeProcessor| PipeSink|ExceptionMessage| -+--------------------------------+-----------------------+-------+---------------+--------------------+------------------------------------------------------------+----------------+ -|3421aacb16ae46249bac96ce4048a220|2024-08-13T09:55:18.717|RUNNING| {}| {}|{{sink=iotdb-thrift-sink, sink.ip=127.0.0.1, sink.port=6668}}| | -+--------------------------------+-----------------------+-------+---------------+--------------------+------------------------------------------------------------+----------------+ -``` - -其中各列含义如下: - -- **ID**:The unique identifier for the synchronization task -- **CreationTime**:The time when the synchronization task was created -- **State**:The state of the synchronization task -- **PipeSource**:The source of the synchronized data stream -- **PipeProcessor**:The processing logic of the synchronized data stream during transmission -- **PipeSink**:The destination of the synchronized data stream -- **ExceptionMessage**:Displays the exception information of the synchronization task - - - - -### Synchronization Plugins - - -To make the overall architecture more flexible to match different synchronization scenario requirements, we support plugin assembly within the synchronization task framework. The system comes with some pre-installed common plugins that you can use directly. At the same time, you can also customize processor plugins and Sink plugins, and load them into the IoTDB system for use. You can view the plugins in the system (including custom and built-in plugins) with the following statement: - -```SQL -SHOW PIPEPLUGINS -``` - -The return result is as follows (version 1.3.2): - -```SQL -IoTDB> SHOW PIPEPLUGINS -+---------------------+----------+-------------------------------------------------------------------------------------------+----------------------------------------------------+ -| PluginName|PluginType| ClassName| PluginJar| -+---------------------+----------+-------------------------------------------------------------------------------------------+----------------------------------------------------+ -| DO-NOTHING-PROCESSOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.processor.donothing.DoNothingProcessor| | -| DO-NOTHING-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.donothing.DoNothingConnector| | -| IOTDB-SOURCE| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.extractor.iotdb.IoTDBExtractor| | -| IOTDB-THRIFT-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftConnector| | -|IOTDB-THRIFT-SSL-SINK| Builtin|org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftSslConnector| | -+---------------------+----------+-------------------------------------------------------------------------------------------+----------------------------------------------------+ -``` - - -Detailed introduction of pre-installed plugins is as follows (for detailed parameters of each plugin, please refer to the [Parameter Description](#reference-parameter-description) section): - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TypeCustom PluginPlugin NameDescriptionApplicable Version
source pluginNot Supportediotdb-sourceThe default extractor plugin, used to extract historical or real-time data from IoTDB1.2.x
processor pluginSupporteddo-nothing-processorThe default processor plugin, which does not process the incoming data1.2.x
sink pluginSupporteddo-nothing-sinkDoes not process the data that is sent out1.2.x
iotdb-thrift-sinkThe default sink plugin ( V1.3.1+ ), used for data transfer between IoTDB ( V1.2.0+ ) and IoTDB( V1.2.0+ ) . It uses the Thrift RPC framework to transfer data, with a multi-threaded async non-blocking IO model, high transfer performance, especially suitable for scenarios where the target end is distributed1.2.x
iotdb-thrift-ssl-sinkUsed for data transfer between IoTDB ( V1.3.1+ ) and IoTDB ( V1.2.0+ ). It uses the Thrift RPC framework to transfer data, with a single-threaded sync blocking IO model, suitable for scenarios with higher security requirements1.3.1+
- -For importing custom plugins, please refer to the [Stream Processing](./Streaming_apache.md#custom-stream-processing-plugin-management) section. - -## Use examples - -### Full data synchronisation - -This example is used to demonstrate the synchronisation of all data from one IoTDB to another IoTDB with the data link as shown below: - -![](/img/pipe1.jpg) - -In this example, we can create a synchronization task named A2B to synchronize the full data from A IoTDB to B IoTDB. The iotdb-thrift-sink plugin (built-in plugin) for the sink is required. The URL of the data service port of the DataNode node on the target IoTDB needs to be configured through node-urls, as shown in the following example statement: - -```SQL -create pipe A2B -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -``` - -### Partial data synchronization - -This example is used to demonstrate the synchronisation of data from a certain historical time range (8:00pm 23 August 2023 to 8:00pm 23 October 2023) to another IoTDB, the data link is shown below: - -![](/img/pipe2.jpg) - -In this example, we can create a synchronization task named A2B. First, we need to define the range of data to be transferred in the source. Since the data being transferred is historical data (historical data refers to data that existed before the creation of the synchronization task), we need to configure the start-time and end-time of the data and the transfer mode mode. The URL of the data service port of the DataNode node on the target IoTDB needs to be configured through node-urls. - -The detailed statements are as follows: - -```SQL -create pipe A2B -WITH SOURCE ( - 'source'= 'iotdb-source', - 'realtime.mode' = 'stream' -- The extraction mode for newly inserted data (after pipe creation) - 'start-time' = '2023.08.23T08:00:00+00:00', -- The start event time for synchronizing all data, including start-time - 'end-time' = '2023.10.23T08:00:00+00:00' -- The end event time for synchronizing all data, including end-time -) -with SINK ( - 'sink'='iotdb-thrift-async-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -) -``` - -### Edge-cloud data transfer - -This example is used to demonstrate the scenario where data from multiple IoTDB is transferred to the cloud, with data from clusters B, C, and D all synchronized to cluster A, as shown in the figure below: - -![](/img/sync_en_03.png) - -In this example, to synchronize the data from clusters B, C, and D to A, the pipe between BA, CA, and DA needs to configure the `path` to limit the range, and to keep the edge and cloud data consistent, the pipe needs to be configured with `inclusion=all` to synchronize full data and metadata. The detailed statement is as follows: - -On B IoTDB, execute the following statement to synchronize data from B to A: - -```SQL -create pipe BA -with source ( - 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth - 'path'='root.db.**', -- Limit the range -) -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -) -) -``` - -On C IoTDB, execute the following statement to synchronize data from C to A: - -```SQL -create pipe CA -with source ( - 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth - 'path'='root.db.**', -- Limit the range -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -) -) -``` - -On D IoTDB, execute the following statement to synchronize data from D to A: - -```SQL -create pipe DA -with source ( - 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth - 'path'='root.db.**', -- Limit the range -) -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -) -) -``` - -### Cascading data transfer - -This example is used to demonstrate the scenario where data is transferred in a cascading manner between multiple IoTDB, with data from cluster A synchronized to cluster B, and then to cluster C, as shown in the figure below: - -![](/img/sync_en_04.png) - -In this example, to synchronize the data from cluster A to C, the `forwarding-pipe-requests` needs to be set to `true` between BC. The detailed statement is as follows: - -On A IoTDB, execute the following statement to synchronize data from A to B: - -```SQL -create pipe AB -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -) -) -``` - -On B IoTDB, execute the following statement to synchronize data from B to C: - -```SQL -create pipe BC -with source ( - 'forwarding-pipe-requests' = 'true' -- Whether to forward data written by other Pipes -) -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6669', -- The URL of the data service port of the DataNode node on the target IoTDB -) -) -``` - - -### Encrypted Synchronization (V1.3.1+) - -IoTDB supports the use of SSL encryption during the synchronization process, ensuring the secure transfer of data between different IoTDB instances. By configuring SSL-related parameters, such as the certificate address and password (`ssl.trust-store-path`)、(`ssl.trust-store-pwd`), data can be protected by SSL encryption during the synchronization process. - -For example, to create a synchronization task named A2B: - -```SQL -create pipe A2B -with sink ( - 'sink'='iotdb-thrift-ssl-sink', - 'node-urls'='127.0.0.1:6667', -- The URL of the data service port of the DataNode node on the target IoTDB - 'ssl.trust-store-path'='pki/trusted', -- The trust store certificate path required to connect to the target DataNode - 'ssl.trust-store-pwd'='root' -- The trust store certificate password required to connect to the target DataNode -) -``` - -## Reference: Notes - -You can adjust the parameters for data synchronization by modifying the IoTDB configuration file (`iotdb-common.properties`), such as the directory for storing synchronized data. The complete configuration is as follows: - -V1.3.0/1/2: - -```Properties -#################### -### Pipe Configuration -#################### - -# Uncomment the following field to configure the pipe lib directory. -# For Windows platform -# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is -# absolute. Otherwise, it is relative. -# pipe_lib_dir=ext\\pipe -# For Linux platform -# If its prefix is "/", then the path is absolute. Otherwise, it is relative. -# pipe_lib_dir=ext/pipe - -# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. -# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). -# pipe_subtask_executor_max_thread_num=5 - -# The connection timeout (in milliseconds) for the thrift client. -# pipe_sink_timeout_ms=900000 - -# The maximum number of selectors that can be used in the sink. -# Recommend to set this value to less than or equal to pipe_sink_max_client_number. -# pipe_sink_selector_number=4 - -# The maximum number of clients that can be used in the sink. -# pipe_sink_max_client_number=16 - -``` - -## Reference: parameter description - -### source parameter(V1.3.0) - -| key | value | value range | required or not | default value | -| :------------------------------ | :----------------------------------------------------------- | :------------------------------------- | :------- | :------------- | -| source | iotdb-source | String: iotdb-source | required | - | -| source.pattern | Used to filter the path prefix of time series | String: any time series prefix | optional | root | -| source.history.enable | Whether to send historical data | Boolean: true / false | optional | true | -| source.history.start-time | The start event time for synchronizing historical data, including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional | Long.MIN_VALUE | -| source.history.end-time | The end event time for synchronizing historical data, including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional | Long.MAX_VALUE | -| source.realtime.mode | Extraction mode for newly inserted data (after pipe creation) | String: batch | Optional | batch | -| source.forwarding-pipe-requests | Whether to forward data written by other pipes (usually data synchronization) | Boolean: true | Optional | true | -| source.history.loose-range | When transferring tsfile, whether to relax the historical data (before pipe creation) range. "": Do not relax the range, select data strictly according to the set conditions "time": Relax the time range to avoid splitting TsFile, which can improve synchronization efficiency | String: "" / "time" | optional | Empty String | - -> 💎 **Explanation: Difference between Historical Data and Real-time Data** -> - **Historical Data**: All data with arrival time < the current system time when the pipe is created is called historical data. -> - **Real-time Data**:All data with arrival time >= the current system time when the pipe is created is called real-time data. -> - **Full Data**: Full data = Historical data + Real-time data -> -> 💎 **Explanation: Differences between Stream and Batch Data Extraction Modes** -> - **stream (recommended)**: In this mode, tasks process and send data in real-time. It is characterized by high timeliness and low throughput. -> - **batch**: In this mode, tasks process and send data in batches (according to the underlying data files). It is characterized by low timeliness and high throughput. - -### source Parameter(V1.3.1) - -> In versions 1.3.1 and above, the parameters no longer require additional source, processor, and sink prefixes. - -| key | value | value range | required or not | default value | -| :----------------------- | :----------------------------------------------------------- | :------------------------------------- | :------- | :------------- | -| source | iotdb-source | String: iotdb-source | Required | - | -| pattern | Used to filter the path prefix of time series | String: any time series prefix | Optional | root | -| start-time | The start event time for synchronizing all data, including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MIN_VALUE | -| end-time | The end event time for synchronizing all data, including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MAX_VALUE | -| realtime.mode | Extraction mode for newly inserted data (after pipe creation) | String: batch | Optional | batch | -| forwarding-pipe-requests | Whether to forward data written by other pipes (usually data synchronization) | Boolean: true | Optional | true | -| history.loose-range | When transferring tsfile, whether to relax the historical data (before pipe creation) range. "": Do not relax the range, select data strictly according to the set conditions "time": Relax the time range to avoid splitting TsFile, which can improve synchronization efficiency | String: "" / "time" | Optional | Empty String | - -> 💎 **Explanation**:To maintain compatibility with lower versions, history.enable, history.start-time, history.end-time, realtime.enable can still be used, but they are not recommended in the new version. -> -> 💎 **Explanation: Differences between Stream and Batch Data Extraction Modes** -> - **stream (recommended)**: In this mode, tasks process and send data in real-time. It is characterized by high timeliness and low throughput. -> - **batch**: In this mode, tasks process and send data in batches (according to the underlying data files). It is characterized by low timeliness and high throughput. - -### source Parameter(V1.3.2) - -> In versions 1.3.1 and above, the parameters no longer require additional source, processor, and sink prefixes. - -| key | value | value range | required or not | default value | -| :----------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :------------- | -| source | iotdb-source | String: iotdb-source | Required | - | -| inclusion | Used to specify the range of data to be synchronized in the data synchronization task, including data, schema, and auth | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | Optional | data.insert | -| inclusion.exclusion | Used to exclude specific operations from the range specified by inclusion, reducing the amount of data synchronized | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | Optional | - | -| path | Used to filter the path pattern schema of time series and data to be synchronized / schema synchronization can only use pathpath is exact matching, parameters must be prefix paths or complete paths, i.e., cannot contain `"*"`, at most one `"**"` at the end of the path parameter | String:IoTDB pattern | Optional | root.** | -| pattern | Used to filter the path prefix of time series | String: Optional | Optional | root | -| start-time | The start event time for synchronizing all data, including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MIN_VALUE | -| end-time | The end event time for synchronizing all data, including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MAX_VALUE | -| realtime.mode | Extraction mode for newly inserted data (after pipe creation) | String: batch | Optional | batch | -| forwarding-pipe-requests | Whether to forward data written by other pipes (usually data synchronization) | Boolean: true | Optional | true | -| history.loose-range | When transferring tsfile, whether to relax the historical data (before pipe creation) range. "": Do not relax the range, select data strictly according to the set conditions "time": Relax the time range to avoid splitting TsFile, which can improve synchronization efficiency | String: "" 、 "time" | Optional | "" | -| mods.enable | Whether to send the mods file of tsfile | Boolean: true / false | Optional | false | - -> 💎 **Explanation**:To maintain compatibility with lower versions, history.enable, history.start-time, history.end-time, realtime.enable can still be used, but they are not recommended in the new version. -> -> 💎 **Explanation: Differences between Stream and Batch Data Extraction Modes** -> - **stream (recommended)**: In this mode, tasks process and send data in real-time. It is characterized by high timeliness and low throughput. -> - **batch**: In this mode, tasks process and send data in batches (according to the underlying data files). It is characterized by low timeliness and high throughput. - -### sink parameter - -> In versions 1.3.1 and above, the parameters no longer require additional source, processor, and sink prefixes. - -#### iotdb-thrift-sink( V1.3.0/1/2) - - -| key | value | value Range | required or not | Default Value | -| :--------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :----------- | -| sink | iotdb-thrift-sink or iotdb-thrift-async-sink | String: iotdb-thrift-sink or iotdb-thrift-async-sink | Required | | -| sink.node-urls | The URL of the data service port of any DataNode nodes on the target IoTDB (please note that synchronization tasks do not support forwarding to its own service) | String. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Required | - | -| sink.batch.enable | Whether to enable batched log transmission mode to improve transmission throughput and reduce IOPS | Boolean: true, false | Optional | true | -| sink.batch.max-delay-seconds | Effective when batched log transmission mode is enabled, it represents the maximum waiting time for a batch of data before sending (unit: s) | Integer | Optional | 1 | -| sink.batch.size-bytes | Effective when batched log transmission mode is enabled, it represents the maximum batch size for a batch of data (unit: byte) | Long | Optional | 16*1024*1024 | - - -#### iotdb-thrift-ssl-sink( V1.3.1/2) - -| key | value | value Range | required or not | Default Value | -| :---------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :----------- | -| sink | iotdb-thrift-ssl-sink | String: iotdb-thrift-ssl-sink | Required | - | -| node-urls | The URL of the data service port of any DataNode nodes on the target IoTDB (please note that synchronization tasks do not support forwarding to its own service) | String. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Required | - | -| batch.enable | Whether to enable batched log transmission mode to improve transmission throughput and reduce IOPS | Boolean: true, false | Optional | true | -| batch.max-delay-seconds | Effective when batched log transmission mode is enabled, it represents the maximum waiting time for a batch of data before sending (unit: s) | Integer | Optional | 1 | -| batch.size-bytes | Effective when batched log transmission mode is enabled, it represents the maximum batch size for a batch of data (unit: byte) | Long | Optional | 16*1024*1024 | -| ssl.trust-store-path | The trust store certificate path required to connect to the target DataNode | String: certificate directory name, when configured as a relative directory, it is relative to the IoTDB root directory. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667'| Required | - | -| ssl.trust-store-pwd | The trust store certificate password required to connect to the target DataNode | Integer | Required | - | - diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Data-Sync_timecho.md b/src/UserGuide/V1.3.0-2/User-Manual/Data-Sync_timecho.md deleted file mode 100644 index 47671ea71..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Data-Sync_timecho.md +++ /dev/null @@ -1,613 +0,0 @@ - - -# Data Sync - -Data synchronization is a typical requirement in industrial Internet of Things (IoT). Through data synchronization mechanisms, it is possible to achieve data sharing between IoTDB, and to establish a complete data link to meet the needs for internal and external network data interconnectivity, edge-cloud synchronization, data migration, and data backup. - -## Function Overview - -### Data Synchronization - -A data synchronization task consists of three stages: - -![](/img/sync_en_01.png) - -- Source Stage:This part is used to extract data from the source IoTDB, defined in the source section of the SQL statement. -- Process Stage:This part is used to process the data extracted from the source IoTDB, defined in the processor section of the SQL statement. -- Sink Stage:This part is used to send data to the target IoTDB, defined in the sink section of the SQL statement. - -By declaratively configuring the specific content of the three parts through SQL statements, flexible data synchronization capabilities can be achieved. Currently, data synchronization supports the synchronization of the following information, and you can select the synchronization scope when creating a synchronization task (the default is data.insert, which means synchronizing newly written data): - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Synchronization ScopeSynchronization Content Description
allAll scopes
data(Data)insertSynchronize newly written data
deleteSynchronize deleted data
schemadatabaseSynchronize database creation, modification or deletion operations
timeseriesSynchronize the definition and attributes of time series
TTLSynchronize the data retention time
auth-Synchronize user permissions and access control
- -### Functional limitations and instructions - -The schema and auth synchronization functions have the following limitations: - -- When using schema synchronization, it is required that the consensus protocol of `Schema region` and `ConfigNode` must be the default ratis protocol, that is: In the `iotdb-common.properties` configuration file, both the `config_node_consensus_protocol_class` and `schema_region_consensus_protocol_class` configuration items are set to `org.apache.iotdb.consensus.ratis.RatisConsensus`. - -- To prevent potential conflicts, please turn off the automatic creation of metadata on the receiving end when enabling schema synchronization. You can do this by setting the `enable_auto_create_schema` configuration in the `iotdb-common.properties` configuration file to false. - -- When schema synchronization is enabled, the use of custom plugins is not supported. - -- In a dual-active cluster, schema synchronization should avoid simultaneous operations on both ends. - -- During data synchronization tasks, please avoid performing any deletion operations to prevent inconsistent states between the two ends. - -## Usage Instructions - -Data synchronization tasks have three states: RUNNING, STOPPED, and DROPPED. The task state transitions are shown in the following diagram: - - -V1.3.0 and earlier versions: - -After creation, it will not start immediately and needs to execute the `START PIPE` statement to start the task. - -![](/img/sync_en_02.png) - -V1.3.1 and later versions: - -After creation, the task will start directly, and when the task stops abnormally, the system will automatically attempt to restart the task. - -![](/img/Data-Sync02.png) - -Provide the following SQL statements for state management of synchronization tasks. - -### Create Task - -Use the `CREATE PIPE` statement to create a data synchronization task. The `PipeId` and `sink` attributes are required, while `source` and `processor` are optional. When entering the SQL, note that the order of the `SOURCE` and `SINK` plugins cannot be swapped. - -The SQL example is as follows: - -```SQL -CREATE PIPE -- PipeId is the name that uniquely identifies the task. --- Data extraction plugin, optional plugin -WITH SOURCE ( - [ = ,], -) --- Data processing plugin, optional plugin -WITH PROCESSOR ( - [ = ,], -) --- Data connection plugin, required plugin -WITH SINK ( - [ = ,], -) -``` - -### Start Task - -Start processing data: - -```SQL -START PIPE -``` - -### Stop Task - -Stop processing data: - -```SQL -STOP PIPE -``` - -### Delete Task - -Deletes the specified task: - -```SQL -DROP PIPE -``` - -Deleting a task does not require stopping the synchronization task first. - -### View Task - -View all tasks: - -```SQL -SHOW PIPES -``` - -To view a specified task: - -```SQL -SHOW PIPE -``` - -Example of the show pipes result for a pipe: - -```SQL -+--------------------------------+-----------------------+-------+---------------+--------------------+------------------------------------------------------------+----------------+ -| ID| CreationTime| State| PipeSource| PipeProcessor| PipeSink|ExceptionMessage| -+--------------------------------+-----------------------+-------+---------------+--------------------+------------------------------------------------------------+----------------+ -|3421aacb16ae46249bac96ce4048a220|2024-08-13T09:55:18.717|RUNNING| {}| {}|{{sink=iotdb-thrift-sink, sink.ip=127.0.0.1, sink.port=6668}}| | -+--------------------------------+-----------------------+-------+---------------+--------------------+------------------------------------------------------------+----------------+ -``` - -The meanings of each column are as follows: - -- **ID**:The unique identifier for the synchronization task -- **CreationTime**:The time when the synchronization task was created -- **State**:The state of the synchronization task -- **PipeSource**:The source of the synchronized data stream -- **PipeProcessor**:The processing logic of the synchronized data stream during transmission -- **PipeSink**:The destination of the synchronized data stream -- **ExceptionMessage**:Displays the exception information of the synchronization task - - -### Synchronization Plugins - -To make the overall architecture more flexible to match different synchronization scenario requirements, we support plugin assembly within the synchronization task framework. The system comes with some pre-installed common plugins that you can use directly. At the same time, you can also customize processor plugins and Sink plugins, and load them into the IoTDB system for use. You can view the plugins in the system (including custom and built-in plugins) with the following statement: - -```SQL -SHOW PIPEPLUGINS -``` - -The return result is as follows (version 1.3.2): - -```SQL -IoTDB> SHOW PIPEPLUGINS -+---------------------+----------+-------------------------------------------------------------------------------------------+----------------------------------------------------+ -| PluginName|PluginType| ClassName| PluginJar| -+---------------------+----------+-------------------------------------------------------------------------------------------+----------------------------------------------------+ -| DO-NOTHING-PROCESSOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.processor.donothing.DoNothingProcessor| | -| DO-NOTHING-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.donothing.DoNothingConnector| | -| IOTDB-AIR-GAP-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.airgap.IoTDBAirGapConnector| | -| IOTDB-SOURCE| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.extractor.iotdb.IoTDBExtractor| | -| IOTDB-THRIFT-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftConnector| | -|IOTDB-THRIFT-SSL-SINK| Builtin|org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftSslConnector| | -+---------------------+----------+-------------------------------------------------------------------------------------------+----------------------------------------------------+ -``` - -Detailed introduction of pre-installed plugins is as follows (for detailed parameters of each plugin, please refer to the [Parameter Description](#reference-parameter-description) section): - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TypeCustom PluginPlugin NameDescriptionApplicable Version
source pluginNot Supportediotdb-sourceThe default extractor plugin, used to extract historical or real-time data from IoTDB1.2.x
processor pluginSupporteddo-nothing-processorThe default processor plugin, which does not process the incoming data1.2.x
sink pluginSupporteddo-nothing-sinkDoes not process the data that is sent out1.2.x
iotdb-thrift-sinkThe default sink plugin ( V1.3.1+ ), used for data transfer between IoTDB ( V1.2.0+ ) and IoTDB( V1.2.0+ ) . It uses the Thrift RPC framework to transfer data, with a multi-threaded async non-blocking IO model, high transfer performance, especially suitable for scenarios where the target end is distributed1.2.x
iotdb-air-gap-sinkUsed for data synchronization across unidirectional data diodes from IoTDB ( V1.2.0+ ) to IoTDB ( V1.2.0+ ). Supported diode models include Nanrui Syskeeper 2000, etc1.2.x
iotdb-thrift-ssl-sinkUsed for data transfer between IoTDB ( V1.3.1+ ) and IoTDB ( V1.2.0+ ). It uses the Thrift RPC framework to transfer data, with a single-threaded sync blocking IO model, suitable for scenarios with higher security requirements1.3.1+
- -For importing custom plugins, please refer to the [Stream Processing](./Streaming_timecho.md#custom-stream-processing-plugin-management) section. - -## Use examples - -### Full data synchronisation - -This example is used to demonstrate the synchronisation of all data from one IoTDB to another IoTDB with the data link as shown below: - -![](/img/pipe1.jpg) - -In this example, we can create a synchronization task named A2B to synchronize the full data from A IoTDB to B IoTDB. The iotdb-thrift-sink plugin (built-in plugin) for the sink is required. The URL of the data service port of the DataNode node on the target IoTDB needs to be configured through node-urls, as shown in the following example statement: - -```SQL -create pipe A2B -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -``` - -### Partial data synchronization - -This example is used to demonstrate the synchronisation of data from a certain historical time range (8:00pm 23 August 2023 to 8:00pm 23 October 2023) to another IoTDB, the data link is shown below: - -![](/img/pipe2.jpg) - -In this example, we can create a synchronization task named A2B. First, we need to define the range of data to be transferred in the source. Since the data being transferred is historical data (historical data refers to data that existed before the creation of the synchronization task), we need to configure the start-time and end-time of the data and the transfer mode mode. The URL of the data service port of the DataNode node on the target IoTDB needs to be configured through node-urls. - -The detailed statements are as follows: - -```SQL -create pipe A2B -WITH SOURCE ( - 'source'= 'iotdb-source', - 'realtime.mode' = 'stream' -- The extraction mode for newly inserted data (after pipe creation) - 'start-time' = '2023.08.23T08:00:00+00:00', -- The start event time for synchronizing all data, including start-time - 'end-time' = '2023.10.23T08:00:00+00:00' -- The end event time for synchronizing all data, including end-time -) -with SINK ( - 'sink'='iotdb-thrift-async-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -) -``` - -### Bidirectional data transfer - -This example is used to demonstrate the scenario where two IoTDB act as active-active pairs, with the data link shown in the figure below: - -![](/img/pipe3.jpg) - -In this example, to avoid infinite data loops, the `forwarding-pipe-requests` parameter on A and B needs to be set to `false`, indicating that data transmitted from another pipe is not forwarded, and to keep the data consistent on both sides, the pipe needs to be configured with `inclusion=all` to synchronize full data and metadata. - -The detailed statement is as follows: - -On A IoTDB, execute the following statement: - -```SQL -create pipe AB -with source ( - 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth - 'forwarding-pipe-requests' = 'false' -- Do not forward data written by other Pipes -) -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -) -``` - -On B IoTDB, execute the following statement: - -```SQL -create pipe BA -with source ( - 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth - 'forwarding-pipe-requests' = 'false' -- Do not forward data written by other Pipes -) -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6667', -- The URL of the data service port of the DataNode node on the target IoTDB -) -``` - -### Edge-cloud data transfer - -This example is used to demonstrate the scenario where data from multiple IoTDB is transferred to the cloud, with data from clusters B, C, and D all synchronized to cluster A, as shown in the figure below: - -![](/img/sync_en_03.png) - -In this example, to synchronize the data from clusters B, C, and D to A, the pipe between BA, CA, and DA needs to configure the `path` to limit the range, and to keep the edge and cloud data consistent, the pipe needs to be configured with `inclusion=all` to synchronize full data and metadata. The detailed statement is as follows: - -On B IoTDB, execute the following statement to synchronize data from B to A: - -```SQL -create pipe BA -with source ( - 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth - 'path'='root.db.**', -- Limit the range -) -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -) -) -``` - -On C IoTDB, execute the following statement to synchronize data from C to A: - -```SQL -create pipe CA -with source ( - 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth - 'path'='root.db.**', -- Limit the range -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -) -) -``` - -On D IoTDB, execute the following statement to synchronize data from D to A: - -```SQL -create pipe DA -with source ( - 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth - 'path'='root.db.**', -- Limit the range -) -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -) -) -``` - -### Cascading data transfer - -This example is used to demonstrate the scenario where data is transferred in a cascading manner between multiple IoTDB, with data from cluster A synchronized to cluster B, and then to cluster C, as shown in the figure below: - -![](/img/sync_en_04.png) - -In this example, to synchronize the data from cluster A to C, the `forwarding-pipe-requests` needs to be set to `true` between BC. The detailed statement is as follows: - -On A IoTDB, execute the following statement to synchronize data from A to B: - -```SQL -create pipe AB -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB -) -) -``` - -On B IoTDB, execute the following statement to synchronize data from B to C: - -```SQL -create pipe BC -with source ( - 'forwarding-pipe-requests' = 'true' -- Whether to forward data written by other Pipes -) -with sink ( - 'sink'='iotdb-thrift-sink', - 'node-urls' = '127.0.0.1:6669', -- The URL of the data service port of the DataNode node on the target IoTDB -) -) -``` - -### Cross-gate data transfer - -This example is used to demonstrate the scenario where data from one IoTDB is synchronized to another IoTDB through a unidirectional gateway, as shown in the figure below: - -![](/img/pipe5.jpg) - - -In this example, the iotdb-air-gap-sink plugin in the sink task needs to be used (currently supports some gateway models, for specific models, please contact Timecho staff for confirmation). After configuring the gateway, execute the following statement on A IoTDB. Fill in the node-urls with the URL of the data service port of the DataNode node on the target IoTDB configured by the gateway, as detailed below: - -```SQL -create pipe A2B -with sink ( - 'sink'='iotdb-air-gap-sink', - 'node-urls' = '10.53.53.53:9780', -- The URL of the data service port of the DataNode node on the target IoTDB -``` - - -### Encrypted Synchronization (V1.3.1+) - -IoTDB supports the use of SSL encryption during the synchronization process, ensuring the secure transfer of data between different IoTDB instances. By configuring SSL-related parameters, such as the certificate address and password (`ssl.trust-store-path`)、(`ssl.trust-store-pwd`), data can be protected by SSL encryption during the synchronization process. - -For example, to create a synchronization task named A2B: - -```SQL -create pipe A2B -with sink ( - 'sink'='iotdb-thrift-ssl-sink', - 'node-urls'='127.0.0.1:6667', -- The URL of the data service port of the DataNode node on the target IoTDB - 'ssl.trust-store-path'='pki/trusted', -- The trust store certificate path required to connect to the target DataNode - 'ssl.trust-store-pwd'='root' -- The trust store certificate password required to connect to the target DataNode -) -``` - -## Reference: Notes - -You can adjust the parameters for data synchronization by modifying the IoTDB configuration file (`iotdb-common.properties`), such as the directory for storing synchronized data. The complete configuration is as follows: - -V1.3.0/1/2: - -```Properties -#################### -### Pipe Configuration -#################### - -# Uncomment the following field to configure the pipe lib directory. -# For Windows platform -# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is -# absolute. Otherwise, it is relative. -# pipe_lib_dir=ext\\pipe -# For Linux platform -# If its prefix is "/", then the path is absolute. Otherwise, it is relative. -# pipe_lib_dir=ext/pipe - -# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. -# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). -# pipe_subtask_executor_max_thread_num=5 - -# The connection timeout (in milliseconds) for the thrift client. -# pipe_sink_timeout_ms=900000 - -# The maximum number of selectors that can be used in the sink. -# Recommend to set this value to less than or equal to pipe_sink_max_client_number. -# pipe_sink_selector_number=4 - -# The maximum number of clients that can be used in the sink. -# pipe_sink_max_client_number=16 - -# Whether to enable receiving pipe data through air gap. -# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. -# pipe_air_gap_receiver_enabled=false - -# The port for the server to receive pipe data through air gap. -# pipe_air_gap_receiver_port=9780 -``` - -## Reference: parameter description - -### source parameter(V1.3.0) - -| key | value | value range | required or not | default value | -| :------------------------------ | :----------------------------------------------------------- | :------------------------------------- | :------- | :------------- | -| source | iotdb-source | String: iotdb-source | required | - | -| source.pattern | Used to filter the path prefix of time series | String: any time series prefix | optional | root | -| source.history.enable | Whether to send historical data | Boolean: true / false | optional | true | -| source.history.start-time | The start event time for synchronizing historical data, including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional | Long.MIN_VALUE | -| source.history.end-time | The end event time for synchronizing historical data, including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional | Long.MAX_VALUE | -| source.realtime.enable | Whether to send real-time data | Boolean: true / false | optional | true | -| source.realtime.mode | The extraction mode for newly inserted data (after pipe creation) | String: stream, batch | optional | stream | -| source.forwarding-pipe-requests | Whether to forward data written by other Pipes (usually data synchronization) | Boolean: true, false | optional | true | -| source.history.loose-range | When transferring tsfile, whether to relax the historical data (before pipe creation) range. "": Do not relax the range, select data strictly according to the set conditions "time": Relax the time range to avoid splitting TsFile, which can improve synchronization efficiency | String: "" / "time" | optional | Empty String | - -> 💎 **Explanation: Difference between Historical Data and Real-time Data** -> - **Historical Data**: All data with arrival time < the current system time when the pipe is created is called historical data. -> - **Real-time Data**:All data with arrival time >= the current system time when the pipe is created is called real-time data. -> - **Full Data**: Full data = Historical data + Real-time data -> -> 💎 **Explanation: Differences between Stream and Batch Data Extraction Modes** -> - **stream (recommended)**: In this mode, tasks process and send data in real-time. It is characterized by high timeliness and low throughput. -> - **batch**: In this mode, tasks process and send data in batches (according to the underlying data files). It is characterized by low timeliness and high throughput. - -### source parameter(V1.3.1) - -> In versions 1.3.1 and above, the parameters no longer require additional source, processor, and sink prefixes. - -| key | value | value range | required or not | default value | -| :----------------------- | :----------------------------------------------------------- | :------------------------------------- | :------- | :------------- | -| source | iotdb-source | String: iotdb-source | Required | - | -| pattern | Used to filter the path prefix of time series | String: any time series prefix | Optional | root | -| start-time | The start event time for synchronizing all data, including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MIN_VALUE | -| end-time | The end event time for synchronizing all data, including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MAX_VALUE | -| realtime.mode | The extraction mode for newly inserted data (after pipe creation) | String: stream, batch | Optional | stream | -| forwarding-pipe-requests | Whether to forward data written by other Pipes (usually data synchronization) | Boolean: true, false | Optional | true | -| history.loose-range | When transferring tsfile, whether to relax the historical data (before pipe creation) range. "": Do not relax the range, select data strictly according to the set conditions "time": Relax the time range to avoid splitting TsFile, which can improve synchronization efficiency | String: "" / "time" | Optional | Empty String | - -> 💎 **Explanation**:To maintain compatibility with lower versions, history.enable, history.start-time, history.end-time, realtime.enable can still be used, but they are not recommended in the new version. -> -> 💎 **Explanation: Differences between Stream and Batch Data Extraction Modes** -> - **stream (recommended)**: In this mode, tasks process and send data in real-time. It is characterized by high timeliness and low throughput. -> - **batch**: In this mode, tasks process and send data in batches (according to the underlying data files). It is characterized by low timeliness and high throughput. - -### source parameter(V1.3.2) - -> In versions 1.3.1 and above, the parameters no longer require additional source, processor, and sink prefixes. - -| key | value | value range | required or not | default value | -| :----------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :------------- | -| source | iotdb-source | String: iotdb-source | Required | - | -| inclusion | Used to specify the range of data to be synchronized in the data synchronization task, including data, schema, and auth | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | Optional | data.insert | -| inclusion.exclusion | Used to exclude specific operations from the range specified by inclusion, reducing the amount of data synchronized | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | Optional | - | -| path | Used to filter the path pattern schema of time series and data to be synchronized / schema synchronization can only use pathpath is exact matching, parameters must be prefix paths or complete paths, i.e., cannot contain `"*"`, at most one `"**"` at the end of the path parameter | String:IoTDB pattern | Optional | root.** | -| pattern | Used to filter the path prefix of time series | String: Optional | Optional | root | -| start-time | The start event time for synchronizing all data, including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MIN_VALUE | -| end-time | The end event time for synchronizing all data, including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MAX_VALUE | -| realtime.mode | The extraction mode for newly inserted data (after pipe creation) | String: stream, batch | Optional | stream | -| forwarding-pipe-requests | Whether to forward data written by other Pipes (usually data synchronization) | Boolean: true, false | Optional | true | -| history.loose-range | When transferring tsfile, whether to relax the historical data (before pipe creation) range. "": Do not relax the range, select data strictly according to the set conditions "time": Relax the time range to avoid splitting TsFile, which can improve synchronization efficiency | String: "" 、 "time" | Optional | "" | -| mods.enable | Whether to send the mods file of tsfile | Boolean: true / false | Optional | false | - -> 💎 **Explanation**:To maintain compatibility with lower versions, history.enable, history.start-time, history.end-time, realtime.enable can still be used, but they are not recommended in the new version. -> -> 💎 **Explanation: Differences between Stream and Batch Data Extraction Modes** -> - **stream (recommended)**: In this mode, tasks process and send data in real-time. It is characterized by high timeliness and low throughput. -> - **batch**: In this mode, tasks process and send data in batches (according to the underlying data files). It is characterized by low timeliness and high throughput. - -### sink parameter - -> In versions 1.3.1 and above, the parameters no longer require additional source, processor, and sink prefixes. - -#### iotdb-thrift-sink( V1.3.0/1/2) - - -| key | value | value Range | required or not | Default Value | -| :--------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :----------- | -| sink | iotdb-thrift-sink or iotdb-thrift-async-sink | String: iotdb-thrift-sink or iotdb-thrift-async-sink | Required | | -| sink.node-urls | The URL of the data service port of any DataNode nodes on the target IoTDB (please note that synchronization tasks do not support forwarding to its own service) | String. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Required | - | -| sink.batch.enable | Whether to enable batched log transmission mode to improve transmission throughput and reduce IOPS | Boolean: true, false | Optional | true | -| sink.batch.max-delay-seconds | Effective when batched log transmission mode is enabled, it represents the maximum waiting time for a batch of data before sending (unit: s) | Integer | Optional | 1 | -| sink.batch.size-bytes | Effective when batched log transmission mode is enabled, it represents the maximum batch size for a batch of data (unit: byte) | Long | Optional | 16*1024*1024 | - -#### iotdb-air-gap-sink( V1.3.0/1/2) - -| key | value | value Range | required or not | Default Value | -| :--------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :----------- | -| sink | iotdb-air-gap-sink | String: iotdb-air-gap-sink | Required | - | -| sink.node-urls | The URL of the data service port of any DataNode nodes on the target IoTDB | String. Example: :'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Required | - | -| sink.air-gap.handshake-timeout-ms | The timeout duration of the handshake request when the sender and receiver first attempt to establish a connection, unit: ms | Integer | Optional | 5000 | - - -#### iotdb-thrift-ssl-sink( V1.3.1/2) - -| key | value | value Range | required or not | Default Value | -| :---------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :----------- | -| sink | iotdb-thrift-ssl-sink | String: iotdb-thrift-ssl-sink | Required | - | -| node-urls | The URL of the data service port of any DataNode nodes on the target IoTDB (please note that synchronization tasks do not support forwarding to its own service) | String. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Required | - | -| batch.enable | Whether to enable batched log transmission mode to improve transmission throughput and reduce IOPS | Boolean: true, false | Optional | true | -| batch.max-delay-seconds | Effective when batched log transmission mode is enabled, it represents the maximum waiting time for a batch of data before sending (unit: s) | Integer | Optional | 1 | -| batch.size-bytes | Effective when batched log transmission mode is enabled, it represents the maximum batch size for a batch of data (unit: byte) | Long | Optional | 16*1024*1024 | -| ssl.trust-store-path | The trust store certificate path required to connect to the target DataNode | String: certificate directory name, when configured as a relative directory, it is relative to the IoTDB root directory. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667'| Required | - | -| ssl.trust-store-pwd | The trust store certificate password required to connect to the target DataNode | Integer | Required | - | diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Database-Programming.md b/src/UserGuide/V1.3.0-2/User-Manual/Database-Programming.md deleted file mode 100644 index 9791a13ee..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Database-Programming.md +++ /dev/null @@ -1,1038 +0,0 @@ - - -# Database Programming - -## TRIGGER - -### Instructions - -The trigger provides a mechanism for listening to changes in time series data. With user-defined logic, tasks such as alerting and data forwarding can be conducted. - -The trigger is implemented based on the reflection mechanism. Users can monitor data changes by implementing the Java interfaces. IoTDB allows users to dynamically register and drop triggers without restarting the server. - -The document will help you learn to define and manage triggers. - -#### Pattern for Listening - -A single trigger can be used to listen for data changes in a time series that match a specific pattern. For example, a trigger can listen for the data changes of time series `root.sg.a`, or time series that match the pattern `root.sg.*`. When you register a trigger, you can specify the path pattern that the trigger listens on through an SQL statement. - -#### Trigger Type - -There are currently two types of triggers, and you can specify the type through an SQL statement when registering a trigger: - -- Stateful triggers: The execution logic of this type of trigger may depend on data from multiple insertion statement . The framework will aggregate the data written by different nodes into the same trigger instance for calculation to retain context information. This type of trigger is usually used for sampling or statistical data aggregation for a period of time. information. Only one node in the cluster holds an instance of a stateful trigger. -- Stateless triggers: The execution logic of the trigger is only related to the current input data. The framework does not need to aggregate the data of different nodes into the same trigger instance. This type of trigger is usually used for calculation of single row data and abnormal detection. Each node in the cluster holds an instance of a stateless trigger. - -#### Trigger Event - -There are currently two trigger events for the trigger, and other trigger events will be expanded in the future. When you register a trigger, you can specify the trigger event through an SQL statement: - -- BEFORE INSERT: Fires before the data is persisted. **Please note that currently the trigger does not support data cleaning and will not change the data to be persisted itself.** -- AFTER INSERT: Fires after the data is persisted. - -### How to Implement a Trigger - -You need to implement the trigger by writing a Java class, where the dependency shown below is required. If you use [Maven](http://search.maven.org/), you can search for them directly from the [Maven repository](http://search.maven.org/). - -#### Dependency - -```xml - - org.apache.iotdb - iotdb-server - 1.0.0 - provided - -``` - -Note that the dependency version should be correspondent to the target server version. - -#### Interface Description - -To implement a trigger, you need to implement the `org.apache.iotdb.trigger.api.Trigger` class. - -```java -import org.apache.iotdb.trigger.api.enums.FailureStrategy; -import org.apache.iotdb.tsfile.write.record.Tablet; - -public interface Trigger { - - /** - * This method is mainly used to validate {@link TriggerAttributes} before calling {@link - * Trigger#onCreate(TriggerAttributes)}. - * - * @param attributes TriggerAttributes - * @throws Exception e - */ - default void validate(TriggerAttributes attributes) throws Exception {} - - /** - * This method will be called when creating a trigger after validation. - * - * @param attributes TriggerAttributes - * @throws Exception e - */ - default void onCreate(TriggerAttributes attributes) throws Exception {} - - /** - * This method will be called when dropping a trigger. - * - * @throws Exception e - */ - default void onDrop() throws Exception {} - - /** - * When restarting a DataNode, Triggers that have been registered will be restored and this method - * will be called during the process of restoring. - * - * @throws Exception e - */ - default void restore() throws Exception {} - - /** - * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} - * is the default strategy. - * - * @return {@link FailureStrategy} - */ - default FailureStrategy getFailureStrategy() { - return FailureStrategy.OPTIMISTIC; - } - - /** - * @param tablet see {@link Tablet} for detailed information of data structure. Data that is - * inserted will be constructed as a Tablet and you can define process logic with {@link - * Tablet}. - * @return true if successfully fired - * @throws Exception e - */ - default boolean fire(Tablet tablet) throws Exception { - return true; - } -} -``` - -This class provides two types of programming interfaces: **Lifecycle related interfaces** and **data change listening related interfaces**. All the interfaces in this class are not required to be implemented. When the interfaces are not implemented, the trigger will not respond to the data changes. You can implement only some of these interfaces according to your needs. - -Descriptions of the interfaces are as followed. - -##### Lifecycle Related Interfaces - -| Interface | Description | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| *default void validate(TriggerAttributes attributes) throws Exception {}* | When you creates a trigger using the `CREATE TRIGGER` statement, you can specify the parameters that the trigger needs to use, and this interface will be used to verify the correctness of the parameters。 | -| *default void onCreate(TriggerAttributes attributes) throws Exception {}* | This interface is called once when you create a trigger using the `CREATE TRIGGER` statement. During the lifetime of each trigger instance, this interface will be called only once. This interface is mainly used for the following functions: helping users to parse custom attributes in SQL statements (using `TriggerAttributes`). You can create or apply for resources, such as establishing external links, opening files, etc. | -| *default void onDrop() throws Exception {}* | This interface is called when you drop a trigger using the `DROP TRIGGER` statement. During the lifetime of each trigger instance, this interface will be called only once. This interface mainly has the following functions: it can perform the operation of resource release and can be used to persist the results of trigger calculations. | -| *default void restore() throws Exception {}* | When the DataNode is restarted, the cluster will restore the trigger instance registered on the DataNode, and this interface will be called once for stateful trigger during the process. After the DataNode where the stateful trigger instance is located goes down, the cluster will restore the trigger instance on another available DataNode, calling this interface once in the process. This interface can be used to customize recovery logic. | - -##### Data Change Listening Related Interfaces - -###### Listening Interface - -```java -/** - * @param tablet see {@link Tablet} for detailed information of data structure. Data that is - * inserted will be constructed as a Tablet and you can define process logic with {@link - * Tablet}. - * @return true if successfully fired - * @throws Exception e - */ - default boolean fire(Tablet tablet) throws Exception { - return true; - } -``` - -When the data changes, the trigger uses the Tablet as the unit of firing operation. You can obtain the metadata and data of the corresponding sequence through Tablet, and then perform the corresponding trigger operation. If the fire process is successful, the return value should be true. If the interface returns false or throws an exception, we consider the trigger fire process as failed. When the trigger fire process fails, we will perform corresponding operations according to the listening strategy interface. - -When performing an INSERT operation, for each time series in it, we will detect whether there is a trigger that listens to the path pattern, and then assemble the time series data that matches the path pattern listened by the same trigger into a new Tablet for trigger fire interface. Can be understood as: - -```java -Map> pathToTriggerListMap => Map -``` - -**Note that currently we do not make any guarantees about the order in which triggers fire.** - -Here is an example: - -Suppose there are three triggers, and the trigger event of the triggers are all BEFORE INSERT: - -- Trigger1 listens on `root.sg.*` -- Trigger2 listens on `root.sg.a` -- Trigger3 listens on `root.sg.b` - -Insertion statement: - -```sql -insert into root.sg(time, a, b) values (1, 1, 1); -``` - -The time series `root.sg.a` matches Trigger1 and Trigger2, and the sequence `root.sg.b` matches Trigger1 and Trigger3, then: - -- The data of `root.sg.a` and `root.sg.b` will be assembled into a new tablet1, and Trigger1.fire(tablet1) will be executed at the corresponding Trigger Event. -- The data of `root.sg.a` will be assembled into a new tablet2, and Trigger2.fire(tablet2) will be executed at the corresponding Trigger Event. -- The data of `root.sg.b` will be assembled into a new tablet3, and Trigger3.fire(tablet3) will be executed at the corresponding Trigger Event. - -###### Listening Strategy Interface - -When the trigger fails to fire, we will take corresponding actions according to the strategy set by the listening strategy interface. You can set `org.apache.iotdb.trigger.api.enums.FailureStrategy`. There are currently two strategies, optimistic and pessimistic: - -- Optimistic strategy: The trigger that fails to fire does not affect the firing of subsequent triggers, nor does it affect the writing process, that is, we do not perform additional processing on the sequence involved in the trigger failure, only log the failure to record the failure, and finally inform user that data insertion is successful, but the trigger fire part failed. -- Pessimistic strategy: The failure trigger affects the processing of all subsequent Pipelines, that is, we believe that the firing failure of the trigger will cause all subsequent triggering processes to no longer be carried out. If the trigger event of the trigger is BEFORE INSERT, then the insertion will no longer be performed, and the insertion failure will be returned directly. - -```java - /** - * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} - * is the default strategy. - * - * @return {@link FailureStrategy} - */ - default FailureStrategy getFailureStrategy() { - return FailureStrategy.OPTIMISTIC; - } -``` - -#### Example - -If you use [Maven](http://search.maven.org/), you can refer to our sample project **trigger-example**. - -You can find it [here](https://github.com/apache/iotdb/tree/master/example/trigger). - -Here is the code from one of the sample projects: - -```java -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iotdb.trigger; - -import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerConfiguration; -import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerEvent; -import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerHandler; -import org.apache.iotdb.trigger.api.Trigger; -import org.apache.iotdb.trigger.api.TriggerAttributes; -import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; -import org.apache.iotdb.tsfile.write.record.Tablet; -import org.apache.iotdb.tsfile.write.schema.MeasurementSchema; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.io.IOException; -import java.util.HashMap; -import java.util.List; - -public class ClusterAlertingExample implements Trigger { - private static final Logger LOGGER = LoggerFactory.getLogger(ClusterAlertingExample.class); - - private final AlertManagerHandler alertManagerHandler = new AlertManagerHandler(); - - private final AlertManagerConfiguration alertManagerConfiguration = - new AlertManagerConfiguration("http://127.0.0.1:9093/api/v2/alerts"); - - private String alertname; - - private final HashMap labels = new HashMap<>(); - - private final HashMap annotations = new HashMap<>(); - - @Override - public void onCreate(TriggerAttributes attributes) throws Exception { - alertname = "alert_test"; - - labels.put("series", "root.ln.wf01.wt01.temperature"); - labels.put("value", ""); - labels.put("severity", ""); - - annotations.put("summary", "high temperature"); - annotations.put("description", "{{.alertname}}: {{.series}} is {{.value}}"); - - alertManagerHandler.open(alertManagerConfiguration); - } - - @Override - public void onDrop() throws IOException { - alertManagerHandler.close(); - } - - @Override - public boolean fire(Tablet tablet) throws Exception { - List measurementSchemaList = tablet.getSchemas(); - for (int i = 0, n = measurementSchemaList.size(); i < n; i++) { - if (measurementSchemaList.get(i).getType().equals(TSDataType.DOUBLE)) { - // for example, we only deal with the columns of Double type - double[] values = (double[]) tablet.values[i]; - for (double value : values) { - if (value > 100.0) { - LOGGER.info("trigger value > 100"); - labels.put("value", String.valueOf(value)); - labels.put("severity", "critical"); - AlertManagerEvent alertManagerEvent = - new AlertManagerEvent(alertname, labels, annotations); - alertManagerHandler.onEvent(alertManagerEvent); - } else if (value > 50.0) { - LOGGER.info("trigger value > 50"); - labels.put("value", String.valueOf(value)); - labels.put("severity", "warning"); - AlertManagerEvent alertManagerEvent = - new AlertManagerEvent(alertname, labels, annotations); - alertManagerHandler.onEvent(alertManagerEvent); - } - } - } - } - return true; - } -} -``` - -### Trigger Management - -You can create and drop a trigger through an SQL statement, and you can also query all registered triggers through an SQL statement. - -**We recommend that you stop insertion while creating triggers.** - -#### Create Trigger - -Triggers can be registered on arbitrary path patterns. The time series registered with the trigger will be listened to by the trigger. When there is data change on the series, the corresponding fire method in the trigger will be called. - -Registering a trigger can be done as follows: - -1. Implement a Trigger class as described in the How to implement a Trigger chapter, assuming the class's full class name is `org.apache.iotdb.trigger.ClusterAlertingExample` -2. Package the project into a JAR package. -3. Register the trigger with an SQL statement. During the creation process, the `validate` and `onCreate` interfaces of the trigger will only be called once. For details, please refer to the chapter of How to implement a Trigger. - -The complete SQL syntax is as follows: - -```sql -// Create Trigger -createTrigger - : CREATE triggerType TRIGGER triggerName=identifier triggerEventClause ON pathPattern AS className=STRING_LITERAL uriClause? triggerAttributeClause? - ; - -triggerType - : STATELESS | STATEFUL - ; - -triggerEventClause - : (BEFORE | AFTER) INSERT - ; - -uriClause - : USING URI uri - ; - -uri - : STRING_LITERAL - ; - -triggerAttributeClause - : WITH LR_BRACKET triggerAttribute (COMMA triggerAttribute)* RR_BRACKET - ; - -triggerAttribute - : key=attributeKey operator_eq value=attributeValue - ; -``` - -Below is the explanation for the SQL syntax: - -- triggerName: The trigger ID, which is globally unique and used to distinguish different triggers, is case-sensitive. -- triggerType: Trigger types are divided into two categories, STATELESS and STATEFUL. -- triggerEventClause: when the trigger fires, BEFORE INSERT and AFTER INSERT are supported now. -- pathPattern:The path pattern the trigger listens on, can contain wildcards * and **. -- className:The class name of the Trigger class. -- jarLocation: Optional. When this option is not specified, by default, we consider that the DBA has placed the JAR package required to create the trigger in the trigger_root_dir directory (configuration item, default is IOTDB_HOME/ext/trigger) of each DataNode node. When this option is specified, we will download and distribute the file resource corresponding to the URI to the trigger_root_dir/install directory of each DataNode. -- triggerAttributeClause: It is used to specify the parameters that need to be set when the trigger instance is created. This part is optional in the SQL syntax. - -Here is an example SQL statement to help you understand: - -```sql -CREATE STATELESS TRIGGER triggerTest -BEFORE INSERT -ON root.sg.** -AS 'org.apache.iotdb.trigger.ClusterAlertingExample' -USING URI '/jar/ClusterAlertingExample.jar' -WITH ( - "name" = "trigger", - "limit" = "100" -) -``` - -The above SQL statement creates a trigger named triggerTest: - -- The trigger is stateless. -- Fires before insertion. -- Listens on path pattern root\.sg.** -- The implemented trigger class is named `org.apache.iotdb.trigger.ClusterAlertingExample` -- The JAR package URI is http://jar/ClusterAlertingExample.jar -- When creating the trigger instance, two parameters, name and limit, are passed in. - -#### Drop Trigger - -The trigger can be dropped by specifying the trigger ID. During the process of dropping the trigger, the `onDrop` interface of the trigger will be called only once. - -The SQL syntax is: - -```sql -// Drop Trigger -dropTrigger - : DROP TRIGGER triggerName=identifier -; -``` - -Here is an example statement: - -```sql -DROP TRIGGER triggerTest1 -``` - -The above statement will drop the trigger with ID triggerTest1. - -#### Show Trigger - -You can query information about triggers that exist in the cluster through an SQL statement. - -The SQL syntax is as follows: - -```sql -SHOW TRIGGERS -``` - -The result set format of this statement is as follows: - -| TriggerName | Event | Type | State | PathPattern | ClassName | NodeId | -| ------------ | ---------------------------- | -------------------- | ------------------------------------------- | ----------- | --------------------------------------- | --------------------------------------- | -| triggerTest1 | BEFORE_INSERT / AFTER_INSERT | STATELESS / STATEFUL | INACTIVE / ACTIVE / DROPPING / TRANSFFERING | root.** | org.apache.iotdb.trigger.TriggerExample | ALL(STATELESS) / DATA_NODE_ID(STATEFUL) | - -#### Trigger State - -During the process of creating and dropping triggers in the cluster, we maintain the states of the triggers. The following is a description of these states: - -| State | Description | Is it recommended to insert data? | -| ------------ | ------------------------------------------------------------ | --------------------------------- | -| INACTIVE | The intermediate state of executing `CREATE TRIGGER`, the cluster has just recorded the trigger information on the ConfigNode, and the trigger has not been activated on any DataNode. | NO | -| ACTIVE | Status after successful execution of `CREATE TRIGGE`, the trigger is available on all DataNodes in the cluster. | YES | -| DROPPING | Intermediate state of executing `DROP TRIGGER`, the cluster is in the process of dropping the trigger. | NO | -| TRANSFERRING | The cluster is migrating the location of this trigger instance. | NO | - -### Notes - -- The trigger takes effect from the time of registration, and does not process the existing historical data. **That is, only insertion requests that occur after the trigger is successfully registered will be listened to by the trigger. ** -- The fire process of trigger is synchronous currently, so you need to ensure the efficiency of the trigger, otherwise the writing performance may be greatly affected. **You need to guarantee concurrency safety of triggers yourself**. -- Please do no register too many triggers in the cluster. Because the trigger information is fully stored in the ConfigNode, and there is a copy of the information in all DataNodes -- **It is recommended to stop writing when registering triggers**. Registering a trigger is not an atomic operation. When registering a trigger, there will be an intermediate state in which some nodes in the cluster have registered the trigger, and some nodes have not yet registered successfully. To avoid write requests on some nodes being listened to by triggers and not being listened to on some nodes, we recommend not to perform writes when registering triggers. -- When the node holding the stateful trigger instance goes down, we will try to restore the corresponding instance on another node. During the recovery process, we will call the restore interface of the trigger class once. -- The trigger JAR package has a size limit, which must be less than min(`config_node_ratis_log_appender_buffer_size_max`, 2G), where `config_node_ratis_log_appender_buffer_size_max` is a configuration item. For the specific meaning, please refer to the IOTDB configuration item description. -- **It is better not to have classes with the same full class name but different function implementations in different JAR packages.** For example, trigger1 and trigger2 correspond to resources trigger1.jar and trigger2.jar respectively. If two JAR packages contain a `org.apache.iotdb.trigger.example.AlertListener` class, when `CREATE TRIGGER` uses this class, the system will randomly load the class in one of the JAR packages, which will eventually leads the inconsistent behavior of trigger and other issues. - -### Configuration Parameters - -| Parameter | Meaning | -| ------------------------------------------------- | ------------------------------------------------------------ | -| *trigger_lib_dir* | Directory to save the trigger jar package | -| *stateful\_trigger\_retry\_num\_when\_not\_found* | How many times will we retry to found an instance of stateful trigger on DataNodes if not found | - -## CONTINUOUS QUERY (CQ) - -### Introduction - -Continuous queries(CQ) are queries that run automatically and periodically on realtime data and store query results in other specified time series. - -Users can implement sliding window streaming computing through continuous query, such as calculating the hourly average temperature of a sequence and writing it into a new sequence. Users can customize the `RESAMPLE` clause to create different sliding windows, which can achieve a certain degree of tolerance for out-of-order data. - -### Syntax - -```sql -CREATE (CONTINUOUS QUERY | CQ) -[RESAMPLE - [EVERY ] - [BOUNDARY ] - [RANGE [, end_time_offset]] -] -[TIMEOUT POLICY BLOCKED|DISCARD] -BEGIN - SELECT CLAUSE - INTO CLAUSE - FROM CLAUSE - [WHERE CLAUSE] - [GROUP BY([, ]) [, level = ]] - [HAVING CLAUSE] - [FILL {PREVIOUS | LINEAR | constant}] - [LIMIT rowLimit OFFSET rowOffset] - [ALIGN BY DEVICE] -END -``` - -> Note: -> -> 1. If there exists any time filters in WHERE CLAUSE, IoTDB will throw an error, because IoTDB will automatically generate a time range for the query each time it's executed. -> 2. GROUP BY TIME CLAUSE is different, it doesn't contain its original first display window parameter which is [start_time, end_time). It's still because IoTDB will automatically generate a time range for the query each time it's executed. -> 3. If there is no group by time clause in query, EVERY clause is required, otherwise IoTDB will throw an error. - -#### Descriptions of parameters in CQ syntax - -- `` specifies the globally unique id of CQ. -- `` specifies the query execution time interval. We currently support the units of ns, us, ms, s, m, h, d, w, and its value should not be lower than the minimum threshold configured by the user, which is `continuous_query_min_every_interval`. It's an optional parameter, default value is set to `group_by_interval` in group by clause. -- `` specifies the start time of each query execution as `now()-`. We currently support the units of ns, us, ms, s, m, h, d, w.It's an optional parameter, default value is set to `every_interval` in resample clause. -- `` specifies the end time of each query execution as `now()-`. We currently support the units of ns, us, ms, s, m, h, d, w.It's an optional parameter, default value is set to `0`. -- `` is a date that represents the execution time of a certain cq task. - - `` can be earlier than, equals to, later than **current time**. - - This parameter is optional. If not specified, it is equal to `BOUNDARY 0`。 - - **The start time of the first time window** is ` - `. - - **The end time of the first time window** is ` - `. - - The **time range** of the `i (1 <= i)th` window is `[ - + (i - 1) * , - + (i - 1) * )`. - - If the **current time** is earlier than or equal to `execution_boundary_time`, then the first execution moment of the continuous query is `execution_boundary_time`. - - If the **current time** is later than `execution_boundary_time`, then the first execution moment of the continuous query is the first `execution_boundary_time + i * ` that is later than or equal to the current time . - -> - ``,`` and `` should all be greater than `0`. -> - The value of `` should be less than or equal to the value of ``, otherwise the system will throw an error. -> - Users should specify the appropriate `` and `` according to actual needs. -> - If `` is greater than ``, there will be partial data overlap in each query window. -> - If `` is less than ``, there may be uncovered data between each query window. -> - `start_time_offset` should be larger than `end_time_offset`, otherwise the system will throw an error. - -##### `` == `` - -![1](/img/UserGuide/Process-Data/Continuous-Query/pic1.png?raw=true) - -##### `` > `` - -![2](/img/UserGuide/Process-Data/Continuous-Query/pic2.png?raw=true) - -##### `` < `` - -![3](/img/UserGuide/Process-Data/Continuous-Query/pic3.png?raw=true) - -##### `` is not zero - -![](/img/UserGuide/Process-Data/Continuous-Query/pic4.png?raw=true) - - -- `TIMEOUT POLICY` specify how we deal with the cq task whose previous time interval execution is not finished while the next execution time has reached. The default value is `BLOCKED`. - - `BLOCKED` means that we will block and wait to do the current cq execution task until the previous time interval cq task finishes. If using `BLOCKED` policy, all the time intervals will be executed, but it may be behind the latest time interval. - - `DISCARD` means that we just discard the current cq execution task and wait for the next execution time and do the next time interval cq task. If using `DISCARD` policy, some time intervals won't be executed when the execution time of one cq task is longer than the ``. However, once a cq task is executed, it will use the latest time interval, so it can catch up at the sacrifice of some time intervals being discarded. - - -### Examples of CQ - -The examples below use the following sample data. It's a real time data stream and we can assume that the data arrives on time. - -```` -+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ -| Time|root.ln.wf02.wt02.temperature|root.ln.wf02.wt01.temperature|root.ln.wf01.wt02.temperature|root.ln.wf01.wt01.temperature| -+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ -|2021-05-11T22:18:14.598+08:00| 121.0| 72.0| 183.0| 115.0| -|2021-05-11T22:18:19.941+08:00| 0.0| 68.0| 68.0| 103.0| -|2021-05-11T22:18:24.949+08:00| 122.0| 45.0| 11.0| 14.0| -|2021-05-11T22:18:29.967+08:00| 47.0| 14.0| 59.0| 181.0| -|2021-05-11T22:18:34.979+08:00| 182.0| 113.0| 29.0| 180.0| -|2021-05-11T22:18:39.990+08:00| 42.0| 11.0| 52.0| 19.0| -|2021-05-11T22:18:44.995+08:00| 78.0| 38.0| 123.0| 52.0| -|2021-05-11T22:18:49.999+08:00| 137.0| 172.0| 135.0| 193.0| -|2021-05-11T22:18:55.003+08:00| 16.0| 124.0| 183.0| 18.0| -+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ -```` - -#### Configuring execution intervals - -Use an `EVERY` interval in the `RESAMPLE` clause to specify the CQ’s execution interval, if not specific, default value is equal to `group_by_interval`. - -```sql -CREATE CONTINUOUS QUERY cq1 -RESAMPLE EVERY 20s -BEGIN -SELECT max_value(temperature) - INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) - FROM root.ln.*.* - GROUP BY(10s) -END -``` - -`cq1` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. - -`cq1` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq1` runs a single query that covers the time range for the current time bucket, that is, the 20-second time bucket that intersects with `now()`. - -Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq1` running at DataNode if you set log level to DEBUG: - -```` -At **2021-05-11T22:18:40.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. -`cq1` generate 2 lines: -> -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| -|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -> -At **2021-05-11T22:19:00.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`. -`cq1` generate 2 lines: -> -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| -|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -> -```` - -`cq1` won't deal with data that is before the current time window which is `2021-05-11T22:18:20.000+08:00`, so here are the results: - -```` -> SELECT temperature_max from root.ln.*.*; -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| -|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| -|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| -|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -```` - -#### Configuring time range for resampling - -Use `start_time_offset` in the `RANGE` clause to specify the start time of the CQ’s time range, if not specific, default value is equal to `EVERY` interval. - -```sql -CREATE CONTINUOUS QUERY cq2 -RESAMPLE RANGE 40s -BEGIN - SELECT max_value(temperature) - INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) - FROM root.ln.*.* - GROUP BY(10s) -END -``` - -`cq2` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. - -`cq2` executes at 10-second intervals, the same interval as the `group_by_interval`. Every 10 seconds, `cq2` runs a single query that covers the time range between `now()` minus the `start_time_offset` and `now()` , that is, the time range between 40 seconds prior to `now()` and `now()`. - -Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq2` running at DataNode if you set log level to DEBUG: - -```` -At **2021-05-11T22:18:40.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`. -`cq2` generate 4 lines: -> -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:00.000+08:00| NULL| NULL| NULL| NULL| -|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| -|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| -|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -> -At **2021-05-11T22:18:50.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:10, 2021-05-11T22:18:50)`. -`cq2` generate 4 lines: -> -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| -|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| -|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| -|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -> -At **2021-05-11T22:19:00.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`. -`cq2` generate 4 lines: -> -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| -|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| -|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| -|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -> -```` - -`cq2` won't write lines that are all null. Notice `cq2` will also calculate the results for some time interval many times. Here are the results: - -```` -> SELECT temperature_max from root.ln.*.*; -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| -|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| -|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| -|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| -|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -```` - -#### Configuring execution intervals and CQ time ranges - -Use an `EVERY` interval and `RANGE` interval in the `RESAMPLE` clause to specify the CQ’s execution interval and the length of the CQ’s time range. And use `fill()` to change the value reported for time intervals with no data. - -```sql -CREATE CONTINUOUS QUERY cq3 -RESAMPLE EVERY 20s RANGE 40s -BEGIN - SELECT max_value(temperature) - INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) - FROM root.ln.*.* - GROUP BY(10s) - FILL(100.0) -END -``` - -`cq3` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. Where possible, it writes the value `100.0` for time intervals with no results. - -`cq3` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq3` runs a single query that covers the time range between `now()` minus the `start_time_offset` and `now()`, that is, the time range between 40 seconds prior to `now()` and `now()`. - -Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq3` running at DataNode if you set log level to DEBUG: - -```` -At **2021-05-11T22:18:40.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`. -`cq3` generate 4 lines: -> -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| -|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| -|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| -|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -> -At **2021-05-11T22:19:00.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`. -`cq3` generate 4 lines: -> -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| -|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| -|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| -|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -> -```` - -Notice that `cq3` will calculate the results for some time interval many times, so here are the results: - -```` -> SELECT temperature_max from root.ln.*.*; -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| -|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| -|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| -|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| -|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| -|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -```` - -#### Configuring end_time_offset for CQ time range - -Use an `EVERY` interval and `RANGE` interval in the RESAMPLE clause to specify the CQ’s execution interval and the length of the CQ’s time range. And use `fill()` to change the value reported for time intervals with no data. - -```sql -CREATE CONTINUOUS QUERY cq4 -RESAMPLE EVERY 20s RANGE 40s, 20s -BEGIN - SELECT max_value(temperature) - INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) - FROM root.ln.*.* - GROUP BY(10s) - FILL(100.0) -END -``` - -`cq4` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. Where possible, it writes the value `100.0` for time intervals with no results. - -`cq4` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq4` runs a single query that covers the time range between `now()` minus the `start_time_offset` and `now()` minus the `end_time_offset`, that is, the time range between 40 seconds prior to `now()` and 20 seconds prior to `now()`. - -Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq4` running at DataNode if you set log level to DEBUG: - -```` -At **2021-05-11T22:18:40.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:20)`. -`cq4` generate 2 lines: -> -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| -|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -> -At **2021-05-11T22:19:00.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. -`cq4` generate 2 lines: -> -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| -|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -> -```` - -Notice that `cq4` will calculate the results for all time intervals only once after a delay of 20 seconds, so here are the results: - -```` -> SELECT temperature_max from root.ln.*.*; -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| -|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| -|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| -|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| -+-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ -```` - -#### CQ without group by clause - -Use an `EVERY` interval in the `RESAMPLE` clause to specify the CQ’s execution interval and the length of the CQ’s time range. - -```sql -CREATE CONTINUOUS QUERY cq5 -RESAMPLE EVERY 20s -BEGIN - SELECT temperature + 1 - INTO root.precalculated_sg.::(temperature) - FROM root.ln.*.* - align by device -END -``` - -`cq5` calculates the `temperature + 1` under the `root.ln` prefix path and stores the results in the `root.precalculated_sg` database. Sensors use the same prefix path as the corresponding sensor. - -`cq5` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq5` runs a single query that covers the time range for the current time bucket, that is, the 20-second time bucket that intersects with `now()`. - -Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq5` running at DataNode if you set log level to DEBUG: - -```` -At **2021-05-11T22:18:40.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. -`cq5` generate 16 lines: -> -+-----------------------------+-------------------------------+-----------+ -| Time| Device|temperature| -+-----------------------------+-------------------------------+-----------+ -|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02| 123.0| -|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02| 48.0| -|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02| 183.0| -|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02| 45.0| -|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01| 46.0| -|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01| 15.0| -|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01| 114.0| -|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01| 12.0| -|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02| 12.0| -|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02| 60.0| -|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02| 30.0| -|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02| 53.0| -|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01| 15.0| -|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01| 182.0| -|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01| 181.0| -|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01| 20.0| -+-----------------------------+-------------------------------+-----------+ -> -At **2021-05-11T22:19:00.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`. -`cq5` generate 12 lines: -> -+-----------------------------+-------------------------------+-----------+ -| Time| Device|temperature| -+-----------------------------+-------------------------------+-----------+ -|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02| 79.0| -|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02| 138.0| -|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02| 17.0| -|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01| 39.0| -|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01| 173.0| -|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01| 125.0| -|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02| 124.0| -|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02| 136.0| -|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02| 184.0| -|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01| 53.0| -|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01| 194.0| -|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01| 19.0| -+-----------------------------+-------------------------------+-----------+ -> -```` - -`cq5` won't deal with data that is before the current time window which is `2021-05-11T22:18:20.000+08:00`, so here are the results: - -```` -> SELECT temperature from root.precalculated_sg.*.* align by device; -+-----------------------------+-------------------------------+-----------+ -| Time| Device|temperature| -+-----------------------------+-------------------------------+-----------+ -|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02| 123.0| -|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02| 48.0| -|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02| 183.0| -|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02| 45.0| -|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02| 79.0| -|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02| 138.0| -|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02| 17.0| -|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01| 46.0| -|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01| 15.0| -|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01| 114.0| -|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01| 12.0| -|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01| 39.0| -|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01| 173.0| -|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01| 125.0| -|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02| 12.0| -|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02| 60.0| -|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02| 30.0| -|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02| 53.0| -|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02| 124.0| -|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02| 136.0| -|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02| 184.0| -|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01| 15.0| -|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01| 182.0| -|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01| 181.0| -|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01| 20.0| -|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01| 53.0| -|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01| 194.0| -|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01| 19.0| -+-----------------------------+-------------------------------+-----------+ -```` - -### CQ Management - -#### Listing continuous queries - -List every CQ on the IoTDB Cluster with: - -```sql -SHOW (CONTINUOUS QUERIES | CQS) -``` - -`SHOW (CONTINUOUS QUERIES | CQS)` order results by `cq_id`. - -##### Examples - -```sql -SHOW CONTINUOUS QUERIES; -``` - -we will get: - -| cq_id | query | state | -| :---------- | ------------------------------------------------------------ | ------ | -| s1_count_cq | CREATE CQ s1_count_cq
BEGIN
SELECT count(s1)
INTO root.sg_count.d.count_s1
FROM root.sg.d
GROUP BY(30m)
END | active | - - -#### Dropping continuous queries - -Drop a CQ with a specific `cq_id`: - -```sql -DROP (CONTINUOUS QUERY | CQ) -``` - -DROP CQ returns an empty result. - -##### Examples - -Drop the CQ named `s1_count_cq`: - -```sql -DROP CONTINUOUS QUERY s1_count_cq; -``` - -#### Altering continuous queries - -CQs can't be altered once they're created. To change a CQ, you must `DROP` and re`CREATE` it with the updated settings. - - -### CQ Use Cases - -#### Downsampling and Data Retention - -Use CQs with `TTL` set on database in IoTDB to mitigate storage concerns. Combine CQs and `TTL` to automatically downsample high precision data to a lower precision and remove the dispensable, high precision data from the database. - -#### Recalculating expensive queries - -Shorten query runtimes by pre-calculating expensive queries with CQs. Use a CQ to automatically downsample commonly-queried, high precision data to a lower precision. Queries on lower precision data require fewer resources and return faster. - -> Pre-calculate queries for your preferred graphing tool to accelerate the population of graphs and dashboards. - -#### Substituting for sub-query - -IoTDB does not support sub queries. We can get the same functionality by creating a CQ as a sub query and store its result into other time series and then querying from those time series again will be like doing nested sub query. - -##### Example - -IoTDB does not accept the following query with a nested sub query. The query calculates the average number of non-null values of `s1` at 30 minute intervals: - -```sql -SELECT avg(count_s1) from (select count(s1) as count_s1 from root.sg.d group by([0, now()), 30m)); -``` - -To get the same results: - -**Create a CQ** - -This step performs the nested sub query in from clause of the query above. The following CQ automatically calculates the number of non-null values of `s1` at 30 minute intervals and writes those counts into the new `root.sg_count.d.count_s1` time series. - -```sql -CREATE CQ s1_count_cq -BEGIN - SELECT count(s1) - INTO root.sg_count.d(count_s1) - FROM root.sg.d - GROUP BY(30m) -END -``` - -**Query the CQ results** - -Next step performs the avg([...]) part of the outer query above. - -Query the data in the time series `root.sg_count.d.count_s1` to calculate the average of it: - -```sql -SELECT avg(count_s1) from root.sg_count.d; -``` - - -### System Parameter Configuration - -| Name | Description | Data Type | Default Value | -| :------------------------------------------ | ------------------------------------------------------------ | --------- | ------------- | -| `continuous_query_submit_thread_count` | The number of threads in the scheduled thread pool that submit continuous query tasks periodically | int32 | 2 | -| `continuous_query_min_every_interval_in_ms` | The minimum value of the continuous query execution time interval | duration | 1000 | diff --git a/src/UserGuide/V1.3.0-2/User-Manual/IoTDB-View_timecho.md b/src/UserGuide/V1.3.0-2/User-Manual/IoTDB-View_timecho.md deleted file mode 100644 index 161890722..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/IoTDB-View_timecho.md +++ /dev/null @@ -1,548 +0,0 @@ - - -# View - -## Sequence View Application Background - -## Application Scenario 1 Time Series Renaming (PI Asset Management) - -In practice, the equipment collecting data may be named with identification numbers that are difficult to be understood by human beings, which brings difficulties in querying to the business layer. - -The Sequence View, on the other hand, is able to re-organise the management of these sequences and access them using a new model structure without changing the original sequence content and without the need to create new or copy sequences. - -**For example**: a cloud device uses its own NIC MAC address to form entity numbers and stores data by writing the following time sequence:`root.db.0800200A8C6D.xvjeifg`. - -It is difficult for the user to understand. However, at this point, the user is able to rename it using the sequence view feature, map it to a sequence view, and use `root.view.device001.temperature` to access the captured data. - -### Application Scenario 2 Simplifying business layer query logic - -Sometimes users have a large number of devices that manage a large number of time series. When conducting a certain business, the user wants to deal with only some of these sequences. At this time, the focus of attention can be picked out by the sequence view function, which is convenient for repeated querying and writing. - -**For example**: Users manage a product assembly line with a large number of time series for each segment of the equipment. The temperature inspector only needs to focus on the temperature of the equipment, so he can extract the temperature-related sequences and compose the sequence view. - -### Application Scenario 3 Auxiliary Rights Management - -In the production process, different operations are generally responsible for different scopes. For security reasons, it is often necessary to restrict the access scope of the operations staff through permission management. - -**For example**: The safety management department now only needs to monitor the temperature of each device in a production line, but these data are stored in the same database with other confidential data. At this point, it is possible to create a number of new views that contain only temperature-related time series on the production line, and then to give the security officer access to only these sequence views, thus achieving the purpose of permission restriction. - -### Motivation for designing sequence view functionality - -Combining the above two types of usage scenarios, the motivations for designing sequence view functionality, are: - -1. time series renaming. -2. to simplify the query logic at the business level. -3. Auxiliary rights management, open data to specific users through the view. - -## Sequence View Concepts - -### Terminology Concepts - -Concept: If not specified, the views specified in this document are **Sequence Views**, and new features such as device views may be introduced in the future. - -### Sequence view - -A sequence view is a way of organising the management of time series. - -In traditional relational databases, data must all be stored in a table, whereas in time series databases such as IoTDB, it is the sequence that is the storage unit. Therefore, the concept of sequence views in IoTDB is also built on sequences. - -A sequence view is a virtual time series, and each virtual time series is like a soft link or shortcut that maps to a sequence or some kind of computational logic external to a certain view. In other words, a virtual sequence either maps to some defined external sequence or is computed from multiple external sequences. - -Users can create views using complex SQL queries, where the sequence view acts as a stored query statement, and when data is read from the view, the stored query statement is used as the source of the data in the FROM clause. - -### Alias Sequences - -There is a special class of beings in a sequence view that satisfy all of the following conditions: - -1. the data source is a single time series -2. there is no computational logic -3. no filtering conditions (e.g., no WHERE clause restrictions). - -Such a sequence view is called an **alias sequence**, or alias sequence view. A sequence view that does not fully satisfy all of the above conditions is called a non-alias sequence view. The difference between them is that only aliased sequences support write functionality. - -** All sequence views, including aliased sequences, do not currently support Trigger functionality. ** - -### Nested Views - -A user may want to select a number of sequences from an existing sequence view to form a new sequence view, called a nested view. - -**The current version does not support the nested view feature**. - -### Some constraints on sequence views in IoTDB - -#### Constraint 1 A sequence view must depend on one or several time series - -A sequence view has two possible forms of existence: - -1. it maps to a time series -2. it is computed from one or more time series. - -The former form of existence has been exemplified in the previous section and is easy to understand; the latter form of existence here is because the sequence view allows for computational logic. - -For example, the user has installed two thermometers in the same boiler and now needs to calculate the average of the two temperature values as a measurement. The user has captured the following two sequences: `root.db.d01.temperature01`, `root.db.d01.temperature02`. - -At this point, the user can use the average of the two sequences as one sequence in the view: `root.db.d01.avg_temperature`. - -This example will 3.1.2 expand in detail. - -#### Restriction 2 Non-alias sequence views are read-only - -Writing to non-alias sequence views is not allowed. - -Only aliased sequence views are supported for writing. - -#### Restriction 3 Nested views are not allowed - -It is not possible to select certain columns in an existing sequence view to create a sequence view, either directly or indirectly. - -An example of this restriction will be given in 3.1.3. - -#### Restriction 4 Sequence view and time series cannot be renamed - -Both sequence views and time series are located under the same tree, so they cannot be renamed. - -The name (path) of any sequence should be uniquely determined. - -#### Restriction 5 Sequence views share timing data with time series, metadata such as labels are not shared - -Sequence views are mappings pointing to time series, so they fully share timing data, with the time series being responsible for persistent storage. - -However, their metadata such as tags and attributes are not shared. - -This is because the business query, view-oriented users are concerned about the structure of the current view, and if you use group by tag and other ways to do the query, obviously want to get the view contains the corresponding tag grouping effect, rather than the time series of the tag grouping effect (the user is not even aware of those time series). - -## Sequence view functionality - -### Creating a view - -Creating a sequence view is similar to creating a time series, the difference is that you need to specify the data source, i.e., the original sequence, through the AS keyword. - -#### SQL for creating a view - -User can select some sequences to create a view: - -```SQL -CREATE VIEW root.view.device.status -AS - SELECT s01 - FROM root.db.device -``` - -It indicates that the user has selected the sequence `s01` from the existing device `root.db.device`, creating the sequence view `root.view.device.status`. - -The sequence view can exist under the same entity as the time series, for example: - -```SQL -CREATE VIEW root.db.device.status -AS - SELECT s01 - FROM root.db.device -``` - -Thus, there is a virtual copy of `s01` under `root.db.device`, but with a different name `status`. - -It can be noticed that the sequence views in both of the above examples are aliased sequences, and we are giving the user a more convenient way of creating a sequence for that sequence: - -```SQL -CREATE VIEW root.view.device.status -AS - root.db.device.s01 -``` - -#### Creating views with computational logic - -Following the example in section 2.2 Limitations 1: - -> A user has installed two thermometers in the same boiler and now needs to calculate the average of the two temperature values as a measurement. The user has captured the following two sequences: `root.db.d01.temperature01`, `root.db.d01.temperature02`. -> -> At this point, the user can use the two sequences averaged as one sequence in the view: `root.view.device01.avg_temperature`. - -If the view is not used, the user can query the average of the two temperatures like this: - -```SQL -SELECT (temperature01 + temperature02) / 2 -FROM root.db.d01 -``` - -And if using a sequence view, the user can create a view this way to simplify future queries: - -```SQL -CREATE VIEW root.db.d01.avg_temperature -AS - SELECT (temperature01 + temperature02) / 2 - FROM root.db.d01 -``` - -The user can then query it like this: - -```SQL -SELECT avg_temperature FROM root.db.d01 -``` - -#### Nested sequence views not supported - -Continuing with the example from 3.1.2, the user now wants to create a new view using the sequence view `root.db.d01.avg_temperature`, which is not allowed. We currently do not support nested views, whether it is an aliased sequence or not. - -For example, the following SQL statement will report an error: - -```SQL -CREATE VIEW root.view.device.avg_temp_copy -AS - root.db.d01.avg_temperature -- Not supported. Nested views are not allowed -``` - -#### Creating multiple sequence views at once - -If only one sequence view can be specified at a time which is not convenient for the user to use, then multiple sequences can be specified at a time, for example: - -```SQL -CREATE VIEW root.db.device.status, root.db.device.sub.hardware -AS - SELECT s01, s02 - FROM root.db.device -``` - -此外,上述写法可以做简化: - -```SQL -CREATE VIEW root.db.device(status, sub.hardware) -AS - SELECT s01, s02 - FROM root.db.device -``` - -Both statements above are equivalent to the following typing: - -```SQL -CREATE VIEW root.db.device.status -AS - SELECT s01 - FROM root.db.device; - -CREATE VIEW root.db.device.sub.hardware -AS - SELECT s02 - FROM root.db.device -``` - -is also equivalent to the following: - -```SQL -CREATE VIEW root.db.device.status, root.db.device.sub.hardware -AS - root.db.device.s01, root.db.device.s02 - --- or - -CREATE VIEW root.db.device(status, sub.hardware) -AS - root.db.device(s01, s02) -``` - -##### The mapping relationships between all sequences are statically stored - -Sometimes, the SELECT clause may contain a number of statements that can only be determined at runtime, such as below: - -```SQL -SELECT s01, s02 -FROM root.db.d01, root.db.d02 -``` - -The number of sequences that can be matched by the above statement is uncertain and is related to the state of the system. Even so, the user can use it to create views. - -However, it is important to note that the mapping relationship between all sequences is stored statically (fixed at creation)! Consider the following example: - -The current database contains only three sequences `root.db.d01.s01`, `root.db.d02.s01`, `root.db.d02.s02`, and then the view is created: - -```SQL -CREATE VIEW root.view.d(alpha, beta, gamma) -AS - SELECT s01, s02 - FROM root.db.d01, root.db.d02 -``` - -The mapping relationship between time series is as follows: - -| sequence number | time series | sequence view | -| ---- | ----------------- | ----------------- | -| 1 | `root.db.d01.s01` | root.view.d.alpha | -| 2 | `root.db.d02.s01` | root.view.d.beta | -| 3 | `root.db.d02.s02` | root.view.d.gamma | - -After that, if the user adds the sequence `root.db.d01.s02`, it does not correspond to any view; then, if the user deletes `root.db.d01.s01`, the query for `root.view.d.alpha` will report an error directly, and it will not correspond to `root.db.d01.s02` either. - -Please always note that inter-sequence mapping relationships are stored statically and solidly. - -#### Batch Creation of Sequence Views - -There are several existing devices, each with a temperature value, for example: - -1. root.db.d1.temperature -2. root.db.d2.temperature -3. ... - -There may be many other sequences stored under these devices (e.g. `root.db.d1.speed`), but for now it is possible to create a view that contains only the temperature values for these devices, without relation to the other sequences:. - -```SQL -CREATE VIEW root.db.view(${2}_temperature) -AS - SELECT temperature FROM root.db.* -``` - -This is modelled on the query writeback (`SELECT INTO`) convention for naming rules, which uses variable placeholders to specify naming rules. See also: [QUERY WRITEBACK (SELECT INTO)](../User-Manual/Query-Data.md#into-clause-query-write-back) - -Here `root.db.*.temperature` specifies what time series will be included in the view; and `${2}` specifies from which node in the time series the name is extracted to name the sequence view. - -Here, `${2}` refers to level 2 (starting at 0) of `root.db.*.temperature`, which is the result of the `*` match; and `${2}_temperature` is the result of the match and `temperature` spliced together with underscores to make up the node names of the sequences under the view. - -The above statement for creating a view is equivalent to the following writeup: - -```SQL -CREATE VIEW root.db.view(${2}_${3}) -AS - SELECT temperature from root.db.* -``` - -The final view contains these sequences: - -1. root.db.view.d1_temperature -2. root.db.view.d2_temperature -3. ... - -Created using wildcards, only static mapping relationships at the moment of creation will be stored. - -#### SELECT clauses are somewhat limited when creating views - -The SELECT clause used when creating a serial view is subject to certain restrictions. The main restrictions are as follows: - -1. the `WHERE` clause cannot be used. -2. `GROUP BY` clause cannot be used. -3. `MAX_VALUE` and other aggregation functions cannot be used. - -Simply put, after `AS` you can only use `SELECT ... FROM ... ` and the results of this query must form a time series. - -### View Data Queries - -For the data query functions that can be supported, the sequence view and time series can be used indiscriminately with identical behaviour when performing time series data queries. - -**The types of queries that are not currently supported by the sequence view are as follows:** - -1. **align by device query -2. **group by tags query - -Users can also mix time series and sequence view queries in the same SELECT statement, for example: - -```SQL -SELECT temperature01, temperature02, avg_temperature -FROM root.db.d01 -WHERE temperature01 < temperature02 -``` - -However, if the user wants to query the metadata of the sequence, such as tag, attributes, etc., the query is the result of the sequence view, not the result of the time series referenced by the sequence view. - -In addition, for aliased sequences, if the user wants to get information about the time series such as tags, attributes, etc., the user needs to query the mapping of the view columns to find the corresponding time series, and then query the time series for the tags, attributes, etc. The method of querying the mapping of the view columns will be explained in section 3.5. - -### Modify Views - -The modification operations supported by the view include: modifying its calculation logic,modifying tag/attributes/aliases, and deleting. - -#### Modify view data source - -```SQL -ALTER VIEW root.view.device.status -AS - SELECT s01 - FROM root.ln.wf.d01 -``` - -#### Modify the view's calculation logic - -```SQL -ALTER VIEW root.db.d01.avg_temperature -AS - SELECT (temperature01 + temperature02 + temperature03) / 3 - FROM root.db.d01 -``` - -#### Tag point management - -- Add a new -tag -```SQL -ALTER view root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 -``` - -- Add a new attribute - -```SQL -ALTER view root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 -``` - -- rename tag or attribute - -```SQL -ALTER view root.turbine.d1.s1 RENAME tag1 TO newTag1 -``` - -- Reset the value of a tag or attribute - -```SQL -ALTER view root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 -``` - -- Delete an existing tag or attribute - -```SQL -ALTER view root.turbine.d1.s1 DROP tag1, tag2 -``` - -- Update insert aliases, tags and attributes - -> If the alias, tag or attribute did not exist before, insert it, otherwise, update the old value with the new one. - -```SQL -ALTER view root.turbine.d1.s1 UPSERT TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4) -``` - -#### Deleting Views - -Since a view is a sequence, a view can be deleted as if it were a time series. - - -```SQL -DELETE VIEW root.view.device.avg_temperatue -``` - -### View Synchronisation - - -#### If the dependent original sequence is deleted - -When the sequence view is queried (when the sequence is parsed), **the empty result set** is returned if the dependent time series does not exist. - -This is similar to the feedback for querying a non-existent sequence, but with a difference: if the dependent time series cannot be parsed, the empty result set is the one that contains the table header as a reminder to the user that the view is problematic. - -Additionally, when the dependent time series is deleted, no attempt is made to find out if there is a view that depends on the column, and the user receives no warning. - -#### Data Writes to Non-Aliased Sequences Not Supported - -Writes to non-alias sequences are not supported. - -Please refer to the previous section 2.1.6 Restrictions2 for more details. - -#### Metadata for sequences is not shared - -Please refer to the previous section 2.1.6 Restriction 5 for details. - -### View Metadata Queries - -View metadata query specifically refers to querying the metadata of the view itself (e.g., how many columns the view has), as well as information about the views in the database (e.g., what views are available). - -#### Viewing Current View Columns - -The user has two ways of querying: - -1. a query using `SHOW TIMESERIES`, which contains both time series and series views. This query contains both the time series and the sequence view. However, only some of the attributes of the view can be displayed. -2. a query using `SHOW VIEW`, which contains only the sequence view. It displays the complete properties of the sequence view. - -Example: - -```Shell -IoTDB> show timeseries; -+--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType| -+--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -|root.db.device.s01 | null| root.db| INT32| RLE| SNAPPY|null| null| null| null| BASE| -+--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -|root.db.view.status | null| root.db| INT32| RLE| SNAPPY|null| null| null| null| VIEW| -+--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -|root.db.d01.temp01 | null| root.db| FLOAT| RLE| SNAPPY|null| null| null| null| BASE| -+--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -|root.db.d01.temp02 | null| root.db| FLOAT| RLE| SNAPPY|null| null| null| null| BASE| -+--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -|root.db.d01.avg_temp| null| root.db| FLOAT| null| null|null| null| null| null| VIEW| -+--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -Total line number = 5 -It costs 0.789s -IoTDB> -``` - -The last column `ViewType` shows the type of the sequence, the time series is BASE and the sequence view is VIEW. - -In addition, some of the sequence view properties will be missing, for example `root.db.d01.avg_temp` is calculated from temperature averages, so the `Encoding` and `Compression` properties are null values. - -In addition, the query results of the `SHOW TIMESERIES` statement are divided into two main parts. - -1. information about the timing data, such as data type, compression, encoding, etc. -2. other metadata information, such as tag, attribute, database, etc. - -For the sequence view, the temporal data information presented is the same as the original sequence or null (e.g., the calculated average temperature has a data type but no compression method); the metadata information presented is the content of the view. - -To learn more about the view, use `SHOW ``VIEW`. The `SHOW ``VIEW` shows the source of the view's data, etc. - -```Shell -IoTDB> show VIEW root.**; -+--------------------+--------+--------+----+----------+--------+-----------------------------------------+ -| Timeseries|Database|DataType|Tags|Attributes|ViewType| SOURCE| -+--------------------+--------+--------+----+----------+--------+-----------------------------------------+ -|root.db.view.status | root.db| INT32|null| null| VIEW| root.db.device.s01| -+--------------------+--------+--------+----+----------+--------+-----------------------------------------+ -|root.db.d01.avg_temp| root.db| FLOAT|null| null| VIEW|(root.db.d01.temp01+root.db.d01.temp02)/2| -+--------------------+--------+--------+----+----------+--------+-----------------------------------------+ -Total line number = 2 -It costs 0.789s -IoTDB> -``` - -The last column, `SOURCE`, shows the data source for the sequence view, listing the SQL statement that created the sequence. - -##### About Data Types - -Both of the above queries involve the data type of the view. The data type of a view is inferred from the original time series type of the query statement or alias sequence that defines the view. This data type is computed in real time based on the current state of the system, so the data type queried at different moments may be changing. - -## FAQ - -#### Q1: I want the view to implement the function of type conversion. For example, a time series of type int32 was originally placed in the same view as other series of type int64. I now want all the data queried through the view to be automatically converted to int64 type. - -> Ans: This is not the function of the sequence view. But the conversion can be done using `CAST`, for example: - -```SQL -CREATE VIEW root.db.device.int64_status -AS - SELECT CAST(s1, 'type'='INT64') from root.db.device -``` - -> This way, a query for `root.view.status` will yield a result of type int64. -> -> Please note in particular that in the above example, the data for the sequence view is obtained by `CAST` conversion, so `root.db.device.int64_status` is not an aliased sequence, and thus **not supported for writing**. - -#### Q2: Is default naming supported? Select a number of time series and create a view; but I don't specify the name of each series, it is named automatically by the database? - -> Ans: Not supported. Users must specify the naming explicitly. - -#### Q3: In the original system, create time series `root.db.device.s01`, you can find that database `root.db` is automatically created and device `root.db.device` is automatically created. Next, deleting the time series `root.db.device.s01` reveals that `root.db.device` was automatically deleted, while `root.db` remained. Will this mechanism be followed for creating views? What are the considerations? - -> Ans: Keep the original behaviour unchanged, the introduction of view functionality will not change these original logics. - -#### Q4: Does it support sequence view renaming? - -> A: Renaming is not supported in the current version, you can create your own view with new name to put it into use. \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Maintennance.md b/src/UserGuide/V1.3.0-2/User-Manual/Maintennance.md deleted file mode 100644 index d0d4c8326..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Maintennance.md +++ /dev/null @@ -1,371 +0,0 @@ - - - - -# Maintennance - -## Explain/Explain Analyze Statements - -The purpose of query analysis is to assist users in understanding the execution mechanism and performance bottlenecks of queries, thereby facilitating query optimization and performance enhancement. This is crucial not only for the efficiency of query execution but also for the user experience of applications and the efficient utilization of resources. For effective query analysis, IoTDB versions V1.3.2 and above offer the query analysis statements: Explain and Explain Analyze. - -- Explain Statement: The Explain statement allows users to preview the execution plan of a query SQL, including how IoTDB organizes data retrieval and processing. - -- Explain Analyze Statement: The Explain Analyze statement builds upon the Explain statement by incorporating performance analysis, fully executing the SQL, and displaying the time and resource consumption during the query execution process. This provides IoTDB users with detailed information to deeply understand the details of the query and to perform query optimization. Compared to other common IoTDB troubleshooting methods, Explain Analyze imposes no deployment burden and can analyze a single SQL statement, which can better pinpoint issues. - -The comparison of various methods is as follows: - -| Method | Installation Difficulty | Business Impact | Functional Scope | -| :------------------ | :----------------------------------------------------------- | :------------------------------------------------ | :----------------------------------------------------------- | -| Explain Analyze Statement | Low. No additional components are needed; it's a built-in SQL statement of IoTDB. | Low. It only affects the single query being analyzed, with no impact on other online loads. | Supports distributed systems, and can track a single SQL statement. | -| Monitoring Panel | Medium. Requires the installation of the IoTDB monitoring panel tool (an enterprise version tool) and the activation of the IoTDB monitoring service. | Medium. The IoTDB monitoring service's recording of metrics will introduce additional latency. | Supports distributed systems, but only analyzes the overall query load and time consumption of the database. | -| Arthas Sampling | Medium. Requires the installation of the Java Arthas tool (Arthas cannot be directly installed in some intranets, and sometimes a restart of the application is needed after installation). | High. CPU sampling may affect the response speed of online business. | Does not support distributed systems and only analyzes the overall query load and time consumption of the database. | - -### Explain Statement - -#### Syntax - -The Explain command enables users to view the execution plan of a SQL query. The execution plan is presented in the form of operators, describing how IoTDB will execute the query. The syntax is as follows, where SELECT_STATEMENT is the SQL statement related to the query: - -```SQL -EXPLAIN -``` - -The results returned by Explain include information such as data access strategies, whether filter conditions are pushed down, and how the query plan is distributed across different nodes, providing users with a means to visualize the internal execution logic of the query. - -#### Example - -```SQL - -# Insert data - -insert into root.explain.data(timestamp, column1, column2) values(1710494762, "hello", "explain") - -# Execute explain statement - -explain select * from root.explain.data -``` - -Executing the above SQL will yield the following results. It is evident that IoTDB uses two SeriesScan nodes to retrieve the data for column1 and column2, and finally connects them through a fullOuterTimeJoin. - -```Plain -+-----------------------------------------------------------------------+ -| distribution plan| -+-----------------------------------------------------------------------+ -| ┌───────────────────┐ | -| │FullOuterTimeJoin-3│ | -| │Order: ASC │ | -| └───────────────────┘ | -| ┌─────────────────┴─────────────────┐ | -| │ │ | -|┌─────────────────────────────────┐ ┌─────────────────────────────────┐| -|│SeriesScan-4 │ │SeriesScan-5 │| -|│Series: root.explain.data.column1│ │Series: root.explain.data.column2│| -|│Partition: 3 │ │Partition: 3 │| -|└─────────────────────────────────┘ └─────────────────────────────────┘| -+-----------------------------------------------------------------------+ -``` - -### Explain Analyze Statement - -#### Syntax - -Explain Analyze is a performance analysis SQL that comes with the IoTDB query engine. Unlike Explain, it executes the corresponding query plan and collects execution information, which can be used to track the specific performance distribution of a query, for observing resources, performance tuning, and anomaly analysis. The syntax is as follows: - -```SQL -EXPLAIN ANALYZE [VERBOSE] -``` - -Where SELECT_STATEMENT corresponds to the query statement that needs to be analyzed; VERBOSE prints detailed analysis results, and when VERBOSE is not filled in, EXPLAIN ANALYZE will omit some information. - -In the EXPLAIN ANALYZE result set, the following information is included: - - -![explain-analyze-1.png](/img/explain-analyze-1.png) - - -- QueryStatistics contains statistical information at the query level, mainly including the time spent in the planning and parsing phase, Fragment metadata, and other information. -- FragmentInstance is an encapsulation of the query plan on a node by IoTDB. Each node will output a Fragment information in the result set, mainly including FragmentStatistics and operator information. FragmentStatistics contains statistical information of the Fragment, including total actual time (wall time), TsFile involved, scheduling information, etc. At the same time, the statistical information of the plan nodes under this Fragment will be displayed in a hierarchical way of the node tree, mainly including: CPU running time, the number of output data rows, the number of times the specified interface is called, the memory occupied, and the custom information exclusive to the node. - -#### Special Instructions - -1. Simplification of Explain Analyze Statement Results - -Since the Fragment will output all the node information executed in the current node, when a query involves too many series, each node is output, which will cause the result set returned by Explain Analyze to be too large. Therefore, when the same type of node exceeds 10, the system will automatically merge all the same types of nodes under the current Fragment, and the merged statistical information is also accumulated. Some custom information that cannot be merged will be directly discarded (as shown in the figure below). - -![explain-analyze-2.png](/img/explain-analyze-2.png) - -Users can also modify the configuration item `merge_threshold_of_explain_analyze` in `iotdb-common.properties` to set the threshold for triggering the merge of nodes. This parameter supports hot loading. - -2. Use of Explain Analyze Statement in Query Timeout Scenarios - -Explain Analyze itself is a special query. When the execution times out, it cannot be analyzed with the Explain Analyze statement. In order to be able to investigate the cause of the timeout through the analysis results even when the query times out, Explain Analyze also provides a timing log mechanism (no user configuration is required), which will output the current results of Explain Analyze in the form of text to a special log at a certain time interval. When the query times out, users can go to `logs/log_explain_analyze.log` to check the corresponding log for investigation. - -The time interval of the log is calculated based on the query timeout time to ensure that at least two result records will be saved before the timeout. - -#### Example - -Here is an example of Explain Analyze: - -```SQL - -# Insert data - -insert into root.explain.analyze.data(timestamp, column1, column2, column3) values(1710494762, "hello", "explain", "analyze") -insert into root.explain.analyze.data(timestamp, column1, column2, column3) values(1710494862, "hello2", "explain2", "analyze2") -insert into root.explain.analyze.data(timestamp, column1, column2, column3) values(1710494962, "hello3", "explain3", "analyze3") - -# Execute explain analyze statement - -explain analyze select column2 from root.explain.analyze.data order by column1 -``` - -The output is as follows: - - -```Plain -+-------------------------------------------------------------------------------------------------+ -| Explain Analyze| -+-------------------------------------------------------------------------------------------------+ -|Analyze Cost: 1.739 ms | -|Fetch Partition Cost: 0.940 ms | -|Fetch Schema Cost: 0.066 ms | -|Logical Plan Cost: 0.000 ms | -|Logical Optimization Cost: 0.000 ms | -|Distribution Plan Cost: 0.000 ms | -|Fragment Instances Count: 1 | -| | -|FRAGMENT-INSTANCE[Id: 20240315_115800_00030_1.2.0][IP: 127.0.0.1][DataRegion: 4][State: FINISHED]| -| Total Wall Time: 25 ms | -| Cost of initDataQuerySource: 0.175 ms | -| Seq File(unclosed): 0, Seq File(closed): 1 | -| UnSeq File(unclosed): 0, UnSeq File(closed): 0 | -| ready queued time: 0.280 ms, blocked queued time: 2.456 ms | -| [PlanNodeId 10]: IdentitySinkNode(IdentitySinkOperator) | -| CPU Time: 0.780 ms | -| output: 1 rows | -| HasNext() Called Count: 3 | -| Next() Called Count: 2 | -| Estimated Memory Size: : 1245184 | -| [PlanNodeId 5]: TransformNode(TransformOperator) | -| CPU Time: 0.764 ms | -| output: 1 rows | -| HasNext() Called Count: 3 | -| Next() Called Count: 2 | -| Estimated Memory Size: : 1245184 | -| [PlanNodeId 4]: SortNode(SortOperator) | -| CPU Time: 0.721 ms | -| output: 1 rows | -| HasNext() Called Count: 3 | -| Next() Called Count: 2 | -| sortCost/ns: 1125 | -| sortedDataSize: 272 | -| prepareCost/ns: 610834 | -| [PlanNodeId 3]: FullOuterTimeJoinNode(FullOuterTimeJoinOperator) | -| CPU Time: 0.706 ms | -| output: 1 rows | -| HasNext() Called Count: 5 | -| Next() Called Count: 1 | -| [PlanNodeId 7]: SeriesScanNode(SeriesScanOperator) | -| CPU Time: 1.085 ms | -| output: 1 rows | -| HasNext() Called Count: 2 | -| Next() Called Count: 1 | -| SeriesPath: root.explain.analyze.data.column2 | -| [PlanNodeId 8]: SeriesScanNode(SeriesScanOperator) | -| CPU Time: 1.091 ms | -| output: 1 rows | -| HasNext() Called Count: 2 | -| Next() Called Count: 1 | -| SeriesPath: root.explain.analyze.data.column1 | -+-------------------------------------------------------------------------------------------------+ -``` - -Example of Partial Results After Triggering Merge: - -```Plain -Analyze Cost: 143.679 ms -Fetch Partition Cost: 22.023 ms -Fetch Schema Cost: 63.086 ms -Logical Plan Cost: 0.000 ms -Logical Optimization Cost: 0.000 ms -Distribution Plan Cost: 0.000 ms -Fragment Instances Count: 2 - -FRAGMENT-INSTANCE[Id: 20240311_041502_00001_1.2.0][IP: 192.168.130.9][DataRegion: 14] - Total Wall Time: 39964 ms - Cost of initDataQuerySource: 1.834 ms - Seq File(unclosed): 0, Seq File(closed): 3 - UnSeq File(unclosed): 0, UnSeq File(closed): 0 - ready queued time: 504.334 ms, blocked queued time: 25356.419 ms - [PlanNodeId 20793]: IdentitySinkNode(IdentitySinkOperator) Count: * 1 - CPU Time: 24440.724 ms - input: 71216 rows - HasNext() Called Count: 35963 - Next() Called Count: 35962 - Estimated Memory Size: : 33882112 - [PlanNodeId 10385]: FullOuterTimeJoinNode(FullOuterTimeJoinOperator) Count: * 8 - CPU Time: 41437.708 ms - input: 243011 rows - HasNext() Called Count: 41965 - Next() Called Count: 41958 - Estimated Memory Size: : 33882112 - [PlanNodeId 11569]: SeriesScanNode(SeriesScanOperator) Count: * 1340 - CPU Time: 1397.822 ms - input: 134000 rows - HasNext() Called Count: 2353 - Next() Called Count: 1340 - Estimated Memory Size: : 32833536 - [PlanNodeId 20778]: ExchangeNode(ExchangeOperator) Count: * 7 - CPU Time: 109.245 ms - input: 71891 rows - HasNext() Called Count: 1431 - Next() Called Count: 1431 - -FRAGMENT-INSTANCE[Id: 20240311_041502_00001_1.3.0][IP: 192.168.130.9][DataRegion: 11] - Total Wall Time: 39912 ms - Cost of initDataQuerySource: 15.439 ms - Seq File(unclosed): 0, Seq File(closed): 2 - UnSeq File(unclosed): 0, UnSeq File(closed): 0 - ready queued time: 152.988 ms, blocked queued time: 37775.356 ms - [PlanNodeId 20786]: IdentitySinkNode(IdentitySinkOperator) Count: * 1 - CPU Time: 2020.258 ms - input: 48800 rows - HasNext() Called Count: 978 - Next() Called Count: 978 - Estimated Memory Size: : 42336256 - [PlanNodeId 20771]: FullOuterTimeJoinNode(FullOuterTimeJoinOperator) Count: * 8 - CPU Time: 5255.307 ms - input: 195800 rows - HasNext() Called Count: 2455 - Next() Called Count: 2448 - Estimated Memory Size: : 42336256 - [PlanNodeId 11867]: SeriesScanNode(SeriesScanOperator) Count: * 1680 - CPU Time: 1248.080 ms - input: 168000 rows - HasNext() Called Count: 3198 - Next() Called Count: 1680 - Estimated Memory Size: : 41287680 - -...... -``` - - -### Common Issues - -#### What is the difference between WALL TIME and CPU TIME? - -CPU time, also known as processor time or CPU usage time, refers to the actual time the CPU is occupied with computation during the execution of a program, indicating the actual consumption of processor resources by the program. - -Wall time, also known as real time or physical time, refers to the total time from the start to the end of a program's execution, including all waiting times. - -1. Scenarios where WALL TIME < CPU TIME: For example, a query slice is finally executed in parallel by the scheduler using two threads. In the real physical world, 10 seconds have passed, but the two threads may have occupied two CPU cores and run for 10 seconds each, so the CPU time would be 20 seconds, while the wall time would be 10 seconds. - -2. Scenarios where WALL TIME > CPU TIME: Since there may be multiple queries running in parallel within the system, but the number of execution threads and memory is fixed, - 1. So when a query slice is blocked by some resources (such as not having enough memory for data transfer or waiting for upstream data), it will be put into the Blocked Queue. At this time, the query slice will not occupy CPU time, but the WALL TIME (real physical time) is still advancing. - 2. Or when the query thread resources are insufficient, for example, there are currently 16 query threads in total, but there are 20 concurrent query slices within the system. Even if all queries are not blocked, only 16 query slices can run in parallel at the same time, and the other four will be put into the READY QUEUE, waiting to be scheduled for execution. At this time, the query slice will not occupy CPU time, but the WALL TIME (real physical time) is still advancing. - -#### Is there any additional overhead with Explain Analyze, and is the measured time different from when the query is actually executed? - -Almost none, because the explain analyze operator is executed by a separate thread to collect the statistical information of the original query, and these statistical information, even if not explain analyze, the original query will also generate, but no one goes to get it. And explain analyze is a pure next traversal of the result set, which will not be printed, so there will be no significant difference from the actual execution time of the original query. - -#### What are the main indicators to focus on for IO time consumption? - -The main indicators that may involve IO time consumption are loadTimeSeriesMetadataDiskSeqTime, loadTimeSeriesMetadataDiskUnSeqTime, and construct[NonAligned/Aligned]ChunkReadersDiskTime. - -The loading of TimeSeriesMetadata statistics is divided into sequential and unaligned files, but the reading of Chunks is not temporarily separated, but the proportion of sequential and unaligned can be calculated based on the proportion of TimeSeriesMetadata. - -#### Can the impact of unaligned data on query performance be demonstrated with some indicators? - -There are mainly two impacts of unaligned data: - -1. An additional merge sort needs to be done in memory (it is generally believed that this time consumption is relatively short, after all, it is a pure memory CPU operation) - -2. Unaligned data will generate overlapping time ranges between data blocks, making statistical information unusable - 1. Unable to directly skip the entire chunk that does not meet the value filtering requirements using statistical information - 1. Generally, the user's query only includes time filtering conditions, so there will be no impact - 2. Unable to directly calculate the aggregate value using statistical information without reading the data - -At present, there is no effective observation method for the performance impact of unaligned data alone, unless a query is executed when there is unaligned data, and then executed again after the unaligned data is merged, in order to compare. - -Because even if this part of the unaligned data is entered into the sequence, IO, compression, and decoding are also required. This time cannot be reduced, and it will not be reduced just because the unaligned data has been merged into the unaligned. - -#### Why is there no output in the log_explain_analyze.log when the query times out during the execution of explain analyze? - -During the upgrade, only the lib package was replaced, and the conf/logback-datanode.xml was not replaced. It needs to be replaced, and there is no need to restart (the content of this file can be hot loaded). After waiting for about 1 minute, re-execute explain analyze verbose. - - -### Practical Case Studies - -#### Case Study 1: The query involves too many files, and disk IO becomes a bottleneck, causing the query speed to slow down. - -![explain-analyze-3.png](/img/explain-analyze-3.png) - -The total query time is 938 ms, of which the time to read the index area and data area from the files accounts for 918 ms, involving a total of 289 files. Assuming the query involves N TsFiles, the theoretical time for the first query (not hitting the cache) is cost = N * (t_seek + t_index + t_seek + t_chunk). Based on experience, the time for a single seek on an HDD disk is about 5-10ms, so the more files involved in the query, the greater the query delay will be. - -The final optimization plan is: - -1. Adjust the merge parameters to reduce the number of files - -2. Replace HDD with SSD to reduce the latency of a single disk IO - - -#### Case Study 2: The execution of the like predicate is slow, causing the query to time out - -When executing the following SQL, the query times out (the default timeout is 60 seconds) - -```SQL -select count(s1) as total from root.db.d1 where s1 like '%XXXXXXXX%' -``` - -When executing explain analyze verbose, even if the query times out, the intermediate collection results will be output to log_explain_analyze.log every 15 seconds. The last two outputs obtained from log_explain_analyze.log are as follows: - -![explain-analyze-4.png](/img/explain-analyze-4.png) - -![explain-analyze-5.png](/img/explain-analyze-5.png) - -Observing the results, we found that it is because the query did not add a time condition, involving too much data, and the time of constructAlignedChunkReadersDiskTime and pageReadersDecodeAlignedDiskTime has been increasing, which means that new chunks are being read all the time. However, the output information of AlignedSeriesScanNode has always been 0, because the operator only gives up the time slice and updates the information when at least one line of data that meets the condition is output. Looking at the total reading time (loadTimeSeriesMetadataAlignedDiskSeqTime + loadTimeSeriesMetadataAlignedDiskUnSeqTime + constructAlignedChunkReadersDiskTime + pageReadersDecodeAlignedDiskTime = about 13.4 seconds), the other time (60s - 13.4 = 46.6) should all be spent on executing the filtering condition (the execution of the like predicate is very time-consuming). - -The final optimization plan is: Add a time filtering condition to avoid a full table scan. - -## Start/Stop Repair Data Statements -Used to repair the unsorted data generate by system bug.(Supported version: 1.3.1 and later) -### START REPAIR DATA - -Start a repair task to scan all files created before current time. -The repair task will scan all tsfiles and repair some bad files. - -```sql -IoTDB> START REPAIR DATA -IoTDB> START REPAIR DATA ON LOCAL -IoTDB> START REPAIR DATA ON CLUSTER -``` - -### STOP REPAIR DATA - -Stop the running repair task. To restart the stopped task. -If there is a stopped repair task, it can be restart and recover the repair progress by executing SQL `START REPAIR DATA`. - -```sql -IoTDB> STOP REPAIR DATA -IoTDB> STOP REPAIR DATA ON LOCAL -IoTDB> STOP REPAIR DATA ON CLUSTER -``` diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Operate-Metadata.md b/src/UserGuide/V1.3.0-2/User-Manual/Operate-Metadata.md deleted file mode 100644 index 4eb80c594..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Operate-Metadata.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -redirectTo: Operate-Metadata_apache.html ---- - diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Operate-Metadata_apache.md b/src/UserGuide/V1.3.0-2/User-Manual/Operate-Metadata_apache.md deleted file mode 100644 index 570c4b952..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Operate-Metadata_apache.md +++ /dev/null @@ -1,1278 +0,0 @@ - - -# Operate Metadata - -## Database Management - -### Create Database - -According to the storage model we can set up the corresponding database. Two SQL statements are supported for creating databases, as follows: - -``` -IoTDB > create database root.ln -IoTDB > create database root.sgcc -``` - -We can thus create two databases using the above two SQL statements. - -It is worth noting that 1 database is recommended. - -When the path itself or the parent/child layer of the path is already created as database, the path is then not allowed to be created as database. For example, it is not feasible to create `root.ln.wf01` as database when two databases `root.ln` and `root.sgcc` exist. The system gives the corresponding error prompt as shown below: - -``` -IoTDB> CREATE DATABASE root.ln.wf01 -Msg: 300: root.ln has already been created as database. -IoTDB> create database root.ln.wf01 -Msg: 300: root.ln has already been created as database. -``` - -The LayerName of database can only be chinese or english characters, numbers, underscores, dots and backticks. If you want to set it to pure numbers or contain backticks or dots, you need to enclose the database name with backticks (` `` `). In ` `` `,2 backticks represents one, i.e. ` ```` ` represents `` ` ``. - -Besides, if deploy on Windows system, the LayerName is case-insensitive, which means it's not allowed to create databases `root.ln` and `root.LN` at the same time. - -### Show Databases - -After creating the database, we can use the [SHOW DATABASES](../SQL-Manual/SQL-Manual.md) statement and [SHOW DATABASES \](../SQL-Manual/SQL-Manual.md) to view the databases. The SQL statements are as follows: - -``` -IoTDB> SHOW DATABASES -IoTDB> SHOW DATABASES root.** -``` - -The result is as follows: - -``` -+-------------+----+-------------------------+-----------------------+-----------------------+ -|database| ttl|schema_replication_factor|data_replication_factor|time_partition_interval| -+-------------+----+-------------------------+-----------------------+-----------------------+ -| root.sgcc|null| 2| 2| 604800| -| root.ln|null| 2| 2| 604800| -+-------------+----+-------------------------+-----------------------+-----------------------+ -Total line number = 2 -It costs 0.060s -``` - -### Delete Database - -User can use the `DELETE DATABASE ` statement to delete all databases matching the pathPattern. Please note the data in the database will also be deleted. - -``` -IoTDB > DELETE DATABASE root.ln -IoTDB > DELETE DATABASE root.sgcc -// delete all data, all timeseries and all databases -IoTDB > DELETE DATABASE root.** -``` - -### Count Databases - -User can use the `COUNT DATABASE ` statement to count the number of databases. It is allowed to specify `PathPattern` to count the number of databases matching the `PathPattern`. - -SQL statement is as follows: - -``` -IoTDB> count databases -IoTDB> count databases root.* -IoTDB> count databases root.sgcc.* -IoTDB> count databases root.sgcc -``` - -The result is as follows: - -``` -+-------------+ -| database| -+-------------+ -| root.sgcc| -| root.turbine| -| root.ln| -+-------------+ -Total line number = 3 -It costs 0.003s - -+-------------+ -| database| -+-------------+ -| 3| -+-------------+ -Total line number = 1 -It costs 0.003s - -+-------------+ -| database| -+-------------+ -| 3| -+-------------+ -Total line number = 1 -It costs 0.002s - -+-------------+ -| database| -+-------------+ -| 0| -+-------------+ -Total line number = 1 -It costs 0.002s - -+-------------+ -| database| -+-------------+ -| 1| -+-------------+ -Total line number = 1 -It costs 0.002s -``` - -### Setting up heterogeneous databases (Advanced operations) - -Under the premise of familiar with IoTDB metadata modeling, -users can set up heterogeneous databases in IoTDB to cope with different production needs. - -Currently, the following database heterogeneous parameters are supported: - -| Parameter | Type | Description | -| ------------------------- | ------- | --------------------------------------------- | -| TTL | Long | TTL of the Database | -| SCHEMA_REPLICATION_FACTOR | Integer | The schema replication number of the Database | -| DATA_REPLICATION_FACTOR | Integer | The data replication number of the Database | -| SCHEMA_REGION_GROUP_NUM | Integer | The SchemaRegionGroup number of the Database | -| DATA_REGION_GROUP_NUM | Integer | The DataRegionGroup number of the Database | - -Note the following when configuring heterogeneous parameters: - -+ TTL and TIME_PARTITION_INTERVAL must be positive integers. -+ SCHEMA_REPLICATION_FACTOR and DATA_REPLICATION_FACTOR must be smaller than or equal to the number of deployed DataNodes. -+ The function of SCHEMA_REGION_GROUP_NUM and DATA_REGION_GROUP_NUM are related to the parameter `schema_region_group_extension_policy` and `data_region_group_extension_policy` in iotdb-common.properties configuration file. Take DATA_REGION_GROUP_NUM as an example: - If `data_region_group_extension_policy=CUSTOM` is set, DATA_REGION_GROUP_NUM serves as the number of DataRegionGroups owned by the Database. - If `data_region_group_extension_policy=AUTO`, DATA_REGION_GROUP_NUM is used as the lower bound of the DataRegionGroup quota owned by the Database. That is, when the Database starts writing data, it will have at least this number of DataRegionGroups. - -Users can set any heterogeneous parameters when creating a Database, or adjust some heterogeneous parameters during a stand-alone/distributed IoTDB run. - -#### Set heterogeneous parameters when creating a Database - -The user can set any of the above heterogeneous parameters when creating a Database. The SQL statement is as follows: - -``` -CREATE DATABASE prefixPath (WITH databaseAttributeClause (COMMA? databaseAttributeClause)*)? -``` - -For example: - -``` -CREATE DATABASE root.db WITH SCHEMA_REPLICATION_FACTOR=1, DATA_REPLICATION_FACTOR=3, SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; -``` - -#### Adjust heterogeneous parameters at run time - -Users can adjust some heterogeneous parameters during the IoTDB runtime, as shown in the following SQL statement: - -``` -ALTER DATABASE prefixPath WITH databaseAttributeClause (COMMA? databaseAttributeClause)* -``` - -For example: - -``` -ALTER DATABASE root.db WITH SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; -``` - -Note that only the following heterogeneous parameters can be adjusted at runtime: - -+ SCHEMA_REGION_GROUP_NUM -+ DATA_REGION_GROUP_NUM - -#### Show heterogeneous databases - -The user can query the specific heterogeneous configuration of each Database, and the SQL statement is as follows: - -``` -SHOW DATABASES DETAILS prefixPath? -``` - -For example: - -``` -IoTDB> SHOW DATABASES DETAILS -+--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ -|Database| TTL|SchemaReplicationFactor|DataReplicationFactor|TimePartitionInterval|SchemaRegionGroupNum|MinSchemaRegionGroupNum|MaxSchemaRegionGroupNum|DataRegionGroupNum|MinDataRegionGroupNum|MaxDataRegionGroupNum| -+--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ -|root.db1| null| 1| 3| 604800000| 0| 1| 1| 0| 2| 2| -|root.db2|86400000| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| -|root.db3| null| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| -+--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ -Total line number = 3 -It costs 0.058s -``` - -The query results in each column are as follows: - -+ The name of the Database -+ The TTL of the Database -+ The schema replication number of the Database -+ The data replication number of the Database -+ The time partition interval of the Database -+ The current SchemaRegionGroup number of the Database -+ The required minimum SchemaRegionGroup number of the Database -+ The permitted maximum SchemaRegionGroup number of the Database -+ The current DataRegionGroup number of the Database -+ The required minimum DataRegionGroup number of the Database -+ The permitted maximum DataRegionGroup number of the Database - -### TTL - -IoTDB supports storage-level TTL settings, which means it is able to delete old data automatically and periodically. The benefit of using TTL is that hopefully you can control the total disk space usage and prevent the machine from running out of disks. Moreover, the query performance may downgrade as the total number of files goes up and the memory usage also increase as there are more files. Timely removing such files helps to keep at a high query performance level and reduce memory usage. - -The default unit of TTL is milliseconds. If the time precision in the configuration file changes to another, the TTL is still set to milliseconds. - -#### Set TTL - -The SQL Statement for setting TTL is as follow: - -``` -IoTDB> set ttl to root.ln 3600000 -``` - -This example means that for data in `root.ln`, only 3600000 ms, that is, the latest 1 hour will remain, the older one is removed or made invisible. - -``` -IoTDB> set ttl to root.sgcc.** 3600000 -``` - -It supports setting TTL for databases in a path. This example represents setting TTL for all databases in the `root.sgcc` path. - -``` -IoTDB> set ttl to root.** 3600000 -``` - -This example represents setting TTL for all databases. - -#### Unset TTL - -To unset TTL, we can use follwing SQL statement: - -``` -IoTDB> unset ttl to root.ln -``` - -After unset TTL, all data will be accepted in `root.ln`. - -``` -IoTDB> unset ttl to root.sgcc.** -``` - -Unset the TTL setting for all databases in the `root.sgcc` path. - -``` -IoTDB> unset ttl to root.** -``` - -Unset the TTL setting for all databases. - -#### Show TTL - -To Show TTL, we can use following SQL statement: - -``` -IoTDB> SHOW ALL TTL -IoTDB> SHOW TTL ON DataBaseNames -``` - -The SHOW ALL TTL example gives the TTL for all databases. -The SHOW TTL ON root.ln,root.sgcc,root.DB example shows the TTL for the three storage -groups specified. -Note: the TTL for databases that do not have a TTL set will display as null. - -``` -IoTDB> show all ttl -+----------+-------+ -| database|ttl(ms)| -+---------+-------+ -| root.ln|3600000| -|root.sgcc| null| -| root.DB|3600000| -+----------+-------+ -``` - -## Device Template - -IoTDB supports the device template function, enabling different entities of the same type to share metadata, reduce the memory usage of metadata, and simplify the management of numerous entities and measurements. - - -### Create Device Template - -The SQL syntax for creating a metadata template is as follows: - -```sql -CREATE DEVICE TEMPLATE ALIGNED? '(' [',' ]+ ')' -``` - -**Example 1:** Create a template containing two non-aligned timeseries - -```shell -IoTDB> create device template t1 (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) -``` - -**Example 2:** Create a template containing a group of aligned timeseries - -```shell -IoTDB> create device template t2 aligned (lat FLOAT encoding=Gorilla, lon FLOAT encoding=Gorilla) -``` - -The` lat` and `lon` measurements are aligned. - -![img](/img/%E6%A8%A1%E6%9D%BF.png) - -![img](/img/templateEN.jpg) - -### Set Device Template - -After a device template is created, it should be set to specific path before creating related timeseries or insert data. - -**It should be ensured that the related database has been set before setting template.** - -**It is recommended to set device template to database path. It is not suggested to set device template to some path above database** - -**It is forbidden to create timeseries under a path setting s tedeviceplate. Device template shall not be set on a prefix path of an existing timeseries.** - -The SQL Statement for setting device template is as follow: - -```shell -IoTDB> set device template t1 to root.sg1.d1 -``` - -### Activate Device Template - -After setting the device template, with the system enabled to auto create schema, you can insert data into the timeseries. For example, suppose there's a database root.sg1 and t1 has been set to root.sg1.d1, then timeseries like root.sg1.d1.temperature and root.sg1.d1.status are available and data points can be inserted. - - -**Attention**: Before inserting data or the system not enabled to auto create schema, timeseries defined by the device template will not be created. You can use the following SQL statement to create the timeseries or activate the templdeviceate, act before inserting data: - -```shell -IoTDB> create timeseries using device template on root.sg1.d1 -``` - -**Example:** Execute the following statement - -```shell -IoTDB> set device template t1 to root.sg1.d1 -IoTDB> set device template t2 to root.sg1.d2 -IoTDB> create timeseries using device template on root.sg1.d1 -IoTDB> create timeseries using device template on root.sg1.d2 -``` - -Show the time series: - -```sql -show timeseries root.sg1.** -```` - -```shell -+-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| -+-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -|root.sg1.d1.temperature| null| root.sg1| FLOAT| RLE| SNAPPY|null| null| null| null| -| root.sg1.d1.status| null| root.sg1| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| -| root.sg1.d2.lon| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| -| root.sg1.d2.lat| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| -+-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -``` - -Show the devices: - -```sql -show devices root.sg1.** -```` - -```shell -+---------------+---------+ -| devices|isAligned| -+---------------+---------+ -| root.sg1.d1| false| -| root.sg1.d2| true| -+---------------+---------+ -```` - -### Show Device Template - -- Show all device templates - -The SQL statement looks like this: - -```shell -IoTDB> show device templates -``` - -The execution result is as follows: - -```shell -+-------------+ -|template name| -+-------------+ -| t2| -| t1| -+-------------+ -``` - -- Show nodes under in device template - -The SQL statement looks like this: - -```shell -IoTDB> show nodes in device template t1 -``` - -The execution result is as follows: - -```shell -+-----------+--------+--------+-----------+ -|child nodes|dataType|encoding|compression| -+-----------+--------+--------+-----------+ -|temperature| FLOAT| RLE| SNAPPY| -| status| BOOLEAN| PLAIN| SNAPPY| -+-----------+--------+--------+-----------+ -``` - -- Show the path prefix where a device template is set - -```shell -IoTDB> show paths set device template t1 -``` - -The execution result is as follows: - -```shell -+-----------+ -|child paths| -+-----------+ -|root.sg1.d1| -+-----------+ -``` - -- Show the path prefix where a device template is used (i.e. the time series has been created) - -```shell -IoTDB> show paths using device template t1 -``` - -The execution result is as follows: - -```shell -+-----------+ -|child paths| -+-----------+ -|root.sg1.d1| -+-----------+ -``` - -### Deactivate device Template - -To delete a group of timeseries represented by device template, namely deactivate the device template, use the following SQL statement: - -```shell -IoTDB> delete timeseries of device template t1 from root.sg1.d1 -``` - -or - -```shell -IoTDB> deactivate device template t1 from root.sg1.d1 -``` - -The deactivation supports batch process. - -```shell -IoTDB> delete timeseries of device template t1 from root.sg1.*, root.sg2.* -``` - -or - -```shell -IoTDB> deactivate device template t1 from root.sg1.*, root.sg2.* -``` - -If the template name is not provided in sql, all template activation on paths matched by given path pattern will be removed. - -### Unset Device Template - -The SQL Statement for unsetting device template is as follow: - -```shell -IoTDB> unset device template t1 from root.sg1.d1 -``` - -**Attention**: It should be guaranteed that none of the timeseries represented by the target device template exists, before unset it. It can be achieved by deactivation operation. - -### Drop Device Template - -The SQL Statement for dropping device template is as follow: - -```shell -IoTDB> drop device template t1 -``` - -**Attention**: Dropping an already set template is not supported. - -### Alter Device Template - -In a scenario where measurements need to be added, you can modify the template to add measurements to all devicesdevice using the device template. - -The SQL Statement for altering device template is as follow: - -```shell -IoTDB> alter device template t1 add (speed FLOAT encoding=RLE, FLOAT TEXT encoding=PLAIN compression=SNAPPY) -``` - -**When executing data insertion to devices with device template set on related prefix path and there are measurements not present in this device template, the measurements will be auto added to this device template.** - -## Timeseries Management - -### Create Timeseries - -According to the storage model selected before, we can create corresponding timeseries in the two databases respectively. The SQL statements for creating timeseries are as follows: - -``` -IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE -IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN -IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE -``` - -From v0.13, you can use a simplified version of the SQL statements to create timeseries: - -``` -IoTDB > create timeseries root.ln.wf01.wt01.status BOOLEAN encoding=PLAIN -IoTDB > create timeseries root.ln.wf01.wt01.temperature FLOAT encoding=RLE -IoTDB > create timeseries root.ln.wf02.wt02.hardware TEXT encoding=PLAIN -IoTDB > create timeseries root.ln.wf02.wt02.status BOOLEAN encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.status BOOLEAN encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.temperature FLOAT encoding=RLE -``` - -Notice that when in the CREATE TIMESERIES statement the encoding method conflicts with the data type, the system gives the corresponding error prompt as shown below: - -``` -IoTDB > create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF -error: encoding TS_2DIFF does not support BOOLEAN -``` - -Please refer to [Encoding](../Basic-Concept/Encoding-and-Compression.md) for correspondence between data type and encoding. - -### Create Aligned Timeseries - -The SQL statement for creating a group of timeseries are as follows: - -``` -IoTDB> CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY) -``` - -You can set different datatype, encoding, and compression for the timeseries in a group of aligned timeseries - -It is also supported to set an alias, tag, and attribute for aligned timeseries. - -### Delete Timeseries - -To delete the timeseries we created before, we are able to use `(DELETE | DROP) TimeSeries ` statement. - -The usage are as follows: - -``` -IoTDB> delete timeseries root.ln.wf01.wt01.status -IoTDB> delete timeseries root.ln.wf01.wt01.temperature, root.ln.wf02.wt02.hardware -IoTDB> delete timeseries root.ln.wf02.* -IoTDB> drop timeseries root.ln.wf02.* -``` - -### Show Timeseries - -* SHOW LATEST? TIMESERIES pathPattern? whereClause? limitClause? - - There are four optional clauses added in SHOW TIMESERIES, return information of time series - -Timeseries information includes: timeseries path, alias of measurement, database it belongs to, data type, encoding type, compression type, tags and attributes. - -Examples: - -* SHOW TIMESERIES - - presents all timeseries information in JSON form - -* SHOW TIMESERIES <`PathPattern`> - - returns all timeseries information matching the given <`PathPattern`>. SQL statements are as follows: - -``` -IoTDB> show timeseries root.** -IoTDB> show timeseries root.ln.** -``` - -The results are shown below respectively: - -``` -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| -| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| null| null| null| null| -| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -Total line number = 7 -It costs 0.016s - -+-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| -+-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|null| null| null| null| -| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| -|root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY|null| null| null| null| -| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| -+-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -Total line number = 4 -It costs 0.004s -``` - -* SHOW TIMESERIES LIMIT INT OFFSET INT - - returns all the timeseries information start from the offset and limit the number of series returned. For example, - -``` -show timeseries root.ln.** limit 10 offset 10 -``` - -* SHOW TIMESERIES WHERE TIMESERIES contains 'containStr' - - The query result set is filtered by string fuzzy matching based on the names of the timeseries. For example: - -``` -show timeseries root.ln.** where timeseries contains 'wf01.wt' -``` - -The result is shown below: - -``` -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -Total line number = 2 -It costs 0.016s -``` - -* SHOW TIMESERIES WHERE DataType=type - - The query result set is filtered by data type. For example: - -``` -show timeseries root.ln.** where dataType=FLOAT -``` - -The result is shown below: - -``` -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| -| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -Total line number = 3 -It costs 0.016s - -``` - - -* SHOW LATEST TIMESERIES - - all the returned timeseries information should be sorted in descending order of the last timestamp of timeseries - -It is worth noting that when the queried path does not exist, the system will return no timeseries. - - -### Count Timeseries - -IoTDB is able to use `COUNT TIMESERIES ` to count the number of timeseries matching the path. SQL statements are as follows: - -* `WHERE` condition could be used to fuzzy match a time series name with the following syntax: `COUNT TIMESERIES WHERE TIMESERIES contains 'containStr'`. -* `WHERE` condition could be used to filter result by data type with the syntax: `COUNT TIMESERIES WHERE DataType='`. -* `WHERE` condition could be used to filter result by tags with the syntax: `COUNT TIMESERIES WHERE TAGS(key)='value'` or `COUNT TIMESERIES WHERE TAGS(key) contains 'value'`. -* `LEVEL` could be defined to show count the number of timeseries of each node at the given level in current Metadata Tree. This could be used to query the number of sensors under each device. The grammar is: `COUNT TIMESERIES GROUP BY LEVEL=`. - - -``` -IoTDB > COUNT TIMESERIES root.** -IoTDB > COUNT TIMESERIES root.ln.** -IoTDB > COUNT TIMESERIES root.ln.*.*.status -IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status -IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' -IoTDB > COUNT TIMESERIES root.** WHERE DATATYPE = INT64 -IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) contains 'c' -IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) = 'c' -IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' group by level = 1 -``` - -For example, if there are several timeseries (use `show timeseries` to show all timeseries): - -``` -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| -| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| {"unit":"c"}| null| null| null| -| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| {"description":"test1"}| null| null| null| -| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -Total line number = 7 -It costs 0.004s -``` - -Then the Metadata Tree will be as below: - -
- -As can be seen, `root` is considered as `LEVEL=0`. So when you enter statements such as: - -``` -IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 -IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 -IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 -``` - -You will get following results: - -``` -+------------+-----------------+ -| column|count(timeseries)| -+------------+-----------------+ -| root.sgcc| 2| -|root.turbine| 1| -| root.ln| 4| -+------------+-----------------+ -Total line number = 3 -It costs 0.002s - -+------------+-----------------+ -| column|count(timeseries)| -+------------+-----------------+ -|root.ln.wf02| 2| -|root.ln.wf01| 2| -+------------+-----------------+ -Total line number = 2 -It costs 0.002s - -+------------+-----------------+ -| column|count(timeseries)| -+------------+-----------------+ -|root.ln.wf01| 2| -+------------+-----------------+ -Total line number = 1 -It costs 0.002s -``` - -> Note: The path of timeseries is just a filter condition, which has no relationship with the definition of level. - -### Active Timeseries Query -By adding WHERE time filter conditions to the existing SHOW/COUNT TIMESERIES, we can obtain time series with data within the specified time range. - -An example usage is as follows: -``` -IoTDB> insert into root.sg.data(timestamp, s1,s2) values(15000, 1, 2); -IoTDB> insert into root.sg.data2(timestamp, s1,s2) values(15002, 1, 2); -IoTDB> insert into root.sg.data3(timestamp, s1,s2) values(16000, 1, 2); -IoTDB> show timeseries; -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType| -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -| root.sg.data.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -| root.sg.data.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data3.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data3.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data2.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data2.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ - -IoTDB> show timeseries where time >= 15000 and time < 16000; -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType| -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -| root.sg.data.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -| root.sg.data.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data2.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data2.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ - -``` -Regarding the definition of active time series, data that can be queried normally is considered active, meaning time series that have been inserted but deleted are not included. -### Tag and Attribute Management - -We can also add an alias, extra tag and attribute information while creating one timeseries. - -The differences between tag and attribute are: - -* Tag could be used to query the path of timeseries, we will maintain an inverted index in memory on the tag: Tag -> Timeseries -* Attribute could only be queried by timeseries path : Timeseries -> Attribute - -The SQL statements for creating timeseries with extra tag and attribute information are extended as follows: - -``` -create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2) -``` - -The `temprature` in the brackets is an alias for the sensor `s1`. So we can use `temprature` to replace `s1` anywhere. - -> IoTDB also supports using AS function to set alias. The difference between the two is: the alias set by the AS function is used to replace the whole time series name, temporary and not bound with the time series; while the alias mentioned above is only used as the alias of the sensor, which is bound with it and can be used equivalent to the original sensor name. - -> Notice that the size of the extra tag and attribute information shouldn't exceed the `tag_attribute_total_size`. - -We can update the tag information after creating it as following: - -* Rename the tag/attribute key - -``` -ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 -``` - -* Reset the tag/attribute value - -``` -ALTER timeseries root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 -``` - -* Delete the existing tag/attribute - -``` -ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 -``` - -* Add new tags - -``` -ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 -``` - -* Add new attributes - -``` -ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 -``` - -* Upsert alias, tags and attributes - -> add alias or a new key-value if the alias or key doesn't exist, otherwise, update the old one with new value. - -``` -ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag3=v3, tag4=v4) ATTRIBUTES(attr3=v3, attr4=v4) -``` - -* Show timeseries using tags. Use TAGS(tagKey) to identify the tags used as filter key - -``` -SHOW TIMESERIES (<`PathPattern`>)? timeseriesWhereClause -``` - -returns all the timeseries information that satisfy the where condition and match the pathPattern. SQL statements are as follows: - -``` -ALTER timeseries root.ln.wf02.wt02.hardware ADD TAGS unit=c -ALTER timeseries root.ln.wf02.wt02.status ADD TAGS description=test1 -show timeseries root.ln.** where TAGS(unit)='c' -show timeseries root.ln.** where TAGS(description) contains 'test1' -``` - -The results are shown below respectly: - -``` -+--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| -+--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ -|root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|{"unit":"c"}| null| null| null| -+--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ -Total line number = 1 -It costs 0.005s - -+------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| -+------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ -|root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|{"description":"test1"}| null| null| null| -+------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ -Total line number = 1 -It costs 0.004s -``` - -- count timeseries using tags - -``` -COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause -COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause GROUP BY LEVEL= -``` - -returns all the number of timeseries that satisfy the where condition and match the pathPattern. SQL statements are as follows: - -``` -count timeseries -count timeseries root.** where TAGS(unit)='c' -count timeseries root.** where TAGS(unit)='c' group by level = 2 -``` - -The results are shown below respectly : - -``` -IoTDB> count timeseries -+-----------------+ -|count(timeseries)| -+-----------------+ -| 6| -+-----------------+ -Total line number = 1 -It costs 0.019s -IoTDB> count timeseries root.** where TAGS(unit)='c' -+-----------------+ -|count(timeseries)| -+-----------------+ -| 2| -+-----------------+ -Total line number = 1 -It costs 0.020s -IoTDB> count timeseries root.** where TAGS(unit)='c' group by level = 2 -+--------------+-----------------+ -| column|count(timeseries)| -+--------------+-----------------+ -| root.ln.wf02| 2| -| root.ln.wf01| 0| -|root.sgcc.wf03| 0| -+--------------+-----------------+ -Total line number = 3 -It costs 0.011s -``` - -> Notice that, we only support one condition in the where clause. Either it's an equal filter or it is an `contains` filter. In both case, the property in the where condition must be a tag. - -create aligned timeseries - -``` -create aligned timeseries root.sg1.d1(s1 INT32 tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2), s2 DOUBLE tags(tag3=v3, tag4=v4) attributes(attr3=v3, attr4=v4)) -``` - -The execution result is as follows: - -``` -IoTDB> show timeseries -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| -|root.sg1.d1.s2| null| root.sg1| DOUBLE| GORILLA| SNAPPY|{"tag4":"v4","tag3":"v3"}|{"attr4":"v4","attr3":"v3"}| null| null| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -``` - -Support query: - -``` -IoTDB> show timeseries where TAGS(tag1)='v1' -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -``` - -The above operations are supported for timeseries tag, attribute updates, etc. - -## Node Management - -### Show Child Paths - -``` -SHOW CHILD PATHS pathPattern -``` - -Return all child paths and their node types of all the paths matching pathPattern. - -node types: ROOT -> DB INTERNAL -> DATABASE -> INTERNAL -> DEVICE -> TIMESERIES - - -Example: - -* return the child paths of root.ln:show child paths root.ln - -``` -+------------+----------+ -| child paths|node types| -+------------+----------+ -|root.ln.wf01| INTERNAL| -|root.ln.wf02| INTERNAL| -+------------+----------+ -Total line number = 2 -It costs 0.002s -``` - -> get all paths in form of root.xx.xx.xx:show child paths root.xx.xx - -### Show Child Nodes - -``` -SHOW CHILD NODES pathPattern -``` - -Return all child nodes of the pathPattern. - -Example: - -* return the child nodes of root:show child nodes root - -``` -+------------+ -| child nodes| -+------------+ -| ln| -+------------+ -``` - -* return the child nodes of root.ln:show child nodes root.ln - -``` -+------------+ -| child nodes| -+------------+ -| wf01| -| wf02| -+------------+ -``` - -### Count Nodes - -IoTDB is able to use `COUNT NODES LEVEL=` to count the number of nodes at - the given level in current Metadata Tree considering a given pattern. IoTDB will find paths that - match the pattern and counts distinct nodes at the specified level among the matched paths. - This could be used to query the number of devices with specified measurements. The usage are as - follows: - -``` -IoTDB > COUNT NODES root.** LEVEL=2 -IoTDB > COUNT NODES root.ln.** LEVEL=2 -IoTDB > COUNT NODES root.ln.wf01.** LEVEL=3 -IoTDB > COUNT NODES root.**.temperature LEVEL=3 -``` - -As for the above mentioned example and Metadata tree, you can get following results: - -``` -+------------+ -|count(nodes)| -+------------+ -| 4| -+------------+ -Total line number = 1 -It costs 0.003s - -+------------+ -|count(nodes)| -+------------+ -| 2| -+------------+ -Total line number = 1 -It costs 0.002s - -+------------+ -|count(nodes)| -+------------+ -| 1| -+------------+ -Total line number = 1 -It costs 0.002s - -+------------+ -|count(nodes)| -+------------+ -| 2| -+------------+ -Total line number = 1 -It costs 0.002s -``` - -> Note: The path of timeseries is just a filter condition, which has no relationship with the definition of level. - -### Show Devices - -* SHOW DEVICES pathPattern? (WITH DATABASE)? devicesWhereClause? limitClause? - -Similar to `Show Timeseries`, IoTDB also supports two ways of viewing devices: - -* `SHOW DEVICES` statement presents all devices' information, which is equal to `SHOW DEVICES root.**`. -* `SHOW DEVICES ` statement specifies the `PathPattern` and returns the devices information matching the pathPattern and under the given level. -* `WHERE` condition supports `DEVICE contains 'xxx'` to do a fuzzy query based on the device name. - -SQL statement is as follows: - -``` -IoTDB> show devices -IoTDB> show devices root.ln.** -IoTDB> show devices root.ln.** where device contains 't' -``` - -You can get results below: - -``` -+-------------------+---------+ -| devices|isAligned| -+-------------------+---------+ -| root.ln.wf01.wt01| false| -| root.ln.wf02.wt02| false| -|root.sgcc.wf03.wt01| false| -| root.turbine.d1| false| -+-------------------+---------+ -Total line number = 4 -It costs 0.002s - -+-----------------+---------+ -| devices|isAligned| -+-----------------+---------+ -|root.ln.wf01.wt01| false| -|root.ln.wf02.wt02| false| -+-----------------+---------+ -Total line number = 2 -It costs 0.001s -``` - -`isAligned` indicates whether the timeseries under the device are aligned. - -To view devices' information with database, we can use `SHOW DEVICES WITH DATABASE` statement. - -* `SHOW DEVICES WITH DATABASE` statement presents all devices' information with their database. -* `SHOW DEVICES WITH DATABASE` statement specifies the `PathPattern` and returns the - devices' information under the given level with their database information. - -SQL statement is as follows: - -``` -IoTDB> show devices with database -IoTDB> show devices root.ln.** with database -``` - -You can get results below: - -``` -+-------------------+-------------+---------+ -| devices| database|isAligned| -+-------------------+-------------+---------+ -| root.ln.wf01.wt01| root.ln| false| -| root.ln.wf02.wt02| root.ln| false| -|root.sgcc.wf03.wt01| root.sgcc| false| -| root.turbine.d1| root.turbine| false| -+-------------------+-------------+---------+ -Total line number = 4 -It costs 0.003s - -+-----------------+-------------+---------+ -| devices| database|isAligned| -+-----------------+-------------+---------+ -|root.ln.wf01.wt01| root.ln| false| -|root.ln.wf02.wt02| root.ln| false| -+-----------------+-------------+---------+ -Total line number = 2 -It costs 0.001s -``` - -### Count Devices - -* COUNT DEVICES / - -The above statement is used to count the number of devices. At the same time, it is allowed to specify `PathPattern` to count the number of devices matching the `PathPattern`. - -SQL statement is as follows: - -``` -IoTDB> show devices -IoTDB> count devices -IoTDB> count devices root.ln.** -``` - -You can get results below: - -``` -+-------------------+---------+ -| devices|isAligned| -+-------------------+---------+ -|root.sgcc.wf03.wt03| false| -| root.turbine.d1| false| -| root.ln.wf02.wt02| false| -| root.ln.wf01.wt01| false| -+-------------------+---------+ -Total line number = 4 -It costs 0.024s - -+--------------+ -|count(devices)| -+--------------+ -| 4| -+--------------+ -Total line number = 1 -It costs 0.004s - -+--------------+ -|count(devices)| -+--------------+ -| 2| -+--------------+ -Total line number = 1 -It costs 0.004s -``` - -### Active Device Query -Similar to active timeseries query, we can add time filter conditions to device viewing and statistics to query active devices that have data within a certain time range. The definition of active here is the same as for active time series. An example usage is as follows: -``` -IoTDB> insert into root.sg.data(timestamp, s1,s2) values(15000, 1, 2); -IoTDB> insert into root.sg.data2(timestamp, s1,s2) values(15002, 1, 2); -IoTDB> insert into root.sg.data3(timestamp, s1,s2) values(16000, 1, 2); -IoTDB> show devices; -+-------------------+---------+ -| devices|isAligned| -+-------------------+---------+ -| root.sg.data| false| -| root.sg.data2| false| -| root.sg.data3| false| -+-------------------+---------+ - -IoTDB> show devices where time >= 15000 and time < 16000; -+-------------------+---------+ -| devices|isAligned| -+-------------------+---------+ -| root.sg.data| false| -| root.sg.data2| false| -+-------------------+---------+ - -IoTDB> count devices where time >= 15000 and time < 16000; -+--------------+ -|count(devices)| -+--------------+ -| 2| -+--------------+ -``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Operate-Metadata_timecho.md b/src/UserGuide/V1.3.0-2/User-Manual/Operate-Metadata_timecho.md deleted file mode 100644 index 33fb8ed19..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Operate-Metadata_timecho.md +++ /dev/null @@ -1,1280 +0,0 @@ - - -# Operate Metadata - -## Database Management - -### Create Database - -According to the storage model we can set up the corresponding database. Two SQL statements are supported for creating databases, as follows: - -``` -IoTDB > create database root.ln -IoTDB > create database root.sgcc -``` - -We can thus create two databases using the above two SQL statements. - -It is worth noting that 1 database is recommended. - -When the path itself or the parent/child layer of the path is already created as database, the path is then not allowed to be created as database. For example, it is not feasible to create `root.ln.wf01` as database when two databases `root.ln` and `root.sgcc` exist. The system gives the corresponding error prompt as shown below: - -``` -IoTDB> CREATE DATABASE root.ln.wf01 -Msg: 300: root.ln has already been created as database. -IoTDB> create database root.ln.wf01 -Msg: 300: root.ln has already been created as database. -``` - -The LayerName of database can only be chinese or english characters, numbers, underscores, dots and backticks. If you want to set it to pure numbers or contain backticks or dots, you need to enclose the database name with backticks (` `` `). In ` `` `,2 backticks represents one, i.e. ` ```` ` represents `` ` ``. - -Besides, if deploy on Windows system, the LayerName is case-insensitive, which means it's not allowed to create databases `root.ln` and `root.LN` at the same time. - -### Show Databases - -After creating the database, we can use the [SHOW DATABASES](../SQL-Manual/SQL-Manual.md) statement and [SHOW DATABASES \](../SQL-Manual/SQL-Manual.md) to view the databases. The SQL statements are as follows: - -``` -IoTDB> SHOW DATABASES -IoTDB> SHOW DATABASES root.** -``` - -The result is as follows: - -``` -+-------------+----+-------------------------+-----------------------+-----------------------+ -|database| ttl|schema_replication_factor|data_replication_factor|time_partition_interval| -+-------------+----+-------------------------+-----------------------+-----------------------+ -| root.sgcc|null| 2| 2| 604800| -| root.ln|null| 2| 2| 604800| -+-------------+----+-------------------------+-----------------------+-----------------------+ -Total line number = 2 -It costs 0.060s -``` - -### Delete Database - -User can use the `DELETE DATABASE ` statement to delete all databases matching the pathPattern. Please note the data in the database will also be deleted. - -``` -IoTDB > DELETE DATABASE root.ln -IoTDB > DELETE DATABASE root.sgcc -// delete all data, all timeseries and all databases -IoTDB > DELETE DATABASE root.** -``` - -### Count Databases - -User can use the `COUNT DATABASE ` statement to count the number of databases. It is allowed to specify `PathPattern` to count the number of databases matching the `PathPattern`. - -SQL statement is as follows: - -``` -IoTDB> count databases -IoTDB> count databases root.* -IoTDB> count databases root.sgcc.* -IoTDB> count databases root.sgcc -``` - -The result is as follows: - -``` -+-------------+ -| database| -+-------------+ -| root.sgcc| -| root.turbine| -| root.ln| -+-------------+ -Total line number = 3 -It costs 0.003s - -+-------------+ -| database| -+-------------+ -| 3| -+-------------+ -Total line number = 1 -It costs 0.003s - -+-------------+ -| database| -+-------------+ -| 3| -+-------------+ -Total line number = 1 -It costs 0.002s - -+-------------+ -| database| -+-------------+ -| 0| -+-------------+ -Total line number = 1 -It costs 0.002s - -+-------------+ -| database| -+-------------+ -| 1| -+-------------+ -Total line number = 1 -It costs 0.002s -``` - -### Setting up heterogeneous databases (Advanced operations) - -Under the premise of familiar with IoTDB metadata modeling, -users can set up heterogeneous databases in IoTDB to cope with different production needs. - -Currently, the following database heterogeneous parameters are supported: - -| Parameter | Type | Description | -| ------------------------- | ------- | --------------------------------------------- | -| TTL | Long | TTL of the Database | -| SCHEMA_REPLICATION_FACTOR | Integer | The schema replication number of the Database | -| DATA_REPLICATION_FACTOR | Integer | The data replication number of the Database | -| SCHEMA_REGION_GROUP_NUM | Integer | The SchemaRegionGroup number of the Database | -| DATA_REGION_GROUP_NUM | Integer | The DataRegionGroup number of the Database | - -Note the following when configuring heterogeneous parameters: - -+ TTL and TIME_PARTITION_INTERVAL must be positive integers. -+ SCHEMA_REPLICATION_FACTOR and DATA_REPLICATION_FACTOR must be smaller than or equal to the number of deployed DataNodes. -+ The function of SCHEMA_REGION_GROUP_NUM and DATA_REGION_GROUP_NUM are related to the parameter `schema_region_group_extension_policy` and `data_region_group_extension_policy` in iotdb-common.properties configuration file. Take DATA_REGION_GROUP_NUM as an example: - If `data_region_group_extension_policy=CUSTOM` is set, DATA_REGION_GROUP_NUM serves as the number of DataRegionGroups owned by the Database. - If `data_region_group_extension_policy=AUTO`, DATA_REGION_GROUP_NUM is used as the lower bound of the DataRegionGroup quota owned by the Database. That is, when the Database starts writing data, it will have at least this number of DataRegionGroups. - -Users can set any heterogeneous parameters when creating a Database, or adjust some heterogeneous parameters during a stand-alone/distributed IoTDB run. - -#### Set heterogeneous parameters when creating a Database - -The user can set any of the above heterogeneous parameters when creating a Database. The SQL statement is as follows: - -``` -CREATE DATABASE prefixPath (WITH databaseAttributeClause (COMMA? databaseAttributeClause)*)? -``` - -For example: - -``` -CREATE DATABASE root.db WITH SCHEMA_REPLICATION_FACTOR=1, DATA_REPLICATION_FACTOR=3, SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; -``` - -#### Adjust heterogeneous parameters at run time - -Users can adjust some heterogeneous parameters during the IoTDB runtime, as shown in the following SQL statement: - -``` -ALTER DATABASE prefixPath WITH databaseAttributeClause (COMMA? databaseAttributeClause)* -``` - -For example: - -``` -ALTER DATABASE root.db WITH SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; -``` - -Note that only the following heterogeneous parameters can be adjusted at runtime: - -+ SCHEMA_REGION_GROUP_NUM -+ DATA_REGION_GROUP_NUM - -#### Show heterogeneous databases - -The user can query the specific heterogeneous configuration of each Database, and the SQL statement is as follows: - -``` -SHOW DATABASES DETAILS prefixPath? -``` - -For example: - -``` -IoTDB> SHOW DATABASES DETAILS -+--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ -|Database| TTL|SchemaReplicationFactor|DataReplicationFactor|TimePartitionInterval|SchemaRegionGroupNum|MinSchemaRegionGroupNum|MaxSchemaRegionGroupNum|DataRegionGroupNum|MinDataRegionGroupNum|MaxDataRegionGroupNum| -+--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ -|root.db1| null| 1| 3| 604800000| 0| 1| 1| 0| 2| 2| -|root.db2|86400000| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| -|root.db3| null| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| -+--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ -Total line number = 3 -It costs 0.058s -``` - -The query results in each column are as follows: - -+ The name of the Database -+ The TTL of the Database -+ The schema replication number of the Database -+ The data replication number of the Database -+ The time partition interval of the Database -+ The current SchemaRegionGroup number of the Database -+ The required minimum SchemaRegionGroup number of the Database -+ The permitted maximum SchemaRegionGroup number of the Database -+ The current DataRegionGroup number of the Database -+ The required minimum DataRegionGroup number of the Database -+ The permitted maximum DataRegionGroup number of the Database - -### TTL - -IoTDB supports storage-level TTL settings, which means it is able to delete old data automatically and periodically. The benefit of using TTL is that hopefully you can control the total disk space usage and prevent the machine from running out of disks. Moreover, the query performance may downgrade as the total number of files goes up and the memory usage also increase as there are more files. Timely removing such files helps to keep at a high query performance level and reduce memory usage. - -The default unit of TTL is milliseconds. If the time precision in the configuration file changes to another, the TTL is still set to milliseconds. - -#### Set TTL - -The SQL Statement for setting TTL is as follow: - -``` -IoTDB> set ttl to root.ln 3600000 -``` - -This example means that for data in `root.ln`, only 3600000 ms, that is, the latest 1 hour will remain, the older one is removed or made invisible. - -``` -IoTDB> set ttl to root.sgcc.** 3600000 -``` - -It supports setting TTL for databases in a path. This example represents setting TTL for all databases in the `root.sgcc` path. - -``` -IoTDB> set ttl to root.** 3600000 -``` - -This example represents setting TTL for all databases. - -#### Unset TTL - -To unset TTL, we can use follwing SQL statement: - -``` -IoTDB> unset ttl to root.ln -``` - -After unset TTL, all data will be accepted in `root.ln`. - -``` -IoTDB> unset ttl to root.sgcc.** -``` - -Unset the TTL setting for all databases in the `root.sgcc` path. - -``` -IoTDB> unset ttl to root.** -``` - -Unset the TTL setting for all databases. - -#### Show TTL - -To Show TTL, we can use following SQL statement: - -``` -IoTDB> SHOW ALL TTL -IoTDB> SHOW TTL ON DataBaseNames -``` - -The SHOW ALL TTL example gives the TTL for all databases. -The SHOW TTL ON root.ln,root.sgcc,root.DB example shows the TTL for the three storage -groups specified. -Note: the TTL for databases that do not have a TTL set will display as null. - -``` -IoTDB> show all ttl -+----------+-------+ -| database|ttl(ms)| -+---------+-------+ -| root.ln|3600000| -|root.sgcc| null| -| root.DB|3600000| -+----------+-------+ -``` - -## Device Template - -IoTDB supports the device template function, enabling different entities of the same type to share metadata, reduce the memory usage of metadata, and simplify the management of numerous entities and measurements. - - -### Create Device Template - -The SQL syntax for creating a metadata template is as follows: - -```sql -CREATE DEVICE TEMPLATE ALIGNED? '(' [',' ]+ ')' -``` - -**Example 1:** Create a template containing two non-aligned timeseries - -```shell -IoTDB> create device template t1 (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) -``` - -**Example 2:** Create a template containing a group of aligned timeseries - -```shell -IoTDB> create device template t2 aligned (lat FLOAT encoding=Gorilla, lon FLOAT encoding=Gorilla) -``` - -The` lat` and `lon` measurements are aligned. - -![img](/img/%E6%A8%A1%E6%9D%BF.png) - -![img](/img/templateEN.jpg) - -### Set Device Template - -After a device template is created, it should be set to specific path before creating related timeseries or insert data. - -**It should be ensured that the related database has been set before setting template.** - -**It is recommended to set device template to database path. It is not suggested to set device template to some path above database** - -**It is forbidden to create timeseries under a path setting s tedeviceplate. Device template shall not be set on a prefix path of an existing timeseries.** - -The SQL Statement for setting device template is as follow: - -```shell -IoTDB> set device template t1 to root.sg1.d1 -``` - -### Activate Device Template - -After setting the device template, with the system enabled to auto create schema, you can insert data into the timeseries. For example, suppose there's a database root.sg1 and t1 has been set to root.sg1.d1, then timeseries like root.sg1.d1.temperature and root.sg1.d1.status are available and data points can be inserted. - - -**Attention**: Before inserting data or the system not enabled to auto create schema, timeseries defined by the device template will not be created. You can use the following SQL statement to create the timeseries or activate the templdeviceate, act before inserting data: - -```shell -IoTDB> create timeseries using device template on root.sg1.d1 -``` - -**Example:** Execute the following statement - -```shell -IoTDB> set device template t1 to root.sg1.d1 -IoTDB> set device template t2 to root.sg1.d2 -IoTDB> create timeseries using device template on root.sg1.d1 -IoTDB> create timeseries using device template on root.sg1.d2 -``` - -Show the time series: - -```sql -show timeseries root.sg1.** -```` - -```shell -+-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| -+-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -|root.sg1.d1.temperature| null| root.sg1| FLOAT| RLE| SNAPPY|null| null| null| null| -| root.sg1.d1.status| null| root.sg1| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| -| root.sg1.d2.lon| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| -| root.sg1.d2.lat| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| -+-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -``` - -Show the devices: - -```sql -show devices root.sg1.** -```` - -```shell -+---------------+---------+ -| devices|isAligned| -+---------------+---------+ -| root.sg1.d1| false| -| root.sg1.d2| true| -+---------------+---------+ -```` - -### Show Device Template - -- Show all device templates - -The SQL statement looks like this: - -```shell -IoTDB> show device templates -``` - -The execution result is as follows: - -```shell -+-------------+ -|template name| -+-------------+ -| t2| -| t1| -+-------------+ -``` - -- Show nodes under in device template - -The SQL statement looks like this: - -```shell -IoTDB> show nodes in device template t1 -``` - -The execution result is as follows: - -```shell -+-----------+--------+--------+-----------+ -|child nodes|dataType|encoding|compression| -+-----------+--------+--------+-----------+ -|temperature| FLOAT| RLE| SNAPPY| -| status| BOOLEAN| PLAIN| SNAPPY| -+-----------+--------+--------+-----------+ -``` - -- Show the path prefix where a device template is set - -```shell -IoTDB> show paths set device template t1 -``` - -The execution result is as follows: - -```shell -+-----------+ -|child paths| -+-----------+ -|root.sg1.d1| -+-----------+ -``` - -- Show the path prefix where a device template is used (i.e. the time series has been created) - -```shell -IoTDB> show paths using device template t1 -``` - -The execution result is as follows: - -```shell -+-----------+ -|child paths| -+-----------+ -|root.sg1.d1| -+-----------+ -``` - -### Deactivate device Template - -To delete a group of timeseries represented by device template, namely deactivate the device template, use the following SQL statement: - -```shell -IoTDB> delete timeseries of device template t1 from root.sg1.d1 -``` - -or - -```shell -IoTDB> deactivate device template t1 from root.sg1.d1 -``` - -The deactivation supports batch process. - -```shell -IoTDB> delete timeseries of device template t1 from root.sg1.*, root.sg2.* -``` - -or - -```shell -IoTDB> deactivate device template t1 from root.sg1.*, root.sg2.* -``` - -If the template name is not provided in sql, all template activation on paths matched by given path pattern will be removed. - -### Unset Device Template - -The SQL Statement for unsetting device template is as follow: - -```shell -IoTDB> unset device template t1 from root.sg1.d1 -``` - -**Attention**: It should be guaranteed that none of the timeseries represented by the target device template exists, before unset it. It can be achieved by deactivation operation. - -### Drop Device Template - -The SQL Statement for dropping device template is as follow: - -```shell -IoTDB> drop device template t1 -``` - -**Attention**: Dropping an already set template is not supported. - -### Alter Device Template - -In a scenario where measurements need to be added, you can modify the template to add measurements to all devicesdevice using the device template. - -The SQL Statement for altering device template is as follow: - -```shell -IoTDB> alter device template t1 add (speed FLOAT encoding=RLE, FLOAT TEXT encoding=PLAIN compression=SNAPPY) -``` - -**When executing data insertion to devices with device template set on related prefix path and there are measurements not present in this device template, the measurements will be auto added to this device template.** - -## Timeseries Management - -### Create Timeseries - -According to the storage model selected before, we can create corresponding timeseries in the two databases respectively. The SQL statements for creating timeseries are as follows: - -``` -IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE -IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN -IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE -``` - -From v0.13, you can use a simplified version of the SQL statements to create timeseries: - -``` -IoTDB > create timeseries root.ln.wf01.wt01.status BOOLEAN encoding=PLAIN -IoTDB > create timeseries root.ln.wf01.wt01.temperature FLOAT encoding=RLE -IoTDB > create timeseries root.ln.wf02.wt02.hardware TEXT encoding=PLAIN -IoTDB > create timeseries root.ln.wf02.wt02.status BOOLEAN encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.status BOOLEAN encoding=PLAIN -IoTDB > create timeseries root.sgcc.wf03.wt01.temperature FLOAT encoding=RLE -``` - -Notice that when in the CREATE TIMESERIES statement the encoding method conflicts with the data type, the system gives the corresponding error prompt as shown below: - -``` -IoTDB > create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF -error: encoding TS_2DIFF does not support BOOLEAN -``` - -Please refer to [Encoding](../Basic-Concept/Encoding-and-Compression.md) for correspondence between data type and encoding. - -### Create Aligned Timeseries - -The SQL statement for creating a group of timeseries are as follows: - -``` -IoTDB> CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY) -``` - -You can set different datatype, encoding, and compression for the timeseries in a group of aligned timeseries - -It is also supported to set an alias, tag, and attribute for aligned timeseries. - -### Delete Timeseries - -To delete the timeseries we created before, we are able to use `(DELETE | DROP) TimeSeries ` statement. - -The usage are as follows: - -``` -IoTDB> delete timeseries root.ln.wf01.wt01.status -IoTDB> delete timeseries root.ln.wf01.wt01.temperature, root.ln.wf02.wt02.hardware -IoTDB> delete timeseries root.ln.wf02.* -IoTDB> drop timeseries root.ln.wf02.* -``` - -### Show Timeseries - -* SHOW LATEST? TIMESERIES pathPattern? whereClause? limitClause? - - There are four optional clauses added in SHOW TIMESERIES, return information of time series - -Timeseries information includes: timeseries path, alias of measurement, database it belongs to, data type, encoding type, compression type, tags and attributes. - -Examples: - -* SHOW TIMESERIES - - presents all timeseries information in JSON form - -* SHOW TIMESERIES <`PathPattern`> - - returns all timeseries information matching the given <`PathPattern`>. SQL statements are as follows: - -``` -IoTDB> show timeseries root.** -IoTDB> show timeseries root.ln.** -``` - -The results are shown below respectively: - -``` -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| -| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| null| null| null| null| -| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -Total line number = 7 -It costs 0.016s - -+-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| -+-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|null| null| null| null| -| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| -|root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY|null| null| null| null| -| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| -+-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ -Total line number = 4 -It costs 0.004s -``` - -* SHOW TIMESERIES LIMIT INT OFFSET INT - - returns all the timeseries information start from the offset and limit the number of series returned. For example, - -``` -show timeseries root.ln.** limit 10 offset 10 -``` - -* SHOW TIMESERIES WHERE TIMESERIES contains 'containStr' - - The query result set is filtered by string fuzzy matching based on the names of the timeseries. For example: - -``` -show timeseries root.ln.** where timeseries contains 'wf01.wt' -``` - -The result is shown below: - -``` -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -Total line number = 2 -It costs 0.016s -``` - -* SHOW TIMESERIES WHERE DataType=type - - The query result set is filtered by data type. For example: - -``` -show timeseries root.ln.** where dataType=FLOAT -``` - -The result is shown below: - -``` -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| -| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -Total line number = 3 -It costs 0.016s - -``` - - -* SHOW LATEST TIMESERIES - - all the returned timeseries information should be sorted in descending order of the last timestamp of timeseries - -It is worth noting that when the queried path does not exist, the system will return no timeseries. - - -### Count Timeseries - -IoTDB is able to use `COUNT TIMESERIES ` to count the number of timeseries matching the path. SQL statements are as follows: - -* `WHERE` condition could be used to fuzzy match a time series name with the following syntax: `COUNT TIMESERIES WHERE TIMESERIES contains 'containStr'`. -* `WHERE` condition could be used to filter result by data type with the syntax: `COUNT TIMESERIES WHERE DataType='`. -* `WHERE` condition could be used to filter result by tags with the syntax: `COUNT TIMESERIES WHERE TAGS(key)='value'` or `COUNT TIMESERIES WHERE TAGS(key) contains 'value'`. -* `LEVEL` could be defined to show count the number of timeseries of each node at the given level in current Metadata Tree. This could be used to query the number of sensors under each device. The grammar is: `COUNT TIMESERIES GROUP BY LEVEL=`. - - -``` -IoTDB > COUNT TIMESERIES root.** -IoTDB > COUNT TIMESERIES root.ln.** -IoTDB > COUNT TIMESERIES root.ln.*.*.status -IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status -IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' -IoTDB > COUNT TIMESERIES root.** WHERE DATATYPE = INT64 -IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) contains 'c' -IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) = 'c' -IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' group by level = 1 -``` - -For example, if there are several timeseries (use `show timeseries` to show all timeseries): - -``` -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| -| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| {"unit":"c"}| null| null| null| -| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| {"description":"test1"}| null| null| null| -| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| -| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| -+-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ -Total line number = 7 -It costs 0.004s -``` - -Then the Metadata Tree will be as below: - -
- -As can be seen, `root` is considered as `LEVEL=0`. So when you enter statements such as: - -``` -IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 -IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 -IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 -``` - -You will get following results: - -``` -+------------+-----------------+ -| column|count(timeseries)| -+------------+-----------------+ -| root.sgcc| 2| -|root.turbine| 1| -| root.ln| 4| -+------------+-----------------+ -Total line number = 3 -It costs 0.002s - -+------------+-----------------+ -| column|count(timeseries)| -+------------+-----------------+ -|root.ln.wf02| 2| -|root.ln.wf01| 2| -+------------+-----------------+ -Total line number = 2 -It costs 0.002s - -+------------+-----------------+ -| column|count(timeseries)| -+------------+-----------------+ -|root.ln.wf01| 2| -+------------+-----------------+ -Total line number = 1 -It costs 0.002s -``` - -> Note: The path of timeseries is just a filter condition, which has no relationship with the definition of level. - -### Active Timeseries Query -By adding WHERE time filter conditions to the existing SHOW/COUNT TIMESERIES, we can obtain time series with data within the specified time range. - -It is important to note that in metadata queries with time filters, views are not considered; only the time series actually stored in the TsFile are taken into account. - -An example usage is as follows: -``` -IoTDB> insert into root.sg.data(timestamp, s1,s2) values(15000, 1, 2); -IoTDB> insert into root.sg.data2(timestamp, s1,s2) values(15002, 1, 2); -IoTDB> insert into root.sg.data3(timestamp, s1,s2) values(16000, 1, 2); -IoTDB> show timeseries; -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType| -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -| root.sg.data.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -| root.sg.data.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data3.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data3.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data2.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data2.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ - -IoTDB> show timeseries where time >= 15000 and time < 16000; -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType| -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ -| root.sg.data.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -| root.sg.data.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data2.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -|root.sg.data2.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| -+----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ - -``` -Regarding the definition of active time series, data that can be queried normally is considered active, meaning time series that have been inserted but deleted are not included. -### Tag and Attribute Management - -We can also add an alias, extra tag and attribute information while creating one timeseries. - -The differences between tag and attribute are: - -* Tag could be used to query the path of timeseries, we will maintain an inverted index in memory on the tag: Tag -> Timeseries -* Attribute could only be queried by timeseries path : Timeseries -> Attribute - -The SQL statements for creating timeseries with extra tag and attribute information are extended as follows: - -``` -create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2) -``` - -The `temprature` in the brackets is an alias for the sensor `s1`. So we can use `temprature` to replace `s1` anywhere. - -> IoTDB also supports using AS function to set alias. The difference between the two is: the alias set by the AS function is used to replace the whole time series name, temporary and not bound with the time series; while the alias mentioned above is only used as the alias of the sensor, which is bound with it and can be used equivalent to the original sensor name. - -> Notice that the size of the extra tag and attribute information shouldn't exceed the `tag_attribute_total_size`. - -We can update the tag information after creating it as following: - -* Rename the tag/attribute key - -``` -ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 -``` - -* Reset the tag/attribute value - -``` -ALTER timeseries root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 -``` - -* Delete the existing tag/attribute - -``` -ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 -``` - -* Add new tags - -``` -ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 -``` - -* Add new attributes - -``` -ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 -``` - -* Upsert alias, tags and attributes - -> add alias or a new key-value if the alias or key doesn't exist, otherwise, update the old one with new value. - -``` -ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag3=v3, tag4=v4) ATTRIBUTES(attr3=v3, attr4=v4) -``` - -* Show timeseries using tags. Use TAGS(tagKey) to identify the tags used as filter key - -``` -SHOW TIMESERIES (<`PathPattern`>)? timeseriesWhereClause -``` - -returns all the timeseries information that satisfy the where condition and match the pathPattern. SQL statements are as follows: - -``` -ALTER timeseries root.ln.wf02.wt02.hardware ADD TAGS unit=c -ALTER timeseries root.ln.wf02.wt02.status ADD TAGS description=test1 -show timeseries root.ln.** where TAGS(unit)='c' -show timeseries root.ln.** where TAGS(description) contains 'test1' -``` - -The results are shown below respectly: - -``` -+--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| -+--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ -|root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|{"unit":"c"}| null| null| null| -+--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ -Total line number = 1 -It costs 0.005s - -+------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| -+------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ -|root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|{"description":"test1"}| null| null| null| -+------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ -Total line number = 1 -It costs 0.004s -``` - -- count timeseries using tags - -``` -COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause -COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause GROUP BY LEVEL= -``` - -returns all the number of timeseries that satisfy the where condition and match the pathPattern. SQL statements are as follows: - -``` -count timeseries -count timeseries root.** where TAGS(unit)='c' -count timeseries root.** where TAGS(unit)='c' group by level = 2 -``` - -The results are shown below respectly : - -``` -IoTDB> count timeseries -+-----------------+ -|count(timeseries)| -+-----------------+ -| 6| -+-----------------+ -Total line number = 1 -It costs 0.019s -IoTDB> count timeseries root.** where TAGS(unit)='c' -+-----------------+ -|count(timeseries)| -+-----------------+ -| 2| -+-----------------+ -Total line number = 1 -It costs 0.020s -IoTDB> count timeseries root.** where TAGS(unit)='c' group by level = 2 -+--------------+-----------------+ -| column|count(timeseries)| -+--------------+-----------------+ -| root.ln.wf02| 2| -| root.ln.wf01| 0| -|root.sgcc.wf03| 0| -+--------------+-----------------+ -Total line number = 3 -It costs 0.011s -``` - -> Notice that, we only support one condition in the where clause. Either it's an equal filter or it is an `contains` filter. In both case, the property in the where condition must be a tag. - -create aligned timeseries - -``` -create aligned timeseries root.sg1.d1(s1 INT32 tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2), s2 DOUBLE tags(tag3=v3, tag4=v4) attributes(attr3=v3, attr4=v4)) -``` - -The execution result is as follows: - -``` -IoTDB> show timeseries -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| -|root.sg1.d1.s2| null| root.sg1| DOUBLE| GORILLA| SNAPPY|{"tag4":"v4","tag3":"v3"}|{"attr4":"v4","attr3":"v3"}| null| null| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -``` - -Support query: - -``` -IoTDB> show timeseries where TAGS(tag1)='v1' -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| -+--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ -``` - -The above operations are supported for timeseries tag, attribute updates, etc. - -## Node Management - -### Show Child Paths - -``` -SHOW CHILD PATHS pathPattern -``` - -Return all child paths and their node types of all the paths matching pathPattern. - -node types: ROOT -> DB INTERNAL -> DATABASE -> INTERNAL -> DEVICE -> TIMESERIES - - -Example: - -* return the child paths of root.ln:show child paths root.ln - -``` -+------------+----------+ -| child paths|node types| -+------------+----------+ -|root.ln.wf01| INTERNAL| -|root.ln.wf02| INTERNAL| -+------------+----------+ -Total line number = 2 -It costs 0.002s -``` - -> get all paths in form of root.xx.xx.xx:show child paths root.xx.xx - -### Show Child Nodes - -``` -SHOW CHILD NODES pathPattern -``` - -Return all child nodes of the pathPattern. - -Example: - -* return the child nodes of root:show child nodes root - -``` -+------------+ -| child nodes| -+------------+ -| ln| -+------------+ -``` - -* return the child nodes of root.ln:show child nodes root.ln - -``` -+------------+ -| child nodes| -+------------+ -| wf01| -| wf02| -+------------+ -``` - -### Count Nodes - -IoTDB is able to use `COUNT NODES LEVEL=` to count the number of nodes at - the given level in current Metadata Tree considering a given pattern. IoTDB will find paths that - match the pattern and counts distinct nodes at the specified level among the matched paths. - This could be used to query the number of devices with specified measurements. The usage are as - follows: - -``` -IoTDB > COUNT NODES root.** LEVEL=2 -IoTDB > COUNT NODES root.ln.** LEVEL=2 -IoTDB > COUNT NODES root.ln.wf01.** LEVEL=3 -IoTDB > COUNT NODES root.**.temperature LEVEL=3 -``` - -As for the above mentioned example and Metadata tree, you can get following results: - -``` -+------------+ -|count(nodes)| -+------------+ -| 4| -+------------+ -Total line number = 1 -It costs 0.003s - -+------------+ -|count(nodes)| -+------------+ -| 2| -+------------+ -Total line number = 1 -It costs 0.002s - -+------------+ -|count(nodes)| -+------------+ -| 1| -+------------+ -Total line number = 1 -It costs 0.002s - -+------------+ -|count(nodes)| -+------------+ -| 2| -+------------+ -Total line number = 1 -It costs 0.002s -``` - -> Note: The path of timeseries is just a filter condition, which has no relationship with the definition of level. - -### Show Devices - -* SHOW DEVICES pathPattern? (WITH DATABASE)? devicesWhereClause? limitClause? - -Similar to `Show Timeseries`, IoTDB also supports two ways of viewing devices: - -* `SHOW DEVICES` statement presents all devices' information, which is equal to `SHOW DEVICES root.**`. -* `SHOW DEVICES ` statement specifies the `PathPattern` and returns the devices information matching the pathPattern and under the given level. -* `WHERE` condition supports `DEVICE contains 'xxx'` to do a fuzzy query based on the device name. - -SQL statement is as follows: - -``` -IoTDB> show devices -IoTDB> show devices root.ln.** -IoTDB> show devices root.ln.** where device contains 't' -``` - -You can get results below: - -``` -+-------------------+---------+ -| devices|isAligned| -+-------------------+---------+ -| root.ln.wf01.wt01| false| -| root.ln.wf02.wt02| false| -|root.sgcc.wf03.wt01| false| -| root.turbine.d1| false| -+-------------------+---------+ -Total line number = 4 -It costs 0.002s - -+-----------------+---------+ -| devices|isAligned| -+-----------------+---------+ -|root.ln.wf01.wt01| false| -|root.ln.wf02.wt02| false| -+-----------------+---------+ -Total line number = 2 -It costs 0.001s -``` - -`isAligned` indicates whether the timeseries under the device are aligned. - -To view devices' information with database, we can use `SHOW DEVICES WITH DATABASE` statement. - -* `SHOW DEVICES WITH DATABASE` statement presents all devices' information with their database. -* `SHOW DEVICES WITH DATABASE` statement specifies the `PathPattern` and returns the - devices' information under the given level with their database information. - -SQL statement is as follows: - -``` -IoTDB> show devices with database -IoTDB> show devices root.ln.** with database -``` - -You can get results below: - -``` -+-------------------+-------------+---------+ -| devices| database|isAligned| -+-------------------+-------------+---------+ -| root.ln.wf01.wt01| root.ln| false| -| root.ln.wf02.wt02| root.ln| false| -|root.sgcc.wf03.wt01| root.sgcc| false| -| root.turbine.d1| root.turbine| false| -+-------------------+-------------+---------+ -Total line number = 4 -It costs 0.003s - -+-----------------+-------------+---------+ -| devices| database|isAligned| -+-----------------+-------------+---------+ -|root.ln.wf01.wt01| root.ln| false| -|root.ln.wf02.wt02| root.ln| false| -+-----------------+-------------+---------+ -Total line number = 2 -It costs 0.001s -``` - -### Count Devices - -* COUNT DEVICES / - -The above statement is used to count the number of devices. At the same time, it is allowed to specify `PathPattern` to count the number of devices matching the `PathPattern`. - -SQL statement is as follows: - -``` -IoTDB> show devices -IoTDB> count devices -IoTDB> count devices root.ln.** -``` - -You can get results below: - -``` -+-------------------+---------+ -| devices|isAligned| -+-------------------+---------+ -|root.sgcc.wf03.wt03| false| -| root.turbine.d1| false| -| root.ln.wf02.wt02| false| -| root.ln.wf01.wt01| false| -+-------------------+---------+ -Total line number = 4 -It costs 0.024s - -+--------------+ -|count(devices)| -+--------------+ -| 4| -+--------------+ -Total line number = 1 -It costs 0.004s - -+--------------+ -|count(devices)| -+--------------+ -| 2| -+--------------+ -Total line number = 1 -It costs 0.004s -``` - -### Active Device Query -Similar to active timeseries query, we can add time filter conditions to device viewing and statistics to query active devices that have data within a certain time range. The definition of active here is the same as for active time series. An example usage is as follows: -``` -IoTDB> insert into root.sg.data(timestamp, s1,s2) values(15000, 1, 2); -IoTDB> insert into root.sg.data2(timestamp, s1,s2) values(15002, 1, 2); -IoTDB> insert into root.sg.data3(timestamp, s1,s2) values(16000, 1, 2); -IoTDB> show devices; -+-------------------+---------+ -| devices|isAligned| -+-------------------+---------+ -| root.sg.data| false| -| root.sg.data2| false| -| root.sg.data3| false| -+-------------------+---------+ - -IoTDB> show devices where time >= 15000 and time < 16000; -+-------------------+---------+ -| devices|isAligned| -+-------------------+---------+ -| root.sg.data| false| -| root.sg.data2| false| -+-------------------+---------+ - -IoTDB> count devices where time >= 15000 and time < 16000; -+--------------+ -|count(devices)| -+--------------+ -| 2| -+--------------+ -``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Operator-and-Expression.md b/src/UserGuide/V1.3.0-2/User-Manual/Operator-and-Expression.md deleted file mode 100644 index 64626eaf5..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Operator-and-Expression.md +++ /dev/null @@ -1,599 +0,0 @@ - - -# Operator and Expression - -This chapter describes the operators and functions supported by IoTDB. IoTDB provides a wealth of built-in operators and functions to meet your computing needs, and supports extensions through the [User-Defined Function](../Reference/UDF-Libraries.md). - -A list of all available functions, both built-in and custom, can be displayed with `SHOW FUNCTIONS` command. - -See the documentation [Select-Expression](../Reference/Function-and-Expression.md#selector-functions) for the behavior of operators and functions in SQL. - -## OPERATORS - -### Arithmetic Operators - -| Operator | Meaning | -| -------- | ------------------------- | -| `+` | positive (unary operator) | -| `-` | negative (unary operator) | -| `*` | multiplication | -| `/` | division | -| `%` | modulo | -| `+` | addition | -| `-` | subtraction | - -For details and examples, see the document [Arithmetic Operators and Functions](../Reference/Function-and-Expression.md#arithmetic-functions). - -### Comparison Operators - -| Operator | Meaning | -| ------------------------- | ------------------------------------ | -| `>` | greater than | -| `>=` | greater than or equal to | -| `<` | less than | -| `<=` | less than or equal to | -| `==` | equal to | -| `!=` / `<>` | not equal to | -| `BETWEEN ... AND ...` | within the specified range | -| `NOT BETWEEN ... AND ...` | not within the specified range | -| `LIKE` | match simple pattern | -| `NOT LIKE` | cannot match simple pattern | -| `REGEXP` | match regular expression | -| `NOT REGEXP` | cannot match regular expression | -| `IS NULL` | is null | -| `IS NOT NULL` | is not null | -| `IN` / `CONTAINS` | is a value in the specified list | -| `NOT IN` / `NOT CONTAINS` | is not a value in the specified list | - -For details and examples, see the document [Comparison Operators and Functions](../Reference/Function-and-Expression.md#comparison-operators-and-functions). - -### Logical Operators - -| Operator | Meaning | -| --------------------------- | --------------------------------- | -| `NOT` / `!` | logical negation (unary operator) | -| `AND` / `&` / `&&` | logical AND | -| `OR`/ | / || | logical OR | - -For details and examples, see the document [Logical Operators](../Reference/Function-and-Expression.md#logical-operators). - -### Operator Precedence - -The precedence of operators is arranged as shown below from high to low, and operators on the same row have the same precedence. - -```sql -!, - (unary operator), + (unary operator) -*, /, DIV, %, MOD --, + -=, ==, <=>, >=, >, <=, <, <>, != -LIKE, REGEXP, NOT LIKE, NOT REGEXP -BETWEEN ... AND ..., NOT BETWEEN ... AND ... -IS NULL, IS NOT NULL -IN, CONTAINS, NOT IN, NOT CONTAINS -AND, &, && -OR, |, || -``` - -## BUILT-IN FUNCTIONS - -The built-in functions can be used in IoTDB without registration, and the functions in the data quality function library need to be registered by referring to the registration steps in the next chapter before they can be used. - -### Aggregate Functions - -| Function Name | Description | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | -| ------------- | ------------------------------------------------------------ | ------------------------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------| -| SUM | Summation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | -| COUNT | Counts the number of data points. | All types | / | INT | -| AVG | Average. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | -| STDDEV | Alias for STDDEV_SAMP. Return the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | -| STDDEV_POP | Return the population standard deviation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | -| STDDEV_SAMP | Return the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | -| VARIANCE | Alias for VAR_SAMP. Return the sample variance. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | -| VAR_POP | Return the population variance. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | -| VAR_SAMP | Return the sample variance. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | -| EXTREME | Finds the value with the largest absolute value. Returns a positive value if the maximum absolute value of positive and negative values is equal. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | -| MAX_VALUE | Find the maximum value. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | -| MIN_VALUE | Find the minimum value. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | -| FIRST_VALUE | Find the value with the smallest timestamp. | All data types | / | Consistent with input data type | -| LAST_VALUE | Find the value with the largest timestamp. | All data types | / | Consistent with input data type | -| MAX_TIME | Find the maximum timestamp. | All data Types | / | Timestamp | -| MIN_TIME | Find the minimum timestamp. | All data Types | / | Timestamp | -| COUNT_IF | Find the number of data points that continuously meet a given condition and the number of data points that meet the condition (represented by keep) meet the specified threshold. | BOOLEAN | `[keep >=/>/=/!=/= threshold` if `threshold` is used alone, type of `threshold` is `INT64` `ignoreNull`:Optional, default value is `true`;If the value is `true`, null values are ignored, it means that if there is a null value in the middle, the value is ignored without interrupting the continuity. If the value is `true`, null values are not ignored, it means that if there are null values in the middle, continuity will be broken | INT64 | -| TIME_DURATION | Find the difference between the timestamp of the largest non-null value and the timestamp of the smallest non-null value in a column | All data Types | / | INT64 | -| MODE | Find the mode. Note: 1.Having too many different values in the input series risks a memory exception; 2.If all the elements have the same number of occurrences, that is no Mode, return the value with earliest time; 3.If there are many Modes, return the Mode with earliest time. | All data Types | / | Consistent with the input data type | -| MAX_BY | MAX_BY(x, y) returns the value of x corresponding to the maximum value of the input y. MAX_BY(time, x) returns the timestamp when x is at its maximum value. | The first input x can be of any type, while the second input y must be of type INT32, INT64, FLOAT, or DOUBLE. | / | Consistent with the data type of the first input x. | -| MIN_BY | MIN_BY(x, y) returns the value of x corresponding to the minimum value of the input y. MIN_BY(time, x) returns the timestamp when x is at its minimum value. | The first input x can be of any type, while the second input y must be of type INT32, INT64, FLOAT, or DOUBLE. | / | Consistent with the data type of the first input x. | - -For details and examples, see the document [Aggregate Functions](../Reference/Function-and-Expression.md#aggregate-functions). - -### Arithmetic Functions - -| Function Name | Allowed Input Series Data Types | Output Series Data Type | Required Attributes | Corresponding Implementation in the Java Standard Library | -| ------------- | ------------------------------- | ----------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -| SIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sin(double) | -| COS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cos(double) | -| TAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tan(double) | -| ASIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#asin(double) | -| ACOS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#acos(double) | -| ATAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#atan(double) | -| SINH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sinh(double) | -| COSH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cosh(double) | -| TANH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tanh(double) | -| DEGREES | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toDegrees(double) | -| RADIANS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toRadians(double) | -| ABS | INT32 / INT64 / FLOAT / DOUBLE | Same type as the input series | / | Math#abs(int) / Math#abs(long) /Math#abs(float) /Math#abs(double) | -| SIGN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#signum(double) | -| CEIL | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#ceil(double) | -| FLOOR | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#floor(double) | -| ROUND | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | 'places' : Round the significant number, positive number is the significant number after the decimal point, negative number is the significant number of whole number | Math#rint(Math#pow(10,places))/Math#pow(10,places) | -| EXP | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#exp(double) | -| LN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log(double) | -| LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log10(double) | -| SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sqrt(double) | - -For details and examples, see the document [Arithmetic Operators and Functions](../Reference/Function-and-Expression.md#arithmetic-operators-and-functions). - -### Comparison Functions - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| ------------- | ------------------------------- | ----------------------------------------- | ----------------------- | --------------------------------------------- | -| ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`: a double type variate | BOOLEAN | Return `ts_value >= threshold`. | -| IN_RANGR | INT32 / INT64 / FLOAT / DOUBLE | `lower`: DOUBLE type `upper`: DOUBLE type | BOOLEAN | Return `ts_value >= lower && value <= upper`. | - -For details and examples, see the document [Comparison Operators and Functions](../Reference/Function-and-Expression.md#comparison-operators-and-functions). - -### String Processing Functions - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| --------------- | ------------------------------- | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | -| STRING_CONTAINS | TEXT | `s`: string to search for | BOOLEAN | Checks whether the substring `s` exists in the string. | -| STRING_MATCHES | TEXT | `regex`: Java standard library-style regular expressions. | BOOLEAN | Judges whether a string can be matched by the regular expression `regex`. | -| LENGTH | TEXT | / | INT32 | Get the length of input series. | -| LOCATE | TEXT | `target`: The substring to be located.
`reverse`: Indicates whether reverse locate is required. The default value is `false`, means left-to-right locate. | INT32 | Get the position of the first occurrence of substring `target` in input series. Returns -1 if there are no `target` in input. | -| STARTSWITH | TEXT | `target`: The prefix to be checked. | BOOLEAN | Check whether input series starts with the specified prefix `target`. | -| ENDSWITH | TEXT | `target`: The suffix to be checked. | BOOLEAN | Check whether input series ends with the specified suffix `target`. | -| CONCAT | TEXT | `targets`: a series of K-V, key needs to start with `target` and be not duplicated, value is the string you want to concat.
`series_behind`: Indicates whether series behind targets. The default value is `false`. | TEXT | Concatenate input string and `target` string. | -| SUBSTRING | TEXT | `from`: Indicates the start position of substring.
`for`: Indicates how many characters to stop after of substring. | TEXT | Extracts a substring of a string, starting with the first specified character and stopping after the specified number of characters.The index start at 1. | -| REPLACE | TEXT | first parameter: The target substring to be replaced.
second parameter: The substring to replace with. | TEXT | Replace a substring in the input sequence with the target substring. | -| UPPER | TEXT | / | TEXT | Get the string of input series with all characters changed to uppercase. | -| LOWER | TEXT | / | TEXT | Get the string of input series with all characters changed to lowercase. | -| TRIM | TEXT | / | TEXT | Get the string whose value is same to input series, with all leading and trailing space removed. | -| STRCMP | TEXT | / | TEXT | Get the compare result of two input series. Returns `0` if series value are the same, a `negative integer` if value of series1 is smaller than series2,
a `positive integer` if value of series1 is more than series2. | - - -For details and examples, see the document [String Processing](../Reference/Function-and-Expression.md#string-processing). - -### Data Type Conversion Function - -| Function Name | Required Attributes | Output Series Data Type | Description | -| ------------- | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | -| CAST | `type`: Output data type, INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | determined by `type` | Convert the data to the type specified by the `type` parameter. | - -For details and examples, see the document [Data Type Conversion Function](../Reference/Function-and-Expression.md#data-type-conversion-function). - -### Constant Timeseries Generating Functions - -| Function Name | Required Attributes | Output Series Data Type | Description | -| ------------- | ------------------------------------------------------------ | -------------------------------------------- | ------------------------------------------------------------ | -| CONST | `value`: the value of the output data point `type`: the type of the output data point, it can only be INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | Determined by the required attribute `type` | Output the user-specified constant timeseries according to the attributes `value` and `type`. | -| PI | None | DOUBLE | Data point value: a `double` value of `π`, the ratio of the circumference of a circle to its diameter, which is equals to `Math.PI` in the *Java Standard Library*. | -| E | None | DOUBLE | Data point value: a `double` value of `e`, the base of the natural logarithms, which is equals to `Math.E` in the *Java Standard Library*. | - -For details and examples, see the document [Constant Timeseries Generating Functions](../Reference/Function-and-Expression.md#constant-timeseries-generating-functions). - -### Selector Functions - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| ------------- | ------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | -| TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the largest values in a time series. | -| BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the smallest values in a time series. | - -For details and examples, see the document [Selector Functions](../Reference/Function-and-Expression.md#selector-functions). - -### Continuous Interval Functions - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| ----------------- | ------------------------------------ | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | -| ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always 0(false), and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | -| NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always not 0, and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | -| ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always 0(false). Data points number `n` satisfy `n >= min && n <= max` | -| NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always not 0(false). Data points number `n` satisfy `n >= min && n <= max` | - -For details and examples, see the document [Continuous Interval Functions](../Reference/Function-and-Expression.md#continuous-interval-functions). - -### Variation Trend Calculation Functions - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| ----------------------- | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | -| TIME_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | / | INT64 | Calculates the difference between the time stamp of a data point and the time stamp of the previous data point. There is no corresponding output for the first data point. | -| DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | -| NON_NEGATIVE_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the absolute value of the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | -| DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the rate of change of a data point compared to the previous data point, the result is equals to DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | -| NON_NEGATIVE_DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the absolute value of the rate of change of a data point compared to the previous data point, the result is equals to NON_NEGATIVE_DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | -| DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:optional,default is true. If is true, the previous data point is ignored when it is null and continues to find the first non-null value forwardly. If the value is false, previous data point is not ignored when it is null, the result is also null because null is used for subtraction | DOUBLE | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point, so output is null | - -For details and examples, see the document [Variation Trend Calculation Functions](../Reference/Function-and-Expression.md#variation-trend-calculation-functions). - -### Sample Functions - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| -------------------------------- | ------------------------------- | ------------------------------------------------------------ | ------------------------------ | ------------------------------------------------------------ | -| EQUAL_SIZE_BUCKET_RANDOM_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns a random sample of equal buckets that matches the sampling ratio | -| EQUAL_SIZE_BUCKET_AGG_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1`
`type`: The value types are `avg`, `max`, `min`, `sum`, `extreme`, `variance`, the default is `avg` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket aggregation samples that match the sampling ratio | -| EQUAL_SIZE_BUCKET_M4_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket M4 samples that match the sampling ratio | -| EQUAL_SIZE_BUCKET_OUTLIER_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | The value range of `proportion` is `(0, 1]`, the default is `0.1`
The value of `type` is `avg` or `stendis` or `cos` or `prenextdis`, the default is `avg`
The value of `number` should be greater than 0, the default is `3` | INT32 / INT64 / FLOAT / DOUBLE | Returns outlier samples in equal buckets that match the sampling ratio and the number of samples in the bucket | -| M4 | INT32 / INT64 / FLOAT / DOUBLE | Different attributes used by the size window and the time window. The size window uses attributes `windowSize` and `slidingStep`. The time window uses attributes `timeInterval`, `slidingStep`, `displayWindowBegin`, and `displayWindowEnd`. More details see below. | INT32 / INT64 / FLOAT / DOUBLE | Returns the `first, last, bottom, top` points in each sliding window. M4 sorts and deduplicates the aggregated points within the window before outputting them. | - -For details and examples, see the document [Sample Functions](../Reference/Function-and-Expression.md#sample-functions). - -### Change Points Function - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | -| ------------- | ------------------------------- | ------------------- | ----------------------------- | ----------------------------------------------------------- | -| CHANGE_POINTS | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Remove consecutive identical values from an input sequence. | - -For details and examples, see the document [Time-Series](../Reference/Function-and-Expression.md#time-series-processing). - -## DATA QUALITY FUNCTION LIBRARY - -### About - -For applications based on time series data, data quality is vital. **UDF Library** is IoTDB User Defined Functions (UDF) about data quality, including data profiling, data quality evalution and data repairing. It effectively meets the demand for data quality in the industrial field. - -### Quick Start - -The functions in this function library are not built-in functions, and must be loaded into the system before use. - -1. [Download](https://archive.apache.org/dist/iotdb/1.0.1/apache-iotdb-1.0.1-library-udf-bin.zip) the JAR with all dependencies and the script of registering UDF. -2. Copy the JAR package to `ext\udf` under the directory of IoTDB system (Please put JAR to this directory of all DataNodes if you use Cluster). -3. Run `sbin\start-confignode.bat` and then `sbin\start-datanode.bat` (for Windows) or `sbin\start-confignode.sh` and `sbin\start-datanode.sh` (for Linux or MacOS) to start IoTDB server. -4. Copy the script to the directory of IoTDB system (under the root directory, at the same level as `sbin`), modify the parameters in the script if needed and run it to register UDF. - -### Implemented Functions - -1. Data Quality related functions, such as `Completeness`. For details and examples, see the document [Data-Quality](../Reference/UDF-Libraries.md#data-quality). -2. Data Profiling related functions, such as `ACF`. For details and examples, see the document [Data-Profiling](../Reference/UDF-Libraries.md#data-profiling). -3. Anomaly Detection related functions, such as `IQR`. For details and examples, see the document [Anomaly-Detection](../Reference/UDF-Libraries.md#anomaly-detection). -4. Frequency Domain Analysis related functions, such as `Conv`. For details and examples, see the document [Frequency-Domain](../Reference/UDF-Libraries.md#frequency-domain-analysis). -5. Data Matching related functions, such as `DTW`. For details and examples, see the document [Data-Matching](../Reference/UDF-Libraries.md#data-matching). -6. Data Repairing related functions, such as `TimestampRepair`. For details and examples, see the document [Data-Repairing](../Reference/UDF-Libraries.md#timestamprepair). -7. Series Discovery related functions, such as `ConsecutiveSequences`. For details and examples, see the document [Series-Discovery](../Reference/UDF-Libraries.md). -8. Machine Learning related functions, such as `AR`. For details and examples, see the document [Machine-Learning](../Reference/UDF-Libraries.md#machine-learning). - -## LAMBDA EXPRESSION - -| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Series Data Type Description | -| ------------- | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------- | ------------------------------------------------------------ | -| JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr` is a lambda expression that supports standard one or multi arguments in the form `x -> {...}` or `(x, y, z) -> {...}`, e.g. `x -> {x * 2}`, `(x, y, z) -> {x + y * z}` | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | Returns the input time series transformed by a lambda expression | - -For details and examples, see the document [Lambda](../Reference/Function-and-Expression.md#lambda-expression). - -## CONDITIONAL EXPRESSION - -| Expression Name | Description | -| --------------- | -------------------- | -| `CASE` | similar to "if else" | - -For details and examples, see the document [Conditional Expressions](../Reference/Function-and-Expression.md#conditional-expressions). - -## SELECT EXPRESSION - -The `SELECT` clause specifies the output of the query, consisting of several `selectExpr`. Each `selectExpr` defines one or more columns in the query result. - -**`selectExpr` is an expression consisting of time series path suffixes, constants, functions, and operators. That is, `selectExpr` can contain: ** - -- Time series path suffix (wildcards are supported) -- operator - - Arithmetic operators - - comparison operators - - Logical Operators -- function - - aggregate functions - - Time series generation functions (including built-in functions and user-defined functions) -- constant - -### Use Alias - -Since the unique data model of IoTDB, lots of additional information like device will be carried before each sensor. Sometimes, we want to query just one specific device, then these prefix information show frequently will be redundant in this situation, influencing the analysis of result set. At this time, we can use `AS` function provided by IoTDB, assign an alias to time series selected in query. - -For example: - -```sql -select s1 as temperature, s2 as speed from root.ln.wf01.wt01; -``` - -The result set is: - -| Time | temperature | speed | -| ---- | ----------- | ----- | -| ... | ... | ... | - - -### Operator - -See this documentation for a list of operators supported in IoTDB. - -### Function - -#### Aggregate Functions - -Aggregate functions are many-to-one functions. They perform aggregate calculations on a set of values, resulting in a single aggregated result. - -**A query that contains an aggregate function is called an aggregate query**, otherwise, it is called a time series query. - -> Please note that mixed use of `Aggregate Query` and `Timeseries Query` is not allowed. Below are examples for queries that are not allowed. -> -> ``` -> select a, count(a) from root.sg -> select sin(a), count(a) from root.sg -> select a, count(a) from root.sg group by ([10,100),10ms) -> ``` - -For the aggregation functions supported by IoTDB, see the document [Aggregate Functions](../Reference/Function-and-Expression.md#aggregate-functions). - - -#### Time Series Generation Function - -A time series generation function takes several raw time series as input and produces a list of time series as output. Unlike aggregate functions, time series generators have a timestamp column in their result sets. - -All time series generation functions accept * as input, and all can be mixed with raw time series queries. - -##### Built-in Time Series Generation Functions - -See this documentation for a list of built-in functions supported in IoTDB. - -##### User-Defined Time Series Generation Functions - -IoTDB supports function extension through User Defined Function (click for [User-Defined Function](./Database-Programming.md#udtfuser-defined-timeseries-generating-function)) capability. - -### Nested Expressions - -IoTDB supports the calculation of arbitrary nested expressions. Since time series query and aggregation query can not be used in a query statement at the same time, we divide nested expressions into two types, which are nested expressions with time series query and nested expressions with aggregation query. - -The following is the syntax definition of the `select` clause: - -```sql -selectClause - : SELECT resultColumn (',' resultColumn)* - ; - -resultColumn - : expression (AS ID)? - ; - -expression - : '(' expression ')' - | '-' expression - | expression ('*' | '/' | '%') expression - | expression ('+' | '-') expression - | functionName '(' expression (',' expression)* functionAttribute* ')' - | timeSeriesSuffixPath - | number - ; -``` - -#### Nested Expressions with Time Series Query - -IoTDB supports the calculation of arbitrary nested expressions consisting of **numbers, time series, time series generating functions (including user-defined functions) and arithmetic expressions** in the `select` clause. - -##### Example - -Input1: - -```sql -select a, - b, - ((a + 1) * 2 - 1) % 2 + 1.5, - sin(a + sin(a + sin(b))), - -(a + b) * (sin(a + b) * sin(a + b) + cos(a + b) * cos(a + b)) + 1 -from root.sg1; -``` - -Result1: - -``` -+-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Time|root.sg1.a|root.sg1.b|((((root.sg1.a + 1) * 2) - 1) % 2) + 1.5|sin(root.sg1.a + sin(root.sg1.a + sin(root.sg1.b)))|(-root.sg1.a + root.sg1.b * ((sin(root.sg1.a + root.sg1.b) * sin(root.sg1.a + root.sg1.b)) + (cos(root.sg1.a + root.sg1.b) * cos(root.sg1.a + root.sg1.b)))) + 1| -+-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ -|1970-01-01T08:00:00.010+08:00| 1| 1| 2.5| 0.9238430524420609| -1.0| -|1970-01-01T08:00:00.020+08:00| 2| 2| 2.5| 0.7903505371876317| -3.0| -|1970-01-01T08:00:00.030+08:00| 3| 3| 2.5| 0.14065207680386618| -5.0| -|1970-01-01T08:00:00.040+08:00| 4| null| 2.5| null| null| -|1970-01-01T08:00:00.050+08:00| null| 5| null| null| null| -|1970-01-01T08:00:00.060+08:00| 6| 6| 2.5| -0.7288037411970916| -11.0| -+-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ -Total line number = 6 -It costs 0.048s -``` - -Input2: - -```sql -select (a + b) * 2 + sin(a) from root.sg -``` - -Result2: - -``` -+-----------------------------+----------------------------------------------+ -| Time|((root.sg.a + root.sg.b) * 2) + sin(root.sg.a)| -+-----------------------------+----------------------------------------------+ -|1970-01-01T08:00:00.010+08:00| 59.45597888911063| -|1970-01-01T08:00:00.020+08:00| 100.91294525072763| -|1970-01-01T08:00:00.030+08:00| 139.01196837590714| -|1970-01-01T08:00:00.040+08:00| 180.74511316047935| -|1970-01-01T08:00:00.050+08:00| 219.73762514629607| -|1970-01-01T08:00:00.060+08:00| 259.6951893788978| -|1970-01-01T08:00:00.070+08:00| 300.7738906815579| -|1970-01-01T08:00:00.090+08:00| 39.45597888911063| -|1970-01-01T08:00:00.100+08:00| 39.45597888911063| -+-----------------------------+----------------------------------------------+ -Total line number = 9 -It costs 0.011s -``` - -Input3: - -```sql -select (a + *) / 2 from root.sg1 -``` - -Result3: - -``` -+-----------------------------+-----------------------------+-----------------------------+ -| Time|(root.sg1.a + root.sg1.a) / 2|(root.sg1.a + root.sg1.b) / 2| -+-----------------------------+-----------------------------+-----------------------------+ -|1970-01-01T08:00:00.010+08:00| 1.0| 1.0| -|1970-01-01T08:00:00.020+08:00| 2.0| 2.0| -|1970-01-01T08:00:00.030+08:00| 3.0| 3.0| -|1970-01-01T08:00:00.040+08:00| 4.0| null| -|1970-01-01T08:00:00.060+08:00| 6.0| 6.0| -+-----------------------------+-----------------------------+-----------------------------+ -Total line number = 5 -It costs 0.011s -``` - -Input4: - -```sql -select (a + b) * 3 from root.sg, root.ln -``` - -Result4: - -``` -+-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ -| Time|(root.sg.a + root.sg.b) * 3|(root.sg.a + root.ln.b) * 3|(root.ln.a + root.sg.b) * 3|(root.ln.a + root.ln.b) * 3| -+-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ -|1970-01-01T08:00:00.010+08:00| 90.0| 270.0| 360.0| 540.0| -|1970-01-01T08:00:00.020+08:00| 150.0| 330.0| 690.0| 870.0| -|1970-01-01T08:00:00.030+08:00| 210.0| 450.0| 570.0| 810.0| -|1970-01-01T08:00:00.040+08:00| 270.0| 240.0| 690.0| 660.0| -|1970-01-01T08:00:00.050+08:00| 330.0| null| null| null| -|1970-01-01T08:00:00.060+08:00| 390.0| null| null| null| -|1970-01-01T08:00:00.070+08:00| 450.0| null| null| null| -|1970-01-01T08:00:00.090+08:00| 60.0| null| null| null| -|1970-01-01T08:00:00.100+08:00| 60.0| null| null| null| -+-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ -Total line number = 9 -It costs 0.014s -``` - -##### Explanation - -- Only when the left operand and the right operand under a certain timestamp are not `null`, the nested expressions will have an output value. Otherwise this row will not be included in the result. - - In Result1 of the Example part, the value of time series `root.sg.a` at time 40 is 4, while the value of time series `root.sg.b` is `null`. So at time 40, the value of nested expressions `(a + b) * 2 + sin(a)` is `null`. So in Result2, this row is not included in the result. -- If one operand in the nested expressions can be translated into multiple time series (For example, `*`), the result of each time series will be included in the result (Cartesian product). Please refer to Input3, Input4 and corresponding Result3 and Result4 in Example. - -##### Note - -> Please note that Aligned Time Series has not been supported in Nested Expressions with Time Series Query yet. An error message is expected if you use it with Aligned Time Series selected in a query statement. - -#### Nested Expressions Query with Aggregations - -IoTDB supports the calculation of arbitrary nested expressions consisting of **numbers, aggregations and arithmetic expressions** in the `select` clause. - -##### Example - -Aggregation query without `GROUP BY`. - -Input1: - -```sql -select avg(temperature), - sin(avg(temperature)), - avg(temperature) + 1, - -sum(hardware), - avg(temperature) + sum(hardware) -from root.ln.wf01.wt01; -``` - -Result1: - -``` -+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ -|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|avg(root.ln.wf01.wt01.temperature) + sum(root.ln.wf01.wt01.hardware)| -+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ -| 15.927999999999999| -0.21826546964855045| 16.927999999999997| -7426.0| 7441.928| -+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ -Total line number = 1 -It costs 0.009s -``` - -Input2: - -```sql -select avg(*), - (avg(*) + 1) * 3 / 2 -1 -from root.sg1 -``` - -Result2: - -``` -+---------------+---------------+-------------------------------------+-------------------------------------+ -|avg(root.sg1.a)|avg(root.sg1.b)|(avg(root.sg1.a) + 1) * 3 / 2 - 1 |(avg(root.sg1.b) + 1) * 3 / 2 - 1 | -+---------------+---------------+-------------------------------------+-------------------------------------+ -| 3.2| 3.4| 5.300000000000001| 5.6000000000000005| -+---------------+---------------+-------------------------------------+-------------------------------------+ -Total line number = 1 -It costs 0.007s -``` - -Aggregation with `GROUP BY`. - -Input3: - -```sql -select avg(temperature), - sin(avg(temperature)), - avg(temperature) + 1, - -sum(hardware), - avg(temperature) + sum(hardware) as custom_sum -from root.ln.wf01.wt01 -GROUP BY([10, 90), 10ms); -``` - -Result3: - -``` -+-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ -| Time|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|custom_sum| -+-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ -|1970-01-01T08:00:00.010+08:00| 13.987499999999999| 0.9888207947857667| 14.987499999999999| -3211.0| 3224.9875| -|1970-01-01T08:00:00.020+08:00| 29.6| -0.9701057337071853| 30.6| -3720.0| 3749.6| -|1970-01-01T08:00:00.030+08:00| null| null| null| null| null| -|1970-01-01T08:00:00.040+08:00| null| null| null| null| null| -|1970-01-01T08:00:00.050+08:00| null| null| null| null| null| -|1970-01-01T08:00:00.060+08:00| null| null| null| null| null| -|1970-01-01T08:00:00.070+08:00| null| null| null| null| null| -|1970-01-01T08:00:00.080+08:00| null| null| null| null| null| -+-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ -Total line number = 8 -It costs 0.012s -``` - -##### Explanation - -- Only when the left operand and the right operand under a certain timestamp are not `null`, the nested expressions will have an output value. Otherwise this row will not be included in the result. But for nested expressions with `GROUP BY` clause, it is better to show the result of all time intervals. Please refer to Input3 and corresponding Result3 in Example. -- If one operand in the nested expressions can be translated into multiple time series (For example, `*`), the result of each time series will be included in the result (Cartesian product). Please refer to Input2 and corresponding Result2 in Example. \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Query-Data.md b/src/UserGuide/V1.3.0-2/User-Manual/Query-Data.md deleted file mode 100644 index 0681da225..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Query-Data.md +++ /dev/null @@ -1,3011 +0,0 @@ - -# Query Data -## OVERVIEW - -### Syntax Definition - -In IoTDB, `SELECT` statement is used to retrieve data from one or more selected time series. Here is the syntax definition of `SELECT` statement: - -```sql -SELECT [LAST] selectExpr [, selectExpr] ... - [INTO intoItem [, intoItem] ...] - FROM prefixPath [, prefixPath] ... - [WHERE whereCondition] - [GROUP BY { - ([startTime, endTime), interval [, slidingStep]) | - LEVEL = levelNum [, levelNum] ... | - TAGS(tagKey [, tagKey] ... ) | - VARIATION(expression[,delta][,ignoreNull=true/false]) | - CONDITION(expression,[keep>/>=/=/ 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; -``` - -which means: - -The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature". The SQL statement requires that the status and temperature sensor values between the time point of "2017-11-01T00:05:00.000" and "2017-11-01T00:12:00.000" be selected. - -The execution result of this SQL statement is as follows: - -``` -+-----------------------------+------------------------+-----------------------------+ -| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| -+-----------------------------+------------------------+-----------------------------+ -|2017-11-01T00:06:00.000+08:00| false| 20.71| -|2017-11-01T00:07:00.000+08:00| false| 21.45| -|2017-11-01T00:08:00.000+08:00| false| 22.58| -|2017-11-01T00:09:00.000+08:00| false| 20.98| -|2017-11-01T00:10:00.000+08:00| true| 25.52| -|2017-11-01T00:11:00.000+08:00| false| 22.91| -+-----------------------------+------------------------+-----------------------------+ -Total line number = 6 -It costs 0.018s -``` - -#### Select Multiple Columns of Data for the Same Device According to Multiple Time Intervals - -IoTDB supports specifying multiple time interval conditions in a query. Users can combine time interval conditions at will according to their needs. For example, the SQL statement is: - -```sql -select status,temperature from root.ln.wf01.wt01 where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); -``` - -which means: - -The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature"; the statement specifies two different time intervals, namely "2017-11-01T00:05:00.000 to 2017-11-01T00:12:00.000" and "2017-11-01T16:35:00.000 to 2017-11-01T16:37:00.000". The SQL statement requires that the values of selected timeseries satisfying any time interval be selected. - -The execution result of this SQL statement is as follows: - -``` -+-----------------------------+------------------------+-----------------------------+ -| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| -+-----------------------------+------------------------+-----------------------------+ -|2017-11-01T00:06:00.000+08:00| false| 20.71| -|2017-11-01T00:07:00.000+08:00| false| 21.45| -|2017-11-01T00:08:00.000+08:00| false| 22.58| -|2017-11-01T00:09:00.000+08:00| false| 20.98| -|2017-11-01T00:10:00.000+08:00| true| 25.52| -|2017-11-01T00:11:00.000+08:00| false| 22.91| -|2017-11-01T16:35:00.000+08:00| true| 23.44| -|2017-11-01T16:36:00.000+08:00| false| 21.98| -|2017-11-01T16:37:00.000+08:00| false| 21.93| -+-----------------------------+------------------------+-----------------------------+ -Total line number = 9 -It costs 0.018s -``` - - -#### Choose Multiple Columns of Data for Different Devices According to Multiple Time Intervals - -The system supports the selection of data in any column in a query, i.e., the selected columns can come from different devices. For example, the SQL statement is: - -```sql -select wf01.wt01.status,wf02.wt02.hardware from root.ln where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); -``` - -which means: - -The selected timeseries are "the power supply status of ln group wf01 plant wt01 device" and "the hardware version of ln group wf02 plant wt02 device"; the statement specifies two different time intervals, namely "2017-11-01T00:05:00.000 to 2017-11-01T00:12:00.000" and "2017-11-01T16:35:00.000 to 2017-11-01T16:37:00.000". The SQL statement requires that the values of selected timeseries satisfying any time interval be selected. - -The execution result of this SQL statement is as follows: - -``` -+-----------------------------+------------------------+--------------------------+ -| Time|root.ln.wf01.wt01.status|root.ln.wf02.wt02.hardware| -+-----------------------------+------------------------+--------------------------+ -|2017-11-01T00:06:00.000+08:00| false| v1| -|2017-11-01T00:07:00.000+08:00| false| v1| -|2017-11-01T00:08:00.000+08:00| false| v1| -|2017-11-01T00:09:00.000+08:00| false| v1| -|2017-11-01T00:10:00.000+08:00| true| v2| -|2017-11-01T00:11:00.000+08:00| false| v1| -|2017-11-01T16:35:00.000+08:00| true| v2| -|2017-11-01T16:36:00.000+08:00| false| v1| -|2017-11-01T16:37:00.000+08:00| false| v1| -+-----------------------------+------------------------+--------------------------+ -Total line number = 9 -It costs 0.014s -``` - -#### Order By Time Query - -IoTDB supports the 'order by time' statement since 0.11, it's used to display results in descending order by time. -For example, the SQL statement is: - -```sql -select * from root.ln.** where time > 1 order by time desc limit 10; -``` - -The execution result of this SQL statement is as follows: - -``` -+-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ -| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| -+-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ -|2017-11-07T23:59:00.000+08:00| v1| false| 21.07| false| -|2017-11-07T23:58:00.000+08:00| v1| false| 22.93| false| -|2017-11-07T23:57:00.000+08:00| v2| true| 24.39| true| -|2017-11-07T23:56:00.000+08:00| v2| true| 24.44| true| -|2017-11-07T23:55:00.000+08:00| v2| true| 25.9| true| -|2017-11-07T23:54:00.000+08:00| v1| false| 22.52| false| -|2017-11-07T23:53:00.000+08:00| v2| true| 24.58| true| -|2017-11-07T23:52:00.000+08:00| v1| false| 20.18| false| -|2017-11-07T23:51:00.000+08:00| v1| false| 22.24| false| -|2017-11-07T23:50:00.000+08:00| v2| true| 23.7| true| -+-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ -Total line number = 10 -It costs 0.016s -``` - -### Execution Interface - -In IoTDB, there are two ways to execute data query: - -- Execute queries using IoTDB-SQL. -- Efficient execution interfaces for common queries, including time-series raw data query, last query, and aggregation query. - -#### Execute queries using IoTDB-SQL - -Data query statements can be used in SQL command-line terminals, JDBC, JAVA / C++ / Python / Go and other native APIs, and RESTful APIs. - -- Execute the query statement in the SQL command line terminal: start the SQL command line terminal, and directly enter the query statement to execute, see [SQL command line terminal](../Tools-System/CLI.md). - -- Execute query statements in JDBC, see [JDBC](../API/Programming-JDBC.md) for details. - -- Execute query statements in native APIs such as JAVA / C++ / Python / Go. For details, please refer to the relevant documentation in the Application Programming Interface chapter. The interface prototype is as follows: - - ````java - SessionDataSet executeQueryStatement(String sql) - ```` - -- Used in RESTful API, see [HTTP API V1](../API/RestServiceV1.md) or [HTTP API V2](../API/RestServiceV2.md) for details. - -#### Efficient execution interfaces - -The native APIs provide efficient execution interfaces for commonly used queries, which can save time-consuming operations such as SQL parsing. include: - -* Time-series raw data query with time range: - - The specified query time range is a left-closed right-open interval, including the start time but excluding the end time. - -```java -SessionDataSet executeRawDataQuery(List paths, long startTime, long endTime); -``` - -* Last query: - - Query the last data, whose timestamp is greater than or equal LastTime. - -```java -SessionDataSet executeLastDataQuery(List paths, long LastTime); -``` - -* Aggregation query: - - Support specified query time range: The specified query time range is a left-closed right-open interval, including the start time but not the end time. - - Support GROUP BY TIME. - -```java -SessionDataSet executeAggregationQuery(List paths, List aggregations); - -SessionDataSet executeAggregationQuery( - List paths, List aggregations, long startTime, long endTime); - -SessionDataSet executeAggregationQuery( - List paths, - List aggregations, - long startTime, - long endTime, - long interval); - -SessionDataSet executeAggregationQuery( - List paths, - List aggregations, - long startTime, - long endTime, - long interval, - long slidingStep); -``` - -## `SELECT` CLAUSE -The `SELECT` clause specifies the output of the query, consisting of several `selectExpr`. Each `selectExpr` defines one or more columns in the query result. For select expression details, see document [Operator-and-Expression](./Operator-and-Expression.md). - -- Example 1: - -```sql -select temperature from root.ln.wf01.wt01 -``` - -- Example 2: - -```sql -select status, temperature from root.ln.wf01.wt01 -``` - -### Last Query - -The last query is a special type of query in Apache IoTDB. It returns the data point with the largest timestamp of the specified time series. In other word, it returns the latest state of a time series. This feature is especially important in IoT data analysis scenarios. To meet the performance requirement of real-time device monitoring systems, Apache IoTDB caches the latest values of all time series to achieve microsecond read latency. - -The last query is to return the most recent data point of the given timeseries in a three column format. - -The SQL syntax is defined as: - -```sql -select last [COMMA ]* from < PrefixPath > [COMMA < PrefixPath >]* [ORDER BY TIMESERIES (DESC | ASC)?] -``` - -which means: Query and return the last data points of timeseries prefixPath.path. - -- Only time filter is supported in \. Any other filters given in the \ will give an exception. When the cached most recent data point does not satisfy the criterion specified by the filter, IoTDB will have to get the result from the external storage, which may cause a decrease in performance. - -- The result will be returned in a four column table format. - - ``` - | Time | timeseries | value | dataType | - ``` - - **Note:** The `value` colum will always return the value as `string` and thus also has `TSDataType.TEXT`. Therefore, the column `dataType` is returned also which contains the _real_ type how the value should be interpreted. - -- We can use `TIME/TIMESERIES/VALUE/DATATYPE (DESC | ASC)` to specify that the result set is sorted in descending/ascending order based on a particular column. When the value column contains multiple types of data, the sorting is based on the string representation of the values. - -**Example 1:** get the last point of root.ln.wf01.wt01.status: - -``` -IoTDB> select last status from root.ln.wf01.wt01 -+-----------------------------+------------------------+-----+--------+ -| Time| timeseries|value|dataType| -+-----------------------------+------------------------+-----+--------+ -|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.status|false| BOOLEAN| -+-----------------------------+------------------------+-----+--------+ -Total line number = 1 -It costs 0.000s -``` - -**Example 2:** get the last status and temperature points of root.ln.wf01.wt01, whose timestamp larger or equal to 2017-11-07T23:50:00。 - -``` -IoTDB> select last status, temperature from root.ln.wf01.wt01 where time >= 2017-11-07T23:50:00 -+-----------------------------+-----------------------------+---------+--------+ -| Time| timeseries| value|dataType| -+-----------------------------+-----------------------------+---------+--------+ -|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| -|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| -+-----------------------------+-----------------------------+---------+--------+ -Total line number = 2 -It costs 0.002s -``` - -**Example 3:** get the last points of all sensor in root.ln.wf01.wt01, and order the result by the timeseries column in descending order - -``` -IoTDB> select last * from root.ln.wf01.wt01 order by timeseries desc; -+-----------------------------+-----------------------------+---------+--------+ -| Time| timeseries| value|dataType| -+-----------------------------+-----------------------------+---------+--------+ -|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| -|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| -+-----------------------------+-----------------------------+---------+--------+ -Total line number = 2 -It costs 0.002s -``` - -**Example 4:** get the last points of all sensor in root.ln.wf01.wt01, and order the result by the dataType column in descending order - -``` -IoTDB> select last * from root.ln.wf01.wt01 order by dataType desc; -+-----------------------------+-----------------------------+---------+--------+ -| Time| timeseries| value|dataType| -+-----------------------------+-----------------------------+---------+--------+ -|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| -|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| -+-----------------------------+-----------------------------+---------+--------+ -Total line number = 2 -It costs 0.002s -``` - -## `WHERE` CLAUSE - -In IoTDB query statements, two filter conditions, **time filter** and **value filter**, are supported. - -The supported operators are as follows: - -- Comparison operators: greater than (`>`), greater than or equal ( `>=`), equal ( `=` or `==`), not equal ( `!=` or `<>`), less than or equal ( `<=`), less than ( `<`). -- Logical operators: and ( `AND` or `&` or `&&`), or ( `OR` or `|` or `||`), not ( `NOT` or `!`). -- Range contains operator: contains ( `IN` ). -- String matches operator: `LIKE`, `REGEXP`. - -### Time Filter - -Use time filters to filter data for a specific time range. For supported formats of timestamps, please refer to [Timestamp](../Basic-Concept/Data-Type.md) . - -An example is as follows: - -1. Select data with timestamp greater than 2022-01-01T00:05:00.000: - - ```sql - select s1 from root.sg1.d1 where time > 2022-01-01T00:05:00.000; - ```` - -2. Select data with timestamp equal to 2022-01-01T00:05:00.000: - - ```sql - select s1 from root.sg1.d1 where time = 2022-01-01T00:05:00.000; - ```` - -3. Select the data in the time interval [2017-11-01T00:05:00.000, 2017-11-01T00:12:00.000): - - ```sql - select s1 from root.sg1.d1 where time >= 2022-01-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; - ```` - -Note: In the above example, `time` can also be written as `timestamp`. - -### Value Filter - -Use value filters to filter data whose data values meet certain criteria. **Allow** to use a time series not selected in the select clause as a value filter. - -An example is as follows: - -1. Select data with a value greater than 36.5: - - ```sql - select temperature from root.sg1.d1 where temperature > 36.5; - ```` - -2. Select data with value equal to true: - - ```sql - select status from root.sg1.d1 where status = true; - ```` - -3. Select data for the interval [36.5,40] or not: - - ```sql - select temperature from root.sg1.d1 where temperature between 36.5 and 40; - ```` - - ```sql - select temperature from root.sg1.d1 where temperature not between 36.5 and 40; - ```` - -4. Select data with values within a specific range: - - ```sql - select code from root.sg1.d1 where code in ('200', '300', '400', '500'); - ```` - -5. Select data with values outside a certain range: - - ```sql - select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); - ```` - -6. Select data with values is null: - - ```sql - select code from root.sg1.d1 where temperature is null; - ```` - -7. Select data with values is not null: - - ```sql - select code from root.sg1.d1 where temperature is not null; - ```` - -### Fuzzy Query - -Fuzzy query is divided into Like statement and Regexp statement, both of which can support fuzzy matching of TEXT type data. - -Like statement: - -#### Fuzzy matching using `Like` - -In the value filter condition, for TEXT type data, use `Like` and `Regexp` operators to perform fuzzy matching on data. - -**Matching rules:** - -- The percentage (`%`) wildcard matches any string of zero or more characters. -- The underscore (`_`) wildcard matches any single character. - -**Example 1:** Query data containing `'cc'` in `value` under `root.sg.d1`. - -``` -IoTDB> select * from root.sg.d1 where value like '%cc%' -+-----------------------------+----------------+ -| Time|root.sg.d1.value| -+-----------------------------+----------------+ -|2017-11-01T00:00:00.000+08:00| aabbccdd| -|2017-11-01T00:00:01.000+08:00| cc| -+-----------------------------+----------------+ -Total line number = 2 -It costs 0.002s -``` - -**Example 2:** Query data that consists of 3 characters and the second character is `'b'` in `value` under `root.sg.d1`. - -``` -IoTDB> select * from root.sg.device where value like '_b_' -+-----------------------------+----------------+ -| Time|root.sg.d1.value| -+-----------------------------+----------------+ -|2017-11-01T00:00:02.000+08:00| abc| -+-----------------------------+----------------+ -Total line number = 1 -It costs 0.002s -``` - -#### Fuzzy matching using `Regexp` - -The filter conditions that need to be passed in are regular expressions in the Java standard library style. - -**Examples of common regular matching:** - -``` -All characters with a length of 3-20: ^.{3,20}$ -Uppercase english characters: ^[A-Z]+$ -Numbers and English characters: ^[A-Za-z0-9]+$ -Beginning with a: ^a.* -``` - -**Example 1:** Query a string composed of 26 English characters for the value under root.sg.d1 - -``` -IoTDB> select * from root.sg.d1 where value regexp '^[A-Za-z]+$' -+-----------------------------+----------------+ -| Time|root.sg.d1.value| -+-----------------------------+----------------+ -|2017-11-01T00:00:00.000+08:00| aabbccdd| -|2017-11-01T00:00:01.000+08:00| cc| -+-----------------------------+----------------+ -Total line number = 2 -It costs 0.002s -``` - -**Example 2:** Query root.sg.d1 where the value value is a string composed of 26 lowercase English characters and the time is greater than 100 - -``` -IoTDB> select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 -+-----------------------------+----------------+ -| Time|root.sg.d1.value| -+-----------------------------+----------------+ -|2017-11-01T00:00:00.000+08:00| aabbccdd| -|2017-11-01T00:00:01.000+08:00| cc| -+-----------------------------+----------------+ -Total line number = 2 -It costs 0.002s -``` - -## `GROUP BY` CLAUSE - -IoTDB supports using `GROUP BY` clause to aggregate the time series by segment and group. - -Segmented aggregation refers to segmenting data in the row direction according to the time dimension, aiming at the time relationship between different data points in the same time series, and obtaining an aggregated value for each segment. Currently only **group by time**、**group by variation**、**group by condition**、**group by session** and **group by count** is supported, and more segmentation methods will be supported in the future. - -Group aggregation refers to grouping the potential business attributes of time series for different time series. Each group contains several time series, and each group gets an aggregated value. Support **group by path level** and **group by tag** two grouping methods. - -### Aggregate By Segment - -#### Aggregate By Time - -Aggregate by time is a typical query method for time series data. Data is collected at high frequency and needs to be aggregated and calculated at certain time intervals. For example, to calculate the daily average temperature, the sequence of temperature needs to be segmented by day, and then calculated. average value. - -Aggregate by time refers to a query method that uses a lower frequency than the time frequency of data collection, and is a special case of segmented aggregation. For example, the frequency of data collection is one second. If you want to display the data in one minute, you need to use time aggregagtion. - -This section mainly introduces the related examples of time aggregation, using the `GROUP BY` clause. IoTDB supports partitioning result sets according to time interval and customized sliding step. And by default results are sorted by time in ascending order. - -The GROUP BY statement provides users with three types of specified parameters: - -* Parameter 1: The display window on the time axis -* Parameter 2: Time interval for dividing the time axis(should be positive) -* Parameter 3: Time sliding step (optional and defaults to equal the time interval if not set) - -The actual meanings of the three types of parameters are shown in Figure below. -Among them, the parameter 3 is optional. - -
-
- - -There are three typical examples of frequency reduction aggregation: - -##### Aggregate By Time without Specifying the Sliding Step Length - -The SQL statement is: - -```sql -select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d); -``` - -which means: - -Since the sliding step length is not specified, the `GROUP BY` statement by default set the sliding step the same as the time interval which is `1d`. - -The fist parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-11-01T00:00:00, 2017-11-07T23:00:00). - -The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (1d) as time interval and startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [0,1d), [1d, 2d), [2d, 3d), etc. - -Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of [2017-11-01T00:00:00, 2017-11-07 T23:00:00]), and map these data to the previously segmented time axis (in this case there are mapped data in every 1-day period from 2017-11-01T00:00:00 to 2017-11-07T23:00:00:00). - -Since there is data for each time period in the result range to be displayed, the execution result of the SQL statement is shown below: - -``` -+-----------------------------+-------------------------------+----------------------------------------+ -| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| -+-----------------------------+-------------------------------+----------------------------------------+ -|2017-11-01T00:00:00.000+08:00| 1440| 26.0| -|2017-11-02T00:00:00.000+08:00| 1440| 26.0| -|2017-11-03T00:00:00.000+08:00| 1440| 25.99| -|2017-11-04T00:00:00.000+08:00| 1440| 26.0| -|2017-11-05T00:00:00.000+08:00| 1440| 26.0| -|2017-11-06T00:00:00.000+08:00| 1440| 25.99| -|2017-11-07T00:00:00.000+08:00| 1380| 26.0| -+-----------------------------+-------------------------------+----------------------------------------+ -Total line number = 7 -It costs 0.024s -``` - -##### Aggregate By Time Specifying the Sliding Step Length - -The SQL statement is: - -```sql -select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d); -``` - -which means: - -Since the user specifies the sliding step parameter as 1d, the `GROUP BY` statement will move the time interval `1 day` long instead of `3 hours` as default. - -That means we want to fetch all the data of 00:00:00 to 02:59:59 every day from 2017-11-01 to 2017-11-07. - -The first parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-11-01T00:00:00, 2017-11-07T23:00:00). - -The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (3h) as time interval and the startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [2017-11-01T00:00:00, 2017-11-01T03:00:00), [2017-11-02T00:00:00, 2017-11-02T03:00:00), [2017-11-03T00:00:00, 2017-11-03T03:00:00), etc. - -The third parameter of the `GROUP BY` statement above is the sliding step for each time interval moving. - -Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of [2017-11-01T00:00:00, 2017-11-07T23:00:00]), and map these data to the previously segmented time axis (in this case there are mapped data in every 3-hour period for each day from 2017-11-01T00:00:00 to 2017-11-07T23:00:00:00). - -Since there is data for each time period in the result range to be displayed, the execution result of the SQL statement is shown below: - -``` -+-----------------------------+-------------------------------+----------------------------------------+ -| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| -+-----------------------------+-------------------------------+----------------------------------------+ -|2017-11-01T00:00:00.000+08:00| 180| 25.98| -|2017-11-02T00:00:00.000+08:00| 180| 25.98| -|2017-11-03T00:00:00.000+08:00| 180| 25.96| -|2017-11-04T00:00:00.000+08:00| 180| 25.96| -|2017-11-05T00:00:00.000+08:00| 180| 26.0| -|2017-11-06T00:00:00.000+08:00| 180| 25.85| -|2017-11-07T00:00:00.000+08:00| 180| 25.99| -+-----------------------------+-------------------------------+----------------------------------------+ -Total line number = 7 -It costs 0.006s -``` - -The sliding step can be smaller than the interval, in which case there is overlapping time between the aggregation windows (similar to a sliding window). - -The SQL statement is: - -```sql -select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-01 10:00:00), 4h, 2h); -``` - -The execution result of the SQL statement is shown below: - -``` -+-----------------------------+-------------------------------+----------------------------------------+ -| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| -+-----------------------------+-------------------------------+----------------------------------------+ -|2017-11-01T00:00:00.000+08:00| 180| 25.98| -|2017-11-01T02:00:00.000+08:00| 180| 25.98| -|2017-11-01T04:00:00.000+08:00| 180| 25.96| -|2017-11-01T06:00:00.000+08:00| 180| 25.96| -|2017-11-01T08:00:00.000+08:00| 180| 26.0| -+-----------------------------+-------------------------------+----------------------------------------+ -Total line number = 5 -It costs 0.006s -``` - -##### Aggregate by Natural Month - -The SQL statement is: - -```sql -select count(status) from root.ln.wf01.wt01 group by([2017-11-01T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); -``` - -which means: - -Since the user specifies the sliding step parameter as `2mo`, the `GROUP BY` statement will move the time interval `2 months` long instead of `1 month` as default. - -The first parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-11-01T00:00:00, 2019-11-07T23:00:00). - -The start time is 2017-11-01T00:00:00. The sliding step will increment monthly based on the start date, and the 1st day of the month will be used as the time interval's start time. - -The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (1mo) as time interval and the startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [2017-11-01T00:00:00, 2017-12-01T00:00:00), [2018-02-01T00:00:00, 2018-03-01T00:00:00), [2018-05-03T00:00:00, 2018-06-01T00:00:00)), etc. - -The third parameter of the `GROUP BY` statement above is the sliding step for each time interval moving. - -Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of (2017-11-01T00:00:00, 2019-11-07T23:00:00], and map these data to the previously segmented time axis (in this case there are mapped data of the first month in every two month period from 2017-11-01T00:00:00 to 2019-11-07T23:00:00). - -The SQL execution result is: - -``` -+-----------------------------+-------------------------------+ -| Time|count(root.ln.wf01.wt01.status)| -+-----------------------------+-------------------------------+ -|2017-11-01T00:00:00.000+08:00| 259| -|2018-01-01T00:00:00.000+08:00| 250| -|2018-03-01T00:00:00.000+08:00| 259| -|2018-05-01T00:00:00.000+08:00| 251| -|2018-07-01T00:00:00.000+08:00| 242| -|2018-09-01T00:00:00.000+08:00| 225| -|2018-11-01T00:00:00.000+08:00| 216| -|2019-01-01T00:00:00.000+08:00| 207| -|2019-03-01T00:00:00.000+08:00| 216| -|2019-05-01T00:00:00.000+08:00| 207| -|2019-07-01T00:00:00.000+08:00| 199| -|2019-09-01T00:00:00.000+08:00| 181| -|2019-11-01T00:00:00.000+08:00| 60| -+-----------------------------+-------------------------------+ -``` - -The SQL statement is: - -```sql -select count(status) from root.ln.wf01.wt01 group by([2017-10-31T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); -``` - -which means: - -Since the user specifies the sliding step parameter as `2mo`, the `GROUP BY` statement will move the time interval `2 months` long instead of `1 month` as default. - -The first parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-10-31T00:00:00, 2019-11-07T23:00:00). - -Different from the previous example, the start time is set to 2017-10-31T00:00:00. The sliding step will increment monthly based on the start date, and the 31st day of the month meaning the last day of the month will be used as the time interval's start time. If the start time is set to the 30th date, the sliding step will use the 30th or the last day of the month. - -The start time is 2017-10-31T00:00:00. The sliding step will increment monthly based on the start time, and the 1st day of the month will be used as the time interval's start time. - -The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (1mo) as time interval and the startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [2017-10-31T00:00:00, 2017-11-31T00:00:00), [2018-02-31T00:00:00, 2018-03-31T00:00:00), [2018-05-31T00:00:00, 2018-06-31T00:00:00), etc. - -The third parameter of the `GROUP BY` statement above is the sliding step for each time interval moving. - -Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of [2017-10-31T00:00:00, 2019-11-07T23:00:00) and map these data to the previously segmented time axis (in this case there are mapped data of the first month in every two month period from 2017-10-31T00:00:00 to 2019-11-07T23:00:00). - -The SQL execution result is: - -``` -+-----------------------------+-------------------------------+ -| Time|count(root.ln.wf01.wt01.status)| -+-----------------------------+-------------------------------+ -|2017-10-31T00:00:00.000+08:00| 251| -|2017-12-31T00:00:00.000+08:00| 250| -|2018-02-28T00:00:00.000+08:00| 259| -|2018-04-30T00:00:00.000+08:00| 250| -|2018-06-30T00:00:00.000+08:00| 242| -|2018-08-31T00:00:00.000+08:00| 225| -|2018-10-31T00:00:00.000+08:00| 216| -|2018-12-31T00:00:00.000+08:00| 208| -|2019-02-28T00:00:00.000+08:00| 216| -|2019-04-30T00:00:00.000+08:00| 208| -|2019-06-30T00:00:00.000+08:00| 199| -|2019-08-31T00:00:00.000+08:00| 181| -|2019-10-31T00:00:00.000+08:00| 69| -+-----------------------------+-------------------------------+ -``` - -##### Left Open And Right Close Range - -The SQL statement is: - -```sql -select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d); -``` - -In this sql, the time interval is left open and right close, so we won't include the value of timestamp 2017-11-01T00:00:00 and instead we will include the value of timestamp 2017-11-07T23:00:00. - -We will get the result like following: - -``` -+-----------------------------+-------------------------------+ -| Time|count(root.ln.wf01.wt01.status)| -+-----------------------------+-------------------------------+ -|2017-11-02T00:00:00.000+08:00| 1440| -|2017-11-03T00:00:00.000+08:00| 1440| -|2017-11-04T00:00:00.000+08:00| 1440| -|2017-11-05T00:00:00.000+08:00| 1440| -|2017-11-06T00:00:00.000+08:00| 1440| -|2017-11-07T00:00:00.000+08:00| 1440| -|2017-11-07T23:00:00.000+08:00| 1380| -+-----------------------------+-------------------------------+ -Total line number = 7 -It costs 0.004s -``` - -#### Aggregation By Variation - -IoTDB supports grouping by continuous stable values through the `GROUP BY VARIATION` statement. - -Group-By-Variation wil set the first point in group as the base point, -then if the difference between the new data and base point is small than or equal to delta, -the data point will be grouped together and execute aggregation query (The calculation of difference and the meaning of delte are introduced below). The groups won't overlap and there is no fixed start time and end time. -The syntax of clause is as follows: - -```sql -group by variation(controlExpression[,delta][,ignoreNull=true/false]) -``` - -The different parameters mean: - -* controlExpression - -The value that is used to calculate difference. It can be any columns or the expression of them. - -* delta - -The threshold that is used when grouping. The difference of controlExpression between the first data point and new data point should less than or equal to delta. -When delta is zero, all the continuous data with equal expression value will be grouped into the same group. - -* ignoreNull - -Used to specify how to deal with the data when the value of controlExpression is null. When ignoreNull is false, null will be treated as a new value and when ignoreNull is true, the data point will be directly skipped. - -The supported return types of controlExpression and how to deal with null value when ignoreNull is false are shown in the following table: - -| delta | Return Type Supported By controlExpression | The Handling of null when ignoreNull is False | -| -------- | ------------------------------------------ | ------------------------------------------------------------ | -| delta!=0 | INT32、INT64、FLOAT、DOUBLE | If the processing group doesn't contains null, null value should be treated as infinity/infinitesimal and will end current group.
Continuous null values are treated as stable values and assigned to the same group. | -| delta=0 | TEXT、BINARY、INT32、INT64、FLOAT、DOUBLE | Null is treated as a new value in a new group and continuous nulls belong to the same group. | - -groupByVariation - -##### Precautions for Use - -1. The result of controlExpression should be a unique value. If multiple columns appear after using wildcard stitching, an error will be reported. -2. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. -3. Each device is grouped separately when used with `ALIGN BY DEVICE`. -4. Delta is zero and ignoreNull is true by default. -5. Currently `GROUP BY VARIATION` is not supported with `GROUP BY LEVEL`. - -Using the raw data below, several examples of `GROUP BY VARIAITON` queries will be given. - -``` -+-----------------------------+-------+-------+-------+--------+-------+-------+ -| Time| s1| s2| s3| s4| s5| s6| -+-----------------------------+-------+-------+-------+--------+-------+-------+ -|1970-01-01T08:00:00.000+08:00| 4.5| 9.0| 0.0| 45.0| 9.0| 8.25| -|1970-01-01T08:00:00.010+08:00| null| 19.0| 10.0| 145.0| 19.0| 8.25| -|1970-01-01T08:00:00.020+08:00| 24.5| 29.0| null| 245.0| 29.0| null| -|1970-01-01T08:00:00.030+08:00| 34.5| null| 30.0| 345.0| null| null| -|1970-01-01T08:00:00.040+08:00| 44.5| 49.0| 40.0| 445.0| 49.0| 8.25| -|1970-01-01T08:00:00.050+08:00| null| 59.0| 50.0| 545.0| 59.0| 6.25| -|1970-01-01T08:00:00.060+08:00| 64.5| 69.0| 60.0| 645.0| 69.0| null| -|1970-01-01T08:00:00.070+08:00| 74.5| 79.0| null| null| 79.0| 3.25| -|1970-01-01T08:00:00.080+08:00| 84.5| 89.0| 80.0| 845.0| 89.0| 3.25| -|1970-01-01T08:00:00.090+08:00| 94.5| 99.0| 90.0| 945.0| 99.0| 3.25| -|1970-01-01T08:00:00.150+08:00| 66.5| 77.0| 90.0| 945.0| 99.0| 9.25| -+-----------------------------+-------+-------+-------+--------+-------+-------+ -``` - -##### delta = 0 - -The sql is shown below: - -```sql -select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6) -``` - -Get the result below which ignores the row with null value in `s6`. - -``` -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.040+08:00| 24.5| 3| 50.0| -|1970-01-01T08:00:00.050+08:00|1970-01-01T08:00:00.050+08:00| null| 1| 50.0| -|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| -|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -``` - -when ignoreNull is false, the row with null value in `s6` will be considered. - -```sql -select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, ignoreNull=false) -``` - -Get the following result. - -``` -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.010+08:00| 4.5| 2| 10.0| -|1970-01-01T08:00:00.020+08:00|1970-01-01T08:00:00.030+08:00| 29.5| 1| 30.0| -|1970-01-01T08:00:00.040+08:00|1970-01-01T08:00:00.040+08:00| 44.5| 1| 40.0| -|1970-01-01T08:00:00.050+08:00|1970-01-01T08:00:00.050+08:00| null| 1| 50.0| -|1970-01-01T08:00:00.060+08:00|1970-01-01T08:00:00.060+08:00| 64.5| 1| 60.0| -|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| -|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -``` - -##### delta !=0 - -The sql is shown below: - -```sql -select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, 4) -``` - -Get the result below: - -``` -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.050+08:00| 24.5| 4| 100.0| -|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| -|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -``` - -The sql is shown below: - -```sql -select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6+s5, 10) -``` - -Get the result below: - -``` -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.010+08:00| 4.5| 2| 10.0| -|1970-01-01T08:00:00.040+08:00|1970-01-01T08:00:00.050+08:00| 44.5| 2| 90.0| -|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.080+08:00| 79.5| 2| 80.0| -|1970-01-01T08:00:00.090+08:00|1970-01-01T08:00:00.150+08:00| 80.5| 2| 180.0| -+-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ -``` - -#### Aggregation By Condition - -When you need to filter the data according to a specific condition and group the continuous ones for an aggregation query. -`GROUP BY CONDITION` is suitable for you.The rows which don't meet the given condition will be simply ignored because they don't belong to any group. -Its syntax is defined below: - -```sql -group by condition(predict,[keep>/>=/=/<=/<]threshold,[,ignoreNull=true/false]) -``` - -* predict - -Any legal expression return the type of boolean for filtering in grouping. - -* [keep>/>=/=/<=/<]threshold - -Keep expression is used to specify the number of continuous rows that meet the `predict` condition to form a group. Only the number of rows in group satisfy the keep condition, the result of group will be output. -Keep expression consists of a 'keep' string and a threshold of type `long` or a single 'long' type data. - -* ignoreNull=true/false - -Used to specify how to handle data rows that encounter null predict, skip the row when it's true and end current group when it's false. - -##### Precautions for Use - -1. keep condition is required in the query, but you can omit the 'keep' string and given a `long` number which defaults to 'keep=long number' condition. -2. IgnoreNull defaults to true. -3. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. -4. Each device is grouped separately when used with `ALIGN BY DEVICE`. -5. Currently `GROUP BY CONDITION` is not supported with `GROUP BY LEVEL`. - -For the following raw data, several query examples are given below: - -``` -+-----------------------------+-------------------------+-------------------------------------+------------------------------------+ -| Time|root.sg.beijing.car01.soc|root.sg.beijing.car01.charging_status|root.sg.beijing.car01.vehicle_status| -+-----------------------------+-------------------------+-------------------------------------+------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 14.0| 1| 1| -|1970-01-01T08:00:00.002+08:00| 16.0| 1| 1| -|1970-01-01T08:00:00.003+08:00| 16.0| 0| 1| -|1970-01-01T08:00:00.004+08:00| 16.0| 0| 1| -|1970-01-01T08:00:00.005+08:00| 18.0| 1| 1| -|1970-01-01T08:00:00.006+08:00| 24.0| 1| 1| -|1970-01-01T08:00:00.007+08:00| 36.0| 1| 1| -|1970-01-01T08:00:00.008+08:00| 36.0| null| 1| -|1970-01-01T08:00:00.009+08:00| 45.0| 1| 1| -|1970-01-01T08:00:00.010+08:00| 60.0| 1| 1| -+-----------------------------+-------------------------+-------------------------------------+------------------------------------+ -``` - -The sql statement to query data with at least two continuous row shown below: - -```sql -select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoringNull=true) -``` - -Get the result below: - -``` -+-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ -| Time|max_time(root.sg.beijing.car01.charging_status)|count(root.sg.beijing.car01.vehicle_status)|last_value(root.sg.beijing.car01.soc)| -+-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 2| 2| 16.0| -|1970-01-01T08:00:00.005+08:00| 10| 5| 60.0| -+-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ -``` - -When ignoreNull is false, the null value will be treated as a row that doesn't meet the condition. - -```sql -select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoringNull=false) -``` - -Get the result below, the original group is split. - -``` -+-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ -| Time|max_time(root.sg.beijing.car01.charging_status)|count(root.sg.beijing.car01.vehicle_status)|last_value(root.sg.beijing.car01.soc)| -+-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ -|1970-01-01T08:00:00.001+08:00| 2| 2| 16.0| -|1970-01-01T08:00:00.005+08:00| 7| 3| 36.0| -|1970-01-01T08:00:00.009+08:00| 10| 2| 60.0| -+-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ -``` - -#### Aggregation By Session - -`GROUP BY SESSION` can be used to group data according to the interval of the time. Data with a time interval less than or equal to the given threshold will be assigned to the same group. -For example, in industrial scenarios, devices don't always run continuously, `GROUP BY SESSION` will group the data generated by each access session of the device. -Its syntax is defined as follows: - -```sql -group by session(timeInterval) -``` - -* timeInterval - -A given interval threshold to create a new group of data when the difference between the time of data is greater than the threshold. - -The figure below is a grouping diagram under `GROUP BY SESSION`. - -groupBySession - -##### Precautions for Use - -1. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. -2. Each device is grouped separately when used with `ALIGN BY DEVICE`. -3. Currently `GROUP BY SESSION` is not supported with `GROUP BY LEVEL`. - -For the raw data below, a few query examples are given: - -``` -+-----------------------------+-----------------+-----------+--------+------+ -| Time| Device|temperature|hardware|status| -+-----------------------------+-----------------+-----------+--------+------+ -|1970-01-01T08:00:01.000+08:00|root.ln.wf02.wt01| 35.7| 11| false| -|1970-01-01T08:00:02.000+08:00|root.ln.wf02.wt01| 35.8| 22| true| -|1970-01-01T08:00:03.000+08:00|root.ln.wf02.wt01| 35.4| 33| false| -|1970-01-01T08:00:04.000+08:00|root.ln.wf02.wt01| 36.4| 44| false| -|1970-01-01T08:00:05.000+08:00|root.ln.wf02.wt01| 36.8| 55| false| -|1970-01-01T08:00:10.000+08:00|root.ln.wf02.wt01| 36.8| 110| false| -|1970-01-01T08:00:20.000+08:00|root.ln.wf02.wt01| 37.8| 220| true| -|1970-01-01T08:00:30.000+08:00|root.ln.wf02.wt01| 37.5| 330| false| -|1970-01-01T08:00:40.000+08:00|root.ln.wf02.wt01| 37.4| 440| false| -|1970-01-01T08:00:50.000+08:00|root.ln.wf02.wt01| 37.9| 550| false| -|1970-01-01T08:01:40.000+08:00|root.ln.wf02.wt01| 38.0| 110| false| -|1970-01-01T08:02:30.000+08:00|root.ln.wf02.wt01| 38.8| 220| true| -|1970-01-01T08:03:20.000+08:00|root.ln.wf02.wt01| 38.6| 330| false| -|1970-01-01T08:04:20.000+08:00|root.ln.wf02.wt01| 38.4| 440| false| -|1970-01-01T08:05:20.000+08:00|root.ln.wf02.wt01| 38.3| 550| false| -|1970-01-01T08:06:40.000+08:00|root.ln.wf02.wt01| null| 0| null| -|1970-01-01T08:07:50.000+08:00|root.ln.wf02.wt01| null| 0| null| -|1970-01-01T08:08:00.000+08:00|root.ln.wf02.wt01| null| 0| null| -|1970-01-02T08:08:01.000+08:00|root.ln.wf02.wt01| 38.2| 110| false| -|1970-01-02T08:08:02.000+08:00|root.ln.wf02.wt01| 37.5| 220| true| -|1970-01-02T08:08:03.000+08:00|root.ln.wf02.wt01| 37.4| 330| false| -|1970-01-02T08:08:04.000+08:00|root.ln.wf02.wt01| 36.8| 440| false| -|1970-01-02T08:08:05.000+08:00|root.ln.wf02.wt01| 37.4| 550| false| -+-----------------------------+-----------------+-----------+--------+------+ -``` - -TimeInterval can be set by different time units, the sql is shown below: - -```sql -select __endTime,count(*) from root.** group by session(1d) -``` - -Get the result: - -``` -+-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ -| Time| __endTime|count(root.ln.wf02.wt01.temperature)|count(root.ln.wf02.wt01.hardware)|count(root.ln.wf02.wt01.status)| -+-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ -|1970-01-01T08:00:01.000+08:00|1970-01-01T08:08:00.000+08:00| 15| 18| 15| -|1970-01-02T08:08:01.000+08:00|1970-01-02T08:08:05.000+08:00| 5| 5| 5| -+-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ -``` - -It can be also used with `HAVING` and `ALIGN BY DEVICE` clauses. - -```sql -select __endTime,sum(hardware) from root.ln.wf02.wt01 group by session(50s) having sum(hardware)>0 align by device -``` - -Get the result below: - -``` -+-----------------------------+-----------------+-----------------------------+-------------+ -| Time| Device| __endTime|sum(hardware)| -+-----------------------------+-----------------+-----------------------------+-------------+ -|1970-01-01T08:00:01.000+08:00|root.ln.wf02.wt01|1970-01-01T08:03:20.000+08:00| 2475.0| -|1970-01-01T08:04:20.000+08:00|root.ln.wf02.wt01|1970-01-01T08:04:20.000+08:00| 440.0| -|1970-01-01T08:05:20.000+08:00|root.ln.wf02.wt01|1970-01-01T08:05:20.000+08:00| 550.0| -|1970-01-02T08:08:01.000+08:00|root.ln.wf02.wt01|1970-01-02T08:08:05.000+08:00| 1650.0| -+-----------------------------+-----------------+-----------------------------+-------------+ -``` - -#### Aggregation By Count - -`GROUP BY COUNT`can aggregate the data points according to the number of points. It can group fixed number of continuous data points together for aggregation query. -Its syntax is defined as follows: - -```sql -group by count(controlExpression, size[,ignoreNull=true/false]) -``` - -* controlExpression - -The object to count during processing, it can be any column or an expression of columns. - -* size - -The number of data points in a group, a number of `size` continuous points will be divided to the same group. - -* ignoreNull=true/false - -Whether to ignore the data points with null in `controlExpression`, when ignoreNull is true, data points with the `controlExpression` of null will be skipped during counting. - -##### Precautions for Use - -1. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. -2. Each device is grouped separately when used with `ALIGN BY DEVICE`. -3. Currently `GROUP BY SESSION` is not supported with `GROUP BY LEVEL`. -4. When the final number of data points in a group is less than `size`, the result of the group will not be output. - -For the data below, some examples will be given. - -``` -+-----------------------------+-----------+-----------------------+ -| Time|root.sg.soc|root.sg.charging_status| -+-----------------------------+-----------+-----------------------+ -|1970-01-01T08:00:00.001+08:00| 14.0| 1| -|1970-01-01T08:00:00.002+08:00| 16.0| 1| -|1970-01-01T08:00:00.003+08:00| 16.0| 0| -|1970-01-01T08:00:00.004+08:00| 16.0| 0| -|1970-01-01T08:00:00.005+08:00| 18.0| 1| -|1970-01-01T08:00:00.006+08:00| 24.0| 1| -|1970-01-01T08:00:00.007+08:00| 36.0| 1| -|1970-01-01T08:00:00.008+08:00| 36.0| null| -|1970-01-01T08:00:00.009+08:00| 45.0| 1| -|1970-01-01T08:00:00.010+08:00| 60.0| 1| -+-----------------------------+-----------+-----------------------+ -``` - -The sql is shown below - -```sql -select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5) -``` - -Get the result below, in the second group from 1970-01-01T08:00:00.006+08:00 to 1970-01-01T08:00:00.010+08:00. There are only four points included which is less than `size`. So it won't be output. - -``` -+-----------------------------+-----------------------------+--------------------------------------+ -| Time| __endTime|first_value(root.sg.beijing.car01.soc)| -+-----------------------------+-----------------------------+--------------------------------------+ -|1970-01-01T08:00:00.001+08:00|1970-01-01T08:00:00.005+08:00| 14.0| -+-----------------------------+-----------------------------+--------------------------------------+ -``` - -When `ignoreNull=false` is used to take null value into account. There will be two groups with 5 points in the resultSet, which is shown as follows: - -```sql -select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5,ignoreNull=false) -``` - -Get the results: - -``` -+-----------------------------+-----------------------------+--------------------------------------+ -| Time| __endTime|first_value(root.sg.beijing.car01.soc)| -+-----------------------------+-----------------------------+--------------------------------------+ -|1970-01-01T08:00:00.001+08:00|1970-01-01T08:00:00.005+08:00| 14.0| -|1970-01-01T08:00:00.006+08:00|1970-01-01T08:00:00.010+08:00| 24.0| -+-----------------------------+-----------------------------+--------------------------------------+ -``` - -### Aggregate By Group - -#### Aggregation By Level - -Aggregation by level statement is used to group the query result whose name is the same at the given level. - -- Keyword `LEVEL` is used to specify the level that need to be grouped. By convention, `level=0` represents *root* level. -- All aggregation functions are supported. When using five aggregations: sum, avg, min_value, max_value and extreme, please make sure all the aggregated series have exactly the same data type. Otherwise, it will generate a syntax error. - -**Example 1:** there are multiple series named `status` under different databases, like "root.ln.wf01.wt01.status", "root.ln.wf02.wt02.status", and "root.sgcc.wf03.wt01.status". If you need to count the number of data points of the `status` sequence under different databases, use the following query: - -```sql -select count(status) from root.** group by level = 1 -``` - -Result: - -``` -+-------------------------+---------------------------+ -|count(root.ln.*.*.status)|count(root.sgcc.*.*.status)| -+-------------------------+---------------------------+ -| 20160| 10080| -+-------------------------+---------------------------+ -Total line number = 1 -It costs 0.003s -``` - -**Example 2:** If you need to count the number of data points under different devices, you can specify level = 3, - -```sql -select count(status) from root.** group by level = 3 -``` - -Result: - -``` -+---------------------------+---------------------------+ -|count(root.*.*.wt01.status)|count(root.*.*.wt02.status)| -+---------------------------+---------------------------+ -| 20160| 10080| -+---------------------------+---------------------------+ -Total line number = 1 -It costs 0.003s -``` - -**Example 3:** Attention,the devices named `wt01` under databases `ln` and `sgcc` are grouped together, since they are regarded as devices with the same name. If you need to further count the number of data points in different devices under different databases, you can use the following query: - -```sql -select count(status) from root.** group by level = 1, 3 -``` - -Result: - -``` -+----------------------------+----------------------------+------------------------------+ -|count(root.ln.*.wt01.status)|count(root.ln.*.wt02.status)|count(root.sgcc.*.wt01.status)| -+----------------------------+----------------------------+------------------------------+ -| 10080| 10080| 10080| -+----------------------------+----------------------------+------------------------------+ -Total line number = 1 -It costs 0.003s -``` - -**Example 4:** Assuming that you want to query the maximum value of temperature sensor under all time series, you can use the following query statement: - -```sql -select max_value(temperature) from root.** group by level = 0 -``` - -Result: - -``` -+---------------------------------+ -|max_value(root.*.*.*.temperature)| -+---------------------------------+ -| 26.0| -+---------------------------------+ -Total line number = 1 -It costs 0.013s -``` - -**Example 5:** The above queries are for a certain sensor. In particular, **if you want to query the total data points owned by all sensors at a certain level**, you need to explicitly specify `*` is selected. - -```sql -select count(*) from root.ln.** group by level = 2 -``` - -Result: - -``` -+----------------------+----------------------+ -|count(root.*.wf01.*.*)|count(root.*.wf02.*.*)| -+----------------------+----------------------+ -| 20160| 20160| -+----------------------+----------------------+ -Total line number = 1 -It costs 0.013s -``` - -##### Aggregate By Time with Level Clause - -Level could be defined to show count the number of points of each node at the given level in current Metadata Tree. - -This could be used to query the number of points under each device. - -The SQL statement is: - -Get time aggregation by level. - -```sql -select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d), level=1; -``` - -Result: - -``` -+-----------------------------+-------------------------+ -| Time|COUNT(root.ln.*.*.status)| -+-----------------------------+-------------------------+ -|2017-11-02T00:00:00.000+08:00| 1440| -|2017-11-03T00:00:00.000+08:00| 1440| -|2017-11-04T00:00:00.000+08:00| 1440| -|2017-11-05T00:00:00.000+08:00| 1440| -|2017-11-06T00:00:00.000+08:00| 1440| -|2017-11-07T00:00:00.000+08:00| 1440| -|2017-11-07T23:00:00.000+08:00| 1380| -+-----------------------------+-------------------------+ -Total line number = 7 -It costs 0.006s -``` - -Time aggregation with sliding step and by level. - -```sql -select count(status) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d), level=1; -``` - -Result: - -``` -+-----------------------------+-------------------------+ -| Time|COUNT(root.ln.*.*.status)| -+-----------------------------+-------------------------+ -|2017-11-01T00:00:00.000+08:00| 180| -|2017-11-02T00:00:00.000+08:00| 180| -|2017-11-03T00:00:00.000+08:00| 180| -|2017-11-04T00:00:00.000+08:00| 180| -|2017-11-05T00:00:00.000+08:00| 180| -|2017-11-06T00:00:00.000+08:00| 180| -|2017-11-07T00:00:00.000+08:00| 180| -+-----------------------------+-------------------------+ -Total line number = 7 -It costs 0.004s -``` - -#### Aggregation By Tags - -IotDB allows you to do aggregation query with the tags defined in timeseries through `GROUP BY TAGS` clause as well. - -Firstly, we can put these example data into IoTDB, which will be used in the following feature introduction. - -These are the temperature data of the workshops, which belongs to the factory `factory1` and locates in different cities. The time range is `[1000, 10000)`. - -The device node of the timeseries path is the ID of the device. The information of city and workshop are modelled in the tags `city` and `workshop`. -The devices `d1` and `d2` belong to the workshop `d1` in `Beijing`. -`d3` and `d4` belong to the workshop `w2` in `Beijing`. -`d5` and `d6` belong to the workshop `w1` in `Shanghai`. -`d7` belongs to the workshop `w2` in `Shanghai`. -`d8` and `d9` are under maintenance, and don't belong to any workshops, so they have no tags. - - -```SQL -CREATE DATABASE root.factory1; -create timeseries root.factory1.d1.temperature with datatype=FLOAT tags(city=Beijing, workshop=w1); -create timeseries root.factory1.d2.temperature with datatype=FLOAT tags(city=Beijing, workshop=w1); -create timeseries root.factory1.d3.temperature with datatype=FLOAT tags(city=Beijing, workshop=w2); -create timeseries root.factory1.d4.temperature with datatype=FLOAT tags(city=Beijing, workshop=w2); -create timeseries root.factory1.d5.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w1); -create timeseries root.factory1.d6.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w1); -create timeseries root.factory1.d7.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w2); -create timeseries root.factory1.d8.temperature with datatype=FLOAT; -create timeseries root.factory1.d9.temperature with datatype=FLOAT; - -insert into root.factory1.d1(time, temperature) values(1000, 104.0); -insert into root.factory1.d1(time, temperature) values(3000, 104.2); -insert into root.factory1.d1(time, temperature) values(5000, 103.3); -insert into root.factory1.d1(time, temperature) values(7000, 104.1); - -insert into root.factory1.d2(time, temperature) values(1000, 104.4); -insert into root.factory1.d2(time, temperature) values(3000, 103.7); -insert into root.factory1.d2(time, temperature) values(5000, 103.3); -insert into root.factory1.d2(time, temperature) values(7000, 102.9); - -insert into root.factory1.d3(time, temperature) values(1000, 103.9); -insert into root.factory1.d3(time, temperature) values(3000, 103.8); -insert into root.factory1.d3(time, temperature) values(5000, 102.7); -insert into root.factory1.d3(time, temperature) values(7000, 106.9); - -insert into root.factory1.d4(time, temperature) values(1000, 103.9); -insert into root.factory1.d4(time, temperature) values(5000, 102.7); -insert into root.factory1.d4(time, temperature) values(7000, 106.9); - -insert into root.factory1.d5(time, temperature) values(1000, 112.9); -insert into root.factory1.d5(time, temperature) values(7000, 113.0); - -insert into root.factory1.d6(time, temperature) values(1000, 113.9); -insert into root.factory1.d6(time, temperature) values(3000, 113.3); -insert into root.factory1.d6(time, temperature) values(5000, 112.7); -insert into root.factory1.d6(time, temperature) values(7000, 112.3); - -insert into root.factory1.d7(time, temperature) values(1000, 101.2); -insert into root.factory1.d7(time, temperature) values(3000, 99.3); -insert into root.factory1.d7(time, temperature) values(5000, 100.1); -insert into root.factory1.d7(time, temperature) values(7000, 99.8); - -insert into root.factory1.d8(time, temperature) values(1000, 50.0); -insert into root.factory1.d8(time, temperature) values(3000, 52.1); -insert into root.factory1.d8(time, temperature) values(5000, 50.1); -insert into root.factory1.d8(time, temperature) values(7000, 50.5); - -insert into root.factory1.d9(time, temperature) values(1000, 50.3); -insert into root.factory1.d9(time, temperature) values(3000, 52.1); -``` - -##### Aggregation query by one single tag - -If the user wants to know the average temperature of each workshop, he can query like this - -```SQL -SELECT AVG(temperature) FROM root.factory1.** GROUP BY TAGS(city); -``` - -The query will calculate the average of the temperatures of those timeseries which have the same tag value of the key `city`. -The results are - -``` -+--------+------------------+ -| city| avg(temperature)| -+--------+------------------+ -| Beijing|104.04666697184244| -|Shanghai|107.85000076293946| -| NULL| 50.84999910990397| -+--------+------------------+ -Total line number = 3 -It costs 0.231s -``` - -From the results we can see that the differences between aggregation by tags query and aggregation by time or level query are: - -1. Aggregation query by tags will no longer remove wildcard to raw timeseries, but do the aggregation through the data of multiple timeseries, which have the same tag value. -2. Except for the aggregate result column, the result set contains the key-value column of the grouped tag. The column name is the tag key, and the values in the column are tag values which present in the searched timeseries. - If some searched timeseries doesn't have the grouped tag, a `NULL` value in the key-value column of the grouped tag will be presented, which means the aggregation of all the timeseries lacking the tagged key. - -##### Aggregation query by multiple tags - -Except for the aggregation query by one single tag, aggregation query by multiple tags in a particular order is allowed as well. - -For example, a user wants to know the average temperature of the devices in each workshop. -As the workshop names may be same in different city, it's not correct to aggregated by the tag `workshop` directly. -So the aggregation by the tag `city` should be done first, and then by the tag `workshop`. - -SQL - -```SQL -SELECT avg(temperature) FROM root.factory1.** GROUP BY TAGS(city, workshop); -``` - -The results - -``` -+--------+--------+------------------+ -| city|workshop| avg(temperature)| -+--------+--------+------------------+ -| NULL| NULL| 50.84999910990397| -|Shanghai| w1|113.01666768391927| -| Beijing| w2| 104.4000004359654| -|Shanghai| w2|100.10000038146973| -| Beijing| w1|103.73750019073486| -+--------+--------+------------------+ -Total line number = 5 -It costs 0.027s -``` - -We can see that in a multiple tags aggregation query, the result set will output the key-value columns of all the grouped tag keys, which have the same order with the one in `GROUP BY TAGS`. - -##### Downsampling Aggregation by tags based on Time Window - -Downsampling aggregation by time window is one of the most popular features in a time series database. IoTDB supports to do aggregation query by tags based on time window. - -For example, a user wants to know the average temperature of the devices in each workshop, in every 5 seconds, in the range of time `[1000, 10000)`. - -SQL - -```SQL -SELECT avg(temperature) FROM root.factory1.** GROUP BY ([1000, 10000), 5s), TAGS(city, workshop); -``` - -The results - -``` -+-----------------------------+--------+--------+------------------+ -| Time| city|workshop| avg(temperature)| -+-----------------------------+--------+--------+------------------+ -|1970-01-01T08:00:01.000+08:00| NULL| NULL| 50.91999893188476| -|1970-01-01T08:00:01.000+08:00|Shanghai| w1|113.20000076293945| -|1970-01-01T08:00:01.000+08:00| Beijing| w2| 103.4| -|1970-01-01T08:00:01.000+08:00|Shanghai| w2| 100.1999994913737| -|1970-01-01T08:00:01.000+08:00| Beijing| w1|103.81666692097981| -|1970-01-01T08:00:06.000+08:00| NULL| NULL| 50.5| -|1970-01-01T08:00:06.000+08:00|Shanghai| w1| 112.6500015258789| -|1970-01-01T08:00:06.000+08:00| Beijing| w2| 106.9000015258789| -|1970-01-01T08:00:06.000+08:00|Shanghai| w2| 99.80000305175781| -|1970-01-01T08:00:06.000+08:00| Beijing| w1| 103.5| -+-----------------------------+--------+--------+------------------+ -``` - -Comparing to the pure tag aggregations, this kind of aggregation will divide the data according to the time window specification firstly, and do the aggregation query by the multiple tags in each time window secondly. -The result set will also contain a time column, which have the same meaning with the time column of the result in downsampling aggregation query by time window. - -##### Limitation of Aggregation by Tags - -As this feature is still under development, some queries have not been completed yet and will be supported in the future. - -> 1. Temporarily not support `HAVING` clause to filter the results. -> 2. Temporarily not support ordering by tag values. -> 3. Temporarily not support `LIMIT`,`OFFSET`,`SLIMIT`,`SOFFSET`. -> 4. Temporarily not support `ALIGN BY DEVICE`. -> 5. Temporarily not support expressions as aggregation function parameter,e.g. `count(s+1)`. -> 6. Not support the value filter, which stands the same with the `GROUP BY LEVEL` query. - -## `HAVING` CLAUSE - -If you want to filter the results of aggregate queries, -you can use the `HAVING` clause after the `GROUP BY` clause. - -> NOTE: -> -> 1.The expression in HAVING clause must consist of aggregate values; the original sequence cannot appear alone. -> The following usages are incorrect: -> -> ```sql -> select count(s1) from root.** group by ([1,3),1ms) having sum(s1) > s1 -> select count(s1) from root.** group by ([1,3),1ms) having s1 > 1 -> ``` -> -> 2.When filtering the `GROUP BY LEVEL` result, the PATH in `SELECT` and `HAVING` can only have one node. -> The following usages are incorrect: -> -> ```sql -> select count(s1) from root.** group by ([1,3),1ms), level=1 having sum(d1.s1) > 1 -> select count(d1.s1) from root.** group by ([1,3),1ms), level=1 having sum(s1) > 1 -> ``` - -Here are a few examples of using the 'HAVING' clause to filter aggregate results. - -Aggregation result 1: - -``` -+-----------------------------+---------------------+---------------------+ -| Time|count(root.test.*.s1)|count(root.test.*.s2)| -+-----------------------------+---------------------+---------------------+ -|1970-01-01T08:00:00.001+08:00| 4| 4| -|1970-01-01T08:00:00.003+08:00| 1| 0| -|1970-01-01T08:00:00.005+08:00| 2| 4| -|1970-01-01T08:00:00.007+08:00| 3| 2| -|1970-01-01T08:00:00.009+08:00| 4| 4| -+-----------------------------+---------------------+---------------------+ -``` - -Aggregation result filtering query 1: - -```sql - select count(s1) from root.** group by ([1,11),2ms), level=1 having count(s2) > 1 -``` - -Filtering result 1: - -``` -+-----------------------------+---------------------+ -| Time|count(root.test.*.s1)| -+-----------------------------+---------------------+ -|1970-01-01T08:00:00.001+08:00| 4| -|1970-01-01T08:00:00.005+08:00| 2| -|1970-01-01T08:00:00.009+08:00| 4| -+-----------------------------+---------------------+ -``` - -Aggregation result 2: - -``` -+-----------------------------+-------------+---------+---------+ -| Time| Device|count(s1)|count(s2)| -+-----------------------------+-------------+---------+---------+ -|1970-01-01T08:00:00.001+08:00|root.test.sg1| 1| 2| -|1970-01-01T08:00:00.003+08:00|root.test.sg1| 1| 0| -|1970-01-01T08:00:00.005+08:00|root.test.sg1| 1| 2| -|1970-01-01T08:00:00.007+08:00|root.test.sg1| 2| 1| -|1970-01-01T08:00:00.009+08:00|root.test.sg1| 2| 2| -|1970-01-01T08:00:00.001+08:00|root.test.sg2| 2| 2| -|1970-01-01T08:00:00.003+08:00|root.test.sg2| 0| 0| -|1970-01-01T08:00:00.005+08:00|root.test.sg2| 1| 2| -|1970-01-01T08:00:00.007+08:00|root.test.sg2| 1| 1| -|1970-01-01T08:00:00.009+08:00|root.test.sg2| 2| 2| -+-----------------------------+-------------+---------+---------+ -``` - -Aggregation result filtering query 2: - -```sql - select count(s1), count(s2) from root.** group by ([1,11),2ms) having count(s2) > 1 align by device -``` - -Filtering result 2: - -``` -+-----------------------------+-------------+---------+---------+ -| Time| Device|count(s1)|count(s2)| -+-----------------------------+-------------+---------+---------+ -|1970-01-01T08:00:00.001+08:00|root.test.sg1| 1| 2| -|1970-01-01T08:00:00.005+08:00|root.test.sg1| 1| 2| -|1970-01-01T08:00:00.009+08:00|root.test.sg1| 2| 2| -|1970-01-01T08:00:00.001+08:00|root.test.sg2| 2| 2| -|1970-01-01T08:00:00.005+08:00|root.test.sg2| 1| 2| -|1970-01-01T08:00:00.009+08:00|root.test.sg2| 2| 2| -+-----------------------------+-------------+---------+---------+ -``` - -## `FILL` CLAUSE - -### Introduction - -When executing some queries, there may be no data for some columns in some rows, and data in these locations will be null, but this kind of null value is not conducive to data visualization and analysis, and the null value needs to be filled. - -In IoTDB, users can use the FILL clause to specify the fill mode when data is missing. Fill null value allows the user to fill any query result with null values according to a specific method, such as taking the previous value that is not null, or linear interpolation. The query result after filling the null value can better reflect the data distribution, which is beneficial for users to perform data analysis. - -### Syntax Definition - -**The following is the syntax definition of the `FILL` clause:** - -```sql -FILL '(' PREVIOUS | LINEAR | constant ')' -``` - -**Note:** - -- We can specify only one fill method in the `FILL` clause, and this method applies to all columns of the result set. -- Null value fill is not compatible with version 0.13 and previous syntax (`FILL(([(, , )?])+)`) is not supported anymore. - -### Fill Methods - -**IoTDB supports the following three fill methods:** - -- `PREVIOUS`: Fill with the previous non-null value of the column. -- `LINEAR`: Fill the column with a linear interpolation of the previous non-null value and the next non-null value of the column. -- Constant: Fill with the specified constant. - -**Following table lists the data types and supported fill methods.** - -| Data Type | Supported Fill Methods | -| :-------- | :---------------------- | -| boolean | previous, value | -| int32 | previous, linear, value | -| int64 | previous, linear, value | -| float | previous, linear, value | -| double | previous, linear, value | -| text | previous, value | - -**Note:** For columns whose data type does not support specifying the fill method, we neither fill it nor throw exception, just keep it as it is. - -**For examples:** - -If we don't use any fill methods: - -```sql -select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000; -``` - -the original result will be like: - -``` -+-----------------------------+-------------------------------+--------------------------+ -| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:37:00.000+08:00| 21.93| true| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:38:00.000+08:00| null| false| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:39:00.000+08:00| 22.23| null| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:40:00.000+08:00| 23.43| null| -+-----------------------------+-------------------------------+--------------------------+ -Total line number = 4 -``` - -#### `PREVIOUS` Fill - -**For null values in the query result set, fill with the previous non-null value of the column.** - -**Note:** If the first value of this column is null, we will keep first value as null and won't fill it until we meet first non-null value - -For example, with `PREVIOUS` fill, the SQL is as follows: - -```sql -select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(previous); -``` - -result will be like: - -``` -+-----------------------------+-------------------------------+--------------------------+ -| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:37:00.000+08:00| 21.93| true| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:38:00.000+08:00| 21.93| false| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:39:00.000+08:00| 22.23| false| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:40:00.000+08:00| 23.43| false| -+-----------------------------+-------------------------------+--------------------------+ -Total line number = 4 -``` - -**While using `FILL(PREVIOUS)`, you can specify a time interval. If the interval between the timestamp of the current null value and the timestamp of the previous non-null value exceeds the specified time interval, no filling will be performed.** - -> 1. In the case of FILL(LINEAR) and FILL(CONSTANT), if the second parameter is specified, an exception will be thrown -> 2. The interval parameter only supports integers - For example, the raw data looks like this: - -```sql -select s1 from root.db.d1 -``` -``` -+-----------------------------+-------------+ -| Time|root.db.d1.s1| -+-----------------------------+-------------+ -|2023-11-08T16:41:50.008+08:00| 1.0| -+-----------------------------+-------------+ -|2023-11-08T16:46:50.011+08:00| 2.0| -+-----------------------------+-------------+ -|2023-11-08T16:48:50.011+08:00| 3.0| -+-----------------------------+-------------+ -``` - -We want to group the data by 1 min time interval: - -```sql -select avg(s1) - from root.db.d1 - group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) -``` -``` -+-----------------------------+------------------+ -| Time|avg(root.db.d1.s1)| -+-----------------------------+------------------+ -|2023-11-08T16:40:00.008+08:00| null| -+-----------------------------+------------------+ -|2023-11-08T16:41:00.008+08:00| 1.0| -+-----------------------------+------------------+ -|2023-11-08T16:42:00.008+08:00| null| -+-----------------------------+------------------+ -|2023-11-08T16:43:00.008+08:00| null| -+-----------------------------+------------------+ -|2023-11-08T16:44:00.008+08:00| null| -+-----------------------------+------------------+ -|2023-11-08T16:45:00.008+08:00| null| -+-----------------------------+------------------+ -|2023-11-08T16:46:00.008+08:00| 2.0| -+-----------------------------+------------------+ -|2023-11-08T16:47:00.008+08:00| null| -+-----------------------------+------------------+ -|2023-11-08T16:48:00.008+08:00| 3.0| -+-----------------------------+------------------+ -|2023-11-08T16:49:00.008+08:00| null| -+-----------------------------+------------------+ -``` - -After grouping, we want to fill the null value: - -```sql -select avg(s1) - from root.db.d1 - group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) - FILL(PREVIOUS); -``` -``` -+-----------------------------+------------------+ -| Time|avg(root.db.d1.s1)| -+-----------------------------+------------------+ -|2023-11-08T16:40:00.008+08:00| null| -+-----------------------------+------------------+ -|2023-11-08T16:41:00.008+08:00| 1.0| -+-----------------------------+------------------+ -|2023-11-08T16:42:00.008+08:00| 1.0| -+-----------------------------+------------------+ -|2023-11-08T16:43:00.008+08:00| 1.0| -+-----------------------------+------------------+ -|2023-11-08T16:44:00.008+08:00| 1.0| -+-----------------------------+------------------+ -|2023-11-08T16:45:00.008+08:00| 1.0| -+-----------------------------+------------------+ -|2023-11-08T16:46:00.008+08:00| 2.0| -+-----------------------------+------------------+ -|2023-11-08T16:47:00.008+08:00| 2.0| -+-----------------------------+------------------+ -|2023-11-08T16:48:00.008+08:00| 3.0| -+-----------------------------+------------------+ -|2023-11-08T16:49:00.008+08:00| 3.0| -+-----------------------------+------------------+ -``` - -we also don't want the null value to be filled if it keeps null for 2 min. - -```sql -select avg(s1) -from root.db.d1 -group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) - FILL(PREVIOUS, 2m); -``` -``` -+-----------------------------+------------------+ -| Time|avg(root.db.d1.s1)| -+-----------------------------+------------------+ -|2023-11-08T16:40:00.008+08:00| null| -+-----------------------------+------------------+ -|2023-11-08T16:41:00.008+08:00| 1.0| -+-----------------------------+------------------+ -|2023-11-08T16:42:00.008+08:00| 1.0| -+-----------------------------+------------------+ -|2023-11-08T16:43:00.008+08:00| 1.0| -+-----------------------------+------------------+ -|2023-11-08T16:44:00.008+08:00| null| -+-----------------------------+------------------+ -|2023-11-08T16:45:00.008+08:00| null| -+-----------------------------+------------------+ -|2023-11-08T16:46:00.008+08:00| 2.0| -+-----------------------------+------------------+ -|2023-11-08T16:47:00.008+08:00| 2.0| -+-----------------------------+------------------+ -|2023-11-08T16:48:00.008+08:00| 3.0| -+-----------------------------+------------------+ -|2023-11-08T16:49:00.008+08:00| 3.0| -+-----------------------------+------------------+ -``` - -#### `LINEAR` Fill - -**For null values in the query result set, fill the column with a linear interpolation of the previous non-null value and the next non-null value of the column.** - -**Note:** - -- If all the values before current value are null or all the values after current value are null, we will keep current value as null and won't fill it. -- If the column's data type is boolean/text, we neither fill it nor throw exception, just keep it as it is. - -Here we give an example of filling null values using the linear method. The SQL statement is as follows: - -For example, with `LINEAR` fill, the SQL is as follows: - -```sql -select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(linear); -``` - -result will be like: - -``` -+-----------------------------+-------------------------------+--------------------------+ -| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:37:00.000+08:00| 21.93| true| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:38:00.000+08:00| 22.08| false| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:39:00.000+08:00| 22.23| null| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:40:00.000+08:00| 23.43| null| -+-----------------------------+-------------------------------+--------------------------+ -Total line number = 4 -``` - -#### Constant Fill - -**For null values in the query result set, fill with the specified constant.** - -**Note:** - -- When using the ValueFill, IoTDB neither fill the query result if the data type is different from the input constant nor throw exception, just keep it as it is. - - | Constant Value Data Type | Support Data Type | - | :----------------------- | :-------------------------------------- | - | `BOOLEAN` | `BOOLEAN` `TEXT` | - | `INT64` | `INT32` `INT64` `FLOAT` `DOUBLE` `TEXT` | - | `DOUBLE` | `FLOAT` `DOUBLE` `TEXT` | - | `TEXT` | `TEXT` | - -- If constant value is larger than Integer.MAX_VALUE, IoTDB neither fill the query result if the data type is int32 nor throw exception, just keep it as it is. - -For example, with `FLOAT` constant fill, the SQL is as follows: - -```sql -select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(2.0); -``` - -result will be like: - -``` -+-----------------------------+-------------------------------+--------------------------+ -| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:37:00.000+08:00| 21.93| true| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:38:00.000+08:00| 2.0| false| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:39:00.000+08:00| 22.23| null| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:40:00.000+08:00| 23.43| null| -+-----------------------------+-------------------------------+--------------------------+ -Total line number = 4 -``` - -For example, with `BOOLEAN` constant fill, the SQL is as follows: - -```sql -select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(true); -``` - -result will be like: - -``` -+-----------------------------+-------------------------------+--------------------------+ -| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:37:00.000+08:00| 21.93| true| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:38:00.000+08:00| null| false| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:39:00.000+08:00| 22.23| true| -+-----------------------------+-------------------------------+--------------------------+ -|2017-11-01T16:40:00.000+08:00| 23.43| true| -+-----------------------------+-------------------------------+--------------------------+ -Total line number = 4 -``` - -## `LIMIT` and `SLIMIT` CLAUSES (PAGINATION) - -When the query result set has a large amount of data, it is not conducive to display on one page. You can use the `LIMIT/SLIMIT` clause and the `OFFSET/SOFFSET` clause to control paging. - -- The `LIMIT` and `SLIMIT` clauses are used to control the number of rows and columns of query results. -- The `OFFSET` and `SOFFSET` clauses are used to control the starting position of the result display. - -### Row Control over Query Results - -By using LIMIT and OFFSET clauses, users control the query results in a row-related manner. We demonstrate how to use LIMIT and OFFSET clauses through the following examples. - -* Example 1: basic LIMIT clause - -The SQL statement is: - -```sql -select status, temperature from root.ln.wf01.wt01 limit 10 -``` - -which means: - -The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature". The SQL statement requires the first 10 rows of the query result. - -The result is shown below: - -``` -+-----------------------------+------------------------+-----------------------------+ -| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| -+-----------------------------+------------------------+-----------------------------+ -|2017-11-01T00:00:00.000+08:00| true| 25.96| -|2017-11-01T00:01:00.000+08:00| true| 24.36| -|2017-11-01T00:02:00.000+08:00| false| 20.09| -|2017-11-01T00:03:00.000+08:00| false| 20.18| -|2017-11-01T00:04:00.000+08:00| false| 21.13| -|2017-11-01T00:05:00.000+08:00| false| 22.72| -|2017-11-01T00:06:00.000+08:00| false| 20.71| -|2017-11-01T00:07:00.000+08:00| false| 21.45| -|2017-11-01T00:08:00.000+08:00| false| 22.58| -|2017-11-01T00:09:00.000+08:00| false| 20.98| -+-----------------------------+------------------------+-----------------------------+ -Total line number = 10 -It costs 0.000s -``` - -* Example 2: LIMIT clause with OFFSET - -The SQL statement is: - -```sql -select status, temperature from root.ln.wf01.wt01 limit 5 offset 3 -``` - -which means: - -The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature". The SQL statement requires rows 3 to 7 of the query result be returned (with the first row numbered as row 0). - -The result is shown below: - -``` -+-----------------------------+------------------------+-----------------------------+ -| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| -+-----------------------------+------------------------+-----------------------------+ -|2017-11-01T00:03:00.000+08:00| false| 20.18| -|2017-11-01T00:04:00.000+08:00| false| 21.13| -|2017-11-01T00:05:00.000+08:00| false| 22.72| -|2017-11-01T00:06:00.000+08:00| false| 20.71| -|2017-11-01T00:07:00.000+08:00| false| 21.45| -+-----------------------------+------------------------+-----------------------------+ -Total line number = 5 -It costs 0.342s -``` - -* Example 3: LIMIT clause combined with WHERE clause - -The SQL statement is: - - -```sql -select status,temperature from root.ln.wf01.wt01 where time > 2024-07-07T00:05:00.000 and time< 2024-07-12T00:12:00.000 limit 5 offset 3 -``` - -which means: - -The selected equipment is the ln group wf01 factory wt01 equipment; The selected time series are "state" and "temperature". The SQL statement requires the return of the status and temperature sensor values between the time "2024-07-07T00:05:00.000" and "2024-07-12T00:12:00.0000" on lines 3 to 7 (the first line is numbered as line 0). - - -The result is shown below: - -``` -+-----------------------------+------------------------+-----------------------------+ -| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| -+-----------------------------+------------------------+-----------------------------+ -|2024-07-09T17:32:11.943+08:00| true| 24.941973| -|2024-07-09T17:32:12.944+08:00| true| 20.05108| -|2024-07-09T17:32:13.945+08:00| true| 20.541632| -|2024-07-09T17:32:14.945+08:00| null| 23.09016| -|2024-07-09T17:32:14.946+08:00| true| null| -+-----------------------------+------------------------+-----------------------------+ -Total line number = 5 -It costs 0.070s -``` - -* Example 4: LIMIT clause combined with GROUP BY clause - -The SQL statement is: - -```sql -select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) limit 5 offset 3 -``` - -which means: - -The SQL statement clause requires rows 3 to 7 of the query result be returned (with the first row numbered as row 0). - -The result is shown below: - -``` -+-----------------------------+-------------------------------+----------------------------------------+ -| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| -+-----------------------------+-------------------------------+----------------------------------------+ -|2017-11-04T00:00:00.000+08:00| 1440| 26.0| -|2017-11-05T00:00:00.000+08:00| 1440| 26.0| -|2017-11-06T00:00:00.000+08:00| 1440| 25.99| -|2017-11-07T00:00:00.000+08:00| 1380| 26.0| -+-----------------------------+-------------------------------+----------------------------------------+ -Total line number = 4 -It costs 0.016s -``` - -### Column Control over Query Results - -By using SLIMIT and SOFFSET clauses, users can control the query results in a column-related manner. We will demonstrate how to use SLIMIT and SOFFSET clauses through the following examples. - -* Example 1: basic SLIMIT clause - -The SQL statement is: - -```sql -select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 -``` - -which means: - -The selected device is ln group wf01 plant wt01 device; the selected timeseries is the first column under this device, i.e., the power supply status. The SQL statement requires the status sensor values between the time point of "2017-11-01T00:05:00.000" and "2017-11-01T00:12:00.000" be selected. - -The result is shown below: - -``` -+-----------------------------+-----------------------------+ -| Time|root.ln.wf01.wt01.temperature| -+-----------------------------+-----------------------------+ -|2017-11-01T00:06:00.000+08:00| 20.71| -|2017-11-01T00:07:00.000+08:00| 21.45| -|2017-11-01T00:08:00.000+08:00| 22.58| -|2017-11-01T00:09:00.000+08:00| 20.98| -|2017-11-01T00:10:00.000+08:00| 25.52| -|2017-11-01T00:11:00.000+08:00| 22.91| -+-----------------------------+-----------------------------+ -Total line number = 6 -It costs 0.000s -``` - -* Example 2: SLIMIT clause with SOFFSET - -The SQL statement is: - -```sql -select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 soffset 1 -``` - -which means: - -The selected device is ln group wf01 plant wt01 device; the selected timeseries is the second column under this device, i.e., the temperature. The SQL statement requires the temperature sensor values between the time point of "2017-11-01T00:05:00.000" and "2017-11-01T00:12:00.000" be selected. - -The result is shown below: - -``` -+-----------------------------+------------------------+ -| Time|root.ln.wf01.wt01.status| -+-----------------------------+------------------------+ -|2017-11-01T00:06:00.000+08:00| false| -|2017-11-01T00:07:00.000+08:00| false| -|2017-11-01T00:08:00.000+08:00| false| -|2017-11-01T00:09:00.000+08:00| false| -|2017-11-01T00:10:00.000+08:00| true| -|2017-11-01T00:11:00.000+08:00| false| -+-----------------------------+------------------------+ -Total line number = 6 -It costs 0.003s -``` - -* Example 3: SLIMIT clause combined with GROUP BY clause - -The SQL statement is: - -```sql -select max_value(*) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) slimit 1 soffset 1 -``` - -The result is shown below: - -``` -+-----------------------------+-----------------------------------+ -| Time|max_value(root.ln.wf01.wt01.status)| -+-----------------------------+-----------------------------------+ -|2017-11-01T00:00:00.000+08:00| true| -|2017-11-02T00:00:00.000+08:00| true| -|2017-11-03T00:00:00.000+08:00| true| -|2017-11-04T00:00:00.000+08:00| true| -|2017-11-05T00:00:00.000+08:00| true| -|2017-11-06T00:00:00.000+08:00| true| -|2017-11-07T00:00:00.000+08:00| true| -+-----------------------------+-----------------------------------+ -Total line number = 7 -It costs 0.000s -``` - -### Row and Column Control over Query Results - -In addition to row or column control over query results, IoTDB allows users to control both rows and columns of query results. Here is a complete example with both LIMIT clauses and SLIMIT clauses. - -The SQL statement is: - -```sql -select * from root.ln.wf01.wt01 limit 10 offset 100 slimit 2 soffset 0 -``` - -which means: - -The selected device is ln group wf01 plant wt01 device; the selected timeseries is columns 0 to 1 under this device (with the first column numbered as column 0). The SQL statement clause requires rows 100 to 109 of the query result be returned (with the first row numbered as row 0). - -The result is shown below: - -``` -+-----------------------------+-----------------------------+------------------------+ -| Time|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| -+-----------------------------+-----------------------------+------------------------+ -|2017-11-01T01:40:00.000+08:00| 21.19| false| -|2017-11-01T01:41:00.000+08:00| 22.79| false| -|2017-11-01T01:42:00.000+08:00| 22.98| false| -|2017-11-01T01:43:00.000+08:00| 21.52| false| -|2017-11-01T01:44:00.000+08:00| 23.45| true| -|2017-11-01T01:45:00.000+08:00| 24.06| true| -|2017-11-01T01:46:00.000+08:00| 22.6| false| -|2017-11-01T01:47:00.000+08:00| 23.78| true| -|2017-11-01T01:48:00.000+08:00| 24.72| true| -|2017-11-01T01:49:00.000+08:00| 24.68| true| -+-----------------------------+-----------------------------+------------------------+ -Total line number = 10 -It costs 0.009s -``` - -### Error Handling - -If the parameter N/SN of LIMIT/SLIMIT exceeds the size of the result set, IoTDB returns all the results as expected. For example, the query result of the original SQL statement consists of six rows, and we select the first 100 rows through the LIMIT clause: - -```sql -select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 100 -``` - -The result is shown below: - -``` -+-----------------------------+------------------------+-----------------------------+ -| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| -+-----------------------------+------------------------+-----------------------------+ -|2017-11-01T00:06:00.000+08:00| false| 20.71| -|2017-11-01T00:07:00.000+08:00| false| 21.45| -|2017-11-01T00:08:00.000+08:00| false| 22.58| -|2017-11-01T00:09:00.000+08:00| false| 20.98| -|2017-11-01T00:10:00.000+08:00| true| 25.52| -|2017-11-01T00:11:00.000+08:00| false| 22.91| -+-----------------------------+------------------------+-----------------------------+ -Total line number = 6 -It costs 0.005s -``` - -If the parameter N/SN of LIMIT/SLIMIT clause exceeds the allowable maximum value (N/SN is of type int64), the system prompts errors. For example, executing the following SQL statement: - -```sql -select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 9223372036854775808 -``` - -The SQL statement will not be executed and the corresponding error prompt is given as follows: - -``` -Msg: 416: Out of range. LIMIT : N should be Int64. -``` - -If the parameter N/SN of LIMIT/SLIMIT clause is not a positive intege, the system prompts errors. For example, executing the following SQL statement: - -```sql -select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 13.1 -``` - -The SQL statement will not be executed and the corresponding error prompt is given as follows: - -``` -Msg: 401: line 1:129 mismatched input '.' expecting {, ';'} -``` - -If the parameter OFFSET of LIMIT clause exceeds the size of the result set, IoTDB will return an empty result set. For example, executing the following SQL statement: - -```sql -select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 2 offset 6 -``` - -The result is shown below: - -``` -+----+------------------------+-----------------------------+ -|Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| -+----+------------------------+-----------------------------+ -+----+------------------------+-----------------------------+ -Empty set. -It costs 0.005s -``` - -If the parameter SOFFSET of SLIMIT clause is not smaller than the number of available timeseries, the system prompts errors. For example, executing the following SQL statement: - -```sql -select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 soffset 2 -``` - -The SQL statement will not be executed and the corresponding error prompt is given as follows: - -``` -Msg: 411: Meet error in query process: The value of SOFFSET (2) is equal to or exceeds the number of sequences (2) that can actually be returned. -``` - -## `ORDER BY` CLAUSE - -### Order by in ALIGN BY TIME mode - -The result set of IoTDB is in ALIGN BY TIME mode by default and `ORDER BY TIME` clause can also be used to specify the ordering of timestamp. The SQL statement is: - -```sql -select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time desc; -``` - -Results: - -``` -+-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ -| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| -+-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ -|2017-11-01T00:01:00.000+08:00| v2| true| 24.36| true| -|2017-11-01T00:00:00.000+08:00| v2| true| 25.96| true| -|1970-01-01T08:00:00.002+08:00| v2| false| null| null| -|1970-01-01T08:00:00.001+08:00| v1| true| null| null| -+-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ -``` - -### Order by in ALIGN BY DEVICE mode - -When querying in ALIGN BY DEVICE mode, `ORDER BY` clause can be used to specify the ordering of result set. - -ALIGN BY DEVICE mode supports four kinds of clauses with two sort keys which are `Device` and `Time`. - -1. ``ORDER BY DEVICE``: sort by the alphabetical order of the device name. The devices with the same column names will be clustered in a group view. - -2. ``ORDER BY TIME``: sort by the timestamp, the data points from different devices will be shuffled according to the timestamp. - -3. ``ORDER BY DEVICE,TIME``: sort by the alphabetical order of the device name. The data points with the same device name will be sorted by timestamp. - -4. ``ORDER BY TIME,DEVICE``: sort by timestamp. The data points with the same time will be sorted by the alphabetical order of the device name. - -> To make the result set more legible, when `ORDER BY` clause is not used, default settings will be provided. -> The default ordering clause is `ORDER BY DEVICE,TIME` and the default ordering is `ASC`. - -When `Device` is the main sort key, the result set is sorted by device name first, then by timestamp in the group with the same device name, the SQL statement is: - -```sql -select * from root.ln.** where time <= 2017-11-01T00:01:00 order by device desc,time asc align by device; -``` - -The result shows below: - -``` -+-----------------------------+-----------------+--------+------+-----------+ -| Time| Device|hardware|status|temperature| -+-----------------------------+-----------------+--------+------+-----------+ -|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| -|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| -|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| -|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| -|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| -|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| -+-----------------------------+-----------------+--------+------+-----------+ -``` - -When `Time` is the main sort key, the result set is sorted by timestamp first, then by device name in data points with the same timestamp. The SQL statement is: - -```sql -select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time asc,device desc align by device; -``` - -The result shows below: - -``` -+-----------------------------+-----------------+--------+------+-----------+ -| Time| Device|hardware|status|temperature| -+-----------------------------+-----------------+--------+------+-----------+ -|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| -|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| -|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| -|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| -|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| -|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| -+-----------------------------+-----------------+--------+------+-----------+ -``` - -When `ORDER BY` clause is not used, sort in default way, the SQL statement is: - -```sql -select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; -``` - -The result below indicates `ORDER BY DEVICE ASC,TIME ASC` is the clause in default situation. -`ASC` can be omitted because it's the default ordering. - -``` -+-----------------------------+-----------------+--------+------+-----------+ -| Time| Device|hardware|status|temperature| -+-----------------------------+-----------------+--------+------+-----------+ -|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| -|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| -|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| -|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| -|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| -|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| -+-----------------------------+-----------------+--------+------+-----------+ -``` - -Besides,`ALIGN BY DEVICE` and `ORDER BY` clauses can be used with aggregate query,the SQL statement is: - -```sql -select count(*) from root.ln.** group by ((2017-11-01T00:00:00.000+08:00,2017-11-01T00:03:00.000+08:00],1m) order by device asc,time asc align by device -``` - -The result shows below: - -``` -+-----------------------------+-----------------+---------------+-------------+------------------+ -| Time| Device|count(hardware)|count(status)|count(temperature)| -+-----------------------------+-----------------+---------------+-------------+------------------+ -|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| 1| 1| -|2017-11-01T00:02:00.000+08:00|root.ln.wf01.wt01| null| 0| 0| -|2017-11-01T00:03:00.000+08:00|root.ln.wf01.wt01| null| 0| 0| -|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| 1| 1| null| -|2017-11-01T00:02:00.000+08:00|root.ln.wf02.wt02| 0| 0| null| -|2017-11-01T00:03:00.000+08:00|root.ln.wf02.wt02| 0| 0| null| -+-----------------------------+-----------------+---------------+-------------+------------------+ -``` - -### Order by arbitrary expressions - -In addition to the predefined keywords "Time" and "Device" in IoTDB, `ORDER BY` can also be used to sort by any expressions. - -When sorting, `ASC` or `DESC` can be used to specify the sorting order, and `NULLS` syntax is supported to specify the priority of NULL values in the sorting. By default, `NULLS FIRST` places NULL values at the top of the result, and `NULLS LAST` ensures that NULL values appear at the end of the result. If not specified in the clause, the default order is ASC with NULLS LAST. - -Here are several examples of queries for sorting arbitrary expressions using the following data: - -``` -+-----------------------------+-------------+-------+-------+--------+-------+ -| Time| Device| base| score| bonus| total| -+-----------------------------+-------------+-------+-------+--------+-------+ -|1970-01-01T08:00:00.000+08:00| root.one| 12| 50.0| 45.0| 107.0| -|1970-01-02T08:00:00.000+08:00| root.one| 10| 50.0| 45.0| 105.0| -|1970-01-03T08:00:00.000+08:00| root.one| 8| 50.0| 45.0| 103.0| -|1970-01-01T08:00:00.010+08:00| root.two| 9| 50.0| 15.0| 74.0| -|1970-01-01T08:00:00.020+08:00| root.two| 8| 10.0| 15.0| 33.0| -|1970-01-01T08:00:00.010+08:00| root.three| 9| null| 24.0| 33.0| -|1970-01-01T08:00:00.020+08:00| root.three| 8| null| 22.5| 30.5| -|1970-01-01T08:00:00.030+08:00| root.three| 7| null| 23.5| 30.5| -|1970-01-01T08:00:00.010+08:00| root.four| 9| 32.0| 45.0| 86.0| -|1970-01-01T08:00:00.020+08:00| root.four| 8| 32.0| 45.0| 85.0| -|1970-01-01T08:00:00.030+08:00| root.five| 7| 53.0| 44.0| 104.0| -|1970-01-01T08:00:00.040+08:00| root.five| 6| 54.0| 42.0| 102.0| -+-----------------------------+-------------+-------+-------+--------+-------+ -``` - -When you need to sort the results based on the base score score, you can use the following SQL: - -```Sql -select score from root.** order by score desc align by device -``` - -This will give you the following results: - -``` -+-----------------------------+---------+-----+ -| Time| Device|score| -+-----------------------------+---------+-----+ -|1970-01-01T08:00:00.040+08:00|root.five| 54.0| -|1970-01-01T08:00:00.030+08:00|root.five| 53.0| -|1970-01-01T08:00:00.000+08:00| root.one| 50.0| -|1970-01-02T08:00:00.000+08:00| root.one| 50.0| -|1970-01-03T08:00:00.000+08:00| root.one| 50.0| -|1970-01-01T08:00:00.000+08:00| root.two| 50.0| -|1970-01-01T08:00:00.010+08:00| root.two| 50.0| -|1970-01-01T08:00:00.010+08:00|root.four| 32.0| -|1970-01-01T08:00:00.020+08:00|root.four| 32.0| -|1970-01-01T08:00:00.020+08:00| root.two| 10.0| -+-----------------------------+---------+-----+ -``` - -If you want to sort the results based on the total score, you can use an expression in the `ORDER BY` clause to perform the calculation: - -```Sql -select score,total from root.one order by base+score+bonus desc -``` - -This SQL is equivalent to: - -```Sql -select score,total from root.one order by total desc -``` - -Here are the results: - -``` -+-----------------------------+--------------+--------------+ -| Time|root.one.score|root.one.total| -+-----------------------------+--------------+--------------+ -|1970-01-01T08:00:00.000+08:00| 50.0| 107.0| -|1970-01-02T08:00:00.000+08:00| 50.0| 105.0| -|1970-01-03T08:00:00.000+08:00| 50.0| 103.0| -+-----------------------------+--------------+--------------+ -``` - -If you want to sort the results based on the total score and, in case of tied scores, sort by score, base, bonus, and submission time in descending order, you can specify multiple layers of sorting using multiple expressions: - -```Sql -select base, score, bonus, total from root.** order by total desc NULLS Last, - score desc NULLS Last, - bonus desc NULLS Last, - time desc align by device -``` - -Here are the results: - -``` -+-----------------------------+----------+----+-----+-----+-----+ -| Time| Device|base|score|bonus|total| -+-----------------------------+----------+----+-----+-----+-----+ -|1970-01-01T08:00:00.000+08:00| root.one| 12| 50.0| 45.0|107.0| -|1970-01-02T08:00:00.000+08:00| root.one| 10| 50.0| 45.0|105.0| -|1970-01-01T08:00:00.030+08:00| root.five| 7| 53.0| 44.0|104.0| -|1970-01-03T08:00:00.000+08:00| root.one| 8| 50.0| 45.0|103.0| -|1970-01-01T08:00:00.040+08:00| root.five| 6| 54.0| 42.0|102.0| -|1970-01-01T08:00:00.010+08:00| root.four| 9| 32.0| 45.0| 86.0| -|1970-01-01T08:00:00.020+08:00| root.four| 8| 32.0| 45.0| 85.0| -|1970-01-01T08:00:00.010+08:00| root.two| 9| 50.0| 15.0| 74.0| -|1970-01-01T08:00:00.000+08:00| root.two| 9| 50.0| 15.0| 74.0| -|1970-01-01T08:00:00.020+08:00| root.two| 8| 10.0| 15.0| 33.0| -|1970-01-01T08:00:00.010+08:00|root.three| 9| null| 24.0| 33.0| -|1970-01-01T08:00:00.030+08:00|root.three| 7| null| 23.5| 30.5| -|1970-01-01T08:00:00.020+08:00|root.three| 8| null| 22.5| 30.5| -+-----------------------------+----------+----+-----+-----+-----+ -``` - -In the `ORDER BY` clause, you can also use aggregate query expressions. For example: - -```Sql -select min_value(total) from root.** order by min_value(total) asc align by device -``` - -This will give you the following results: - -``` -+----------+----------------+ -| Device|min_value(total)| -+----------+----------------+ -|root.three| 30.5| -| root.two| 33.0| -| root.four| 85.0| -| root.five| 102.0| -| root.one| 103.0| -+----------+----------------+ -``` - -When specifying multiple columns in the query, the unsorted columns will change order along with the rows and sorted columns. The order of rows when the sorting columns are the same may vary depending on the specific implementation (no fixed order). For example: - -```Sql -select min_value(total),max_value(base) from root.** order by max_value(total) desc align by device -``` - -This will give you the following results: -· - -``` -+----------+----------------+---------------+ -| Device|min_value(total)|max_value(base)| -+----------+----------------+---------------+ -| root.one| 103.0| 12| -| root.five| 102.0| 7| -| root.four| 85.0| 9| -| root.two| 33.0| 9| -|root.three| 30.5| 9| -+----------+----------------+---------------+ -``` - -You can use both `ORDER BY DEVICE,TIME` and `ORDER BY EXPRESSION` together. For example: - -```Sql -select score from root.** order by device asc, score desc, time asc align by device -``` - -This will give you the following results: - -``` -+-----------------------------+---------+-----+ -| Time| Device|score| -+-----------------------------+---------+-----+ -|1970-01-01T08:00:00.040+08:00|root.five| 54.0| -|1970-01-01T08:00:00.030+08:00|root.five| 53.0| -|1970-01-01T08:00:00.010+08:00|root.four| 32.0| -|1970-01-01T08:00:00.020+08:00|root.four| 32.0| -|1970-01-01T08:00:00.000+08:00| root.one| 50.0| -|1970-01-02T08:00:00.000+08:00| root.one| 50.0| -|1970-01-03T08:00:00.000+08:00| root.one| 50.0| -|1970-01-01T08:00:00.000+08:00| root.two| 50.0| -|1970-01-01T08:00:00.010+08:00| root.two| 50.0| -|1970-01-01T08:00:00.020+08:00| root.two| 10.0| -+-----------------------------+---------+-----+ -``` - -## `ALIGN BY` CLAUSE - -In addition, IoTDB supports another result set format: `ALIGN BY DEVICE`. - -### Align by Device - -The `ALIGN BY DEVICE` indicates that the deviceId is considered as a column. Therefore, there are totally limited columns in the dataset. - -> NOTE: -> -> 1.You can see the result of 'align by device' as one relational table, `Time + Device` is the primary key of this Table. -> -> 2.The result is order by `Device` firstly, and then by `Time` order. - -The SQL statement is: - -```sql -select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; -``` - -The result shows below: - -``` -+-----------------------------+-----------------+-----------+------+--------+ -| Time| Device|temperature|status|hardware| -+-----------------------------+-----------------+-----------+------+--------+ -|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| 25.96| true| null| -|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| 24.36| true| null| -|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| null| true| v1| -|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| null| false| v2| -|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| null| true| v2| -|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| null| true| v2| -+-----------------------------+-----------------+-----------+------+--------+ -Total line number = 6 -It costs 0.012s -``` - -### Ordering in ALIGN BY DEVICE - -ALIGN BY DEVICE mode arranges according to the device first, and sort each device in ascending order according to the timestamp. The ordering and priority can be adjusted through `ORDER BY` clause. - -## `INTO` CLAUSE (QUERY WRITE-BACK) - -The `SELECT INTO` statement copies data from query result set into target time series. - -The application scenarios are as follows: - -- **Implement IoTDB internal ETL**: ETL the original data and write a new time series. -- **Query result storage**: Persistently store the query results, which acts like a materialized view. -- **Non-aligned time series to aligned time series**: Rewrite non-aligned time series into another aligned time series. - -### SQL Syntax - -#### Syntax Definition - -**The following is the syntax definition of the `select` statement:** - -```sql -selectIntoStatement -: SELECT - resultColumn [, resultColumn] ... - INTO intoItem [, intoItem] ... - FROM prefixPath [, prefixPath] ... - [WHERE whereCondition] - [GROUP BY groupByTimeClause, groupByLevelClause] - [FILL {PREVIOUS | LINEAR | constant}] - [LIMIT rowLimit OFFSET rowOffset] - [ALIGN BY DEVICE] -; - -intoItem -: [ALIGNED] intoDevicePath '(' intoMeasurementName [',' intoMeasurementName]* ')' - ; -``` - -#### `INTO` Clause - -The `INTO` clause consists of several `intoItem`. - -Each `intoItem` consists of a target device and a list of target measurements (similar to the `INTO` clause in an `INSERT` statement). - -Each target measurement and device form a target time series, and an `intoItem` contains a series of time series. For example: `root.sg_copy.d1(s1, s2)` specifies two target time series `root.sg_copy.d1.s1` and `root.sg_copy.d1.s2`. - -The target time series specified by the `INTO` clause must correspond one-to-one with the columns of the query result set. The specific rules are as follows: - -- **Align by time** (default): The number of target time series contained in all `intoItem` must be consistent with the number of columns in the query result set (except the time column) and correspond one-to-one in the order from left to right in the header. -- **Align by device** (using `ALIGN BY DEVICE`): the number of target devices specified in all `intoItem` is the same as the number of devices queried (i.e., the number of devices matched by the path pattern in the `FROM` clause), and One-to-one correspondence according to the output order of the result set device. -
The number of measurements specified for each target device should be consistent with the number of columns in the query result set (except for the time and device columns). It should be in one-to-one correspondence from left to right in the header. - -For examples: - -- **Example 1** (aligned by time) - -```shell -IoTDB> select s1, s2 into root.sg_copy.d1(t1), root.sg_copy.d2(t1, t2), root.sg_copy.d1(t2) from root.sg.d1, root.sg.d2; -+--------------+-------------------+--------+ -| source column| target timeseries| written| -+--------------+-------------------+--------+ -| root.sg.d1.s1| root.sg_copy.d1.t1| 8000| -+--------------+-------------------+--------+ -| root.sg.d2.s1| root.sg_copy.d2.t1| 10000| -+--------------+-------------------+--------+ -| root.sg.d1.s2| root.sg_copy.d2.t2| 12000| -+--------------+-------------------+--------+ -| root.sg.d2.s2| root.sg_copy.d1.t2| 10000| -+--------------+-------------------+--------+ -Total line number = 4 -It costs 0.725s -``` - -This statement writes the query results of the four time series under the `root.sg` database to the four specified time series under the `root.sg_copy` database. Note that `root.sg_copy.d2(t1, t2)` can also be written as `root.sg_copy.d2(t1), root.sg_copy.d2(t2)`. - -We can see that the writing of the `INTO` clause is very flexible as long as the combined target time series is not repeated and corresponds to the query result column one-to-one. - -> In the result set displayed by `CLI`, the meaning of each column is as follows: -> -> - The `source column` column represents the column name of the query result. -> - `target timeseries` represents the target time series for the corresponding column to write. -> - `written` indicates the amount of data expected to be written. - - -- **Example 2** (aligned by time) - -```shell -IoTDB> select count(s1 + s2), last_value(s2) into root.agg.count(s1_add_s2), root.agg.last_value(s2) from root.sg.d1 group by ([0, 100), 10ms); -+--------------------------------------+-------------------------+--------+ -| source column| target timeseries| written| -+--------------------------------------+-------------------------+--------+ -| count(root.sg.d1.s1 + root.sg.d1.s2)| root.agg.count.s1_add_s2| 10| -+--------------------------------------+-------------------------+--------+ -| last_value(root.sg.d1.s2)| root.agg.last_value.s2| 10| -+--------------------------------------+-------------------------+--------+ -Total line number = 2 -It costs 0.375s -``` - -This statement stores the results of an aggregated query into the specified time series. - -- **Example 3** (aligned by device) - -```shell -IoTDB> select s1, s2 into root.sg_copy.d1(t1, t2), root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; -+--------------+--------------+-------------------+--------+ -| source device| source column| target timeseries| written| -+--------------+--------------+-------------------+--------+ -| root.sg.d1| s1| root.sg_copy.d1.t1| 8000| -+--------------+--------------+-------------------+--------+ -| root.sg.d1| s2| root.sg_copy.d1.t2| 11000| -+--------------+--------------+-------------------+--------+ -| root.sg.d2| s1| root.sg_copy.d2.t1| 12000| -+--------------+--------------+-------------------+--------+ -| root.sg.d2| s2| root.sg_copy.d2.t2| 9000| -+--------------+--------------+-------------------+--------+ -Total line number = 4 -It costs 0.625s -``` - -This statement also writes the query results of the four time series under the `root.sg` database to the four specified time series under the `root.sg_copy` database. However, in ALIGN BY DEVICE, the number of `intoItem` must be the same as the number of queried devices, and each queried device corresponds to one `intoItem`. - -> When aligning the query by device, the result set displayed by `CLI` has one more column, the `source device` column indicating the queried device. - -- **Example 4** (aligned by device) - -```shell -IoTDB> select s1 + s2 into root.expr.add(d1s1_d1s2), root.expr.add(d2s1_d2s2) from root.sg.d1, root.sg.d2 align by device; -+--------------+--------------+------------------------+--------+ -| source device| source column| target timeseries| written| -+--------------+--------------+------------------------+--------+ -| root.sg.d1| s1 + s2| root.expr.add.d1s1_d1s2| 10000| -+--------------+--------------+------------------------+--------+ -| root.sg.d2| s1 + s2| root.expr.add.d2s1_d2s2| 10000| -+--------------+--------------+------------------------+--------+ -Total line number = 2 -It costs 0.532s -``` - -This statement stores the result of evaluating an expression into the specified time series. - -#### Using variable placeholders - -In particular, We can use variable placeholders to describe the correspondence between the target and query time series, simplifying the statement. The following two variable placeholders are currently supported: - -- Suffix duplication character `::`: Copy the suffix (or measurement) of the query device, indicating that from this layer to the last layer (or measurement) of the device, the node name (or measurement) of the target device corresponds to the queried device The node name (or measurement) is the same. -- Single-level node matcher `${i}`: Indicates that the current level node name of the target sequence is the same as the i-th level node name of the query sequence. For example, for the path `root.sg1.d1.s1`, `${1}` means `sg1`, `${2}` means `d1`, and `${3}` means `s1`. - -When using variable placeholders, there must be no ambiguity in the correspondence between `intoItem` and the columns of the query result set. The specific cases are classified as follows: - -##### ALIGN BY TIME (default) - -> Note: The variable placeholder **can only describe the correspondence between time series**. If the query includes aggregation and expression calculation, the columns in the query result cannot correspond to a time series, so neither the target device nor the measurement can use variable placeholders. - -###### (1) The target device does not use variable placeholders & the target measurement list uses variable placeholders - -**Limitations:** - -1. In each `intoItem`, the length of the list of physical quantities must be 1.
(If the length can be greater than 1, e.g. `root.sg1.d1(::, s1)`, it is not possible to determine which columns match `::`) -2. The number of `intoItem` is 1, or the same as the number of columns in the query result set.
(When the length of each target measurement list is 1, if there is only one `intoItem`, it means that all the query sequences are written to the same device; if the number of `intoItem` is consistent with the query sequence, it is expressed as each query time series specifies a target device; if `intoItem` is greater than one and less than the number of query sequences, it cannot be a one-to-one correspondence with the query sequence) - -**Matching method:** Each query time series specifies the target device, and the target measurement is generated from the variable placeholder. - -**Example:** - -```sql -select s1, s2 -into root.sg_copy.d1(::), root.sg_copy.d2(s1), root.sg_copy.d1(${3}), root.sg_copy.d2(::) -from root.sg.d1, root.sg.d2; -```` - -This statement is equivalent to: - -```sql -select s1, s2 -into root.sg_copy.d1(s1), root.sg_copy.d2(s1), root.sg_copy.d1(s2), root.sg_copy.d2(s2) -from root.sg.d1, root.sg.d2; -```` - -As you can see, the statement is not very simplified in this case. - -###### (2) The target device uses variable placeholders & the target measurement list does not use variable placeholders - -**Limitations:** The number of target measurements in all `intoItem` is the same as the number of columns in the query result set. - -**Matching method:** The target measurement is specified for each query time series, and the target device is generated according to the target device placeholder of the `intoItem` where the corresponding target measurement is located. - -**Example:** - -```sql -select d1.s1, d1.s2, d2.s3, d3.s4 -into ::(s1_1, s2_2), root.sg.d2_2(s3_3), root.${2}_copy.::(s4) -from root.sg; -```` - -###### (3) The target device uses variable placeholders & the target measurement list uses variable placeholders - -**Limitations:** There is only one `intoItem`, and the length of the list of measurement list is 1. - -**Matching method:** Each query time series can get a target time series according to the variable placeholder. - -**Example:** - -```sql -select * into root.sg_bk.::(::) from root.sg.**; -```` - -Write the query results of all time series under `root.sg` to `root.sg_bk`, the device name suffix and measurement remain unchanged. - -##### ALIGN BY DEVICE - -> Note: The variable placeholder **can only describe the correspondence between time series**. If the query includes aggregation and expression calculation, the columns in the query result cannot correspond to a specific physical quantity, so the target measurement cannot use variable placeholders. - -###### (1) The target device does not use variable placeholders & the target measurement list uses variable placeholders - -**Limitations:** In each `intoItem`, if the list of measurement uses variable placeholders, the length of the list must be 1. - -**Matching method:** Each query time series specifies the target device, and the target measurement is generated from the variable placeholder. - -**Example:** - -```sql -select s1, s2, s3, s4 -into root.backup_sg.d1(s1, s2, s3, s4), root.backup_sg.d2(::), root.sg.d3(backup_${4}) -from root.sg.d1, root.sg.d2, root.sg.d3 -align by device; -```` - -###### (2) The target device uses variable placeholders & the target measurement list does not use variable placeholders - -**Limitations:** There is only one `intoItem`. (If there are multiple `intoItem` with placeholders, we will not know which source devices each `intoItem` needs to match) - -**Matching method:** Each query device obtains a target device according to the variable placeholder, and the target measurement written in each column of the result set under each device is specified by the target measurement list. - -**Example:** - -```sql -select avg(s1), sum(s2) + sum(s3), count(s4) -into root.agg_${2}.::(avg_s1, sum_s2_add_s3, count_s4) -from root.** -align by device; -```` - -###### (3) The target device uses variable placeholders & the target measurement list uses variable placeholders - -**Limitations:** There is only one `intoItem` and the length of the target measurement list is 1. - -**Matching method:** Each query time series can get a target time series according to the variable placeholder. - -**Example:** - -```sql -select * into ::(backup_${4}) from root.sg.** align by device; -```` - -Write the query result of each time series in `root.sg` to the same device, and add `backup_` before the measurement. - -#### Specify the target time series as the aligned time series - -We can use the `ALIGNED` keyword to specify the target device for writing to be aligned, and each `intoItem` can be set independently. - -**Example:** - -```sql -select s1, s2 into root.sg_copy.d1(t1, t2), aligned root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; -``` - -This statement specifies that `root.sg_copy.d1` is an unaligned device and `root.sg_copy.d2` is an aligned device. - -#### Unsupported query clauses - -- `SLIMIT`, `SOFFSET`: The query columns are uncertain, so they are not supported. -- `LAST`, `GROUP BY TAGS`, `DISABLE ALIGN`: The table structure is inconsistent with the writing structure, so it is not supported. - -#### Other points to note - -- For general aggregation queries, the timestamp is meaningless, and the convention is to use 0 to store. -- When the target time-series exists, the data type of the source column and the target time-series must be compatible. About data type compatibility, see the document [Data Type](../Basic-Concept/Data-Type.md#Data Type Compatibility). -- When the target time series does not exist, the system automatically creates it (including the database). -- When the queried time series does not exist, or the queried sequence does not have data, the target time series will not be created automatically. - -### Application examples - -#### Implement IoTDB internal ETL - -ETL the original data and write a new time series. - -```shell -IOTDB > SELECT preprocess_udf(s1, s2) INTO ::(preprocessed_s1, preprocessed_s2) FROM root.sg.* ALIGN BY DEIVCE; -+--------------+-------------------+---------------------------+--------+ -| source device| source column| target timeseries| written| -+--------------+-------------------+---------------------------+--------+ -| root.sg.d1| preprocess_udf(s1)| root.sg.d1.preprocessed_s1| 8000| -+--------------+-------------------+---------------------------+--------+ -| root.sg.d1| preprocess_udf(s2)| root.sg.d1.preprocessed_s2| 10000| -+--------------+-------------------+---------------------------+--------+ -| root.sg.d2| preprocess_udf(s1)| root.sg.d2.preprocessed_s1| 11000| -+--------------+-------------------+---------------------------+--------+ -| root.sg.d2| preprocess_udf(s2)| root.sg.d2.preprocessed_s2| 9000| -+--------------+-------------------+---------------------------+--------+ -``` - -#### Query result storage - -Persistently store the query results, which acts like a materialized view. - -```shell -IOTDB > SELECT count(s1), last_value(s1) INTO root.sg.agg_${2}(count_s1, last_value_s1) FROM root.sg1.d1 GROUP BY ([0, 10000), 10ms); -+--------------------------+-----------------------------+--------+ -| source column| target timeseries| written| -+--------------------------+-----------------------------+--------+ -| count(root.sg.d1.s1)| root.sg.agg_d1.count_s1| 1000| -+--------------------------+-----------------------------+--------+ -| last_value(root.sg.d1.s2)| root.sg.agg_d1.last_value_s2| 1000| -+--------------------------+-----------------------------+--------+ -Total line number = 2 -It costs 0.115s -``` - -#### Non-aligned time series to aligned time series - -Rewrite non-aligned time series into another aligned time series. - -**Note:** It is recommended to use the `LIMIT & OFFSET` clause or the `WHERE` clause (time filter) to batch data to prevent excessive data volume in a single operation. - -```shell -IOTDB > SELECT s1, s2 INTO ALIGNED root.sg1.aligned_d(s1, s2) FROM root.sg1.non_aligned_d WHERE time >= 0 and time < 10000; -+--------------------------+----------------------+--------+ -| source column| target timeseries| written| -+--------------------------+----------------------+--------+ -| root.sg1.non_aligned_d.s1| root.sg1.aligned_d.s1| 10000| -+--------------------------+----------------------+--------+ -| root.sg1.non_aligned_d.s2| root.sg1.aligned_d.s2| 10000| -+--------------------------+----------------------+--------+ -Total line number = 2 -It costs 0.375s -``` - -### User Permission Management - -The user must have the following permissions to execute a query write-back statement: - -* All `WRITE_SCHEMA` permissions for the source series in the `select` clause. -* All `WRITE_DATA` permissions for the target series in the `into` clause. - -For more user permissions related content, please refer to [Account Management Statements](./Authority-Management.md). - -### Configurable Properties - -* `select_into_insert_tablet_plan_row_limit`: The maximum number of rows can be processed in one insert-tablet-plan when executing select-into statements. 10000 by default. diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Security-Management_timecho.md b/src/UserGuide/V1.3.0-2/User-Manual/Security-Management_timecho.md deleted file mode 100644 index b0c19a135..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Security-Management_timecho.md +++ /dev/null @@ -1,144 +0,0 @@ - - -# Security Management - -## White List - -**function description** - -Allow which client addresses can connect to IoTDB - -**configuration file** - -conf/iotdb-common.properties - -conf/white.list - -**configuration item** - -iotdb-common.properties: - -Decide whether to enable white list - -```YAML - -# Whether to enable white list -enable_white_list=true -``` - -white.list: - -Decide which IP addresses can connect to IoTDB - -```YAML -# Support for annotation -# Supports precise matching, one IP per line -10.2.3.4 - -# Support for * wildcards, one ip per line -10.*.1.3 -10.100.0.* -``` - -**note** - -1. If the white list itself is cancelled via the session client, the current connection is not immediately disconnected. It is rejected the next time the connection is created. -2. If white.list is modified directly, it takes effect within one minute. If modified via the session client, it takes effect immediately, updating the values in memory and the white.list disk file. -3. Enable the whitelist function, there is no white.list file, start the DB service successfully, however, all connections are rejected. -4. while DB service is running, the white.list file is deleted, and all connections are denied after up to one minute. -5. whether to enable the configuration of the white list function, can be hot loaded. -6. Use the Java native interface to modify the whitelist, must be the root user to modify, reject non-root user to modify; modify the content must be legal, otherwise it will throw a StatementExecutionException. - -![](/img/%E7%99%BD%E5%90%8D%E5%8D%95.png) - -## Audit log - -### Background of the function - -Audit log is the record credentials of a database, which can be queried by the audit log function to ensure information security by various operations such as user add, delete, change and check in the database. With the audit log function of IoTDB, the following scenarios can be achieved: - -- We can decide whether to record audit logs according to the source of the link ( human operation or not), such as: non-human operation such as hardware collector write data no need to record audit logs, human operation such as ordinary users through cli, workbench and other tools to operate the data need to record audit logs. -- Filter out system-level write operations, such as those recorded by the IoTDB monitoring system itself. - -#### Scene Description - -##### Logging all operations (add, delete, change, check) of all users - -The audit log function traces all user operations in the database. The information recorded should include data operations (add, delete, query) and metadata operations (add, modify, delete, query), client login information (user name, ip address). - -Client Sources: -- Cli、workbench、Zeppelin、Grafana、通过 Session/JDBC/MQTT 等协议传入的请求 - -![](/img/%E5%AE%A1%E8%AE%A1%E6%97%A5%E5%BF%97.png) - -##### Audit logging can be turned off for some user connections - -No audit logs are required for data written by the hardware collector via Session/JDBC/MQTT if it is a non-human action. - -### Function Definition - -It is available through through configurations: - -- Decide whether to enable the audit function or not -- Decide where to output the audit logs, support output to one or more - 1. log file - 2. IoTDB storage -- Decide whether to block the native interface writes to prevent recording too many audit logs to affect performance. -- Decide the content category of the audit log, supporting recording one or more - 1. data addition and deletion operations - 2. data and metadata query operations - 3. metadata class adding, modifying, and deleting operations. - -#### configuration item - -In iotdb-common.properties, change the following configurations: - -```YAML -#################### -### Audit log Configuration -#################### - -# whether to enable the audit log. -# Datatype: Boolean -# enable_audit_log=false - -# Output location of audit logs -# Datatype: String -# IOTDB: the stored time series is: root.__system.audit._{user} -# LOGGER: log_audit.log in the log directory -# audit_log_storage=IOTDB,LOGGER - -# whether enable audit log for DML operation of data -# whether enable audit log for DDL operation of schema -# whether enable audit log for QUERY operation of data and schema -# Datatype: String -# audit_log_operation=DML,DDL,QUERY - -# whether the local write api records audit logs -# Datatype: Boolean -# This contains Session insert api: insertRecord(s), insertTablet(s),insertRecordsOfOneDevice -# MQTT insert api -# RestAPI insert api -# This parameter will cover the DML in audit_log_operation -# enable_audit_log_for_native_insert_api=true -``` - diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Streaming_apache.md b/src/UserGuide/V1.3.0-2/User-Manual/Streaming_apache.md deleted file mode 100644 index be8d1c08b..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Streaming_apache.md +++ /dev/null @@ -1,800 +0,0 @@ - - -# Stream Processing - -The IoTDB stream processing framework allows users to implement customized stream processing logic, which can monitor and capture storage engine changes, transform changed data, and push transformed data outward. - -We call a data flow processing task a Pipe. A stream processing task (Pipe) contains three subtasks: - -- Source task -- Processor task -- Sink task - -The stream processing framework allows users to customize the processing logic of three subtasks using Java language and process data in a UDF-like manner. -In a Pipe, the three subtasks mentioned above are executed and implemented by three types of plugins. Data flows through these three plugins sequentially for processing: -Pipe Source is used to extract data, Pipe Processor is used to process data, Pipe Sink is used to send data, and the final data will be sent to an external system. - -**The model for a Pipe task is as follows:** - -![pipe.png](/img/1706778988482.jpg) - -A data stream processing task essentially describes the attributes of the Pipe Source, Pipe Processor, and Pipe Sink plugins. - -Users can configure the specific attributes of these three subtasks declaratively using SQL statements. By combining different attributes, flexible data ETL (Extract, Transform, Load) capabilities can be achieved. - -Using the stream processing framework, it is possible to build a complete data pipeline to fulfill various requirements such as *edge-to-cloud synchronization, remote disaster recovery, and read/write load balancing across multiple databases*. - -## Custom Stream Processing Plugin Development - -### Programming development dependencies - -It is recommended to use Maven to build the project. Add the following dependencies in the `pom.xml` file. Please make sure to choose dependencies with the same version as the IoTDB server version. - -```xml - - org.apache.iotdb - pipe-api - 1.3.1 - provided - -``` - -### Event-Driven Programming Model - -The design of user programming interfaces for stream processing plugins follows the principles of the event-driven programming model. In this model, events serve as the abstraction of data in the user programming interface. The programming interface is decoupled from the specific execution method, allowing the focus to be on describing how the system expects events (data) to be processed upon arrival. - -In the user programming interface of stream processing plugins, events abstract the write operations of database data. Events are captured by the local stream processing engine and passed sequentially through the three stages of stream processing, namely Pipe Source, Pipe Processor, and Pipe Sink plugins. User logic is triggered and executed within these three plugins. - -To accommodate both low-latency stream processing in low-load scenarios and high-throughput stream processing in high-load scenarios at the edge, the stream processing engine dynamically chooses the processing objects from operation logs and data files. Therefore, the user programming interface for stream processing requires the user to provide the handling logic for two types of events: TabletInsertionEvent for operation log write events and TsFileInsertionEvent for data file write events. - -#### **TabletInsertionEvent** - -The TabletInsertionEvent is a high-level data abstraction for user write requests, which provides the ability to manipulate the underlying data of the write request by providing a unified operation interface. - -For different database deployments, the underlying storage structure corresponding to the operation log write event is different. For stand-alone deployment scenarios, the operation log write event is an encapsulation of write-ahead log (WAL) entries; for distributed deployment scenarios, the operation log write event is an encapsulation of individual node consensus protocol operation log entries. - -For write operations generated by different write request interfaces of the database, the data structure of the request structure corresponding to the operation log write event is also different.IoTDB provides many write interfaces such as InsertRecord, InsertRecords, InsertTablet, InsertTablets, and so on, and each kind of write request uses a completely different serialisation method to generate a write request. completely different serialisation methods and generate different binary entries. - -The existence of operation log write events provides users with a unified view of data operations, which shields the implementation differences of the underlying data structures, greatly reduces the programming threshold for users, and improves the ease of use of the functionality. - -```java -/** TabletInsertionEvent is used to define the event of data insertion. */ -public interface TabletInsertionEvent extends Event { - - /** - * The consumer processes the data row by row and collects the results by RowCollector. - * - * @return {@code Iterable} a list of new TabletInsertionEvent contains the - * results collected by the RowCollector - */ - Iterable processRowByRow(BiConsumer consumer); - - /** - * The consumer processes the Tablet directly and collects the results by RowCollector. - * - * @return {@code Iterable} a list of new TabletInsertionEvent contains the - * results collected by the RowCollector - */ - Iterable processTablet(BiConsumer consumer); -} -``` - -#### **TsFileInsertionEvent** - -The TsFileInsertionEvent represents a high-level abstraction of the database's disk flush operation and is a collection of multiple TabletInsertionEvents. - -IoTDB's storage engine is based on the LSM (Log-Structured Merge) structure. When data is written, the write operations are first flushed to log-structured files, while the written data is also stored in memory. When the memory reaches its capacity limit, a flush operation is triggered, converting the data in memory into a database file while deleting the previously written log entries. During the conversion from memory data to database file data, two compression processes, encoding compression and universal compression, are applied. As a result, the data in the database file occupies less space compared to the original data in memory. - -In extreme network conditions, directly transferring data files is more cost-effective than transmitting individual write operations. It consumes lower network bandwidth and achieves faster transmission speed. However, there is no such thing as a free lunch. Performing calculations on data in the disk file incurs additional costs for file I/O compared to performing calculations directly on data in memory. Nevertheless, the coexistence of disk data files and memory write operations permits dynamic trade-offs and adjustments. It is based on this observation that the data file write event is introduced into the event model of the plugin. - -In summary, the data file write event appears in the event stream of stream processing plugins in the following two scenarios: - -1. Historical data extraction: Before a stream processing task starts, all persisted write data exists in the form of TsFiles. When collecting historical data at the beginning of a stream processing task, the historical data is abstracted as TsFileInsertionEvent. - -2. Real-time data extraction: During the execution of a stream processing task, if the speed of processing the log entries representing real-time operations is slower than the rate of write requests, the unprocessed log entries will be persisted to disk in the form of TsFiles. When these data are extracted by the stream processing engine, they are abstracted as TsFileInsertionEvent. - -```java -/** - * TsFileInsertionEvent is used to define the event of writing TsFile. Event data stores in disks, - * which is compressed and encoded, and requires IO cost for computational processing. - */ -public interface TsFileInsertionEvent extends Event { - - /** - * The method is used to convert the TsFileInsertionEvent into several TabletInsertionEvents. - * - * @return {@code Iterable} the list of TabletInsertionEvent - */ - Iterable toTabletInsertionEvents(); -} -``` - -### Custom Stream Processing Plugin Programming Interface Definition - -Based on the custom stream processing plugin programming interface, users can easily write data extraction plugins, data processing plugins, and data sending plugins, allowing the stream processing functionality to adapt flexibly to various industrial scenarios. -#### Data Extraction Plugin Interface - -Data extraction is the first stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data extraction plugin (PipeSource) serves as a bridge between the stream processing engine and the storage engine. It captures various data write events by listening to the behavior of the storage engine. -```java -/** - * PipeSource - * - *

PipeSource is responsible for capturing events from sources. - * - *

Various data sources can be supported by implementing different PipeSource classes. - * - *

The lifecycle of a PipeSource is as follows: - * - *

    - *
  • When a collaboration task is created, the KV pairs of `WITH SOURCE` clause in SQL are - * parsed and the validation method {@link PipeSource#validate(PipeParameterValidator)} will - * be called to validate the parameters. - *
  • Before the collaboration task starts, the method {@link - * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} will be called to - * config the runtime behavior of the PipeSource. - *
  • Then the method {@link PipeSource#start()} will be called to start the PipeSource. - *
  • While the collaboration task is in progress, the method {@link PipeSource#supply()} will be - * called to capture events from sources and then the events will be passed to the - * PipeProcessor. - *
  • The method {@link PipeSource#close()} will be called when the collaboration task is - * cancelled (the `DROP PIPE` command is executed). - *
- */ -public interface PipeSource extends PipePlugin { - - /** - * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link - * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. - * - * @param validator the validator used to validate {@link PipeParameters} - * @throws Exception if any parameter is not valid - */ - void validate(PipeParameterValidator validator) throws Exception; - - /** - * This method is mainly used to customize PipeSource. In this method, the user can do the - * following things: - * - *
    - *
  • Use PipeParameters to parse key-value pair attributes entered by the user. - *
  • Set the running configurations in PipeSourceRuntimeConfiguration. - *
- * - *

This method is called after the method {@link PipeSource#validate(PipeParameterValidator)} - * is called. - * - * @param parameters used to parse the input parameters entered by the user - * @param configuration used to set the required properties of the running PipeSource - * @throws Exception the user can throw errors if necessary - */ - void customize(PipeParameters parameters, PipeSourceRuntimeConfiguration configuration) - throws Exception; - - /** - * Start the source. After this method is called, events should be ready to be supplied by - * {@link PipeSource#supply()}. This method is called after {@link - * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. - * - * @throws Exception the user can throw errors if necessary - */ - void start() throws Exception; - - /** - * Supply single event from the source and the caller will send the event to the processor. - * This method is called after {@link PipeSource#start()} is called. - * - * @return the event to be supplied. the event may be null if the source has no more events at - * the moment, but the source is still running for more events. - * @throws Exception the user can throw errors if necessary - */ - Event supply() throws Exception; -} -``` - -#### Data Processing Plugin Interface - -Data processing is the second stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data processing plugin (PipeProcessor) is primarily used for filtering and transforming the various events captured by the data extraction plugin (PipeSource). - -```java -/** - * PipeProcessor - * - *

PipeProcessor is used to filter and transform the Event formed by the PipeSource. - * - *

The lifecycle of a PipeProcessor is as follows: - * - *

    - *
  • When a collaboration task is created, the KV pairs of `WITH PROCESSOR` clause in SQL are - * parsed and the validation method {@link PipeProcessor#validate(PipeParameterValidator)} - * will be called to validate the parameters. - *
  • Before the collaboration task starts, the method {@link - * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} will be called - * to config the runtime behavior of the PipeProcessor. - *
  • While the collaboration task is in progress: - *
      - *
    • PipeSource captures the events and wraps them into three types of Event instances. - *
    • PipeProcessor processes the event and then passes them to the PipeSink. The - * following 3 methods will be called: {@link - * PipeProcessor#process(TabletInsertionEvent, EventCollector)}, {@link - * PipeProcessor#process(TsFileInsertionEvent, EventCollector)} and {@link - * PipeProcessor#process(Event, EventCollector)}. - *
    • PipeSink serializes the events into binaries and send them to sinks. - *
    - *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link - * PipeProcessor#close() } method will be called. - *
- */ -public interface PipeProcessor extends PipePlugin { - - /** - * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link - * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} is called. - * - * @param validator the validator used to validate {@link PipeParameters} - * @throws Exception if any parameter is not valid - */ - void validate(PipeParameterValidator validator) throws Exception; - - /** - * This method is mainly used to customize PipeProcessor. In this method, the user can do the - * following things: - * - *
    - *
  • Use PipeParameters to parse key-value pair attributes entered by the user. - *
  • Set the running configurations in PipeProcessorRuntimeConfiguration. - *
- * - *

This method is called after the method {@link - * PipeProcessor#validate(PipeParameterValidator)} is called and before the beginning of the - * events processing. - * - * @param parameters used to parse the input parameters entered by the user - * @param configuration used to set the required properties of the running PipeProcessor - * @throws Exception the user can throw errors if necessary - */ - void customize(PipeParameters parameters, PipeProcessorRuntimeConfiguration configuration) - throws Exception; - - /** - * This method is called to process the TabletInsertionEvent. - * - * @param tabletInsertionEvent TabletInsertionEvent to be processed - * @param eventCollector used to collect result events after processing - * @throws Exception the user can throw errors if necessary - */ - void process(TabletInsertionEvent tabletInsertionEvent, EventCollector eventCollector) - throws Exception; - - /** - * This method is called to process the TsFileInsertionEvent. - * - * @param tsFileInsertionEvent TsFileInsertionEvent to be processed - * @param eventCollector used to collect result events after processing - * @throws Exception the user can throw errors if necessary - */ - default void process(TsFileInsertionEvent tsFileInsertionEvent, EventCollector eventCollector) - throws Exception { - for (final TabletInsertionEvent tabletInsertionEvent : - tsFileInsertionEvent.toTabletInsertionEvents()) { - process(tabletInsertionEvent, eventCollector); - } - } - - /** - * This method is called to process the Event. - * - * @param event Event to be processed - * @param eventCollector used to collect result events after processing - * @throws Exception the user can throw errors if necessary - */ - void process(Event event, EventCollector eventCollector) throws Exception; -} -``` - -#### Data Sending Plugin Interface - -Data sending is the third stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data sending plugin (PipeSink) is responsible for sending the various events processed by the data processing plugin (PipeProcessor). It serves as the network implementation layer of the stream processing framework and should support multiple real-time communication protocols and connectors in its interface. - -```java -/** - * PipeSink - * - *

PipeSink is responsible for sending events to sinks. - * - *

Various network protocols can be supported by implementing different PipeSink classes. - * - *

The lifecycle of a PipeSink is as follows: - * - *

    - *
  • When a collaboration task is created, the KV pairs of `WITH SINK` clause in SQL are - * parsed and the validation method {@link PipeSink#validate(PipeParameterValidator)} will be - * called to validate the parameters. - *
  • Before the collaboration task starts, the method {@link PipeSink#customize(PipeParameters, - * PipeSinkRuntimeConfiguration)} will be called to configure the runtime behavior of the - * PipeSink and the method {@link PipeSink#handshake()} will be called to create a connection - * with sink. - *
  • While the collaboration task is in progress: - *
      - *
    • PipeSource captures the events and wraps them into three types of Event instances. - *
    • PipeProcessor processes the event and then passes them to the PipeSink. - *
    • PipeSink serializes the events into binaries and send them to sinks. The following 3 - * methods will be called: {@link PipeSink#transfer(TabletInsertionEvent)}, {@link - * PipeSink#transfer(TsFileInsertionEvent)} and {@link PipeSink#transfer(Event)}. - *
    - *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link - * PipeSink#close() } method will be called. - *
- * - *

In addition, the method {@link PipeSink#heartbeat()} will be called periodically to check - * whether the connection with sink is still alive. The method {@link PipeSink#handshake()} will be - * called to create a new connection with the sink when the method {@link PipeSink#heartbeat()} - * throws exceptions. - */ -public interface PipeSink extends PipePlugin { - - /** - * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link - * PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called. - * - * @param validator the validator used to validate {@link PipeParameters} - * @throws Exception if any parameter is not valid - */ - void validate(PipeParameterValidator validator) throws Exception; - - /** - * This method is mainly used to customize PipeSink. In this method, the user can do the following - * things: - * - *

    - *
  • Use PipeParameters to parse key-value pair attributes entered by the user. - *
  • Set the running configurations in PipeSinkRuntimeConfiguration. - *
- * - *

This method is called after the method {@link PipeSink#validate(PipeParameterValidator)} is - * called and before the method {@link PipeSink#handshake()} is called. - * - * @param parameters used to parse the input parameters entered by the user - * @param configuration used to set the required properties of the running PipeSink - * @throws Exception the user can throw errors if necessary - */ - void customize(PipeParameters parameters, PipeSinkRuntimeConfiguration configuration) - throws Exception; - - /** - * This method is used to create a connection with sink. This method will be called after the - * method {@link PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called or - * will be called when the method {@link PipeSink#heartbeat()} throws exceptions. - * - * @throws Exception if the connection is failed to be created - */ - void handshake() throws Exception; - - /** - * This method will be called periodically to check whether the connection with sink is still - * alive. - * - * @throws Exception if the connection dies - */ - void heartbeat() throws Exception; - - /** - * This method is used to transfer the TabletInsertionEvent. - * - * @param tabletInsertionEvent TabletInsertionEvent to be transferred - * @throws PipeConnectionException if the connection is broken - * @throws Exception the user can throw errors if necessary - */ - void transfer(TabletInsertionEvent tabletInsertionEvent) throws Exception; - - /** - * This method is used to transfer the TsFileInsertionEvent. - * - * @param tsFileInsertionEvent TsFileInsertionEvent to be transferred - * @throws PipeConnectionException if the connection is broken - * @throws Exception the user can throw errors if necessary - */ - default void transfer(TsFileInsertionEvent tsFileInsertionEvent) throws Exception { - try { - for (final TabletInsertionEvent tabletInsertionEvent : - tsFileInsertionEvent.toTabletInsertionEvents()) { - transfer(tabletInsertionEvent); - } - } finally { - tsFileInsertionEvent.close(); - } - } - - /** - * This method is used to transfer the generic events, including HeartbeatEvent. - * - * @param event Event to be transferred - * @throws PipeConnectionException if the connection is broken - * @throws Exception the user can throw errors if necessary - */ - void transfer(Event event) throws Exception; -} -``` - -## Custom Stream Processing Plugin Management - -To ensure the flexibility and usability of user-defined plugins in production environments, the system needs to provide the capability to dynamically manage plugins. This section introduces the management statements for stream processing plugins, which enable the dynamic and unified management of plugins. - -### Load Plugin Statement - -In IoTDB, to dynamically load a user-defined plugin into the system, you first need to implement a specific plugin class based on PipeSource, PipeProcessor, or PipeSink. Then, you need to compile and package the plugin class into an executable jar file. Finally, you can use the loading plugin management statement to load the plugin into IoTDB. - -The syntax of the loading plugin management statement is as follows: - -```sql -CREATE PIPEPLUGIN -AS -USING -``` - -For example, if a user implements a data processing plugin with the fully qualified class name "edu.tsinghua.iotdb.pipe.ExampleProcessor" and packages it into a jar file, which is stored at "https://example.com:8080/iotdb/pipe-plugin.jar", and the user wants to use this plugin in the stream processing engine, marking the plugin as "example". The creation statement for this data processing plugin is as follows: - -```sql -CREATE PIPEPLUGIN example -AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' -USING URI -``` - -### Delete Plugin Statement - -When user no longer wants to use a plugin and needs to uninstall the plugin from the system, you can use the Remove plugin statement as shown below. -```sql -DROP PIPEPLUGIN -``` - -### Show Plugin Statement - -User can also view the plugin in the system on need. The statement to view plugin is as follows. -```sql -SHOW PIPEPLUGINS -``` - -## System Pre-installed Stream Processing Plugin - -### Pre-built Source Plugin - -#### iotdb-source - -Function: Extract historical or realtime data inside IoTDB into pipe. - - -| key | value | value range | required or optional with default | -|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|-----------------------------------| -| source | iotdb-source | String: iotdb-source | required | -| source.pattern | path prefix for filtering time series | String: any time series prefix | optional: root | -| source.history.start-time | start of synchronizing historical data event time,including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | -| source.history.end-time | end of synchronizing historical data event time,including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | -| start-time(V1.3.1+) | start of synchronizing all data event time,including start-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | -| end-time(V1.3.1+) | end of synchronizing all data event time,including end-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | - -> 🚫 **source.pattern Parameter Description** -> -> * Pattern should use backquotes to modify illegal characters or illegal path nodes, for example, if you want to filter root.\`a@b\` or root.\`123\`, you should set the pattern to root.\`a@b\` or root.\`123\`(Refer specifically to [Timing of single and double quotes and backquotes](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) -> * In the underlying implementation, when pattern is detected as root (default value) or a database name, synchronization efficiency is higher, and any other format will reduce performance. -> * The path prefix does not need to form a complete path. For example, when creating a pipe with the parameter 'source.pattern'='root.aligned.1': -> -> * root.aligned.1TS -> * root.aligned.1TS.\`1\` -> * root.aligned.100TS -> -> the data will be synchronized; -> -> * root.aligned.\`123\` -> -> the data will not be synchronized. - -> ❗️**start-time, end-time parameter description of source** -> -> * start-time, end-time should be in ISO format, such as 2011-12-03T10:15:30 or 2011-12-03T10:15:30+01:00. However, version 1.3.1+ supports timeStamp format like 1706704494000. - -> ✅ **A piece of data from production to IoTDB contains two key concepts of time** -> -> * **event time:** the time when the data is actually produced (or the generation time assigned to the data by the data production system, which is a time item in the data point), also called the event time. -> * **arrival time:** the time the data arrived in the IoTDB system. -> -> The out-of-order data we often refer to refers to data whose **event time** is far behind the current system time (or the maximum **event time** that has been dropped) when the data arrives. On the other hand, whether it is out-of-order data or sequential data, as long as they arrive newly in the system, their **arrival time** will increase with the order in which the data arrives at IoTDB. - -> 💎 **the work of iotdb-source can be split into two stages** -> -> 1. Historical data extraction: All data with **arrival time** < **current system time** when creating the pipe is called historical data -> 2. Realtime data extraction: All data with **arrival time** >= **current system time** when the pipe is created is called realtime data -> -> The historical data transmission phase and the realtime data transmission phase are executed serially. Only when the historical data transmission phase is completed, the realtime data transmission phase is executed.** - -### Pre-built Processor Plugin - -#### do-nothing-processor - -Function: Do not do anything with the events passed in by the source. - - -| key | value | value range | required or optional with default | -|-----------|----------------------|------------------------------|-----------------------------------| -| processor | do-nothing-processor | String: do-nothing-processor | required | -### Pre-built Sink Plugin - -#### do-nothing-sink - -Function: Does not do anything with the events passed in by the processor. - - -| key | value | value range | required or optional with default | -|------|-----------------|-------------------------|-----------------------------------| -| sink | do-nothing-sink | String: do-nothing-sink | required | - -## Stream Processing Task Management - -### Create Stream Processing Task - -A stream processing task can be created using the `CREATE PIPE` statement, a sample SQL statement is shown below: - -```sql -CREATE PIPE -- PipeId is the name that uniquely identifies the sync task -WITH SOURCE ( - -- Default IoTDB Data Extraction Plugin - 'source' = 'iotdb-source', - -- Path prefix, only data that can match the path prefix will be extracted for subsequent processing and delivery - 'source.pattern' = 'root.timecho', - -- Whether to extract historical data - 'source.history.enable' = 'true', - -- Describes the time range of the historical data being extracted, indicating the earliest possible time - 'source.history.start-time' = '2011.12.03T10:15:30+01:00', - -- Describes the time range of the extracted historical data, indicating the latest time - 'source.history.end-time' = '2022.12.03T10:15:30+01:00', - -- Whether to extract realtime data - 'source.realtime.enable' = 'true', -) -WITH PROCESSOR ( - -- Default data processing plugin, means no processing - 'processor' = 'do-nothing-processor', -) -WITH SINK ( - -- IoTDB data sending plugin with target IoTDB - 'sink' = 'iotdb-thrift-sink', - -- Data service for one of the DataNode nodes on the target IoTDB ip - 'sink.ip' = '127.0.0.1', - -- Data service port of one of the DataNode nodes of the target IoTDB - 'sink.port' = '6667', -) -``` - -**To create a stream processing task it is necessary to configure the PipeId and the parameters of the three plugin sections:** - - -| configuration item | description | Required or not | default implementation | Default implementation description | Whether to allow custom implementations | -|--------------------|-------------------------------------------------------------------------------------|---------------------------------|------------------------|-----------------------------------------------------------------------------------------------|-----------------------------------------| -| pipeId | Globally uniquely identifies the name of a sync task | required | - | - | - | -| source | pipe Source plugin, for extracting synchronized data at the bottom of the database | Optional | iotdb-source | Integrate all historical data of the database and subsequent realtime data into the sync task | no | -| processor | Pipe Processor plugin, for processing data | Optional | do-nothing-processor | no processing of incoming data | yes | -| sink | Pipe Sink plugin,for sending data | required | - | - | yes | - -In the example, the iotdb-source, do-nothing-processor, and iotdb-thrift-sink plugins are used to build the data synchronisation task. iotdb has other built-in data synchronisation plugins, **see the section "System pre-built data synchronisation plugins" **. See the "System Pre-installed Stream Processing Plugin" section**. - -**An example of a minimalist CREATE PIPE statement is as follows:** - -```sql -CREATE PIPE -- PipeId is a name that uniquely identifies the task. -WITH SINK ( - -- IoTDB data sending plugin with target IoTDB - 'sink' = 'iotdb-thrift-sink', - -- Data service for one of the DataNode nodes on the target IoTDB ip - 'sink.ip' = '127.0.0.1', - -- Data service port of one of the DataNode nodes of the target IoTDB - 'sink.port' = '6667', -) -``` - -The expressed semantics are: synchronise the full amount of historical data and subsequent arrivals of realtime data from this database instance to the IoTDB instance with target 127.0.0.1:6667. - -**Note:** - -- SOURCE and PROCESSOR are optional, if no configuration parameters are filled in, the system will use the corresponding default implementation. -- The SINK is a mandatory configuration that needs to be declared in the CREATE PIPE statement for configuring purposes. -- The SINK exhibits self-reusability. For different tasks, if their SINK possesses identical KV properties (where the value corresponds to every key), **the system will ultimately create only one instance of the SINK** to achieve resource reuse for connections. - - - For example, there are the following pipe1, pipe2 task declarations: - - ```sql - CREATE PIPE pipe1 - WITH SINK ( - 'sink' = 'iotdb-thrift-sink', - 'sink.thrift.host' = 'localhost', - 'sink.thrift.port' = '9999', - ) - - CREATE PIPE pipe2 - WITH SINK ( - 'sink' = 'iotdb-thrift-sink', - 'sink.thrift.port' = '9999', - 'sink.thrift.host' = 'localhost', - ) - ``` - -- Since they have identical SINK declarations (**even if the order of some properties is different**), the framework will automatically reuse the SINK declared by them. Hence, the SINK instances for pipe1 and pipe2 will be the same. -- Please note that we should avoid constructing application scenarios that involve data cycle sync (as it can result in an infinite loop): - -- IoTDB A -> IoTDB B -> IoTDB A -- IoTDB A -> IoTDB A - -### Start Stream Processing Task - -After the successful execution of the CREATE PIPE statement, task-related instances will be created. However, the overall task's running status will be set to STOPPED(V1.3.0), meaning the task will not immediately process data. In version 1.3.1 and later, the status of the task will be set to RUNNING after CREATE. - -You can use the START PIPE statement to make the stream processing task start processing data: -```sql -START PIPE -``` - -### Stop Stream Processing Task - -Use the STOP PIPE statement to stop the stream processing task from processing data: - -```sql -STOP PIPE -``` - -### Delete Stream Processing Task - -If a stream processing task is in the RUNNING state, you can use the DROP PIPE statement to stop it and delete the entire task: - -```sql -DROP PIPE -``` - -Before deleting a stream processing task, there is no need to execute the STOP operation. - -### Show Stream Processing Task - -Use the SHOW PIPES statement to view all stream processing tasks: -```sql -SHOW PIPES -``` - -The query results are as follows: - -```sql -+-----------+-----------------------+-------+----------+-------------+--------+----------------+ -| ID| CreationTime | State|PipeSource|PipeProcessor|PipeSink|ExceptionMessage| -+-----------+-----------------------+-------+----------+-------------+--------+----------------+ -|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| {}| -+-----------+-----------------------+-------+----------+-------------+--------+----------------+ -|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| -+-----------+-----------------------+-------+----------+-------------+--------+----------------+ -``` - -You can use `` to specify the status of a stream processing task you want to see: -```sql -SHOW PIPE -``` - -Additionally, the WHERE clause can be used to determine if the Pipe Sink used by a specific \ is being reused. - -```sql -SHOW PIPES -WHERE SINK USED BY -``` - -### Stream Processing Task Running Status Migration - -A stream processing task status can transition through several states during the lifecycle of a data synchronization pipe: - -- **RUNNING:** The pipe is actively processing data - - After the successful creation of a pipe, its initial state is set to RUNNING (V1.3.1+) -- **STOPPED:** The pipe is in a stopped state. It can have the following possibilities: - - After the successful creation of a pipe, its initial state is set to RUNNING (V1.3.0) - - The user manually pauses a pipe that is in normal running state, transitioning its status from RUNNING to STOPPED - - If a pipe encounters an unrecoverable error during execution, its status automatically changes from RUNNING to STOPPED. -- **DROPPED:** The pipe is permanently deleted - -The following diagram illustrates the different states and their transitions: - -![state migration diagram](/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) - -## Authority Management - -### Stream Processing Task - -| Authority Name | Description | -|----------------|---------------------------------| -| USE_PIPE | Register task,path-independent | -| USE_PIPE | Start task,path-independent | -| USE_PIPE | Stop task,path-independent | -| USE_PIPE | Uninstall task,path-independent | -| USE_PIPE | Query task,path-independent | -### Stream Processing Task Plugin - - -| Authority Name | Description | -|----------------|---------------------------------------------------------| -| USE_PIPE | Register stream processing task plugin,path-independent | -| USE_PIPE | Delete stream processing task plugin,path-independent | -| USE_PIPE | Query stream processing task plugin,path-independent | - -## Configure Parameters - -In iotdb-common.properties : - -V1.3.0: -```Properties -#################### -### Pipe Configuration -#################### - -# Uncomment the following field to configure the pipe lib directory. -# For Windows platform -# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is -# absolute. Otherwise, it is relative. -# pipe_lib_dir=ext\\pipe -# For Linux platform -# If its prefix is "/", then the path is absolute. Otherwise, it is relative. -# pipe_lib_dir=ext/pipe - -# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. -# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). -# pipe_subtask_executor_max_thread_num=5 - -# The connection timeout (in milliseconds) for the thrift client. -# pipe_connector_timeout_ms=900000 - -# The maximum number of selectors that can be used in the async connector. -# pipe_async_connector_selector_number=1 - -# The core number of clients that can be used in the async connector. -# pipe_async_connector_core_client_number=8 - -# The maximum number of clients that can be used in the async connector. -# pipe_async_connector_max_client_number=16 -``` - -V1.3.1+: -```Properties -#################### -### Pipe Configuration -#################### - -# Uncomment the following field to configure the pipe lib directory. -# For Windows platform -# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is -# absolute. Otherwise, it is relative. -# pipe_lib_dir=ext\\pipe -# For Linux platform -# If its prefix is "/", then the path is absolute. Otherwise, it is relative. -# pipe_lib_dir=ext/pipe - -# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. -# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). -# pipe_subtask_executor_max_thread_num=5 - -# The connection timeout (in milliseconds) for the thrift client. -# pipe_sink_timeout_ms=900000 - -# The maximum number of selectors that can be used in the sink. -# Recommend to set this value to less than or equal to pipe_sink_max_client_number. -# pipe_sink_selector_number=4 - -# The maximum number of clients that can be used in the sink. -# pipe_sink_max_client_number=16 -``` \ No newline at end of file diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Streaming_timecho.md b/src/UserGuide/V1.3.0-2/User-Manual/Streaming_timecho.md deleted file mode 100644 index e3c6a0b12..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Streaming_timecho.md +++ /dev/null @@ -1,854 +0,0 @@ - - -# Stream Processing - -The IoTDB stream processing framework allows users to implement customized stream processing logic, which can monitor and capture storage engine changes, transform changed data, and push transformed data outward. - -We call a data flow processing task a Pipe. A stream processing task (Pipe) contains three subtasks: - -- Source task -- Processor task -- Sink task - -The stream processing framework allows users to customize the processing logic of three subtasks using Java language and process data in a UDF-like manner. -In a Pipe, the above three subtasks are executed by three plugins respectively, and the data will be processed by these three plugins in turn: -Pipe Source is used to extract data, Pipe Processor is used to process data, Pipe Sink is used to send data, and the final data will be sent to an external system. - -**The model of the Pipe task is as follows:** - -![pipe.png](/img/1706778988482.jpg) - -Describing a data flow processing task essentially describes the properties of Pipe Source, Pipe Processor and Pipe Sink plugins. -Users can declaratively configure the specific attributes of the three subtasks through SQL statements, and achieve flexible data ETL capabilities by combining different attributes. - -Using the stream processing framework, a complete data link can be built to meet the needs of end-side-cloud synchronization, off-site disaster recovery, and read-write load sub-library*. - -## Custom stream processing plugin development - -### Programming development dependencies - -It is recommended to use maven to build the project and add the following dependencies in `pom.xml`. Please be careful to select the same dependency version as the IoTDB server version. - -```xml - - org.apache.iotdb - pipe-api - 1.3.1 - provided - -``` - -### Event-driven programming model - -The user programming interface design of the stream processing plugin refers to the general design concept of the event-driven programming model. Events are data abstractions in the user programming interface, and the programming interface is decoupled from the specific execution method. It only needs to focus on describing the processing method expected by the system after the event (data) reaches the system. - -In the user programming interface of the stream processing plugin, events are an abstraction of database data writing operations. The event is captured by the stand-alone stream processing engine, and is passed to the PipeSource plugin, PipeProcessor plugin, and PipeSink plugin in sequence according to the three-stage stream processing process, and triggers the execution of user logic in the three plugins in turn. - -In order to take into account the low latency of stream processing in low load scenarios on the end side and the high throughput of stream processing in high load scenarios on the end side, the stream processing engine will dynamically select processing objects in the operation logs and data files. Therefore, user programming of stream processing The interface requires users to provide processing logic for the following two types of events: operation log writing event TabletInsertionEvent and data file writing event TsFileInsertionEvent. - -#### **Operation log writing event (TabletInsertionEvent)** - -The operation log write event (TabletInsertionEvent) is a high-level data abstraction for user write requests. It provides users with the ability to manipulate the underlying data of write requests by providing a unified operation interface. - -For different database deployment methods, the underlying storage structures corresponding to operation log writing events are different. For stand-alone deployment scenarios, the operation log writing event is an encapsulation of write-ahead log (WAL) entries; for a distributed deployment scenario, the operation log writing event is an encapsulation of a single node consensus protocol operation log entry. - -For write operations generated by different write request interfaces in the database, the data structure of the request structure corresponding to the operation log write event is also different. IoTDB provides numerous writing interfaces such as InsertRecord, InsertRecords, InsertTablet, InsertTablets, etc. Each writing request uses a completely different serialization method, and the generated binary entries are also different. - -The existence of operation log writing events provides users with a unified view of data operations, which shields the implementation differences of the underlying data structure, greatly reduces the user's programming threshold, and improves the ease of use of the function. - -```java -/** TabletInsertionEvent is used to define the event of data insertion. */ -public interface TabletInsertionEvent extends Event { - - /** - * The consumer processes the data row by row and collects the results by RowCollector. - * - * @return {@code Iterable} a list of new TabletInsertionEvent contains the - * results collected by the RowCollector - */ - Iterable processRowByRow(BiConsumer consumer); - - /** - * The consumer processes the Tablet directly and collects the results by RowCollector. - * - * @return {@code Iterable} a list of new TabletInsertionEvent contains the - * results collected by the RowCollector - */ - Iterable processTablet(BiConsumer consumer); -} -``` - -#### **Data file writing event (TsFileInsertionEvent)** - -The data file writing event (TsFileInsertionEvent) is a high-level abstraction of the database file writing operation. It is a data collection of several operation log writing events (TabletInsertionEvent). - -The storage engine of IoTDB is LSM structured. When data is written, the writing operation will first be placed into a log-structured file, and the written data will be stored in the memory at the same time. When the memory reaches the control upper limit, the disk flushing behavior will be triggered, that is, the data in the memory will be converted into a database file, and the previously prewritten operation log will be deleted. When the data in the memory is converted into the data in the database file, it will undergo two compression processes: encoding compression and general compression. Therefore, the data in the database file takes up less space than the original data in the memory. - -In extreme network conditions, directly transmitting data files is more economical than transmitting data writing operations. It will occupy lower network bandwidth and achieve faster transmission speeds. Of course, there is no free lunch. Computing and processing data in files requires additional file I/O costs compared to directly computing and processing data in memory. However, it is precisely the existence of two structures, disk data files and memory write operations, with their own advantages and disadvantages, that gives the system the opportunity to make dynamic trade-offs and adjustments. It is based on this observation that data files are introduced into the plugin's event model. Write event. - -To sum up, the data file writing event appears in the event stream of the stream processing plugin, and there are two situations: - -(1) Historical data extraction: Before a stream processing task starts, all written data that has been placed on the disk will exist in the form of TsFile. After a stream processing task starts, when collecting historical data, the historical data will be abstracted using TsFileInsertionEvent; - -(2) Real-time data extraction: When a stream processing task is in progress, when the real-time processing speed of operation log write events in the data stream is slower than the write request speed, after a certain progress, the operation log write events that cannot be processed in the future will be persisted. to disk and exists in the form of TsFile. After this data is extracted by the stream processing engine, TsFileInsertionEvent will be used as an abstraction. - -```java -/** - * TsFileInsertionEvent is used to define the event of writing TsFile. Event data stores in disks, - * which is compressed and encoded, and requires IO cost for computational processing. - */ -public interface TsFileInsertionEvent extends Event { - - /** - * The method is used to convert the TsFileInsertionEvent into several TabletInsertionEvents. - * - * @return {@code Iterable} the list of TabletInsertionEvent - */ - Iterable toTabletInsertionEvents(); -} -``` - -### Custom stream processing plugin programming interface definition - -Based on the custom stream processing plugin programming interface, users can easily write data extraction plugins, data processing plugins and data sending plugins, so that the stream processing function can be flexibly adapted to various industrial scenarios. - -#### Data extraction plugin interface - -Data extraction is the first stage of the three stages of stream processing data from data extraction to data sending. The data extraction plugin (PipeSource) is the bridge between the stream processing engine and the storage engine. It monitors the behavior of the storage engine, -Capture various data write events. - -```java -/** - * PipeSource - * - *

PipeSource is responsible for capturing events from sources. - * - *

Various data sources can be supported by implementing different PipeSource classes. - * - *

The lifecycle of a PipeSource is as follows: - * - *

    - *
  • When a collaboration task is created, the KV pairs of `WITH Source` clause in SQL are - * parsed and the validation method {@link PipeSource#validate(PipeParameterValidator)} will - * be called to validate the parameters. - *
  • Before the collaboration task starts, the method {@link - * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} will be called to - * configure the runtime behavior of the PipeSource. - *
  • Then the method {@link PipeSource#start()} will be called to start the PipeSource. - *
  • While the collaboration task is in progress, the method {@link PipeSource#supply()} will be - * called to capture events from sources and then the events will be passed to the - * PipeProcessor. - *
  • The method {@link PipeSource#close()} will be called when the collaboration task is - * cancelled (the `DROP PIPE` command is executed). - *
- */ -public interface PipeSource extends PipePlugin { - - /** - * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link - * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. - * - * @param validator the validator used to validate {@link PipeParameters} - * @throws Exception if any parameter is not valid - */ - void validate(PipeParameterValidator validator) throws Exception; - - /** - * This method is mainly used to customize PipeSource. In this method, the user can do the - * following things: - * - *
    - *
  • Use PipeParameters to parse key-value pair attributes entered by the user. - *
  • Set the running configurations in PipeSourceRuntimeConfiguration. - *
- * - *

This method is called after the method {@link PipeSource#validate(PipeParameterValidator)} - * is called. - * - * @param parameters used to parse the input parameters entered by the user - * @param configuration used to set the required properties of the running PipeSource - * @throws Exception the user can throw errors if necessary - */ - void customize(PipeParameters parameters, PipeSourceRuntimeConfiguration configuration) - throws Exception; - - /** - * Start the Source. After this method is called, events should be ready to be supplied by - * {@link PipeSource#supply()}. This method is called after {@link - * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. - * - * @throws Exception the user can throw errors if necessary - */ - void start() throws Exception; - - /** - * Supply single event from the Source and the caller will send the event to the processor. - * This method is called after {@link PipeSource#start()} is called. - * - * @return the event to be supplied. the event may be null if the Source has no more events at - * the moment, but the Source is still running for more events. - * @throws Exception the user can throw errors if necessary - */ - Event supply() throws Exception; -} -``` - -#### Data processing plugin interface - -Data processing is the second stage of the three stages of stream processing data from data extraction to data sending. The data processing plugin (PipeProcessor) is mainly used to filter and transform the data captured by the data extraction plugin (PipeSource). -various events. - -```java -/** - * PipeProcessor - * - *

PipeProcessor is used to filter and transform the Event formed by the PipeSource. - * - *

The lifecycle of a PipeProcessor is as follows: - * - *

    - *
  • When a collaboration task is created, the KV pairs of `WITH PROCESSOR` clause in SQL are - * parsed and the validation method {@link PipeProcessor#validate(PipeParameterValidator)} - * will be called to validate the parameters. - *
  • Before the collaboration task starts, the method {@link - * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} will be called - * to configure the runtime behavior of the PipeProcessor. - *
  • While the collaboration task is in progress: - *
      - *
    • PipeSource captures the events and wraps them into three types of Event instances. - *
    • PipeProcessor processes the event and then passes them to the PipeSource. The - * following 3 methods will be called: {@link - * PipeProcessor#process(TabletInsertionEvent, EventCollector)}, {@link - * PipeProcessor#process(TsFileInsertionEvent, EventCollector)} and {@link - * PipeProcessor#process(Event, EventCollector)}. - *
    • PipeSink serializes the events into binaries and send them to sinks. - *
    - *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link - * PipeProcessor#close() } method will be called. - *
- */ -public interface PipeProcessor extends PipePlugin { - - /** - * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link - * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} is called. - * - * @param validator the validator used to validate {@link PipeParameters} - * @throws Exception if any parameter is not valid - */ - void validate(PipeParameterValidator validator) throws Exception; - - /** - * This method is mainly used to customize PipeProcessor. In this method, the user can do the - * following things: - * - *
    - *
  • Use PipeParameters to parse key-value pair attributes entered by the user. - *
  • Set the running configurations in PipeProcessorRuntimeConfiguration. - *
- * - *

This method is called after the method {@link - * PipeProcessor#validate(PipeParameterValidator)} is called and before the beginning of the - * events processing. - * - * @param parameters used to parse the input parameters entered by the user - * @param configuration used to set the required properties of the running PipeProcessor - * @throws Exception the user can throw errors if necessary - */ - void customize(PipeParameters parameters, PipeProcessorRuntimeConfiguration configuration) - throws Exception; - - /** - * This method is called to process the TabletInsertionEvent. - * - * @param tabletInsertionEvent TabletInsertionEvent to be processed - * @param eventCollector used to collect result events after processing - * @throws Exception the user can throw errors if necessary - */ - void process(TabletInsertionEvent tabletInsertionEvent, EventCollector eventCollector) - throws Exception; - - /** - * This method is called to process the TsFileInsertionEvent. - * - * @param tsFileInsertionEvent TsFileInsertionEvent to be processed - * @param eventCollector used to collect result events after processing - * @throws Exception the user can throw errors if necessary - */ - default void process(TsFileInsertionEvent tsFileInsertionEvent, EventCollector eventCollector) - throws Exception { - for (final TabletInsertionEvent tabletInsertionEvent : - tsFileInsertionEvent.toTabletInsertionEvents()) { - process(tabletInsertionEvent, eventCollector); - } - } - - /** - * This method is called to process the Event. - * - * @param event Event to be processed - * @param eventCollector used to collect result events after processing - * @throws Exception the user can throw errors if necessary - */ - void process(Event event, EventCollector eventCollector) throws Exception; -} -``` - -#### Data sending plugin interface - -Data sending is the third stage of the three stages of stream processing data from data extraction to data sending. The data sending plugin (PipeSink) is mainly used to send data processed by the data processing plugin (PipeProcessor). -Various events, it serves as the network implementation layer of the stream processing framework, and the interface should allow access to multiple real-time communication protocols and multiple sinks. - -```java -/** - * PipeSink - * - *

PipeSink is responsible for sending events to sinks. - * - *

Various network protocols can be supported by implementing different PipeSink classes. - * - *

The lifecycle of a PipeSink is as follows: - * - *

    - *
  • When a collaboration task is created, the KV pairs of `WITH SINK` clause in SQL are - * parsed and the validation method {@link PipeSink#validate(PipeParameterValidator)} will be - * called to validate the parameters. - *
  • Before the collaboration task starts, the method {@link PipeSink#customize(PipeParameters, - * PipeSinkRuntimeConfiguration)} will be called to configure the runtime behavior of the - * PipeSink and the method {@link PipeSink#handshake()} will be called to create a connection - * with sink. - *
  • While the collaboration task is in progress: - *
      - *
    • PipeSource captures the events and wraps them into three types of Event instances. - *
    • PipeProcessor processes the event and then passes them to the PipeSink. - *
    • PipeSink serializes the events into binaries and send them to sinks. The following 3 - * methods will be called: {@link PipeSink#transfer(TabletInsertionEvent)}, {@link - * PipeSink#transfer(TsFileInsertionEvent)} and {@link PipeSink#transfer(Event)}. - *
    - *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link - * PipeSink#close() } method will be called. - *
- * - *

In addition, the method {@link PipeSink#heartbeat()} will be called periodically to check - * whether the connection with sink is still alive. The method {@link PipeSink#handshake()} will be - * called to create a new connection with the sink when the method {@link PipeSink#heartbeat()} - * throws exceptions. - */ -public interface PipeSink extends PipePlugin { - - /** - * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link - * PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called. - * - * @param validator the validator used to validate {@link PipeParameters} - * @throws Exception if any parameter is not valid - */ - void validate(PipeParameterValidator validator) throws Exception; - - /** - * This method is mainly used to customize PipeSink. In this method, the user can do the following - * things: - * - *

    - *
  • Use PipeParameters to parse key-value pair attributes entered by the user. - *
  • Set the running configurations in PipeSinkRuntimeConfiguration. - *
- * - *

This method is called after the method {@link PipeSink#validate(PipeParameterValidator)} is - * called and before the method {@link PipeSink#handshake()} is called. - * - * @param parameters used to parse the input parameters entered by the user - * @param configuration used to set the required properties of the running PipeSink - * @throws Exception the user can throw errors if necessary - */ - void customize(PipeParameters parameters, PipeSinkRuntimeConfiguration configuration) - throws Exception; - - /** - * This method is used to create a connection with sink. This method will be called after the - * method {@link PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called or - * will be called when the method {@link PipeSink#heartbeat()} throws exceptions. - * - * @throws Exception if the connection is failed to be created - */ - void handshake() throws Exception; - - /** - * This method will be called periodically to check whether the connection with sink is still - * alive. - * - * @throws Exception if the connection dies - */ - void heartbeat() throws Exception; - - /** - * This method is used to transfer the TabletInsertionEvent. - * - * @param tabletInsertionEvent TabletInsertionEvent to be transferred - * @throws PipeConnectionException if the connection is broken - * @throws Exception the user can throw errors if necessary - */ - void transfer(TabletInsertionEvent tabletInsertionEvent) throws Exception; - - /** - * This method is used to transfer the TsFileInsertionEvent. - * - * @param tsFileInsertionEvent TsFileInsertionEvent to be transferred - * @throws PipeConnectionException if the connection is broken - * @throws Exception the user can throw errors if necessary - */ - default void transfer(TsFileInsertionEvent tsFileInsertionEvent) throws Exception { - try { - for (final TabletInsertionEvent tabletInsertionEvent : - tsFileInsertionEvent.toTabletInsertionEvents()) { - transfer(tabletInsertionEvent); - } - } finally { - tsFileInsertionEvent.close(); - } - } - - /** - * This method is used to transfer the generic events, including HeartbeatEvent. - * - * @param event Event to be transferred - * @throws PipeConnectionException if the connection is broken - * @throws Exception the user can throw errors if necessary - */ - void transfer(Event event) throws Exception; -} -``` - -## Custom stream processing plugin management - -In order to ensure the flexibility and ease of use of user-defined plugins in actual production, the system also needs to provide the ability to dynamically and uniformly manage plugins. -The stream processing plugin management statements introduced in this chapter provide an entry point for dynamic unified management of plugins. - -### Load plugin statement - -In IoTDB, if you want to dynamically load a user-defined plugin in the system, you first need to implement a specific plugin class based on PipeSource, PipeProcessor or PipeSink. -Then the plugin class needs to be compiled and packaged into a jar executable file, and finally the plugin is loaded into IoTDB using the management statement for loading the plugin. - -The syntax of the management statement for loading the plugin is shown in the figure. - -```sql -CREATE PIPEPLUGIN -AS -USING -``` - -Example: If you implement a data processing plugin named edu.tsinghua.iotdb.pipe.ExampleProcessor, and the packaged jar package is pipe-plugin.jar, you want to use this plugin in the stream processing engine, and mark the plugin as example. There are two ways to use the plugin package, one is to upload to the URI server, and the other is to upload to the local directory of the cluster. - -Method 1: Upload to the URI server - -Preparation: To register in this way, you need to upload the JAR package to the URI server in advance and ensure that the IoTDB instance that executes the registration statement can access the URI server. For example https://example.com:8080/iotdb/pipe-plugin.jar . - -SQL: - -```sql -CREATE PIPEPLUGIN example -AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' -USING URI -``` - -Method 2: Upload the data to the local directory of the cluster - -Preparation: To register in this way, you need to place the JAR package in any path on the machine where the DataNode node is located, and we recommend that you place the JAR package in the /ext/pipe directory of the IoTDB installation path (the installation package is already in the installation package, so you do not need to create a new one). For example: iotdb-1.x.x-bin/ext/pipe/pipe-plugin.jar. **(Note: If you are using a cluster, you will need to place the JAR package under the same path as the machine where each DataNode node is located)** - -SQL: - -```sql -CREATE PIPEPLUGIN example -AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' -USING URI -``` - -### Delete plugin statement - -When the user no longer wants to use a plugin and needs to uninstall the plugin from the system, he can use the delete plugin statement as shown in the figure. - -```sql -DROP PIPEPLUGIN -``` - -### View plugin statements - -Users can also view plugins in the system on demand. View the statement of the plugin as shown in the figure. -```sql -SHOW PIPEPLUGINS -``` - -## System preset stream processing plugin - -### Pre-built Source Plugin - -#### iotdb-source - -Function: Extract historical or realtime data inside IoTDB into pipe. - - -| key | value | value range | required or optional with default | -|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|-----------------------------------| -| source | iotdb-source | String: iotdb-source | required | -| source.pattern | path prefix for filtering time series | String: any time series prefix | optional: root | -| source.history.start-time | start of synchronizing historical data event time,including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | -| source.history.end-time | end of synchronizing historical data event time,including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | -| source.forwarding-pipe-requests | Whether to forward data written by another Pipe (usually Data Sync) | Boolean: true, false | optional:true | -| start-time(V1.3.1+) | start of synchronizing all data event time,including start-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | -| end-time(V1.3.1+) | end of synchronizing all data event time,including end-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | -| source.realtime.mode | Extraction mode for real-time data | String: hybrid, stream, batch | optional:hybrid | -| source.forwarding-pipe-requests | Whether to forward data written by another Pipe (usually Data Sync) | Boolean: true, false | optional:true | - -> 🚫 **source.pattern Parameter Description** -> -> * Pattern should use backquotes to modify illegal characters or illegal path nodes, for example, if you want to filter root.\`a@b\` or root.\`123\`, you should set the pattern to root.\`a@b\` or root.\`123\`(Refer specifically to [Timing of single and double quotes and backquotes](https://iotdb.apache.org/Download/)) -> * In the underlying implementation, when pattern is detected as root (default value) or a database name, synchronization efficiency is higher, and any other format will reduce performance. -> * The path prefix does not need to form a complete path. For example, when creating a pipe with the parameter 'source.pattern'='root.aligned.1': - > - > * root.aligned.1TS - > * root.aligned.1TS.\`1\` -> * root.aligned.100TS - > - > the data will be synchronized; - > - > * root.aligned.\`1\` -> * root.aligned.\`123\` - > - > the data will not be synchronized. - -> ❗️**start-time, end-time parameter description of source** -> -> * start-time, end-time should be in ISO format, such as 2011-12-03T10:15:30 or 2011-12-03T10:15:30+01:00. However, version 1.3.1+ supports timeStamp format like 1706704494000. - -> ✅ **A piece of data from production to IoTDB contains two key concepts of time** -> -> * **event time:** The time when the data is actually produced (or the generation time assigned to the data by the data production system, which is the time item in the data point), also called event time. -> * **arrival time:** The time when data arrives in the IoTDB system. -> -> The out-of-order data we often refer to refers to data whose **event time** is far behind the current system time (or the maximum **event time** that has been dropped) when the data arrives. On the other hand, whether it is out-of-order data or sequential data, as long as they arrive newly in the system, their **arrival time** will increase with the order in which the data arrives at IoTDB. - -> 💎 **The work of iotdb-source can be split into two stages** -> -> 1. Historical data extraction: All data with **arrival time** < **current system time** when creating the pipe is called historical data -> 2. Realtime data extraction: All data with **arrival time** >= **current system time** when the pipe is created is called realtime data -> -> The historical data transmission phase and the realtime data transmission phase are executed serially. Only when the historical data transmission phase is completed, the realtime data transmission phase is executed.** - -> 📌 **source.realtime.mode: Data extraction mode** -> -> * log: In this mode, the task only uses the operation log for data processing and sending -> * file: In this mode, the task only uses data files for data processing and sending. -> * hybrid: This mode takes into account the characteristics of low latency but low throughput when sending data one by one in the operation log, and the characteristics of high throughput but high latency when sending in batches of data files. It can automatically operate under different write loads. Switch the appropriate data extraction method. First, adopt the data extraction method based on operation logs to ensure low sending delay. When a data backlog occurs, it will automatically switch to the data extraction method based on data files to ensure high sending throughput. When the backlog is eliminated, it will automatically switch back to the data extraction method based on data files. The data extraction method of the operation log avoids the problem of difficulty in balancing data sending delay or throughput using a single data extraction algorithm. - -> 🍕 **source.forwarding-pipe-requests: Whether to allow forwarding data transmitted from another pipe** -> -> * If you want to use pipe to build data synchronization of A -> B -> C, then the pipe of B -> C needs to set this parameter to true, so that the data written by A to B through the pipe in A -> B can be forwarded correctly. to C -> * If you want to use pipe to build two-way data synchronization (dual-active) of A \<-> B, then the pipes of A -> B and B -> A need to set this parameter to false, otherwise the data will be endless. inter-cluster round-robin forwarding - -### Preset processor plugin - -#### do-nothing-processor - -Function: No processing is done on the events passed in by the source. - - -| key | value | value range | required or optional with default | -|-----------|----------------------|------------------------------|-----------------------------------| -| processor | do-nothing-processor | String: do-nothing-processor | required | - -### Preset sink plugin - -#### do-nothing-sink - -Function: No processing is done on the events passed in by the processor. - -| key | value | value range | required or optional with default | -|------|-----------------|-------------------------|-----------------------------------| -| sink | do-nothing-sink | String: do-nothing-sink | required | - -## Stream processing task management - -### Create a stream processing task - -Use the `CREATE PIPE` statement to create a stream processing task. Taking the creation of a data synchronization stream processing task as an example, the sample SQL statement is as follows: - -```sql -CREATE PIPE -- PipeId is the name that uniquely identifies the sync task -WITH SOURCE ( - -- Default IoTDB Data Extraction Plugin - 'source' = 'iotdb-source', - -- Path prefix, only data that can match the path prefix will be extracted for subsequent processing and delivery - 'source.pattern' = 'root.timecho', - -- Whether to extract historical data - 'source.history.enable' = 'true', - -- Describes the time range of the historical data being extracted, indicating the earliest possible time - 'source.history.start-time' = '2011.12.03T10:15:30+01:00', - -- Describes the time range of the extracted historical data, indicating the latest time - 'source.history.end-time' = '2022.12.03T10:15:30+01:00', - -- Whether to extract realtime data - 'source.realtime.enable' = 'true', -) -WITH PROCESSOR ( - -- Default data processing plugin, means no processing - 'processor' = 'do-nothing-processor', -) -WITH SINK ( - -- IoTDB data sending plugin with target IoTDB - 'sink' = 'iotdb-thrift-sink', - -- Data service for one of the DataNode nodes on the target IoTDB ip - 'sink.ip' = '127.0.0.1', - -- Data service port of one of the DataNode nodes of the target IoTDB - 'sink.port' = '6667', -) -``` - -**When creating a stream processing task, you need to configure the PipeId and the parameters of the three plugin parts:** - -| Configuration | Description | Required or not | Default implementation | Default implementation description | Default implementation description | -|---------------|-----------------------------------------------------------------------------------------------------|---------------------------------|------------------------|---------------------------------------------------------------------------------------------------------------------------|------------------------------------| -| PipeId | A globally unique name that identifies a stream processing | Required | - | - | - | -| source | Pipe Source plugin, responsible for extracting stream processing data at the bottom of the database | Optional | iotdb-source | Integrate the full historical data of the database and subsequent real-time data arriving into the stream processing task | No | -| processor | Pipe Processor plugin, responsible for processing data | Optional | do-nothing-processor | Does not do any processing on the incoming data | Yes | -| sink | Pipe Sink plugin, responsible for sending data | Required | - | - | Yes | - -In the example, the iotdb-source, do-nothing-processor and iotdb-thrift-sink plugins are used to build the data flow processing task. IoTDB also has other built-in stream processing plugins, **please check the "System Preset Stream Processing plugin" section**. - -**A simplest example of the CREATE PIPE statement is as follows:** - -```sql -CREATE PIPE -- PipeId is a name that uniquely identifies the stream processing task -WITH SINK ( - -- IoTDB data sending plugin, the target is IoTDB - 'sink' = 'iotdb-thrift-sink', - --The data service IP of one of the DataNode nodes in the target IoTDB - 'sink.ip' = '127.0.0.1', - -- The data service port of one of the DataNode nodes in the target IoTDB - 'sink.port' = '6667', -) -``` - -The semantics expressed are: synchronize all historical data in this database instance and subsequent real-time data arriving to the IoTDB instance with the target 127.0.0.1:6667. - -**Notice:** - -- SOURCE and PROCESSOR are optional configurations. If you do not fill in the configuration parameters, the system will use the corresponding default implementation. -- SINK is a required configuration and needs to be configured declaratively in the CREATE PIPE statement -- SINK has self-reuse capability. For different stream processing tasks, if their SINKs have the same KV attributes (the keys corresponding to the values of all attributes are the same), then the system will only create one SINK instance in the end to realize the duplication of connection resources. - - - For example, there are the following declarations of two stream processing tasks, pipe1 and pipe2: - - ```sql - CREATE PIPE pipe1 - WITH SINK ( - 'sink' = 'iotdb-thrift-sink', - 'sink.ip' = 'localhost', - 'sink.port' = '9999', - ) - - CREATE PIPE pipe2 - WITH SINK ( - 'sink' = 'iotdb-thrift-sink', - 'sink.port' = '9999', - 'sink.ip' = 'localhost', - ) - ``` - -- Because their declarations of SINK are exactly the same (**even if the order of declaration of some attributes is different**), the framework will automatically reuse the SINKs they declared, and ultimately the SINKs of pipe1 and pipe2 will be the same instance. . -- When the source is the default iotdb-source, and source.forwarding-pipe-requests is the default value true, please do not build an application scenario that includes data cycle synchronization (it will cause an infinite loop): - - - IoTDB A -> IoTDB B -> IoTDB A - - IoTDB A -> IoTDB A - -### Start the stream processing task - -After the CREATE PIPE statement is successfully executed, the stream processing task-related instance will be created, but the running status of the entire stream processing task will be set to STOPPED(V1.3.0), that is, the stream processing task will not process data immediately. In version 1.3.1 and later, the status of the task will be set to RUNNING after CREATE. - -You can use the START PIPE statement to cause a stream processing task to start processing data: - -```sql -START PIPE -``` - -### Stop the stream processing task - -Use the STOP PIPE statement to stop the stream processing task from processing data: - -```sql -STOP PIPE -``` - -### Delete stream processing tasks - -Use the DROP PIPE statement to stop the stream processing task from processing data (when the stream processing task status is RUNNING), and then delete the entire stream processing task: - -```sql -DROP PIPE -``` - -Users do not need to perform a STOP operation before deleting the stream processing task. - -### Display stream processing tasks - -Use the SHOW PIPES statement to view all stream processing tasks: - -```sql -SHOW PIPES -``` - -The query results are as follows: - -```sql -+-----------+-----------------------+-------+----------+-------------+--------+----------------+ -| ID| CreationTime| State|PipeSource|PipeProcessor|PipeSink|ExceptionMessage| -+-----------+-----------------------+-------+----------+-------------+--------+----------------+ -|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| {}| -+-----------+-----------------------+-------+----------+-------------+--------+----------------+ -|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| -+-----------+-----------------------+-------+----------+-------------+--------+----------------+ -``` - -You can use `` to specify the status of a stream processing task you want to see: - -```sql -SHOW PIPE -``` - -You can also use the where clause to determine whether the Pipe Sink used by a certain \ is reused. - -```sql -SHOW PIPES -WHERE SINK USED BY -``` - -### Stream processing task running status migration - -A stream processing pipe will pass through various states during its managed life cycle: - -- **RUNNING:** pipe is working properly - - When a pipe is successfully created, its initial state is RUNNING.(V1.3.1+) -- **STOPPED:** The pipe is stopped. When the pipeline is in this state, there are several possibilities: - - When a pipe is successfully created, its initial state is STOPPED.(V1.3.0) - - The user manually pauses a pipe that is in normal running status, and its status will passively change from RUNNING to STOPPED. - - When an unrecoverable error occurs during the running of a pipe, its status will automatically change from RUNNING to STOPPED -- **DROPPED:** The pipe task was permanently deleted - -The following diagram shows all states and state transitions: - -![State migration diagram](/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) - -## authority management - -### Stream processing tasks - - -| Permission name | Description | -|-----------------|------------------------------------------------------------| -| USE_PIPE | Register a stream processing task. The path is irrelevant. | -| USE_PIPE | Start the stream processing task. The path is irrelevant. | -| USE_PIPE | Stop the stream processing task. The path is irrelevant. | -| USE_PIPE | Offload stream processing tasks. The path is irrelevant. | -| USE_PIPE | Query stream processing tasks. The path is irrelevant. | - -### Stream processing task plugin - - -| Permission name | Description | -|-----------------|----------------------------------------------------------------------| -| USE_PIPE | Register stream processing task plugin. The path is irrelevant. | -| USE_PIPE | Uninstall the stream processing task plugin. The path is irrelevant. | -| USE_PIPE | Query stream processing task plugin. The path is irrelevant. | - -## Configuration parameters - -In iotdb-common.properties: - -V1.3.0+: -```Properties -#################### -### Pipe Configuration -#################### - -# Uncomment the following field to configure the pipe lib directory. -# For Windows platform -# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is -# absolute. Otherwise, it is relative. -# pipe_lib_dir=ext\\pipe -# For Linux platform -# If its prefix is "/", then the path is absolute. Otherwise, it is relative. -# pipe_lib_dir=ext/pipe - -# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. -# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). -# pipe_subtask_executor_max_thread_num=5 - -# The connection timeout (in milliseconds) for the thrift client. -# pipe_connector_timeout_ms=900000 - -# The maximum number of selectors that can be used in the async connector. -# pipe_async_connector_selector_number=1 - -# The core number of clients that can be used in the async connector. -# pipe_async_connector_core_client_number=8 - -# The maximum number of clients that can be used in the async connector. -# pipe_async_connector_max_client_number=16 - -# Whether to enable receiving pipe data through air gap. -# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. -# pipe_air_gap_receiver_enabled=false - -# The port for the server to receive pipe data through air gap. -# pipe_air_gap_receiver_port=9780 -``` - -V1.3.1+: -```Properties -# Uncomment the following field to configure the pipe lib directory. -# For Windows platform -# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is -# absolute. Otherwise, it is relative. -# pipe_lib_dir=ext\\pipe -# For Linux platform -# If its prefix is "/", then the path is absolute. Otherwise, it is relative. -# pipe_lib_dir=ext/pipe - -# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. -# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). -# pipe_subtask_executor_max_thread_num=5 - -# The connection timeout (in milliseconds) for the thrift client. -# pipe_sink_timeout_ms=900000 - -# The maximum number of selectors that can be used in the sink. -# Recommend to set this value to less than or equal to pipe_sink_max_client_number. -# pipe_sink_selector_number=4 - -# The maximum number of clients that can be used in the sink. -# pipe_sink_max_client_number=16 - -# Whether to enable receiving pipe data through air gap. -# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. -# pipe_air_gap_receiver_enabled=false - -# The port for the server to receive pipe data through air gap. -# pipe_air_gap_receiver_port=9780 -``` diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Syntax-Rule.md b/src/UserGuide/V1.3.0-2/User-Manual/Syntax-Rule.md deleted file mode 100644 index 30ac234cc..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Syntax-Rule.md +++ /dev/null @@ -1,293 +0,0 @@ - - -# Syntax Rule - -## Literal Values - -This section describes how to write literal values in IoTDB. These include strings, numbers, timestamp values, boolean values, and NULL. - -### String Literals - -in IoTDB, **A string is a sequence of bytes or characters, enclosed within either single quote (`'`) or double quote (`"`) characters.** Examples: - -```js -'a string' -"another string" -``` - -#### Usage Scenarios - -Usages of string literals: - -- Values of `TEXT` type data in `INSERT` or `SELECT` statements - - ```sql - # insert - insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') - insert into root.ln.wf02.wt02(timestamp,hardware) values(2, '\\') - - +-----------------------------+--------------------------+ - | Time|root.ln.wf02.wt02.hardware| - +-----------------------------+--------------------------+ - |1970-01-01T08:00:00.001+08:00| v1| - +-----------------------------+--------------------------+ - |1970-01-01T08:00:00.002+08:00| \\| - +-----------------------------+--------------------------+ - - # select - select code from root.sg1.d1 where code in ('string1', 'string2'); - ``` - -- Used in`LOAD` / `REMOVE` / `SETTLE` instructions to represent file path. - - ```sql - # load - LOAD 'examplePath' - - # remove - REMOVE 'examplePath' - - # SETTLE - SETTLE 'examplePath' - ``` - -- Password fields in user management statements - - ```sql - # write_pwd is the password - CREATE USER ln_write_user 'write_pwd' - ``` - -- Full Java class names in UDF and trigger management statements - - ```sql - # Trigger example. Full java class names after 'AS' should be string literals. - CREATE TRIGGER `alert-listener-sg1d1s1` - AFTER INSERT - ON root.sg1.d1.s1 - AS 'org.apache.iotdb.db.engine.trigger.example.AlertListener' - WITH ( - 'lo' = '0', - 'hi' = '100.0' - ) - - # UDF example. Full java class names after 'AS' should be string literals. - CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' - ``` - -- `AS` function provided by IoTDB can assign an alias to time series selected in query. Alias can be constant(including string) or identifier. - - ```sql - select s1 as 'temperature', s2 as 'speed' from root.ln.wf01.wt01; - - # Header of dataset - +-----------------------------+-----------|-----+ - | Time|temperature|speed| - +-----------------------------+-----------|-----+ - ``` - -- The key/value of an attribute can be String Literal and identifier, more details can be found at **key-value pair** part. - - -#### How to use quotation marks in String Literals - -There are several ways to include quote characters within a string: - - - `'` inside a string quoted with `"` needs no special treatment and need not be doubled or escaped. In the same way, `"` inside a string quoted with `'` needs no special treatment. - - A `'` inside a string quoted with `'` may be written as `''`. -- A `"` inside a string quoted with `"` may be written as `""`. - -The following examples demonstrate how quoting and escaping work: - -```js -'string' // string -'"string"' // "string" -'""string""' // ""string"" -'''string' // 'string - -"string" // string -"'string'" // 'string' -"''string''" // ''string'' -"""string" // "string -``` - -### Numeric Literals - -Number literals include integer (exact-value) literals and floating-point (approximate-value) literals. - -Integers are represented as a sequence of digits. Numbers may be preceded by `-` or `+` to indicate a negative or positive value, respectively. Examples: `1`, `-1`. - -Numbers with fractional part or represented in scientific notation with a mantissa and exponent are approximate-value numbers. Examples: `.1`, `3.14`, `-2.23`, `+1.70`, `1.2E3`, `1.2E-3`, `-1.2E3`, `-1.2E-3`. - -The `INT32` and `INT64` data types are integer types and calculations are exact. - -The `FLOAT` and `DOUBLE` data types are floating-point types and calculations are approximate. - -An integer may be used in floating-point context; it is interpreted as the equivalent floating-point number. - -### Timestamp Literals - -The timestamp is the time point at which data is produced. It includes absolute timestamps and relative timestamps in IoTDB. For information about timestamp support in IoTDB, see [Data Type Doc](../Basic-Concept/Data-Type.md). - -Specially, `NOW()` represents a constant timestamp that indicates the system time at which the statement began to execute. - -### Boolean Literals - -The constants `TRUE` and `FALSE` evaluate to 1 and 0, respectively. The constant names can be written in any lettercase. - -### NULL Values - -The `NULL` value means “no data.” `NULL` can be written in any lettercase. - -## Identifier - -### Usage scenarios - -Certain objects within IoTDB, including `TRIGGER`, `FUNCTION`(UDF), `CONTINUOUS QUERY`, `SCHEMA TEMPLATE`, `USER`, `ROLE`,`Pipe`,`PipeSink`,`alias` and other object names are known as identifiers. - -### Constraints - -Below are basic constraints of identifiers, specific identifiers may have other constraints, for example, `user` should consists of more than 4 characters. - -- Permitted characters in unquoted identifiers: - - [0-9 a-z A-Z _ ] (letters, digits and underscore) - - ['\u2E80'..'\u9FFF'] (UNICODE Chinese characters) -- Identifiers may begin with a digit, unquoted identifiers can not be a real number. -- Identifiers are case sensitive. -- Key words can be used as an identifier. - -**You need to quote the identifier with back quote(`) in the following cases:** - -- Identifier contains special characters. -- Identifier that is a real number. - -### How to use quotations marks in quoted identifiers - -`'` and `"` can be used directly in quoted identifiers. - -` may be written as `` in quoted identifiers. See the example below: - -```sql -# create template t1't"t -create device template `t1't"t` -(temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) - -# create template t1`t -create device template `t1``t` -(temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) -``` - -### Examples - -Examples of case in which quoted identifier is used : - -- Trigger name should be quoted in cases described above : - - ```sql - # create trigger named alert.`listener-sg1d1s1 - CREATE TRIGGER `alert.``listener-sg1d1s1` - AFTER INSERT - ON root.sg1.d1.s1 - AS 'org.apache.iotdb.db.storageengine.trigger.example.AlertListener' - WITH ( - 'lo' = '0', - 'hi' = '100.0' - ) - ``` - -- UDF name should be quoted in cases described above : - - ```sql - # create a funciton named 111, 111 is a real number. - CREATE FUNCTION `111` AS 'org.apache.iotdb.udf.UDTFExample' - ``` - -- Template name should be quoted in cases described above : - - ```sql - # create a template named 111, 111 is a real number. - create device template `111` - (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) - ``` - -- User and Role name should be quoted in cases described above, blank space is not allow in User and Role name whether quoted or not : - - ```sql - # create user special`user. - CREATE USER `special``user.` 'write_pwd' - - # create role 111 - CREATE ROLE `111` - ``` - -- Continuous query name should be quoted in cases described above : - - ```sql - # create continuous query test.cq - CREATE CONTINUOUS QUERY `test.cq` - BEGIN - SELECT max_value(temperature) - INTO temperature_max - FROM root.ln.*.* - GROUP BY time(10s) - END - ``` - -- Pipe、PipeSink should be quoted in cases described above : - - ```sql - # create PipeSink test.*1 - CREATE PIPESINK `test.*1` AS IoTDB ('ip' = '输入你的IP') - - # create Pipe test.*2 - CREATE PIPE `test.*2` TO `test.*1` FROM - (select ** from root WHERE time>=yyyy-mm-dd HH:MM:SS) WITH 'SyncDelOp' = 'true' - ``` - -- `AS` function provided by IoTDB can assign an alias to time series selected in query. Alias can be constant(including string) or identifier. - - ```sql - select s1 as temperature, s2 as speed from root.ln.wf01.wt01; - - # Header of result dataset - +-----------------------------+-----------|-----+ - | Time|temperature|speed| - +-----------------------------+-----------|-----+ - ``` - -- The key/value of an attribute can be String Literal and identifier, more details can be found at **key-value pair** part. - -- Nodes except database in the path are allowed to contain the "*" symbol, when using this symbol it is required to enclose the node in backquotes, e.g., root.db.`*`, but this usage is only recommended when the path cannot avoid containing the "*" symbol. - -## KeyWords Words - -Keywords are words that have significance in SQL. Keywords can be used as an identifier. Certain keywords, such as TIME/TIMESTAMP and ROOT, are reserved and cannot use as identifiers. - -[Keywords](../Reference/Keywords.md) shows the keywords in IoTDB. - -## Detailed Definitions of Lexical and Grammar - -Please read the lexical and grammar description files in our code repository: - -Lexical file: `antlr/src/main/antlr4/org/apache/iotdb/db/qp/sql/IoTDBSqlLexer.g4` - -Grammer file: `antlr/src/main/antlr4/org/apache/iotdb/db/qp/sql/IoTDBSqlParser.g4` diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Tiered-Storage_timecho.md b/src/UserGuide/V1.3.0-2/User-Manual/Tiered-Storage_timecho.md deleted file mode 100644 index 20f7e7d9d..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Tiered-Storage_timecho.md +++ /dev/null @@ -1,97 +0,0 @@ - - -# Tiered Storage -## Overview - -The Tiered storage functionality allows users to define multiple layers of storage, spanning across multiple types of storage media (Memory mapped directory, SSD, rotational hard discs or cloud storage). While memory and cloud storage is usually singular, the local file system storages can consist of multiple directories joined together into one tier. Meanwhile, users can classify data based on its hot or cold nature and store data of different categories in specified "tier". Currently, IoTDB supports the classification of hot and cold data through TTL (Time to live / age) of data. When the data in one tier does not meet the TTL rules defined in the current tier, the data will be automatically migrated to the next tier. - -## Parameter Definition - -To enable tiered storage in IoTDB, you need to configure the following aspects: - -1. configure the data catalogue and divide the data catalogue into different tiers -2. configure the TTL of the data managed in each tier to distinguish between hot and cold data categories managed in different tiers. -3. configure the minimum remaining storage space ratio for each tier so that when the storage space of the tier triggers the threshold, the data of the tier will be automatically migrated to the next tier (optional). - -The specific parameter definitions and their descriptions are as follows. - -| Configuration | Default | Description | Constraint | -| ---------------------------------------- | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| dn_data_dirs | data/datanode/data | specify different storage directories and divide the storage directories into tiers | Each level of storage uses a semicolon to separate, and commas to separate within a single level; cloud (OBJECT_STORAGE) configuration can only be used as the last level of storage and the first level can't be used as cloud storage; a cloud object at most; the remote storage directory is denoted by OBJECT_STORAGE | -| default_ttl_in_ms | -1 | Define the maximum age of data for which each tier is responsible | Each level of storage is separated by a semicolon; the number of levels should match the number of levels defined by dn_data_dirs;"-1" means "unlimited". | -| dn_default_space_move_thresholds(V1.3.0/1) | 0.15 | Define the minimum remaining space ratio for each tier data catalogue; when the remaining space is less than this ratio, the data will be automatically migrated to the next tier; when the remaining storage space of the last tier falls below this threshold, the system will be set to READ_ONLY | Each level of storage is separated by a semicolon; the number of levels should match the number of levels defined by dn_data_dirs | -| dn_default_space_usage_thresholds(V1.3.2) | 0.85 | Define the minimum remaining space ratio for each tier data catalogue; when the remaining space is less than this ratio, the data will be automatically migrated to the next tier; when the remaining storage space of the last tier falls below this threshold, the system will be set to READ_ONLY | Each level of storage is separated by a semicolon; the number of levels should match the number of levels defined by dn_data_dirs | -| object_storage_type | AWS_S3 | Cloud Storage Type | IoTDB currently only supports AWS S3 as a remote storage type, and this parameter can't be modified | -| object_storage_bucket | iotdb_data | Name of cloud storage bucket | Bucket definition in AWS S3; no need to configure if remote storage is not used | -| object_storage_endpoiont | | endpoint of cloud storage | endpoint of AWS S3;If remote storage is not used, no configuration required | -| object_storage_access_key | | Authentication information stored in the cloud: key | AWS S3 credential key;If remote storage is not used, no configuration required | -| object_storage_access_secret | | Authentication information stored in the cloud: secret | AWS S3 credential secret;If remote storage is not used, no configuration required | -| remote_tsfile_cache_dirs | data/datanode/data/cache | Cache directory stored locally in the cloud | If remote storage is not used, no configuration required | -| remote_tsfile_cache_page_size_in_kb | 20480 |Block size of locally cached files stored in the cloud | If remote storage is not used, no configuration required | -| remote_tsfile_cache_max_disk_usage_in_mb | 51200 | Maximum Disk Occupancy Size for Cloud Storage Local Cache | If remote storage is not used, no configuration required | - -## local tiered storag configuration example - -The following is an example of a local two-level storage configuration. - -```JavaScript -//Required configuration items -dn_data_dirs=/data1/data;/data2/data,/data3/data; -default_ttl_in_ms=86400000;-1 -dn_default_space_move_thresholds=0.2;0.1 -``` - -In this example, two levels of storage are configured, specifically: - -| **tier** | **data path** | **data range** | **threshold for minimum remaining disk space** | -| -------- | -------------------------------------- | --------------- | ------------------------ | -| tier 1 | path 1:/data1/data | data for last 1 day | 20% | -| tier 2 | path 2:/data2/data path 2:/data3/data | data from 1 day ago | 10% | - -## remote tiered storag configuration example - -The following takes three-level storage as an example: - -```JavaScript -//Required configuration items -dn_data_dirs=/data1/data;/data2/data,/data3/data;OBJECT_STORAGE -default_ttl_in_ms=86400000;864000000;-1 -dn_default_space_move_thresholds=0.2;0.15;0.1 -object_storage_name=AWS_S3 -object_storage_bucket=iotdb -object_storage_endpoiont= -object_storage_access_key= -object_storage_access_secret= - -// Optional configuration items -remote_tsfile_cache_dirs=data/datanode/data/cache -remote_tsfile_cache_page_size_in_kb=20971520 -remote_tsfile_cache_max_disk_usage_in_mb=53687091200 -``` - -In this example, a total of three levels of storage are configured, specifically: - -| **tier** | **data path** | **data range** | **threshold for minimum remaining disk space** | -| -------- | -------------------------------------- | ---------------------------- | ------------------------ | -| tier1 | path 1:/data1/data | data for last 1 day | 20% | -| tier2 | path 1:/data2/data path 2:/data3/data | data from past 1 day to past 10 days | 15% | -| tier3 | Remote AWS S3 Storage | data from 10 days ago | 10% | diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Trigger.md b/src/UserGuide/V1.3.0-2/User-Manual/Trigger.md deleted file mode 100644 index 7c4e163fb..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Trigger.md +++ /dev/null @@ -1,466 +0,0 @@ - - -# TRIGGER - -## Instructions - -The trigger provides a mechanism for listening to changes in time series data. With user-defined logic, tasks such as alerting and data forwarding can be conducted. - -The trigger is implemented based on the reflection mechanism. Users can monitor data changes by implementing the Java interfaces. IoTDB allows users to dynamically register and drop triggers without restarting the server. - -The document will help you learn to define and manage triggers. - -### Pattern for Listening - -A single trigger can be used to listen for data changes in a time series that match a specific pattern. For example, a trigger can listen for the data changes of time series `root.sg.a`, or time series that match the pattern `root.sg.*`. When you register a trigger, you can specify the path pattern that the trigger listens on through an SQL statement. - -### Trigger Type - -There are currently two types of triggers, and you can specify the type through an SQL statement when registering a trigger: - -- Stateful triggers: The execution logic of this type of trigger may depend on data from multiple insertion statement . The framework will aggregate the data written by different nodes into the same trigger instance for calculation to retain context information. This type of trigger is usually used for sampling or statistical data aggregation for a period of time. information. Only one node in the cluster holds an instance of a stateful trigger. -- Stateless triggers: The execution logic of the trigger is only related to the current input data. The framework does not need to aggregate the data of different nodes into the same trigger instance. This type of trigger is usually used for calculation of single row data and abnormal detection. Each node in the cluster holds an instance of a stateless trigger. - -### Trigger Event - -There are currently two trigger events for the trigger, and other trigger events will be expanded in the future. When you register a trigger, you can specify the trigger event through an SQL statement: - -- BEFORE INSERT: Fires before the data is persisted. **Please note that currently the trigger does not support data cleaning and will not change the data to be persisted itself.** -- AFTER INSERT: Fires after the data is persisted. - -## How to Implement a Trigger - -You need to implement the trigger by writing a Java class, where the dependency shown below is required. If you use [Maven](http://search.maven.org/), you can search for them directly from the [Maven repository](http://search.maven.org/). - -### Dependency - -```xml - - org.apache.iotdb - iotdb-server - 1.0.0 - provided - -``` - -Note that the dependency version should be correspondent to the target server version. - -### Interface Description - -To implement a trigger, you need to implement the `org.apache.iotdb.trigger.api.Trigger` class. - -```java -import org.apache.iotdb.trigger.api.enums.FailureStrategy; -import org.apache.iotdb.tsfile.write.record.Tablet; - -public interface Trigger { - - /** - * This method is mainly used to validate {@link TriggerAttributes} before calling {@link - * Trigger#onCreate(TriggerAttributes)}. - * - * @param attributes TriggerAttributes - * @throws Exception e - */ - default void validate(TriggerAttributes attributes) throws Exception {} - - /** - * This method will be called when creating a trigger after validation. - * - * @param attributes TriggerAttributes - * @throws Exception e - */ - default void onCreate(TriggerAttributes attributes) throws Exception {} - - /** - * This method will be called when dropping a trigger. - * - * @throws Exception e - */ - default void onDrop() throws Exception {} - - /** - * When restarting a DataNode, Triggers that have been registered will be restored and this method - * will be called during the process of restoring. - * - * @throws Exception e - */ - default void restore() throws Exception {} - - /** - * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} - * is the default strategy. - * - * @return {@link FailureStrategy} - */ - default FailureStrategy getFailureStrategy() { - return FailureStrategy.OPTIMISTIC; - } - - /** - * @param tablet see {@link Tablet} for detailed information of data structure. Data that is - * inserted will be constructed as a Tablet and you can define process logic with {@link - * Tablet}. - * @return true if successfully fired - * @throws Exception e - */ - default boolean fire(Tablet tablet) throws Exception { - return true; - } -} -``` - -This class provides two types of programming interfaces: **Lifecycle related interfaces** and **data change listening related interfaces**. All the interfaces in this class are not required to be implemented. When the interfaces are not implemented, the trigger will not respond to the data changes. You can implement only some of these interfaces according to your needs. - -Descriptions of the interfaces are as followed. - -#### Lifecycle Related Interfaces - -| Interface | Description | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| *default void validate(TriggerAttributes attributes) throws Exception {}* | When you creates a trigger using the `CREATE TRIGGER` statement, you can specify the parameters that the trigger needs to use, and this interface will be used to verify the correctness of the parameters。 | -| *default void onCreate(TriggerAttributes attributes) throws Exception {}* | This interface is called once when you create a trigger using the `CREATE TRIGGER` statement. During the lifetime of each trigger instance, this interface will be called only once. This interface is mainly used for the following functions: helping users to parse custom attributes in SQL statements (using `TriggerAttributes`). You can create or apply for resources, such as establishing external links, opening files, etc. | -| *default void onDrop() throws Exception {}* | This interface is called when you drop a trigger using the `DROP TRIGGER` statement. During the lifetime of each trigger instance, this interface will be called only once. This interface mainly has the following functions: it can perform the operation of resource release and can be used to persist the results of trigger calculations. | -| *default void restore() throws Exception {}* | When the DataNode is restarted, the cluster will restore the trigger instance registered on the DataNode, and this interface will be called once for stateful trigger during the process. After the DataNode where the stateful trigger instance is located goes down, the cluster will restore the trigger instance on another available DataNode, calling this interface once in the process. This interface can be used to customize recovery logic. | - -#### Data Change Listening Related Interfaces - -##### Listening Interface - -```java -/** - * @param tablet see {@link Tablet} for detailed information of data structure. Data that is - * inserted will be constructed as a Tablet and you can define process logic with {@link - * Tablet}. - * @return true if successfully fired - * @throws Exception e - */ - default boolean fire(Tablet tablet) throws Exception { - return true; - } -``` - -When the data changes, the trigger uses the Tablet as the unit of firing operation. You can obtain the metadata and data of the corresponding sequence through Tablet, and then perform the corresponding trigger operation. If the fire process is successful, the return value should be true. If the interface returns false or throws an exception, we consider the trigger fire process as failed. When the trigger fire process fails, we will perform corresponding operations according to the listening strategy interface. - -When performing an INSERT operation, for each time series in it, we will detect whether there is a trigger that listens to the path pattern, and then assemble the time series data that matches the path pattern listened by the same trigger into a new Tablet for trigger fire interface. Can be understood as: - -```java -Map> pathToTriggerListMap => Map -``` - -**Note that currently we do not make any guarantees about the order in which triggers fire.** - -Here is an example: - -Suppose there are three triggers, and the trigger event of the triggers are all BEFORE INSERT: - -- Trigger1 listens on `root.sg.*` -- Trigger2 listens on `root.sg.a` -- Trigger3 listens on `root.sg.b` - -Insertion statement: - -```sql -insert into root.sg(time, a, b) values (1, 1, 1); -``` - -The time series `root.sg.a` matches Trigger1 and Trigger2, and the sequence `root.sg.b` matches Trigger1 and Trigger3, then: - -- The data of `root.sg.a` and `root.sg.b` will be assembled into a new tablet1, and Trigger1.fire(tablet1) will be executed at the corresponding Trigger Event. -- The data of `root.sg.a` will be assembled into a new tablet2, and Trigger2.fire(tablet2) will be executed at the corresponding Trigger Event. -- The data of `root.sg.b` will be assembled into a new tablet3, and Trigger3.fire(tablet3) will be executed at the corresponding Trigger Event. - -##### Listening Strategy Interface - -When the trigger fails to fire, we will take corresponding actions according to the strategy set by the listening strategy interface. You can set `org.apache.iotdb.trigger.api.enums.FailureStrategy`. There are currently two strategies, optimistic and pessimistic: - -- Optimistic strategy: The trigger that fails to fire does not affect the firing of subsequent triggers, nor does it affect the writing process, that is, we do not perform additional processing on the sequence involved in the trigger failure, only log the failure to record the failure, and finally inform user that data insertion is successful, but the trigger fire part failed. -- Pessimistic strategy: The failure trigger affects the processing of all subsequent Pipelines, that is, we believe that the firing failure of the trigger will cause all subsequent triggering processes to no longer be carried out. If the trigger event of the trigger is BEFORE INSERT, then the insertion will no longer be performed, and the insertion failure will be returned directly. - -```java - /** - * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} - * is the default strategy. - * - * @return {@link FailureStrategy} - */ - default FailureStrategy getFailureStrategy() { - return FailureStrategy.OPTIMISTIC; - } -``` - -### Example - -If you use [Maven](http://search.maven.org/), you can refer to our sample project **trigger-example**. - -You can find it [here](https://github.com/apache/iotdb/tree/master/example/trigger). - -Here is the code from one of the sample projects: - -```java -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - */ - -package org.apache.iotdb.trigger; - -import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerConfiguration; -import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerEvent; -import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerHandler; -import org.apache.iotdb.trigger.api.Trigger; -import org.apache.iotdb.trigger.api.TriggerAttributes; -import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; -import org.apache.iotdb.tsfile.write.record.Tablet; -import org.apache.iotdb.tsfile.write.schema.MeasurementSchema; - -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.io.IOException; -import java.util.HashMap; -import java.util.List; - -public class ClusterAlertingExample implements Trigger { - private static final Logger LOGGER = LoggerFactory.getLogger(ClusterAlertingExample.class); - - private final AlertManagerHandler alertManagerHandler = new AlertManagerHandler(); - - private final AlertManagerConfiguration alertManagerConfiguration = - new AlertManagerConfiguration("http://127.0.0.1:9093/api/v2/alerts"); - - private String alertname; - - private final HashMap labels = new HashMap<>(); - - private final HashMap annotations = new HashMap<>(); - - @Override - public void onCreate(TriggerAttributes attributes) throws Exception { - alertname = "alert_test"; - - labels.put("series", "root.ln.wf01.wt01.temperature"); - labels.put("value", ""); - labels.put("severity", ""); - - annotations.put("summary", "high temperature"); - annotations.put("description", "{{.alertname}}: {{.series}} is {{.value}}"); - - alertManagerHandler.open(alertManagerConfiguration); - } - - @Override - public void onDrop() throws IOException { - alertManagerHandler.close(); - } - - @Override - public boolean fire(Tablet tablet) throws Exception { - List measurementSchemaList = tablet.getSchemas(); - for (int i = 0, n = measurementSchemaList.size(); i < n; i++) { - if (measurementSchemaList.get(i).getType().equals(TSDataType.DOUBLE)) { - // for example, we only deal with the columns of Double type - double[] values = (double[]) tablet.values[i]; - for (double value : values) { - if (value > 100.0) { - LOGGER.info("trigger value > 100"); - labels.put("value", String.valueOf(value)); - labels.put("severity", "critical"); - AlertManagerEvent alertManagerEvent = - new AlertManagerEvent(alertname, labels, annotations); - alertManagerHandler.onEvent(alertManagerEvent); - } else if (value > 50.0) { - LOGGER.info("trigger value > 50"); - labels.put("value", String.valueOf(value)); - labels.put("severity", "warning"); - AlertManagerEvent alertManagerEvent = - new AlertManagerEvent(alertname, labels, annotations); - alertManagerHandler.onEvent(alertManagerEvent); - } - } - } - } - return true; - } -} -``` - -## Trigger Management - -You can create and drop a trigger through an SQL statement, and you can also query all registered triggers through an SQL statement. - -**We recommend that you stop insertion while creating triggers.** - -### Create Trigger - -Triggers can be registered on arbitrary path patterns. The time series registered with the trigger will be listened to by the trigger. When there is data change on the series, the corresponding fire method in the trigger will be called. - -Registering a trigger can be done as follows: - -1. Implement a Trigger class as described in the How to implement a Trigger chapter, assuming the class's full class name is `org.apache.iotdb.trigger.ClusterAlertingExample` -2. Package the project into a JAR package. -3. Register the trigger with an SQL statement. During the creation process, the `validate` and `onCreate` interfaces of the trigger will only be called once. For details, please refer to the chapter of How to implement a Trigger. - -The complete SQL syntax is as follows: - -```sql -// Create Trigger -createTrigger - : CREATE triggerType TRIGGER triggerName=identifier triggerEventClause ON pathPattern AS className=STRING_LITERAL uriClause? triggerAttributeClause? - ; - -triggerType - : STATELESS | STATEFUL - ; - -triggerEventClause - : (BEFORE | AFTER) INSERT - ; - -uriClause - : USING URI uri - ; - -uri - : STRING_LITERAL - ; - -triggerAttributeClause - : WITH LR_BRACKET triggerAttribute (COMMA triggerAttribute)* RR_BRACKET - ; - -triggerAttribute - : key=attributeKey operator_eq value=attributeValue - ; -``` - -Below is the explanation for the SQL syntax: - -- triggerName: The trigger ID, which is globally unique and used to distinguish different triggers, is case-sensitive. -- triggerType: Trigger types are divided into two categories, STATELESS and STATEFUL. -- triggerEventClause: when the trigger fires, BEFORE INSERT and AFTER INSERT are supported now. -- pathPattern:The path pattern the trigger listens on, can contain wildcards * and **. -- className:The class name of the Trigger class. -- jarLocation: Optional. When this option is not specified, by default, we consider that the DBA has placed the JAR package required to create the trigger in the trigger_root_dir directory (configuration item, default is IOTDB_HOME/ext/trigger) of each DataNode node. When this option is specified, we will download and distribute the file resource corresponding to the URI to the trigger_root_dir/install directory of each DataNode. -- triggerAttributeClause: It is used to specify the parameters that need to be set when the trigger instance is created. This part is optional in the SQL syntax. - -Here is an example SQL statement to help you understand: - -```sql -CREATE STATELESS TRIGGER triggerTest -BEFORE INSERT -ON root.sg.** -AS 'org.apache.iotdb.trigger.ClusterAlertingExample' -USING URI '/jar/ClusterAlertingExample.jar' -WITH ( - "name" = "trigger", - "limit" = "100" -) -``` - -The above SQL statement creates a trigger named triggerTest: - -- The trigger is stateless. -- Fires before insertion. -- Listens on path pattern root.sg.** -- The implemented trigger class is named `org.apache.iotdb.trigger.ClusterAlertingExample` -- The JAR package URI is http://jar/ClusterAlertingExample.jar -- When creating the trigger instance, two parameters, name and limit, are passed in. - -### Drop Trigger - -The trigger can be dropped by specifying the trigger ID. During the process of dropping the trigger, the `onDrop` interface of the trigger will be called only once. - -The SQL syntax is: - -```sql -// Drop Trigger -dropTrigger - : DROP TRIGGER triggerName=identifier -; -``` - -Here is an example statement: - -```sql -DROP TRIGGER triggerTest1 -``` - -The above statement will drop the trigger with ID triggerTest1. - -### Show Trigger - -You can query information about triggers that exist in the cluster through an SQL statement. - -The SQL syntax is as follows: - -```sql -SHOW TRIGGERS -``` - -The result set format of this statement is as follows: - -| TriggerName | Event | Type | State | PathPattern | ClassName | NodeId | -| ------------ | ---------------------------- | -------------------- | ------------------------------------------- | ----------- | --------------------------------------- | --------------------------------------- | -| triggerTest1 | BEFORE_INSERT / AFTER_INSERT | STATELESS / STATEFUL | INACTIVE / ACTIVE / DROPPING / TRANSFFERING | root.** | org.apache.iotdb.trigger.TriggerExample | ALL(STATELESS) / DATA_NODE_ID(STATEFUL) | - -### Trigger State - -During the process of creating and dropping triggers in the cluster, we maintain the states of the triggers. The following is a description of these states: - -| State | Description | Is it recommended to insert data? | -| ------------ | ------------------------------------------------------------ | --------------------------------- | -| INACTIVE | The intermediate state of executing `CREATE TRIGGER`, the cluster has just recorded the trigger information on the ConfigNode, and the trigger has not been activated on any DataNode. | NO | -| ACTIVE | Status after successful execution of `CREATE TRIGGE`, the trigger is available on all DataNodes in the cluster. | YES | -| DROPPING | Intermediate state of executing `DROP TRIGGER`, the cluster is in the process of dropping the trigger. | NO | -| TRANSFERRING | The cluster is migrating the location of this trigger instance. | NO | - -## Notes - -- The trigger takes effect from the time of registration, and does not process the existing historical data. **That is, only insertion requests that occur after the trigger is successfully registered will be listened to by the trigger. ** -- The fire process of trigger is synchronous currently, so you need to ensure the efficiency of the trigger, otherwise the writing performance may be greatly affected. **You need to guarantee concurrency safety of triggers yourself**. -- Please do no register too many triggers in the cluster. Because the trigger information is fully stored in the ConfigNode, and there is a copy of the information in all DataNodes -- **It is recommended to stop writing when registering triggers**. Registering a trigger is not an atomic operation. When registering a trigger, there will be an intermediate state in which some nodes in the cluster have registered the trigger, and some nodes have not yet registered successfully. To avoid write requests on some nodes being listened to by triggers and not being listened to on some nodes, we recommend not to perform writes when registering triggers. -- When the node holding the stateful trigger instance goes down, we will try to restore the corresponding instance on another node. During the recovery process, we will call the restore interface of the trigger class once. -- The trigger JAR package has a size limit, which must be less than min(`config_node_ratis_log_appender_buffer_size_max`, 2G), where `config_node_ratis_log_appender_buffer_size_max` is a configuration item. For the specific meaning, please refer to the IOTDB configuration item description. -- **It is better not to have classes with the same full class name but different function implementations in different JAR packages.** For example, trigger1 and trigger2 correspond to resources trigger1.jar and trigger2.jar respectively. If two JAR packages contain a `org.apache.iotdb.trigger.example.AlertListener` class, when `CREATE TRIGGER` uses this class, the system will randomly load the class in one of the JAR packages, which will eventually leads the inconsistent behavior of trigger and other issues. - -## Configuration Parameters - -| Parameter | Meaning | -| ------------------------------------------------- | ------------------------------------------------------------ | -| *trigger_lib_dir* | Directory to save the trigger jar package | -| *stateful\_trigger\_retry\_num\_when\_not\_found* | How many times will we retry to found an instance of stateful trigger on DataNodes if not found | diff --git a/src/UserGuide/V1.3.0-2/User-Manual/User-defined-function.md b/src/UserGuide/V1.3.0-2/User-Manual/User-defined-function.md deleted file mode 100644 index 4f03857dd..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/User-defined-function.md +++ /dev/null @@ -1,214 +0,0 @@ -# USER-DEFINED FUNCTION (UDF) - -## 1. UDF Introduction - -UDF (User Defined Function) refers to user-defined functions. IoTDB provides a variety of built-in time series processing functions and also supports extending custom functions to meet more computing needs. - -In IoTDB, you can expand two types of UDF: - - - - - - - - - - - - - - - - - - - - - - -
UDF ClassAccessStrategyDescription
UDTFMAPPABLE_ROW_BY_ROWCustom scalar function, input k columns of time series and 1 row of data, output 1 column of time series and 1 row of data, can be used in any clause and expression that appears in the scalar function, such as select clause, where clause, etc.
ROW_BY_ROW
SLIDING_TIME_WINDOW
SLIDING_SIZE_WINDOW
SESSION_TIME_WINDOW
STATE_WINDOW
Custom time series generation function, input k columns of time series m rows of data, output 1 column of time series n rows of data, the number of input rows m can be different from the number of output rows n, and can only be used in SELECT clauses.
UDAF-Custom aggregation function, input k columns of time series m rows of data, output 1 column of time series 1 row of data, can be used in any clause and expression that appears in the aggregation function, such as select clause, having clause, etc.
- -### 1.1 UDF usage - -The usage of UDF is similar to that of regular built-in functions, and can be directly used in SELECT statements like calling regular functions. - -#### 1.Basic SQL syntax support - -* Support `SLIMIT` / `SOFFSET` -* Support `LIMIT` / `OFFSET` -* Support queries with value filters -* Support queries with time filters - - -#### 2. Queries with * in SELECT Clauses - -Assume that there are 2 time series (`root.sg.d1.s1` and `root.sg.d1.s2`) in the system. - -* **`SELECT example(*) from root.sg.d1`** - -Then the result set will include the results of `example (root.sg.d1.s1)` and `example (root.sg.d1.s2)`. - -* **`SELECT example(s1, *) from root.sg.d1`** - -Then the result set will include the results of `example(root.sg.d1.s1, root.sg.d1.s1)` and `example(root.sg.d1.s1, root.sg.d1.s2)`. - -* **`SELECT example(*, *) from root.sg.d1`** - -Then the result set will include the results of `example(root.sg.d1.s1, root.sg.d1.s1)`, `example(root.sg.d1.s2, root.sg.d1.s1)`, `example(root.sg.d1.s1, root.sg.d1.s2)` and `example(root.sg.d1.s2, root.sg.d1.s2)`. - -#### 3. Queries with Key-value Attributes in UDF Parameters - -You can pass any number of key-value pair parameters to the UDF when constructing a UDF query. The key and value in the key-value pair need to be enclosed in single or double quotes. Note that key-value pair parameters can only be passed in after all time series have been passed in. Here is a set of examples: - - Example: -``` sql -SELECT example(s1, 'key1'='value1', 'key2'='value2'), example(*, 'key3'='value3') FROM root.sg.d1; -SELECT example(s1, s2, 'key1'='value1', 'key2'='value2') FROM root.sg.d1; -``` - -#### 4. Nested Queries - - Example: -``` sql -SELECT s1, s2, example(s1, s2) FROM root.sg.d1; -SELECT *, example(*) FROM root.sg.d1 DISABLE ALIGN; -SELECT s1 * example(* / s1 + s2) FROM root.sg.d1; -SELECT s1, s2, s1 + example(s1, s2), s1 - example(s1 + example(s1, s2) / s2) FROM root.sg.d1; -``` - -## 2. UDF Development - -You can refer to UDF development:[Development Guide](../Reference/UDF-development.md) - -## 3. UDF management - -### 3.1 UDF Registration - -The process of registering a UDF in IoTDB is as follows: - -1. Implement a complete UDF class, assuming the full class name of this class is `org.apache.iotdb.udf.ExampleUDTF`. -2. Convert the project into a JAR package. If using Maven to manage the project, you can refer to the [Maven project example](https://github.com/apache/iotdb/tree/master/example/udf) above. -3. Make preparations for registration according to the registration mode. For details, see the following example. -4. You can use following SQL to register UDF. - -```sql -CREATE FUNCTION AS (USING URI URI-STRING) -``` - -#### Example: register UDF named `example`, you can choose either of the following two registration methods - -#### Method 1: Manually place the jar package - -Prepare: -When registering using this method, it is necessary to place the JAR package in advance in the `ext/udf` directory of all nodes in the cluster (which can be configured). - -Registration statement: - -```sql -CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' -``` - -#### Method 2: Cluster automatically installs jar packages through URI - -Prepare: -When registering using this method, it is necessary to upload the JAR package to the URI server in advance and ensure that the IoTDB instance executing the registration statement can access the URI server. - -Registration statement: - -```sql -CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' USING URI 'http://jar/example.jar' -``` - -IoTDB will download JAR packages and synchronize them to the entire cluster. - -#### Note - -1. Since UDF instances are dynamically loaded through reflection technology, you do not need to restart the server during the UDF registration process. - -2. UDF function names are not case-sensitive. - -3. Please ensure that the function name given to the UDF is different from all built-in function names. A UDF with the same name as a built-in function cannot be registered. - -4. We recommend that you do not use classes that have the same class name but different function logic in different JAR packages. For example, in `UDF(UDAF/UDTF): udf1, udf2`, the JAR package of udf1 is `udf1.jar` and the JAR package of udf2 is `udf2.jar`. Assume that both JAR packages contain the `org.apache.iotdb.udf.ExampleUDTF` class. If you use two UDFs in the same SQL statement at the same time, the system will randomly load either of them and may cause inconsistency in UDF execution behavior. - -### 3.2 UDF Deregistration - -The SQL syntax is as follows: - -```sql -DROP FUNCTION -``` - -Example: Uninstall the UDF from the above example: - -```sql -DROP FUNCTION example -``` - - - -### 3.3 Show All Registered UDFs - -``` sql -SHOW FUNCTIONS -``` - -### 3.4 UDF configuration - -- UDF configuration allows configuring the storage directory of UDF in `iotdb-common.properties` - ``` Properties -# UDF lib dir - -udf_lib_dir=ext/udf -``` - -- -When using custom functions, there is a message indicating insufficient memory. Change the following configuration parameters in `iotdb-common.properties` and restart the service. - - ``` Properties - -# Used to estimate the memory usage of text fields in a UDF query. -# It is recommended to set this value to be slightly larger than the average length of all text -# effectiveMode: restart -# Datatype: int -udf_initial_byte_array_length_for_memory_control=48 - -# How much memory may be used in ONE UDF query (in MB). -# The upper limit is 20% of allocated memory for read. -# effectiveMode: restart -# Datatype: float -udf_memory_budget_in_mb=30.0 - -# UDF memory allocation ratio. -# The parameter form is a:b:c, where a, b, and c are integers. -# effectiveMode: restart -udf_reader_transformer_collector_memory_proportion=1:1:1 -``` - -### 3.5 UDF User Permissions - - -When users use UDF, they will be involved in the `USE_UDF` permission, and only users with this permission are allowed to perform UDF registration, uninstallation, and query operations. - -For more user permissions related content, please refer to [Account Management Statements](./Authority-Management.md). - - -## 4. UDF Libraries - -Based on the ability of user-defined functions, IoTDB provides a series of functions for temporal data processing, including data quality, data profiling, anomaly detection, frequency domain analysis, data matching, data repairing, sequence discovery, machine learning, etc., which can meet the needs of industrial fields for temporal data processing. - -You can refer to the [UDF Libraries](../Reference/UDF-Libraries.md)document to find the installation steps and registration statements for each function, to ensure that all required functions are registered correctly. - - -## 5. Common problem: - -Q1: How to modify the registered UDF? - -A1: Assume that the name of the UDF is `example` and the full class name is `org.apache.iotdb.udf.ExampleUDTF`, which is introduced by `example.jar`. - -1. Unload the registered function by executing `DROP FUNCTION example`. -2. Delete `example.jar` under `iotdb-server-1.0.0-all-bin/ext/udf`. -3. Modify the logic in `org.apache.iotdb.udf.ExampleUDTF` and repackage it. The name of the JAR package can still be `example.jar`. -4. Upload the new JAR package to `iotdb-server-1.0.0-all-bin/ext/udf`. -5. Load the new UDF by executing `CREATE FUNCTION example AS "org.apache.iotdb.udf.ExampleUDTF"`. - diff --git a/src/UserGuide/V1.3.0-2/User-Manual/Write-Delete-Data.md b/src/UserGuide/V1.3.0-2/User-Manual/Write-Delete-Data.md deleted file mode 100644 index 44e341fbd..000000000 --- a/src/UserGuide/V1.3.0-2/User-Manual/Write-Delete-Data.md +++ /dev/null @@ -1,280 +0,0 @@ - - - -# Write & Delete Data -## CLI INSERT - -IoTDB provides users with a variety of ways to insert real-time data, such as directly inputting [INSERT SQL statement](../SQL-Manual/SQL-Manual.md#insert-data) in [Client/Shell tools](../Tools-System/CLI.md), or using [Java JDBC](../API/Programming-JDBC.md) to perform single or batch execution of [INSERT SQL statement](../SQL-Manual/SQL-Manual.md). - -NOTE: This section mainly introduces the use of [INSERT SQL statement](../SQL-Manual/SQL-Manual.md#insert-data) for real-time data import in the scenario. - -Writing a repeat timestamp covers the original timestamp data, which can be regarded as updated data. - -### Use of INSERT Statements - -The [INSERT SQL statement](../SQL-Manual/SQL-Manual.md#insert-data) statement is used to insert data into one or more specified timeseries created. For each point of data inserted, it consists of a [timestamp](../Basic-Concept/Data-Model-and-Terminology.md) and a sensor acquisition value (see [Data Type](../Basic-Concept/Data-Type.md)). - -**Schema-less writing**: When metadata is not defined, data can be directly written through an insert statement, and the required metadata will be automatically recognized and registered in the database, achieving automatic modeling. - -In the scenario of this section, take two timeseries `root.ln.wf02.wt02.status` and `root.ln.wf02.wt02.hardware` as an example, and their data types are BOOLEAN and TEXT, respectively. - -The sample code for single column data insertion is as follows: - -``` -IoTDB > insert into root.ln.wf02.wt02(timestamp,status) values(1,true) -IoTDB > insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') -``` - -The above example code inserts the long integer timestamp and the value "true" into the timeseries `root.ln.wf02.wt02.status` and inserts the long integer timestamp and the value "v1" into the timeseries `root.ln.wf02.wt02.hardware`. When the execution is successful, cost time is shown to indicate that the data insertion has been completed. - -> Note: In IoTDB, TEXT type data can be represented by single and double quotation marks. The insertion statement above uses double quotation marks for TEXT type data. The following example will use single quotation marks for TEXT type data. - -The INSERT statement can also support the insertion of multi-column data at the same time point. The sample code of inserting the values of the two timeseries at the same time point '2' is as follows: - -```sql -IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (2, false, 'v2') -``` - -In addition, The INSERT statement support insert multi-rows at once. The sample code of inserting two rows as follows: - -```sql -IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (3, false, 'v3'),(4, true, 'v4') -``` - -After inserting the data, we can simply query the inserted data using the SELECT statement: - -```sql -IoTDB > select * from root.ln.wf02.wt02 where time < 5 -``` - -The result is shown below. The query result shows that the insertion statements of single column and multi column data are performed correctly. - -``` -+-----------------------------+--------------------------+------------------------+ -| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status| -+-----------------------------+--------------------------+------------------------+ -|1970-01-01T08:00:00.001+08:00| v1| true| -|1970-01-01T08:00:00.002+08:00| v2| false| -|1970-01-01T08:00:00.003+08:00| v3| false| -|1970-01-01T08:00:00.004+08:00| v4| true| -+-----------------------------+--------------------------+------------------------+ -Total line number = 4 -It costs 0.004s -``` - -In addition, we can omit the timestamp column, and the system will use the current system timestamp as the timestamp of the data point. The sample code is as follows: - -```sql -IoTDB > insert into root.ln.wf02.wt02(status, hardware) values (false, 'v2') -``` - -**Note:** Timestamps must be specified when inserting multiple rows of data in a SQL. - -### Insert Data Into Aligned Timeseries - -To insert data into a group of aligned time series, we only need to add the `ALIGNED` keyword in SQL, and others are similar. - -The sample code is as follows: - -```sql -IoTDB > create aligned timeseries root.sg1.d1(s1 INT32, s2 DOUBLE) -IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1) -IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(2, 2, 2), (3, 3, 3) -IoTDB > select * from root.sg1.d1 -``` - -The result is shown below. The query result shows that the insertion statements are performed correctly. - -``` -+-----------------------------+--------------+--------------+ -| Time|root.sg1.d1.s1|root.sg1.d1.s2| -+-----------------------------+--------------+--------------+ -|1970-01-01T08:00:00.001+08:00| 1| 1.0| -|1970-01-01T08:00:00.002+08:00| 2| 2.0| -|1970-01-01T08:00:00.003+08:00| 3| 3.0| -+-----------------------------+--------------+--------------+ -Total line number = 3 -It costs 0.004s -``` - -## NATIVE API WRITE - -The Native API ( Session ) is the most widely used series of APIs of IoTDB, including multiple APIs, adapted to different data collection scenarios, with high performance and multi-language support. - -### Multi-language API write - -#### Java - -Before writing via the Java API, you need to establish a connection, refer to [Java Native API](../API/Programming-Java-Native-API.md). -then refer to [ JAVA Data Manipulation Interface (DML) ](../API/Programming-Java-Native-API.md#insert) - -#### Python - -Refer to [ Python Data Manipulation Interface (DML) ](../API/Programming-Python-Native-API.md#insert) - -#### C++ - -Refer to [ C++ Data Manipulation Interface (DML) ](../API/Programming-Cpp-Native-API.md#insert) - -#### Go - -Refer to [Go Native API](../API/Programming-Go-Native-API.md) - -## REST API WRITE - -Refer to [insertTablet (v1)](../API/RestServiceV1.md#inserttablet) or [insertTablet (v2)](../API/RestServiceV2.md#inserttablet) - -Example: - -```JSON -{ -      "timestamps": [ -            1, -            2, -            3 -      ], -      "measurements": [ -            "temperature", -            "status" -      ], -      "data_types": [ -            "FLOAT", -            "BOOLEAN" -      ], -      "values": [ -            [ -                  1.1, -                  2.2, -                  3.3 -            ], -            [ -                  false, -                  true, -                  true -            ] -      ], -      "is_aligned": false, -      "device": "root.ln.wf01.wt01" -} -``` - -## MQTT WRITE - -Refer to [Built-in MQTT Service](../API/Programming-MQTT.md#built-in-mqtt-service) - -## BATCH DATA LOAD - -In different scenarios, the IoTDB provides a variety of methods for importing data in batches. This section describes the two most common methods for importing data in CSV format and TsFile format. - -### TsFile Batch Load - -TsFile is the file format of time series used in IoTDB. You can directly import one or more TsFile files with time series into another running IoTDB instance through tools such as CLI. For details, see [Import-Export-Tool](../Tools-System/TsFile-Import-Export-Tool.md). - -### CSV Batch Load - -CSV stores table data in plain text. You can write multiple formatted data into a CSV file and import the data into the IoTDB in batches. Before importing data, you are advised to create the corresponding metadata in the IoTDB. Don't worry if you forget to create one, the IoTDB can automatically infer the data in the CSV to its corresponding data type, as long as you have a unique data type for each column. In addition to a single file, the tool supports importing multiple CSV files as folders and setting optimization parameters such as time precision. For details, see [Import-Export-Tool](../Tools-System/Data-Import-Export-Tool.md). - -## DELETE - -Users can delete data that meet the deletion condition in the specified timeseries by using the [DELETE statement](../SQL-Manual/SQL-Manual.md#delete-data). When deleting data, users can select one or more timeseries paths, prefix paths, or paths with star to delete data within a certain time interval. - -In a JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute single or batch UPDATE statements. - -### Delete Single Timeseries - -Taking ln Group as an example, there exists such a usage scenario: - -The wf02 plant's wt02 device has many segments of errors in its power supply status before 2017-11-01 16:26:00, and the data cannot be analyzed correctly. The erroneous data affected the correlation analysis with other devices. At this point, the data before this time point needs to be deleted. The SQL statement for this operation is - -```sql -delete from root.ln.wf02.wt02.status where time<=2017-11-01T16:26:00; -``` - -In case we hope to merely delete the data before 2017-11-01 16:26:00 in the year of 2017, The SQL statement is: - -```sql -delete from root.ln.wf02.wt02.status where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; -``` - -IoTDB supports to delete a range of timeseries points. Users can write SQL expressions as follows to specify the delete interval: - -```sql -delete from root.ln.wf02.wt02.status where time < 10 -delete from root.ln.wf02.wt02.status where time <= 10 -delete from root.ln.wf02.wt02.status where time < 20 and time > 10 -delete from root.ln.wf02.wt02.status where time <= 20 and time >= 10 -delete from root.ln.wf02.wt02.status where time > 20 -delete from root.ln.wf02.wt02.status where time >= 20 -delete from root.ln.wf02.wt02.status where time = 20 -``` - -Please pay attention that multiple intervals connected by "OR" expression are not supported in delete statement: - -``` -delete from root.ln.wf02.wt02.status where time > 4 or time < 0 -Msg: 303: Check metadata error: For delete statement, where clause can only contain atomic -expressions like : time > XXX, time <= XXX, or two atomic expressions connected by 'AND' -``` - -If no "where" clause specified in a delete statement, all the data in a timeseries will be deleted. - -```sql -delete from root.ln.wf02.wt02.status -``` - - -### Delete Multiple Timeseries - -If both the power supply status and hardware version of the ln group wf02 plant wt02 device before 2017-11-01 16:26:00 need to be deleted, [the prefix path with broader meaning or the path with star](../Basic-Concept/Data-Model-and-Terminology.md) can be used to delete the data. The SQL statement for this operation is: - -```sql -delete from root.ln.wf02.wt02 where time <= 2017-11-01T16:26:00; -``` - -or - -```sql -delete from root.ln.wf02.wt02.* where time <= 2017-11-01T16:26:00; -``` - -It should be noted that when the deleted path does not exist, IoTDB will not prompt that the path does not exist, but that the execution is successful, because SQL is a declarative programming method. Unless it is a syntax error, insufficient permissions and so on, it is not considered an error, as shown below: - -``` -IoTDB> delete from root.ln.wf03.wt02.status where time < now() -Msg: The statement is executed successfully. -``` - -### Delete Time Partition (experimental) - -You may delete all data in a time partition of a database using the following grammar: - -```sql -DELETE PARTITION root.ln 0,1,2 -``` - -The `0,1,2` above is the id of the partition that is to be deleted, you can find it from the IoTDB -data folders or convert a timestamp manually to an id using `timestamp / partitionInterval -` (flooring), and the `partitionInterval` should be in your config (if time-partitioning is -supported in your version). - -Please notice that this function is experimental and mainly for development, please use it with care. - diff --git a/src/UserGuide/V1.3.0-2/UserGuideReadme.md b/src/UserGuide/V1.3.0-2/UserGuideReadme.md deleted file mode 100644 index 5ab5a5640..000000000 --- a/src/UserGuide/V1.3.0-2/UserGuideReadme.md +++ /dev/null @@ -1,31 +0,0 @@ - -# IoTDB User Guide Toc - -We keep introducing more features into IoTDB. Therefore, different released versions have their user guide documents respectively. - -The "In Progress Version" is for matching the master branch of IOTDB's source code Repository. -Other documents are for IoTDB previous released versions. - -- [In progress version](https://iotdb.apache.org/UserGuide/Master/QuickStart/QuickStart_apache.html) -- [Version 1.0.x](https://iotdb.apache.org/UserGuide/V1.0.x/QuickStart/QuickStart_apache.html) -- [Version 0.13.x](https://iotdb.apache.org/UserGuide/V0.13.x/QuickStart/QuickStart_apache.html) -