diff --git a/src/.vuepress/public/img/scenario-aerospace-en.png b/src/.vuepress/public/img/scenario-aerospace-en.png new file mode 100644 index 000000000..7d3e8c488 Binary files /dev/null and b/src/.vuepress/public/img/scenario-aerospace-en.png differ diff --git a/src/.vuepress/public/img/scenario-energy-en.png b/src/.vuepress/public/img/scenario-energy-en.png new file mode 100644 index 000000000..e4ebce4cb Binary files /dev/null and b/src/.vuepress/public/img/scenario-energy-en.png differ diff --git a/src/.vuepress/public/img/scenario-iot-en.png b/src/.vuepress/public/img/scenario-iot-en.png new file mode 100644 index 000000000..f229cee57 Binary files /dev/null and b/src/.vuepress/public/img/scenario-iot-en.png differ diff --git a/src/.vuepress/public/img/scenario-steel-en.png b/src/.vuepress/public/img/scenario-steel-en.png new file mode 100644 index 000000000..3457ed904 Binary files /dev/null and b/src/.vuepress/public/img/scenario-steel-en.png differ diff --git a/src/.vuepress/public/img/scenario-transportation-en.png b/src/.vuepress/public/img/scenario-transportation-en.png new file mode 100644 index 000000000..48f322d59 Binary files /dev/null and b/src/.vuepress/public/img/scenario-transportation-en.png differ diff --git a/src/UserGuide/Master/Table/API/Programming-JDBC.md b/src/UserGuide/Master/Table/API/Programming-JDBC.md new file mode 100644 index 000000000..8779ec5f8 --- /dev/null +++ b/src/UserGuide/Master/Table/API/Programming-JDBC.md @@ -0,0 +1,187 @@ + + +The IoTDB JDBC provides a standardized way to interact with the IoTDB database, allowing users to execute SQL statements from Java programs for managing databases and time-series data. It supports operations such as connecting to the database, creating, querying, updating, and deleting data, as well as batch insertion and querying of time-series data. + +**Note:** The current JDBC implementation is designed primarily for integration with third-party tools. High-performance writing **may not be achieved** when using JDBC for insert operations. For Java applications, it is recommended to use the **JAVA Native API** for optimal performance. + +## Prerequisites + +### **Environment Requirements** + +- **JDK:** Version 1.8 or higher +- **Maven:** Version 3.6 or higher + +### **Adding Maven Dependencies** + +Add the following dependency to your Maven `pom.xml` file: + +```XML + + + com.timecho.iotdb + iotdb-session + 2.0.1.1 + + +``` + +## Read and Write Operations + +**Write Operations:** Perform database operations such as inserting data, creating databases, and creating time-series using the `execute` method. + +**Read Operations:** Execute queries using the `executeQuery` method and retrieve results via the `ResultSet` object. + +### Method Overview + +| **Method Name** | **Description** | **Parameters** | **Return Value** | +| ------------------------------------------------------------ | ----------------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------- | +| Class.forName(String driver) | Loads the JDBC driver class | `driver`: Name of the JDBC driver class | `Class`: Loaded class object | +| DriverManager.getConnection(String url, String username, String password) | Establishes a database connection | `url`: Database URL `username`: Username `password`: Password | `Connection`: Database connection object | +| Connection.createStatement() | Creates a `Statement` object for executing SQL statements | None | `Statement`: SQL execution object | +| Statement.execute(String sql) | Executes a non-query SQL statement | `sql`: SQL statement to execute | `boolean`: Indicates if a `ResultSet` is returned | +| Statement.executeQuery(String sql) | Executes a query SQL statement and retrieves the result set | `sql`: SQL query statement | `ResultSet`: Query result set | +| ResultSet.getMetaData() | Retrieves metadata of the result set | None | `ResultSetMetaData`: Metadata object | +| ResultSet.next() | Moves to the next row in the result set | None | `boolean`: Whether the move was successful | +| ResultSet.getString(int columnIndex) | Retrieves the string value of a specified column | `columnIndex`: Column index (starting from 1) | `String`: Column value | + +## Sample Code + +**Note:** When using the Table Model, you must specify the `sql_dialect` parameter as `table` in the URL. Example: + +```Java +String url = "jdbc:iotdb://127.0.0.1:6667?sql_dialect=table"; +``` + +You can find the full example code at [GitHub Repository](https://github.com/apache/iotdb/blob/master/example/jdbc/src/main/java/org/apache/iotdb/TableModelJDBCExample.java). + +Here is an excerpt of the sample code: + +```Java +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb; + +import org.apache.iotdb.jdbc.IoTDBSQLException; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.Connection; +import java.sql.DriverManager; +import java.sql.ResultSet; +import java.sql.ResultSetMetaData; +import java.sql.SQLException; +import java.sql.Statement; + +public class TableModelJDBCExample { + + private static final Logger LOGGER = LoggerFactory.getLogger(TableModelJDBCExample.class); + + public static void main(String[] args) throws ClassNotFoundException, SQLException { + Class.forName("org.apache.iotdb.jdbc.IoTDBDriver"); + + // don't specify database in url + try (Connection connection = + DriverManager.getConnection( + "jdbc:iotdb://127.0.0.1:6667?sql_dialect=table", "root", "root"); + Statement statement = connection.createStatement()) { + + statement.execute("CREATE DATABASE test1"); + statement.execute("CREATE DATABASE test2"); + + statement.execute("use test2"); + + // or use full qualified table name + statement.execute( + "create table test1.table1(region_id STRING ID, plant_id STRING ID, device_id STRING ID, model STRING ATTRIBUTE, temperature FLOAT MEASUREMENT, humidity DOUBLE MEASUREMENT) with (TTL=3600000)"); + + statement.execute( + "create table table2(region_id STRING ID, plant_id STRING ID, color STRING ATTRIBUTE, temperature FLOAT MEASUREMENT, speed DOUBLE MEASUREMENT) with (TTL=6600000)"); + + // show tables from current database + try (ResultSet resultSet = statement.executeQuery("SHOW TABLES")) { + ResultSetMetaData metaData = resultSet.getMetaData(); + System.out.println(metaData.getColumnCount()); + while (resultSet.next()) { + System.out.println(resultSet.getString(1) + ", " + resultSet.getInt(2)); + } + } + + // show tables by specifying another database + // using SHOW tables FROM + try (ResultSet resultSet = statement.executeQuery("SHOW TABLES FROM test1")) { + ResultSetMetaData metaData = resultSet.getMetaData(); + System.out.println(metaData.getColumnCount()); + while (resultSet.next()) { + System.out.println(resultSet.getString(1) + ", " + resultSet.getInt(2)); + } + } + + } catch (IoTDBSQLException e) { + LOGGER.error("IoTDB Jdbc example error", e); + } + + // specify database in url + try (Connection connection = + DriverManager.getConnection( + "jdbc:iotdb://127.0.0.1:6667/test1?sql_dialect=table", "root", "root"); + Statement statement = connection.createStatement()) { + // show tables from current database test1 + try (ResultSet resultSet = statement.executeQuery("SHOW TABLES")) { + ResultSetMetaData metaData = resultSet.getMetaData(); + System.out.println(metaData.getColumnCount()); + while (resultSet.next()) { + System.out.println(resultSet.getString(1) + ", " + resultSet.getInt(2)); + } + } + + // change database to test2 + statement.execute("use test2"); + + try (ResultSet resultSet = statement.executeQuery("SHOW TABLES")) { + ResultSetMetaData metaData = resultSet.getMetaData(); + System.out.println(metaData.getColumnCount()); + while (resultSet.next()) { + System.out.println(resultSet.getString(1) + ", " + resultSet.getInt(2)); + } + } + } + } +} +``` \ No newline at end of file diff --git a/src/UserGuide/Master/Table/API/Programming-Java-Native-API.md b/src/UserGuide/Master/Table/API/Programming-Java-Native-API.md new file mode 100644 index 000000000..c97f50769 --- /dev/null +++ b/src/UserGuide/Master/Table/API/Programming-Java-Native-API.md @@ -0,0 +1,610 @@ + + +IoTDB provides a Java native client driver and a session pool management mechanism. These tools enable developers to interact with IoTDB using object-oriented APIs, allowing time-series objects to be directly assembled and inserted into the database without constructing SQL statements. It is recommended to use the `ITableSessionPool` for multi-threaded database operations to maximize efficiency. + +## Prerequisites + +### Environment Requirements + +- **JDK**: Version 1.8 or higher +- **Maven**: Version 3.6 or higher + +### Adding Maven Dependencies + +```XML + + + com.timecho.iotdb + iotdb-session + 2.0.1.1 + + +``` + +## Read and Write Operations + +### ITableSession Interface + +The `ITableSession` interface defines basic operations for interacting with IoTDB, including data insertion, query execution, and session closure. Note that this interface is **not thread-safe**. + +#### Method Overview + +| **Method Name** | **Description** | **Parameters** | **Return Value** | **Exceptions** | +| --------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------- | --------------------------------------------------------- | +| insert(Tablet tablet) | Inserts a `Tablet` containing time-series data into the database. | `tablet`: The `Tablet` object to be inserted. | None | `StatementExecutionException`, `IoTDBConnectionException` | +| executeNonQueryStatement(String sql) | Executes non-query SQL statements such as DDL or DML commands. | `sql`: The SQL statement to execute. | None | `StatementExecutionException`, `IoTDBConnectionException` | +| executeQueryStatement(String sql) | Executes a query SQL statement and returns a `SessionDataSet` containing the query results. | `sql`: The SQL query statement to execute. | `SessionDataSet` | `StatementExecutionException`, `IoTDBConnectionException` | +| executeQueryStatement(String sql, long timeoutInMs) | Executes a query SQL statement with a specified timeout in milliseconds. | `sql`: The SQL query statement. `timeoutInMs`: Query timeout in milliseconds. | `SessionDataSet` | `StatementExecutionException` | +| close() | Closes the session and releases resources. | None | None | IoTDBConnectionException | + +#### Sample Code + +```Java +/** + * This interface defines a session for interacting with IoTDB tables. + * It supports operations such as data insertion, executing queries, and closing the session. + * Implementations of this interface are expected to manage connections and ensure + * proper resource cleanup. + * + *

Each method may throw exceptions to indicate issues such as connection errors or + * execution failures. + * + *

Since this interface extends {@link AutoCloseable}, it is recommended to use + * try-with-resources to ensure the session is properly closed. + */ +public interface ITableSession extends AutoCloseable { + + /** + * Inserts a {@link Tablet} into the database. + * + * @param tablet the tablet containing time-series data to be inserted. + * @throws StatementExecutionException if an error occurs while executing the statement. + * @throws IoTDBConnectionException if there is an issue with the IoTDB connection. + */ + void insert(Tablet tablet) throws StatementExecutionException, IoTDBConnectionException; + + /** + * Executes a non-query SQL statement, such as a DDL or DML command. + * + * @param sql the SQL statement to execute. + * @throws IoTDBConnectionException if there is an issue with the IoTDB connection. + * @throws StatementExecutionException if an error occurs while executing the statement. + */ + void executeNonQueryStatement(String sql) throws IoTDBConnectionException, StatementExecutionException; + + /** + * Executes a query SQL statement and returns the result set. + * + * @param sql the SQL query statement to execute. + * @return a {@link SessionDataSet} containing the query results. + * @throws StatementExecutionException if an error occurs while executing the statement. + * @throws IoTDBConnectionException if there is an issue with the IoTDB connection. + */ + SessionDataSet executeQueryStatement(String sql) + throws StatementExecutionException, IoTDBConnectionException; + + /** + * Executes a query SQL statement with a specified timeout and returns the result set. + * + * @param sql the SQL query statement to execute. + * @param timeoutInMs the timeout duration in milliseconds for the query execution. + * @return a {@link SessionDataSet} containing the query results. + * @throws StatementExecutionException if an error occurs while executing the statement. + * @throws IoTDBConnectionException if there is an issue with the IoTDB connection. + */ + SessionDataSet executeQueryStatement(String sql, long timeoutInMs) + throws StatementExecutionException, IoTDBConnectionException; + + /** + * Closes the session, releasing any held resources. + * + * @throws IoTDBConnectionException if there is an issue with closing the IoTDB connection. + */ + @Override + void close() throws IoTDBConnectionException; +} +``` + +### TableSessionBuilder Class + +The `TableSessionBuilder` class is a builder for configuring and creating instances of the `ITableSession` interface. It allows developers to set connection parameters, query parameters, and security features. + +#### Parameter Configuration + +| **Parameter** | **Description** | **Default Value** | +|-----------------------------------------------------| ------------------------------------------------------------ | ------------------------------------------------- | +| nodeUrls(List\ nodeUrls) | Sets the list of IoTDB cluster node URLs. | `Collections.singletonList("``localhost:6667``")` | +| username(String username) | Sets the username for the connection. | `"root"` | +| password(String password) | Sets the password for the connection. | `"root"` | +| database(String database) | Sets the target database name. | `null` | +| queryTimeoutInMs(long queryTimeoutInMs) | Sets the query timeout in milliseconds. | `60000` (1 minute) | +| fetchSize(int fetchSize) | Sets the fetch size for query results. | `5000` | +| zoneId(ZoneId zoneId) | Sets the timezone-related `ZoneId`. | `ZoneId.systemDefault()` | +| thriftDefaultBufferSize(int thriftDefaultBufferSize) | Sets the default buffer size for the Thrift client (in bytes). | `1024`(1KB) | +| thriftMaxFrameSize(int thriftMaxFrameSize) | Sets the maximum frame size for the Thrift client (in bytes). | `64 * 1024 * 1024`(64MB) | +| enableRedirection(boolean enableRedirection) | Enables or disables redirection for cluster nodes. | `true` | +| enableAutoFetch(boolean enableAutoFetch) | Enables or disables automatic fetching of available DataNodes. | `true` | +| maxRetryCount(int maxRetryCount) | Sets the maximum number of connection retry attempts. | `60` | +| retryIntervalInMs(long retryIntervalInMs) | Sets the interval between retry attempts (in milliseconds). | `500`(500 millisesonds) | +| useSSL(boolean useSSL) | Enables or disables SSL for secure connections. | `false` | +| trustStore(String keyStore) | Sets the path to the trust store for SSL connections. | `null` | +| trustStorePwd(String keyStorePwd) | Sets the password for the SSL trust store. | `null` | +| enableCompression(boolean enableCompression) | Enables or disables RPC compression for the connection. | `false` | +| connectionTimeoutInMs(int connectionTimeoutInMs) | Sets the connection timeout in milliseconds. | `0` (no timeout) | + +#### Sample Code + +```Java +/** + * A builder class for constructing instances of {@link ITableSession}. + * + *

This builder provides a fluent API for configuring various options such as connection + * settings, query parameters, and security features. + * + *

All configurations have reasonable default values, which can be overridden as needed. + */ +public class TableSessionBuilder { + + /** + * Builds and returns a configured {@link ITableSession} instance. + * + * @return a fully configured {@link ITableSession}. + * @throws IoTDBConnectionException if an error occurs while establishing the connection. + */ + public ITableSession build() throws IoTDBConnectionException; + + /** + * Sets the list of node URLs for the IoTDB cluster. + * + * @param nodeUrls a list of node URLs. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue Collection.singletonList("localhost:6667") + */ + public TableSessionBuilder nodeUrls(List nodeUrls); + + /** + * Sets the username for the connection. + * + * @param username the username. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue "root" + */ + public TableSessionBuilder username(String username); + + /** + * Sets the password for the connection. + * + * @param password the password. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue "root" + */ + public TableSessionBuilder password(String password); + + /** + * Sets the target database name. + * + * @param database the database name. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue null + */ + public TableSessionBuilder database(String database); + + /** + * Sets the query timeout in milliseconds. + * + * @param queryTimeoutInMs the query timeout in milliseconds. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 60000 (1 minute) + */ + public TableSessionBuilder queryTimeoutInMs(long queryTimeoutInMs); + + /** + * Sets the fetch size for query results. + * + * @param fetchSize the fetch size. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 5000 + */ + public TableSessionBuilder fetchSize(int fetchSize); + + /** + * Sets the {@link ZoneId} for timezone-related operations. + * + * @param zoneId the {@link ZoneId}. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue ZoneId.systemDefault() + */ + public TableSessionBuilder zoneId(ZoneId zoneId); + + /** + * Sets the default init buffer size for the Thrift client. + * + * @param thriftDefaultBufferSize the buffer size in bytes. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 1024 (1 KB) + */ + public TableSessionBuilder thriftDefaultBufferSize(int thriftDefaultBufferSize); + + /** + * Sets the maximum frame size for the Thrift client. + * + * @param thriftMaxFrameSize the maximum frame size in bytes. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 64 * 1024 * 1024 (64 MB) + */ + public TableSessionBuilder thriftMaxFrameSize(int thriftMaxFrameSize); + + /** + * Enables or disables redirection for cluster nodes. + * + * @param enableRedirection whether to enable redirection. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue true + */ + public TableSessionBuilder enableRedirection(boolean enableRedirection); + + /** + * Enables or disables automatic fetching of available DataNodes. + * + * @param enableAutoFetch whether to enable automatic fetching. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue true + */ + public TableSessionBuilder enableAutoFetch(boolean enableAutoFetch); + + /** + * Sets the maximum number of retries for connection attempts. + * + * @param maxRetryCount the maximum retry count. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 60 + */ + public TableSessionBuilder maxRetryCount(int maxRetryCount); + + /** + * Sets the interval between retries in milliseconds. + * + * @param retryIntervalInMs the interval in milliseconds. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 500 milliseconds + */ + public TableSessionBuilder retryIntervalInMs(long retryIntervalInMs); + + /** + * Enables or disables SSL for secure connections. + * + * @param useSSL whether to enable SSL. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue false + */ + public TableSessionBuilder useSSL(boolean useSSL); + + /** + * Sets the trust store path for SSL connections. + * + * @param keyStore the trust store path. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue null + */ + public TableSessionBuilder trustStore(String keyStore); + + /** + * Sets the trust store password for SSL connections. + * + * @param keyStorePwd the trust store password. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue null + */ + public TableSessionBuilder trustStorePwd(String keyStorePwd); + + /** + * Enables or disables rpc compression for the connection. + * + * @param enableCompression whether to enable compression. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue false + */ + public TableSessionBuilder enableCompression(boolean enableCompression); + + /** + * Sets the connection timeout in milliseconds. + * + * @param connectionTimeoutInMs the connection timeout in milliseconds. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 0 (no timeout) + */ + public TableSessionBuilder connectionTimeoutInMs(int connectionTimeoutInMs); +} +``` + +## Session Pool + +### ITableSessionPool Interface + +The `ITableSessionPool` interface manages a pool of `ITableSession` instances, enabling efficient reuse of connections and proper cleanup of resources. + +#### Method Overview + +| **Method Name** | **Description** | **Return Value** | **Exceptions** | +| --------------- | ---------------------------------------------------------- | ---------------- | -------------------------- | +| getSession() | Acquires a session from the pool for database interaction. | `ITableSession` | `IoTDBConnectionException` | +| close() | Closes the session pool and releases resources.。 | None | None | + +#### Sample Code + +```Java +/** + * This interface defines a pool for managing {@link ITableSession} instances. + * It provides methods to acquire a session from the pool and to close the pool. + * + *

The implementation should handle the lifecycle of sessions, ensuring efficient + * reuse and proper cleanup of resources. + */ +public interface ITableSessionPool { + + /** + * Acquires an {@link ITableSession} instance from the pool. + * + * @return an {@link ITableSession} instance for interacting with the IoTDB. + * @throws IoTDBConnectionException if there is an issue obtaining a session from the pool. + */ + ITableSession getSession() throws IoTDBConnectionException; + + /** + * Closes the session pool, releasing any held resources. + * + *

Once the pool is closed, no further sessions can be acquired. + */ + void close(); +} +``` + +### TableSessionPoolBuilder Class + +The `TableSessionPoolBuilder` class is a builder for configuring and creating `ITableSessionPool` instances, supporting options like connection settings and pooling behavior. + +#### Parameter Configuration + +| **Parameter** | **Description** | **Default Value** | +|---------------------------------------------------------------| ------------------------------------------------------------ | --------------------------------------------- | +| nodeUrls(List\ nodeUrls) | Sets the list of IoTDB cluster node URLs. | `Collections.singletonList("localhost:6667")` | +| maxSize(int maxSize) | Sets the maximum size of the session pool, i.e., the maximum number of sessions allowed in the pool. | `5` | +| user(String user) | Sets the username for the connection. | `"root"` | +| password(String password) | Sets the password for the connection. | `"root"` | +| database(String database) | Sets the target database name. | `"root"` | +| queryTimeoutInMs(long queryTimeoutInMs) | Sets the query timeout in milliseconds. | `60000`(1 minute) | +| fetchSize(int fetchSize) | Sets the fetch size for query results. | `5000` | +| zoneId(ZoneId zoneId) | Sets the timezone-related `ZoneId`. | `ZoneId.systemDefault()` | +| waitToGetSessionTimeoutInMs(long waitToGetSessionTimeoutInMs) | Sets the timeout duration (in milliseconds) for acquiring a session from the pool. | `30000`(30 seconds) | +| thriftDefaultBufferSize(int thriftDefaultBufferSize) | Sets the default buffer size for the Thrift client (in bytes). | `1024`(1KB) | +| thriftMaxFrameSize(int thriftMaxFrameSize) | Sets the maximum frame size for the Thrift client (in bytes). | `64 * 1024 * 1024`(64MB) | +| enableCompression(boolean enableCompression) | Enables or disables compression for the connection. | `false` | +| enableRedirection(boolean enableRedirection) | Enables or disables redirection for cluster nodes. | `true` | +| connectionTimeoutInMs(int connectionTimeoutInMs) | Sets the connection timeout in milliseconds. | `10000` (10 seconds) | +| enableAutoFetch(boolean enableAutoFetch) | Enables or disables automatic fetching of available DataNodes. | `true` | +| maxRetryCount(int maxRetryCount) | Sets the maximum number of connection retry attempts. | `60` | +| retryIntervalInMs(long retryIntervalInMs) | Sets the interval between retry attempts (in milliseconds). | `500` (500 milliseconds) | +| useSSL(boolean useSSL) | Enables or disables SSL for secure connections. | `false` | +| trustStore(String keyStore) | Sets the path to the trust store for SSL connections. | `null` | +| trustStorePwd(String keyStorePwd) | Sets the password for the SSL trust store. | `null` | + +#### Sample Code + +```Java +/** + * A builder class for constructing instances of {@link ITableSessionPool}. + * + *

This builder provides a fluent API for configuring a session pool, including + * connection settings, session parameters, and pool behavior. + * + *

All configurations have reasonable default values, which can be overridden as needed. + */ +public class TableSessionPoolBuilder { + + /** + * Builds and returns a configured {@link ITableSessionPool} instance. + * + * @return a fully configured {@link ITableSessionPool}. + */ + public ITableSessionPool build(); + + /** + * Sets the list of node URLs for the IoTDB cluster. + * + * @param nodeUrls a list of node URLs. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue Collection.singletonList("localhost:6667") + */ + public TableSessionPoolBuilder nodeUrls(List nodeUrls); + + /** + * Sets the maximum size of the session pool. + * + * @param maxSize the maximum number of sessions allowed in the pool. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 5 + */ + public TableSessionPoolBuilder maxSize(int maxSize); + + /** + * Sets the username for the connection. + * + * @param user the username. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue "root" + */ + public TableSessionPoolBuilder user(String user); + + /** + * Sets the password for the connection. + * + * @param password the password. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue "root" + */ + public TableSessionPoolBuilder password(String password); + + /** + * Sets the target database name. + * + * @param database the database name. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue "root" + */ + public TableSessionPoolBuilder database(String database); + + /** + * Sets the query timeout in milliseconds. + * + * @param queryTimeoutInMs the query timeout in milliseconds. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 60000 (1 minute) + */ + public TableSessionPoolBuilder queryTimeoutInMs(long queryTimeoutInMs); + + /** + * Sets the fetch size for query results. + * + * @param fetchSize the fetch size. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 5000 + */ + public TableSessionPoolBuilder fetchSize(int fetchSize); + + /** + * Sets the {@link ZoneId} for timezone-related operations. + * + * @param zoneId the {@link ZoneId}. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue ZoneId.systemDefault() + */ + public TableSessionPoolBuilder zoneId(ZoneId zoneId); + + /** + * Sets the timeout for waiting to acquire a session from the pool. + * + * @param waitToGetSessionTimeoutInMs the timeout duration in milliseconds. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 30000 (30 seconds) + */ + public TableSessionPoolBuilder waitToGetSessionTimeoutInMs(long waitToGetSessionTimeoutInMs); + + /** + * Sets the default buffer size for the Thrift client. + * + * @param thriftDefaultBufferSize the buffer size in bytes. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 1024 (1 KB) + */ + public TableSessionPoolBuilder thriftDefaultBufferSize(int thriftDefaultBufferSize); + + /** + * Sets the maximum frame size for the Thrift client. + * + * @param thriftMaxFrameSize the maximum frame size in bytes. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 64 * 1024 * 1024 (64 MB) + */ + public TableSessionPoolBuilder thriftMaxFrameSize(int thriftMaxFrameSize); + + /** + * Enables or disables compression for the connection. + * + * @param enableCompression whether to enable compression. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue false + */ + public TableSessionPoolBuilder enableCompression(boolean enableCompression); + + /** + * Enables or disables redirection for cluster nodes. + * + * @param enableRedirection whether to enable redirection. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue true + */ + public TableSessionPoolBuilder enableRedirection(boolean enableRedirection); + + /** + * Sets the connection timeout in milliseconds. + * + * @param connectionTimeoutInMs the connection timeout in milliseconds. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 10000 (10 seconds) + */ + public TableSessionPoolBuilder connectionTimeoutInMs(int connectionTimeoutInMs); + + /** + * Enables or disables automatic fetching of available DataNodes. + * + * @param enableAutoFetch whether to enable automatic fetching. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue true + */ + public TableSessionPoolBuilder enableAutoFetch(boolean enableAutoFetch); + + /** + * Sets the maximum number of retries for connection attempts. + * + * @param maxRetryCount the maximum retry count. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 60 + */ + public TableSessionPoolBuilder maxRetryCount(int maxRetryCount); + + /** + * Sets the interval between retries in milliseconds. + * + * @param retryIntervalInMs the interval in milliseconds. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 500 milliseconds + */ + public TableSessionPoolBuilder retryIntervalInMs(long retryIntervalInMs); + + /** + * Enables or disables SSL for secure connections. + * + * @param useSSL whether to enable SSL. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue false + */ + public TableSessionPoolBuilder useSSL(boolean useSSL); + + /** + * Sets the trust store path for SSL connections. + * + * @param keyStore the trust store path. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue null + */ + public TableSessionPoolBuilder trustStore(String keyStore); + + /** + * Sets the trust store password for SSL connections. + * + * @param keyStorePwd the trust store password. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue null + */ + public TableSessionPoolBuilder trustStorePwd(String keyStorePwd); +} +``` \ No newline at end of file diff --git a/src/UserGuide/Master/Table/API/Programming-Python-Native-API.md b/src/UserGuide/Master/Table/API/Programming-Python-Native-API.md new file mode 100644 index 000000000..5ed77d6da --- /dev/null +++ b/src/UserGuide/Master/Table/API/Programming-Python-Native-API.md @@ -0,0 +1,448 @@ + + +IoTDB provides a Python native client driver and a session pool management mechanism. These tools allow developers to interact with IoTDB in a programmatic and efficient manner. Using the Python API, developers can encapsulate time-series data into objects (e.g., `Tablet`, `NumpyTablet`) and insert them into the database directly, without the need to manually construct SQL statements. For multi-threaded operations, the `TableSessionPool` is recommended to optimize resource utilization and enhance performance. + +## Prerequisites + +To use the IoTDB Python API, install the required package using pip: + +```Java +pip3 install apache-iotdb +``` + +## Read and Write Operations + +### TableSession + +`TableSession` is a core class in IoTDB, enabling users to interact with the IoTDB database. It provides methods to execute SQL statements, insert data, and manage database sessions. + +#### Method Overview + +| **Method Name** | **Descripton** | **Parameter Type** | **Return Type** | +| --------------------------- | ----------------------------------------------------- | ------------------------------------ | ---------------- | +| insert | Inserts data into the database. | tablet: `Union[Tablet, NumpyTablet]` | None | +| execute_non_query_statement | Executes non-query SQL statements like DDL/DML. | sql: `str` | None | +| execute_query_statement | Executes a query SQL statement and retrieves results. | sql: `str` | `SessionDataSet` | +| close | Closes the session and releases resources. | None | None | + +#### Sample Code + +```Python +class TableSession(object): +def insert(self, tablet: Union[Tablet, NumpyTablet]): + """ + Insert data into the database. + + Parameters: + tablet (Tablet | NumpyTablet): The tablet containing the data to be inserted. + Accepts either a `Tablet` or `NumpyTablet`. + + Raises: + IoTDBConnectionException: If there is an issue with the database connection. + """ + pass + +def execute_non_query_statement(self, sql: str): + """ + Execute a non-query SQL statement. + + Parameters: + sql (str): The SQL statement to execute. Typically used for commands + such as INSERT, DELETE, or UPDATE. + + Raises: + IoTDBConnectionException: If there is an issue with the database connection. + """ + pass + +def execute_query_statement(self, sql: str, timeout_in_ms: int = 0) -> "SessionDataSet": + """ + Execute a query SQL statement and return the result set. + + Parameters: + sql (str): The SQL query to execute. + timeout_in_ms (int, optional): Timeout for the query in milliseconds. Defaults to 0, + which means no timeout. + + Returns: + SessionDataSet: The result set of the query. + + Raises: + IoTDBConnectionException: If there is an issue with the database connection. + """ + pass + +def close(self): + """ + Close the session and release resources. + + Raises: + IoTDBConnectionException: If there is an issue closing the connection. + """ + pass +``` + +### TableSessionConfig + +`TableSessionConfig` is a configuration class that sets parameters for creating a `TableSession` instance, defining essential settings for connecting to the IoTDB database. + +#### Parameter Configuration + +| **Parameter** | **Description** | **Type** | **Default Value** | +| ------------------ | ------------------------------------- | -------- | ------------------------- | +| node_urls | List of database node URLs. | `list` | `["localhost:6667"]` | +| username | Username for the database connection. | `str` | `"root"` | +| password | Password for the database connection. | `str` | `"root"` | +| database | Target database to connect to. | `str` | `None` | +| fetch_size | Number of rows to fetch per query. | `int` | `5000` | +| time_zone | Default session time zone. | `str` | `Session.DEFAULT_ZONE_ID` | +| enable_compression | Enable data compression. | `bool` | `False` | + +#### Sample Code + +```Python +class TableSessionConfig(object): + """ + Configuration class for a TableSession. + + This class defines various parameters for connecting to and interacting + with the IoTDB tables. + """ + + def __init__( + self, + node_urls: list = None, + username: str = Session.DEFAULT_USER, + password: str = Session.DEFAULT_PASSWORD, + database: str = None, + fetch_size: int = 5000, + time_zone: str = Session.DEFAULT_ZONE_ID, + enable_compression: bool = False, + ): + """ + Initialize a TableSessionConfig object with the provided parameters. + + Parameters: + node_urls (list, optional): A list of node URLs for the database connection. + Defaults to ["localhost:6667"]. + username (str, optional): The username for the database connection. + Defaults to "root". + password (str, optional): The password for the database connection. + Defaults to "root". + database (str, optional): The target database to connect to. Defaults to None. + fetch_size (int, optional): The number of rows to fetch per query. Defaults to 5000. + time_zone (str, optional): The default time zone for the session. + Defaults to Session.DEFAULT_ZONE_ID. + enable_compression (bool, optional): Whether to enable data compression. + Defaults to False. + """ +``` + +**Note:** After using a `TableSession`, make sure to call the `close` method to release resources. + +## Session Pool + +### TableSessionPool + +`TableSessionPool` is a session pool management class designed for creating and managing `TableSession` instances. It provides functionality to retrieve sessions from the pool and close the pool when it is no longer needed. + +#### Method Overview + +| **Method Name** | **Description** | **Return Type** | **Exceptions** | +| --------------- | ------------------------------------------------------ | --------------- | -------------- | +| get_session | Retrieves a new `TableSession` instance from the pool. | `TableSession` | None | +| close | Closes the session pool and releases all resources. | None | None | + +#### Sample Code + +```Java +def get_session(self) -> TableSession: + """ + Retrieve a new TableSession instance. + + Returns: + TableSession: A new session object configured with the session pool. + + Notes: + The session is initialized with the underlying session pool for managing + connections. Ensure proper usage of the session's lifecycle. + """ + +def close(self): + """ + Close the session pool and release all resources. + + This method closes the underlying session pool, ensuring that all + resources associated with it are properly released. + + Notes: + After calling this method, the session pool cannot be used to retrieve + new sessions, and any attempt to do so may raise an exception. + """ +``` + +### TableSessionPoolConfig + +`TableSessionPoolConfig` is a configuration class used to define parameters for initializing and managing a `TableSessionPool` instance. It specifies the settings needed for efficient session pool management in IoTDB. + +#### Parameter Configuration + +| **Paramater** | **Description** | **Type** | **Default Value** | +| ------------------ | ------------------------------------------------------------ | -------- | -------------------------- | +| node_urls | List of IoTDB cluster node URLs. | `list` | None | +| max_pool_size | Maximum size of the session pool, i.e., the maximum number of sessions allowed in the pool. | `int` | `5` | +| username | Username for the connection. | `str` | `Session.DEFAULT_USER` | +| password | Password for the connection. | `str` | `Session.DEFAULT_PASSWORD` | +| database | Target database to connect to. | `str` | None | +| fetch_size | Fetch size for query results | `int` | `5000` | +| time_zone | Timezone-related `ZoneId` | `str` | `Session.DEFAULT_ZONE_ID` | +| enable_redirection | Whether to enable redirection. | `bool` | `False` | +| enable_compression | Whether to enable data compression. | `bool` | `False` | +| wait_timeout_in_ms | Sets the connection timeout in milliseconds. | `int` | `10000` | +| max_retry | Maximum number of connection retry attempts. | `int` | `3` | + +#### Sample Code + +```Java +class TableSessionPoolConfig(object): + """ + Configuration class for a TableSessionPool. + + This class defines the parameters required to initialize and manage + a session pool for interacting with the IoTDB database. + """ + def __init__( + self, + node_urls: list = None, + max_pool_size: int = 5, + username: str = Session.DEFAULT_USER, + password: str = Session.DEFAULT_PASSWORD, + database: str = None, + fetch_size: int = 5000, + time_zone: str = Session.DEFAULT_ZONE_ID, + enable_redirection: bool = False, + enable_compression: bool = False, + wait_timeout_in_ms: int = 10000, + max_retry: int = 3, + ): + """ + Initialize a TableSessionPoolConfig object with the provided parameters. + + Parameters: + node_urls (list, optional): A list of node URLs for the database connection. + Defaults to None. + max_pool_size (int, optional): The maximum number of sessions in the pool. + Defaults to 5. + username (str, optional): The username for the database connection. + Defaults to Session.DEFAULT_USER. + password (str, optional): The password for the database connection. + Defaults to Session.DEFAULT_PASSWORD. + database (str, optional): The target database to connect to. Defaults to None. + fetch_size (int, optional): The number of rows to fetch per query. Defaults to 5000. + time_zone (str, optional): The default time zone for the session pool. + Defaults to Session.DEFAULT_ZONE_ID. + enable_redirection (bool, optional): Whether to enable redirection. + Defaults to False. + enable_compression (bool, optional): Whether to enable data compression. + Defaults to False. + wait_timeout_in_ms (int, optional): The maximum time (in milliseconds) to wait for a session + to become available. Defaults to 10000. + max_retry (int, optional): The maximum number of retry attempts for operations. Defaults to 3. + + """ +``` + +**Notes:** + +- Ensure that `TableSession` instances retrieved from the `TableSessionPool` are properly closed after use. +- After closing the `TableSessionPool`, it will no longer be possible to retrieve new sessions. + +## Sample Code + +**Session** Example: You can find the full example code at [GitHub Repository](https://github.com/apache/iotdb/blob/master/iotdb-client/client-py/table_model_session_example.py). + +**Session Pool** Example: You can find the full example code at [GitHub Repository](https://github.com/apache/iotdb/blob/master/iotdb-client/client-py/table_model_session_pool_example.py). + +Here is an excerpt of the sample code: + +```Java +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +import threading + +import numpy as np + +from iotdb.table_session_pool import TableSessionPool, TableSessionPoolConfig +from iotdb.utils.IoTDBConstants import TSDataType +from iotdb.utils.NumpyTablet import NumpyTablet +from iotdb.utils.Tablet import ColumnType, Tablet + + +def prepare_data(): + print("create database") + # Get a session from the pool + session = session_pool.get_session() + session.execute_non_query_statement("CREATE DATABASE IF NOT EXISTS db1") + session.execute_non_query_statement('USE "db1"') + session.execute_non_query_statement( + "CREATE TABLE table0 (id1 string id, attr1 string attribute, " + + "m1 double " + + "measurement)" + ) + session.execute_non_query_statement( + "CREATE TABLE table1 (id1 string id, attr1 string attribute, " + + "m1 double " + + "measurement)" + ) + + print("now the tables are:") + # show result + res = session.execute_query_statement("SHOW TABLES") + while res.has_next(): + print(res.next()) + + session.close() + + +def insert_data(num: int): + print("insert data for table" + str(num)) + # Get a session from the pool + session = session_pool.get_session() + column_names = [ + "id1", + "attr1", + "m1", + ] + data_types = [ + TSDataType.STRING, + TSDataType.STRING, + TSDataType.DOUBLE, + ] + column_types = [ColumnType.ID, ColumnType.ATTRIBUTE, ColumnType.MEASUREMENT] + timestamps = [] + values = [] + for row in range(15): + timestamps.append(row) + values.append(["id:" + str(row), "attr:" + str(row), row * 1.0]) + tablet = Tablet( + "table" + str(num), column_names, data_types, values, timestamps, column_types + ) + session.insert(tablet) + session.execute_non_query_statement("FLush") + + np_timestamps = np.arange(15, 30, dtype=np.dtype(">i8")) + np_values = [ + np.array(["id:{}".format(i) for i in range(15, 30)]), + np.array(["attr:{}".format(i) for i in range(15, 30)]), + np.linspace(15.0, 29.0, num=15, dtype=TSDataType.DOUBLE.np_dtype()), + ] + + np_tablet = NumpyTablet( + "table" + str(num), + column_names, + data_types, + np_values, + np_timestamps, + column_types=column_types, + ) + session.insert(np_tablet) + session.close() + + +def query_data(): + # Get a session from the pool + session = session_pool.get_session() + + print("get data from table0") + res = session.execute_query_statement("select * from table0") + while res.has_next(): + print(res.next()) + + print("get data from table1") + res = session.execute_query_statement("select * from table0") + while res.has_next(): + print(res.next()) + + session.close() + + +def delete_data(): + session = session_pool.get_session() + session.execute_non_query_statement("drop database db1") + print("data has been deleted. now the databases are:") + res = session.execute_query_statement("show databases") + while res.has_next(): + print(res.next()) + session.close() + + +# Create a session pool +username = "root" +password = "root" +node_urls = ["127.0.0.1:6667", "127.0.0.1:6668", "127.0.0.1:6669"] +fetch_size = 1024 +database = "db1" +max_pool_size = 5 +wait_timeout_in_ms = 3000 +config = TableSessionPoolConfig( + node_urls=node_urls, + username=username, + password=password, + database=database, + max_pool_size=max_pool_size, + fetch_size=fetch_size, + wait_timeout_in_ms=wait_timeout_in_ms, +) +session_pool = TableSessionPool(config) + +prepare_data() + +insert_thread1 = threading.Thread(target=insert_data, args=(0,)) +insert_thread2 = threading.Thread(target=insert_data, args=(1,)) + +insert_thread1.start() +insert_thread2.start() + +insert_thread1.join() +insert_thread2.join() + +query_data() +delete_data() +session_pool.close() +print("example is finished!") +``` + diff --git a/src/UserGuide/Master/Table/IoTDB-Introduction/Release-history_timecho.md b/src/UserGuide/Master/Table/IoTDB-Introduction/Release-history_timecho.md new file mode 100644 index 000000000..9cf91d467 --- /dev/null +++ b/src/UserGuide/Master/Table/IoTDB-Introduction/Release-history_timecho.md @@ -0,0 +1,299 @@ + + +### TimechoDB (Database Core) + +#### **V1.3.4.1** + +> **Release Date**: January 8, 2025 +> +> **Download**: Please contact the Timecho team for download. + +Version V1.3.4.1 introduces a pattern-matching function and further optimizes the data subscription mechanism for improved stability. The `import-data` and `export-data` scripts have been enhanced to support additional data types. The `import-data` and `export-data` scripts have been unified, now supporting the import and export of `TsFile`, `CSV`, and `SQL` formats. Meanwhile, comprehensive improvements have been made to database monitoring, performance, and stability. The specific release contents are as follows: + +- **Query** **Module**: Users can configure UDF, PipePlugin, Trigger, and AINode settings and load JAR packages via a URI. +- **System Module**: + - Expansion of UDF, + - Added `pattern_match` function for pattern matching. +- **Data Synchronization**: Supports specifying authentication information on the sender side. +- **Ecosystem Integration**: Kubernetes Operator compatibility. +- **Scripts & Tools**: + - `import-data`/`export-data` scripts now support new data types (strings, large binary objects, dates, timestamps). + - Unified import/export compatibility for TsFile, CSV, and SQL formats. + +#### **V1.3.3.3** + +> **Release Date**: October 31, 2024 +> +> **Download**: Please contact the Timecho team for download. + +Version V1.3.3.3 adds the following features: optimization of restart and recovery performance to reduce startup time; the `DataNode `actively listens for and loads `TsFile` data; addition of observability indicators; once the sender transfers files to a specified directory, the receiver automatically loads them into IoTDB.; the `Alter Pipe` supports the `Alter Source` capability. At the same time, comprehensive improvements have been made to database monitoring, performance, and stability. The specific release contents are as follows: + +- **Data Synchronization**: + - Automatic data type conversion on the receiver side. + - Enhanced observability with ops/latency metrics for internal interfaces. + - OPC-UA-SINK plugin now supports CS mode and non-anonymous access. +- **Data Subscription**: SDK supports `CREATE IF NOT EXISTS` and `DROP IF EXISTS` interfaces. +- **Stream Processing**: `ALTER PIPE` supports `ALTER SOURCE` capability. +- **System Module**: Added latency monitoring for REST modules. +- **Scripts & Tools**: + - Auto-loading `TsFile` from specified directories. + - `import-tsfile` script supports remote server execution. + - Added Kubernetes Helm support. + - Python client now supports new data types (strings, large binary objects, dates, timestamps). + +#### **V1.3.3.2** + +> **Release Date**: August 15, 2024 +> +> **Download**: Please contact the Timecho team for download. + +Version V1.3.3.2 supports outputting the time consumption of reading `mods `files, the memory for maximum sequential disorder merge-sort during input, and the dispatch time consumption. It also enables adjustment of the time partition origin through parameter configuration, and supports automatic termination of subscriptions based on the end-marker of historical pipe data processing. Meanwhile, it combines the performance improvement of module memory control. The specific release contents are as follows: + +- **Query** **Module**: + - `EXPLAIN ANALYZE` now reports time spent reading mods files. + - Metrics for merge-sort memory usage and dispatch latency. +- **Storage Module**: Added configurable time partition origin adjustment. +- **Stream Processing**: Auto-terminate subscriptions based on pipe history markers. +- **Data Synchronization**: RPC compression now supports configurable levels. +- **Scripts & Tools**: Metadata export excludes only `root.__system`, not similar prefixes. + +#### **V1.3.3.1** + +> **Release Date**: July 12, 2024 +> +> **Download**: Please contact the Timecho team for download. + +In version V1.3.3.1, a throttling mechanism is added to multi-tier storage. Data synchronization supports specifying username and password authentication for the receiver at the sender's sink. Some unclear WARN logs on the data synchronization receiver side are optimized, the restart-recovery performance is enhanced, and the startup time is reduced. Meanwhile, the script contents are merged. The specific release contents are as follows: + +- **Storage Module**: Rate-limiting added to multi-tier storage. +- **Data Synchronization**: Sender-side username/password authentication for receivers. +- **System Module**: + - Merged configuration files into `iotdb-system.properties`. + - Optimized restart recovery time. +- **Query** **Module**: + - Improved filter performance for aggregation and WHERE clauses. + - Java Session client distributes SQL query requests evenly to all nodes. + +#### **V1.3.2.2** + +> **Release Date**: June 4, 2024 +> +> **Download**: Please contact the Timecho team for download. + +The V1.3.2.2 version introduces the Explain Analyze statement for analyzing the execution time of a single `SQL `query, a User-Defined Aggregate Function (`UDAF`) framework, automatic data deletion when disk space reaches a set threshold, schema synchronization, counting data points in specified paths, and `SQL `script import/export functionality. The cluster management tool now supports rolling upgrades and plugin deployment across the entire cluster. Comprehensive improvements have also been made to database monitoring, performance, and stability. The specific release content is as follows: + +**Storage Module:** + +- Improved write performance of the `insertRecords `interface. +- Added `SpaceTL `functionality to automatically delete data when disk space reaches a set threshold. + +**Query** **Module:** + +- Added the `Explain Analyze` statement to monitor the execution time of each stage of a single SQL query. +- Introduced a User-Defined Aggregate Function (UDAF) framework. +- Added envelope demodulation analysis in UDF. +- Added `MaxBy/MinBy` functions to return the corresponding timestamp while obtaining the maximum/minimum value. +- Improved performance of value filter queries. + +**Data Synchronization:** + +- Path matching now supports wildcards. +- Schema synchronization is now supported, including time series and related attributes, permissions, and other settings. + +**Stream Processing:** + +- Added the `Alter Pipe` statement to support hot updates of Pipe task plugins. + +**System Module:** + +- Enhanced system data point counting to include statistics for `load TsFile`. + +**Scripts and Tools:** + +- Added a local upgrade backup tool that uses hard links to back up existing data. +- Introduced `export-data/import-data` scripts to support data export in `CSV`, `TsFile `, or as `SQL `statements. +- The Windows environment now supports distinguishing `ConfigNode`, `DataNode`, and `Cli `by window name. + +#### **V1.3.1.4** + +> **Release Date**: April 23, 2024 +> +> **Download**: Please contact the Timecho team for download. + +The V1.3.1 release introduces several new features and enhancements, including the ability to view system activation status, built-in variance and standard deviation aggregate functions, timeout settings for the built-in `Fill `statement, and a `TsFile `repair command. Additionally, one-click scripts for collecting instance information and starting/stopping the cluster have been added. The usability and performance of views and stream processing have also been optimized. The specific release content is as follows: + +**Query** **Module:** + +- The `Fill `clause now supports setting a fill timeout threshold; no fill will occur if the time threshold is exceeded. +- The `REST API` (V2 version) now returns column types. + +**Data Synchronization:** + +- Simplified the way to specify time ranges for data synchronization by directly setting start and end times. +- Data synchronization now supports the `SSL `transport protocol (via the` iotdb-thrift-ssl-sink` plugin). + +**System Module:** + +- Added the ability to query cluster activation information using SQL. +- Added transmission rate control during data migration in multi-tier storage. +- Enhanced system observability (added divergence monitoring for cluster nodes and observability for the distributed task scheduling framework). +- Optimized the default log output strategy. + +**Scripts and Tools:** + +- Added one-click scripts to start and stop the cluster (`start-all/stop-all.sh & start-all/stop-all.bat`). +- Added one-click scripts to collect instance information (`collect-info.sh & collect-info.bat`). + +#### **V1.3.0.4** + +> **Release Date**: January 3, 2024 +> +> **Download**: Please contact the Timecho team for download. + + + +The V1.3.0.4 release introduces a new inborn machine learning framework `AINode`, a comprehensive upgrade of the permission module to support sequence-granularity permissions, and numerous detail optimizations for views and stream processing. These enhancements further improve usability, version stability, and overall performance. The specific release content is as follows: + +**Query** **Module:** + +- Added the `AINode `inborn machine learning module. +- Optimized the performance of the `show path` statement to reduce response time. + +**Security Module:** + +- Upgraded the permission module to support permission settings at the time-series granularity. +- Added `SSL `communication encryption between clients and servers. + +**Stream Processing:** + +- Added multiple new metrics for monitoring in the stream processing module. + +**Query** **Module:** + +- Non-writable view sequences now support `LAST` queries. +- Optimized the accuracy of data point monitoring statistics. + +#### **V1.2.0.1** + +> **Release Date**: June 30, 2023 +> +> **Download**: Please contact the Timecho team for download. + +The V1.2.0.1 release introduces several new features, including a new stream processing framework, dynamic templates, and built-in query functions such as `substring`, `replace`, and `round`. It also enhances the functionality of built-in statements like `show region`, `show timeseries`, and `show variable`, as well as the Session interface. Additionally, it optimizes built-in monitoring items and their implementation, and fixes several product bugs and performance issues. The specific release content is as follows: + +**Stream Processing:** + +- Added a new stream processing framework. + +**Schema Module:** + +- Added dynamic template expansion functionality. + +**Storage Module:** + +- Added SPRINTZ and RLBE encoding, as well as the LZMA2 compression algorithm. + +**Query** **Module:** + +- Added built-in scalar functions: `cast`, `round`, `substr`, `replace`. +- Added built-in aggregate functions: `time_duration`, `mode`. +- SQL statements now support `CASE WHEN` syntax. +- SQL statements now support `ORDER BY` expressions. + +**Interface Module:** + +- Python API now supports connecting to multiple distributed nodes. +- Python client now supports write redirection. +- Session API added an interface for creating sequences in batches using templates. + +#### **V1.1.0.1** + +> **Release Date**: April 3, 2023 +> +> **Download**: Please contact the Timecho team for download. + + + +The V1.1.0.1 release introduces several new features, including support for `GROUP BY VARIATION`, `GROUP BY CONDITION`, and useful functions like `DIFF` and `COUNT_IF`. It also introduces the pipeline execution engine to further improve query speed. Additionally, it fixes several issues related to last query alignment, `LIMIT` and `OFFSET` functionality, metadata template errors after restart, and sequence creation errors after deleting all databases. The specific release content is as follows: + +**Query** **Module:** + +- `ALIGN BY DEVICE` statements now support `ORDER BY TIME`. +- Added support for the `SHOW QUERIES` command. +- Added support for the `KILL QUERY` command. + +**System Module:** + +- `SHOW REGIONS` now supports specifying a particular database. +- Added the `SHOW VARIABLES` SQL command to display current cluster parameters. +- Aggregation queries now support `GROUP BY VARIATION`. +- `SELECT INTO` now supports explicit data type conversion. +- Implemented the built-in scalar function `DIFF`. +- `SHOW REGIONS` now displays creation time. +- Implemented the built-in aggregate function `COUNT_IF`. +- Aggregation queries now support `GROUP BY CONDITION`. +- Added support for modifying `dn_rpc_port` and `dn_rpc_address`. + +#### **V1.0.0.1** + +> **Release Date**: December 3, 2022 +> +> **Download**: Please contact the Timecho team for download. + + + +The V1.0.0.1 release focuses on fixing issues related to partition computation and query execution, undeleted historical snapshots, data query problems, and SessionPool memory usage. It also introduces several new features, such as support for `SHOW VARIABLES`, `EXPLAIN ALIGN BY DEVICE`, and enhanced functionality for ExportCSV/ExportTsFile/MQTT. Additionally, it improves the cluster startup/shutdown process, changes the default internal ports of the IoTDB cluster, and adds the `cluster_name` attribute to distinguish clusters. The specific release content is as follows: + +**System Module:** + +- Added support for distributed high-availability architecture. +- Added support for multi-replica storage. +- If a port is already in use, the node startup process will be terminated. +- Added cluster management SQL. +- Added functional management for starting, stopping, and removing ConfigNodes and DataNodes. +- Configurable consensus protocol framework and multiple consensus protocols: Simple, IoTConsensus, Ratis. +- Added multi-replica management for data, schema, and ConfigNodes. + +**Query** **Module:** + +- Added support for the large-scale parallel processing framework MPP, providing distributed read/write capabilities. + +**Stream Processing Module:** + +- Added support for the stream processing framework. +- Added support for data synchronization between clusters. + +### Workbench (Console Tool) + +| Version | Key New Features | Supported IoTDB Versions | +| :------ | :--------------------------------------- | :----------------------- | +| V1.5.1 | AI analysis, pattern matching | V1.3.2+ | +| V1.4.0 | Tree model visualization, English UI | V1.3.2+ | +| V1.3.1 | Enhanced analysis templates | V1.3.2+ | +| V1.3.0 | Database configuration tools | V1.3.2+ | +| V1.2.6 | Improved permission controls | V1.3.1+ | +| V1.2.5 | Template caching, UI optimizations | V1.3.0+ | +| V1.2.4 | Data import/export, time alignment | V1.2.2+ | +| V1.2.3 | Activation details, analysis tools | V1.2.2+ | +| V1.2.2 | Enhanced measurement point descriptions | V1.2.2+ | +| V1.2.1 | Sync monitoring panel, Prometheus alerts | V1.2.2+ | +| V1.2.0 | Major Workbench upgrade | V1.2.0+ | \ No newline at end of file diff --git a/src/UserGuide/Master/Table/IoTDB-Introduction/Scenario.md b/src/UserGuide/Master/Table/IoTDB-Introduction/Scenario.md new file mode 100644 index 000000000..6709c7009 --- /dev/null +++ b/src/UserGuide/Master/Table/IoTDB-Introduction/Scenario.md @@ -0,0 +1,80 @@ + + +## Scenario 1: Energy & Power + +#### **Background** + +By collecting, storing, and analyzing massive time-series data from power generation, transmission, storage, and consumption processes—combined with real-time monitoring, accurate forecasting, and intelligent scheduling of power systems—enterprises can significantly improve energy efficiency, reduce operational costs, ensure the safety and sustainability of energy production, and maintain the stable operation of power grids. + +#### **Architecture** + +IoTDB provides a self-hosted time-series database solution with high availability, efficient data synchronization across networks, and optimized performance for large-scale data ingestion and querying. It enables power enterprises to handle large-scale time-series data efficiently, supporting real-time anomaly detection, forecasting models, and intelligent scheduling for both traditional and renewable energy sources. + +![](/img/scenario-energy-en.png) + +## Scenario 2: Aerospace + +#### **Background** + +With the rapid evolution of aerospace technology, digital transformation has become essential to improving flight safety and system performance. The aerospace industry generates vast amounts of time-series data throughout the lifecycle of aircraft, rockets, and satellites—from design and manufacturing to testing and operation. Managing and analyzing telemetry data in real time is critical for mission reliability, system optimization, and failure prevention. + +#### **Architecture** + +IoTDB’s high-performance time-series data processing capabilities enable real-time telemetry analysis, low-bandwidth data synchronization, and seamless offline data migration. Its flexible deployment and resource-efficient architecture provide a reliable foundation for aerospace enterprises, facilitating intelligent monitoring, rapid fault diagnosis, and continuous optimization of critical systems. + +![](/img/scenario-aerospace-en.png) + +## Scenario 3: Transportation + +#### **Background** + +The rapid growth of the transportation industry has heightened demand for diversified data management, particularly in critical hubs like railways and subways, where real-time, reliable, and precise data is essential. By leveraging multi-dimensional operational, condition, and geospatial data from trains, subways, ships, and vehicles, enterprises can enable intelligent scheduling, fault prediction, route optimization, and efficient maintenance. These capabilities not only improve operational efficiency but also reduce management costs. + +#### **Architecture** + +IoTDB’s high-throughput time-series database supports low-latency queries, high concurrency, and efficient processing of multi-source heterogeneous data. It provides a scalable foundation for intelligent transportation systems, enabling real-time analytics for vehicle monitoring, traffic flow optimization, and predictive fault detection across large-scale transportation networks. + +![](/img/scenario-transportation-en.png) + +## Scenario 4: Steel & Metallurgy + +#### **Background** + +Facing increasing market competition and stringent environmental regulations, the steel and metallurgy industry is undergoing digital transformation. Industrial IoT platforms play a crucial role in optimizing production efficiency, improving product quality, and reducing energy consumption. Real-time data collection and analysis across smelting equipment, production lines, and supply chains enable intelligent monitoring, predictive maintenance, and precise process control. + +#### **Architecture** + +IoTDB’s powerful data storage and computing capabilities provide cross-platform compatibility, lightweight deployment options, and robust integration with industrial automation systems. Its ability to efficiently handle high-frequency time-series data empowers steel and metallurgy enterprises to implement smart manufacturing solutions and accelerate digitalization. + +![](/img/scenario-steel-en.png) + +## Scenario 5: IoT + +#### **Background** + +The Internet of Things (IoT) is driving digital transformation across industries by enabling real-time device connectivity and intelligent management. As IoT deployments scale, enterprises require a time-series data management system capable of processing vast data streams from edge devices to the cloud. Ensuring high-performance data storage, fast querying, and reliable synchronization is crucial for applications such as equipment monitoring, anomaly detection, and predictive maintenance. + +#### **Architecture** + +As a IoT-native high-performance time-series database, IoTDB supports end-to-end data synchronization and analysis from edge devices to the cloud. With high-concurrency processing capabilities, it meets the demands of large-scale device connectivity. IoTDB provides flexible data solutions to unlock deeper insights from operational data, improve efficiency, and drive comprehensive IoT business growth. + +![](/img/scenario-iot-en.png) \ No newline at end of file diff --git a/src/UserGuide/Master/Table/IoTDB-Introduction/What-is-timechodb_timecho.md b/src/UserGuide/Master/Table/IoTDB-Introduction/What-is-timechodb_timecho.md new file mode 100644 index 000000000..0113ec4b5 --- /dev/null +++ b/src/UserGuide/Master/Table/IoTDB-Introduction/What-is-timechodb_timecho.md @@ -0,0 +1,297 @@ + + +TimechoDB is a high-performance, cost-efficient, and IoT-native time-series database developed by Timecho. As an enterprise-grade extension of Apache IoTDB, it is designed to tackle the complexities of managing large-scale time-series data in IoT environments. These challenges include high-frequency data sampling, massive data volumes, out-of-order data, extended processing times, diverse analytical demands, and high storage and maintenance costs. + +TimechoDB enhances Apache IoTDB with superior functionality, optimized performance, enterprise-grade reliability, and an intuitive toolset, enabling industrial users to streamline data operations and unlock deeper insights. + +- [Quick Start](../QuickStart/QuickStart_timecho.md): Download, Deploy, and Use + +## TimechoDB Data Management Solution + +The Timecho ecosystem provides an integrated **collect-store-use** solution, covering the complete lifecycle of time-series data, from acquisition to analysis. + +![](/img/Introduction-en-timecho-new.png) + +Key components include: + +1. **Time-Series Database (TimechoDB)**: + 1. The primary storage and processing engine for time-series data, based on Apache IoTDB. + 2. Offers **high compression, advanced** **query** **capabilities, real-time stream processing, high availability, and scalability**. + 3. Provides **security features, multi-language APIs, and seamless integration with external systems**. +2. **Time-Series Standard File Format** **(Apache** **TsFile)**: + 1. A high-performance storage format originally developed by Timecho’s core contributors. + 2. Enables **efficient compression and fast querying**. + 3. Powers TimechoDB’s **data collection, storage, and analysis pipeline**, ensuring unified data management +3. **Time-Series AI Engine** **(AINode)**: + 1. Integrates **machine learning and deep learning** for time-series analytics. + 2. Extracts actionable insights directly from TimechoDB-stored data. +4. **Data Collection Framework**: + 1. Supports **various industrial protocols, resumable transfers, and network barrier penetration**. + 2. Facilitates **reliable data acquisition in challenging industrial environments**. + +## TimechoDB Architecture + +The diagram below illustrates a common cluster deployment (3 ConfigNodes, 3 DataNodes) of TimechoDB: + +![](/img/Cluster-Concept03.png) + +### Key Features + +TimechoDB offers the following advantages: + +**Flexible Deployment:** + +- Supports one-click cloud deployment, on-premise installation, and seamless terminal-cloud synchronization. +- Adapts to hybrid, edge, and cloud-native architectures + +**Cost-Efficient Storage:** + +- Utilizes high compression ratio storage, eliminating the need for separate real-time and historical databases. +- Supports unified data management across different time horizons. + +**Hierarchical** **Data** **Organization:** + +- Mirrors real-world industrial structures through hierarchical measurement point modeling. +- Enables directory-based navigation, search, and retrieval. + +**High-Throughput Read****&****Write:** + +- Optimized for millions of concurrent device connections. +- Handles multi-frequency and out-of-order data ingestion with high efficiency. + +**Advanced Time-Series Query Semantics** **:** + +- Features a native time-series computation engine with built-in timestamp alignment. +- Provides nearly 100 aggregation and analytical functions, enabling AI-powered time-series insights. + +**Enterprise-Grade High Availability** **:** + +- Distributed HA architecture ensures 24/7 real-time database services. +- Automated resource balancing when nodes are added, removed, or overheated. +- Supports heterogeneous clusters with varying hardware configurations. + +**Operational Simplicity** **:** + +- Standard SQL query syntax for ease of use. +- Multi-language APIs for flexible development. +- Comes with a comprehensive toolset, including an intuitive management console + +**Robust Ecosystem Integration:** + +- Seamlessly integrates with big data frameworks (Hadoop, Spark) and visualization tools (Grafana, ThingsBoard, DataEase). +- Supports device management for industrial IoT environments. + +### Enterprise-level Enhancements + +TimechoDB extends the open-source version with advanced industrial-grade capabilities, including tiered storage, cloud-edge collaboration, visualization tools, and security upgrades. + +**Dual-Active Deployment:** + +- Implements active-active high availability, ensuring continuous operations. +- Two independent clusters perform real-time bidirectional synchronization. +- Both systems accept external writes and maintain eventual consistency. + +**Seamless Data Synchronization** **:** + +- Built-in synchronization module supports real-time and batch data aggregation from field devices to central hubs. +- Supports full, partial, and cascading aggregation. +- Includes enterprise-ready plugins for cross air-gap transmission, encrypted transmission, and compression. + +**Tiered** **Storage:** + +- Dynamically categorizes data into hot, warm, and cold tiers. +- Efficiently balances SSD, HDD, and cloud storage utilization. +- Automatically optimizes data access speed and storage costs. + +**Enhanced Security** **:** + +- Implements whitelist-based access control and audit logging. +- Strengthens data governance and risk mitigation. + +**Feature Comparison**: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionApache IoTDBTimechoDB
Deployment ModeStand-Alone Deployment
Distributed Deployment
Dual Active Deployment×
Container DeploymentPartial support
Database FunctionalitySensor Management
Write Data
Query Data
Continuous Query
Trigger
User Defined Function
Permission Management
Data SynchronisationOnly file synchronization, no built-in pluginsReal time synchronization+file synchronization, enriched with built-in plugins
Stream ProcessingOnly framework, no built-in pluginsFramework+rich built-in plugins
Tiered Storage×
View×
White List×
Audit Log×
Supporting ToolsWorkbench×
Cluster Management Tool×
System Monitor Tool×
LocalizationLocalization Compatibility Certification×
Technical SupportBest Practices×
Use Training×
+ +#### Higher Efficiency and Stability + +TimechoDB achieves up to 10x performance improvements over Apache IoTDB in mission-critical workloads, and provides rapid fault recovery for industrial environments. + +#### Comprehensive Management Tools + +TimechoDB simplifies deployment, monitoring, and maintenance through an intuitive toolset: + +- **Cluster Monitoring Dashboard** + - Real-time insights into IoTDB and underlying OS health. + - 100+ performance metrics for in-depth monitoring and optimization. + - + - ![](/img/Introduction01.png) + - + - ![](/img/Introduction02.png) + - + - ![](/img/Introduction03.png) + - +- **Database Console** **:** + - Simplifies interaction with an intuitive GUI for metadata management, SQL execution, user permissions, and system configuration. +- **Cluster Management Tool** **:** + - Provides **one-click operations** for cluster deployment, scaling, start/stop, and configuration updates. + +#### Professional Enterprise Technical Services + +TimechoDB offers **vendor-backed enterprise services** to support industrial-scale deployments: + +- **On-Site Installation & Training**: Hands-on guidance for fast adoption. +- **Expert Consulting & Advisory**: Performance tuning and best practices. +- **Emergency Support & Remote Assistance**: Minimized downtime for mission-critical operations. +- **Custom Development & Optimization**: Tailored solutions for unique industrial use cases. + +Compared to the open-source version’s 2-3 month release cycle, TimechoDB delivers faster updates and same-day critical issue resolutions, ensuring production stability. + +#### Ecosystem Compatibility & Compliance + +imechoDB is self-developed, supports mainstream CPUs & operating systems, and meets industry compliance standards, making it a reliable choice for enterprise IoT deployments. \ No newline at end of file diff --git a/src/UserGuide/Master/Table/Technical-Insider/Cluster-data-partitioning.md b/src/UserGuide/Master/Table/Technical-Insider/Cluster-data-partitioning.md new file mode 100644 index 000000000..2a3f54fe7 --- /dev/null +++ b/src/UserGuide/Master/Table/Technical-Insider/Cluster-data-partitioning.md @@ -0,0 +1,125 @@ + + + +This document introduces the partitioning strategies and load balance strategies in IoTDB. According to the characteristics of time series data, IoTDB partitions them by series and time dimensions. Combining a series partition with a time partition creates a partition, the unit of division. To enhance throughput and reduce management costs, these partitions are evenly allocated to RegionGroups, which serve as the unit of replication. The RegionGroup's Regions then determine the storage location, with the leader Region managing the primary load. During this process, the Region placement strategy determines which nodes will host the replicas, while the leader selection strategy designates which Region will act as the leader. + +### Partitioning Strategy and Partition Allocation + +IoTDB implements a tailored partitioning algorithm for time-series data. Based on this, the partition information cached on the ConfigNode and DataNode is not only easy to manage but also clearly distinguishes between hot and cold data. Subsequently, balanced partitions are evenly distributed across the RegionGroups in the cluster to achieve storage balance. + +#### Partitioning Strategy + +IoTDB maps each sensor in a production environment to a time series. It then uses a **series** **partitioning algorithm** to partition the time series for schema management and a **time partitioning algorithm** to manage the data. The figure below illustrates how IoTDB partitions time-series data. + +![](/img/partition_table_en.png) + +##### Partitioning Algorithms + +Since a large number of devices and sensors are typically deployed in production environments, IoTDB uses a series partitioning algorithm to ensure that the size of partition information remains manageable. As the generated time series are associated with timestamps, IoTDB uses a time partitioning algorithm to clearly distinguish between hot and cold partitions. + +###### Series Partitioning Algorithm + +By default, IoTDB limits the number of series partitions to 1,000 and configures the series partitioning algorithm as a **hash partitioning algorithm**. This provides the following benefits: + +- The number of series partitions is a fixed constant, ensuring stable mapping between series and series partitions. Thus, IoTDB does not require frequent data migration. +- The load on series partitions is relatively balanced, as the number of series partitions is much smaller than the number of sensors deployed in production environments. + +Furthermore, if the actual load in the production environment can be estimated more accurately, the sequence partitioning algorithm can be configured as a custom hash or list partitioning algorithm to achieve a more uniform load distribution across all sequence partitions. + +###### Time Partitioning Algorithm + +The time partitioning algorithm converts a given timestamp into the corresponding time partition using the following formula: + +$$\left\lfloor\frac{\text{Timestamp} - \text{StartTimestamp}}{\text{TimePartitionInterval}}\right\rfloor$$ + +In this formula, $\text{StartTimestamp}$ and $\text{TimePartitionInterval}$ are configurable parameters to adapt to different production environments. $\text{StartTimestamp}$ represents the start time of the first time partition, while $\text{TimePartitionInterval}$ defines the duration of each time partition. By default, $\text{TimePartitionInterval}$ is set to seven days. + +##### Schema Partitioning + +Since the series partitioning algorithm evenly partitions the time series, each series partition corresponds to a schema partition. These schema partitions are then evenly distributed across **SchemaRegionGroups** to achieve balanced schema distribution. + +##### Data Partitioning + +Data partitions are created by combining series partitions and time partitions. Since the series partitioning algorithm evenly partitions the time series, the load of data partitions within a specific time partition remains balanced. These data partitions are then evenly distributed across **DataRegionGroups** to achieve balanced data distribution. + +#### Partition Allocation + +IoTDB uses RegionGroups to achieve elastic storage for time-series data. The number of RegionGroups in the cluster is determined by the total resources of all DataNodes. Since the number of RegionGroups is dynamic, IoTDB can easily scale. Both SchemaRegionGroups and DataRegionGroups follow the same partition allocation algorithm, which evenly divides all series partitions. The figure below illustrates the partition allocation process, where dynamically expanding RegionGroups match the continuously expanding time series and cluster. + +![](/img/partition_allocation_en.png) + +##### RegionGroup Expansion + +The number of RegionGroups is given by the following formula: + +$$\text{RegionGroupNumber} = \left\lfloor\frac{\sum_{i=1}^{\text{DataNodeNumber}} \text{RegionNumber}_i}{\text{ReplicationFactor}}\right\rfloor$$ + +In this formula, $\text{RegionNumber}_i$ represents the number of Regions expected to be hosted on the $i$-th DataNode, and $\text{ReplicationFactor}$ denotes the number of Regions within each RegionGroup. Both $\text{RegionNumber}_i$ and $\text{ReplicationFactor}$ are configurable parameters. $\text{RegionNumber}_i$ can be determined based on the available hardware resources (e.g., CPU cores, memory size) on the $i$-th DataNode to adapt to different physical servers. $\text{ReplicationFactor}$ can be adjusted to ensure different levels of fault tolerance. + +##### Allocation Strategy + +Both the SchemaRegionGroup and the DataRegionGroup follow the same allocation algorithm--splitting all series partitions evenly. As a result, each SchemaRegionGroup holds the same number of schema partitions, ensuring balanced schema storage. Similarly, for each time partition, each DataRegionGroup acquires the data partitions corresponding to the series partitions it holds. Consequently, the data partitions within a time partition are evenly distributed across all DataRegionGroups, ensuring balanced data storage in each time partition. + +Notably, IoTDB effectively leverages the characteristics of time series data. When the TTL (Time to Live) is configured, IoTDB enables migration-free elastic storage for time series data. This feature facilitates cluster expansion while minimizing the impact on online operations. The figures above illustrate an instance of this feature: newborn data partitions are evenly allocated to each DataRegion, and expired data are automatically archived. As a result, the cluster's storage will eventually remain balanced. + +### Load Balancing Strategies + +To improve cluster availability and performance, IoTDB employs carefully designed storage balancing and computation balancing algorithms. + +#### Storage Balancing + +The number of Regions held by a DataNode reflects its storage load. If the number of Regions varies significantly between DataNodes, the DataNode with more Regions may become a storage bottleneck. Although a simple Round Robin placement algorithm can achieve storage balancing by ensuring each DataNode holds an equal number of Regions, it reduces the cluster's fault tolerance, as shown below: + +![](/img/placement_en.png) + +- Assume the cluster has 4 DataNodes, 4 RegionGroups, and a replication factor of 2. +- Place the 2 Regions of RegionGroups $r_1$ on DataNodes $n_1$ and $n_2$ . +- Place the 2 Regions of RegionGroups $r_2$ on DataNodes $n_3$ and $n_4$ . +- Place the 2 Regions of RegionGroups $r_3$ on DataNodes $n_1$ and $n_3$ . +- Place the 2 Regions of RegionGroups $r_4$ on DataNodes $n_2$ and $n_4$ . + +In this scenario, if DataNode $n_2$ fails, the load previously handled by DataNode $n_2$ would be transferred solely to DataNode $n_1$ , potentially overloading it. + +To address this issue, IoTDB employs a Region placement algorithm that not only evenly distributes Regions across all DataNodes but also ensures that each DataNode can offload its storage to sufficient other DataNodes in the event of a failure. As a result, the cluster achieves balanced storage distribution and a high level of fault tolerance, ensuring its availability. + +#### Computation Balancing + +The number of leader Regions held by a DataNode reflects its Computing load. If the difference in the number of leaders across DataNodes is relatively large, the DataNode with more leaders is likely to become a Computing bottleneck. If the leader selection process is conducted using a transparent Greedy algorithm, the result may be an unbalanced leader distribution when the Regions are fault-tolerantly placed, as demonstrated below: + +![](/img/selection_en.png) + +- Assume the cluster has 4 DataNodes, 4 RegionGroups, and a replication factor of 2. +- Select RegionGroup $r_5$ on DataNode $n_5$ as the leader. +- Select RegionGroup $r_6$ on DataNode $n_7$ as the leader. +- Select RegionGroup $r_7$ on DataNode $n_7$ as the leader. +- Select RegionGroup $r_8$ on DataNode $n_8$ as the leader. + +Note that the above steps strictly follow the Greedy algorithm. However, by step 3, selecting the leader of RegionGroup $r_7$ on either DataNode $n_5$ or $n_7$ would result in uneven leader distribution. The root cause is that each greedy selection step lacks a global perspective, ultimately leading to a local optimum. + +To address this issue, IoTDB adopts a **leader selection algorithm** that continuously balances the distribution of leader across the cluster. As a result, the cluster achieves balanced computation load distribution, ensuring its performance. + +### Source Code + +- [Data Partitioning](https://github.com/apache/iotdb/tree/master/iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/partition) +- [Partition Allocation](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/partition) +- [Region Placement](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/region) +- [Leader Selection](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/router/leader) \ No newline at end of file diff --git a/src/UserGuide/Master/Table/Technical-Insider/Encoding-and-Compression.md b/src/UserGuide/Master/Table/Technical-Insider/Encoding-and-Compression.md new file mode 100644 index 000000000..d1546bead --- /dev/null +++ b/src/UserGuide/Master/Table/Technical-Insider/Encoding-and-Compression.md @@ -0,0 +1,126 @@ + + +IoTDB employs various encoding and compression techniques to enhance storage efficiency and reduce I/O operations during data writing and reading. Below is a detailed explanation of the supported encoding and compression methods. + +## **Encoding Methods** + +IoTDB supports multiple encoding methods tailored for different data types to optimize storage and performance. + +1. PLAIN + +The default encoding method, meaning no encoding is applied. It supports multiple data types and offers high time efficiency for compression and decompression, but has relatively slow storage efficiency. + +1. TS_2DIFF + +Second-order differential encoding (TS_2DIFF) is suitable for encoding monotonically increasing or decreasing sequences. It is not ideal for encoding data with significant fluctuations. + +1. RLE + +Run-Length Encoding (RLE) is ideal for sequences where certain values appear consecutively. It is not effective for sequences where most consecutive values differ. + +RLE can also be used to encode floating-point numbers, while it is necessary to specify decimal precision when creating time series. It is suitable to store sequence data where floating-point values appear continuously, but is not recommended for sequences requiring high decimal precision or those with large fluctuations. + +> Both RLE and TS_2DIFF encoding for `float` and `double` have precision limitations, with a default of two decimal places. GORILLA encoding is recommended instead. + +1. GORILLA + +A lossless encoding method suitable for sequences where consecutive values are close to each other. It is not effective for data with large fluctuations. + +Currently, there are two versions of GORILLA encoding implementation, it is recommended to use `GORILLA` instead of `GORILLA_V1` (which is deprecated). + +Usage restrictions: + +- When using GORILLA encoding for `INT32` data, ensure that the sequence does not contain values equal to `Integer.MIN_VALUE`. +- When using GORILLA encoding for `INT64` data, ensure that the sequence does not contain values equal to `Long.MIN_VALUE`. + +1. DICTIONARY + +A lossless encoding method suitable for data with low cardinality (i.e., a limited number of unique values). It is not recommended for high-cardinality data. + +1. ZIGZAG + +Maps signed integers to unsigned integers, making it suitable for small integer values. + +1. CHIMP + +A lossless encoding method designed for streaming floating-point data compression. It is efficient for sequences with small variations and low random noise. + +Usage restrictions: + +- When using CHIMP encoding for `INT32` data, ensure that the sequence does not contain values equal to `Integer.MIN_VALUE`. +- When using CHIMP encoding for `INT64` data, ensure that the sequence does not contain values equal to `Long.MIN_VALUE`. + +1. SPRINTZ + +A lossless encoding method combining prediction, ZigZag encoding, bit packing, and run-length encoding. It is best suited for time-series data with small absolute differences (i.e., low fluctuation) and is not effective for data with large variations. + +1. RLBE + +A lossless encoding method combining differential encoding, bit packing, run-length encoding, Fibonacci encoding, and concatenation. It is suitable for time-series data with a small and steadily increasing trend but is not effective for highly fluctuating data. + +### **Data Types and Supported Encoding Methods** + +The following table summarizes the recommended and supported encoding methods for each data type: + +| **Data Type** | **Recommended Encoding** | **Supported Encoding Methods** | +| :------------ | :----------------------- | :---------------------------------------------------------- | +| BOOLEAN | RLE | PLAIN, RLE | +| INT32 | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| DATE | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| INT64 | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| TIMESTAMP | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| FLOAT | GORILLA | PLAIN, RLE, TS_2DIFF, GORILLA, CHIMP, SPRINTZ, RLBE | +| DOUBLE | GORILLA | PLAIN, RLE, TS_2DIFF, GORILLA, CHIMP, SPRINTZ, RLBE | +| TEXT | PLAIN | PLAIN, DICTIONARY | +| STRING | PLAIN | PLAIN, DICTIONARY | +| BLOB | PLAIN | PLAIN | + +**Error Handling**: If the data type entered by the user does not match the specified encoding method, the system will display an error message. For example: + +```Plain +IoTDB> create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +Msg: 507: encoding TS_2DIFF does not support BOOLEAN +``` + +## **Compression Methods** + +When the time series is written and encoded as binary data according to the specified type, IoTDB applies compression techniques to further enhance storage efficiency. While both encoding and compression aim to optimize storage, encoding techniques are typically designed for specific data types (e.g., second-order differential encoding is only suitable for INT32 or INT64, and storing floating-point numbers requires multiplying them by 10ⁿ to convert them into integers) before converting the data into a binary stream. Compression methods like SNAPPY operate on the binary stream, making them independent of the data type. + +### **Supported Compression Methods** + +IoTDB allows specifying the compression method of a column when creating a time series. Currently, IoTDB supports the following compression methods: + +- UNCOMPRESSED +- SNAPPY +- LZ4 (Recommended) +- GZIP +- ZSTD +- LZMA2 + +### **Compression Ratio Statistics** + +IoTDB provides compression ratio statistics to monitor the effectiveness of compression. The statistics are stored in:`data/datanode/system/compression_ratio` + +- ratio_sum: The total sum of memtable compression ratios. +- memtable_flush_time: The total number of memtable flushes. + +The average compression ratio can be calculated as:`Average Compression Ratio = ratio_sum / memtable_flush_time` \ No newline at end of file diff --git a/src/UserGuide/Master/Table/User-Manual/Data-Sync_timecho.md b/src/UserGuide/Master/Table/User-Manual/Data-Sync_timecho.md new file mode 100644 index 000000000..90bc5bc4f --- /dev/null +++ b/src/UserGuide/Master/Table/User-Manual/Data-Sync_timecho.md @@ -0,0 +1,525 @@ + + +Data synchronization is a typical requirement in the Industrial Internet of Things (IIoT). Through data synchronization mechanisms, data sharing between IoTDB instances can be achieved, enabling the establishment of a complete data pipeline to meet needs such as internal and external network data exchange, edge-to-cloud synchronization, data migration, and data backup. + +# Functional Overview + +## Data Synchronization + +A data synchronization task consists of three stages: + +![](/img/en_dataSync01.png) + +- Source Stage: This stage is used to extract data from the source IoTDB, defined in the `source` section of the SQL statement. +- Process Stage: This stage is used to process the data extracted from the source IoTDB, defined in the `processor` section of the SQL statement. +- Sink Stage: This stage is used to send data to the target IoTDB, defined in the `sink` section of the SQL statement. + +By declaratively configuring these three parts in an SQL statement, flexible data synchronization capabilities can be achieved. + +## Functional Limitations and Notes + +- Supports data synchronization from IoTDB version 1.x series to version 2.x and later. +- Does not support data synchronization from IoTDB version 2.x series to version 1.x series. +- When performing data synchronization tasks, avoid executing any deletion operations to prevent inconsistencies between the two ends. + +# Usage Instructions + +A data synchronization task can be in one of three states: RUNNING, STOPPED, and DROPPED. The state transitions of the task are illustrated in the diagram below: + +![](/img/Data-Sync02.png) + +After creation, the task will start directly. Additionally, if the task stops due to an exception, the system will automatically attempt to restart it. + +We provide the following SQL statements for managing the state of synchronization tasks. + +## Create a Task + +Use the `CREATE PIPE` statement to create a data synchronization task. Among the following attributes, `PipeId` and `sink` are required, while `source` and `processor` are optional. Note that the order of the `SOURCE` and `SINK` plugins cannot be swapped when writing the SQL. + +SQL Example: + +```SQL +CREATE PIPE [IF NOT EXISTS] -- PipeId is a unique name identifying the task +-- Data extraction plugin (optional) +WITH SOURCE ( + [ = ,], +) +-- Data processing plugin (optional) +WITH PROCESSOR ( + [ = ,], +) +-- Data transmission plugin (required) +WITH SINK ( + [ = ,], +) +``` + +**IF NOT EXISTS Semantics**: Ensures that the creation command is executed only if the specified Pipe does not exist, preventing errors caused by attempting to create an already existing Pipe. + +## Start a Task + +After creation, the task directly enters the RUNNING state and does not require manual startup. However, if the task is stopped using the `STOP PIPE` statement, you need to manually start it using the `START PIPE` statement. If the task stops due to an exception, it will automatically restart to resume data processing: + +```SQL +START PIPE +``` + +## Stop a Task + +To stop data processing: + +```SQL +STOP PIPE +``` + +## Delete a Task + +To delete a specified task: + +```SQL +DROP PIPE [IF EXISTS] +``` + +**IF EXISTS Semantics**: Ensures that the deletion command is executed only if the specified Pipe exists, preventing errors caused by attempting to delete a non-existent Pipe. **Note**: Deleting a task does not require stopping the synchronization task first. + +## View Tasks + +To view all tasks: + +```SQL +SHOW PIPES +``` + +To view a specific task: + +```SQL +SHOW PIPE +``` + +Example Output of `SHOW PIPES`: + +```SQL ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +| ID| CreationTime| State|PipeSource|PipeProcessor| PipeSink|ExceptionMessage|RemainingEventCount|EstimatedRemainingSeconds| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +|59abf95db892428b9d01c5fa318014ea|2024-06-17T14:03:44.189|RUNNING| {}| {}|{sink=iotdb-thrift-sink, sink.ip=127.0.0.1, sink.port=6668}| | 128| 1.03| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +``` + +**Column Descriptions**: + +- **ID**: Unique identifier of the synchronization task. +- **CreationTime**: Time when the task was created. +- **State**: Current state of the task. +- **PipeSource**: Source of the data stream. +- **PipeProcessor**: Processing logic applied during data transmission. +- **PipeSink**: Destination of the data stream. +- **ExceptionMessage**: Displays exception information for the task. +- **RemainingEventCount** (statistics may have delays): Number of remaining events, including data and metadata synchronization events, as well as system and user-defined events. +- **EstimatedRemainingSeconds** (statistics may have delays): Estimated remaining time to complete the transmission based on the current event count and pipe processing rate. + +## Synchronization Plugins + +To make the architecture more flexible and adaptable to different synchronization scenarios, IoTDB supports plugin assembly in the synchronization task framework. The system provides some common pre-installed plugins, and you can also customize `processor` and `sink` plugins and load them into the IoTDB system. + +To view the plugins available in the system (including custom and built-in plugins), use the following statement: + +```SQL +SHOW PIPEPLUGINS +``` + +Example Output: + +```SQL +IoTDB> SHOW PIPEPLUGINS ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| PluginName|PluginType| ClassName| PluginJar| ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| DO-NOTHING-PROCESSOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.processor.donothing.DoNothingProcessor| | +| DO-NOTHING-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.donothing.DoNothingConnector| | +| IOTDB-AIR-GAP-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.airgap.IoTDBAirGapConnector| | +| IOTDB-SOURCE| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.extractor.iotdb.IoTDBExtractor| | +| IOTDB-THRIFT-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftConnector| | +| IOTDB-THRIFT-SSL-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftSslConnector| | ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +``` + +Detailed introduction of pre-installed plugins is as follows (for detailed parameters of each plugin, please refer to the [Parameter Description](../Reference/System-Config-Manual.md) section): + +| **Type** | **Custom Plugin** | **Plugin Name** | **Description** | +| :---------------------- | :----------------------------------------------------------- | :--------------------- | :----------------------------------------------------------- | +| Source Plugin | Not Supported | `iotdb-source` | Default extractor plugin for extracting historical or real-time data from IoTDB. | +| Processor Plugin | Supported | `do-nothing-processor` | Default processor plugin that does not process incoming data. | +| Sink Plugin | Supported | `do-nothing-sink` | Does not process outgoing data. | +| `iotdb-thrift-sink` | Default sink plugin for data transmission between IoTDB instances (V2.0.0+). Uses Thrift RPC framework with a multi-threaded async non-blocking IO model, ideal for distributed target scenarios. | | | +| `iotdb-air-gap-sink` | Used for cross-unidirectional data gate synchronization between IoTDB instances (V2.0.0+). Supports gate models like NARI Syskeeper 2000. | | | +| `iotdb-thrift-ssl-sink` | Used for data transmission between IoTDB instances (V2.0.0+). Uses Thrift RPC framework with a multi-threaded sync blocking IO model, suitable for high-security scenarios. | | | + +# Usage Examples + +## Full Data Synchronization + +This example demonstrates synchronizing all data from one IoTDB to another. The data pipeline is shown below: + +![](/img/e1.png) + +In this example, we create a synchronization task named `A2B` to synchronize all data from IoTDB A to IoTDB B. The `iotdb-thrift-sink` plugin (built-in) is used, and the `node-urls` parameter is configured with the URL of the DataNode service port on the target IoTDB. + +SQL Example: + +```SQL +CREATE PIPE A2B +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668' -- URL of the DataNode service port on the target IoTDB +) +``` + +## Partial Data Synchronization + +This example demonstrates synchronizing data within a specific historical time range (from August 23, 2023, 8:00 to October 23, 2023, 8:00) to another IoTDB. The data pipeline is shown below: + +![](/img/e2.png) + +In this example, we create a synchronization task named `A2B`. First, we define the data range in the `source` configuration. Since we are synchronizing historical data (data that existed before the task was created), we need to configure the start time (`start-time`), end time (`end-time`), and the streaming mode (`mode.streaming`). The `node-urls` parameter is configured with the URL of the DataNode service port on the target IoTDB. + +SQL Example: + +```SQL +CREATE PIPE A2B +WITH SOURCE ( + 'source' = 'iotdb-source', + 'mode.streaming' = 'true' -- Extraction mode for newly inserted data (after the pipe is created): + -- Whether to extract data in streaming mode (if set to false, batch mode is used). + 'start-time' = '2023.08.23T08:00:00+00:00', -- The event time at which data synchronization starts (inclusive). + 'end-time' = '2023.10.23T08:00:00+00:00' -- The event time at which data synchronization ends (inclusive). +) +WITH SINK ( + 'sink' = 'iotdb-thrift-async-sink', + 'node-urls' = '127.0.0.1:6668' -- The URL of the DataNode's data service port in the target IoTDB instance. +) +``` + +## Bidirectional Data Transmission + +This example demonstrates a scenario where two IoTDB instances act as dual-active systems. The data pipeline is shown below: + +![](/img/e3.png) + +To avoid infinite data loops, the `source.mode.double-living` parameter must be set to `true` on both IoTDB A and B, indicating that data forwarded from another pipe will not be retransmitted. + +SQL Example: On IoTDB A: + +```SQL +CREATE PIPE AB +WITH SOURCE ( + 'source.mode.double-living' = 'true' -- Do not forward data from other pipes +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668' -- URL of the DataNode service port on the target IoTDB +) +``` + +On IoTDB B: + +```SQL +CREATE PIPE BA +WITH SOURCE ( + 'source.mode.double-living' = 'true' -- Do not forward data from other pipes +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6667' -- URL of the DataNode service port on the target IoTDB +) +``` + +## Edge-to-Cloud Data Transmission + +This example demonstrates synchronizing data from multiple IoTDB clusters (B, C, D) to a central IoTDB cluster (A). The data pipeline is shown below: + +![](/img/sync_en_03.png) + +To synchronize data from clusters B, C, and D to cluster A, the `database-name` and `table-name` parameters are used to restrict the data range. + +SQL Example: On IoTDB B: + +```SQL +CREATE PIPE BA +WITH SOURCE ( + 'database-name' = 'db_b.*', -- Restrict the database scope + 'table-name' = '.*' -- Match all tables +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6667' -- URL of the DataNode service port on the target IoTDB +) +``` + +On IoTDB C : + +```SQL +CREATE PIPE CA +WITH SOURCE ( + 'database-name' = 'db_c.*', -- Restrict the database scope + 'table-name' = '.*' -- Match all tables +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668' -- URL of the DataNode service port on the target IoTDB +) +``` + +On IoTDB D: + +```SQL +CREATE PIPE DA +WITH SOURCE ( + 'database-name' = 'db_d.*', -- Restrict the database scope + 'table-name' = '.*' -- Match all tables +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6669' -- URL of the DataNode service port on the target IoTDB +) +``` + +## Cascaded Data Transmission + +This example demonstrates cascading data transmission from IoTDB A to IoTDB B and then to IoTDB C. The data pipeline is shown below: + +![](/img/sync_en_04.png) + +To synchronize data from cluster A to cluster C, the `source.mode.double-living` parameter is set to `true` in the pipe between B and C. + +SQL Example: On IoTDB A: + +```SQL +CREATE PIPE AB +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668' -- URL of the DataNode service port on the target IoTDB +) +``` + +On IoTDB B: + +```SQL +CREATE PIPE BC +WITH SOURCE ( + 'source.mode.double-living' = 'true' -- Do not forward data from other pipes +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6669' -- URL of the DataNode service port on the target IoTDB +) +``` + +## Air-Gapped Data Transmission + +This example demonstrates synchronizing data from one IoTDB to another through a unidirectional air gap. The data pipeline is shown below: + +![](/img/e5.png) + +In this example, the `iotdb-air-gap-sink` plugin is used (currently supports specific air gap models; contact Timecho team for details). After configuring the air gap, execute the following statement on IoTDB A, where `node-urls` is the URL of the DataNode service port on the target IoTDB. + +SQL Example: + +```SQL +CREATE PIPE A2B +WITH SINK ( + 'sink' = 'iotdb-air-gap-sink', + 'node-urls' = '10.53.53.53:9780' -- URL of the DataNode service port on the target IoTDB +) +``` + +## Compressed Synchronization + +IoTDB supports specifying data compression methods during synchronization. The `compressor` parameter can be configured to enable real-time data compression and transmission. Supported algorithms include `snappy`, `gzip`, `lz4`, `zstd`, and `lzma2`. Multiple algorithms can be combined and applied in the configured order. The `rate-limit-bytes-per-second` parameter (supported in V1.3.3 and later) limits the maximum number of bytes transmitted per second (calculated after compression). If set to a value less than 0, there is no limit. + +**SQL Example**: + +```SQL +CREATE PIPE A2B +WITH SINK ( + 'node-urls' = '127.0.0.1:6668', -- URL of the DataNode service port on the target IoTDB + 'compressor' = 'snappy,lz4', -- Compression algorithms + 'rate-limit-bytes-per-second' = '1048576' -- Maximum bytes allowed per second +) +``` + +## Encrypted Synchronization + +IoTDB supports SSL encryption during synchronization to securely transmit data between IoTDB instances. By configuring SSL-related parameters such as the certificate path (`ssl.trust-store-path`) and password (`ssl.trust-store-pwd`), data can be protected by SSL encryption during synchronization. + +**SQL Example**: + +```SQL +CREATE PIPE A2B +WITH SINK ( + 'sink' = 'iotdb-thrift-ssl-sink', + 'node-urls' = '127.0.0.1:6667', -- URL of the DataNode service port on the target IoTDB + 'ssl.trust-store-path' = 'pki/trusted', -- Path to the trust store certificate + 'ssl.trust-store-pwd' = 'root' -- Password for the trust store certificate +) +``` + +# Reference: Notes + +You can adjust the parameters for data synchronization by modifying the IoTDB configuration file (`iotdb-system.properties`), such as the directory for storing synchronized data. The complete configuration is as follows: + +```Properties +# pipe_receiver_file_dir +# If this property is unset, system will save the data in the default relative path directory under the IoTDB folder(i.e., %IOTDB_HOME%/${cn_system_dir}/pipe/receiver). +# If it is absolute, system will save the data in the exact location it points to. +# If it is relative, system will save the data in the relative path directory it indicates under the IoTDB folder. +# Note: If pipe_receiver_file_dir is assigned an empty string(i.e.,zero-size), it will be handled as a relative path. +# effectiveMode: restart +# For windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is absolute. Otherwise, it is relative. +# pipe_receiver_file_dir=data\\confignode\\system\\pipe\\receiver +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_receiver_file_dir=data/confignode/system/pipe/receiver + +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# effectiveMode: first_start +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# effectiveMode: restart +# Datatype: int +pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# effectiveMode: restart +# Datatype: int +pipe_sink_timeout_ms=900000 + +# The maximum number of selectors that can be used in the sink. +# Recommend to set this value to less than or equal to pipe_sink_max_client_number. +# effectiveMode: restart +# Datatype: int +pipe_sink_selector_number=4 + +# The maximum number of clients that can be used in the sink. +# effectiveMode: restart +# Datatype: int +pipe_sink_max_client_number=16 + +# Whether to enable receiving pipe data through air gap. +# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. +# effectiveMode: restart +# Datatype: Boolean +pipe_air_gap_receiver_enabled=false + +# The port for the server to receive pipe data through air gap. +# Datatype: int +# effectiveMode: restart +pipe_air_gap_receiver_port=9780 + +# The total bytes that all pipe sinks can transfer per second. +# When given a value less than or equal to 0, it means no limit. +# default value is -1, which means no limit. +# effectiveMode: hot_reload +# Datatype: double +pipe_all_sinks_rate_limit_bytes_per_second=-1 +``` + +# Reference: Parameter Description + +## S**ource** **p****arameter****s** + +| **Parameter** | **Description** | **Value Range** | **Required** | **Default Value** | +| :----------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :----------- | :---------------------------------------------------------- | +| source | iotdb-source | String: iotdb-source | Yes | - | +| mode.streaming | This parameter specifies the source of time-series data capture. It applies to scenarios where `mode.streaming` is set to `false`, determining the capture source for `data.insert` in `inclusion`. Two capture strategies are available: - **true**: Dynamically selects the capture type. The system adapts to downstream processing speed, choosing between capturing each write request or only capturing TsFile file sealing requests. When downstream processing is fast, write requests are prioritized to reduce latency; when processing is slow, only file sealing requests are captured to prevent processing backlogs. This mode suits most scenarios, optimizing the balance between processing latency and throughput. - **false**: Uses a fixed batch capture approach, capturing only TsFile file sealing requests. This mode is suitable for resource-constrained applications, reducing system load. **Note**: Snapshot data captured when the pipe starts will only be provided for downstream processing as files. | Boolean: true / false | No | true | +| mode.strict | Determines whether to strictly filter data when using the `time`, `path`, `database-name`, or `table-name` parameters: - **true**: Strict filtering. The system will strictly filter captured data according to the specified conditions, ensuring that only matching data is selected. - **false**: Non-strict filtering. Some extra data may be included during the selection process to optimize performance and reduce CPU and I/O consumption. | Boolean: true / false | No | true | +| mode.snapshot | This parameter determines the data capture mode, affecting the `data` in `inclusion`. Two modes are available: - **true**: Static data capture. A one-time data snapshot is taken when the pipe starts. Once the snapshot data is fully consumed, the pipe automatically terminates (executing `DROP PIPE` SQL automatically). - **false**: Dynamic data capture. In addition to capturing snapshot data when the pipe starts, it continuously captures subsequent data changes. The pipe remains active to process the dynamic data stream. | Boolean: true / false | No | false | +| database-name | When the user connects with `sql_dialect` set to `table`, this parameter can be specified. Determines the scope of data capture, affecting the `data` in `inclusion`. Specifies the database name to filter. It can be a specific database name or a Java-style regular expression to match multiple databases. By default, all databases are matched. | String: Database name or database regular expression pattern string, which can match uncreated or non - existent databases. | No | ".*" | +| table-name | When the user connects with `sql_dialect` set to `table`, this parameter can be specified. Determines the scope of data capture, affecting the `data` in `inclusion`. Specifies the table name to filter. It can be a specific table name or a Java-style regular expression to match multiple tables. By default, all tables are matched. | String: Data table name or data table regular expression pattern string, which can be uncreated or non - existent tables. | No | ".*" | +| start-time | Determines the scope of data capture, affecting the `data` in `inclusion`. Data with an event time **greater than or equal to** this parameter will be selected for stream processing in the pipe. | Long: [Long.MIN_VALUE, Long.MAX_VALUE](Unix bare timestamp)orString: ISO format timestamp supported by IoTDB | No | Long: [Long.MIN_VALUE, Long.MAX_VALUE](Unix bare timestamp) | +| end-time | Determines the scope of data capture, affecting the `data` in `inclusion`. Data with an event time **less than or equal to** this parameter will be selected for stream processing in the pipe. | Long: [Long.MIN_VALUE, Long.MAX_VALUE](Unix bare timestamp)orString: ISO format timestamp supported by IoTDB | No | Long: [Long.MIN_VALUE, Long.MAX_VALUE](Unix bare timestamp) | +| forwarding-pipe-requests | Specifies whether to forward data that was synchronized via the pipe to external clusters. Typically used for setting up **active-active clusters**. In active-active cluster mode, this parameter should be set to `false` to prevent **infinite circular synchronization**. | Boolean: true / false | No | true | + +> 💎 **Note:** The difference between the values of true and false for the data extraction mode `mode.streaming` +> +> - True (recommended): Under this value, the task will process and send the data in real-time. Its characteristics are high timeliness and low throughput. +> - False: Under this value, the task will process and send the data in batches (according to the underlying data files). Its characteristics are low timeliness and high throughput. + +## Sink **p****arameter****s** + +#### iotdb-thrift-sink + +| **Parameter** | **Description** | Value Range | Required | Default Value | +| :-------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :------------ | +| sink | iotdb-thrift-sink or iotdb-thrift-async-sink | String: iotdb-thrift-sink or iotdb-thrift-async-sink | Yes | - | +| node-urls | URLs of the DataNode service ports on the target IoTDB. (please note that the synchronization task does not support forwarding to its own service). | String. Example:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Yes | - | +| user/usename | Usename for connecting to the target IoTDB. Must have appropriate permissions. | String | No | root | +| password | Password for the username. | String | No | root | +| batch.enable | Enables batch mode for log transmission to improve throughput and reduce IOPS. | Boolean: true, false | No | true | +| batch.max-delay-seconds | Maximum delay (in seconds) for batch transmission. | Integer | No | 1 | +| batch.size-bytes | Maximum batch size (in bytes) for batch transmission. | Long | No | 16*1024*1024 | +| compressor | The selected RPC compression algorithm. Multiple algorithms can be configured and will be adopted in sequence for each request. | String: snappy / gzip / lz4 / zstd / lzma2 | No | "" | +| compressor.zstd.level | When the selected RPC compression algorithm is zstd, this parameter can be used to additionally configure the compression level of the zstd algorithm. | Int: [-131072, 22] | No | 3 | +| rate-limit-bytes-per-second | The maximum number of bytes allowed to be transmitted per second. The compressed bytes (such as after compression) are calculated. If it is less than 0, there is no limit. | Double: [Double.MIN_VALUE, Double.MAX_VALUE] | No | -1 | + +#### iotdb-air-gap-sink + +| **Parameter** | **Description** | Value Range | Required | Default Value | +| :--------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :------------ | +| sink | iotdb-air-gap-sink | String: iotdb-air-gap-sink | Yes | - | +| node-urls | URLs of the DataNode service ports on the target IoTDB. (please note that the synchronization task does not support forwarding to its own service). | String. Example:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Yes | - | +| user/usename | Usename for connecting to the target IoTDB. Must have appropriate permissions. | String | No | root | +| password | Password for the username. | String | No | root | +| compressor | The selected RPC compression algorithm. Multiple algorithms can be configured and will be adopted in sequence for each request. | String: snappy / gzip / lz4 / zstd / lzma2 | No | "" | +| compressor.zstd.level | When the selected RPC compression algorithm is zstd, this parameter can be used to additionally configure the compression level of the zstd algorithm. | Int: [-131072, 22] | No | 3 | +| rate-limit-bytes-per-second | The maximum number of bytes allowed to be transmitted per second. The compressed bytes (such as after compression) are calculated. If it is less than 0, there is no limit. | Double: [Double.MIN_VALUE, Double.MAX_VALUE] | No | -1 | +| air-gap.handshake-timeout-ms | The timeout duration for the handshake requests when the sender and receiver attempt to establish a connection for the first time, in milliseconds. | Integer | No | 5000 | + +#### iotdb-thrift-ssl-sink + +| **Parameter** | **Description** | Value Range | Required | Default Value | +| :-------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :------------ | +| sink | iotdb-thrift-ssl-sink | String: iotdb-thrift-ssl-sink | Yes | - | +| node-urls | URLs of the DataNode service ports on the target IoTDB. (please note that the synchronization task does not support forwarding to its own service). | String. Example:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Yes | - | +| user/usename | Usename for connecting to the target IoTDB. Must have appropriate permissions. | String | No | root | +| password | Password for the username. | String | No | root | +| batch.enable | Enables batch mode for log transmission to improve throughput and reduce IOPS. | Boolean: true, false | No | true | +| batch.max-delay-seconds | Maximum delay (in seconds) for batch transmission. | Integer | No | 1 | +| batch.size-bytes | Maximum batch size (in bytes) for batch transmission. | Long | No | 16*1024*1024 | +| compressor | The selected RPC compression algorithm. Multiple algorithms can be configured and will be adopted in sequence for each request. | String: snappy / gzip / lz4 / zstd / lzma2 | No | "" | +| compressor.zstd.level | When the selected RPC compression algorithm is zstd, this parameter can be used to additionally configure the compression level of the zstd algorithm. | Int: [-131072, 22] | No | 3 | +| rate-limit-bytes-per-second | Maximum bytes allowed per second for transmission (calculated after compression). Set to a value less than 0 for no limit. | Double: [Double.MIN_VALUE, Double.MAX_VALUE] | No | -1 | +| ssl.trust-store-path | Path to the trust store certificate for SSL connection. | String.Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Yes | - | +| ssl.trust-store-pwd | Password for the trust store certificate. | Integer | Yes | - | \ No newline at end of file diff --git a/src/UserGuide/Master/Table/User-Manual/Tiered-Storage_timecho.md b/src/UserGuide/Master/Table/User-Manual/Tiered-Storage_timecho.md new file mode 100644 index 000000000..406798d64 --- /dev/null +++ b/src/UserGuide/Master/Table/User-Manual/Tiered-Storage_timecho.md @@ -0,0 +1,97 @@ + + +## Overview + +The **tiered storage** feature enables users to manage multiple types of storage media efficiently. Users can configure different storage media types within IoTDB and classify them into distinct storage tiers. In IoTDB, tiered storage is implemented by managing multiple directories. Users can group multiple storage directories into the same category and designate them as a **storage tier**. Additionally, data can be classified based on its "hotness" or "coldness" and stored accordingly in designated tiers. + +Currently, IoTDB supports hot and cold data classification based on the **Time-To-Live (****TTL****)** parameter. When data in a tier no longer meets the defined TTL rules, it is automatically migrated to the next tier. + +## **Parameter Definitions** + +To enable multi-level storage in IoTDB, the following configurations are required: + +1. Configure data directories and assign them into different tiers +2. Set TTL for each Tier to distinguish hot and cold data managed by different tiers. +3. Configure minimum remaining storage space ratio for each tier (Optional). If the available space in a tier falls below the defined threshold, data will be migrated to the next tier automatically. + +The specific parameter definitions and their descriptions are as follows. + +| **Parameter** | **Default Value** | **Description** | **Constraints** | +| :----------------------------------------- | :------------------------- |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `dn_data_dirs` | `data/datanode/data` | Specifies storage directories grouped into tiers. | Tiers are separated by `;`, directories within the same tier are separated by `,`.
Cloud storage (e.g., AWS S3) can only be the last tier.
Use `OBJECT_STORAGE` to denote cloud storage.
Only one cloud storage bucket is allowed. | +| `tier_ttl_in_ms` | `-1` | Defines the TTL (in milliseconds) for each tier to determine the data range it manages. | Tiers are separated by `;`.
The number of tiers must match `dn_data_dirs`.
`-1` means "no limit". | +| `dn_default_space_usage_thresholds` | `0.85` | Defines the minimum remaining space threshold (as a ratio) for each tier. When a tier’s remaining space falls below this threshold, data is migrated to the next tier.
The last tier triggers `READ_ONLY` mode. | -Tiers are separated by `;`.The number of tiers must match `dn_data_dirs`. | +| `object_storage_type` | `AWS_S3` | Cloud storage type. | Only `AWS_S3` is supported. | +| `object_storage_bucket` | `iotdb_data` | Cloud storage bucket name. | Required only if cloud storage is used. | +| `object_storage_endpoiont` | (Empty) | Cloud storage endpoint. | Required only if cloud storage is used. | +| `object_storage_access_key` | (Empty) | Cloud storage access key. | Required only if cloud storage is used. | +| `object_storage_access_secret` | (Empty) | Cloud storage access secret. | Required only if cloud storage is used. | +| `remote_tsfile_cache_dirs` | `data/datanode/data/cache` | Local cache directory for cloud storage. | Required only if cloud storage is used. | +| `remote_tsfile_cache_page_size_in_kb` | `20480` | Page size (in KB) for cloud storage local cache. | Required only if cloud storage is used. | +| `remote_tsfile_cache_max_disk_usage_in_mb` | `51200` | Maximum disk space (in MB) allocated for cloud storage local cache. | Required only if cloud storage is used. | + +## Local Tiered Storage Example + +The following is an example of a **two-tier local storage configuration**: + +```Properties +# Mandatory configurations +dn_data_dirs=/data1/data;/data2/data,/data3/data +tier_ttl_in_ms=86400000;-1 +dn_default_space_usage_thresholds=0.2;0.1 +``` + +**Tier Details:** + +| **Tier** | **Storage Directories** | **Data Range** | **Remaining Space Threshold** | +| :------- | :--------------------------- | :-------------------- | :---------------------------- | +| Tier 1 | `/data1/data` | Last 1 day of data | 20% | +| Tier 2 | `/data2/data`, `/data3/data` | Data older than 1 day | 10% | + +## Cloud-based Tiered Storage Example + +The following is an example of a **three-tier configuration with cloud storage**: + +```Properties +# Mandatory configurations +dn_data_dirs=/data1/data;/data2/data,/data3/data;OBJECT_STORAGE +tier_ttl_in_ms=86400000;864000000;-1 +dn_default_space_usage_thresholds=0.2;0.15;0.1 +object_storage_type=AWS_S3 +object_storage_bucket=iotdb +object_storage_endpoiont= +object_storage_access_key= +object_storage_access_secret= + +# Optional configurations +remote_tsfile_cache_dirs=data/datanode/data/cache +remote_tsfile_cache_page_size_in_kb=20971520 +remote_tsfile_cache_max_disk_usage_in_mb=53687091200 +``` + +**Tier Details:** + +| **Tier** | **Storage Directories** | **Data Range** | **Remaining Space Threshold** | +| :------- | :--------------------------- | :----------------------------- | :---------------------------- | +| Tier 1 | `/data1/data` | Last 1 day of data | 20% | +| Tier 2 | `/data2/data`, `/data3/data` | Data from 1 day to 10 days ago | 15% | +| Tier 3 | AWS S3 Cloud Storage | Data older than 10 days | 10% | \ No newline at end of file diff --git a/src/UserGuide/latest-Table/API/Programming-JDBC.md b/src/UserGuide/latest-Table/API/Programming-JDBC.md new file mode 100644 index 000000000..8779ec5f8 --- /dev/null +++ b/src/UserGuide/latest-Table/API/Programming-JDBC.md @@ -0,0 +1,187 @@ + + +The IoTDB JDBC provides a standardized way to interact with the IoTDB database, allowing users to execute SQL statements from Java programs for managing databases and time-series data. It supports operations such as connecting to the database, creating, querying, updating, and deleting data, as well as batch insertion and querying of time-series data. + +**Note:** The current JDBC implementation is designed primarily for integration with third-party tools. High-performance writing **may not be achieved** when using JDBC for insert operations. For Java applications, it is recommended to use the **JAVA Native API** for optimal performance. + +## Prerequisites + +### **Environment Requirements** + +- **JDK:** Version 1.8 or higher +- **Maven:** Version 3.6 or higher + +### **Adding Maven Dependencies** + +Add the following dependency to your Maven `pom.xml` file: + +```XML + + + com.timecho.iotdb + iotdb-session + 2.0.1.1 + + +``` + +## Read and Write Operations + +**Write Operations:** Perform database operations such as inserting data, creating databases, and creating time-series using the `execute` method. + +**Read Operations:** Execute queries using the `executeQuery` method and retrieve results via the `ResultSet` object. + +### Method Overview + +| **Method Name** | **Description** | **Parameters** | **Return Value** | +| ------------------------------------------------------------ | ----------------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------- | +| Class.forName(String driver) | Loads the JDBC driver class | `driver`: Name of the JDBC driver class | `Class`: Loaded class object | +| DriverManager.getConnection(String url, String username, String password) | Establishes a database connection | `url`: Database URL `username`: Username `password`: Password | `Connection`: Database connection object | +| Connection.createStatement() | Creates a `Statement` object for executing SQL statements | None | `Statement`: SQL execution object | +| Statement.execute(String sql) | Executes a non-query SQL statement | `sql`: SQL statement to execute | `boolean`: Indicates if a `ResultSet` is returned | +| Statement.executeQuery(String sql) | Executes a query SQL statement and retrieves the result set | `sql`: SQL query statement | `ResultSet`: Query result set | +| ResultSet.getMetaData() | Retrieves metadata of the result set | None | `ResultSetMetaData`: Metadata object | +| ResultSet.next() | Moves to the next row in the result set | None | `boolean`: Whether the move was successful | +| ResultSet.getString(int columnIndex) | Retrieves the string value of a specified column | `columnIndex`: Column index (starting from 1) | `String`: Column value | + +## Sample Code + +**Note:** When using the Table Model, you must specify the `sql_dialect` parameter as `table` in the URL. Example: + +```Java +String url = "jdbc:iotdb://127.0.0.1:6667?sql_dialect=table"; +``` + +You can find the full example code at [GitHub Repository](https://github.com/apache/iotdb/blob/master/example/jdbc/src/main/java/org/apache/iotdb/TableModelJDBCExample.java). + +Here is an excerpt of the sample code: + +```Java +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb; + +import org.apache.iotdb.jdbc.IoTDBSQLException; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.Connection; +import java.sql.DriverManager; +import java.sql.ResultSet; +import java.sql.ResultSetMetaData; +import java.sql.SQLException; +import java.sql.Statement; + +public class TableModelJDBCExample { + + private static final Logger LOGGER = LoggerFactory.getLogger(TableModelJDBCExample.class); + + public static void main(String[] args) throws ClassNotFoundException, SQLException { + Class.forName("org.apache.iotdb.jdbc.IoTDBDriver"); + + // don't specify database in url + try (Connection connection = + DriverManager.getConnection( + "jdbc:iotdb://127.0.0.1:6667?sql_dialect=table", "root", "root"); + Statement statement = connection.createStatement()) { + + statement.execute("CREATE DATABASE test1"); + statement.execute("CREATE DATABASE test2"); + + statement.execute("use test2"); + + // or use full qualified table name + statement.execute( + "create table test1.table1(region_id STRING ID, plant_id STRING ID, device_id STRING ID, model STRING ATTRIBUTE, temperature FLOAT MEASUREMENT, humidity DOUBLE MEASUREMENT) with (TTL=3600000)"); + + statement.execute( + "create table table2(region_id STRING ID, plant_id STRING ID, color STRING ATTRIBUTE, temperature FLOAT MEASUREMENT, speed DOUBLE MEASUREMENT) with (TTL=6600000)"); + + // show tables from current database + try (ResultSet resultSet = statement.executeQuery("SHOW TABLES")) { + ResultSetMetaData metaData = resultSet.getMetaData(); + System.out.println(metaData.getColumnCount()); + while (resultSet.next()) { + System.out.println(resultSet.getString(1) + ", " + resultSet.getInt(2)); + } + } + + // show tables by specifying another database + // using SHOW tables FROM + try (ResultSet resultSet = statement.executeQuery("SHOW TABLES FROM test1")) { + ResultSetMetaData metaData = resultSet.getMetaData(); + System.out.println(metaData.getColumnCount()); + while (resultSet.next()) { + System.out.println(resultSet.getString(1) + ", " + resultSet.getInt(2)); + } + } + + } catch (IoTDBSQLException e) { + LOGGER.error("IoTDB Jdbc example error", e); + } + + // specify database in url + try (Connection connection = + DriverManager.getConnection( + "jdbc:iotdb://127.0.0.1:6667/test1?sql_dialect=table", "root", "root"); + Statement statement = connection.createStatement()) { + // show tables from current database test1 + try (ResultSet resultSet = statement.executeQuery("SHOW TABLES")) { + ResultSetMetaData metaData = resultSet.getMetaData(); + System.out.println(metaData.getColumnCount()); + while (resultSet.next()) { + System.out.println(resultSet.getString(1) + ", " + resultSet.getInt(2)); + } + } + + // change database to test2 + statement.execute("use test2"); + + try (ResultSet resultSet = statement.executeQuery("SHOW TABLES")) { + ResultSetMetaData metaData = resultSet.getMetaData(); + System.out.println(metaData.getColumnCount()); + while (resultSet.next()) { + System.out.println(resultSet.getString(1) + ", " + resultSet.getInt(2)); + } + } + } + } +} +``` \ No newline at end of file diff --git a/src/UserGuide/latest-Table/API/Programming-Java-Native-API.md b/src/UserGuide/latest-Table/API/Programming-Java-Native-API.md new file mode 100644 index 000000000..c97f50769 --- /dev/null +++ b/src/UserGuide/latest-Table/API/Programming-Java-Native-API.md @@ -0,0 +1,610 @@ + + +IoTDB provides a Java native client driver and a session pool management mechanism. These tools enable developers to interact with IoTDB using object-oriented APIs, allowing time-series objects to be directly assembled and inserted into the database without constructing SQL statements. It is recommended to use the `ITableSessionPool` for multi-threaded database operations to maximize efficiency. + +## Prerequisites + +### Environment Requirements + +- **JDK**: Version 1.8 or higher +- **Maven**: Version 3.6 or higher + +### Adding Maven Dependencies + +```XML + + + com.timecho.iotdb + iotdb-session + 2.0.1.1 + + +``` + +## Read and Write Operations + +### ITableSession Interface + +The `ITableSession` interface defines basic operations for interacting with IoTDB, including data insertion, query execution, and session closure. Note that this interface is **not thread-safe**. + +#### Method Overview + +| **Method Name** | **Description** | **Parameters** | **Return Value** | **Exceptions** | +| --------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------------- | --------------------------------------------------------- | +| insert(Tablet tablet) | Inserts a `Tablet` containing time-series data into the database. | `tablet`: The `Tablet` object to be inserted. | None | `StatementExecutionException`, `IoTDBConnectionException` | +| executeNonQueryStatement(String sql) | Executes non-query SQL statements such as DDL or DML commands. | `sql`: The SQL statement to execute. | None | `StatementExecutionException`, `IoTDBConnectionException` | +| executeQueryStatement(String sql) | Executes a query SQL statement and returns a `SessionDataSet` containing the query results. | `sql`: The SQL query statement to execute. | `SessionDataSet` | `StatementExecutionException`, `IoTDBConnectionException` | +| executeQueryStatement(String sql, long timeoutInMs) | Executes a query SQL statement with a specified timeout in milliseconds. | `sql`: The SQL query statement. `timeoutInMs`: Query timeout in milliseconds. | `SessionDataSet` | `StatementExecutionException` | +| close() | Closes the session and releases resources. | None | None | IoTDBConnectionException | + +#### Sample Code + +```Java +/** + * This interface defines a session for interacting with IoTDB tables. + * It supports operations such as data insertion, executing queries, and closing the session. + * Implementations of this interface are expected to manage connections and ensure + * proper resource cleanup. + * + *

Each method may throw exceptions to indicate issues such as connection errors or + * execution failures. + * + *

Since this interface extends {@link AutoCloseable}, it is recommended to use + * try-with-resources to ensure the session is properly closed. + */ +public interface ITableSession extends AutoCloseable { + + /** + * Inserts a {@link Tablet} into the database. + * + * @param tablet the tablet containing time-series data to be inserted. + * @throws StatementExecutionException if an error occurs while executing the statement. + * @throws IoTDBConnectionException if there is an issue with the IoTDB connection. + */ + void insert(Tablet tablet) throws StatementExecutionException, IoTDBConnectionException; + + /** + * Executes a non-query SQL statement, such as a DDL or DML command. + * + * @param sql the SQL statement to execute. + * @throws IoTDBConnectionException if there is an issue with the IoTDB connection. + * @throws StatementExecutionException if an error occurs while executing the statement. + */ + void executeNonQueryStatement(String sql) throws IoTDBConnectionException, StatementExecutionException; + + /** + * Executes a query SQL statement and returns the result set. + * + * @param sql the SQL query statement to execute. + * @return a {@link SessionDataSet} containing the query results. + * @throws StatementExecutionException if an error occurs while executing the statement. + * @throws IoTDBConnectionException if there is an issue with the IoTDB connection. + */ + SessionDataSet executeQueryStatement(String sql) + throws StatementExecutionException, IoTDBConnectionException; + + /** + * Executes a query SQL statement with a specified timeout and returns the result set. + * + * @param sql the SQL query statement to execute. + * @param timeoutInMs the timeout duration in milliseconds for the query execution. + * @return a {@link SessionDataSet} containing the query results. + * @throws StatementExecutionException if an error occurs while executing the statement. + * @throws IoTDBConnectionException if there is an issue with the IoTDB connection. + */ + SessionDataSet executeQueryStatement(String sql, long timeoutInMs) + throws StatementExecutionException, IoTDBConnectionException; + + /** + * Closes the session, releasing any held resources. + * + * @throws IoTDBConnectionException if there is an issue with closing the IoTDB connection. + */ + @Override + void close() throws IoTDBConnectionException; +} +``` + +### TableSessionBuilder Class + +The `TableSessionBuilder` class is a builder for configuring and creating instances of the `ITableSession` interface. It allows developers to set connection parameters, query parameters, and security features. + +#### Parameter Configuration + +| **Parameter** | **Description** | **Default Value** | +|-----------------------------------------------------| ------------------------------------------------------------ | ------------------------------------------------- | +| nodeUrls(List\ nodeUrls) | Sets the list of IoTDB cluster node URLs. | `Collections.singletonList("``localhost:6667``")` | +| username(String username) | Sets the username for the connection. | `"root"` | +| password(String password) | Sets the password for the connection. | `"root"` | +| database(String database) | Sets the target database name. | `null` | +| queryTimeoutInMs(long queryTimeoutInMs) | Sets the query timeout in milliseconds. | `60000` (1 minute) | +| fetchSize(int fetchSize) | Sets the fetch size for query results. | `5000` | +| zoneId(ZoneId zoneId) | Sets the timezone-related `ZoneId`. | `ZoneId.systemDefault()` | +| thriftDefaultBufferSize(int thriftDefaultBufferSize) | Sets the default buffer size for the Thrift client (in bytes). | `1024`(1KB) | +| thriftMaxFrameSize(int thriftMaxFrameSize) | Sets the maximum frame size for the Thrift client (in bytes). | `64 * 1024 * 1024`(64MB) | +| enableRedirection(boolean enableRedirection) | Enables or disables redirection for cluster nodes. | `true` | +| enableAutoFetch(boolean enableAutoFetch) | Enables or disables automatic fetching of available DataNodes. | `true` | +| maxRetryCount(int maxRetryCount) | Sets the maximum number of connection retry attempts. | `60` | +| retryIntervalInMs(long retryIntervalInMs) | Sets the interval between retry attempts (in milliseconds). | `500`(500 millisesonds) | +| useSSL(boolean useSSL) | Enables or disables SSL for secure connections. | `false` | +| trustStore(String keyStore) | Sets the path to the trust store for SSL connections. | `null` | +| trustStorePwd(String keyStorePwd) | Sets the password for the SSL trust store. | `null` | +| enableCompression(boolean enableCompression) | Enables or disables RPC compression for the connection. | `false` | +| connectionTimeoutInMs(int connectionTimeoutInMs) | Sets the connection timeout in milliseconds. | `0` (no timeout) | + +#### Sample Code + +```Java +/** + * A builder class for constructing instances of {@link ITableSession}. + * + *

This builder provides a fluent API for configuring various options such as connection + * settings, query parameters, and security features. + * + *

All configurations have reasonable default values, which can be overridden as needed. + */ +public class TableSessionBuilder { + + /** + * Builds and returns a configured {@link ITableSession} instance. + * + * @return a fully configured {@link ITableSession}. + * @throws IoTDBConnectionException if an error occurs while establishing the connection. + */ + public ITableSession build() throws IoTDBConnectionException; + + /** + * Sets the list of node URLs for the IoTDB cluster. + * + * @param nodeUrls a list of node URLs. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue Collection.singletonList("localhost:6667") + */ + public TableSessionBuilder nodeUrls(List nodeUrls); + + /** + * Sets the username for the connection. + * + * @param username the username. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue "root" + */ + public TableSessionBuilder username(String username); + + /** + * Sets the password for the connection. + * + * @param password the password. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue "root" + */ + public TableSessionBuilder password(String password); + + /** + * Sets the target database name. + * + * @param database the database name. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue null + */ + public TableSessionBuilder database(String database); + + /** + * Sets the query timeout in milliseconds. + * + * @param queryTimeoutInMs the query timeout in milliseconds. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 60000 (1 minute) + */ + public TableSessionBuilder queryTimeoutInMs(long queryTimeoutInMs); + + /** + * Sets the fetch size for query results. + * + * @param fetchSize the fetch size. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 5000 + */ + public TableSessionBuilder fetchSize(int fetchSize); + + /** + * Sets the {@link ZoneId} for timezone-related operations. + * + * @param zoneId the {@link ZoneId}. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue ZoneId.systemDefault() + */ + public TableSessionBuilder zoneId(ZoneId zoneId); + + /** + * Sets the default init buffer size for the Thrift client. + * + * @param thriftDefaultBufferSize the buffer size in bytes. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 1024 (1 KB) + */ + public TableSessionBuilder thriftDefaultBufferSize(int thriftDefaultBufferSize); + + /** + * Sets the maximum frame size for the Thrift client. + * + * @param thriftMaxFrameSize the maximum frame size in bytes. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 64 * 1024 * 1024 (64 MB) + */ + public TableSessionBuilder thriftMaxFrameSize(int thriftMaxFrameSize); + + /** + * Enables or disables redirection for cluster nodes. + * + * @param enableRedirection whether to enable redirection. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue true + */ + public TableSessionBuilder enableRedirection(boolean enableRedirection); + + /** + * Enables or disables automatic fetching of available DataNodes. + * + * @param enableAutoFetch whether to enable automatic fetching. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue true + */ + public TableSessionBuilder enableAutoFetch(boolean enableAutoFetch); + + /** + * Sets the maximum number of retries for connection attempts. + * + * @param maxRetryCount the maximum retry count. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 60 + */ + public TableSessionBuilder maxRetryCount(int maxRetryCount); + + /** + * Sets the interval between retries in milliseconds. + * + * @param retryIntervalInMs the interval in milliseconds. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 500 milliseconds + */ + public TableSessionBuilder retryIntervalInMs(long retryIntervalInMs); + + /** + * Enables or disables SSL for secure connections. + * + * @param useSSL whether to enable SSL. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue false + */ + public TableSessionBuilder useSSL(boolean useSSL); + + /** + * Sets the trust store path for SSL connections. + * + * @param keyStore the trust store path. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue null + */ + public TableSessionBuilder trustStore(String keyStore); + + /** + * Sets the trust store password for SSL connections. + * + * @param keyStorePwd the trust store password. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue null + */ + public TableSessionBuilder trustStorePwd(String keyStorePwd); + + /** + * Enables or disables rpc compression for the connection. + * + * @param enableCompression whether to enable compression. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue false + */ + public TableSessionBuilder enableCompression(boolean enableCompression); + + /** + * Sets the connection timeout in milliseconds. + * + * @param connectionTimeoutInMs the connection timeout in milliseconds. + * @return the current {@link TableSessionBuilder} instance. + * @defaultValue 0 (no timeout) + */ + public TableSessionBuilder connectionTimeoutInMs(int connectionTimeoutInMs); +} +``` + +## Session Pool + +### ITableSessionPool Interface + +The `ITableSessionPool` interface manages a pool of `ITableSession` instances, enabling efficient reuse of connections and proper cleanup of resources. + +#### Method Overview + +| **Method Name** | **Description** | **Return Value** | **Exceptions** | +| --------------- | ---------------------------------------------------------- | ---------------- | -------------------------- | +| getSession() | Acquires a session from the pool for database interaction. | `ITableSession` | `IoTDBConnectionException` | +| close() | Closes the session pool and releases resources.。 | None | None | + +#### Sample Code + +```Java +/** + * This interface defines a pool for managing {@link ITableSession} instances. + * It provides methods to acquire a session from the pool and to close the pool. + * + *

The implementation should handle the lifecycle of sessions, ensuring efficient + * reuse and proper cleanup of resources. + */ +public interface ITableSessionPool { + + /** + * Acquires an {@link ITableSession} instance from the pool. + * + * @return an {@link ITableSession} instance for interacting with the IoTDB. + * @throws IoTDBConnectionException if there is an issue obtaining a session from the pool. + */ + ITableSession getSession() throws IoTDBConnectionException; + + /** + * Closes the session pool, releasing any held resources. + * + *

Once the pool is closed, no further sessions can be acquired. + */ + void close(); +} +``` + +### TableSessionPoolBuilder Class + +The `TableSessionPoolBuilder` class is a builder for configuring and creating `ITableSessionPool` instances, supporting options like connection settings and pooling behavior. + +#### Parameter Configuration + +| **Parameter** | **Description** | **Default Value** | +|---------------------------------------------------------------| ------------------------------------------------------------ | --------------------------------------------- | +| nodeUrls(List\ nodeUrls) | Sets the list of IoTDB cluster node URLs. | `Collections.singletonList("localhost:6667")` | +| maxSize(int maxSize) | Sets the maximum size of the session pool, i.e., the maximum number of sessions allowed in the pool. | `5` | +| user(String user) | Sets the username for the connection. | `"root"` | +| password(String password) | Sets the password for the connection. | `"root"` | +| database(String database) | Sets the target database name. | `"root"` | +| queryTimeoutInMs(long queryTimeoutInMs) | Sets the query timeout in milliseconds. | `60000`(1 minute) | +| fetchSize(int fetchSize) | Sets the fetch size for query results. | `5000` | +| zoneId(ZoneId zoneId) | Sets the timezone-related `ZoneId`. | `ZoneId.systemDefault()` | +| waitToGetSessionTimeoutInMs(long waitToGetSessionTimeoutInMs) | Sets the timeout duration (in milliseconds) for acquiring a session from the pool. | `30000`(30 seconds) | +| thriftDefaultBufferSize(int thriftDefaultBufferSize) | Sets the default buffer size for the Thrift client (in bytes). | `1024`(1KB) | +| thriftMaxFrameSize(int thriftMaxFrameSize) | Sets the maximum frame size for the Thrift client (in bytes). | `64 * 1024 * 1024`(64MB) | +| enableCompression(boolean enableCompression) | Enables or disables compression for the connection. | `false` | +| enableRedirection(boolean enableRedirection) | Enables or disables redirection for cluster nodes. | `true` | +| connectionTimeoutInMs(int connectionTimeoutInMs) | Sets the connection timeout in milliseconds. | `10000` (10 seconds) | +| enableAutoFetch(boolean enableAutoFetch) | Enables or disables automatic fetching of available DataNodes. | `true` | +| maxRetryCount(int maxRetryCount) | Sets the maximum number of connection retry attempts. | `60` | +| retryIntervalInMs(long retryIntervalInMs) | Sets the interval between retry attempts (in milliseconds). | `500` (500 milliseconds) | +| useSSL(boolean useSSL) | Enables or disables SSL for secure connections. | `false` | +| trustStore(String keyStore) | Sets the path to the trust store for SSL connections. | `null` | +| trustStorePwd(String keyStorePwd) | Sets the password for the SSL trust store. | `null` | + +#### Sample Code + +```Java +/** + * A builder class for constructing instances of {@link ITableSessionPool}. + * + *

This builder provides a fluent API for configuring a session pool, including + * connection settings, session parameters, and pool behavior. + * + *

All configurations have reasonable default values, which can be overridden as needed. + */ +public class TableSessionPoolBuilder { + + /** + * Builds and returns a configured {@link ITableSessionPool} instance. + * + * @return a fully configured {@link ITableSessionPool}. + */ + public ITableSessionPool build(); + + /** + * Sets the list of node URLs for the IoTDB cluster. + * + * @param nodeUrls a list of node URLs. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue Collection.singletonList("localhost:6667") + */ + public TableSessionPoolBuilder nodeUrls(List nodeUrls); + + /** + * Sets the maximum size of the session pool. + * + * @param maxSize the maximum number of sessions allowed in the pool. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 5 + */ + public TableSessionPoolBuilder maxSize(int maxSize); + + /** + * Sets the username for the connection. + * + * @param user the username. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue "root" + */ + public TableSessionPoolBuilder user(String user); + + /** + * Sets the password for the connection. + * + * @param password the password. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue "root" + */ + public TableSessionPoolBuilder password(String password); + + /** + * Sets the target database name. + * + * @param database the database name. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue "root" + */ + public TableSessionPoolBuilder database(String database); + + /** + * Sets the query timeout in milliseconds. + * + * @param queryTimeoutInMs the query timeout in milliseconds. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 60000 (1 minute) + */ + public TableSessionPoolBuilder queryTimeoutInMs(long queryTimeoutInMs); + + /** + * Sets the fetch size for query results. + * + * @param fetchSize the fetch size. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 5000 + */ + public TableSessionPoolBuilder fetchSize(int fetchSize); + + /** + * Sets the {@link ZoneId} for timezone-related operations. + * + * @param zoneId the {@link ZoneId}. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue ZoneId.systemDefault() + */ + public TableSessionPoolBuilder zoneId(ZoneId zoneId); + + /** + * Sets the timeout for waiting to acquire a session from the pool. + * + * @param waitToGetSessionTimeoutInMs the timeout duration in milliseconds. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 30000 (30 seconds) + */ + public TableSessionPoolBuilder waitToGetSessionTimeoutInMs(long waitToGetSessionTimeoutInMs); + + /** + * Sets the default buffer size for the Thrift client. + * + * @param thriftDefaultBufferSize the buffer size in bytes. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 1024 (1 KB) + */ + public TableSessionPoolBuilder thriftDefaultBufferSize(int thriftDefaultBufferSize); + + /** + * Sets the maximum frame size for the Thrift client. + * + * @param thriftMaxFrameSize the maximum frame size in bytes. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 64 * 1024 * 1024 (64 MB) + */ + public TableSessionPoolBuilder thriftMaxFrameSize(int thriftMaxFrameSize); + + /** + * Enables or disables compression for the connection. + * + * @param enableCompression whether to enable compression. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue false + */ + public TableSessionPoolBuilder enableCompression(boolean enableCompression); + + /** + * Enables or disables redirection for cluster nodes. + * + * @param enableRedirection whether to enable redirection. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue true + */ + public TableSessionPoolBuilder enableRedirection(boolean enableRedirection); + + /** + * Sets the connection timeout in milliseconds. + * + * @param connectionTimeoutInMs the connection timeout in milliseconds. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 10000 (10 seconds) + */ + public TableSessionPoolBuilder connectionTimeoutInMs(int connectionTimeoutInMs); + + /** + * Enables or disables automatic fetching of available DataNodes. + * + * @param enableAutoFetch whether to enable automatic fetching. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue true + */ + public TableSessionPoolBuilder enableAutoFetch(boolean enableAutoFetch); + + /** + * Sets the maximum number of retries for connection attempts. + * + * @param maxRetryCount the maximum retry count. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 60 + */ + public TableSessionPoolBuilder maxRetryCount(int maxRetryCount); + + /** + * Sets the interval between retries in milliseconds. + * + * @param retryIntervalInMs the interval in milliseconds. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue 500 milliseconds + */ + public TableSessionPoolBuilder retryIntervalInMs(long retryIntervalInMs); + + /** + * Enables or disables SSL for secure connections. + * + * @param useSSL whether to enable SSL. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue false + */ + public TableSessionPoolBuilder useSSL(boolean useSSL); + + /** + * Sets the trust store path for SSL connections. + * + * @param keyStore the trust store path. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue null + */ + public TableSessionPoolBuilder trustStore(String keyStore); + + /** + * Sets the trust store password for SSL connections. + * + * @param keyStorePwd the trust store password. + * @return the current {@link TableSessionPoolBuilder} instance. + * @defaultValue null + */ + public TableSessionPoolBuilder trustStorePwd(String keyStorePwd); +} +``` \ No newline at end of file diff --git a/src/UserGuide/latest-Table/API/Programming-Python-Native-API.md b/src/UserGuide/latest-Table/API/Programming-Python-Native-API.md new file mode 100644 index 000000000..5ed77d6da --- /dev/null +++ b/src/UserGuide/latest-Table/API/Programming-Python-Native-API.md @@ -0,0 +1,448 @@ + + +IoTDB provides a Python native client driver and a session pool management mechanism. These tools allow developers to interact with IoTDB in a programmatic and efficient manner. Using the Python API, developers can encapsulate time-series data into objects (e.g., `Tablet`, `NumpyTablet`) and insert them into the database directly, without the need to manually construct SQL statements. For multi-threaded operations, the `TableSessionPool` is recommended to optimize resource utilization and enhance performance. + +## Prerequisites + +To use the IoTDB Python API, install the required package using pip: + +```Java +pip3 install apache-iotdb +``` + +## Read and Write Operations + +### TableSession + +`TableSession` is a core class in IoTDB, enabling users to interact with the IoTDB database. It provides methods to execute SQL statements, insert data, and manage database sessions. + +#### Method Overview + +| **Method Name** | **Descripton** | **Parameter Type** | **Return Type** | +| --------------------------- | ----------------------------------------------------- | ------------------------------------ | ---------------- | +| insert | Inserts data into the database. | tablet: `Union[Tablet, NumpyTablet]` | None | +| execute_non_query_statement | Executes non-query SQL statements like DDL/DML. | sql: `str` | None | +| execute_query_statement | Executes a query SQL statement and retrieves results. | sql: `str` | `SessionDataSet` | +| close | Closes the session and releases resources. | None | None | + +#### Sample Code + +```Python +class TableSession(object): +def insert(self, tablet: Union[Tablet, NumpyTablet]): + """ + Insert data into the database. + + Parameters: + tablet (Tablet | NumpyTablet): The tablet containing the data to be inserted. + Accepts either a `Tablet` or `NumpyTablet`. + + Raises: + IoTDBConnectionException: If there is an issue with the database connection. + """ + pass + +def execute_non_query_statement(self, sql: str): + """ + Execute a non-query SQL statement. + + Parameters: + sql (str): The SQL statement to execute. Typically used for commands + such as INSERT, DELETE, or UPDATE. + + Raises: + IoTDBConnectionException: If there is an issue with the database connection. + """ + pass + +def execute_query_statement(self, sql: str, timeout_in_ms: int = 0) -> "SessionDataSet": + """ + Execute a query SQL statement and return the result set. + + Parameters: + sql (str): The SQL query to execute. + timeout_in_ms (int, optional): Timeout for the query in milliseconds. Defaults to 0, + which means no timeout. + + Returns: + SessionDataSet: The result set of the query. + + Raises: + IoTDBConnectionException: If there is an issue with the database connection. + """ + pass + +def close(self): + """ + Close the session and release resources. + + Raises: + IoTDBConnectionException: If there is an issue closing the connection. + """ + pass +``` + +### TableSessionConfig + +`TableSessionConfig` is a configuration class that sets parameters for creating a `TableSession` instance, defining essential settings for connecting to the IoTDB database. + +#### Parameter Configuration + +| **Parameter** | **Description** | **Type** | **Default Value** | +| ------------------ | ------------------------------------- | -------- | ------------------------- | +| node_urls | List of database node URLs. | `list` | `["localhost:6667"]` | +| username | Username for the database connection. | `str` | `"root"` | +| password | Password for the database connection. | `str` | `"root"` | +| database | Target database to connect to. | `str` | `None` | +| fetch_size | Number of rows to fetch per query. | `int` | `5000` | +| time_zone | Default session time zone. | `str` | `Session.DEFAULT_ZONE_ID` | +| enable_compression | Enable data compression. | `bool` | `False` | + +#### Sample Code + +```Python +class TableSessionConfig(object): + """ + Configuration class for a TableSession. + + This class defines various parameters for connecting to and interacting + with the IoTDB tables. + """ + + def __init__( + self, + node_urls: list = None, + username: str = Session.DEFAULT_USER, + password: str = Session.DEFAULT_PASSWORD, + database: str = None, + fetch_size: int = 5000, + time_zone: str = Session.DEFAULT_ZONE_ID, + enable_compression: bool = False, + ): + """ + Initialize a TableSessionConfig object with the provided parameters. + + Parameters: + node_urls (list, optional): A list of node URLs for the database connection. + Defaults to ["localhost:6667"]. + username (str, optional): The username for the database connection. + Defaults to "root". + password (str, optional): The password for the database connection. + Defaults to "root". + database (str, optional): The target database to connect to. Defaults to None. + fetch_size (int, optional): The number of rows to fetch per query. Defaults to 5000. + time_zone (str, optional): The default time zone for the session. + Defaults to Session.DEFAULT_ZONE_ID. + enable_compression (bool, optional): Whether to enable data compression. + Defaults to False. + """ +``` + +**Note:** After using a `TableSession`, make sure to call the `close` method to release resources. + +## Session Pool + +### TableSessionPool + +`TableSessionPool` is a session pool management class designed for creating and managing `TableSession` instances. It provides functionality to retrieve sessions from the pool and close the pool when it is no longer needed. + +#### Method Overview + +| **Method Name** | **Description** | **Return Type** | **Exceptions** | +| --------------- | ------------------------------------------------------ | --------------- | -------------- | +| get_session | Retrieves a new `TableSession` instance from the pool. | `TableSession` | None | +| close | Closes the session pool and releases all resources. | None | None | + +#### Sample Code + +```Java +def get_session(self) -> TableSession: + """ + Retrieve a new TableSession instance. + + Returns: + TableSession: A new session object configured with the session pool. + + Notes: + The session is initialized with the underlying session pool for managing + connections. Ensure proper usage of the session's lifecycle. + """ + +def close(self): + """ + Close the session pool and release all resources. + + This method closes the underlying session pool, ensuring that all + resources associated with it are properly released. + + Notes: + After calling this method, the session pool cannot be used to retrieve + new sessions, and any attempt to do so may raise an exception. + """ +``` + +### TableSessionPoolConfig + +`TableSessionPoolConfig` is a configuration class used to define parameters for initializing and managing a `TableSessionPool` instance. It specifies the settings needed for efficient session pool management in IoTDB. + +#### Parameter Configuration + +| **Paramater** | **Description** | **Type** | **Default Value** | +| ------------------ | ------------------------------------------------------------ | -------- | -------------------------- | +| node_urls | List of IoTDB cluster node URLs. | `list` | None | +| max_pool_size | Maximum size of the session pool, i.e., the maximum number of sessions allowed in the pool. | `int` | `5` | +| username | Username for the connection. | `str` | `Session.DEFAULT_USER` | +| password | Password for the connection. | `str` | `Session.DEFAULT_PASSWORD` | +| database | Target database to connect to. | `str` | None | +| fetch_size | Fetch size for query results | `int` | `5000` | +| time_zone | Timezone-related `ZoneId` | `str` | `Session.DEFAULT_ZONE_ID` | +| enable_redirection | Whether to enable redirection. | `bool` | `False` | +| enable_compression | Whether to enable data compression. | `bool` | `False` | +| wait_timeout_in_ms | Sets the connection timeout in milliseconds. | `int` | `10000` | +| max_retry | Maximum number of connection retry attempts. | `int` | `3` | + +#### Sample Code + +```Java +class TableSessionPoolConfig(object): + """ + Configuration class for a TableSessionPool. + + This class defines the parameters required to initialize and manage + a session pool for interacting with the IoTDB database. + """ + def __init__( + self, + node_urls: list = None, + max_pool_size: int = 5, + username: str = Session.DEFAULT_USER, + password: str = Session.DEFAULT_PASSWORD, + database: str = None, + fetch_size: int = 5000, + time_zone: str = Session.DEFAULT_ZONE_ID, + enable_redirection: bool = False, + enable_compression: bool = False, + wait_timeout_in_ms: int = 10000, + max_retry: int = 3, + ): + """ + Initialize a TableSessionPoolConfig object with the provided parameters. + + Parameters: + node_urls (list, optional): A list of node URLs for the database connection. + Defaults to None. + max_pool_size (int, optional): The maximum number of sessions in the pool. + Defaults to 5. + username (str, optional): The username for the database connection. + Defaults to Session.DEFAULT_USER. + password (str, optional): The password for the database connection. + Defaults to Session.DEFAULT_PASSWORD. + database (str, optional): The target database to connect to. Defaults to None. + fetch_size (int, optional): The number of rows to fetch per query. Defaults to 5000. + time_zone (str, optional): The default time zone for the session pool. + Defaults to Session.DEFAULT_ZONE_ID. + enable_redirection (bool, optional): Whether to enable redirection. + Defaults to False. + enable_compression (bool, optional): Whether to enable data compression. + Defaults to False. + wait_timeout_in_ms (int, optional): The maximum time (in milliseconds) to wait for a session + to become available. Defaults to 10000. + max_retry (int, optional): The maximum number of retry attempts for operations. Defaults to 3. + + """ +``` + +**Notes:** + +- Ensure that `TableSession` instances retrieved from the `TableSessionPool` are properly closed after use. +- After closing the `TableSessionPool`, it will no longer be possible to retrieve new sessions. + +## Sample Code + +**Session** Example: You can find the full example code at [GitHub Repository](https://github.com/apache/iotdb/blob/master/iotdb-client/client-py/table_model_session_example.py). + +**Session Pool** Example: You can find the full example code at [GitHub Repository](https://github.com/apache/iotdb/blob/master/iotdb-client/client-py/table_model_session_pool_example.py). + +Here is an excerpt of the sample code: + +```Java +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +import threading + +import numpy as np + +from iotdb.table_session_pool import TableSessionPool, TableSessionPoolConfig +from iotdb.utils.IoTDBConstants import TSDataType +from iotdb.utils.NumpyTablet import NumpyTablet +from iotdb.utils.Tablet import ColumnType, Tablet + + +def prepare_data(): + print("create database") + # Get a session from the pool + session = session_pool.get_session() + session.execute_non_query_statement("CREATE DATABASE IF NOT EXISTS db1") + session.execute_non_query_statement('USE "db1"') + session.execute_non_query_statement( + "CREATE TABLE table0 (id1 string id, attr1 string attribute, " + + "m1 double " + + "measurement)" + ) + session.execute_non_query_statement( + "CREATE TABLE table1 (id1 string id, attr1 string attribute, " + + "m1 double " + + "measurement)" + ) + + print("now the tables are:") + # show result + res = session.execute_query_statement("SHOW TABLES") + while res.has_next(): + print(res.next()) + + session.close() + + +def insert_data(num: int): + print("insert data for table" + str(num)) + # Get a session from the pool + session = session_pool.get_session() + column_names = [ + "id1", + "attr1", + "m1", + ] + data_types = [ + TSDataType.STRING, + TSDataType.STRING, + TSDataType.DOUBLE, + ] + column_types = [ColumnType.ID, ColumnType.ATTRIBUTE, ColumnType.MEASUREMENT] + timestamps = [] + values = [] + for row in range(15): + timestamps.append(row) + values.append(["id:" + str(row), "attr:" + str(row), row * 1.0]) + tablet = Tablet( + "table" + str(num), column_names, data_types, values, timestamps, column_types + ) + session.insert(tablet) + session.execute_non_query_statement("FLush") + + np_timestamps = np.arange(15, 30, dtype=np.dtype(">i8")) + np_values = [ + np.array(["id:{}".format(i) for i in range(15, 30)]), + np.array(["attr:{}".format(i) for i in range(15, 30)]), + np.linspace(15.0, 29.0, num=15, dtype=TSDataType.DOUBLE.np_dtype()), + ] + + np_tablet = NumpyTablet( + "table" + str(num), + column_names, + data_types, + np_values, + np_timestamps, + column_types=column_types, + ) + session.insert(np_tablet) + session.close() + + +def query_data(): + # Get a session from the pool + session = session_pool.get_session() + + print("get data from table0") + res = session.execute_query_statement("select * from table0") + while res.has_next(): + print(res.next()) + + print("get data from table1") + res = session.execute_query_statement("select * from table0") + while res.has_next(): + print(res.next()) + + session.close() + + +def delete_data(): + session = session_pool.get_session() + session.execute_non_query_statement("drop database db1") + print("data has been deleted. now the databases are:") + res = session.execute_query_statement("show databases") + while res.has_next(): + print(res.next()) + session.close() + + +# Create a session pool +username = "root" +password = "root" +node_urls = ["127.0.0.1:6667", "127.0.0.1:6668", "127.0.0.1:6669"] +fetch_size = 1024 +database = "db1" +max_pool_size = 5 +wait_timeout_in_ms = 3000 +config = TableSessionPoolConfig( + node_urls=node_urls, + username=username, + password=password, + database=database, + max_pool_size=max_pool_size, + fetch_size=fetch_size, + wait_timeout_in_ms=wait_timeout_in_ms, +) +session_pool = TableSessionPool(config) + +prepare_data() + +insert_thread1 = threading.Thread(target=insert_data, args=(0,)) +insert_thread2 = threading.Thread(target=insert_data, args=(1,)) + +insert_thread1.start() +insert_thread2.start() + +insert_thread1.join() +insert_thread2.join() + +query_data() +delete_data() +session_pool.close() +print("example is finished!") +``` + diff --git a/src/UserGuide/latest-Table/IoTDB-Introduction/Release-history_timecho.md b/src/UserGuide/latest-Table/IoTDB-Introduction/Release-history_timecho.md new file mode 100644 index 000000000..9cf91d467 --- /dev/null +++ b/src/UserGuide/latest-Table/IoTDB-Introduction/Release-history_timecho.md @@ -0,0 +1,299 @@ + + +### TimechoDB (Database Core) + +#### **V1.3.4.1** + +> **Release Date**: January 8, 2025 +> +> **Download**: Please contact the Timecho team for download. + +Version V1.3.4.1 introduces a pattern-matching function and further optimizes the data subscription mechanism for improved stability. The `import-data` and `export-data` scripts have been enhanced to support additional data types. The `import-data` and `export-data` scripts have been unified, now supporting the import and export of `TsFile`, `CSV`, and `SQL` formats. Meanwhile, comprehensive improvements have been made to database monitoring, performance, and stability. The specific release contents are as follows: + +- **Query** **Module**: Users can configure UDF, PipePlugin, Trigger, and AINode settings and load JAR packages via a URI. +- **System Module**: + - Expansion of UDF, + - Added `pattern_match` function for pattern matching. +- **Data Synchronization**: Supports specifying authentication information on the sender side. +- **Ecosystem Integration**: Kubernetes Operator compatibility. +- **Scripts & Tools**: + - `import-data`/`export-data` scripts now support new data types (strings, large binary objects, dates, timestamps). + - Unified import/export compatibility for TsFile, CSV, and SQL formats. + +#### **V1.3.3.3** + +> **Release Date**: October 31, 2024 +> +> **Download**: Please contact the Timecho team for download. + +Version V1.3.3.3 adds the following features: optimization of restart and recovery performance to reduce startup time; the `DataNode `actively listens for and loads `TsFile` data; addition of observability indicators; once the sender transfers files to a specified directory, the receiver automatically loads them into IoTDB.; the `Alter Pipe` supports the `Alter Source` capability. At the same time, comprehensive improvements have been made to database monitoring, performance, and stability. The specific release contents are as follows: + +- **Data Synchronization**: + - Automatic data type conversion on the receiver side. + - Enhanced observability with ops/latency metrics for internal interfaces. + - OPC-UA-SINK plugin now supports CS mode and non-anonymous access. +- **Data Subscription**: SDK supports `CREATE IF NOT EXISTS` and `DROP IF EXISTS` interfaces. +- **Stream Processing**: `ALTER PIPE` supports `ALTER SOURCE` capability. +- **System Module**: Added latency monitoring for REST modules. +- **Scripts & Tools**: + - Auto-loading `TsFile` from specified directories. + - `import-tsfile` script supports remote server execution. + - Added Kubernetes Helm support. + - Python client now supports new data types (strings, large binary objects, dates, timestamps). + +#### **V1.3.3.2** + +> **Release Date**: August 15, 2024 +> +> **Download**: Please contact the Timecho team for download. + +Version V1.3.3.2 supports outputting the time consumption of reading `mods `files, the memory for maximum sequential disorder merge-sort during input, and the dispatch time consumption. It also enables adjustment of the time partition origin through parameter configuration, and supports automatic termination of subscriptions based on the end-marker of historical pipe data processing. Meanwhile, it combines the performance improvement of module memory control. The specific release contents are as follows: + +- **Query** **Module**: + - `EXPLAIN ANALYZE` now reports time spent reading mods files. + - Metrics for merge-sort memory usage and dispatch latency. +- **Storage Module**: Added configurable time partition origin adjustment. +- **Stream Processing**: Auto-terminate subscriptions based on pipe history markers. +- **Data Synchronization**: RPC compression now supports configurable levels. +- **Scripts & Tools**: Metadata export excludes only `root.__system`, not similar prefixes. + +#### **V1.3.3.1** + +> **Release Date**: July 12, 2024 +> +> **Download**: Please contact the Timecho team for download. + +In version V1.3.3.1, a throttling mechanism is added to multi-tier storage. Data synchronization supports specifying username and password authentication for the receiver at the sender's sink. Some unclear WARN logs on the data synchronization receiver side are optimized, the restart-recovery performance is enhanced, and the startup time is reduced. Meanwhile, the script contents are merged. The specific release contents are as follows: + +- **Storage Module**: Rate-limiting added to multi-tier storage. +- **Data Synchronization**: Sender-side username/password authentication for receivers. +- **System Module**: + - Merged configuration files into `iotdb-system.properties`. + - Optimized restart recovery time. +- **Query** **Module**: + - Improved filter performance for aggregation and WHERE clauses. + - Java Session client distributes SQL query requests evenly to all nodes. + +#### **V1.3.2.2** + +> **Release Date**: June 4, 2024 +> +> **Download**: Please contact the Timecho team for download. + +The V1.3.2.2 version introduces the Explain Analyze statement for analyzing the execution time of a single `SQL `query, a User-Defined Aggregate Function (`UDAF`) framework, automatic data deletion when disk space reaches a set threshold, schema synchronization, counting data points in specified paths, and `SQL `script import/export functionality. The cluster management tool now supports rolling upgrades and plugin deployment across the entire cluster. Comprehensive improvements have also been made to database monitoring, performance, and stability. The specific release content is as follows: + +**Storage Module:** + +- Improved write performance of the `insertRecords `interface. +- Added `SpaceTL `functionality to automatically delete data when disk space reaches a set threshold. + +**Query** **Module:** + +- Added the `Explain Analyze` statement to monitor the execution time of each stage of a single SQL query. +- Introduced a User-Defined Aggregate Function (UDAF) framework. +- Added envelope demodulation analysis in UDF. +- Added `MaxBy/MinBy` functions to return the corresponding timestamp while obtaining the maximum/minimum value. +- Improved performance of value filter queries. + +**Data Synchronization:** + +- Path matching now supports wildcards. +- Schema synchronization is now supported, including time series and related attributes, permissions, and other settings. + +**Stream Processing:** + +- Added the `Alter Pipe` statement to support hot updates of Pipe task plugins. + +**System Module:** + +- Enhanced system data point counting to include statistics for `load TsFile`. + +**Scripts and Tools:** + +- Added a local upgrade backup tool that uses hard links to back up existing data. +- Introduced `export-data/import-data` scripts to support data export in `CSV`, `TsFile `, or as `SQL `statements. +- The Windows environment now supports distinguishing `ConfigNode`, `DataNode`, and `Cli `by window name. + +#### **V1.3.1.4** + +> **Release Date**: April 23, 2024 +> +> **Download**: Please contact the Timecho team for download. + +The V1.3.1 release introduces several new features and enhancements, including the ability to view system activation status, built-in variance and standard deviation aggregate functions, timeout settings for the built-in `Fill `statement, and a `TsFile `repair command. Additionally, one-click scripts for collecting instance information and starting/stopping the cluster have been added. The usability and performance of views and stream processing have also been optimized. The specific release content is as follows: + +**Query** **Module:** + +- The `Fill `clause now supports setting a fill timeout threshold; no fill will occur if the time threshold is exceeded. +- The `REST API` (V2 version) now returns column types. + +**Data Synchronization:** + +- Simplified the way to specify time ranges for data synchronization by directly setting start and end times. +- Data synchronization now supports the `SSL `transport protocol (via the` iotdb-thrift-ssl-sink` plugin). + +**System Module:** + +- Added the ability to query cluster activation information using SQL. +- Added transmission rate control during data migration in multi-tier storage. +- Enhanced system observability (added divergence monitoring for cluster nodes and observability for the distributed task scheduling framework). +- Optimized the default log output strategy. + +**Scripts and Tools:** + +- Added one-click scripts to start and stop the cluster (`start-all/stop-all.sh & start-all/stop-all.bat`). +- Added one-click scripts to collect instance information (`collect-info.sh & collect-info.bat`). + +#### **V1.3.0.4** + +> **Release Date**: January 3, 2024 +> +> **Download**: Please contact the Timecho team for download. + + + +The V1.3.0.4 release introduces a new inborn machine learning framework `AINode`, a comprehensive upgrade of the permission module to support sequence-granularity permissions, and numerous detail optimizations for views and stream processing. These enhancements further improve usability, version stability, and overall performance. The specific release content is as follows: + +**Query** **Module:** + +- Added the `AINode `inborn machine learning module. +- Optimized the performance of the `show path` statement to reduce response time. + +**Security Module:** + +- Upgraded the permission module to support permission settings at the time-series granularity. +- Added `SSL `communication encryption between clients and servers. + +**Stream Processing:** + +- Added multiple new metrics for monitoring in the stream processing module. + +**Query** **Module:** + +- Non-writable view sequences now support `LAST` queries. +- Optimized the accuracy of data point monitoring statistics. + +#### **V1.2.0.1** + +> **Release Date**: June 30, 2023 +> +> **Download**: Please contact the Timecho team for download. + +The V1.2.0.1 release introduces several new features, including a new stream processing framework, dynamic templates, and built-in query functions such as `substring`, `replace`, and `round`. It also enhances the functionality of built-in statements like `show region`, `show timeseries`, and `show variable`, as well as the Session interface. Additionally, it optimizes built-in monitoring items and their implementation, and fixes several product bugs and performance issues. The specific release content is as follows: + +**Stream Processing:** + +- Added a new stream processing framework. + +**Schema Module:** + +- Added dynamic template expansion functionality. + +**Storage Module:** + +- Added SPRINTZ and RLBE encoding, as well as the LZMA2 compression algorithm. + +**Query** **Module:** + +- Added built-in scalar functions: `cast`, `round`, `substr`, `replace`. +- Added built-in aggregate functions: `time_duration`, `mode`. +- SQL statements now support `CASE WHEN` syntax. +- SQL statements now support `ORDER BY` expressions. + +**Interface Module:** + +- Python API now supports connecting to multiple distributed nodes. +- Python client now supports write redirection. +- Session API added an interface for creating sequences in batches using templates. + +#### **V1.1.0.1** + +> **Release Date**: April 3, 2023 +> +> **Download**: Please contact the Timecho team for download. + + + +The V1.1.0.1 release introduces several new features, including support for `GROUP BY VARIATION`, `GROUP BY CONDITION`, and useful functions like `DIFF` and `COUNT_IF`. It also introduces the pipeline execution engine to further improve query speed. Additionally, it fixes several issues related to last query alignment, `LIMIT` and `OFFSET` functionality, metadata template errors after restart, and sequence creation errors after deleting all databases. The specific release content is as follows: + +**Query** **Module:** + +- `ALIGN BY DEVICE` statements now support `ORDER BY TIME`. +- Added support for the `SHOW QUERIES` command. +- Added support for the `KILL QUERY` command. + +**System Module:** + +- `SHOW REGIONS` now supports specifying a particular database. +- Added the `SHOW VARIABLES` SQL command to display current cluster parameters. +- Aggregation queries now support `GROUP BY VARIATION`. +- `SELECT INTO` now supports explicit data type conversion. +- Implemented the built-in scalar function `DIFF`. +- `SHOW REGIONS` now displays creation time. +- Implemented the built-in aggregate function `COUNT_IF`. +- Aggregation queries now support `GROUP BY CONDITION`. +- Added support for modifying `dn_rpc_port` and `dn_rpc_address`. + +#### **V1.0.0.1** + +> **Release Date**: December 3, 2022 +> +> **Download**: Please contact the Timecho team for download. + + + +The V1.0.0.1 release focuses on fixing issues related to partition computation and query execution, undeleted historical snapshots, data query problems, and SessionPool memory usage. It also introduces several new features, such as support for `SHOW VARIABLES`, `EXPLAIN ALIGN BY DEVICE`, and enhanced functionality for ExportCSV/ExportTsFile/MQTT. Additionally, it improves the cluster startup/shutdown process, changes the default internal ports of the IoTDB cluster, and adds the `cluster_name` attribute to distinguish clusters. The specific release content is as follows: + +**System Module:** + +- Added support for distributed high-availability architecture. +- Added support for multi-replica storage. +- If a port is already in use, the node startup process will be terminated. +- Added cluster management SQL. +- Added functional management for starting, stopping, and removing ConfigNodes and DataNodes. +- Configurable consensus protocol framework and multiple consensus protocols: Simple, IoTConsensus, Ratis. +- Added multi-replica management for data, schema, and ConfigNodes. + +**Query** **Module:** + +- Added support for the large-scale parallel processing framework MPP, providing distributed read/write capabilities. + +**Stream Processing Module:** + +- Added support for the stream processing framework. +- Added support for data synchronization between clusters. + +### Workbench (Console Tool) + +| Version | Key New Features | Supported IoTDB Versions | +| :------ | :--------------------------------------- | :----------------------- | +| V1.5.1 | AI analysis, pattern matching | V1.3.2+ | +| V1.4.0 | Tree model visualization, English UI | V1.3.2+ | +| V1.3.1 | Enhanced analysis templates | V1.3.2+ | +| V1.3.0 | Database configuration tools | V1.3.2+ | +| V1.2.6 | Improved permission controls | V1.3.1+ | +| V1.2.5 | Template caching, UI optimizations | V1.3.0+ | +| V1.2.4 | Data import/export, time alignment | V1.2.2+ | +| V1.2.3 | Activation details, analysis tools | V1.2.2+ | +| V1.2.2 | Enhanced measurement point descriptions | V1.2.2+ | +| V1.2.1 | Sync monitoring panel, Prometheus alerts | V1.2.2+ | +| V1.2.0 | Major Workbench upgrade | V1.2.0+ | \ No newline at end of file diff --git a/src/UserGuide/latest-Table/IoTDB-Introduction/Scenario.md b/src/UserGuide/latest-Table/IoTDB-Introduction/Scenario.md new file mode 100644 index 000000000..6709c7009 --- /dev/null +++ b/src/UserGuide/latest-Table/IoTDB-Introduction/Scenario.md @@ -0,0 +1,80 @@ + + +## Scenario 1: Energy & Power + +#### **Background** + +By collecting, storing, and analyzing massive time-series data from power generation, transmission, storage, and consumption processes—combined with real-time monitoring, accurate forecasting, and intelligent scheduling of power systems—enterprises can significantly improve energy efficiency, reduce operational costs, ensure the safety and sustainability of energy production, and maintain the stable operation of power grids. + +#### **Architecture** + +IoTDB provides a self-hosted time-series database solution with high availability, efficient data synchronization across networks, and optimized performance for large-scale data ingestion and querying. It enables power enterprises to handle large-scale time-series data efficiently, supporting real-time anomaly detection, forecasting models, and intelligent scheduling for both traditional and renewable energy sources. + +![](/img/scenario-energy-en.png) + +## Scenario 2: Aerospace + +#### **Background** + +With the rapid evolution of aerospace technology, digital transformation has become essential to improving flight safety and system performance. The aerospace industry generates vast amounts of time-series data throughout the lifecycle of aircraft, rockets, and satellites—from design and manufacturing to testing and operation. Managing and analyzing telemetry data in real time is critical for mission reliability, system optimization, and failure prevention. + +#### **Architecture** + +IoTDB’s high-performance time-series data processing capabilities enable real-time telemetry analysis, low-bandwidth data synchronization, and seamless offline data migration. Its flexible deployment and resource-efficient architecture provide a reliable foundation for aerospace enterprises, facilitating intelligent monitoring, rapid fault diagnosis, and continuous optimization of critical systems. + +![](/img/scenario-aerospace-en.png) + +## Scenario 3: Transportation + +#### **Background** + +The rapid growth of the transportation industry has heightened demand for diversified data management, particularly in critical hubs like railways and subways, where real-time, reliable, and precise data is essential. By leveraging multi-dimensional operational, condition, and geospatial data from trains, subways, ships, and vehicles, enterprises can enable intelligent scheduling, fault prediction, route optimization, and efficient maintenance. These capabilities not only improve operational efficiency but also reduce management costs. + +#### **Architecture** + +IoTDB’s high-throughput time-series database supports low-latency queries, high concurrency, and efficient processing of multi-source heterogeneous data. It provides a scalable foundation for intelligent transportation systems, enabling real-time analytics for vehicle monitoring, traffic flow optimization, and predictive fault detection across large-scale transportation networks. + +![](/img/scenario-transportation-en.png) + +## Scenario 4: Steel & Metallurgy + +#### **Background** + +Facing increasing market competition and stringent environmental regulations, the steel and metallurgy industry is undergoing digital transformation. Industrial IoT platforms play a crucial role in optimizing production efficiency, improving product quality, and reducing energy consumption. Real-time data collection and analysis across smelting equipment, production lines, and supply chains enable intelligent monitoring, predictive maintenance, and precise process control. + +#### **Architecture** + +IoTDB’s powerful data storage and computing capabilities provide cross-platform compatibility, lightweight deployment options, and robust integration with industrial automation systems. Its ability to efficiently handle high-frequency time-series data empowers steel and metallurgy enterprises to implement smart manufacturing solutions and accelerate digitalization. + +![](/img/scenario-steel-en.png) + +## Scenario 5: IoT + +#### **Background** + +The Internet of Things (IoT) is driving digital transformation across industries by enabling real-time device connectivity and intelligent management. As IoT deployments scale, enterprises require a time-series data management system capable of processing vast data streams from edge devices to the cloud. Ensuring high-performance data storage, fast querying, and reliable synchronization is crucial for applications such as equipment monitoring, anomaly detection, and predictive maintenance. + +#### **Architecture** + +As a IoT-native high-performance time-series database, IoTDB supports end-to-end data synchronization and analysis from edge devices to the cloud. With high-concurrency processing capabilities, it meets the demands of large-scale device connectivity. IoTDB provides flexible data solutions to unlock deeper insights from operational data, improve efficiency, and drive comprehensive IoT business growth. + +![](/img/scenario-iot-en.png) \ No newline at end of file diff --git a/src/UserGuide/latest-Table/IoTDB-Introduction/What-is-timechodb_timecho.md b/src/UserGuide/latest-Table/IoTDB-Introduction/What-is-timechodb_timecho.md new file mode 100644 index 000000000..0113ec4b5 --- /dev/null +++ b/src/UserGuide/latest-Table/IoTDB-Introduction/What-is-timechodb_timecho.md @@ -0,0 +1,297 @@ + + +TimechoDB is a high-performance, cost-efficient, and IoT-native time-series database developed by Timecho. As an enterprise-grade extension of Apache IoTDB, it is designed to tackle the complexities of managing large-scale time-series data in IoT environments. These challenges include high-frequency data sampling, massive data volumes, out-of-order data, extended processing times, diverse analytical demands, and high storage and maintenance costs. + +TimechoDB enhances Apache IoTDB with superior functionality, optimized performance, enterprise-grade reliability, and an intuitive toolset, enabling industrial users to streamline data operations and unlock deeper insights. + +- [Quick Start](../QuickStart/QuickStart_timecho.md): Download, Deploy, and Use + +## TimechoDB Data Management Solution + +The Timecho ecosystem provides an integrated **collect-store-use** solution, covering the complete lifecycle of time-series data, from acquisition to analysis. + +![](/img/Introduction-en-timecho-new.png) + +Key components include: + +1. **Time-Series Database (TimechoDB)**: + 1. The primary storage and processing engine for time-series data, based on Apache IoTDB. + 2. Offers **high compression, advanced** **query** **capabilities, real-time stream processing, high availability, and scalability**. + 3. Provides **security features, multi-language APIs, and seamless integration with external systems**. +2. **Time-Series Standard File Format** **(Apache** **TsFile)**: + 1. A high-performance storage format originally developed by Timecho’s core contributors. + 2. Enables **efficient compression and fast querying**. + 3. Powers TimechoDB’s **data collection, storage, and analysis pipeline**, ensuring unified data management +3. **Time-Series AI Engine** **(AINode)**: + 1. Integrates **machine learning and deep learning** for time-series analytics. + 2. Extracts actionable insights directly from TimechoDB-stored data. +4. **Data Collection Framework**: + 1. Supports **various industrial protocols, resumable transfers, and network barrier penetration**. + 2. Facilitates **reliable data acquisition in challenging industrial environments**. + +## TimechoDB Architecture + +The diagram below illustrates a common cluster deployment (3 ConfigNodes, 3 DataNodes) of TimechoDB: + +![](/img/Cluster-Concept03.png) + +### Key Features + +TimechoDB offers the following advantages: + +**Flexible Deployment:** + +- Supports one-click cloud deployment, on-premise installation, and seamless terminal-cloud synchronization. +- Adapts to hybrid, edge, and cloud-native architectures + +**Cost-Efficient Storage:** + +- Utilizes high compression ratio storage, eliminating the need for separate real-time and historical databases. +- Supports unified data management across different time horizons. + +**Hierarchical** **Data** **Organization:** + +- Mirrors real-world industrial structures through hierarchical measurement point modeling. +- Enables directory-based navigation, search, and retrieval. + +**High-Throughput Read****&****Write:** + +- Optimized for millions of concurrent device connections. +- Handles multi-frequency and out-of-order data ingestion with high efficiency. + +**Advanced Time-Series Query Semantics** **:** + +- Features a native time-series computation engine with built-in timestamp alignment. +- Provides nearly 100 aggregation and analytical functions, enabling AI-powered time-series insights. + +**Enterprise-Grade High Availability** **:** + +- Distributed HA architecture ensures 24/7 real-time database services. +- Automated resource balancing when nodes are added, removed, or overheated. +- Supports heterogeneous clusters with varying hardware configurations. + +**Operational Simplicity** **:** + +- Standard SQL query syntax for ease of use. +- Multi-language APIs for flexible development. +- Comes with a comprehensive toolset, including an intuitive management console + +**Robust Ecosystem Integration:** + +- Seamlessly integrates with big data frameworks (Hadoop, Spark) and visualization tools (Grafana, ThingsBoard, DataEase). +- Supports device management for industrial IoT environments. + +### Enterprise-level Enhancements + +TimechoDB extends the open-source version with advanced industrial-grade capabilities, including tiered storage, cloud-edge collaboration, visualization tools, and security upgrades. + +**Dual-Active Deployment:** + +- Implements active-active high availability, ensuring continuous operations. +- Two independent clusters perform real-time bidirectional synchronization. +- Both systems accept external writes and maintain eventual consistency. + +**Seamless Data Synchronization** **:** + +- Built-in synchronization module supports real-time and batch data aggregation from field devices to central hubs. +- Supports full, partial, and cascading aggregation. +- Includes enterprise-ready plugins for cross air-gap transmission, encrypted transmission, and compression. + +**Tiered** **Storage:** + +- Dynamically categorizes data into hot, warm, and cold tiers. +- Efficiently balances SSD, HDD, and cloud storage utilization. +- Automatically optimizes data access speed and storage costs. + +**Enhanced Security** **:** + +- Implements whitelist-based access control and audit logging. +- Strengthens data governance and risk mitigation. + +**Feature Comparison**: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionApache IoTDBTimechoDB
Deployment ModeStand-Alone Deployment
Distributed Deployment
Dual Active Deployment×
Container DeploymentPartial support
Database FunctionalitySensor Management
Write Data
Query Data
Continuous Query
Trigger
User Defined Function
Permission Management
Data SynchronisationOnly file synchronization, no built-in pluginsReal time synchronization+file synchronization, enriched with built-in plugins
Stream ProcessingOnly framework, no built-in pluginsFramework+rich built-in plugins
Tiered Storage×
View×
White List×
Audit Log×
Supporting ToolsWorkbench×
Cluster Management Tool×
System Monitor Tool×
LocalizationLocalization Compatibility Certification×
Technical SupportBest Practices×
Use Training×
+ +#### Higher Efficiency and Stability + +TimechoDB achieves up to 10x performance improvements over Apache IoTDB in mission-critical workloads, and provides rapid fault recovery for industrial environments. + +#### Comprehensive Management Tools + +TimechoDB simplifies deployment, monitoring, and maintenance through an intuitive toolset: + +- **Cluster Monitoring Dashboard** + - Real-time insights into IoTDB and underlying OS health. + - 100+ performance metrics for in-depth monitoring and optimization. + - + - ![](/img/Introduction01.png) + - + - ![](/img/Introduction02.png) + - + - ![](/img/Introduction03.png) + - +- **Database Console** **:** + - Simplifies interaction with an intuitive GUI for metadata management, SQL execution, user permissions, and system configuration. +- **Cluster Management Tool** **:** + - Provides **one-click operations** for cluster deployment, scaling, start/stop, and configuration updates. + +#### Professional Enterprise Technical Services + +TimechoDB offers **vendor-backed enterprise services** to support industrial-scale deployments: + +- **On-Site Installation & Training**: Hands-on guidance for fast adoption. +- **Expert Consulting & Advisory**: Performance tuning and best practices. +- **Emergency Support & Remote Assistance**: Minimized downtime for mission-critical operations. +- **Custom Development & Optimization**: Tailored solutions for unique industrial use cases. + +Compared to the open-source version’s 2-3 month release cycle, TimechoDB delivers faster updates and same-day critical issue resolutions, ensuring production stability. + +#### Ecosystem Compatibility & Compliance + +imechoDB is self-developed, supports mainstream CPUs & operating systems, and meets industry compliance standards, making it a reliable choice for enterprise IoT deployments. \ No newline at end of file diff --git a/src/UserGuide/latest-Table/Technical-Insider/Cluster-data-partitioning.md b/src/UserGuide/latest-Table/Technical-Insider/Cluster-data-partitioning.md new file mode 100644 index 000000000..2a3f54fe7 --- /dev/null +++ b/src/UserGuide/latest-Table/Technical-Insider/Cluster-data-partitioning.md @@ -0,0 +1,125 @@ + + + +This document introduces the partitioning strategies and load balance strategies in IoTDB. According to the characteristics of time series data, IoTDB partitions them by series and time dimensions. Combining a series partition with a time partition creates a partition, the unit of division. To enhance throughput and reduce management costs, these partitions are evenly allocated to RegionGroups, which serve as the unit of replication. The RegionGroup's Regions then determine the storage location, with the leader Region managing the primary load. During this process, the Region placement strategy determines which nodes will host the replicas, while the leader selection strategy designates which Region will act as the leader. + +### Partitioning Strategy and Partition Allocation + +IoTDB implements a tailored partitioning algorithm for time-series data. Based on this, the partition information cached on the ConfigNode and DataNode is not only easy to manage but also clearly distinguishes between hot and cold data. Subsequently, balanced partitions are evenly distributed across the RegionGroups in the cluster to achieve storage balance. + +#### Partitioning Strategy + +IoTDB maps each sensor in a production environment to a time series. It then uses a **series** **partitioning algorithm** to partition the time series for schema management and a **time partitioning algorithm** to manage the data. The figure below illustrates how IoTDB partitions time-series data. + +![](/img/partition_table_en.png) + +##### Partitioning Algorithms + +Since a large number of devices and sensors are typically deployed in production environments, IoTDB uses a series partitioning algorithm to ensure that the size of partition information remains manageable. As the generated time series are associated with timestamps, IoTDB uses a time partitioning algorithm to clearly distinguish between hot and cold partitions. + +###### Series Partitioning Algorithm + +By default, IoTDB limits the number of series partitions to 1,000 and configures the series partitioning algorithm as a **hash partitioning algorithm**. This provides the following benefits: + +- The number of series partitions is a fixed constant, ensuring stable mapping between series and series partitions. Thus, IoTDB does not require frequent data migration. +- The load on series partitions is relatively balanced, as the number of series partitions is much smaller than the number of sensors deployed in production environments. + +Furthermore, if the actual load in the production environment can be estimated more accurately, the sequence partitioning algorithm can be configured as a custom hash or list partitioning algorithm to achieve a more uniform load distribution across all sequence partitions. + +###### Time Partitioning Algorithm + +The time partitioning algorithm converts a given timestamp into the corresponding time partition using the following formula: + +$$\left\lfloor\frac{\text{Timestamp} - \text{StartTimestamp}}{\text{TimePartitionInterval}}\right\rfloor$$ + +In this formula, $\text{StartTimestamp}$ and $\text{TimePartitionInterval}$ are configurable parameters to adapt to different production environments. $\text{StartTimestamp}$ represents the start time of the first time partition, while $\text{TimePartitionInterval}$ defines the duration of each time partition. By default, $\text{TimePartitionInterval}$ is set to seven days. + +##### Schema Partitioning + +Since the series partitioning algorithm evenly partitions the time series, each series partition corresponds to a schema partition. These schema partitions are then evenly distributed across **SchemaRegionGroups** to achieve balanced schema distribution. + +##### Data Partitioning + +Data partitions are created by combining series partitions and time partitions. Since the series partitioning algorithm evenly partitions the time series, the load of data partitions within a specific time partition remains balanced. These data partitions are then evenly distributed across **DataRegionGroups** to achieve balanced data distribution. + +#### Partition Allocation + +IoTDB uses RegionGroups to achieve elastic storage for time-series data. The number of RegionGroups in the cluster is determined by the total resources of all DataNodes. Since the number of RegionGroups is dynamic, IoTDB can easily scale. Both SchemaRegionGroups and DataRegionGroups follow the same partition allocation algorithm, which evenly divides all series partitions. The figure below illustrates the partition allocation process, where dynamically expanding RegionGroups match the continuously expanding time series and cluster. + +![](/img/partition_allocation_en.png) + +##### RegionGroup Expansion + +The number of RegionGroups is given by the following formula: + +$$\text{RegionGroupNumber} = \left\lfloor\frac{\sum_{i=1}^{\text{DataNodeNumber}} \text{RegionNumber}_i}{\text{ReplicationFactor}}\right\rfloor$$ + +In this formula, $\text{RegionNumber}_i$ represents the number of Regions expected to be hosted on the $i$-th DataNode, and $\text{ReplicationFactor}$ denotes the number of Regions within each RegionGroup. Both $\text{RegionNumber}_i$ and $\text{ReplicationFactor}$ are configurable parameters. $\text{RegionNumber}_i$ can be determined based on the available hardware resources (e.g., CPU cores, memory size) on the $i$-th DataNode to adapt to different physical servers. $\text{ReplicationFactor}$ can be adjusted to ensure different levels of fault tolerance. + +##### Allocation Strategy + +Both the SchemaRegionGroup and the DataRegionGroup follow the same allocation algorithm--splitting all series partitions evenly. As a result, each SchemaRegionGroup holds the same number of schema partitions, ensuring balanced schema storage. Similarly, for each time partition, each DataRegionGroup acquires the data partitions corresponding to the series partitions it holds. Consequently, the data partitions within a time partition are evenly distributed across all DataRegionGroups, ensuring balanced data storage in each time partition. + +Notably, IoTDB effectively leverages the characteristics of time series data. When the TTL (Time to Live) is configured, IoTDB enables migration-free elastic storage for time series data. This feature facilitates cluster expansion while minimizing the impact on online operations. The figures above illustrate an instance of this feature: newborn data partitions are evenly allocated to each DataRegion, and expired data are automatically archived. As a result, the cluster's storage will eventually remain balanced. + +### Load Balancing Strategies + +To improve cluster availability and performance, IoTDB employs carefully designed storage balancing and computation balancing algorithms. + +#### Storage Balancing + +The number of Regions held by a DataNode reflects its storage load. If the number of Regions varies significantly between DataNodes, the DataNode with more Regions may become a storage bottleneck. Although a simple Round Robin placement algorithm can achieve storage balancing by ensuring each DataNode holds an equal number of Regions, it reduces the cluster's fault tolerance, as shown below: + +![](/img/placement_en.png) + +- Assume the cluster has 4 DataNodes, 4 RegionGroups, and a replication factor of 2. +- Place the 2 Regions of RegionGroups $r_1$ on DataNodes $n_1$ and $n_2$ . +- Place the 2 Regions of RegionGroups $r_2$ on DataNodes $n_3$ and $n_4$ . +- Place the 2 Regions of RegionGroups $r_3$ on DataNodes $n_1$ and $n_3$ . +- Place the 2 Regions of RegionGroups $r_4$ on DataNodes $n_2$ and $n_4$ . + +In this scenario, if DataNode $n_2$ fails, the load previously handled by DataNode $n_2$ would be transferred solely to DataNode $n_1$ , potentially overloading it. + +To address this issue, IoTDB employs a Region placement algorithm that not only evenly distributes Regions across all DataNodes but also ensures that each DataNode can offload its storage to sufficient other DataNodes in the event of a failure. As a result, the cluster achieves balanced storage distribution and a high level of fault tolerance, ensuring its availability. + +#### Computation Balancing + +The number of leader Regions held by a DataNode reflects its Computing load. If the difference in the number of leaders across DataNodes is relatively large, the DataNode with more leaders is likely to become a Computing bottleneck. If the leader selection process is conducted using a transparent Greedy algorithm, the result may be an unbalanced leader distribution when the Regions are fault-tolerantly placed, as demonstrated below: + +![](/img/selection_en.png) + +- Assume the cluster has 4 DataNodes, 4 RegionGroups, and a replication factor of 2. +- Select RegionGroup $r_5$ on DataNode $n_5$ as the leader. +- Select RegionGroup $r_6$ on DataNode $n_7$ as the leader. +- Select RegionGroup $r_7$ on DataNode $n_7$ as the leader. +- Select RegionGroup $r_8$ on DataNode $n_8$ as the leader. + +Note that the above steps strictly follow the Greedy algorithm. However, by step 3, selecting the leader of RegionGroup $r_7$ on either DataNode $n_5$ or $n_7$ would result in uneven leader distribution. The root cause is that each greedy selection step lacks a global perspective, ultimately leading to a local optimum. + +To address this issue, IoTDB adopts a **leader selection algorithm** that continuously balances the distribution of leader across the cluster. As a result, the cluster achieves balanced computation load distribution, ensuring its performance. + +### Source Code + +- [Data Partitioning](https://github.com/apache/iotdb/tree/master/iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/partition) +- [Partition Allocation](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/partition) +- [Region Placement](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/region) +- [Leader Selection](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/router/leader) \ No newline at end of file diff --git a/src/UserGuide/latest-Table/Technical-Insider/Encoding-and-Compression.md b/src/UserGuide/latest-Table/Technical-Insider/Encoding-and-Compression.md new file mode 100644 index 000000000..d1546bead --- /dev/null +++ b/src/UserGuide/latest-Table/Technical-Insider/Encoding-and-Compression.md @@ -0,0 +1,126 @@ + + +IoTDB employs various encoding and compression techniques to enhance storage efficiency and reduce I/O operations during data writing and reading. Below is a detailed explanation of the supported encoding and compression methods. + +## **Encoding Methods** + +IoTDB supports multiple encoding methods tailored for different data types to optimize storage and performance. + +1. PLAIN + +The default encoding method, meaning no encoding is applied. It supports multiple data types and offers high time efficiency for compression and decompression, but has relatively slow storage efficiency. + +1. TS_2DIFF + +Second-order differential encoding (TS_2DIFF) is suitable for encoding monotonically increasing or decreasing sequences. It is not ideal for encoding data with significant fluctuations. + +1. RLE + +Run-Length Encoding (RLE) is ideal for sequences where certain values appear consecutively. It is not effective for sequences where most consecutive values differ. + +RLE can also be used to encode floating-point numbers, while it is necessary to specify decimal precision when creating time series. It is suitable to store sequence data where floating-point values appear continuously, but is not recommended for sequences requiring high decimal precision or those with large fluctuations. + +> Both RLE and TS_2DIFF encoding for `float` and `double` have precision limitations, with a default of two decimal places. GORILLA encoding is recommended instead. + +1. GORILLA + +A lossless encoding method suitable for sequences where consecutive values are close to each other. It is not effective for data with large fluctuations. + +Currently, there are two versions of GORILLA encoding implementation, it is recommended to use `GORILLA` instead of `GORILLA_V1` (which is deprecated). + +Usage restrictions: + +- When using GORILLA encoding for `INT32` data, ensure that the sequence does not contain values equal to `Integer.MIN_VALUE`. +- When using GORILLA encoding for `INT64` data, ensure that the sequence does not contain values equal to `Long.MIN_VALUE`. + +1. DICTIONARY + +A lossless encoding method suitable for data with low cardinality (i.e., a limited number of unique values). It is not recommended for high-cardinality data. + +1. ZIGZAG + +Maps signed integers to unsigned integers, making it suitable for small integer values. + +1. CHIMP + +A lossless encoding method designed for streaming floating-point data compression. It is efficient for sequences with small variations and low random noise. + +Usage restrictions: + +- When using CHIMP encoding for `INT32` data, ensure that the sequence does not contain values equal to `Integer.MIN_VALUE`. +- When using CHIMP encoding for `INT64` data, ensure that the sequence does not contain values equal to `Long.MIN_VALUE`. + +1. SPRINTZ + +A lossless encoding method combining prediction, ZigZag encoding, bit packing, and run-length encoding. It is best suited for time-series data with small absolute differences (i.e., low fluctuation) and is not effective for data with large variations. + +1. RLBE + +A lossless encoding method combining differential encoding, bit packing, run-length encoding, Fibonacci encoding, and concatenation. It is suitable for time-series data with a small and steadily increasing trend but is not effective for highly fluctuating data. + +### **Data Types and Supported Encoding Methods** + +The following table summarizes the recommended and supported encoding methods for each data type: + +| **Data Type** | **Recommended Encoding** | **Supported Encoding Methods** | +| :------------ | :----------------------- | :---------------------------------------------------------- | +| BOOLEAN | RLE | PLAIN, RLE | +| INT32 | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| DATE | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| INT64 | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| TIMESTAMP | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| FLOAT | GORILLA | PLAIN, RLE, TS_2DIFF, GORILLA, CHIMP, SPRINTZ, RLBE | +| DOUBLE | GORILLA | PLAIN, RLE, TS_2DIFF, GORILLA, CHIMP, SPRINTZ, RLBE | +| TEXT | PLAIN | PLAIN, DICTIONARY | +| STRING | PLAIN | PLAIN, DICTIONARY | +| BLOB | PLAIN | PLAIN | + +**Error Handling**: If the data type entered by the user does not match the specified encoding method, the system will display an error message. For example: + +```Plain +IoTDB> create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +Msg: 507: encoding TS_2DIFF does not support BOOLEAN +``` + +## **Compression Methods** + +When the time series is written and encoded as binary data according to the specified type, IoTDB applies compression techniques to further enhance storage efficiency. While both encoding and compression aim to optimize storage, encoding techniques are typically designed for specific data types (e.g., second-order differential encoding is only suitable for INT32 or INT64, and storing floating-point numbers requires multiplying them by 10ⁿ to convert them into integers) before converting the data into a binary stream. Compression methods like SNAPPY operate on the binary stream, making them independent of the data type. + +### **Supported Compression Methods** + +IoTDB allows specifying the compression method of a column when creating a time series. Currently, IoTDB supports the following compression methods: + +- UNCOMPRESSED +- SNAPPY +- LZ4 (Recommended) +- GZIP +- ZSTD +- LZMA2 + +### **Compression Ratio Statistics** + +IoTDB provides compression ratio statistics to monitor the effectiveness of compression. The statistics are stored in:`data/datanode/system/compression_ratio` + +- ratio_sum: The total sum of memtable compression ratios. +- memtable_flush_time: The total number of memtable flushes. + +The average compression ratio can be calculated as:`Average Compression Ratio = ratio_sum / memtable_flush_time` \ No newline at end of file diff --git a/src/UserGuide/latest-Table/User-Manual/Data-Sync_timecho.md b/src/UserGuide/latest-Table/User-Manual/Data-Sync_timecho.md new file mode 100644 index 000000000..90bc5bc4f --- /dev/null +++ b/src/UserGuide/latest-Table/User-Manual/Data-Sync_timecho.md @@ -0,0 +1,525 @@ + + +Data synchronization is a typical requirement in the Industrial Internet of Things (IIoT). Through data synchronization mechanisms, data sharing between IoTDB instances can be achieved, enabling the establishment of a complete data pipeline to meet needs such as internal and external network data exchange, edge-to-cloud synchronization, data migration, and data backup. + +# Functional Overview + +## Data Synchronization + +A data synchronization task consists of three stages: + +![](/img/en_dataSync01.png) + +- Source Stage: This stage is used to extract data from the source IoTDB, defined in the `source` section of the SQL statement. +- Process Stage: This stage is used to process the data extracted from the source IoTDB, defined in the `processor` section of the SQL statement. +- Sink Stage: This stage is used to send data to the target IoTDB, defined in the `sink` section of the SQL statement. + +By declaratively configuring these three parts in an SQL statement, flexible data synchronization capabilities can be achieved. + +## Functional Limitations and Notes + +- Supports data synchronization from IoTDB version 1.x series to version 2.x and later. +- Does not support data synchronization from IoTDB version 2.x series to version 1.x series. +- When performing data synchronization tasks, avoid executing any deletion operations to prevent inconsistencies between the two ends. + +# Usage Instructions + +A data synchronization task can be in one of three states: RUNNING, STOPPED, and DROPPED. The state transitions of the task are illustrated in the diagram below: + +![](/img/Data-Sync02.png) + +After creation, the task will start directly. Additionally, if the task stops due to an exception, the system will automatically attempt to restart it. + +We provide the following SQL statements for managing the state of synchronization tasks. + +## Create a Task + +Use the `CREATE PIPE` statement to create a data synchronization task. Among the following attributes, `PipeId` and `sink` are required, while `source` and `processor` are optional. Note that the order of the `SOURCE` and `SINK` plugins cannot be swapped when writing the SQL. + +SQL Example: + +```SQL +CREATE PIPE [IF NOT EXISTS] -- PipeId is a unique name identifying the task +-- Data extraction plugin (optional) +WITH SOURCE ( + [ = ,], +) +-- Data processing plugin (optional) +WITH PROCESSOR ( + [ = ,], +) +-- Data transmission plugin (required) +WITH SINK ( + [ = ,], +) +``` + +**IF NOT EXISTS Semantics**: Ensures that the creation command is executed only if the specified Pipe does not exist, preventing errors caused by attempting to create an already existing Pipe. + +## Start a Task + +After creation, the task directly enters the RUNNING state and does not require manual startup. However, if the task is stopped using the `STOP PIPE` statement, you need to manually start it using the `START PIPE` statement. If the task stops due to an exception, it will automatically restart to resume data processing: + +```SQL +START PIPE +``` + +## Stop a Task + +To stop data processing: + +```SQL +STOP PIPE +``` + +## Delete a Task + +To delete a specified task: + +```SQL +DROP PIPE [IF EXISTS] +``` + +**IF EXISTS Semantics**: Ensures that the deletion command is executed only if the specified Pipe exists, preventing errors caused by attempting to delete a non-existent Pipe. **Note**: Deleting a task does not require stopping the synchronization task first. + +## View Tasks + +To view all tasks: + +```SQL +SHOW PIPES +``` + +To view a specific task: + +```SQL +SHOW PIPE +``` + +Example Output of `SHOW PIPES`: + +```SQL ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +| ID| CreationTime| State|PipeSource|PipeProcessor| PipeSink|ExceptionMessage|RemainingEventCount|EstimatedRemainingSeconds| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +|59abf95db892428b9d01c5fa318014ea|2024-06-17T14:03:44.189|RUNNING| {}| {}|{sink=iotdb-thrift-sink, sink.ip=127.0.0.1, sink.port=6668}| | 128| 1.03| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +``` + +**Column Descriptions**: + +- **ID**: Unique identifier of the synchronization task. +- **CreationTime**: Time when the task was created. +- **State**: Current state of the task. +- **PipeSource**: Source of the data stream. +- **PipeProcessor**: Processing logic applied during data transmission. +- **PipeSink**: Destination of the data stream. +- **ExceptionMessage**: Displays exception information for the task. +- **RemainingEventCount** (statistics may have delays): Number of remaining events, including data and metadata synchronization events, as well as system and user-defined events. +- **EstimatedRemainingSeconds** (statistics may have delays): Estimated remaining time to complete the transmission based on the current event count and pipe processing rate. + +## Synchronization Plugins + +To make the architecture more flexible and adaptable to different synchronization scenarios, IoTDB supports plugin assembly in the synchronization task framework. The system provides some common pre-installed plugins, and you can also customize `processor` and `sink` plugins and load them into the IoTDB system. + +To view the plugins available in the system (including custom and built-in plugins), use the following statement: + +```SQL +SHOW PIPEPLUGINS +``` + +Example Output: + +```SQL +IoTDB> SHOW PIPEPLUGINS ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| PluginName|PluginType| ClassName| PluginJar| ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| DO-NOTHING-PROCESSOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.processor.donothing.DoNothingProcessor| | +| DO-NOTHING-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.donothing.DoNothingConnector| | +| IOTDB-AIR-GAP-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.airgap.IoTDBAirGapConnector| | +| IOTDB-SOURCE| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.extractor.iotdb.IoTDBExtractor| | +| IOTDB-THRIFT-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftConnector| | +| IOTDB-THRIFT-SSL-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftSslConnector| | ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +``` + +Detailed introduction of pre-installed plugins is as follows (for detailed parameters of each plugin, please refer to the [Parameter Description](../Reference/System-Config-Manual.md) section): + +| **Type** | **Custom Plugin** | **Plugin Name** | **Description** | +| :---------------------- | :----------------------------------------------------------- | :--------------------- | :----------------------------------------------------------- | +| Source Plugin | Not Supported | `iotdb-source` | Default extractor plugin for extracting historical or real-time data from IoTDB. | +| Processor Plugin | Supported | `do-nothing-processor` | Default processor plugin that does not process incoming data. | +| Sink Plugin | Supported | `do-nothing-sink` | Does not process outgoing data. | +| `iotdb-thrift-sink` | Default sink plugin for data transmission between IoTDB instances (V2.0.0+). Uses Thrift RPC framework with a multi-threaded async non-blocking IO model, ideal for distributed target scenarios. | | | +| `iotdb-air-gap-sink` | Used for cross-unidirectional data gate synchronization between IoTDB instances (V2.0.0+). Supports gate models like NARI Syskeeper 2000. | | | +| `iotdb-thrift-ssl-sink` | Used for data transmission between IoTDB instances (V2.0.0+). Uses Thrift RPC framework with a multi-threaded sync blocking IO model, suitable for high-security scenarios. | | | + +# Usage Examples + +## Full Data Synchronization + +This example demonstrates synchronizing all data from one IoTDB to another. The data pipeline is shown below: + +![](/img/e1.png) + +In this example, we create a synchronization task named `A2B` to synchronize all data from IoTDB A to IoTDB B. The `iotdb-thrift-sink` plugin (built-in) is used, and the `node-urls` parameter is configured with the URL of the DataNode service port on the target IoTDB. + +SQL Example: + +```SQL +CREATE PIPE A2B +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668' -- URL of the DataNode service port on the target IoTDB +) +``` + +## Partial Data Synchronization + +This example demonstrates synchronizing data within a specific historical time range (from August 23, 2023, 8:00 to October 23, 2023, 8:00) to another IoTDB. The data pipeline is shown below: + +![](/img/e2.png) + +In this example, we create a synchronization task named `A2B`. First, we define the data range in the `source` configuration. Since we are synchronizing historical data (data that existed before the task was created), we need to configure the start time (`start-time`), end time (`end-time`), and the streaming mode (`mode.streaming`). The `node-urls` parameter is configured with the URL of the DataNode service port on the target IoTDB. + +SQL Example: + +```SQL +CREATE PIPE A2B +WITH SOURCE ( + 'source' = 'iotdb-source', + 'mode.streaming' = 'true' -- Extraction mode for newly inserted data (after the pipe is created): + -- Whether to extract data in streaming mode (if set to false, batch mode is used). + 'start-time' = '2023.08.23T08:00:00+00:00', -- The event time at which data synchronization starts (inclusive). + 'end-time' = '2023.10.23T08:00:00+00:00' -- The event time at which data synchronization ends (inclusive). +) +WITH SINK ( + 'sink' = 'iotdb-thrift-async-sink', + 'node-urls' = '127.0.0.1:6668' -- The URL of the DataNode's data service port in the target IoTDB instance. +) +``` + +## Bidirectional Data Transmission + +This example demonstrates a scenario where two IoTDB instances act as dual-active systems. The data pipeline is shown below: + +![](/img/e3.png) + +To avoid infinite data loops, the `source.mode.double-living` parameter must be set to `true` on both IoTDB A and B, indicating that data forwarded from another pipe will not be retransmitted. + +SQL Example: On IoTDB A: + +```SQL +CREATE PIPE AB +WITH SOURCE ( + 'source.mode.double-living' = 'true' -- Do not forward data from other pipes +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668' -- URL of the DataNode service port on the target IoTDB +) +``` + +On IoTDB B: + +```SQL +CREATE PIPE BA +WITH SOURCE ( + 'source.mode.double-living' = 'true' -- Do not forward data from other pipes +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6667' -- URL of the DataNode service port on the target IoTDB +) +``` + +## Edge-to-Cloud Data Transmission + +This example demonstrates synchronizing data from multiple IoTDB clusters (B, C, D) to a central IoTDB cluster (A). The data pipeline is shown below: + +![](/img/sync_en_03.png) + +To synchronize data from clusters B, C, and D to cluster A, the `database-name` and `table-name` parameters are used to restrict the data range. + +SQL Example: On IoTDB B: + +```SQL +CREATE PIPE BA +WITH SOURCE ( + 'database-name' = 'db_b.*', -- Restrict the database scope + 'table-name' = '.*' -- Match all tables +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6667' -- URL of the DataNode service port on the target IoTDB +) +``` + +On IoTDB C : + +```SQL +CREATE PIPE CA +WITH SOURCE ( + 'database-name' = 'db_c.*', -- Restrict the database scope + 'table-name' = '.*' -- Match all tables +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668' -- URL of the DataNode service port on the target IoTDB +) +``` + +On IoTDB D: + +```SQL +CREATE PIPE DA +WITH SOURCE ( + 'database-name' = 'db_d.*', -- Restrict the database scope + 'table-name' = '.*' -- Match all tables +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6669' -- URL of the DataNode service port on the target IoTDB +) +``` + +## Cascaded Data Transmission + +This example demonstrates cascading data transmission from IoTDB A to IoTDB B and then to IoTDB C. The data pipeline is shown below: + +![](/img/sync_en_04.png) + +To synchronize data from cluster A to cluster C, the `source.mode.double-living` parameter is set to `true` in the pipe between B and C. + +SQL Example: On IoTDB A: + +```SQL +CREATE PIPE AB +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668' -- URL of the DataNode service port on the target IoTDB +) +``` + +On IoTDB B: + +```SQL +CREATE PIPE BC +WITH SOURCE ( + 'source.mode.double-living' = 'true' -- Do not forward data from other pipes +) +WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6669' -- URL of the DataNode service port on the target IoTDB +) +``` + +## Air-Gapped Data Transmission + +This example demonstrates synchronizing data from one IoTDB to another through a unidirectional air gap. The data pipeline is shown below: + +![](/img/e5.png) + +In this example, the `iotdb-air-gap-sink` plugin is used (currently supports specific air gap models; contact Timecho team for details). After configuring the air gap, execute the following statement on IoTDB A, where `node-urls` is the URL of the DataNode service port on the target IoTDB. + +SQL Example: + +```SQL +CREATE PIPE A2B +WITH SINK ( + 'sink' = 'iotdb-air-gap-sink', + 'node-urls' = '10.53.53.53:9780' -- URL of the DataNode service port on the target IoTDB +) +``` + +## Compressed Synchronization + +IoTDB supports specifying data compression methods during synchronization. The `compressor` parameter can be configured to enable real-time data compression and transmission. Supported algorithms include `snappy`, `gzip`, `lz4`, `zstd`, and `lzma2`. Multiple algorithms can be combined and applied in the configured order. The `rate-limit-bytes-per-second` parameter (supported in V1.3.3 and later) limits the maximum number of bytes transmitted per second (calculated after compression). If set to a value less than 0, there is no limit. + +**SQL Example**: + +```SQL +CREATE PIPE A2B +WITH SINK ( + 'node-urls' = '127.0.0.1:6668', -- URL of the DataNode service port on the target IoTDB + 'compressor' = 'snappy,lz4', -- Compression algorithms + 'rate-limit-bytes-per-second' = '1048576' -- Maximum bytes allowed per second +) +``` + +## Encrypted Synchronization + +IoTDB supports SSL encryption during synchronization to securely transmit data between IoTDB instances. By configuring SSL-related parameters such as the certificate path (`ssl.trust-store-path`) and password (`ssl.trust-store-pwd`), data can be protected by SSL encryption during synchronization. + +**SQL Example**: + +```SQL +CREATE PIPE A2B +WITH SINK ( + 'sink' = 'iotdb-thrift-ssl-sink', + 'node-urls' = '127.0.0.1:6667', -- URL of the DataNode service port on the target IoTDB + 'ssl.trust-store-path' = 'pki/trusted', -- Path to the trust store certificate + 'ssl.trust-store-pwd' = 'root' -- Password for the trust store certificate +) +``` + +# Reference: Notes + +You can adjust the parameters for data synchronization by modifying the IoTDB configuration file (`iotdb-system.properties`), such as the directory for storing synchronized data. The complete configuration is as follows: + +```Properties +# pipe_receiver_file_dir +# If this property is unset, system will save the data in the default relative path directory under the IoTDB folder(i.e., %IOTDB_HOME%/${cn_system_dir}/pipe/receiver). +# If it is absolute, system will save the data in the exact location it points to. +# If it is relative, system will save the data in the relative path directory it indicates under the IoTDB folder. +# Note: If pipe_receiver_file_dir is assigned an empty string(i.e.,zero-size), it will be handled as a relative path. +# effectiveMode: restart +# For windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is absolute. Otherwise, it is relative. +# pipe_receiver_file_dir=data\\confignode\\system\\pipe\\receiver +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_receiver_file_dir=data/confignode/system/pipe/receiver + +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# effectiveMode: first_start +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# effectiveMode: restart +# Datatype: int +pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# effectiveMode: restart +# Datatype: int +pipe_sink_timeout_ms=900000 + +# The maximum number of selectors that can be used in the sink. +# Recommend to set this value to less than or equal to pipe_sink_max_client_number. +# effectiveMode: restart +# Datatype: int +pipe_sink_selector_number=4 + +# The maximum number of clients that can be used in the sink. +# effectiveMode: restart +# Datatype: int +pipe_sink_max_client_number=16 + +# Whether to enable receiving pipe data through air gap. +# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. +# effectiveMode: restart +# Datatype: Boolean +pipe_air_gap_receiver_enabled=false + +# The port for the server to receive pipe data through air gap. +# Datatype: int +# effectiveMode: restart +pipe_air_gap_receiver_port=9780 + +# The total bytes that all pipe sinks can transfer per second. +# When given a value less than or equal to 0, it means no limit. +# default value is -1, which means no limit. +# effectiveMode: hot_reload +# Datatype: double +pipe_all_sinks_rate_limit_bytes_per_second=-1 +``` + +# Reference: Parameter Description + +## S**ource** **p****arameter****s** + +| **Parameter** | **Description** | **Value Range** | **Required** | **Default Value** | +| :----------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :----------- | :---------------------------------------------------------- | +| source | iotdb-source | String: iotdb-source | Yes | - | +| mode.streaming | This parameter specifies the source of time-series data capture. It applies to scenarios where `mode.streaming` is set to `false`, determining the capture source for `data.insert` in `inclusion`. Two capture strategies are available: - **true**: Dynamically selects the capture type. The system adapts to downstream processing speed, choosing between capturing each write request or only capturing TsFile file sealing requests. When downstream processing is fast, write requests are prioritized to reduce latency; when processing is slow, only file sealing requests are captured to prevent processing backlogs. This mode suits most scenarios, optimizing the balance between processing latency and throughput. - **false**: Uses a fixed batch capture approach, capturing only TsFile file sealing requests. This mode is suitable for resource-constrained applications, reducing system load. **Note**: Snapshot data captured when the pipe starts will only be provided for downstream processing as files. | Boolean: true / false | No | true | +| mode.strict | Determines whether to strictly filter data when using the `time`, `path`, `database-name`, or `table-name` parameters: - **true**: Strict filtering. The system will strictly filter captured data according to the specified conditions, ensuring that only matching data is selected. - **false**: Non-strict filtering. Some extra data may be included during the selection process to optimize performance and reduce CPU and I/O consumption. | Boolean: true / false | No | true | +| mode.snapshot | This parameter determines the data capture mode, affecting the `data` in `inclusion`. Two modes are available: - **true**: Static data capture. A one-time data snapshot is taken when the pipe starts. Once the snapshot data is fully consumed, the pipe automatically terminates (executing `DROP PIPE` SQL automatically). - **false**: Dynamic data capture. In addition to capturing snapshot data when the pipe starts, it continuously captures subsequent data changes. The pipe remains active to process the dynamic data stream. | Boolean: true / false | No | false | +| database-name | When the user connects with `sql_dialect` set to `table`, this parameter can be specified. Determines the scope of data capture, affecting the `data` in `inclusion`. Specifies the database name to filter. It can be a specific database name or a Java-style regular expression to match multiple databases. By default, all databases are matched. | String: Database name or database regular expression pattern string, which can match uncreated or non - existent databases. | No | ".*" | +| table-name | When the user connects with `sql_dialect` set to `table`, this parameter can be specified. Determines the scope of data capture, affecting the `data` in `inclusion`. Specifies the table name to filter. It can be a specific table name or a Java-style regular expression to match multiple tables. By default, all tables are matched. | String: Data table name or data table regular expression pattern string, which can be uncreated or non - existent tables. | No | ".*" | +| start-time | Determines the scope of data capture, affecting the `data` in `inclusion`. Data with an event time **greater than or equal to** this parameter will be selected for stream processing in the pipe. | Long: [Long.MIN_VALUE, Long.MAX_VALUE](Unix bare timestamp)orString: ISO format timestamp supported by IoTDB | No | Long: [Long.MIN_VALUE, Long.MAX_VALUE](Unix bare timestamp) | +| end-time | Determines the scope of data capture, affecting the `data` in `inclusion`. Data with an event time **less than or equal to** this parameter will be selected for stream processing in the pipe. | Long: [Long.MIN_VALUE, Long.MAX_VALUE](Unix bare timestamp)orString: ISO format timestamp supported by IoTDB | No | Long: [Long.MIN_VALUE, Long.MAX_VALUE](Unix bare timestamp) | +| forwarding-pipe-requests | Specifies whether to forward data that was synchronized via the pipe to external clusters. Typically used for setting up **active-active clusters**. In active-active cluster mode, this parameter should be set to `false` to prevent **infinite circular synchronization**. | Boolean: true / false | No | true | + +> 💎 **Note:** The difference between the values of true and false for the data extraction mode `mode.streaming` +> +> - True (recommended): Under this value, the task will process and send the data in real-time. Its characteristics are high timeliness and low throughput. +> - False: Under this value, the task will process and send the data in batches (according to the underlying data files). Its characteristics are low timeliness and high throughput. + +## Sink **p****arameter****s** + +#### iotdb-thrift-sink + +| **Parameter** | **Description** | Value Range | Required | Default Value | +| :-------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :------------ | +| sink | iotdb-thrift-sink or iotdb-thrift-async-sink | String: iotdb-thrift-sink or iotdb-thrift-async-sink | Yes | - | +| node-urls | URLs of the DataNode service ports on the target IoTDB. (please note that the synchronization task does not support forwarding to its own service). | String. Example:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Yes | - | +| user/usename | Usename for connecting to the target IoTDB. Must have appropriate permissions. | String | No | root | +| password | Password for the username. | String | No | root | +| batch.enable | Enables batch mode for log transmission to improve throughput and reduce IOPS. | Boolean: true, false | No | true | +| batch.max-delay-seconds | Maximum delay (in seconds) for batch transmission. | Integer | No | 1 | +| batch.size-bytes | Maximum batch size (in bytes) for batch transmission. | Long | No | 16*1024*1024 | +| compressor | The selected RPC compression algorithm. Multiple algorithms can be configured and will be adopted in sequence for each request. | String: snappy / gzip / lz4 / zstd / lzma2 | No | "" | +| compressor.zstd.level | When the selected RPC compression algorithm is zstd, this parameter can be used to additionally configure the compression level of the zstd algorithm. | Int: [-131072, 22] | No | 3 | +| rate-limit-bytes-per-second | The maximum number of bytes allowed to be transmitted per second. The compressed bytes (such as after compression) are calculated. If it is less than 0, there is no limit. | Double: [Double.MIN_VALUE, Double.MAX_VALUE] | No | -1 | + +#### iotdb-air-gap-sink + +| **Parameter** | **Description** | Value Range | Required | Default Value | +| :--------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :------------ | +| sink | iotdb-air-gap-sink | String: iotdb-air-gap-sink | Yes | - | +| node-urls | URLs of the DataNode service ports on the target IoTDB. (please note that the synchronization task does not support forwarding to its own service). | String. Example:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Yes | - | +| user/usename | Usename for connecting to the target IoTDB. Must have appropriate permissions. | String | No | root | +| password | Password for the username. | String | No | root | +| compressor | The selected RPC compression algorithm. Multiple algorithms can be configured and will be adopted in sequence for each request. | String: snappy / gzip / lz4 / zstd / lzma2 | No | "" | +| compressor.zstd.level | When the selected RPC compression algorithm is zstd, this parameter can be used to additionally configure the compression level of the zstd algorithm. | Int: [-131072, 22] | No | 3 | +| rate-limit-bytes-per-second | The maximum number of bytes allowed to be transmitted per second. The compressed bytes (such as after compression) are calculated. If it is less than 0, there is no limit. | Double: [Double.MIN_VALUE, Double.MAX_VALUE] | No | -1 | +| air-gap.handshake-timeout-ms | The timeout duration for the handshake requests when the sender and receiver attempt to establish a connection for the first time, in milliseconds. | Integer | No | 5000 | + +#### iotdb-thrift-ssl-sink + +| **Parameter** | **Description** | Value Range | Required | Default Value | +| :-------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :------------ | +| sink | iotdb-thrift-ssl-sink | String: iotdb-thrift-ssl-sink | Yes | - | +| node-urls | URLs of the DataNode service ports on the target IoTDB. (please note that the synchronization task does not support forwarding to its own service). | String. Example:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Yes | - | +| user/usename | Usename for connecting to the target IoTDB. Must have appropriate permissions. | String | No | root | +| password | Password for the username. | String | No | root | +| batch.enable | Enables batch mode for log transmission to improve throughput and reduce IOPS. | Boolean: true, false | No | true | +| batch.max-delay-seconds | Maximum delay (in seconds) for batch transmission. | Integer | No | 1 | +| batch.size-bytes | Maximum batch size (in bytes) for batch transmission. | Long | No | 16*1024*1024 | +| compressor | The selected RPC compression algorithm. Multiple algorithms can be configured and will be adopted in sequence for each request. | String: snappy / gzip / lz4 / zstd / lzma2 | No | "" | +| compressor.zstd.level | When the selected RPC compression algorithm is zstd, this parameter can be used to additionally configure the compression level of the zstd algorithm. | Int: [-131072, 22] | No | 3 | +| rate-limit-bytes-per-second | Maximum bytes allowed per second for transmission (calculated after compression). Set to a value less than 0 for no limit. | Double: [Double.MIN_VALUE, Double.MAX_VALUE] | No | -1 | +| ssl.trust-store-path | Path to the trust store certificate for SSL connection. | String.Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Yes | - | +| ssl.trust-store-pwd | Password for the trust store certificate. | Integer | Yes | - | \ No newline at end of file diff --git a/src/UserGuide/latest-Table/User-Manual/Tiered-Storage_timecho.md b/src/UserGuide/latest-Table/User-Manual/Tiered-Storage_timecho.md new file mode 100644 index 000000000..406798d64 --- /dev/null +++ b/src/UserGuide/latest-Table/User-Manual/Tiered-Storage_timecho.md @@ -0,0 +1,97 @@ + + +## Overview + +The **tiered storage** feature enables users to manage multiple types of storage media efficiently. Users can configure different storage media types within IoTDB and classify them into distinct storage tiers. In IoTDB, tiered storage is implemented by managing multiple directories. Users can group multiple storage directories into the same category and designate them as a **storage tier**. Additionally, data can be classified based on its "hotness" or "coldness" and stored accordingly in designated tiers. + +Currently, IoTDB supports hot and cold data classification based on the **Time-To-Live (****TTL****)** parameter. When data in a tier no longer meets the defined TTL rules, it is automatically migrated to the next tier. + +## **Parameter Definitions** + +To enable multi-level storage in IoTDB, the following configurations are required: + +1. Configure data directories and assign them into different tiers +2. Set TTL for each Tier to distinguish hot and cold data managed by different tiers. +3. Configure minimum remaining storage space ratio for each tier (Optional). If the available space in a tier falls below the defined threshold, data will be migrated to the next tier automatically. + +The specific parameter definitions and their descriptions are as follows. + +| **Parameter** | **Default Value** | **Description** | **Constraints** | +| :----------------------------------------- | :------------------------- |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `dn_data_dirs` | `data/datanode/data` | Specifies storage directories grouped into tiers. | Tiers are separated by `;`, directories within the same tier are separated by `,`.
Cloud storage (e.g., AWS S3) can only be the last tier.
Use `OBJECT_STORAGE` to denote cloud storage.
Only one cloud storage bucket is allowed. | +| `tier_ttl_in_ms` | `-1` | Defines the TTL (in milliseconds) for each tier to determine the data range it manages. | Tiers are separated by `;`.
The number of tiers must match `dn_data_dirs`.
`-1` means "no limit". | +| `dn_default_space_usage_thresholds` | `0.85` | Defines the minimum remaining space threshold (as a ratio) for each tier. When a tier’s remaining space falls below this threshold, data is migrated to the next tier.
The last tier triggers `READ_ONLY` mode. | -Tiers are separated by `;`.The number of tiers must match `dn_data_dirs`. | +| `object_storage_type` | `AWS_S3` | Cloud storage type. | Only `AWS_S3` is supported. | +| `object_storage_bucket` | `iotdb_data` | Cloud storage bucket name. | Required only if cloud storage is used. | +| `object_storage_endpoiont` | (Empty) | Cloud storage endpoint. | Required only if cloud storage is used. | +| `object_storage_access_key` | (Empty) | Cloud storage access key. | Required only if cloud storage is used. | +| `object_storage_access_secret` | (Empty) | Cloud storage access secret. | Required only if cloud storage is used. | +| `remote_tsfile_cache_dirs` | `data/datanode/data/cache` | Local cache directory for cloud storage. | Required only if cloud storage is used. | +| `remote_tsfile_cache_page_size_in_kb` | `20480` | Page size (in KB) for cloud storage local cache. | Required only if cloud storage is used. | +| `remote_tsfile_cache_max_disk_usage_in_mb` | `51200` | Maximum disk space (in MB) allocated for cloud storage local cache. | Required only if cloud storage is used. | + +## Local Tiered Storage Example + +The following is an example of a **two-tier local storage configuration**: + +```Properties +# Mandatory configurations +dn_data_dirs=/data1/data;/data2/data,/data3/data +tier_ttl_in_ms=86400000;-1 +dn_default_space_usage_thresholds=0.2;0.1 +``` + +**Tier Details:** + +| **Tier** | **Storage Directories** | **Data Range** | **Remaining Space Threshold** | +| :------- | :--------------------------- | :-------------------- | :---------------------------- | +| Tier 1 | `/data1/data` | Last 1 day of data | 20% | +| Tier 2 | `/data2/data`, `/data3/data` | Data older than 1 day | 10% | + +## Cloud-based Tiered Storage Example + +The following is an example of a **three-tier configuration with cloud storage**: + +```Properties +# Mandatory configurations +dn_data_dirs=/data1/data;/data2/data,/data3/data;OBJECT_STORAGE +tier_ttl_in_ms=86400000;864000000;-1 +dn_default_space_usage_thresholds=0.2;0.15;0.1 +object_storage_type=AWS_S3 +object_storage_bucket=iotdb +object_storage_endpoiont= +object_storage_access_key= +object_storage_access_secret= + +# Optional configurations +remote_tsfile_cache_dirs=data/datanode/data/cache +remote_tsfile_cache_page_size_in_kb=20971520 +remote_tsfile_cache_max_disk_usage_in_mb=53687091200 +``` + +**Tier Details:** + +| **Tier** | **Storage Directories** | **Data Range** | **Remaining Space Threshold** | +| :------- | :--------------------------- | :----------------------------- | :---------------------------- | +| Tier 1 | `/data1/data` | Last 1 day of data | 20% | +| Tier 2 | `/data2/data`, `/data3/data` | Data from 1 day to 10 days ago | 15% | +| Tier 3 | AWS S3 Cloud Storage | Data older than 10 days | 10% | \ No newline at end of file