diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt index 226dba80..b173c873 100644 --- a/.github/actions/spelling/allow.txt +++ b/.github/actions/spelling/allow.txt @@ -691,6 +691,8 @@ ipv netdev somaxconn JVM - +analyticdb +ttl +XUANWU diff --git a/docs/connectors/cloud-databases/aliyun-adb-mysql.md b/docs/connectors/cloud-databases/aliyun-adb-mysql.md index f715bdee..ee893348 100644 --- a/docs/connectors/cloud-databases/aliyun-adb-mysql.md +++ b/docs/connectors/cloud-databases/aliyun-adb-mysql.md @@ -1,56 +1,106 @@ # Aliyun ADB MySQL +Aliyun AnalyticDB for MySQL (Aliyun ADB MySQL) is a cloud-native real-time data warehouse that is fully compatible with the MySQL protocol. It offers millisecond-level writes and sub-second query performance. TapData supports using it as either a source or target database to build low-latency real-time analytical views or data services, accelerating multi-source data integration and business decision-making, and helping enterprises establish a unified analytics platform. +## Supported Versions -Please follow the instructions below to successfully add and use Aliyun ADB MySQL database in TapData Cloud. +Aliyun ADB MySQL cluster version must be 3.0, with a kernel version of 3.2.1.0 or later. -## Supported Versions +## Supported Operations + +**DML Operations**: INSERT, UPDATE, DELETE + +## Prerequisites + +1. Log in to the Alibaba Cloud console and create a [database account](https://www.alibabacloud.com/help/en/analyticdb/analyticdb-for-mysql/user-guide/create-database-accounts) for data synchronization. + + :::tip + + - To ensure sufficient privileges for data synchronization, please select **High Privilege Account** as the account type during creation. + - For fine-grained permission control, grant the source database **read access to the sync tables**, and grant the target database **read/write access**. + For detailed syntax, see [GRANT Syntax Guide](https://www.alibabacloud.com/help/en/analyticdb/analyticdb-for-mysql/developer-reference/grant). + + ::: + +2. Enable public access. If your TapData instance is on the same private network as the ADB MySQL cluster, you can skip this step. + + 1. In the left sidebar, select **Database Connections**. + + 2. Click **Enable Public Address**. + + 3. In the pop-up window, add the public IP address of your TapData service to the whitelist. + + For TapData Cloud, the fixed whitelist IPs are 47.93.190.224 and 47.242.251.110. + + 4. Click **OK**. + +3. If you need to connect via the public network, [apply for a public endpoint.](https://www.alibabacloud.com/help/en/analyticdb/analyticdb-for-mysql/user-guide/apply-for-or-release-a-public-endpoint) + +4. (Optional) To enable incremental sync from Aliyun ADB MySQL, configure Binlog for each table to be synced: + + 1. Enable Binlog: + + ```sql + -- Replace table_name with your actual table name + ALTER TABLE table_name BINLOG=true; + ``` -Aliyun ADB MySQL 5.0, 5.1, 5.5, 5.6, 5.7, 8.x + :::tip -## Prerequisites (As a Source) + [XUANWU_V2](https://www.alibabacloud.com/help/en/analyticdb/analyticdb-for-mysql/developer-reference/table-engine) tables do not support enabling Binlog. -Enable binlog for Aliyun ADB MySQL. + ::: -> Cascade deletes, such as those generated by the database, are not recorded in the binlog and are therefore not supported. + 2. Adjust Binlog retention time to prevent premature cleanup and ensure incremental sync works properly. -## Create Aliyun ADB MySQL Account + You can check the current setting with: -For MySQL 8 and later versions, the encryption method for passwords is different. Please use the corresponding method for the version to set the password; otherwise, incremental synchronization may not work. + ```sql + SHOW CREATE TABLE source_table; + ``` -**3.3.1 For 5.x versions** + Set the retention: -```sql -create user 'username'@'localhost' identified by 'password'; -``` + ```sql + ALTER TABLE table_name binlog_ttl='1d'; + ``` -**3.3.2 For 8.x versions** + `binlog_ttl` formats: -```sql --- Create user -create user 'username'@'localhost' identified with mysql_native_password by 'password'; --- Change password -alter user 'username'@'localhost' identified with mysql_native_password by 'password'; -``` + - Milliseconds: digits only (e.g., `60` = 60 ms) + - Seconds: digits + `s` (e.g., `30s`) + - Hours: digits + `h` (e.g., `2h`) + - Days: digits + `d` (e.g., `1d`) -## Grant Account Permissions +## Connect to Aliyun ADB MySQL -Grant SELECT permission for a specific database: +1. Log in to TapData Platform. -```sql -GRANT SELECT, SHOW VIEW, CREATE ROUTINE, LOCK TABLES ON . TO 'tapdata' IDENTIFIED BY 'password'; -``` +2. In the left sidebar, click **Connections**. -Grant global permissions: +3. Click **Create** on the right side. -```sql -GRANT RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'tapdata' IDENTIFIED BY 'password'; -``` +4. In the pop-up dialog, search and select **Aliyun ADB MySQL**. -### Constraint Description +5. On the connection page, fill in the cluster connection details as follows: -When syncing from Aliyun ADB MySQL to other heterogeneous databases, if the source Aliyun ADB MySQL has table cascade settings, data updates and deletions triggered by such cascades will not be propagated to the target. If you need to build cascade handling capability on the target side, depending on the target, you can achieve this type of data synchronization through triggers or other means. + ![Connection Example](../../images/aliyun_adb_mysql_connection_settings.png) -### About Update Events + - **Basic Settings** + - **Name**: Enter a unique name with business significance. + - **Type**: Support using Aliyun ADB MySQL as either a source or target database. + - **Host**: The internal or public host address. + - **Port**: Database port (default is **3306**). + - **Database**: The name of the database. One connection maps to one database; use multiple connections for multiple databases. + - **Username**: The database username. + - **Password**: The database password. + - **Connection Parameters**: Optional. + - **Timezone**: Defaults to the database's timezone, can be adjusted as needed. + - **Advanced Settings** + - **Contain Table**: The default option is **All**, which includes all tables. Alternatively, you can select **Custom** and manually specify the desired tables by separating their names with commas (,). + - **Exclude Tables**: Once the switch is enabled, you have the option to specify tables to be excluded. You can do this by listing the table names separated by commas (,) in case there are multiple tables to be excluded. + - **Agent Settings**: Defaults to **Platform automatic allocation**, you can also manually specify an agent. + - **Model Load Time**: If there are less than 10,000 models in the data source, their schema will be updated every hour. But if the number of models exceeds 10,000, the refresh will take place daily at the time you have specified. + - **Enable Heartbeat Table**: For source/target connections, enables automatic creation of a `_tapdata_heartbeat_table` that updates every 10 seconds (requires permissions). Used to monitor connection and task health. -AliYun ADB MySQL update events cannot update primary keys. Therefore, when writing updates, it's necessary to determine whether the primary key value before and after the update is the same. If they are the same, the primary key needs to be removed for the update. If they are different, the update needs to be split into a delete and an insert operation. \ No newline at end of file +6. Click **Test**. Once successful, click **Save**. \ No newline at end of file diff --git a/docs/connectors/cloud-databases/polardb-mysql.md b/docs/connectors/cloud-databases/polardb-mysql.md index e67b3a93..58a60ad6 100644 --- a/docs/connectors/cloud-databases/polardb-mysql.md +++ b/docs/connectors/cloud-databases/polardb-mysql.md @@ -1,77 +1,81 @@ # PolarDB MySQL +Alibaba Cloud PolarDB for MySQL is a cloud-native distributed database that is fully compatible with MySQL. It offers high availability, high concurrency, and elastic scalability. Tapdata supports using PolarDB for MySQL as both a source and a target database. It is ideal for scenarios such as on-premises to cloud migration, cross-region disaster recovery synchronization, cloud-off backups, and real-time data services—helping enterprises build a flexible, efficient, and unified data flow architecture. +## Supported Versions -Follow the instructions below to successfully add and use PolarDB MySQL database in TapData Cloud. +All versions of PolarDB MySQL are supported. -## Supported Versions +## Supported Sync Operations + +- **DML**: INSERT, UPDATE, DELETE -PolarDB MySQL 5.7.x, 8.0.x + :::tip -## As a Data Source + When PolarDB MySQL is used as a target database, you can configure write policies through advanced settings in the task node: for insert conflicts, you can choose to update or discard; for update failures, you can choose to insert or just log the errors. -* Enable the Binlog feature on the source database. + ::: -- Create an Account +- **DDL**: ADD COLUMN, CHANGE COLUMN (auto-increment not supported), DROP COLUMN, RENAME COLUMN - For MySQL 8 and later, password encryption is different. Make sure to use the corresponding method for your version to set the password; otherwise, incremental synchronization may fail. Use the following commands to confirm whether supplemental logging is enabled. +## Prerequisites -**For 5.x Versions** +1. Log in to the Alibaba Cloud console and [create a database account](https://www.alibabacloud.com/help/en/polardb/polardb-for-mysql/user-guide/create-and-manage-database-accounts) for data synchronization. -```sql -CREATE USER 'username'@'localhost' IDENTIFIED BY 'password'; -``` + :::tip -**For 8.x Versions** + - To ensure sufficient privileges for synchronization, select High-Privilege Account when creating the user. + - For fine-grained permission control, grant the source database **read access to the sync tables**, and grant the target database **read/write access**. For detailed syntax, see [Account Permission](https://www.alibabacloud.com/help/en/polardb/polardb-for-mysql/user-guide/account-permissions). -```sql --- Create the user -CREATE USER 'username'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password'; --- Change the password -ALTER USER 'username'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password'; -``` + ::: -### Granting Permissions +2. Enable public access if needed. If your Tapdata service is deployed in the same VPC as the PolarDB for MySQL cluster, this step can be skipped. -Grant `SELECT` permissions for a specific database: + 1. In the left navigation, go to **Database Connections**. + 2. Click **Enable Public Address**. + 3. In the dialog, add the public IP address of the Tapdata service to the whitelist. + 4. Click **OK**. -```sql -GRANT SELECT, SHOW VIEW, CREATE ROUTINE, LOCK TABLES ON . TO 'tapdata' IDENTIFIED BY 'password'; -``` +3. If you need to connect via the public network, [apply for a public endpoint](https://www.alibabacloud.com/help/en/polardb/polardb-for-mysql/user-guide/apply-for-a-cluster-endpoint-or-a-primary-endpoint#35097e34565yw). -Grant global privileges: +4. *(Optional)* To enable incremental data reading from PolarDB for MySQL, [enable Binary Logging](https://www.alibabacloud.com/help/en/polardb/polardb-for-mysql/user-guide/enable-binary-logging) (Binlog). -```sql -GRANT RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'tapdata' IDENTIFIED BY 'password'; -``` + :::tip -### Constraint Explanation + It is recommended to set the [Binlog retention period](https://www.alibabacloud.com/help/en/polardb/polardb-for-mysql/user-guide/enable-binary-logging#7962e330893uy) to at least **7 days**. This helps prevent loss of incremental change data and ensures that incremental sync can proceed normally. -When synchronizing from MySQL to other heterogeneous databases, if the source MySQL database has table-level cascade settings, data updates and deletes triggered by this cascade will not be propagated to the target. If you need to build cascading processing capabilities on the target side, you can use triggers or other methods to achieve this type of data synchronization. + ::: -## As a Target +## Connect to PolarDB MySQL -Grant full privileges for a specific database: +1. Log in to Tapdata Platform. -```sql -GRANT ALL PRIVILEGES ON . TO 'tapdata' IDENTIFIED BY 'password'; -``` +2. In the left navigation menu, click **Connections**. -Grant global privileges: +3. Click **Create** on the right side of the page. -```sql -GRANT PROCESS ON *.* TO 'tapdata' IDENTIFIED BY 'password'; -``` +4. In the dialog box, search for and select **PolarDB MySQL**. -### Common Errors +5. On the connection setup page, fill in the connection details as described below: -"Unknown error 1044" + ![PolarDB MySQL Connection](../../images/aliyun_polardb_mysql_connection_settings.png) -If permissions are granted correctly but you're still unable to pass the test connection through TapData, you can use the following steps to check and fix the issue: + - **Connection Settings** + - **Name**: Enter a unique name with business significance. + - **Type**: Support using PolarDB MySQL as either a source or target database. + - **Host**: The **primary address** of your PolarDB MySQL cluster (either internal or public). + - **Port**: Default is **3306**. + - **Database**: Name of the database to connect to. One connection maps to one database. For multiple databases, create separate connections. + - **Username**: A high-privilege user account. + - **Password**: Password for the above account. + - **Connection Parameters**: Optional; leave blank unless needed. + - **Time Zone**: Default is the database time zone; can be manually set if needed. + - **Advanced Settings** + - **CDC Log Caching**: [Mining the source database's](../../operational-data-hub/advanced/share-mining.md) incremental logs. This allows multiple tasks to share the same source database’s incremental log mining process, reducing duplicate reads and minimizing the impact of incremental synchronization on the source database. After enabling this feature, you will need to select an external storage to store the incremental log information. + - **Contain Table**: The default option is **All**, which includes all tables. Alternatively, you can select **Custom** and manually specify the desired tables by separating their names with commas (,). + - **Exclude Tables**: Once the switch is enabled, you have the option to specify tables to be excluded. You can do this by listing the table names separated by commas (,) in case there are multiple tables to be excluded. + - **Agent Settings**: Defaults to **Platform automatic allocation**, you can also manually specify an agent. + - **Model Load Time**: If there are less than 10,000 models in the data source, their schema will be updated every hour. But if the number of models exceeds 10,000, the refresh will take place daily at the time you have specified. + - **Enable Heartbeat Table**: For source/target connections, enables automatic creation of a `_tapdata_heartbeat_table` that updates every 10 seconds (requires permissions). Used to monitor connection and task health. -```sql -SELECT host, user, Grant_priv, Super_priv FROM mysql.user WHERE user='username'; --- Check if the value of the Grant_priv field is 'Y' --- If not, execute the following command -UPDATE mysql.user SET Grant_priv='Y' WHERE user='username'; FLUSH PRIVILEGES; -``` +6. Click **Test**. If successful, click **Save**. diff --git a/docs/connectors/supported-data-sources.md b/docs/connectors/supported-data-sources.md index aa5fa012..99ab320b 100644 --- a/docs/connectors/supported-data-sources.md +++ b/docs/connectors/supported-data-sources.md @@ -472,7 +472,6 @@ The beta version of the data sources is in public preview and has passed the bas ➖ XLS/XLSX, file locations supported include local, FTP, SFTP, SMB, S3FS, OSS - Huawei Cloud GaussDB ✅ @@ -482,6 +481,7 @@ The beta version of the data sources is in public preview and has passed the bas ✅ Enterprise version 2.8 (primary-standby), supports Standby version 8.1 for on-prem deployment + HubSpot ✅ ➖ @@ -747,9 +747,9 @@ The Alpha version of the data sources is in public preview and has passed the ba ✅ ✅ ➖ + ✅ ➖ - ➖ - 5.0, 5.1, 5.5, 5.6, 5.7, 8.x + 3.x Aliyun AnalyticDB PostgreSQL @@ -809,10 +809,10 @@ The Alpha version of the data sources is in public preview and has passed the ba PolarDB MySQL ✅ ✅ - ➖ ✅ - ➖ - 5.6, 5.7, 8.0 + ✅ + ✅ + All PolarDB PostgreSQL diff --git a/docs/images/aliyun_adb_mysql_connection_settings.png b/docs/images/aliyun_adb_mysql_connection_settings.png new file mode 100644 index 00000000..96d97716 Binary files /dev/null and b/docs/images/aliyun_adb_mysql_connection_settings.png differ diff --git a/docs/images/aliyun_polardb_mysql_connection_settings.png b/docs/images/aliyun_polardb_mysql_connection_settings.png new file mode 100644 index 00000000..203ceb4e Binary files /dev/null and b/docs/images/aliyun_polardb_mysql_connection_settings.png differ diff --git a/docs/what-is-tapdata.md b/docs/what-is-tapdata.md index 02575b92..9debfcee 100644 --- a/docs/what-is-tapdata.md +++ b/docs/what-is-tapdata.md @@ -18,21 +18,21 @@ Transform all your data assets—legacy CRM, ERP, databases, and SaaS—into a s ## When to use TapData -- **Build Operational Data Hub**: A modern, centralized data integration architecture. -- **Active Master Data Management (MDM):** Unify customers, products, and transactions into a single source of truth. -- **Real-Time Data Integration**: Change Data Capture (CDC) based database replications and transformations, in cloud or on-prem. -- **Real-Time Single View**: Quickly build up-to-date, analysis-ready wide tables for customer or product. +- **[Build Operational Data Hub](operational-data-hub/plan-data-platform.md)**: A modern, centralized data integration architecture. +- **[Active Master Data Management (MDM)](operational-data-hub/mdm-layer/prepare-and-transform.md):** Unify customers, products, and transactions into a single source of truth. +- **[Real-Time Data Integration](introduction/change-data-capture-mechanism.md)**: Change Data Capture (CDC) based database replications and transformations, in cloud or on-prem. +- **[Real-Time Single View](getting-started/build-real-time-materialized-view.md)**: Quickly build up-to-date, analysis-ready wide tables for customer or product. - **Query Acceleration:** Power complex analytics with incremental materialized views—no performance hit on production databases. - **Microservices Data Sync:** Keep distributed services and APIs in sync with live, event-driven data. ## Key Features -- **50+ Pre-Built CDC Connectors**: Oracle, DB2, Sybase, MSSQL, PostgreSQL, MySQL and cloud variations -- **Sub-Second Data Capture & Sync**: Instant detect and capture changes from source databases and sync to destination -- **Real-Time Materialized Views**: continuously refreshed materialized views, always in sync with source tables. +- **[50+ Pre-Built CDC Connectors](connectors/supported-data-sources.md)**: Oracle, DB2, Sybase, MSSQL, PostgreSQL, MySQL and cloud variations +- **[Sub-Second Data Capture & Sync](introduction/change-data-capture-mechanism.md)**: Instant detect and capture changes from source databases and sync to destination +- **[Real-Time Materialized Views](data-transformation/create-views/overview.md)**: continuously refreshed materialized views, always in sync with source tables. - **Flexible Architecture**: Supports point-to-point, hub-and-spoke, REST API, and event streaming. -- **Developer Friendly**: Visual drag-and-drop pipelines, plus Python SDK for data engineers. -- **Enterprise-Grade**: Scales to millions of daily transactions, with built-in monitoring and full data lineage. +- **Developer Friendly**: Visual [drag-and-drop pipelines](getting-started/build-real-time-materialized-view.md), plus [Python SDK](experimental/tapflow/introduction.md) for data engineers. +- **Enterprise-Grade**: Scales to millions of daily transactions, with [built-in monitoring](data-replication/monitor-task.md) and [full data lineage](operational-data-hub/fdm-layer/explore-fdm-tables.md). ## Why Real-Time Operational Data Platform Matters