Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .github/actions/spelling/allow.txt
Original file line number Diff line number Diff line change
Expand Up @@ -691,6 +691,8 @@ ipv
netdev
somaxconn
JVM

analyticdb
ttl
XUANWU


114 changes: 82 additions & 32 deletions docs/connectors/cloud-databases/aliyun-adb-mysql.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,106 @@
# Aliyun ADB MySQL

Aliyun AnalyticDB for MySQL (Aliyun ADB MySQL) is a cloud-native real-time data warehouse that is fully compatible with the MySQL protocol. It offers millisecond-level writes and sub-second query performance. TapData supports using it as either a source or target database to build low-latency real-time analytical views or data services, accelerating multi-source data integration and business decision-making, and helping enterprises establish a unified analytics platform.

## Supported Versions

Please follow the instructions below to successfully add and use Aliyun ADB MySQL database in TapData Cloud.
Aliyun ADB MySQL cluster version must be 3.0, with a kernel version of 3.2.1.0 or later.

## Supported Versions
## Supported Operations

**DML Operations**: INSERT, UPDATE, DELETE

## Prerequisites

1. Log in to the Alibaba Cloud console and create a [database account](https://www.alibabacloud.com/help/en/analyticdb/analyticdb-for-mysql/user-guide/create-database-accounts) for data synchronization.

:::tip

- To ensure sufficient privileges for data synchronization, please select **High Privilege Account** as the account type during creation.
- For fine-grained permission control, grant the source database **read access to the sync tables**, and grant the target database **read/write access**.
For detailed syntax, see [GRANT Syntax Guide](https://www.alibabacloud.com/help/en/analyticdb/analyticdb-for-mysql/developer-reference/grant).

:::

2. Enable public access. If your TapData instance is on the same private network as the ADB MySQL cluster, you can skip this step.

1. In the left sidebar, select **Database Connections**.

2. Click **Enable Public Address**.

3. In the pop-up window, add the public IP address of your TapData service to the whitelist.

For TapData Cloud, the fixed whitelist IPs are 47.93.190.224 and 47.242.251.110.

4. Click **OK**.

3. If you need to connect via the public network, [apply for a public endpoint.](https://www.alibabacloud.com/help/en/analyticdb/analyticdb-for-mysql/user-guide/apply-for-or-release-a-public-endpoint)

4. (Optional) To enable incremental sync from Aliyun ADB MySQL, configure Binlog for each table to be synced:

1. Enable Binlog:

```sql
-- Replace table_name with your actual table name
ALTER TABLE table_name BINLOG=true;
```

Aliyun ADB MySQL 5.0, 5.1, 5.5, 5.6, 5.7, 8.x
:::tip

## Prerequisites (As a Source)
[XUANWU_V2](https://www.alibabacloud.com/help/en/analyticdb/analyticdb-for-mysql/developer-reference/table-engine) tables do not support enabling Binlog.

Enable binlog for Aliyun ADB MySQL.
:::

> Cascade deletes, such as those generated by the database, are not recorded in the binlog and are therefore not supported.
2. Adjust Binlog retention time to prevent premature cleanup and ensure incremental sync works properly.

## Create Aliyun ADB MySQL Account
You can check the current setting with:

For MySQL 8 and later versions, the encryption method for passwords is different. Please use the corresponding method for the version to set the password; otherwise, incremental synchronization may not work.
```sql
SHOW CREATE TABLE source_table;
```

**3.3.1 For 5.x versions**
Set the retention:

```sql
create user 'username'@'localhost' identified by 'password';
```
```sql
ALTER TABLE table_name binlog_ttl='1d';
```

**3.3.2 For 8.x versions**
`binlog_ttl` formats:

```sql
-- Create user
create user 'username'@'localhost' identified with mysql_native_password by 'password';
-- Change password
alter user 'username'@'localhost' identified with mysql_native_password by 'password';
```
- Milliseconds: digits only (e.g., `60` = 60 ms)
- Seconds: digits + `s` (e.g., `30s`)
- Hours: digits + `h` (e.g., `2h`)
- Days: digits + `d` (e.g., `1d`)

## Grant Account Permissions
## Connect to Aliyun ADB MySQL

Grant SELECT permission for a specific database:
1. Log in to TapData Platform.

```sql
GRANT SELECT, SHOW VIEW, CREATE ROUTINE, LOCK TABLES ON <DATABASE_NAME>.<TABLE_NAME> TO 'tapdata' IDENTIFIED BY 'password';
```
2. In the left sidebar, click **Connections**.

Grant global permissions:
3. Click **Create** on the right side.

```sql
GRANT RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'tapdata' IDENTIFIED BY 'password';
```
4. In the pop-up dialog, search and select **Aliyun ADB MySQL**.

### Constraint Description
5. On the connection page, fill in the cluster connection details as follows:

When syncing from Aliyun ADB MySQL to other heterogeneous databases, if the source Aliyun ADB MySQL has table cascade settings, data updates and deletions triggered by such cascades will not be propagated to the target. If you need to build cascade handling capability on the target side, depending on the target, you can achieve this type of data synchronization through triggers or other means.
![Connection Example](../../images/aliyun_adb_mysql_connection_settings.png)

### About Update Events
- **Basic Settings**
- **Name**: Enter a unique name with business significance.
- **Type**: Support using Aliyun ADB MySQL as either a source or target database.
- **Host**: The internal or public host address.
- **Port**: Database port (default is **3306**).
- **Database**: The name of the database. One connection maps to one database; use multiple connections for multiple databases.
- **Username**: The database username.
- **Password**: The database password.
- **Connection Parameters**: Optional.
- **Timezone**: Defaults to the database's timezone, can be adjusted as needed.
- **Advanced Settings**
- **Contain Table**: The default option is **All**, which includes all tables. Alternatively, you can select **Custom** and manually specify the desired tables by separating their names with commas (,).
- **Exclude Tables**: Once the switch is enabled, you have the option to specify tables to be excluded. You can do this by listing the table names separated by commas (,) in case there are multiple tables to be excluded.
- **Agent Settings**: Defaults to **Platform automatic allocation**, you can also manually specify an agent.
- **Model Load Time**: If there are less than 10,000 models in the data source, their schema will be updated every hour. But if the number of models exceeds 10,000, the refresh will take place daily at the time you have specified.
- **Enable Heartbeat Table**: For source/target connections, enables automatic creation of a `_tapdata_heartbeat_table` that updates every 10 seconds (requires permissions). Used to monitor connection and task health.

AliYun ADB MySQL update events cannot update primary keys. Therefore, when writing updates, it's necessary to determine whether the primary key value before and after the update is the same. If they are the same, the primary key needs to be removed for the update. If they are different, the update needs to be split into a delete and an insert operation.
6. Click **Test**. Once successful, click **Save**.
98 changes: 51 additions & 47 deletions docs/connectors/cloud-databases/polardb-mysql.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,81 @@
# PolarDB MySQL

Alibaba Cloud PolarDB for MySQL is a cloud-native distributed database that is fully compatible with MySQL. It offers high availability, high concurrency, and elastic scalability. Tapdata supports using PolarDB for MySQL as both a source and a target database. It is ideal for scenarios such as on-premises to cloud migration, cross-region disaster recovery synchronization, cloud-off backups, and real-time data services—helping enterprises build a flexible, efficient, and unified data flow architecture.

## Supported Versions

Follow the instructions below to successfully add and use PolarDB MySQL database in TapData Cloud.
All versions of PolarDB MySQL are supported.

## Supported Versions
## Supported Sync Operations

- **DML**: INSERT, UPDATE, DELETE

PolarDB MySQL 5.7.x, 8.0.x
:::tip

## As a Data Source
When PolarDB MySQL is used as a target database, you can configure write policies through advanced settings in the task node: for insert conflicts, you can choose to update or discard; for update failures, you can choose to insert or just log the errors.

* Enable the Binlog feature on the source database.
:::

- Create an Account
- **DDL**: ADD COLUMN, CHANGE COLUMN (auto-increment not supported), DROP COLUMN, RENAME COLUMN

For MySQL 8 and later, password encryption is different. Make sure to use the corresponding method for your version to set the password; otherwise, incremental synchronization may fail. Use the following commands to confirm whether supplemental logging is enabled.
## Prerequisites

**For 5.x Versions**
1. Log in to the Alibaba Cloud console and [create a database account](https://www.alibabacloud.com/help/en/polardb/polardb-for-mysql/user-guide/create-and-manage-database-accounts) for data synchronization.

```sql
CREATE USER 'username'@'localhost' IDENTIFIED BY 'password';
```
:::tip

**For 8.x Versions**
- To ensure sufficient privileges for synchronization, select High-Privilege Account when creating the user.
- For fine-grained permission control, grant the source database **read access to the sync tables**, and grant the target database **read/write access**. For detailed syntax, see [Account Permission](https://www.alibabacloud.com/help/en/polardb/polardb-for-mysql/user-guide/account-permissions).

```sql
-- Create the user
CREATE USER 'username'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
-- Change the password
ALTER USER 'username'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
```
:::

### Granting Permissions
2. Enable public access if needed. If your Tapdata service is deployed in the same VPC as the PolarDB for MySQL cluster, this step can be skipped.

Grant `SELECT` permissions for a specific database:
1. In the left navigation, go to **Database Connections**.
2. Click **Enable Public Address**.
3. In the dialog, add the public IP address of the Tapdata service to the whitelist.
4. Click **OK**.

```sql
GRANT SELECT, SHOW VIEW, CREATE ROUTINE, LOCK TABLES ON <DATABASE_NAME>.<TABLE_NAME> TO 'tapdata' IDENTIFIED BY 'password';
```
3. If you need to connect via the public network, [apply for a public endpoint](https://www.alibabacloud.com/help/en/polardb/polardb-for-mysql/user-guide/apply-for-a-cluster-endpoint-or-a-primary-endpoint#35097e34565yw).

Grant global privileges:
4. *(Optional)* To enable incremental data reading from PolarDB for MySQL, [enable Binary Logging](https://www.alibabacloud.com/help/en/polardb/polardb-for-mysql/user-guide/enable-binary-logging) (Binlog).

```sql
GRANT RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'tapdata' IDENTIFIED BY 'password';
```
:::tip

### Constraint Explanation
It is recommended to set the [Binlog retention period](https://www.alibabacloud.com/help/en/polardb/polardb-for-mysql/user-guide/enable-binary-logging#7962e330893uy) to at least **7 days**. This helps prevent loss of incremental change data and ensures that incremental sync can proceed normally.

When synchronizing from MySQL to other heterogeneous databases, if the source MySQL database has table-level cascade settings, data updates and deletes triggered by this cascade will not be propagated to the target. If you need to build cascading processing capabilities on the target side, you can use triggers or other methods to achieve this type of data synchronization.
:::

## As a Target
## Connect to PolarDB MySQL

Grant full privileges for a specific database:
1. Log in to Tapdata Platform.

```sql
GRANT ALL PRIVILEGES ON <DATABASE_NAME>.<TABLE_NAME> TO 'tapdata' IDENTIFIED BY 'password';
```
2. In the left navigation menu, click **Connections**.

Grant global privileges:
3. Click **Create** on the right side of the page.

```sql
GRANT PROCESS ON *.* TO 'tapdata' IDENTIFIED BY 'password';
```
4. In the dialog box, search for and select **PolarDB MySQL**.

### Common Errors
5. On the connection setup page, fill in the connection details as described below:

"Unknown error 1044"
![PolarDB MySQL Connection](../../images/aliyun_polardb_mysql_connection_settings.png)

If permissions are granted correctly but you're still unable to pass the test connection through TapData, you can use the following steps to check and fix the issue:
- **Connection Settings**
- **Name**: Enter a unique name with business significance.
- **Type**: Support using PolarDB MySQL as either a source or target database.
- **Host**: The **primary address** of your PolarDB MySQL cluster (either internal or public).
- **Port**: Default is **3306**.
- **Database**: Name of the database to connect to. One connection maps to one database. For multiple databases, create separate connections.
- **Username**: A high-privilege user account.
- **Password**: Password for the above account.
- **Connection Parameters**: Optional; leave blank unless needed.
- **Time Zone**: Default is the database time zone; can be manually set if needed.
- **Advanced Settings**
- **CDC Log Caching**: [Mining the source database's](../../operational-data-hub/advanced/share-mining.md) incremental logs. This allows multiple tasks to share the same source database’s incremental log mining process, reducing duplicate reads and minimizing the impact of incremental synchronization on the source database. After enabling this feature, you will need to select an external storage to store the incremental log information.
- **Contain Table**: The default option is **All**, which includes all tables. Alternatively, you can select **Custom** and manually specify the desired tables by separating their names with commas (,).
- **Exclude Tables**: Once the switch is enabled, you have the option to specify tables to be excluded. You can do this by listing the table names separated by commas (,) in case there are multiple tables to be excluded.
- **Agent Settings**: Defaults to **Platform automatic allocation**, you can also manually specify an agent.
- **Model Load Time**: If there are less than 10,000 models in the data source, their schema will be updated every hour. But if the number of models exceeds 10,000, the refresh will take place daily at the time you have specified.
- **Enable Heartbeat Table**: For source/target connections, enables automatic creation of a `_tapdata_heartbeat_table` that updates every 10 seconds (requires permissions). Used to monitor connection and task health.

```sql
SELECT host, user, Grant_priv, Super_priv FROM mysql.user WHERE user='username';
-- Check if the value of the Grant_priv field is 'Y'
-- If not, execute the following command
UPDATE mysql.user SET Grant_priv='Y' WHERE user='username'; FLUSH PRIVILEGES;
```
6. Click **Test**. If successful, click **Save**.
12 changes: 6 additions & 6 deletions docs/connectors/supported-data-sources.md
Original file line number Diff line number Diff line change
Expand Up @@ -472,7 +472,6 @@ The beta version of the data sources is in public preview and has passed the bas
<td>➖</td>
<td>XLS/XLSX, file locations supported include local, FTP, SFTP, SMB, S3FS, OSS</td>
</tr>
<tr>
<tr>
<td>Huawei Cloud GaussDB</td>
<td>✅</td>
Expand All @@ -482,6 +481,7 @@ The beta version of the data sources is in public preview and has passed the bas
<td>✅</td>
<td>Enterprise version 2.8 (primary-standby), supports Standby version 8.1 for on-prem deployment</td>
</tr>
<tr>
<td>HubSpot</td>
<td>✅</td>
<td>➖</td>
Expand Down Expand Up @@ -747,9 +747,9 @@ The Alpha version of the data sources is in public preview and has passed the ba
<td>✅</td>
<td>✅</td>
<td>➖</td>
<td>✅</td>
<td>➖</td>
<td>➖</td>
<td>5.0, 5.1, 5.5, 5.6, 5.7, 8.x</td>
<td>3.x</td>
</tr>
<tr>
<td>Aliyun AnalyticDB PostgreSQL</td>
Expand Down Expand Up @@ -809,10 +809,10 @@ The Alpha version of the data sources is in public preview and has passed the ba
<td>PolarDB MySQL</td>
<td>✅</td>
<td>✅</td>
<td>➖</td>
<td>✅</td>
<td>➖</td>
<td>5.6, 5.7, 8.0</td>
<td>✅</td>
<td>✅</td>
<td>All</td>
</tr>
<tr>
<td>PolarDB PostgreSQL</td>
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 9 additions & 9 deletions docs/what-is-tapdata.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,21 @@ Transform all your data assets—legacy CRM, ERP, databases, and SaaS—into a s

## When to use TapData

- **Build Operational Data Hub**: A modern, centralized data integration architecture.
- **Active Master Data Management (MDM):** Unify customers, products, and transactions into a single source of truth.
- **Real-Time Data Integration**: Change Data Capture (CDC) based database replications and transformations, in cloud or on-prem.
- **Real-Time Single View**: Quickly build up-to-date, analysis-ready wide tables for customer or product.
- **[Build Operational Data Hub](operational-data-hub/plan-data-platform.md)**: A modern, centralized data integration architecture.
- **[Active Master Data Management (MDM)](operational-data-hub/mdm-layer/prepare-and-transform.md):** Unify customers, products, and transactions into a single source of truth.
- **[Real-Time Data Integration](introduction/change-data-capture-mechanism.md)**: Change Data Capture (CDC) based database replications and transformations, in cloud or on-prem.
- **[Real-Time Single View](getting-started/build-real-time-materialized-view.md)**: Quickly build up-to-date, analysis-ready wide tables for customer or product.
- **Query Acceleration:** Power complex analytics with incremental materialized views—no performance hit on production databases.
- **Microservices Data Sync:** Keep distributed services and APIs in sync with live, event-driven data.

## Key Features

- **50+ Pre-Built CDC Connectors**: Oracle, DB2, Sybase, MSSQL, PostgreSQL, MySQL and cloud variations
- **Sub-Second Data Capture & Sync**: Instant detect and capture changes from source databases and sync to destination
- **Real-Time Materialized Views**: continuously refreshed materialized views, always in sync with source tables.
- **[50+ Pre-Built CDC Connectors](connectors/supported-data-sources.md)**: Oracle, DB2, Sybase, MSSQL, PostgreSQL, MySQL and cloud variations
- **[Sub-Second Data Capture & Sync](introduction/change-data-capture-mechanism.md)**: Instant detect and capture changes from source databases and sync to destination
- **[Real-Time Materialized Views](data-transformation/create-views/overview.md)**: continuously refreshed materialized views, always in sync with source tables.
- **Flexible Architecture**: Supports point-to-point, hub-and-spoke, REST API, and event streaming.
- **Developer Friendly**: Visual drag-and-drop pipelines, plus Python SDK for data engineers.
- **Enterprise-Grade**: Scales to millions of daily transactions, with built-in monitoring and full data lineage.
- **Developer Friendly**: Visual [drag-and-drop pipelines](getting-started/build-real-time-materialized-view.md), plus [Python SDK](experimental/tapflow/introduction.md) for data engineers.
- **Enterprise-Grade**: Scales to millions of daily transactions, with [built-in monitoring](data-replication/monitor-task.md) and [full data lineage](operational-data-hub/fdm-layer/explore-fdm-tables.md).

## Why Real-Time Operational Data Platform Matters

Expand Down
Loading