Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/case-practices/best-practice/data-sync.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@ Analyzing the data sources is fundamental to data synchronization. It helps asse
| **Primary Keys/Unique Indexes** | Primary keys or unique indexes play a crucial role in synchronization performance and data consistency. If absent, special configurations may be needed for these tables in subsequent task settings. |
| **Target Database Type** | Confirm the type of target database. For heterogeneous data synchronization, ensure data type compatibility. For more information, see [Data Type Support](../../faq/no-supported-data-type.md). |

:::caution Tables Without Primary Keys

For tables without primary keys or unique identifiers, TapData cannot distinguish between duplicate rows. When the source updates or deletes some duplicate rows, all identical rows in the target will be affected. To ensure accurate synchronization, add a primary key or unique index to source tables, or select column combinations that can uniquely identify rows. Also avoid creating tables with many identical rows.

:::


## Configure and Optimize Tasks

Expand Down
2 changes: 1 addition & 1 deletion docs/connectors/on-prem-databases/oceanbase-oracle.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ import TabItem from '@theme/TabItem';

1. [Deploy OBProxy](https://en.oceanbase.com/docs/common-odp-doc-en-10000000002135940): the proxy layer that handles client connections and load balancing.

2. [Deploy oblogproxy](https://en.oceanbase.com/docs/community-obd-en-10000000002136450): the incremental log proxy that connects to OceanBase and fetches CDC logs.
2. Contact OceanBase technical support to obtain and install the enterprise edition **obcdc** component for transaction log conversion.

3. Contact the [Tapdata team](../../appendix/support.md) to obtain the `OB-Log-Decoder` installation package.

Expand Down
6 changes: 6 additions & 0 deletions docs/connectors/on-prem-databases/postgresql.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,10 @@ When using PostgreSQL as the target database or obtaining incremental data via t
CREATE PUBLICATION dbz_publication_root FOR ALL TABLES WITH (PUBLISH_VIA_PARTITION_ROOT = TRUE);
CREATE PUBLICATION dbz_publication FOR ALL TABLES;
```
:::tip
When creating the connection, selecting the Pgoutput plugin enables **Partial Publication**, eliminating the need to set REPLICA IDENTITY FULL for tables without primary keys during updates/deletes. Note that the synchronization account needs `CREATE PUBLICATION` permission and `OWNER` privileges on target tables.
:::

- [Decoderbufs](https://github.com/debezium/postgres-decoderbufs): Suitable for PostgreSQL 9.6 and above, uses Google Protocol Buffers to parse WAL logs but requires more complex configuration.
- [Walminer](https://gitee.com/movead/XLogMiner/tree/master/): Does not rely on logical replication, doesn't require setting `wal_level` to `logical`, or adjusting replication slot configuration, but requires superuser permissions.

Expand Down Expand Up @@ -428,6 +432,7 @@ To further enhance the security of the data pipeline, you can enable SSL (Secure
* **User**: Database username.
* **Password**: Password corresponding to the database username.
* **Log Plugin Name**: To read data changes from PostgreSQL and achieve incremental data synchronization, you need to follow the guidance in the [Preparation](#prerequisites) section to select and install the appropriate plugin.
* **Partial Publication**: Available only when **Pgoutput** is selected as the log plugin. This creates individual publications for each table, eliminating the global publication restriction that requires `REPLICA IDENTITY FULL` for tables without primary keys during updates/deletes. Requires database account with `CREATE PUBLICATION` permission and `OWNER` privileges on target tables.
* **Advanced Settings**
* **ExtParams**: Additional connection parameters, default is empty.
* **Timezone**: Defaults to timezone 0. You can also specify it manually according to business needs. Configuring a different timezone will affect timezone-related fields, such as DATE, TIMESTAMP, TIMESTAMP WITH TIME ZONE, etc.
Expand Down Expand Up @@ -457,6 +462,7 @@ When configuring data synchronization/conversion tasks, you can use PostgreSQL a
* **Hash Sharding**: When enabled, all table data will be split into multiple shards based on hash values during the full synchronization phase, allowing concurrent data reading. This significantly improves reading performance but also increases the database load. The maximum number of shards can be manually set after enabling this option.
* **Partition Table CDC Root Table**: Supported only in PostgreSQL 13 and above, and when selecting the pgoutput log plugin. When enabled, only CDC events for root tables will be detected; when disabled, only CDC events for child tables will be detected.
* **Max Queue Size**: Specifies the queue size for reading incremental data in PostgreSQL. The default value is **8000**. If the downstream synchronization is slow or individual table records are too large, consider lowering this value.
* **Split Update Unique Key**: Enabled by default. When updating unique key fields, this splits UPDATE into DELETE + INSERT events to improve target compatibility. Disable this if you need to preserve original UPDATE events (e.g., for auditing or change tracking).
* As a Target Node
* **Ignore NotNull**: Default is off, meaning NOT NULL constraints will be ignored when creating tables in the target database.
* **Specify Table Owner**: When synchronizing to PostgreSQL, you can specify the owner of automatically created tables. Ensure that the account used for data synchronization has the necessary permissions. If not, log in to the database as an administrator and execute `ALTER USER <tapdataUser> INHERIT;` and `GRANT <tableOwner> TO <tapdataUser>;`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,10 @@ This guide uses CentOS 7 as an example to demonstrate the deployment process.

For example: `tar -zxvf tapdata-release-v2.14.tar.gz && cd tapdata`

:::tip
If you need to copy the extracted program files to another directory for deployment, use the `cp -a` command to copy the entire directory. Avoid using the `*` wildcard to match files, as this may omit hidden files and cause startup failures.
:::

5. Prepare the License file.

1. Execute the following command to obtain the SID information required for the application.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,9 @@ The following operations need to be **performed separately on each of the three
# Extract the installation package (replace the package name with the actual name)
tar -zxvf installation-package-name -C /data/tapdata
```
:::tip
If you need to copy the extracted program files to another directory for deployment, use the `cp -a` command to copy the entire directory. Avoid using the `*` wildcard to match files, as this may omit hidden files and cause startup failures.
:::

4. Navigate to the extracted directory and run the `./tapdata start` command to start the TapData deployment process. Follow the command line prompts to set up TapData's login address, API service port, MongoDB connection authentication, and other settings. An example setup is provided below:

Expand Down
Loading