From 49b648dad097ca2fc5f9152f88444521a54b3598 Mon Sep 17 00:00:00 2001
From: Walter <guichenchen@outlook.com>
Date: Sat, 29 Nov 2025 15:34:37 +0800
Subject: [PATCH] Update documentation for API responses, terms, and operations

---
 docs/introduction/terms.md                   | 15 ++++-
 docs/platform-ops/operation.md               | 58 +++++++++++++++++++-
 docs/publish-apis/query/query-via-restful.md | 20 +++++++
 3 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/docs/introduction/terms.md b/docs/introduction/terms.md
index e0e3bbc5..de3431fc 100644
--- a/docs/introduction/terms.md
+++ b/docs/introduction/terms.md
@@ -76,4 +76,17 @@ A lightweight runtime component that executes pipelines. It connects to data sou
 
 ## TCM (TapData Control Manager)
 
-The centralized management plane for pipeline orchestration, configuration, monitoring, and deployment. Users interact with TCM to create, modify, and observe pipelines.
\ No newline at end of file
+The centralized management plane for pipeline orchestration, configuration, monitoring, and deployment. Users interact with TCM to create, modify, and observe pipelines.
+
+
+## QPS
+
+Queries Per Second. The average number of change events the sync task processes every second. It shows how fast data is replicated from the source to the target.
+
+## Incremental Validation
+
+While the task is running, TapData randomly compares rows in the target with the source to make sure they match. The check keeps going as long as the sync is active. See [Incremental Data Check](../data-replication/incremental-check.md).
+
+## API Server
+
+TapData’s built-in publishing layer. Pick any table and expose it as a [RESTful API endpoint](../publish-apis/README.md). Teams use it to share clean, governed data with mobile apps, third-party systems, or any client that speaks HTTP.
\ No newline at end of file
diff --git a/docs/platform-ops/operation.md b/docs/platform-ops/operation.md
index 6ec20d50..992770e2 100644
--- a/docs/platform-ops/operation.md
+++ b/docs/platform-ops/operation.md
@@ -377,4 +377,60 @@ a data replication task is used for scenarios that only synchronize incremental
 * [Data Services](../publish-apis/README.md)
     * Deleting or taking an API offline will render it unavailable.
 * [System Management](../system-admin/other-settings/system-settings.md)
-    * When [managing a cluster](../system-admin/manage-cluster.md), only perform close or restart operations on related services when they are experiencing anomalies.
\ No newline at end of file
+    * When [managing a cluster](../system-admin/manage-cluster.md), only perform close or restart operations on related services when they are experiencing anomalies.
+
+## How to run a TapData health check
+
+Use this checklist to confirm TapData is running normally.
+
+1. Log in to TapData.
+
+2. In the left menu choose **System Management > Cluster Management** and verify [component status](../system-admin/manage-cluster.md):
+   - TapData Manager, Engine, and API Server are all **Running**.
+   - CPU and memory are below 70 %.
+
+3. Open **Data Replication** or **Data Transformation** and scan the task list:
+   - Every task should show **Running**.
+   - Click a task name and check [metrics](../data-replication/monitor-task.md): lag is acceptable and QPS > 0.
+
+   If a task is unhealthy:
+   - **Read the error log** at the bottom of the monitor page and follow the hints. See [troubleshooting](../platform-ops/troubleshooting/README.md).
+   - **Test the connection**: open **Connections**, click **Test** on the related source/target and fix any auth or network issues.
+   - **Check incremental lag**: if QPS spikes for > 30 min, the source may be in a batch window—consider scaling the task. If the target receives no changes, verify CDC prerequisites (e.g. MySQL binlog = ROW). Primary-key conflicts in the log usually mean a config change.
+
+Still stuck? [Contact support](../appendix/support.md).
+
+
+## How to handle TapData alerts
+
+TapData sends alerts by [email](../case-practices/best-practice/alert-via-qqmail.md). Use the subject line to pick the right playbook below.
+
+**Task-state alerts**
+
+| Alert | What it means | What to do |
+| --- | --- | --- |
+| **Task error** | Task stopped; replication is down. | Open the task → Logs, fix the issue, restart. Escalate if stuck. |
+| **Full load finished** | Bulk copy is done. | Info only. Run a data-validate task if you need a checksum. |
+| **Incremental started** | Task is now streaming changes. | Info only. |
+| **Task stopped** | Someone clicked Stop. | Restart if it was accidental. |
+
+**Replication-lag alert**
+
+Lag exceeds the threshold you set. Open the task monitor and look for:
+
+- **Slow source reads** – “Read time” is high → ask the DBA to check load or network.
+- **Slow target writes / high QPS** – raise “Incremental read size” (≤1 000) and “Batch write size” (≤10 000); keep Agent memory <70 %.
+- **False lag** – QPS is 0 but lag still climbs → enable [heartbeat table](../case-practices/best-practice/heart-beat-task.md) on the source.
+- **Slow engine** – “Process time” keeps rising → optimise JS code or open a ticket.
+
+**Validation & performance alerts**
+
+| Alert | What it means | What to do |
+| --- | --- | --- |
+| **Validation diff** | Incremental compare found mismatches. | Auto-repair is on? Do nothing. Otherwise open the task and click **Repair**. |
+| **Data-source node slow** | Source/target latency high. | If lag alert fired, treat as “slow source reads” above; else watch and loop in the DBA if lag appears. |
+| **Process node slow** | JS node is the bottleneck. | Optimise logic or open a ticket if lag follows. |
+| **Validation job error** | Compare task crashed. | Doesn’t affect replication; restart the validation job. Escalate if it keeps failing. |
+| **Count diff limit exceeded** | Row counts don’t match. | **Full-sync task**: switch to full-field compare to pinpoint rows. **Incremental task**: wait 1–2 lag cycles and re-validate; repair if the gap remains. |
+| **Field diff limit exceeded** | Same as above but field-level. | Same playbook. |
+| **Task retry limit** | Task retried and still failed. | Open the task, follow the error message; escalate if you can’t clear it. |
\ No newline at end of file
diff --git a/docs/publish-apis/query/query-via-restful.md b/docs/publish-apis/query/query-via-restful.md
index e843c35b..fe1a816f 100644
--- a/docs/publish-apis/query/query-via-restful.md
+++ b/docs/publish-apis/query/query-via-restful.md
@@ -57,3 +57,23 @@ If you'd prefer to use an external tool or automate API testing, [Postman](https
 6. Click **Send**. You’ll get a real-time response from the API.
 
    ![Query Result](../../images/restful_api_query_result.png)
+
+
+## Common response codes
+
+| Code | Message | Meaning |
+| --- | --- | --- |
+| 200 | OK | Request succeeded |
+| 401 | Unauthorized error: token expired | Token expired; generate a new one |
+| 404 | Not Found error: endpoint not found | API does not exist or is not yet published—check the URL or wait for the publish to finish |
+| 429 | Rate limit exceeded. Maximum \${api limit} requests per second allowed | You hit the rate limit; retry later or raise the limit in the API settings |
+
+## FAQ
+
+* Q: The API takes too long to return data or times out
+
+  A: Add indexes on every column used in `WHERE`, `ORDER BY`, or joins. If the delay persists, enable response caching or increase the query timeout in the API settings.
+
+* Q: The payload doesn’t look right
+
+   A: Check the data-source model and the underlying table—make sure the data is current and that any field-merging logic matches what you expect.
\ No newline at end of file