From 402046154e247384d718cd738de2bc7778d5ad40 Mon Sep 17 00:00:00 2001 From: Oleg Kunitsyn Date: Tue, 28 Jan 2025 16:26:42 +0100 Subject: [PATCH 01/11] [ODM-12343] Add draft for clickhouse rebalancing documentation --- docs/home/clickhouse/rebalancing.md | 38 +++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 docs/home/clickhouse/rebalancing.md diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md new file mode 100644 index 00000000..01a8f9a0 --- /dev/null +++ b/docs/home/clickhouse/rebalancing.md @@ -0,0 +1,38 @@ +# Clickhouse rebalancing + +Clickhouse rebalancing is mostly manual process, due to clickhouse limitations. +Becasuse of this we developed a tool that will help you to make shard rebalancing easier. + +## Sequence of actions + +1) Make sure there is no ODM tasks in running state, wait until all of them finished. It's an important step to keep data consistent in Clickhouse. + +2) Configure ODM to be in clickhouse read-only mode: + + ```shell + export ODM_CORE_URL=: + clickhouse-helper odm readonly --set-value=true + ``` + + Note: Read-only mode doesn't affect schema migration. + +3) Redeploy `core` and `applications` services with new database in clickhouse. + 1) Update required values in helm values. + 2) Run helm upgrade. + +4) Clone data from the previous database to the new one. + + ```shell + export CH_SOURCE_URL=: + export CH_DESTINATION_URL=: + export CH_SOURCE_DATABASE=genestack + export CH_DESTINATION_DATABASE=genestack_new + clickhouse-helper ch clone + ``` + +5) Configure ODM to be in clickhouse read-write mode: + + ```shell + export ODM_CORE_URL=: + clickhouse-helper odm readonly --set-value=false + ``` From 9215f9049a9c87d5ba63e71bfa5ddaa921352d51 Mon Sep 17 00:00:00 2001 From: Oleg Kunitsyn Date: Thu, 30 Jan 2025 14:01:53 +0100 Subject: [PATCH 02/11] [ODM-12343] Add clickhouse-new-database.yaml file --- .../clickhouse/files/clickhouse-new-database.yaml | 14 ++++++++++++++ docs/home/clickhouse/rebalancing.md | 4 ++-- 2 files changed, 16 insertions(+), 2 deletions(-) create mode 100644 docs/home/clickhouse/files/clickhouse-new-database.yaml diff --git a/docs/home/clickhouse/files/clickhouse-new-database.yaml b/docs/home/clickhouse/files/clickhouse-new-database.yaml new file mode 100644 index 00000000..d2118603 --- /dev/null +++ b/docs/home/clickhouse/files/clickhouse-new-database.yaml @@ -0,0 +1,14 @@ +core: + files: + "/var/lib/genestack/properties/application.yaml": + backend: + clickhouse: + main: + url: "jdbc:clickhouse://{{ include \"odm.clickhouseHosts\" (dict \"port\" 8123 \"global\" $) }}/genestack_new?socket_timeout=1800000&dataTransferTimeout=1800000&maxQuerySize=20971520&createDatabaseIfNotExist=true&load_balancing_policy=roundRobin" +applications: + files: + "/var/lib/genestack/properties/application.yaml": + frontend: + clickhouse: + main: + url: "jdbc:clickhouse://{{ include \"odm.clickhouseHosts\" (dict \"port\" 8123 \"global\" $) }}/genestack_new?socket_timeout=1800000&dataTransferTimeout=1800000&maxQuerySize=20971520&createDatabaseIfNotExist=true&load_balancing_policy=roundRobin" diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md index 01a8f9a0..dfcc0fe9 100644 --- a/docs/home/clickhouse/rebalancing.md +++ b/docs/home/clickhouse/rebalancing.md @@ -10,14 +10,14 @@ Becasuse of this we developed a tool that will help you to make shard rebalancin 2) Configure ODM to be in clickhouse read-only mode: ```shell - export ODM_CORE_URL=: + export ODM_CORE_URL=http://: clickhouse-helper odm readonly --set-value=true ``` Note: Read-only mode doesn't affect schema migration. 3) Redeploy `core` and `applications` services with new database in clickhouse. - 1) Update required values in helm values. + 1) Update required values in helm values. View the values file patch [example](files/clickhouse-new-database.yaml) using `genestack_new` database name. 2) Run helm upgrade. 4) Clone data from the previous database to the new one. From 1a7b78e0691633a878a55de6ca43cc10814364cd Mon Sep 17 00:00:00 2001 From: Oleg Kunitsyn Date: Thu, 30 Jan 2025 16:38:45 +0100 Subject: [PATCH 03/11] [ODM-12343] Fix it --- docs/home/clickhouse/rebalancing.md | 16 ++++++++-------- mkdocs.yml | 2 ++ 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md index dfcc0fe9..01d6fd3b 100644 --- a/docs/home/clickhouse/rebalancing.md +++ b/docs/home/clickhouse/rebalancing.md @@ -5,9 +5,9 @@ Becasuse of this we developed a tool that will help you to make shard rebalancin ## Sequence of actions -1) Make sure there is no ODM tasks in running state, wait until all of them finished. It's an important step to keep data consistent in Clickhouse. +1. Make sure there is no ODM tasks in running state, wait until all of them finished. It's an important step to keep data consistent in Clickhouse. -2) Configure ODM to be in clickhouse read-only mode: +2. Configure ODM to be in clickhouse read-only mode: ```shell export ODM_CORE_URL=http://: @@ -16,11 +16,11 @@ Becasuse of this we developed a tool that will help you to make shard rebalancin Note: Read-only mode doesn't affect schema migration. -3) Redeploy `core` and `applications` services with new database in clickhouse. - 1) Update required values in helm values. View the values file patch [example](files/clickhouse-new-database.yaml) using `genestack_new` database name. - 2) Run helm upgrade. +3. Redeploy `core` and `applications` services with new database in clickhouse. + a) Update required values in helm values. View the values file patch [example](files/clickhouse-new-database.yaml) using `genestack_new` database name. + b) Run helm upgrade. -4) Clone data from the previous database to the new one. +4. Clone data from the previous database to the new one. ```shell export CH_SOURCE_URL=: @@ -28,9 +28,9 @@ Becasuse of this we developed a tool that will help you to make shard rebalancin export CH_SOURCE_DATABASE=genestack export CH_DESTINATION_DATABASE=genestack_new clickhouse-helper ch clone - ``` + ``` -5) Configure ODM to be in clickhouse read-write mode: +5. Configure ODM to be in clickhouse read-write mode: ```shell export ODM_CORE_URL=: diff --git a/mkdocs.yml b/mkdocs.yml index f1cb6c28..5f093781 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -21,6 +21,8 @@ nav: - Microsoft Azure: home/single-sign-on/scim/azure.md - Helm: - How to deploy: home/helm/how-to-deploy.md + - Clickhouse: + - Rebalancing: home/clickhouse/rebalancing.md - Troubleshooting: - AWS S3: home/troubleshooting/aws-s3.md - Azure SSO: home/troubleshooting/azure-sso.md From c2e18c8e6dee21e5b23528df6a371746462e2794 Mon Sep 17 00:00:00 2001 From: Oleg Kunitsyn Date: Thu, 30 Jan 2025 16:42:51 +0100 Subject: [PATCH 04/11] [ODM-12343] Enhance doc --- docs/home/clickhouse/rebalancing.md | 76 +++++++++++++++++++---------- 1 file changed, 49 insertions(+), 27 deletions(-) diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md index 01d6fd3b..f43ca49b 100644 --- a/docs/home/clickhouse/rebalancing.md +++ b/docs/home/clickhouse/rebalancing.md @@ -1,38 +1,60 @@ -# Clickhouse rebalancing +# ClickHouse Rebalancing -Clickhouse rebalancing is mostly manual process, due to clickhouse limitations. -Becasuse of this we developed a tool that will help you to make shard rebalancing easier. +Rebalancing shards in ClickHouse is primarily a manual process due to inherent [limitations](https://clickhouse.com/docs/en/guides/sre/scaling-clusters) in ClickHouse. To simplify this process, we have developed a tool to assist with shard rebalancing. -## Sequence of actions +## Prerequisites -1. Make sure there is no ODM tasks in running state, wait until all of them finished. It's an important step to keep data consistent in Clickhouse. +- Ensure there are no running ODM tasks. Wait for all tasks to complete before proceeding. This step is crucial to maintain data consistency in ClickHouse. -2. Configure ODM to be in clickhouse read-only mode: +## Steps for Rebalancing - ```shell - export ODM_CORE_URL=http://: - clickhouse-helper odm readonly --set-value=true - ``` +### 1. Enable ClickHouse Read-Only Mode in ODM - Note: Read-only mode doesn't affect schema migration. +Set ODM to read-only mode to prevent any write operations during the rebalancing process. This does not affect schema migrations. -3. Redeploy `core` and `applications` services with new database in clickhouse. - a) Update required values in helm values. View the values file patch [example](files/clickhouse-new-database.yaml) using `genestack_new` database name. - b) Run helm upgrade. +```shell +export ODM_CORE_URL=http://: +clickhouse-helper odm readonly --set-value=true +``` -4. Clone data from the previous database to the new one. +### 2. Redeploy Services with the New ClickHouse Database - ```shell - export CH_SOURCE_URL=: - export CH_DESTINATION_URL=: - export CH_SOURCE_DATABASE=genestack - export CH_DESTINATION_DATABASE=genestack_new - clickhouse-helper ch clone - ``` +Update your Helm values to point to the new ClickHouse database and redeploy the `core` and `applications` services. -5. Configure ODM to be in clickhouse read-write mode: +#### a) Update Helm Values - ```shell - export ODM_CORE_URL=: - clickhouse-helper odm readonly --set-value=false - ``` +Refer to the example values file patch for guidance: [clickhouse-new-database.yaml](files/clickhouse-new-database.yaml). Use the `genestack_new` database name. + +#### b) Perform Helm Upgrade + +Run the following command to apply the changes: + +```shell +helm upgrade -f values.yaml +``` + +### 3. Clone Data to the New Database + +Copy data from the old database to the new one using the `clickhouse-helper` tool. + +```shell +export CH_SOURCE_URL=: +export CH_DESTINATION_URL=: +export CH_SOURCE_DATABASE=genestack +export CH_DESTINATION_DATABASE=genestack_new +clickhouse-helper ch clone +``` + +### 4. Disable ClickHouse Read-Only Mode in ODM + +Once the data cloning is complete, re-enable write operations in ODM. + +```shell +export ODM_CORE_URL=: +clickhouse-helper odm readonly --set-value=false +``` + +## Notes + +- Ensure all steps are followed in sequence to avoid data inconsistencies. +- The `clickhouse-helper` tool is essential for simplifying the rebalancing process. From 2237669cf4fdb11cd47941df565ee25d9d0c3513 Mon Sep 17 00:00:00 2001 From: Oleg Kunitsyn Date: Thu, 30 Jan 2025 16:45:14 +0100 Subject: [PATCH 05/11] [ODM-12343] Add versions in doc --- docs/home/clickhouse/rebalancing.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md index f43ca49b..53f6807e 100644 --- a/docs/home/clickhouse/rebalancing.md +++ b/docs/home/clickhouse/rebalancing.md @@ -5,6 +5,8 @@ Rebalancing shards in ClickHouse is primarily a manual process due to inherent [ ## Prerequisites - Ensure there are no running ODM tasks. Wait for all tasks to complete before proceeding. This step is crucial to maintain data consistency in ClickHouse. +- Make sure ODM version is 1.60 or higher. +- Make sure `clickhouse-helper` version is higher than 0.30.0. ## Steps for Rebalancing From dd3a50cd3238b4c0a7e0947f3bc088396fd627ee Mon Sep 17 00:00:00 2001 From: Oleg Kunitsyn Date: Thu, 30 Jan 2025 17:24:21 +0100 Subject: [PATCH 06/11] [ODM-12343] Update Clone Data to the New Database --- docs/home/clickhouse/rebalancing.md | 32 +++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md index 53f6807e..211384bc 100644 --- a/docs/home/clickhouse/rebalancing.md +++ b/docs/home/clickhouse/rebalancing.md @@ -16,7 +16,7 @@ Set ODM to read-only mode to prevent any write operations during the rebalancing ```shell export ODM_CORE_URL=http://: -clickhouse-helper odm readonly --set-value=true +docker run clickhouse-helper odm readonly --set-value=true ``` ### 2. Redeploy Services with the New ClickHouse Database @@ -37,15 +37,29 @@ helm upgrade -f values.yaml ### 3. Clone Data to the New Database -Copy data from the old database to the new one using the `clickhouse-helper` tool. +Use the `clickhouse-helper` tool to copy data from the old database to the new one. Both `CH_SOURCE_URL` and `CH_DESTINATION_URL` can accept multiple nodes separated by a comma (`,`), for example, `localhost:9000,localhost:19000`. It is recommended to include all nodes in the cluster. -```shell -export CH_SOURCE_URL=: -export CH_DESTINATION_URL=: -export CH_SOURCE_DATABASE=genestack -export CH_DESTINATION_DATABASE=genestack_new -clickhouse-helper ch clone -``` +Follow these steps: + +1. Set the source and destination ClickHouse server URLs: + + ```shell + export CH_SOURCE_URL=: + export CH_DESTINATION_URL=: + ``` + +2. Set the source and destination database names: + + ```shell + export CH_SOURCE_DATABASE=genestack + export CH_DESTINATION_DATABASE=genestack_new + ``` + +3. Run the `clickhouse-helper` to clone the data: + + ```shell + clickhouse-helper ch clone + ``` ### 4. Disable ClickHouse Read-Only Mode in ODM From 5e2efccd0997349f8b949892d4a06f42b0d3661c Mon Sep 17 00:00:00 2001 From: Oleg Kunitsyn Date: Thu, 30 Jan 2025 17:37:02 +0100 Subject: [PATCH 07/11] [ODM-12343] Update doc after review --- docs/home/clickhouse/rebalancing.md | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md index 211384bc..bdaf8554 100644 --- a/docs/home/clickhouse/rebalancing.md +++ b/docs/home/clickhouse/rebalancing.md @@ -5,9 +5,14 @@ Rebalancing shards in ClickHouse is primarily a manual process due to inherent [ ## Prerequisites - Ensure there are no running ODM tasks. Wait for all tasks to complete before proceeding. This step is crucial to maintain data consistency in ClickHouse. +- Make sure that you have enought free space in clickhouse cluster, all rebalanced data should be disctibuted equally between nodes. - Make sure ODM version is 1.60 or higher. - Make sure `clickhouse-helper` version is higher than 0.30.0. +## Just to be sure + +You can use [sanity check](../troubleshooting/sanity-check.md) just to doublecheck that data is consistent in ODM. + ## Steps for Rebalancing ### 1. Enable ClickHouse Read-Only Mode in ODM @@ -16,7 +21,10 @@ Set ODM to read-only mode to prevent any write operations during the rebalancing ```shell export ODM_CORE_URL=http://: -docker run clickhouse-helper odm readonly --set-value=true +docker run \ + --env ODM_CORE_URL=${ODM_CORE_URL} \ + 091468197733.dkr.ecr.us-east-1.amazonaws.com/genestack/clickhouse-helper \ + odm readonly --set-value=true ``` ### 2. Redeploy Services with the New ClickHouse Database @@ -58,7 +66,13 @@ Follow these steps: 3. Run the `clickhouse-helper` to clone the data: ```shell - clickhouse-helper ch clone + docker run \ + --env CH_SOURCE_URL=${CH_SOURCE_URL} \ + --env CH_DESTINATION_URL=${CH_DESTINATION_URL} \ + --env CH_SOURCE_DATABASE=${CH_SOURCE_DATABASE} \ + --env CH_DESTINATION_DATABASE=${CH_DESTINATION_DATABASE} \ + 091468197733.dkr.ecr.us-east-1.amazonaws.com/genestack/clickhouse-helper \ + ch clone ``` ### 4. Disable ClickHouse Read-Only Mode in ODM @@ -66,11 +80,15 @@ Follow these steps: Once the data cloning is complete, re-enable write operations in ODM. ```shell -export ODM_CORE_URL=: -clickhouse-helper odm readonly --set-value=false +export ODM_CORE_URL=http://: +docker run \ + --env ODM_CORE_URL=${ODM_CORE_URL} \ + 091468197733.dkr.ecr.us-east-1.amazonaws.com/genestack/clickhouse-helper \ + odm readonly --set-value=false ``` ## Notes - Ensure all steps are followed in sequence to avoid data inconsistencies. - The `clickhouse-helper` tool is essential for simplifying the rebalancing process. +- Remember to delete the old database from ClickHouse after the rebalancing process is complete. From ccace6236400e435736c5bb5f45e95bcfba9a93a Mon Sep 17 00:00:00 2001 From: Oleg Kunitsyn Date: Thu, 30 Jan 2025 17:38:05 +0100 Subject: [PATCH 08/11] [ODM-12343] Update Prerequisites --- docs/home/clickhouse/rebalancing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md index bdaf8554..8e34480e 100644 --- a/docs/home/clickhouse/rebalancing.md +++ b/docs/home/clickhouse/rebalancing.md @@ -5,7 +5,7 @@ Rebalancing shards in ClickHouse is primarily a manual process due to inherent [ ## Prerequisites - Ensure there are no running ODM tasks. Wait for all tasks to complete before proceeding. This step is crucial to maintain data consistency in ClickHouse. -- Make sure that you have enought free space in clickhouse cluster, all rebalanced data should be disctibuted equally between nodes. +- Ensure that there is enough free space in the ClickHouse cluster. All rebalanced data should be distributed equally across the nodes. - Make sure ODM version is 1.60 or higher. - Make sure `clickhouse-helper` version is higher than 0.30.0. From ef25a1c83a78e827d4356a967447d68241549b7c Mon Sep 17 00:00:00 2001 From: Oleg Kunitsyn Date: Thu, 30 Jan 2025 17:42:22 +0100 Subject: [PATCH 09/11] [ODM-12343] Make recommendation bold --- docs/home/clickhouse/rebalancing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md index 8e34480e..57519bea 100644 --- a/docs/home/clickhouse/rebalancing.md +++ b/docs/home/clickhouse/rebalancing.md @@ -45,7 +45,7 @@ helm upgrade -f values.yaml ### 3. Clone Data to the New Database -Use the `clickhouse-helper` tool to copy data from the old database to the new one. Both `CH_SOURCE_URL` and `CH_DESTINATION_URL` can accept multiple nodes separated by a comma (`,`), for example, `localhost:9000,localhost:19000`. It is recommended to include all nodes in the cluster. +Use the `clickhouse-helper` tool to copy data from the old database to the new one. Both `CH_SOURCE_URL` and `CH_DESTINATION_URL` can accept multiple nodes separated by a comma (`,`), for example, `localhost:9000,localhost:19000`. **It is recommended to include all nodes in the cluster**. Follow these steps: From 819f65ff4eaeabc0401d9ba6ce0a2b0f397edb97 Mon Sep 17 00:00:00 2001 From: Oleg Kunitsyn Date: Thu, 30 Jan 2025 18:00:58 +0100 Subject: [PATCH 10/11] [ODM-12343] Add clickhouse client command --- docs/home/clickhouse/rebalancing.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md index 57519bea..363c4f5a 100644 --- a/docs/home/clickhouse/rebalancing.md +++ b/docs/home/clickhouse/rebalancing.md @@ -92,3 +92,7 @@ docker run \ - Ensure all steps are followed in sequence to avoid data inconsistencies. - The `clickhouse-helper` tool is essential for simplifying the rebalancing process. - Remember to delete the old database from ClickHouse after the rebalancing process is complete. + + ```shell + clickhouse-client --host --port -q "DROP DATABASE genestack" + ``` From 426ddaa011d8dc5c445ae3d11f3590e3d0e2a049 Mon Sep 17 00:00:00 2001 From: Oleg Kunitsyn Date: Thu, 30 Jan 2025 18:07:15 +0100 Subject: [PATCH 11/11] [ODM-12343] Remove exact query --- docs/home/clickhouse/rebalancing.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md index 363c4f5a..e9674083 100644 --- a/docs/home/clickhouse/rebalancing.md +++ b/docs/home/clickhouse/rebalancing.md @@ -92,7 +92,4 @@ docker run \ - Ensure all steps are followed in sequence to avoid data inconsistencies. - The `clickhouse-helper` tool is essential for simplifying the rebalancing process. - Remember to delete the old database from ClickHouse after the rebalancing process is complete. - - ```shell - clickhouse-client --host --port -q "DROP DATABASE genestack" - ``` + It can be done with `clickhouse-client` command-line tool.