diff --git a/docs/home/clickhouse/files/clickhouse-new-database.yaml b/docs/home/clickhouse/files/clickhouse-new-database.yaml new file mode 100644 index 00000000..d2118603 --- /dev/null +++ b/docs/home/clickhouse/files/clickhouse-new-database.yaml @@ -0,0 +1,14 @@ +core: + files: + "/var/lib/genestack/properties/application.yaml": + backend: + clickhouse: + main: + url: "jdbc:clickhouse://{{ include \"odm.clickhouseHosts\" (dict \"port\" 8123 \"global\" $) }}/genestack_new?socket_timeout=1800000&dataTransferTimeout=1800000&maxQuerySize=20971520&createDatabaseIfNotExist=true&load_balancing_policy=roundRobin" +applications: + files: + "/var/lib/genestack/properties/application.yaml": + frontend: + clickhouse: + main: + url: "jdbc:clickhouse://{{ include \"odm.clickhouseHosts\" (dict \"port\" 8123 \"global\" $) }}/genestack_new?socket_timeout=1800000&dataTransferTimeout=1800000&maxQuerySize=20971520&createDatabaseIfNotExist=true&load_balancing_policy=roundRobin" diff --git a/docs/home/clickhouse/rebalancing.md b/docs/home/clickhouse/rebalancing.md new file mode 100644 index 00000000..e9674083 --- /dev/null +++ b/docs/home/clickhouse/rebalancing.md @@ -0,0 +1,95 @@ +# ClickHouse Rebalancing + +Rebalancing shards in ClickHouse is primarily a manual process due to inherent [limitations](https://clickhouse.com/docs/en/guides/sre/scaling-clusters) in ClickHouse. To simplify this process, we have developed a tool to assist with shard rebalancing. + +## Prerequisites + +- Ensure there are no running ODM tasks. Wait for all tasks to complete before proceeding. This step is crucial to maintain data consistency in ClickHouse. +- Ensure that there is enough free space in the ClickHouse cluster. All rebalanced data should be distributed equally across the nodes. +- Make sure ODM version is 1.60 or higher. +- Make sure `clickhouse-helper` version is higher than 0.30.0. + +## Just to be sure + +You can use [sanity check](../troubleshooting/sanity-check.md) just to doublecheck that data is consistent in ODM. + +## Steps for Rebalancing + +### 1. Enable ClickHouse Read-Only Mode in ODM + +Set ODM to read-only mode to prevent any write operations during the rebalancing process. This does not affect schema migrations. + +```shell +export ODM_CORE_URL=http://: +docker run \ + --env ODM_CORE_URL=${ODM_CORE_URL} \ + 091468197733.dkr.ecr.us-east-1.amazonaws.com/genestack/clickhouse-helper \ + odm readonly --set-value=true +``` + +### 2. Redeploy Services with the New ClickHouse Database + +Update your Helm values to point to the new ClickHouse database and redeploy the `core` and `applications` services. + +#### a) Update Helm Values + +Refer to the example values file patch for guidance: [clickhouse-new-database.yaml](files/clickhouse-new-database.yaml). Use the `genestack_new` database name. + +#### b) Perform Helm Upgrade + +Run the following command to apply the changes: + +```shell +helm upgrade -f values.yaml +``` + +### 3. Clone Data to the New Database + +Use the `clickhouse-helper` tool to copy data from the old database to the new one. Both `CH_SOURCE_URL` and `CH_DESTINATION_URL` can accept multiple nodes separated by a comma (`,`), for example, `localhost:9000,localhost:19000`. **It is recommended to include all nodes in the cluster**. + +Follow these steps: + +1. Set the source and destination ClickHouse server URLs: + + ```shell + export CH_SOURCE_URL=: + export CH_DESTINATION_URL=: + ``` + +2. Set the source and destination database names: + + ```shell + export CH_SOURCE_DATABASE=genestack + export CH_DESTINATION_DATABASE=genestack_new + ``` + +3. Run the `clickhouse-helper` to clone the data: + + ```shell + docker run \ + --env CH_SOURCE_URL=${CH_SOURCE_URL} \ + --env CH_DESTINATION_URL=${CH_DESTINATION_URL} \ + --env CH_SOURCE_DATABASE=${CH_SOURCE_DATABASE} \ + --env CH_DESTINATION_DATABASE=${CH_DESTINATION_DATABASE} \ + 091468197733.dkr.ecr.us-east-1.amazonaws.com/genestack/clickhouse-helper \ + ch clone + ``` + +### 4. Disable ClickHouse Read-Only Mode in ODM + +Once the data cloning is complete, re-enable write operations in ODM. + +```shell +export ODM_CORE_URL=http://: +docker run \ + --env ODM_CORE_URL=${ODM_CORE_URL} \ + 091468197733.dkr.ecr.us-east-1.amazonaws.com/genestack/clickhouse-helper \ + odm readonly --set-value=false +``` + +## Notes + +- Ensure all steps are followed in sequence to avoid data inconsistencies. +- The `clickhouse-helper` tool is essential for simplifying the rebalancing process. +- Remember to delete the old database from ClickHouse after the rebalancing process is complete. + It can be done with `clickhouse-client` command-line tool. diff --git a/mkdocs.yml b/mkdocs.yml index f1cb6c28..5f093781 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -21,6 +21,8 @@ nav: - Microsoft Azure: home/single-sign-on/scim/azure.md - Helm: - How to deploy: home/helm/how-to-deploy.md + - Clickhouse: + - Rebalancing: home/clickhouse/rebalancing.md - Troubleshooting: - AWS S3: home/troubleshooting/aws-s3.md - Azure SSO: home/troubleshooting/azure-sso.md