diff --git a/doc/integration/clickhouse.md b/doc/integration/clickhouse.md index b9cbf346fb097dedcee9060c298f99ac9cf22555..538407877cd43ecb85748d9f68f50eb9d7814395 100644 --- a/doc/integration/clickhouse.md +++ b/doc/integration/clickhouse.md @@ -2,8 +2,7 @@ stage: Analytics group: Platform Insights info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments -gitlab_dedicated: yes -title: ClickHouse integration guidelines +title: ClickHouse --- {{< details >}} @@ -14,16 +13,12 @@ title: ClickHouse integration guidelines {{< /details >}} -{{< alert type="note" >}} +ClickHouse is a secondary data store for GitLab that enables advanced analytical features such as GitLab Duo, SDLC trends, and CI Analytics. Only specific data is stored in ClickHouse for these analytics purposes. +{{< alert type="warning" >}} For more information on plans for ClickHouse support for GitLab Self-Managed, see [epic 51](https://gitlab.com/groups/gitlab-org/architecture/gitlab-data-analytics/-/epics/51). -{{< /alert >}} - -{{< alert type="note" >}} - For more information about ClickHouse support for GitLab Dedicated, see [ClickHouse for GitLab Dedicated](../subscriptions/gitlab_dedicated/_index.md#clickhouse-cloud). - {{< /alert >}} [ClickHouse](https://clickhouse.com) is an open-source column-oriented database management system. It can efficiently filter, aggregate, and query across large data sets. @@ -37,36 +32,67 @@ Alternatively, you can [bring your own ClickHouse](https://clickhouse.com/docs/e ## Supported ClickHouse versions | First GitLab version | ClickHouse versions | Comment | -|----------------------|---------------------|---------| -| 17.7.0 | 23.x (24.x, 25.x) | For using ClickHouse 24.x and 25.x see the [workaround section](#database-schema-migrations-on-gitlab-1800-and-earlier). | -| 18.1.0 | 23.x, 24.x, 25.x | | -| 18.5.0 | 23.x, 24.x, 25.x | Experimental support for `Replicated` database engine. | +|---------------------|---------------------|---------| +| 17.7.0 | 23.x (24.x, 25.x) | For using ClickHouse 24.x and 25.x see the [workaround section](#database-schema-migrations-on-gitlab-1800-and-earlier). | +| 18.1.0 | 23.x, 24.x, 25.x | | +| 18.5.0 | 23.x, 24.x, 25.x | Experimental support for Replicated database engine. | -{{< alert type="note" >}} +ClickHouse Cloud is supported. Compatibility is generally ensured with the latest major GitLab release and newer versions. -[ClickHouse Cloud](https://clickhouse.com/cloud) is supported. Compatibility is generally ensured with the latest major GitLab release and newer versions. +## Requirements -{{< /alert >}} +For detailed system requirements and sizing recommendations, see [issue 548450](https://gitlab.com/gitlab-org/gitlab/-/issues/548450). ## Set up ClickHouse -To set up ClickHouse with GitLab: +Choose your deployment type based on your operational requirements: -1. [Run ClickHouse Cluster and configure database](#run-and-configure-clickhouse). -1. [Configure GitLab connection to ClickHouse](#configure-the-gitlab-connection-to-clickhouse). -1. [Run ClickHouse migrations](#run-clickhouse-migrations). +- **[ClickHouse Cloud](#set-up-clickhouse-cloud)** (Recommended): Fully managed service with automatic upgrades, backups, and scaling. +- **[Self-managed ClickHouse (BYOC)](#set-up-self-managed-clickhouse-byoc)**: Complete control over your infrastructure and configuration. -### Run and configure ClickHouse +### Set up ClickHouse Cloud -When you run ClickHouse on a hosted server, various data points might impact the resource consumption, like the number -of builds that run on your instance each month, the selected hardware, the data center choice to host ClickHouse, and more. -Regardless, the cost should not be significant. +Prerequisites: -To create the necessary user and database objects: +- ClickHouse Cloud account +- Network connectivity from your GitLab instance to ClickHouse Cloud +- Administrator access to your GitLab instance + +To set up ClickHouse Cloud with GitLab: + +1. [Create and configure your ClickHouse Cloud service](#create-clickhouse-cloud-service). +1. [Create the GitLab database and user](#create-database-and-user-cloud). +1. [Configure the GitLab connection](#configure-gitlab-connection-cloud). +1. [Verify the connection](#verify-connection-cloud). +1. [Run ClickHouse migrations](#run-migrations-cloud). +1. [Enable ClickHouse for Analytics](#enable-clickhouse-for-analytics). + +#### Create ClickHouse Cloud service + +1. Sign in to [ClickHouse Cloud](https://clickhouse.cloud). +1. Select **New Service**. +1. Choose your service tier: + - **Development**: For testing and development environments. + - **Production**: For production workloads with high availability. +1. Select your cloud provider and region. Choose a region close to your GitLab instance for optimal performance. +1. Configure your service name and settings. +1. Select **Create Service**. +1. Once provisioned, note your connection details from the service dashboard: + - Host + - Port (usually `9440` for secure connections) + - Username + - Password -1. Generate a secure password and save it. -1. Sign in to the ClickHouse SQL console. -1. Execute the following command. Replace `PASSWORD_HERE` with the generated password. +{{< alert type="note" >}} +**Auto-upgrade capability**: ClickHouse Cloud automatically handles version upgrades and security patches. Enterprise plan customers can schedule upgrade windows to control when upgrades occur and avoid unexpected service interruptions during business hours. +{{< /alert >}} + +#### Create database and user (Cloud) + +1. In the ClickHouse Cloud console, select your service. +1. Select **SQL Console**. +1. Generate a secure password for the GitLab user and save it. +1. Execute the following SQL commands, replacing `PASSWORD_HERE` with your generated password: ```sql CREATE DATABASE gitlab_clickhouse_main_production; @@ -77,11 +103,11 @@ To create the necessary user and database objects: GRANT gitlab_app TO gitlab; ``` -### Configure the GitLab connection to ClickHouse +#### Configure GitLab connection (Cloud) -{{< tabs >}} +::Tabs -{{< tab title="Linux package" >}} +:::TabTitle Linux package To provide GitLab with ClickHouse credentials: @@ -89,7 +115,7 @@ To provide GitLab with ClickHouse credentials: ```ruby gitlab_rails['clickhouse_databases']['main']['database'] = 'gitlab_clickhouse_main_production' - gitlab_rails['clickhouse_databases']['main']['url'] = 'https://example.com/path' + gitlab_rails['clickhouse_databases']['main']['url'] = 'https://your-service.clickhouse.cloud:9440' gitlab_rails['clickhouse_databases']['main']['username'] = 'gitlab' gitlab_rails['clickhouse_databases']['main']['password'] = 'PASSWORD_HERE' # replace with the actual password ``` @@ -100,9 +126,7 @@ To provide GitLab with ClickHouse credentials: sudo gitlab-ctl reconfigure ``` -{{< /tab >}} - -{{< tab title="Helm chart (Kubernetes)" >}} +:::TabTitle Helm chart (Kubernetes) 1. Save the ClickHouse password as a Kubernetes Secret: @@ -123,12 +147,12 @@ To provide GitLab with ClickHouse credentials: clickhouse: enabled: true main: - username: default + username: gitlab password: secret: gitlab-clickhouse-password key: main_password database: gitlab_clickhouse_main_production - url: 'http://example.com' + url: 'https://your-service.clickhouse.cloud:9440' ``` 1. Save the file and apply the new values: @@ -137,86 +161,519 @@ To provide GitLab with ClickHouse credentials: helm upgrade -f gitlab_values.yaml gitlab gitlab/gitlab ``` -{{< /tab >}} +::EndTabs -{{< /tabs >}} +#### Verify connection (Cloud) To verify that your connection is set up successfully: -1. Sign in to [Rails console](../administration/operations/rails_console.md#starting-a-rails-console-session) +1. Sign in to the [Rails console](../administration/operations/rails_console.md#starting-a-rails-console-session). 1. Execute the following command: ```ruby ClickHouse::Client.select('SELECT 1', :main) ``` - If successful, the command returns `[{"1"=>1}]` + If successful, the command returns `[{"1"=>1}]`. -### Run ClickHouse migrations +If the connection fails, verify: -{{< tabs >}} +- ClickHouse Cloud service is running and accessible. +- Network connectivity from GitLab to ClickHouse Cloud. Check firewalls and security groups. +- Connection URL includes the correct host and port. +- Credentials are correct. -{{< tab title="Linux package" >}} +#### Run migrations (Cloud) -To create the required database objects execute: +::Tabs + +:::TabTitle Linux package + +To create the required database objects, execute: ```shell sudo gitlab-rake gitlab:clickhouse:migrate ``` -{{< /tab >}} +:::TabTitle Helm chart (Kubernetes) -{{< tab title="Helm chart (Kubernetes)" >}} +Migrations are executed automatically using the GitLab-Migrations chart. -Migrations are executed automatically using the [GitLab-Migrations chart](https://docs.gitlab.com/charts/charts/gitlab/migrations/#clickhouse-optional). - -Alternatively, you can run migrations by executing the following command in the [Toolbox pod](https://docs.gitlab.com/charts/charts/gitlab/toolbox/): +Alternatively, you can run migrations by executing the following command in the Toolbox pod: ```shell gitlab-rake gitlab:clickhouse:migrate ``` -{{< /tab >}} +::EndTabs -{{< /tabs >}} +### Set up self-managed ClickHouse (BYOC) -### Enable ClickHouse for Analytics +Prerequisites: + +- ClickHouse instance installed and running +- Compatible ClickHouse version. See [Supported ClickHouse versions](#supported-clickhouse-versions). +- Network connectivity from your GitLab instance to ClickHouse +- Administrator access to both ClickHouse and GitLab + +To set up self-managed ClickHouse with GitLab: + +1. [Verify ClickHouse installation](#verify-clickhouse-installation). +1. [Create the GitLab database and user](#create-database-and-user-byoc). +1. Optional. [Configure high availability](#configure-high-availability) (for HA deployments). +1. Optional. [Configure load balancer](#configure-load-balancer) (for HA deployments). +1. [Configure the GitLab connection](#configure-gitlab-connection-byoc). +1. [Verify the connection](#verify-connection-byoc). +1. [Run ClickHouse migrations](#run-migrations-byoc). +1. [Enable ClickHouse for Analytics](#enable-clickhouse-for-analytics). -Now that your GitLab instance is connected to ClickHouse, you can enable features to use ClickHouse by [enabling ClickHouse for Analytics](../administration/analytics.md). +{{< alert type="warning" >}} +**Manual upgrades required**: For self-managed ClickHouse, you are responsible for planning and executing version upgrades, security patches, and backups. See [Upgrade ClickHouse](#upgrade-clickhouse) for guidance. +{{< /alert >}} + +#### Verify ClickHouse installation + +Before configuring the database, verify ClickHouse is installed and accessible: + +1. Check ClickHouse is running: -## `Replicated` database engine + ```shell + clickhouse-client --query "SELECT version()" + ``` -{{< history >}} + Expected output: Version number (for example, `24.3.1.12`) -- [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/560927) as an experiment in GitLab 18.5. +1. Verify you can connect with credentials: -{{< /history >}} + ```shell + clickhouse-client --host your-clickhouse-host --port 9000 --user default --password 'your-password' + ``` -For a multi-node, high-availability setup, GitLab supports the `Replicated` table engine in ClickHouse. +If ClickHouse is not installed, see: + +- [ClickHouse official installation guide](https://clickhouse.com/docs/en/install) +- [ClickHouse recommendations for GitLab Self-Managed](https://clickhouse.com/docs/guides/sizing-and-hardware-recommendations) + +#### Create database and user (BYOC) + +To create the necessary user and database objects: + +1. Generate a secure password and save it. +1. Sign in to the ClickHouse SQL console or use `clickhouse-client`. +1. Execute the following commands, replacing `PASSWORD_HERE` with the generated password: + + ```sql + CREATE DATABASE gitlab_clickhouse_main_production; + CREATE USER gitlab IDENTIFIED WITH sha256_password BY 'PASSWORD_HERE'; + CREATE ROLE gitlab_app; + GRANT SELECT, INSERT, ALTER, CREATE, UPDATE, DROP, TRUNCATE, OPTIMIZE ON gitlab_clickhouse_main_production.* TO gitlab_app; + GRANT SELECT ON information_schema.* TO gitlab_app; + GRANT gitlab_app TO gitlab; + ``` + +#### Configure high availability + +{{< alert type="note" >}} +**Status:** Experiment +**Introduced:** GitLab 18.5 +{{< /alert >}} + +For a multi-node, high-availability setup, GitLab supports the Replicated table engine in ClickHouse. Prerequisites: -- A cluster must be defined in the `remote_servers` [configuration section](https://clickhouse.com/docs/architecture/cluster-deployment#configure-clickhouse-servers). -- The following [macros](https://clickhouse.com/docs/architecture/cluster-deployment#macros-config-explanation) must be configured: +- ClickHouse cluster with multiple nodes (minimum 3 nodes recommended) +- A cluster must be defined in the `remote_servers` configuration section +- The following macros must be configured in your ClickHouse configuration: - `cluster` - `shard` - `replica` -When configuring the database, you must run the statements with the `ON CLUSTER` clause. -In the following example, replace `CLUSTER_NAME_HERE` with your cluster's name: +When configuring the database for HA, you must run the statements with the `ON CLUSTER` clause. In the following example, replace `CLUSTER_NAME_HERE` with your cluster's name: + +```sql +CREATE DATABASE gitlab_clickhouse_main_production ON CLUSTER CLUSTER_NAME_HERE ENGINE = Replicated('/clickhouse/databases/{cluster}/gitlab_clickhouse_main_production', '{shard}', '{replica}'); +CREATE USER gitlab IDENTIFIED WITH sha256_password BY 'PASSWORD_HERE' ON CLUSTER CLUSTER_NAME_HERE; +CREATE ROLE gitlab_app ON CLUSTER CLUSTER_NAME_HERE; +GRANT SELECT, INSERT, ALTER, CREATE, UPDATE, DROP, TRUNCATE, OPTIMIZE ON gitlab_clickhouse_main_production.* TO gitlab_app ON CLUSTER CLUSTER_NAME_HERE; +GRANT SELECT ON information_schema.* TO gitlab_app ON CLUSTER CLUSTER_NAME_HERE; +GRANT gitlab_app TO gitlab ON CLUSTER CLUSTER_NAME_HERE; +``` + +For more information, see [ClickHouse Replicated database engine documentation](https://clickhouse.com/docs/en/engines/database-engines/replicated). + +#### Configure load balancer + +For HA deployments, configure a load balancer to distribute requests across ClickHouse nodes. + +The GitLab application communicates with the ClickHouse cluster through the HTTP/HTTPS interface. You should use an HTTP proxy or load balancer to distribute requests across cluster nodes. + +Recommended load balancer options: + +- [chproxy](https://www.chproxy.org/) - ClickHouse-specific HTTP proxy with built-in caching and routing +- HAProxy - General-purpose TCP/HTTP load balancer +- NGINX - Web server with load balancing capabilities +- Cloud provider load balancers (AWS Application Load Balancer, GCP Load Balancer, Azure Load Balancer) + +Basic chproxy configuration example: + +```yaml +server: + http: + listen_addr: ":8080" + +clusters: + - name: "clickhouse_cluster" + nodes: [ + "http://ch-node1:8123", + "http://ch-node2:8123", + "http://ch-node3:8123" + ] + +users: + - name: "gitlab" + password: "your_secure_password" + to_cluster: "clickhouse_cluster" + to_user: "gitlab" +``` + +When using a load balancer, configure GitLab to connect to the load balancer URL instead of individual ClickHouse nodes. + +For more information, see [chproxy documentation](https://www.chproxy.org/). + +#### Configure GitLab connection (BYOC) + +::Tabs + +:::TabTitle Linux package + +To provide GitLab with ClickHouse credentials: + +1. Edit `/etc/gitlab/gitlab.rb`: + + ```ruby + gitlab_rails['clickhouse_databases']['main']['database'] = 'gitlab_clickhouse_main_production' + gitlab_rails['clickhouse_databases']['main']['url'] = 'https://your-clickhouse-host:8443' # Use load balancer URL for HA deployments + gitlab_rails['clickhouse_databases']['main']['username'] = 'gitlab' + gitlab_rails['clickhouse_databases']['main']['password'] = 'PASSWORD_HERE' # replace with the actual password + ``` + +1. Save the file and reconfigure GitLab: + + ```shell + sudo gitlab-ctl reconfigure + ``` + +:::TabTitle Helm chart (Kubernetes) + +1. Save the ClickHouse password as a Kubernetes Secret: + + ```shell + kubectl create secret generic gitlab-clickhouse-password --from-literal="main_password=PASSWORD_HERE" + ``` + +1. Export the Helm values: + + ```shell + helm get values gitlab > gitlab_values.yaml + ``` + +1. Edit `gitlab_values.yaml`: + + ```yaml + global: + clickhouse: + enabled: true + main: + username: gitlab + password: + secret: gitlab-clickhouse-password + key: main_password + database: gitlab_clickhouse_main_production + url: 'https://your-clickhouse-host:8443' # Use load balancer URL for HA deployments + ``` + +1. Save the file and apply the new values: + + ```shell + helm upgrade -f gitlab_values.yaml gitlab gitlab/gitlab + ``` + +::EndTabs + +{{< alert type="note" >}} +**TLS/SSL configuration**: For production deployments, configure TLS/SSL on your ClickHouse instance and use `https://` URLs. See [ClickHouse TLS/SSL configuration](https://clickhouse.com/docs/guides/sre/configuring-ssl) for details. +{{< /alert >}} + +#### Verify connection (BYOC) + +To verify that your connection is set up successfully: + +1. Sign in to the [Rails console](../administration/operations/rails_console.md#starting-a-rails-console-session). +1. Execute the following command: + + ```ruby + ClickHouse::Client.select('SELECT 1', :main) + ``` + + If successful, the command returns `[{"1"=>1}]`. + +If the connection fails, verify: + +- ClickHouse service is running on all nodes. +- Network connectivity from GitLab to ClickHouse. Check firewalls and security groups. +- Connection URL is correct (host, port, protocol). +- Credentials are correct. +- For HA setups: Load balancer is properly configured and routing requests. + +#### Run migrations (BYOC) + +::Tabs + +:::TabTitle Linux package + +To create the required database objects, execute: + +```shell +sudo gitlab-rake gitlab:clickhouse:migrate +``` + +:::TabTitle Helm chart (Kubernetes) + +Migrations are executed automatically using the GitLab-Migrations chart. + +Alternatively, you can run migrations by executing the following command in the Toolbox pod: + +```shell +gitlab-rake gitlab:clickhouse:migrate +``` + +::EndTabs + +### Enable ClickHouse for Analytics + +After your GitLab instance is connected to ClickHouse, you can enable features that use ClickHouse: + +Prerequisites: + +- You must have administrator access to the instance. +- ClickHouse connection is configured and verified. +- Migrations have been successfully completed. + +To enable ClickHouse for Analytics: + +1. On the left sidebar, at the bottom, select **Admin**. +1. Select **Settings > General**. +1. Expand **ClickHouse**. +1. Select **Enable ClickHouse for Analytics**. +1. Select **Save changes**. + +### Disable ClickHouse for Analytics + +To disable ClickHouse for Analytics: + +Prerequisites: + +- You must have administrator access to the instance. + +To disable: + +1. On the left sidebar, at the bottom, select **Admin**. +1. Select **Settings > General**. +1. Expand **ClickHouse**. +1. Clear the **Enable ClickHouse for Analytics** checkbox. +1. Select **Save changes**. + +{{< alert type="note" >}} +Disabling ClickHouse for Analytics stops GitLab from querying ClickHouse but does not delete any data from your ClickHouse instance. Analytics features that rely on ClickHouse will fall back to alternative data sources or become unavailable. +{{< /alert >}} + +## Upgrade ClickHouse + +### ClickHouse Cloud + +ClickHouse Cloud automatically handles version upgrades and security patches. No manual intervention is required. + +**Upgrade behavior:** + +- **Development and Production plans**: Upgrades are applied automatically during scheduled maintenance windows. +- **Enterprise plans**: You can schedule custom upgrade windows to control when upgrades occur. + +To view or schedule upgrade windows (Enterprise plans): + +1. Sign in to [ClickHouse Cloud](https://clickhouse.cloud). +1. Select your service. +1. Go to **Settings > Maintenance**. +1. Configure your preferred maintenance window. + +{{< alert type="note" >}} +ClickHouse Cloud notifies you in advance of upcoming upgrades. Review the [ClickHouse Cloud changelog](https://clickhouse.com/docs/cloud/changes) to stay informed about new features and changes. +{{< /alert >}} + +### Self-managed ClickHouse (BYOC) + +For self-managed ClickHouse, you are responsible for planning and executing version upgrades. + +Prerequisites: + +- You must have administrator access to the ClickHouse instance. +- Back up your data before upgrading. See [Disaster recovery](#disaster-recovery). + +Before upgrading: + +1. Review the [ClickHouse release notes](https://clickhouse.com/docs/category/release-notes) for breaking changes. +1. Check [compatibility](#supported-clickhouse-versions) with your GitLab version. +1. Test the upgrade in a non-production environment. +1. Plan for potential downtime, or use a rolling upgrade strategy for HA clusters. + +To upgrade ClickHouse: + +1. For single-node deployments, follow the [ClickHouse upgrade documentation](https://clickhouse.com/docs/manage/updates). +1. For HA cluster deployments, perform a rolling upgrade to minimize downtime: + - Upgrade one node at a time. + - Wait for the node to rejoin the cluster. + - Verify cluster health before proceeding to the next node. + +{{< alert type="warning" >}} +Always ensure the ClickHouse version remains compatible with your GitLab version. See [Supported ClickHouse versions](#supported-clickhouse-versions) for the compatibility matrix. Incompatible versions may cause indexing to pause and features to fail. +{{< /alert >}} + +For detailed upgrade procedures, see the [ClickHouse documentation on updates](https://clickhouse.com/docs/manage/updates). + +## Operations + +### Check migration status + +Prerequisites: + +- You must have administrator access to the instance. + +To check the status of ClickHouse migrations: + +1. On the left sidebar, at the bottom, select **Admin**. +1. Select **Settings > General**. +1. Expand **ClickHouse**. +1. Review the **Migration status** section if available. + +Alternatively, check for pending migrations using the Rails console: + +```ruby +# Sign in to Rails console +# Run this to check migrations +ClickHouse::MigrationSupport::Migrator.new(:main).pending_migrations +``` + +### Retry failed migrations + +If a ClickHouse migration fails: + +1. Check the logs for error details. ClickHouse-related errors are logged in the GitLab application logs. +1. Address the underlying issue (for example, insufficient memory, connectivity problems). +1. Retry the migration: + + ```shell + # For installations that use the Linux package + sudo gitlab-rake gitlab:clickhouse:migrate + + # For self-compiled installations + bundle exec rake gitlab:clickhouse:migrate RAILS_ENV=production + ``` + +{{< alert type="note" >}} +Migrations are designed to be idempotent and safe to retry. If a migration fails partway through, running it again will resume from where it left off or skip already-completed steps. +{{< /alert >}} + +## ClickHouse Rake tasks - ```sql - CREATE DATABASE gitlab_clickhouse_main_production ON CLUSTER CLUSTER_NAME_HERE ENGINE = Replicated('/clickhouse/databases/{cluster}/gitlab_clickhouse_main_production', '{shard}', '{replica}') - CREATE USER gitlab IDENTIFIED WITH sha256_password BY 'PASSWORD_HERE' ON CLUSTER CLUSTER_NAME_HERE; - CREATE ROLE gitlab_app ON CLUSTER CLUSTER_NAME_HERE; - GRANT SELECT, INSERT, ALTER, CREATE, UPDATE, DROP, TRUNCATE, OPTIMIZE ON gitlab_clickhouse_main_production.* TO gitlab_app ON CLUSTER CLUSTER_NAME_HERE; - GRANT SELECT ON information_schema.* TO gitlab_app ON CLUSTER CLUSTER_NAME_HERE; - GRANT gitlab_app TO gitlab ON CLUSTER CLUSTER_NAME_HERE; - ``` +GitLab provides several Rake tasks for managing your ClickHouse database. -### Load balancer considerations +The following Rake tasks are available: -The GitLab application communicates with the ClickHouse cluster through the HTTP/HTTPS interface. Consider using an HTTP proxy for load balancing requests to the ClickHouse cluster, such as [`chproxy`](https://www.chproxy.org/). +| Task | Description | +|------|-------------| +| [`sudo gitlab-rake gitlab:clickhouse:migrate`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/tasks/gitlab/click_house/migration.rake) | Runs all pending ClickHouse migrations to create or update database schema. | +| [`sudo gitlab-rake gitlab:clickhouse:drop`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/tasks/gitlab/click_house/migration.rake) | Drops all ClickHouse databases. Use with extreme caution as this deletes all data. | +| [`sudo gitlab-rake gitlab:clickhouse:create`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/tasks/gitlab/click_house/migration.rake) | Creates ClickHouse databases if they don't exist. | +| [`sudo gitlab-rake gitlab:clickhouse:setup`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/tasks/gitlab/click_house/migration.rake) | Creates databases and runs all migrations. Equivalent to running `create` and `migrate` tasks. | +| [`sudo gitlab-rake gitlab:clickhouse:schema:dump`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/tasks/gitlab/click_house/migration.rake) | Dumps the current database schema to a file for backup or version control. | +| [`sudo gitlab-rake gitlab:clickhouse:schema:load`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/tasks/gitlab/click_house/migration.rake) | Loads the database schema from a dump file. | + +{{< alert type="note" >}} +For self-compiled installations, use `bundle exec rake` instead of `sudo gitlab-rake` and add `RAILS_ENV=production` to the end of the command. +{{< /alert >}} + +### Common task examples + +#### Verify ClickHouse connection and schema + +To verify your ClickHouse connection is working: + +```shell +# For installations that use the Linux package +sudo gitlab-rake gitlab:clickhouse:info + +# For self-compiled installations +bundle exec rake gitlab:clickhouse:info RAILS_ENV=production +``` + +This task outputs debugging information about the ClickHouse connection and configuration. + +#### Re-run all migrations + +To run all pending migrations: + +```shell +# For installations that use the Linux package +sudo gitlab-rake gitlab:clickhouse:migrate + +# For self-compiled installations +bundle exec rake gitlab:clickhouse:migrate RAILS_ENV=production +``` + +#### Reset the database + +{{< alert type="warning" >}} +This deletes all data in your ClickHouse database. Use only in development or when troubleshooting. +{{< /alert >}} + +To drop and recreate the database: + +```shell +# For installations that use the Linux package +sudo gitlab-rake gitlab:clickhouse:drop +sudo gitlab-rake gitlab:clickhouse:setup + +# For self-compiled installations +bundle exec rake gitlab:clickhouse:drop RAILS_ENV=production +bundle exec rake gitlab:clickhouse:setup RAILS_ENV=production +``` + +### Environment variables + +You can use environment variables to control Rake task behavior: + +| Environment Variable | Data Type | Description | +|---------------------|-----------|-------------| +| `VERBOSE` | Boolean | Set to `true` to see detailed output during migrations. Example: `VERBOSE=true sudo gitlab-rake gitlab:clickhouse:migrate` | + +## Performance tuning + +{{< alert type="note" >}} +For resource sizing and deployment recommendations based on your user count, see [System requirements](#system-requirements). +{{< /alert >}} + +For information about ClickHouse architecture and performance tuning, see the [ClickHouse documentation on architecture](https://clickhouse.com/docs/architecture/introduction). + +### Query optimization + +ClickHouse is optimized for analytical queries. For best performance: + +- Use date/time range filters when possible (ClickHouse partitions by time). +- Limit result sets with `LIMIT` clauses. +- Use materialized views for frequently-run aggregations. +- Monitor slow queries in the ClickHouse query log. + +## Disaster recovery ### Backup and Restore @@ -265,13 +722,13 @@ To enable this: 1. Configure the `prometheus` section in your `config.xml` to expose metrics on a dedicated port (default is `9363`). ```xml - - /metrics - 9363 - true - true - true - + + /metrics + 9363 + true + true + true + ``` 1. Configure Prometheus or a similar compatible server to scrape `http://:9363/metrics`. @@ -322,13 +779,13 @@ For self-managed instances, ensure the `query_log` configuration parameter is en 1. Verify that the `query_log` section exists in your `config.xml` or `users.xml`: ```xml - - system - query_log
- toYYYYMM(event_date) - 7500 - event_date + INTERVAL 30 DAY -
+ + system + query_log
+ toYYYYMM(event_date) + 7500 + event_date + INTERVAL 30 DAY +
``` 1. Once enabled, all executed queries are recorded in the `system.query_log` table, allowing for audit trail. @@ -444,25 +901,26 @@ HA setup becomes cost effective only at 10k users or above. ## Glossary -- Cluster: A collection of nodes (servers) that work together to store and process data. -- MergeTree: [`MergeTree`](https://clickhouse.com/docs/engines/table-engines/mergetree-family/mergetree) is a table engine in ClickHouse designed for high data ingest rates and large data volumes. +- **Cluster**: A collection of nodes (servers) that work together to store and process data. +- **MergeTree**: [`MergeTree`](https://clickhouse.com/docs/engines/table-engines/mergetree-family/mergetree) is a table engine in ClickHouse designed for high data ingest rates and large data volumes. It is the core storage engine in ClickHouse, providing features such as columnar storage, custom partitioning, sparse primary indexes, and support for background data merges. -- Parts: A physical file on a disk that stores a portion of the table's data. +- **Parts**: A physical file on a disk that stores a portion of the table's data. A part is different from a partition, which is a logical division of a table's data that is created using a partition key. -- Replica: A copy of the data stored in a ClickHouse database. +- **Replica**: A copy of the data stored in a ClickHouse database. You can have any number of replicas of the same data for redundancy and reliability. Replicas are used in conjunction with the ReplicatedMergeTree table engine, which enables ClickHouse to keep multiple copies of data in sync across different servers. -- Shard: A subset of data. +- **Shard**: A subset of data. ClickHouse always has at least one shard for your data. If you do not split the data across multiple servers, your data is stored in one shard. Sharding data across multiple servers can be used to divide the load if you exceed the capacity of a single server. -- TTL: Time To Live (TTL) is a ClickHouse feature that automatically moves, deletes, or rolls up columns/rows after a certain time period. +- **TTL (Time To Live)**: Time To Live (TTL) is a ClickHouse feature that automatically moves, deletes, or rolls up columns/rows after a certain time period. This allows you to manage storage more efficiently because you can delete, move, or archive the data that you no longer need to access frequently. ## Troubleshooting ### Database schema migrations on GitLab 18.0.0 and earlier +{{< alert variant="warning" >}} On GitLab 18.0.0 and earlier, running database schema migrations for ClickHouse may fail for ClickHouse 24.x and 25.x with the following error message: ```plaintext @@ -470,10 +928,11 @@ Code: 344. DB::Exception: Projection is fully supported in ReplacingMergeTree wi ``` Without running all migrations, the ClickHouse integration will not work. +{{< /alert >}} To work around this issue and run the migrations: -1. Sign in to [Rails console](../administration/operations/rails_console.md#starting-a-rails-console-session) +1. Sign in to the [Rails console](../administration/operations/rails_console.md#starting-a-rails-console-session). 1. Execute the following command: ```ruby