From a295611ed3af4c10f86e05446d7bdf1086329d3e Mon Sep 17 00:00:00 2001 From: Nailia Iskhakova Date: Mon, 9 Aug 2021 19:25:25 +0300 Subject: [PATCH 1/4] Add 2k hybrid documentation Signed-off-by: Nailia Iskhakova --- .../reference_architectures/2k_users.md | 135 ++++++++++++++++++ .../reference_architectures/3k_users.md | 2 +- .../reference_architectures/index.md | 1 + 3 files changed, 137 insertions(+), 1 deletion(-) diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index e11627b0c8fea7..f742a152a67e53 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -967,6 +967,141 @@ Read: - The [Gitaly and NFS deprecation notice](../gitaly/index.md#nfs-deprecation-notice). - About the [correct mount options to use](../nfs.md#upgrade-to-gitaly-cluster-or-disable-caching-if-experiencing-data-loss). +## Cloud Native Hybrid reference architecture with Helm Charts (alternative) + +As an alternative approach, you can also run select components of GitLab as Cloud Native +in Kubernetes via our official [Helm Charts](https://docs.gitlab.com/charts/). +In this setup, we support running the equivalent of GitLab Rails and Sidekiq nodes +in a Kubernetes cluster, named Webservice and Sidekiq respectively. In addition, +the following other supporting services are supported: NGINX, Task Runner, Migrations, +Prometheus, and Grafana. + +Hybrid installations leverage the benefits of both cloud native and traditional +compute deployments. With this, _stateless_ components can benefit from cloud native +workload management benefits while _stateful_ components are deployed in compute VMs +with Omnibus to benefit from increased permanence. + +NOTE: +This is an **advanced** setup. Running services in Kubernetes is well known +to be complex. **This setup is only recommended** if you have strong working +knowledge and experience in Kubernetes. The rest of this +section assumes this. + +### Cluster topology + +The following tables and diagram detail the hybrid environment using the same formats +as the normal environment above. + +First are the components that run in Kubernetes. The recommendation at this time is to +use Google Cloud’s Kubernetes Engine (GKE) and associated machine types, but the memory +and CPU requirements should translate to most other providers. We hope to update this in the +future with further specific cloud provider details. + +| Service | Nodes1 | Configuration | GCP | Allocatable CPUs and Memory | +|-------------------------------------------------------|-------------------|------------------------|-----------------|-----------------------------| +| Webservice | 3 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | 23.7 vCPU, 16.9 GB memory | +| Sidekiq | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory | +| Supporting services such as NGINX, Prometheus | 2 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | 1.9 vCPU, 5.5 GB memory | + + + +1. Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. + In production deployments there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. + + +Next are the backend components that run on static compute VMs via Omnibus (or External PaaS +services where applicable): + +| Service | Nodes | Configuration | GCP | +|--------------------------------------------|-------|-------------------------|------------------| +| PostgreSQL1 | 1 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | +| Redis2 | 1 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | +| Gitaly | 1 | 4 vCPU, 15 GB memory | `n1-standard-4` | +| Object storage3 | n/a | n/a | n/a | + + + +1. Can be optionally run on reputable third-party external PaaS PostgreSQL solutions. Google Cloud SQL and AWS RDS are known to work, however Azure Database for PostgreSQL is [not recommended](https://gitlab.com/gitlab-org/quality/reference-architectures/-/issues/61) due to performance issues. Consul is primarily used for PostgreSQL high availability so can be ignored when using a PostgreSQL PaaS setup. However it is also used optionally by Prometheus for Omnibus auto host discovery. +2. Can be optionally run on reputable third-party external PaaS Redis solutions. Google Memorystore and AWS Elasticache are known to work. +3. Should be run on reputable third-party object storage (storage PaaS) for cloud implementations. Google Cloud Storage and AWS S3 are known to work. + + +NOTE: +For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. + +```plantuml +@startuml 2k + +card "Kubernetes via Helm Charts" as kubernetes { + card "**External Load Balancer**" as elb #6a9be7 + + together { + collections "**Webservice** x3" as gitlab #32CD32 + collections "**Sidekiq** x2" as sidekiq #ff8dd1 + } + + card "**Prometheus + Grafana**" as monitor #7FFFD4 + card "**Supporting Services**" as support +} + +card "**Gitaly**" as gitaly #FF8C00 +card "**PostgreSQL**" as postgres #4EA7FF +card "**Redis**" as redis #FF6347 +cloud "**Object Storage**" as object_storage #white + +elb -[#6a9be7]-> gitlab +elb -[#6a9be7]--> monitor + +gitlab -[#32CD32]--> gitaly +gitlab -[#32CD32]--> postgres +gitlab -[#32CD32]-> object_storage +gitlab -[#32CD32]--> redis + +sidekiq -[#ff8dd1]--> gitaly +sidekiq -[#ff8dd1]-> object_storage +sidekiq -[#ff8dd1]---> postgres +sidekiq -[#ff8dd1]---> redis + +monitor .[#7FFFD4]u-> gitlab +monitor .[#7FFFD4]-> gitaly +monitor .[#7FFFD4]-> postgres +monitor .[#7FFFD4,norank]--> redis +monitor .[#7FFFD4,norank]u--> elb + +@enduml +``` + +### Resource usage settings + +The following formulas help when calculating how many pods may be deployed within resource constraints. +The [2k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/2k.yaml) +documents how to apply the calculated configuration to the Helm Chart. + +#### Webservice + +Webservice pods typically need about 1 vCPU and 1.25 GB of memory _per worker_. +Each Webservice pod consumes roughly 2 vCPUs and 2.5 GB of memory using +the [recommended topology](#cluster-topology) because two worker processes +are created by default and each pod has other small processes running. + +For 2,000 users we recommend a total Puma worker count of around 12. +With the [provided recommendations](#cluster-topology) this allows the deployment of up to 6 +Webservice pods with 2 workers per pod and 2 pods per node. Expand available resources using +the ratio of 1 vCPU to 1.25 GB of memory _per each worker process_ for each additional +Webservice pod. + +For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). + +#### Sidekiq + +Sidekiq pods should generally have 1 vCPU and 2 GB of memory. + +[The provided starting point](#cluster-topology) allows the deployment of up to +2 Sidekiq pods. Expand available resources using the 1 vCPU to 2GB memory +ratio for each additional pod. + +For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +
Back to setup components diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 1d3c57e5509267..f4ae01c7442b7f 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -2150,7 +2150,7 @@ services where applicable): | PostgreSQL1 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | | PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | | Internal load balancing node3 | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | -| Gitaly | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | +| Gitaly | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | | Praefect | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | | Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | | Object storage4 | n/a | n/a | n/a | diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index 2965b1202db91e..74d8bf39d03e12 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -71,6 +71,7 @@ The following reference architectures are available: The following Cloud Native Hybrid reference architectures, where select recommended components can be run in Kubernetes, are available: +- [Up to 2,000 users](2k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) - [Up to 3,000 users](3k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) - [Up to 5,000 users](5k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) - [Up to 10,000 users](10k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) -- GitLab From f68db9ee5610cd2bff4ad3fb6d75bc56d530edad Mon Sep 17 00:00:00 2001 From: Nailia Iskhakova Date: Mon, 9 Aug 2021 21:01:20 +0300 Subject: [PATCH 2/4] Mention that 2k hybrid is not HA Signed-off-by: Nailia Iskhakova --- doc/administration/reference_architectures/2k_users.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index f742a152a67e53..0c7146b0d16722 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -981,6 +981,8 @@ compute deployments. With this, _stateless_ components can benefit from cloud na workload management benefits while _stateful_ components are deployed in compute VMs with Omnibus to benefit from increased permanence. +Please note that 2,000 reference architecture is not highly-available setup. To achieve HA, you can follow a modified [3K reference architecture](3k_users.md#supported-modifications-for-lower-user-counts-ha). + NOTE: This is an **advanced** setup. Running services in Kubernetes is well known to be complex. **This setup is only recommended** if you have strong working -- GitLab From 96e4af60f79e0decc9ab302c7cd1ab7f3405e4bc Mon Sep 17 00:00:00 2001 From: Nailia Iskhakova Date: Tue, 17 Aug 2021 16:22:07 +0300 Subject: [PATCH 3/4] Update link to 3k hybrid Signed-off-by: Nailia Iskhakova --- doc/administration/reference_architectures/2k_users.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 0c7146b0d16722..e17c5ebac623ce 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -981,7 +981,7 @@ compute deployments. With this, _stateless_ components can benefit from cloud na workload management benefits while _stateful_ components are deployed in compute VMs with Omnibus to benefit from increased permanence. -Please note that 2,000 reference architecture is not highly-available setup. To achieve HA, you can follow a modified [3K reference architecture](3k_users.md#supported-modifications-for-lower-user-counts-ha). +Please note that 2,000 reference architecture is not highly-available setup. To achieve HA, you can follow a modified [3K reference architecture](3k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative). NOTE: This is an **advanced** setup. Running services in Kubernetes is well known -- GitLab From 5bd65b305ef2acead75a69891fb3e4d63c0aa776 Mon Sep 17 00:00:00 2001 From: Nailia Iskhakova Date: Wed, 18 Aug 2021 19:42:02 +0300 Subject: [PATCH 4/4] Add 2k hybrid documentation --- doc/administration/reference_architectures/2k_users.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index e17c5ebac623ce..0af4dbc8a7f344 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -981,7 +981,7 @@ compute deployments. With this, _stateless_ components can benefit from cloud na workload management benefits while _stateful_ components are deployed in compute VMs with Omnibus to benefit from increased permanence. -Please note that 2,000 reference architecture is not highly-available setup. To achieve HA, you can follow a modified [3K reference architecture](3k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative). +The 2,000 reference architecture is not a highly-available setup. To achieve HA, you can follow a modified [3K reference architecture](3k_users.md#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative). NOTE: This is an **advanced** setup. Running services in Kubernetes is well known -- GitLab