From 4e7bd89338605f3acdc1c0e915d603effaaaf446 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Wed, 17 Apr 2024 16:33:09 +0100 Subject: [PATCH 01/20] Adjust Reference Architecture Cloud Native Webservice sizings --- .../reference_architectures/10k_users.md | 76 ++++++++++--------- .../reference_architectures/25k_users.md | 72 ++++++++++-------- .../reference_architectures/2k_users.md | 68 +++++++++-------- .../reference_architectures/3k_users.md | 72 ++++++++++-------- .../reference_architectures/50k_users.md | 68 ++++++++++------- .../reference_architectures/5k_users.md | 72 ++++++++++-------- .../reference_architectures/index.md | 12 ++- 7 files changed, 247 insertions(+), 193 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index e6acf7840bb084..dab3e149b2f8e5 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -61,7 +61,7 @@ specifically the [Before you start](index.md#before-you-start) and [Deciding whi NOTE: -For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. +For all PaaS solutions that involve configuring instances, it's recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. ```plantuml @startuml 10k @@ -2268,16 +2268,19 @@ as the typical environment above. First are the components that run in Kubernetes. These run across several node groups, although you can change the overall makeup as desired as long as the minimum CPU and Memory requirements are observed. -| Service Node Group | Nodes | Configuration | GCP | AWS | Min Allocatable CPUs and Memory | -|---------------------|-------|-------------------------|-----------------|--------------|---------------------------------| -| Webservice | 4 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | 127.5 vCPU, 118 GB memory | -| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | 15.5 vCPU, 50 GB memory | -| Supporting services | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | 7.75 vCPU, 25 GB memory | +| Component Node Group | Target Node Pool Totals | GCP Example | AWS Example | +|----------------------|-------------------------|-----------------|--------------| +| Webservice | 80 vCPU
100 GB memory (request)
140 GB memory (limit) | 3 x `n1-standard-32` | 3 x `c5.9xlarge` | +| Sidekiq | 12.6 vCPU
28 GB memory (request)
56 GB memory (limit) | 4 x `n1-standard-4` | 4 x `m5.xlarge` | +| Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-4` | 2 x `m5.xlarge` | - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) - [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. -- Nodes configuration is shown as it is forced to ensure pod vCPU / memory ratios and avoid scaling during **performance testing**. - - In production deployments, there is no need to assign pods to specific nodes. A minimum of three nodes per node group in three different availability zones is strongly recommended to align with resilient cloud architecture practices. +[Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. +- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): @@ -2312,7 +2315,7 @@ services where applicable): NOTE: -For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. +For all PaaS solutions that involve configuring instances, it's recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. ```plantuml @startuml 10k @@ -2322,7 +2325,7 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x4" as gitlab #32CD32 + collections "**Webservice** x3" as gitlab #32CD32 collections "**Sidekiq** x4" as sidekiq #ff8dd1 } @@ -2384,38 +2387,39 @@ consul .[#e76a9b]--> redis @enduml ``` -### Resource usage settings +### Kubernetes component targets -The following formulas help when calculating how many pods may be deployed within resource constraints. -The [10k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/10k.yaml) -documents how to apply the calculated configuration to the Helm Chart. +The following section details the targets used for the GitLab components deployed in Kubernetes. #### Webservice -Webservice pods typically need about 1 CPU and 1.25 GB of memory _per worker_. -Each Webservice pod consumes roughly 4 CPUs and 5 GB of memory using -the [recommended topology](#cluster-topology) because four worker processes -are created by default and each pod has other small processes running. +Each Webservice pod (Puma and Workhorse) is recommended to be run with the following configuration: -For 200 RPS or 10,000 users we recommend a total Puma worker count of around 80. -With the [provided recommendations](#cluster-topology) this allows the deployment of up to 20 -Webservice pods with 4 workers per pod and 5 pods per node. Expand available resources using -the ratio of 1 CPU to 1.25 GB of memory _per each worker process_ for each additional -Webservice pod. +- 4 Puma Workers +- 4 vCPU +- 5 GB memory (request) +- 7 GB memory (limit) -For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For 200 RPS or 10,000 users we recommend a total Puma worker count of around 80 so in turn it's recommended to run at +least 20 Webservice pods. + +For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). #### Sidekiq -Sidekiq pods should generally have 0.9 CPU and 2 GB of memory. +Each Sidekiq pod is recommended to be run with the following configuration: + +- 1 Sidekiq worker +- 900m vCPU +- 2 GB memory (request) +- 4 GB memory (limit) -[The provided starting point](#cluster-topology) allows the deployment of up to -14 Sidekiq pods. Expand available resources using the 0.9 CPU to 2 GB memory -ratio for each additional pod. +Similar to the standard deployment above, an initial target of 14 Sidekiq workers has been used here. +Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). -#### Supporting +### Supporting The Supporting Node Pool is designed to house all supporting deployments that don't need to be on the Webservice and Sidekiq pools. @@ -2423,16 +2427,14 @@ on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). -If you wish to make any additional deployments, such as for Monitoring, it's recommended +If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed specifically to accommodate several additional deployments. However, if your deployments don't fit into the -pool as given, you can increase the node pool accordingly. - -## Secrets +pool as given, you can increase the node pool accordingly. Conversely, if the pool in your use case is over-provisioned you can reduce accordingly. -When setting up a Cloud Native Hybrid environment, it's worth noting that several secrets should be synced from backend VMs from the `/etc/gitlab/gitlab-secrets.json` file into Kubernetes. +### Example config file -For this setup specifically, the [GitLab Rails](https://docs.gitlab.com/charts/installation/secrets.html#gitlab-rails-secret) and [GitLab Shell](https://docs.gitlab.com/charts/installation/secrets.html#gitlab-rails-secret) secrets should be synced. +An example for the GitLab Helm Charts targetting the above 200 RPS or 10,000 reference architecture configuration [can be found in the Charts project](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/10k.yaml).
diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 316d874ae921f2..f4dbf6480d4aec 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -61,7 +61,7 @@ specifically the [Before you start](index.md#before-you-start) and [Deciding whi NOTE: -For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. +For all PaaS solutions that involve configuring instances, it's recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. ```plantuml @startuml 25k @@ -2274,16 +2274,19 @@ as the typical environment above. First are the components that run in Kubernetes. These run across several node groups, although you can change the overall makeup as desired as long as the minimum CPU and Memory requirements are observed. -| Service Node Group | Nodes | Configuration | GCP | AWS | Min Allocatable CPUs and Memory | -|---------------------|-------|-------------------------|-----------------|--------------|---------------------------------| -| Webservice | 7 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | 223 vCPU, 206.5 GB memory | -| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | 15.5 vCPU, 50 GB memory | -| Supporting services | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | 7.75 vCPU, 25 GB memory | +| Component Node Group | Target Node Pool Totals | GCP Example | AWS Example | +|----------------------|-------------------------|-----------------|--------------| +| Webservice | 140 vCPU
175 GB memory (request)
245 GB memory (limit) | 5 x `n1-standard-32` | 5 x `c5.9xlarge` | +| Sidekiq | 12.6 vCPU
28 GB memory (request)
56 GB memory (limit) | 4 x `n1-standard-4` | 4 x `m5.xlarge` | +| Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-4` | 2 x `m5.xlarge` | - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) - [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. -- Nodes configuration is shown as it is forced to ensure pod vCPU / memory ratios and avoid scaling during **performance testing**. - - In production deployments, there is no need to assign pods to specific nodes. A minimum of three nodes per node group in three different availability zones is strongly recommended to align with resilient cloud architecture practices. +[Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. +- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): @@ -2317,7 +2320,7 @@ services where applicable): NOTE: -For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. +For all PaaS solutions that involve configuring instances, it's recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. ```plantuml @startuml 25k @@ -2327,7 +2330,7 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x7" as gitlab #32CD32 + collections "**Webservice** x5" as gitlab #32CD32 collections "**Sidekiq** x4" as sidekiq #ff8dd1 } @@ -2389,36 +2392,37 @@ consul .[#e76a9b]--> redis @enduml ``` -### Resource usage settings +### Kubernetes component targets -The following formulas help when calculating how many pods may be deployed within resource constraints. -The [25k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/25k.yaml) -documents how to apply the calculated configuration to the Helm Chart. +The following section details the targets used for the GitLab components deployed in Kubernetes. #### Webservice -Webservice pods typically need about 1 CPU and 1.25 GB of memory _per worker_. -Each Webservice pod consumes roughly 4 CPUs and 5 GB of memory using -the [recommended topology](#cluster-topology) because four worker processes -are created by default and each pod has other small processes running. +Each Webservice pod (Puma and Workhorse) is recommended to be run with the following configuration: -For 500 RPS or 25,000 users we recommend a total Puma worker count of around 140. -With the [provided recommendations](#cluster-topology) this allows the deployment of up to 35 -Webservice pods with 4 workers per pod and 5 pods per node. Expand available resources using -the ratio of 1 CPU to 1.25 GB of memory _per each worker process_ for each additional -Webservice pod. +- 4 Puma Workers +- 4 vCPU +- 5 GB memory (request) +- 7 GB memory (limit) -For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For 500 RPS or 25,000 users we recommend a total Puma worker count of around 140 so in turn it's recommended to run at +least 35 Webservice pods. + +For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). #### Sidekiq -Sidekiq pods should generally have 0.9 CPU and 2 GB of memory. +Each Sidekiq pod is recommended to be run with the following configuration: + +- 1 Sidekiq worker +- 900m vCPU +- 2 GB memory (request) +- 4 GB memory (limit) -[The provided starting point](#cluster-topology) allows the deployment of up to -14 Sidekiq pods. Expand available resources using the 0.9 CPU to 2 GB memory -ratio for each additional pod. +Similar to the standard deployment above, an initial target of 14 Sidekiq workers has been used here. +Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). ### Supporting @@ -2428,10 +2432,14 @@ on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). -If you wish to make any additional deployments, such as for Monitoring, it's recommended +If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed specifically to accommodate several additional deployments. However, if your deployments don't fit into the -pool as given, you can increase the node pool accordingly. +pool as given, you can increase the node pool accordingly. Conversely, if the pool in your use case is over-provisioned you can reduce accordingly. + +### Example config file + +An example for the GitLab Helm Charts targetting the above 500 RPS or 25,000 reference architecture configuration [can be found in the Charts project](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/25k.yaml).
diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 0dcf15ccc67e6c..b7d5e20c1ec17d 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -1118,16 +1118,19 @@ as the typical environment above. First are the components that run in Kubernetes. These run across several node groups, although you can change the overall makeup as desired as long as the minimum CPU and Memory requirements are observed. -| Service Node Group | Nodes | Configuration | GCP | AWS | Min Allocatable CPUs and Memory | -|---------------------|-------|------------------------|-----------------|--------------|---------------------------------| -| Webservice | 3 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | 23.7 vCPU, 16.9 GB memory | -| Sidekiq | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | 7.8 vCPU, 25.9 GB memory | -| Supporting services | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | 1.9 vCPU, 5.5 GB memory | +| Component Node Group | Target Node Pool Totals | GCP Example | AWS Example | +|----------------------|-------------------------|-----------------|--------------| +| Webservice | 12 vCPU
15 GB memory (request)
21 GB memory (limit) | 3 x `n1-standard-8` | 3 x `c5.2xlarge` | +| Sidekiq | 3.6 vCPU
8 GB memory (request)
16 GB memory (limit) | 2 x `n1-standard-4` | 2 x `m5.xlarge` | +| Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-2` | 2 x `m5.large` | - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) - [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. -- Nodes configuration is shown as it is forced to ensure pod vCPU / memory ratios and avoid scaling during **performance testing**. - - In production deployments, there is no need to assign pods to specific nodes. A minimum of three nodes per node group in three different availability zones is strongly recommended to align with resilient cloud architecture practices. +[Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. +- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): @@ -1149,7 +1152,7 @@ services where applicable): NOTE: -For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. +For all PaaS solutions that involve configuring instances, it's recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. ```plantuml @startuml 2k @@ -1186,36 +1189,37 @@ sidekiq -[#ff8dd1]--> redis @enduml ``` -### Resource usage settings +### Kubernetes component targets -The following formulas help when calculating how many pods may be deployed within resource constraints. -The [2k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/2k.yaml) -documents how to apply the calculated configuration to the Helm Chart. +The following section details the targets used for the GitLab components deployed in Kubernetes. #### Webservice -Webservice pods typically need about 1 CPU and 1.25 GB of memory _per worker_. -Each Webservice pod consumes roughly 4 CPUs and 5 GB of memory using -the [recommended topology](#cluster-topology) because two worker processes -are created by default and each pod has other small processes running. +Each Webservice pod (Puma and Workhorse) is recommended to be run with the following configuration: -For 40 RPS or 2,000 users we recommend a total Puma worker count of around 12. -With the [provided recommendations](#cluster-topology) this allows the deployment of up to 3 -Webservice pods with 4 workers per pod and 1 pod per node. Expand available resources using -the ratio of 1 CPU to 1.25 GB of memory _per each worker process_ for each additional -Webservice pod. +- 4 Puma Workers +- 4 vCPU +- 5 GB memory (request) +- 7 GB memory (limit) -For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For 40 RPS or 2,000 users we recommend a total Puma worker count of around 12 so in turn it's recommended to run at +least 3 Webservice pods. + +For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). #### Sidekiq -Sidekiq pods should generally have 0.9 CPU and 2 GB of memory. +Each Sidekiq pod is recommended to be run with the following configuration: + +- 1 Sidekiq worker +- 900m vCPU +- 2 GB memory (request) +- 4 GB memory (limit) -[The provided starting point](#cluster-topology) allows the deployment of up to -4 Sidekiq pods. Expand available resources using the 0.9 CPU to 2 GB memory -ratio for each additional pod. +Similar to the standard deployment above, an initial target of 4 Sidekiq workers has been used here. +Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). ### Supporting @@ -1225,10 +1229,14 @@ on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). -If you wish to make any additional deployments, such as for Monitoring, it's recommended +If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed specifically to accommodate several additional deployments. However, if your deployments don't fit into the -pool as given, you can increase the node pool accordingly. +pool as given, you can increase the node pool accordingly. Conversely, if the pool in your use case is over-provisioned you can reduce accordingly. + +### Example config file + +An example for the GitLab Helm Charts for the above 40 RPS or 2,000 reference architecture configuration [can be found in the Charts project](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/2k.yaml).
diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index cf78fe53f52bb4..5c1e664e047dce 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -59,7 +59,7 @@ For a full list of reference architectures, see NOTE: -For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. +For all PaaS solutions that involve configuring instances, it's recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. ```plantuml @startuml 3k @@ -2256,16 +2256,19 @@ as the typical environment above. First are the components that run in Kubernetes. These run across several node groups, although you can change the overall makeup as desired as long as the minimum CPU and Memory requirements are observed. -| Service Node Group | Nodes | Configuration | GCP | AWS | Min Allocatable CPUs and Memory | -|---------------------|-------|-------------------------|-----------------|--------------|---------------------------------| -| Webservice | 2 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | `c5.4xlarge` | 31.8 vCPU, 24.8 GB memory | -| Sidekiq | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | 11.8 vCPU, 38.9 GB memory | -| Supporting services | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | 3.9 vCPU, 11.8 GB memory | +| Component Node Group | Target Node Pool Totals | GCP Example | AWS Example | +|----------------------|-------------------------|-----------------|--------------| +| Webservice | 16 vCPU
20 GB memory (request)
28 GB memory (limit) | 3 x `n1-standard-8` | 3 x `c5.2xlarge` | +| Sidekiq | 7.2 vCPU
16 GB memory (request)
32 GB memory (limit) | 3 x `n1-standard-4` | 2 x `m5.xlarge` | +| Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-2` | 2 x `m5.large` | - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) - [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. -- Nodes configuration is shown as it is forced to ensure pod vCPU / memory ratios and avoid scaling during **performance testing**. - - In production deployments, there is no need to assign pods to specific nodes. A minimum of three nodes per node group in three different availability zones is strongly recommended to align with resilient cloud architecture practices. +[Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. +- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): @@ -2298,7 +2301,7 @@ services where applicable): NOTE: -For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. +For all PaaS solutions that involve configuring instances, it's recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. ```plantuml @startuml 3k @@ -2308,7 +2311,7 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x2" as gitlab #32CD32 + collections "**Webservice** x3" as gitlab #32CD32 collections "**Sidekiq** x3" as sidekiq #ff8dd1 } @@ -2367,36 +2370,37 @@ consul .[#e76a9b]--> redis @enduml ``` -### Resource usage settings +### Kubernetes component targets -The following formulas help when calculating how many pods may be deployed within resource constraints. -The [3k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/3k.yaml) -documents how to apply the calculated configuration to the Helm Chart. +The following section details the targets used for the GitLab components deployed in Kubernetes. #### Webservice -Webservice pods typically need about 1 CPU and 1.25 GB of memory _per worker_. -Each Webservice pod consumes roughly 4 CPUs and 5 GB of memory using -the [recommended topology](#cluster-topology) because four worker processes -are created by default and each pod has other small processes running. +Each Webservice pod (Puma and Workhorse) is recommended to be run with the following configuration: -For 60 RPS or 3,000 users we recommend a total Puma worker count of around 16. -With the [provided recommendations](#cluster-topology) this allows the deployment of up to 4 -Webservice pods with 4 workers per pod and 2 pods per node. Expand available resources using -the ratio of 1 CPU to 1.25 GB of memory _per each worker process_ for each additional -Webservice pod. +- 4 Puma Workers +- 4 vCPU +- 5 GB memory (request) +- 7 GB memory (limit) -For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For 60 RPS or 3,000 users we recommend a total Puma worker count of around 16 so in turn it's recommended to run at +least 4 Webservice pods. + +For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). #### Sidekiq -Sidekiq pods should generally have 0.9 CPU and 2 GB of memory. +Each Sidekiq pod is recommended to be run with the following configuration: + +- 1 Sidekiq worker +- 900m vCPU +- 2 GB memory (request) +- 4 GB memory (limit) -[The provided starting point](#cluster-topology) allows the deployment of up to -8 Sidekiq pods. Expand available resources using the 0.9 CPU to 2 GB memory -ratio for each additional pod. +Similar to the standard deployment above, an initial target of 8 Sidekiq workers has been used here. +Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). ### Supporting @@ -2406,10 +2410,14 @@ on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). -If you wish to make any additional deployments, such as for Monitoring, it's recommended +If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed specifically to accommodate several additional deployments. However, if your deployments don't fit into the -pool as given, you can increase the node pool accordingly. +pool as given, you can increase the node pool accordingly. Conversely, if the pool in your use case is over-provisioned you can reduce accordingly. + +### Example config file + +An example for the GitLab Helm Charts for the above 60 RPS or 3,000 reference architecture configuration [can be found in the Charts project](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/3k.yaml).
diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index e9ec5832673c7c..87bf696c51625d 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -60,7 +60,7 @@ specifically the [Before you start](index.md#before-you-start) and [Deciding whi NOTE: -For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. +For all PaaS solutions that involve configuring instances, it's recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. ```plantuml @startuml 50k @@ -2294,10 +2294,19 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements | Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | 15.5 vCPU, 50 GB memory | | Supporting services | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | 7.75 vCPU, 25 GB memory | +| Component Node Group | Target Node Pool Totals | GCP Example | AWS Example | +|----------------------|-------------------------|-----------------|--------------| +| Webservice | 308 vCPU
385 GB memory (request)
539 GB memory (limit) | 11 x `n1-standard-32` | 11 x `c5.9xlarge` | +| Sidekiq | 12.6 vCPU
28 GB memory (request)
56 GB memory (limit) | 4 x `n1-standard-4` | 4 x `m5.xlarge` | +| Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-4` | 2 x `m5.xlarge` | + - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) - [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. -- Nodes configuration is shown as it is forced to ensure pod vCPU / memory ratios and avoid scaling during **performance testing**. - - In production deployments, there is no need to assign pods to specific nodes. A minimum of three nodes per node group in three different availability zones is strongly recommended to align with resilient cloud architecture practices. +[Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. +- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): @@ -2331,7 +2340,7 @@ services where applicable): NOTE: -For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. +For all PaaS solutions that involve configuring instances, it's recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. ```plantuml @startuml 50k @@ -2341,7 +2350,7 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x16" as gitlab #32CD32 + collections "**Webservice** x11" as gitlab #32CD32 collections "**Sidekiq** x4" as sidekiq #ff8dd1 } @@ -2403,36 +2412,37 @@ consul .[#e76a9b]--> redis @enduml ``` -### Resource usage settings +### Kubernetes component targets -The following formulas help when calculating how many pods may be deployed within resource constraints. -The [50k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/50k.yaml) -documents how to apply the calculated configuration to the Helm Chart. +The following section details the targets used for the GitLab components deployed in Kubernetes. #### Webservice -Webservice pods typically need about 1 CPU and 1.25 GB of memory _per worker_. -Each Webservice pod consumes roughly 4 CPUs and 5 GB of memory using -the [recommended topology](#cluster-topology) because four worker processes -are created by default and each pod has other small processes running. +Each Webservice pod (Puma and Workhorse) is recommended to be run with the following configuration: + +- 4 Puma Workers +- 4 vCPU +- 5 GB memory (request) +- 7 GB memory (limit) -For 1000 RPS or 50,000 users we recommend a total Puma worker count of around 320. -With the [provided recommendations](#cluster-topology) this allows the deployment of up to 80 -Webservice pods with 4 workers per pod and 5 pods per node. Expand available resources using -the ratio of 1 CPU to 1.25 GB of memory _per each worker process_ for each additional -Webservice pod. +For 500 RPS or 25,000 users we recommend a total Puma worker count of around 308 so in turn it's recommended to run at +least 77 Webservice pods. -For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). #### Sidekiq -Sidekiq pods should generally have 0.9 CPU and 2 GB of memory. +Each Sidekiq pod is recommended to be run with the following configuration: + +- 1 Sidekiq worker +- 900m vCPU +- 2 GB memory (request) +- 4 GB memory (limit) -[The provided starting point](#cluster-topology) allows the deployment of up to -14 Sidekiq pods. Expand available resources using the 0.9 CPU to 2 GB memory -ratio for each additional pod. +Similar to the standard deployment above, an initial target of 8 Sidekiq workers has been used here. +Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). ### Supporting @@ -2442,10 +2452,14 @@ on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). -If you wish to make any additional deployments, such as for Monitoring, it's recommended +If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed specifically to accommodate several additional deployments. However, if your deployments don't fit into the -pool as given, you can increase the node pool accordingly. +pool as given, you can increase the node pool accordingly. Conversely, if the pool in your use case is over-provisioned you can reduce accordingly. + +### Example config file + +An example for the GitLab Helm Charts targetting the above 1000 RPS or 50,000 reference architecture configuration [can be found in the Charts project](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/50k.yaml).
diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 71bc40cd9ae3e9..dc5578d84a710d 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -59,7 +59,7 @@ specifically the [Before you start](index.md#before-you-start) and [Deciding whi NOTE: -For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. +For all PaaS solutions that involve configuring instances, it's recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. ```plantuml @startuml 5k @@ -2231,16 +2231,19 @@ as the typical environment above. First are the components that run in Kubernetes. These run across several node groups, although you can change the overall makeup as desired as long as the minimum CPU and Memory requirements are observed. -| Service Node Group | Nodes | Configuration | GCP | AWS | Min Allocatable CPUs and Memory | -|-------------------- |-------|-------------------------|-----------------|--------------|---------------------------------| -| Webservice | 5 | 16 vCPU, 14.4 GB memory | `n1-highcpu-16` | `c5.4xlarge` | 79.5 vCPU, 62 GB memory | -| Sidekiq | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | 11.8 vCPU, 38.9 GB memory | -| Supporting services | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | 3.9 vCPU, 11.8 GB memory | +| Component Node Group | Target Node Pool Totals | GCP Example | AWS Example | +|----------------------|-------------------------|-----------------|--------------| +| Webservice | 36 vCPU
45 GB memory (request)
63 GB memory (limit) | 3 x `n1-standard-16` | 3 x `c5.4xlarge` | +| Sidekiq | 7.2 vCPU
16 GB memory (request)
32 GB memory (limit) | 3 x `n1-standard-4` | 2 x `m5.xlarge` | +| Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-2` | 2 x `m5.large` | - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) - [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. -- Nodes configuration is shown as it is forced to ensure pod vCPU / memory ratios and avoid scaling during **performance testing**. - - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. +[Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. +- GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): @@ -2273,7 +2276,7 @@ services where applicable): NOTE: -For all PaaS solutions that involve configuring instances, it is strongly recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. +For all PaaS solutions that involve configuring instances, it's recommended to implement a minimum of three nodes in three different availability zones to align with resilient cloud architecture practices. ```plantuml @startuml 5k @@ -2283,7 +2286,7 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x5" as gitlab #32CD32 + collections "**Webservice** x3" as gitlab #32CD32 collections "**Sidekiq** x3" as sidekiq #ff8dd1 } @@ -2342,36 +2345,37 @@ consul .[#e76a9b]--> redis @enduml ``` -### Resource usage settings +### Kubernetes component targets -The following formulas help when calculating how many pods may be deployed within resource constraints. -The [5k reference architecture example values file](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/5k.yaml) -documents how to apply the calculated configuration to the Helm Chart. +The following section details the targets used for the GitLab components deployed in Kubernetes. #### Webservice -Webservice pods typically need about 1 CPU and 1.25 GB of memory _per worker_. -Each Webservice pod consumes roughly 4 CPUs and 5 GB of memory using -the [recommended topology](#cluster-topology) because four worker processes -are created by default and each pod has other small processes running. +Each Webservice pod (Puma and Workhorse) is recommended to be run with the following configuration: -For 100 RPS or 5,000 users we recommend a total Puma worker count of around 40. -With the [provided recommendations](#cluster-topology) this allows the deployment of up to 10 -Webservice pods with 4 workers per pod and 2 pods per node. Expand available resources using -the ratio of 1 CPU to 1.25 GB of memory _per each worker process_ for each additional -Webservice pod. +- 4 Puma Workers +- 4 vCPU +- 5 GB memory (request) +- 7 GB memory (limit) -For further information on resource usage, see the [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For 100 RPS or 5,000 users we recommend a total Puma worker count of around 36 so in turn it's recommended to run at +least 9 Webservice pods. + +For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). #### Sidekiq -Sidekiq pods should generally have 0.9 CPU and 2 GB of memory. +Each Sidekiq pod is recommended to be run with the following configuration: + +- 1 Sidekiq worker +- 900m vCPU +- 2 GB memory (request) +- 4 GB memory (limit) -[The provided starting point](#cluster-topology) allows the deployment of up to -8 Sidekiq pods. Expand available resources using the 0.9 CPU to 2 GB memory -ratio for each additional pod. +Similar to the standard deployment above, an initial target of 8 Sidekiq workers has been used here. +Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). ### Supporting @@ -2381,10 +2385,14 @@ on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). -If you wish to make any additional deployments, such as for Monitoring, it's recommended +If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed specifically to accommodate several additional deployments. However, if your deployments don't fit into the -pool as given, you can increase the node pool accordingly. +pool as given, you can increase the node pool accordingly. Conversely, if the pool in your use case is over-provisioned you can reduce accordingly. + +### Example config file + +An example for the GitLab Helm Charts targetting the above 100 RPS or 5,000 reference architecture configuration [can be found in the Charts project](https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/examples/ref/5k.yaml).
diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index b8a705bb5388ba..9d1e6829337805 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -386,7 +386,7 @@ Additionally, the following cloud provider services are recommended for use as p Database - 🟢   Cloud SQL + 🟢   Cloud SQL1 🟢   RDS 🟢   Azure Database for PostgreSQL Flexible Server @@ -401,6 +401,12 @@ Additionally, the following cloud provider services are recommended for use as p + + +1. The [Enterprise Plus edition](https://cloud.google.com/sql/docs/editions-intro) for GCP Cloud SQL is generally recommended for optimal performance. This recommendation is especially so for larger environments (500 RPS / 25k users or higher). Max connections may need to be adjusted higher than the service's defaults depending on workload. +2. It's strongly recommended deploying the [Premium tier of Azure Cache for Redis](https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-overview#service-tiers) to ensure good performance. + + ### Recommendation notes for the database services [When selecting to use an external database service](../postgresql/external.md), it should run a standard, performant, and [supported version](../../install/requirements.md#postgresql-requirements). @@ -409,9 +415,9 @@ If you choose to use a third party external service: 1. Note that the HA Linux package PostgreSQL setup encompasses PostgreSQL, PgBouncer and Consul. All of these components would no longer be required when using a third party external service. 1. The number of nodes required to achieve HA may differ depending on the service compared to the Linux package and doesn't need to match accordingly. -1. However, if [Database Load Balancing](../postgresql/database_load_balancing.md) via Read Replicas is desired for further improved performance it's recommended to follow the node count for the Reference Architecture. +1. It's recommended in general to enable Read Replicas for [Database Load Balancing](../postgresql/database_load_balancing.md) if possible, matching the node counts for the standard Linux package deployment. This recommendation is especially so for larger environments (over 200 RPS / 10k users). 1. Ensure that if a pooler is offered as part of the service that it can handle the total load without bottlenecking. - For example, Azure Database for PostgreSQL Flexible Server can optionally deploy a PgBouncer pooler in front of the Database, but PgBouncer is single threaded, so this in turn may cause bottlenecking. However, if using Database Load Balancing, this could be enabled on each node in distributed fashion to compensate. +For example, Azure Database for PostgreSQL Flexible Server can optionally deploy a PgBouncer pooler in front of the Database, but PgBouncer is single threaded, so this in turn may cause bottlenecking. However, if using Database Load Balancing, this could be enabled on each node in distributed fashion to compensate. 1. If [GitLab Geo](../geo/index.md) is to be used the service will need to support Cross Region replication. ### Recommendation notes for the Redis services -- GitLab From f67133cd34dfe0e532e6f3402f8d7f02479ce562 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Wed, 17 Apr 2024 17:07:32 +0100 Subject: [PATCH 02/20] Add history line --- doc/administration/reference_architectures/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index 9d1e6829337805..7d362ef373959d 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -769,6 +769,7 @@ You can find a full history of changes [on the GitLab project](https://gitlab.co **2024:** +- [2024-04](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/149878): Updated recommended sizings for Webservice nodes for Cloud Native Hybrids on GCP. - [2024-04](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/149528): Updated 20 RPS / 1,000 User architecture specs to follow recommended memory target of 16 GB. - [2024-04](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/148313): Updated Reference Architecture titles to include RPS for further clarity and to help right sizing. - [2024-02](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/145436): Updated recommended sizings for Load Balancer nodes if deployed on VMs. Also added notes on network bandwidth considerations. -- GitLab From 6f519749b8fd8fbbafbc9a3b905178f3be3dc33a Mon Sep 17 00:00:00 2001 From: Grant Young Date: Thu, 18 Apr 2024 13:12:29 +0100 Subject: [PATCH 03/20] Remove old table --- doc/administration/reference_architectures/50k_users.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index 87bf696c51625d..ca5a452c2dca5f 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -2288,12 +2288,6 @@ as the typical environment above. First are the components that run in Kubernetes. These run across several node groups, although you can change the overall makeup as desired as long as the minimum CPU and Memory requirements are observed. -| Service Node Group | Nodes | Configuration | GCP | AWS | Min Allocatable CPUs and Memory | -|---------------------|-------|-------------------------|-----------------|--------------|---------------------------------| -| Webservice | 16 | 32 vCPU, 28.8 GB memory | `n1-highcpu-32` | `c5.9xlarge` | 510 vCPU, 472 GB memory | -| Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | 15.5 vCPU, 50 GB memory | -| Supporting services | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | 7.75 vCPU, 25 GB memory | - | Component Node Group | Target Node Pool Totals | GCP Example | AWS Example | |----------------------|-------------------------|-----------------|--------------| | Webservice | 308 vCPU
385 GB memory (request)
539 GB memory (limit) | 11 x `n1-standard-32` | 11 x `c5.9xlarge` | -- GitLab From 881bc7a410c3272ae846acd3676b74758692c84e Mon Sep 17 00:00:00 2001 From: Grant Young Date: Thu, 18 Apr 2024 13:14:28 +0100 Subject: [PATCH 04/20] Align numbers for 3k and 5k --- doc/administration/reference_architectures/3k_users.md | 2 +- doc/administration/reference_architectures/5k_users.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 5c1e664e047dce..2423a7f7f2805e 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -2259,7 +2259,7 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements | Component Node Group | Target Node Pool Totals | GCP Example | AWS Example | |----------------------|-------------------------|-----------------|--------------| | Webservice | 16 vCPU
20 GB memory (request)
28 GB memory (limit) | 3 x `n1-standard-8` | 3 x `c5.2xlarge` | -| Sidekiq | 7.2 vCPU
16 GB memory (request)
32 GB memory (limit) | 3 x `n1-standard-4` | 2 x `m5.xlarge` | +| Sidekiq | 7.2 vCPU
16 GB memory (request)
32 GB memory (limit) | 3 x `n1-standard-4` | 3 x `m5.xlarge` | | Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-2` | 2 x `m5.large` | - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index dc5578d84a710d..e7528e045ae047 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -2234,7 +2234,7 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements | Component Node Group | Target Node Pool Totals | GCP Example | AWS Example | |----------------------|-------------------------|-----------------|--------------| | Webservice | 36 vCPU
45 GB memory (request)
63 GB memory (limit) | 3 x `n1-standard-16` | 3 x `c5.4xlarge` | -| Sidekiq | 7.2 vCPU
16 GB memory (request)
32 GB memory (limit) | 3 x `n1-standard-4` | 2 x `m5.xlarge` | +| Sidekiq | 7.2 vCPU
16 GB memory (request)
32 GB memory (limit) | 3 x `n1-standard-4` | 3 x `m5.xlarge` | | Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-2` | 2 x `m5.large` | - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) -- GitLab From c7eaacf86ee8ea3244eeffd98b88ec07294dcc01 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Thu, 18 Apr 2024 13:15:19 +0100 Subject: [PATCH 05/20] Adjust text for autoscaling --- doc/administration/reference_architectures/10k_users.md | 2 +- doc/administration/reference_architectures/25k_users.md | 2 +- doc/administration/reference_architectures/2k_users.md | 2 +- doc/administration/reference_architectures/3k_users.md | 2 +- doc/administration/reference_architectures/50k_users.md | 2 +- doc/administration/reference_architectures/5k_users.md | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index dab3e149b2f8e5..5a07d8cece893d 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -2280,7 +2280,7 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. - The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. -- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index f4dbf6480d4aec..7f182aae67c441 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -2286,7 +2286,7 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. - The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. -- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index b7d5e20c1ec17d..e6b4f1c581d933 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -1130,7 +1130,7 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. - The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. -- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 2423a7f7f2805e..1ea662b1e25351 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -2268,7 +2268,7 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. - The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. -- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index ca5a452c2dca5f..32454c68d7803b 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -2300,7 +2300,7 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. - The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. -- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index e7528e045ae047..56e53491c0780d 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -2243,7 +2243,7 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. - The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. -- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% to ensure ongoing performance. +- Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. Next are the backend components that run on static compute VMs using the Linux package (or External PaaS services where applicable): -- GitLab From b2cdc024de375d01c06b54feef2ff85bde2e5d25 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Thu, 18 Apr 2024 15:45:02 +0100 Subject: [PATCH 06/20] Correct 3k sizings --- doc/administration/reference_architectures/3k_users.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 1ea662b1e25351..ebc29f239fd0af 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -2258,7 +2258,7 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements | Component Node Group | Target Node Pool Totals | GCP Example | AWS Example | |----------------------|-------------------------|-----------------|--------------| -| Webservice | 16 vCPU
20 GB memory (request)
28 GB memory (limit) | 3 x `n1-standard-8` | 3 x `c5.2xlarge` | +| Webservice | 16 vCPU
20 GB memory (request)
28 GB memory (limit) | 2 x `n1-standard-16` | 2 x `c5.4xlarge` | | Sidekiq | 7.2 vCPU
16 GB memory (request)
32 GB memory (limit) | 3 x `n1-standard-4` | 3 x `m5.xlarge` | | Supporting services | 4 vCPU
15 GB memory | 2 x `n1-standard-2` | 2 x `m5.large` | @@ -2311,7 +2311,7 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x3" as gitlab #32CD32 + collections "**Webservice** x2" as gitlab #32CD32 collections "**Sidekiq** x3" as sidekiq #ff8dd1 } -- GitLab From 61916abb8e0d5da82216ba97f9341c4edf600670 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Thu, 18 Apr 2024 16:08:53 +0100 Subject: [PATCH 07/20] Adjust RA CNH Diagrams --- doc/administration/reference_architectures/10k_users.md | 6 +++--- doc/administration/reference_architectures/25k_users.md | 6 +++--- doc/administration/reference_architectures/2k_users.md | 6 +++--- doc/administration/reference_architectures/3k_users.md | 6 +++--- doc/administration/reference_architectures/50k_users.md | 6 +++--- doc/administration/reference_architectures/5k_users.md | 6 +++--- 6 files changed, 18 insertions(+), 18 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index 5a07d8cece893d..d7db11aa56efe5 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -2325,11 +2325,11 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x3" as gitlab #32CD32 - collections "**Sidekiq** x4" as sidekiq #ff8dd1 + collections "**Webservice**" as gitlab #32CD32 + collections "**Sidekiq**" as sidekiq #ff8dd1 } - card "**Supporting Services** x2" as support + card "**Supporting Services**" as support } card "**Internal Load Balancer**" as ilb #9370DB diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 7f182aae67c441..a6680b94d395d6 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -2330,11 +2330,11 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x5" as gitlab #32CD32 - collections "**Sidekiq** x4" as sidekiq #ff8dd1 + collections "**Webservice**" as gitlab #32CD32 + collections "**Sidekiq**" as sidekiq #ff8dd1 } - card "**Supporting Services** x2" as support + card "**Supporting Services**" as support } card "**Internal Load Balancer**" as ilb #9370DB diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index e6b4f1c581d933..553ced4afb14a6 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -1162,11 +1162,11 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x3" as gitlab #32CD32 - collections "**Sidekiq** x2" as sidekiq #ff8dd1 + collections "**Webservice**" as gitlab #32CD32 + collections "**Sidekiq**" as sidekiq #ff8dd1 } - collections "**Supporting Services** x2" as support + collections "**Supporting Services**" as support } card "**Gitaly**" as gitaly #FF8C00 diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index ebc29f239fd0af..4512a2b680ade1 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -2311,11 +2311,11 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x2" as gitlab #32CD32 - collections "**Sidekiq** x3" as sidekiq #ff8dd1 + collections "**Webservice**" as gitlab #32CD32 + collections "**Sidekiq**" as sidekiq #ff8dd1 } - card "**Supporting Services** x2" as support + card "**Supporting Services**" as support } card "**Internal Load Balancer**" as ilb #9370DB diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index 32454c68d7803b..b84b56c1774640 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -2344,11 +2344,11 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x11" as gitlab #32CD32 - collections "**Sidekiq** x4" as sidekiq #ff8dd1 + collections "**Webservice**" as gitlab #32CD32 + collections "**Sidekiq**" as sidekiq #ff8dd1 } - card "**Supporting Services** x2" as support + card "**Supporting Services**" as support } card "**Internal Load Balancer**" as ilb #9370DB diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 56e53491c0780d..12644e6141e26f 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -2286,11 +2286,11 @@ card "Kubernetes via Helm Charts" as kubernetes { card "**External Load Balancer**" as elb #6a9be7 together { - collections "**Webservice** x3" as gitlab #32CD32 - collections "**Sidekiq** x3" as sidekiq #ff8dd1 + collections "**Webservice**" as gitlab #32CD32 + collections "**Sidekiq**" as sidekiq #ff8dd1 } - card "**Supporting Services** x2" as support + card "**Supporting Services**" as support } card "**Internal Load Balancer**" as ilb #9370DB -- GitLab From d7f0f752285af8d0aa3dffca3cf2505168741c86 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Thu, 18 Apr 2024 16:12:56 +0100 Subject: [PATCH 08/20] Adjust text --- doc/administration/reference_architectures/10k_users.md | 2 +- doc/administration/reference_architectures/25k_users.md | 2 +- doc/administration/reference_architectures/2k_users.md | 2 +- doc/administration/reference_architectures/3k_users.md | 2 +- doc/administration/reference_architectures/50k_users.md | 2 +- doc/administration/reference_architectures/5k_users.md | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index d7db11aa56efe5..751c732ecc022c 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -2425,7 +2425,7 @@ The Supporting Node Pool is designed to house all supporting deployments that do on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting -GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). +GitLab deployments such as [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index a6680b94d395d6..07c12a409658ca 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -2430,7 +2430,7 @@ The Supporting Node Pool is designed to house all supporting deployments that do on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting -GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). +GitLab deployments such as [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 553ced4afb14a6..4a9b3281326ac7 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -1227,7 +1227,7 @@ The Supporting Node Pool is designed to house all supporting deployments that do on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting -GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). +GitLab deployments such as [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 4512a2b680ade1..74ff0d61622574 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -2408,7 +2408,7 @@ The Supporting Node Pool is designed to house all supporting deployments that do on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting -GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). +GitLab deployments such as [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index b84b56c1774640..f08699c2ba2ac0 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -2444,7 +2444,7 @@ The Supporting Node Pool is designed to house all supporting deployments that do on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting -GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). +GitLab deployments such as [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 12644e6141e26f..86364d06afb61d 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -2383,7 +2383,7 @@ The Supporting Node Pool is designed to house all supporting deployments that do on the Webservice and Sidekiq pools. This includes various deployments related to the Cloud Provider's implementation and supporting -GitLab deployments such as NGINX or [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). +GitLab deployments such as [GitLab Shell](https://docs.gitlab.com/charts/charts/gitlab/gitlab-shell/). If you wish to make any additional deployments such as Container Registry, Pages or Monitoring, it's recommended to deploy these in this pool where possible and not in the Webservice or Sidekiq pools, as the Supporting pool has been designed -- GitLab From 7089b160b64c3e5c927396fef15fecae48c82888 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Fri, 19 Apr 2024 11:11:22 +0100 Subject: [PATCH 09/20] Tweak ASG guidance --- doc/administration/reference_architectures/10k_users.md | 5 +++-- doc/administration/reference_architectures/25k_users.md | 5 +++-- doc/administration/reference_architectures/2k_users.md | 3 ++- doc/administration/reference_architectures/3k_users.md | 5 +++-- doc/administration/reference_architectures/50k_users.md | 5 +++-- doc/administration/reference_architectures/5k_users.md | 5 +++-- doc/administration/reference_architectures/index.md | 6 +++--- 7 files changed, 20 insertions(+), 14 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index 751c732ecc022c..c9384c3a5f6440 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -56,8 +56,9 @@ specifically the [Before you start](index.md#before-you-start) and [Deciding whi Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. 6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health. However, if you have [large monorepos](index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required. -7. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](index.md#autoscaling-of-stateful-nodes). - However, for GitLab Rails certain processes like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) should be run on only one node. +6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](index.md#autoscaling-of-stateful-nodes). + However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components + such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes. NOTE: diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 07c12a409658ca..fa78da0c3a4510 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -56,8 +56,9 @@ specifically the [Before you start](index.md#before-you-start) and [Deciding whi Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. 6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health. However, if you have [large monorepos](index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required. -7. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](index.md#autoscaling-of-stateful-nodes). - However, for GitLab Rails certain processes like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) should be run on only one node. +6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](index.md#autoscaling-of-stateful-nodes). + However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components + such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes. NOTE: diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 4a9b3281326ac7..c4ffb7e6ef3a44 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -46,7 +46,8 @@ For a full list of reference architectures, see However, if you have large monorepos (larger than several gigabytes) this can **significantly** impact Git and Gitaly performance and an increase of specifications will likely be required. Refer to [large monorepos](index.md#large-monorepos) for more information. 6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](index.md#autoscaling-of-stateful-nodes). - However, for GitLab Rails certain processes like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) should be run on only one node. + However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components + such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes. NOTE: diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 74ff0d61622574..8cdf28df21e60e 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -54,8 +54,9 @@ For a full list of reference architectures, see Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. 1. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health. However, if you have [large monorepos](index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required. -1. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](index.md#autoscaling-of-stateful-nodes). - However, for GitLab Rails certain processes like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) should be run on only one node. +6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](index.md#autoscaling-of-stateful-nodes). + However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components + such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes. NOTE: diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index f08699c2ba2ac0..420948b0f39f94 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -55,8 +55,9 @@ specifically the [Before you start](index.md#before-you-start) and [Deciding whi Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. 6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health. However, if you have [large monorepos](index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required. -7. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](index.md#autoscaling-of-stateful-nodes). - However, for GitLab Rails certain processes like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) should be run on only one node. +6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](index.md#autoscaling-of-stateful-nodes). + However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components + such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes. NOTE: diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 86364d06afb61d..301dc3c7557570 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -54,8 +54,9 @@ specifically the [Before you start](index.md#before-you-start) and [Deciding whi Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. 6. Gitaly specifications are based on high percentiles of both usage patterns and repository sizes in good health. However, if you have [large monorepos](index.md#large-monorepos) (larger than several gigabytes) or [additional workloads](index.md#additional-workloads) these can *significantly* impact Git and Gitaly performance and further adjustments will likely be required. -7. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](index.md#autoscaling-of-stateful-nodes). - However, for GitLab Rails certain processes like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) should be run on only one node. +6. Can be placed in Auto Scaling Groups (ASGs) as the component doesn't store any [stateful data](index.md#autoscaling-of-stateful-nodes). + However, [Cloud Native Hybrid setups](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) are generally preferred as certain components + such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes. NOTE: diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index 7d362ef373959d..5d515fb313182f 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -474,12 +474,12 @@ This also applies to other third-party stateful components such as Postgres and #### Autoscaling of stateful nodes As a general guidance, only _stateless_ components of GitLab can be run in Autoscaling groups, namely GitLab Rails -and Sidekiq. - -Other components that have state, such as Gitaly, are not supported in this fashion (for more information, see [issue 2997](https://gitlab.com/gitlab-org/gitaly/-/issues/2997)). +and Sidekiq. Other components that have state, such as Gitaly, are not supported in this fashion (for more information, see [issue 2997](https://gitlab.com/gitlab-org/gitaly/-/issues/2997)). This also applies to other third-party stateful components such as Postgres and Redis, but you can explore other third-party solutions for those components if desired such as supported Cloud Provider services unless called out specifically as unsupported. +However, [Cloud Native Hybrid setups](#cloud-native-hybrid) are generally preferred over ASGs as certain components such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes. + #### Spreading one environment over multiple data centers Deploying one GitLab environment over multiple data centers is not supported due to potential split brain edge cases -- GitLab From d4c406ca0a1ebbc17a203e87b81d11eaa4f6052a Mon Sep 17 00:00:00 2001 From: Grant Young Date: Thu, 25 Apr 2024 14:11:56 +0100 Subject: [PATCH 10/20] Fix broken link --- doc/administration/reference_architectures/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index 5d515fb313182f..bd4897b4350f54 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -478,7 +478,7 @@ and Sidekiq. Other components that have state, such as Gitaly, are not supported This also applies to other third-party stateful components such as Postgres and Redis, but you can explore other third-party solutions for those components if desired such as supported Cloud Provider services unless called out specifically as unsupported. -However, [Cloud Native Hybrid setups](#cloud-native-hybrid) are generally preferred over ASGs as certain components such as like [migrations](#gitlab-rails-post-configuration) and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes. +However, [Cloud Native Hybrid setups](#cloud-native-hybrid) are generally preferred over ASGs as certain components such as like database migrations and [Mailroom](../incoming_email.md) can only be run on one node, which is handled better in Kubernetes. #### Spreading one environment over multiple data centers -- GitLab From 6eff0799d0f746751b24e15297729e20bc3ff9e6 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Fri, 26 Apr 2024 10:37:02 +0100 Subject: [PATCH 11/20] Add new guidance on NGinx pods --- .../reference_architectures/10k_users.md | 10 ++++++++-- .../reference_architectures/25k_users.md | 10 ++++++++-- doc/administration/reference_architectures/2k_users.md | 10 ++++++++-- doc/administration/reference_architectures/3k_users.md | 10 ++++++++-- .../reference_architectures/50k_users.md | 10 ++++++++-- doc/administration/reference_architectures/5k_users.md | 10 ++++++++-- doc/administration/reference_architectures/index.md | 2 +- 7 files changed, 49 insertions(+), 13 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index c9384c3a5f6440..e3ddcc3c17b4f5 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -2404,7 +2404,13 @@ Each Webservice pod (Puma and Workhorse) is recommended to be run with the follo For 200 RPS or 10,000 users we recommend a total Puma worker count of around 80 so in turn it's recommended to run at least 20 Webservice pods. -For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). + +##### NGinx + +It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. + +Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq @@ -2418,7 +2424,7 @@ Each Sidekiq pod is recommended to be run with the following configuration: Similar to the standard deployment above, an initial target of 14 Sidekiq workers has been used here. Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on Sidekiq resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). ### Supporting diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index fa78da0c3a4510..8cd2aecd7bdb98 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -2409,7 +2409,13 @@ Each Webservice pod (Puma and Workhorse) is recommended to be run with the follo For 500 RPS or 25,000 users we recommend a total Puma worker count of around 140 so in turn it's recommended to run at least 35 Webservice pods. -For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). + +##### NGinx + +It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. + +Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq @@ -2423,7 +2429,7 @@ Each Sidekiq pod is recommended to be run with the following configuration: Similar to the standard deployment above, an initial target of 14 Sidekiq workers has been used here. Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on Sidekiq resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). ### Supporting diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index c4ffb7e6ef3a44..351c340dd55add 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -1206,7 +1206,13 @@ Each Webservice pod (Puma and Workhorse) is recommended to be run with the follo For 40 RPS or 2,000 users we recommend a total Puma worker count of around 12 so in turn it's recommended to run at least 3 Webservice pods. -For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). + +##### NGinx + +It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. + +Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq @@ -1220,7 +1226,7 @@ Each Sidekiq pod is recommended to be run with the following configuration: Similar to the standard deployment above, an initial target of 4 Sidekiq workers has been used here. Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on Sidekiq resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). ### Supporting diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 8cdf28df21e60e..2a7260d9ecc988 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -2387,7 +2387,13 @@ Each Webservice pod (Puma and Workhorse) is recommended to be run with the follo For 60 RPS or 3,000 users we recommend a total Puma worker count of around 16 so in turn it's recommended to run at least 4 Webservice pods. -For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). + +##### NGinx + +It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. + +Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq @@ -2401,7 +2407,7 @@ Each Sidekiq pod is recommended to be run with the following configuration: Similar to the standard deployment above, an initial target of 8 Sidekiq workers has been used here. Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on Sidekiq resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). ### Supporting diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index 420948b0f39f94..026c1a27d7a74b 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -2423,7 +2423,13 @@ Each Webservice pod (Puma and Workhorse) is recommended to be run with the follo For 500 RPS or 25,000 users we recommend a total Puma worker count of around 308 so in turn it's recommended to run at least 77 Webservice pods. -For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). + +##### NGinx + +It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. + +Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq @@ -2437,7 +2443,7 @@ Each Sidekiq pod is recommended to be run with the following configuration: Similar to the standard deployment above, an initial target of 8 Sidekiq workers has been used here. Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on Sidekiq resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). ### Supporting diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 301dc3c7557570..ee9cd71011bb19 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -2362,7 +2362,13 @@ Each Webservice pod (Puma and Workhorse) is recommended to be run with the follo For 100 RPS or 5,000 users we recommend a total Puma worker count of around 36 so in turn it's recommended to run at least 9 Webservice pods. -For further information on resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). +For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). + +##### NGinx + +It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. + +Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq @@ -2376,7 +2382,7 @@ Each Sidekiq pod is recommended to be run with the following configuration: Similar to the standard deployment above, an initial target of 8 Sidekiq workers has been used here. Additional workers may be required depending on your specific workflow. -For further information on resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). +For further information on Sidekiq resource usage, see the Charts documentation on [Sidekiq resources](https://docs.gitlab.com/charts/charts/gitlab/sidekiq/#resources). ### Supporting diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index bd4897b4350f54..a35ac47fde80e4 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -769,7 +769,7 @@ You can find a full history of changes [on the GitLab project](https://gitlab.co **2024:** -- [2024-04](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/149878): Updated recommended sizings for Webservice nodes for Cloud Native Hybrids on GCP. +- [2024-04](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/149878): Updated recommended sizings for Webservice nodes for Cloud Native Hybrids on GCP. Also adjusted NGinx pod recommendation to be run on Webservice node pool as a DaemonSet. - [2024-04](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/149528): Updated 20 RPS / 1,000 User architecture specs to follow recommended memory target of 16 GB. - [2024-04](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/148313): Updated Reference Architecture titles to include RPS for further clarity and to help right sizing. - [2024-02](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/145436): Updated recommended sizings for Load Balancer nodes if deployed on VMs. Also added notes on network bandwidth considerations. -- GitLab From 8a14c54ad2e5992cf17c69f2c70c8c67e7a5e926 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Fri, 26 Apr 2024 10:57:53 +0100 Subject: [PATCH 12/20] Add more context to Cost to Run section --- doc/administration/reference_architectures/index.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index a35ac47fde80e4..b65792c126510d 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -633,11 +633,16 @@ table.test-coverage th { ## Cost to run -As a starting point, the following table details initial costs for the different reference architectures across GCP, AWS, and Azure through the Linux package. +As a starting point, the following table details initial costs for the different reference architectures across GCP, AWS, and Azure through the Linux package via each cloud provider's official calculator. -NOTE: -Due to the nature of Cloud Native Hybrid, it's not possible to give a static cost calculation. -Bare-metal costs are also not included here as it varies widely depending on each configuration. +However, please be aware of the following caveats: + +- These are only rough estimates for the Linux package environments. +- They do not take into account dynamic elements such as disk, network or object storage. +- Due to the nature of Cloud Native Hybrid, it's not possible to give a static cost calculation for that deployment. +- Bare-metal costs are also not included here as it varies widely depending on each configuration. + +Due to the above it's strongly recommended taking these calculators and adjusting them as close as possible to your specific setup and usage as much as possible to get a more accurate estimate. -- GitLab From 5981d693e5aeeb4f823223c447ed8f6f925b81a6 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Fri, 26 Apr 2024 11:04:52 +0100 Subject: [PATCH 13/20] Lint fix for proper name --- doc/administration/reference_architectures/10k_users.md | 6 +++--- doc/administration/reference_architectures/25k_users.md | 6 +++--- doc/administration/reference_architectures/2k_users.md | 6 +++--- doc/administration/reference_architectures/3k_users.md | 6 +++--- doc/administration/reference_architectures/50k_users.md | 6 +++--- doc/administration/reference_architectures/5k_users.md | 6 +++--- doc/administration/reference_architectures/index.md | 2 +- 7 files changed, 19 insertions(+), 19 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index e3ddcc3c17b4f5..15894adb47e928 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -2406,11 +2406,11 @@ least 20 Webservice pods. For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). -##### NGinx +##### NGINX -It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. +It's also recommended deploying the NGINX controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. -Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. +Note that this isn't a strict requirement. The NGINX controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 8cd2aecd7bdb98..236cabebed3978 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -2411,11 +2411,11 @@ least 35 Webservice pods. For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). -##### NGinx +##### NGINX -It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. +It's also recommended deploying the NGINX controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. -Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. +Note that this isn't a strict requirement. The NGINX controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 351c340dd55add..faeb3f60a86e8d 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -1208,11 +1208,11 @@ least 3 Webservice pods. For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). -##### NGinx +##### NGINX -It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. +It's also recommended deploying the NGINX controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. -Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. +Note that this isn't a strict requirement. The NGINX controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 2a7260d9ecc988..b9e77c04ad2803 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -2389,11 +2389,11 @@ least 4 Webservice pods. For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). -##### NGinx +##### NGINX -It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. +It's also recommended deploying the NGINX controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. -Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. +Note that this isn't a strict requirement. The NGINX controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index 026c1a27d7a74b..1e05cdbaa3ef06 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -2425,11 +2425,11 @@ least 77 Webservice pods. For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). -##### NGinx +##### NGINX -It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. +It's also recommended deploying the NGINX controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. -Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. +Note that this isn't a strict requirement. The NGINX controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index ee9cd71011bb19..f509f33de008b8 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -2364,11 +2364,11 @@ least 9 Webservice pods. For further information on Webservice resource usage, see the Charts documentation on [Webservice resources](https://docs.gitlab.com/charts/charts/gitlab/webservice/#resources). -##### NGinx +##### NGINX -It's also recommended deploying the NGinx controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. +It's also recommended deploying the NGINX controller pods across the Webservice nodes as a DaemonSet. This is to allow the controllers to scale dynamically with the Webservice pods they serve as well as take advantage of the higher network bandwidth larger machine types typically have. -Note that this isn't a strict requirement. The NGinx controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. +Note that this isn't a strict requirement. The NGINX controller pods can be deployed as desired as long as they have enough resources to handle the web traffic. #### Sidekiq diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index b65792c126510d..34c3d7f6912752 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -774,7 +774,7 @@ You can find a full history of changes [on the GitLab project](https://gitlab.co **2024:** -- [2024-04](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/149878): Updated recommended sizings for Webservice nodes for Cloud Native Hybrids on GCP. Also adjusted NGinx pod recommendation to be run on Webservice node pool as a DaemonSet. +- [2024-04](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/149878): Updated recommended sizings for Webservice nodes for Cloud Native Hybrids on GCP. Also adjusted NGINX pod recommendation to be run on Webservice node pool as a DaemonSet. - [2024-04](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/149528): Updated 20 RPS / 1,000 User architecture specs to follow recommended memory target of 16 GB. - [2024-04](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/148313): Updated Reference Architecture titles to include RPS for further clarity and to help right sizing. - [2024-02](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/145436): Updated recommended sizings for Load Balancer nodes if deployed on VMs. Also added notes on network bandwidth considerations. -- GitLab From 1d8d45296bb3e27c39959305bb72a55f9c1cde43 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Tue, 30 Apr 2024 13:22:34 +0100 Subject: [PATCH 14/20] Adjust intro text --- doc/administration/reference_architectures/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index 34c3d7f6912752..ea770588dff733 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -12,7 +12,7 @@ DETAILS: **Offering:** Self-managed The GitLab Reference Architectures have been designed and tested by the -GitLab Test Platform and Support teams to provide scalable recommended deployments for target loads. +GitLab Test Platform and Support teams to provide scalable recommended deployments as starting points for target loads. ## Available reference architectures -- GitLab From 9a66005072a1eac6804cf7d8b52c3be1be061fe8 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Tue, 30 Apr 2024 16:56:35 +0100 Subject: [PATCH 15/20] Adjust section title in main doc --- .../reference_architectures/10k_users.md | 4 ++-- .../reference_architectures/1k_users.md | 2 +- .../reference_architectures/25k_users.md | 4 ++-- .../reference_architectures/2k_users.md | 2 +- .../reference_architectures/3k_users.md | 2 +- .../reference_architectures/50k_users.md | 4 ++-- .../reference_architectures/5k_users.md | 4 ++-- .../reference_architectures/index.md | 16 +++++++++------- 8 files changed, 20 insertions(+), 18 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index 15894adb47e928..a6326480c71b44 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -17,13 +17,13 @@ For a full list of reference architectures, see NOTE: Before deploying this architecture it's recommended to read through the [main documentation](index.md) first, -specifically the [Before you start](index.md#before-you-start) and [Deciding which architecture to use](index.md#deciding-which-architecture-to-use) sections. +specifically the [Before you start](index.md#before-you-start) and [Deciding which architecture to use](index.md#deciding-which-architecture-to-start-with) sections. > - **Target load:** API: 200 RPS, Web: 20 RPS, Git (Pull): 20 RPS, Git (Push): 4 RPS > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) > - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) -> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-use) +> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-start-with) | Service | Nodes | Configuration | GCP | AWS | Azure | |------------------------------------------|-------|-------------------------|------------------|----------------|-----------| diff --git a/doc/administration/reference_architectures/1k_users.md b/doc/administration/reference_architectures/1k_users.md index d86d2daaf29c8e..1628c929555238 100644 --- a/doc/administration/reference_architectures/1k_users.md +++ b/doc/administration/reference_architectures/1k_users.md @@ -21,7 +21,7 @@ For a full list of reference architectures, see > - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid:** No. For a cloud native hybrid environment, you > can follow a [modified hybrid reference architecture](#cloud-native-hybrid-reference-architecture-with-helm-charts). -> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-use). +> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-start-with). | Users | Configuration | GCP | AWS | Azure | |--------------|----------------------|----------------|--------------|----------| diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 236cabebed3978..4e4e754fb7d1a9 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -17,13 +17,13 @@ For a full list of reference architectures, see NOTE: Before deploying this architecture it's recommended to read through the [main documentation](index.md) first, -specifically the [Before you start](index.md#before-you-start) and [Deciding which architecture to use](index.md#deciding-which-architecture-to-use) sections. +specifically the [Before you start](index.md#before-you-start) and [Deciding which architecture to use](index.md#deciding-which-architecture-to-start-with) sections. > - **Target load:** API: 500 RPS, Web: 50 RPS, Git (Pull): 50 RPS, Git (Push): 10 RPS > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) > - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) -> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-use) +> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-start-with) | Service | Nodes | Configuration | GCP | AWS | Azure | |------------------------------------------|-------|-------------------------|------------------|--------------|-----------| diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index faeb3f60a86e8d..7d06e39f0abfda 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -20,7 +20,7 @@ For a full list of reference architectures, see > follow a modified [3K or 60 RPS reference architecture](3k_users.md#supported-modifications-for-lower-user-counts-ha). > - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) -> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-use). +> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-start-with). | Service | Nodes | Configuration | GCP | AWS | Azure | |------------------------------------|-------|------------------------|-----------------|--------------|----------| diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index b9e77c04ad2803..2f7c3247ce7c83 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -23,7 +23,7 @@ For a full list of reference architectures, see > - **High Availability:** Yes, although [Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution > - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) -> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-use). +> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-start-with). | Service | Nodes | Configuration | GCP | AWS | Azure | |-------------------------------------------|-------|-----------------------|-----------------|--------------|----------| diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index 1e05cdbaa3ef06..36f8a8fc0bb0f9 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -17,13 +17,13 @@ For a full list of reference architectures, see NOTE: Before deploying this architecture it's recommended to read through the [main documentation](index.md) first, -specifically the [Before you start](index.md#before-you-start) and [Deciding which architecture to use](index.md#deciding-which-architecture-to-use) sections. +specifically the [Before you start](index.md#before-you-start) and [Deciding which architecture to use](index.md#deciding-which-architecture-to-start-with) sections. > - **Target load:** API: 1000 RPS, Web: 100 RPS, Git (Pull): 100 RPS, Git (Push): 20 RPS > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) > - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) -> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-use) +> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-start-with) | Service | Nodes | Configuration | GCP | AWS | Azure | |------------------------------------------|-------|-------------------------|------------------|---------------|-----------| diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index f509f33de008b8..8656323611816a 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -17,13 +17,13 @@ For a full list of reference architectures, see NOTE: Before deploying this architecture it's recommended to read through the [main documentation](index.md) first, -specifically the [Before you start](index.md#before-you-start) and [Deciding which architecture to use](index.md#deciding-which-architecture-to-use) sections. +specifically the [Before you start](index.md#before-you-start) and [Deciding which architecture to use](index.md#deciding-which-architecture-to-start-with) sections. > - **Target load:** API: 100 RPS, Web: 10 RPS, Git (Pull): 10 RPS, Git (Push): 2 RPS > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) > - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) -> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-use) +> - **Unsure which Reference Architecture to use?** [Go to this guide for more info](index.md#deciding-which-architecture-to-start-with) | Service | Nodes | Configuration | GCP | AWS | Azure | |-------------------------------------------|-------|-------------------------|-----------------|--------------|----------| diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index ea770588dff733..dc71c839d8ec41 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -56,19 +56,19 @@ Running any application in production is complex, and the same applies for GitLa As such, it's recommended that you have a working knowledge of running and maintaining applications in production when deciding on going down this route. If you aren't in this position, our [Professional Services](https://about.gitlab.com/services/#implementation-services) team offers implementation services, but for those who want a more managed solution long term, it's recommended to instead explore our other offerings such as [GitLab SaaS](../../subscriptions/gitlab_com/index.md) or [GitLab Dedicated](../../subscriptions/gitlab_dedicated/index.md). -If Self Managed is the approach you're considering, it's strongly encouraged to read through this page in full, in particular the [Deciding which architecture to use](#deciding-which-architecture-to-use), [Large monorepos](#large-monorepos) and [Additional workloads](#additional-workloads) sections. +If Self Managed is the approach you're considering, it's strongly encouraged to read through this page in full, in particular the [Deciding which architecture to use](#deciding-which-architecture-to-start-with), [Large monorepos](#large-monorepos) and [Additional workloads](#additional-workloads) sections. -## Deciding which architecture to use +## Deciding which architecture to start with -The Reference Architectures are designed to strike a balance between two important factors--performance and resilience. +The Reference Architectures are designed to strike a balance between three important factors--performance, resilience and costs. -While they are designed to make it easier to set up GitLab at scale, it can still be a challenge to know which one meets your requirements. +While they are designed to make it easier to set up GitLab at scale, it can still be a challenge to know which one meets your requirements and where to start accordingly. As a general guide, **the more performant and/or resilient you want your environment to be, the more complex it is**. -This section explains the designs you can choose from. It begins with the least complexity, goes to the most, and ends with a decision tree. +This section explains the things to consider when picking a Reference Architecture to start with. -### Expected Load (RPS or user count) +### Expected Load The first thing to check is what the expected peak load is your environment would be expected to serve. @@ -78,8 +78,10 @@ against its listed RPS for each endpoint type (API, Web, Git), which is the typi It's strongly recommended finding out what peak RPS your environment will be expected to handle across endpoint types, through existing metrics (such as [Prometheus](../monitoring/prometheus/gitlab_metrics.md)) or estimates, and to select the corresponding architecture as this is the most objective. +#### If in doubt, pick the closest user count and scale accordingly + If it's not possible for you to find out the expected peak RPS then it's recommended to select based on user count to start and then monitor the environment -closely to confirm the RPS, whether the architecture is performing and adjust accordingly is necessary. +closely to confirm the RPS, whether the architecture is performing and [scale accordingly](#scaling-an-environment) as necessary. ### Standalone (non-HA) -- GitLab From 415630970e98e5ac92c9330e2920af2d1cb51e20 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Thu, 2 May 2024 17:28:05 +0100 Subject: [PATCH 16/20] Further tweaks and polish --- .../reference_architectures/10k_users.md | 4 ++-- .../reference_architectures/1k_users.md | 4 ++-- .../reference_architectures/25k_users.md | 4 ++-- .../reference_architectures/2k_users.md | 4 ++-- .../reference_architectures/3k_users.md | 4 ++-- .../reference_architectures/50k_users.md | 4 ++-- .../reference_architectures/5k_users.md | 4 ++-- .../reference_architectures/index.md | 18 +++++++++--------- 8 files changed, 23 insertions(+), 23 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index a6326480c71b44..eb9b9d0f42b7ad 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -10,7 +10,7 @@ DETAILS: **Tier:** Premium, Ultimate **Offering:** Self-managed -This page describes the GitLab reference architecture designed to target a peak load of 200 requests per second (RPS), the typical peak load of up to 10,000 users, both manual and automated, based on real data with headroom added. +This page describes the GitLab reference architecture designed to target a peak load of 200 requests per second (RPS), the typical peak load of up to 10,000 users, both manual and automated, based on real data. For a full list of reference architectures, see [Available reference architectures](index.md#available-reference-architectures). @@ -166,7 +166,7 @@ against the following endpoint throughput targets: - Git (Push): 4 RPS The above targets were selected based on real customer data of total environmental loads corresponding to the user count, -including CI and other workloads along with additional substantial headroom added. +including CI and other workloads. If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](index.md#large-monorepos) or notable [additional workloads](index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](index.md#scaling-an-environment). diff --git a/doc/administration/reference_architectures/1k_users.md b/doc/administration/reference_architectures/1k_users.md index 1628c929555238..f3380347bf5dd9 100644 --- a/doc/administration/reference_architectures/1k_users.md +++ b/doc/administration/reference_architectures/1k_users.md @@ -10,7 +10,7 @@ DETAILS: **Tier:** Free, Premium, Ultimate **Offering:** Self-managed -This page describes the GitLab reference architecture designed to target a peak load of 20 requests per second (RPS), the typical peak load of up to 1,000 users, both manual and automated, based on real data with headroom added. +This page describes the GitLab reference architecture designed to target a peak load of 20 requests per second (RPS), the typical peak load of up to 1,000 users, both manual and automated, based on real data. For a full list of reference architectures, see [Available reference architectures](index.md#available-reference-architectures). @@ -89,7 +89,7 @@ against the following endpoint throughput targets: - Git (Push): 1 RPS The above targets were selected based on real customer data of total environmental loads corresponding to the user count, -including CI and other workloads along with additional substantial headroom added. +including CI and other workloads. If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](index.md#large-monorepos) or notable [additional workloads](index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](index.md#scaling-an-environment). diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 4e4e754fb7d1a9..a350a7ffdf568d 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -10,7 +10,7 @@ DETAILS: **Tier:** Premium, Ultimate **Offering:** Self-managed -This page describes the GitLab reference architecture designed to target a peak load of 500 requests per second (RPS) - The typical peak load of up to 25,000 users, both manual and automated, based on real data with headroom added. +This page describes the GitLab reference architecture designed to target a peak load of 500 requests per second (RPS) - The typical peak load of up to 25,000 users, both manual and automated, based on real data. For a full list of reference architectures, see [Available reference architectures](index.md#available-reference-architectures). @@ -166,7 +166,7 @@ against the following endpoint throughput targets: - Git (Push): 10 RPS The above targets were selected based on real customer data of total environmental loads corresponding to the user count, -including CI and other workloads along with additional substantial headroom added. +including CI and other workloads. If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](index.md#large-monorepos) or notable [additional workloads](index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](index.md#scaling-an-environment). diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 7d06e39f0abfda..b1a8ded88e43c2 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -10,7 +10,7 @@ DETAILS: **Tier:** Free, Premium, Ultimate **Offering:** Self-managed -This page describes the GitLab reference architecture designed to target a peak load of 40 requests per second (RPS), the typical peak load of up to 2,000 users, both manual and automated, based on real data with headroom added. +This page describes the GitLab reference architecture designed to target a peak load of 40 requests per second (RPS), the typical peak load of up to 2,000 users, both manual and automated, based on real data. For a full list of reference architectures, see [Available reference architectures](index.md#available-reference-architectures). @@ -109,7 +109,7 @@ against the following endpoint throughput targets: - Git (Push): 1 RPS The above targets were selected based on real customer data of total environmental loads corresponding to the user count, -including CI and other workloads along with additional substantial headroom added. +including CI and other workloads. If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](index.md#large-monorepos) or notable [additional workloads](index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](index.md#scaling-an-environment). diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 2f7c3247ce7c83..5ac125771befa0 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -10,7 +10,7 @@ DETAILS: **Tier:** Premium, Ultimate **Offering:** Self-managed -This page describes the GitLab reference architecture designed to target a peak load of 60 requests per second (RPS), the typical peak load of up to 3,000 users, both manual and automated, based on real data with headroom added. +This page describes the GitLab reference architecture designed to target a peak load of 60 requests per second (RPS), the typical peak load of up to 3,000 users, both manual and automated, based on real data. This architecture is the smallest one available with HA built in. If you require HA but have a lower user count or total load the [Supported Modifications for lower user counts](#supported-modifications-for-lower-user-counts-ha) @@ -161,7 +161,7 @@ against the following endpoint throughput targets: - Git (Push): 1 RPS The above targets were selected based on real customer data of total environmental loads corresponding to the user count, -including CI and other workloads along with additional substantial headroom added. +including CI and other workloads. If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](index.md#large-monorepos) or notable [additional workloads](index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](index.md#scaling-an-environment). diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index 36f8a8fc0bb0f9..f07456564d22da 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -10,7 +10,7 @@ DETAILS: **Tier:** Premium, Ultimate **Offering:** Self-managed -This page describes the GitLab reference architecture designed to target a peak load of 1000 requests per second (RPS), the typical peak load of up to 50,000 users, both manual and automated, based on real data with headroom added. +This page describes the GitLab reference architecture designed to target a peak load of 1000 requests per second (RPS), the typical peak load of up to 50,000 users, both manual and automated, based on real data. For a full list of reference architectures, see [Available reference architectures](index.md#available-reference-architectures). @@ -165,7 +165,7 @@ against the following endpoint throughput targets: - Git (Push): 20 RPS The above targets were selected based on real customer data of total environmental loads corresponding to the user count, -including CI and other workloads along with additional substantial headroom added. +including CI and other workloads. If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](index.md#large-monorepos) or notable [additional workloads](index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](index.md#scaling-an-environment). diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 8656323611816a..13e8d0b81a4db3 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -10,7 +10,7 @@ DETAILS: **Tier:** Premium, Ultimate **Offering:** Self-managed -This page describes the GitLab reference architecture designed to target a peak load of 100 requests per second (RPS) - The typical peak load of up to 5,000 users, both manual and automated, based on real data with headroom added. +This page describes the GitLab reference architecture designed to target a peak load of 100 requests per second (RPS) - The typical peak load of up to 5,000 users, both manual and automated, based on real data. For a full list of reference architectures, see [Available reference architectures](index.md#available-reference-architectures). @@ -161,7 +161,7 @@ against the following endpoint throughput targets: - Git (Push): 2 RPS The above targets were selected based on real customer data of total environmental loads corresponding to the user count, -including CI and other workloads along with additional substantial headroom added. +including CI and other workloads. If you have metrics to suggest that you have regularly higher throughput against the above endpoint targets, [large monorepos](index.md#large-monorepos) or notable [additional workloads](index.md#additional-workloads) these can notably impact the performance environment and [further adjustments may be required](index.md#scaling-an-environment). diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index dc71c839d8ec41..8d949cfbf48b40 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -12,16 +12,16 @@ DETAILS: **Offering:** Self-managed The GitLab Reference Architectures have been designed and tested by the -GitLab Test Platform and Support teams to provide scalable recommended deployments as starting points for target loads. +GitLab Test Platform and Support teams to provide recommend scalable and elastic deployments as starting points for target loads. ## Available reference architectures The following Reference Architectures are available as recommended starting points for your environment. -The architectures are named in terms of peak load, based on user count or Requests per Second (RPS). Where the latter has been calculated based on average real data of the former with headroom added. +The architectures are named in terms of peak load, based on user count or Requests per Second (RPS). Where the latter has been calculated based on average real data. NOTE: -Each architecture has been designed to be [scalable and can be adjusted accordingly if required](#scaling-an-environment) by your specific workload. This may be likely in known heavy scenarios such as using [large monorepos](#large-monorepos) or notable [additional workloads](#additional-workloads). +Each architecture has been designed to be [scalable and elastic](#scaling-an-environment). As such, they can be adjusted accordingly if required by your specific workload. This may be likely in known heavy scenarios such as using [large monorepos](#large-monorepos) or notable [additional workloads](#additional-workloads). For details about what each Reference Architecture has been tested against, see the "Testing Methodology" section of each page. @@ -73,7 +73,7 @@ This section explains the things to consider when picking a Reference Architectu The first thing to check is what the expected peak load is your environment would be expected to serve. Each architecture is described in terms of peak Requests per Second (RPS) or user count load. As detailed under the "Testing Methodology" section on each page, each architecture is tested -against its listed RPS for each endpoint type (API, Web, Git), which is the typical peak load of the given user count, both manual and automated, with headroom. +against its listed RPS for each endpoint type (API, Web, Git), which is the typical peak load of the given user count, both manual and automated. It's strongly recommended finding out what peak RPS your environment will be expected to handle across endpoint types, through existing metrics (such as [Prometheus](../monitoring/prometheus/gitlab_metrics.md)) or estimates, and to select the corresponding architecture as this is the most objective. @@ -269,7 +269,7 @@ the following guidance is followed to ensure the best chance of good performance ### Additional workloads These reference architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab -setups with good headroom in mind to cover most scenarios. +setups based on real data. However, additional workloads can multiply the impact of operations by triggering follow-up actions. You may need to adjust the suggested specifications to compensate if you use, for example: @@ -309,12 +309,12 @@ We don’t recommend the use of round-robin algorithms as they are known to not The total network bandwidth available to a load balancer when deployed on a machine can vary notably across Cloud Providers. In particular some Cloud Providers, like [AWS](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html), may operate on a burst system with credits to determine the bandwidth at any time. -The network bandwidth your environment's load balancers will require is dependent on numerous factors such as data shape and workload. The recommended base sizes for each Reference Architecture class have been selected to give a good level of bandwidth with adequate headroom but in some scenarios, such as consistent clones of [large monorepos](#large-monorepos), the sizes may need to be adjusted accordingly. +The network bandwidth your environment's load balancers will require is dependent on numerous factors such as data shape and workload. The recommended base sizes for each Reference Architecture class have been selected based on read data but in some scenarios, such as consistent clones of [large monorepos](#large-monorepos), the sizes may need to be adjusted accordingly. ### No swap Swap is not recommended in the reference architectures. It's a failsafe that impacts performance greatly. The -reference architectures are designed to have memory headroom to avoid needing swap. +reference architectures are designed to have enough memory in most cases to avoid needing swap. ### Praefect PostgreSQL @@ -531,7 +531,7 @@ per 1,000 users: - Git (Pull): 2 RPS - Git (Push): 0.4 RPS (rounded to the nearest integer) -The above RPS targets were selected based on real customer data of total environmental loads corresponding to the user count, including CI and other workloads along with additional substantial headroom added. +The above RPS targets were selected based on real customer data of total environmental loads corresponding to the user count, including CI and other workloads. ### How to interpret the results @@ -743,7 +743,7 @@ You should take an iterative approach when scaling downwards, however, to ensure In some cases scaling a component significantly may result in knock on effects for downstream components, impacting performance. The Reference Architectures were designed with balance in mind to ensure components that depend on each other are congruent in terms of specs. As such you may find when notably scaling a component that it's increase may result in additional throughput being passed to the other components it depends on and that they, in turn, may need to be scaled as well. NOTE: -As a general rule most components have good headroom to accommodate an upstream component being scaled, so this is typically on a case by case basis and specific to what has been changed. It's recommended for you to reach out to our [Support team](https://about.gitlab.com/support/) before you make any significant changes to the environment. +The Reference Architectures have been designed to have a decent level elasticity to accommodate an upstream component being scaled. However, it's still generally recommended for you to reach out to our [Support team](https://about.gitlab.com/support/) before you make any significant changes to the environment to be safe. The following components can impact others when they have been significantly scaled: -- GitLab From fe6ae13a5debfc8374484fd58df4b79480dc1ac7 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Thu, 2 May 2024 16:44:51 +0000 Subject: [PATCH 17/20] Apply 2 suggestion(s) to 1 file(s) Co-authored-by: Kassandra Svoboda --- doc/administration/reference_architectures/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index 8d949cfbf48b40..ee1ffcd4f6ec7c 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -309,7 +309,7 @@ We don’t recommend the use of round-robin algorithms as they are known to not The total network bandwidth available to a load balancer when deployed on a machine can vary notably across Cloud Providers. In particular some Cloud Providers, like [AWS](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html), may operate on a burst system with credits to determine the bandwidth at any time. -The network bandwidth your environment's load balancers will require is dependent on numerous factors such as data shape and workload. The recommended base sizes for each Reference Architecture class have been selected based on read data but in some scenarios, such as consistent clones of [large monorepos](#large-monorepos), the sizes may need to be adjusted accordingly. +The network bandwidth your environment's load balancers will require is dependent on numerous factors such as data shape and workload. The recommended base sizes for each Reference Architecture class have been selected based on real data but in some scenarios, such as consistent clones of [large monorepos](#large-monorepos), the sizes may need to be adjusted accordingly. ### No swap @@ -743,7 +743,7 @@ You should take an iterative approach when scaling downwards, however, to ensure In some cases scaling a component significantly may result in knock on effects for downstream components, impacting performance. The Reference Architectures were designed with balance in mind to ensure components that depend on each other are congruent in terms of specs. As such you may find when notably scaling a component that it's increase may result in additional throughput being passed to the other components it depends on and that they, in turn, may need to be scaled as well. NOTE: -The Reference Architectures have been designed to have a decent level elasticity to accommodate an upstream component being scaled. However, it's still generally recommended for you to reach out to our [Support team](https://about.gitlab.com/support/) before you make any significant changes to the environment to be safe. +The Reference Architectures have been designed to have elasticity to accommodate an upstream component being scaled. However, it's still generally recommended for you to reach out to our [Support team](https://about.gitlab.com/support/) before you make any significant changes to the environment to be safe. The following components can impact others when they have been significantly scaled: -- GitLab From 371fa37749b423ae87bfe70551f6632380a675d6 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Fri, 3 May 2024 10:07:24 +0100 Subject: [PATCH 18/20] Expand Prometheus query samples --- doc/administration/monitoring/prometheus/index.md | 12 ++++++++++-- doc/administration/reference_architectures/index.md | 2 +- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/doc/administration/monitoring/prometheus/index.md b/doc/administration/monitoring/prometheus/index.md index 1bddbbc25c2812..0af13624b6ec05 100644 --- a/doc/administration/monitoring/prometheus/index.md +++ b/doc/administration/monitoring/prometheus/index.md @@ -371,12 +371,20 @@ to work with the collected data where you can visualize the output. For a more fully featured dashboard, Grafana can be used and has [official support for Prometheus](https://prometheus.io/docs/visualization/grafana/). -Sample Prometheus queries: +## Sample Prometheus queries + +Below are some sample Prometheus queries that can be used. + +NOTE: +These are only examples and may not work on all setups. Further adjustments may be required. -- **% Memory available:** `((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) or ((node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes) / node_memory_MemTotal_bytes)) * 100` - **% CPU utilization:** `1 - avg without (mode,cpu) (rate(node_cpu_seconds_total{mode="idle"}[5m]))` +- **% Memory available:** `((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) or ((node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes) / node_memory_MemTotal_bytes)) * 100` - **Data transmitted:** `rate(node_network_transmit_bytes_total{device!="lo"}[5m])` - **Data received:** `rate(node_network_receive_bytes_total{device!="lo"}[5m])` +- **Disk read IOPS:** `sum by (instance) (rate(node_disk_reads_completed_total[1m]))` +- **Disk write IOPS**: `sum by (instance) (rate(node_disk_writes_completed_total[1m]))` +- **RPS via GitLab transaction count**: `sum(irate(gitlab_transaction_duration_seconds_count{controller!~'HealthController|MetricsController|'}[1m])) by (controller, action)` ## Prometheus as a Grafana data source diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index ee1ffcd4f6ec7c..05bd5fbb2287df 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -75,7 +75,7 @@ The first thing to check is what the expected peak load is your environment woul Each architecture is described in terms of peak Requests per Second (RPS) or user count load. As detailed under the "Testing Methodology" section on each page, each architecture is tested against its listed RPS for each endpoint type (API, Web, Git), which is the typical peak load of the given user count, both manual and automated. -It's strongly recommended finding out what peak RPS your environment will be expected to handle across endpoint types, through existing metrics (such as [Prometheus](../monitoring/prometheus/gitlab_metrics.md)) +It's strongly recommended finding out what peak RPS your environment will be expected to handle across endpoint types, through existing metrics (such as [Prometheus](../monitoring/prometheus/index.md#sample-prometheus-queries)) or estimates, and to select the corresponding architecture as this is the most objective. #### If in doubt, pick the closest user count and scale accordingly -- GitLab From ea1ffc5b1468df5b587d8aefd7e3350d9ca68185 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Tue, 7 May 2024 08:53:00 +0000 Subject: [PATCH 19/20] Apply 7 suggestion(s) to 7 file(s) Co-authored-by: Achilleas Pipinellis --- doc/administration/reference_architectures/10k_users.md | 4 ++-- doc/administration/reference_architectures/25k_users.md | 4 ++-- doc/administration/reference_architectures/2k_users.md | 4 ++-- doc/administration/reference_architectures/3k_users.md | 4 ++-- doc/administration/reference_architectures/50k_users.md | 4 ++-- doc/administration/reference_architectures/5k_users.md | 4 ++-- doc/administration/reference_architectures/index.md | 2 +- 7 files changed, 13 insertions(+), 13 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index eb9b9d0f42b7ad..4d604cce9e9c22 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -2278,8 +2278,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. -- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. -- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. - Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index a350a7ffdf568d..cf9163484a3e15 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -2284,8 +2284,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. -- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. -- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. - Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index b1a8ded88e43c2..7a814ea2dd21f1 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -1128,8 +1128,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. -- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. -- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. - Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 5ac125771befa0..bcdbe585981151 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -2266,8 +2266,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. -- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. -- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. - Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index f07456564d22da..7be79661846146 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -2298,8 +2298,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. -- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. -- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. - Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 13e8d0b81a4db3..25eec25617d94d 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -2241,8 +2241,8 @@ the overall makeup as desired as long as the minimum CPU and Memory requirements - For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - GCP and AWS examples of how to reach the Target Node Pool Total are given for convenience. These sizes are used in performance testing but following the example is not required. Different node pool designs can be used as desired as long as the targets are met, and all pods can deploy. -- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources will be required for the chosen Kubernetes provider's system processes - The examples as given take this into account. -- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes will also require resources - The examples as given take this into account. +- The [Webservice](#webservice) and [Sidekiq](#sidekiq) target node pool totals are given for GitLab components only. Additional resources are required for the chosen Kubernetes provider's system processes. The given examples take this into account. +- The [Supporting](#supporting) target node pool total is given generally to accommodate several resources for supporting the GitLab deployment as well as any additional deployments you may wish to make depending on your requirements. Similar to the other node pools, the chosen Kubernetes provider's system processes also require resources. The given examples take this into account. - In production deployments, it's not required to assign pods to specific nodes. However, it is recommended to have several nodes in each pool spread across different availability zones to align with resilient cloud architecture practices. - Enabling autoscaling, such as Cluster Autoscaler, for efficiency reasons is encouraged, but it's generally recommended targeting a floor of 75% for Webservice and Sidekiq pods to ensure ongoing performance. diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index 05bd5fbb2287df..3a4797a634d05b 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -12,7 +12,7 @@ DETAILS: **Offering:** Self-managed The GitLab Reference Architectures have been designed and tested by the -GitLab Test Platform and Support teams to provide recommend scalable and elastic deployments as starting points for target loads. +GitLab Test Platform and Support teams to provide recommended scalable and elastic deployments as starting points for target loads. ## Available reference architectures -- GitLab From 5b786cfb2894f1e37d379bdb130804a1ea96ffd0 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Tue, 7 May 2024 10:11:22 +0100 Subject: [PATCH 20/20] Adjust scaling text --- .../reference_architectures/index.md | 23 +++++++++---------- 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index 3a4797a634d05b..419a10c1aa4c23 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -711,20 +711,9 @@ Maintaining a Reference Architecture environment is generally the same as any ot In this section you'll find links to documentation for relevant areas as well as any specific Reference Architecture notes. -### Upgrades - -Upgrades for a Reference Architecture environment is the same as any other GitLab environment. -The main [Upgrade GitLab](../../update/index.md) section has detailed steps on how to approach this. - -[Zero-downtime upgrades](#zero-downtime-upgrades) are also available. - -NOTE: -You should upgrade a Reference Architecture in the same order as you created it. - ### Scaling an environment -The Reference Architectures have been designed to support scaling in various ways depending on your use case and circumstances. -This can be done iteratively or wholesale to the next size of architecture depending on if metrics suggest a component is being exhausted. +The Reference Architectures have been designed as a starting point and are elastic and scalable throughout. It's more likely than not that you may want to adjust the environment for your specific needs after deployment for reasons such as additional performance capacity or reduced costs. This is expected and, as such, scaling can be done iteratively or wholesale to the next size of architecture depending on if metrics suggest a component is being exhausted. NOTE: If you're seeing a component continuously exhausting it's given resources it's strongly recommended for you to reach out to our [Support team](https://about.gitlab.com/support/) before performing any scaling. This is especially so if you're planning to scale any component significantly. @@ -762,6 +751,16 @@ documentation for each as follows - [Postgres to multi-node Postgres w/ Consul + PgBouncer](../postgresql/moving.md) - [Gitaly to Gitaly Cluster w/ Praefect](../gitaly/index.md#migrate-to-gitaly-cluster) +### Upgrades + +Upgrades for a Reference Architecture environment is the same as any other GitLab environment. +The main [Upgrade GitLab](../../update/index.md) section has detailed steps on how to approach this. + +[Zero-downtime upgrades](#zero-downtime-upgrades) are also available. + +NOTE: +You should upgrade a Reference Architecture in the same order as you created it. + ### Monitoring There are numerous options available to monitor your infrastructure, as well as [GitLab itself](../monitoring/index.md), and you should refer to your selected monitoring solution's documentation for more information. -- GitLab