diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index a5c60af47b183bbb83c822ff716078e4897ebe09..87fa0ff236e53bbc4b56892c7727ed3ca3fdd561 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -12,7 +12,7 @@ full list of reference architectures, see > - **Supported users (approximate):** 10,000 > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator#id=e77713f6-dc0b-4bb3-bcef-cea904ac8efd) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested daily with the [GitLab Performance Tool](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 200 RPS, Web: 20 RPS, Git (Pull): 20 RPS, Git (Push): 4 RPS @@ -2234,7 +2234,7 @@ future with further specific cloud provider details. | Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/1k_users.md b/doc/administration/reference_architectures/1k_users.md index ed6fbe84a4804c969661418e1e765ab22e230033..0d0e7681ffdb26b4bcb7e8d05a0b0dc39022fd95 100644 --- a/doc/administration/reference_architectures/1k_users.md +++ b/doc/administration/reference_architectures/1k_users.md @@ -18,6 +18,7 @@ many organizations. > - **Supported users (approximate):** 1,000 > - **High Availability:** No. For a highly-available environment, you can > follow a modified [3K reference architecture](3k_users.md#supported-modifications-for-lower-user-counts-ha). +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid:** No. For a cloud native hybrid environment, you > can follow a [modified hybrid reference architecture](#cloud-native-hybrid-reference-architecture-with-helm-charts). > - **Performance tested daily with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 8cc355db9513659b91964b0e98a93be9c5bcd479..f133e48008e881fb6413aa84e0796929d893df37 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -12,7 +12,7 @@ full list of reference architectures, see > - **Supported users (approximate):** 25,000 > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator#id=925386e1-c01c-4c0a-8d7d-ebde1824b7b0) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested weekly with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 500 RPS, Web: 50 RPS, Git (Pull): 50 RPS, Git (Push): 10 RPS @@ -2232,7 +2232,7 @@ future with further specific cloud provider details. | Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 467c64b8279d4f537c5afdb36f0422cb06a375e0..f6c484b08b12774339fecd89ab5c764bce0f3567 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -13,7 +13,7 @@ For a full list of reference architectures, see > - **Supported users (approximate):** 2,000 > - **High Availability:** No. For a highly-available environment, you can > follow a modified [3K reference architecture](3k_users.md#supported-modifications-for-lower-user-counts-ha). -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator#id=84d11491-d72a-493c-a16e-650931faa658) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested daily with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 40 RPS, Web: 4 RPS, Git (Pull): 4 RPS, Git (Push): 1 RPS @@ -1022,7 +1022,7 @@ future with further specific cloud provider details. | Sidekiq | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | 1.9 vCPU, 5.5 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 01d9987739bb767def67c8729eaef9e9629447bc..0aadfa4e6220f3d5a8a0b72c0bcd61b6452206d1 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -22,7 +22,7 @@ For a full list of reference architectures, see > - **Supported users (approximate):** 3,000 > - **High Availability:** Yes, although [Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator/#id=ac4838e6-9c40-4a36-ac43-6d1bc1843e08) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested weekly with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 60 RPS, Web: 6 RPS, Git (Pull): 6 RPS, Git (Push): 1 RPS @@ -2191,7 +2191,7 @@ future with further specific cloud provider details. | Sidekiq | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | 11.8 vCPU, 38.9 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index d5bb9c4ad6459d2b1b1fc555882555ea1a4aa2b0..365a634cec55a9eeeb5e63b5c7da15e3e3a60b1d 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -12,7 +12,7 @@ full list of reference architectures, see > - **Supported users (approximate):** 50,000 > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator/#id=8006396b-88ee-40cd-a1c8-77cdefa4d3c8) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested weekly with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 1000 RPS, Web: 100 RPS, Git (Pull): 100 RPS, Git (Push): 20 RPS @@ -2248,7 +2248,7 @@ future with further specific cloud provider details. | Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 33ca4e4899f4b51282faadc70a28e44556fbe700..5e029de517dfb8430cef34a79fe82f16f02f9ae4 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -19,7 +19,7 @@ costly-to-operate environment by using the > - **Supported users (approximate):** 5,000 > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator/#id=8742e8ea-c08f-4e0a-b058-02f3a1c38a2f) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested weekly with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 100 RPS, Web: 10 RPS, Git (Pull): 10 RPS, Git (Push): 2 RPS @@ -2167,7 +2167,7 @@ future with further specific cloud provider details. | Sidekiq | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | 11.8 vCPU, 38.9 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index bd796600564c9b08f1c66694bb2aa2399f549a7d..2a9e4cc0e7a7caf9a0c6f46ddf42da19058cda4b 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -83,6 +83,203 @@ to get assistance from Support with troubleshooting the [2,000 users](2k_users.m and higher reference architectures. [Read more about our definition of scaled architectures](https://about.gitlab.com/support/#definition-of-scaled-architecture). +### Validation and test results + +The [Quality Engineering - Enablement team](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/) does regular smoke and performance tests for the reference architectures to ensure they remain compliant. + +- Testing occurs against all reference architectures and cloud providers in an automated and ad-hoc fashion. This is done by two tools: + - The [GitLab Environment Toolkit](https://gitlab.com/gitlab-org/quality/gitlab-environment-toolkit) for building the environments. + - The [GitLab Performance Tool](https://gitlab.com/gitlab-org/quality/performance) for performance testing. +- Network latency on the test environments between components on all Cloud Providers were measured at <5ms. Note that this is shared as an observation and not as an implicit recommendation. +- We aim to have a "test smart" approach where architectures tested have a good range that can also apply to others. Testing focuses on 10k Omnibus on GCP as the testing has shown this is a good bellwether for the other architectures and cloud providers as well as Cloud Native Hybrids. +- Testing is done publicly and all results are shared. + +The following table details the testing done against the reference architectures along with the frequency and results. Additional testing is continuously evaluated, and the table is updated accordingly. + + + +
| Reference Architecture |
+ GCP (* also proxy for Bare-Metal) | +AWS | +Azure | +|||
|---|---|---|---|---|---|---|
| Omnibus | +Cloud Native Hybrid | +Omnibus | +Cloud Native Hybrid | +Omnibus | +Cloud Native Hybrid | +|
| 1k | +Daily (to be moved to Weekly) | ++ | + | + | + | + |
| 2k | +Daily (to be moved to Weekly) | ++ | + | + | + | + |
| 3k | +Weekly | ++ | + | + | + | + |
| 5k | +Weekly | ++ | + | + | + | + |
| 10k | +Daily | +Ad-Hoc | +Ad-Hoc (inc Cloud Services) | +Ad-Hoc | +Ad-Hoc | ++ |
| 25k | +Weekly | ++ | + | + | Ad-Hoc | ++ |
| 50k | +Weekly | ++ | Ad-Hoc (inc Cloud Services) | ++ | + | + |
| Reference Architecture |
+ GCP | +AWS | +Azure | +|||
|---|---|---|---|---|---|---|
| Omnibus | +Cloud Native Hybrid | +Omnibus | +Cloud Native Hybrid | +Omnibus | +Cloud Native Hybrid | +|
| 1k | +Calculated cost | ++ | + | + | + | + |
| 2k | +Calculated cost | ++ | + | + | + | + |
| 3k | +Calculated cost | ++ | + | + | + | + |
| 5k | +Calculated cost | ++ | + | + | + | + |
| 10k | +Calculated cost | ++ | + | + | + | + |
| 25k | +Calculated cost | ++ | + | + | + | + |
| 50k | +Calculated cost | ++ | + | + | + | + |