diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index a5c60af47b183bbb83c822ff716078e4897ebe09..87fa0ff236e53bbc4b56892c7727ed3ca3fdd561 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -12,7 +12,7 @@ full list of reference architectures, see > - **Supported users (approximate):** 10,000 > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator#id=e77713f6-dc0b-4bb3-bcef-cea904ac8efd) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested daily with the [GitLab Performance Tool](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 200 RPS, Web: 20 RPS, Git (Pull): 20 RPS, Git (Push): 4 RPS @@ -2234,7 +2234,7 @@ future with further specific cloud provider details. | Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/1k_users.md b/doc/administration/reference_architectures/1k_users.md index ed6fbe84a4804c969661418e1e765ab22e230033..0d0e7681ffdb26b4bcb7e8d05a0b0dc39022fd95 100644 --- a/doc/administration/reference_architectures/1k_users.md +++ b/doc/administration/reference_architectures/1k_users.md @@ -18,6 +18,7 @@ many organizations. > - **Supported users (approximate):** 1,000 > - **High Availability:** No. For a highly-available environment, you can > follow a modified [3K reference architecture](3k_users.md#supported-modifications-for-lower-user-counts-ha). +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid:** No. For a cloud native hybrid environment, you > can follow a [modified hybrid reference architecture](#cloud-native-hybrid-reference-architecture-with-helm-charts). > - **Performance tested daily with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 8cc355db9513659b91964b0e98a93be9c5bcd479..f133e48008e881fb6413aa84e0796929d893df37 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -12,7 +12,7 @@ full list of reference architectures, see > - **Supported users (approximate):** 25,000 > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator#id=925386e1-c01c-4c0a-8d7d-ebde1824b7b0) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested weekly with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 500 RPS, Web: 50 RPS, Git (Pull): 50 RPS, Git (Push): 10 RPS @@ -2232,7 +2232,7 @@ future with further specific cloud provider details. | Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 467c64b8279d4f537c5afdb36f0422cb06a375e0..f6c484b08b12774339fecd89ab5c764bce0f3567 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -13,7 +13,7 @@ For a full list of reference architectures, see > - **Supported users (approximate):** 2,000 > - **High Availability:** No. For a highly-available environment, you can > follow a modified [3K reference architecture](3k_users.md#supported-modifications-for-lower-user-counts-ha). -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator#id=84d11491-d72a-493c-a16e-650931faa658) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested daily with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 40 RPS, Web: 4 RPS, Git (Pull): 4 RPS, Git (Push): 1 RPS @@ -1022,7 +1022,7 @@ future with further specific cloud provider details. | Sidekiq | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | 1.9 vCPU, 5.5 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 01d9987739bb767def67c8729eaef9e9629447bc..0aadfa4e6220f3d5a8a0b72c0bcd61b6452206d1 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -22,7 +22,7 @@ For a full list of reference architectures, see > - **Supported users (approximate):** 3,000 > - **High Availability:** Yes, although [Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator/#id=ac4838e6-9c40-4a36-ac43-6d1bc1843e08) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested weekly with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 60 RPS, Web: 6 RPS, Git (Pull): 6 RPS, Git (Push): 1 RPS @@ -2191,7 +2191,7 @@ future with further specific cloud provider details. | Sidekiq | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | 11.8 vCPU, 38.9 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index d5bb9c4ad6459d2b1b1fc555882555ea1a4aa2b0..365a634cec55a9eeeb5e63b5c7da15e3e3a60b1d 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -12,7 +12,7 @@ full list of reference architectures, see > - **Supported users (approximate):** 50,000 > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator/#id=8006396b-88ee-40cd-a1c8-77cdefa4d3c8) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested weekly with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 1000 RPS, Web: 100 RPS, Git (Pull): 100 RPS, Git (Push): 20 RPS @@ -2248,7 +2248,7 @@ future with further specific cloud provider details. | Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | 15.5 vCPU, 50 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 4 vCPU, 15 GB memory | `n1-standard-4` | 7.75 vCPU, 25 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 33ca4e4899f4b51282faadc70a28e44556fbe700..5e029de517dfb8430cef34a79fe82f16f02f9ae4 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -19,7 +19,7 @@ costly-to-operate environment by using the > - **Supported users (approximate):** 5,000 > - **High Availability:** Yes ([Praefect](#configure-praefect-postgresql) needs a third-party PostgreSQL solution for HA) -> - **Estimated Costs:** [GCP](https://cloud.google.com/products/calculator/#id=8742e8ea-c08f-4e0a-b058-02f3a1c38a2f) +> - **Estimated Costs:** [See cost table](index.md#cost-to-run) > - **Cloud Native Hybrid Alternative:** [Yes](#cloud-native-hybrid-reference-architecture-with-helm-charts-alternative) > - **Performance tested weekly with the [GitLab Performance Tool (GPT)](https://gitlab.com/gitlab-org/quality/performance)**: > - **Test requests per second (RPS) rates:** API: 100 RPS, Web: 10 RPS, Git (Pull): 10 RPS, Git (Push): 2 RPS @@ -2167,7 +2167,7 @@ future with further specific cloud provider details. | Sidekiq | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | 11.8 vCPU, 38.9 GB memory | | Supporting services such as NGINX, Prometheus | 2 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | 3.9 vCPU, 11.8 GB memory | -- For this setup, we **recommend** and regularly [test](index.md#testing-process-and-results) +- For this setup, we **recommend** and regularly [test](index.md#validation-and-test-results) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) and [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/). Other Kubernetes services may also work, but your mileage may vary. - Nodes configuration is shown as it is forced to ensure pod vcpu / memory ratios and avoid scaling during **performance testing**. - In production deployments, there is no need to assign pods to nodes. A minimum of three nodes in three different availability zones is strongly recommended to align with resilient cloud architecture practices. diff --git a/doc/administration/reference_architectures/index.md b/doc/administration/reference_architectures/index.md index bd796600564c9b08f1c66694bb2aa2399f549a7d..2a9e4cc0e7a7caf9a0c6f46ddf42da19058cda4b 100644 --- a/doc/administration/reference_architectures/index.md +++ b/doc/administration/reference_architectures/index.md @@ -83,6 +83,203 @@ to get assistance from Support with troubleshooting the [2,000 users](2k_users.m and higher reference architectures. [Read more about our definition of scaled architectures](https://about.gitlab.com/support/#definition-of-scaled-architecture). +### Validation and test results + +The [Quality Engineering - Enablement team](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/) does regular smoke and performance tests for the reference architectures to ensure they remain compliant. + +- Testing occurs against all reference architectures and cloud providers in an automated and ad-hoc fashion. This is done by two tools: + - The [GitLab Environment Toolkit](https://gitlab.com/gitlab-org/quality/gitlab-environment-toolkit) for building the environments. + - The [GitLab Performance Tool](https://gitlab.com/gitlab-org/quality/performance) for performance testing. +- Network latency on the test environments between components on all Cloud Providers were measured at <5ms. Note that this is shared as an observation and not as an implicit recommendation. +- We aim to have a "test smart" approach where architectures tested have a good range that can also apply to others. Testing focuses on 10k Omnibus on GCP as the testing has shown this is a good bellwether for the other architectures and cloud providers as well as Cloud Native Hybrids. +- Testing is done publicly and all results are shared. + +The following table details the testing done against the reference architectures along with the frequency and results. Additional testing is continuously evaluated, and the table is updated accordingly. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Reference
Architecture
GCP (* also proxy for Bare-Metal)AWSAzure
OmnibusCloud Native HybridOmnibusCloud Native HybridOmnibusCloud Native Hybrid
1kDaily (to be moved to Weekly)
2kDaily (to be moved to Weekly)
3kWeekly
5kWeekly
10kDailyAd-HocAd-Hoc (inc Cloud Services)Ad-HocAd-Hoc
25kWeeklyAd-Hoc
50kWeeklyAd-Hoc (inc Cloud Services)
+ +The Standard Reference Architectures are designed to be platform agnostic, with everything being run on VMs via [Omnibus GitLab](https://docs.gitlab.com/omnibus/). While testing occurs primarily on GCP, ad-hoc testing has shown that they perform similarly on equivalently specced hardware on other Cloud Providers or if run on premises (bare-metal). + +### Cost to run + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Reference
Architecture
GCPAWSAzure
OmnibusCloud Native HybridOmnibusCloud Native HybridOmnibusCloud Native Hybrid
1kCalculated cost
2kCalculated cost
3kCalculated cost
5kCalculated cost
10kCalculated cost
25kCalculated cost
50kCalculated cost
+ ## Availability Components GitLab comes with the following components for your use, listed from least to @@ -191,33 +388,3 @@ The reference architectures for user counts [3,000](3k_users.md) and up support In the specific case you have the requirement to achieve HA but have a lower user count, select modifications to the [3,000 user](3k_users.md) architecture are supported. For more details, [refer to this section in the architecture's documentation](3k_users.md#supported-modifications-for-lower-user-counts-ha). - -## Testing process and results - -The [Quality Engineering - Enablement team](https://about.gitlab.com/handbook/engineering/quality/quality-engineering/) does regular smoke and performance tests for the reference architectures to ensure they remain compliant. - -In this section, we detail some of the process as well as the results. - -Note the following about the testing process: - -- Testing occurs against all main reference architectures and cloud providers in an automated and ad-hoc fashion. - This is achieved through two tools built by the team: - - The [GitLab Environment Toolkit](https://gitlab.com/gitlab-org/quality/gitlab-environment-toolkit) for building the environments. - - The [GitLab Performance Tool](https://gitlab.com/gitlab-org/quality/performance) for performance testing. -- Network latency on the test environments between components on all Cloud Providers were measured at <5ms. Note that this is shared as an observation and not as an implicit recommendation. -- We aim to have a "test smart" approach where architectures tested have a good range that can also apply to others. Testing focuses on 10k Omnibus on GCP as the testing has shown this is a good bellwether for the other architectures and cloud providers as well as Cloud Native Hybrids. -- Testing is done publicly and all results are shared. - -Τhe following table details the testing done against the reference architectures along with the frequency and results. Note that this list above is non exhaustive. Additional testing is continuously evaluated and iterated on, and the table is updated accordingly. - -| Reference
Architecture
Size | Bare-Metal | GCP | AWS | Azure | -|-----------------------------|------------|-----|-----|-------| -| 1k | Refer to GCP1 | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/1k)1 | - | - | -| 2k | Refer to GCP1 | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/2k)1 | - | - | -| 3k | Refer to GCP1 | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/3k)1 | - | - | -| 5k | Refer to GCP1 | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/5k)1 | - | - | -| 10k | Refer to GCP1 | [Standard - Daily](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/10k)1
[Standard (inc Cloud Services) - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k)
[Cloud Native Hybrid - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k-Cloud-Native-Hybrid) | [Standard (inc Cloud Services) - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k)
[Cloud Native Hybrid - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k-Cloud-Native-Hybrid) | [Standard - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/10k) | -| 25k | Refer to GCP1 | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/25k)1 | - | [Standard - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/25k) | -| 50k | Refer to GCP1 | [Standard - Weekly](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/50k)1 | [Standard (inc Cloud Services) - Ad-Hoc](https://gitlab.com/gitlab-org/quality/performance/-/wikis/Past-Results/50k) | - | - -1. The Standard Reference Architectures are designed to be platform agnostic, with everything being run on VMs via [Omnibus GitLab](https://docs.gitlab.com/omnibus/). While testing occurs primarily on GCP, ad-hoc testing has shown that they perform similarly on equivalently specced hardware on other Cloud Providers or if run on premises (bare-metal).