From d8d992fdd3b0ee460df0cf03a1967f89b503483e Mon Sep 17 00:00:00 2001 From: Grant Young Date: Wed, 2 Nov 2022 11:51:45 +0000 Subject: [PATCH 1/8] Add Ref Arch docs notes about additional workloads Also notes about large repos and load balancing --- .../reference_architectures/10k_users.md | 138 ++++++++++----- .../reference_architectures/1k_users.md | 15 +- .../reference_architectures/25k_users.md | 135 ++++++++++---- .../reference_architectures/2k_users.md | 166 +++++++++++------- .../reference_architectures/3k_users.md | 129 ++++++++++---- .../reference_architectures/50k_users.md | 138 ++++++++++----- .../reference_architectures/5k_users.md | 139 ++++++++++----- 7 files changed, 587 insertions(+), 273 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index 45939b48f7888e..765fa49548265f 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -158,10 +158,23 @@ Any "burstable" instance types are not recommended due to inconsistent performan ### Supported infrastructure -As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP) and their services, or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. However, this does not constitute a guarantee for every potential permutation. +As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP) and their services, +or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. +However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. +### Additional Workloads + +The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with +good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, +such as security software, you may still need to adjust the specs accordingly to compensate. + +This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). + +As a general rule it's recommended to have robust monitoring in place to measure the impact of +any additional workloads to inform any changes needed to be made. + ### Praefect PostgreSQL It's worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and @@ -241,8 +254,7 @@ In a multi-node GitLab configuration, you'll need a load balancer to route traffic to the application servers. The specifics on which load balancer to use or its exact configuration is beyond the scope of GitLab documentation. We assume that if you're managing multi-node systems like GitLab, you already have a load -balancer of choice and that the routing methods used are distributing calls evenly -between all nodes. Some load balancer examples include HAProxy (open-source), +balancer of choice. Some load balancer examples include HAProxy (open-source), F5 Big-IP LTM, and Citrix Net Scaler. This documentation outline the ports and protocols needed for use with GitLab. @@ -250,47 +262,13 @@ This architecture has been tested and validated with [HAProxy](https://www.hapro as the load balancer. Although other load balancers with similar feature sets could also be used, those load balancers have not been validated. -The next question is how you will handle SSL in your environment. -There are several different options: - -- [The application node terminates SSL](#application-node-terminates-ssl). -- [The load balancer terminates SSL without backend SSL](#load-balancer-terminates-ssl-without-backend-ssl) - and communication is not secure between the load balancer and the application node. -- [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) - and communication is *secure* between the load balancer and the application node. +### Balancing Algorithm -### Application node terminates SSL +We recommend that a least connection based load balancing algorithm or equivalent +is used wherever possible to ensure equal spread of calls to the nodes and good performance. -Configure your load balancer to pass connections on port 443 as `TCP` rather -than `HTTP(S)` protocol. This will pass the connection to the application node's -NGINX service untouched. NGINX will have the SSL certificate and listen on port 443. - -See the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) -for details on managing SSL certificates and configuring NGINX. - -### Load balancer terminates SSL without backend SSL - -Configure your load balancer to use the `HTTP(S)` protocol rather than `TCP`. -The load balancer will then be responsible for managing SSL certificates and -terminating SSL. - -Since communication between the load balancer and GitLab will not be secure, -there is some additional configuration needed. See the -[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination) -for details. - -### Load balancer terminates SSL with backend SSL - -Configure your load balancers to use the 'HTTP(S)' protocol rather than 'TCP'. -The load balancers will be responsible for managing SSL certificates that -end users will see. - -Traffic will also be secure between the load balancers and NGINX in this -scenario. There is no need to add configuration for proxied SSL since the -connection will be secure all the way. However, configuration will need to be -added to GitLab to configure SSL certificates. See -the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) -for details on managing SSL certificates and configuring NGINX. +We don't recommend the use specifically of round-robin algorithms as they are known to not +spread connections equally in practice. ### Readiness checks @@ -351,6 +329,50 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`. | ------- | ------------ | -------- | | 443 | 22 | TCP | +### SSL + +The next question is how you will handle SSL in your environment. +There are several different options: + +- [The application node terminates SSL](#application-node-terminates-ssl). +- [The load balancer terminates SSL without backend SSL](#load-balancer-terminates-ssl-without-backend-ssl) + and communication is not secure between the load balancer and the application node. +- [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) + and communication is *secure* between the load balancer and the application node. + +#### Application node terminates SSL + +Configure your load balancer to pass connections on port 443 as `TCP` rather +than `HTTP(S)` protocol. This will pass the connection to the application node's +NGINX service untouched. NGINX will have the SSL certificate and listen on port 443. + +See the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. + +#### Load balancer terminates SSL without backend SSL + +Configure your load balancer to use the `HTTP(S)` protocol rather than `TCP`. +The load balancer will then be responsible for managing SSL certificates and +terminating SSL. + +Since communication between the load balancer and GitLab will not be secure, +there is some additional configuration needed. See the +[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination) +for details. + +#### Load balancer terminates SSL with backend SSL + +Configure your load balancers to use the 'HTTP(S)' protocol rather than 'TCP'. +The load balancers will be responsible for managing SSL certificates that +end users will see. + +Traffic will also be secure between the load balancers and NGINX in this +scenario. There is no need to add configuration for proxied SSL since the +connection will be secure all the way. However, configuration will need to be +added to GitLab to configure SSL certificates. See +the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. +
Back to setup components @@ -415,8 +437,14 @@ backend praefect ``` Refer to your preferred Load Balancer's documentation for further guidance. -Also ensure that the routing methods used are distributing calls evenly across -all nodes. + +### Balancing Algorithm + +We recommend that a least connection based load balancing algorithm or equivalent +is used wherever possible to ensure equal spread of calls to the nodes and good performance. + +We don't recommend the use specifically of round-robin algorithms as they are known to not +spread connections equally in practice.
@@ -1475,9 +1503,14 @@ The [Gitaly](../gitaly/index.md) server nodes that make up the cluster have requirements that are dependent on data and load. NOTE: -The Reference Architecture specs have been designed with good headroom in mind -but for Gitaly, increased specs or additional -Gitaly Cluster arrays may be required for notably large data sets or load. +Increased specs for Gitaly nodes may be required in some circumstances such as +significantly large repositories or if any [additional workloads](#additional-workloads), +such as [Server Hooks](../server_hooks.md), have been added. + +NOTE: +Large repositories not following best practices can impact performance notably. +Specific guidance for these can be found in the +[Managing Large Repositories](#managing-large-repositories) section below. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -1857,6 +1890,17 @@ If you find that the environment's Sidekiq job processing is slow with long queu more nodes can be added as required. You can also tune your Sidekiq nodes to run [multiple Sidekiq processes](../operations/extra_sidekiq_processes.md). +### Managing Large Repositories + +Large repositories or monorepos can notably impact performance of Git itself as well as the environment +if best practices aren't being followed. + +It's strongly recommended to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. +
Back to setup components diff --git a/doc/administration/reference_architectures/1k_users.md b/doc/administration/reference_architectures/1k_users.md index a8e0e23512f933..7e88109219ff08 100644 --- a/doc/administration/reference_architectures/1k_users.md +++ b/doc/administration/reference_architectures/1k_users.md @@ -82,10 +82,23 @@ Any "burstable" instance types are not recommended due to inconsistent performan ### Supported infrastructure -As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP, Azure) and their services, or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. However, this does not constitute a guarantee for every potential permutation. +As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP) and their services, +or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. +However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. +### Additional Workloads + +The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with +good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, +such as security software, you may still need to adjust the specs accordingly to compensate. + +This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). + +As a general rule it's recommended to have robust monitoring in place to measure the impact of +any additional workloads to inform any changes needed to be made. + ### Swap In addition to the stated configurations, we recommend having at least 2 GB of diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 7d67ac48b73c63..1157e373822977 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -158,10 +158,23 @@ Any "burstable" instance types are not recommended due to inconsistent performan ### Supported infrastructure -As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP) and their services, or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. However, this does not constitute a guarantee for every potential permutation. +As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP, Azure) and their services, +or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. +However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. +### Additional Workloads + +The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with +good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, +such as security software, you may still need to adjust the specs accordingly to compensate. + +This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). + +As a general rule it's recommended to have robust monitoring in place to measure the impact of +any additional workloads to inform any changes needed to be made. + ### Praefect PostgreSQL It's worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and @@ -243,8 +256,7 @@ In a multi-node GitLab configuration, you'll need a load balancer to route traffic to the application servers. The specifics on which load balancer to use or its exact configuration is beyond the scope of GitLab documentation. We assume that if you're managing multi-node systems like GitLab, you already have a load -balancer of choice and that the routing methods used are distributing calls evenly -between all nodes. Some load balancer examples include HAProxy (open-source), +balancer of choice. Some load balancer examples include HAProxy (open-source), F5 Big-IP LTM, and Citrix Net Scaler. This documentation outline the ports and protocols needed for use with GitLab. @@ -261,38 +273,13 @@ There are several different options: - [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) and communication is *secure* between the load balancer and the application node. -### Application node terminates SSL +### Balancing Algorithm -Configure your load balancer to pass connections on port 443 as `TCP` rather -than `HTTP(S)` protocol. This will pass the connection to the application node's -NGINX service untouched. NGINX will have the SSL certificate and listen on port 443. +We recommend that a least connection based load balancing algorithm or equivalent +is used wherever possible to ensure equal spread of calls to the nodes and good performance. -See the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) -for details on managing SSL certificates and configuring NGINX. - -### Load balancer terminates SSL without backend SSL - -Configure your load balancer to use the `HTTP(S)` protocol rather than `TCP`. -The load balancer will then be responsible for managing SSL certificates and -terminating SSL. - -Since communication between the load balancer and GitLab will not be secure, -there is some additional configuration needed. See the -[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination) -for details. - -### Load balancer terminates SSL with backend SSL - -Configure your load balancers to use the 'HTTP(S)' protocol rather than 'TCP'. -The load balancers will be responsible for managing SSL certificates that -end users will see. - -Traffic will also be secure between the load balancers and NGINX in this -scenario. There is no need to add configuration for proxied SSL since the -connection will be secure all the way. However, configuration will need to be -added to GitLab to configure SSL certificates. See the -[HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) -for details on managing SSL certificates and configuring NGINX. +We don't recommend the use specifically of round-robin algorithms as they are known to not +spread connections equally in practice. ### Readiness checks @@ -353,6 +340,50 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`. | ------- | ------------ | -------- | | 443 | 22 | TCP | +### SSL + +The next question is how you will handle SSL in your environment. +There are several different options: + +- [The application node terminates SSL](#application-node-terminates-ssl). +- [The load balancer terminates SSL without backend SSL](#load-balancer-terminates-ssl-without-backend-ssl) + and communication is not secure between the load balancer and the application node. +- [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) + and communication is *secure* between the load balancer and the application node. + +#### Application node terminates SSL + +Configure your load balancer to pass connections on port 443 as `TCP` rather +than `HTTP(S)` protocol. This will pass the connection to the application node's +NGINX service untouched. NGINX will have the SSL certificate and listen on port 443. + +See the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. + +#### Load balancer terminates SSL without backend SSL + +Configure your load balancer to use the `HTTP(S)` protocol rather than `TCP`. +The load balancer will then be responsible for managing SSL certificates and +terminating SSL. + +Since communication between the load balancer and GitLab will not be secure, +there is some additional configuration needed. See the +[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination) +for details. + +#### Load balancer terminates SSL with backend SSL + +Configure your load balancers to use the 'HTTP(S)' protocol rather than 'TCP'. +The load balancers will be responsible for managing SSL certificates that +end users will see. + +Traffic will also be secure between the load balancers and NGINX in this +scenario. There is no need to add configuration for proxied SSL since the +connection will be secure all the way. However, configuration will need to be +added to GitLab to configure SSL certificates. See +the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. +
Back to setup components @@ -417,8 +448,20 @@ backend praefect ``` Refer to your preferred Load Balancer's documentation for further guidance. -Also ensure that the routing methods used are distributing calls evenly across -all nodes. + +### Balancing Algorithm + +We recommend that a least connection based load balancing algorithm or equivalent +is used wherever possible to ensure equal spread of calls to the nodes and good performance. + +We don't recommend the use specifically of round-robin algorithms as they are known to not +spread connections equally in practice. + +
@@ -1478,9 +1521,14 @@ The [Gitaly](../gitaly/index.md) server nodes that make up the cluster have requirements that are dependent on data and load. NOTE: -The Reference Architecture specs have been designed with good headroom in mind -but for Gitaly, increased specs or additional -Gitaly Cluster arrays may be required for notably large data sets or load. +Increased specs for Gitaly nodes may be required in some circumstances such as +significantly large repositories or if any [additional workloads](#additional-workloads), +such as [Server Hooks](../server_hooks.md), have been added. + +NOTE: +Large repositories not following best practices can impact performance notably. +Specific guidance for these can be found in the +[Managing Large Repositories](#managing-large-repositories) section below. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -1860,6 +1908,17 @@ If you find that the environment's Sidekiq job processing is slow with long queu more nodes can be added as required. You can also tune your Sidekiq nodes to run [multiple Sidekiq processes](../operations/extra_sidekiq_processes.md). +### Managing Large Repositories + +Large repositories or monorepos can notably impact performance of Git itself as well as the environment +if best practices aren't being followed. + +It's strongly recommended to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. +
Back to setup components diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 61ea435f63fca8..a38ac473abdbe5 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -94,10 +94,23 @@ Any "burstable" instance types are not recommended due to inconsistent performan ### Supported infrastructure -As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP, Azure) and their services, or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. However, this does not constitute a guarantee for every potential permutation. +As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP) and their services, +or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. +However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. +### Additional Workloads + +The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with +good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, +such as security software, you may still need to adjust the specs accordingly to compensate. + +This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). + +As a general rule it's recommended to have robust monitoring in place to measure the impact of +any additional workloads to inform any changes needed to be made. + ## Setup components To set up GitLab and its components to accommodate up to 2,000 users: @@ -127,8 +140,7 @@ In a multi-node GitLab configuration, you'll need a load balancer to route traffic to the application servers. The specifics on which load balancer to use or its exact configuration is beyond the scope of GitLab documentation. We assume that if you're managing multi-node systems like GitLab, you already have a load -balancer of choice and that the routing methods used are distributing calls evenly -between all nodes. Some load balancer examples include HAProxy (open-source), +balancer of choice. Some load balancer examples include HAProxy (open-source), F5 Big-IP LTM, and Citrix Net Scaler. This documentation outline the ports and protocols needed for use with GitLab. @@ -145,36 +157,13 @@ several different options: - [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) and communication is *secure* between the load balancer and the application node. -### Application node terminates SSL - -Configure your load balancer to pass connections on port 443 as `TCP` instead -of `HTTP(S)`. This will pass the connection unaltered to the application node's -NGINX service, which has the SSL certificate and listens to port 443. - -For details about managing SSL certificates and configuring NGINX, see the -[HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) - -### Load balancer terminates SSL without backend SSL - -Configure your load balancer to use the `HTTP(S)` protocol instead of `TCP`. -The load balancer will be responsible for both managing SSL certificates and -terminating SSL. - -Due to communication between the load balancer and GitLab not being secure, -you'll need to complete some additional configuration. For details, see the -[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination). - -### Load balancer terminates SSL with backend SSL +### Balancing Algorithm -Configure your load balancers (or single balancer, if you have only one) to use -the `HTTP(S)` protocol rather than `TCP`. The load balancers will be -responsible for the managing SSL certificates for end users. +We recommend that a least connection based load balancing algorithm or equivalent +is used wherever possible to ensure equal spread of calls to the nodes and good performance. -Traffic will be secure between the load balancers and NGINX in this scenario, -and there's no need to add a configuration for proxied SSL. However, you'll -need to add a configuration to GitLab to configure SSL certificates. For -details about managing SSL certificates and configuring NGINX, see the -[HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html). +We don't recommend the use specifically of round-robin algorithms as they are known to not +spread connections equally in practice. ### Readiness checks @@ -186,56 +175,99 @@ connect. ### Ports -The basic load balancer ports you should use are described in the following -table: +The basic ports to be used are shown in the table below. -| Port | Backend Port | Protocol | +| LB Port | Backend Port | Protocol | | ------- | ------------ | ------------------------ | | 80 | 80 | HTTP (*1*) | | 443 | 443 | TCP or HTTPS (*1*) (*2*) | | 22 | 22 | TCP | -- (*1*): [Web terminal](../../ci/environments/index.md#web-terminals-deprecated) support - requires your load balancer to correctly handle WebSocket connections. - When using HTTP or HTTPS proxying, your load balancer must be configured - to pass through the `Connection` and `Upgrade` hop-by-hop headers. For - details, see the [web terminal](../integration/terminal.md) integration guide. -- (*2*): When using the HTTPS protocol for port 443, you'll need to add an SSL - certificate to the load balancers. If you need to terminate SSL at the - GitLab application server, use the TCP protocol. +- (*1*): [Web terminal](../../ci/environments/index.md#web-terminals-deprecated) support requires + your load balancer to correctly handle WebSocket connections. When using + HTTP or HTTPS proxying, this means your load balancer must be configured + to pass through the `Connection` and `Upgrade` hop-by-hop headers. See the + [web terminal](../integration/terminal.md) integration guide for + more details. +- (*2*): When using HTTPS protocol for port 443, you will need to add an SSL + certificate to the load balancers. If you wish to terminate SSL at the + GitLab application server instead, use TCP protocol. If you're using GitLab Pages with custom domain support you will need some -additional port configurations. GitLab Pages requires a separate virtual IP -address. Configure DNS to point the `pages_external_url` from -`/etc/gitlab/gitlab.rb` to the new virtual IP address. For more information, -see the [GitLab Pages documentation](../pages/index.md). +additional port configurations. +GitLab Pages requires a separate virtual IP address. Configure DNS to point the +`pages_external_url` from `/etc/gitlab/gitlab.rb` at the new virtual IP address. See the +[GitLab Pages documentation](../pages/index.md) for more information. -| Port | Backend Port | Protocol | +| LB Port | Backend Port | Protocol | | ------- | ------------- | --------- | | 80 | Varies (*1*) | HTTP | | 443 | Varies (*1*) | TCP (*2*) | - (*1*): The backend port for GitLab Pages depends on the `gitlab_pages['external_http']` and `gitlab_pages['external_https']` - settings. For details, see the [GitLab Pages documentation](../pages/index.md). -- (*2*): Port 443 for GitLab Pages must use the TCP protocol. Users can - configure custom domains with custom SSL, which wouldn't be possible if SSL - was terminated at the load balancer. + setting. See [GitLab Pages documentation](../pages/index.md) for more details. +- (*2*): Port 443 for GitLab Pages should always use the TCP protocol. Users can + configure custom domains with custom SSL, which would not be possible + if SSL was terminated at the load balancer. #### Alternate SSH Port Some organizations have policies against opening SSH port 22. In this case, -it may be helpful to configure an alternate SSH hostname that instead allows -users to use SSH over port 443. An alternate SSH hostname requires a new -virtual IP address compared to the previously described GitLab HTTP -configuration. +it may be helpful to configure an alternate SSH hostname that allows users +to use SSH on port 443. An alternate SSH hostname will require a new virtual IP address +compared to the other GitLab HTTP configuration above. -Configure DNS for an alternate SSH hostname, such as `altssh.gitlab.example.com`: +Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`. | LB Port | Backend Port | Protocol | | ------- | ------------ | -------- | | 443 | 22 | TCP | +### SSL + +The next question is how you will handle SSL in your environment. +There are several different options: + +- [The application node terminates SSL](#application-node-terminates-ssl). +- [The load balancer terminates SSL without backend SSL](#load-balancer-terminates-ssl-without-backend-ssl) + and communication is not secure between the load balancer and the application node. +- [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) + and communication is *secure* between the load balancer and the application node. + +#### Application node terminates SSL + +Configure your load balancer to pass connections on port 443 as `TCP` rather +than `HTTP(S)` protocol. This will pass the connection to the application node's +NGINX service untouched. NGINX will have the SSL certificate and listen on port 443. + +See the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. + +#### Load balancer terminates SSL without backend SSL + +Configure your load balancer to use the `HTTP(S)` protocol rather than `TCP`. +The load balancer will then be responsible for managing SSL certificates and +terminating SSL. + +Since communication between the load balancer and GitLab will not be secure, +there is some additional configuration needed. See the +[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination) +for details. + +#### Load balancer terminates SSL with backend SSL + +Configure your load balancers to use the 'HTTP(S)' protocol rather than 'TCP'. +The load balancers will be responsible for managing SSL certificates that +end users will see. + +Traffic will also be secure between the load balancers and NGINX in this +scenario. There is no need to add configuration for proxied SSL since the +connection will be secure all the way. However, configuration will need to be +added to GitLab to configure SSL certificates. See +the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. +
Back to setup components @@ -407,9 +439,14 @@ are supported and can be added if needed. specifically the number of projects and those projects' sizes. NOTE: -The Reference Architecture specs have been designed with good headroom in mind -but for Gitaly, increased specs or switching to Gitaly Cluster -may be required for notably large data sets or load. +Increased specs for Gitaly nodes may be required in some circumstances such as +significantly large repositories or if any [additional workloads](#additional-workloads), +such as [Server Hooks](../server_hooks.md), have been added. + +NOTE: +Large repositories not following best practices can impact performance notably. +Specific guidance for these can be found in the +[Managing Large Repositories](#managing-large-repositories) section below. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -574,6 +611,17 @@ To configure Gitaly with TLS: 1. Delete `gitaly['listen_addr']` to allow only encrypted connections. 1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure). +### Managing Large Repositories + +Large repositories or monorepos can notably impact performance of Git itself as well as the environment +if best practices aren't being followed. + +It's strongly recommended to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. +
Back to setup components diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 7484fafe1b00e9..6608789f226ec5 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -164,10 +164,23 @@ Any "burstable" instance types are not recommended due to inconsistent performan ### Supported infrastructure -As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP) and their services, or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. However, this does not constitute a guarantee for every potential permutation. +As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP) and their services, +or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. +However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. +### Additional Workloads + +The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with +good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, +such as security software, you may still need to adjust the specs accordingly to compensate. + +This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). + +As a general rule it's recommended to have robust monitoring in place to measure the impact of +any additional workloads to inform any changes needed to be made. + ### Praefect PostgreSQL It's worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and @@ -244,8 +257,7 @@ In a multi-node GitLab configuration, you'll need a load balancer to route traffic to the application servers. The specifics on which load balancer to use or its exact configuration is beyond the scope of GitLab documentation. We assume that if you're managing multi-node systems like GitLab, you already have a load -balancer of choice and that the routing methods used are distributing calls evenly -between all nodes. Some load balancer examples include HAProxy (open-source), +balancer of choice. Some load balancer examples include HAProxy (open-source), F5 Big-IP LTM, and Citrix Net Scaler. This documentation outline the ports and protocols needed for use with GitLab. @@ -262,38 +274,13 @@ There are several different options: - [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) and communication is *secure* between the load balancer and the application node. -### Application node terminates SSL - -Configure your load balancer to pass connections on port 443 as `TCP` rather -than `HTTP(S)` protocol. This will pass the connection to the application node's -NGINX service untouched. NGINX will have the SSL certificate and listen on port 443. - -See the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) -for details on managing SSL certificates and configuring NGINX. - -### Load balancer terminates SSL without backend SSL - -Configure your load balancer to use the `HTTP(S)` protocol rather than `TCP`. -The load balancer will then be responsible for managing SSL certificates and -terminating SSL. - -Since communication between the load balancer and GitLab will not be secure, -there is some additional configuration needed. See the -[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination) -for details. - -### Load balancer terminates SSL with backend SSL +### Balancing Algorithm -Configure your load balancers to use the 'HTTP(S)' protocol rather than 'TCP'. -The load balancers will be responsible for managing SSL certificates that -end users will see. +We recommend that a least connection based load balancing algorithm or equivalent +is used wherever possible to ensure equal spread of calls to the nodes and good performance. -Traffic will also be secure between the load balancers and NGINX in this -scenario. There is no need to add configuration for proxied SSL since the -connection will be secure all the way. However, configuration will need to be -added to GitLab to configure SSL certificates. See the -[HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) -for details on managing SSL certificates and configuring NGINX. +We don't recommend the use specifically of round-robin algorithms as they are known to not +spread connections equally in practice. ### Readiness checks @@ -354,6 +341,50 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`. | ------- | ------------ | -------- | | 443 | 22 | TCP | +### SSL + +The next question is how you will handle SSL in your environment. +There are several different options: + +- [The application node terminates SSL](#application-node-terminates-ssl). +- [The load balancer terminates SSL without backend SSL](#load-balancer-terminates-ssl-without-backend-ssl) + and communication is not secure between the load balancer and the application node. +- [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) + and communication is *secure* between the load balancer and the application node. + +#### Application node terminates SSL + +Configure your load balancer to pass connections on port 443 as `TCP` rather +than `HTTP(S)` protocol. This will pass the connection to the application node's +NGINX service untouched. NGINX will have the SSL certificate and listen on port 443. + +See the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. + +#### Load balancer terminates SSL without backend SSL + +Configure your load balancer to use the `HTTP(S)` protocol rather than `TCP`. +The load balancer will then be responsible for managing SSL certificates and +terminating SSL. + +Since communication between the load balancer and GitLab will not be secure, +there is some additional configuration needed. See the +[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination) +for details. + +#### Load balancer terminates SSL with backend SSL + +Configure your load balancers to use the 'HTTP(S)' protocol rather than 'TCP'. +The load balancers will be responsible for managing SSL certificates that +end users will see. + +Traffic will also be secure between the load balancers and NGINX in this +scenario. There is no need to add configuration for proxied SSL since the +connection will be secure all the way. However, configuration will need to be +added to GitLab to configure SSL certificates. See +the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. +
Back to setup components @@ -418,8 +449,14 @@ backend praefect ``` Refer to your preferred Load Balancer's documentation for further guidance. -Also ensure that the routing methods used are distributing calls evenly across -all nodes. + +### Balancing Algorithm + +We recommend that a least connection based load balancing algorithm or equivalent +is used wherever possible to ensure equal spread of calls to the nodes and good performance. + +We don't recommend the use specifically of round-robin algorithms as they are known to not +spread connections equally in practice.
@@ -1418,9 +1455,14 @@ The [Gitaly](../gitaly/index.md) server nodes that make up the cluster have requirements that are dependent on data and load. NOTE: -The Reference Architecture specs have been designed with good headroom in mind -but for Gitaly, increased specs or additional -Gitaly Cluster arrays may be required for notably large data sets or load. +Increased specs for Gitaly nodes may be required in some circumstances such as +significantly large repositories or if any [additional workloads](#additional-workloads), +such as [Server Hooks](../server_hooks.md), have been added. + +NOTE: +Large repositories not following best practices can impact performance notably. +Specific guidance for these can be found in the +[Managing Large Repositories](#managing-large-repositories) section below. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -1799,6 +1841,17 @@ If you find that the environment's Sidekiq job processing is slow with long queu more nodes can be added as required. You can also tune your Sidekiq nodes to run [multiple Sidekiq processes](../operations/extra_sidekiq_processes.md). +### Managing Large Repositories + +Large repositories or monorepos can notably impact performance of Git itself as well as the environment +if best practices aren't being followed. + +It's strongly recommended to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. +
Back to setup components diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index 88fc3649b3f151..42faa5b4c45e36 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -158,10 +158,23 @@ Any "burstable" instance types are not recommended due to inconsistent performan ### Supported infrastructure -As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP) and their services, or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. However, this does not constitute a guarantee for every potential permutation. +As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP, Azure) and their services, +or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. +However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. +### Additional Workloads + +The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with +good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, +such as security software, you may still need to adjust the specs accordingly to compensate. + +This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). + +As a general rule it's recommended to have robust monitoring in place to measure the impact of +any additional workloads to inform any changes needed to be made. + ### Praefect PostgreSQL It's worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and @@ -250,8 +263,7 @@ In a multi-node GitLab configuration, you'll need a load balancer to route traffic to the application servers. The specifics on which load balancer to use or its exact configuration is beyond the scope of GitLab documentation. We assume that if you're managing multi-node systems like GitLab, you already have a load -balancer of choice and that the routing methods used are distributing calls evenly -between all nodes. Some load balancer examples include HAProxy (open-source), +balancer of choice. Some load balancer examples include HAProxy (open-source), F5 Big-IP LTM, and Citrix Net Scaler. This documentation outline the ports and protocols needed for use with GitLab. @@ -259,47 +271,13 @@ This architecture has been tested and validated with [HAProxy](https://www.hapro as the load balancer. Although other load balancers with similar feature sets could also be used, those load balancers have not been validated. -The next question is how you will handle SSL in your environment. -There are several different options: - -- [The application node terminates SSL](#application-node-terminates-ssl). -- [The load balancer terminates SSL without backend SSL](#load-balancer-terminates-ssl-without-backend-ssl) - and communication is not secure between the load balancer and the application node. -- [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) - and communication is *secure* between the load balancer and the application node. - -### Application node terminates SSL +### Balancing Algorithm -Configure your load balancer to pass connections on port 443 as `TCP` rather -than `HTTP(S)` protocol. This will pass the connection to the application node's -NGINX service untouched. NGINX will have the SSL certificate and listen on port 443. +We recommend that a least connection based load balancing algorithm or equivalent +is used wherever possible to ensure equal spread of calls to the nodes and good performance. -See the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) -for details on managing SSL certificates and configuring NGINX. - -### Load balancer terminates SSL without backend SSL - -Configure your load balancer to use the `HTTP(S)` protocol rather than `TCP`. -The load balancer will then be responsible for managing SSL certificates and -terminating SSL. - -Since communication between the load balancer and GitLab will not be secure, -there is some additional configuration needed. See the -[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination) -for details. - -### Load balancer terminates SSL with backend SSL - -Configure your load balancers to use the 'HTTP(S)' protocol rather than 'TCP'. -The load balancers will be responsible for managing SSL certificates that -end users will see. - -Traffic will also be secure between the load balancers and NGINX in this -scenario. There is no need to add configuration for proxied SSL since the -connection will be secure all the way. However, configuration will need to be -added to GitLab to configure SSL certificates. See the -[HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) -for details on managing SSL certificates and configuring NGINX. +We don't recommend the use specifically of round-robin algorithms as they are known to not +spread connections equally in practice. ### Readiness checks @@ -360,6 +338,50 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`. | ------- | ------------ | -------- | | 443 | 22 | TCP | +### SSL + +The next question is how you will handle SSL in your environment. +There are several different options: + +- [The application node terminates SSL](#application-node-terminates-ssl). +- [The load balancer terminates SSL without backend SSL](#load-balancer-terminates-ssl-without-backend-ssl) + and communication is not secure between the load balancer and the application node. +- [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) + and communication is *secure* between the load balancer and the application node. + +#### Application node terminates SSL + +Configure your load balancer to pass connections on port 443 as `TCP` rather +than `HTTP(S)` protocol. This will pass the connection to the application node's +NGINX service untouched. NGINX will have the SSL certificate and listen on port 443. + +See the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. + +#### Load balancer terminates SSL without backend SSL + +Configure your load balancer to use the `HTTP(S)` protocol rather than `TCP`. +The load balancer will then be responsible for managing SSL certificates and +terminating SSL. + +Since communication between the load balancer and GitLab will not be secure, +there is some additional configuration needed. See the +[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination) +for details. + +#### Load balancer terminates SSL with backend SSL + +Configure your load balancers to use the 'HTTP(S)' protocol rather than 'TCP'. +The load balancers will be responsible for managing SSL certificates that +end users will see. + +Traffic will also be secure between the load balancers and NGINX in this +scenario. There is no need to add configuration for proxied SSL since the +connection will be secure all the way. However, configuration will need to be +added to GitLab to configure SSL certificates. See +the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. +
Back to setup components @@ -424,8 +446,14 @@ backend praefect ``` Refer to your preferred Load Balancer's documentation for further guidance. -Also ensure that the routing methods used are distributing calls evenly across -all nodes. + +### Balancing Algorithm + +We recommend that a least connection based load balancing algorithm or equivalent +is used wherever possible to ensure equal spread of calls to the nodes and good performance. + +We don't recommend the use specifically of round-robin algorithms as they are known to not +spread connections equally in practice.
@@ -1488,9 +1516,14 @@ The [Gitaly](../gitaly/index.md) server nodes that make up the cluster have requirements that are dependent on data and load. NOTE: -The Reference Architecture specs have been designed with good headroom in mind -but for Gitaly, increased specs or additional -Gitaly Cluster arrays may be required for notably large data sets or load. +Increased specs for Gitaly nodes may be required in some circumstances such as +significantly large repositories or if any [additional workloads](#additional-workloads), +such as [Server Hooks](../server_hooks.md), have been added. + +NOTE: +Large repositories not following best practices can impact performance notably. +Specific guidance for these can be found in the +[Managing Large Repositories](#managing-large-repositories) section below. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -1870,6 +1903,17 @@ If you find that the environment's Sidekiq job processing is slow with long queu more nodes can be added as required. You can also tune your Sidekiq nodes to run [multiple Sidekiq processes](../operations/extra_sidekiq_processes.md). +### Managing Large Repositories + +Large repositories or monorepos can notably impact performance of Git itself as well as the environment +if best practices aren't being followed. + +It's strongly recommended to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. +
Back to setup components diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index c8cf35a2e59d2e..ff5d08fe1d2214 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -161,10 +161,23 @@ Any "burstable" instance types are not recommended due to inconsistent performan ### Supported infrastructure -As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP) and their services, or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. However, this does not constitute a guarantee for every potential permutation. +As a general guidance, GitLab should run on most infrastructure such as reputable Cloud Providers (AWS, GCP) and their services, +or self managed (ESXi) that meet both the specs detailed above, as well as any requirements in this section. +However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. +### Additional Workloads + +The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with +good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, +such as security software, you may still need to adjust the specs accordingly to compensate. + +This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). + +As a general rule it's recommended to have robust monitoring in place to measure the impact of +any additional workloads to inform any changes needed to be made. + ### Praefect PostgreSQL It's worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and @@ -237,12 +250,11 @@ The following list includes descriptions of each server and its assigned IP: ## Configure the external load balancer -In a multi-node GitLab configuration, you need a load balancer to route +In a multi-node GitLab configuration, you'll need a load balancer to route traffic to the application servers. The specifics on which load balancer to use or its exact configuration is beyond the scope of GitLab documentation. We assume that if you're managing multi-node systems like GitLab, you already have a load -balancer of choice and that the routing methods used are distributing calls evenly -between all nodes. Some load balancer examples include HAProxy (open-source), +balancer of choice. Some load balancer examples include HAProxy (open-source), F5 Big-IP LTM, and Citrix Net Scaler. This documentation outline the ports and protocols needed for use with GitLab. @@ -259,45 +271,20 @@ There are several different options: - [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) and communication is *secure* between the load balancer and the application node. -### Application node terminates SSL - -Configure your load balancer to pass connections on port 443 as `TCP` rather -than `HTTP(S)` protocol. This passes the connection to the application node's -NGINX service untouched. NGINX has the SSL certificate and listen on port 443. - -See the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) -for details on managing SSL certificates and configuring NGINX. - -### Load balancer terminates SSL without backend SSL - -Configure your load balancer to use the `HTTP(S)` protocol rather than `TCP`. -The load balancer is then responsible for managing SSL certificates and -terminating SSL. - -Since communication between the load balancer and GitLab is not secure, -there is some additional configuration needed. See the -[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination) -for details. - -### Load balancer terminates SSL with backend SSL +### Balancing Algorithm -Configure your load balancers to use the 'HTTP(S)' protocol rather than 'TCP'. -The load balancers are responsible for managing SSL certificates that -end users see. +We recommend that a least connection based load balancing algorithm or equivalent +is used wherever possible to ensure equal spread of calls to the nodes and good performance. -Traffic is also secure between the load balancers and NGINX in this -scenario. There is no need to add configuration for proxied SSL since the -connection is secure all the way. However, configuration needs to be -added to GitLab to configure SSL certificates. See the -[HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) -for details on managing SSL certificates and configuring NGINX. +We don't recommend the use specifically of round-robin algorithms as they are known to not +spread connections equally in practice. ### Readiness checks Ensure the external load balancer only routes to working services with built in monitoring endpoints. The [readiness checks](../../user/admin_area/monitoring/health_check.md) all require [additional configuration](../monitoring/ip_allowlist.md) -on the nodes being checked, otherwise, the external load balancer is not able to +on the nodes being checked, otherwise, the external load balancer will not be able to connect. ### Ports @@ -316,11 +303,11 @@ The basic ports to be used are shown in the table below. to pass through the `Connection` and `Upgrade` hop-by-hop headers. See the [web terminal](../integration/terminal.md) integration guide for more details. -- (*2*): When using HTTPS protocol for port 443, you need to add an SSL +- (*2*): When using HTTPS protocol for port 443, you will need to add an SSL certificate to the load balancers. If you wish to terminate SSL at the GitLab application server instead, use TCP protocol. -If you're using GitLab Pages with custom domain support you need some +If you're using GitLab Pages with custom domain support you will need some additional port configurations. GitLab Pages requires a separate virtual IP address. Configure DNS to point the `pages_external_url` from `/etc/gitlab/gitlab.rb` at the new virtual IP address. See the @@ -342,7 +329,7 @@ GitLab Pages requires a separate virtual IP address. Configure DNS to point the Some organizations have policies against opening SSH port 22. In this case, it may be helpful to configure an alternate SSH hostname that allows users -to use SSH on port 443. An alternate SSH hostname requires a new virtual IP address +to use SSH on port 443. An alternate SSH hostname will require a new virtual IP address compared to the other GitLab HTTP configuration above. Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`. @@ -351,6 +338,50 @@ Configure DNS for an alternate SSH hostname such as `altssh.gitlab.example.com`. | ------- | ------------ | -------- | | 443 | 22 | TCP | +### SSL + +The next question is how you will handle SSL in your environment. +There are several different options: + +- [The application node terminates SSL](#application-node-terminates-ssl). +- [The load balancer terminates SSL without backend SSL](#load-balancer-terminates-ssl-without-backend-ssl) + and communication is not secure between the load balancer and the application node. +- [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) + and communication is *secure* between the load balancer and the application node. + +#### Application node terminates SSL + +Configure your load balancer to pass connections on port 443 as `TCP` rather +than `HTTP(S)` protocol. This will pass the connection to the application node's +NGINX service untouched. NGINX will have the SSL certificate and listen on port 443. + +See the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. + +#### Load balancer terminates SSL without backend SSL + +Configure your load balancer to use the `HTTP(S)` protocol rather than `TCP`. +The load balancer will then be responsible for managing SSL certificates and +terminating SSL. + +Since communication between the load balancer and GitLab will not be secure, +there is some additional configuration needed. See the +[proxied SSL documentation](https://docs.gitlab.com/omnibus/settings/ssl.html#configure-a-reverse-proxy-or-load-balancer-ssl-termination) +for details. + +#### Load balancer terminates SSL with backend SSL + +Configure your load balancers to use the 'HTTP(S)' protocol rather than 'TCP'. +The load balancers will be responsible for managing SSL certificates that +end users will see. + +Traffic will also be secure between the load balancers and NGINX in this +scenario. There is no need to add configuration for proxied SSL since the +connection will be secure all the way. However, configuration will need to be +added to GitLab to configure SSL certificates. See +the [HTTPS documentation](https://docs.gitlab.com/omnibus/settings/ssl.html) +for details on managing SSL certificates and configuring NGINX. +
Back to setup components @@ -415,8 +446,14 @@ backend praefect ``` Refer to your preferred Load Balancer's documentation for further guidance. -Also ensure that the routing methods used are distributing calls evenly across -all nodes. + +### Balancing Algorithm + +We recommend that a least connection based load balancing algorithm or equivalent +is used wherever possible to ensure equal spread of calls to the nodes and good performance. + +We don't recommend the use specifically of round-robin algorithms as they are known to not +spread connections equally in practice.
@@ -1415,9 +1452,14 @@ The [Gitaly](../gitaly/index.md) server nodes that make up the cluster have requirements that are dependent on data and load. NOTE: -The Reference Architecture specs have been designed with good headroom in mind -but for Gitaly, increased specs or additional -Gitaly Cluster arrays may be required for notably large data sets or load. +Increased specs for Gitaly nodes may be required in some circumstances such as +significantly large repositories or if any [additional workloads](#additional-workloads), +such as [Server Hooks](../server_hooks.md), have been added. + +NOTE: +Large repositories not following best practices can impact performance notably. +Specific guidance for these can be found in the +[Managing Large Repositories](#managing-large-repositories) section below. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -1795,6 +1837,17 @@ If you find that the environment's Sidekiq job processing is slow with long queu more nodes can be added as required. You can also tune your Sidekiq nodes to run [multiple Sidekiq processes](../operations/extra_sidekiq_processes.md). +### Managing Large Repositories + +Large repositories or monorepos can notably impact performance of Git itself as well as the environment +if best practices aren't being followed. + +It's strongly recommended to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. +
Back to setup components -- GitLab From 57c9e27e5e463a5808323a1298dca6f82c40cec0 Mon Sep 17 00:00:00 2001 From: Grant Young Date: Mon, 7 Nov 2022 11:53:40 +0000 Subject: [PATCH 2/8] Strengthen large repo guidance further --- .../reference_architectures/10k_users.md | 44 ++++++++++++------- .../reference_architectures/25k_users.md | 44 ++++++++++++------- .../reference_architectures/2k_users.md | 38 +++++++++------- .../reference_architectures/3k_users.md | 44 ++++++++++++------- .../reference_architectures/50k_users.md | 44 ++++++++++++------- .../reference_architectures/5k_users.md | 44 ++++++++++++------- 6 files changed, 168 insertions(+), 90 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index 765fa49548265f..e28cc90ca3e394 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -28,7 +28,7 @@ full list of reference architectures, see | Internal load balancing node3 | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | | Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | -| Gitaly5 | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | +| Gitaly56 | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | | Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | @@ -50,6 +50,7 @@ full list of reference architectures, see - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: @@ -175,6 +176,23 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. +### Large Repositories + +The Reference Architectures were tested with repositories of varying sizes that follow best practices. + +However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance +of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing +binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be +impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. + +As such, large repositories come with notable cost and typically will require more resources to handle. +It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible +Gitaly may require more resources as a result. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. + ### Praefect PostgreSQL It's worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and @@ -1196,6 +1214,12 @@ NOTE: Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). For implementations with sharded Gitaly, use the same Gitaly specs. Follow the [separate Gitaly documentation](../gitaly/configure_gitaly.md) instead of this section. +NOTE: +Gitaly has been designed and tested with repositories of varying sizes that follow best practices. +However, large repositories or monorepos not following these practices can significantly +impact Gitaly performance and requirements. +Refer to the [Large Repositories](#large-repositories) for more info. + The recommended cluster setup includes the following components: - 3 Gitaly nodes: Replicated storage of Git repositories. @@ -1508,9 +1532,10 @@ significantly large repositories or if any [additional workloads](#additional-wo such as [Server Hooks](../server_hooks.md), have been added. NOTE: -Large repositories not following best practices can impact performance notably. -Specific guidance for these can be found in the -[Managing Large Repositories](#managing-large-repositories) section below. +Gitaly has been designed and tested with repositories of varying sizes that follow best practices. +However, large repositories or monorepos not following these practices can significantly +impact Gitaly performance and requirements. +Refer to the [Large Repositories](#large-repositories) for more info. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -1890,17 +1915,6 @@ If you find that the environment's Sidekiq job processing is slow with long queu more nodes can be added as required. You can also tune your Sidekiq nodes to run [multiple Sidekiq processes](../operations/extra_sidekiq_processes.md). -### Managing Large Repositories - -Large repositories or monorepos can notably impact performance of Git itself as well as the environment -if best practices aren't being followed. - -It's strongly recommended to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS. - -Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) -for more information and guidance. -
Back to setup components diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 1157e373822977..28fc7ff2e464ae 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -28,7 +28,7 @@ full list of reference architectures, see | Internal load balancing node3 | 1 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | | Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | | Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | -| Gitaly5 | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` | +| Gitaly56 | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` | | Praefect5 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | | Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | @@ -50,6 +50,7 @@ full list of reference architectures, see - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: @@ -175,6 +176,23 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. +### Large Repositories + +The Reference Architectures were tested with repositories of varying sizes that follow best practices. + +However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance +of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing +binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be +impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. + +As such, large repositories come with notable cost and typically will require more resources to handle. +It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible +Gitaly may require more resources as a result. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. + ### Praefect PostgreSQL It's worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and @@ -1216,6 +1234,12 @@ NOTE: Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). For implementations with sharded Gitaly, use the same Gitaly specs. Follow the [separate Gitaly documentation](../gitaly/configure_gitaly.md) instead of this section. +NOTE: +Gitaly has been designed and tested with repositories of varying sizes that follow best practices. +However, large repositories or monorepos not following these practices can significantly +impact Gitaly performance and requirements. +Refer to the [Large Repositories](#large-repositories) for more info. + The recommended cluster setup includes the following components: - 3 Gitaly nodes: Replicated storage of Git repositories. @@ -1526,9 +1550,10 @@ significantly large repositories or if any [additional workloads](#additional-wo such as [Server Hooks](../server_hooks.md), have been added. NOTE: -Large repositories not following best practices can impact performance notably. -Specific guidance for these can be found in the -[Managing Large Repositories](#managing-large-repositories) section below. +Gitaly has been designed and tested with repositories of varying sizes that follow best practices. +However, large repositories or monorepos not following these practices can significantly +impact Gitaly performance and requirements. +Refer to the [Large Repositories](#large-repositories) for more info. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -1908,17 +1933,6 @@ If you find that the environment's Sidekiq job processing is slow with long queu more nodes can be added as required. You can also tune your Sidekiq nodes to run [multiple Sidekiq processes](../operations/extra_sidekiq_processes.md). -### Managing Large Repositories - -Large repositories or monorepos can notably impact performance of Git itself as well as the environment -if best practices aren't being followed. - -It's strongly recommended to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS. - -Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) -for more information and guidance. -
Back to setup components diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index a38ac473abdbe5..4f4329b82b6703 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -25,7 +25,7 @@ For a full list of reference architectures, see | Load balancer3 | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` | | PostgreSQL1 | 1 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | `D2s v3` | | Redis2 | 1 | 1 vCPU, 3.75 GB memory | `n1-standard-1` | `m5.large` | `D2s v3` | -| Gitaly | 1 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` | +| Gitaly5 | 1 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | `D4s v3` | | GitLab Rails | 2 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | `F8s v2` | | Monitoring node | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | `F2s v2` | | Object storage4 | - | - | - | - | - | @@ -42,6 +42,7 @@ For a full list of reference architectures, see 3. Can be optionally run on reputable third-party load balancing services (LB PaaS). See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. +5. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: @@ -111,6 +112,23 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. +### Large Repositories + +The Reference Architectures were tested with repositories of varying sizes that follow best practices. + +However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance +of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing +binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be +impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. + +As such, large repositories come with notable cost and typically will require more resources to handle. +It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible +Gitaly may require more resources as a result. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. + ## Setup components To set up GitLab and its components to accommodate up to 2,000 users: @@ -444,9 +462,10 @@ significantly large repositories or if any [additional workloads](#additional-wo such as [Server Hooks](../server_hooks.md), have been added. NOTE: -Large repositories not following best practices can impact performance notably. -Specific guidance for these can be found in the -[Managing Large Repositories](#managing-large-repositories) section below. +Gitaly has been designed and tested with repositories of varying sizes that follow best practices. +However, large repositories or monorepos not following these practices can significantly +impact Gitaly performance and requirements. +Refer to the [Large Repositories](#large-repositories) for more info. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -611,17 +630,6 @@ To configure Gitaly with TLS: 1. Delete `gitaly['listen_addr']` to allow only encrypted connections. 1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure). -### Managing Large Repositories - -Large repositories or monorepos can notably impact performance of Git itself as well as the environment -if best practices aren't being followed. - -It's strongly recommended to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS. - -Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) -for more information and guidance. -
Back to setup components diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 6608789f226ec5..dad315ba2919db 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -37,7 +37,7 @@ For a full list of reference architectures, see | PostgreSQL1 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | | PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Internal load balancing node3 | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | -| Gitaly5 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | +| Gitaly56 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | | Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Sidekiq | 4 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | @@ -59,6 +59,7 @@ For a full list of reference architectures, see - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: @@ -181,6 +182,23 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. +### Large Repositories + +The Reference Architectures were tested with repositories of varying sizes that follow best practices. + +However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance +of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing +binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be +impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. + +As such, large repositories come with notable cost and typically will require more resources to handle. +It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible +Gitaly may require more resources as a result. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. + ### Praefect PostgreSQL It's worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and @@ -1151,6 +1169,12 @@ NOTE: Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). For implementations with sharded Gitaly, use the same Gitaly specs. Follow the [separate Gitaly documentation](../gitaly/configure_gitaly.md) instead of this section. +NOTE: +Gitaly has been designed and tested with repositories of varying sizes that follow best practices. +However, large repositories or monorepos not following these practices can significantly +impact Gitaly performance and requirements. +Refer to the [Large Repositories](#large-repositories) for more info. + The recommended cluster setup includes the following components: - 3 Gitaly nodes: Replicated storage of Git repositories. @@ -1460,9 +1484,10 @@ significantly large repositories or if any [additional workloads](#additional-wo such as [Server Hooks](../server_hooks.md), have been added. NOTE: -Large repositories not following best practices can impact performance notably. -Specific guidance for these can be found in the -[Managing Large Repositories](#managing-large-repositories) section below. +Gitaly has been designed and tested with repositories of varying sizes that follow best practices. +However, large repositories or monorepos not following these practices can significantly +impact Gitaly performance and requirements. +Refer to the [Large Repositories](#large-repositories) for more info. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -1841,17 +1866,6 @@ If you find that the environment's Sidekiq job processing is slow with long queu more nodes can be added as required. You can also tune your Sidekiq nodes to run [multiple Sidekiq processes](../operations/extra_sidekiq_processes.md). -### Managing Large Repositories - -Large repositories or monorepos can notably impact performance of Git itself as well as the environment -if best practices aren't being followed. - -It's strongly recommended to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS. - -Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) -for more information and guidance. -
Back to setup components diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index 42faa5b4c45e36..c86aa450c4b83b 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -28,7 +28,7 @@ full list of reference architectures, see | Internal load balancing node3 | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | | Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | | Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | -| Gitaly5 | 3 | 64 vCPU, 240 GB memory | `n1-standard-64` | `m5.16xlarge` | +| Gitaly56 | 3 | 64 vCPU, 240 GB memory | `n1-standard-64` | `m5.16xlarge` | | Praefect5 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | | Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Sidekiq | 4 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | @@ -50,6 +50,7 @@ full list of reference architectures, see - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: @@ -175,6 +176,23 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. +### Large Repositories + +The Reference Architectures were tested with repositories of varying sizes that follow best practices. + +However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance +of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing +binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be +impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. + +As such, large repositories come with notable cost and typically will require more resources to handle. +It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible +Gitaly may require more resources as a result. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. + ### Praefect PostgreSQL It's worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and @@ -1209,6 +1227,12 @@ NOTE: Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). For implementations with sharded Gitaly, use the same Gitaly specs. Follow the [separate Gitaly documentation](../gitaly/configure_gitaly.md) instead of this section. +NOTE: +Gitaly has been designed and tested with repositories of varying sizes that follow best practices. +However, large repositories or monorepos not following these practices can significantly +impact Gitaly performance and requirements. +Refer to the [Large Repositories](#large-repositories) for more info. + The recommended cluster setup includes the following components: - 3 Gitaly nodes: Replicated storage of Git repositories. @@ -1521,9 +1545,10 @@ significantly large repositories or if any [additional workloads](#additional-wo such as [Server Hooks](../server_hooks.md), have been added. NOTE: -Large repositories not following best practices can impact performance notably. -Specific guidance for these can be found in the -[Managing Large Repositories](#managing-large-repositories) section below. +Gitaly has been designed and tested with repositories of varying sizes that follow best practices. +However, large repositories or monorepos not following these practices can significantly +impact Gitaly performance and requirements. +Refer to the [Large Repositories](#large-repositories) for more info. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -1903,17 +1928,6 @@ If you find that the environment's Sidekiq job processing is slow with long queu more nodes can be added as required. You can also tune your Sidekiq nodes to run [multiple Sidekiq processes](../operations/extra_sidekiq_processes.md). -### Managing Large Repositories - -Large repositories or monorepos can notably impact performance of Git itself as well as the environment -if best practices aren't being followed. - -It's strongly recommended to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS. - -Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) -for more information and guidance. -
Back to setup components diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index ff5d08fe1d2214..781c5b20356ed4 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -34,7 +34,7 @@ costly-to-operate environment by using the | PostgreSQL1 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | | PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Internal load balancing node3 | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | -| Gitaly5 | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` | +| Gitaly56 | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` | | Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Sidekiq | 4 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | @@ -56,6 +56,7 @@ costly-to-operate environment by using the - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: @@ -178,6 +179,23 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. +### Large Repositories + +The Reference Architectures were tested with repositories of varying sizes that follow best practices. + +However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance +of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing +binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be +impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. + +As such, large repositories come with notable cost and typically will require more resources to handle. +It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, +such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible +Gitaly may require more resources as a result. + +Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +for more information and guidance. + ### Praefect PostgreSQL It's worth noting that at this time [Praefect requires its own database server](../gitaly/praefect.md#postgresql) and @@ -1147,6 +1165,12 @@ NOTE: Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). For implementations with sharded Gitaly, use the same Gitaly specs. Follow the [separate Gitaly documentation](../gitaly/configure_gitaly.md) instead of this section. +NOTE: +Gitaly has been designed and tested with repositories of varying sizes that follow best practices. +However, large repositories or monorepos not following these practices can significantly +impact Gitaly performance and requirements. +Refer to the [Large Repositories](#large-repositories) for more info. + The recommended cluster setup includes the following components: - 3 Gitaly nodes: Replicated storage of Git repositories. @@ -1457,9 +1481,10 @@ significantly large repositories or if any [additional workloads](#additional-wo such as [Server Hooks](../server_hooks.md), have been added. NOTE: -Large repositories not following best practices can impact performance notably. -Specific guidance for these can be found in the -[Managing Large Repositories](#managing-large-repositories) section below. +Gitaly has been designed and tested with repositories of varying sizes that follow best practices. +However, large repositories or monorepos not following these practices can significantly +impact Gitaly performance and requirements. +Refer to the [Large Repositories](#large-repositories) for more info. Due to Gitaly having notable input and output requirements, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs @@ -1837,17 +1862,6 @@ If you find that the environment's Sidekiq job processing is slow with long queu more nodes can be added as required. You can also tune your Sidekiq nodes to run [multiple Sidekiq processes](../operations/extra_sidekiq_processes.md). -### Managing Large Repositories - -Large repositories or monorepos can notably impact performance of Git itself as well as the environment -if best practices aren't being followed. - -It's strongly recommended to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS. - -Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) -for more information and guidance. -
Back to setup components -- GitLab From 2e3eab07a255aceac71566f615b9558ca73869cb Mon Sep 17 00:00:00 2001 From: Grant Young Date: Mon, 7 Nov 2022 12:07:32 +0000 Subject: [PATCH 3/8] Also add large repo notes for Hybrids --- doc/administration/reference_architectures/10k_users.md | 5 +++-- doc/administration/reference_architectures/25k_users.md | 5 +++-- doc/administration/reference_architectures/2k_users.md | 2 +- doc/administration/reference_architectures/3k_users.md | 5 +++-- doc/administration/reference_architectures/50k_users.md | 5 +++-- doc/administration/reference_architectures/5k_users.md | 5 +++-- 6 files changed, 16 insertions(+), 11 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index e28cc90ca3e394..40c059a5553c67 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -50,7 +50,7 @@ full list of reference architectures, see - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. -6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: @@ -2337,7 +2337,7 @@ services where applicable): | Internal load balancing node3 | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | | Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | -| Gitaly5 | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | +| Gitaly56 | 3 | 16 vCPU, 60 GB memory | `n1-standard-16` | `m5.4xlarge` | | Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Object storage4 | - | - | - | - | @@ -2355,6 +2355,7 @@ services where applicable): - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 28fc7ff2e464ae..d12dcf2a99c64d 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -50,7 +50,7 @@ full list of reference architectures, see - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. -6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: @@ -2356,7 +2356,7 @@ services where applicable): | Internal load balancing node3 | 1 | 4 vCPU, 3.6GB memory | `n1-highcpu-4` | `c5.xlarge` | | Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | | Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | -| Gitaly5 | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` | +| Gitaly56 | 3 | 32 vCPU, 120 GB memory | `n1-standard-32` | `m5.8xlarge` | | Praefect5 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | | Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Object storage4 | - | - | - | - | @@ -2374,6 +2374,7 @@ services where applicable): - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 4f4329b82b6703..09b73f6934480c 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -42,7 +42,7 @@ For a full list of reference architectures, see 3. Can be optionally run on reputable third-party load balancing services (LB PaaS). See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. -5. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. +5. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index dad315ba2919db..dcfa142254c8f3 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -59,7 +59,7 @@ For a full list of reference architectures, see - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. -6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: @@ -2327,7 +2327,7 @@ services where applicable): | PostgreSQL1 | 3 | 2 vCPU, 7.5 GB memory | `n1-standard-2` | `m5.large` | | PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Internal load balancing node3 | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | -| Gitaly5 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | +| Gitaly56 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | | Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Object storage4 | - | - | - | - | @@ -2345,6 +2345,7 @@ services where applicable): - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index c86aa450c4b83b..7c66d09c37b727 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -50,7 +50,7 @@ full list of reference architectures, see - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. -6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: @@ -2358,7 +2358,7 @@ services where applicable): | Internal load balancing node3 | 1 | 8 vCPU, 7.2 GB memory | `n1-highcpu-8` | `c5.2xlarge` | | Redis/Sentinel - Cache2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | | Redis/Sentinel - Persistent2 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | -| Gitaly5 | 3 | 64 vCPU, 240 GB memory | `n1-standard-64` | `m5.16xlarge` | +| Gitaly56 | 3 | 64 vCPU, 240 GB memory | `n1-standard-64` | `m5.16xlarge` | | Praefect5 | 3 | 4 vCPU, 3.6 GB memory | `n1-highcpu-4` | `c5.xlarge` | | Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Object storage4 | - | - | - | - | @@ -2376,6 +2376,7 @@ services where applicable): - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 781c5b20356ed4..8e4bcd1dd63947 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -56,7 +56,7 @@ costly-to-operate environment by using the - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. -6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: @@ -2301,7 +2301,7 @@ services where applicable): | PostgreSQL1 | 3 | 4 vCPU, 15 GB memory | `n1-standard-4` | `m5.xlarge` | | PgBouncer1 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Internal load balancing node3 | 1 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | -| Gitaly5 | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` | +| Gitaly56 | 3 | 8 vCPU, 30 GB memory | `n1-standard-8` | `m5.2xlarge` | | Praefect5 | 3 | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Praefect PostgreSQL1 | 1+ | 2 vCPU, 1.8 GB memory | `n1-highcpu-2` | `c5.large` | | Object storage4 | - | - | - | - | @@ -2319,6 +2319,7 @@ services where applicable): - [Google Cloud Load Balancing](https://cloud.google.com/load-balancing) and [Amazon Elastic Load Balancing](https://aws.amazon.com/elasticloadbalancing/) are known to work. 4. Should be run on reputable Cloud Provider or Self Managed solutions. More information can be found in the [Configure the object storage](#configure-the-object-storage) section. 5. Gitaly Cluster provides the benefits of fault tolerance, but comes with additional complexity of setup and management. Review the existing [technical limitations and considerations before deploying Gitaly Cluster](../gitaly/index.md#before-deploying-gitaly-cluster). If you want sharded Gitaly, use the same specs listed above for `Gitaly`. +6. Gitaly has been designed and tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos that don't follow these practices can significantly impact Gitaly requirements. Refer to the [Large Repositories](#large-repositories) for more info. NOTE: -- GitLab From df263e9541bdd04e6e32cbcaa4130d7051b38e6e Mon Sep 17 00:00:00 2001 From: Achilleas Pipinellis Date: Tue, 8 Nov 2022 16:57:22 +0000 Subject: [PATCH 4/8] Apply 33 suggestion(s) to 7 file(s) --- .../reference_architectures/10k_users.md | 20 +++++++++---------- .../reference_architectures/1k_users.md | 6 +++--- .../reference_architectures/25k_users.md | 20 +++++++++---------- .../reference_architectures/2k_users.md | 4 ++-- .../reference_architectures/3k_users.md | 20 +++++++++---------- .../reference_architectures/50k_users.md | 20 +++++++++---------- .../reference_architectures/5k_users.md | 14 ++++++------- 7 files changed, 52 insertions(+), 52 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index 40c059a5553c67..c3c3787d349ccb 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -165,15 +165,15 @@ However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. -### Additional Workloads +### Additional workloads The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, such as security software, you may still need to adjust the specs accordingly to compensate. -This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). +This also applies for some GitLab features where it's possible to run custom scripts, for example [server hooks](../server_hooks.md). -As a general rule it's recommended to have robust monitoring in place to measure the impact of +As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. ### Large Repositories @@ -280,12 +280,12 @@ This architecture has been tested and validated with [HAProxy](https://www.hapro as the load balancer. Although other load balancers with similar feature sets could also be used, those load balancers have not been validated. -### Balancing Algorithm +### Balancing algorithm -We recommend that a least connection based load balancing algorithm or equivalent +We recommend that a least-connection load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. -We don't recommend the use specifically of round-robin algorithms as they are known to not +We don't recommend the use of round-robin algorithms as they are known to not spread connections equally in practice. ### Readiness checks @@ -456,12 +456,12 @@ backend praefect Refer to your preferred Load Balancer's documentation for further guidance. -### Balancing Algorithm +### Balancing algorithm -We recommend that a least connection based load balancing algorithm or equivalent +We recommend that a least-connection-based load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. -We don't recommend the use specifically of round-robin algorithms as they are known to not +We don't recommend the use of round-robin algorithms as they are known to not spread connections equally in practice.
@@ -1529,7 +1529,7 @@ requirements that are dependent on data and load. NOTE: Increased specs for Gitaly nodes may be required in some circumstances such as significantly large repositories or if any [additional workloads](#additional-workloads), -such as [Server Hooks](../server_hooks.md), have been added. +such as [server hooks](../server_hooks.md), have been added. NOTE: Gitaly has been designed and tested with repositories of varying sizes that follow best practices. diff --git a/doc/administration/reference_architectures/1k_users.md b/doc/administration/reference_architectures/1k_users.md index 7e88109219ff08..2a9636b6e05e59 100644 --- a/doc/administration/reference_architectures/1k_users.md +++ b/doc/administration/reference_architectures/1k_users.md @@ -88,15 +88,15 @@ However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. -### Additional Workloads +### Additional workloads The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, such as security software, you may still need to adjust the specs accordingly to compensate. -This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). +This also applies for some GitLab features where it's possible to run custom scripts, for example [server hooks](../server_hooks.md). -As a general rule it's recommended to have robust monitoring in place to measure the impact of +As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. ### Swap diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index d12dcf2a99c64d..df9467ae8bb7c8 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -165,15 +165,15 @@ However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. -### Additional Workloads +### Additional workloads The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, such as security software, you may still need to adjust the specs accordingly to compensate. -This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). +This also applies for some GitLab features where it's possible to run custom scripts, for example [server hooks](../server_hooks.md). -As a general rule it's recommended to have robust monitoring in place to measure the impact of +As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. ### Large Repositories @@ -291,12 +291,12 @@ There are several different options: - [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) and communication is *secure* between the load balancer and the application node. -### Balancing Algorithm +### Balancing algorithm -We recommend that a least connection based load balancing algorithm or equivalent +We recommend that a least-connection load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. -We don't recommend the use specifically of round-robin algorithms as they are known to not +We don't recommend the use of round-robin algorithms as they are known to not spread connections equally in practice. ### Readiness checks @@ -467,12 +467,12 @@ backend praefect Refer to your preferred Load Balancer's documentation for further guidance. -### Balancing Algorithm +### Balancing algorithm -We recommend that a least connection based load balancing algorithm or equivalent +We recommend that a least-connection load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. -We don't recommend the use specifically of round-robin algorithms as they are known to not +We don't recommend the use of round-robin algorithms as they are known to not spread connections equally in practice.
@@ -1547,7 +1547,7 @@ requirements that are dependent on data and load. NOTE: Increased specs for Gitaly nodes may be required in some circumstances such as significantly large repositories or if any [additional workloads](#additional-workloads), -such as [Server Hooks](../server_hooks.md), have been added. +such as [server hooks](../server_hooks.md), have been added. NOTE: Gitaly has been designed and tested with repositories of varying sizes that follow best practices. diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 09b73f6934480c..6e72c0674cfce7 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -175,7 +175,7 @@ several different options: - [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) and communication is *secure* between the load balancer and the application node. -### Balancing Algorithm +### Balancing algorithm We recommend that a least connection based load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. @@ -459,7 +459,7 @@ specifically the number of projects and those projects' sizes. NOTE: Increased specs for Gitaly nodes may be required in some circumstances such as significantly large repositories or if any [additional workloads](#additional-workloads), -such as [Server Hooks](../server_hooks.md), have been added. +such as [server hooks](../server_hooks.md), have been added. NOTE: Gitaly has been designed and tested with repositories of varying sizes that follow best practices. diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index dcfa142254c8f3..30826d6e79b487 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -171,15 +171,15 @@ However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. -### Additional Workloads +### Additional workloads The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, such as security software, you may still need to adjust the specs accordingly to compensate. -This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). +This also applies for some GitLab features where it's possible to run custom scripts, for example [server hooks](../server_hooks.md). -As a general rule it's recommended to have robust monitoring in place to measure the impact of +As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. ### Large Repositories @@ -292,12 +292,12 @@ There are several different options: - [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) and communication is *secure* between the load balancer and the application node. -### Balancing Algorithm +### Balancing algorithm -We recommend that a least connection based load balancing algorithm or equivalent +We recommend that a least-connection load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. -We don't recommend the use specifically of round-robin algorithms as they are known to not +We don't recommend the use of round-robin algorithms as they are known to not spread connections equally in practice. ### Readiness checks @@ -468,12 +468,12 @@ backend praefect Refer to your preferred Load Balancer's documentation for further guidance. -### Balancing Algorithm +### Balancing algorithm -We recommend that a least connection based load balancing algorithm or equivalent +We recommend that a least-connection load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. -We don't recommend the use specifically of round-robin algorithms as they are known to not +We don't recommend the use of round-robin algorithms as they are known to not spread connections equally in practice.
@@ -1481,7 +1481,7 @@ requirements that are dependent on data and load. NOTE: Increased specs for Gitaly nodes may be required in some circumstances such as significantly large repositories or if any [additional workloads](#additional-workloads), -such as [Server Hooks](../server_hooks.md), have been added. +such as [server hooks](../server_hooks.md), have been added. NOTE: Gitaly has been designed and tested with repositories of varying sizes that follow best practices. diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index 7c66d09c37b727..281dce78a3d258 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -165,15 +165,15 @@ However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. -### Additional Workloads +### Additional workloads The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, such as security software, you may still need to adjust the specs accordingly to compensate. -This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). +This also applies for some GitLab features where it's possible to run custom scripts, for example [server hooks](../server_hooks.md). -As a general rule it's recommended to have robust monitoring in place to measure the impact of +As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. ### Large Repositories @@ -289,12 +289,12 @@ This architecture has been tested and validated with [HAProxy](https://www.hapro as the load balancer. Although other load balancers with similar feature sets could also be used, those load balancers have not been validated. -### Balancing Algorithm +### Balancing algorithm -We recommend that a least connection based load balancing algorithm or equivalent +We recommend that a least-connection load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. -We don't recommend the use specifically of round-robin algorithms as they are known to not +We don't recommend the use of round-robin algorithms as they are known to not spread connections equally in practice. ### Readiness checks @@ -465,12 +465,12 @@ backend praefect Refer to your preferred Load Balancer's documentation for further guidance. -### Balancing Algorithm +### Balancing algorithm -We recommend that a least connection based load balancing algorithm or equivalent +We recommend that a least-connection load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. -We don't recommend the use specifically of round-robin algorithms as they are known to not +We don't recommend the use of round-robin algorithms as they are known to not spread connections equally in practice.
@@ -1542,7 +1542,7 @@ requirements that are dependent on data and load. NOTE: Increased specs for Gitaly nodes may be required in some circumstances such as significantly large repositories or if any [additional workloads](#additional-workloads), -such as [Server Hooks](../server_hooks.md), have been added. +such as [server hooks](../server_hooks.md), have been added. NOTE: Gitaly has been designed and tested with repositories of varying sizes that follow best practices. diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 8e4bcd1dd63947..63177080805a82 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -289,12 +289,12 @@ There are several different options: - [The load balancer terminates SSL with backend SSL](#load-balancer-terminates-ssl-with-backend-ssl) and communication is *secure* between the load balancer and the application node. -### Balancing Algorithm +### Balancing algorithm -We recommend that a least connection based load balancing algorithm or equivalent +We recommend that a least-connection load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. -We don't recommend the use specifically of round-robin algorithms as they are known to not +We don't recommend the use of round-robin algorithms as they are known to not spread connections equally in practice. ### Readiness checks @@ -465,12 +465,12 @@ backend praefect Refer to your preferred Load Balancer's documentation for further guidance. -### Balancing Algorithm +### Balancing algorithm -We recommend that a least connection based load balancing algorithm or equivalent +We recommend that a least-connection load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. -We don't recommend the use specifically of round-robin algorithms as they are known to not +We don't recommend the use of round-robin algorithms as they are known to not spread connections equally in practice.
@@ -1478,7 +1478,7 @@ requirements that are dependent on data and load. NOTE: Increased specs for Gitaly nodes may be required in some circumstances such as significantly large repositories or if any [additional workloads](#additional-workloads), -such as [Server Hooks](../server_hooks.md), have been added. +such as [server hooks](../server_hooks.md), have been added. NOTE: Gitaly has been designed and tested with repositories of varying sizes that follow best practices. -- GitLab From 2cc1990890ba3cffa0db34eeade11ef4122c4277 Mon Sep 17 00:00:00 2001 From: Achilleas Pipinellis Date: Tue, 8 Nov 2022 16:58:02 +0000 Subject: [PATCH 5/8] Apply 4 suggestion(s) to 2 file(s) --- doc/administration/reference_architectures/2k_users.md | 10 +++++----- doc/administration/reference_architectures/5k_users.md | 6 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 6e72c0674cfce7..a9285f53e4ac18 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -101,15 +101,15 @@ However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. -### Additional Workloads +### Additional workloads The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, such as security software, you may still need to adjust the specs accordingly to compensate. -This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). +This also applies for some GitLab features where it's possible to run custom scripts, for example [server hooks](../server_hooks.md). -As a general rule it's recommended to have robust monitoring in place to measure the impact of +As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. ### Large Repositories @@ -177,10 +177,10 @@ several different options: ### Balancing algorithm -We recommend that a least connection based load balancing algorithm or equivalent +We recommend that a least-connection load balancing algorithm or equivalent is used wherever possible to ensure equal spread of calls to the nodes and good performance. -We don't recommend the use specifically of round-robin algorithms as they are known to not +We don't recommend the use of round-robin algorithms as they are known to not spread connections equally in practice. ### Readiness checks diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 63177080805a82..a3bb4e3f6cfbd4 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -168,15 +168,15 @@ However, this does not constitute a guarantee for every potential permutation. See [Recommended cloud providers and services](index.md#recommended-cloud-providers-and-services) for more information. -### Additional Workloads +### Additional workloads The Reference Architectures have been [designed and tested](index.md#validation-and-test-results) for standard GitLab setups with good headroom in mind to cover most scenarios. However, if any additional workloads are being added on the nodes, such as security software, you may still need to adjust the specs accordingly to compensate. -This also applies for some GitLab features where it's possible to run custom scripts, for example [Server Hooks](../server_hooks.md). +This also applies for some GitLab features where it's possible to run custom scripts, for example [server hooks](../server_hooks.md). -As a general rule it's recommended to have robust monitoring in place to measure the impact of +As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. ### Large Repositories -- GitLab From b43908200a07572d214d6819df7ba17b3fbb08bf Mon Sep 17 00:00:00 2001 From: Grant Young Date: Tue, 8 Nov 2022 17:18:11 +0000 Subject: [PATCH 6/8] Adjust large repo guidance further --- .../reference_architectures/10k_users.md | 23 +++++++++++-------- .../reference_architectures/25k_users.md | 23 +++++++++++-------- .../reference_architectures/2k_users.md | 23 +++++++++++-------- .../reference_architectures/3k_users.md | 23 +++++++++++-------- .../reference_architectures/50k_users.md | 23 +++++++++++-------- .../reference_architectures/5k_users.md | 23 +++++++++++-------- 6 files changed, 84 insertions(+), 54 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index c3c3787d349ccb..7997bff5a03c61 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -176,19 +176,24 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. -### Large Repositories +### Large repositories The Reference Architectures were tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance -of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing -binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be -impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. - -As such, large repositories come with notable cost and typically will require more resources to handle. -It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible -Gitaly may require more resources as a result. +of Git and in turn the environment itself if best practices aren't being followed such as not storing +binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging +when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and +CI pipelines alike. + +As such, large repositories come with notable cost and typically will require more resources to handle, +significantly so in some cases. It's therefore **strongly** recommended then to review large repositories +to ensure they maintain good repo health and reduce their size wherever possible. + +NOTE: +If best practices aren't followed and large repositories are present on the environment, +increased Gitaly specs may be required to ensure stable performance. Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) for more information and guidance. diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index df9467ae8bb7c8..3c938054189f71 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -176,19 +176,24 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. -### Large Repositories +### Large repositories The Reference Architectures were tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance -of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing -binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be -impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. - -As such, large repositories come with notable cost and typically will require more resources to handle. -It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible -Gitaly may require more resources as a result. +of Git and in turn the environment itself if best practices aren't being followed such as not storing +binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging +when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and +CI pipelines alike. + +As such, large repositories come with notable cost and typically will require more resources to handle, +significantly so in some cases. It's therefore **strongly** recommended then to review large repositories +to ensure they maintain good repo health and reduce their size wherever possible. + +NOTE: +If best practices aren't followed and large repositories are present on the environment, +increased Gitaly specs may be required to ensure stable performance. Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) for more information and guidance. diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index a9285f53e4ac18..ce87ff7ca602e9 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -112,19 +112,24 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. -### Large Repositories +### Large repositories The Reference Architectures were tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance -of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing -binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be -impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. - -As such, large repositories come with notable cost and typically will require more resources to handle. -It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible -Gitaly may require more resources as a result. +of Git and in turn the environment itself if best practices aren't being followed such as not storing +binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging +when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and +CI pipelines alike. + +As such, large repositories come with notable cost and typically will require more resources to handle, +significantly so in some cases. It's therefore **strongly** recommended then to review large repositories +to ensure they maintain good repo health and reduce their size wherever possible. + +NOTE: +If best practices aren't followed and large repositories are present on the environment, +increased Gitaly specs may be required to ensure stable performance. Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) for more information and guidance. diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 30826d6e79b487..0b38c545573c1d 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -182,19 +182,24 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. -### Large Repositories +### Large repositories The Reference Architectures were tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance -of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing -binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be -impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. - -As such, large repositories come with notable cost and typically will require more resources to handle. -It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible -Gitaly may require more resources as a result. +of Git and in turn the environment itself if best practices aren't being followed such as not storing +binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging +when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and +CI pipelines alike. + +As such, large repositories come with notable cost and typically will require more resources to handle, +significantly so in some cases. It's therefore **strongly** recommended then to review large repositories +to ensure they maintain good repo health and reduce their size wherever possible. + +NOTE: +If best practices aren't followed and large repositories are present on the environment, +increased Gitaly specs may be required to ensure stable performance. Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) for more information and guidance. diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index 281dce78a3d258..f267d49bf9eb7e 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -176,19 +176,24 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. -### Large Repositories +### Large repositories The Reference Architectures were tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance -of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing -binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be -impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. - -As such, large repositories come with notable cost and typically will require more resources to handle. -It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible -Gitaly may require more resources as a result. +of Git and in turn the environment itself if best practices aren't being followed such as not storing +binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging +when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and +CI pipelines alike. + +As such, large repositories come with notable cost and typically will require more resources to handle, +significantly so in some cases. It's therefore **strongly** recommended then to review large repositories +to ensure they maintain good repo health and reduce their size wherever possible. + +NOTE: +If best practices aren't followed and large repositories are present on the environment, +increased Gitaly specs may be required to ensure stable performance. Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) for more information and guidance. diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index a3bb4e3f6cfbd4..0e2efb1d19036f 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -179,19 +179,24 @@ This also applies for some GitLab features where it's possible to run custom scr As a general rule, it's recommended to have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. -### Large Repositories +### Large repositories The Reference Architectures were tested with repositories of varying sizes that follow best practices. However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance -of Git and in turn the environment itself (primarily Gitaly) if best practices aren't being followed such as not storing -binary or blob files in LFS. This is due to numerous actions happening under the hood across the whole repo that can be -impactful alone but especially so at larger scales where CPU and Memory resource requirements can jump notably. - -As such, large repositories come with notable cost and typically will require more resources to handle. -It's therefore **strongly** recommended then to review large repositories to ensure they maintain good repo health, -such as the storage of binary files in Git LFS, and reduce their size wherever possible. If this is not possible -Gitaly may require more resources as a result. +of Git and in turn the environment itself if best practices aren't being followed such as not storing +binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging +when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and +CI pipelines alike. + +As such, large repositories come with notable cost and typically will require more resources to handle, +significantly so in some cases. It's therefore **strongly** recommended then to review large repositories +to ensure they maintain good repo health and reduce their size wherever possible. + +NOTE: +If best practices aren't followed and large repositories are present on the environment, +increased Gitaly specs may be required to ensure stable performance. Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) for more information and guidance. -- GitLab From 27e1840872fa4878cafd116d4234946e802fe5db Mon Sep 17 00:00:00 2001 From: Grant Young Date: Tue, 8 Nov 2022 17:26:09 +0000 Subject: [PATCH 7/8] Spelling lint fix --- doc/administration/reference_architectures/10k_users.md | 2 +- doc/administration/reference_architectures/25k_users.md | 2 +- doc/administration/reference_architectures/2k_users.md | 2 +- doc/administration/reference_architectures/3k_users.md | 2 +- doc/administration/reference_architectures/50k_users.md | 2 +- doc/administration/reference_architectures/5k_users.md | 2 +- 6 files changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index 7997bff5a03c61..1e472360706a11 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -183,7 +183,7 @@ The Reference Architectures were tested with repositories of varying sizes that However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance of Git and in turn the environment itself if best practices aren't being followed such as not storing binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging -when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +when they are not optimized. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and CI pipelines alike. diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 3c938054189f71..1ccaf7d06dd9f1 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -183,7 +183,7 @@ The Reference Architectures were tested with repositories of varying sizes that However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance of Git and in turn the environment itself if best practices aren't being followed such as not storing binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging -when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +when they are not optimized. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and CI pipelines alike. diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index ce87ff7ca602e9..599fdc1bc7fbec 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -119,7 +119,7 @@ The Reference Architectures were tested with repositories of varying sizes that However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance of Git and in turn the environment itself if best practices aren't being followed such as not storing binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging -when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +when they are not optimized. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and CI pipelines alike. diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 0b38c545573c1d..963f4dfef920ae 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -189,7 +189,7 @@ The Reference Architectures were tested with repositories of varying sizes that However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance of Git and in turn the environment itself if best practices aren't being followed such as not storing binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging -when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +when they are not optimized. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and CI pipelines alike. diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index f267d49bf9eb7e..79e51f5fb05043 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -183,7 +183,7 @@ The Reference Architectures were tested with repositories of varying sizes that However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance of Git and in turn the environment itself if best practices aren't being followed such as not storing binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging -when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +when they are not optimized. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and CI pipelines alike. diff --git a/doc/administration/reference_architectures/5k_users.md b/doc/administration/reference_architectures/5k_users.md index 0e2efb1d19036f..c9e70018f9ed74 100644 --- a/doc/administration/reference_architectures/5k_users.md +++ b/doc/administration/reference_architectures/5k_users.md @@ -186,7 +186,7 @@ The Reference Architectures were tested with repositories of varying sizes that However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance of Git and in turn the environment itself if best practices aren't being followed such as not storing binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging -when they are not optimised. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) +when they are not optimized. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) taking longer and consuming high CPU / Memory resources or Git checkouts taking longer that affect both users and CI pipelines alike. -- GitLab From 9f4e71c8185e2e17c488bba4d4270421edeb32c6 Mon Sep 17 00:00:00 2001 From: Achilleas Pipinellis Date: Wed, 9 Nov 2022 13:16:13 +0000 Subject: [PATCH 8/8] Apply 9 suggestion(s) to 5 file(s) --- doc/administration/reference_architectures/10k_users.md | 4 ++-- doc/administration/reference_architectures/25k_users.md | 4 ++-- doc/administration/reference_architectures/2k_users.md | 4 ++-- doc/administration/reference_architectures/3k_users.md | 4 ++-- doc/administration/reference_architectures/50k_users.md | 2 +- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/doc/administration/reference_architectures/10k_users.md b/doc/administration/reference_architectures/10k_users.md index 1e472360706a11..8734d8e1503fea 100644 --- a/doc/administration/reference_architectures/10k_users.md +++ b/doc/administration/reference_architectures/10k_users.md @@ -180,7 +180,7 @@ any additional workloads to inform any changes needed to be made. The Reference Architectures were tested with repositories of varying sizes that follow best practices. -However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance +However, large repositories or monorepos (several gigabytes or more) can **significantly** impact the performance of Git and in turn the environment itself if best practices aren't being followed such as not storing binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging when they are not optimized. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) @@ -195,7 +195,7 @@ NOTE: If best practices aren't followed and large repositories are present on the environment, increased Gitaly specs may be required to ensure stable performance. -Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +Refer to the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) for more information and guidance. ### Praefect PostgreSQL diff --git a/doc/administration/reference_architectures/25k_users.md b/doc/administration/reference_architectures/25k_users.md index 1ccaf7d06dd9f1..b9d13165743d3d 100644 --- a/doc/administration/reference_architectures/25k_users.md +++ b/doc/administration/reference_architectures/25k_users.md @@ -180,7 +180,7 @@ any additional workloads to inform any changes needed to be made. The Reference Architectures were tested with repositories of varying sizes that follow best practices. -However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance +However, large repositories or monorepos (several gigabytes or more) can **significantly** impact the performance of Git and in turn the environment itself if best practices aren't being followed such as not storing binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging when they are not optimized. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) @@ -195,7 +195,7 @@ NOTE: If best practices aren't followed and large repositories are present on the environment, increased Gitaly specs may be required to ensure stable performance. -Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +Refer to the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) for more information and guidance. ### Praefect PostgreSQL diff --git a/doc/administration/reference_architectures/2k_users.md b/doc/administration/reference_architectures/2k_users.md index 599fdc1bc7fbec..003cb4a203bb2e 100644 --- a/doc/administration/reference_architectures/2k_users.md +++ b/doc/administration/reference_architectures/2k_users.md @@ -116,7 +116,7 @@ any additional workloads to inform any changes needed to be made. The Reference Architectures were tested with repositories of varying sizes that follow best practices. -However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance +However, large repositories or monorepos (several gigabytes or more) can **significantly** impact the performance of Git and in turn the environment itself if best practices aren't being followed such as not storing binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging when they are not optimized. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) @@ -131,7 +131,7 @@ NOTE: If best practices aren't followed and large repositories are present on the environment, increased Gitaly specs may be required to ensure stable performance. -Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +Refer to the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) for more information and guidance. ## Setup components diff --git a/doc/administration/reference_architectures/3k_users.md b/doc/administration/reference_architectures/3k_users.md index 963f4dfef920ae..ccde764386d160 100644 --- a/doc/administration/reference_architectures/3k_users.md +++ b/doc/administration/reference_architectures/3k_users.md @@ -186,7 +186,7 @@ any additional workloads to inform any changes needed to be made. The Reference Architectures were tested with repositories of varying sizes that follow best practices. -However, large repositories or monorepos (Several gigabytes or more) can **significantly** impact the performance +However, large repositories or monorepos (several gigabytes or more) can **significantly** impact the performance of Git and in turn the environment itself if best practices aren't being followed such as not storing binary or blob files in LFS. Repositories are at the core of any environment the consequences can be wide-ranging when they are not optimized. Some examples of this impact include [Git packing operations](https://git-scm.com/book/en/v2/Git-Internals-Packfiles) @@ -201,7 +201,7 @@ NOTE: If best practices aren't followed and large repositories are present on the environment, increased Gitaly specs may be required to ensure stable performance. -Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +Refer to the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) for more information and guidance. ### Praefect PostgreSQL diff --git a/doc/administration/reference_architectures/50k_users.md b/doc/administration/reference_architectures/50k_users.md index 79e51f5fb05043..a4dcaa6402fb7b 100644 --- a/doc/administration/reference_architectures/50k_users.md +++ b/doc/administration/reference_architectures/50k_users.md @@ -195,7 +195,7 @@ NOTE: If best practices aren't followed and large repositories are present on the environment, increased Gitaly specs may be required to ensure stable performance. -Refer the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) +Refer to the [Managing large repositories documentation](../../user/project/repository/managing_large_repositories.md) for more information and guidance. ### Praefect PostgreSQL -- GitLab