From 83d94d1d23d7bc07ebe1e925ea83c058fc917bf3 Mon Sep 17 00:00:00 2001 From: Mike Kozono Date: Fri, 30 Jun 2023 13:30:55 -1000 Subject: [PATCH 1/9] Document back up of large reference architectures --- .../backup_restore/backup_gitlab.md | 31 +++++- .../backup_large_reference_architectures.md | 99 +++++++++++++++++++ 2 files changed, 128 insertions(+), 2 deletions(-) create mode 100644 doc/administration/backup_restore/backup_large_reference_architectures.md diff --git a/doc/administration/backup_restore/backup_gitlab.md b/doc/administration/backup_restore/backup_gitlab.md index 24b7b453517a69..ab220346f9a5af 100644 --- a/doc/administration/backup_restore/backup_gitlab.md +++ b/doc/administration/backup_restore/backup_gitlab.md @@ -18,15 +18,18 @@ As a rough guideline, if you are using a [1k reference architecture](../referenc ## Scaling backups -As the volume of GitLab data grows, the [backup command](#backup-command) takes longer to execute. At some point, the execution time becomes impractical. For example, it can take 24 hours or more. +As the volume of GitLab data grows, the [backup command](#backup-command) takes longer to execute. [Backup options](#backup-options) such as [back up Git repositories concurrently](#back-up-git-repositories-concurrently) and [incremental repository backups](#incremental-repository-backups) can help to reduce execution time. At some point, the backup command becomes impractical by itself. For example, it can take 24 hours or more. -For more information, see [alternative backup strategies](#alternative-backup-strategies). +In some cases, a different architecture may be warranted to allow backups to scale. For more information, see [alternative backup strategies](#alternative-backup-strategies). + +If you are using a GitLab reference architecture, see [Back up and restore large reference architectures](backup_large_reference_architectures.md). ## What data needs to be backed up? - [PostgreSQL databases](#postgresql-databases) - [Git repositories](#git-repositories) - [Blobs](#blobs) +- [Container Registry](#container-registry) - [Configuration files](#storing-configuration-files) - [Other data](#other-data) @@ -87,6 +90,30 @@ The [backup command](#backup-command) doesn't back up blobs that aren't stored o - [Amazon S3 backups](https://docs.aws.amazon.com/aws-backup/latest/devguide/s3-backups.html) - [Google Cloud Storage Transfer Service](https://cloud.google.com/storage-transfer-service) and [Google Cloud Storage Object Versioning](https://cloud.google.com/storage/docs/object-versioning) +### Container Registry + +[GitLab Container Registry](../packages/container_registry.md) storage can be configured in either: + +- The file system in a specific location. +- An [Object Storage](../object_storage.md) solution. Object Storage solutions can be: + - Cloud based like Amazon S3 and Google Cloud Storage. + - Hosted by you (like MinIO). + - A Storage Appliance that exposes an Object Storage-compatible API. + +The Container Registry stores two categories of data in its configured storage backend. + +- Blobs, also known as images +- Repositories, also known as metadata + +The back up command will back up both categories of data when they are stored in the default location on the file system. + +#### Object storage + +The [backup command](#backup-command) doesn't back up blobs that aren't stored on the file system. If you're using [object storage](../object_storage.md), be sure to enable backups with your object storage provider. For example, see: + +- [Amazon S3 backups](https://docs.aws.amazon.com/aws-backup/latest/devguide/s3-backups.html) +- [Google Cloud Storage Transfer Service](https://cloud.google.com/storage-transfer-service) and [Google Cloud Storage Object Versioning](https://cloud.google.com/storage/docs/object-versioning) + ### Storing configuration files WARNING: diff --git a/doc/administration/backup_restore/backup_large_reference_architectures.md b/doc/administration/backup_restore/backup_large_reference_architectures.md new file mode 100644 index 00000000000000..79fa86484c7a6a --- /dev/null +++ b/doc/administration/backup_restore/backup_large_reference_architectures.md @@ -0,0 +1,99 @@ +--- +stage: Systems +group: Geo +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments +--- + +# Back up and restore large reference architectures **(FREE SELF)** + +This document describes how to: + +- [Configure daily backups](#configure-daily-backups) +- Take a backup now (planned) +- Restore a backup (planned) + +This document is intended for environments using: + +- [GitLab reference architectures 3,000 users and up](../reference_architectures/index.md) +- Highly-automated deployment tooling such as [GitLab Environment Toolkit](https://gitlab.com/gitlab-org/gitlab-environment-toolkit) +- [Amazon RDS](https://aws.amazon.com/rds/) for PostgreSQL data +- [Amazon S3](https://aws.amazon.com/s3/) for object storage +- [Object storage](../object_storage.md) to store everything possible, including [blobs](backup_gitlab.md#blobs) and [Container Registry](backup_gitlab.md#container-registry) + +## Configure daily backups + +### Configure backup of PostgreSQL and object storage data + +The [backup command](backup_gitlab.md) uses `pg_dump`, which is [not appropriate for databases over 100 GB](backup_gitlab.md#postgresql-databases). You must choose a PostgreSQL solution which has native, robust backup capabilities. + +[Object storage](../object_storage.md), [not NFS](../nfs.md) is recommended for storing GitLab data, including [blobs](backup_gitlab.md#blobs) and [Container registry](backup_gitlab.md#container-registry). + +1. [Configure AWS Backup](https://docs.aws.amazon.com/aws-backup/latest/devguide/creating-a-backup-plan.html) to back up both RDS and S3 data. For maximum protection, [configure continuous backups as well as snapshot backups](https://docs.aws.amazon.com/aws-backup/latest/devguide/point-in-time-recovery.html). +1. Configure AWS Backup to copy backups to a separate region. When AWS takes a backup, the backup can only be restored in the region the backup is stored. +1. After AWS Backup has run at least one scheduled backup, then you can [create an on-demand backup](https://docs.aws.amazon.com/aws-backup/latest/devguide/recov-point-create-on-demand-backup.html) as needed. + +### Configure backup of Git repositories + +NOTE: +There is a feature proposal to add the ability to back up repositories directly from Gitaly to object storage. See [epic #10077](https://gitlab.com/groups/gitlab-org/-/epics/10077). + +::Tabs + +:::TabTitle Linux package (Omnibus) + +We will continue to use the [backup command](backup_gitlab.md#backup-command) to back up Git repositories. + +If utilization is low enough, you can run it from an existing GitLab Rails node. Otherwise, spin up another node. + +:::TabTitle Cloud native hybrid + +[The `backup-utility` command in a `toolbox` pod will fail when there is a large amount of data](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1352989908). In this case, you must run the [backup command](backup_gitlab.md#backup-command) to back up Git repositories, and you need to run it in a VM running the GitLab Linux package. + +1. Spin up a VM with 8 vCPU and 7.2 GB memory. This node will be used to back up Git repositories. Note: + - [A Praefect node cannot be used to back up Git data at this time](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1385950340). +1. Configure the node as another **GitLab Rails** node as defined in your [reference architecture](../reference_architectures/index.md). Use the [GitLab Environment Toolkit `gitlab_rails.yml` playbook](https://gitlab.com/gitlab-org/gitlab-environment-toolkit/-/blob/2.8.5/ansible/playbooks/gitlab_rails.yml). As with other GitLab Rails nodes, this node must have access to your main Postgres database as well as to Gitaly Cluster. + +::EndTabs + +1. The backup node will copy all of the environment's Git data, so ensure that it has enough attached storage. For example, you need at least as much storage as one node in a Gitaly Cluster. Without Gitaly Cluster, you need at least as much storage as all Gitaly nodes. Keep in mind that Git repository backups can be significantly larger than Gitaly storage usage because forks are deduplicated in Gitaly but not in backups. +1. SSH into the GitLab Rails node. +1. [Configure uploading backups to remote cloud storage](backup_gitlab.md#upload-backups-to-a-remote-cloud-storage). +1. [Configure AWS Backup](https://docs.aws.amazon.com/aws-backup/latest/devguide/creating-a-backup-plan.html) for this bucket. Or use a bucket in the same account and region as your production data object storage buckets, and ensure this bucket is included in your [preexisting AWS Backup](#configure-backup-of-postgresql-and-object-storage-data). +1. Run the [backup command](backup_gitlab.md#backup-command), skipping PostgreSQL data: + + ```shell + sudo gitlab-backup create SKIP=db + ``` + + The resulting tar file will include only the Git repositories and some metadata. Blobs such as uploads, artifacts, and LFS do not need to be explicitly skipped, because the command does not back up object storage by default. The tar file will be created in the [`/var/opt/gitlab/backups` directory](https://docs.gitlab.com/omnibus/settings/backups.html#creating-an-application-backup) and [the filename will end in `_gitlab_backup.tar`](backup_gitlab.md#backup-timestamp). + + Since we configured uploading backups to remote cloud storage, the tar file will be uploaded to the remote region and deleted from disk. + +1. Note the [timestamp](backup_gitlab.md#backup-timestamp) of the backup file for the next step. For example, if the backup name is `1493107454_2018_04_25_10.6.4-ce_gitlab_backup.tar`, the timestamp is `1493107454_2018_04_25_10.6.4-ce`. +1. Run the [backup command](backup_gitlab.md#backup-command) again, this time specifying [incremental backup of Git repositories](backup_gitlab.md#incremental-repository-backups), and the timestamp of the source backup file. Using the example timestamp from the previous step, the command is: + + ```shell + sudo gitlab-backup create SKIP=db INCREMENTAL=yes PREVIOUS_BACKUP=1493107454_2018_04_25_10.6.4-ce + ``` + +1. Check that the incremental backup succeeded and uploaded to object storage. +1. [Configure cron to make daily backups](backup_gitlab.md#configuring-cron-to-make-daily-backups). Edit the crontab for the `root` user: + + ```shell + sudo su - + crontab -e + ``` + +1. There, add the following line to schedule the backup for everyday at 2 AM: + + ```plaintext + 0 2 * * * /opt/gitlab/bin/gitlab-backup create SKIP=db INCREMENTAL=yes PREVIOUS_BACKUP=1493107454_2018_04_25_10.6.4-ce CRON=1 + ``` + +### Configure backup of configuration files + +We strongly recommend using rigorous automation tools such as [Terraform](https://www.terraform.io/) and [Ansible](https://www.ansible.com/) to administer large GitLab environments. [GitLab Environment Toolkit](https://gitlab.com/gitlab-org/gitlab-environment-toolkit) is a good example. You may choose to build up your own deployment tool and use it as a reference. + +Following this approach, your configuration files and secrets should already exist in secure, canonical locations outside of the production VMs or pods. This document does not cover backing up that data. + +As an example, you can store secrets in [AWS Secret Manager](https://aws.amazon.com/secrets-manager/) and pull them into your [Terraform configuration files](https://gitlab.com/gitlab-org/gitlab-environment-toolkit/-/blob/main/docs/environment_provision.md#terraform-data-sources). [AWS Secret Manager can be configured to replicate to multiple regions](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create-manage-multi-region-secrets.html). -- GitLab From 613467f94514988646f4ab966c5007d57a6735c4 Mon Sep 17 00:00:00 2001 From: Michael Kozono Date: Sat, 15 Jul 2023 05:21:09 +0000 Subject: [PATCH 2/9] Apply tweaks --- doc/administration/backup_restore/backup_gitlab.md | 4 ++-- .../backup_restore/backup_large_reference_architectures.md | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/administration/backup_restore/backup_gitlab.md b/doc/administration/backup_restore/backup_gitlab.md index ab220346f9a5af..a1d1c66e750f21 100644 --- a/doc/administration/backup_restore/backup_gitlab.md +++ b/doc/administration/backup_restore/backup_gitlab.md @@ -20,9 +20,9 @@ As a rough guideline, if you are using a [1k reference architecture](../referenc As the volume of GitLab data grows, the [backup command](#backup-command) takes longer to execute. [Backup options](#backup-options) such as [back up Git repositories concurrently](#back-up-git-repositories-concurrently) and [incremental repository backups](#incremental-repository-backups) can help to reduce execution time. At some point, the backup command becomes impractical by itself. For example, it can take 24 hours or more. -In some cases, a different architecture may be warranted to allow backups to scale. For more information, see [alternative backup strategies](#alternative-backup-strategies). +In some cases, architecture changes may be warranted to allow backups to scale. If you are using a GitLab reference architecture, see [Back up and restore large reference architectures](backup_large_reference_architectures.md). -If you are using a GitLab reference architecture, see [Back up and restore large reference architectures](backup_large_reference_architectures.md). +For more information, see [alternative backup strategies](#alternative-backup-strategies). ## What data needs to be backed up? diff --git a/doc/administration/backup_restore/backup_large_reference_architectures.md b/doc/administration/backup_restore/backup_large_reference_architectures.md index 79fa86484c7a6a..4182c89870153c 100644 --- a/doc/administration/backup_restore/backup_large_reference_architectures.md +++ b/doc/administration/backup_restore/backup_large_reference_architectures.md @@ -14,7 +14,7 @@ This document describes how to: This document is intended for environments using: -- [GitLab reference architectures 3,000 users and up](../reference_architectures/index.md) +- [Linux package (Omnibus) and cloud-native hybrid reference architectures 3,000 users and up](../reference_architectures/index.md) - Highly-automated deployment tooling such as [GitLab Environment Toolkit](https://gitlab.com/gitlab-org/gitlab-environment-toolkit) - [Amazon RDS](https://aws.amazon.com/rds/) for PostgreSQL data - [Amazon S3](https://aws.amazon.com/s3/) for object storage -- GitLab From bbbc769ef1d07c1c27cf9e200fc1ac23677c940e Mon Sep 17 00:00:00 2001 From: Hayley Swimelar Date: Mon, 17 Jul 2023 22:29:51 +0000 Subject: [PATCH 3/9] Remove unhelpful content --- doc/administration/backup_restore/backup_gitlab.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/doc/administration/backup_restore/backup_gitlab.md b/doc/administration/backup_restore/backup_gitlab.md index a1d1c66e750f21..1e8b58f3befd8e 100644 --- a/doc/administration/backup_restore/backup_gitlab.md +++ b/doc/administration/backup_restore/backup_gitlab.md @@ -100,12 +100,7 @@ The [backup command](#backup-command) doesn't back up blobs that aren't stored o - Hosted by you (like MinIO). - A Storage Appliance that exposes an Object Storage-compatible API. -The Container Registry stores two categories of data in its configured storage backend. - -- Blobs, also known as images -- Repositories, also known as metadata - -The back up command will back up both categories of data when they are stored in the default location on the file system. +The back up command will back up registry data when they are stored in the default location on the file system. #### Object storage -- GitLab From 480de99b384a0ef0eee70cfea8615973cf815d94 Mon Sep 17 00:00:00 2001 From: Achilleas Pipinellis Date: Tue, 25 Jul 2023 11:01:43 +0000 Subject: [PATCH 4/9] Apply 4 suggestion(s) to 1 file(s) --- .../backup_large_reference_architectures.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/administration/backup_restore/backup_large_reference_architectures.md b/doc/administration/backup_restore/backup_large_reference_architectures.md index 4182c89870153c..f71bb9dbf2ce45 100644 --- a/doc/administration/backup_restore/backup_large_reference_architectures.md +++ b/doc/administration/backup_restore/backup_large_reference_architectures.md @@ -35,7 +35,7 @@ The [backup command](backup_gitlab.md) uses `pg_dump`, which is [not appropriate ### Configure backup of Git repositories NOTE: -There is a feature proposal to add the ability to back up repositories directly from Gitaly to object storage. See [epic #10077](https://gitlab.com/groups/gitlab-org/-/epics/10077). +There is a feature proposal to add the ability to back up repositories directly from Gitaly to object storage. See [epic 10077](https://gitlab.com/groups/gitlab-org/-/epics/10077). ::Tabs @@ -47,10 +47,10 @@ If utilization is low enough, you can run it from an existing GitLab Rails node. :::TabTitle Cloud native hybrid -[The `backup-utility` command in a `toolbox` pod will fail when there is a large amount of data](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1352989908). In this case, you must run the [backup command](backup_gitlab.md#backup-command) to back up Git repositories, and you need to run it in a VM running the GitLab Linux package. +[The `backup-utility` command in a `toolbox` pod fails when there is a large amount of data](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1352989908). In this case, you must run the [backup command](backup_gitlab.md#backup-command) to back up Git repositories, and you must run it in a VM running the GitLab Linux package: -1. Spin up a VM with 8 vCPU and 7.2 GB memory. This node will be used to back up Git repositories. Note: - - [A Praefect node cannot be used to back up Git data at this time](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1385950340). +1. Spin up a VM with 8 vCPU and 7.2 GB memory. This node will be used to back up Git repositories. Note that + [a Praefect node cannot be used to back up Git data](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1385950340). 1. Configure the node as another **GitLab Rails** node as defined in your [reference architecture](../reference_architectures/index.md). Use the [GitLab Environment Toolkit `gitlab_rails.yml` playbook](https://gitlab.com/gitlab-org/gitlab-environment-toolkit/-/blob/2.8.5/ansible/playbooks/gitlab_rails.yml). As with other GitLab Rails nodes, this node must have access to your main Postgres database as well as to Gitaly Cluster. ::EndTabs @@ -58,7 +58,7 @@ If utilization is low enough, you can run it from an existing GitLab Rails node. 1. The backup node will copy all of the environment's Git data, so ensure that it has enough attached storage. For example, you need at least as much storage as one node in a Gitaly Cluster. Without Gitaly Cluster, you need at least as much storage as all Gitaly nodes. Keep in mind that Git repository backups can be significantly larger than Gitaly storage usage because forks are deduplicated in Gitaly but not in backups. 1. SSH into the GitLab Rails node. 1. [Configure uploading backups to remote cloud storage](backup_gitlab.md#upload-backups-to-a-remote-cloud-storage). -1. [Configure AWS Backup](https://docs.aws.amazon.com/aws-backup/latest/devguide/creating-a-backup-plan.html) for this bucket. Or use a bucket in the same account and region as your production data object storage buckets, and ensure this bucket is included in your [preexisting AWS Backup](#configure-backup-of-postgresql-and-object-storage-data). +1. [Configure AWS Backup](https://docs.aws.amazon.com/aws-backup/latest/devguide/creating-a-backup-plan.html) for this bucket, or use a bucket in the same account and region as your production data object storage buckets, and ensure this bucket is included in your [preexisting AWS Backup](#configure-backup-of-postgresql-and-object-storage-data). 1. Run the [backup command](backup_gitlab.md#backup-command), skipping PostgreSQL data: ```shell -- GitLab From 5a73a4a3f94ef136cb6ccc53f6e7038664bf81b3 Mon Sep 17 00:00:00 2001 From: Achilleas Pipinellis Date: Tue, 25 Jul 2023 11:04:23 +0000 Subject: [PATCH 5/9] Apply 2 suggestion(s) to 1 file(s) --- .../backup_large_reference_architectures.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/doc/administration/backup_restore/backup_large_reference_architectures.md b/doc/administration/backup_restore/backup_large_reference_architectures.md index f71bb9dbf2ce45..2ade7adec9b6ab 100644 --- a/doc/administration/backup_restore/backup_large_reference_architectures.md +++ b/doc/administration/backup_restore/backup_large_reference_architectures.md @@ -37,21 +37,19 @@ The [backup command](backup_gitlab.md) uses `pg_dump`, which is [not appropriate NOTE: There is a feature proposal to add the ability to back up repositories directly from Gitaly to object storage. See [epic 10077](https://gitlab.com/groups/gitlab-org/-/epics/10077). -::Tabs +- Linux package (Omnibus): -:::TabTitle Linux package (Omnibus) + We will continue to use the [backup command](backup_gitlab.md#backup-command) to back up Git repositories. -We will continue to use the [backup command](backup_gitlab.md#backup-command) to back up Git repositories. + If utilization is low enough, you can run it from an existing GitLab Rails node. Otherwise, spin up another node. -If utilization is low enough, you can run it from an existing GitLab Rails node. Otherwise, spin up another node. +- Cloud native hybrid: -:::TabTitle Cloud native hybrid + [The `backup-utility` command in a `toolbox` pod fails when there is a large amount of data](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1352989908). In this case, you must run the [backup command](backup_gitlab.md#backup-command) to back up Git repositories, and you must run it in a VM running the GitLab Linux package: -[The `backup-utility` command in a `toolbox` pod fails when there is a large amount of data](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1352989908). In this case, you must run the [backup command](backup_gitlab.md#backup-command) to back up Git repositories, and you must run it in a VM running the GitLab Linux package: - -1. Spin up a VM with 8 vCPU and 7.2 GB memory. This node will be used to back up Git repositories. Note that - [a Praefect node cannot be used to back up Git data](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1385950340). -1. Configure the node as another **GitLab Rails** node as defined in your [reference architecture](../reference_architectures/index.md). Use the [GitLab Environment Toolkit `gitlab_rails.yml` playbook](https://gitlab.com/gitlab-org/gitlab-environment-toolkit/-/blob/2.8.5/ansible/playbooks/gitlab_rails.yml). As with other GitLab Rails nodes, this node must have access to your main Postgres database as well as to Gitaly Cluster. + 1. Spin up a VM with 8 vCPU and 7.2 GB memory. This node will be used to back up Git repositories. Note that + [a Praefect node cannot be used to back up Git data](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1385950340). + 1. Configure the node as another **GitLab Rails** node as defined in your [reference architecture](../reference_architectures/index.md). Use the [GitLab Environment Toolkit `gitlab_rails.yml` playbook](https://gitlab.com/gitlab-org/gitlab-environment-toolkit/-/blob/2.8.5/ansible/playbooks/gitlab_rails.yml). As with other GitLab Rails nodes, this node must have access to your main Postgres database as well as to Gitaly Cluster. ::EndTabs -- GitLab From c049791f0f81a176686d52ecfe073cc49bd74e5e Mon Sep 17 00:00:00 2001 From: Achilleas Pipinellis Date: Tue, 25 Jul 2023 11:04:46 +0000 Subject: [PATCH 6/9] Apply 1 suggestion(s) to 1 file(s) --- .../backup_restore/backup_large_reference_architectures.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/doc/administration/backup_restore/backup_large_reference_architectures.md b/doc/administration/backup_restore/backup_large_reference_architectures.md index 2ade7adec9b6ab..cc9a2be8eab0c8 100644 --- a/doc/administration/backup_restore/backup_large_reference_architectures.md +++ b/doc/administration/backup_restore/backup_large_reference_architectures.md @@ -51,8 +51,6 @@ There is a feature proposal to add the ability to back up repositories directly [a Praefect node cannot be used to back up Git data](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1385950340). 1. Configure the node as another **GitLab Rails** node as defined in your [reference architecture](../reference_architectures/index.md). Use the [GitLab Environment Toolkit `gitlab_rails.yml` playbook](https://gitlab.com/gitlab-org/gitlab-environment-toolkit/-/blob/2.8.5/ansible/playbooks/gitlab_rails.yml). As with other GitLab Rails nodes, this node must have access to your main Postgres database as well as to Gitaly Cluster. -::EndTabs - 1. The backup node will copy all of the environment's Git data, so ensure that it has enough attached storage. For example, you need at least as much storage as one node in a Gitaly Cluster. Without Gitaly Cluster, you need at least as much storage as all Gitaly nodes. Keep in mind that Git repository backups can be significantly larger than Gitaly storage usage because forks are deduplicated in Gitaly but not in backups. 1. SSH into the GitLab Rails node. 1. [Configure uploading backups to remote cloud storage](backup_gitlab.md#upload-backups-to-a-remote-cloud-storage). -- GitLab From 38a26a27bad97642fc338087516fc20713cfb029 Mon Sep 17 00:00:00 2001 From: Achilleas Pipinellis Date: Tue, 25 Jul 2023 11:05:36 +0000 Subject: [PATCH 7/9] Apply 1 suggestion(s) to 1 file(s) --- .../backup_restore/backup_large_reference_architectures.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/doc/administration/backup_restore/backup_large_reference_architectures.md b/doc/administration/backup_restore/backup_large_reference_architectures.md index cc9a2be8eab0c8..cc6891c491005a 100644 --- a/doc/administration/backup_restore/backup_large_reference_architectures.md +++ b/doc/administration/backup_restore/backup_large_reference_architectures.md @@ -51,7 +51,10 @@ There is a feature proposal to add the ability to back up repositories directly [a Praefect node cannot be used to back up Git data](https://gitlab.com/gitlab-org/gitlab/-/issues/396343#note_1385950340). 1. Configure the node as another **GitLab Rails** node as defined in your [reference architecture](../reference_architectures/index.md). Use the [GitLab Environment Toolkit `gitlab_rails.yml` playbook](https://gitlab.com/gitlab-org/gitlab-environment-toolkit/-/blob/2.8.5/ansible/playbooks/gitlab_rails.yml). As with other GitLab Rails nodes, this node must have access to your main Postgres database as well as to Gitaly Cluster. -1. The backup node will copy all of the environment's Git data, so ensure that it has enough attached storage. For example, you need at least as much storage as one node in a Gitaly Cluster. Without Gitaly Cluster, you need at least as much storage as all Gitaly nodes. Keep in mind that Git repository backups can be significantly larger than Gitaly storage usage because forks are deduplicated in Gitaly but not in backups. +The backup node will copy all of the environment's Git data, so ensure that it has enough attached storage. For example, you need at least as much storage as one node in a Gitaly Cluster. Without Gitaly Cluster, you need at least as much storage as all Gitaly nodes. Keep in mind that Git repository backups can be significantly larger than Gitaly storage usage because forks are deduplicated in Gitaly but not in backups. + +To back up the Git repositories: + 1. SSH into the GitLab Rails node. 1. [Configure uploading backups to remote cloud storage](backup_gitlab.md#upload-backups-to-a-remote-cloud-storage). 1. [Configure AWS Backup](https://docs.aws.amazon.com/aws-backup/latest/devguide/creating-a-backup-plan.html) for this bucket, or use a bucket in the same account and region as your production data object storage buckets, and ensure this bucket is included in your [preexisting AWS Backup](#configure-backup-of-postgresql-and-object-storage-data). -- GitLab From 1434b421f23f1ec68ea680942370fcc0827a5223 Mon Sep 17 00:00:00 2001 From: Achilleas Pipinellis Date: Tue, 25 Jul 2023 11:06:16 +0000 Subject: [PATCH 8/9] Apply 1 suggestion(s) to 1 file(s) --- doc/administration/backup_restore/backup_gitlab.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/administration/backup_restore/backup_gitlab.md b/doc/administration/backup_restore/backup_gitlab.md index 1e8b58f3befd8e..5f396947341a90 100644 --- a/doc/administration/backup_restore/backup_gitlab.md +++ b/doc/administration/backup_restore/backup_gitlab.md @@ -100,7 +100,7 @@ The [backup command](#backup-command) doesn't back up blobs that aren't stored o - Hosted by you (like MinIO). - A Storage Appliance that exposes an Object Storage-compatible API. -The back up command will back up registry data when they are stored in the default location on the file system. +The backup command backs up registry data when they are stored in the default location on the file system. #### Object storage -- GitLab From 4481ca482839a5b9b1d8eecfe035d2d29b83aa1e Mon Sep 17 00:00:00 2001 From: Achilleas Pipinellis Date: Tue, 25 Jul 2023 20:10:54 +0000 Subject: [PATCH 9/9] Apply 1 suggestion(s) to 1 file(s) --- .../backup_restore/backup_large_reference_architectures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/administration/backup_restore/backup_large_reference_architectures.md b/doc/administration/backup_restore/backup_large_reference_architectures.md index cc6891c491005a..b8983613f56a3d 100644 --- a/doc/administration/backup_restore/backup_large_reference_architectures.md +++ b/doc/administration/backup_restore/backup_large_reference_architectures.md @@ -26,7 +26,7 @@ This document is intended for environments using: The [backup command](backup_gitlab.md) uses `pg_dump`, which is [not appropriate for databases over 100 GB](backup_gitlab.md#postgresql-databases). You must choose a PostgreSQL solution which has native, robust backup capabilities. -[Object storage](../object_storage.md), [not NFS](../nfs.md) is recommended for storing GitLab data, including [blobs](backup_gitlab.md#blobs) and [Container registry](backup_gitlab.md#container-registry). +[Object storage](../object_storage.md), ([not NFS](../nfs.md)) is recommended for storing GitLab data, including [blobs](backup_gitlab.md#blobs) and [Container registry](backup_gitlab.md#container-registry). 1. [Configure AWS Backup](https://docs.aws.amazon.com/aws-backup/latest/devguide/creating-a-backup-plan.html) to back up both RDS and S3 data. For maximum protection, [configure continuous backups as well as snapshot backups](https://docs.aws.amazon.com/aws-backup/latest/devguide/point-in-time-recovery.html). 1. Configure AWS Backup to copy backups to a separate region. When AWS takes a backup, the backup can only be restored in the region the backup is stored. -- GitLab