[go: up one dir, main page]

Server-side backup metrics

Add prometheus metrics to keep track of server-side backups.

Proposed metrics

The table below outlines some metrics we can consider adding. These were tested locally via the GDK. For each metric, we can also track gl_project_path as a label attribute to identify particularly large or troublesome repositories.

Metric Example Notes
Backup duration by phase

image.png

A rolling average rate of each phase of a backup. Backups have four phases:

  • writing refs
  • writing the bundle
  • writing custom hooks
  • committing the manifest

BackupRepository RPC response codes

image.png

Rate of RPC responses grouped by response code. BackupRepository emits the following codes:

  • OK
  • NotFound (for skipped backups)
  • Internal (for errors)

BackupRepository RPC response time

image.png

A rolling average rate of response time for the RPC, which pretty much translates to the actual time taken to perform a backup of a single repository.
Bundle upload rate

image.png

Upload rate in MB/s of bundle files into object storage.
Bundle uploads by size

image.png

Persistent count of bundles uploaded by size. Each row represents the number of bundles uploaded with a size within that bucket. e.g. 186 bundles <10MB were uploaded.

Not sure how useful this graph will be in practice.

Implementation plan

Edited by James Liu