[go: up one dir, main page]

CI: improve observability of job caches

What

  1. Computes size information on all caches. Sizes are for uncompressed caches, so not the same sizes that in the GCP buckets.
  2. Prints this cache size info on job logs.
  3. Sends the data to Datadog.

Why

  • To improve observability of cache usages:
    • detecting cache-related problems and debugging them should be easier (e.g. OOMs)
    • debugging these problems should be easier
    • whilst experimenting (e.g. trying to optimise job speed through cache), the results of the trials should be easier to understand reliably.
  • This should in the end improve the reliability and speed of pipelines.
  • NB: this comes at a small increase of the wall time:
    • 3-5s are added per job, so ~20-25s per pipeline
    • it seems like a very reasonable trade-off given the gains in observability.

How

  • We add [datadog_send_job_cache_info.sh] that:
    • computes the size of the caches (0 if it does not exist)
    • prints them (human readable, bytes) in the log
    • generates datadog tags with cache sizes in bytes (human-readable would have been harder to manage in Datadog as units could be either GB or MB)
    • if [datadog-ci] is installed, sends the data to Datadog
  • We modify the job definition so that a call to [datadog_send_job_cache_info.sh] is made in the [before_script] and [after_script] steps.

Manually testing the MR

Checklist

  • Document the interface of any function added or modified (see the coding guidelines)
  • Document any change to the user interface, including configuration parameters (see node configuration)
  • Provide automatic testing (see the testing guide).
  • For new features and bug fixes, add an item in the appropriate changelog (docs/protocols/alpha.rst for the protocol and the environment, CHANGES.rst at the root of the repository for everything else).
  • Select suitable reviewers using the Reviewers field below.
  • Select as Assignee the next person who should take action on that MR
Edited by Bruno B

Merge request reports

Loading