[go: up one dir, main page]

Skip to content

Provide date limit option for project imports

This was broken out of #37058 (closed)

Proposal

In an effort to reduce the time it takes to import large projects, we're looking at it from all angles including the product itself. One option we identified that could lead to a significant reduction in the time it takes to import is by limiting imported relations by age.

For a large project, we took measurements and found that by evicting relations (MRs, issues, notes etc.) that are older than ~1.5 years the import size can be reduced by ~50% (which should come with a similar reduction in the time it takes to be processed.)

This would require a change in the import UI where perhaps only when we detect a very large import, we give the user the option to only import recent relations. This is based on the assumption that the value and likelihood of any interaction with the imported data diminishes with age.

Measurements

Measurements were done against a real-world customer export with 2.5GB of relational metadata in JSON.

Impact (size reduction) as a function of age:

Age project.json Reduction
>= 3 months 165.9M 93%
>= 6 months 417.5M 83%
>= 12 months 838.8M 66%
>= 24 months 1.6G 36%

Percentiles:

perc created_at
p50 2018-06-12
p75 2019-03-05
p90 2019-07-29
p95 2019-09-12
p99 2019-10-14

i.e. 50% of the relations in the original project JSON are older than 1.5 years making it an effective lever, assuming that the project used is somewhat representative of others out there in terms of activity.

Caveats

A few considerations here:

  • the actual git repo data would remain untouched; just the project metadata would be affected
  • relentlessly evicting by age only means we might create dangling references; care needs to be taken to not compromise the integrity of the imported data
  • we don't currently know whether this is a feature users would find desirable; more research would be necessary (in theory we want this not to be used at all, it would be a measure of last resort if no other optimizations are good enough)
Edited by 🤖 GitLab Bot 🤖