DeleteDiffFilesWorker doesn't work as expected for diffs in external storage
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
This worker wipes MergeRequestDiffFile
records from a MergeRequestDiff
in some circumstances. However, it was introduced prior to external merge request diff storage.
When it is given a merge request diff that is stored externally, it removes the MergeRequestDiffFile
rows, but it should also ensure that the file associated with MergeRequestDiff#external_diff
is deleted, and that stored_externally
and external_diff_store
are set to false
and LOCAL
, respectively.
Steps to reproduce
- Run
DeleteDiffFilesWorker
against any merge request diff sat in object storage - Note that the external diff file is still accessible
What is the current bug behavior?
Data attached to MergeRequestDiff is not cleaned as expected
What is the expected correct behavior?
Data should be removed
Output of checks
This bug happens on GitLab.com
Possible fixes
The actual fix isn't too bad - we should move the logic inside this worker to the clean!
state transition for the MergeRequestDiff
, and have it take account of the
What to do about pre-existing data is a harder question. When the MergeRequestDiff
record is removed (say, along with the project), the associated file is destroyed, so it's not technically a resource leak. However, I'd feel better if we scanned through all MergeRequestDiff
records with state: without_file, stored_externally: true
and cleaned things up eagerly, to free up object storage space.