DeleteDiffFilesWorker doesn't work as expected for diffs in external storage
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
This worker wipes MergeRequestDiffFile records from a MergeRequestDiff in some circumstances. However, it was introduced prior to external merge request diff storage.
When it is given a merge request diff that is stored externally, it removes the MergeRequestDiffFile rows, but it should also ensure that the file associated with MergeRequestDiff#external_diff is deleted, and that stored_externally and external_diff_store are set to false and LOCAL, respectively.
Steps to reproduce
- Run
DeleteDiffFilesWorkeragainst any merge request diff sat in object storage - Note that the external diff file is still accessible
What is the current bug behavior?
Data attached to MergeRequestDiff is not cleaned as expected
What is the expected correct behavior?
Data should be removed
Output of checks
This bug happens on GitLab.com
Possible fixes
The actual fix isn't too bad - we should move the logic inside this worker to the clean! state transition for the MergeRequestDiff, and have it take account of the
What to do about pre-existing data is a harder question. When the MergeRequestDiff record is removed (say, along with the project), the associated file is destroyed, so it's not technically a resource leak. However, I'd feel better if we scanned through all MergeRequestDiff records with state: without_file, stored_externally: true and cleaned things up eagerly, to free up object storage space.