housekeeping: Add git refs verify after reference repacking
-
Please check this box if this contribution uses AI-generated content (including content generated by GitLab Duo features) as outlined in the GitLab DCO & CLA. As a benefit of being a GitLab Community Contributor, you receive complimentary access to GitLab Duo.
What does this MR do?
This MR implements the git refs verify command as part of the housekeeping process to detect potential reference database corruption. It:
- Adds a new
verifyRefsmethod that executesgit refs verifyafter successful reference packing - Logs any verification failures with repository context
- Adds a new Prometheus counter metric (
gitaly_housekeeping_failed_verify_refs) to track and alert on verification failures - Includes comprehensive tests for both success and failure scenarios
Why was this MR needed?
As we prepare for the rollout of the reftable backend to production, we need additional safety mechanisms to detect any kind of corruption with the new backend. The git refs verify tool allows us to check whether the reference database is consistent, providing an early warning system for potential issues.
Implementation details
- The verification runs after successful reference packing in the
packRefsIfNeededmethod - Verification failures are logged but don't interrupt the housekeeping process
- A new Prometheus counter tracks failures with the storage name as a label
- Verification errors include detailed stderr output for better diagnostics
- Tests verify both success and failure scenarios, including proper logging and metrics
Questions
- I've set the storage label for the Prometheus counter metric on failed ref verifications. Is this sufficient?
- I have implemented unit tests to ensure the functionality works as expected. Are these sufficient, or should additional tests be considered?
- The failure of the refs verification is only logged, and a Prometheus metric counter is incremented. The housekeeping process continues regardless of the verification failure. Is this the expected behavior?
What are the relevant issue numbers?
Fixes #6531 (closed)
Edited by 🤖 GitLab Bot 🤖