[go: up one dir, main page]

Improve logging during recovery process

In #6333 (closed) we added new replay command to recover WAL entries on top of disk snapshot and we also added some logs during the process. But to provide more helpful updates during the process we should improve these logs with the following:

  • Switch to using logger instead of fmt
  • Currently if command is initiated while Gitaly is running, we get errors regarding database is already in-use, but it would not be clear to user that it is because gitaly is running, let's improve this to let user know they should stop gitaly.
  • Currently after log appending starts it doesn't log anything until all the logs are appended. If the backlog of log entries are huge, user might think that the process is frozen or not working. We should instead provide more frequent updates, either after each log or after X amount of logs to indicate progress.
  • There are multiple scenarios where we might abort the process with an error. Currently we are returning those errors but we should make it easy to understand by a sys admin along with an advice and context on how to resolve the problem where possible. Similarly, maybe we can differentiate errors that are okay to retry and errors that require manual intervention before trying again, giving this hint will help the sys admins to have better control over the process rather than just retrying and hoping it will work.
Edited by Mustafa Bayar