[go: up one dir, main page]

Optimize transaction committed log retention

Extracted from this suggestion

In Add pack-refs housekeeping task support to the ... (!6551 - merged), we introduced a mechanism to keep track of committed log entries after it's admitted by the manager. Those logs are used for conflict checks, verification, result merges, etc. They are organized as an in-memory linked list. Empty leading log entries are truncated if there isn't further transaction using them as its snapshot repositories.

That linked list stores the content of the log entry at the moment. Although each entry does not contain heavy data, they might be kept for a significant amount of time. For example, a repacking task might run for hours until being applied. We need to retain all entries from the time the task starts until then. A busy repository can make the list accumulated. Thus, memory is not the optimal place for it.

One point to consider is that our database already contains all the log records. In the future, we'll need to retain these logs for an extended period anyway for the purposes of archiving and replication. By monitoring the lowest stable log sequence number (LSN), we can determine which logs can be safely discarded. However, if we account for active transactions within this low watermark, we could postpone the deletion of the logs until those transactions are complete, thereby utilizing them for conflict verification. This same process would also be used to ensure that logs are not discarded before they have been archived and duplicated.