Journal flushing
Journal flushing
Posted Apr 29, 2021 17:28 UTC (Thu) by tytso (subscriber, #9993)In reply to: Journal flushing by Wol
Parent article: Preventing information leaks from ext4 filesystems
In general, file systems should not take ages to checkpoint a transaction. On average, we will force a journal commit every 5 seconds (or perhaps sooner if fsync is called). As part of the commit processing, once the commit block is written, all of the metadata buffers will be marked dirty, and 30 seconds later, the normal buffer cache writeback will start. Hence in practice, the only writes that we need to do when doing a full checkpoint are the metadata blocks that were touched in the last 30 seconds, plus any pending writeback.
Also, normally, we don't actually try to checkpoint all completed transactions. If we checkpoint a transaction that completed, say, 50 seconds ago, and all of its dirty buffers have been writen back (or taken over by a newer transaction), then we don't need to do any I/O before we declare, "yep, this transaction is no longer needs to be kept in the journal; we can reuse that space for newer transactions". So normally, we just try to checkpoint all old transactions without needing to do any I/O, and we can move the tail of the journal without needing to do any synchronous I/O.
Now, if the journal is too small, or the block device is too slow, it's possible that this will not free enough space in the journal for newer transactions. In that case, we will need to do a synchronous checkpoint where we actually force the buffers to be written back so we can free space in the journal. While this is happening, file system mutations are blocked, so this is terrible for file system performance. Fortunately, this rarely happens, and in modern versions of mke2fs, we create file systems with larger journals to prevent this from happening.
The only time we need to do a full, synchronous checkpoint, to completely empty the journal, is when we unmount the file system, or if we are trying to freeze the file system in order to take a snapshot. But even then, it's perfectly safe, because we don't actually truncate the journal until the metadata writeback is complete.