Deferring mtime and ctime updates
Deferring mtime and ctime updates
Posted Aug 23, 2013 13:41 UTC (Fri) by bfields (subscriber, #19510)In reply to: Deferring mtime and ctime updates by jlayton
Parent article: Deferring mtime and ctime updates
Yeah, exactly. At the time of a crash the in-memory change attribute may be well ahead of the on-disk one, and when the client resends the uncommitted data after boot it probably doesn't send exactly the same number and sequence of write rpc's, so as the server processes those resends it could reuse old change attributes with different data.
I don't know if the problem would be easy to hit in practice.
For a fix: we'd rather not invalidate all caches on every boot. We can't know which inodes are affected as that would require a disk write before allowing dirtying of any pages. Especially if there's a possibility of multiple reboots and network partitions I don't think we even know which boots are affected (maybe this is a boot after a clean shutdown but we still have a back-in-time change attribute left over from a previous crash).
Maybe a simple fix would be: instead of making the change attribute a simple 64-bit counter, instead put current unix time in the top 32 bits and a counter in the bottom 32 bits. Print a warning and congratulations to the log the first time anyone manages to sustain more than 4 billion writes in a second....