Following up on file-position locking

By Jonathan Corbet
August 11, 2023

LWN recently covered a discussion on file-position locking that demonstrated the hazards that can result from unexpected concurrency. It turns out that this discussion had not yet fully run its course. Since that article was written, additional changes intended to address a performance regression evolved into a core virtual filesystem (VFS) layer API change to carry out some much-delayed housecleaning.

At the end of the previous article, a change had been merged into the mainline to unconditionally take the file-position lock (which ensures that only one thread is manipulating the current file read/write position at any given time). The article noted that the performance impact of this change had not been measured. That changed on August 3, when Mateusz Guzik reported that there was indeed a performance change — specifically, a 5% regression on a test he had run. VFS layer maintainer Christian Brauner initially discounted the report, but also said that the problem, if it truly existed, could be mitigated by only taking the position lock for directories (and not for regular files). The original locking problem had only affected directory reads, so the fix is only needed there as well.

Linus Torvalds was quick to respond to the report, though, posting a patch that implemented the directory-only behavior; it worked by doing the locking explicitly in the implementation of the directory-related system calls. Brauner pointed out that this approach missed the case of calling lseek() on a directory; "that's the heinous part". Since the whole purpose of lseek() is to manipulate the file position, that was indeed a bit of an oversight.

Torvalds acknowledged the omission, but additionally realized that there are some filesystems that support read() and write() on directories as well. "They may be broken, but people actually did do things like that historically, maybe there's a reason adfs and ceph allow it". Those cases, too, would need to be addressed for a complete fix. To create that fix, he put together a new patch that changes __fdget_pos() to simply test whether the object in question is a directory, and to always take the lock in that case. Brauner was still unconvinced about the need for the patch, but agreed that Torvalds's approach would work; that patch was duly applied to the mainline for 6.5-rc5.

After that happened, Brauner suggested a slightly different approach. __fdget_pos() takes a file structure pointer as an argument; to determine whether that file corresponds to an open directory, the code must traverse the f_inode pointer to get to the associated inode structure. A simpler test, he said, might be to test whether the given file provides either of the iterate() or iterate_shared() operations, which are only provided for directories.

Once upon a time, the kernel's file_operations structure included a member called readdir() that, as might be expected, would be provided by filesystems to support reading directory contents. The 3.11 release in 2013 saw that member renamed to iterate(), with a different API. This function had exclusive access to the inode, which would be write-locked before the call was made. That limited performance for filesystems that were able to support multiple, concurrent read operations on the same directory. To speed things up, a new iterate_shared() variant was added for the 4.7 release in 2016. This version is called with a shared (read) lock on the inode, so multiple calls can be running concurrently.

At the time, the documentation was updated with the admonition: "Old method is only used if the new one is absent; eventually it will be removed. Switch while you still can; the old one won't stay." Few people who are familiar with the kernel development process will be shocked to learn that, in fact, iterate() did stay, and that it is still present in current kernels, seven years later.

Torvalds, having been "shamed" by this reminder that iterate() still exists, decided to finish the job. With a new patch set, he eliminated the iterate() function. For the filesystems that were still using it (ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, procfs, and vboxfs), he created a new wrapper that can be provided as iterate_shared(); it drops the read lock, then acquires a full write lock before calling into the filesystem. The exclusive locking, in other words, has been pushed down a level and only appears in the few remaining places where it is needed.

Torvalds had intended to apply this change during the 6.6 merge window; it is, after all, rather late in the 6.5 cycle for core changes of this nature. One of the advantages of being Linus Torvalds, though, is that you can break the rules when it seems appropriate; he took that approach in this case and merged the change into the mainline, also for the 6.5-rc5 release; it was quickly followed by a change from Brauner to change the position-locking test to just look for the existence of an iterate_shared() function.

Naturally, none of that work updated the associated documentation, giving the tireless documentation maintainer something to do. Otherwise, though, perhaps this little story has finally come to a real conclusion, with code that is more correct and a little bit of longstanding technical debt removed. Of course, it is once again true that no performance results have been posted, so the possibility of a third installment cannot be entirely ruled out.

Index entries for this article
Kernel	Filesystems/Virtual filesystem layer

to post comments

Following up on file-position locking

Posted Aug 11, 2023 15:45 UTC (Fri) by dullfire (guest, #111432) [Link] (17 responses)

to any kernel documentation maintainers: your work is much appreciated. Of the projects whose source I dig through regularly, Linux has the best documentation.

Following up on file-position locking

Posted Aug 11, 2023 17:02 UTC (Fri) by willy (subscriber, #9762) [Link] (16 responses)

I appreciate that! On the other hand that makes me sad because I think the Linux documentation is terrible. That means other projects are worse :-(

Following up on file-position locking

Posted Aug 12, 2023 9:54 UTC (Sat) by jengelh (subscriber, #33263) [Link]

Ironic, though, how underdocumented(*) systems have been running the world :-/

* concerning distribution in electronic form

Following up on file-position locking

Posted Aug 12, 2023 23:45 UTC (Sat) by tialaramex (subscriber, #21167) [Link] (14 responses)

Would things get better if Rust for Linux stuff uses Rust's inherent markdown documentation style where comments live next to the related type/ function/ whatever? And indeed then if the C code began to do likewise, subject to tooling ?

One of the reasons to like this sort of style is that the programmer changing make_doodads() so that now the doodads it makes are crispy rather than chewy, is right there in the file where the docs say /// The doodads are chewy -- and so is much more likely (though of course never certain) to also s/chewy/crispy/ than when those docs live out of sight in some documentation file. Any reviewers are likewise looking at the changed file, where it maybe still says // The doodads are chewy -- and might object that this now seems wrong.

The other reason likely doesn't apply to the kernel, Rust's code examples are tested by cargo test, if we wrote an alleged example of how to make_doodads then it ought to work, so the test harness will attempt to do exactly what you wrote in the example and if you broke it your tests fail. I can't imagine that working in kernel context without an investment of effort that seems disproportionate to the gain.

Following up on file-position locking

Posted Aug 12, 2023 23:58 UTC (Sat) by willy (subscriber, #9762) [Link] (10 responses)

I've ranted about things I hate about kernel-doc before. The first is the style that it makes you write in.

/**
* page_frobnicate - Frobnicate a page.
* @page: The page.
* @flags: How to frobnicate the page.
*
* ...
*
* Context: Can be called from interrupt context unless LOCK is set in @flags.
* Return: 0 on success or a negative errno if frobnication failed.
*/

It's so stilted and repetitive. Yes, it's a lot like java-doc. We can do better.

It doesn't use rst. So kernel-doc is actually a completely separate language from the rst files in the Documentation directory.

The biggest problem I have with Rust doc is that it uses Markdown instead of ReStructuredText. There was even a bug filed against Rust pointing out this was a big mistake, and the developers acknowledged that, but said it was too late to fix it. Shades of Makefiles having to use tabs instead of spaces.

So it feels like taking a big step backwards to convert the rst files to markdown, but it's equally a terrible idea to have a mixture of rst and md inside the kernel docs. I think the onus here is actually on the Rust project to fix their mistake first.

Following up on file-position locking

Posted Aug 13, 2023 3:58 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (2 responses)

I didn't realise Rust people considered ReST (and Asciidoc and several other alternatives). I think you may wait a long time if you're expecting the choice to change but maybe I'll be proved wrong.

That current kernel-doc style is a huge turn-off. Saying the obvious is exactly the kind of thing which makes developers hate writing docs but also makes users hate reading them. There is no overlap between the set of people who need a document explaining that the parameter named "page" is the page, and the set of people who read kernel docs.

Presumably kernel-doc pre-dates the choice of ReST ?

Following up on file-position locking

Posted Aug 13, 2023 4:32 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

I'll say that, at least for myself, Markdown is far easier to write and read. Remembering what extensions work where sucks, but the core is solid. ReST does make better *rendered* docs because it has a bit more control for custom things like cross-linking and such without relying on a sufficiently sophisticated renderer, but I find it to be a bit heavy in the syntax department for my tastes (especially having to match section titles with the appropriate symbol of the right length too). It's not like either make it easy to properly typeset things like LaTeX2ε anyways (IIRC, it was impossible to get the subscript italics as part of a single word without extensions).

Following up on file-position locking

Posted Aug 13, 2023 8:43 UTC (Sun) by Sesse (subscriber, #53779) [Link]

Without having ever used kernel-doc, so just commenting generally: The biggest problem with writing the obvious is that it takes the place of real documentation. If you put in a big blurb saying nothing useful, it still looks optically like you documented it, and so you don't worry about it from there.

(That big blurb saying nothing useful can also be the BSD or GPL header :-) )

Markdown and reST

Posted Aug 13, 2023 16:43 UTC (Sun) by geofft (subscriber, #59789) [Link] (1 responses)

This is getting a little bikesheddy, but at work we've been trying to encourage people to use reST because it's supposed to be better suited for technical documentation, and we've largely concluded in practice that Markdown is better. There are things you just cannot do in any reasonable way in reST, such as putting a hyperlink inside bold text or monospace/code text inside a hyperlink: https://docutils.sourceforge.io/FAQ.html#is-nested-inline... This is a natural pattern when writing any significant amount of technical documentation. (It's also hard to overcome the fact that Markdown is natural to a lot of developeers and reStructuredText isn't - every time I have to write a link in reST I have to go look it up.)

There is a project called MyST (see e.g. https://jupyterbook.org/en/stable/content/myst.html) which extends Markdown syntax so it can map onto the reStructuredText data model, allowing you to use arbitrary reST directives/roles and the ability to link to high-level names instead of specific file paths. I think if you're interested in the more robust technical feature set of reST but wish people would write more docs, you should take a look at MyST.

(Disclaimed, I was involved with a lot of the pre-merge Rust-for-Linux work but I've been uninvolved for long enough that I don't even know what they're using for inline documentation these days....)

Markdown and reST

Posted Aug 13, 2023 21:49 UTC (Sun) by jengelh (subscriber, #33263) [Link]

>every time I have to write a link in reST I have to go look it up

The feeling is mutual: Everytime I have to write a link in MD, I have to look it up. Was it (text)[link], [text](link), (link)[text] or [link](text)? The RST syntax with `text before and URLs in angle brackets <http://en.wikipedia.org/wiki/Bracket>`_ on the other hand is reminiscient of what (I would argue) people use in plaintext mail every now and then, minus the ``_ bugs-on-a-windshield that RST added.

Following up on file-position locking

Posted Aug 13, 2023 19:59 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (3 responses)

> I think the onus here is actually on the Rust project to fix their mistake first.

This is unreasonable. You cannot unilaterally declare the use of Markdown "wrong" merely because you personally dislike it. If Rust wants to use Markdown, they can use Markdown.

Imagine if somebody filed a bug against your project saying "You shouldn't use Make [or Ant, or Maven, or...], because Bazel is much more expressive and powerful." You would WONTFIX it in a heartbeat, wouldn't you? This falls into the same category.

Following up on file-position locking

Posted Aug 14, 2023 0:34 UTC (Mon) by willy (subscriber, #9762) [Link] (2 responses)

Bazel is awful. I'd rather not use software than try to build anything else ever again using Bazel.

But it seemed to me from the threads I read a while back that Rust were still trying to make a decision on switching to RST. What I was saying was that I'd rather wait than write Markdown now and switch it back to RST later. So your characterization of my position is entirely unwarranted.

Following up on file-position locking

Posted Aug 14, 2023 12:07 UTC (Mon) by jkingweb (subscriber, #113039) [Link]

> What I was saying was that I'd rather wait than write Markdown now and switch it back to RST later. So your characterization of my position is entirely unwarranted.

Respectfully, I think NYKevin was completely reasonable in interpreting your words at face value. If you meant to say X, don't say Y.

Following up on file-position locking

Posted Aug 14, 2023 15:15 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

I feel like getting docs is the first step. Delaying it just continues the situation you yourself[1] have complained about. One can use `pandoc` to transform the comments if/when alternate syntaxes become supported (or at least get the bulk of the work done).

[1]https://lwn.net/Articles/941311/

Following up on file-position locking

Posted Aug 15, 2023 11:20 UTC (Tue) by sima (subscriber, #160698) [Link]

OG kerneldoc is indeed garbage, but a pile of things have been improved to make it at least bearable (in my opinion, but I'm biased because I pushed for a lot of these). One is that in the text blocks you can use rst syntax (except titles and a few other things that would collide with the autogenerated stuff from the boilerplate parts of kerneldoc, for DOC: section there's no restrictions since that's just an include really). The automatic hyperlinking that kerneldoc has has been extended to .rst files, which helps with making .rst less verbose and makes it a lot easier to shuffle pieces back&forth. Also the parameter text can be multiline, and for structs you can split them up and put them right in front of each member declaration, which helps a ton with complex stuff (like vtables). I think the end result is a lot more bearable than what we started out with before Jon Corbet took over as maintainer and all these efforts to improve the kernel docs haven't landed yet.

Following up on file-position locking

Posted Aug 27, 2023 12:07 UTC (Sun) by NAR (subscriber, #1313) [Link] (2 responses)

Even documentation right next to the code can go stale. At work I've seen it on more occasions than I can count that e.g. new flags or even new function parameters were not added to the documentation. The best effort I've seen to fix these kind of errors is the Elixir DocTest, where the examples in the documentation are actually executed by the unit test framework, so for example if a new function parameter is added (without a default value), the unit test(!) will catch if the documentation was not upgraded. It's far from 100% and there are code where unit testing is complicated, but it's still useful.

Following up on file-position locking

Posted Aug 29, 2023 13:15 UTC (Tue) by timon (subscriber, #152974) [Link]

The Rust toolchain does run code examples in documentation comments.

Following up on file-position locking

Posted Aug 31, 2023 5:49 UTC (Thu) by jezuch (subscriber, #52988) [Link]

Even Java is getting code snippets in javadoc: https://openjdk.org/jeps/413