LWN.net Weekly Edition for September 26, 2024

Welcome to the LWN.net Weekly Edition for September 26, 2024

This edition contains the following feature content:

Linus and Dirk on succession, Rust, and more: the Open Source Summit tradition continues in Vienna.
KDE sets its goals through 2026: where the KDE desktop project would like to go.
Kangrejos 2024: the next set of articles from the annual Rust-for-Linux conference:
- Best practices for error handling in kernel Rust: Dirk Behme asked whether there were ways that the kernel could improve error handling in Rust components.
- Resources for learning Rust for kernel development: the Rust-for-Linux project discusses collecting learning materials.
- What the Nova driver needs: Danilo Krummrich explains what is needed for his new work on the Nova driver.
The 2024 Maintainers Summit: reporting from the annual gathering of top kernel subsystem maintainers:
- Regression tracking: the kernel's regression tracker is unfunded and unsure about continuing; is this work valuable and how can the task be supported?
- Considering kernel pass-through interfaces: what position should the kernel community take toward device drivers that simply pass commands through, unmediated, to a device?
- Tools for kernel developers: the current status and future direction for kernel-development tools.
- Committing to Rust in the kernel: has the Rust experiment succeeded, and where does it go from here?
The 6.12 merge window begins: the first set of changes pulled into the mainline for the next major kernel release.
RPM 4.20 is coming: what to expect from the last RPM 4.x release.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Linus and Dirk on succession, Rust, and more

By Jake Edge
September 25, 2024

OSSEU

The "Linus and Dirk show" has been a fixture at Open Source Summit for as long as the conference has existed; it started back when the conference was called LinuxCon. Since Linus Torvalds famously does not like to give talks, as he said during this year's edition at Open Source Summit Europe (OSSEU) in Vienna, Austria, he and Dirk Hohndel have been sitting down for an informal chat on a wide range of topics as a keynote session. That way, Torvalds does not need to prepare, but also does not know what topics will be brought up, which makes it "so much more fun for one of us", Hohndel said with a grin. The topics this time ranged from the just-released 6.11 kernel and the upcoming Linux 6.12, through Rust for the kernel, to the recurring topic of succession and the graying of Linux maintainers.

After Torvalds suggested that they had been doing these talks for 20 years, though Hohndel pointed out that the tradition began in 2012, the conversation turned to the weather—a common topic after the surprisingly horrible weather in much of Europe due to Storm Boris. In a massive understatement, Hohndel said that it had been a "wee bit windy" the day before, which he and Torvalds had hoped to use as a sightseeing day, but the day before the conference started (September 15) was not a day to stray far from your hotel. It did give Torvalds plenty of time to do the 6.11 kernel release, which they discussed next.

6.11 and 6.12

Hohndel asked what was interesting in 6.11, but Torvalds replied that, like most every other kernel release over the last 15 years so, it was not particularly exciting, which is exactly how it is supposed to be. The release signals the opening of the two-week merge window, of course, which Torvalds chose to start while he was on the road for OSSEU (as well as the Maintainers Summit and Linux Plumbers Conference). The merge window is where "we're getting all the new code for the next release and that's the fun part" for him. Hohndel noted that Torvalds had been pulling patches on his laptop backstage while they were waiting for the session, "so it is just-in-time delivery of the Linux kernel".

Hohndel said that the bulk of what is being pulled these days seems to be drivers of various sorts, which Torvalds agreed is the case. More than half of the kernel is drivers, which has generally been true over the years, because that is "literally the point of a kernel" since it is meant to "abstract out all the hardware details". So, much of the code flowing in is meant to enable new hardware or to fix the hardware support already in the kernel.

For Torvalds, the surprise is that, after working on Linux for a third of a century now, there are still plenty of changes to the core kernel that are being made. Half of what he merged that day were low-level changes to the virtual filesystem (VFS) layer and there have been lots of discussions lately in the area of memory management. Those core changes are ultimately being driven by expansion in the hardware base, but also by new users, with new ways to use the kernel.

The extensible scheduler class (sched_ext), which allows scheduling decisions to be made by BPF programs, will be coming in 6.12, he said, in answer to a question from Hohndel. Torvalds had not yet merged it, but it was in his queue (and was merged later in the week). Some of the core kernel maintainers are at the conferences this week, so he got a lot of early pull requests the previous week, because they did not want to deal with them during their travel. "I'm stuck with that part", he said to laughter.

The conclusion of the 20-year project to get the realtime patches upstream was another thing that would be part of the next kernel, Torvalds confirmed. (In fact, he pulled the enablement patches on September 20, the day after receiving them in a rather different form.) People think that kernel development is rapid because of the pace of releases, but any given feature "may have been developed over months or years or, in some cases, decades". That development happens in the open on the mailing lists, but people typically do not see the background work that goes into a new feature that seems to just appear in the kernel. But, he agreed, the realtime patches are an outlier in terms of development time, in part because they "touched every single area in the kernel", so there is a lot of convincing and coordination that needed to be done; he knows of no other similar project out there.

Rust

Hohndel said that one of the topics that has been generating a lot of discussion in the community recently is "obviously Rust". He noted that one of the Rust-for-Linux maintainers stepped down citing "'non-technical nonsense' as the reason"; beyond that, there have been problems getting the Apple graphics driver written in Rust merged. He asked: "why is this so hard?"

"I actually enjoy it, I enjoy arguments", Torvalds said; one of the nice things about the Rust effort is that it "has livened up some of the discussions". Some of those arguments "get nasty" and people do "decide that this is not worth my time", but it is interesting and "kind of shows how much people care". He is not sure why Rust, in particular, has been so contentious, however.

The whole "Rust versus C discussion is taking almost religious overtones" that remind him of the vi versus Emacs fights from when he was young, which, Hohndel reminded, still go on. "C is, in the end, a very simple language", Torvalds said, which is why he and lots of other programmers enjoy it. The other side of that, though, is that it is also easy to make mistakes in C. Rust is quite different and there are those who do not like that difference; "that's OK".

There is no one who understands the entire kernel, he said; he relies heavily on the maintainers of various subsystems, since there are only a few areas that he gets personally involved in. There are C people who do not know Rust and the reverse is true as well, which is also fine. One of the nice things about the kernel is that people can specialize: some people care about drivers, others about specific architectures, or still others who like filesystems. "And that's how it should be."

There are, obviously, some people who do not like "the notion of Rust and having Rust encroach on their area". He finds it all interesting, however. People have "even talked about the Rust integration being a failure", which is way too early to determine. Even if that happens, "and I don't think it will, that's how you learn". He sees the Rust effort as a "positive, even if the arguments are not necessarily always".

Torvalds noted that kernel C is "not normal C"; there are a lot of rules on how the code can be written and there are tools to help detect when things go awry. There is memory-safety infrastructure within the kernel project that is not part of the C language, but it has been built up incrementally over the years, which allowed it to avoid any major outcry. The "Rust change is, obviously, a much bigger and a much more in-your-face" thing.

Hohndel agreed that it was too early to say that Rust in the kernel was a failure, but that he had been hearing about efforts to build a Rust kernel from the bottom up as an alternative. He wondered if that was a potential outcome if there continues to be a struggle to get Rust into Linux; "an alternative universe" could perhaps arise from Redox, Maestro, or some other Rust kernel. In terms of languages for building a kernel, Torvalds said, there is not a lot of choice; unless you are going to write in assembly, you have to choose one of the C-like languages—or Rust. Linux is not everywhere, these days, in part because it has gotten "very big" over the last three decades; some developers are looking for something smaller, safer, and "not quite as fully-fledged", which is an area where Rust kernels could perhaps make an impact.

Hohndel disagreed somewhat with that, though he did agree that there are "deeply embedded" use cases where Linux is not used; but for general-purpose systems, "it is everywhere". He noted that most 5G modem chips have a complete Linux distribution running inside; "in your iPhone is a chip that runs, as its firmware, Linux". Torvalds reminded attendees of the old "joke" about "world domination", but "that joke became reality and isn't funny any more", he said to laughter.

Torvalds seemed optimistic that "some clueless young person will decide 'how hard can it be?'" and start their own operating system in Rust or some other language. If they keep at it "for many, many decades", they may get somewhere; "I am looking forward to seeing that". Hohndel clarified that by "clueless", Torvalds was referring to his younger self; "Oh, absolutely, yeah, you have to be all kinds of stupid to say 'I can do this'", he said to more laughter. He could not have done it without the "literally tens of thousands of other people"; the "only reason I ever started was that I didn't know how hard it would be, but that's what makes it fun".

Gray hair and burnout

The kernel Maintainers Summit was being held the next day, Hohndel said, and he expected the topic of burnout to come up. Maintainers are "an aging group", many with less or not "the right color of hair"—though Torvalds interjected: "gray is the right color". Meanwhile, kernel development is showing no signs of slowing down, Hohndel said; in fact, it is "accelerating in many ways, Rust being one of them". He wonders, as maintainers get older and burnout becomes more widespread, if there is a need to talk about "mini-Linus" who would be a successor.

"We have been talking about that forever", Torvalds replied, "some people are probably still disappointed that I'm still here". While it is definitely true that kernel maintainers are aging, the positive spin on that is that he does not know of many projects where maintainers—and not just him—have stuck around for more than three decades. The idea that people "burn out and go away" is true, but that is the norm for most projects; the fact that some stick around on the kernel project for decades is unusual, "and I think that's, to some degree, a good sign".

On the other hand, new developers may look at the project and not really see a place for themselves when they see people who have been with the project for a long time, he said. The kernel project is unlike other open-source projects, since the number of developers just seems to grow; there is a "fairly healthy developer subsystem" in the kernel. "The whole 'monkey dance' about 'developers, developers, developers'—we've got them"; he does not see the presence of some graying developers as a "huge problem".

Hohndel said that he was not claiming that the older maintainers were a problem, per se, just that it indicates things will have to change down the road. Torvalds has been doing Linux for 33 years, but Hohndel suggested that in another 33, he would not be—"possibly" was the reply. The current backup is Greg Kroah-Hartman, "who has even less hair than the two of us" and is around the same age as they are, Hohndel said. "How do we get the next generation to gain the experience" needed so that they can take over Torvalds's role in "10, 15, 20, 30 years", he wondered.

Since the kernel project has so many developers, it has always had a lot of competent people that could step up if needed, Torvalds said. Hohndel had mentioned Kroah-Hartman, but he has not always been the backup, Torvalds said. "Before Greg, there were Andrews [Morton] and Alans [Cox], and after Greg there will be Shannons and Steves, who knows?" It comes down to a matter of trust, he said; there will need to be a person or group of people that the community can trust.

Part of being trusted is having been around for "long enough that people know how you work", but that does not have to be 30 years. There are top-level maintainers for major subsystems who got there in just a few years, he said. In truth, being a maintainer is "not as glorious as it is sometimes conceived"; there are maintainers "who would be more than happy to get new people in to come and help".

Starting out

When they got their start in open source, that world was a smaller and simpler place, Hohndel said; these days, everything is "hype versus reality", and he was "not just talking about AI", with projects focused on making "the quick buck, about the quick exit, versus things that are making a difference". If Torvalds were starting today, did he think it "would be easy to find interesting, rewarding, long-term useful projects"?

Torvalds replied: "I don't think that has ever been easy". But setting up an open-source project is much easier these days; "do a few web clicks and you have a GitHub repository where you can start doing open source". In addition, you do not have to "explain why you are doing open source, because people take that for granted now". That means there are a lot of small projects out there "that you would never have seen 30 years ago".

That said, "it was never easy to find something meaningful that you could spend decades doing", which is still true today. You have to come up with an idea that you are interested in, Torvalds said, "but at the same time, you are not the only one interested in it". People often say "do what you love", but if that is "something that nobody else cares about, you are not going to create the next big, successful open-source project".

Finding something meaningful is particularly hard in the tech industry, where there is so much hype. "Everybody is following everybody else like lemmings off a cliff trying to chase the next big thing." Torvalds does not think that is a successful strategy; instead, "find something that isn't what everybody else does and excel at that". Hohndel interjected that they were out of time; he had hoped to end with something inspirational along the lines of "Linus telling the community where to go and make a difference", but, what he got was lemmings falling off a cliff, he said with a chuckle. That way, though, the session itself ended with laughter.

[ I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Vienna for Open Source Summit Europe. ]

Comments (48 posted)

KDE sets its goals through 2026

By Joe Brockmeier
September 24, 2024

Almost a decade ago KDE e.V., the non-profit organization that supports KDE, started a process for selecting goals to help the community unite behind a common vision for where the project should go in the near future. KDE recently wrapped up its 2022-2024 cycle and announced the goals for 2024-2026 at Akademy on September 7, in Würzburg, Germany. This time around, KDE will be looking to streamline its application-development experience, improve support for input devices, and bring in new contributors.

Evolving KDE

The goal-setting practice got its start in 2015 when Lydia Pintscher, then president of KDE e.V., blogged about a plan for a yearly process of gathering community input, defining goals, planning ways to achieve them, and measuring success. She said that setting goals for the project was important to help KDE "get a better understanding of where we are, where we want to go and how we want to get there".

In 2017, Pintscher announced a formal process for setting goals and providing sponsorship from KDE e.V. in the form of support for sprints, presentation slots at Akademy, and more. Goals could be about "anything you consider important – it doesn't have to be about writing code".

Setting goals

Contributors have about a month from the announcement of a cycle to come up with proposals. The proposals are then posted on KDE's Phabricator instance. (The Phabricator project was a code-collaboration project that was open-sourced by Facebook that is now defunct. KDE is phasing it out in favor of GitLab.) The format is somewhat free-form, but it follows a rough template. It includes a detailed description of the goal (or a problem that the goal will solve), a high-level plan, what's needed to accomplish the goal, the champion or champions for the goal (if any), and a list of contributors willing to work toward the goal. All of that information is important for the next step, the discussion cycle.

During that phase, proposals are batted about and refined based on community input. The community is encouraged to submit proposals even if the proposal does not have a champion yet, with the idea that champions may be found during the discussion period.

The list is then winnowed down to the proposals that are eligible for voting. A proposal might not be deemed eligible for voting if it lacks a champion, is too narrowly focused to be a goal, or provides too little information to be considered. For the current cycle, more than 40 proposals were submitted, with only ten declared ready for voting.

Voting on the proposals is open to KDE contributors with more than 10 "actions" on Invent, KDE's GitLab instance. This year, voting took place from August 15 through August 31. Finally, the top three are announced at Akademy (and via KDE's mailing list and blog). Then the work begins.

For the 2018-2020 cycle, the project chose streamlining onboarding of new contributors, improving user privacy, and improving the usability and productivity of KDE. The 2020-2022 cycle focused on consistency, applications, and finalizing the transition to Wayland. The goal set for the most recently completed cycle, 2022-2024, focused on accessibility, sustainable software, and automating and systematizing KDE processes to improve the project's institutional memory.

We are the champions

In prior cycles, each goal had an individual champion who was responsible for the goal from proposal to completion. This time around, Pintscher wrote, goals will be driven by a team of champions instead of an individual champion. She suggested that one person should focus on the goal's vision, another person would work on steering the technical implementation, and a third person would work on promotion. That team structure was merely a suggestion; the team could delegate responsibility in whatever way made sense, so long as there was a team. The plan was to put less pressure on the champions doing the work themselves and more emphasis on them "driving the goal forward through others". It makes sense to expand the number of people responsible for championing major initiatives, rather than putting all the responsibility on one person. As with most community open-source projects, KDE is not overburdened with contributors, and it's important to help them avoid burnout. KDE e.V. would work with champions to help with fundraising for projects to support the goals.

The winning goals were unveiled at a session at Akademy by Pintscher (video here), and later sent to the kde-community mailing list by Farid Abdelnour. The first winner is "strealined application development experience", championed by Graham and Nicolas Fella. The second is "we care about your input", championed by Gernot Schiller, Jakob Petsovits, and Joshua Goins. Finally, the third winning goal is "KDE needs you!", which has three champions; Aniqa Khokhar, Johnny Jazeix, and Paul Brown.

Streamlined application-development experience

The streamlining proposal is a good example of how goals are refined over the discussion period. The original text for the goal sparked a lengthy discussion, with Julius Enriquez pointing out that some of its suggestions overlapped with a KDE apps initiative by Carl Schwan. Fella suggested rewriting the proposal around making sure "our application development story is as good as possible", whether they are first-party applications from the KDE project itself or third-party applications targeting KDE.

Ultimately, Fella's rewritten version of the goal was adopted, leading him and Graham to sign on as champions. The final proposal says that previous goals have helped to improve application delivery and design, and to improve the consistency of the Plasma desktop—but KDE has fallen short of improving the application-development story:

There are many cases where our application design is inconsistent, either because no unified idea has been established or older applications have not been updated to new design ideas. The introduction of convergent design as a goal introduced new complexity to this. We tried many ideas and approaches, but often didn't converge (pun intended) on a common design.

To that end, the plan is to improve KDE's human interface guidelines (HIG) and design documentation for applications. Some of that work had begun before the goal was proposed: Graham announced a rewrite of the guidelines in June 2024. The goal goes much farther than that, though; it sets its sights on improving developer documentation, establishing "a culture of continuous review and improvement", and making it possible to build KDE applications with languages other than C++, such as Rust or Python.

We care about your input

KDE has decent support for basic input devices, such as mice, touchpads, and (of course) keyboards. However, users need support for a much wider array of devices or input types, including drawing tablets, touchscreens, speech-to-text input, and more. Those devices are often under-supported on Linux since the manufacturers usually do not do the work of providing software for platforms other than Windows and sometimes macOS. Users tend not to care about where the fault lies, however, only that their devices work as expected with all the bells and whistles. In addition, the proposal acknowledges that KDE still needs to close the gap between Wayland and X11 support.

The plan is to start by gathering information about missing features, bugs, wishlist items, and GitLab issues to compile a wiki page that tracks work to be done. The Wayland showstopper page that was used for the Wayland goal in the 2020-2022 cycle is given as an example of this. The next steps will be to perform research on input topics and blog about the findings "to educate users and developers, spur discussions, and inspire contributions from the wider community". The team also plans to reach out to users with "special hardware" who can help test patches, and to coordinate developers interested in working on improving input-device support in KDE.

Some of the ideas that are already on the table include disabling a touchpad when a mouse is plugged in, allowing customization of three-finger and four-finger touchpad gestures, and making it easy to re-bind keyboard keys globally. For example, letting users remap the Copilot key that is included on PCs shipping with Windows 11 to something useful.

Even more ambitious, the plan includes an number of ideas around input-method editor (IME) support in KDE. Generally, IME is the use of software to allow users to input text through methods other than a traditional keyboard. This covers input methods such as speech-to-text, language translation, and handwriting digitization when writing with a stylus on a touchscreen laptop.

Plasma is included as the default desktop of SteamOS, a Linux distribution that is designed for portable gaming PCs such as the Steam Deck. SteamOS can be used on other devices, however, and KDE may not have support for third-party game controllers that users choose for those devices. The proposal notes that navigating the desktop and applications on SteamOS without the Steam Deck's trackpads "is not currently a pleasant experience". Therefore, the team has set a target of making game controllers a primary input device for KDE. This would entail making it possible to navigate the desktop using only "arrow keys, a Back button, an A (enter/select) button and a menu button".

And, of course, an input goal would not be complete without addressing accessibility. The proposal notes that some of the input focus areas "are directly relevant to accessibility concerns". It notes Matt Campbell's work on the Newton accessibility architecture for Wayland as work that "KDE needs to stay on top of" to ensure that screen readers work well with KDE software "regardless of whether users access it via mouse, keyboard, touchscreen, tablet or game controller".

What does success look like for this goal? The proposal says that success is when "we're happy enough with the input stack to move on to other undertakings instead". Some of the specific indicators of success include: at least one new IME input method, a virtual keyboard that can fully replace a physical keyboard for users with a touchscreen or mouse input only, and new reports of "missing or unintuitive functionality become exceedingly rare" for mice and keyboards.

KDE needs you

Finally, the "KDE needs you!" goal focuses on formalizing and improving KDE's processes for recruiting active contributors, with an emphasis on "active". The proposal explains that many of KDE's projects are maintained by only one or two people, leaving aside "flyby contributions (which cannot be relied upon for continuous and stable development)". Even a core project like Plasma, it says, only has between eight and ten contributors; attracting new contributors is a matter of survival for KDE.

KDE already has a mentorship team that is meant to help new contributors learn how to productively contribute to KDE. The proposal recommends expanding that team's charter, increasing its resources, and turning it into a recruitment team, "since the main purpose of the mentorship programmes is recruitment anyway". The proposal aims to recruit more aggressively over social media, through direct contacts, and more. It also recommends adding a paid contractor to manage the recruitment team if necessary and resources allow.

The proposal identifies academic institutions and companies as potential sources for new contributors. It suggests trying to persuade educational institutions to direct students to do internships or final projects with KDE. Companies might want to be involved, it says, to help train their employees on working with open source.

In addition to the recruitment team mentoring new contributors, the proposal suggests that KDE's "harried developers" should "slow down development and spend time making it easier to bring new contributors into the fold":

It's often argued that developers don't have the time to mentor and train new contributors, but why not? There are no quarterly targets to meet, no board of directors demanding more and more output. Development could slow down by 1/4 for, say, two years and KDE would not be affected in any significant way. So instead of 100 changes listed in the changelog, there are now 75, so what? This is not a big deal folks.

The slowdown would be temporary, according to the proposal, as more contributors are trained and become productive. Success will be easy to identify by counting heads. The target for the goal is to increase the number of regular contributors to some of KDE's core projects by 50% in the next two years.

Cumulative culture

One might wonder about the results of the program so far, and whether the effort of goal-setting has panned out for the project. Looking at prior cycles, the practice does seem to be working for the KDE community. The onboarding project, for example, delivered a number of improvements to KDE's Bugzilla during the cycle to help new contributors file better bugs, and helped fill out KDE's developer documentation.

The Wayland goal didn't quite succeed in hitting its target of letting KDE use Wayland by default by the end of the cycle: but it laid the groundwork for KDE to make the switch in 2023, ahead of the KDE Plasma 6 release in 2024. That is in keeping with KDE's philosophy of cumulative culture: every cycle "represents a new layer of accumulated wisdom, i.e. new features and more stability". As long as the process continues to improve KDE in the long run, goals don't have to hit 100 percent (though that would be nice), they just have to drive KDE forward.

Comments (5 posted)

Kangrejos 2024

Kangrejos is the annual conference about the Rust-for-Linux project. In 2024, it was held in Copenhagen, on September 7 and 8. Topics include current challenges faced by developers using Rust in the kernel, future directions of development, and status updates on people's work. Our coverage is a work in progress; articles from the conference will be listed here.

A discussion of Rust safety documentation: Benno Lossin continues his work to standardize safety comments on kernel code.
Best practices for error handling in kernel Rust: Dirk Behme asked whether there were ways that the kernel could improve error handling in Rust components.
Resources for learning Rust for kernel development: the Rust-for-Linux project discusses collecting learning materials.
What the Nova driver needs: Danilo Krummrich explains what is needed for his new work on the Nova driver.
Getting PCI driver abstractions upstream: What is required to have a completely safe PCI driver?
Coccinelle for Rust: next steps toward being able to use Coccinelle with Rust code.
BTF, Rust, and the kernel toolchain: how evolutions in BTF could impact the Rust-for-Linux project.
Smart pointers for the kernel: new Rust features to improve smart pointer ergonomics.
Efficient Rust tracepoints: integrating Rust with the kernel's tracepoint mechanism.
Improving bindgen for the kernel: John Baublitz explains the improvements that he has made to bindgen.
FFI type mismatches in Rust for Linux: there are a few subtle mismatches between Rust and C code; how can they be fixed?
Zapping pointers out of thin air: Paul McKenney discusses the work being done to make C++ concurrency safer, and how it might affect rust.
Using LKMM atomics in Rust: Rust is unaware of the Linux kernel memory model; how can this be rectified?

Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our coverage of Kangrejos.

Comments (none posted)

Best practices for error handling in kernel Rust

By Daroc Alden
September 19, 2024

Kangrejos 2024

Dirk Behme led a session discussing the use of Rust's question-mark operator in the kernel at Kangrejos 2024. He was particularly concerned with the concept of "silent" errors that don't print any messages to the console. Other attendees were less convinced that this was a problem, but his presentation sparked a lot of discussion about whether the Rust-for-Linux project could improve error handling in kernel Rust code.

He opened the session by giving a simplified example, based on some actual debugging he had done. In short, he had a Rust function that returns an error. In Rust errors are just normal values that are returned to the caller of a function like any other value. The caller can either do something explicit with the error — or return early and pass the error on to its caller using the question-mark operator. Code like foo()? calls foo() and then, if it returns an Err value, immediately returns the error to the calling function.

This chain has to end at some point. In user space, the ultimate endpoint is the return value from the program's main function. If main() returns an error, the default behavior is to print it to stderr and exit with a nonzero exit code:

    $ ./failing-rust-program
    Error: ParseIntError { kind: InvalidDigit }

In the kernel, there's no such facility. Therefore an unhandled error might not result in anything being printed to the kernel log, which is not ideal. Behme looked at a few alternatives. One suggested alternative was that the coding standards for kernel Rust could encourage people to use .expect() instead of the question-mark operator, which does show a message in the log. It also currently crashes the kernel, however, which is never the right solution.

    let value = fallible_operation()
       .expect("the operation not to fail in this case. If it does, BUG_ON()");

An alternative solution, he mentioned, could be adapted from ScopeGuard. He asked Andreas Hindborg to explain in more detail. Unlike the question-mark operator or .expect(), ScopeGuard is not built into the Rust language, Hindborg explained. Rather, it's part of the kernel crate. A ScopeGuard is a wrapper around a function pointer that calls the function whenever the ScopeGuard stops being referenced. It's useful for handling cleanup when a function has more complicated cleanup than just freeing memory or unlocking locks (which can be handled by Rust's Drop mechanism). The user can also "disarm" the ScopeGuard to prevent it from firing on code paths that handle cleanup themselves (or don't need to do it for some reason):

    // Create an anonymous function to perform cleanup
    let guard = ScopeGuard::new(|| {
      ... // Cleanup code
    });

    ...

    if some_condition() {
        return; // guard goes out of scope, runs cleanup code
    }

    ...

    // When the function exits normally, the guard can be disarmed
    guard.disarm();
    return;

Behme suggested that they might want to adapt this to create an error type that prints a log message to the console when an error of that type is dropped. So if a programmer calls a fallible function, they can either handle the error themselves (preventing it from being dropped), pass it on to their caller (likewise preventing it from being dropped), or forget to handle it — causing the value to be dropped and print an error to the console. He thought a downside of this approach is that he wasn't certain how to make it work with line information — after all, the place where the error is dropped is likely to be far away from where it was created, so the error type would need to capture and store the original source information.

For guidance, Behme suggested looking at what C does. In C, not all errors print log messages, but not all errors are silent either. It can be decided on a case-by-case basis. Kernel Rust code could potentially do the same, but then contributors lose some of the convenience of having a generic solution. He asked what the attendees would prefer.

Miguel Ojeda asked when they actually want errors to be printed — does it make sense to have a special debugging mode that prints more errors? Or to select individual functions for tracing? In either case, it would be better to enable that without requiring source-level changes for tracing.

Behme was skeptical about the idea of having a special debugging mode; he felt that developers often don't actually run with modes like that turned on, and that customers with a production image don't either. So adding a special mode would not really help.

Benno Lossin asked what information Behme wanted for debugging — did he want just the file and line number, or would he like to show a whole backtrace? Behme answered that obviously the best approach would be "an AI that tells me where the bug is". Failing that, more information is better — and just one line would be better than being completely silent. "I'll take what I can get".

Lossin mentioned that in one of his user-space projects he embeds a backtrace in his error type by using one of the internal details of the question-mark operator: when a function returns one type of error, but uses the question-mark operator on another, the operator tries to convert between the types using the From trait. If no such conversion is possible, it's a compile-time error.

So in Lossin's project, he has one error type for all of his code, and when some external library returns an error, the From implementation captures a backtrace and stores it. Behme thought that sounded potentially useful, and that having full backtraces available would be helpful.

Alice Ryhl wasn't sure that they should want errors in drivers to print to the console, however. She pointed out that some "error" conditions are actually fairly normal, such as temporary allocation failures. Logging about these whenever they happen could spam the kernel log. Behme responded that Ryhl's driver has about 200 places that use the question-mark operator — did she really want all of those to be silent?

Ryhl explained that some of them are not silent — she has explicit logging code where needed. But also, she said, the utility of the question-mark operator is that it makes it easy to bubble-up errors to higher levels that can handle them with more context. So not every invocation of the question-mark operator really needs to log.

Paul McKenney agreed, saying that it was possible to have too much output, especially in embedded systems. He noted that kernel developers have spent 20 years adding and removing debug info from various places in the kernel in order to try and get the balance correct.

Behme noted that he would even be happy with the conclusion that Rust should just handle each case manually, like C, as long as they started forbidding the question-mark operator so that people actually have to think about it.

Boqun Feng thought that dynamic BPF tracepoints could maybe provide some help with debugging things like this, therefore there was no need to log until the user attaches a function for debugging.

Greg Kroah-Hartman disagreed, saying that whatever logging they decided on was something that would run all of the time. It's often not possible to tell the customer to rerun something with debugging turned on. He had some thoughts of his own on the question, however: traditionally in the kernel, it's the function that creates the error that's responsible for logging it, he said. If there's a memory error, you don't print it at every level bubbling up.

Behme suggested that would mean the question-mark operator should only be used after checking that the function that creates the error does log it, then.

Carlos Bilbao was confused about how Behme was seeing errors not being reported, anyway — in Rust, the Result type, which represents either an error or a normal return value, has the #[must_use] attribute. So the compiler will emit a warning if the code does not do something with an error.

Lossin thought that the C approach might not make sense for Rust, because idiomatic Rust often has many small helper functions, and it could make sense to centralize error reporting more than that. He went on to suggest adding an extension to Result that can attach log messages to errors — like the popular user-space anyhow crate does:

    some_fallible_operation().log("if it fails, this is printed")?;
    // and the error is still bubbled up with ?

Gary Guo pointed out that you actually want more information than just "there was an error", so you can't really get away from the need for custom error-printing code. Hindborg noted that different use cases warrant different amounts of logging.

The eventual conclusion was that the question-mark operator was not the problem per se, but rather a lack of standardized error handling and logging in kernel Rust. Additional libraries or simple documentation that helps address the issue could be useful. The balance between performance, ease of debugging, and error handling remains one that requires human judgement.

[ Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our coverage of Kangrejos. ]

Comments (55 posted)

Resources for learning Rust for kernel development

By Daroc Alden
September 23, 2024

Kangrejos 2024

Dirk Behme led a second session, back-to-back with his session on error handling at Kangrejos 2024, discussing providing better guidance for users of the kernel's Rust abstractions. Just after that, Carlos Bilbao and Miguel Ojeda had their own time slot dedicated to collecting resources that could be of use to someone trying to come up to speed on kernel development in Rust. The attendees provided a lot of guidance in both sessions, and discussed what they could do to make things easier for people coming from non-Rust backgrounds.

He opened the session by noting that "most of you are special" — that the attendees were, by and large, already knowledgeable about Rust. They have written drivers, seen that abstractions were missing, and written the abstractions as well. So nearly everyone in the room was an expert, who knew all of the details of how Rust works in the Linux kernel. Behme isn't a computer-science person, though. His background is in electrical engineering.

He put up a picture of Linux Device Drivers, 3rd edition, asking: does there also need to be a book about Rust kernel abstractions? Rust is said to have a steep learning curve — and Rust-for-Linux goes even further, since it involves writing low-level code in a particular style and the kernel is always under heavy development.

To illustrate his point, Behme put up some examples of beginners asking about writing kernel Rust. One person was having trouble writing a module. Alice Ryhl had replied to them that the abstraction they were using had changed its API, and explained how to adapt their module. This isn't an uncommon problem — others have also reported needing time to adapt, he said. Behme himself took some time to figure out the devicetree abstraction — about a week. He said that this wasn't a complaint, just an example of how learning the necessary prerequisites can be hard, and how the project could have better learning materials.

Andreas Hindborg said that when an abstraction goes into a kernel tree, the requirement is for there to be a user of that abstraction — so there should be an example right there in the tree. In practice, he said, the abstractions that do go in also tend to have good examples in the documentation. So the project certainly intends for the type of learning material Behme was asking for to exist.

Miguel Ojeda pointed out that there may be books about Linux device drivers, but that it's still early days for Rust in the kernel. It took time for those books to be written, he said. "We were thinking about writing a book," he continued, but it was just too much work right now.

Behme replied that he did think that the project was doing a good job with documentation, but that it was not enough. At his work, he had asked about whether they could start using Rust-for-Linux soon; his manager said no, not for technical reasons, but for social reasons — the learning curve from C to Rust was too steep for most of the engineers at his company.

One audience member asked whether that difficult curve was due to the language, or the Rust-for-Linux project. Behme said that the main concern was Rust, but that the project adds complexity on top of that. He gave the example of looking at some Rust code and seeing that it called spin_lock(), but "why the hell was there no unlock"? (Answer: the spin_lock() Rust abstraction returns a guard object that automatically releases the lock when it is dropped — either explicitly by the programmer, or implicitly at the end of the function.) There are examples for these things, but the underlying reasoning is different from C, and that takes time to learn.

Paul McKenney noted that modern kernel C code actually has similar lock guards now, and that maybe this would make Rust's use of lock-guard objects less counterintuitive. Ryhl wondered whether having translations for common conventions between languages would be helpful.

Hindborg agreed that it takes time to learn a language, and that Rust is fairly difficult to learn as imperative languages go. But "you need to invest time in learning that; it doesn't come for free". Once you have put the time in, there are substantial benefits. He suggested that Behme tell his manager that, noting that Google claimed a 3x productivity increase with Rust. He also said that while Rust-for-Linux does add some additional details on top of plain Rust, that's nothing new for the kernel — C in the kernel is pretty different from C in user space.

Ryhl asked what it takes to teach other developers to write Rust, in the other attendees' experience. She noted that there would be an upcoming talk at RustConf on that topic, actually. Another attendee pointed out that Rust-for-Linux patches go via the mailing lists, just like any kernel patch — so the documentation that justifies or explains a change is often there. Behme asked if it would be possible to get something like a set of release notes for each kernel, talking about what changed.

Hindborg replied that all of the changes are there in Git. Ojeda suggested that when Behme saw a change to an API, he should go look at the corresponding commit.

Greg Kroah-Hartman pointed out that the kernel's C developers don't provide internal kernel-change information — so why should the Rust developers do that? On a related note, he advised against using Linux Device Drivers, because it is now seriously out of date. Ojeda agreed, noting that the changes Behme had highlighted were all to internal Rust APIs — and just like other kernel interfaces, there is no guarantee that they can't change at any time.

Richard Weinberger thought Behme had a good point, however — he noted that most people writing device drivers are electrical engineers, not computer scientists. From that point of view, Rust looks "outlandish and hostile". If you know OCaml and Haskell, Rust looks awesome, he said. The Rust-for-Linux developers should be careful not to assume that kernel hackers who only know C have the same positive impression.

Hindborg replied that he understood Weinberger's point, but that he was himself an electrical engineer who learned C and then Rust. It's not impossible, he said, and you can expect people to learn new tools.

Yes, but you need to give them motivation to do so, Weinberger responded. Benno Lossin said that as a writer of Rust documentation, it's often hard to know what a beginner won't understand. If you're coming at it from the Rust side, the reason that there's no corresponding unlock() in the code is pretty clear. We need to listen to the people coming from the kernel side who have problems in order to improve our documentation, he said. He asked Behme to write down some of the problems he had encountered, so they could turn it into some documentation. Behme agreed.

Lossin also agreed that the changes to the Rust APIs were frustrating, but that they could become better over time — there's a preexisting plan to split up some of the functionality of the kernel crate into smaller crates that should therefore see less frequent changes. He said that there are so many people working on the kernel crate right now that it's hard for anyone to track all of the changes.

Gary Guo thought that Rust would actually shine for developers writing device drivers. It's unrealistic to expect every engineer to understand C's many sharp edges, he said. In Rust, the APIs are not all there yet, but it's possible that they could only ever need to write safe Rust code. So there's actually value in getting less experienced engineers to write Rust — the compiler will help them write fewer bugs.

Simona Vetter said that, in her experience, the average kernel C developer doesn't understand C either. There are, in theory, five people who could write a bug-free driver, and in practice zero. It's practically impossible to write bug-free kernel code, she stated.

Behme replied that, in industry, you have to write drivers. So getting acceptable drivers out of real engineers is a requirement. Vetter thought that Rust could actually be helpful with that — her hope is that new engineers could just type out "random code" in Rust and get a correct driver out, which would never happen in C.

Hindborg thought that was an interesting observation. He predicted that lots of people would be angry if their employer told them to write Rust, because nobody likes being told off by the compiler. But despite that, when the necessary libraries are in place, perhaps we can just never compile a buggy driver.

Ryhl noted that she has seen other people contribute to her driver without adding any abstractions. So in at least one domain, things have gotten to the point that things are mostly stable.

Collecting resources

At that point, it was officially time to go to the next session. But, luckily, the next session was scheduled to be a roundup of different educational materials, with the aim of producing a recommended list for learning Rust in the kernel.

Ojeda asked that people list out the resources they found most helpful while learning rust. For his own part, he found an online book from Brown University's Cognitive Engineering Lab with an interactive borrow-checker helpful.

Guo joked that the best way to learn Rust was to learn C++, hate it, and then learn Rust. Lots of concepts map, he said, but Rust is much better. Adrian Taylor suggested the New Rustacean podcast. He thought that audio was a weird way to learn a programming language, but he liked it in this case. He also suggested a series of articles, "Learn Rust the Dangerous Way", which shows the incremental conversion of a C program to Rust.

Kroah-Hartman said that the Linux Foundation has a free online course for learning Rust. Hindborg said that Google had a free five-day course as well, "Comprehensive Rust".

Lossin gave a more general recommendation — read blogs. There are lots of good posts on advanced topics, he said. He particularly liked Amos Wenger's explanation of Pin. Ojeda suggested the Master's thesis "You Can't Spell Trust Without Rust" by Aria Desires as a good resource for advanced topics as well. She also wrote "Learn Rust With Entirely Too Many Linked Lists", which nobody recommended at the time, but that is also intended as an introduction to Rust for programmers with existing C experience.

With the resources collected, the discussion turned to what to do with them. Bilbao said that the project should make a distinction between people just starting out, and people who have been writing Rust in Linux for some time — they have different needs. He suggested using the Rust-for-Linux web site as a central location for hosting good blog posts, but also thought that it was important that the project be "serious" about ensuring things are well documented.

Lossin noted that there is already a linter rule requiring that all public items (functions and types) be documented. There was a brief discussion of current kernel conventions for documenting C code.

Vetter ended up pointing out one problem with kernel-doc, the tool that checks whether C code is documented. It doesn't complain when there are no comments, but it does complain when there is one comment and some still missing. This makes people not want to add documentation where it doesn't already exist. Rust is ahead, she said, because just requiring that documentation exists, even if people don't put effort into it, makes it easier to improve later. She pointed out that the overlap between people who are good at writing complicated code, and people who are good at writing documentation is often not big — so it's okay to encourage people to collaborate.

In all, there was a clear consensus that the Rust-for-Linux project could make it easier for people to get up to speed with the knowledge necessary to write Rust in the kernel. So the project will continue to encourage good documentation standards, centralize learning resources, and work with other kernel developers who bring up pain points to figure out what else needs to be covered.

[ Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our coverage of Kangrejos. ]

Comments (47 posted)

What the Nova GPU driver needs

By Daroc Alden
September 25, 2024

Kangrejos 2024

In March, Danilo Krummrich announced the new Nova GPU driver — a successor to Nouveau for controlling NVIDIA GPUs. At Kangrejos 2024, Krummrich gave a presentation about what it is, why it's needed, and where it's going next. Hearing about the needs of the driver provoked extended discussion on related topics, including what level of safety is reasonable to expect from drivers, given that they must interact with the hardware.

Krummrich started off by covering the motivation for the new driver. He had been working on multiple drivers that required a particular component, GPUVM, the dedicated memory-management system for some GPUs — but that component only had one core contributor. There are a few reasons that nobody had stepped up to help: it was reverse-engineered from the hardware, so there was little documentation, and the hardware itself is complicated. Now, Krummrich needed to add another complication: the new GPU System Processor (GSP), a CPU intended for low-latency GPU configuration included in some recent NVIDIA CPUs.

The parts of the driver outside of GPUVM and the GSP aren't as bad, Krummrich said. The DRM layer does not suffer as much from missing documentation — but some driver code is still hard to understand. For example, the addition of virtual memory management for Vulkan saw the page table for GPU memory be implemented separately from the memory management. That doesn't work out well. In theory, lockups are possible, he explained. Ultimately, he determined it was better to have a clean cut between the GSP code and legacy code.

And, if a clean cut is necessary, Rust is a good choice, he explained. Using Rust has the normal benefits for memory safety that people talk about, but there are other, more specific, reasons to pick it. The GSP firmware interface is unstable — the firmware generally works by placing messages in queues in shared memory, but the details are not guaranteed to remain the same from one version to another. This is partly because of how NVIDIA distributes changes: it can bundle a new firmware and new driver together, so it doesn't need to keep the interface stable.

That approach doesn't work for the upstream kernel, Krummrich said. Once the kernel supports a firmware version, that support must be maintained. This is, of course, possible in C, but it becomes "really messy". Not every version changes everything, so there's lots of common code, with occasional version-specific hacks.

In C, this caused the code to slowly devolve into "macro hell". He hopes that with Rust, he and his collaborators can do something better — Rust's procedural macros are a lot more flexible, understandable, and maintainable. His proposed approach is to generate Rust structures from NVIDIA's C headers, and then generate separate code for each version implementing a common interface. Then the right version can be picked at run time.

Discussion

Writing a complex graphics driver is a big task, though, so Krummrich has a plan to tackle it a step at a time. He noticed that a problem with Rust drivers so far is a chicken-and-egg problem of needing abstractions and users of those abstractions to both be merged at the same time. Asahi Linux had problems with that. So for Nova, Krummrich wants to keep it simple, and start with a stub driver. Then, take the time to work things out and improve it incrementally.

Paul McKenney agreed that getting work upstreamed can be difficult, but wondered if there was some way to get access to in-progress work before it was upstreamed. Krummrich said that there are some staging branches in the Rust-for-Linux tree and in the DRM tree. He maintains a branch that merges them all together on top of the latest kernel.

Maciej Falkowski asked to hear more details about how versioning using procedural macros works. Krummrich explained that the exact details are somewhat in flux, but that conceptually, for each function, they annotate parts of the code as only applying to a specific version. Then a macro picks and chooses code blocks to assemble the source for each version, and generates a trait implementation using those.

Benno Lossin asked what Krummrich thought the core problems he was facing were, and how the other people in the room could help — other than reviewing, which he knows always needs more people. The best thing would be to help get abstractions upstream, Krummrich said. "So, reviewing," Gary Guo replied. Krummrich agreed, to scattered chuckles.

Alice Ryhl had questions about the chicken-and-egg problem he had described; she hasn't run into it. For her, telling maintainers "here's this abstraction, I'll upstream the user later" has worked just fine. Krummrich said that for a few patch series, he had been explicitly asked to show a user. Ryhl suggested that it might help to reference specific patches that are not upstream yet.

Miguel Ojeda asked how much safe and unsafe code had been required so far. Krummrich thought it was a fair question, but that the answer might be misleading without the context of which features have actually been implemented yet. The current Nova stub can represent PCI buses, read and write some values, and use those to initialize the GPU, he said. It can also use a DRM abstraction to create an object representing a message queue, but it does not actually allocate memory or communicate with the GPU yet. And, in its current state, it does not use any unsafe code.

Lossin asked how much unsafe code Krummrich expected to need when the driver was complete. Ideally, unsafe code would only be needed for the firmware interface, where there is shared memory, Krummrich answered.

Carlos Bilbao was surprised that it was possible to get even that far with only safe code — doesn't the driver need to memory-map registers? When the code sets up the PCI interface, the size of the base address registers (BARs) is known either at compile time or at run time from the PCI subsystem, Krummrich explained. So the Nova driver says "I need N bytes out of a PCI BAR", and this allocation either fails or succeeds. If it succeeds, it can be wrapped in a structure with bounds checking that ensures that once the device is unbound, the memory can no longer be accessed. That's a shared abstraction in the kernel crate, so the driver itself never needs unsafe code. As long as the abstraction is sound, the whole thing should be memory safe, albeit with an obvious caveat: the GPU can do whatever it likes in response to changes to memory-mapped registers, potentially including things that subvert Rust's guarantees.

Andreas Hindborg said that he thought they had debated whether the bound on the size of a PCI BAR should be known at run time or compile time. On the GPU, the offset can't be known at compile time, Krummrich said, because the GPU itself tells you how big the allocation should be. Hindborg clarified that this meant that we were trusting the device to give accurate answers.

If the device gives a random address that makes no sense, it will fail the bounds check on the PCI BAR itself, Krummrich replied. Greg Kroah-Hartman clarified that this may be the case, but we're still trusting the hardware. Hindborg said he was convinced of the safety of the approach, however, pointing out that they know the size of the PCI BAR. "Sometimes," Kroah-Hartman ominously objected. Krummrich was of the opinion that if the device was going to lie about the size of the PCI BAR "we can't do anything about that".

Kroah-Hartman warned that this wasn't a hypothetical — there are USB devices that lie about the size of the BAR, and use that to take over your kernel. That's why the kernel has trusted and untrusted modes for PCI (and USB, as well) — do you trust the hardware or not? But he admitted that they do need to trust the hardware at some point.

This kicked off an extensive discussion about whether writing a filesystem that does not trust its underlying block device is possible, including whether there was a way to track data that has not been validated in the type system. After the session, Lossin followed up on that discussion by posting a patch set introducing an abstraction for tracking unvalidated data.

The conclusion in the room was that compile-time tracking like that would be useful, but that it certainly would not come without some work. What is potentially possible soon (with good API design) is making filesystem drivers that, when used with malicious filesystems, don't allow kernel exploits — even if there's no practical way to prevent them from returning arbitrary bad data back to user space.

[ Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our coverage of Kangrejos. ]

Comments (8 posted)

The 2024 Maintainers Summit

By Jonathan Corbet
September 19, 2024

The kernel Maintainers Summit is an annual, invitation-only gathering of a few dozen of the top kernel subsystem maintainers. The 2024 Summit was held on September 17 in Vienna, Austria, immediately prior to the Linux Plumbers Conference. LWN was represented there by Jonathan Corbet (the kernel's documentation maintainer).

The topics discussed at this year's gathering included:

Regression tracking: the kernel's regression tracker is unfunded and unsure about continuing; is this work valuable and how can the task be supported?
Considering kernel pass-through interfaces: what position should the kernel community take toward device drivers that simply pass commands through, unmediated, to a device?
Tools for kernel developers: the current status and future direction for kernel-development tools.
Committing to Rust in the kernel: has the Rust experiment succeeded, and where does it go from here?

Group photo

[ Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to this event. ]

Comments (none posted)

The uncertain future of kernel regression tracking

By Jonathan Corbet
September 19, 2024

Maintainers Summit

Tracking of regressions seems like an important task for any project; there is no other way to ensure that known problems are fixed. At the 2024 Maintainers Summit, though, Thorsten Leemhuis, who has been doing that work for the kernel, expressed some doubts about whether it is worth continuing. The result was an energetic session on how regression tracking should be done better, and how this work should be supported.

Leemhuis began by saying that he is thinking about giving up on regression tracking. The funding that was supporting this work has gone away. On top of that, this work has resulted in a number of "annoying" discussions with maintainers who do not appreciate being nagged about open regressions. He does not really even know what Linus Torvalds expects with regard to regression tracking and fixes. Burnout is a problem for many maintainers, and being pressed to fix regressions can make it worse, but burnout is a problem for Leemhuis as well

He made a request for some basic guidance regarding the expectations in this area. His reports to Torvalds on open regressions often get no reply at all; Torvalds answered that he tends not to answer email unless it is truly necessary.

A lack of support from developers and maintainers is making the regression-tracking task harder. There is a mailing list for discussions on regressions, but almost nobody adds it to the CC list. He understands why; few people want to feel that he is watching them, but including that list would help a lot. The use of Closes tags on patches that fix regressions would also be useful.

Leemhuis said that he often gets responses along the lines of "you are not my manager" when he reminds maintainers of regressions in their subsystem. It makes him feel bad; "nobody likes cops" and he is not enjoying being one.

With regard to tools, dealing with the kernel bugzilla remains annoying. Additionally, he cannot add bugzilla participants to email CC lists due to GDPR rules; email addresses provided there are personally identifying information, and the site's policy does not allow exposing them in that way. Some subsystems, he said, are using him as a go-between with bugzilla, but he is not paid to do that work.

His regzbot regression-tracking tool works, he said, but he has never been a good programmer and would like some help with it. He has, still, added GitHub support to regzbot and can track regressions reported there. In the absence of funding, though, he is unlikely to continue this work.

He mentioned the possibility of maintaining a separate Git tree for regression fixes that could be sent upstream if subsystem maintainers do not send fixes themselves. Torvalds said that he has pulled that sort of fixes branch in the past but did not like it; that is not the right way to bypass maintainers. But perhaps it is necessary, he said, when regressions are not being fixed. Jens Axboe suggested that this sort of bypass should be normal practice for any regressions that are still present when the development cycle reaches -rc6.

Fix or revert

The discussion had mostly focused on reverting changes that cause regressions, but Torvalds pointed out that there is often a known fix for any given bug, so a revert is not always the right approach. Fixes are better than reverts, but he does not want anybody to rely on third parties getting fixes upstream. Will Deacon said that maintainers, in general, strongly prefer to not have regressions in their subsystems and they do not just ignore them. It can take a while to get a fix in, though; fixes need testing and integration work. It can be annoying to be nagged by Leemhuis while this work is ongoing.

Leemhuis responded that he does not normally bother maintainers if he is able to see any sort of activity aimed at a problem. He does get a bit more aggressive if a regression has found its way into the stable trees, though; in that case, the fix cannot be backported until it hits the mainline, so there is an additional reason to hurry. Torvalds said that his normal deadline for fixes is -rc6; if nothing has arrived by then he will consider other actions.

Ted Ts'o said that there can be a few failure points in this whole process. One is when maintainers acknowledge a regression, but hold off on applying a fix. A different sort of failure happens in the case of sporadic maintainers, who may not be paying attention at the time at all. He once fell down on a regression that he had not realized had found its way into a released kernel. Since he did not know that real users were affected, he did not prioritize a quick fix; better reporting would have helped in that case, he said.

Ts'o also noted that this is the first he has heard that the regression-tracking work is no longer funded; fixing that should be "a no-brainer", he said. Thomas Gleixner added that the kernel needs to apply some sort of "sustainability tax" so that this work (and more, including addressing technical debt) can be supported. Ts'o said that developers should push their employers to support cleanup work, but Gleixner was pessimistic about that approach; proper funding and dedicated people are what is needed.

Kees Cook asked Leemhuis whether he wants to continue this work if funding can be found; if so, what other changes would he like to see? Leemhuis said that he would like to continue, but that he wants to see some agreed-upon guidelines about regression response and what his role should be. Deacon said that there are two jobs involved: tracking regressions (which everybody wants more of), and chasing developers for fixes, which is less popular. Perhaps, he suggested, the responsibilities should be separated.

Rafael Wysocki said that, when he did regression tracking years ago, he only did the tracking part. Torvalds said that he likes it when somebody else is the bad guy for once. But, he said, adding more people to the task is not going to improve fix times. He suggested working more on automated tracking and nagging; people respond less emotionally to an automated message. But, he said, it would also be good to just impose a policy saying that, after a given number of weeks without a fix, a patch causing a regression will simply be reverted.

Guidelines

With regard to guidelines, Ts'o suggested that, normally, a fix should land in linux-next within a week of a regression report. Jason Gunthorpe said that, sometimes, a regression has wide-ranging impact, causing automated testing to fail. Those regressions need to be fixed quickly; that can involve escalating a situation to Torvalds, which nobody likes. Torvalds, though, said that escalation is the right thing to do; he is happy to aggressively revert changes that break testing. Having a patch reverted that way does not need to carry a stigma; it is simply something that happens at times.

Dan Williams suggested that a fully automatic process for reverting buggy commits would help to reduce the stigma associated with reversion; Torvalds agreed, saying that would make things less personal. Deacon suggested putting the responsibility on maintainers to do those reverts; Torvalds said he would like that, but he feared that each additional person in the chain would add latency before a fix gets to the mainline. Alexei Starovoitov said that, in some subsystems, reverts are just standard procedure; Torvalds worried that such a policy could encourage people to apply non-ready patches, secure in the knowledge that they can be quickly pulled back out if it turns out to be a bad idea. Dave Airlie said that sometimes reverting a patch is not simple; other changes may depend on it. Torvalds answered that the process can never be entirely automatic.

Leemhuis said that the most important task is to establish some real deadlines for regression response; Torvalds suggested writing a documentation patch and getting comments. Leemhuis also asked about whether there is too much immediate backporting of fixes that land in the mainline during the merge window, risking backporting regressions as well. The problem there, Torvalds answered, is that linux-next is not getting enough testing; many of those regressions should never get to the mainline in the first place.

When asked which subsystems handle regressions well, Leemhuis mentioned the tip tree (which handles x86 and many core-kernel patches) and the block subsystem. Gleixner (the "t" in "tip") said that this responsiveness comes at a price; dealing with regressions takes a lot of time. More problematic, Leemhuis said, are often the "sub-subsystems" that go through more than one level of maintainer. The top-level maintainer may understand the situation, but low-level maintainers may be too far removed from Torvalds to share the same priorities. As a result, fixes can sit in linux-next for too long before landing upstream.

Torvalds asserted that linux-next is useful for changes aimed at the next merge window, but that it is less useful for fixes. The needed test coverage, he said, just isn't there. So running fixes though linux-next is just a waste of time. Arnd Bergmann protested that he is indeed running daily build tests on linux-next and letting maintainers know when things break. Many of the problems he reports move into the mainline unfixed, though.

As the session closed, the maintainers in the room affirmed that they find the regression-tracking work useful, and that they would like it to continue.

Comments (9 posted)

Considering kernel pass-through interfaces

By Jonathan Corbet
September 20, 2024

Maintainers Summit

The kernel normally sits firmly between user space and the system's peripheral devices, and provides a standard interface to those devices. At times, though, a more direct interface to a device is desired — but such interfaces can be controversial. At the 2024 Maintainers Summit, the assembled developers considered a specific case — the proposed fwctl subsystem — as well as the role of such drivers in general.

Fwctl comes out of a longstanding disagreement over a driver initially called mlx5ctl; its purpose is to allow a user-space utility to adjust any of hundreds of tunable parameters within mlx5 devices (which implement InfiniBand, RDMA, and other protocols). Proponents say that this interface is necessary to configure the device properly; opponents say that it is a way to bypass the kernel and that device-independent interfaces should be designed for this task instead. The discussion went on over the course of a year or so, with no resolution in sight.

The session was led by Dan Williams and fwctl developer Jason Gunthorpe. Williams started by saying that he had two objectives in mind. One was to try to heal the community; he has been watching two developers he respects in conflict and the conversation deteriorating over time. Additionally, there were junior developers finding themselves caught in the middle who might just end up leaving the community. But Williams also had a more personal goal: he, too, is working on an API for device provisioning and feels the need to allow direct access to vendor commands to get the job done.

The argument against fwctl, he said, is that this kind of configuration should be done with the network subsystem's devlink interface instead; fwctl is seen as a shortcut to avoid creating a proper interface. Allowing fwctl, it is feared, would reduce the motivation to improve devlink. Proponents of fwctl, instead, say that devlink does not provide the needed functionality, and that insisting on it is forcing these interfaces to be maintained outside of the mainline instead.

Inspired by lockdown

Gunthorpe explained that the kernel lockdown feature disables access to /dev/mem, which is the interface by which these devices had traditionally been controlled. This interface provides direct access to the system's I/O memory, which presents no end of potential security problems. Fwctl is an attempt to keep the configuration software working in a locked-down world. He is well familiar with the devlink interface, but there is no standard behind the configuration of these complex devices; every manufacturer creates its own set of knobs that is hard to bring into a common interface.

Even so, the original plan for mlx5ctl had been to use devlink, but that quickly led to having to argue about 300 different parameters, each of which had to be standardized and approved independently. Additionally, devlink does not provide access to debugging information, while mlx5ctl and fwctl provide access to that sort of data. There is no alternative proposal out there, he said, that provides the needed access.

Devlink, he said, has its origins in the "WiFi debacle", where every driver provided a different configuration interface. It cleaned up that situation and led to the conclusion that providing direct access to device-configuration interfaces was a mistake. That conclusion made sense in the networking context, but there are no standards for these complex, server-oriented devices, so devlink is not a good fit.

Williams said that he would like the group to agree that non-generic device commands exist, and that users need access to those commands. For the CXL devices he works with, the policy has been that no such vendor commands would be enabled, that any needed functionality must be incorporated into the standards instead. "That has happened zero times", he said; making it harder to solve problems does not force vendors to come and talk to us. A different policy is called for here, he said. Gunthorpe added that generic interfaces are the right solution for WiFi configuration, but they are a poor match to key-value stores buried in device firmware.

There are, Williams continued, classes of device commands that the kernel just does not care about. These include device-specific configuration and access to debugging information. He asked the group to agree that, in such cases, the kernel should not stand in the way; the alternative is that vendors just push distributors to ship out-of-tree code instead.

Security boundaries

Dave Airlie worried that such interfaces could facilitate the compromise of the whole system; almost all firmware has that ability (by writing arbitrary system memory, for example) somewhere. Since locked-down systems are involved, security is clearly a concern; given the low quality of most firmware implementations, he is worried about providing this kind of access. Will vendors provide assurances that their systems cannot be used to compromise the kernel?

Gunthorpe replied that devlink can be used now to flash new firmware, so all bets are off when using that interface too. Linus Torvalds, more bluntly, said that "lockdown is a joke", a public-relations feature that lets kernel developers make a show of being careful. With regard to fwctl, he said that he does not see what the argument is about; forcing the use of devlink has no benefits for either the kernel or the hardware vendors. He added that, if somebody is running as root, they can do what they want with the hardware; "we are not doing DRM".

Ted Ts'o said "if you don't trust the hardware, you might as well just go home". Arnd Bergmann asked why users wanting to run these configuration utilities don't just turn off lockdown. Gunthorpe answered that customers (especially governments) often require it to be enabled. Fwctl was created partially to support this kind of deployment; it includes a long list of rules on what commands are allowed to do, so that they do not compromise system security.

Damien Le Moal agreed that existing commands provide plenty of ways for somebody to damage their system; if they do so, it is their fault. But, he said, a device driver's job is to configure hardware properly; he wondered why this additional interface was needed. Gunthorpe answered that modern devices are hugely complex and must be configured to work within the environment in which they are used. Vendors can ship special configurations suited to the needs of the largest customers — the Googles and Metas of the world. Smaller customers, though, must customize their devices in the field; that is where this kind of interface is needed.

Kees Cook agreed that, in the end, there is no alternative to trusting the hardware. Existing ways of configuring hardware using /dev/mem are opaque; the fwctl interface is better, he said. Gunthorpe agreed that fwctl is far better for the reverse-engineering of hardware; it also makes it easier for the kernel to block or modify specific commands if they turn out to be problematic. Williams added that vendors can normally be trusted to abide by the system's security boundaries.

Ts'o pointed out that the SG_IO operation (which allows arbitrary commands to be sent to SCSI devices) has been supported by the kernel since the early days; it provides the same capabilities as fwctl. Perhaps, he said, SG_IO is only "grandfathered because CD burners", but it would be good to know what the policy is. Having distributors applying out-of-tree patches seems like a worse outcome than just including fwctl, he said.

Gunthorpe said that he is trying to create a general policy for this kind of interface. Torvalds said that users have to be able to run device-specific commands; SG_IO is a good example of the sort of capability that the kernel has always supported. It is fine for the kernel to apply a root-only policy to commands it does not recognize, but he has no interest in saying that the owner of a machine cannot manage their hardware.

The rule, he said, is that developers should try to prevent each device from implementing its own pass-through command; instead, there should be some sort of baseline (such as fwctl) for this access. Permissible commands should only change the device in question, but touch no other part of the system. There should be "no random DMA" operations. In the end, though, the kernel has to trust the hardware.

Will Deacon asked how fwctl would interact with common interfaces; will it be possible to spot commands that are common between devices and standardize them? Gunthorpe answered that common interfaces provide a better interface for users. Some of that needs to be provided by user-space utilities, though; it is easier to create shared interfaces there.

Cook said that having arbitrary applications accessing device memory via /dev/mem is bad; there is no way for the kernel to impose a policy in that setting. Torvalds answered: "we call that X11". Cook said that he wants to see documentation of the commands supported by a device and what they do. Airlie answered that there will only be a single command, analogous to SG_IO, that passes operations through to the device. Torvalds said that this interface is still better than using /dev/mem.

Cross-subsystem disagreement

Williams raised another aspect of this debate — that of cross-subsystem nacks. The fwctl patches have been blocked by the networking subsystem maintainer, even though that work is not a part of his domain. How far, Williams asked, can a maintainer's veto extend beyond their own subsystem? Airlie answered that this was Torvalds's problem. Gunthorpe said that there is a precedent for how to respond to this sort of block: the RDMA subsystem was started as a response to the blocking of support for TCP offload engines in the networking subsystem. That block was the right decision for networking, but there is still a place for TCP offloading in the kernel. RDMA is widely used and supported by open-source software now; it is, he said, a great success story. Fwctl could be a similar story.

There seemed to be a clear consensus in the room that work on fwctl should proceed and find its way into the mainline; Gunthorpe was asked about how he planned to do that. He answered that he likes to see three independent users before merging a new subsystem; that helps to show that the interfaces are correct. A proposed third fwctl user had showed up in his inbox that morning, adding to the existing RDMA and CXL users. For now, he plans to focus on getting the drivers into good shape, and expects to send out a pull request in roughly six months. An implementation will ship to Mellanox users sooner, though.

Six months may seem like a long time; Gunthorpe said that he has been taking it slowly and carefully because of the pushback he has been receiving. He respects the people who have opposed this work and wants to show that respect by doing the job properly. Even so, he expects the nack from the networking subsystem to persist; it is "the right position" for that subsystem to have, he said, but would like to have some peace and get this work done. The session closed with him saying that he would have further discussions with the developers involved.

Comments (10 posted)

Tools for kernel developers

By Jonathan Corbet
September 23, 2024

Maintainers Summit

Konstantin Ryabitsev started a session on development tooling at the 2024 Maintainers Summit by saying that he does not want to be a "wrecking ball". If a given workflow is working for people, he does not want to try to force any sort of change. That said, he has ideas for how he can continue his work on providing better tooling for the development community.

The use of the b4 tool is increasing, he said, and the new features for patch preparation and submission have been well received. He is working on a b4 review command that will ease the manual process of sending Acked-by and Reviewed-by tags for patches written by others. Jason Gunthorpe asked whether it will be possible to send free-form remarks along with the tags; the answer was "yes".

Another area of work is the "bugspray" bot that is intended to integrate the kernel bugzilla instance with the project's mailing lists and Git repositories. Use of bugspray within a given subsystem will require maintainer buy-in; those who do not like bugzilla are free to ignore it. Since bugspray can look at a report and try to figure out which subsystems are involved, Ryabitsev is thinking about removing all of the various subsystem components from the bugzilla server. Many of those components have not been used in years, but it is hard to tell which ones are active, and users can have difficulties choosing the correct one. Instead, users would just file bugs for the kernel and the bot would figure it out.

One thing that is needed, he said, is volunteers to be the first point of contact for bug reports. Thorsten Leemhuis said that he would be willing to do that if he could find funding for the work.

Ted Ts'o said that he needs the ability to move bugs between subsystems; often, as developers dig into why something is going wrong, they conclude that the bug is not where it first seemed to be. So, to be useful, bugspray needs to make it possible to reassign a bug.

Leemhuis pointed out a problem with the existing bugzilla instance. Email addresses, which must be provided to file a bug, are considered to be private information. Using a bug submitter's address to include them in email conversations is thus a GDPR violation, and evidently somebody has complained. Linus Torvalds said that there needs to be a public notice on the bugzilla site that email addresses are public; a similar thing was done with the kernel's developer certificate of origin years ago.

With regard to the lore email archive, Ryabitsev is working on providing pre-filtered email inboxes for specific subsystems. The Linux Foundation continues to fund work on the public-inbox archive system that underlies lore. There is experimental work on an automated "what's new" summary generator that could provide an overview of what is happening in a given subsystem. This feature uses a large language model to generate the summaries, and is "hit and miss" for now, he said.

There are early trials underway to provide development-forge services using Forgejo; nothing is publicly available yet.

What Ryabitsev wanted more than anything else was to find out if the maintainers in the room support the work that he is doing. Along with everything described above, that work includes maintaining the lore email archives, writing workflow documentation, helping new maintainers get started with the tools and processes, deploying new services, and maintaining the kernel keyring. The answer to that question was clear: the development community truly appreciates this work, and would like to see a lot more of it.

As things wound down, Leemhuis said that he would like to get his regzbot regression-tracking tool on the list of supported systems. It can do things that bugzilla cannot, he said, including monitoring outside trackers for information on regressions. There was some unfocused talk on the maintenance of the patchwork system and whether it still needs to exist. Ts'o asked for an authentication mechanism that would allow automated systems (such as continuous-integration testers) to get past kernel.org's increasingly fortified bot defenses.

At the close of the session, Ryabitsev said that he was happy to continue working to support the community, but that he could use more help. Keeping the email archives going, in particular, is a surprisingly labor-intensive task that makes it hard to get anything else done.

Comments (16 posted)

Committing to Rust in the kernel

By Jonathan Corbet
September 24, 2024

Maintainers Summit

The project to enable the writing of kernel code in Rust has been underway for several years, and each kernel release includes more Rust code. Even so, some developers have expressed frustration at the time it takes to get new functionality merged, and an air of uncertainty still hangs over the project. At the 2024 Maintainers Summit, Miguel Ojeda led a discussion on the status of Rust in the kernel and whether the time had come to stop considering it an experimental project. There were not answers to all of the questions, but it seems clear that Rust in the kernel will continue steaming ahead.

Ojeda started with the topic of the flexibility needed from the kernel's subsystem maintainers. Two years ago, before the initial Rust support was pulled into the kernel, he had requested that flexibility because there would be the need to change some core APIs at times to fit Rust code in. The need for that flexibility is being felt now, he said.

There are some clear differences in the expectations around Rust in the kernel, he continued. He has read through thousands of comments and emails on recent events, and has seen a wide range of opinions on the state of the Rust-for-Linux project and where it is headed. It would be a good thing to converge on a common understanding of what the goals are. People and companies want to invest in Rust for the kernel, but they are unsure about its future.

Jason Gunthorpe said that he, like many other kernel developers, has not participated in this work so far. The project was intended to demonstrate that Rust is suitable for kernel usage; he is waiting for the decision on the outcome. Dave Airlie said that the experiment is not complete, but Greg Kroah-Hartman said that it is clear at this point that Rust in the kernel is viable. Part of the reason for the apparent slow pace of the work, he said, is that the Rust developers are concentrating on device drivers; since drivers must interface with many other kernel subsystems, there is a lot of support code that must be merged. That takes time.

Gunthorpe said that he would like to see a clear message that Rust is a success before jumping into it; he also is unable to work with Rust until a suitable compiler is available in RHEL. Airlie said that perhaps, for Gunthorpe, the time had not yet come.

Tooling and help

Arnd Bergmann said that there was no doubt that drivers written in Rust would be better than those in C, but he wondered how long it would take to merge support in all the necessary subsystems, and when the tooling would be widely available. When, he asked, will he be able to build kernel code with a Rust compiler shipped by his distribution? Ojeda answered that multiple compiler versions are now supported by the kernel code, and that suitable compilers are available from many community-oriented distributions. Airlie said that it is too soon to ask the Rust community for a completely stable compiler to build kernel code with; there just is not yet enough Rust code in the kernel to make that happen.

Linus Torvalds admonished the group that he did not want to talk about every subsystem supporting Rust at this time; getting support into some of them is sufficient for now. When Airlie asked what would happen when some subsystem blocks progress, Torvalds answered "that's my job". Christian Brauner said that the binder driver is motivating much of the subsystem work now, including the somewhat contentious filesystem abstractions. That code is being reviewed now. Airlie added that the first real driver to be merged will be a sort of inflection point, after which the pace will pick up; the next challenge after that will be the creation of Rust infrastructure that is callable from C.

Will Deacon asked Ojeda about the support that the Rust community was offering to kernel developers; Ojeda answered that he has been building a team of experts to help where needed. Some of these people are core Rust developers who know the language thoroughly; they can help to review patches even if they lack deep kernel experience.

Torvalds pointed out that there are kernel features that are currently incompatible with Rust; that is impeding Rust support overall. He mentioned modversions in particular; that problem is being worked on. The list of blocking features is getting shorter, he said, but it still includes kernel features that people need.

Managing expectations

Dan Williams pointed out that he once spent two years just getting a new mmap() flag merged. It is necessary to manage expectations on the Rust side, he said; merging all of that code will be a slow process. Ojeda acknowledged this point, but said that the companies funding the Rust work are not seeing it going upstream; that is making them reluctant to continue that funding going forward.

Brauner said that nobody has ever declared that the filesystem abstractions would not be merged; the discussion is all about the details of how that will happen.

Ted Ts'o said that the Rust developers have been trying to avoid scaring kernel maintainers, and have been saying that "all you need is to learn a little Rust". But a little Rust is not enough to understand filesystem abstractions, which have to deal with that subsystem's complex locking rules. There is a need for documentation and tutorials on how to write filesystem code in idiomatic Rust. He said that he has a lot to learn; he is willing to do that, but needs help on what to learn. (See this article for a discussion of how the Rust-for-Linux developers are working to meet this need).

Torvalds said that it is not necessary to understand Rust to let it into a subsystem; after all, he said, nobody understands the memory-management subsystem, but everybody is able to work with it. I pointed out that the Rust developers are not just creating subsystem bindings; they are trying to create inherently safe interfaces, and that often requires changes on the C side. That increases the impact on subsystems, but also makes the C code better. Airlie added that the Rust developers have to bring maintainers along with them, or the maintainers will not understand what is happening.

Deacon raised the question of refactoring on the C side. Changing C interfaces will often have implications for the Rust code and may break it; somebody will the have to fix the problems. Torvalds said that, for now, breaking the Rust code is permissible, but that will change at some point in the future. Kroah-Hartman said that the Rust developers can take responsibility for the maintenance of the abstractions they add.

Steam right ahead

Torvalds said that nothing depends on Rust in the kernel now, and nothing will for some time yet. What is important is to make forward progress, so developers should "steam right ahead" and not worry about these problems for now. It is enough to get things working, even though the details are not right. Once users are depending on Rust code, it will be necessary to worry more, he said, but kernel developers should not fail by being too careful now.

Thomas Gleixner said that the Rust developers are careful about documenting their code, and he is not frightened by the prospect of refactoring it. If he does not understand something, he will simply send an email to the developer, just as he does with C code. Torvalds added that Rust has a lot to offer, and the kernel should try to take advantage of it. Kroah-Hartman said that it could eliminate entire classes of bugs in the kernel.

Deacon asked how many developers are working on the Rust side now; Ojeda answered that there are currently six or seven people, most of whom are "real Rust experts". The strongest kernel expertise in the group had been Wedson Almeida Filho, who had recently left the project. That was a real loss, but Ojeda is working to recruit others.

Gleixner said that, 20 years ago, there had been a lot of fear and concern surrounding the realtime kernel work; he is seeing the same thing now with regard to Rust. We cannot let that fear drive things, he said. Torvalds said that Rust has been partially integrated for two years. That is nothing, he said; the project to build the kernel with Clang took a decade, and that was the same old language.

Julia Lawall asked what happens when things change on the C side; how much will leak through into the Rust code? Bergmann said that reviewing Rust abstractions for a C subsystem without knowing Rust is not difficult; he can reach a point where he understands the code, but would not feel able to change it.

Torvalds said that the community can play around with Rust for a few years. Gunthorpe, though, said that it would be good to get something into production; that would give the project some needed momentum. The binder driver might be a good choice. Ojeda said that would help to justify more support from companies. As the session closed, though, the primary outcome may well have been expressed by Torvalds, who suggested telling people that getting kernel Rust up to production levels will happen, but it will take years.

Comments (51 posted)

The 6.12 merge window begins

By Jonathan Corbet
September 20, 2024

As of this writing, 6,778 non-merge changesets have been pulled into the mainline kernel for the 6.12 release — over half of the work that had been staged in linux-next prior to the opening of the merge window. There has been a lot of refactoring and cleanup work this time around, but also some significant changes. Read on for a summary of the first half of the 6.12 merge window.

The most significant changes pulled to date include:

Architecture-specific

The Arm "permission overlay extension" feature is now supported, making memory protection keys available on that architecture.
There are now separate configuration options for each x86 Spectre mitigation technique, allowing kernels to be customized to specific processor models.
The Loongarch, 64-bit Arm, PowerPC, and s390 architectures have all gained support for the vDSO implementation of the getrandom() system call.

Core kernel

Io_uring operations can now have absolute timeouts, along with the relative timeouts that were already supported.
The remaining pieces of the deadline server mechanism have been merged. Deadline servers replace realtime throttling with a special server task running under the deadline scheduler; it ensures that normal-priority tasks get a small chance to run even if a realtime task is monopolizing the CPUs.
Also completed in this cycle was the EEVDF scheduler, which replaces the completely fair scheduler and, with luck, provides better response times.
Some of the preliminary work needed for the merging of the extensible scheduling class (sched_ext) has landed. The pull request for sched_ext itself has also been sent, but has not been acted upon as of this writing; it seems likely to be pulled before the merge window closes.
A simple series allowing realtime preemption to be configured in mainline kernels has been merged. This change marks a milestone in a 20-year development effort to bring realtime response to a general-purpose kernel.

Filesystems and block I/O

There is a new fcntl() operation (F_CREATED_QUERY) that allows an application to determine whether a file opened with O_CREAT was actually created (rather than already existing).
The name_to_handle_at() system call has gained the ability to provide unique, 64-bit mount IDs, eliminating a racy workaround needed until now; see this commit for some more information.
The size of struct file within the kernel has been reduced from 232 bytes to 184; that will provide significant memory savings on systems running file-heavy workloads. See this commit for a description of how that reduction was accomplished.
It is no longer possible to mount a filesystem on top of any of the ephemeral files in /proc — the files under /proc/PID/fd, for example. Allowing such mounts makes little sense and can be a security problem, so it was removed as a bug; see this commit for more information.
The namespace filesystem (nsfs) has gained the ability to provide more information about mount namespaces; see this commit for details.
The EROFS filesystem can now mount filesystems directly from images stored in files; see this commit for more details.
The XFS filesystem has gained two ioctl() commands that will exchange the contents of two files. XFS_IOC_START_COMMIT sets up the exchange, while XFS_IOC_COMMIT_RANGE actually effects the exchange, but only if the second file has not changed in the meantime. This commit contains a man page for these operations.

Hardware support

GPIO and pin control: Analog Devices ADP5585 GPIO controllers.
Input: Goodix GT7986U SPI HID touchscreens.
Miscellaneous: Rockchip true random number generators, Arm NI-700 performance-monitoring units, Mobileye EyeQ reset controllers, Nuvoton MA35D1 SDHCI controllers, Analog Devices ADP5585 pulse-width modulators, and Microsoft Surface thermal sensors.
Networking: AMCC QT2025 PHYs (implemented in Rust), Rockchip CAN-FD controllers, Realtek Automotive Switch 9054/9068/9072/9075/9068/9071 PCIe Interfaces, OPEN Alliance TC6 10BASE-T1x MAC-PHYs, and Microchip LAN8650/1 Rev.B0/B1 MACPHY Ethernet chips.
Sound: MediaTek MT6357 codecs.

Networking

The device memory TCP patch set has been merged. It provides an optimized data-transfer path for applications that are transferring data between the network and a peripheral device without the need to go through the CPU.

Security-related

The FOLL_FORCE removal patch has been merged. This internal kernel flag had been used by /proc/PID/mem, making it an attractive target for attackers. Its removal can break some systems, so it is not effective by default; the proc_mem.force_override= command-line parameter can be used to change it. See this commit for a bit more information.
The security-module subsystem now uses static calls for almost all callbacks, improving both performance and security.
The Integrity Policy Enforcement security module has been added. According to the merge message: "the basic motivation behind IPE is to provide a mechanism such that administrators can restrict execution to only those binaries which come from integrity protected storage". See this documentation commit for more information.

Virtualization and containers

64-Bit Arm kernels can now run as a guest on protected KVM systems.

Internal kernel changes

msleep() has long added an extra jiffy (scheduler clock tick) to the requested sleep time to ensure that the caller did not wake too soon. That padding has not been necessary to meet that requirement for some time, so it has been removed in 6.12.
The final set of printk() improvements has been merged. This was the last significant piece of the realtime preemption patch set that remained out of tree.
The interface to kmem_cache_create() has changed significantly. It now accepts a pointer to a kmem_cache_args structure describing how the cache should be created; at this time, that structure can be best seen in this commit. Thanks to some macro magic, calls to the older API still work as expected, but those calls can be expected to be migrated to the new API over time.

The 6.12 merge window will likely remain open through September 29. Once it closes, LWN will be back with a summary of the rest of the changes merged for the next major kernel release.

Comments (none posted)

RPM 4.20 is coming

By Joe Brockmeier
September 19, 2024

The RPM Package Manager (RPM) project is nearing the release of RPM 4.20, the last major planned update for the RPM 4.x series. It has few user-facing changes, but several additions and enhancements for developers—as well as some small incompatibilities that will likely require RPM packagers to revise their spec files. 4.20 will be rolling out to many users soon, in Fedora 41, which is scheduled for October. RPM 6.0 is already in the works, with a new package format and opening the door to enabling C++ use in the RPM codebase.

An RPM release consists of the command-line suite of tools for installing, managing, removing, and creating RPM packages. It also includes RPM plugins and librpm, which provides the RPM API to user-facing tools like DNF, Zypper, and others. In addition, the project maintains the RPM package format and spec file format documentation, which is updated with each RPM release. RPM spec files are text files, with the .spec extension, that describe how to build an RPM. They are included with a package's source RPM (.src.rpm), along with the original source code and patches to be applied to the software.

Many Linux users interact directly with RPM rarely, if at all. Desktop users can do all their software management on RPM-based systems, like Fedora and openSUSE, with tools that use RPM behind the scenes, such as DNF, Zypper, or GNOME Software. Packagers, on the other hand, spend quite a bit of time with RPM, and the bulk of the features in 4.20 are designed to make it easier to build and maintain RPMs.

Append and prepend

RPM spec files define package-build steps in scriplet sections such as %prep (get sources ready), %setup (create build directories), %patch (apply patches), and so forth. With 4.20, RPM has added the ability to append (-a) and prepend (-p) the sections with additional commands. This will allow packagers to insert instructions before or after each build step. For example, one might add an %install -a section to remove a file after the package is installed. The append and prepend directives are useful to RPM in any case, but are particularly useful in conjunction with the addition of one of RPM 4.20's more interesting new features: declarative builds.

Declarative builds

This feature is meant to allow upstream projects and distributions to provide build-system macros for common build processes, such as creating Python packages, Ruby gems, or Rust crates. Packagers can use this feature to declare a desired build system for a package with a single stanza, rather than each packager having to describe the build process on their own. The example given in the request for enhancement (RFE) is building projects that use GNU Autotools. When compiling from source, that would usually involve the familiar steps:

    $ ./configure
    $ make
    # make install

In an RPM spec file, that would usually look something like this:

    %prep
    %autosetup
    
    %build
    %configure
    %make_build
    
    %install
    %make_install

With this release, that can be condensed to a single line:

    BuildSystem: autotools

Declarative builds should reduce the amount of redundant boilerplate that developers have to add to package spec files—but it does not preclude tweaking things if necessary. Declarative builds can be modified using the BuildOption tag or by prepending and appending %install -a, %build -p, etc., sections. See the documentation and example macros for more on declarative builds.

Public plugin API

RPM plugins provide support for features that are not suitable for all platforms. For example, Fedora users may want the SELinux plugin, but it is not suitable for systems with AppArmor, such as openSUSE. RPM has also been ported to non-Linux systems, and many of RPM's plugins are Linux-specific.

Work began on RPM's plugin system in 2012. RPM shipped a plugin system in the 4.12 release in 2014, but the API has been considered subject to change and kept internal-only ever since. Not that it has changed: the RPM plugin interface is basically unmodified since the 4.12.0 release. In 2023, RPM developer Panu Matilainen said that the API should have been public to begin with and committed the change to make it so. "We've procrastinated on making this API public for about ten years now, and in the meanwhile there has been exactly one disruptive change to the API."

Making the API public will make it possible for others to develop RPM plugins with confidence that subsequent RPM releases won't break the plugins. There is still work to be done for RPM to verify that plugins are compatible with the version of RPM being used, but that does not look likely to ship in 4.20.

Isolation

When RPMs are installed or removed, they can run scriptlets that make updates to the system. For example, a package may need to restart a service or make updates to the system's GNOME settings. Running these scriptlets is necessary, but Johannes Segitz noted that they can pose a security challenge since packagers may "naively" place code in /tmp that could be exploited for privilege escalation and submitted pull request with a proof-of-concept to provide a private /tmp directory for scriptlets.

Matilainen broke that out as a plugin and added Linux-specific functionality to implement optional filesystem and network isolation using namespaces. Specifically, the unshare plugin shipped with RPM 4.20 can mount paths privately during scriptlet execution, such as /tmp and /home, so that scriptlets do not have access to the system /tmp or user home directories. The directories to be isolated are specified in /usr/lib/rpm/macros.d/macros.transaction_unshare as a colon-separated list of directories.

The plugin also allows disabling network access, so that the scriptlet cannot upload or download anything during execution.

Goodbye `%patchN`

Packagers often add patches to the original upstream source code of a project. RPM has a macro for this, %patch, which has accepted several variations for some time: "%patch N", "%patch -P N", or "%patchN" allow packagers to specify one or more patches to be applied at build time. However, in RPM 4.19, the %patchN syntax, was deprecated in favor of %patch N or %patch -p N. Matilainen said on the fedora-devel list that the reason for its deprecation was "it's a mind-bogglingly bad syntax for what it does, and prevents making %prep like any other section in the spec". He did not elaborate on how it prevented this, except to say "don't ask, you don't want to know".

The RPM developers may be happy to shed the %patchN syntax, but it is widely used. Dominique Leuenberger posted a message to the opensuse-factory mailing list in February about work needed to enable 4.20, saying that more than 2,066 packages used that syntax. According to the Fedora 41 change proposal for RPM 4.20, more than 1,000 spec files in Fedora's repository were using that syntax as of April. No doubt, many packagers still have work to do to update their package specs.

Miscellaneous

There are, of course, plenty of smaller changes that users and developers may (or may not) notice. For instance, the rpmkeys command, which is used to manage and provide information about RPM signing keys, now has options to list installed keys (--list) and to delete keys (--delete). Previously rpmkeys could only import keys and verify package signatures.

RPM has also added JSON as a query format. It joins the XML query format, which Matilainen noted is "an eyesore". For example, developers who want to access all of the information about an RPM for a program can use the following to get full output in JSON format:

    $ rpm -q --json <packagename>

Packagers who would like to enable reproducible builds will be interested in the %build_mtime_policy. This allows the packager to set a timestamp policy for the package to help ensure that the timestamps remain the same for all builds. See the documentation for more information.

Beyond 4.20

The RPM project is now in its 29th year, with the first commit (to a CVS repository) made on November 27, 1995, by either Marc Ewing or Erik Troan. The RPM timeline notes that the first commit was added as root, thus the identity of the original author is lost to history. So it goes. The project has pushed out substantial feature releases to the 4.x series about once a year since the project rebooted in 2006. According to the RPM roadmap, RPM 6.0 will be released sometime in the third quarter of 2025, which will coincide with the 30th anniversary of RPM. Those wondering about the jump from RPM 4.20 to 6.0 may not have lived through (or have forgotten) the fork that prompted the RPM.org reboot, relaunch of the fork, the fork's RPM 5.0.0 release, and subsequent drama in 2011.

The fork is showing few signs of life at present. The project site is still up but its news page has no updates after the 2009 release of 5.2, though there appears to have been a 5.4.17 release in 2016. The distributions that had switched to the fork, such as OpenMandriva Lx and OpenEmbedded, have switched back to the RPM.org version.

At any rate, it seems likely the RPM maintainers thought the best course was to avoid any confusion with the fork's 5.x releases and will skip straight to 6.0. The version jump will introduce the RPM v6 package format, which is described as a face-lift for the format rather than a full redesign. The goals are to shed some compatibility baggage, drop obsolete crypto algorithms (MD5, SHA1, and DSA1), and use 64-bit sizes in all headers.

RPM v6 will stop using cpio as the archive format for its file archive (payload) and start using the new format, which supports files larger than 4GB. If 4GB sounds excessive for an RPM, it's worth noting that the Chromium source RPM for Fedora 40 weighs in at a hefty 3.8GB. The Firefox source RPM is a comparatively svelte 826MB. If there are no packages that exceed the 4GB limit today, it seems likely that there will be before long.

To prepare for the new format, RPM 4.20 ships with a new utility, rpm2archive, which replaces the now-outdated rpm2cpio utility. Historically, rpm2cpio has been used to convert RPMs to cpio files, which can be unpacked with cpio. The rpm2archive utility converts RPMs to a gzipped-compressed tar archive that can be manipulated with tar instead.

The move to the v6 format should not be disruptive to distributions still on RPM 4.20 when 6.0 is released. In the first draft Matilainen sent to the rpm-maint mailing list, he said that the last RPM 4.x version would be able to read and install v6 packages and that RPM v6 would be able to read and install v4 packages. This should mean that RPM 4.20 will be able to work with v6 packages, since no further major 4.x releases are on the RPM roadmap.

6.0 and C++

In March, Matilainen started a discussion on GitHub about RPM in C++. He said that the project had been "dreaming about richer data structures than C has to offer" since the RPM reboot, but his early experiments in 2010 with C++ left him with a "resounding 'ugh no' conclusion". Recently, he took another look at C++ and "woke up to a language that seems almost like a distant cousin to the C++ I cursed at in 2010 (and before)".

With RPM 6.0 on the horizon, it was a good opportunity to enable the use of C++ in the RPM codebase. This is not a rewrite, he said, just an implementation detail. After 4.20 branched, Matilainen merged a commit that made it possible to build RPM with a C++ compiler and start switching to C++. Currently RPM's Python bindings, its plugins, and the low-level engine for RPM's ndb database are not planned for conversion.

In the initial discussion, Neal H. Walfield wondered whether it would make sense to wait a few more years to port RPM to Rust instead of C++. Matilainen said that Rust is not an option, but if someone wants to rewrite RPM in Rust from scratch "in another 15 years when I'm retired", they are welcome to do so.

The final RPM 4.20 release is expected before the end of September. The most recent release as of this writing is 4.19.94 (aka 4.20 RC2), released on September 10. That release is already available in Fedora 41 Beta, which was released on September 17. It will also likely turn up in the openSUSE Tumbleweed rolling-release distribution early in 2025 (RPM 4.19 entered Tumbleweed in February of this year), though it's unclear whether it will make it into openSUSE Leap 16.

There are quite a few other changes arriving in RPM 4.20, see the draft release notes for a comprehensive list of changes and bug fixes.

Comments (15 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: pmcd review; Vanilla OS; GNOME 47; HarfBuzz 10.0; Hy 1.0; OpenSSH 9.9; Quotes; ...
Announcements: Newsletters, conferences, security updates, patches, and more.

Next page: Brief items>>

LWN.net Weekly Edition for September 26, 2024

6.11 and 6.12

Rust

Gray hair and burnout

Starting out

Evolving KDE

Setting goals

We are the champions

Streamlined application-development experience

We care about your input

KDE needs you

Cumulative culture

Collecting resources

Discussion

Group photo

Fix or revert

Guidelines

Inspired by lockdown

Security boundaries

Cross-subsystem disagreement

Tooling and help

Managing expectations

Steam right ahead

Architecture-specific

Core kernel

Filesystems and block I/O

Hardware support

Networking

Security-related

Virtualization and containers

Internal kernel changes

Append and prepend

Declarative builds

Public plugin API

Isolation

Goodbye %patchN

Miscellaneous

Beyond 4.20

6.0 and C++

Inside this week's LWN.net Weekly Edition

Goodbye `%patchN`