Best practices for error handling in kernel Rust

By Daroc Alden
September 19, 2024

Kangrejos 2024

Dirk Behme led a session discussing the use of Rust's question-mark operator in the kernel at Kangrejos 2024. He was particularly concerned with the concept of "silent" errors that don't print any messages to the console. Other attendees were less convinced that this was a problem, but his presentation sparked a lot of discussion about whether the Rust-for-Linux project could improve error handling in kernel Rust code.

He opened the session by giving a simplified example, based on some actual debugging he had done. In short, he had a Rust function that returns an error. In Rust errors are just normal values that are returned to the caller of a function like any other value. The caller can either do something explicit with the error — or return early and pass the error on to its caller using the question-mark operator. Code like foo()? calls foo() and then, if it returns an Err value, immediately returns the error to the calling function.

This chain has to end at some point. In user space, the ultimate endpoint is the return value from the program's main function. If main() returns an error, the default behavior is to print it to stderr and exit with a nonzero exit code:

    $ ./failing-rust-program
    Error: ParseIntError { kind: InvalidDigit }

In the kernel, there's no such facility. Therefore an unhandled error might not result in anything being printed to the kernel log, which is not ideal. Behme looked at a few alternatives. One suggested alternative was that the coding standards for kernel Rust could encourage people to use .expect() instead of the question-mark operator, which does show a message in the log. It also currently crashes the kernel, however, which is never the right solution.

    let value = fallible_operation()
       .expect("the operation not to fail in this case. If it does, BUG_ON()");

An alternative solution, he mentioned, could be adapted from ScopeGuard. He asked Andreas Hindborg to explain in more detail. Unlike the question-mark operator or .expect(), ScopeGuard is not built into the Rust language, Hindborg explained. Rather, it's part of the kernel crate. A ScopeGuard is a wrapper around a function pointer that calls the function whenever the ScopeGuard stops being referenced. It's useful for handling cleanup when a function has more complicated cleanup than just freeing memory or unlocking locks (which can be handled by Rust's Drop mechanism). The user can also "disarm" the ScopeGuard to prevent it from firing on code paths that handle cleanup themselves (or don't need to do it for some reason):

    // Create an anonymous function to perform cleanup
    let guard = ScopeGuard::new(|| {
      ... // Cleanup code
    });

    ...

    if some_condition() {
        return; // guard goes out of scope, runs cleanup code
    }

    ...

    // When the function exits normally, the guard can be disarmed
    guard.disarm();
    return;

Behme suggested that they might want to adapt this to create an error type that prints a log message to the console when an error of that type is dropped. So if a programmer calls a fallible function, they can either handle the error themselves (preventing it from being dropped), pass it on to their caller (likewise preventing it from being dropped), or forget to handle it — causing the value to be dropped and print an error to the console. He thought a downside of this approach is that he wasn't certain how to make it work with line information — after all, the place where the error is dropped is likely to be far away from where it was created, so the error type would need to capture and store the original source information.

For guidance, Behme suggested looking at what C does. In C, not all errors print log messages, but not all errors are silent either. It can be decided on a case-by-case basis. Kernel Rust code could potentially do the same, but then contributors lose some of the convenience of having a generic solution. He asked what the attendees would prefer.

Miguel Ojeda asked when they actually want errors to be printed — does it make sense to have a special debugging mode that prints more errors? Or to select individual functions for tracing? In either case, it would be better to enable that without requiring source-level changes for tracing.

Behme was skeptical about the idea of having a special debugging mode; he felt that developers often don't actually run with modes like that turned on, and that customers with a production image don't either. So adding a special mode would not really help.

Benno Lossin asked what information Behme wanted for debugging — did he want just the file and line number, or would he like to show a whole backtrace? Behme answered that obviously the best approach would be "an AI that tells me where the bug is". Failing that, more information is better — and just one line would be better than being completely silent. "I'll take what I can get".

Lossin mentioned that in one of his user-space projects he embeds a backtrace in his error type by using one of the internal details of the question-mark operator: when a function returns one type of error, but uses the question-mark operator on another, the operator tries to convert between the types using the From trait. If no such conversion is possible, it's a compile-time error.

So in Lossin's project, he has one error type for all of his code, and when some external library returns an error, the From implementation captures a backtrace and stores it. Behme thought that sounded potentially useful, and that having full backtraces available would be helpful.

Alice Ryhl wasn't sure that they should want errors in drivers to print to the console, however. She pointed out that some "error" conditions are actually fairly normal, such as temporary allocation failures. Logging about these whenever they happen could spam the kernel log. Behme responded that Ryhl's driver has about 200 places that use the question-mark operator — did she really want all of those to be silent?

Ryhl explained that some of them are not silent — she has explicit logging code where needed. But also, she said, the utility of the question-mark operator is that it makes it easy to bubble-up errors to higher levels that can handle them with more context. So not every invocation of the question-mark operator really needs to log.

Paul McKenney agreed, saying that it was possible to have too much output, especially in embedded systems. He noted that kernel developers have spent 20 years adding and removing debug info from various places in the kernel in order to try and get the balance correct.

Behme noted that he would even be happy with the conclusion that Rust should just handle each case manually, like C, as long as they started forbidding the question-mark operator so that people actually have to think about it.

Boqun Feng thought that dynamic BPF tracepoints could maybe provide some help with debugging things like this, therefore there was no need to log until the user attaches a function for debugging.

Greg Kroah-Hartman disagreed, saying that whatever logging they decided on was something that would run all of the time. It's often not possible to tell the customer to rerun something with debugging turned on. He had some thoughts of his own on the question, however: traditionally in the kernel, it's the function that creates the error that's responsible for logging it, he said. If there's a memory error, you don't print it at every level bubbling up.

Behme suggested that would mean the question-mark operator should only be used after checking that the function that creates the error does log it, then.

Carlos Bilbao was confused about how Behme was seeing errors not being reported, anyway — in Rust, the Result type, which represents either an error or a normal return value, has the #[must_use] attribute. So the compiler will emit a warning if the code does not do something with an error.

Lossin thought that the C approach might not make sense for Rust, because idiomatic Rust often has many small helper functions, and it could make sense to centralize error reporting more than that. He went on to suggest adding an extension to Result that can attach log messages to errors — like the popular user-space anyhow crate does:

    some_fallible_operation().log("if it fails, this is printed")?;
    // and the error is still bubbled up with ?

Gary Guo pointed out that you actually want more information than just "there was an error", so you can't really get away from the need for custom error-printing code. Hindborg noted that different use cases warrant different amounts of logging.

The eventual conclusion was that the question-mark operator was not the problem per se, but rather a lack of standardized error handling and logging in kernel Rust. Additional libraries or simple documentation that helps address the issue could be useful. The balance between performance, ease of debugging, and error handling remains one that requires human judgement.

[ Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our coverage of Kangrejos. ]

to post comments

Exceptions?

Posted Sep 19, 2024 18:03 UTC (Thu) by zorro (subscriber, #45643) [Link] (25 responses)

Isn't there a way to emulate try/catch in Rust? It baffles me a bit that Rust didn't choose exceptions (the C# kind) for error handling.

Printed in large friendly letters on its cover

Posted Sep 19, 2024 18:19 UTC (Thu) by tux3 (subscriber, #101245) [Link] (2 responses)

A few people have tried. One kind is syntactic sugar around the regular Error type. There have been proposals for try {} blocks and a throw (a.k.a "yeet") keyword.

A more direct equivalent of exceptions in Rust is panics, those unwind the stack and look for landing pads similar to many other languages, but it is very much not discouraged and un-ergonomic to use Rust panics in this way.

And then there is pretty much the reverse of what you want — it fills me with so many mixed emotions that I can't help but mention it — someone who emulated classic Rust errors with exceptions (panics), that's the iex crate: https://docs.rs/iex/latest/iex/ (see also the wonderful blog post: https://purplesyringa.moe/blog/you-might-want-to-use-pani...)

Printed in large friendly letters on its cover

Posted Sep 21, 2024 14:20 UTC (Sat) by zorro (subscriber, #45643) [Link]

Thanks for the links, interesting reading for sure

Printed in large friendly letters on its cover

Posted Sep 21, 2024 14:32 UTC (Sat) by adobriyan (subscriber, #30858) [Link]

> someone who emulated classic Rust errors with exceptions (panics), that's the iex crate

It is never late to return to the dark side with "throw std::expected<T, U>;".

Exceptions?

Posted Sep 19, 2024 18:36 UTC (Thu) by shahms (subscriber, #8877) [Link]

It does not surprise me at all. Invisible control flow is hard for humans and compilers to reason about. As you can see from the above discussion, there are even people for whom Rust's `?` is insufficient.

Exceptions?

Posted Sep 19, 2024 19:09 UTC (Thu) by khim (subscriber, #9252) [Link]

Exception handling is, actually, somewhat weak in Rust as Graydon quite explicitly have told us… the only issue with your proposal is simply the fact that C# exceptions (how they are different from C++ exceptions, BTW?) are most definitely not better.

If you want safety and robustness then the last thing that you need are hidden return paths that are entirely invisible, when you look on your code.

Exceptions?

Posted Sep 19, 2024 20:35 UTC (Thu) by sunshowers (guest, #170655) [Link] (2 responses)

Error handling is somewhere between 30% and 90% of high-quality software, and exceptions are too magical to permit a long-term state where errors are handled perfectly. Even ? is kind of a crude approximation of good error handling. Many serious programs (e.g. compilers) want to collect errors from various subroutines rather than bailing on the first one.

Errors as part of a returned sum type

Posted Sep 20, 2024 9:06 UTC (Fri) by farnz (subscriber, #17727) [Link] (1 responses)

One benefit of errors being part of a returned sum type, as in Rust, is that you can call functions of type Result<T, Error> and yourself return something like Result<T, Vec<Error>>, where I'm assuming that Error contains all the information you need to produce a good error report to the user.

Add in the fact that Rust has support for turning an iterator over items of type Result<T, E> into either of Result<Collection<T>, E> (collect all good results, fail on first error) or Collection<Result<T, E>> (collect all results, good and bad), and you get a powerful way to start out with simpler error handling (fail on first error collected, ignore the rest), and refactor later to consider all the errors once you've got the happy path working.

And fundamentally, I think the biggest strength Rust's approach to error handling has is that you can use ? and something like eyre or anyhow to implement "careless" error handling, then later refactor in stages to high-quality error handling. You can even focus your efforts on the places where "careless" error handling is causing most trouble.

Errors as part of a returned sum type

Posted Sep 21, 2024 1:48 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

> Add in the fact that Rust has support for turning an iterator over items of type Result<T, E> into either of Result<Collection<T>, E> (collect all good results, fail on first error) or Collection<Result<T, E>> (collect all results, good and bad), and you get a powerful way to start out with simpler error handling (fail on first error collected, ignore the rest), and refactor later to consider all the errors once you've got the happy path working.

You can also go the other way. A Result<T, E> implements IntoIterator<T> (yielding a single value if Ok(), or no value if Err()). So in cases where T is a container and you want to handle any error as if it was just an empty container, you can call .into_iter().flatten() and then proceed with more iterator calculus. Similarly, if you have an Iterator<T> and a FnMut(T) -> Result<U, E>, you can call .flat_map() on the iterator and only keep the U values (throwing away all of the E values).

In practice, both of these operations make *significantly* more sense if you are using Option instead of Result, which supports exactly the same thing, but now you're just throwing away Nones instead of actual error values. For some reason, that feels much less evil to me, even though Option<T> is isomorphic to Result<T, ()>.

Exceptions?

Posted Sep 20, 2024 6:17 UTC (Fri) by tchernobog (subscriber, #73595) [Link] (10 responses)

It was a conscious decision.

Exceptions are not immediately visible from the function signature, and as such an opaque feature of virtually any languages implementing them. In other words, the caller cannot know what they need to catch without reading the whole implementation. This makes it hard to perform proper static verification from the compiler too.

A few have tried to enforce throw declarations in the signature, e.g. Java, only to later on give up for anything derived from the class of runtime exceptions.

And then you go back not to know you have to handle errors at all. It is one of the reasons why a lot of code with e.g. functional safety in mind in C++ compile with -fno-exceptions.

The Rust model of making it part of the return type is much more robust.

In my opinion, exceptions are a feature of languages that didn't know better when they were designed, and should stay a thing of the past.

Exceptions?

Posted Sep 20, 2024 13:37 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (6 responses)

Making code readable is still a valuable thing today though.

Exceptions?

Posted Sep 20, 2024 14:43 UTC (Fri) by ebee_matteo (guest, #165284) [Link] (5 responses)

Yes, of course. My view is that non-trivial code tends to be far *less* readable when using exceptions.

I sometimes get the feeling that people in favor of exceptions are just experienced developers in programming languages that use them, but have had limited exposures to other programming languages with alternative error handling facilities.

Exceptions?

Posted Sep 21, 2024 10:57 UTC (Sat) by LtWorf (subscriber, #124958) [Link]

I do Python but also C/C++. Recently some go as well.

Exceptions?

Posted Sep 21, 2024 14:39 UTC (Sat) by zorro (subscriber, #45643) [Link] (3 responses)

I think that depends on the application.

If I'm developing a web service that talks to a database, then there is nothing I can do with errors returned by the database, except log the error and return a failure message to the client. Any code devoted to propagating such errors up the call stack is just noise.

This is generally true for all errors that are caused by bugs or the non-availability of a critical resource without which the code cannot function. For some applications this is the vast majority of error cases, and (unchecked) exceptions work very well for them.

Exceptions?

Posted Sep 22, 2024 10:35 UTC (Sun) by farnz (subscriber, #17727) [Link] (2 responses)

My experience is that the ? operator Rust uses for this is a good compromise; you have to mark everything that can propagate an error, but it then just calmly propagates without issue.

Otherwise, I find that you eventually hit a case where you're struggling to debug a failure because a call used to be infallible and fully local, but as the system has evolved, someone has changed it to be fallible; the "ceremony" provided by the ? operator is sufficient to both make this a deliberate decision on someone's part (not a bad mistake that slips past review), and to prompt you into investigating why the "infallible" function fails.

Exceptions?

Posted Sep 22, 2024 12:46 UTC (Sun) by zorro (subscriber, #45643) [Link] (1 responses)

But isn't the point of the article that you are losing valuable debugging information when propagating errors using the ? operator?

With (C#) exceptions you can log a nice stacktrace as a starting point (and often even end point) for debugging.

What's present in a Rust error

Posted Sep 22, 2024 14:39 UTC (Sun) by farnz (subscriber, #17727) [Link]

Capturing a backtrace is a decision made at the time you convert from the called function's error type to your error type; you can capture a backtrace (and if you use thiserror to derive the std::error::Error trait for your type, it'll capture a backtrace when you put a field of type std::backtrace::Backtrace in your error type, and both eyre and anyhow capture backtraces for you), or you can decide that you're not capturing a backtrace, and avoid paying the cost of capturing a backtrace.

The article's point is that you need to think about exactly what you capture in the error type, and what you log at the point of error, in a Rust program; in kernel C, it's simple because the error is almost always a negative integer, and thus your only choice is to log at the point of error, but in kernel Rust, you have more options - and you need to think through what you're doing and get it right.

Exceptions?

Posted Sep 21, 2024 4:54 UTC (Sat) by zorro (subscriber, #45643) [Link] (2 responses)

How are Rust errors conceptually different from Java checked exceptions, though? Checked exceptions force you to either handle errors as they occur, or pass them to the caller, just like Rust errors. Most people will agree that Java checked exceptions were a disaster.

I do like that panic/recover facility of Rust, it seems to match the failfast philosophy underlying robust software development very well.

Rust Error versus Java checked exceptions

Posted Sep 21, 2024 11:16 UTC (Sat) by farnz (subscriber, #17727) [Link]

The biggest differences I've found between Rust errors and Java's checked exceptions as I experienced them in Java 1.2 days (so some of the Java issues may have been fixed in later versions) are:

It is in practice trivial to define my error type as "all errors that the functions I call can produce, plus the ones I create myself"; at one level, there's existing crates like eyre and anyhow that provide "general" error types, and at another, I can define my own error type with From implementations to convert from your error type to my error type in a meaningful fashion - which will get used by ? for bubbling the error up. In contrast, in Java, I found it very hard to define my exceptions as "the exception specification of foo, plus my added exceptions" without copying-and-pasting foo's exception specification and keeping it in sync manually.
The requirement to handle failure at the call site (even if I just use ? to immediately bubble it up to the next level up) means that it's simple to see where an error can come from; I found in Java that I'd be surprised by an exception from a function that I thought (erroneously) was infallible.

Exceptions?

Posted Oct 6, 2024 10:09 UTC (Sun) by ssokolow (guest, #94568) [Link]

Checked exceptions are a sidecar bolted onto the type system as a whole. An aftermarket verifier for a conceptually independent mechanism of control flow and data return.

Rust's error handling is just returning a value, completely within the type system. The compiler finally having support for sum types (data-bearing enums/tagged unions) as well as product types (structs and classes) is a very powerful thing... especially when paired with a little syntactic sugar like the ? operator, which more or less boils down to something like this:

let thing = match thing {
    Some(x) => x,
    Err(e) => return Err(e.into()),
}

In the error case, call whichever Into implementation matches the error side of the return type in the function signature, wrap it back up in the Err variant of the new Result type, and return it. The only thing magical about it is that the try_trait_v2 language feature hasn't been stabilized, so only the standard library can implement ? for new types in stable Rust at the moment.

Exceptions?

Posted Sep 20, 2024 21:28 UTC (Fri) by riking (subscriber, #95706) [Link]

Unwinding panics are banned in kernel Rust.

Implicit stack-unwinding exceptions are terrible

Posted Sep 22, 2024 9:49 UTC (Sun) by summentier (subscriber, #100638) [Link] (4 responses)

Implicit stack-unwinding exceptions of the Ada/C++/Java/Python variety are, to my mind, a straight-forward language design mistake, certainly in this age of nontrivial codebases and nontrivial parallelism. Rust did well by not including them.

1. Exception safety together with RAII seems simple but is actually fiendishly difficult to get right – Stroustup's paper on execption safety in something as basic as an array has 17 pages, auxiliary base types, etc.

2. Add parallelism to the mix, and you have to reintroduce a monad (`Result`, `std::future`) anyway. Oh, by the way, are you sure your network of threads is notified and unwound in the correct order by the default behaviour?

3. Are you sure the calling code knows which exceptions to expect? Are you sure you know which ones you are passing along without even knowing from third-party libraries? Are you sure none of these libraries are adding or changing expections in new versions? The function signature is certainly of no help to you there...

4. As the C++ FAQ points out, exceptions allow you to factor out code to helper functions. Sure, but are the exceptions the helper raises still relevant to the caller of the upper-level function? Is, e.g., "log2 expects positive argument" something you want to expose from a config file parser rather than "Invalid entry: expect n to be positive"? Something you want to guarantee and maintain?

In an ideal world, I would like to have a "decay" mechanism, where Results, when assigned to a variable that is not Result, are unpacked and a panic raised if that fails, just to free up some mental space of new programmers or computational scientists who do not want to think about error paths all the time. Other than that, I think Rust gets this right.

Implicit stack-unwinding exceptions are terrible

Posted Sep 22, 2024 13:37 UTC (Sun) by zorro (subscriber, #45643) [Link] (1 responses)

As to your first and second points, the whole purpose of Rust is for the compiler to check that your (parallel) code is safe and free from race conditions. From a code safety perspective, propagating errors explicitly using ? is really not that different from propagating errors implicitly using an exception*

As to your third point, fail-fast philosophy ([1]) is that you do not need to know what exceptions to expect as they are always "fatal" (leading to an abort of the transaction, after which the system continues as if the transaction never happened). An exception that you *can* handle is not an error and should not be modeled as an exception. *In that sense, Rust's Result type makes sense.* But at some point, if all my code can do with an error is to pass it to the caller then it may as well be an exception as it is extremely unlikely that the calling code will be able to do anything else with it either.

As to your final point, I don't think it is only new programmers and computational scientists that don't want to think about exceptions. I don't want to think about them either, and I've been programming for more than 25 years. In fact, all catch blocks in my C# code are required to have a comment explaining why the exception must be caught and handled (rather than implicitly propagating it) and why it is not possible to prevent the exception from happening in the first place.

[1] "Need Robust Software? Make It Fragile" (https://www.yegor256.com/2015/08/25/fail-fast.html).

*One area in which I think exceptions are problematic is in asynchronous code, but I'm a "keep-it-simple" kind of guy and avoid asynchronous code like the plague.

Implicit stack-unwinding exceptions are terrible

Posted Sep 22, 2024 13:54 UTC (Sun) by mb (subscriber, #50428) [Link]

>if all my code can do with an error is to pass it to the caller then it may as well be an exception as it is
>extremely unlikely that the calling code will be able to do anything else with it either.

That is pretty much not true in general and especially not true for kernel code.
There almost always is *some* point in the call chain that can handle the error. And it almost never is at the tip. It's somewhere in the midde (e.g. re-queue, re-try, try something else, etc...) or at the bottom (fail syscall).

Exceptions are a nightmare.
They are invisible in almost all of the program code.

Exceptions just require that every path eventually has a catch-all block in addition to the known-error-catch-blocks, because "huh, *that* exception was possible??".

Implicit stack-unwinding exceptions are terrible

Posted Oct 6, 2024 9:59 UTC (Sun) by ssokolow (guest, #94568) [Link] (1 responses)

There's an essay from 2005 named Exception Handling Considered Harmful which basically agrees. It argues two points:

That, in real-world use, exception handling introduces a hidden control flow path at essentially every line of code, which is unreasonable and potentially impossible for even experts to manage correctly.
That the implicit SQL-esque transaction-rollback nature of exception handling through stack unwinding does not map well to multi-threaded execution.

Implicit stack-unwinding exceptions are terrible

Posted Oct 6, 2024 10:19 UTC (Sun) by Wol (subscriber, #4433) [Link]

I'm trying to remember how PLP (and by implication, PL/1) handled exceptions.

Iirc, by default, when an exception triggered, it "return"ed to where it came from. It was standard practice to register a longjmp if you wanted it to return somewhere else (which could be anywhere in the call stack). So in practice, exceptions were simply a good mechanism for "on error goto cleanup", but were rather more powerful if required.

I'm guessing this "default to return to call site" meant that they weren't quite so bad as the exceptions being discussed here ...

Cheers,
Wol

Result is more flexible than just the question mark

Posted Sep 19, 2024 18:38 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (11 responses)

> He opened the session by giving a simplified example, based on some actual debugging he had done. In short, he had a Rust function that returns an error. In Rust errors are just normal values that are returned to the caller of a function like any other value. The caller can either do something explicit with the error — or return early and pass the error on to its caller using the question-mark operator. Code like foo()? calls foo() and then, if it returns an Err value, immediately returns the error to the calling function.

For those less familiar with Rust, I'd like to point out that Result has a ton of convenience methods for doing more complicated transforms, if you need them. Here are some simple examples:

* and_then() - If the Result is an Ok(), calls a function on it and return the Result from that function, otherwise return the Result as-is.
* or_else() - The same as and_then(), but transforms the Err() case instead of the Ok() case.
* map() - The same as and_then(), but the called function is infallible (does not return a Result), so an Ok() is always turned into another Ok().
* unwrap_or_else() - Returns the value inside the Ok(), or calls a function to compute such a value if the Result is an Err().
* iter() - Maps an Ok() into a single-element iterator, maps Err() into an empty iterator. Most useful if the Ok() type is some kind of iterable (you can then call flatten() or flat_map() and continue with other Iterator generic methods).
* inspect() and inspect_err() - Calls a function on the value inside the Ok() or Err() (respectively), if the Result matches that variant, or else does nothing.

Since most of these methods return another Result, you can combine these methods with each other or with the question mark to build up a more complicated chain of logic.

You always have the option of writing out an explicit match block, but that is often needlessly verbose. I would suggest considering it if you're calling more than two or three consecutive Result methods, just to make things a little easier to read. As you might expect, Option has a similar menagerie of convenience methods, and both types have conversions to each other.

Result is more flexible than just the question mark

Posted Sep 19, 2024 19:53 UTC (Thu) by adobriyan (subscriber, #30858) [Link] (10 responses)

So we're supposed to use

.map_err(|e| {printk!("XXX"); e})?

for printk style debugging?

Result is more flexible than just the question mark

Posted Sep 19, 2024 20:13 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

No, you are meant to use inspect_err() for that. The difference is that inspect_err() is not allowed to change the Err, just "inspect" it. This also means you don't need the return expression at the end.

Result is more flexible than just the question mark

Posted Sep 19, 2024 20:37 UTC (Thu) by adobriyan (subscriber, #30858) [Link]

Yes, that's even better.

Result is more flexible than just the question mark

Posted Sep 20, 2024 6:54 UTC (Fri) by atnot (guest, #124910) [Link] (5 responses)

Theoretically, but personally I never do something like that in practice.

Result implements Debug if the types it wraps implement Debug, which means you can just print the whole Result. If it's an error, it prints the error, if it's ok it prints the return. In C you usually need separate code paths for the error and success cases because you don't know whether the result of the function is valid without extra information, but in Rust that's not really the case that often.

It's also a nice motivation to keep your error messages very thorough and helpful! If you're putting it in a debug print, probably makes sense to put it into the error value permanently too :)

Result is more flexible than just the question mark

Posted Sep 20, 2024 8:17 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (4 responses)

Does the kernel mandate that your types must impl Debug if anybody else can see them? I can imagine the kernel as the sort of place which views even the minimal ceremony of #[derive(Debug)] as an imposition or has an insistence that they can do better in terms of the code emitted (but then they don't, so it goes unimplemented).

Result is more flexible than just the question mark

Posted Sep 20, 2024 19:37 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

It is generally considered polite to impl Debug on all public types. The compiler even has an off-by-default warning for failing to do so: https://doc.rust-lang.org/rustc/lints/listing/allowed-by-...

More pragmatically, if you impl std::error::Error (or core::error::Error in nostd environments), then you must also impl Display and Debug to satisfy its trait bounds. Strictly speaking, Result does not require the Err type to impl that trait, but it is the usual way of doing things.

Result is more flexible than just the question mark

Posted Sep 21, 2024 6:02 UTC (Sat) by mgb (guest, #3226) [Link] (2 responses)

"It is generally considered polite to ..."

I'm a C++ and LISP programmer (and many older languages) who has read the Rust Programming Language book. I often see assertions on LWN that X is polite or Y is the norm or Z is typical in Rust. Is there a good resource for learning this layer of Rust?

Result is more flexible than just the question mark

Posted Sep 23, 2024 3:22 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

In this particular case, it is documented at https://doc.rust-lang.org/std/fmt/index.html#fmtdisplay-v...

> fmt::Debug implementations should be implemented for all public types. Output will typically represent the internal state as faithfully as possible. The purpose of the Debug trait is to facilitate debugging Rust code. In most cases, using #[derive(Debug)] is sufficient and recommended.

As for more general advice, I don't have a specific reference, but the single broadest rule of thumb that I tend to follow is this: Avoid making assumptions about your callers and callees, within reason. I can also give you a bunch of special cases of this rule to help you see what it means in general:

* Where practical, prefer to take generic arguments which match some trait, rather than requiring an exact type. If the trait in question is not in std, nor in some directly-relevant library that you are compatible with, then you *may* want to define your own trait and let the caller impl it, but this is considerably more of a judgment call and I don't think a blanket rule can tell you whether it's a good idea or a bad idea.
* &T is the lowest common denominator of short-lived borrows. In theory, you could (following the first rule) insist on taking everything as Deref<T> or Borrow<Q> where T: Borrow<Q> instead of &T. In practice, it turns out that most callers really can come up with a &T if you ask for one, so the added flexibility is overkill.
* Borrow is occasionally useful if you know that your callers may want to pass a &str *and* they will want T to be String and not str in that case. This mostly only comes up with (owning) containers that offer fast key lookup, such as HashMap or any tree-like structure, but it is just about the only exception I can think of where &T is not good enough.
* Do not take &Option<T> as an argument. Always take Option<&T> instead. This is because the former requires the caller to move the T into an Option if it was not already in an Option (e.g. it's in a Box or Arc), and that's annoying. The latter can be constructed directly from &T and it is very cheap to do so because it's literally just a nullable pointer at the ABI level. A similar rule applies to Result and all other simple wrapper types.
* Do not mention smart pointers as an argument type, unless they are actually necessary. Box<T> means "I'm taking ownership of a heap-allocated T." Arc<T> means "I'm taking a long-lived borrow of a heap-allocated, reference-counted T." &Arc<T> means "I'm taking at least a short-lived borrow, but I might promote it to a long-lived borrow if I so choose," which is ever so slightly more efficient if the short-lived borrow case really happens at least some of the time. &Box<T>, Box<&T>, and Arc<&T> are all nonsense and should not be used.
* Use lifetime parameters on structs with caution. If a struct has a lifetime parameter, it will be harder for callers to reason about when it may be used and how it relates to the data which it has borrowed. This is mostly harmless on private types (or where visibility otherwise means that the caller does not have to think about it), and may be acceptable in cases where the lifetime restriction is "obvious" and straightforward. If it takes more than one or two sentences to explain, it's probably too complicated.
* Do not use type-unsafe sentinel values. Don't use integers to indicate which of several states we are in, use an enum. Don't use -1 to indicate "not found," use Option<usize> (or some other Option<T> as applicable). Etc. Generally, do not assume that the caller has carefully written out all of the if statements that your documentation says they're supposed to write - instead, make them write an exhaustive match block that cannot forget possible values.
* If you are implementing From, TryFrom, or a similar conversion trait, it's OK to mention From and Into in your where clause (to indicate that a type conversion is possible whenever some generic parameter allows a corresponding conversion). Otherwise, it may indicate that you are trying to be "too flexible." Let the caller write foo.into() if they want a type conversion to happen - do not do it for them, unless silent type conversion is appropriate for that context (which is sometimes the case, but you should think about it carefully). Note: IntoIterator is not the same thing as Into<Iterator>. It is entirely unproblematic to accept IntoIterator, because that is a collection-specific trait, rather than a generic "convert anything into anything" trait.

Another way of phrasing this: Think about how you would go about using a given API in different use cases (maybe even sketch out some example code, which can also serve as doctests). If the API is awkward or fiddly, then think about why that is the case, and whether you can find a way to make things easier for the caller.

Result is more flexible than just the question mark

Posted Sep 23, 2024 6:44 UTC (Mon) by mgb (guest, #3226) [Link]

Much to think about there. Thank you.

Result is more flexible than just the question mark

Posted Sep 20, 2024 7:00 UTC (Fri) by mb (subscriber, #50428) [Link] (1 responses)

I don't know if it's available in the kernel, but in general one would use:

https://doc.rust-lang.org/std/macro.dbg.html

Result is more flexible than just the question mark

Posted Sep 20, 2024 11:17 UTC (Fri) by farnz (subscriber, #17727) [Link]

The kernel has its own dbg! macro which uses pr_info to put things in the log.

Why the controversy?

Posted Sep 19, 2024 19:05 UTC (Thu) by roc (subscriber, #30627) [Link] (16 responses)

The ? operator is just a more convenient and safer equivalent of "if (result < 0) return result;" (or "goto out;") which the kernel does all over the place. There's nothing wrong with it.

> Carlos Bilbao was confused about how Behme was seeing errors not being reported, anyway — in Rust, the Result type, which represents either an error or a normal return value, has the #[must_use] attribute.

I was wondering this myself. Unlike in C, you can't silently discard errors in Rust without triggering that warning. So the original motivation for this discussion, as described, does not actually make sense.

The question of whether and where to log errors seems very simple: Rust should log errors wherever the equivalent C code would.

I don't understand why there is a controversy here.

Why the controversy?

Posted Sep 19, 2024 19:54 UTC (Thu) by ringerc (subscriber, #3071) [Link]

Agreed.

If someone tried to mandate golang-style verbose explicit error handling there would probably be a rebellion about verbose code and demands for some kind of "?" operator to abbreviate the common case of propagating the error.

Why the controversy?

Posted Sep 19, 2024 21:29 UTC (Thu) by fishface60 (subscriber, #88700) [Link] (3 responses)

> I don't understand why there is a controversy here.

I think it's because kernel code's errno-based error handling means that you don't get very much context propagated back up your call stack, so if something goes wrong and you propagate the error up the call stack instead of reporting it then and there, you don't have the context to give a useful error log.

As a result they've got a convention of logging as soon as an error occurs, which lacks the context of _why_ it was called, but is otherwise has a reasonable amount of context to log what happened.

In rust with anyhow/eyre/miette you have the option of attaching extra context to an error as you propagate it, which gives more information as it propagates up the stack, but at some point you need to decide when something is caught and logged.

If you've got a standard of not logging at every level of the stack, logging when the error first occurs seems like a good way to ensure it happens.

I think Rust has an opportunity to encode this standard into the type system, by having functions which can fail badly enough to need immediate logging return a type that wraps Result, and the Result can't be propagated without first logging the error, though some other approach might be necessary to ensure the context is propagated down through the call stack rather than on its way back up.

Why the controversy?

Posted Sep 20, 2024 14:10 UTC (Fri) by stressinduktion (subscriber, #46452) [Link] (2 responses)

> In rust with anyhow/eyre/miette you have the option of attaching extra context to an error as you propagate it, which gives more information as it propagates up the stack, but at some point you need to decide when something is caught and logged.

This might be problematic in some kernel code, as good error messages might use format strings to print good error messages, but that requires memory allocation in error paths. E.g. printk does a lot of heavy lifting to be usable in contexts where memory allocation should be avoided. Hanging on to error strings for longer as needed when bubbling them up might be problematic. This might be solvable by deferring the formatting, but might become a complex subsystem in itself.

Why the controversy?

Posted Sep 20, 2024 17:45 UTC (Fri) by sunshowers (guest, #170655) [Link]

Generally speaking, in Rust you can model your error types such that conversions to strings are delayed as much as possible. Rust's enums tend to support that use case well since you can use lots of enums to correspond to each case.

Why the controversy?

Posted Sep 20, 2024 20:05 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

> This might be solvable by deferring the formatting, but might become a complex subsystem in itself.

Rust handles this complexity by biting the bullet and using dynamic dispatch. See for example https://doc.rust-lang.org/std/fmt/struct.DebugStruct.html... (specifically the "&dyn Debug" argument). Since vtables are preallocated at compile-time, there's no need to allocate at runtime, so this just adds the overhead of a little bit of extra pointer chasing. The vtables are also quite small since Debug only has one method (and zero supertraits). The alternative would be monomorphizing all of these formatting methods separately for every Debug type that uses them (which turns out to be a lot of types in practice), and then (probably) dead-code-eliminating all or nearly all of those monomorphizations since Debug is not meant to be used for purposes other than debugging, so this is exactly the sort of use case where dynamic dispatch is the most sensible option.

(I dunno, maybe the compiler is smart enough to deduce that we have an isolated graph of Debug-related functions that only call each other and are not reachable from main(), and thereby avoid monomorphizing them in the first place. But it seems like that would be much harder than just doing complete monomorphization of every function first and then dead code elimination as a separate pass, so it is probably not reasonable for Rust-the-language to demand that rustc-the-implementation deal with a hypothetical monomorphized Debug/Formatter implementation within a reasonable compile time.)

Why the controversy?

Posted Sep 24, 2024 6:08 UTC (Tue) by dirklwn (subscriber, #80581) [Link] (10 responses)

> I don't understand why there is a controversy here.

The main topic is the different behavior between Rust (kernel guys would say "in user space") and Rust for Linux. With Rust in user space you will get a notification if something using the ?-operator does fail. In the kernel with Rust for Linux you will not (in the trivial case, if you don't have done any provisions to catch it somewhere else, see the discussion above).

Taking a quite trivial example from the Rust for Linux in tree rust_minimal.rs sample [1]

numbers.push(72, GFP_KERNEL)?;
numbers.push(108, GFP_KERNEL)?;
numbers.push(200, GFP_KERNEL)?;

in case of a failure in any of these 3 lines you won't get any notification at all. So from observer (console) point of view there no difference at all if this sample is executed successfully or not. I wonder if that is what is intended here and everybody is aware of that?

Yes, using the ?-operator is quite convenient, results in nice code and is encouraged to be used in a lot of Rust documentation. Even in the kernel. That is totally fine.

But for Rust for Linux I would think we should put at least a big red attention sign somewhere "be aware of the difference and make sure your error handling is as you really want it to be".

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...

Why the controversy?

Posted Sep 24, 2024 6:19 UTC (Tue) by mb (subscriber, #50428) [Link] (1 responses)

>you won't get any notification at all.

Well, the module would fail to load. With a message, right?

I don't see how this is any different from the usual pattern

foo = kmalloc(...);
if (!foo) goto out;

Why the controversy?

Posted Sep 24, 2024 8:20 UTC (Tue) by dirklwn (subscriber, #80581) [Link]

> Well, the module would fail to load. With a message, right?

The test setup was built-in.

Why the controversy?

Posted Sep 24, 2024 7:11 UTC (Tue) by roc (subscriber, #30627) [Link] (7 responses)

> With Rust in user space you will get a notification if something using the ?-operator does fail.

No you won't, unless you write code to do that.

Why the controversy?

Posted Sep 24, 2024 8:25 UTC (Tue) by dirklwn (subscriber, #80581) [Link] (6 responses)

> No you won't, unless you write code to do that.

This even more simplified example:

pub const EINVAL: u32 = 22;

fn larger_three(val: u32) -> Result<(), u32> {

if val > 3 {
return Err(EINVAL);
}

return Ok(())
}

fn main() -> Result<(), u32>{
println!("Hello Rust!");

larger_three(1)?;
larger_three(4)?;
larger_three(3)?;

return Ok(())
}

with a cargo run results in

Hello Rust!
Error: 22

Why the controversy?

Posted Sep 24, 2024 8:42 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

That's because you've added code (the scaffolding around main) to explicitly print the error.

If I rewrite that to not have the scaffolding you added, I get:

pub const EINVAL: u32 = 22;

fn larger_three(val: u32) -> Result<(), u32> {
    if val > 3 {
        return Err(EINVAL);
    }
    return Ok(())
}

fn main() {
    println!("Hello Rust!");

    larger_three(1)?;
    larger_three(4)?;
    larger_three(3)?;

    return Ok(())
}

Which doesn't compile; if I refactor a little to be more typical userspace Rust code where you're thinking about error handling, I might get (for example):

pub const EINVAL: u32 = 22;

fn larger_three(val: u32) -> Result<(), u32> {
    if val > 3 {
        return Err(EINVAL);
    }
    return Ok(())
}

fn load() -> Result<(), u32> {
    println!("Hello Rust!");

    larger_three(1)?;
    larger_three(4)?;
    larger_three(3)?;

    return Ok(())
}

fn main() {
    if let Err(e) = load() {
        eprintln!("load attempt failed: {e:?}");
        panic!("Did not successfully load the system");
    }
}

This is not hypothetical code, BTW; a significant amount of my employer's codebase has main functions like this, because we don't want the ? operator to be used blithely during startup - it's something that exists to pass errors out, and once you get to main, we don't want you passing errors out blindly, we want you to handle them.

Why the controversy?

Posted Sep 24, 2024 8:59 UTC (Tue) by dirklwn (subscriber, #80581) [Link] (1 responses)

> a significant amount of my employer's codebase has main functions like this, because we don't want the ? operator to be used blithely during startup - it's something that exists to pass errors out, and once you get to main, we don't want you passing errors out blindly, we want you to handle them.

Yes, this is totally fine. Thanks!

I think if we manage to document that somehow for example as "error handling good practices" then we are on a better way, already.

Unfortunately I have the impression that the rust_minimal.rs example is used by several people as a starting point and with this might be a little misleading regarding proper error handling. Maybe an update of rust_minimal.rs like in your example above would be good. as well.

Why the controversy?

Posted Sep 24, 2024 9:40 UTC (Tue) by farnz (subscriber, #17727) [Link]

I can't face the effort it'd take to get set up to deal with LKML patch submission again for something this small, but the following diff would refactor it to be closer to what I'd write in userspace Rust; I'll leave it to you to ensure this compiles and works, then push it through the process.


diff --git a/samples/rust/rust_minimal.rs b/samples/rust/rust_minimal.rs
index 2a9eaab62d1c..f13a4548dd26 100644
--- a/samples/rust/rust_minimal.rs
+++ b/samples/rust/rust_minimal.rs
@@ -16,8 +16,8 @@ struct RustMinimal {
     numbers: Vec<;i32>,
 }
 
-impl kernel::Module for RustMinimal {
-    fn init(_module: &'static ThisModule) -> Result<;Self> {
+impl RustMinimal {
+    fn new(_module: &'static ThisModule) -> Result<;Self> {
         pr_info!("Rust minimal sample (init)\n");
         pr_info!("Am I built-in? {}\n", !cfg!(MODULE));
 
@@ -30,6 +30,16 @@ fn init(_module: &'static ThisModule) -> Result<;Self> {
     }
 }
 
+impl kernel::Module for RustMinimal {
+    fn init(module: &'static ThisModule) -> Result<;Self> {
+        let res = Self::new(module);
+        if let Some(err) = &res {
+            pr_err("Failed to create RustMinimal: {err:?}");
+        }
+        res
+    }
+}
+
 impl Drop for RustMinimal {
     fn drop(&mut self) {
         pr_info!("My numbers are {:?}\n", self.numbers);

Why the controversy?

Posted Sep 24, 2024 10:46 UTC (Tue) by roc (subscriber, #30627) [Link] (2 responses)

That "Error: 22" message is printed when the `main` function returns an error. It has nothing to do with the ? operator and nothing do with with kernel vs user space.

Why the controversy?

Posted Sep 24, 2024 11:36 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

Isn't all this just missing the point?

The point as I understand it is that adding a "throw error" deep in the call stack will cause the compiler to object blue murder unless that error is explicitly handled.

Yes you can just add a question mark in the caller routine to say "punt this error up the stack".

But as soon as somebody starts digging through the patches to debug a problem, the fact that "git diff" or whatever shows that "somebody added a ? to this subroutine call", red flags will start going up everywhere.

So the naive C way of just ignoring the error is impossible. You can handle the error deep in the bowels. You can propagate it up the stack with the ? Either way you're forced to do *something* rather than leaving it behind as a landmine for the next programmer. That's the important point - the compiler won't let you do anything else. (And there's no difference between user space and kernel space in this regard - nor should there be.)

And all that stuff about main not having a ? is just coding standards to say "don't let the entire program punt it back up to the command-line / xterm / konsole / whatever to handle/drop aka terminate with an unacknowledged error.

Cheers,
Wol

Why the controversy?

Posted Sep 24, 2024 20:53 UTC (Tue) by roc (subscriber, #30627) [Link]

Yes, that's right.