Making EPERM friendlier

By Michael Kerrisk
January 19, 2013

Error reporting from the kernel (and low-level system libraries such as the C library) has been a primitive affair since the earliest UNIX systems. One of the consequences of this is that end users and system administrators often encounter error messages that provide quite limited information about the cause of the error, making it difficult to diagnose the underlying problem. Some recent discussions on the libc-alpha and Linux kernel mailing lists were started by developers who would like to improve this state of affairs by having the kernel provide more detailed error information to user space.

The traditional UNIX (and Linux) method of error reporting is via the (per-thread) global errno variable. The C library wrapper functions that invoke system calls indicate an error by returning -1 as the function result and setting errno to a positive integer value that identifies the cause of the error.

The fact that errno is a global variable is a source of complications for user-space programs. Because each system call may overwrite the global value, it is sometimes necessary to save a copy of the value if it needs to be preserved while making another system call. The fact that errno is global also means that signal handlers that make system calls must save a copy of errno on entry to the handler and restore it on exit, to prevent the possibility of overwriting a errno value that had previously been set in the main program.

Another problem with errno is that the information it reports is rather minimal: one of somewhat more than one hundred integer codes. Given that the kernel provides hundreds of system calls, many of which have multiple error cases, the mapping of errors to errno values inevitably means a loss of information.

That loss of information can be particularly acute when it comes to certain commonly used errno values. In a message to the libc-alpha mailing list, Dan Walsh explained the problem for two errors that are frequently encountered by end users:

Traditionally, if a process attempts a forbidden operation, errno for that thread is set to EACCES or EPERM, and a call to strerror() returns a localized version of "Permission Denied" or "Operation not permitted". This string appears throughout textual uis and syslogs. For example, it will show up in command-line tools, in exceptions within scripting languages, etc.

Those two errors have been defined on UNIX systems since early times. POSIX defines EACCES as "an attempt was made to access a file in a way forbidden by its file access permissions" and EPERM as "an attempt was made to perform an operation limited to processes with appropriate privileges or to the owner of a file or other resource". These definitions were fairly comprehensible on early UNIX systems, where the kernel was much less complex, the only method of controlling file access was via classical rwx file permissions, and the only kind of privilege separation was via user and group IDs and superuser versus non-superuser. However, life is rather more complex on modern UNIX systems.

In all, EPERM and EACCES are returned by more than 3000 locations across the Linux 3.7 kernel source code. However, it is not so much the number of return paths yielding these errors that is the problem. Rather, the problem for end users is determining the underlying cause of the errors. The possible causes are many, including denial of file access because of insufficient (classical) file permissions or because of permissions in an ACL, lack of the right capability, denial of an operation by a Linux Security Module or by the seccomp mechanism, and any of a number of other reasons. Dan summarized the problem faced by the end user:

As we continue to add mechanisms for the Kernel to deny permissions, the Administrator/User is faced with just a message that says "Permission Denied" Then if the administrator is lucky enough or skilled enough to know where to look, he might be able to understand why the process was denied access.

Dan's mail linked to a wiki page ("Friendly EPERM") with a proposal on how to deal with the problem. That proposal involves changes to both the kernel and the GNU C library (glibc). The kernel changes would add a mechanism for exposing a "failure cookie" to user space that would provide more detailed information about the error delivered in errno. On the glibc side, strerror() and related calls (e.g., perror()) would access the failure cookie in order obtain information that could be used to provide a more detailed error message to the user.

Roland McGrath was quick to point out that the solution is not so simple. The problem is that it is quite common for applications to call strerror() only some time after a failed system call, or to do things such as saving errno in a temporary location and then restoring it later. In the meantime, the application is likely to have performed further system calls that may have changed the value of the failure cookie.

Roland went on to identify some of the problems inherent in trying to extend existing standardized interfaces in order to provide useful error information to end users:

It is indeed an unfortunate limitation of POSIX-like interfaces that error reporting is limited to a single integer. But it's very deeply ingrained in the fundamental structure of all Unix-like interfaces.

Frankly, I don't see any practical way to achieve what you're after. In most cases, you can't even add new different errno codes for different kinds of permission errors, because POSIX specifies the standard code for certain errors and you'd break both standards compliance and all applications that test for standard errno codes to treat known classes of errors in particular ways.

In response, Eric Paris, one of the other proponents of the failure-cookie idea acknowledged Roland's points, noting that since the standard APIs can't be extended, then changes would be required to each application that wanted to take advantage of any additional error information provided by the kernel.

Eric subsequently posted a note to the kernel mailing list with a proposal on the kernel changes required to support improved error reporting. In essence, he proposes exposing some form of binary structure to user space that describes the cause of the last EPERM or EACCES error returned to the process by the kernel. That structure might, for example, be exposed via a thread-specific file in the /proc filesystem.

The structure would take the form of an initial field that indicates the subsystem that triggered the error—for example, capabilities, SELinux, or file permissions—followed by a union of substructures that provide subsystem-specific detail on the circumstances that triggered the error. Thus, for a file permissions error, the substructure might return the effective user and group ID of the process, the file user ID and group ID, and the file permission bits. At the user-space level, the binary structure could be read and translated to human-readable strings, perhaps via a glibc function that Eric suggested might be named something like get_extended_error_info().

Each of the kernel call sites that returned an EPERM or EACCES error would then need to be patched to update this information. But, patching all of those call sites would not be necessary to make the feature useful. As Eric noted:

But just getting extended denial information in a couple of the hot spots would be a huge win. Put it in capable(), LSM hooks, the open() syscall and path walk code.

There were various comments on Eric's proposal. In response to concerns from Stephen Smalley that this feature might leak information (such as file attributes) that could be considered sensitive in systems with a strict security policy (enforced by an LSM), Eric responded that the system could provide a sysctl to disable the feature:

I know many people are worried about information leaks, so I'll right up front say lets add the sysctl to disable the interface for those who are concerned about the metadata information leak. But for most of us I want that data right when it happens, where it happens, so It can be exposed, used, and acted upon by the admin trying to troubleshoot why the shit just hit the fan.

Reasoning that its best to use an existing format and its tools rather than inventing a new format for error reporting, Casey Schaufler suggested that audit records should be used instead:

the string returned by get_extended_error_info() ought to be the audit record the system call would generate, regardless of whether the audit system would emit it or not. If the audit record doesn't have the information you need we should fix the audit system to provide it. Any bit of the information in the audit record might be relevant, and your admin or developer might need to see it.

Eric expressed concerns that copying an audit record to the process's task_struct would carry more of a performance hit than copying a few integers to that structure, concluding:

I don't see a problem storing the last audit record if it exists, but I don't like making audit part of the normal workflow. I'd do it if others like that though.

Jakub Jelinek wondered which system call Eric's mechanism should return information about, and whether its state would be reset if a subsequent system call succeeded. In many cases, there is no one-to-one mapping between C library calls and system calls, so that some library functions may make one system call, save errno, then make some other system call (that may or may not also fail), and then restore the first system call's errno before returning to the caller. Other C library functions themselves set errno. "So, when would it be safe to call this new get_extended_error_info function and how to determine to which syscall it was relevant?"

Eric's opinion was that the mechanism should return information about the last kernel system call. "It would be really neat for libc to have a way to save and restore the extended errno information, maybe even supply its own if it made the choice in userspace, but that sounds really hard for the first pass."

However, there are problems with such a bare-bones approach. If the value returned by get_extended_error_info() corresponds to the last system call, rather than the errno value actually returned to user space, this risks confusing user-space applications (and users). Carlos O'Donell, who had earlier raised some of the same questions as Jakub and pointed out the need to properly handle the extended error information when a signal handler interrupts the main program, agreed with Casey's assessment that get_extended_error_info() should always return a value that corresponds to the current content of errno. That implies the need for a user-space function that can save and restore the extended error information.

Finally, David Gilbert suggested that it would be useful to broaden Eric's proposal to handle errors beyond EPERM and EACESS. "I've wasted way too much time trying to figure out why mmap (for example) has given me an EINVAL; there are just too many holes you can fall into."

In the last few days, discussion in the thread has gone quiet. However, it's clear that Dan and Eric have identified a very real and practical problem (and one that has been identified by others in the past). The solution would probably need to address the concerns raised in the discussion—most notably the need to have get_extended_error_info() always correspond to the current value of errno—and might possibly also be generalized beyond EPERM and EACCES. However, that should all be feasible, assuming someone takes on the (not insignificant) work of fleshing out the design and implementing it. If they do, the lives of system administrators and end users should become considerably easier when it comes to diagnosing the causes of software error reports.

Index entries for this article
Kernel	User-space API/Error reporting

to post comments

log why the permission is denied

Posted Jan 19, 2013 3:45 UTC (Sat) by dlang (guest, #313) [Link] (45 responses)

If the kernel logs this information, it can then be mined out of the logs.

If errors are rare, this is easy (look for the error with grep, or just look at the logs and notice the error)

if errors are common you have a bigger problem (both in running your system, and in finding the error :-) but finding what error message is unusual in a pile of common error messages is a common problem when dealing with logs.

log why the permission is denied

Posted Jan 19, 2013 10:16 UTC (Sat) by gdt (subscriber, #6284) [Link] (1 responses)

Remote logging, access permissions, information leakage, file formats, high cost of error path allowing denial of service.

log why the permission is denied

Posted Jan 19, 2013 10:35 UTC (Sat) by dlang (guest, #313) [Link]

> Remote logging,

routine, or are you listing this as an advantage? In any case it's far easier with logs than with the other options being listed.

> access permissions

configurable by the sysadmin

> information leakage

configurable by the sysadmin, just like all other information in the logs. This is even ignoring the syscall to disable it that was mentioned

> file formats

Yes, this is a wonderful advantage, the data can be put in whatever file format the sysadmin wants.

> high cost of error path allowing denial of service.

Only if you configure it to be a denial of service, Again, this is up to the sysadmin, some admins may want to run a system so locked down that if the log cannot be written they want the system to stop. Most admins won't want this, and this behavior is configurable in the logging daemons.

everything you mention is either a solved problem, or a strong advantage of having this information in logs rather than in some temporary memory structure that requires that applications be modified to gather the information (and in almost every case, that gathered information ends up in the logs from the application)

Logs already contain sensitive information, in fact, any substantial body of logs is going to contain user passwords, from the simple fact that it's valuable to track failed login attempts and _someone_ will get out of sync with the software and type their password in the userid field, followed pretty quickly afterwords with a successful login by that same user.

This is part of the reason that system logs (at least authentication related logs) need to have their access restricted to the admins. I don't see any reason that this extra information about denied access would be any different.

And I flatly reject the concept that the reason for denying access needs to be kept secret from the sysadmin who's running the box (who may need to grant the access)

log why the permission is denied

Posted Jan 20, 2013 0:13 UTC (Sun) by dvdeug (subscriber, #10998) [Link] (25 responses)

That seems like a solution for a different problem. I want the program I'm using to tell me what's wrong; whether I'm using rm or Konqueror, it would be nice to know why I can't delete a file right then and there.

log why the permission is denied

Posted Jan 20, 2013 0:30 UTC (Sun) by dlang (guest, #313) [Link] (24 responses)

that will require modifying every application to know about these linux-only error messages. I just don't see that happening. Some apps will be modified, but unless you get other *nix developers to adopt the same error reporting mechanism (with at least similar error codes) you are not going to get very far with this.

If you were to get all the *BSDs to sign on, that would probably be enough.

log why the permission is denied

Posted Jan 20, 2013 0:50 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (22 responses)

It's a high time to ditch POSIXs idiocies and start evolving a better standard. BSDs can either join or continue using the rapidly obsoleting standard. And the good thing is that it can be done incrementally, an extended error reporting even in a few programs would really help.

log why the permission is denied

Posted Jan 20, 2013 0:54 UTC (Sun) by ebiederm (subscriber, #35028) [Link] (6 responses)

Evolution is fine but backwards compatibility needs to be retained.

log why the permission is denied

Posted Jan 20, 2013 1:03 UTC (Sun) by dlang (guest, #313) [Link] (5 responses)

people who want to ditch POSIX compatibility offend me the same way that Big Media does.

Both groups benefit from being able to use the work that came before them, and both groups want to prevent others from benefiting from the work that they are doing (or are arrogant enough to believe that what they are doing is perfect and there will never be any need to build on what they are doing)

I fully expect people to take offence at this comparison, but after you calm down a bit, think about it and you will hopefully be a bit uncomfortable at how close the comparison matches.

log why the permission is denied

Posted Jan 20, 2013 1:13 UTC (Sun) by dvdeug (subscriber, #10998) [Link] (2 responses)

Linux doesn't run on VAXes. Furthermore, Linux supports those who break ix86 compatibility with their fancy new chipsets. It neither supports the old standard or enforces the new one.

Nobody is obliged to let history chain them in place; if you think you can do better then POSIX, you have the right to try and it's sad that you'll have to endure abuse to do so. Everyone benefits from the work that came before them; *nix systems have been harvesting features from Windows and Apple for years, and designing systems that don't work on those OSes, with no apology.

log why the permission is denied

Posted Jan 20, 2013 21:51 UTC (Sun) by deater (subscriber, #11746) [Link] (1 responses)

> Linux doesn't run on VAXes.

Of course it does! I've successfully run in under the simh emulator, enough to port my assembly-language version of linux_logo to it (http://www.deater.net/weave/vmwprod/asm/ll/ll.html).

Sadly it seems like development has died off at some point, and the top hit for a website doesn't have much info.
http://vax-linux.org/

Mailing list still active as of Dec 2012

Posted Jan 21, 2013 14:59 UTC (Mon) by jjs (guest, #10315) [Link]

http://vax-linux.org/pipermail/vax-linux/2012-December/00...

According to that (if I read it correctly), they have it running on 2.6.18 kernel, at least to the CLI.

log why the permission is denied

Posted Jan 20, 2013 1:21 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Nobody prevents you from, say, reimplementing cgroups on BSD to make systemd possible. That's the crucial difference.

I'm lurking on a lot of mailing lists and too often I see responses that can be summed up as: "It's not POSIX! Burn the heretic!" Meanwhile, competitors who don't care about POSIX beyond the very basics eat up their marketshare.

log why the permission is denied

Posted Jan 20, 2013 9:33 UTC (Sun) by alankila (guest, #47141) [Link]

There is nobody preventing anything. It's just that if a standard is insufficient to meet a new task, you either evolve that standard, or make a new standard. The alternative equals irrelevance and death, so seems like no-brainer to me.

log why the permission is denied

Posted Jan 20, 2013 1:07 UTC (Sun) by dvdeug (subscriber, #10998) [Link] (14 responses)

You're running a system largely written in C/C++ on (most likely) x86 systems; programming languages with compatibility back to the early 1970s on chips with compatibility back to the early 1970s. Backwards compatibility is a huge winner and breaking old code so bad that Linux is bound to being backwardly compatible.

log why the permission is denied

Posted Jan 20, 2013 1:09 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (13 responses)

We don't need to be backwards-incompatible. It's fine to have POSIX compat layer and API, but it's a good time to start developing a new API.

log why the permission is denied

Posted Jan 20, 2013 1:14 UTC (Sun) by mpr22 (subscriber, #60784) [Link]

Don't let us stop you.

log why the permission is denied

Posted Jan 20, 2013 1:21 UTC (Sun) by dvdeug (subscriber, #10998) [Link] (1 responses)

There's always impedence mismatches. VMS's POSIX always had serious problems because VMS is not Unix. A program that doesn't know about the filesystem versioning can't do the right thing about it. If you start from the POSIX level, you have a hard time introducing features like that in the first place, because nothing will handle them correct.

I believe that file names should be valid strings of Unicode characters. But if you do that, there's going to be edge problems where POSIX programs can't access certain files, can't create certain files for reasons inexplicable to them, or the POSIX filename-native filename mapping is confusing. The question is going to be is it worth it?

log why the permission is denied

Posted Jan 20, 2013 1:23 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Mac OS X shows that enforcing UTF-8 (and normalizing filenames) is totally fine for all real-world practical purposes.

log why the permission is denied

Posted Jan 20, 2013 1:25 UTC (Sun) by dlang (guest, #313) [Link] (9 responses)

the ash-heap of history is littered by companies and organizations that have decided that everyone else was wrong, and they knew better and could design such a great system that everyone else would abandon what they use to jump on board.

You act as if the POSIX (and Single Unix Specification) standard is something handed down from on high that hasn't changed in 20 years.

The last revision to POSIX and SUS took place within the last couple of years, and the next one will take place within the next few years.

These standards work by looking at the things that people are developing, and getting consensus between the different developers as to what they can agree on, They then have those developers go and implement what they are proposing, and it only gets into the standard after there are running implementations.

by definition this means that they encourage new, non-standard, things to be developed and deployed (they can't add something to the standard if it hasn't been deployed yet)

The problem isn't with the idea of enhancing things, it's with the idea that standards don't matter, nobody else matters, only develop for yourself and to #$% with everyone else.

log why the permission is denied

Posted Jan 20, 2013 1:31 UTC (Sun) by dvdeug (subscriber, #10998) [Link] (6 responses)

The ashheap of history is littered with *nix companies. And Microsoft is still out there. History's statement on the matter tells me that you've got to know when to hold them, know when to fold them, know when to walk away, and know when to run. Neither standards nor standard-free innovation is a guarantee of anything.

90% of companies and organizations fail quickly no matter what they do.

log why the permission is denied

Posted Jan 21, 2013 23:13 UTC (Mon) by cmccabe (guest, #60281) [Link] (5 responses)

Microsoft's Win32 uses simple integer error codes.

http://msdn.microsoft.com/en-us/library/cc231199.aspx

So this kind of rant is offtopic in more ways than one...

log why the permission is denied

Posted Jan 22, 2013 17:51 UTC (Tue) by ssmith32 (subscriber, #72404) [Link] (1 responses)

And windows has one of my favorite error codes/constant names!

ERROR_SUCCESS, of course :)

-stu

log why the permission is denied

Posted Jan 23, 2013 0:36 UTC (Wed) by marcH (subscriber, #57642) [Link]

... possibly brought by the same people who gave us the Start->Shut Down menu item.

PS: I thought Windows 7 got rid of the "Start" name but I just found it is still showing as a tooltip.

log why the permission is denied

Posted Jan 24, 2013 11:19 UTC (Thu) by sorokin (guest, #88478) [Link] (2 responses)

No. Since introduction of COM it uses IErrorInfo in addition to HRESULT.

log why the permission is denied

Posted Jan 30, 2013 5:53 UTC (Wed) by cmccabe (guest, #60281) [Link] (1 responses)

COM is a userspace thing, similar to DBus or CORBA. We are discussing kernel APIs here.

log why the permission is denied

Posted Jan 30, 2013 7:33 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Not really, COM is also used in kernel. IErrorInfo also supports marshalling across processes.

log why the permission is denied

Posted Jan 20, 2013 1:36 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> the ash-heap of history is littered by companies and organizations that have decided that everyone else was wrong, and they knew better and could design such a great system that everyone else would abandon what they use to jump on board.
And with even more companies that decided to "stick to standards" and stop innovating (e.g. basically all commercial UNIX vendors).

> You act as if the POSIX (and Single Unix Specification) standard is something handed down from on high that hasn't changed in 20 years.
Yep. Not much has changed in important areas, changes are mostly cosmetic (and yes, we've actually paid for copies of official POSIX standards).

For example, my another pet peeve - signals are useless for library writers because there's no mechanism to allocate/reserve them or to pass parameters to a signal handler.

> The last revision to POSIX and SUS took place within the last couple of years, and the next one will take place within the next few years.
Will it include cgroups, namespaces, kqueue? No?

> The problem isn't with the idea of enhancing things, it's with the idea that standards don't matter, nobody else matters, only develop for yourself and to #$% with everyone else.
And yet, the recent history shows us that this very attitude works. Most "community projects" end up dead after extensive bike-shedding flamewars.

log why the permission is denied

Posted Jan 20, 2013 14:17 UTC (Sun) by RobSeace (subscriber, #4435) [Link]

> For example, my another pet peeve - signals are useless for library
> writers because there's no mechanism to allocate/reserve them or to pass
> parameters to a signal handler.

You may wish to look into sigaction(SA_SIGINFO) and sigqueue() used with POSIX.1b real-time signals... That at least solves your second issue... As for your first, I'd think just using sigaction() to peek at the current handler would tell you if a signal is currently already in use or not...

log why the permission is denied

Posted Jan 20, 2013 1:00 UTC (Sun) by dvdeug (subscriber, #10998) [Link]

No, it would only require modifying applications that care to produce better error messages. Any solution that gets better error messages to the users is going to require modifying applications.

With all due respect to *BSD maintainers, I don't see it. Lots of stuff uses udev, despite it being Linux-only. If you want to provide decent error messages to the majority of your users, you'll support it; if you don't care, then you won't.

log why the permission is denied

Posted Jan 20, 2013 9:58 UTC (Sun) by epa (subscriber, #39769) [Link] (10 responses)

The idea that grepping through log files to diagnose a cryptic one-byte error is 'easy' is rationalizing the flaws in the current system to such an extent as to leave reality far behind.

Many of the original UNIX designers clearly did not share the extreme conservatism of some of their followers. Plan 9 introduced errstr, an error string set by system calls and maintained in parallel with the old errno. That seems like a simple and elegant solution which Linux could also adopt.

log why the permission is denied

Posted Jan 20, 2013 21:52 UTC (Sun) by dlang (guest, #313) [Link] (1 responses)

getting permission denied info in log files is useful for far more than "diagnosing cryptic one by error codes"

it lets the admin of the box see all the access that was denied. This can frequently identify 'bad actors' (unless they know the system intimately, they will have to poke around a bit before they find the hole they can get through)

And if you have a lot of permission denied errors, you would want to fix the software that's generating them to do something different.

all of this without any need to tie it in to a specific return code.

It happens to also give you a way to get more detail on the specific error (when you can tie the error to a specific time), and it nicely addresses the fact that you may not want to user to know all the details of why the permission was denied, but you do want to let the admin know.

log why the permission is denied

Posted Jan 21, 2013 5:53 UTC (Mon) by epa (subscriber, #39769) [Link]

It's not about 'the admin of the box' - that is no longer even a concept that makes sense in many use cases such as phones, or even a corporate environment where the IT department might have better things to do than investigate every 'permission denied' error returned to every user. It is about giving the application the details it needs to report a clear error messaage to the user. Otherwise why have error returns at all? Every syscall could just return 0 or 1, and if you want more info the administrator can easily grep the log files....

Returning meaningful error indicators to userspace does not preclude writing to a log file as well. In some cases, yes, security requires giving a terse 'permission denied' error with no further details. That situation is not the norm.

log why the permission is denied

Posted Jan 20, 2013 23:09 UTC (Sun) by skissane (subscriber, #38675) [Link] (6 responses)

Returning an error string has a problem - it doesn't internationalize well. I think it is better to define a catalog of error numbers; each error number has attached the number and types of allowed parameters and the English text. Additional files can contain translations to other languages. The kernel then just makes available to user-space a buffer containing the error code and its parameters - it is up to user space to do the message formatting. You'd need to make sure user space is using the same message catalog as the kernel - but that should not be too hard.

log why the permission is denied

Posted Jan 21, 2013 0:10 UTC (Mon) by ebiederm (subscriber, #35028) [Link] (2 responses)

If you are giving the English text you might as well use the English text as your key in your map for your lookups. People have been doing that successfully for 20+ years.

log why the permission is denied

Posted Jan 25, 2013 4:17 UTC (Fri) by skissane (subscriber, #38675) [Link] (1 responses)

Well, you can't just use the English text as-is. You actually have to use the English text with substitution variables, e.g. "File %s not found", and then pass the substitution variables separately - so passing a single string variable from user-space doesn't work very well, you'd need to do something like pass a structure containing the format string and the arguments to go with it...

And, while this approach is popular with e.g. the gettext API, I see a couple of drawbacks:

Assigning numeric error IDs helps: if a non-English user needs help from an English-speaking resource, the English-speaker can quickly identify the error if a numeric code is included; otherwise, off to Google translate
What if the English message in the kernel has bad spelling or grammar? Well, you don't want to change it now, its part of the user-space ABI. If the kernel was to pass a numeric error code to user space, the English text can be corrected much more easily

log why the permission is denied

Posted Jan 25, 2013 4:25 UTC (Fri) by dlang (guest, #313) [Link]

actually google does amazingly well with just pasting in the full error text. I used to carefully modify the messages when searching for them, then I saw a co-worker just pasting in the full error and it sure saves time :-)

log why the permission is denied

Posted Jan 21, 2013 10:44 UTC (Mon) by micka (subscriber, #38720) [Link] (2 responses)

Internationalized error messages are a terrible thing when trying to debuag by searching on the web.

I don't think english speaking people can really see what the problem is here : when the error message is by default internationalized, you can't "google" it, it will only return a handful of results, all of them by someone asking about the same problem you have, with no answers.

When you got this sort of error message and you must perform a search, you know you must reproduce the problem with i18n disabled. Sometimes it's as simple as

LANG=C <myprogram> <myparams>

but sometimes (I mostly have the problem when on windows, I don't know this system much and have no clue how i18n works there), that doesn't work.

The worst I have seen is the oracle database server (a really bad software anyway) giving you i18n'ed error messages ; you can't change the language on the client, you must do it on the server (if you are allowed to do so) !

log why the permission is denied

Posted Jan 21, 2013 11:06 UTC (Mon) by epa (subscriber, #39769) [Link]

I think the error string returned by the OS should be understandable by a programmer, but not necessarily shown to the user unchanged; it can be looked up in a translations table in a similar way to how errno is looked up. (There needs to be an agreed way to escape filenames, etc, so you don't get pathological cases when somebody creates a filename which looks like a fragment of error message.)

Any translated or 'friendly' message should be accompanied by a 'more details' button which gives the original string you can Google for.

log why the permission is denied

Posted Jan 25, 2013 4:07 UTC (Fri) by skissane (subscriber, #38675) [Link]

About Oracle RDBMS error messages, they all have numeric error codes, so even if the message text is non-English, you can just Google the code.

Your comment "you can't change the language on the client, you must do it on the server (if you are allowed to do so)" doesn't appear to be true:

SQL> select * from fred;
select * from fred
*
ERROR at line 1:
ORA-00942: table or view does not exist

SQL> alter session set nls_language = german;

Session altered.

SQL> select * from fred;
select * from fred
*
ERROR at line 1:
ORA-00942: Tabelle oder View nicht vorhanden

(Disclosure: I work for Oracle; these are my personal opinions, not my employer's.)

log why the permission is denied

Posted Feb 2, 2013 18:06 UTC (Sat) by quanstro (guest, #77996) [Link]

nitpcik. errstr, accessed by rerrstr(2) and werrstr(2) replaces
errno in plan 9; they are not maintained in parallel.

log why the permission is denied

Posted Jan 21, 2013 8:40 UTC (Mon) by jorgegv (subscriber, #60484) [Link] (5 responses)

If you strace the simplest command, say 'ls -l', you'll find hundreds of system calls which return an error and are ignored by the 'ls' process. E.g.: trying to open the dozen or so possible locations of locale files, ld.so files, etc.

If all of those error-returning calls each get a line or two sent to syslog with the reasons for the denial, you'll fill your log partition pretty quickly. Appart from the huge workload the syslog process would have.

The policy ('I want this error logged' or 'I don't want this error logged') belongs in userspace, not kernel.

log why the permission is denied

Posted Jan 21, 2013 9:07 UTC (Mon) by dlang (guest, #313) [Link] (4 responses)

syslog is userspace, and it can handle large volumes of logs and filter them efficently ;-)

that said, if the logs are not written sanely, filtering them can be expensive, but given that we are talking about adding the logging now, we should be able to make this be something easy to filter.

log why the permission is denied

Posted Jan 21, 2013 10:16 UTC (Mon) by meuh (guest, #22042) [Link] (3 responses)

Why not introduce a non-POSIX, Linux specific way to turn on error logging for "critical" code path, the same way one can enable logging in Bourne shell: set -x, then disable after the "critical" code path: set +x.

Sadly, I'm not really convinced myself: even if there's only one POSIX function in the "critical" section, this call could be translated in multiple library function and syscall calls. And this is going to create some annoyance when reading logs.

log why the permission is denied

Posted Jan 21, 2013 10:26 UTC (Mon) by meuh (guest, #22042) [Link] (2 responses)

And let's be funnier: add SIG_ERRNO, a signal triggered when a syscall return a an error code, and extend siginfo_t to hold error description.
Then have the kernel export, as part of the VDSO, an error decoder library.
But to be expandable, a user space library might be better.

log why the permission is denied

Posted Jan 21, 2013 23:19 UTC (Mon) by cmccabe (guest, #60281) [Link] (1 responses)

Not enterprisey enough.

We need ErrnoKit, a DBUS-enabled daemon that sends XML messages to a Mono process, which logs them in a custom binary format to the GNOME3 registry. Then they can be retrieved by the client application through SOAP requests to a CORBA object broker.

log why the permission is denied

Posted Jan 22, 2013 9:17 UTC (Tue) by niner (guest, #26151) [Link]

You're missing the most important piece: the error messages need to get stored in the Cloud! There the localization can be crowd sourced and error message profiles be monetized. I think, we should put this project up on kickstarter immediately...

Making EPERM friendlier

Posted Jan 19, 2013 5:03 UTC (Sat) by luto (subscriber, #39314) [Link]

This sounds like the kind of change that needs to be done with care to avoid introducing new security issues. Crypto code that gives too detailed failure information is often completely broken [1], and I can imagine other places in the kernel where the ability to distinguish reasons for EPERM could be dangerous.

[1] http://en.wikipedia.org/wiki/Padding_oracle_attack

Making EPERM friendlier

Posted Jan 19, 2013 5:24 UTC (Sat) by josh (subscriber, #17465) [Link] (5 responses)

Perhaps this could also make another error case more friendly: running a program can mysteriously fail with ENOENT if its ELF interpreter doesn't exist, which proves mystifying at first since the program itself does exist.

Making EPERM friendlier

Posted Jan 19, 2013 11:23 UTC (Sat) by khim (subscriber, #9252) [Link]

It's only mystifying the very first time: after you spent day or two trying to understand what's wrong with your file (redownload it, unpack it again, try to give it 777 permissions, etc) and finally ask on mailing list and get the answer "oh, well, it's obvious: you need 32bit-compatibility subsystem, the file which is not found is actually /lib/ld-linux.so.2, not the file which you are trying to run"... it's frustrating enough that you remember that fiasco for a looong time.

Making EPERM friendlier

Posted Jan 20, 2013 8:17 UTC (Sun) by geofft (subscriber, #59789) [Link] (3 responses)

Fedora solves (-ish) this with a patch to bash to figure out if that's the cause of the ENOENT and give you a better error message. Other distros should pick up the patch.

Making EPERM friendlier

Posted Jan 21, 2013 9:36 UTC (Mon) by pabs (subscriber, #43278) [Link] (2 responses)

... or Fedora could send the patch upstream?

Making EPERM friendlier

Posted Jan 21, 2013 9:49 UTC (Mon) by rahulsundaram (subscriber, #21946) [Link]

Upstream knows and seems reluctant IIRC

Making EPERM friendlier

Posted Jan 21, 2013 10:49 UTC (Mon) by micka (subscriber, #38720) [Link]

I think they're trying, but doing something less hackish, not in bash but at a lower level. But they're mostly at RFC stage at the moment (see article above...)

Making EPERM friendlier

Posted Jan 19, 2013 10:39 UTC (Sat) by bokr (guest, #58369) [Link] (1 responses)

Will strace be updated to use/show the friendlier info?

Making EPERM friendlier

Posted Jan 21, 2013 19:21 UTC (Mon) by dave_malcolm (subscriber, #15013) [Link]

I hadn't thought of that when I wrote the proposal, but I like that idea. and I've added it to the "Scope" section of the wiki page. Thanks!

Making EPERM friendlier

Posted Jan 19, 2013 16:18 UTC (Sat) by apoelstra (subscriber, #75205) [Link]

I'm glad that kernel developers are taking this seriously. It's gotten to the point that the first response to a "permission denied" error is to start chmod'ing things to 777, disabling the LSM, running things as root, moving important paths to world-writable directories, or other extreme measures simply because there are too many things to check, and if you check them all carefully you might not disable the message (and therefore not find the cause).

People arguing about the security implications here should be arguing about how to secure their logs, not how to sanitize them.

Making EPERM friendlier

Posted Jan 19, 2013 20:05 UTC (Sat) by dkg (subscriber, #55359) [Link] (1 responses)

This is definitely potentially a two-edged sword. It's worth noting CVE-2013-0157 (aka debian bug 697464) is a recent and simple example of a way in which more-detailed error reporting causes a data leak that might not be acceptable on some systems.

I'm grateful to see the additional error reporting (i do think that obscure errors limit the usability of our systems) but there are some tricky tradeoffs that need to be balanced to do it right.

Making EPERM friendlier

Posted Jan 20, 2013 0:23 UTC (Sun) by dvdeug (subscriber, #10998) [Link]

Actually, that bug shows the same problem happening with
$ mount --guess-fstype /root/.ssh/../../dev/sda1 ; even the error-reporting only looks like an error reporting one, because $ mount /root/.ssh/../../dev/cdrom mounting the cdrom confirms the existence of /root/.ssh as much as an error message would.

Making EPERM friendlier

Posted Jan 20, 2013 6:17 UTC (Sun) by wahern (subscriber, #37304) [Link] (3 responses)

This isn't about making error messages more friendly, this is about providing a hack because most software is broken. If software reported and recorded errors precisely when they happened, this wouldn't be an issue. The application making the request has much more relevant contextual information than the operating system does. And context that is inaccessible to the application is usually supposed to be that way for security. You're not supposed to know which policy blocked your request. (Things may be different if you're a sysadmin, but you want a separate, protected channel for that information.)

Yes, it's tedious to deal with errors early, but everything is tedious in C. It's the nature of the language. C isn't a RAD environment.

Let's not pretend that this proposal is a better errno. It's a work-around for broken software. It's equivalent to wrapping every error in an exception, and pretending that exceptions fix the tedium, instead of what they usually do--kick the bucket down the road.

Now, that doesn't make it a bad proposal, per se, just not what it's advertised as.

As for dealing with errno munging, the simplest answer is to not use errno. Capture the errno value immediately after a system call fails. And stop writing library APIs which write through errno; instead, return a friggin' int directly. Why people use kernel error reporting semantics as a prototype, I'll never understand. When I see application routines which return -1 to signal an error, I want to tear my hair out.

Making EPERM friendlier

Posted Jan 20, 2013 6:34 UTC (Sun) by dlang (guest, #313) [Link]

you are complaining about a completely different problem.

The problem that's being addressed here is that a well written application that is going to tell the user what went wrong only knows that permisison was denied.

The application cannot provide any more information to the user, because it doesn't have the information.

And there is no sane way for the admin/support person to figure out _why_ the permission was denied.

In the 'old days', this was fairly simple, there was only one place to check (the rwx permissions).

However today, it's much harder.

When you don't give admins sane ways to figure out the cause of the permission problems within the context of the more complex security model, the result is going to be that admins disable the more complex security model, a secure system that doesn't get the job done is worthless.

Making EPERM friendlier

Posted Jan 20, 2013 9:40 UTC (Sun) by alankila (guest, #47141) [Link] (1 responses)

I know this take on the issue isn't going to be popular here, but I like exceptions as error-reporting system. It is imho hell of a lot better than single integer. The information contained in the stack trace allows me to infer the state of the system and generally determine the point of failure, or the offending piece of code I wrote.

If only C, or the userspace-kernel API could have something like that...

Making EPERM friendlier

Posted Jan 21, 2013 8:40 UTC (Mon) by epa (subscriber, #39769) [Link]

An exception is essentially an alternative return value from the function. (The *implementation* of exception handling in languages such as C++ is something quite different, but that is not relevant here). I agree that it is great to have a structured object giving details on the error. However that may be a little ambitious - and in 20 years we might be having arguments about how the error details don't fit into the C structure defined way back in 2013. I suggest that a string, such as Plan 9's errstr, is a reasonable compromise between a single int (clearly inadequate) and a complex structured exception object (too complex to be widely adopted at the kernel level).

Making EPERM (un)friendlier

Posted Jan 20, 2013 8:29 UTC (Sun) by akeane (guest, #85436) [Link] (1 responses)

POSIX returns too much information as it is, EPERM, ENONENT, ETHIS, ETHAT; it's too confusing!

The only jmp codes that should be popped off the stack should be EFAIL and SSUCCESS, computers are binary after all, if my processor doesn't need more than two states to let me play Doom, then I fail to see why the arrogant C library and POSIX standard should need more. It's complexity for it's own sake...

Maybe as a compromise, some kind soul should hack the C library so it associates an "E" number with a NULL terminated series of bytes which can then be written to a terminal or dot matrix printer.

Making EPERM (un)friendlier

Posted Jan 25, 2013 19:36 UTC (Fri) by nix (subscriber, #2304) [Link]

Sounds like you want the Hurd, with its -EIEIO and -EGREGIOUS errnos. :)

Making EPERM friendlier

Posted Jan 20, 2013 14:08 UTC (Sun) by justincormack (subscriber, #70439) [Link] (4 responses)

I don't see why we can't just add more error codes. There are only about 132 error codes at the moment, and return values are 32 bit.

A high bit mask with more detailed information would suffice, so you get the traditional error code in the low 8 bits, then information about the error location in the kernel in other bits. The kernel could export a map of more detailed information, so you could match up (and document) the reasons.

Obviously this is a breaking change, so your binary might have to set some flag to get the extended bits from the kernel.

Making EPERM friendlier

Posted Jan 20, 2013 15:44 UTC (Sun) by andreasb (guest, #80258) [Link] (1 responses)

Setting a flag wouldn't work, I think. The main application might set the flag, but some library it has linked in might also make a syscall and get confused by the errno values.

Making EPERM friendlier

Posted Jan 20, 2013 18:59 UTC (Sun) by akeane (guest, #85436) [Link]

It would if the flag was a #define directive the C lib header files picked up and you had a set of parallel syscalls in the kernel, behold:

_open is the normal one
__open_ret_32 is the OMG MORE ERROR CODEZ!!!

So, you need a set of extra syscalls in the kernel to add more info to the ret value, (luckily this will add even more lines of code and complexity to the kernel, what could go wrong? yay!)

and a switch in the C lib:

cc -o my_earthly_soul p_audio.cs -DOMGMOREERRNOSSUCKA

But this is assuming that anybody actually goes around checking error codes in this modern era; no one really bothers anyway; if it's a real problem and not just the kernel nagging at you then something else will break properly later on and you get a nice SEGV which you can blame on a third party device driver.

It also becomes increasing difficult to add additional lines of error checking code when you reach a certain age, and your monocle has seen better days (also you waste valuable bytes on your winchester disk)

I stand by my assertion that only two ERR codes are necessary in your typical unix warez:

fd = open("~/Music/a-dreadful-din.mp1");

#ifdef YOUNG_PERSON
if(fd == E:-) )
return cool;

if(fd == E:-( ))
return opens_gonna_hate;
#endif

#ifdef MOI

if(fd == EAKEANE) /* Clearly a measure of success */
{
/* Remove rubbish modern so-called "music" */
unlink("~/Music");

/* Check for errors from unlink? nah... */
return heh!;
}

if(fd == EGETOFFMYLAWN)
unlink("~"); /* There's probably some bad music there somewhere */

#endif

Making EPERM friendlier

Posted Jan 20, 2013 19:03 UTC (Sun) by jreiser (subscriber, #11027) [Link] (1 responses)

The current scheme used by the linux kernel for error return codes from a syscall allows only 4095 error codes (0xFFF...F001 through 0xFFF...FFFF) because any other bit pattern could be a legitimate non-error return value.

Making EPERM friendlier

Posted Jan 20, 2013 20:02 UTC (Sun) by justincormack (subscriber, #70439) [Link]

Good point. Thats probably still enough to distinguish all the error points for a particular syscall, but it is more of a squeeze.

A solved problem

Posted Jan 20, 2013 23:50 UTC (Sun) by imunsie (guest, #68550) [Link] (2 responses)

This problem is already solved by Peter Miller's libexplain:

http://libexplain.sourceforge.net/

A solved problem

Posted Jan 21, 2013 0:15 UTC (Mon) by ebiederm (subscriber, #35028) [Link]

Thanks for the link. I don't know if it fully solves the problem but that certainly looks like the proper form of a solution.

A solved problem

Posted Jan 21, 2013 21:59 UTC (Mon) by PaulWay (guest, #45600) [Link]

Heh - I see our minds think alike, Ian :-)

Paul

Making EPERM friendlier

Posted Jan 21, 2013 2:18 UTC (Mon) by PaulWay (guest, #45600) [Link]

It would be an error (heh) to not mention libexplain in this conversation - http://libexplain.sourceforge.net/. Peter has done a huge amount of work in trying to back-track why problems have occurred and explain the problem intelligibly to the user.

I don't think this is the basis of any in-Kernel expanded messaging, but I do think that the knowledge that Peter has picked up and put in libexplain of why things go wrong and what various codes mean is a useful reference when trying to build a system that improves on the current error reporting.

Hope this helps,

Paul

Making EPERM friendlier

Posted Jan 21, 2013 10:17 UTC (Mon) by etienne (guest, #25256) [Link] (6 responses)

I do not get why there is a problem.
Right now errno is an address of an errno_t area of memory.
Why not increase a bit the area to write:
- a signature confirming the extended errno_t
- the size of this errno_t
- what service created the error
- a better description of the error
- a serial number?
It would be fully backward compatible.

Making EPERM friendlier

Posted Jan 21, 2013 10:24 UTC (Mon) by mpr22 (subscriber, #60784) [Link] (5 responses)

Congratulations, you just broke switch (errno) { /* ... */ }.

Making EPERM friendlier

Posted Jan 21, 2013 11:45 UTC (Mon) by johill (subscriber, #25196) [Link] (4 responses)

Not really. As far as I understand he's basically saying

struct ext_err_no {
  int /* or whatever */ errno;
  // ... extended info ...
};

struct ext_err_no errno_storage;

#define errno &errno_storage.errno

(I'd guess this could be made to work and still be compliant)

The question of course is how to determine that the extended info is there?

Making EPERM friendlier

Posted Jan 21, 2013 18:47 UTC (Mon) by dtlin (subscriber, #36537) [Link] (3 responses)

I'm pretty sure etienne was not suggesting that, because if you had been reading the article, this breaks existing programs.

do_something_that_might_fail();
{ /* maybe inside an interrupt, logging routine,
   * or anything else that happens between where
   * the error occurs and its consumer */
  errno_t saved_errno = errno;
  do_something_else();  /* might change errno */
  errno = saved_errno;  /* so put errno back */
}

Now you've just clobbered a single field in errno_storage, which might have been saving information from a different call. If you make the variable errno encompass the entire extended storage space, then that code is fine, but then you can't treat it as an int, which is certainly used in many places too.

Making EPERM friendlier

Posted Jan 21, 2013 18:56 UTC (Mon) by apoelstra (subscriber, #75205) [Link] (2 responses)

Suppose you took errno to be an index into a giant array of extended error structures. Then that should be okay?

Making EPERM friendlier

Posted Jan 22, 2013 7:22 UTC (Tue) by itvirta (guest, #49997) [Link] (1 responses)

That only worked if the two functions returned distinct error codes, if they both return EPERM then the same problem appears.

Btw, this is the first I've heard of errno_t, apparently on the systems I checked, errno is defined as just (extern) int errno. Where does errno_t come from?

Making EPERM friendlier

Posted Jan 22, 2013 14:00 UTC (Tue) by etienne (guest, #25256) [Link]

> Where does errno_t come from?

I do not remember where I have seen it first, probably someone defined it locally when going from 32 bits int to 64 bits int, but a bit of internet search leads to:
In the world of Standard C, the type 'errno_t' is defined by TR24731-1 (see http://stackoverflow.com/questions/372980/ for more information) and you have to 'activate it' by defining '__STDC_WANT_LIB_EXT1__'.

Note that increasing the size of the memory referenced by errno is fully backward compatible with already compiled software. My comment was a bit early on Monday morning, to activate a "bigger" errno you would need to define something like '__STDC_WANT_BIG_ERRNO__' before including "errno.h", and check a 32 bits signature just after the old standard code if you may run on a LIBC which do not provide the big errno.

Making EPERM friendlier

Posted Jan 22, 2013 0:00 UTC (Tue) by marcH (subscriber, #57642) [Link] (1 responses)

Think errno sucks? Then just consider firewalls for a second.

Making EPERM friendlier

Posted Jan 23, 2013 9:37 UTC (Wed) by ymmv (subscriber, #4375) [Link]

Parents do as well.

Never had a detailed answer when I pretty please asked for a flame thrower.

Making EPERM friendlier

Posted Jan 22, 2013 11:51 UTC (Tue) by vonbrand (subscriber, #4458) [Link] (2 responses)

If errno is an integer, why not just use the full range? I.e., low 8 bits contain "traditional" errno, high 24 bits (uint32_t should be plenty... famous last words) contain details for whoever is groping for them. Frob perror(3) to use the full range on some feature macro, i.e., LINUX_VERSION >= 0x030900. Or am I missing something critical here?

Making EPERM friendlier

Posted Jan 22, 2013 12:00 UTC (Tue) by andresfreund (subscriber, #69562) [Link]

Heaps of existing switch(errno) kind of code?

Making EPERM friendlier

Posted Jan 23, 2013 1:57 UTC (Wed) by gdt (subscriber, #6284) [Link]

The critical thing you are missing is that a single error code is part of the problem.

What an application wants to know is: what happened, what should I tell the user, and what should I do next?

Take what to do next: Does it matter? Should you loop around and retry? Should you back up to the file selection interface and retry? Should you terminate cleanly (ie, telling the user WTF just happened, at the cost of more resources)? Is it so severe you should throw up your hands and exit uncleanly to give the system the best chance to bounce back?

If you go an add a bazillion more integer values then the "what to do next" problem becomes a bazillion times harder. One thing which a redesign should do is to stop the overloading of errno with meaning.

Making EPERM friendlier, even for C++ programs

Posted Jan 22, 2013 20:33 UTC (Tue) by scripter (subscriber, #2654) [Link] (4 responses)

While I welcome better error messages, it reminds me that there are challenges getting at the root cause in languages besides C.

For example, C++ streams haven't provided programmers with a standardized way to get at errno/strerror. There's a proposal to fix it:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n...

For g++, there's a non-standard workaround: use C calls and convert the file descriptor into a C++ stream using __gnu_cxx::stdio_filebuf<char>:

http://stackoverflow.com/questions/2746168

It's also nice to get good error feedback in higher-level languages like Java.

I get frustrated when applications swallow error messages and provide high-level "Something went wrong" messages because it makes it difficult to find and fix the root problem. I suppose that's why tools like strace and ltrace are so useful.

Making EPERM friendlier, even for C++ programs

Posted Feb 2, 2013 12:43 UTC (Sat) by MrWim (subscriber, #47432) [Link] (3 responses)

This is also a frustration of mine. Most exception propagation schemes make only two options easy:

Allow the exception to propagate (e.g. "cannot convert 'abc' to int") such that the UI can only tell the user exactly that but not more information about the context
Catch all exceptions and throw another one with context but not (e.g. "Loading simulation failed")

It is far too difficult to provide an error message like "Loading simulation failed because cell B74 of sim.csv contains 'abc' when it should contain a number". It would be nice if it were possible to attach more and more context to an exception as it propagates up the stack. Java has exception.getCause() and C++'s boost::exception has the ability to attach more data to an exception.

I prefer boost's approach but it still sucks as:

You still need a try...catch block. It would be much nicer to have some sort of RAII style context built up on the stack which would be unrolled.
To use it all the code you call must throw boost exceptions.
It is difficult to serialize/deserialize if you want to transport it between threads.

I don't know if other languages have solved this in a nicer way.

Making EPERM friendlier, even for C++ programs

Posted Feb 2, 2013 23:22 UTC (Sat) by etienne (guest, #25256) [Link] (1 responses)

Maybe you want something like the lower quartet/byte indicate the error, the function up the stack adds its quartet/byte (shifted by 8 bits) to tell when that error happens, and go up for next quartet/byte. The problem then is to document all the error codes you show to the final user...

Making EPERM friendlier, even for C++ programs

Posted Feb 3, 2013 0:37 UTC (Sun) by nix (subscriber, #2304) [Link]

That doesn't let you include more than a few functions' worth of info, and *also* doesn't let you include a string with variable components.

Doing this gets even more painful when you consider localization. Even GNU gettext, with its support for %s-style elements whose order depends on language, would have trouble here, I fear.

Making EPERM friendlier, even for C++ programs

Posted Feb 3, 2013 19:16 UTC (Sun) by MrWim (subscriber, #47432) [Link]

Not relevant for this discussion perhaps but maybe such a system would be possible in C++. See my code-sketch: https://gist.github.com/4697481 . Even so it would probably be difficult to make this efficient enough and malloc failure friendly that people would actually want to use it.