Toward race-free process signaling

Posted Dec 7, 2018 8:20 UTC (Fri) by Cyberax (✭ supporter ✭, #52523)
In reply to: Toward race-free process signaling by epa
Parent article: Toward race-free process signaling

This won't work. Or more precisely, it'll have all the same drawbacks.

Consider the current use-case:
- list processes
- get process pid
- kill process by pid

The new use-case will be:
- list processes
- get process pid
- get long pid by pid
- kill process by long pid

The race condition is still there. You'll need to fix all the APIs to use long pids in the first place.

to post comments

Toward race-free process signaling

Posted Dec 7, 2018 9:02 UTC (Fri) by epa (subscriber, #39769) [Link] (16 responses)

Indeed, every system call will need a version that returns a long pid. So the new fork() will return the long pid directly, and so on. There is no need for a separate and race-prone lookup from short pid to long pid (which is not a 1-1 mapping anyway).

Toward race-free process signaling

Posted Dec 7, 2018 9:05 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (15 responses)

But at this point it makes sense to just use file descriptors instead of long pids. File descriptors are way better for many reasons - they can be securely sent over Unix sockets, they can be inherited by subprocesses and so on.

Toward race-free process signaling

Posted Dec 7, 2018 11:22 UTC (Fri) by epa (subscriber, #39769) [Link] (6 responses)

I guess the only thing you can't do with a file descriptor is type it in on the command line. So shell scripts, etc, would still be prone to race conditions.

(A numeric 64-bit pid can be sent over sockets and told to a subprocess, of course.)

Toward race-free process signaling

Posted Dec 7, 2018 14:01 UTC (Fri) by ebiederm (subscriber, #35028) [Link] (3 responses)

A 64bit pid is long enough you can't reliably type it on the command line, even 32bits are a problem.

This is part of the reason why pids which have a 32bit type are limited to 16bits by default.

Toward race-free process signaling

Posted Dec 7, 2018 17:18 UTC (Fri) by smurf (subscriber, #17840) [Link] (1 responses)

These days, that's the only reason to do this. People not running 10-year-old code are unlikely to be affected by >16-bit PIDs.

I habitually set maxpid to 99999. Anything unlikely to run >1000 processes, like Raspberry Pis, get 9999.

Toward race-free process signaling

Posted Dec 7, 2018 17:53 UTC (Fri) by zdzichu (subscriber, #17118) [Link]

I think the bigger maxpid, the better – safer. Short pids encourage manual typing, which is error-prone. Big pids kinda forces copy-pasting, which is safer (modulo pid reuse).

Toward race-free process signaling

Posted Dec 7, 2018 19:46 UTC (Fri) by epa (subscriber, #39769) [Link]

Sorry for being unclear. I didn’t mean literally typing in the number (I would cut and paste anyway). I was illustrating the general point that a process id is just a number, with no special magic, and can be handled by any programming language including shell scripts. It can be saved to a file, passed on the command line, even sent over TCP/IP if necessary.

Existing code which works with 15-bit process ids could normally work on 64-bit ones with no change, or at most a change of type from int to long in strongly typed languages. File descriptors are great, but they form their own closed world and need a new set of APIs. They cannot just be treated as an opaque number or a string of text.

Toward race-free process signaling

Posted May 6, 2019 3:02 UTC (Mon) by cyphar (subscriber, #110703) [Link] (1 responses)

In many cases, /proc/self/fd/... is a neat way to "type an fd on the command-line".

Toward race-free process signaling

Posted May 6, 2019 12:22 UTC (Mon) by smurf (subscriber, #17840) [Link]

Your favorite shell's autocomplete mechanism should be able to understand PIDs too.

It's still somewhat dangerous to actually use that, though. The probability that mistyping the first four digits and pressing TAB gives you an entirely unrelated process shouldn't be underestimated.

Toward race-free process signaling

Posted Dec 7, 2018 16:03 UTC (Fri) by dw (subscriber, #12017) [Link] (7 responses)

My understanding is that this is an attempt to fix an edge case in code that does not keep track of its own children correctly. The problem is one of:

1) Child exits, crap parent kills unrelated process because it wasn't paying attention
2) /etc/init.d/postfox stop, crap init script kills unrelated process due to stale PID file.

No solution presented thus far actually solves case 1), the old API will continue to exist in perpetuity, and any new API will always only see limited uptake, due to portability or simple lack of effort to port everything over. There is a limit to the value in any solution, because it is unlikely to see revolutionary uptake. A simple solution therefore seems preferable.

The file descriptor solution does not meaningfully solve case 2), there is still a race for the init script to open /proc/blah/pid and somehow introspect the descriptor it received matches the daemon it is trying to kill, so some "is this really the process I want?" code is still necessary.

The FD solution creates a world of security pain that doesn't match the typical UNIX files model, because the kernel object in question can change its security identity over time.

The cookie-based solution does not entail updating every single API, the original problem is only about signal delivery, and thus only effects kill() and possibly clone().

A cookie-based solution allows the identifier persist on disk easily. Consider two new system calls:

- pid_to_handle(pid_t pid, struct pid_handle *handle) -- accepts pid==0 or pid==child pid. In the 0 case, PID of current process returned. In remaining case, return -1 if PID is not a child of the current process.

- kill_by_handle(pid_t, struct pid_handle *); -- works identically to kill(), except handle must match. No other restriction placed on caller.

After calling clone(), pid_to_handle() is used by the parent prior to waitpid() to retrieve the handle. For daemonizing processes, it must be the child invoking it on itself as any handle the parent could receive would be for the intermediary daemonizing process that almost immediately died.

Toward race-free process signaling

Posted Dec 7, 2018 17:30 UTC (Fri) by smurf (subscriber, #17840) [Link] (6 responses)

> A cookie-based solution allows the identifier persist on disk easily.

A pid-plus-verifiable-identifier approach solves this problem just as well.

> /etc/init.d/postfox stop, crap init script kills unrelated process due to stale PID file.

This is why sane init systems tend to not use PID files.

> After calling clone(), pid_to_handle() is used by the parent prior to waitpid() to retrieve the handle.

This entails a race. You should not be required to assume that your thread is the only one calling waitpid(-1). clone2() needs to return the handle atomically.

Toward race-free process signaling

Posted Dec 7, 2018 17:37 UTC (Fri) by dw (subscriber, #12017) [Link]

> This entails a race. You should not be required to assume that your thread is the only one calling waitpid(-1). clone2() needs to return the handle atomically.

That's a fair point, but multi-threaded software with competing threads calling waitpid(-1) are no less buggy IMHO than those with competing threads say, closing random file descriptors, or creating new ones without CLOEXEC -- the problem is simply moved. It's just one of many single-thread-centric interfaces an MT app must give up. And particularly, it is a class of problem that is not fixed by modifying clone() -- an MT app exhibiting this behaviour has bigger problems than race-free child signalling

Toward race-free process signaling

Posted Dec 8, 2018 2:12 UTC (Sat) by wahern (subscriber, #37304) [Link] (4 responses)

> > /etc/init.d/postfox stop, crap init script kills unrelated process due to stale PID file.
>
> This is why sane init systems tend to not use PID files.

If the service takes a POSIX lock on the PID file (rather than writing it out), the PID can be queried atomically. You can't *use* it atomically, but that's because the only way to atomically send a signal to an individual process is if you're the parent and aren't using SA_NOCLDWAIT.

If the child disassociates from the service manager then you either need to rely on process groups or cgroups. While process groups are atomic (a beneficial inheritance from legacy TTY and batch job management), the cgroups approach still involves reading PIDs from a file, which has the same TOCTTOU race.

Basically, on Linux I think it's still impossible to write a service manager that isn't susceptible to the classic PID file race while also being able to accurately signal individual wayward processes. (And to be fair, I don't think it's possible on any other Unix-like system, at least not using published and supported interfaces.) You could use cgroups and PID namespaces to minimize collateral damage, but it's still fundamentally a hack. You could use a seccomp policy to prevent disassociation from the process group, but you still couldn't target *individual* processes in the group.

To safely signal individual processes there's really no substitute for process descriptors. A larger PID namespace that doesn't recycle PIDs isn't any better, even as an expediency. In both cases you still need to add a bevy of new syscalls and additional bookkeeping in the kernel. While PIDs may seem easier to use from the shell, the shell is perfectly capable of juggling and passing around descriptors (e.g. exec 8</proc/PID). The necessary bookkeeping in the kernel isn't less for wider PIDs because, like with the shell, all the infrastructure for descriptors exists and is easily applied. The benefit of descriptors, however, is that it gives processes a handle to query process state, like exit status, as well as a channel for reliable delivery of lifetime events (e.g. fork) so that a service manager could manage process trees in a straight-forward, race-free manner. That may not happen immediately, but if you're going to add new syscalls, why pick the dead-end solution?

Toward race-free process signaling

Posted Dec 8, 2018 2:41 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> Basically, on Linux I think it's still impossible to write a service manager that isn't susceptible to the classic PID file race while also being able to accurately signal individual wayward processes.
You can do that with cgroups, but it does require some trickery:
- Put a process in cgroup.
- SIGSTOP it.
- Inspect the cgroup to make sure the process is still the correct one.
- Send the signal.
- SIGCONT it.

Toward race-free process signaling

Posted Dec 8, 2018 2:43 UTC (Sat) by dw (subscriber, #12017) [Link]

If you're willing to risk sending SIGSTOP to a random process, as done here, there is no value to cgroups or indeed any API change whatsoever.

Toward race-free process signaling

Posted Dec 8, 2018 8:39 UTC (Sat) by nopsled (guest, #129072) [Link] (1 responses)

No need to SIGSTOP or anything else, just use the freezer (which is coming for v2, patches have already been posted).

Toward race-free process signaling

Posted Dec 8, 2018 9:02 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

The last time I tried that (3-4 years ago) it resulted in unrecoverable system lockups. So I kinda hesitate to recommend it.