New system calls: pidfd_open() and close_range()

Posted May 24, 2019 13:26 UTC (Fri) by cyphar (subscriber, #110703)
In reply to: New system calls: pidfd_open() and close_range() by rweikusat2
Parent article: New system calls: pidfd_open() and close_range()

We are in strong agreement over the pid reuse -- that's literally the whole point of this interface. What I'm saying is that you:

1. Do a pidfd_open(); *then*
2. Check that it is the process you want; and *then*
3. Operate on the pidfd which you now know reference the process you want.

If the process dies after (1) then you will get -ESRCH on all operations. If (1) is the wrong process, you will detect it during (2) and can error out. Thus (3) will only ever operate on the correct process -- because you have a re-usable reference that won't be recycled you aren't subject to recycling problems. This is not possible with the original pid-based interfaces because any operations in (3) would be using a pid that might be recycled and thus the check in (2) is worthless.

Please note that all of the above is also true with the current pidfd interface which works through opening /proc/$pid and pidfd_send_signal(2) (this was merged in 5.1). pidfd_open(2) is not anything more radical than that, it just offers a way of using pidfds without the need for procfs to be mounted and usable.

to post comments

New system calls: pidfd_open() and close_range()

Posted May 24, 2019 14:44 UTC (Fri) by rweikusat2 (subscriber, #117920) [Link] (11 responses)

I was asking how this "check that you got the right process" is supposed to happen.

New system calls: pidfd_open() and close_range()

Posted May 24, 2019 15:44 UTC (Fri) by cyphar (subscriber, #110703) [Link] (10 responses)

I guess I misunderstood your question (you were talking about the race condition not how the check might work).

This depends strongly on what the application is doing. If the application is pkill(1), then you check openat(pidfd, "cmdline") and check the name of the command. If you're an init system you might check the ppid is 1, that it's in the right cgroup, that it has the right exe magiclink, that you haven't received a death signal from that service, and so on.

In addition, the benefit of pidfds is that they can be passed around or even persisted (through bind-mounts) so that if you are in a scenario where you are sure the pidfd is correct (for instance, you are the parent process and spawned the child) you can pass the pidfd to another process which can operate on the pidfd even though they could not (by themselves) get a pidfd that they were sure about. A toy usecase might be a container runtime bind-mounting the pidfd of the pid1 of containers (after spawning the pid1) and then using that pidfd after-the-fact to operate on the container's namespaces.

New system calls: pidfd_open() and close_range()

Posted May 24, 2019 18:31 UTC (Fri) by rweikusat2 (subscriber, #117920) [Link] (9 responses)

Short version of the answer: It can't de done. This was my impression as well.

New system calls: pidfd_open() and close_range()

Posted May 24, 2019 19:59 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

What do you mean by "it can't be done"? Cyphar provided the exact way it can be done.

New system calls: pidfd_open() and close_range()

Posted May 25, 2019 15:43 UTC (Sat) by rweikusat2 (subscriber, #117920) [Link] (7 responses)

"It can't be done".

Various heuristics can be applied here in order to find a process which is (according to someone's opinion at least) very likely to be the process whose pid was originally acquired but this doesn't mean that it actually is this process. The only way to know that it pid refers to a certain process is to acquire it from the parent and 'somehow' ensure that the exit status won't be reaped until one is done using the pid. If this is guaranteed, jumping through the pidfd hoop is just useless. The pid could be used instead (for signalling at least).

New system calls: pidfd_open() and close_range()

Posted May 25, 2019 23:09 UTC (Sat) by quotemstr (subscriber, #45331) [Link] (3 responses)

Cyphar really did describe exactly how it should be done, as Cyberax noted. Either provide a *precise timeline* showing a scenario under which Cypher's approach fails or admit that you're wrong. Your comment makes unsubstantiated claims that are simply incorrect. Coordination with parental reaping is unnecessary.

New system calls: pidfd_open() and close_range()

Posted May 26, 2019 20:34 UTC (Sun) by rweikusat2 (subscriber, #117920) [Link] (2 responses)

The original statement was that it would be possible to check that a pid which was used for pidfd_open indeed referred to the process it was intended to refer when the open was done. This is impossible unless the process doing the open is the parent or communicates with the parent. In particular, it cannot be done by looking at the content of /proc-files related to the process now using the pid as there's nothing specific to a particular process in there: All /proc directories for processes running ls -l /proc/self/ will contain the same metainformation (except the timestamps, that is, but these depend on the system clock and could also repeat in sufficiently adverse circumstances).

New system calls: pidfd_open() and close_range()

Posted May 26, 2019 20:46 UTC (Sun) by quotemstr (subscriber, #45331) [Link] (1 responses)

All you've done is repeat your initial assertion. You're wrong. Either admit that or provide a *SPECIFIC* and *CONCRETE* concrete counter-example showing an example of the race you're claiming exist.

New system calls: pidfd_open() and close_range()

Posted May 26, 2019 21:33 UTC (Sun) by rweikusat2 (subscriber, #117920) [Link]

I'm sorry but if you don't understand that, I obviously won't be able to explain it to you.

New system calls: pidfd_open() and close_range()

Posted May 26, 2019 3:28 UTC (Sun) by mjg59 (subscriber, #23239) [Link] (2 responses)

The behaviour of pill(1) is well documented - it kills processes that have specified attributes. You take a reference to a pid and then start examining those attributes. If they all match, you send an appropriate signal to that pid. Where's the race?

New system calls: pidfd_open() and close_range()

Posted May 26, 2019 20:42 UTC (Sun) by rweikusat2 (subscriber, #117920) [Link] (1 responses)

It's inherent in the semantics of the command: pkill (I'm assuming this was typo) kills some processes which are currently running and have certain attributes, namely, all it happens to find. This may include processes which were started after the pkill (and thus, very likely shouldn't have been killed by it) but it may as well not (they might be started such that pkill won't find them). Arguably, a pidfd_open in pkill would stop that from possibly killing processes it shouldn't ever have killed because they didn't match the specification. I didn't understand that.

New system calls: pidfd_open() and close_range()

Posted May 26, 2019 22:46 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Now you're just being silly.

pkill() has a contract - it kills processes by specified attributes. Right now it can kill a random process due to the PID-based race condition. There's still a race condition - pkill is not guaranteed to kill processes that launched concurrently with it.

The new pkill() would _always_ kill the right processes.