[go: up one dir, main page]

|
|
Log in / Subscribe / Register

New system calls: pidfd_open() and close_range()

New system calls: pidfd_open() and close_range()

Posted May 23, 2019 15:02 UTC (Thu) by mezcalero (subscriber, #45103)
Parent article: New system calls: pidfd_open() and close_range()

That syscall prototype with the "unsigned int" used for fds looks like the kernel side prototype. In userspace fds are generally "int". (Isnt't it great that the Linux kernel side and userspace type disagree on such a fundamental type? Yay for Linux!). Hence I presume the MAXINT in the article should actually be a UINTMAX, i.e. the same as (unsigned int) -1...

I must say, for all purposes I have in the codebases I care for (systemd…) the range thing is a bit weird though, usually we want to close everything but a few arbitrary fds, and then rearrange those fds to specific positions. But for that close_range() is not particularly useful, as it requires you to sort your list of fds to keep open first and then find all ranges between them. This means behaviour of closing everything is O(n*log(n)) (for the worst case where the fds to keep open are fully distributed over the entire range), for n being the number of fds to keep open. This is only marginally better than enumerating /proc/self/fd/ and closing everything found there, which is O(m) for m being the number of fds previously open. Marginally better since usually n ≪ m.

I personally would much rather prefer a prototype like:

int close_except(const int *fds, size_t n_fds);

i.e. just specify the fds you want to keep open explicitly, regardless of order, trivially easily...

And I think not only systemd would benefit from such a close_except() call, but also everything else that invokes something with fds set up in a special way, for example popen() and friends.

Lennart


to post comments

New system calls: pidfd_open() and close_range()

Posted May 23, 2019 16:10 UTC (Thu) by tao (subscriber, #17563) [Link]

Yeah, close_except() certainly seems like a more desirable behaviour.

New system calls: pidfd_open() and close_range()

Posted May 23, 2019 16:22 UTC (Thu) by Sesse (subscriber, #53779) [Link] (7 responses)

Can't you rearrange before close instead of after? Say that you want fds 5, 26 and 4, and then close the rest:

dup2(5, 0);
dup2(26, 1);
dup2(4, 2);
close_range(3, MAXINT);

(If you also wanted to keep e.g. 1, you would need some extra fiddling, but that goes for close_except(), too.)

New system calls: pidfd_open() and close_range()

Posted May 24, 2019 12:49 UTC (Fri) by mezcalero (subscriber, #45103) [Link] (6 responses)

Well, for stdin/stdout/stderr that'll work, but it falls apart if the fds you want to keep and maybe later rearrange are already in the range you want to rearrange them to... The socket activation protocol systemd implements (i.e. LISTEN_FDS=) supports large numbers of fds, and systemd might create them a long time before actually activating things, hence they likely are in the range we want to move fds to.

New system calls: pidfd_open() and close_range()

Posted May 24, 2019 12:53 UTC (Fri) by Sesse (subscriber, #53779) [Link]

But that's still true even with your proposed close_except()?

New system calls: pidfd_open() and close_range()

Posted May 30, 2019 8:44 UTC (Thu) by mina86 (guest, #68442) [Link] (4 responses)

int close_except(int *fds_to_keep, size_t n) {
    qsort(fds_to_keep, n, sizeof *fds_to_keep, int_cmp);
    int fd = 0;
    for (size_t i = 0; i < n; ++i)
        if (fds_to_keep[i] != fd) dup2(fds_to_keep[i], fd++);
    return close_range(fd, INT_MAX);
}
Handling of EBADF and EINTR left as exercise to the reader.

New system calls: pidfd_open() and close_range()

Posted May 30, 2019 10:31 UTC (Thu) by Jandar (subscriber, #85683) [Link] (3 responses)

> Handling of EBADF and EINTR left as exercise to the reader.

And the important exercise is fixing the bug ;-). The fd's have to be moved back to their original value.

Closing every fd I don't know about is only a band-aid to fix buggy software (own or 3rd part library). The correct solution is using CLOEXEC and is available for ages.

New system calls: pidfd_open() and close_range()

Posted May 30, 2019 10:49 UTC (Thu) by mina86 (guest, #68442) [Link]

> And the important exercise is fixing the bug ;-). The fd's have to be moved back to their original value.

Or start referring to the file descriptors by their new numbers (though even then the function lacks a way to communicate all the remappings) to save on bunch of syscalls.

New system calls: pidfd_open() and close_range()

Posted Sep 7, 2019 17:59 UTC (Sat) by ceplm (subscriber, #41334) [Link] (1 responses)

> Closing every fd I don't know about is only a band-aid to fix buggy software (own or 3rd part library). The correct solution is using CLOEXEC and is available for ages.

No, it is useful also for programs where the author doesn’t have full control. See https://bugs.python.org/issue11284. I could have switch all Python open() calls to use CLOEXEC (that’s essentially what happened in Python 3.2 as an implementation of https://www.python.org/dev/peps/pep-0446/), but it doesn’t save me from C extensions which just use open(2) with default values and create inheritable file descriptors on POSIX platforms.

And walking through /proc/self/fd/ is horribly slow (think about FreeBSD’s MAXFD=655000), and mostly not async signal safe.

New system calls: pidfd_open() and close_range()

Posted Sep 14, 2019 20:54 UTC (Sat) by Jandar (subscriber, #85683) [Link]

There are libraries (I have encountered one) opening a file-descriptor which has to remain open across exec to be used by the same library in (grand)child processes.

Mangling with unknown resources is never good practice but maybe as I said: a band-aid to fix buggy software. The C extension of your example is such buggy software you are applying a band-aid to.

I hope I never see close_range() in production-code. Using it is the confession one has given up on producing/using good software.

New system calls: pidfd_open() and close_range()

Posted May 23, 2019 20:07 UTC (Thu) by scientes (guest, #83068) [Link] (3 responses)

What about having a open() flag that doesn't guarantee the lowest syscall number? That is also O(n) for no good reason.

O_ANYFD

New system calls: pidfd_open() and close_range()

Posted May 24, 2019 6:14 UTC (Fri) by smurf (subscriber, #17840) [Link] (1 responses)

You mean the lowest FD number.

Well, that's O(n) over a bitmap, plus there's a CPU instruction to find the first free bit, so the cost is reasonably low no matter how large N is. You could even cache the first free FD or two. I doubt that there'd be any measurable impact of O_ANYFD, esp. compared to the cost of looking up the target of the open().

New system calls: pidfd_open() and close_range()

Posted May 24, 2019 22:50 UTC (Fri) by dezgeg (guest, #92243) [Link]

See https://lwn.net/Articles/787473/, specifically this quote: "Jan Kara wondered if the file-descriptor table bouncing could be handled by allocating file descriptors in a way that causes separate threads to use different parts of the table. "

New system calls: pidfd_open() and close_range()

Posted May 25, 2019 0:33 UTC (Sat) by wahern (subscriber, #37304) [Link]

FWIW, fcntl F_DUPFD (or F_DUPFD_CLOEXEC) can be used to get a higher descriptor: it returns a descriptor greater than equal to the fcntl argument. Not ideal because you have to open and close the original descriptor, but technically sufficient to implement your suggested interface. One still has to figure out what the highest descriptor is and ensuring contiguous allocation of related descriptors, but so too with O_ANYFD.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds