[go: up one dir, main page]

|
|
Log in / Subscribe / Register

The curious case of O_DIRECTORY|O_CREAT

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 28, 2023 2:03 UTC (Tue) by josh (subscriber, #17465)
Parent article: The curious case of O_DIRECTORY|O_CREAT

> The expectation often is that this particular combination would create and open a directory.

Given the indication that the behavior of this combination can have its behavior changed/fixed, is there some strong reason *not* to make it successfully create a directory? That seems like *useful behavior*: create a directory and atomically return an fd for that directory.

Would that break some existing software? It doesn't sound like it would, given

> "I think we can pretty much assume that there are no actual users of it, and we might as well clean up the semantics properly"


to post comments

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 28, 2023 3:53 UTC (Tue) by brauner (subscriber, #109349) [Link] (8 responses)

I already proposed that when I fixed the bug:

"(As a sidenote, posix made an interpretation change a long time ago to
at least allow for O_DIRECTORY | O_CREAT to create a directory (see [3]).

But that's a whole different can of worms and I haven't spent any
thoughts even on feasibility. And even if we should probably get through
a couple of kernels with O_DIRECTORY | O_CREAT failing with EINVAL first.)"

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 28, 2023 4:11 UTC (Tue) by josh (subscriber, #17465) [Link] (6 responses)

Thank you! Sorry I missed that.

But in any case, Linus pointed out open's hard-to-extend semantics (the same ones that motivated O_TMPFILE), which make it less worthwhile to attempt to make this work.

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 28, 2023 8:40 UTC (Tue) by smcv (subscriber, #53363) [Link] (5 responses)

Perhaps openat2() (which always rejected unknown flags) could interpret O_DIRECTORY|O_CREAT as "open a directory, creating it if it doesn't exist", even if plain open() and openat() don't? Or would it be more confusing for openat2() and openat() to differ on that?

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 28, 2023 9:36 UTC (Tue) by brauner (subscriber, #109349) [Link] (4 responses)

openat2() can very well differ from the other variants. But if we wanted it to be the only open* syscall to open and create a directory then we should add an openat2() specific flag instead of reusing O_DIRECTORY | O_CREAT. Otherwise userspace might rightly be confused why O_DIRECTORY|O_CREAT works on openat2() but not on other open() variants. Reusing O_DIRECTORY|O_CREAT really only makes sense when we provide consistent behavior for all open syscalls imho.

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 28, 2023 9:49 UTC (Tue) by josh (subscriber, #17465) [Link] (3 responses)

Yeah, agreed; might as well make O_CREATE_DIR.

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 28, 2023 11:58 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)

I like how you give "create" back its "E" but then steal "ectory" again.

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 28, 2023 12:54 UTC (Tue) by brauner (subscriber, #109349) [Link] (1 responses)

The API giveth, and the API taketh away.

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 28, 2023 20:44 UTC (Tue) by Villemoes (subscriber, #91911) [Link]

O_LORD_WONT_YOU_BUY_ME_PONIES

open(, O_DIRECTORY|O_CREAT) should create and open a file descriptor to a directory

Posted Apr 11, 2023 16:39 UTC (Tue) by meuh (guest, #22042) [Link]

Yes please

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 28, 2023 16:19 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (6 responses)

You don't, strictly speaking, need it. mkdirat(2) and openat(2) should be able to resolve any race condition that might otherwise be possible. Here's a possible sequence of events:

0. Thread A opens the parent directory. This may or may not race with something, but let's call it out of scope for now and just assume that it happens.
1. Thread A calls mkdirat and creates /path/to/foo/
2. Thread B renames it to /path/to/bar/ (or removes it or whatever)
3. Now if thread A tries to openat /path/to/foo/, it gets ENOENT, so it would know to start over and try again.

Alternatively:

2. Thread B renames it to /path/to/bar/
3. Thread B creates a new /path/to/foo/
4. Thread A successfully openats /path/to/foo/. It's a different directory than the one that it created, but from thread A's perspective, this makes no practical difference. One newly-created directory is just as good as another, right?

Alternatively alternatively:

3. Thread B creates /path/to/foo as a symlink to something else.
4. Thread A tries to openat /path/to/foo/, but it fails because of O_NOFOLLOW. A knows that something is wrong and aborts the operation.

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 28, 2023 18:35 UTC (Tue) by zev (subscriber, #88455) [Link] (5 responses)

> One newly-created directory is just as good as another, right?

Not if they have different metadata (permissions or ownership, which could arise with multiple processes accessing /tmp, say). Also, is there any guarantee B even left it empty and didn't also start populating it with things that would conflict with A's plans for it?

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 29, 2023 0:46 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

> Not if they have different metadata (permissions or ownership, which could arise with multiple processes accessing /tmp, say).

A can fstat it after opening it, if A cares.

> Also, is there any guarantee B even left it empty and didn't also start populating it with things that would conflict with A's plans for it?

If the permissions on the original dir allow it, then B can do this anyway. If not, then see previous reply.

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 29, 2023 2:12 UTC (Wed) by interalia (subscriber, #26615) [Link] (3 responses)

I could be wrong but I imagine that even if A managed to create the directory atomically first, process B could create files in there before A gets control again. So there's guarantee the newly created directory is empty by the time that A reads it. Most programs probably just don't care, of course, as long as the directory serves their purposes.

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 29, 2023 5:37 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (2 responses)

It is possible for A to use unlinkat(2) to remove the directory immediately after creating it (with AT_REMOVEDIR). In principle, I imagine that you might be able to continue using other fooat syscalls on the unlinked directory until you close it, thus creating a "true" temporary directory, but I have not tried it.

That still doesn't work because you race against someone else opening the directory before you can unlink it, but it's a neat party trick (if it works).

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 30, 2023 2:20 UTC (Thu) by josh (subscriber, #17465) [Link]

Sadly, that doesn't work. Once you've unlinked a directory, attempting to create something in that directory produces ENOENT.

The curious case of O_DIRECTORY|O_CREAT

Posted Mar 30, 2023 6:50 UTC (Thu) by donald.buczek (subscriber, #112892) [Link]

Yes, anonymous directories would be nice to have. Of course, with `dfd = mkdirat(parentfd, optional_name, mode, O_TMPDIR)` ínstead of a non-atomic `mkdir()`, `unlinkat()` combination. But I assume, that would be far from trivial to implement, because filesystems are not prepared to handle trees of unlinked directories.

If you want a directory tree and its contents to go away by kernel cleanup when the last access is gone and have privilege, you can use a lazily unmounted mount on top of a lazily detached loop device on top of an unlinked file like this:

root@theinternet:~# fallocate -l 10G /tmp/x.dat
root@theinternet:~# losetup --find --show /tmp/x.dat
/dev/loop0
root@theinternet:~# mkfs.ext4 -q /dev/loop0
root@theinternet:~# mount /dev/loop0 /mnt
root@theinternet:~# cd /mnt
root@theinternet:/mnt# umount -l /mnt
root@theinternet:/mnt# losetup -d /dev/loop0
root@theinternet:/mnt# rm /tmp/x.dat
root@theinternet:/mnt# ls -l
total 16
drwx------ 2 root root 16384 Mar 30 08:43 lost+found
root@theinternet:/mnt# df /tmp
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 268304384 44856232 223448152 17% /
root@theinternet:/mnt# cd
root@theinternet:~# df /tmp
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 268304384 44786364 223518020 17% /

But there is no way to create this stack atomically.

Another ugliness here is, that the system will unnecessarily flush modified data during (auto-)umount.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds