[go: up one dir, main page]

|
|
Log in / Subscribe / Register

A filesystem for namespaces

By Jonathan Corbet
December 3, 2021
It is natural, when looking at the kernel development process, to focus on patches that find their way to acceptance and become a part of future kernels. But there can be value in looking at work that doesn't clear the bar; in failing, these patches often reveal things about the kernel and the community that creates it. Such is the case with the proof-of-concept namespacefs patch series recently posted by Yordan Karadzhov. One should not expect to see namespacefs in a future kernel but, in failing, this work showed a real use case and why it is hard to satisfy that use case in the kernel.

Namespacefs is, as one might expect, a virtual filesystem implemented by the kernel. Its job is to display the hierarchy of namespaces running on the system; this information reflects the hierarchy of containers that are running. By using namespacefs, administrators can more readily see what is happening on their systems; it is also meant to facilitate complicated use cases like tracing multiple containers and watching how they interact.

The initial implementation was limited to the PID and time namespaces. One can use it to traverse the hierarchy of PID namespaces (time namespaces are not hierarchical) and obtain the list of processes running in each. Other types of namespaces are not supported in this posting, but the intent was seemingly to add that support in a future version if namespacefs looked like the right solution to the problem.

As Karadzhov wrote:

Being able to see the structure of the namespaces can be very useful in the context of the containerized workloads. This will provide universal methods for detecting, examining and monitoring all sorts of containers running on the system, without relaying on any specific user-space software.

Much of this information is available in user space now in the form of directories under /proc, but there are some missing pieces and that information is not organized in a way that shows the actual namespace hierarchy. Container-orchestration systems can also provide a view of the containers they manage, of course, but they don't provide a solution for the general case. Namespacefs was meant to make this information readily available regardless of which orchestration systems are in use.

There were a few objections raised to this work, starting with the fact that a namespace's entry in namespacefs needs to have a name. There are currently no names associated with namespaces, so namespacefs uses the number of the inode that is attached to each namespace inside the kernel. Eric Biederman was quick to criticize that approach, saying: "It is not correct to use inode numbers as the actual names for namespaces". He went on to say that there was nothing else that could be used as names for namespaces either, and that the entire idea was unworkable.

There are, it seems, a couple of problems with using inode numbers as names for namespaces. One of those, which Biederman spelled out later, is that there is no way to recreate the namespace hierarchy at a later time with the same names. That, he said, would break any system that uses, for example, CRIU to checkpoint and restart containers, perhaps as part of a live-migration scheme. The only way to handle this properly, he said, is to create a namespace for namespace names, and that has proved to be a hard problem in the past.

The CRIU issue is only relevant if containers that may be checkpointed will use namespacefs. As both Karadzhov and Steve Rostedt pointed out, that is unlikely; the whole point of namespacefs is to show the situation on a specific machine. There is no reason for anybody to want to move namespacefs — or any container making use of namespacefs — across machines or even to checkpoint it. It is, of course, hazardous to assume that nobody will want to use a feature in a certain way in the future but, in the absence of a surprising use case, the naming problem may not be an issue in actual use.

An arguably deeper problem, though, is that namespacefs can be seen as an attempt to recognize containers in the kernel, but the kernel has (by design) no concept of a "container". The kernel, instead, provides a set of pieces that user-space systems can assemble into varying types of containers. The namespacefs patches use PID namespaces as the objects around which the hierarchy is built and ignore user namespaces entirely. Biederman, in his initial response, criticized that decision, saying that "there is definitely no meaningful hierarchy without the user namespace". Not all containers use user namespaces, though, and those namespaces lack the process-ID information that Karadzhov's patch was meant to expose in the first place.

But, as James Bottomley pointed out, not all containers use PID namespaces either. Trying to identify containers without PID namespaces in namespacefs is not going to lead to much joy.

The end result is that it appears difficult to implement something like namespacefs in the kernel without introducing some sort of concept of what a container is. There is no more appetite for doing that now than there has been in past years; the lack of a container abstraction in the kernel is seen as having enabled a great deal of innovation on the user-space side. For this reason alone, namespacefs would be a hard sell in the kernel community.

It also appears, though, that it should be possible to get the required information entirely in user space by digging through a lot of /proc files. If there is information that is missing, it can be added to /proc rather than introducing an entirely new filesystem. So that is the approach that Karadzhov will take to solve this problem. Another proof of concept will be put together to show how it would work.

If that implementation turns out to be difficult or impossible to do efficiently, then there might be an argument for reconsidering namespacefs. Otherwise, though, a mechanism like namespacefs seems unlikely to make it into the kernel. That particular effort may not have led directly to the desired result, but it did create a discussion that coalesced on a seemingly better solution and, in the process, highlighted some of the constraints brought by the kernel's lack of a container concept. A reluctance to implement policy is generally a good thing, but it can end up making certain kinds of problems harder for users to solve.

Index entries for this article
KernelContainers
KernelNamespaces


to post comments

A filesystem for namespaces

Posted Dec 3, 2021 18:45 UTC (Fri) by snajpa (subscriber, #73467) [Link] (12 responses)

> There is no more appetite for doing that now than there has been in past years; the lack of a container abstraction in the kernel is seen as having enabled a great deal of innovation on the user-space side. For this reason alone, namespacefs would be a hard sell in the kernel community.

Surely, yes. All that innovation would be _entirely_blocked_ by having an _optional_ directly-in-kernel container abstraction, which could just mean processes couldn't shake it off once contained in it, the container could have a sound name and the community could finally start working on direct in-kernel /proc & /sys abstraction for the namespaces & cgroups the container has. That'd be a nightmare world to live in :)

[if not obvious, I've always found this a bit funny, at times a bit infuriating, but mostly funny as hell :D]

A filesystem for namespaces

Posted Dec 3, 2021 20:07 UTC (Fri) by smurf (subscriber, #17840) [Link]

The point is that you cannot have a "container abstraction", optional or otherwise, without forcing disparate namespaces to have common boundaries.

Not having an enforced boundary in namespace A just because you happen to have one in namespace B, however, is exactly what enables many of these user-space innovations.

A filesystem for namespaces

Posted Dec 4, 2021 9:39 UTC (Sat) by NYKevin (subscriber, #129325) [Link] (10 responses)

Once you have a kernel-provided abstraction, it inevitably becomes the center of gravity about which all userspace implementations must orbit, either by explicitly supporting it, or by explicitly refusing to support it. Everyone who wants to make a userspace container system has to choose whether and how to support your abstraction, and if they choose not to support it, they have to document that fact and (probably) justify it to people who want the abstraction supported. If your abstraction is good, then there is nothing wrong with this, of course, but it is equally a problem if your abstraction is leaky or only appropriate for some use cases. Therefore, before providing a new abstraction, you should be satisfied that it is a good, or at least reasonable, abstraction to provide, that you can commit to supporting it indefinitely, and that it will not sow discord amongst the existing implementations and use cases.

Another perspective: The people who adopted Unicode early (Microsoft and Sun, mostly) got stuck on UTF-16, and now they can't transition to UTF-8 without breaking backwards compatibility. It may be better to give the technology a bit longer to mature before you start kernelizing things that are already very well supported in userspace at the moment.

A filesystem for namespaces

Posted Dec 4, 2021 22:56 UTC (Sat) by pm215 (subscriber, #98099) [Link]

On the other hand, it's possible to be late in recognizing and providing a key abstraction in the kernel -- I think you could make a case for threads being an example of that. We still have some legacy warts as a result that userspace has to jump through silly hoops to work around, like the setuid syscall being per-thread, not per-process.

(I have no dog in the container API question, the analogy just seemed interesting to me.)

A filesystem for namespaces

Posted Dec 5, 2021 14:04 UTC (Sun) by snajpa (subscriber, #73467) [Link]

I believe this argument was valid in the beginning. Now everyone uses only the bits they need/like, so we're already in the future :) Not having the container abstraction has managed to get us the innovation many have imagined it would and most importantly, show us these ways of thinking clearly, with pretty exciting example use-cases. Now various system management daemons and even desktop environments use it, etc...

That's not going to go anywhere. My point is that _now_ it may finally be the right time to introduce the container abstraction. We know that we don't want to force it onto everyone + not everyone has the same understanding what a container even is/means.

Is it really so hard to imagine a "uber" namespace/cgroup/something, 'struct ve' like OpenVZ has, or 'struct jail' like FreeBSD (which already supports nesting btw), only with the upstream Linux innovation of the various cg/ns being optional? We've gotten here by different approach, yes. That's good! Now we can do better than anyone has ever done so far.

I think that's rather exciting. I don't understand this - almost dogmatic - dismissal of the idea as whole.

A filesystem for namespaces

Posted Dec 6, 2021 0:05 UTC (Mon) by marcH (subscriber, #57642) [Link] (7 responses)

> The people who adopted Unicode early (Microsoft and Sun, mostly) got stuck on UTF-16, and now they can't transition to UTF-8 without breaking backwards compatibility.

Good example but just for Unicode completeness I think you mean: "They can't transition to UCS-4".

I don't think anyone intentionally opted for UTF-16, because it sucks as you alluded to. All APIs support either encoded strings (e,g.: UTF-8) or decoded strings (e.g.: UCS-4) whereas UTF-16 is "mostly-decoded for awkward backward compatibility". Not encoded but "mostly decoded" - and a likely infinite source of bugs...

I'm relatively confident what happened is: early Unicode adopters chose:
- UCS-2 for decoded strings / wchar_t
- UTF-8 for encoded strings but easily overridden by user preference.

Later, Unicode said "Wait, in fact we have more than 65536 characters. Sorry about that". Enters the UTF-16 hack replacing UCS-2 for _mostly decoded_ strings. The encoded APIs have not changed and still default to "UTF-8 easy overridden by user preference."

A filesystem for namespaces

Posted Dec 6, 2021 6:53 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (5 responses)

As I understand the history here, UTF-8 did not exist at this point. Your options were UCS-2, UCS-4, or "ANSI" (i.e. various legacy non-Unicode codepages such as the venerable Windows-1252). Sun and Microsoft opted for UCS-2, other vendors either picked UCS-4 or ignored the problem. Then the Unicode people realized that UCS-2 was too small to encode everything, and introduced surrogates (creating UTF-16, and renaming UCS-4 to UTF-32 for consistency).

Much later, UTF-8 was introduced as a hack to make non-Unicode aware APIs (i.e. APIs which are incompatible with both UTF-16 and UTF-32, usually because they assumed "no embedded nulls") handle Unicode transparently, or at least in a way that was not entirely wrong. The alternative would have been to introduce wchar_t versions of the entire POSIX API, which would have sucked (and is exactly what Windows ended up doing for backcompat with "ANSI" programs).

A filesystem for namespaces

Posted Dec 6, 2021 7:14 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> As I understand the history here, UTF-8 did not exist at this point.
UTF-1 did exist, and it was backwards (though not forwards) compatible with ASCII.

Pretty much the first major adopter of Unicode was NT. And at that time they simply didn't have anybody fluent in Chinese on the team, who would point out that there's no way in hell 2^16 characters are going to be enough.

There was not that much discussion about Unicode at that time at all, you can try searching comp.* hierarchy and barely anything comes up. So it's no wonder that the NT development team decided to go with 16-bit encoding. And the rest was history.

A filesystem for namespaces

Posted Dec 6, 2021 19:07 UTC (Mon) by mpr22 (subscriber, #60784) [Link] (3 responses)

> Much later, UTF-8 was introduced

Plan 9 adopted UTF-8 in 1992. (Ken Thompson invented it in a New Jersey diner placemat in September and added it to Plan 9 the next day.)

Windows NT 3.1 (which used UCS-2 but didn't properly support it) was released in 1993.

JDK Beta was released in 1995.

UTF-16 was introduced to support Unicode 2.0, which was published in 1996.

A filesystem for namespaces

Posted Dec 6, 2021 19:30 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

You see, this is what happens when you base your entire knowledge of history on oral tradition and random articles on Hacker News: You get all the dates wrong and your version of events is completely incorrect. Sorry for posting misinformation.

A filesystem for namespaces

Posted Dec 7, 2021 19:06 UTC (Tue) by atnot (guest, #124910) [Link]

I don't think it has to be wrong. Just that something was introduced at some point doesn't mean it was actually widely used.

Although I suspect that old code that assumed a fixed length encoding was probably a big factor, considering that is still a leading reason for poor unicode support today.

A filesystem for namespaces

Posted Dec 9, 2021 11:02 UTC (Thu) by Karellen (subscriber, #67644) [Link]

Windows NT 3.1 (which used UCS-2 but didn't properly support it) was released in 1993.

But NT development, which is where core unicode-everywhere support was first implemented, started in 1989. Even though an NT product hadn't been released yet, they were 3 years into development using UCS-2 before UTF-8 was invented - and there was no guarantee that it would end up being the winner at the time. Even most Linux distros, with 8-bit char string APIs in the kernel, were still primarily using language-specific character encodings until the 2000s.

A filesystem for namespaces

Posted Dec 6, 2021 15:44 UTC (Mon) by marcH (subscriber, #57642) [Link]

Found this useful thread in the meantime: https://news.ycombinator.com/item?id=20600195 "The tragedy of UCS-2"

A filesystem for namespaces

Posted Dec 3, 2021 23:12 UTC (Fri) by dullfire (guest, #111432) [Link]

It sounds kind of like the "namespacefs" concept would be best served as a fuse fs. Since it's boundaries/hierarchy and such are at least partly defined by the container implementation.

Off the top of my head I didn't notice anything in the article that would require the implementation be in-kernel.

A filesystem for namespaces

Posted Dec 10, 2021 14:59 UTC (Fri) by smitty_one_each (subscriber, #28989) [Link]

> That particular effort may not have led directly to the desired result, but it did create a discussion that coalesced on a seemingly better solution and, in the process, highlighted some of the constraints brought by the kernel's lack of a container concept.

Requirements development: the bargain may be Nietzschean, but what a bargain!


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds