A filesystem for namespaces

Posted Dec 6, 2021 6:53 UTC (Mon) by NYKevin (subscriber, #129325)
In reply to: A filesystem for namespaces by marcH
Parent article: A filesystem for namespaces

As I understand the history here, UTF-8 did not exist at this point. Your options were UCS-2, UCS-4, or "ANSI" (i.e. various legacy non-Unicode codepages such as the venerable Windows-1252). Sun and Microsoft opted for UCS-2, other vendors either picked UCS-4 or ignored the problem. Then the Unicode people realized that UCS-2 was too small to encode everything, and introduced surrogates (creating UTF-16, and renaming UCS-4 to UTF-32 for consistency).

Much later, UTF-8 was introduced as a hack to make non-Unicode aware APIs (i.e. APIs which are incompatible with both UTF-16 and UTF-32, usually because they assumed "no embedded nulls") handle Unicode transparently, or at least in a way that was not entirely wrong. The alternative would have been to introduce wchar_t versions of the entire POSIX API, which would have sucked (and is exactly what Windows ended up doing for backcompat with "ANSI" programs).

to post comments

A filesystem for namespaces

Posted Dec 6, 2021 7:14 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> As I understand the history here, UTF-8 did not exist at this point.
UTF-1 did exist, and it was backwards (though not forwards) compatible with ASCII.

Pretty much the first major adopter of Unicode was NT. And at that time they simply didn't have anybody fluent in Chinese on the team, who would point out that there's no way in hell 2^16 characters are going to be enough.

There was not that much discussion about Unicode at that time at all, you can try searching comp.* hierarchy and barely anything comes up. So it's no wonder that the NT development team decided to go with 16-bit encoding. And the rest was history.

A filesystem for namespaces

Posted Dec 6, 2021 19:07 UTC (Mon) by mpr22 (subscriber, #60784) [Link] (3 responses)

> Much later, UTF-8 was introduced

Plan 9 adopted UTF-8 in 1992. (Ken Thompson invented it in a New Jersey diner placemat in September and added it to Plan 9 the next day.)

Windows NT 3.1 (which used UCS-2 but didn't properly support it) was released in 1993.

JDK Beta was released in 1995.

UTF-16 was introduced to support Unicode 2.0, which was published in 1996.

A filesystem for namespaces

Posted Dec 6, 2021 19:30 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (1 responses)

You see, this is what happens when you base your entire knowledge of history on oral tradition and random articles on Hacker News: You get all the dates wrong and your version of events is completely incorrect. Sorry for posting misinformation.

A filesystem for namespaces

Posted Dec 7, 2021 19:06 UTC (Tue) by atnot (guest, #124910) [Link]

I don't think it has to be wrong. Just that something was introduced at some point doesn't mean it was actually widely used.

Although I suspect that old code that assumed a fixed length encoding was probably a big factor, considering that is still a leading reason for poor unicode support today.

A filesystem for namespaces

Posted Dec 9, 2021 11:02 UTC (Thu) by Karellen (subscriber, #67644) [Link]

Windows NT 3.1 (which used UCS-2 but didn't properly support it) was released in 1993.

But NT development, which is where core unicode-everywhere support was first implemented, started in 1989. Even though an NT product hadn't been released yet, they were 3 years into development using UCS-2 before UTF-8 was invented - and there was no guarantee that it would end up being the winner at the time. Even most Linux distros, with 8-bit char string APIs in the kernel, were still primarily using language-specific character encodings until the 2000s.