[go: up one dir, main page]

|
|
Log in / Subscribe / Register

Little things that matter in language design

Little things that matter in language design

Posted Jun 8, 2013 8:15 UTC (Sat) by mgedmin (guest, #34497)
Parent article: Little things that matter in language design

Didn't Plan 9 have a "rune" type for 32-bit Unicode characters? Go was created by the same people who created Plan 9, wasn't it?


to post comments

Little things that matter in language design

Posted Jun 8, 2013 10:22 UTC (Sat) by lsl (guest, #86508) [Link]

Yes, that's where the rune type comes from. Also, the gc toolchain for Go is a direct descendant of the Plan 9 compilers.

Little things that matter in language design

Posted Jun 9, 2013 12:15 UTC (Sun) by tialaramex (subscriber, #21167) [Link] (33 responses)

The article doesn't mention this, but allowing Unicode beyond ASCII in identifiers means you need to do a bunch of extra work that might tempt you to throw in case insensitivity while you're at it.

Unicode is fairly insistent that e.g. although it provides two separate ways to "spell" the e-acute in café for compatibility reasons these two spellings are equivalent and an equality test for the two should pass. For this purpose it provides UAX #15 which specifies four distinct normalisation methods, each of which results in equivalent strings becoming codepoint identical.

If you don't do this normalisation step you can end up with a confusing situation where when the programmer types a symbol (in their text editor which happens to emit pre-combined characters) the toolchain can't match it to a visually and lexicographically identical character mentioned in another file which happened to be written with separate combining characters. This would obviously be very frustrating.

On the other hand, to completely fulfil Unicode's intentions either your language runtime or any binary you compile that does a string comparison needs to embed many kilobytes (perhaps megabytes) of Unicode tables in order to perform the normalisation steps correctly.

Little things that matter in language design

Posted Jun 9, 2013 12:32 UTC (Sun) by mpr22 (subscriber, #60784) [Link] (20 responses)

Case-insensitivity, Unicode, interoperation between Turks and non-Turks. Pick two.

Little things that matter in language design

Posted Jun 10, 2013 0:23 UTC (Mon) by dvdeug (subscriber, #10998) [Link] (19 responses)

How do you get case-insensitivity and interoperation between Turks and non-Turks? It's not a Unicode problem; Turks want i (ordinary i) to uppercase to İ (I with a dot), and non-Turks don't. Short of making a special Turkish i and I, which comes with its own problems and nobody does, that's going to be a problem.

Little things that matter in language design

Posted Jun 11, 2013 9:56 UTC (Tue) by khim (subscriber, #9252) [Link] (18 responses)

Since offer was "pick two" and you've decided to throw Unicode out the solution is obvious.

Short of making a special Turkish i and I, which comes with its own problems and nobody does, that's going to be a problem.

Sure, but it is as way to achieve case-insensitivity and interoperation between Turks and non-Turks.

Little things that matter in language design

Posted Jun 11, 2013 20:04 UTC (Tue) by dvdeug (subscriber, #10998) [Link] (17 responses)

Just because something tells you that you can have something cheap, fast or good, pick two, doesn't mean it's true.

Creating a new character set only achieves interoperation in a theoretical way, since nobody is using it. You've not thrown out just Unicode; you've thrown out any character set that has seen actual use for Turkish.

Even if you do and get everyone to use it, how much bad data is going to get created? Imagine a keyboard with 3 i keys; we'd get a bunch of data with the wrong i or wrong I. You've also created a whole new set of spoofing characters; Microsoft had better race to get Microsoft.com (with a Turkish i) as should everyone else with an i in their name.

Little things that matter in language design

Posted Jun 11, 2013 22:04 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (16 responses)

Ukrainian has its own letter 'i' which is distinct from ASCII 'i'. It works just fine.

It would be another story if dotless 'i' was the only unique letter in Turkish, but it's not. There also are: Ç, Ğ, I, İ, Ö, Ş, and Ü.

Little things that matter in language design

Posted Jun 12, 2013 5:12 UTC (Wed) by dvdeug (subscriber, #10998) [Link] (15 responses)

That's because, unlike Turkish, Ukrainian also has a complete alphabet that's distinct from ASCII. People who use languages written in Cyrillic have to switch their whole keyboard back and forth to write in English, unlike people who type in Turkish.

Even then, it doesn't work just fine. There's rules against registering mixed script domain names, and webbrowsers will display microsoft.com differently from mіcrosoft.com, because they detect that mixed script. Other places without that special code will provide no hint that the two aren't the same.

Having different characters with the same glyphs in the same script is even more problematic, because that special code won't work; there's no way a program could tell that microsoft.com (with a Turkish i) was a spoofing attempt.

Little things that matter in language design

Posted Jun 12, 2013 13:57 UTC (Wed) by khim (subscriber, #9252) [Link] (14 responses)

I don't really see your point: you still are trying to explain why dropping Unicode for the sake of keeping case-insensitivity and interoperation between Turks and non-Turks is dumb choice. Yes, it's dumb, people usually pick some other pair. But it does not change the fact that it may work just fine (for some certain definition of "just").

Little things that matter in language design

Posted Jun 12, 2013 19:01 UTC (Wed) by dvdeug (subscriber, #10998) [Link] (13 responses)

I'm explaining that it's not a real option for anyone that doesn't control their own universe. It's not Unicode; it's every Turkish character set ever. It's what Turkish keyboards give you. Nobody ever picks that pair because it's not a real option.

Little things that matter in language design

Posted Jun 14, 2013 21:40 UTC (Fri) by khim (subscriber, #9252) [Link] (12 responses)

Yet this is what used to solve the problem for Russian. Early computers in USSR only had Russian letters which were different from latin. And they, too, had this upcase problem (upcase for Russian "у" was "У" and for latin "y" was "Y"). It's not clear why Turks can not adopt the same solution. Well, "for historical reasons" probably - but that's still a "Unicode" choice.

Little things that matter in language design

Posted Jun 14, 2013 23:33 UTC (Fri) by dvdeug (subscriber, #10998) [Link] (11 responses)

I don't think you can call choosing any currently existing Turkish character set a "Unicode" choice. If we're going to dismiss history and how Turks currently use their computers, we could go further and change their whole writing system.

Russian is written in the Cyrillic alphabet, unlike Turkish which is written in the Latin alphabet. It's not written in the Latin alphabet by accident; it was changed from the Arabic alphabet in 1927 in an attempt to modernize the country and attach themselves politically and culturally to the successful West. Separating the Turkish alphabet from the Latin is not a neutral act, particularly when you don't do the same to the French or Romanian.

Little things that matter in language design

Posted Jun 15, 2013 14:19 UTC (Sat) by khim (subscriber, #9252) [Link] (10 responses)

Separating the Turkish alphabet from the Latin is not a neutral act, particularly when you don't do the same to the French or Romanian.

Sure. But this is what Unicode is all about. Unicode didn't happen in one step. Early character encodings were... strange (from today's POV). Not just Russian computers, US-based computers, too (think EBCDIC and all these strange symbols used by APL). Eventually some groups of symbols were put together and some other symbols were separated. Not just Cyrillic, but Greek (charset which is as closely related to Cyrillic as Turkish as related to Romanian), etc. Why Telugu and Kannada are separated but Chineese and Japanese Han characters are merged? If we want to make upcase/lowercase functions locale-independent we can do with Turkish (French, Romanian, etc) what was done with Telugu and Kannada.

Little things that matter in language design

Posted Jun 15, 2013 14:52 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

The relationship between the Turkish variant of the Latin alphabet and some other random European variant of the Latin alphabet more closely resembles the relationship between the Serbian and Russian variants of the Cyrillic alphabet than the relationship between the Cyrillic alphabet and the Greek alphabet.

Little things that matter in language design

Posted Jun 15, 2013 22:52 UTC (Sat) by dvdeug (subscriber, #10998) [Link] (6 responses)

"Unicode didn't happen in one step" is blaming Unicode for the entire history of computing.

If you don't care about if the Turks are going to use your character set, go ahead and tell them to use ASCII. If you choose to separate their alphabet from the Latin, you're going to have a problem that they consider their alphabet part of the extended Latin alphabet, and they're not going to find that an acceptable solution. If you choose to separate out the alphabets of thousands of languages (even though the English alphabet is a superset of the French and Latin), you might mollify the Turks, but nobody is going to use your character set.

In reality, Turkish support requires locale-sensitive casing functions because every other solution has serious technical and often political problems, as well as not being compatible with existing systems, including keyboards.

Little things that matter in language design

Posted Jun 16, 2013 3:30 UTC (Sun) by hummassa (guest, #307) [Link] (2 responses)

...
> In reality, Turkish support requires locale-sensitive casing functions
...

Let's be plain: there is no "casing functions" that are not locale-sensitive. The Turkish dotted "i"s are one example, the German vs. Austrian "ß" is another, etc. And don't get me started on collation order. If one is going to try to facilitate computations by separating each locale to an alphabet, I wish good luck with its newnicode. The real Unicode thankfully does not work that way. Usually, at least. :-D

Little things that matter in language design

Posted Jun 16, 2013 8:21 UTC (Sun) by khim (subscriber, #9252) [Link] (1 responses)

If one is going to try to facilitate computations by separating each locale to an alphabet, I wish good luck with its newnicode. The real Unicode thankfully does not work that way. Usually, at least. :-D

Well, that's certainly a pity: Unicode was developed to fit in 16bit and thus merged many scripts (it assumed language will be separated "on the side" and/or will be less important then glyphs themselves). They have failed (today there are over 90'000 glyphs in Unicode) yet as a result we can not properly work with English+Turkish (or even German+Austrian) texts as you've correctly pointed out.

Today we are stuck: yes, it's not perfect and this decision certainly made life harder, not easier, but it'll be hard to replace it with anything else at this point. Similar story to QWERTY. Numerous problems which stems from that old decision are considered minor enough and it'll be hard to switch. But note that the most popular OS does exactly that for CJK. It's slowly but surely is replaced by Unicode-based OSes (such as Android) thus in the end Unicode is probably inevitable, but it does not means that you can not achieve interoperability with Turkish people and working upcase/lowercase simultaneously. You can - Unicode prevents that, nothing else.

Little things that matter in language design

Posted Jun 16, 2013 10:33 UTC (Sun) by dvdeug (subscriber, #10998) [Link]

Let's note that you want every one of 5,000 different languages to have its own code page; your comment about German+Austrian implies that you want every subdialect to have its own code page. And that's not approaching the question of how you want to deal with sometimes wildly different orthographies for one language.

"this decision certainly made life harder, not easier"

There's no certainly about it. To type "mv Das_Boot_German.avi Boata_filmoj" in your system you'd have to change keyboards several times, from whatever language mv is in, to German, to English, possibly to whatever language you count avi as, then to Esperanto. Right now, you can type that from any keyboard that supports the ISO standard 26-letter alphabet. You can't search a document for Bremen without knowing whether someone considered that a German word or an English word, and e = mc², originally written by a German speaker but understood worldwide, would get an arbitrary language tag. While there are some Cyrillic and Greek look-alikes for Latin-script words, you would explode that; "go" could be encoded any number of ways, and any non-English speaker would have to switch their keyboard to go to lwn.net or google.com or any other English-named sight.

"note that the most popular OS does exactly that for CJK."

Note that the article you link to does not say Tron is the most popular OS, and that it does not do exactly that for CJK, because Chinese is not one language; it's a rather messy collection of languages. Tron forces Cantonese to be written in the same script as Mandarin and Jinyu. Note also that Tron treats Turkish the exactly same way Unicode does, as it's a copy of Unicode in everywhere but the Han characters.

"You can - Unicode prevents that, nothing else."

If by Unicode, you mean every character set ever used for Turkish (including Tron). I've never seen a fully worked out draft of a character set that fits your specifications. That's never really impressive, is it, when someone is claiming that something would be clearly easier yet it's never been tried.

Little things that matter in language design

Posted Jun 16, 2013 6:35 UTC (Sun) by micka (subscriber, #38720) [Link] (2 responses)

> even though the English alphabet is a superset of the French and Latin

I suppose you mean "subset" ? Like English alphabet is strictly included inside French (without é, è, à, ...) and latin alphabet (I see no difference) ?

Little things that matter in language design

Posted Jun 16, 2013 9:17 UTC (Sun) by dvdeug (subscriber, #10998) [Link] (1 responses)

English uses a lot of diacritics on characters if you look hard enough. Façade and résumé are completely standard spellings; coöperate is still used, by the New Yorker, for example. I don't know that there are any words where ÿ is used, so superset might be too strong, but it's certainly not a subset.

(If we're strictly speaking of the alphabet, neither of them count accents, so both French and English have the same 26 letters for the alphabet.)

Little things that matter in language design

Posted Jun 16, 2013 10:49 UTC (Sun) by micka (subscriber, #38720) [Link]

Depending on your sources, the french alphabet is either 26 letters, taking out diacritics, or 42 letters, counting diacritics and ligatures (œ and æ) separately (I suppose, the same as ß). Even the french and english version of "french alphabet" on Wikipedia have different count (error or cultural difference in speciality languages ? I know for example "ring" in mathematics have related but different definitions for french and american mathematicians)

The spanish alphabet is more consistently considered as having 27 letters even though ñ could be considered a n with diacritic. And in the past, even some combination of letters (from the point o view of the latin alphabet) were considered separate letters.

And I don't even talk about http://en.wikipedia.org/wiki/Alphabet_%28computer_science%29 (each diacritic variant would be considered a different letter).

Little things that matter in language design

Posted Jun 16, 2013 5:36 UTC (Sun) by viro (subscriber, #7872) [Link] (1 responses)

You can easily have a text in English with quoted sentences in French or in Turkish, using the same font. Try the same with e.g. Russian and Greek and see if you will be able to read the result[1]. Turkish and French alphabets are Latin with some diacrytics added; current Cyrillic is much more distant from Greek than that, as you bloody well know.

[1] lowercase glyphs aside, (И, Н) and (Η, Ν) alone are enough to render the result unreadable (shift circa 16th century, IIRC; at some point both Eta and Nu conterparts got the slant of the middle strokes changed in the same way, turning 'Ν' into 'Н' and 'Η' into 'И')

Little things that matter in language design

Posted Jun 16, 2013 7:58 UTC (Sun) by khim (subscriber, #9252) [Link]

You can easily have a text in English with quoted sentences in French or in Turkish, using the same font. Try the same with e.g. Russian and Greek and see if you will be able to read the result[1].

Of course you could. What's the problem? You'll be forced to read Greek letter-by-letter probably, but English-speaking person will mangle French or Turkish, too. It's not as if just resemblance letters of the alphabet matters in this case: English and French may use similarly looking characters, but they use them to encode radically different consonants, vowels and words.

[1] lowercase glyphs aside, (И, Н) and (Η, Ν) alone are enough to render the result unreadable (shift circa 16th century, IIRC; at some point both Eta and Nu conterparts got the slant of the middle strokes changed in the same way, turning 'Ν' into 'Н' and 'Η' into 'И')

If you don't know which language is used you can not read your word, period. Identically-looking words in French and Turkish will have radically different pronouncements and will be, in fact, different words.

Little things that matter in language design

Posted Jun 9, 2013 20:09 UTC (Sun) by khim (subscriber, #9252) [Link] (11 responses)

For this purpose it provides UAX #15

Which nobody uses in programming languages because of performance reason.

If you don't do this normalisation step you can end up with a confusing situation where when the programmer types a symbol (in their text editor which happens to emit pre-combined characters) the toolchain can't match it to a visually and lexicographically identical character mentioned in another file which happened to be written with separate combining characters. This would obviously be very frustrating.

It's not as frustrating as you think. They don't type ı followed by ˙, they just type i. And the same with other cases. Any other approach is crazy. Why? Well, because many programming languages will show ı combined with ˙ as "ı˙", not as "ı̇".

You may say that ı˙ is not canonical representation of "i". Ok. "и" plus " ̆" is the canonical representation of "й". Try this for size: $ cat test.c
#include <stdio.h>

int main() {
  printf("%c%c%c%c == %c%c\n", 0xD0, 0xB8, 0xCC, 0x86, 0xD0, 0xB9);
}
$ gcc test.c -o test
$ ./test | tee test.txt
й == й
Not sure about you but on my system these two symbols only look similar when copy-pasted in browser - and then only in the main window (if I copy-paste them to "location" line they suddenly looks differently!). And of course these two symbols are different in GNOME terminal, gEdit, Emacs and other tools!

Thus, in the end you have two choices:

  1. Compare strings as sequence of bytes. Result: simple, clean, robust code, but toolchain can't match [symbol] to a visually and lexicographically identical character mentioned in another file
  2. Compare strings as UAX #15 says. Result: huge pole of complicated code and toolchain can match symbol to a visually and lexicographically different character mentioned in another file

Frankly I don't see second alternative as superior.

Little things that matter in language design

Posted Jun 9, 2013 20:48 UTC (Sun) by hummassa (guest, #307) [Link] (7 responses)

> Which nobody uses in programming languages because of performance reason.

(UAX-15). I use it. Perl offers NFC, NFD, NFKC, NFKD without a huge perceivable (to me) performance penalty. AFAICT MySQL uses it, too.

> It's not as frustrating as you think. They don't type ı followed by ˙, they just type i. And the same with other cases. Any other approach is crazy. Why? Well, because many programming languages will show ı combined with ˙ as "ı˙", not as "ı̇".

This silly example tells me you don't have diacritics in your name, do you? Sometimes the "ã" in my last name is in one of the Alt-Gr keys. Sometimes I have to enter it via vi digraphs, either as "a" + "~" or "~" + "a". Sometimes I click "a", "combine", "~" or "~", "combine", "a". Or "~" (it's combining in my current keyboard by deafult, so that if I want to type a tilde, I have to follow it with a space or type it twice) followed by "a".

> й == й
> Not sure about you but on my system these two symbols only look similar when copy-pasted in browser - and then only in the main window (if I copy-paste them to "location" line they suddenly looks differently!). And of course these two symbols are different in GNOME terminal, gEdit, Emacs and other tools!

it seems to me that your system is misconfigured. I could not see the difference between "й" and "й" in my computer, be it in Chrome's main window, location bar, gvim, or in yakuake's konsole window.

> Frankly I don't see second alternative as superior.

UAX15 is important. People sometimes type their names with or without diacritcs (André versus Andre). Some names are in different databases with variant -- and database/time/platform dependent -- spellings. In some keyboards, a "ç" c-cedilla is a single character, in others, you punch first the cedilla dead key and then "c", and in others you type, for instance, the acute dead key followed by "c" (it's the case in the keyboard I'm typing right now). Sometimes you have to say your name over the phone and the person on the other side of the call must be capable of searching the database by the perceived name. Someone could have entered "fi" and another person is searching by "fi".

So, sometimes your "second alternative" is the only viable alternative. Anyway, the programming language should support "compare bytes" and "compare runes/characters" as two different use cases.

Little things that matter in language design

Posted Jun 9, 2013 21:14 UTC (Sun) by khim (subscriber, #9252) [Link] (2 responses)

Anyway, the programming language should support "compare bytes" and "compare runes/characters" as two different use cases.

I may be mistaken, but it looks like you are discussion completely different problem. Both tialaramex and me are talking about programming langauges themselves.

(UAX-15). I use it. Perl offers NFC, NFD, NFKC, NFKD without a huge perceivable (to me) performance penalty.

Really?. Let me check:
$ cat test.pl
use utf8;

$й="This is test";

print "Combined version works: \"$й\"\n";
print "Decomposed version does not work: \"$й\"\n";
$ perl test.pl
Combined version works: "This is test"
Decomposed version does not work: ""

Am I missing something? What should I add to my program to make sure I can refer to $й as $й?

it seems to me that your system is misconfigured. I could not see the difference between "й" and "й" in my computer, be it in Chrome's main window, location bar, gvim, or in yakuake's konsole window.

Of course not! You've replaced all occurrences of "й" with "й" - of course there will be no difference! Not sure why you've did that (perhaps your browser did that for you?) but if you do a "view source" on my message then you'll see a difference, if you do the same with your message - both cases are byte-to-byte identical. It'll be a little strange to see different symbols in such a case.

UAX15 is important.

Sure. In databases, search systems and so on (where fuzzy matching is better then no matching) it's important. In programming languages? Not so much. Most of the time when language tries to save programmer from himself (or herself) it just makes him (or her) miserable long (and even medium) term.

Little things that matter in language design

Posted Jun 10, 2013 16:12 UTC (Mon) by jzbiciak (guest, #5246) [Link]

Wow... Abusing the difference between й and й (and other cases of such fun) would make for some great obfuscated code. Or better yet, subtly malicious code.

Little things that matter in language design

Posted Jun 10, 2013 17:27 UTC (Mon) by hummassa (guest, #307) [Link]

> I may be mistaken, but it looks like you are discussion completely different problem. Both tialaramex and me are talking about programming langauges themselves.

You are right about this and I apologize for any confusion.

Little things that matter in language design

Posted Jun 9, 2013 23:38 UTC (Sun) by wahern (subscriber, #37304) [Link] (3 responses)

Perl6 also has NFG, which is probably the best normalization form out of all of them, although non-standard. It's not really even just a normalization form, but addresses issues of representation and comparison at the implementation level.

Using NFG solves all the low-level problems, including identifiers in source code, by getting rid of combining sequences altogether. Frankly I don't understand why it hasn't become more common. Maybe because most people just don't care about Unicode. Every individual has come to terms with the little issues with their locale. It's only when you look at all of them from 10,000 feet that you can see the cluster f*ck of problems. But few people look at it from 10,000 feet.

Little things that matter in language design

Posted Jun 11, 2013 1:07 UTC (Tue) by dvdeug (subscriber, #10998) [Link] (2 responses)

NFG isn't a normalization form at all. It doesn't get rid of combining sequences at all; it just invents dummy characters to hide combining sequences from the user. It's not that hard to generate a billion different combining sequences and potentially DoS any system using NFG. Ultimately, it's a lot of complexity for most systems that doesn't gain you that much over NFC.

Little things that matter in language design

Posted Jun 13, 2013 1:19 UTC (Thu) by wahern (subscriber, #37304) [Link] (1 responses)

You can DoS any system that doesn't use the correct algorithms. There are ways of implementing NFG that don't require storing every cluster ever encountered.

And it's not like existing systems don't have their own issues. The nice thing about NFG is that all the complexity is placed at the edges, in the I/O layers. All the other code, including the rapidly developed code that is usually poorly scrutinized for errors, is provided a much safer and more convenient interface for manipulation of I18N text. NFG isn't more complex to implement than any other system that provides absolute grapheme indexing. It's grapheme indexing that is the most intuitive, because it's the model everybody has been using for generations.

But most languages merely aim for half measures, and are content leaving applications to deal w/ all the corner cases. This is why UTF-8 is so popular. And it is the best solution when your goal is pushing all the complexity onto the application.

Little things that matter in language design

Posted Jun 14, 2013 0:22 UTC (Fri) by dvdeug (subscriber, #10998) [Link]

This is the 21st century; for the most part, I don't index anything. I have iterators to do that work for me, and arbitrary cursors when I need a location. If I want to work with graphemes, I can step between graphemes. If I want to work with words, I can step between words.

Grapheme indexing is not what everybody has been using for generations. In the 60 years of computing history, there have been a lot of cases where people working with scripts more complex then ASCII or Chinese have handled it a number of ways, including character sets that explicitly encoded combining characters (like ISO/IEC 6937) and the use of BS with ASCII to overstrike characters like ^ with the previous character.

UTF-8 is so popular because for many purposes it's 1/4th the size of UTF-32, and for normal data never worse then 3/4 the size. And as long as you're messing with ASCII, you can generally ignore the differences. If people want UTF-32, it's easy to find.

Little things that matter in language design

Posted Jun 10, 2013 16:58 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (2 responses)

The idea that programming languages don't use UAX #15 for symbol matching due to performance problems would be an easier sell if UAX #15 anywhere near to approached the difficulty of something like C++ symbol mangling.

You seem to be suffering some quite serious display problems with non-ASCII text on your system, I don't know what to suggest other than maybe you can find someone to help figure out what you did wrong, or upgrade to something a bit more modern. I've seen glitches like those you describe but mostly quite some years ago. Your example program displays two visually identical characters on my system but I can believe your system doesn't do this, only I would point out that it's /a bug/.

Even allowing for that your last paragraph is hard to understand. Are you claiming that because on your system some symbols are rendered incorrectly depending on how they were encoded those symbols are _different_ lexicographically and everybody else (who can't see these erroneous display differences) should accept that?

Little things that matter in language design

Posted Jun 11, 2013 9:07 UTC (Tue) by etienne (guest, #25256) [Link] (1 responses)

Just a $0.02:
> You seem to be suffering some quite serious display problems with non-ASCII text on your system

It seems (some) people want to use a fixed-width font to write programs, mostly because some Quality Enhancement Program declared the TAB character obsolete, and SPACE character width is not a constant in variable-width fonts editors.
Most software language needs indentations...
With non-ASCII chars in fixed-width font, if you even get the char shape in the font you are using, the only solution is probably to start drawing each char every N (constant) pixels and have the end of large chars superimpose with the beginning of the next char...

Little things that matter in language design

Posted Jun 11, 2013 10:13 UTC (Tue) by mpr22 (subscriber, #60784) [Link]

I use a fixed-width font to write code chiefly out of pure inertia: most of my coding is done in text editors running in character-cell terminals. Code written in Inform 7 is an exception (the Inform 7 IDE's editor uses a proportional font by default, and the IDE is so well-adapted to the needs of typical Inform 7 programming that not using it is silly), but Inform 7 statements look like (somewhat stilted) English prose so I don't mind so much.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds