[go: up one dir, main page]

|
|
Log in / Subscribe / Register

Little things that matter in language design

Little things that matter in language design

Posted Jun 14, 2013 0:22 UTC (Fri) by dvdeug (subscriber, #10998)
In reply to: Little things that matter in language design by wahern
Parent article: Little things that matter in language design

This is the 21st century; for the most part, I don't index anything. I have iterators to do that work for me, and arbitrary cursors when I need a location. If I want to work with graphemes, I can step between graphemes. If I want to work with words, I can step between words.

Grapheme indexing is not what everybody has been using for generations. In the 60 years of computing history, there have been a lot of cases where people working with scripts more complex then ASCII or Chinese have handled it a number of ways, including character sets that explicitly encoded combining characters (like ISO/IEC 6937) and the use of BS with ASCII to overstrike characters like ^ with the previous character.

UTF-8 is so popular because for many purposes it's 1/4th the size of UTF-32, and for normal data never worse then 3/4 the size. And as long as you're messing with ASCII, you can generally ignore the differences. If people want UTF-32, it's easy to find.


to post comments


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds