- From: John Cowan <cowan@mercury.ccil.org>
- Date: Mon, 18 Nov 2013 13:41:53 -0500
- To: Pete Cordell <petejson@codalogic.com>
- Cc: Tim Bray <tbray@textuality.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, "Henry S. Thompson" <ht@inf.ed.ac.uk>, IETF Discussion <ietf@ietf.org>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, www-tag@w3.org
Pete Cordell scripsit:
> Do you mean that the presence of a UTF-8 BOF sequence doesn't prove
> that it's not Windows cp-1252 or do you mean you can tell apart a
> UTF-8 and cp-1252 file without BOMs?
I meant the latter, but the former is true, too. A plain text document
beginning "" in Windows-1252 will appear to begin with an 8-BOM
in the absence of out of band information.
> If the latter, do the relevant tools take the time to distinguish
> the 2 without BOMs?
Some tools do, some don't. The IRC client I use, XChat, attempts to
convert input as UTF-8, and if that fails, converts it as Latin-1.
I have not yet seen it produce mojibake.
--
John Cowan cowan@ccil.org http://www.ccil.org/~cowan
Most languages are dramatically underdescribed, and at least one is
dramatically overdescribed. Still other languages are simultaneously
overdescribed and underdescribed. Welsh pertains to the third category.
--Alan King
Received on Monday, 18 November 2013 18:42:29 UTC