- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Mon, 18 Nov 2013 14:48:19 +0100
- To: ht@inf.ed.ac.uk (Henry S. Thompson)
- Cc: Martin J. Dürst <duerst@it.aoyama.ac.jp>, IETF Discussion <ietf@ietf.org>, JSON WG <json@ietf.org>, Anne van Kesteren <annevk@annevk.nl>, www-tag@w3.org, es-discuss <es-discuss@mozilla.org>
* Henry S. Thompson wrote:
>I'm curious to know what level you're invoking the parser at. As
>implied by my previous post about the Python 'requests' package, it
>handles application/json resources by stripping any initial BOM it
>finds -- you can try this with
>
>>>> import requests
>>>> r=requests.get("http://www.ltg.ed.ac.uk/ov-test/b16le.json")
>>>> r.json()
The Perl code was
perl -MJSON -MEncode -e
"my $s = encode_utf8(chr 0xFEFF) . '[]'; JSON->new->decode($s)"
The Python code was
import json
json.loads(u"\uFEFF[]".encode('utf-8'))
The Go code was
package main
import "encoding/json"
import "fmt"
func main() {
r := "\uFEFF[]"
var f interface{}
err := json.Unmarshal([]byte(r), &f)
fmt.Println(err)
}
In other words, always passing a UTF-8 encoded byte string to the byte
string parsing part of the JSON implementation. RFC 4627 is the only
specification for the application/json on-the-wire format and it does
not mention anything about Unicode signatures. Looking for certain byte
sequences at the beginning and treating them as a Unicode signature is
the same as looking for `/* ... */` and treating it as a comment.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Monday, 18 November 2013 13:48:50 UTC