- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Tue, 28 Apr 2009 16:31:28 +0100
- To: www-tag@w3.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
There are currently five documents in this space (that I am aware of):
[URI] The current RFC governing URIs:
http://tools.ietf.org/html/rfc3986
[IRI] The current RFC governing IRIs:
http://tools.ietf.org/html/rfc3987
[IRI-BIS] The most recent draft of a planned update for the RFC
governing IRIs:
http://tools.ietf.org/html/draft-duerst-iri-bis-04
[LEIRI] A W3C Note defining Legacy Extended IRIs (extracted from [IRI-BIS]):
http://www.w3.org/TR/leiri/
[WEBADDR] A preliminary draft of a possible RFC for Web Addresses
(extracted from HTML5 [1]):
http://www.w3.org/html/wg/href/draft.html [not yet in RFC format,
converted version expected
RSN]
On the TAG telcon of 2009-04-17, there was some sense that this is too
many specs in the same space. . .
In order to contextualize and perhaps stimulate a possible effort to
seek a rationalization here, here's _my_ understanding of how we got
here.
[URI] is the mature stage of a spec. which has been revised a number
of times. It carries a certain amount of historical baggage with
it, particularly its restriction to 7-bit characters, but that also
ensures wide interoperability and preserves access to legacy
applications.
[IRI] was intended to address the needs of the expanding Internet
and Web community, allowing most of Unicode into most parts of IRIs.
Rather than require upgrades in a wide range of applications and
uses, it did not set up IRIs as a _replacement_ for URIs across the
board, but as a _complement_ to URIs. It therefore included an
explicit trancoding algorithm, for converting IRIs to URIs.
[IRI-BIS] was initiated by the editors of [IRI] to correct several
errata to [IRI] and to address the exclusion from [IRI] of certain
characters and character ranges.
[LEIRI] had its origins in the XML family of W3C specifications.
The XML specification itself [2], as well as a number of other
XML-related specifications (including XML Base, XML Schema, XPointer
Framework, XML Signature) all involve appeal to a process for
converting arbitrary strings which are intended to identify web
resources into URIs. They all incorporate more-or-less identical
prose excerpted from the XLink specification [3] which specifies how
this is to be done.
The XML Core WG has long been unhappy with this state of affairs,
and the impending release of new editions of several of these specs
encouraged the WG to try to establish a single normative reference
for the concept of a string for identifying web resources in XML
documents and a process for converting them to URIs, which
acknowledged and built on the IRI specification.
After drafting a document to serve this purpose, discussion with the
editors of [IRI-BIS] convinced all concerned that since a new
version of the IRI spec was already in progress, the best thing to
do, to respect precedent and to avoid unnecessary proliferation, was
to include the relevant definitions in [IRI-BIS], and in fact that
has been done [4]. Once it became apparent, however, that the
progress of [IRI-BIS] to Draft Standard status was likely to be
considerably delayed for reasons outside its editors' control, the
Core WG, with the agreement and co-operation of the editors of
[IRI-BIS], published [LEIRI] as a Working Group Note, so that the
re-issue of new editions of the relevant XML-familty specs could go
ahead. The intention is to issue a revision of [LEIRI] replacing
its contents with a reference to [IRI-BIS] as soon as [IRI-BIS]
becomes a Draft Standard.
[WEBADDR] had in some ways a similar origin to [LEIRI], starting out
as a section of the HTML5 spec which addressed the process by which
existing browsers process strings to produce URIs which can be
dereferenced. It differs from [LEIRI] in the exact set of
characters which it escapes, and in the special handling it mandates
for the encoding of characters in the 'query' part of a URI.
I am sure that the above summaries can be improved. In particular it
would be helpful have clear statements from their respective
authors/owners as to what the _requirements_ for the three new
documents ([IRI-BIS], [LEIRI] and [WEBADDR]) are. Only after we have
those would it make sense to turn to the question of whether we can
merge some or all of them.
ht
[1] http://dev.w3.org/html5/spec/Overview.html#urls
[2] http://www.w3.org/TR/xml/#dt-sysid
[3] http://www.w3.org/TR/2001/REC-xlink-20010627/#link-locators
[4] http://tools.ietf.org/html/draft-duerst-iri-bis-04#section-7
- --
Henry S. Thompson, School of Informatics, University of Edinburgh
Half-time member of W3C Team
10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
iD8DBQFJ9yFQkjnJixAXWBoRAq5tAJwMb/0jpU6XwLbYNqyt2s4uNwTcQACdHx4B
F/J04oFFOeDHZLTT9Y0qkT0=
=f6+L
-----END PGP SIGNATURE-----
Received on Tuesday, 28 April 2009 15:32:03 UTC