- From: Larry Masinter <masinter@adobe.com>
- Date: Tue, 14 Jul 2009 15:45:59 -0700
- To: Doug Schepers <schepers@w3.org>
- CC: "www-tag@w3.org WG" <www-tag@w3.org>
(I'm cc'ing www-tag rather than public-html since my
comments are relative to the TAG versioning issue, and
I don't have much confidence that the HTML-WG wants to
do more about versioning than get a report from the TAG.)
Doug, I like this approach. I think it fits into the general
direction we're taking in the TAG to deal with "versioning".
It's not clear to me whether the global indicator that the
language "version" you want has strict parsing belongs as a
separate attribute. It's not clear whether the situation where
old user agents would ignore parsing="strict" and allow
non-strict content to be parsed would lead to the rise of
material that is marked "strict" but in fact only works with
"loose".
I also wonder whether there are more kinds of strictness than
just "parsing" strictness that are also useful, and how many
strict/non-strict version indicators we would need to capture
them.
For example, are there "strict" versions of JavaScript APIs which
don't allow access to document.root, for example, to allow
for mashups?
Just trying to get the issues out on the table, with the idea that
a holistic approach to versioning might also deal with strict vs
loose parsing.
Larry
--
http://larry.masinter.net
-----Original Message-----
From: tag-request@w3.org [mailto:tag-request@w3.org] On Behalf Of Doug Schepers
Sent: Tuesday, July 14, 2009 12:16 AM
To: public-html@w3.org
Subject: Proposal: @parsing="loose | strict"
Hi, HTML WG-
There are advantages and disadvantages to both the strict ("draconian")
and error-correcting parsing of markup. HTML evolved to have loose
parsing with undefined and browser-specific error correction, and XML
was designed and well-defined to have strict parsing (probably as a
reaction to the chaotic HTML approach).
We have come full circle on the matter, and the HTML5 spec marries many
of the advantages of both approaches, by offering a well-defined
error-correction model. This has the advantage that it is sometimes
easier to author (though it can make debugging more difficult), the more
profound advantage that it hides problems from the reader, and the even
more important advantage that it is more or less how browsers already
parse HTML documents.
However, it cannot gracefully address all the situations in which strict
parsing is an advantage:
* For authoring, it is often useful to know when you have validity or
well-formedness errors, which helps debug script and CSS, and doing this
on the fly in the browser is faster and easier while developing than
reiterative validation with a separate tool;
* Strict markup works predictably for mashups and mixtures of different
markup languages;
* Draconian error handling enforces structure and content models for
mission-critical applications, such as the canonical "financial
transactions" example, where the reader *wants* to know about problems
in the markup [1], and for use cases that are low-tolerance for
potential errors (such as the government and some industries).
To meet this need, I propose a new attribute, 'parsing', which, when
placed on the document root, defines the type of parsing which a UA must
use when parsing the document. The values would be "loose" and
"strict", with loose parsing as the default (an omitted @parsing
attribute would result in loose parsing).
When the parsing is loose, the error-correction algorithms defined in
HTML5 must be applied; when the parsing is strict, there must be no
error-correction (as is commonly the case for XHTML in most browsers).
This way, authors could optionally enforce strictness when they want or
need to, and then change/remove the value when they are ready for
publication, or when the needs change. It is possible that there would
be instances where strict parsing makes it out of development and into
production code, but this would have relatively few negative
consequences (the kind of author who uses this would probably product
strict code anyway, and would know it if they didn't), and would be
easily corrected. And, quite frankly, some people simply prefer
stricter parsing for aesthetic or whatever, and this would provide them
with that option while not imposing it on others.
Had this option been available in XML from the beginning, many problems
and community schisms may have been avoided. I believe that presenting
the option for strict parsing may change how the various communities
approach HTML5, and avoid further schisms. I see this as having
relatively low costs for the specification, and very little
implementation cost, since browsers will already have both modes (even
IE has a built-in XML parser, though it doesn't use it for XHTML).
Please correct me if my assumption here is wrong.
I also believe that this is backwards-compatible, since the default will
be loose parsing as is already applied, and forwards-compatible, since
any alternate future parsing models (such as the proposed XML2 or XML5,
or some use case we don't see today) can be specified as the value for
@parsing in a later specification without changing how it would be used
as defined in HTML5. It may lay the groundwork for a new formulation of
error-correcting XML, as Anne proposed.
I'm hoping that the dust has sufficiently settled about the parsing
debate that we can hold a logical discussion of this proposal on its merits.
(Meta: I chose the keywords of the attribute and values for brevity, and
I'm not at all married to them; treat them as placeholders for the
purposes of discussing this proposal; another option might be something
like @error-correction="true | false". Please don't suggest different
names quite yet unless they represent a functional difference to this
proposal. Also, I've BCC'ed the TAG just so they know.)
[1] http://www.tbray.org/ongoing/When/200x/2004/01/11/PostelPilgrim
Regards-
-Doug Schepers
W3C Team Contact, SVG and WebApps WGs
Received on Tuesday, 14 July 2009 22:46:40 UTC