The Tao of HTML 5

This is a very old article. It has been imported from older blogging software, and the formatting, images, etc may have been lost. Some links may be broken. Some of the information may no longer be correct. Opinions expressed in this article may no longer be held.

On the 10th of June 1215, the a group of English barons invaded London and five days later forced King John to attach his seal to the Magna Carta in Runnymede, on the border of modern-day Sussex and Berkshire. (In those days it was customary to attach ones seal to an agreement rather than sign it. However the fact that it was not signed has led to a popular misconception that King John was illiterate, when in fact he was not.)

The Magna Carta was a key document in English constitutional law, establishing certain rights (such as habeas corpus) for the King’s subjects, and limiting the rights of the King; importantly, requiring the King to obey “the law of the land”. The Magna Carta is widely regarded as a major influence on world constitutional law, and in particular greatly influenced the United States constitution and Bill of Rights. Three clauses of the Magna Carta remain in force in English law today:

the freedom of the Church of England;
the “ancient liberties” of the City of London; and
the right to due process.

Although the other clauses have been repealed, they have strongly influenced the Acts of Parliament that replaced them.

(It having been signed under duress, Pope Innocent III gave his blessing for King John to renounce the Magna Carta. King John’s successor, Henry III, reaffirmed it.)

Fast-forward nearly 800 years, and we see that Apple, Mozilla and Opera, key members of WHATWG are pressuring the W3C to bless HTML5 as the successor to current versions of (X)HTML. I’ve been watching the development of the W3C’s XHTML 2.0 and WHATWG’s alternative markup format for several years, and thought I’d share my thoughts on them.

XHTML 2.0

When the W3C commenced work on this standard, it decided that it would allow itself to significantly backwards compatibility in a way that previous (X)HTML standards hadn’t. In a way, this was needed. There are many aspects of the current standards that are regarded by many as flawed. Examples include:

too much use of empty elements such as

and
to specify divisions in content;
remaining legacy presentational elements, such as , , , and ; and lack of a method to specify metadata which applies to part of a document as against the whole document.

Early drafts included several big wins. New Element The element in earlier versions of (X)HTML was a source of frustration to many authors. Although the standards said that authors should not include quote marks around the quoted portions of text, and that user-agents should add them automatically, in a manner fitting with the surrounding text's locale, in practice, many agents did not insert quote marks, and when they did, often inserted the wrong type. To make matters worse, many user-agents also lacked support for the parts of CSS that effect quoting. Using a complicated system of hacks, it was possible to use in a manner that worked in most browsers, but due to the difficulty, most authors did not use it. The new draft swept these problems aside, by replacing the element and specifying that user agents must not automatically quote its contents, and that it was the responsibility of authors to do so, either directly, or via stylesheets. This would enable the element to be handled correctly by an XML+stylesheets user-agent without even having to know anything in particular about XHTML. Removal of and This was easily predicted. Although neither of these elements was formally deprecated in previous specifications, they were widely considered to be throw-backs to HTML's distant past, having no place in a fully-semantic markup language. On the W3C's HTML mailing list, numerous edge cases were proposed where they (and in particular) could be considered justified. Ships' names, words written in foreign languages and Linnean taxonomy terms. However, at least in my own opinion, these are not valid arguments to keep , but rather arguments in favour of elements to mark up Linnean taxonomy, &c. Early drafts of XHTML 2.0 removed these elements, along with and . An Improved Mechanism for Markup up Line Breaks The element was replaced with an element. While is an empty element sitting between two lines of text, instead surrounds a line. Compare: I think that I shall never see A poem as lovely as a tree. I think that I shall never see A poem as lovely as a tree. There is something wonderfully symmetrical about the proposed new method. A Whole New Paradigm for Headings In one of the more radical changes, a new paradigm was introduced for headings. What would in HTML 4 have been: Main Heading Foo. Subheading Bar. Third Level Heading Baz. Another Subheading Quux. became the following under the new system: Main Heading Foo. Subheading Bar. Third Level Heading Baz. Another Subheading Quux. or maybe: Main Heading Foo. Subheading Bar. Third Level Heading Baz. Another Subheading Quux. The W3C was never really clear on whether a heading should be inside its section or outside. Consistancy with and would lead one to assume that it should be inside, but the examples given in the drafts seemed to suggest that the reverse might be the case. Either behaviour has advantages over the older system: Allows more than six levels of heading; and Makes it easier to transclude a document or section of a document into another file without having to manually adjust heading levels. However, for "backwards compatibility" (despite the fact that XHTML 2 isn't supposed to be backwards compatible), the to elements are kept. Their exact relation to is never explained. Navigation Lists In addition to the unordered, ordered and definition lists of previous specifications, XHTML 2.0 introduced a method for nested navigation menus., with a suggestion that user agents implement them as a drop-down menu. As drop-down nested menus are a common navigation feature on websites, often involving convoluted CSS and Javascript, it is immediately clear why this would be a popular suggestion. An example of their use follows: Contents Introduction Terms May Must Should Conformance References ... Notice in the above example, that the href attribute is applied directly to the element, and no element is needed. That's because... Everything's a Link! Well, not everything. Your mouse wouldn't have a place to rest. But virtually every element is allowed to take an href attribute and become linkified. still exists, but there's nothing special about it anymore. With href inevitably comes type to specify what sort of thing is at the other end of the link. (Is it a JPEG image? An MP3 audio file?) And that's not all... Everything's an Image... err... Object. Having distributed the special powers of the element, in the next draft, the W3C did the same to and