IE-friendly HTML5

I’m learning about HTML5 at the moment, and my current homework assignment is to retrofit a page in such a way that it uses valid HTML5 but also works in IE6:

Make it validate as HTML5
Make sure you honor the semantics of HTML5 as well
Make sure that whatever you use would also work in IE6 and up

Sounds fine in theory, but that last one’s actually a bit tricky, because IE6 and IE7 don’t know about HTML5 elements.

HTML5 + IE6 + IE7 = :-(

So if you do something like this:

[file lang="html" link="on"]code-snippets/ie-friendly-html5-1.txt[/file]

then IE thinks you mean

[file lang="html" link="on"]code-snippets/ie-friendly-html5-2.txt[/file]

which is totally not what you mean.. and your lovely styled HTML5 header will now collapse because it’s basically empty.

Known workarounds: CSS or JavaScript

There are workarounds, but to be honest, I’m not about to put my fellow developers through that kind of CSS hell to maintain a project that I’ve built, and the popular JavaScript solution is not appealing to me because I don’t like to rely on something as easy to disable as JavaScript is.

There’s also Chrome Frame, which is currently still in beta, and relies on users having adequate permissions, or the technical know-how to install it.

A PHP workaround

So I got my hands dirty with a bit of PHP to come up with something that, if not immediately usable on a busy high-traffic site, is at least going to let me do my homework with the minimum of fuss:

A function

[file lang="php" link="on"]code-snippets/pseudo_html5.inc.php.txt[/file]

An implementation

[file lang="php" link="on"]code-snippets/pseudo_html5_buffer.txt[/file]

How it works

This is pretty straightforward, and it would have to be, because I’m not a PHP-coder. Tracking down some appropriate Regex took a while and it’s possible that there is a cleaner solution out there, but it seems robust as-is.

  1. There is an array of HTML5 elements, and the HTML4 elements they should map to.
  2. There is a dash of Regex to find the HTML5 tags that need to be replaced.
  3. When the page is served, the page HTML is rendered into a buffer.
  4. We then loop through our array of HTML5 tags and replace any matches in the buffered HTML.
  5. Then we empty the buffer and output the updated HTML.

Only for IE6-7

To prevent a redundant function call by all of our HTML5-ready visitors.

  1. We test whether the page is being viewed in IE6 or IE7 by checking the user-agent string
  2. If the browser is IE6 or IE7, we run the function. If it’s not, we don’t.

Caveats

Did I mention that I’m not a PHP-coder? I’m sure this could be done better by someone who is, but, regardless, it works as a proof-of-concept.

Some things to be wary of with this approach:

  1. You certainly wouldn’t want the overhead of running the replacement every time the page was viewed.
    My thoughts are that this process would be triggered by a user viewing the page – if the page was already cached, the cached version would be used. If not the tag rewriting would occur and then the page would be cached. Or maybe a cron job could be added to automate this dual-browser caching.
  2. The user agent string can easily be changed in many popular browsers. My personal feeling is that this is most likely something that only a power user (or developer) would do. Regardless, the worst possible outcome for these users is that they will experience well structured pseudo HTML5 instead of fully semantic HTML5.

Feel free to flame me in the comments with any other downsides that I’ve missed ;-)

HTML/CSS Syntax

To overcome browser differences in the stylability of HTML5 elements, I’ve used CSS class names which mirror the HTML5 tag names.

While this makes the markup a little more bulky, it means that CSS can be authored once and applied to all browsers, making it easier to maintain.

It also has the advantage that the class names are visible to and usable by all developers working on a project, not just those focussing on IE compatibility.

And finally, it is a ‘means to an end’, the ‘end’ being the improved document semantics available in browsers which support HTML5, and a graceful fallback in those that don’t.

HTML5

[file lang="html" link="on"]code-snippets/ie-friendly-html5-3.txt[/file]

Pseudo HTML5

[file lang="html" link="on"]code-snippets/ie-friendly-html5-4.txt[/file]

Shared CSS

[file lang="css" link="on"]code-snippets/ie-friendly-html5-5.txt[/file]

Validation

Because div and span elements still exist in HTML5, both the ‘real’ HTML5 and the ‘pseudo’ HTML5 will validate as HTML5.

A demo

If you made it down this far, it’s only fair that you should see a demo of this in action.

Cheers.

Hacking CSS

I’m having a quieter-than-usual week at work this week (which is just as well considering that I spent 90 minutes of today with no internet connection thanks to a ghost in my machine/router..). So it was a bit of a shock to the system to have to swing into action mid-afternoon to tackle a bug that was crashing IE6 when it attempted to load a previously-tested-and-working web page that was supposed to go live the very next day.

My first thought was that maybe an automatic Windows Update had somehow adversely affected my install of IE6. That’s not usually my first thought, but for a previously tested and working HTML page to suddenly start crashing all by itself? Nup, no way was it my CSS. To be honest it was just as unlikely that it was a Windows Update that was at fault either, but it was worth a shot.

So I went into my XP Control Panel (“Add or Remove Programs”) and Googled the last couple of Windows updates.

Security Update for Windows XP (KB937143), installed on 15/08/2007 turned up some worrying info – apparently some guy in Germany had noticed that IE6 had started to crash “while accessing at least one site”, after installing the GDR version of the update. Maybe I was on the right track?

After a bit more Googling, I arrived at the MS support page for MS07-045: Cumulative Security Update for Internet Explorer.

Completely missing the ‘IT professionals‘ link at the top of the page (maybe that says something) and finding nothing useful anywhere else, I ended up Googling the update name and got to a page about Microsoft IE Crafted CSS Unspecified Memory Corruption, which told me of
“a flaw that may allow a malicious user to gain the same user rights as the logged in user. The issue is triggered when IE parses certain strings in CSS. It is possible for a malacious person to construct a specially crafted website which could remotely execute code on the visitor’s computer.”

I wondered who these ‘malacious’ people were? And why didn’t they just stick to the accepted label of ‘malicious’? Clearly they were social outcasts, determined to make my life harder by forcing me to rewrite my CSS so that it didn’t look like it was attempting to trick people into giving up their Facebook passwords.

A bit more Googling and a bit more info from this page which explained that “The specific vulnerability exists due to improper parsing of HTML CSS ‘float’ properties. By ordering specially crafted ‘div’ tags in a web page, memory corruption can occur leading to remote code execution.”

To cut a long story short, the page in question was debugged after I followed up the notion of improperly parsed ‘float’ properties and ended up at an old post on Eric Meyer’s site. This kicked my brain into revisiting the CSS, and after some playing around, I resolved the issue by adding a couple of lines of otherwise unnecessary CSS, to prevent a floated list from bringing the house down in IE6.

I still don’t know whether Microsoft or KB937143 were at fault, or whether the bug was always there and just waiting for the slightest encouragement, but the afternoon’s debugging has made me feel that much less secure about developing CSS for Internet Explorer.