Home arrow Code Validation
Website Page Code Validation Guide



Website Page Code Validation Guide

Here is a two-page guide to basic page code validation. Validating your website pages is the first step toward web standards compliance and a quality website.

Part 1 - The background to page code validation; web standards compliance; quirks mode; HTML or xHTML  [this page]
Part 2 - How to validate web pages

Part 1 - The Background to Validation

Web standards

Web standards are the 'rules and regs' that lay out the way web page code is written and how browsers interpret it. In the end this determines how websites work. It seems reasonable that if you write some code that a browser will be trying to turn into a web page, then the code should be normal and correct language for that type of code - ie in a standard format and without errors. If you want to write French then it is no good if one word in three is misspelt or in Greek.

So getting the basic page code right is the first step toward web standards compliance. After this come accessibility and semantic markup - but let's start out simple. Standards compliance means that everybody sees the same page; people with older and less powerful equipment are not excluded; people with disadvantages are not excluded; pages work correctly on any computer, on any platform; and pages don't behave in strange ways and freeze or lock up your computer. If you ignore web standards then anything can happen, and often does. It also means that an easy, quick and cheap route has been taken in order to publish the content, and this policy means low quality is the main feature of the website.

Complying with accepted web standards is the foundation of a quality policy. It is impossible to maintain any form of quality control if you ignore web standards. In every case where you find evidence of disregard for standards, you will also find all sorts of other faults and evidence of low quality. Quality starts with standards compliance and in our experience, where it is ignored, you will face other more serious problems.

Standards compliance is an important test of both basic website construction knowledge, and perhaps more importantly, attitude.

Central control

If you have a type of code, such as PHP for example, there needs to be someone somewhere who decides on how the code is written and what it does. This seems reasonable and logical. For PHP, then, there is a central organisation that organises this aspect for all PHP coders. In fact they couldn't work without this central body, it would be chaos and anarchy otherwise.

So you can perhaps see that a central body to decide standards for code is not only a good idea - it's essential. The central body having been established, it wouldn't be much use if all their work was ignored. So it is up to us to follow their recommendations and ensure that our code is written correctly.

For web page code the central authority is the W3C, WorldWideWeb Consortium. They are universally recognised as the central authority for web standards, and they decide how code is written and interpreted by browsers. The Director is Tim Berners-Lee, who helped to shape the Internet as we know it today (he invented the web browser and code).

They have provided some useful tools for us to establish whether or not our code is correct. The most popular of these is the basic page code validator at:
http://validator.w3.org

Code validation

If you enter the domain name of your site in the text box there (eg www.a3webtech.com), then the test will tell you if your index page (front page or home page) code is correct or not; and if faulty, what the errors are.

If you are lucky the page will pass with a green icon and a congratulatory message. The reality is though that you will have some errors to fix.

The first thing to note is that there are many code variations that browsers can read - but you have to tell them what one you are using. In other words, if you have written in English, you say "This page is in English."

The equivalent statement in web page code is the Doctype, or document type declaration. The most common are HTML or xHTML. There are Strict or Loose (aka Transitional) versions of each. If possible you should always use the Strict type as it is true and correct code. The Transitional type is a relaxed specification that puts the browser into quirks mode. The HTML Transitional type is a low-quality mode that is used when there is no alternative, or when a shortcut must be taken. It is not equal to HTML Strict and is an 'economy' alternative.

Quirks mode

This is a backward-compatibility mode, designed to allow older or faulty browsers such as Internet Explorer 5 or 6 to work as correctly as possible. Some commands and many errors are ignored, and  an 'average'  page results. The page may not display correctly and it is a cheat method. However, it can be used in cases where a web application creates its own version of code that it is difficult or impossible to correct perfectly.

If you do not include a Doctype; or if the formatting of this statement is incorrect; or if there is an XML prolog; or if there is anything before the Doctype - then a browser may be put into quirks mode. In this mode, code has errors but they are of a type that browsers can accommodate as long as they know what they are.

Here is an HTML Doctype:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">


It should be the first item in the source code. Nothing should come before it.

Here is an xHTML Doctype:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">


If you place an XML prolog before it then some browsers will be thrown into quirks mode. In theory, IE6, 7 and 8 will be, but others won't. An XML prolog looks like this:

<?xml version="1.0" encoding="utf-8"?>

Here is the previous xHTML Doctype with an XML prolog that means IE browsers will go into quirks mode:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">


However, in theory quirks mode can be triggered in IE simply by putting anything at all before the Doctype, such as a commented-out line. So if your page crashes in IE6, you can always try putting it into quirks mode by one of these tricks - it might make the page work. Here is a commented-out ('invisible') line:

<!-- quirks mode for Internet Explorer -->

Here is an HTML Doctype with a quirks mode comment for IE browsers:

<!-- quirks mode for Internet Explorer -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">


This might assist faulty browsers to read a page correctly when otherwise they cannot cope.

Browser functional modes
Modern correctly-functioning browsers have either two or three operational modes:
  • standards mode (ie standards-compliant, strict) - pages are interpreted strictly according to the rules. This is how a page will be read if the correct Doctype is used. In theory, then, a page will always look the same on any browser, on any platform. It will always be interpreted correctly.
  • quirks mode - a mode that attempts to read the page as an old and faulty browser would, and introduces various quirky ways of interpreting the page that these browsers might have used.
  • transitional (aka, occasionally, strict - which is incorrect and a bad misnomer) - a halfway mode between the main two.

Browser checks

If you build web pages you need more than one browser and more than one screen. Otherwise, some people may not see what you see. One way to check your pages in a variety of other browsers is to use an online checker such as this one:
http://browsershots.org

When building pages it is preferable to check them in a standards-compliant browser such as Firefox. This means that as many other people as possible will see the page correctly. If you cannot really comprehend this and think only IE versions are important, then perhaps you should take a look at the intro page of Browsershots.org, and you will see that the web is a big place and IE doesn't really have too important a role - and it is diminishing every year. It is wrong to build a page for faulty browsers because people with proper, correctly-functioning equipment may be penalised.

Guide to Browsershots.org

It will probably help you if we include a brief guide to using Browsershots. This is a fine site and extremely useful. Please donate to them or use their sponsors if possible.

1. Enter your site name last of all - it removes the temptation to hit 'Go' without altering the defaults. If you do this, it will order a screenshot of every browser on every platform in existence. Only a developer would know why this is the default configuration, the rest of the world would assume that - for various reasons - IE6 on Windows would be the single, default browser selected.

2. First go down to the foot of the browser table, and select NONE.

3. Then go up to the browser list and check off the ones you need to see. Resist the temptation to view 93 of them because the operation will time out or take 4 days.

4. For normal, everyday web design work I suggest you check these ones:

On Linux:
  • Firefox 3.5
On Windows:
  • Firefox 3.0
  • Firefox 3.5
  • MSIE 6.0 [Internet Explorer 6]
  • MSIE 7.0
  • MSIE 8.0
  • Opera 9.64
On MacOS:
  • Firefox 3.5
  • Safari 4.0

5. Enter your site name, and hit go at the right. You have 30 minutes for the sites to load, which will time out in busy periods. You need to come back before 30 minutes is up and extend the time period. The browser screenshots load up automatically and refresh the page as they do. If one or more is delayed, you'll need to extend the time or you'll lose it, and have to start again.

The browsers listed above are all you need to worry about. You may be interested to know that K-meleon (the stripped-down Firefox version) and Camino (the Mac browser) both have more users than some of the other choices here, and so probably does Google Chrome. As a K-meleon user I'm afraid that it does need a Browsershots check, as it crashes the occasional page that is fine in other browsers.

HTML or xHTML?

Which one to choose? First check your page source code for a Doctype. If it's there, use that one. Modern web applications tend to use xHTML, extended HTML, but straight web pages are more likely to use HTML.

If you have a choice, and are using flat web pages (ie not a CMS etc), I recommend you use HTML and not xHTML. That is for several reasons such as:

1. It's much easier to code for HTML, as it is more basic and there are far more resources available.
2.  If you are reading this and need the advice, you are not an expert -- so use HTML.
3. There is no difference whatsoever in practical terms.
4. Static web pages are best coded in HTML not xHTML.
5. xHTML is best for dynamic (database-driven) applications.

Anyway - if there is no Doctype, move to the next stage:

HTML source code check
HTML allows uppercase text in the code.

HTML may not use CSS fully and may have font tags within the page code. In the following example there are uppercase letters, with a mix of upper and lower case. There is a font tag. The bold tag is closed in HTML style. The line break tag is clearly in HTML format:

<FONT CLASS=Arial-16pxb><b>Stuff</b><BR>

This is clearly HTML.

xHTML source code check
xHTML uses all lowercase text. There is no uppercase text in xHTML code apart from the Doctype - which is always uppercase - and which is not seen as part of the page code.

xHTML uses a specific type of tag closure, like this:

<img src="/images/boat.gif" alt="Nice Boat Picture" />
<br />  


This is clearly xHTML.

The tag ending slash comes AFTER the command. In HTML this command has no slash anyway. You can also see the space before the slash. The uppercase initial capitals in the text are in some text that 'prints' - it's on-page text and not part of the code. That alt. text phrase shows instead of the image, or before it loads, and/or on the status bar. In IE browsers it shows on hover - which is a useful (non-standard) feature and probably the only good point they have.

So: first check the Doctype. Then look for clues in the code. It will be very hard to get a document to validate to the wrong Doctype as the errors will be more numerous, and harder to correct. Lots more work.

First, look at whether the code text includes capital letters. If it does, it's HTML and not xHTML. Then, place an HTML Strict Doctype on the page and try to get it to validate. If there are errors, ask the validator to try against a Transitional Doctype. If the results are better (as they are quite likely to be), then you can decide whether you wish to take the easy way out. If so, change to a Transitional Doctype and continue. Correct the errors one by one and you'll see the total dropping away quickly. Many errors are cascading - that is to say, if you fix three, then another ten will disappear as they were caused by the first three. So a page with 100 errors may need many less than this to be fixed, to correct it. Which is good news if you're faced with a lot!

We advise that if your code has no Doctype, and has no xHTML features such as the obvious closing tags, then go for an HTML Doctype. Try Strict first, then Transitional if you want an easier job. Strict is the better and more correct choice.

Strict v Transitional Doctype

The best code always uses the Strict doctype - because only pages that validate to a Strict doctype are correct code. The Loose or Transitional doctype is a 'nearly-right', second choice. They are widely used of course, and that's fine as at least the code is known and correct acording to a set of standards if it validates to the Transitional type. It's just that Strict is perfect code.

Strict v Transitional Doctype for SEO
In theory, there is no difference between these two for SEO purposes, ie (in this case) attraction to search engines. In fact, search staff disclaim all effects of web standards on search results in any case. It may be that the staff concerned believe this, or it may be politic to make such statements - but practice disproves theory here. All tests we have run show that a quality site makes a huge difference in search. It could of course be that all the other factors in quality are having the effect, and standards-compliance was not an element - but this is also disproved by trial and testing. Standards compliance works.

However the difference between a page that validates to Strict and one that validates to Transitional is much harder to quantify. For all practical purposes you would have to say this is not a profitable area for work - there are a hundred other places where the same time input would produce better results. It's absolutely true that sites with crystal-clear sparkling clean code can place very much higher than they should - but the time input has to be accounted for, and if it took more than a quick going-over to fix, best just leave it getting a green for the Transitional doctype.

HTML5 code validation

My advice at Q1 2012 is to ignore HTML5 for the time being. Right now it is purely an experimental system - even a 'concept' system - and there aren't even any rules for writing it yet never mind validating it. The W3C official page validator is experimental (as is the whole code system). At this stage people are even making up their own code to go into it - and which validates as a pass even though by any other standard it is faulty. To be brutally honest the W3C management of the HTML5 implementation is best described as incompetent. As a guide to the situation, old web pages with 100 errors will validate as correct or nearly correct at the W3C validator.

If it has any uses at all currently, it is for building web forms, as these are now far easier since they don't need JS validation to correct user input errors. Otherwise, you can ignore HTML5 until it stabilises and/or the W3C wake up to the fact they have badly mismanaged the introductory process - if not the entire concept. Apparently you can write reams of faulty code, or invent your own, and it will still validate OK, which does not seem to be an ideal situation. The basic issue is that the rules or conventions are not set out in full, so that for example you can leave tags open, or close them; or use a mix of upper and lower case; or use quotes around elements, or not, as you please; and it will still validate correctly. Plus, nobody knows exactly what is supposed to be done in many areas as there are no conventions yet. In fact, as a code system, it is an unholy mess. If you want to take it further then take a look at these links - but be advised HTML5 is not a usable code system at present.

http://www.impressivewebs.com/html5-syntax-style/
http://www.impressivewebs.com/understanding-html5-validation/

As for its use within CMS/ecommerce/forum software, our main concern here - let's just hope no software authors take the bait until the gazillion issues have been resolved. Any pagecode code generated in this style now runs the risk of being unmaintainable. HTML5 has no real rules as yet and is the Wild West, and at Q1 2012 you should probably avoid it unless you need a specific feature such as the improved html forms.

Code validator for IE6

Code validation for Firefox
There is no such thing as a code validator for Internet Explorer or Firefox etc. Code is either right or wrong, and which browser is chosen to view the resulting page is irrelevant.

You should validate your web page code at the W3C validator. This tells you if it is good code or not. You can then see how different browsers will render that code, using Browsershots for example, as described elsewhere on this page. Simple code will be displayed correctly in all correctly-functioning browsers. More complex code may have inter-browser compatibility issues, as the exact way to render it may be interpreted with slight variations. This can be checked in those browsers, or at Browsershots.

Some browsers are so poor that they cannot cope with standard, ordinary code, and chief among their number is IE6, Internet Explorer 6. This browser has so many faults and security exploits that it should have been upgraded to a later version many years ago, or preferably blocked in your firewall and a more sensible choice of browser used. IE6 is a 'rogue' browser as its performance is so poor in so many areas. It has to have a special CSS file provided for its exclusive use, since it cannot cope with standard HTML code.

The IE6 cut-off date

Enough is enough. Internet Explorer 6 was obsolete by 2004, and has been a dangerous and low-quality choice for many years. Developers have neen wrestling with this issue for a long time and several have decided that by January 2010, no more effort can be expended on this Stone Age relic. Support for it will be discontinued after this time, including by us.

The history of web page code

Modern page code has evolved and no longer looks or acts like that of ten years ago. The main difference is that in the past a web page comprised HTML alone. Then along came CSS, which was used to alter display properties - and this CSS was included on the page. This is called inline CSS.

Now, a page is composed of HTML (or xHTML) and CSS is placed in an external stylesheet - a far more efficient method. The HTML comprises the content and layout skeleton, the CSS is an added layer that tells a browser how to display the page in detail. The more display instructions transferred from HTML to the CSS the better.

In addition there is another factor: the codebase layout scheme. In the 90s the structure of a page, its overall layout, were usually determined by cells or tables. However, these devices were never intended for this purpose and were originally for tabular data, ie charts, tables and forms. Using these instructions for a page layout results in a clunky, slow and bloated page that has numerous technical restrictions and cannot be viewed identically by all visitors everywhere. This is because of the limitations of the structure, and its interpretation by a wide range of browsers - which after all, are trying to make the page look like some form of chart or table if you use this structure.

In the early 2000s a new method became popular, layers (divs) and CSS, for structure and display. This method is called the tableless layout. It was so obviously superior that by 2002 cells and tables were almost obsolete. For the next three years, a half-and-half system was popular, where layers were mixed with tables, for web applications (or coders) who couldn't handle the new methods. This resulted in a mixed divs and tables layout.

From 2007, it was seen that the only viable way to code pages was the use of layers (divs) and CSS. Nowadays this is seen as a crucial part of web page construction and is used in all forms of page building - by manual means, wysiwyg editors, or by web applications such as CMS. Only the crudest or hard-coded old apps that can't be changed use tables in the code. Hand-coders should have changed to divs and CSS by now as (in 2009) it is seven years after tables were seen to be becoming obsolete. By 2002, the most advanced web authoring applications had changed over completely to divs and CSS, and had finished with tables.

Actually this is not to be worried about if you are validating a page as there is nothing you can do about it. In any case it does not affect the validation itself. It has a big effect in other areas, but you can still get a Green - Passed confirmation even if your web page building application uses an obsolete method of page code layout.

Why validate?

Good question. It's the same answer as why, if you speak German, most words had better be in that language or you won't be correctly understood. If 33% of your speech is Bengali then you are not speaking any known language.

For us there are two more good reasons: it's a community thing. If you belong, you wear the same uniform. If you don't belong, don't wear it - but don't ask for the benefits of the community. It's entirely your choice.

And in addition, clean websites with perfect code are very successful commercially. That may influence us slightly!

However, you will find all sorts of arguments against this, and here are some excellent ones:

1. Search engines strip out all HTML tags and other code before they read your site text. So the code is irrelevant. So search success has nothing to do with site code. So website code has no bearing on commercial success.

2. Senior search engine staff have clearly stated that poor site code does not result in any form of penalty. This is logical because the majority of the world's websites have code errors, and cannot be ignored simply because of this factor.

3. The best-known SEO experts have stated that perfect code is not necessary for top results.

These are all excellent arguments and cannot be countered.

There is only one problem with this: all our own tests, research and livesite results have produced data that contradicts these statements.

In our own tests; our own research; our own website results - every single result or test we have ever run tells us that perfect site code is a factor in search results. It doesn't matter who says different because we know what works. All we do is get top results, we don't have to act as company representatives in public, or need to appear as media stars. We just get results.

If you frequently read in books that A = B; if you go to lectures and are repeatedly told that A = B; if you take the advice of the experts, who tell you that A = B; then it is reasonable to assume that A = B.

But if in your daily life, you find that whatever you do, however you arrange it, whatever the circumstances, A always = C; then what do you do? Do you believe the people who tell you that A = B? Or do you believe the evidence of your own eyes on a daily basis, that A = C?

Sorry, but we know that A, here, equals C...

All other things being equal, a site with perfect code will trounce a site with poor code. You could get George W and Tony Blair to stand up on stage and tell us that clean code was of no relevance; but, much as we respect them, in this particular instance we would be forced, unwillingly, to disbelieve them - because we know the truth, strangely enough, is different.

Did you find this page useful?
If so, please consider linking to it. Thank you.


Make sure your web code validates, it's the first step toward quality. If quality is of no interest to you - then simply forget about validation, it's not for you.
 
Web Business Managers