Home arrow Code Validation
Website Page Code Validation Guide



Website Page Code Validation Guide

Here is a two-page guide to basic page code validation. Validating your website pages is the first step toward web standards compliance and a quality website.

Part 1 - The background to page code validation; web standards compliance; quirks mode; HTML or xHTML  [this page.]
Part 2 - How to validate web pages.

Part1 - The Background to Validation

Web standards

Web standards are the 'rules and regs' that lay out the way web page code is written and how browsers interpret it. In the end this determines how websites work. It seems reasonable that if you write some code that a browser will be trying to turn into a web page, then the code should be normal and correct language for that type of code - ie in a standard format and without errors. If you want to write French then it is no good if one word in three is misspelt or in Greek.

So getting the basic page code right is the first step toward web standards compliance. After this come accessibility and semantic markup - but let's start out simple. Standards compliance means that everybody sees the same page; people with older and less powerful equipment are not excluded; people with disadvantages are not excluded; pages work correctly on any computer, on any platform; and pages don't behave in strange ways and freeze or lock up your computer. If you ignore web standards then anything can happen, and often does. It also means that an easy, quick and cheap route has been taken in order to publish the content, and this policy means low quality is the main feature of the website.

Complying with accepted web standards is the foundation of a quality policy. It is impossible to maintain any form of quality control if you ignore web standards. In every case where you find evidence of disregard for standards, you will also find all sorts of other faults and evidence of low quality. Quality starts with standards compliance and in our experience, where it is ignored, you will face other more serious problems.

Standards compliance is an important test of both basic website construction knowledge, and perhaps more importantly, attitude.

Central control

If you have a type of code, such as PHP for example, there needs to be someone somewhere who decides on how the code is written and what it does. This seems reasonable and logical. For PHP, then, there is a central organisation that organises this aspect for all PHP coders. In fact they couldn't work without this central body, it would be chaos and anarchy otherwise.

So you can perhaps see that a central body to decide standards for code is not only a good idea - it's essential. The central body having been established, it wouldn't be much use if all their work was ignored. So it is up to us to follow their recommendations and ensure that our code is written correctly.

For web page code the central authority is the W3C, WorldWideWeb Consortium. They are universally recognised as the central authority for web standards, and they decide how code is written and interpreted by browsers. The Director is Tim Berners-Lee, who helped to found the Internet.

They have provided some useful tools for us to establish whether or not our code is correct. The most popular of these is the basic page code validator at:
http://validator.w3.org

Code validation

If you enter the domain name of your site in the text box there (eg www.a3webtech.com), then the test will tell you if your index page (front page or home page) code is correct or not; and if faulty, what the errors are.

If you are lucky the page will pass with a green icon and a congratulatory message. The reality is though that you will have some errors to fix.

The first thing to note is that there are many code variations that browsers can read - but you have to tell them what one you are using. In other words, if you have written in English, you say "This page is in English."

The equivalent statement in web page code is the Doctype, or document type declaration. The most common are HTML or xHTML. There are Strict or Transitional versions of each. If possible you should always use the Strict type as it is true and correct code. The Transitional type is a relaxed specification that puts the browser into quirks mode. The HTML Transitional type is a low-quality mode that is used when there is no alternative, or when a shortcut must be taken. It is not equal to HTML Strict and is an 'economy' alternative.

Quirks mode

This is a backward-compatibility mode, designed to allow older or faulty browsers such as Internet Explorer 5 or 6 to work as correctly as possible. Some commands and many errors are ignored, and  an 'average'  page results. The page may not display correctly and it is a cheat method. However, it can be used in cases where a web application creates its own version of code that it is difficult or impossible to correct perfectly.

If you do not include a Doctype; or if the formatting of this statement is incorrect; or if there is an XML prolog; or if there is anything before the Doctype - then a browser may be put into quirks mode. In this mode, code has errors but they are of a type that browsers can accommodate as long as they know what they are.

Here is an HTML Doctype:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">


It should be the first item in the source code. Nothing should come before it.

Here is an xHTML Doctype:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">


If you place an XML prolog before it then some browsers will be thrown into quirks mode. In theory, IE6, 7 and 8 will be, but others won't. An XML prolog looks like this:

<?xml version="1.0" encoding="utf-8"?>

Here is the previous xHTML Doctype with an XML prolog that means IE browsers will go into quirks mode:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">


However, in theory quirks mode can be triggered in IE simply by putting anything at all before the Doctype, such as a commented-out line. So if your page crashes in IE6, you can always try putting it into quirks mode by one of these tricks - it might make the page work. Here is a commented-out ('invisible') line:

<!-- quirks mode for Internet Explorer -->

Here is an HTML Doctype with a quirks mode comment for IE browsers:

<!-- quirks mode for Internet Explorer -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">


This might assist faulty browsers to read a page correctly when otherwise they cannot cope.


Browser functional modes
Modern correctly-functioning browsers have either two or three operational modes:
  • standards mode (ie standards-compliant, strict) - pages are interpreted strictly according to the rules. This is how a page will be read if the correct Doctype is used. In theory, then, a page will always look the same on any browser, on any platform. It will always be interpreted correctly.

  • quirks mode - a mode that attempts to read the page as an old and faulty browser would, and introduces various quirky ways of interpreting the page that these browsers might have used.

  • transitional (aka, occasionally, strict - which is incorrect and a bad misnomer) - a halfway mode between the main two.

Browser checks

If you build web pages you need more than one browser and more than one screen. Otherwise, some people may not see what you see. One way to check your pages in a variety of other browsers is to use an online checker such as this one:
http://browsershots.org

When building pages it is preferable to check them in a standards-compliant browser such as Firefox. This means that as many other people as possible will see the page correctly. If you cannot really comprehend this and think only IE versions are important, then perhaps you should take a look at the intro page of Browsershots.org, and you will see that the web is a big place and IE doesn't really have too important a role - and it is diminishing every year. It is wrong to build a page for faulty browsers because people with proper, correctly-functioning equipment may be penalised.

HTML or xHTML?

Which one to choose? First check your page source code for a Doctype. If it's there, use that one. Modern web applications tend to use xHTML, extended HTML. If you have a choice, at this time (2008), I recommend you use HTML and not xHTML. That is for several reasons such as:

1. It's much easier to code for HTML, as it is more basic and there are far more resources available.
2.  If you are reading this and need the advice, you are not an expert -- so use HTML.
3. There is no difference whatsoever in practical terms.
4. Static web pages are best coded in HTML not xHTML.
5. xHTML is best for dynamic (database-driven) applications.


Anyway - if there is no Doctype, move to the next stage:

HTML source code check
HTML allows uppercase text in the code.

HTML may not use CSS fully and may have font tags within the page code. In the following example there are uppercase letters; and a mix of upper and lower case. The bold tag is closed in HTML style. The line break tag is clearly in HTML format:

<FONT CLASS=Arial-16pxb><b>Stuff</b><BR>


xHTML source code check
xHTML uses all lowercase text. There is no uppercase text in xHTML code apart from the Doctype - which is always uppercase - and which is not seen as part of the page code.

xHTML uses a specific type of tag closure, like this:

<img src="/images/boat.gif" alt="Nice Boat Picture" />
<br />  


The tag ending slash comes AFTER the command. In HTML this command has no slash anyway. You can also see the space before the slash. The uppercase initial capitals in the text are in some text that 'prints' - it's on-page text and not part of the code. That alt. text phrase shows instead of the image, or before it loads, and/or on the status bar. In IE browsers it shows on hover - which is a useful (non-standard) feature and probably the only good point they have.


So: first check the Doctype. Then look for clues in the code. It will be very hard to get a document to validate to the wrong Doctype as the errors will be more numerous, and harder to correct. Lots more work.

First, look at whether the code text includes capital letters. If it does, it's HTML and not xHTML. Then, place an HTML Strict Doctype on the page and try to get it to validate. If there are errors, ask the validator to try against a Transitional Doctype. If the results are better (as they are quite likely to be), then you can decide whether you wish to take the easy way out. If so, change to a Transitional Doctype and continue. Correct the errors one by one and you'll see the total dropping away quickly. Many errors are cascading - that is to say, if you fix three, then another ten will disappear as they were caused by the first three. So a page with 100 errors may need many less than this to be fixed, to correct it. Which is good news if you're faced with a lot!

We advise that if your code has no Doctype, and has no xHTML features such as the obvious closing tags, then go for an HTML Doctype. Try Strict first, then Transitional if you want an easier job. Strict is the better and more correct choice.

Modern code

Modern page code has evolved and no longer looks or acts like that of ten years ago. The main difference is that in the past a web page comprised HTML alone. Then, CSS was used to alter display properties - and this CSS was included on the page. This is called inline CSS.

Now, a page is composed of HTML (or xHTML) and CSS placed in an external stylesheet - a far more efficient method. The HTML comprises the content and instructions, the CSS is an added layer that tells a browser how to display the page. The more display instructions transferred from the HTML to the CSS the better.

In addition there is another factor: the codebase layout scheme. In the 90s the structure of a page, its overall layout, were usually determined by cells or tables. However, these devices were never intended for this purpose and were originally for tabular data, ie charts, tables and forms. Using these instructions for a page layout results in a clunky, slow and bloated page that has numerous technical restrictions and cannot be viewed identically by all visitors everywhere. This is because of the limitations of the structure, and its interpretation by a wide range of browsers - which after all, are trying to make the page look like some form of chart or table if you use this structure.

In the early 2000s a new method became popular, layers and CSS for structure and display. This method is called the tableless layout. It was so obviously superior that by 2002 cells and tables were obsolete and the only viable way to code pages  became the use of layers (divs) and CSS. Nowadays this is seen as a crucial part of web page construction and is used in all forms of page building - by manual means, wysiwyg editors, or by web applications such as CMS.

Actually this is not to be worried about if you are validating a page as there is nothing you can do about it. In any case it does not affect the validation itself. It has a big effect in other areas, but you can still get a Green - Passed confirmation even if your web page building application uses an obsolete method of page layout scheme.

Why validate?

Good question. It's the same answer as why, if you speak German, most words had better be in that language or you won't be correctly understood. If 33% of your speech is Bengali then you are not speaking any known language.

For us there are two more good reasons: it's a community thing. If you belong, you wear the same uniform. If you don't belong, don't wear it - but don't ask for the benefits of the community. It's entirely your choice.

And in addition, clean websites with perfect code are very successful commercially. That may influence us slightly!

However, you will find all sorts of arguments against this, and here are some excellent ones:

1. Search engines strip out all HTML tags and other code before they read your site text. So the code is irrelevant. So search success has nothing to do with site code. So website code has no bearing on commercial success.

2. Senior search engine staff have clearly stated that poor site code does not result in any form of penalty. This is logical because the majority of the world's websites have code errors, and cannot be ignored simply because of this factor.

3. The best-known SEO experts have stated that perfect code is not necessary for top results.

These are all excellent arguments and cannot be countered. There is only one problem with this: all our own tests, research and livesite results have produced data that contradicts these statements.

If you frequently read in books that A = B; if you go to lectures and are repeatedly told that A = B; if you take the advice of the experts, who tell you that A = B; then it is reasonable to assume that A = B.

But if in your daily life, you find that whatever you do, however you arrange it, whatever the circumstances, A always = C; then what do you do? Do you believe the people who tell you that A = B? Or do you believe the evidence of your own eyes on a daily basis, that A = C?

In our own tests; our own research; our own website results: every single result or test we have ever run tells us that perfect site code is a factor in search results. It doesn't matter who says different because we know what works. All we do is get results, we don't act as public speakers or media stars. We just get results.

All other things being equal, a site with perfect code will trounce a site with poor code. You could get George W and Tony Blair to stand up on stage and tell us that clean code was of no relevance; but, much as we respect them, in this particular instance we would be forced, unwillingly, to disbelieve them - because we know the truth is different.

Did you find this page useful?
If so, please consider linking to it. Thank you.


Make sure your page code validates, it's the first step.

 
Ethical SEO Agency