Installing the W3C-Validator on a Apache under Windows

Table of Contents

About this guide

This guide for the installation of the W3C Markup Validator was contributed by David Tibbe. It is also available in german.

See the generic installation guide for instructions on how to install the Markup Validator on other platforms and links to other platform-specific guides.

Needed programs and other prerequisites

The first prerequisite to run the validator on your system is a Web server. This guide will assume that the server is already installed, and its configuration will not be discussed here. The server used in this guide is Apache 2, and the instructions should be applicable to most versions of Apache.

The Markup Validator itself is basically a script written in Perl, so you will need that, too. ActivePerl (version 5.8), is one of the options, and thanks to its installer should not be difficult to set up.

Of course, you will also need the Validator itself. It is available for download as two tar-archives: the validator itself (~300kB), and a collection of DTDs (~400kB).

The validator relies on a number of Perl libraries, or "modules". At Active State, has a list of all available modules for Active Perl. In this list is also mentioned if a module is "Core" (which means it is build in) or if it is downloadable.
For the validator, the following modules are required:

...and the following are optional:

You can get each of them in a single package at ActiveState. They are also packed ready-to-install in a single zip-file, (courtesy of the guide's author).

Finally you will need some calm and patience. A whole installation of the validator (including the Apache and Perl) will take about an hour if you do not have much experience.

Directory Structure

It might be a good idea to think about the directories the program should be installed to. Just clicking "Next" in all installation routines is not a very good idea.

One suggested method is to have a directory C:\www in which all programs concerning the Web server are located (in subfolders, obviously). For instance, Apache is installed in C:\www\Apache2, Perl in C:\www\perl, the Perl modules unzipped in C:\www\pmm, the validator itself in C:\www\validator and finally the DTD-Collection in C:\www\sgml-lib

The rest of this guide will assume that these paths are being used. If you want to use another directory structure, you will have to adapt the instructions and sample configuration to your own setup.

Installation of the Programs

Next, you will be installing all the programs and libraries one after the other.

The Apache Web server should first be installed and successfully started. As mentioned, this will not be explained in here, but you will find a lot of How-Tos all over the net, e.g. in the Documentation on the Apache site The only thing that is important to know is that the installation routine will create a subfolder Apache2 by itself. Therefore, you will want to choose C:\www as installation directory, and Apache will be installed to C:\www\Apache2.

Adding the Perl Modules

Since Perl 5.8.8 build 817.91 PPM has a GUI. It is quite easy to use, if you do have any problems have a look at its documentation.

You can add the modules from the downloaded package or directly from the web. If you want to install them from a local repository, you have to add it first (if you don't want to do so, just skip this step).

That can be done within in the preferences dialogue (Edit -> Preferences -> Repositories). Click the Folder-Icon, select C:\www\ppm and name the repository "Local", for example. Click "Add" and then "OK".

When you type the first letters of the desired package, the long list is getting shorter and only matching packages are shown. Select the one you want to install, select "Install..." from the context menu. When you have done that for all packages, click the little green arrow in the to of the window.

In the little status window on the bottom you will see the progress and success of each installation. Exit PPM when all packages have been installed.

Configuration of the Apache Web server

The first file to edit is the httpd.conf located at C:\www\Apache2\conf, the central configuration file of your Apache. It is recommended to make a backup before editing it.

The validator pages are composed by using SSI. Therefore, the Apache has to load the required module. In "Section 1 Global Environment", the different modules are loaded. SSI needs mod_include for working. The line

LoadModule include_module modules/mod_include.so

has to be unquoted (by deleting the # at the beginning of the line) or added completely if the module is not loaded yet.

The next step is to create a virtual host. It is needed, because the validator should be run from a different directory and logically separated from the default host. At the end of the httpd.conf, in "Section 3: Virtual Hosts", the following lines have to be added:

NameVirtualHost 127.0.0.2:80

<VirtualHost 127.0.0.2:80>
    ServerName validator.example.org
    DocumentRoot "C:/www/validator/htdocs"

    ErrorLog logs/error_validator.log
    CustomLog logs/access_validator.log common

    ScriptAlias /cgi-bin "C:/www/validator/httpd/cgi-bin"
    ScriptAlias /check "C:/www/validator/httpd/cgi-bin/check"

    AddType text/html .html
    AddOutputFilter INCLUDES .html

    <Directory "C:/www/validator/htdocs">
        Options ExecCGI Includes Indexes MultiViews
        AddEncoding x-gzip .gz
        <Files *.js.gz>
            ForceType application/javascript
        </Files>
        <Files *.css.gz>
            ForceType text/css
        </Files>
        AllowOverride None
        Order deny,allow
        Allow from localhost
    </Directory>
    
    <Directory "C:/www/validator/httpd/cgi-bin">
        Options ExecCGI Includes Indexes MultiViews
        AllowOverride None
        Order deny,allow
        Allow from localhost
    </Directory>
    
</VirtualHost>

It may be that the httpd.conf is splitted into more than one file. The other files will be located in the C:\www\Apache2\conf\extra folder. If there is a file named httpd-vhosts.conf edit this one and make sure that it will be included in the httpd.con (there must be a line

# Local access to the Apache HTTP Server Manual
Include conf/extra/httpd-manual.conf

The meaning of all these lines will not be discussed here. Who is interested in their sense can have a look to the manual or in one of the many How-Tos found by Google. Just some short notes about: The first line specifies the IP-address, the validator should run at. It is a loop back address, so the validator will be accessible only from your machine. The following lines specify the name of the host, the locations of the log files and some "short cuts" for the cgi-bin directory and check-script. The next three lines make the Apache parsing HTML-files for SSI-directives. The last two sections set some permission for the used directories.
The files error_validator.log and access_validator.log can be found in the C:\www\Apache2\logs-directory. They log every request and error occurring on this host and give you some helpful advices in these cases.

Finally, the Apache has to be restarted so that the changes take effect. You can do that by choosing the shortcut in the Apache-program group (Start, Programs, Apache HTTP Server, Control Apache Server Restart). A DOS-box will appear shortly; when it disappears, the Apache is restarted.

When you call http://127.0.0.2/ in your browser, you should see the well-known site from http://validator.w3.org/. In the configuration file of the Apache, a name for the Virtual Host was defined (validator.example.org), but it is not resolved yet. That is changed in the next step.

Adaptation of the hosts-File

The hosts-file can be seen as a local DNS-configuration. It is located in at C:\windows\hosts on Win9x and at C:\Windows\system32\drivers\etc\hosts on WinXP. It might be possible, that the file is missing, but a file hosts.sam can be found instead. In that case you have to rename it by deleting the file extension and its leading dot.

When opening it in an editor, you will find a leading comment in there. A line

127.0.0.1 localhost

follows.

That line means that a request to localhost is redirected to 127.0.0.1 (i.e. calling http://localhost/ in your browser effects a request to http://127.0.01/).

Edit the file to the following:

127.0.0.1 localhost
127.0.0.1 www.example.org
127.0.0.2 validator.example.org

After these changes, the server is available at http://localhost/ but can be accessed also at http://www.example.org/, too. Requests for http://validator.example.org/ are redirected to http://127.0.0.2/.

The server configuration is finished right now. But if you try to validate a page, you will get an "Internal Server Error", because the check-script has not been configured yet.

Configuration of the Validator

In the directory c:\www\validator\htdocs\config, a file called validator.conf can be found. Open it with the editor. Lines beginning with # are comments.

For the SGML Library, c:/www/sgml-lib is declared. Notice that slashes are used here instead of windows-like back slashes.

The last option that has to be modified is Allow Private IPs = { no | yes }. It must be set to "yes". Otherwise, you cannot validate files from the local PC and you will get just an access fault because of security reasons.

After that, the configuration file will look like this:

#
# Main Configuration File for the W3C Markup Validation Service.
#
# See 'perldoc Config::General' for the syntax, and be aware that the
# 'SplitPolicy' is 'equalsign', ie. keys and values are separated by '\s*=\s*',
# and that 'InterPolateVars' is in effect.
#

#
# Base Path for Markup Validator files.
#
# You MUST set these unless you use the default locations for the files.
# e.g. the config files in "/etc/w3c/" and everything else in
# "/usr/local/validator/".
#
# Make sure all file paths below do NOT end with a slash

<Paths>
  #
  # Base path.  Defaults to the value of the W3C_VALIDATOR_HOME environment
  # variable or /usr/local/validator if the variable does not exist.
  #Base = /usr/local/validator

  #
  # Location of template files
  Templates = $Base/share/templates

  # configuration file for HTML::Tidy Module, if available
  TidyConf = $Base/htdocs/config/tidy.conf

  <SGML>
    #
    # The SGML Library Path.
    Library = C:/www/sgml-lib
  </SGML>
</Paths>

#
# This controls whether the debugging options are allowed to be enabled. 
Allow Debug = yes

#
# This lets you permanently enable the debugging options. Can be overridden
# with CGI options (unlike "Allow Debug" above).
Enable Debug = no

#
# Whether private RFC1918 addresses are allowed.
Allow Private IPs = yes

#
# Enable (or not) the web service API for this validator
# see http://validator.w3.org/docs/api.html
Enable SOAP = yes


#
# Whether the validator will check its own output.
# 0 means it will refuse to check its own output, 1 means it will but it will
# refuse to check the results of it checking itself. Etc.
Max Recursion = 0

#
# Protocols the validator is allowed to use for retrieving documents.
# The default is to allow http and https.
<Protocols>
  Allow = data,http,https
</Protocols>

#
# Email address of the maintainer of this service.
Maintainer = www-validator@w3.org

# Localization
# only English available for now
Languages = en



#
# Mapping tables etc...
#

#
# Main document Type Registry; contains all information on the types
# of documents we support and how they are processed.
<Types>
  Include types.conf
</Types>

#
# Mapping of charset names to their IANA names and how iconv(3) knows them.
<Charsets>
  Include charset.cfg
</Charsets>

#
# Map MIME Media Type to Parse Mode mapping.
<MIME>
  text/xml              = XML
  image/svg             = XML
  image/svg+xml         = XML
  application/smil      = XML
  application/xml       = XML
  text/html             = TBD
  text/vnd.wap.wml      = XML
  application/xhtml+xml = XML
  application/mathml+xml = XML
</MIME>

Now, the validator has been successfully configured. But it is not ready for use yet, some lines in the validator script itself have to be changed before.

Adaptation of the check-Script

The following changes are necessary because the script is written for a Unixserver where some things are different to a Windows-system.

The check-script located in the directory C:\www\validator\httpd\cgi-bin can be opened with the editor. I will not mention any line numbers in the following steps, because they might differ in later versions. Some advices are always placed in the lines above in the script so that you can orientate yourself to those lines.

The very first line of the script has to be changed to

#!c:/www/perl/bin/perl.exe

This is the path to the Perl interpreter, up to now in Unix-style. So it has to be changed to Windows-style. The parameter -T is replaced by that, too.

In the following lines the script is told where to find the configuration file. That is done after the comment in these lines

#
# Read Config Files.
eval {
  my %config_opts = (
     -ConfigFile => ($ENV{W3C_VALIDATOR_CFG} || '/etc/w3c/validator.conf'),

We do not define an environment variable but specify the full path to the file. So the lines have to be changed to

#
# Read Config Files.
eval {
  my %config_opts = (
     -ConfigFile => 'C:/www/validator/htdocs/config/validator.conf',

For some configurations, the check-skript needs to know where its root directory is. It tries to read it from the environment variable W3C_VALIDATOR_HOME and uses a default directory if it is not set. That is done within the line:

          Paths => {
            Base => ($ENV{W3C_VALIDATOR_HOME} || '/usr/local/validator'),
          },

So we change that line fixed to our path:

          Paths => {
            Base => 'C:/www/validator',
          },
.

After saving the script, you can use it at http://validator.example.org/ as you know it from http://validator.w3.org/.

That is it, your own validator is working.

Hints

In further versions of the validator, some other Perl modules will be needed perhaps. They can be downloaded by PPM. You can discover that case very easily: When trying to run the script, you will get an output like

Can't locate Config/General.pm in @INC (@INC contains: C:/www/perl/lib C:/www/perl/site/lib .)
    at C:/www/validator/httpd/cgi-bin/check line 46.
BEGIN failed--compilation aborted at C:/www/validator/httpd/cgi-bin/check line 46.

It is easy to see that the missing module is "Config General", that has to be installed.

Windows XP with Service Pack 2 installed might have problems with the loop back address 127.0.0.2. The Problem and its solution are described at http://support.microsoft.com/default.aspx?kbid=884020.

If you do have any further questions or suggestions to improve this guide, you can use our feedback channels.