Web Applications
Web Apps. Overview
Get Page
Analyse Page
Site Mapper
Custom 404 Error
Next section...
Techmiscellanea!
Source Code
getpage.pl source
analpage.pl source
sitemap.pl source
dandy404.pl source
dwd.pm source
dwd.conf source

Web Applications

dandy404.pl - Custom 404 Error

When a webserver receives a request for a URL which it cannot match it returns a 404 error code in the HTTP stream. The browser receives this and displays its default 404 page; Page not found! This is rarely very helpful or informative, the lack of requested page usually speaks for itself. It is possible to configure the server to return a specific 404 error page, which certainly looks better than the default error page but still does not really help.

The solution demonstrated here is rather different, instead of directing the 404 error to a specific page we can just as easily invoke a CGI program which may be able to remedy the missing page problem.

The previous application »Site Mapper produced a map of the website links, a by-product of this is that a list of all possible links within the website can be generated too. This also has many practical uses as we shall see...

There are two common reasons for invalid request URLs, either the filename is correct but the path is wrong, or part of the URL has been mistyped. In both cases this program should be able to find the correct URL and direct the user to it. Naturally, if the request URL is utterly wrong, perhaps for a page which simply does not exist then a fix is obviously not possible.

The basic program function is discussed below, refer to the links in the sidebar for the source code, to see the program in action follow this deliberately invalidated link back to this page:

Program Function

The Apache error handler for the website, contained within the .htaccess file in the top-level directory should be modified to direct 404 errors to the new program:

ErrorDocument 404 /cgi-bin/dandy404.pl

When the program is called it acquires the request URL from the CGI environment and then compares it against a separate list of all possible URLs within this website stored in a text file on the webserver.

The possible URLs are then ranked with the best matches first and a 404 page can be returned with the most likely links displayed within it.

It is possible to output a simple redirection page and take the user directly to the suggested page. However this may not always be a desirable option, in my view it is better to advise the user that an error has occurred and allow them to make an informed decision as to which page to go to instead.

The referer [sic] value is also checked. This indicates where the invalid URL came from and hence the required or possible actions.

If the referer value is blank then this indicates that the URL has either been typed into the address bar or it is being requested by something other than a regular browser. If the referer value is populated then it will indicate which website the URL came from, your website or an external one. In both cases a broken link is the cause, but clearly the action to be taken is different. The program can be configured to log all calls that have a referer defined.

Program Options

The program itself is fairly simple with essentially a single mode of operation. It will modify its behaviour slightly depending on how well it matches the request URL, and whether or not there is a defined referer value, it will also note when it is invoked directly, with no attendant error.

The program does have a few settings which are contained within the dwd.conf external configuration file. These are as follows, and explained below:

dandy404urllist     /src/url.dat
dandy404displayno   8
dandy404logfile     /src/404errors.dat
dandy404log404      1

dandy404urllist The path from the document root to the text file which contains the list of allowed URLs for your website with one per line.

dandy404displayno Defines the number of matching URLs that will be displayed. If the last link in this list has an equal correlation to those that will be omitted, these will be included as well. The remaining possible URLs are also added to the 404 page but these are hidden using DHTML and only revealed on demand by clicking an associated link.

dandy404logfile The path from the document root to a log file, if logging is enabled then all 404 errors with a referer value will be logged with the invalid URL and the referring page so that you can fix it or contact the relevant webmaster.

dandy404log404 A simple boolean, 1 enables logging, 0 turns it off.

Program Availability

You are welcome to take a copy of this program and use it within your own website provided that such use is entirely non-commercial. The design and structure should allow you to easily modify the output to match your own website, see the CGI Scripting pages for more details on how to do this.

You will need to copy three files; dandy404.pl, dwd.pm and dwd.conf, open each of these and copy-paste directly to a text file.

You can modify the program to run in any way that you want it to, or pick apart the key routines to write your own version, in addition the core subroutines used by this program form the basis of many other useful website utilities...

Show Style-Switcher...