CGI Scripting
Perl CGI Index
Basic CGI Scripting
Simple CGI Demo
Styling CGI Output
Sharing Perl Code
CGI Code Suite
CGI Environment
Next section...
Web Applications
Demo Source Code
Example #1 source
Example #2 source
Example #3 source
dwd.pm (#3) source
Example #4 source
dwd.pm (#4) source
dwd.conf source

CGI Scripting

CGI Environment - Example #4

The previous article contained a brief discussion relating to the 'out-sourcing' of the program environment code to a shared subroutine.

This provides a simple yet robust framework for each program within a consistent and stable environment. The resulting program script is simpler, leaving you to concentrate on the primary task: the web application...

The main change is the addition of a new subroutine: initprogenv()
This takes care of a number of key program initialisation features, each will be discussed in more detail below.

These ideas have been implemented in the next version of our example program; »Temperature Converter v4, see the sidebar links for v4 source code...

Initialising an Environment

When your program starts you want it to 'know' everything it needs as soon as possible, it needs to pick up all the various background data as well as its arguments before performing any necessary checks and only then executing its primary function.

As well as picking up all this background data it must be in such a form that any part of the program can access it as required, it needs to be portable! This is especially true if a lot of functionality has been out-sourced to a shared module as we are demonstrating here.

The solution to this has already been touched upon previously, to store the data within a hash, which can easily be passed by reference to any part of the subsequent code.

In practice a single hash has its drawbacks, different parts of the data need to be handled differently and so it is more convenient to use two hashes with distinct functions;
%G (G for Global) is intended to hold all of the global scalar values, generally file locations, titles, strings, code-fragments and so on...
%arg (arg for argument) This holds the program input, strictly those values passed to the program from the CGI input; these will be values from a web-form or passed in via the request URL.

In this revised version the rest of the initialisation has been passed to a separate subroutine within the module; initprogenv() This takes references to the two hashes and an array containing the names of all expected parameters.

By using a shared subroutine like this we ensure that all of the programs within the website are initialised in the same way, have access to the same data and function in a consistent manner. It also dramatically reduces the volume of code within the program, apart from a few initialisation lines the rest is pure program specific function.

Once the initialisation routine has completed all that remains is to define any values that are specific to the program, in this case the return and associated links and then we can go straight into the main program function.

The initialisation functions contained within initprogenv() are shown here and discussed in more detail below...

sub initprogenv
  {my ($G,$arg,$paras) = @_;
  #  Preload Global hash
  $$G{'doc_root'}    = $ENV{DOCUMENT_ROOT};
  $$G{'progname'}    = $ENV{'SCRIPT_NAME'};
  $$G{'domainpref'}  = "http://$ENV{'HTTP_HOST'}";
  $$G{'timestamp'}   = scalar localtime();
  $$G{'referer'}     = $ENV{'HTTP_REFERER'};
  $$G{'corefile'}    = "$$G{'doc_root'}/src/dwd.conf";
  #  Load external file configuration
  load_hash_from_file($G,$$G{'corefile'});
  #  Prepend doc_root path to SSI file paths
  foreach my $k (sort keys %$G)
    {if(($k =~ m/ssi$/) && ($$G{$k} =~ m/\.ssi$/))
      {$$G{$k} = "$$G{'doc_root'}$$G{$k}";}
    }
  foreach my $p (@$paras) {$$arg{$p} = nvl(param($p),"");}
  #  Perform security checks if required...
  #  ...
  }

Note! There are a number of subroutines mentioned here which have been 'casually' added to the module. Within the scope of this simple example these are of little value, however these functions play a key part in the other programs used within this site, and are themselves just a few of a collection of highly portable and therefore useful subroutines that I have to hand. Here, as always, good coding ethics prevent me from reinventing the wheel and writing them twice! Always write code with re-usage in mind!

Loading External Configuration Values

So far the only external values that we have needed to pass to the programs are the locations of the various SSI files. Within the scope of this very simple example program there is little need for anything more.

However in more complex 'real-world' programs there will be a need to pass other values, pre-set names and data, other datafiles, switches, flags and so on. These can all be hard-coded into the initprogenv() subroutine, and for a single program that would be easiest, but to make all of the programs as portable and as manageable as possible the best way is to pull all of these values to an external text file which can be read by the program on start-up.

The file contains a series of tags and values, these are parsed into the %G global hash with the tags as the hash keys. This is taken care of by another subroutine: load_hash_from_file() Reading data from files in this way is a common method and so worth having as a dedicated subroutine as it is used a lot - obviously this simple example doesn't do it justice!

Some of the values are SSI file locations, once the main file has been loaded, a quick loop through all of the global values finds each of these and prepends the document_root path to provide a full file path.

There are also a number of other useful values added to the global hash which allow all subsequent routines to 'know' things such as the program name, the execution timestamp and so on. These values are not used in this demonstration but are generally useful in many other programs and included here as an example.

Reading of CGI Parameters

This important task now goes almost un-noticed in the initialisation routine, blink and you'll miss it.

All of the expected parameters are defined at the start of the program in the @parameters array. The following line within the initprogenv() subroutine loops through these values and reads them from the CGI environment using the subroutine 'param', which itself is part of the CGI.pm module that is also included in the program.

foreach my $p (@$paras) {$$arg{$p} = nvl(param($p),"");}

nvl() is another useful function, it goes through each of its arguments in turn and returns the first one that has a value, if none of them do it returns an empty string. This means that all of the main program arguments will always have a defined value even if they are not defined in the calling form or URL. This simple default easily avoids undefined value issues within your programs!

Security and Anti-abuse Features...

Because the initprogenv() subroutine is executed by all of the programs on start-up it is a good place to add any checking or security related routines.

Within this example we have little need for this so for now there is just a commented line in the code, however on a larger or more complex site this becomes relevant. There are several specific security issues that you may need to safeguard against, all are easily solved by adding the required code here.

Out Of Context Program Call
CGI programs are most usually invoked from webpage forms, however it is also possible to call the program directly by typing its name and arguments into the request URL. Many webpages use JavaScript to pre-process or validate the entries before submission, many entries are one of a select range of values within the form. Directly invoking the program via the URL will bypass this and increase the risk of 'dirty' input data, the program may be running outside of its intended scope.

The solution is to check the referer value, another CGI environment variable, (the mis-spelling is intentional!). This will contain the URL of the page from which the program was called and will normally be the URL for the form itself. If however the program was invoked from the command line or the address bar of the browser then the referer value will be blank. This is easily tested for, the response, if any, will of course depend on the requirements of the program and the website in general, for many simple programs there is no reason to disallow this behaviour at all.

Resource Theft/Abuse
In many ways this is a similar issue to the 'Out of Context Call' discussed above, the difference here is that someone is deliberately calling the program externally to either steal your resources or to be a nuisance.

This is especially true of programs that can send email or make postings to to the site; if left unguarded a simple website email form becomes a spam-portal!

As before the solution is simple, check the referer value for 'unusual' calls and check the input to make sure it has no unexpected inclusions such as embedded HTML, multiple email addresses or email source code. You can also check other values such as cookies, or the IP address of the call to identify valid, or more importantly invalid traffic.

Password Controlled Access
Using .htaccess files on an Apache webserver allows certain directories and all sub-directories beneath them to be password controlled. This method defines a 'realm', and when a password is successfully given it lets the browser know that the authentication for this realm has been done. This is a tried and tested method that works well for static pages.

Unfortunately CGI resources are not usually stored within that part of the filesystem, the /cgi-bin will nearly always be outside of the realm and so this methodology does not protect CGI resources. Not only but the CGI environment does not know the status of any previous authentication and so cannot tell if you have previously logged into the protected area or not.

There is of course a solution, (the exact technical details will be discussed elsewhere on this site), but for now suffice it to say once access has been made to a protected and static page, and the authentication established, a cookie can be written. It is this cookie that the CGI programs can test for to see if the call is permitted.

The initprogenv() subroutine is the place to make this test and to re-direct the output if it fails.

All of the ideas given here are fairly simple, yet combined according to the needs of your website they provide a very robust environment and a stable platform upon which to base the rest of your code. Once you have these routines established and working you can forget about them to concentrate on the 'proper' coding for the dynamic pages themselves.

Show Style-Switcher...