CGI Scripting
CGI Environment - Example #4
The previous article contained a brief discussion relating to the 'out-sourcing' of the program environment code to a shared subroutine.
This provides a simple yet robust framework for each program within a consistent and stable environment. The resulting program script is simpler, leaving you to concentrate on the primary task: the web application...
The main change is the addition of a new subroutine:
initprogenv()
This takes care of a number of key program initialisation features,
each will be discussed in more detail below.
- Initialising - setting up the program environment...
- Configuration and loading of external values such as files...
- Reading of CGI parameters - URL/form inputs etc...
- Security and anti-abuse features...
These ideas have been implemented in the next version of our example program; »Temperature Converter v4, see the sidebar links for v4 source code...
Initialising an Environment
When your program starts you want it to 'know' everything it needs as soon as possible, it needs to pick up all the various background data as well as its arguments before performing any necessary checks and only then executing its primary function.
As well as picking up all this background data it must be in such a form that any part of the program can access it as required, it needs to be portable! This is especially true if a lot of functionality has been out-sourced to a shared module as we are demonstrating here.
The solution to this has already been touched upon previously, to store the data within a hash, which can easily be passed by reference to any part of the subsequent code.
In practice a single hash has its drawbacks, different parts of the
data need to be handled differently and so it is more convenient to use
two hashes with distinct functions;
%G
(G for Global) is intended to hold
all of the global scalar values, generally file locations, titles, strings,
code-fragments and so on...
%arg
(arg for argument) This holds the
program input, strictly those values passed to the program from the CGI
input; these will be values from a web-form or passed in via the request
URL.
In this revised version the rest of the initialisation has been passed to a separate subroutine within the module; initprogenv() This takes references to the two hashes and an array containing the names of all expected parameters.
By using a shared subroutine like this we ensure that all of the programs within the website are initialised in the same way, have access to the same data and function in a consistent manner. It also dramatically reduces the volume of code within the program, apart from a few initialisation lines the rest is pure program specific function.
Once the initialisation routine has completed all that remains is to define any values that are specific to the program, in this case the return and associated links and then we can go straight into the main program function.
The initialisation functions contained within initprogenv() are shown here and discussed in more detail below...
{my ($G,$arg,$paras) = @_;
# Preload Global hash
$$G{'doc_root'} = $ENV{DOCUMENT_ROOT};
$$G{'progname'} = $ENV{'SCRIPT_NAME'};
$$G{'domainpref'} = "http://$ENV{'HTTP_HOST'}";
$$G{'timestamp'} = scalar localtime();
$$G{'referer'} = $ENV{'HTTP_REFERER'};
$$G{'corefile'} = "$$G{'doc_root'}/src/dwd.conf";
# Load external file configuration
load_hash_from_file($G,$$G{'corefile'});
# Prepend doc_root path to SSI file paths
foreach my $k (sort keys %$G)
{if(($k =~ m/ssi$/) && ($$G{$k} =~ m/\.ssi$/))
{$$G{$k} = "$$G{'doc_root'}$$G{$k}";}
}
foreach my $p (@$paras) {$$arg{$p} = nvl(param($p),"");}
# Perform security checks if required...
# ...
}
Note! There are a number of subroutines mentioned here which have been 'casually' added to the module. Within the scope of this simple example these are of little value, however these functions play a key part in the other programs used within this site, and are themselves just a few of a collection of highly portable and therefore useful subroutines that I have to hand. Here, as always, good coding ethics prevent me from reinventing the wheel and writing them twice! Always write code with re-usage in mind!
Loading External Configuration Values
So far the only external values that we have needed to pass to the programs are the locations of the various SSI files. Within the scope of this very simple example program there is little need for anything more.
However in more complex 'real-world' programs there will be a need to pass
other values, pre-set names and data, other datafiles, switches, flags and so
on. These can all be hard-coded into the initprogenv()
subroutine, and for a
single program that would be easiest, but to make all of the programs as
portable and as manageable as possible the best way is to pull all of these
values to an external text file which can be read by the program on start-up.
The file contains a series of tags and values, these are parsed into the
%G
global hash with the tags as the hash keys. This is taken care
of by another subroutine: load_hash_from_file() Reading data
from files in this way is a common method and so worth having as a dedicated
subroutine as it is used a lot - obviously this simple example doesn't do it
justice!
Some of the values are SSI file locations, once the main file has been loaded, a quick loop through all of the global values finds each of these and prepends the document_root path to provide a full file path.
There are also a number of other useful values added to the global hash which allow all subsequent routines to 'know' things such as the program name, the execution timestamp and so on. These values are not used in this demonstration but are generally useful in many other programs and included here as an example.
Reading of CGI Parameters
This important task now goes almost un-noticed in the initialisation routine, blink and you'll miss it.
All of the expected parameters are defined at the start of the program
in the @parameters
array. The following line within the
initprogenv()
subroutine loops through these values and reads
them from the CGI environment using the subroutine 'param',
which itself is part of the CGI.pm module that is also
included in the program.
nvl() is another useful function, it goes through each of its arguments in turn and returns the first one that has a value, if none of them do it returns an empty string. This means that all of the main program arguments will always have a defined value even if they are not defined in the calling form or URL. This simple default easily avoids undefined value issues within your programs!
Security and Anti-abuse Features...
Because the initprogenv()
subroutine is executed by all of
the programs on start-up it is a good place to add any checking or security
related routines.
Within this example we have little need for this so for now there is just a commented line in the code, however on a larger or more complex site this becomes relevant. There are several specific security issues that you may need to safeguard against, all are easily solved by adding the required code here.
Out Of Context Program Call
CGI programs are most usually
invoked from webpage forms, however it is also possible to call the program
directly by typing its name and arguments into the request URL. Many webpages
use JavaScript to pre-process or validate the entries before submission, many
entries are one of a select range of values within the form. Directly invoking
the program via the URL will bypass this and increase the risk of 'dirty' input
data, the program may be running outside of its intended scope.
The solution is to check the referer value, another CGI environment variable, (the mis-spelling is intentional!). This will contain the URL of the page from which the program was called and will normally be the URL for the form itself. If however the program was invoked from the command line or the address bar of the browser then the referer value will be blank. This is easily tested for, the response, if any, will of course depend on the requirements of the program and the website in general, for many simple programs there is no reason to disallow this behaviour at all.
Resource Theft/Abuse
In many ways this is a similar issue to the 'Out of Context Call'
discussed above, the difference here is that someone is deliberately calling
the program externally to either steal your resources or to be a nuisance.
This is especially true of programs that can send email or make postings to to the site; if left unguarded a simple website email form becomes a spam-portal!
As before the solution is simple, check the referer
value
for 'unusual' calls and check the input to make sure it has no unexpected
inclusions such as embedded HTML, multiple email addresses or email source
code. You can also check other values such as cookies, or the IP address
of the call to identify valid, or more importantly invalid traffic.
Password Controlled Access
Using .htaccess
files on an Apache webserver allows
certain directories and all sub-directories beneath them to be password
controlled. This method defines a 'realm', and when a password is
successfully given it lets the browser know that the authentication for
this realm has been done. This is a tried and tested method that works well
for static pages.
Unfortunately CGI resources are not usually stored within that part of the
filesystem, the /cgi-bin
will nearly always be outside of
the realm and so this methodology does not protect CGI resources. Not only
but the CGI environment does not know the status of any previous authentication
and so cannot tell if you have previously logged into the protected area or
not.
There is of course a solution, (the exact technical details will be discussed elsewhere on this site), but for now suffice it to say once access has been made to a protected and static page, and the authentication established, a cookie can be written. It is this cookie that the CGI programs can test for to see if the call is permitted.
The initprogenv()
subroutine is the place to make this test
and to re-direct the output if it fails.
All of the ideas given here are fairly simple, yet combined according to the needs of your website they provide a very robust environment and a stable platform upon which to base the rest of your code. Once you have these routines established and working you can forget about them to concentrate on the 'proper' coding for the dynamic pages themselves.