Drag and Drop CGI | 3 | WebReference

# Drag and Drop CGI | 3

## Configuring and Installing the ICE Scripts

Configuring ICE is largely a matter of setting the directory paths and Web server aliases in the configuration sections of the two script files. The script will work perfectly well without a thesaurus file (which we'll show you how to create later in this chapter). Don't feel pressured to come up with valid alternates for the words in your pages at this point (e.g., physician'' for doctor''). Follow these steps to install and configure ICE.

### Step 1: Configuration

We start our configuration party with the indexing script. Load the script, called
/scripts/ice/ice-idx.pl, from the CD-ROM into your editor. We'll point out the lines that need to be changed by line number in the original file, since ICE doesn't follow the configuration block format of the other scripts in the book. Be sure to turn line-wrapping off in your editor so that your line numbers will be the same as ours.

## Hairsaver

Keep in mind that as you edit the script filesÂadding and deleting linesÂthat the lines below the added/deleted lines will change numbers. We configure the items in order from top to bottom, so just look down a few lines from the last item to find the next. In other words, the normal process of customizing your script will cause your line numbers to differ from those in the original file.

As always, you need to check the first line of the file (e.g., #!/usr/bin/perl) for the correct path to your Perl interpreter. Check your ISP/SA questionnaire and modify this line as needed.

Listing 12.2 shows a section of ice-idx.pl starting at line 19.

Listing 12.2 Configuration portion of ice-idx.pl

#--- start of configuration --- put your changes here ---

# NOTE: Depending on your Perl implementation, you may

# have to use different path separators in the following

# paths when you are on a Macintosh or PC system. In that

# case, a path may look like e.g. "usr:foo:bar" (Mac), or

# "\\usr\\foo\\bar" resp. '\usr\foo\bar' (PC).

# The physical directory/directories to scan for html-files.

# It's better to supply a tailing "/" for each directory,

# since otherwise automounting may not work.

# Example:

#  @SEARCHDIRS=('/usr/www/dir','/tmp/html','/usr/foo/html-dir');

@SEARCHDIRS=(

  '/tmp/',

);

# Location of the index file.

# Example:

#  $INDEXFILE='/usr/local/httpd/index.idx';  $INDEXFILE='/tmp/index.idx';

# The ICE indexer will support full international characters by

# converting them to their html equivalent if $ISO is set.  # This has a slightly negative impact on the indexing speed, so  # set it to "y" only if you index files with 8 bit international  # charcters. OTHERWISE DONT! iso2html seems to cause a memory  # leak, causing the indexer to run forever. I'm working on it.  $ISO="n";

# Type of system (for figuring out the path delimiting character)

# that ice-idx.pl runs on. Select one of "UNIX", "MAC", or "PC"

$TYPE="UNIX";  # Minimum length of word to be indexed  $MINLEN=3;

#--- end of configuration --- don't change anything below ---


Scan down through the file to line 31. You should see the Perl array variable @SEARCHDIRS. This variable defines an array of directories to search for HTML files. Files in subdirectories below these directories will also be indexed. Unless you are indexing multiple Web sites, you will probably have only one entry here. Enter the full path to your root HTML directory in line 32, which currently reads:

'/tmp/',


(Check your ISP/SA questionnaire if you don't have the path memorized.) The edited block should look something like this:

@SEARCHDIRS=(

'/home/users/joeuser/public_html',

);


Windows users will need to include the disk drive letter and use the backslash character, in this manner:

@SEARCHDIRS=(
'd:\httpd\htdocs\joeuser',
);


The author of ICE recommends putting the trailing slash on the directory name to force UNIX to mount a remote disk if needed. We don't recommend you do this, since it's unlikely you'll need to mount any disks on your Web server. Try running the script without the trailing slash. If ice-idx.pl complains about not being able to find the files, you can add it.

Normally, one directory and its subdirectories should be all you'll need to index. To include more directories, simply place each directory's full path on a line by itself. Each directory must be enclosed within single quotation marks and have a comma after the closing mark, as follows:

@SEARCHDIRS=(

'/home/users/joeuser/public_html',

'/home/companies/joescorp/public_html/prod_specs',

);


Next, you set the location of the index file. We normally put the index file in the cgi-bin/ directory along with the scripts, but you can put it anywhere you like. You'll need to edit line 38 in the file to change the $INDEXFILE variable $INDEXFILE='/tmp/index.idx';


to point to a convenient location such as

$INDEXFILE = '/home/users/joeuser/public_html/cgi-bin/index.idx';  Be sure to put single quotation marks around the file specification and end the line with a semicolon to keep Perl happy. The next three variables are settings that, if acceptable, you can safely leave alone. • Line 46 contains a variable called$ISO. This variable controls whether extended ISO-Latin characters (such as umlauts in German) are converted to their HTML equivalent. If your Web pages contain these characters, set this variable to y; otherwise, leave it as is.
• Line 50 contains the variable $TYPE. This variable defines the type of directory separators used in the file path. The three options are "UNIX", "MAC", and "PC". Change this variable to match your system if you aren't using UNIX. • Line 53 contains the$MINLEN variable. This variable sets the length of the smallest word to be included in the index. Setting it to a larger value will produce slightly smaller indexes, since the short words, such as conjunctions, won't be included. We recommend you leave this value as is unless you have a compelling reason to change it.
• When you've finished editing the script, save it to a convenient location for upload to your Web server.
Next, you configure the actual search script--ice-form.cgi--located on the CD-ROM as /scripts/ice/ice-form.pl. After loading the script into your editor, check the first line of the file and set the correct path to your Perl interpreter.

Listing 12.3 shows the configuration section of the ice-form.pl file.

Listing 12.3 Configuration section of ice-form.pl

#!/usr/local/bin/perl

#

# ice-form.pl -- cgi compliant ICE search interface // Jun 24 1996

#

# ICE Version 1.31

# (C) Christian Neuss (http://www.informatik.th-darmstadt.de/~neuss)

#--- start of configuration --- put your changes here ---

# Title or name of your server:

#   Example: local($title)="ICE Indexing Gateway";  local($title)="ICE Indexing Gateway";

# search directories to present in the search dialogue

#   Example:

# local(@directories)=(

#    "Image Communication Information Board (/icib)",

#    "WISE (/some/where/wise)"

# );

local(@directories)=(

    "Image Communication Information Board (/icib)",

    "WISE (/www/projects/wise)",

    "Multimedia Survey (/www/projects/mms)",

    "Department A2 (/www/igd-a2)",

    "Department A8 (/www/igd-a8)",

    "Department A9 (/www/igd-a9)",

    "DZSIM (/www/projects/dzsim)",

    "CSCW Laboratory (/www/projects/cscw-lab)",

    "Software Catalog (/www/projects/sw-catalog)",

    "WWW-Schulung (/www/igd-a3/schulung)",

    "DZSIM (/www/projects/dzsim)",

    "ZGDV User Interface GROUP (/www/zgdv-uig)"

);

# Location of the indexfile:

#   Note: under Windows or Windows NT, add the drive letter

#   Example: $indexfile='/usr/local/etc/httpd/index/index.idx';  $indexfile='/tmp/index.idx';

# Location of the thesaurus data file:

#   Example: $thesfile='/igd/a3/home1/neuss/Perl/thes.dat';  $thesfile='/igd/a3/home1/neuss/Perl/thes.dat';

# URL Mappings (a.k.a Aliases) that your server does

# map "/" to some path to reflect a "document root"

#   Example

#   %urltopath = (

#   '/projects',   '/usr/stud/proj',

#   '/people',   '/usr3/webstuff/staff',

#   '/',      '/usr3/webstuff/documents',

#   );

#

%urltopath = (

  '/',         '',

);

#--- end of configuration --- you don't have to change anything below ---

The first configuration item, after the location of Perl, is located at line 11 in the file:

local($title)="ICE Indexing Gateway";  The$title variable defines the title of the search form and the <H1> heading that appears on the search form and results pages. If you modify the title, make sure to keep double quotation marks in place before and after the string and to end the line with a semicolon.

The next item provides a way for the user to search only a portion of the Web site (by default, all directories are searched). The lines starting at line 20 look like this:

local(@directories)=(

    "Image Communication Information Board (/icib)",

    "WISE (/www/projects/wise)",

    "Multimedia Survey (/www/projects/mms)",

    "Department A2 (/www/igd-a2)",

    "Department A8 (/www/igd-a8)",

    "Department A9 (/www/igd-a9)",

    "DZSIM (/www/projects/dzsim)",

    "CSCW Laboratory (/www/projects/cscw-lab)",

    "Software Catalog (/www/projects/sw-catalog)",

    "WWW-Schulung (/www/igd-a3/schulung)",

    "DZSIM (/www/projects/dzsim)",

    "ZGDV User Interface GROUP (/www/zgdv-uig)"

);


First, you should delete all the lines between local (@directories) = and ); unless your server happens to have exactly the same directory setup (not very likely). More likely, you'll need to enter your own list of names and directories relative to the server root. For example, if the URL of your home page is

http://www.myco.com/


you may have several directories within your Web site to keep things organized, something like this:

http://www.myco.com/products/

http://www.myco.com/news/

http://www.myco.com/downloads/

http://www.myco.com/company_info/

http://www.myco.com/people/openings/


To offer users the option of selectively searching each of these directories, you would create the following list:

local (@directories) = (

'Product Information (/products)',

'MyCo in the News (/news)',

'Free Downloads! (/downloads)',

'All about MyCo (/company_info)',

'Help Wanted (/people/openings)',

);


You don't need to provide an entry for the whole site because that's included by default. Notice that only the part of the URL after the domain name is included. You must include that part of the line shown in parentheses. The script reads this information to create the links returned by the search engine. As always, be sure to enclose the text within single quotation marks and separate the entries with a comma. Also make sure you include both parentheses. Later, you will define the mapping between the URL and the actual file location on the server. If you have only a few files, or if they're all located in a single directory, you can comment out this variable. Just place a # in front of everything from the local (@directories)= line to the closing ); line.

Next, you specify the location of the index file. Line 38 in the original file looks like this:

$indexfile='/tmp/index.idx';  You must make this variable match the entry you put in the ice-idx.pl script. If they don't match, you won't get much searching done.  Line 43 contains the location of the thesaurus file: $thesfile='/igd/a3/home1/neuss/Perl/thes.dat';

As with the index file location, this variable gives the full path to the file. You can comment
out this line if you don't have a thesaurus. We cover creating a thesaurus later in the chapter. For now, comment out the line so that you can run some tests.

Last, but certainly not least, is the most important configuration item: the mapping between your URLs and the actual directory paths on the Web server. If you make a mistake here, the search will probably appear to work, but all the returned links will be incorrect. This will frustrate your users and do nothing for your budding reputation as a Perl-meister.

Starting at line 46 we see the following lines in the script:

#   Example

#   %urltopath = (

#   '/projects',   '/usr/stud/proj',

#   '/people',   '/usr3/webstuff/staff',

#   '/',      '/usr3/webstuff/documents',

#   );

#

%urltopath = (

  '/',         '',

);


You must have at least one entry in the %urltopath variable; namely, the path to your root Web directory. A typical entry might look like this:

%urltopath = (

'/~joeuser', '/home/users/joeuser/public_html'

);


This would work if you have a typical user'' Web page with a URL that looks like this:

http://www.myisp.com/~joeuser/

In contrast, a virtual domain Web page with a URL like

http://www.myco.com/

might have an entry like

%urltopath = (

'/', '/home/corp/myco/public_html'

);


In this case, the Web server has an alias'' set up so that the root of myco.com ('/') is the file path above ('/home/myco/public_html'). If you have a dedicated Web server, your URL will look like the virtual domain example. However, you likely will have an entry more like this:

%urltopath = (

'/', '/usr/local/httpd/htdocs'

);


where the path is the directory where the Web server expects to find the root HTML files. This directory is set within the server configuration files.

You can create other entries in the %urltopath variable if your server has other aliases set up. For example, you might have an entry like this:

%urltopath = (

'/', '/usr/local/httpd/htdocs',

'/products', '/usr/marketing/prod/web_stuff'

);


Notice that the '/products' directory has a much different path on the server than does the home page. It might even be on a different machine, with the remote disk mounted via a network file system. Most people will probably be able to get away with just the root directory entry. But if you have parts of your Web site spread around your server, you can get ICE to recognize it by configuring the %urltopath variable.

## Hair Saver

Be sure to include all of the paths to the parts of your Web site in the ice-idx.pl @SEARCHDIRS variable. Otherwise, the documents in those directories won't be indexed and all your hard work editing %urltopath will be for nought.

When you are finished editing both ice-idx.pl and ice-form.pl, save them in a con-venient location for upload to the Web server. Remember, you may have to rename ice-form.pl to ice-form.cgi, depending on the file extension requirements for your Web server. Check your ISP/SA questionnaire if in doubt.

That concludes the basic configuration of the two ICE scripts. We cover advanced configuration of the input and output of ICE along with creating a thesaurus file in the Advanced ICE Configuration'' section later in the chapter. For some quick gratification, move on to installing and testing the scripts in their current state.