##########################################################
#
# xref
#
# An automatic cross-reference linking script for Web
# pages.
#
##########################################################
Version: 0.53
Date: 11/14/2008
Author: D.M. Ragle, dragle@quinstreet.com
Site: http://webref.com/programming/perl/xref
README.TXT CONTENTS
-------------------
I. Requirements
II. Setup Instructions
III. Retrieving Terms
IV. Notes
===========================================================
I. Requirements
To set up and use this system, you'll need Perl 5.6.1 or
later installed on your server (5.8 or later is
recommended), and access to place scripts and files into
your cgi-bin.
You will also need the following Perl modules installed on
your system:
CGI
HTML::Template
Date::Format
Time::Local
All but the HTML::Template module are typically installed
with your Perl distribution; HTML::Template (and any others
that may be missing) you can pick up from CPAN.
xref will work in both a standard and mod_perl enabled Web
server.
To view a list of changes from version to version, read
the CHANGES.TXT file.
Additional notes and information for xref can be found at
http://webref.com/programming/perl/xref.
===========================================================
II. Setup Instructions
It's presumed that you've already unzipped the xref.zip
distribution file. If not, do so now in its own directory.
You will then have the following files:
| xref.js
| init_js.tmpl
| term_select.tmpl
| init_js_opt.tmpl
| term_select_opt.tmpl
| README.TXT
| CHANGES.TXT
To set up the system, follow these steps. Unless otherwise
specified, the actual names of files and directories that
you create are up to you; but I'll use a typical naming
structure here that I recommend you go with unless you
have a specific reason not to. I recommend that you read
all of these steps in their entirety before beginning the
actual work.
The following steps assume you're a new user, installing
xref for the first time. Even if you're upgrading I
recommend that you browse these instructions; since some
features, configuration steps, and parameters may be new
to this version (and you may need or wish to use these
new features/parameters in your existing setup).
1. Create an "xref" directory within your cgi-bin.
2. Edit the xref.js file. Change the first line of the
file to point to the actual perl binary on your server.
You can find this location by typing:
which perl
at your server's command prompt. For example, if "which perl"
responds with "/usr/bin/perl," then you would change the top
line of the script to be this:
#!/usr/bin/perl -T
(That's the default setting, by the way.)
3. Edit the get_data subroutine in the xref.js file so
it includes and/or properly retrieves your terms. Instructions
for doing this can be found in Section III of this README
("Retrieving Terms").
4. Copy or FTP the xref.js file you edited in steps 2 and 3
into the xref directory of your cgi-bin that you created
in step 1. The script should be flagged with read/execute
rights; something like 0755.
5. Copy or FTP the init_js.tmpl and term_select.tmpl files
into the xref directory you created in step 1. Both of
these files should be marked world readable but unwritable
(something like 0644). Or, if you prefer an obfuscated/
space optimized version of the JavaScript, then upload
term_select_opt.tmpl and init_js_opt.tmpl, instead. If
you choose to upload the optimized versions, then you'll
need to rename them to term_select.tmpl and init_js.tmpl
for them to be recognized by xref.
6. To activate the script, you must include it in the Web
pages that you want it to automatically create links
within. You can do this by adding this command to each
target Web page:
replacing "http://example.com" with your actual domain
name. Affiliate sites can also access the script by
including the same statement in their pages; they needn't
have their own local copy of the script!
===========================================================
III. Retrieving Terms
The default version of xref.js that ships with this ditribution
contains a statically defined list of terms. And unless you
particularly like fruits (and want the script to automatically
link terms like Apples, Bananas, and Pears), then you'll need
to adjust the script to retrieve and/or use your own term list.
The place to do this is in the get_data subroutine of the
script. Within that function is a block of code that looks
like this:
# BEGIN CUSTOM....
# The default setup: The terms
# are defined statically
$master_data = {
'1' => { 'title' => 'Apple',
'URL' => 'http://en.wikipedia.org/wiki/Apple',
'description' => 'The pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae.' },
...
'10' => { 'title' => 'Pears',
'URL' => 'http://en.wikipedia.org/wiki/Grape',
'description' => 'A pomaceous fruit produced by a tree of genus Pyrus.' },
};
# END CUSTOM
To use your own term list, you'll need to replace the
block above with your own code that fills in the master
data hash in the same manner that this section does. How
you do that is limited only by your knowledge of Perl. You
could simply define a static list of terms, similar to the
fruity list above; or you could dynamically retrieve the
terms from a database or a file. Examples of both of these
possibilities will be given later in this README.
However you do it, the term list you create must be in
exactly the same format as above; a hash of hashes, where
each sub-hash contains 'title', 'URL', and 'description'
fields. Each of these fields must be utf-8 encoded. If
you're unfamiliar with the creation of hashes, you may
want to review this primer on the topic:
http://webref.com/programming/perl/nested/
Here's what needs to go in each of the fields:
title
The exact term/keywords that you will be
matching in the Web page
description
A plain (and typically brief) description or
definition of the term
URL
The URL that will be used for the link created
for this term. The URL will be used exactly as
is; so if it needs to be absolute (recommended),
be sure to make it absolute.
Each record in the hash is keyed by a unique id. This ID
must be alphanumeric (upper and lower case letters,
underscores, and numbers are allowed). For consistency,
it's strongly recommended that you use the unique key
for the term from your DB, if possible. If not using a DB,
just start with a key of 1 for the first item and count
up from there.
The get_data subroutine is passed a variable that will
indicate what data must be returned to the script.
This parameter, called $data_set, will be set to either
'init', or 'terms.' You can use this parameter to limit
the data you need to retrieve (useful for DB calls, so
you can limit the data you request from the DB to only
what you need). If $data_set is equal to 'init', then
only the title is required to be in the hash. If
$data_set is equal to 'terms', then the title, URL, and
description are required (and of course the key is always
required). Also, if $data_set is equal to 'terms', then
the variable $params->{'term_list'} will include a comma-
delimited list of keys that the terms function needs (so
you can limit your selection to only those records, if you
like). get_data automatically extracts those keys into
@key_list; so you can use either $params->{'term_list'}
or @key_list as per your needs.
When deciding how to retrieve your terms, remember that the
xref script may be called repeatedly on your server. While
not graceful in appearance, a static-defined list (like our fruits)
may provide you with best performance, since it avoids an
I/O call with each script hit. On the other hand, a static
list may require more memory than necessary, since you'll
need to include the full list (including URLs and descriptions)
even though you may not need them. If you have a fast DB
with good caching, then a dynamic retrieval against it may
very well be the best choice of all.
EXAMPLES
========
The following are two potential examples of how you might
retrieve the data dynamically in your own version of
xref.
MySQL
-----
The first example retrieves the data from a MySQL table.
Assuming you have a "Terms" table in your DB that looks
like this:
mysql> describe Terms;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| term_key | int(10) | NO | PRI | NULL | auto_increment |
| term_title | varchar(30) | NO | | | |
| term_url | varchar(256) | NO | | | |
| term_desc | varchar(256) | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
You could use code like this to retrieve it (be sure to
fill in your own user, host, and DB name, and password!):
# BEGIN CUSTOM....
use DBI;
my $user = 'read';
my $pass = 'read_pass';
my $host = 'localhost';
my $db = 'TermDefs';
# Connect to the DB
my $dbh = DBI->connect("DBI:mysql:hostname=$host;database=$db",
$user, $pass);
# Clear the master hash
$master_data = {};
# Retrieve the terms data,
# and fill in master_data
if ($data_set eq 'init') {
my $sth = $dbh->prepare('SELECT term_key, term_title FROM Terms');
$sth->execute();
while (my $row = $sth->fetchrow_hashref) {
$master_data->{$row->{'term_key'}} = {
'title' => $row->{'term_title'}
};
}
}
elsif (($data_set eq 'terms') &&
($params->{'term_list'})) {
my $sth = $dbh->prepare('SELECT term_key, term_title,
term_url, term_desc
FROM Terms
WHERE term_key=?');
foreach my $search_key (@key_list) {
$sth->execute($search_key);
if (my $row = $sth->fetchrow_hashref) {
$master_data->{$row->{'term_key'}} = {
'title' => $row->{'term_title'},
'URL' => $row->{'term_url'},
'description' => $row->{'term_desc'} || ''
};
}
}
}
# END CUSTOM
Of course, you might also need to tweak the field names,
select statements, etc. to match your own DB definitions.
You can assume that entries in the @key_list include
only letters, numbers, and underscores; since this is
enforced in the get_params section of the script.
You might also consider LIMITing the init results
(see the SQL LIMIT clause if you don't know what
I'm talking about) if your term list is very large
(say, better than 1,000 entries).
XML
---
While much shorter than the DB example above, retrieving
XML data (especially via XML::Simple, as we do here) might
not be as efficient as a DB call, or even a static list.
Nonetheless, it may be the most reasonable method for you
to retrieve your terms; especially if your list is
fairly small. For this example, I'll assume you have your
XML file set up as follows:
1
Apple
http://en.wikipedia.org/wiki/Apple
The pomaceous fruit of the apple tree, species Malus
domestica in the rose family Rosaceae.
2
Orange
http://en.wikipedia.org/wiki/Orange_(fruit)
A hybrid of ancient cultivated origin, possibly between
pomelo (Citrus maxima) and tangerine (Citrus reticulata).
And here's the code:
# BEGIN CUSTOM....
use XML::Simple qw(XMLin);
my $xml = XMLin('terms.xml', forcearray => ['term'],
keyattr => ['term_key']);
$master_data = ($xml->{'term'}) ? $xml->{'term'} : {};
# END CUSTOM
Again, depending on how you have your xml file defined,
you may need to change the forcearray or keyattr
definitions, or map the resulting hash to the fields needed
in $master_data. See perldoc XML::Simple for more information.
===========================================================
IV. Notes
1. The script will only match terms that are themselves the
immediate children of specific HTML elements; these key
elements include:
div
span
strong
b
em
i
p
Additionally, the script won't include links within
already existing links of a page. These checks are included
in an attempt to include links in "logical" areas of the
page; i.e., areas where the user would expect text links to
appear. This may cause the script to appear to be broken;
because it may ignore terms that you believe should be
automatically linked. If this is a problen for your
particular implementation of xref, then you'll need to
adjust this line of code in init_js.tmpl to include the
additional tags that you would like considered:
this.rAllowedElements=/^(DIV|SPAN|P|STRONG|B|EM|I)$/i;
Remember that this check is made against the immediate parent
of a text node; it's not made against the elements of the
page as a whole. In other words, if your entire page is
enclosed within a table, xref can still find the terms within
it (provided your table has divs, spans, paragraphs, etc.
within it).
2. Be especially careful when using the scan_now option;
specifically, when using scan_now I strongly recommend that
you place the script at the very end of your Web pages,
just before the closing