########################################################## # # xref # # An automatic cross-reference linking script for Web # pages. # ########################################################## Version: 0.53 Date: 11/14/2008 Author: D.M. Ragle, dragle@internet.com Site: http://webref.com/programming/perl/xref README.TXT CONTENTS ------------------- I. Requirements II. Setup Instructions III. Retrieving Terms IV. Notes =========================================================== I. Requirements To set up and use this system, you'll need Perl 5.6.1 or later installed on your server (5.8 or later is recommended), and access to place scripts and files into your cgi-bin. You will also need the following Perl modules installed on your system: CGI HTML::Template Date::Format Time::Local All but the HTML::Template module are typically installed with your Perl distribution; HTML::Template (and any others that may be missing) you can pick up from CPAN. xref will work in both a standard and mod_perl enabled Web server. To view a list of changes from version to version, read the CHANGES.TXT file. Additional notes and information for xref can be found at http://webref.com/programming/perl/xref. =========================================================== II. Setup Instructions It's presumed that you've already unzipped the xref.zip distribution file. If not, do so now in its own directory. You will then have the following files: | xref.js | init_js.tmpl | term_select.tmpl | init_js_opt.tmpl | term_select_opt.tmpl | README.TXT | CHANGES.TXT To set up the system, follow these steps. Unless otherwise specified, the actual names of files and directories that you create are up to you; but I'll use a typical naming structure here that I recommend you go with unless you have a specific reason not to. I recommend that you read all of these steps in their entirety before beginning the actual work. The following steps assume you're a new user, installing xref for the first time. Even if you're upgrading I recommend that you browse these instructions; since some features, configuration steps, and parameters may be new to this version (and you may need or wish to use these new features/parameters in your existing setup). 1. Create an "xref" directory within your cgi-bin. 2. Edit the xref.js file. Change the first line of the file to point to the actual perl binary on your server. You can find this location by typing: which perl at your server's command prompt. For example, if "which perl" responds with "/usr/bin/perl," then you would change the top line of the script to be this: #!/usr/bin/perl -T (That's the default setting, by the way.) 3. Edit the get_data subroutine in the xref.js file so it includes and/or properly retrieves your terms. Instructions for doing this can be found in Section III of this README ("Retrieving Terms"). 4. Copy or FTP the xref.js file you edited in steps 2 and 3 into the xref directory of your cgi-bin that you created in step 1. The script should be flagged with read/execute rights; something like 0755. 5. Copy or FTP the init_js.tmpl and term_select.tmpl files into the xref directory you created in step 1. Both of these files should be marked world readable but unwritable (something like 0644). Or, if you prefer an obfuscated/ space optimized version of the JavaScript, then upload term_select_opt.tmpl and init_js_opt.tmpl, instead. If you choose to upload the optimized versions, then you'll need to rename them to term_select.tmpl and init_js.tmpl for them to be recognized by xref. 6. To activate the script, you must include it in the Web pages that you want it to automatically create links within. You can do this by adding this command to each target Web page: replacing "http://example.com" with your actual domain name. Affiliate sites can also access the script by including the same statement in their pages; they needn't have their own local copy of the script! =========================================================== III. Retrieving Terms The default version of xref.js that ships with this ditribution contains a statically defined list of terms. And unless you particularly like fruits (and want the script to automatically link terms like Apples, Bananas, and Pears), then you'll need to adjust the script to retrieve and/or use your own term list. The place to do this is in the get_data subroutine of the script. Within that function is a block of code that looks like this: # BEGIN CUSTOM.... # The default setup: The terms # are defined statically $master_data = { '1' => { 'title' => 'Apple', 'URL' => 'http://en.wikipedia.org/wiki/Apple', 'description' => 'The pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae.' }, ... '10' => { 'title' => 'Pears', 'URL' => 'http://en.wikipedia.org/wiki/Grape', 'description' => 'A pomaceous fruit produced by a tree of genus Pyrus.' }, }; # END CUSTOM To use your own term list, you'll need to replace the block above with your own code that fills in the master data hash in the same manner that this section does. How you do that is limited only by your knowledge of Perl. You could simply define a static list of terms, similar to the fruity list above; or you could dynamically retrieve the terms from a database or a file. Examples of both of these possibilities will be given later in this README. However you do it, the term list you create must be in exactly the same format as above; a hash of hashes, where each sub-hash contains 'title', 'URL', and 'description' fields. Each of these fields must be utf-8 encoded. If you're unfamiliar with the creation of hashes, you may want to review this primer on the topic: http://webref.com/programming/perl/nested/ Here's what needs to go in each of the fields: title The exact term/keywords that you will be matching in the Web page description A plain (and typically brief) description or definition of the term URL The URL that will be used for the link created for this term. The URL will be used exactly as is; so if it needs to be absolute (recommended), be sure to make it absolute. Each record in the hash is keyed by a unique id. This ID must be alphanumeric (upper and lower case letters, underscores, and numbers are allowed). For consistency, it's strongly recommended that you use the unique key for the term from your DB, if possible. If not using a DB, just start with a key of 1 for the first item and count up from there. The get_data subroutine is passed a variable that will indicate what data must be returned to the script. This parameter, called $data_set, will be set to either 'init', or 'terms.' You can use this parameter to limit the data you need to retrieve (useful for DB calls, so you can limit the data you request from the DB to only what you need). If $data_set is equal to 'init', then only the title is required to be in the hash. If $data_set is equal to 'terms', then the title, URL, and description are required (and of course the key is always required). Also, if $data_set is equal to 'terms', then the variable $params->{'term_list'} will include a comma- delimited list of keys that the terms function needs (so you can limit your selection to only those records, if you like). get_data automatically extracts those keys into @key_list; so you can use either $params->{'term_list'} or @key_list as per your needs. When deciding how to retrieve your terms, remember that the xref script may be called repeatedly on your server. While not graceful in appearance, a static-defined list (like our fruits) may provide you with best performance, since it avoids an I/O call with each script hit. On the other hand, a static list may require more memory than necessary, since you'll need to include the full list (including URLs and descriptions) even though you may not need them. If you have a fast DB with good caching, then a dynamic retrieval against it may very well be the best choice of all. EXAMPLES ======== The following are two potential examples of how you might retrieve the data dynamically in your own version of xref. MySQL ----- The first example retrieves the data from a MySQL table. Assuming you have a "Terms" table in your DB that looks like this: mysql> describe Terms; +------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+--------------+------+-----+---------+----------------+ | term_key | int(10) | NO | PRI | NULL | auto_increment | | term_title | varchar(30) | NO | | | | | term_url | varchar(256) | NO | | | | | term_desc | varchar(256) | YES | | NULL | | +------------+--------------+------+-----+---------+----------------+ You could use code like this to retrieve it (be sure to fill in your own user, host, and DB name, and password!): # BEGIN CUSTOM.... use DBI; my $user = 'read'; my $pass = 'read_pass'; my $host = 'localhost'; my $db = 'TermDefs'; # Connect to the DB my $dbh = DBI->connect("DBI:mysql:hostname=$host;database=$db", $user, $pass); # Clear the master hash $master_data = {}; # Retrieve the terms data, # and fill in master_data if ($data_set eq 'init') { my $sth = $dbh->prepare('SELECT term_key, term_title FROM Terms'); $sth->execute(); while (my $row = $sth->fetchrow_hashref) { $master_data->{$row->{'term_key'}} = { 'title' => $row->{'term_title'} }; } } elsif (($data_set eq 'terms') && ($params->{'term_list'})) { my $sth = $dbh->prepare('SELECT term_key, term_title, term_url, term_desc FROM Terms WHERE term_key=?'); foreach my $search_key (@key_list) { $sth->execute($search_key); if (my $row = $sth->fetchrow_hashref) { $master_data->{$row->{'term_key'}} = { 'title' => $row->{'term_title'}, 'URL' => $row->{'term_url'}, 'description' => $row->{'term_desc'} || '' }; } } } # END CUSTOM Of course, you might also need to tweak the field names, select statements, etc. to match your own DB definitions. You can assume that entries in the @key_list include only letters, numbers, and underscores; since this is enforced in the get_params section of the script. You might also consider LIMITing the init results (see the SQL LIMIT clause if you don't know what I'm talking about) if your term list is very large (say, better than 1,000 entries). XML --- While much shorter than the DB example above, retrieving XML data (especially via XML::Simple, as we do here) might not be as efficient as a DB call, or even a static list. Nonetheless, it may be the most reasonable method for you to retrieve your terms; especially if your list is fairly small. For this example, I'll assume you have your XML file set up as follows: 1 Apple http://en.wikipedia.org/wiki/Apple The pomaceous fruit of the apple tree, species Malus domestica in the rose family Rosaceae. 2 Orange http://en.wikipedia.org/wiki/Orange_(fruit) A hybrid of ancient cultivated origin, possibly between pomelo (Citrus maxima) and tangerine (Citrus reticulata). And here's the code: # BEGIN CUSTOM.... use XML::Simple qw(XMLin); my $xml = XMLin('terms.xml', forcearray => ['term'], keyattr => ['term_key']); $master_data = ($xml->{'term'}) ? $xml->{'term'} : {}; # END CUSTOM Again, depending on how you have your xml file defined, you may need to change the forcearray or keyattr definitions, or map the resulting hash to the fields needed in $master_data. See perldoc XML::Simple for more information. =========================================================== IV. Notes 1. The script will only match terms that are themselves the immediate children of specific HTML elements; these key elements include: div span strong b em i p Additionally, the script won't include links within already existing links of a page. These checks are included in an attempt to include links in "logical" areas of the page; i.e., areas where the user would expect text links to appear. This may cause the script to appear to be broken; because it may ignore terms that you believe should be automatically linked. If this is a problen for your particular implementation of xref, then you'll need to adjust this line of code in init_js.tmpl to include the additional tags that you would like considered: this.rAllowedElements=/^(DIV|SPAN|P|STRONG|B|EM|I)$/i; Remember that this check is made against the immediate parent of a text node; it's not made against the elements of the page as a whole. In other words, if your entire page is enclosed within a table, xref can still find the terms within it (provided your table has divs, spans, paragraphs, etc. within it). 2. Be especially careful when using the scan_now option; specifically, when using scan_now I strongly recommend that you place the script at the very end of your Web pages, just before the closing tag. Internet Explorer 7, in particular, is very sensitive to dynamically inserted scripts when they occur between or within certain page elements, and at times during our testing would even go so far as to refuse to display the Web page at all! Placing the script at the end of the page seemed to provide consistently good (i.e., no crashes, no errors) behavior from all browsers. And if you don't use scan_now, the script seems quite stable anywhere on the page.