Friendly 404 Errors - WebReference Update - 011004 | WebReference

Friendly 404 Errors - WebReference Update - 011004

((((((((((((((((( WEBREFERENCE UPDATE NEWSLETTER ))))))))))))))))) October 4, 2001

_________________________Sponsors____________________________

FlashKit.com Nanotech-Planet.com _____________________________________________________________

/-------------------------------------------------------------------\ ** FlashKit Fall 2001 Developer's Conference Hits the U.S.** October 15-17th 2001- LA Convention Center- Los Angeles, CA Attend Flash Kit Fall 2001 and gain hands-on knowledge and business insights from expert speakers and gurus. From budding young designers to marketing professionals looking for creative new ideas, you need to be a part of this event! Register by October 3rd and Save! http://seminars.internet.com/flash/la01/index.html

\--------------------------------------------------------------adv.-/

This week we show how to make your 404s more friendly. All sites experience some degree of link rot. When users try to go to a non- existent page, they typically see an unhelpful error message. Two Michigan State University techies show you how to make a better 404 page by using an informative script.

http://www.webreference.com *- link to us today http://www.webreference.com/new/ *- newsletter home

New this week on WebReference.com and the Web:

1. FEATURE: Helping Lost Site Visitors: The Error 404 Handler 2. OTHER VOICES: * Web Development in the "Real World" * Introducing MovableType * A Cold Look at Chilled Speech 3. NET NEWS: * W3C Extends Comment Period on Patent Policy

Like what you see? Get our front page e-mailed to you every business day with our HTML newsletter. Just send an e-mail to:

subscribe-html@webreference.com

or for this text newsletter:

subscribe@webreference.com

Spread the word! Feel free to send a copy of this newsletter to your friends and colleagues, and while you're at it, snap a link to WebReference.com.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. FEATURE: Helping Lost Site Visitors: The Error 404 Handler

The savvy webmaster tries hard to make his or her site visually appealing and easy for visitors to navigate. But it's also important to be friendly when users visit pages that don't exist on your site.

We refer, of course, to the dreaded Error 404. This is the standard error code that Web servers issue when they are asked to fetch a URL that doesn't map to an existing resource on the server.

The simplest case is that the URL refers to a static file that no longer exists. Such "link rot" occurs when sites are reorganized or when old content is deleted. After a page is removed, users continue to follow old bookmarks, old search engine pointers, old links from other sites, or even old links as printed in a magazine or book. URLs may also be invalidated for database-driven content; for instance, the exact syntax required to retrieve a given virtual page may change subtly due to a change in underlying technology. Finally, sometimes site visitors will guess at a URL on your site, assuming that www.acme.com/about is a valid path. Any of these cases leads to an Error 404.

When a Web server receives a request for a non-existent page, by default it returns an Error 404 state code to the client, and offers explanatory text. The default text is often terse and not very helpful:

HTTP Error 404

404 Not Found

The Web server cannot find the file or script you asked for. Please check the URL to ensure that the path is correct.

Please contact the server's administrator if this problem persists.

It's easy to give the lost visitor a more helpful response - an Error 404 page that offers a friendlier message as well as some choices to help the user find whatever it was they were looking for. Fortunately it is pretty easy in Apache and in IIS to deliver friendly, useful responses when URLs go awry.

Most major sites that deal with the public offer such Error 404 handlers, including WebReference.com. At Michigan State University, we've implemented one that provides the user these options:

* Easy access to campus search engines. * Ability to click parts of the directory structure embedded in the URL, in the hopes of finding a valid parent page. * A simple search form with the URL as specified. (This helps users who don't realize they can type over the Address box in the browser.) * A Back hyperlink. (This helps users who have the navigation toolbar turned off. Yes, there are such users!) * Phone numbers for university help desks. * A link to the campus home page.

In our message to the user, we do not use the term "Error 404" or even the word "Error." We simply say the Web page was not found, and offer some possible reasons. We do, however, return the Error 404 status to the client computer, so that spiders won't be tempted to index the error page.

In our case, we also offer an image of the university mascot, Sparty, shrugging his shoulders or pointing at a laptop displaying the words "Web Page Not Found." The goal is to show the mascot empathizing with the user in a lighthearted, friendly fashion. To see the error handler in action, try this URL:

http://www.msu.edu/unit/wharton/thispartisnotfound

Other webmasters may want to incorporate some or all of these elements in their Error 404 handlers. We've packaged Unix/Apache and NT|2000/IIS versions of the error handler for others to use. (You'll have to use your own creative image; Sparty is a registered trademark of the university.)

>How To Use the Error 404 Handler

Both Apache Web server (http://www.apache.org) and Microsoft's Internet Information Services provide mechanisms for providing custom error responses. Not surprisingly, implementation of an Error 404 Handler is very different between the two platforms, but the concept is the same.

It's your choice whether author an HTML page or a script to give the user a meaningful response. We suggest a script instead of a static page. A script lets you can analyze environmental variables so that your message is tailored to the path the user followed - just as our handler does at MSU.

Two main steps are involved when employing an Error 404 handler for your site. First, you must write a script that returns a useful page to the user, most likely including some analysis on several environment variables. Then you must tell the Web server to process the script of your choice rather than sending the default response to clients.

/-------------------------------------------------------------------\ ** Announcing NanoTech-Planet.Com and Conference & Expo** Get the latest news and developments focusing on the business of nanotechnology. Understand the current applications, and learn where this technology will take biomedicine, materials science, microelectronics and optics in the future. See the companies and labs behind nanotechnology and the VC firms and Gov't agencies reviewing/investing in this breakthrough field. Conf.- Boston 11/29-30 http://www.nanotechplanet.com

\--------------------------------------------------------------adv.-/

>Writing Your Error Handling Script

For Apache, we wrote a standard CGI (http://www.w3.org/CGI/) script using PERL (http://www.cpan.org/). The following are some of the most useful environment variables in this context:

* HTTP_REFERER - If the client was sent to the invalid URL by a link or other mechanism, then usually this variable will hold the URL of the page they came from. * HTTP_HOST - The full hostname of the machine your Web server is running on. Using this variable allows you to put the same script on several installations, or to change your hostname without breaking the code. * REDIRECT_URL - The URL the client attempted to retrieve, causing the Error 404. (So HTTP_HOST + REDIRECT_URL = text in the browser's Address box.)

For example, code that will give the user a link back to the page they came from would look something like this:

if ($ENV{'HTTP_REFERER'} eq "") { print "Back"; } else { print "$ENV{'HTTP_REFERER'}"; }

Giving users the ability to traverse the directory structure embedded in the URL can be done by parsing the REDIRECT_URL by the '/' characters:

@dirs = split /\//, $ENV{'REDIRECT_URL'};

Then it is trivial to construct all the necessary links by looping through the @dirs array.

You could certainly use a similar solution for IIS, but the easiest way to set up an error handler on a Microsoft server is to use Active Server Pages (http://msdn.microsoft.com/asp). Either way, the important thing to note is that IIS provides different environment variables to retrieve the data you are interested in:

* HTTP_REFERER - Same as Apache. * QUERY_STRING - IIS stores the URL requested that caused the Error 404 in the query string. The whole URL, including the hostname is included here. * SERVER_PORT - The TCP port the Web server is running on. This is only important if you are running on a port other than the standard HTTP port 80.

For example, this ASP code will extract the URL that was requested and split it into the individual directories, with the hostname being the first element in the array:

If Len(url) > 0 Then url = Right(url, Len(url) - Instr(url, "//") - 1) Else Response.End End If

If Instr(url, "/") > 0 Then directories = Split(url, "/") Else Redim directories(0) directories(0) = url End If %>

First, the URL was extracted from the server variable QUERY_STRING. Then, the 'http://' was hacked off. Finally, the string was split on the '/' character to give you each directory.

>Configuring the Server to Run Your Error Handler Script

To set up Apache to run a script when an Error 404 is encountered:

* Open the configuration file, httpd.conf, and find the section labeled "customizable error responses." (The exact location is not important if your configuration file does not have this comment anywhere, except that you should be in the section of the file for the 'Main' server directives.) * Add the line:

ErrorDocument 404 /cgi-bin/errorhandler

The path in this example assumes you are executing a script on the same Web server called 'errorhandler' in the 'cgi-bin' directory. It is also possible to redirect to a different Web server, a static page, or to send a text message. * Restart Apache. This can usually be done at the command line by typing:

apachectl restart

To configure IIS in the same way: (Note: The directions that follow are for Windows 2000 and IIS 5.0. If you are running NT or an older version of IIS, required steps may vary somewhat.)

* Open Internet Services Manager. * Find the website you want to put the error handler in, right- click on its icon, and click "Properties" on the menu. * Go to the Custom Errors tab. * Find 404 in the HTTP Error column of the listbox and select it. * Click the "Edit Properties" button below the listbox to bring up the Error Mapping Properties Dialog. * Change the Message Type to URL. (You can alternatively specify a file to send when an Error 404 occurs.) * Type in the path from the root of your website to the error handler script. For example:

/errorhandler/errorhandler.asp

* Click OK on the Error Mapping Properties Dialog, then click OK on the Properties Dialog for your website.

An error handling script is no different than any other script you might write. You can do pretty much anything that suits your individual needs. For example, we present the user with a form that allows them to submit a query to one of our campus search engines. As a result, we actually had to write a second script to process the results of that form.

You may want to consider providing an error handler for other conditions, such as Error 403 (Forbidden).

The important thing to remember is that any friendly message, especially when combined with a similar look-and-feel as the rest of your website, is a great way to improve the user's experience. Novice users will not become frustrated when you give them an easy way to get back on track. The end result is happier site visitors who stay around to find more of your content.

For sample source code follow this link:

http://search.msu.edu/services/errorhandler/ http://www.msu.edu/unit/wharton/thispartisnotfound http://www.plinko.net/404/

# # # #

About the authors: Mathew R. Shuster (http://www.egr.msu.edu/~shusterm) is a senior studying management and computer science at Michigan State University and a student employee in the MSU Computer Laboratory. Richard W. Wiggins (http://richardwiggins.com) is a senior information technologist at MSU.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2. OTHER VOICES: Web Development in the "Real World", Introducing MovableType, A Cold Look at Chilled Speech

>Web Development in the "Real World"

Don Makoviney interviews a cross-section of developers from around the country to discover how they've changed with the Web.

http://www.makovision.com/articles/2001/10/interview_1.asp Makovision.com, Oct. 2001

>Introducing MovableType

There's a new content management/blogging system for the masses in town, named Movabletype. WriteTheWeb asked its founders to explain what they're up to.

http://www.movabletype.org/ http://writetheweb.com/read.php?item=115 WriteTheWeb, Oct. 2, 2001

>A Cold Look at Chilled Speech

When he finished his manuscript on copyright protections in the digital age last Thanksgiving, Siva Vaidhyanathan knew some things would change before its Oct. 1 publication date. Then came a Tuesday in September.

http://www.wired.com/news/privacy/0,1848,47195,00.html Wire.com, Oct. 3, 2001

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3. NET NEWS: W3C Extends Comment Period on Patent Policy

>W3C Extends Comment Period on Patent Policy

Following a storm of last minute comments, the standards body said it will extend the review period on its Patent Policy Framework until Oct. 11.

http://www.webstandards.org/ http://www.internetnews.com/dev-news/article/0,,10_896021,00.html Internetnews.com, Oct. 2, 2001

That's it for this week, see you next time.

Andrew King Newsletter Editor, WebReference.com aking@internet.com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Advertising: If you are interested in advertising in our newsletters, call Claudia at 1-203-662-2863 or send email to mailto:nsladsales@internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For contact information on sales offices worldwide visit http://www.internet.com/mediakit/salescontacts.html ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For details on becoming a Commerce Partner, contact David Arganbright on 1-203-662-2858 or mailto:commerce-licensing@internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To learn about other free newsletters offered by internet.com or to change your subscription visit http://e-newsletters.internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ internet.com's network of more than 160 Web sites are organized into 16 channels: Internet Technology http://internet.com/it E-Commerce/Marketing http://internet.com/marketing Web Developer http://internet.com/webdev Windows Internet Technology http://internet.com/win Linux/Open Source http://internet.com/linux Internet Resources http://internet.com/resources ISP Resources http://internet.com/isp Internet Lists http://internet.com/lists Download http://internet.com/downloads International http://internet.com/international Internet News http://internet.com/news Internet Investing http://internet.com/stocks ASP Resources http://internet.com/asp Wireless Internet http://internet.com/wireless Career Resources http://internet.com/careers EarthWeb http://www.earthweb.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To find an answer - http://search.internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Looking for a job? Filling an opening? - http://jobs.internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This newsletter is published by Jupitermedia Corp http://internet.com - The Internet & IT Network Copyright (c) 2001 Jupitermedia Corp. All rights reserved. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For information on reprinting or linking to internet.com content: http://internet.com/corporate/permissions.html ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~