Web / Authoring / HTML / meta | 2
Back to Basics: META Tags, Part 2
by Scott Clark
Using META Tags
On to more important issues, like how to actually implement META tags in your Web pages. If you've ever had readers tell you that they're seeing an old version of your page when you know that you've updated it, you may want to make sure that their browser isn't caching the Web pages. Using META tags, you can tell the browser not to cache files, and/or when to request a newer version of the page. In this article, we'll cover some of the META tags, their uses, and how to implement them.
This tells the browser the date and time when the document will be considered "expired." If a user is using Netscape Navigator, a request for a document whose time has "expired" will initiate a new network request for the document. An illegal Expires date such as "0" is interpreted by the browser as "immediately." Dates must be in the RFC850 format, (GMT format):
<META HTTP-EQUIV="expires" CONTENT="Wed, 26 Feb 1997 08:21:57 GMT">
This is another way to control browser caching. To use this tag, the value must be "no-cache". When this is included in a document, it prevents Netscape Navigator from caching a page locally.
<META HTTP-EQUIV="Pragma" CONTENT="no-cache">
These two tags can be used as together as shown to keep your content currentÂbut beware. Many users have reported that Microsoft's Internet Explorer refuses the META tag instructions, and caches the files anyway. So far, nobody has been able to supply a fix to this "bug." As of the release of MSIE 4.01, this problem still existed.
This tag specifies the time in seconds before the Web browser reloads the document automatically. Alternatively, it can specify a different URL for the browser to load.
<META HTTP-EQUIV="Refresh" CONTENT="0;URL=http://www.newurl.com">
Be sure to remember to place quotation marks around the entire CONTENT attribute's value, or the page will not reload at all.
This is one method of setting a "cookie" in the user's Web browser. If you use an expiration date, the cookie is considered permanent and will be saved to disk (until it expires), otherwise it will be considered valid only for the current session and will be erased upon closing the Web browser.
<META HTTP-EQUIV="Set-Cookie" CONTENT="cookievalue=xxx;expires=Wednesday, 21-Oct-98 16:14:21 GMT; path=/">
This one specifies the "named window" of the current page, and can be used to prevent a page from appearing inside another framed page. Usually this means that the Web browser will force the page to go the top frameset.
<META HTTP-EQUIV="Window-target" CONTENT="_top">
Although you may not have heard of PICS-Label (PICS stands for Platform for Internet Content Selection), you probably will soon. At the same time that the Communications Decency Act was struck down, the World Wide Web Consortium (W3C) was working to develop a standard for labeling online content (see www.w3.org/PICS/ ). This standard became the Platform for Internet Content Selection (PICS). The W3C's standard left the actual creation of labels to the "labeling services." Anything which has a URL can be labeled, and labels can be assigned in two ways. First, a third party labeling service may rate the site, and the labels are stored at the actual labeling bureau which resides on the Web server of the labeling service. The second method involves the developer or Web site host contacting a rating service, filling out the proper forms, and using the HTML META tag information that the service provides on their pages. One such free service is the PICS-Label generator that Vancouver-Webpages provides. It is based on the Vancouver Webpages Canadian PICS ratings, version 1.0, and can be used as a guideline for creating your own PICS-Label META tag.
Although PICS-Label was designed as a ratings label, it also has other uses, including code signing, privacy, and intellectual property rights management. PICS uses what is called generic and specific labels. Generic labels apply to each document whose URL begins with a specific string of characters, while specific labels apply only to a given file. A typical PICS-Label for an entire site would look like this:
<META http-equiv="PICS-Label" content='(PICS-1.1 "http://vancouver-webpages.com/VWP1.0/" l gen true comment "VWP1.0" by "firstname.lastname@example.org" on "1997.10.28T12:34-0800" for "http://www.hisdomain.com/" r (P 2 S 0 SF -2 V 0 Tol -2 Com 0 Env -2 MC -3 Gam -1 Can 0 Edu -1 ))'>
Keyword and Description attributes
Chances are that if you manually code your Web pages, you're aware of the "keyword" and "description" attributes. These allow the search engines to easily index your page using the keywords you specifically tell it, along with a description of the site that you yourself get to write. Couldn't be simpler, right? You use the keywords attribute to tell the search engines which keywords to use, like this:
<META NAME ="keywords" CONTENT="life, universe, mankind, plants, relationships, the meaning of life, science">
By the way, don't think you can spike the keywords by using the same word repeated over and over, as most search engines have refined their spiders to ignore such spam. Using the META description attribute, you add your own description for your page:
<META NAME="description" CONTENT="This page is about the meaning of life, the universe, mankind and plants.">
Make sure that you use several of your keywords in your description. While you are at it, you may want to include the same description enclosed in comment tags, just for the spiders that do not look at META tags. To do that, just use the regular comment tags, like this:
<!-- This page is about the meaning of life, the universe, mankind and plants. -->
More about search engines can be found in WebDeveloper.com's special report.
ROBOTs in the mist
On the other hand, there are probably some of you who do not wish your pages to be indexed by the spiders at all. Worse yet, you may not have access to the robots.txt file. The robots META attribute was designed with this problem in mind.
<META NAME="robots" CONTENT="all | none | index | noindex | follow | nofollow">
The default for the robot attribute is "all". This would allow all of the files to be indexed. "None" would tell the spider not to index any files, and not to follow the hyperlinks on the page to other pages. "Index" indicates that this page may be indexed by the spider, while "follow" would mean that the spider is free to follow the links from this page to other pages. The inverse is also true, thus this META tag:
<META NAME="robots" CONTENT=" noindex">
would tell the spider not to index this page, but would allow it to follow subsidiary links and index those pages. "nofollow" would allow the page itself to be indexed, but the links could not be followed. As you can see, the robots attribute can be very useful for Web developers. For more information about the robot attribute, visit the W3C's robot paper.
Placement of META tags
META tags should always be placed in the head of the HTML document between the actual <HEAD> tags, before the BODY tag. This is very important with framed pages, as a lot of developers tend to forget to include them on individual framed pages. Remember, if you only use META tags on the frameset pages, you'll be missing a large number of potential hits.
Comments are welcome
Originally written on WebDeveloper.com in October 1997. Revised: May 17, 2000