HTML Unleashed PRE. Strategies for Indexing and Search Engines: The META tag | WebReference

HTML Unleashed PRE. Strategies for Indexing and Search Engines: The META tag

 

HTML Unleashed PRE: Strategies for Indexing and Search Engines

 
 

The META tag

 
 

Getting back to HTML, you might wonder what the syntax is for adding keywords to a document.  Of course, the text of a page is the primary source of searchable material, but you may also need to add certain keywords without altering the page content.  (Changing text color to make keywords invisible in the body of a document is a really ugly trick; please never resort to it!)

The META tag serves this purpose (as well as several other purposes).  "Meta" is a Greek word for "over," and META tag was intended to carry all sorts of meta- information, that is, information about (or "over") information.  You should understand that using META for specifying keywords is not an HTML convention, but only one of the widely accepted uses of the tag.

A META tag usually takes the following form:

  <META name="..." content="...">

As you can see, the names of the META tag attributes are rather generic, which allows you to use the tag to express virtually any information that may be represented as a name-value pair.  For example, you could use META tags to supply information about yourself (name="author"), the program you used to create the HTML file (name="generator"), and so on.

Here's how the META tag is used for introducing your document to search engines:

 
 
 
  <META name="keywords"
     content="searching, search engines, keywords, HTML">
  <META name="description"
     content="A description of major web search engines, spiders,
              and search-friendly HTML authoring">
 
 
 

These tags should be placed within the HEAD element.  Keywords and phrases in the content of the tag with name="keywords" attribute can be separated by commas for better readability, although spiders usually ignore the separators.  The maximum number of keywords depends on the search engine in question; for some of them, 25 words or 200 characters have been quoted as the upper limit.

Hopefully, the keywords thus specified will be added to the searchable representation of the document in the engine's database, and the description will be stored as the summary to be displayed for the document in a list of results (in the absence of a description, most search engines will take the first lines of text on the page).

Another use of the META tag is for excluding a page from spiders' attention.  By adding the following tag,

  <META name="robots" content="noindex">

you instruct any spiders that run into your page to bypass it without indexing.

However, not all spiders support this convention.  A more reliable solution is to add a robots.txt file to the root directory of your web server, with a list of files that must be excluded from indexing.  For example, your robots.txt might contain these lines:

  User-agent: *
  Disallow: /dont_index_me.html
  Disallow: /hidden_dir/

With these lines, no robot will scan the dont_index_me.html document, nor any document from the /hidden_dir/.  For more information on robots exclusion, refer to http://info.webcrawler.com/mak/projects/robots/exclusion.html.

 

Created: Sept. 19, 1997
Revised: Sept. 19, 1997

URL: http://www.webreference.com/dlab/books/html-pre/43-2-2.html