spacer

Webref WebRef   Sitemap · Experts · Tools · Services · Newsletters · About i.com

home / experts / dlab / html-pre / chapter 43
Developer News
MicrosoftÂ’s Automated Agent: Can We Talk?
Borland Finally Sells CodeGear
Red Hat Heads For The JON 2.0
 

HTML Unleashed PRE: Strategies for Indexing and Search Engines

How to Design for Search Engines

 
 

If you've read the previous chapter, you might have noticed the similarity between search spiders and people with disabilities: Both have no access except to the text-only content of web pages.  Therefore, most of the HTML authoring recommendations from the chapter on disabilities apply to the search-friendly design as well.

Providing text-only alternatives for every piece of information on your page is an obvious requirement because spiders only scan plain text (although, unfortunately, not all of them index alt texts of images).  Making your content fully comprehensible in text-only modality may be difficult (it's like trying to persuade somebody not in person but with a letter, without the powerful "multimedia" of gestures and facial expressions), but it's really rewarding in the long run.

Preserving the logical flow of the text, rather than sacrificing it for the sake of layout tricks, is also very important.  This improves the chances of spiders extracting a better summary for your document, as well as makes the text more suitable for automatic processing or categorizing.

Similarly, logical markup is an important requirement if you care about someone being able to use your document in any way, not just read it in a graphical browser.  Besides the spiders of the major search engines, a great number of various robots and indexers wander along the roads of the Web, and many of them rely on the logical tags, such as H1, for figuring out the structure of your data.

 
 
 

Keyword Strategies

 
 

All searches on the Web are being done via keywords, so it is probably the most important requirement to make sure that your documents contain all the keywords that are likely to be used to find the document.  Two distinct strategies can be outlined in this respect.

  1. The first idea that comes to mind is simple: The more keywords you cram into a page, the better.  Indeed, you can never predict what particular keywords will come to users' minds, so it's always a good idea to think about all possible synonyms, variants, generic inclusive terms, subterms, and related concepts for all the main subjects of your discourse.

    Besides, remember that the keywords can be entered in a different grammatical form, such as plural instead of singular for nouns.  And of the major search engines, only Alta Vista provides the "wildcard" notation to look for "table" or "tables" by specifying "table*".  So, you'd better see to it yourself by including both forms in your document.  (This problem is especially serious for languages other than English; for example, a verb in Russian may have up to 235 distinct forms.  Therefore, most Russian search engines, such as Aport mentioned earlier, by default employ word inflection algorithms allowing to automatically match all word forms.)

    Finally, if your main keyword is a relatively common word (such as "search"), it is likely that practiced search users will employ the phrase searching feature to query for word combinations (such as "search engines") rather than single words.  Therefore, make sure that your document contains the most common collocations of the main keyword with closely related nouns, adjectives, verbs, and so on.

  2. However, one might think about an opposite to the strategy of maximizing "keyword coverage" just described.  Remember that one of the factors in results ranking, as implemented by major search engines, is frequency, which is computed as the number of keyword occurrences divided by the document size.

    One consequence of this calculation is that if two documents contain the same keyword (located at the same distance from the top of document), the one that is smaller in size will get a higher ranking.  This gives you a clue: Select one of the root (introductory) pages on your site and try to make it as compact and concise as possible, so that it presents just the essence of your content with only the most common keywords.  This page will get a boost with respect to searches for these keywords, thereby attracting more hits to the entire site.

Thus, the best you can do is combine these two approaches by setting up both sorts of pages on your site: those with maximum keywords coverage and those with maximum relevance with respect to main keywords.

By the way, these two keyword strategies correspond to the two types of search queries, specific and general searches.  Some search engine users are looking for very specific information; they use rare keywords, phrase searches, and various advanced features such as Boolean operators.  It's these "power users" that your keyword-rich pages should appeal to.

Other users, however, just need to find a good resource covering some fairly general topic; they enter a couple of simple keywords, get an avalanche of results, and browse the first several links found.  For such general searches, web directories (such as Yahoo) usually perform better than search engines; however, a lot of users still employ search engines for the task.  The relevance boosting technique described above could be useful in attracting such users to your site.

You might be interested to see what keywords are entered most frequently by search engine users, to better align your keyword spectrum with the public preferences.  Unfortunately, this information (which would be immensely interesting from other viewpoints as well) is considered top-secret by major search engines---they never reveal their "top ten search words" lists for the (rather well-grounded) fear of spamming.

WebCrawler allows only a peek at the flow of search queries in real time, as they're entered on the search page.  However, minor search engines are usually less obsessed with confidentiality, and some of them show their search statistics (for example, a Russian search engine called Rambler presents its list of the top 100 search words).

The final piece of advice concerning keywords is rather obvious: Always check your spelling.  Spiders, in contrast to human readers, cannot "overlook" spelling errors, and you risk missing a good share of your potential audience by misspelling some important keyword.  It is especially relevant given that in most cases you add your keywords into a META tag after the document itself is written, edited, and probably spell-checked.

 

Produced by Dmitry Kirsanov
Copyright Sams.net Publishing and


JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
Microsoft Article: HyperV-The Killer Feature in WinServer ‘08
Avaya Article: How to Feed Data into the Avaya Event Processor
Microsoft Article: Install What You Need with Win Server ‘08
HP eBook: Putting the Green into IT
Whitepaper: HP Integrated Citrix XenServer for HP ProLiant Servers
Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 1
Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 2--The Future of Concurrency
Avaya Article: Setting Up a SIP A/S Development Environment
IBM Article: How Cool Is Your Data Center?
Microsoft Article: Managing Virtual Machines with Microsoft System Center
HP eBook: Storage Networking , Part 1
Microsoft Article: Solving Data Center Complexity with Microsoft System Center Configuration Manager 2007
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Intel Video: Are Multi-core Processors Here to Stay?
On-Demand Webcast: Five Virtualization Trends to Watch
HP Video: Page Cost Calculator
Intel Video: APIs for Parallel Programming
HP Webcast: Storage Is Changing Fast - Be Ready or Be Left Behind
Microsoft Silverlight Video: Creating Fading Controls with Expression Design and Expression Blend 2
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Sun Download: Solaris 8 Migration Assistant
Sybase Download: SQL Anywhere Developer Edition
Red Gate Download: SQL Backup Pro and free DBA Best Practices eBook
Red Gate Download: SQL Compare Pro 6
Iron Speed Designer Application Generator
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
How-to-Article: Preparing for Hyper-Threading Technology and Dual Core Technology
eTouch PDF: Conquering the Tyranny of E-Mail and Word Processors
IBM Article: Collaborating in the High-Performance Workplace
HP Demo: StorageWorks EVA4400
Intel Featured Algorhythm: Intel Threading Building Blocks--The Pipeline Class
Microsoft How-to Article: Get Going with Silverlight and Windows Live
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES
webref The latest from WebReference.com Browse >
Perl Pragma Primer · Implement Drag and Drop in Your Web Apps: Part 2 · How to Create an Ajax Autocomplete Text Field: Part 5
Sitemap · Experts · Tools · Services · Email a Colleague · Contact FREE Newsletters 
 The latest from internet.com
SQL Server 2005 Express Edition - Part 22 - Upgrading from Microsoft SQL Server Desktop Engine (MSDE) · Vyatta: Downgrades that Pay Off · NetMotion Brings Cross-Network Support to Wireless VoIP

Created: Sept. 19, 1997
Revised: Sept. 19, 1997

URL: http://www.webreference.com/dlab/books/html-pre/43-2-1.html