HTML Unleashed PRE. Strategies for Indexing and Search Engines: How to Design for Search Engines
HTML Unleashed PRE: Strategies for Indexing and Search Engines
How to Design for Search Engines
f you've read the previous chapter, you might have noticed the similarity between search spiders and people with disabilities: Both have no access except to the text-only content of web pages. Therefore, most of the HTML authoring recommendations from the chapter on disabilities apply to the search-friendly design as well.
Providing text-only alternatives for every piece of information on your page is an obvious requirement because spiders only scan plain text (although, unfortunately, not all of them index alt texts of images). Making your content fully comprehensible in text-only modality may be difficult (it's like trying to persuade somebody not in person but with a letter, without the powerful "multimedia" of gestures and facial expressions), but it's really rewarding in the long run.
Preserving the logical flow of the text, rather than sacrificing it for the sake of layout tricks, is also very important. This improves the chances of spiders extracting a better summary for your document, as well as makes the text more suitable for automatic processing or categorizing.
Similarly, logical markup is an important requirement if you care about someone being able to use your document in any way, not just read it in a graphical browser. Besides the spiders of the major search engines, a great number of various robots and indexers wander along the roads of the Web, and many of them rely on the logical tags, such as H1, for figuring out the structure of your data.
All searches on the Web are being done via keywords, so it is probably the most important requirement to make sure that your documents contain all the keywords that are likely to be used to find the document. Two distinct strategies can be outlined in this respect.
Thus, the best you can do is combine these two approaches by setting up both sorts of pages on your site: those with maximum keywords coverage and those with maximum relevance with respect to main keywords.
By the way, these two keyword strategies correspond to the two types of search queries, specific and general searches. Some search engine users are looking for very specific information; they use rare keywords, phrase searches, and various advanced features such as Boolean operators. It's these "power users" that your keyword-rich pages should appeal to.
Other users, however, just need to find a good resource covering some fairly general topic; they enter a couple of simple keywords, get an avalanche of results, and browse the first several links found. For such general searches, web directories (such as Yahoo) usually perform better than search engines; however, a lot of users still employ search engines for the task. The relevance boosting technique described above could be useful in attracting such users to your site.
You might be interested to see what keywords are entered most frequently by search engine users, to better align your keyword spectrum with the public preferences. Unfortunately, this information (which would be immensely interesting from other viewpoints as well) is considered top-secret by major search engines---they never reveal their "top ten search words" lists for the (rather well-grounded) fear of spamming.
WebCrawler allows only a peek at the flow of search queries in real time, as they're entered on the search page. However, minor search engines are usually less obsessed with confidentiality, and some of them show their search statistics (for example, a Russian search engine called Rambler presents its list of the top 100 search words).
The final piece of advice concerning keywords is rather obvious: Always check your spelling. Spiders, in contrast to human readers, cannot "overlook" spelling errors, and you risk missing a good share of your potential audience by misspelling some important keyword. It is especially relevant given that in most cases you add your keywords into a META tag after the document itself is written, edited, and probably spell-checked.
Revised: Sept. 19, 1997