AlltheWeb versus Google - WebReference Update - 020620 | WebReference

AlltheWeb versus Google - WebReference Update - 020620

((((((((((((((((( WEBREFERENCE UPDATE NEWSLETTER ))))))))))))))))) June 20, 2002

___________________________ Sponsors ________________________________ This newsletter sponsored by: Informative Graphics Instant Messaging Planet Fall 2002 Conference & Expo _____________________________________________________________________

This week former WebReference expert columnist Rich Wiggins investigates AlltheWeb's claim that they've surpassed Google in index size. The search engine leapfrog continues. In other voices MIT investigates bad software, and the WDVL looks at how rich media has evolved. In other news, Microsoft says they'll reinstate Java in Windows XP this summer, and an anti-spam startup. *- link to us today *- newsletter home *- submit article

New this week on and the Web:

1. FEATURE: AlltheWeb versus Google 2. OTHER VOICES: * Why Software Is So Bad, * The Evolution of Rich Media 3. NET NEWS: * Microsoft to reinstate Java in Windows * Start-up wants your help to fight spam

Like what you see? Get our front page e-mailed to you every business day with our HTML newsletter. Just send an e-mail to:

or for this text newsletter:

Spread the word! Feel free to send a copy of this newsletter to your friends and colleagues, and while you're at it, snap a link to

/-------------------------------------------------------------------\ "Because Net-It Central runs itself, I have time to work on four other major IT projects. With the automated process in Net-It Central for posting new materials, we all have more time to do other things." - Devon Johnston, Web Manager, DuPont Photomasks Inc. 2,000 Employees & 11 DuPont Photomasks facilities across North America, Asia and Europe are now able to communicate quickly and easily thanks to Net-It Central. Free Evaluation:


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. FEATURE: AlltheWeb Claims to Have Surpassed Google in Size of Index

Does size matter? One search technology company seems to think so, announcing that their Web index is now bigger than Google's.

Norwegian search technology company Fast Search & Transfer (FAST) claims that their search engine,, now has an index larger than that of Google. FAST says their index now comprises some 2.1 billion pages, whereas Google currently claims an index size of 2,073,418,204.

FAST issued a press release touting the achievement, and also noting that their index now covers Adobe Acrobat formatted content. Dr. John M. Lervik, chief executive officer of FAST. "The index size of AlltheWeb demonstrates the unique scalability of our search products, while support for multiple file and language types displays our flexibility and universal search abilities. The integration of breaking news highlights our real- time indexing capabilities, while our advanced linguistics and categorization techniques automatically interpret the correct meaning of queries and connect users to the most relevant information."

It is a testament to Google's perceived dominance in the field that rivals issue press releases comparing themselves to the leading Web search engine. See, for instance, the August 2001 announcement of Teoma (InfoToday NewsBreak, August 20, 2001, ). In this case, however, FAST is not announcing an assault on the champ. Jami Axelrod, Senior Product Marketing Manager for FAST, says "Google is a consumer brand. AlltheWeb is our a showcase - our technology sandbox." FAST sells search technology for intranet and extranet applications, and AlltheWeb exposes the technology to potential customers as well as the public at large.

I asked Axelrod if she had any sample searches that showed off AlltheWeb's prowess relative to Google. Indeed she did:

* The classic foil of many a search engine, "to be or not to be," yields nasty warnings from Google: The word "or" was ignored in your query - for search results including one term or another, use capitalized "OR" between words and the following words are very common and were not included in your search: to be to be. The resulting Google hit list is uncharacteristically full of irrelevant documents. AlltheWeb doesn't choke on the stop words, and does deliver relevant content.

* A search for "new york restaurants 57" claims Axelrod does a better job of finding restaurants on 57th Street. (In practice it appears that both search engines find many false matches due to out-of-context uses of the number 57.)

* A search for a word with variant meanings showcases AlltheWeb's "Fast Topics" functionality. For instance, searching for "saturn" yields topical breakdowns, for the automobile, the Sega video game, the planet, and astrological links. The topical breakdowns are somewhat reminiscent of Northern Light's Custom Search Folders, except the FAST application relies on Open Directory Project categories. (Also, alas, Northern Light is no longer in the Web search business.)

I asked Axelrod if she was sure that FAST and Google counted URLs the same way. She said "We did the best we could to come up with an apples-to-apples comparison." She noted that FAST technology de-duplicates before URLs go in the index, whereas Google de-dups at search time; she questioned whether Google's document count is not therefore falsely inflated, noting "We throw out three times as many pages as we index due to the duplicates we find."

Asked to clarify what URL paths AlltheWeb follows, FAST engineer Frode Lundgren indicated that while crawling the spider will follow links from a static page to a dynamic one, but, to avoid looping situations, will not follow links from a dynamic page to another dynamic page.

Some of FAST's recent customers include the U.S. Government (for its citizen portal, Ebay, Reuters (for a real-time news filter) and IBM. The company offers both hosted (ASP) and software licensing models to its customers.

Google officials, true to form, refused to take the bait from a competitor's comparison. Spokesperson Nate Tyler said "Google welcomes any company's efforts to enhance web [sic] search and hope that it will raise the awareness of the value of search engines in people's day-to-day lives."

It will be interesting to see if the index claims cause Google and other companies to update their reported index size more frequently. Close observers of the major contenders note that the odometer is manually updated, like the old McDonald's signs. Will we soon see animated odometers rolling over in real time as the indexes grow?

# # # #

About the author: Richard Wiggins is an author and speaker specializing in Internet topics. Wiggins writes for national publications such as New Media, Searcher, and Internet World. He serves on the editorial board of First Monday, a peer reviewed e-journal about the Internet. He is author of the first book on Web publishing, "The Internet for Everyone: A Guide for Users and Provider" (McGraw-Hill, 1995). He is a former expert columnist. His site is at and he can be reached at:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2. OTHER VOICES: Why Software Is So Bad, The Evolution of Rich Media

* Why Software Is So Bad

For years we've tolerated buggy, bloated, badly organized computer programs. But soon, we'll innovate, litigate and regulate them into reliability. MIT's Technology Review, July/August, 2002

* The Evolution of Rich Media

Digital content is changing the face of application development. Conventional business platforms are not capable of managing the complex structure and relationships of digital assets. A rich- media platform is proposed., June 19, 2002

/-------------------------------------------------------------------\ Don't miss Instant Messaging Planet Fall 2002 Conference & Expo coming to San Francisco on Sept 9-10. The first focused event of its kind, Instant Messaging Planet is the only industry forum that will bring together IM experts and professionals in order to exchange "IM in the Enterprise" ideas, strategies and IM success stories. For more information and to register today, visit


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3. NET NEWS: Microsoft to reinstate Java in Windows, Start-up wants your help to fight spam

* Microsoft to reinstate Java in Windows

In an about-face, Microsoft said Tuesday that it will reinstate the ability to run Java programs in Windows XP., June 18, 2002

* Start-up wants your help to fight spam

Ordinary Web surfers could play a major role in stemming the rising tide of junk e-mail crippling the Net, if a new anti-spam company hits its mark., June 19, 2002

That's it for this Thursday, see you next time.

Andrew King Newsletter Editor, aking at internet dot com


Outsource Your Email Newsletter For As Low As $19.99/mo with Free Setup! SparkLIST offers self service managed email list hosting with live human support or without depending on your preference! Free Quote:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Advertising: If you are interested in advertising in our newsletters, call Claudia at 1-203-662-2863 or send email to ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For contact information on sales offices worldwide visit ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For details on becoming a Commerce Partner, contact David Arganbright on 1-203-662-2858 or ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To learn about other free newsletters offered by or to change your subscription visit ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'s network of more than 160 Web sites is organized into 16 channels: Internet Technology E-Commerce/Marketing Web Developer Windows Internet Technology Linux/Open Source Internet Resources ISP Resources Internet Lists Download International Internet News Internet Investing ASP Resources Wireless Internet Career Resources EarthWeb ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To find an answer - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Looking for a job? Filling an opening? - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This newsletter is published by Jupitermedia Corp - The Internet & IT Network Copyright (c) 2002 Jupitermedia Corp. All rights reserved. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For information on reprinting or linking to content: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~