AlltheWeb versus Google - WebReference Update - 020620 | WebReference

AlltheWeb versus Google - WebReference Update - 020620

((((((((((((((((( WEBREFERENCE UPDATE NEWSLETTER ))))))))))))))))) June 20, 2002

___________________________ Sponsors ________________________________ This newsletter sponsored by: Informative Graphics Instant Messaging Planet Fall 2002 Conference & Expo _____________________________________________________________________

This week former WebReference expert columnist Rich Wiggins investigates AlltheWeb's claim that they've surpassed Google in index size. The search engine leapfrog continues. In other voices MIT investigates bad software, and the WDVL looks at how rich media has evolved. In other news, Microsoft says they'll reinstate Java in Windows XP this summer, and an anti-spam startup.

http://www.webreference.com *- link to us today http://www.webreference.com/new/ *- newsletter home http://www.webreference.com/new/submit.html *- submit article

New this week on WebReference.com and the Web:

1. FEATURE: AlltheWeb versus Google 2. OTHER VOICES: * Why Software Is So Bad, * The Evolution of Rich Media 3. NET NEWS: * Microsoft to reinstate Java in Windows * Start-up wants your help to fight spam

Like what you see? Get our front page e-mailed to you every business day with our HTML newsletter. Just send an e-mail to:

mailto:subscribe-html@webreference.com

or for this text newsletter:

mailto:subscribe@webreference.com

Spread the word! Feel free to send a copy of this newsletter to your friends and colleagues, and while you're at it, snap a link to WebReference.com.

/-------------------------------------------------------------------\ "Because Net-It Central runs itself, I have time to work on four other major IT projects. With the automated process in Net-It Central for posting new materials, we all have more time to do other things." - Devon Johnston, Web Manager, DuPont Photomasks Inc. 2,000 Employees & 11 DuPont Photomasks facilities across North America, Asia and Europe are now able to communicate quickly and easily thanks to Net-It Central. Free Evaluation: http://www.netit.com/WbRf.htm

\--------------------------------------------------------------adv.-/

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. FEATURE: AlltheWeb Claims to Have Surpassed Google in Size of Index

Does size matter? One search technology company seems to think so, announcing that their Web index is now bigger than Google's.

Norwegian search technology company Fast Search & Transfer (FAST) claims that their search engine, AlltheWeb.com, now has an index larger than that of Google. FAST says their index now comprises some 2.1 billion pages, whereas Google currently claims an index size of 2,073,418,204.

FAST issued a press release touting the achievement, and also noting that their index now covers Adobe Acrobat formatted content. Dr. John M. Lervik, chief executive officer of FAST. "The index size of AlltheWeb demonstrates the unique scalability of our search products, while support for multiple file and language types displays our flexibility and universal search abilities. The integration of breaking news highlights our real- time indexing capabilities, while our advanced linguistics and categorization techniques automatically interpret the correct meaning of queries and connect users to the most relevant information."

It is a testament to Google's perceived dominance in the field that rivals issue press releases comparing themselves to the leading Web search engine. See, for instance, the August 2001 announcement of Teoma (InfoToday NewsBreak, August 20, 2001, http://www.infotoday.com/newsbreaks/nb010820-2.htm ). In this case, however, FAST is not announcing an assault on the champ. Jami Axelrod, Senior Product Marketing Manager for FAST, says "Google is a consumer brand. AlltheWeb is our a showcase - our technology sandbox." FAST sells search technology for intranet and extranet applications, and AlltheWeb exposes the technology to potential customers as well as the public at large.

I asked Axelrod if she had any sample searches that showed off AlltheWeb's prowess relative to Google. Indeed she did:

* The classic foil of many a search engine, "to be or not to be," yields nasty warnings from Google: The word "or" was ignored in your query - for search results including one term or another, use capitalized "OR" between words and the following words are very common and were not included in your search: to be to be. The resulting Google hit list is uncharacteristically full of irrelevant documents. AlltheWeb doesn't choke on the stop words, and does deliver relevant content.

* A search for "new york restaurants 57" claims Axelrod does a better job of finding restaurants on 57th Street. (In practice it appears that both search engines find many false matches due to out-of-context uses of the number 57.)

* A search for a word with variant meanings showcases AlltheWeb's "Fast Topics" functionality. For instance, searching for "saturn" yields topical breakdowns, for the automobile, the Sega video game, the planet, and astrological links. The topical breakdowns are somewhat reminiscent of Northern Light's Custom Search Folders, except the FAST application relies on Open Directory Project categories. (Also, alas, Northern Light is no longer in the Web search business.)

I asked Axelrod if she was sure that FAST and Google counted URLs the same way. She said "We did the best we could to come up with an apples-to-apples comparison." She noted that FAST technology de-duplicates before URLs go in the index, whereas Google de-dups at search time; she questioned whether Google's document count is not therefore falsely inflated, noting "We throw out three times as many pages as we index due to the duplicates we find."

Asked to clarify what URL paths AlltheWeb follows, FAST engineer Frode Lundgren indicated that while crawling the spider will follow links from a static page to a dynamic one, but, to avoid looping situations, will not follow links from a dynamic page to another dynamic page.

Some of FAST's recent customers include the U.S. Government (for its citizen portal FirstGov.gov), Ebay, Reuters (for a real-time news filter) and IBM. The company offers both hosted (ASP) and software licensing models to its customers.

Google officials, true to form, refused to take the bait from a competitor's comparison. Spokesperson Nate Tyler said "Google welcomes any company's efforts to enhance web [sic] search and hope that it will raise the awareness of the value of search engines in people's day-to-day lives."

It will be interesting to see if the index claims cause Google and other companies to update their reported index size more frequently. Close observers of the major contenders note that the odometer is manually updated, like the old McDonald's signs. Will we soon see animated odometers rolling over in real time as the indexes grow?

# # # #

About the author: Richard Wiggins is an author and speaker specializing in Internet topics. Wiggins writes for national publications such as New Media, Searcher, and Internet World. He serves on the editorial board of First Monday, a peer reviewed e-journal about the Internet. He is author of the first book on Web publishing, "The Internet for Everyone: A Guide for Users and Provider" (McGraw-Hill, 1995). He is a former WebReference.com expert columnist. His site is at http://www.richardwiggins.com and he can be reached at: mailto:rich@richardwiggins.com.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2. OTHER VOICES: Why Software Is So Bad, The Evolution of Rich Media

* Why Software Is So Bad

For years we've tolerated buggy, bloated, badly organized computer programs. But soon, we'll innovate, litigate and regulate them into reliability. http://www.technologyreview.com/articles/mann0702.asp MIT's Technology Review, July/August, 2002

* The Evolution of Rich Media

Digital content is changing the face of application development. Conventional business platforms are not capable of managing the complex structure and relationships of digital assets. A rich- media platform is proposed. http://wdvl.internet.com/Multimedia/RichMedia/ WDVL.com, June 19, 2002

/-------------------------------------------------------------------\ Don't miss Instant Messaging Planet Fall 2002 Conference & Expo coming to San Francisco on Sept 9-10. The first focused event of its kind, Instant Messaging Planet is the only industry forum that will bring together IM experts and professionals in order to exchange "IM in the Enterprise" ideas, strategies and IM success stories. For more information and to register today, visit http://www.intmediaevents.com/im/fall02.

\--------------------------------------------------------------adv.-/

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3. NET NEWS: Microsoft to reinstate Java in Windows, Start-up wants your help to fight spam

* Microsoft to reinstate Java in Windows

In an about-face, Microsoft said Tuesday that it will reinstate the ability to run Java programs in Windows XP. http://news.com.com/2100-1001-937053.html News.com, June 18, 2002

* Start-up wants your help to fight spam

Ordinary Web surfers could play a major role in stemming the rising tide of junk e-mail crippling the Net, if a new anti-spam company hits its mark. http://news.com.com/2100-1023-937300.html News.com, June 19, 2002

That's it for this Thursday, see you next time.

Andrew King Newsletter Editor, WebReference.com aking at internet dot com

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Outsource Your Email Newsletter For As Low As $19.99/mo with Free Setup! SparkLIST offers self service managed email list hosting with live human support or without depending on your preference! Free Quote: mailto:sales@sparklist.com http://SparkLIST.com/intm/

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Advertising: If you are interested in advertising in our newsletters, call Claudia at 1-203-662-2863 or send email to mailto:nsladsales@internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For contact information on sales offices worldwide visit http://www.internet.com/mediakit/salescontacts.html ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For details on becoming a Commerce Partner, contact David Arganbright on 1-203-662-2858 or mailto:commerce-licensing@internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To learn about other free newsletters offered by internet.com or to change your subscription visit http://e-newsletters.internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ internet.com's network of more than 160 Web sites is organized into 16 channels: Internet Technology http://internet.com/it E-Commerce/Marketing http://internet.com/marketing Web Developer http://internet.com/webdev Windows Internet Technology http://internet.com/win Linux/Open Source http://internet.com/linux Internet Resources http://internet.com/resources ISP Resources http://internet.com/isp Internet Lists http://internet.com/lists Download http://internet.com/downloads International http://internet.com/international Internet News http://internet.com/news Internet Investing http://internet.com/stocks ASP Resources http://internet.com/asp Wireless Internet http://internet.com/wireless Career Resources http://internet.com/careers EarthWeb http://www.earthweb.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To find an answer - http://search.internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Looking for a job? Filling an opening? - http://jobs.internet.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This newsletter is published by Jupitermedia Corp http://internet.com - The Internet & IT Network Copyright (c) 2002 Jupitermedia Corp. All rights reserved. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For information on reprinting or linking to internet.com content: http://internet.com/corporate/permissions.html ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~