Internet Tutorials | John Faughnan | Robert Elson

Search Engines: Google, AltaVista and Profusion

  • Introduction
  • Google
  • AltaVista
  • Basic Searches
  • Adding Specificity
  • Profusion
  • Footnotes

  • Introduction

    There are about 800 million web pages that are publicly accessible (Feb. 1999 [1]). This excludes, for example, the New York Times and the Encyclopedia Brittanica! These are the pages that a search engine can find for you.

    Until 1999 the best search engine was AltaVista. It had about the widest coverage (16% [1],[2]), very good performance, and a powerful but slightly complex search language. When AltaVista failed, Profusion was a great way to try every other useful search engine.

    Then came Google, and everybody else became history.

    This document introduces Google, and talks about how to use AltaVista and Profusion on those occasions that Google doesn't succeed.

    Google

    Google, at http://www.google.com, is the best search engine yet built. It excels at finding things that can be named. If you want to find a combination of things that are hard to name (like a line from a poem who's name you forget), use AltaVista.

    Google is very simple to use. You enter the name of what you're looking for. If the name has more than one word, use quotes (") around the words (eg. "Lyme arthritis"). That's about all you'll ever need.

    Google is somewhat magical [3].

    AltaVista

    AltaVista is harder to use than Google, and search results often have much more 'garbage' in them. However it indexes almost twice as many pages[1] and it can be used when you can't name what you're looking for, but you know some words that are likely to be on the page that you want.

    These notes are based on AltaVista's online documentation for "simple queries". The words in this font are what you'd type into the entry form on the Alta Vista page. Try out the examples!

    Basic Searches

    paris "petite galerie" louvre

    Finds documents containing as many of these words and phrases as possible, ranked so that documents with the most matches are presented first. A phrase is any string of adjacent words. The preferred way to link words into a phrase is to use quotes.

    Lower-case search will find matches of capitalized words also. For example, paris will find matches for paris, Paris, and PARIS.

    Capital letters in a search will force an exact case match on the entire word. For example, submitting a query for parIS will search only for matches of parIS. (Don't be surprised if there are none.)

    +noir +film -"pinot noir"

    Matches may be required, or prohibited. Precede a required word or phrase with + and a prohibited one with -. This query finds documents containing film and noir, but not containing pinot noir.

    quilt*

    This query matches pages that contain at least one word such as quilt, quilts, quilting, quilted, quilter etc. Hint: The *-notation is also useful for searching for variant spellings.

    Adding Specificity

    It is possible to restrict searches to certain portions of documents by using the following syntax. The keyword (link, title, image,...) should be in lower-case, and immediately followed by a colon.

    title:"The Wall Street Journal"

    Matches pages with the phrase The Wall Street Journal in the title. This is very useful!

    url:home.html

    Matches pages with the words home and html together in the page's URL. Equivalent to url:"home html".

    host:digital.com

    Matches pages with the phrase digital.com in the host name of the Web server.

    Profusion

    If Google and AltaVista both fail, I usually use the Profusion MetaSearch engine <http://www.profusion.com/>. This tool combines searches across multiple engines.

    History

    Footnotes

    [1] Lawrence S, Giles CL. Accessibility of Information on the web. Nature July 8, 1999.
    [2] Search engines used to cover about half of the public web. At the rate they're falling behind they will soon drop below 10%.
    [3] Google uses principles similar to neural network design. Simplistically, it finds the "best" pages by looking at the pages that the "best" pages link to. If this seems circular, don't worry, it is. But that's how neural networks (such as the human mind) function. The resulting intelligence is sometimes unnerving.

    Last Revised: 01 Feb 2002. Author: John G. Faughnan M.D. and Robert Elson M.D. Disclaimer: The views and opinions expressed in this and related pages are strictly those of the page authors. Anyone may link to or print out any of these pages.