Searching the Internet

General idea:

  1. First look for a category in a directory like Yahoo or Google Directory. [Human-built, limited coverage, nicely structured]
  2. If no category, use the flat list of hits from a search engine like Google or Alta Vista. [Machine-built, wide coverage, but unstructured]

For a topic about which there is a lot of information, like "Shakespeare":

For more obscure, or more localised, topics:

Directories v. Search engines

Directories - Hand-built. Hierarchical Structure. Information is nicely organised.

Search engines - Machine built. Unordered List of sites. Much more disorganised. But because machine-built, indexes millions more pages than Directory. Directories like Yahoo, because they are built by hand, will always lag behind.

Directories - Only list the home page of each site. Search engines - May list every single page on site. Sometimes this is an advantage, sometimes a huge disadvantage.

Directories - For well-known, universal topics, finding a good place to start on the Web. For good places to link to for your readers' starting points.

Search engines - For obscure, once-off, heavy duty, user-driven searches.

Tips on Yahoo

Yahoo itself can be confusing. Type "CGI" and you get a confusing page of hits, a mix of categories and actual sites:

What we are looking for though is really if there is a dedicated category for CGI, and in among the above list we will see up at the top what we are looking for:

This, rather than the search results, is the starting point we want to start our exploration of CGI. This is also the kind of page that is good to link to, if you want to provide a Starting point for CGI to your users.

Tips on search engines

Read the help page. Use all options.

Alta Vista has Boolean logic:

valera AND collins
valera OR collins
title:"de valera"

url:dcu.ie AND linux


  1. Find on the Web:
    1. Pi to 1000 digits
    2. Darwin's grandfather's birthday
    3. The Treaty Debates of 1921-22
    4. The teachings of Scientology
    5. What movies are on in Dublin now
    6. A graph of ex
    7. Aerial pictures of the North Korean concentration camps
    8. Copies of any 18th or 19th century newspaper
    9. Transcripts of any 18th or 19th century trial
    10. U.S. Election ads from any election before 2004
    11. List of pirate radio stations in Ireland

  2. For each item above, what is the best hit to link to? e.g. Authoritative site. Site that links to other sites. Site that will still exist in 10 years' time.

  3. Suggest something that cannot be found on the Web.

  4. Suggest something that cannot be found on the Web for market reasons.

  5. Suggest something that cannot be found on the Web for copyright reasons.

  6. Suggest something that cannot be found on the Web for privacy reasons.

  7. Suggest something that cannot be found on the Web for logistical reasons.

  8. Suggest something that cannot be found on the Web, and we could be waiting 500 years to see it online.

