Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA249      CA318      CA425      CA651

w2mind.computing.dcu.ie      w2mind.org


Search engine


Overview

  1. Write an offline search engline to search your web pages and produce an offline output web page where you can click on links.
  2. The search engine will be an offline version of my online search engine.
  3. You may not have web pages, so we will test it on a sample corpus of the works of Shakespeare in one of my directories.

Test corpus

  1. We will test it on the works of Shakespeare from the University of Adelaide.
  2. I have a copy here:
    file:///users/gdf1/mhtest09/share/shakespeare/home.html

  3. Note I am sharing this through the shared file system, not through http.
    Permissions need to be:
    drwx--x--x    /users/gdf1/mhtest09
    drwxr-xr-x    /users/gdf1/mhtest09/share
    (Q. Why?)

  4. Note you may have to paste the file:// link into your browser address bar.
    Firefox does not allow links from http:// to file://
    Once you are in file:// mode however, you can follow links from file:// to file://


For pass mark

  1. Call it gweb ("grep web").
    gweb string
  2. It searches the test corpus for the input string:
     
       cd /users/gdf1/mhtest09/share/shakespeare
       grep -i string  */*html    
    

  3. Use <pre> and sed as in the online version.

  4. The script sends its final output into an (offline) output web page:
    $HOME/tmp/gweb.output.html

For full marks

The above is for a pass mark. For full marks, make the files clickable.
  1. The basic grep above gives output like this:
    file.html: hit

  2. To make the files clickable, pipe the output to a second script, which does this:
     
    
    while read line
    do
     file=`echo "$line" | cut -f1  -d:`
      hit=`echo "$line" | cut -f2- -d:`
     
     echo "<a href=$file>$file</a>: $hit <br>"
    done
      
    

  3. You can now click on hits in the output page to see them (offline).
  4. You will have to adjust the href address if you are to click on links to my files from an output file in your directory.


Test

  1. For example:
    gweb northumberland
    will show all lines in the corpus where "northumberland" appears in any case.



Feeds      HumphrysFamilyTree.com

Bookmark and Share           On Internet since 1987.