- Write an offline search engine to search your web pages and produce an offline output web page where you can click on links.
- The search engine will be an offline version of
my online search engine.
- You may not have web pages, so we will test it on a sample corpus of the works of Shakespeare
in one of my directories.
- We will test it on the
works of Shakespeare
University of Adelaide.
I have a copy here:
- Note you may have to paste the file:// link into your browser address bar.
Firefox and Chrome
do not allow links from http:// to file://
Once you are in file:// mode however,
you can follow links from file:// to file://
- Note I am sharing this through the shared file system, not through http.
Permissions need to be:
For pass mark
- Call it gweb ("grep web").
- It searches the test corpus for the input string:
grep -i string */*html
as in the online version.
N.B. You must find and delete the parts of the online version that are irrelevant to the offline version.
- The script sends its final output into an (offline) output web page:
For full marks
The above is for a pass mark.
For full marks, make the files clickable.
- The basic grep above gives output like this:
- To make the files clickable, pipe the output to a second script, which does this:
while read line
file=`echo "$line" | [CUT BEFORE THE COLON]`
hit=`echo "$line" | [CUT AFTER THE COLON]`
echo "[LINKABLE FILENAME]: $hit <br>"
The bits in capital letters inside square brackets you need to work out yourself!
- You can now click on hits in the output page to see them (offline).
- You will have to adjust the href address if you are to click on links to my files
from an output file in your directory.
- For example:
will show all lines in the corpus where "northumberland" appears in any case.