Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA216      CA249      CA318

CA400      CA651      CA668


Link checker


Write a Java program to:
  1. Take a URL as a command-line argument.
  2. Catch errors if bad URL or URL not found.
  3. If good URL, download the page.
  4. Extract all links in the page.
  5. See Parsing HTML with Java
  6. Find all broken links.

  7. For this exercise, we will narrowly define a "broken" link as any link with a HTTP return code of 404, or a link that times out.
  8. For timeout settings see Networking Properties.

  9. Output is a web page:
    • Output the list of broken links to a web page that you can browse (offline) and click on the links.
    • Use this for debugging. If your program claims the link is broken, you can test it here.
    • Do not bother listing any links to Google.
    • Only list URLs with return code 404 or time out. Do not list other URLs.
    • Remove all duplicates.



Test on these URLs:

Your final output should demonstrate your program working on these URLs:

http://computing.dcu.ie/~humphrys/computers.internet.links.html
http://computing.dcu.ie/~humphrys/news.links.html
http://humphrysfamilytree.com/links.html
http://humphrysfamilytree.com/sources.html
http://humphrysfamilytree.com/sources.local.html


To hand up:

What to hand up (Include a printout of the output table when run on the URLs above.)


Feeds      w2mind.org

On Internet since 1987.