Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Search:

CA216      CA249      CA318

CA400      CA651      CA668


My "404 Not Found" Handler

Our main webserver www.computing.dcu.ie is a UNIX/Linux server with a case-sensitive file system.
As a result, a direct reference to a file with the wrong case will fail and give "404 Not Found".

See three different approaches to this by three different users:


User Relationship between URL and file system What happens if case wrong Example of correct case Example of incorrect case
Sysadmins Indirect. Content management system. URL seems to be address of an object in its database. Content management system can implement its own case-insensitive addressing. Correct Incorrect
Normal user's personal webspace Direct 404 Not Found Correct Incorrect
My personal webspace Direct 404 triggered, but I use .htaccess to re-direct 404 to my own program to do case-insensitive matching. Correct Incorrect




How to do 404 redirection

The Apache web server allows "404 not found" to run a program rather than just output a standard error page.

The program could do a case-insensitive search on valid URLs to find a match for the bad URL.
It could pre-build a list of all files, and then   grep -i   with the bad URL string.




My 404 handler

So this is what I do to implement an error-tolerant web server. I put a .htaccess file in:
$HOME/public_html/.htaccess
This .htaccess file has an ErrorDocument line to redirect 404's to a Program:
ErrorDocument 404  /cgi-bin/user/prog
ErrorDocument 403  /cgi-bin/user/prog
The program it redirects to is a CGI script.
The CGI script looks at REDIRECT_URL and does a case insensitive and partial-line match on a pre-built list of all URLs.
e.g. Try some URLs:

public_html may need to be readable for this to work:

drwx---r-x  

This works for my sub-site only: http://computing.dcu.ie/~humphrys/*
It won't work for mis-spellings higher up:




Note - May need to override browser error message

Problem: For some versions of IE, if IE receives a return code of 404, it may override the server error handling with its own useless error message. See "Show friendly HTTP error messages".
The user can turn this off in Tools-Options-Advanced. But obviously you can't get every user to do that.
So to get my script to work, I tell IE that it is not an error. i.e. The first 2 lines output by the script are:
Status: 200
Content-type: text/html

Returning 200 does have problems, though, because then spiders do not realise this link is broken. Everything seemed to work just fine. So, for example, all error URLs will be archived in the Internet Archive as well as all real URLs, since the archive cannot tell they are just error screens.




Feeds      w2mind.org

On Internet since 1987.