Dr. Mark Humphrys

School of Computing. Dublin City University.

Home      Blog      Teaching      Research      Contact

Online coding site: Ancient Brain

coders   JavaScript worlds


CA170      CA668      CA686

Online AI coding exercises

Project ideas

How I stop spam

I have a simple approach to the problem of junk email or "spam". It really helps.

The problem

I have a website on various topics of research where I actively want random strangers to be able to email me. Every week or month I hear from some stranger (whose address I could not have known in advance) who has information that I want on one of my specialised topics.

Not publishing my email address is not an option. The problem, of course, is that publishing my email address on my web pages means I get spam.

Abandoned tactic: Refuse-lists

So how can we filter spam?

One could have a personal "refuse-list" filter that refuses email from certain places (but there are new ones all the time) or if it contains certain terms, like "Make Money Fast" or "Viagra" (but lots of spam will always slip through the net). I have given up on refuse lists because they only capture a small minority of my spam.

The best refuse strategy I used was one Netscape recommended, filter out all Bcc messages. I did find this caught the majority of my spam at one time. But now it only captures a minority of my spam - perhaps the spammers have become wise to this.

Shared network refuse-lists (built collectively by many people, like online updates of lists of viruses) are more promising than personal (individual-built) refuse-lists. But my institution uses a shared network refuse-list and lots of spam still gets through.

New tactic: Accept-lists

The new tactic is to refuse by default, i.e. only allow through email that reaches some acceptance criteria.

  1. Inbox is Trash by default

    The first strategy is very simple - don't have a system where I have to laboriously delete incoming junk email to get it out of my Inbox. I just scan the Subject lines, and if it looks like junk I leave it there unread in the Inbox. Anything that looks like real email is moved to my "Real Inbox" which is where I actually work. Periodically I delete everything in the "Inbox". In other words, the Inbox is a Trash file by default. I do my real work somewhere else.

  2. Accept-list

    The next step is to construct the "accept-list". Email from these addresses gets moved automatically by my email client to the "Real Inbox". As well as a program to move the email, I also want a program to automatically build the accept-list.

    I have written a short program that, every time I run it, extracts all email addresses in all From: and To: lines in all my non-trash mailboxes (i.e. email that I have kept for some reason), and all email addresses found in my personal files on disk, and then the program automatically writes a set of filter rules for my email client, Mozilla Thunderbird (I previously used this system with Netscape Mail). In my case, I find the accept-list consists of about 1500 addresses, which translates into about 300 Mozilla Thunderbird filter rules of the form:

    name="rule N"
    action="Move to folder"
    condition="OR (from,contains,address1) OR (from,contains,address2) OR (from,contains,address3) OR (from,contains,address4) OR (from,contains,address5)"

    Now anybody on the accept-list gets into the "Real Inbox", anyone else is left with all the junk in the "Inbox".

  3. Password

    The final step is for new, random people emailing me, I tell them to use a password, which identifies them as a human and gets them into the "Real Inbox". A human will follow the instructions, a program won't. See the FAQ.

  4. Local email is in Trash by default

    How could a spammer defeat this system, if a lot of people use it? The spammer can't write a program to follow password instructions in general. So how about the spammer forging names on your accept-list? The first problem for them is that they don't have your accept-list. So what rule could their program follow? It seems the only rule they could use that would work with everyone would be to forge a name at your local institution. In fact, spammers already do this.

    The solution (which I adopt) is to treat all mail from my local institution as suspect, and filter it separately so that it is in a Trash file by default.

    But we can go further and stop such forgeries at all. The institution can detect such forgeries if it refuses to allow its users send outgoing email from ISP's. To be precise, the institution can't stop such email being sent to a 3rd party. But it can stop it being sent to one of the institution's users. All email from one local user to another must be sent through the institution's mailhost. To be precise, the institution will not accept from another locale (e.g. an ISP) email with a From: line claiming to be from one of the institution's users. So forged email from one local user to another local user can be detected. Which doesn't leave the spam programs with any other option that will work for everyone.

  5. Postmaster email is in Trash by default

    Finally, how about spam forged to come from me? This generates junk replies from postmasters (to an email I never sent). So I separate all email from a postmaster or similar so that it is in a Trash file by default.

It works!

This system works, so far. Mozilla Thunderbird with hundreds of filter rules runs fine.
Every day, when I check my email:

I started on email in 1987.
Starting around 1995, my email became increasingly unusable because of spam.
I set up this system in 2002. And my email was suddenly usable again! Just like in the early 1990s again.


In 2003 my institution installed at the server level the Bayesian filter SpamAssassin.

I separate out the SpamAssassin hits:

name="spamassassin rule"
action="Move to folder"
condition=" OR (subject,contains,*****SPAM*****)"
and, combined with my system, life is very calm and quiet now!

I still get tons of spam of course (more than ever, in fact), but it's now 95-100 percent filtered correctly into the right mailboxes, so it's no work at all.

2003-04 study

In a 2003-04 study of my email (940 emails over a 17 day period), I got the following results:

Update 2006:

The absolute numbers have increased (now 300 spams a day), but the percentages are still roughly as above:
95 percent of my email is spam.
SpamAssassin catches 80 percent of the spam.
SpamAssassin plus my system catches 95-100 percent (most days 100 percent).
Life is still peaceful, as it has been since 2002.

ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.

Wikipedia: Sometimes I link to Wikipedia. I have written something In defence of Wikipedia. It is often a useful starting point but you cannot trust it. Linking to it is like linking to a Google search. A starting point, not a destination. I automatically highlight in red all links to Wikipedia and Google search and other possibly-unreliable user-generated content.