Pages

Wednesday 11 February 2015

One weird old trick to fend off spam

I don't know about you, but for me spam is a real issue. If you're using gmail, or hotmail, or something like that, then they're probably despamming for you and you don't even know about it. But despammers are never perfect; they make two kinds of mistake.

1)  Labelling something as spam that isn't
2) Not labelling someting as spam that is.

The second sort of mistake is mildly annoying, but the first kind of mistake means you could miss an important email.

So I run my own mail server, and do my own despamming.

I do this in several stages. Stage one, is to do with the mail servers that I announce to the world. I run six mail servers. The first one I call, imaginativly, mail1. That is where your mailer should deliver any mail for me. The second exists in case the first one crashes and I don't notice, and I call that, guess what, mail2.

I don't read my mail on those servers; I have another server, my mail processor, that collects any mail from mail1 and mail2 using IMAP. It also visits various other email addresses that I have, mostly set up donkeys years ago (a donkey year is about 20 years). Mail1 has priority 10, and mail2 has priority 20. So your mailer knows that it should preferentially deliver to mail1, because when it gets the list of my mail servers, it can parse the informatin that comes back, and see the priorities.

I also have four more mail servers, and you can probably guess the names. These have priority 200, 300, 400 and 500. So really, they shouldn't get any mail unless both mail1 and mail2 are out of action, and if that's happened, then it'll be caused by a total comms outage, so mail3 to mail6 won't be accessible either.

But they do get mail. Lots and lots of mail. And every single email they get, is spam. What's happening, I think, is this. People who spew out spam, don't really care about doing things right. When you do "dig mx" on a domain name (that's asking for the list of mail servers) the list comes back in a random order; you're supposed to read it to decide which is the highest priority server (in my case, mail1). But spammers just send their spam to the first on the list; this is slightly quicker for them And that's why, everything I get on mail3 to mail6 is spam. And mail3 to mail6 is actually just one server with different names, so I don't even have to run extra servers.

I'd guess that this simple trick, fends off about 2/3 of the spam sent to me (and you could add a hundred more servers to fend off a lot more spam, all actually just the one sever).

On the server that collects all my email together from various sources, I run the despammer what I wrote. That also does a few simple things before doing the complicated thing.

It has a look at the subject and the header. If this isn't using the Roman character set (maybe it's Chinese, or Russian) then I'm not going to be able to understand it anyway. So it's spam, and it's put into a "non-roman" mailbox. if it wasn't actually spam, tough, I can't read Chinese.

It has a look at who the email is addressed so. If that isn't one of a limited list (or if there's no-one that it's addressed to) then it's spam, put into a "not-me" mailbox..

How many people was it addressed to?  If someone sends an email to six people, of which I'm one, then I doubt if I want to read it - spam, into the "spam" mailbox..

If it passes all those tests, then it's put through my despammer program, which looks for things like "make money fast" and "prescription".

And if it passes all that, then I have one final weird old trick - I sort the mail into alphabetical order. Most people view their mail sorted into date order, and that makes a lot of sense. But for despamming, you want alphabetical order. When you see fifteen emails with the subject "Breakthrough Baldness Cure" you can swiftly delete them all. More importantly, when you see seven emails all entitled "Outstanding Invoice" then you know they're spam.

So when I read my email, nearly all the spam has been fended off, and what wasn't fended, is more clearly spam because of the alphabetical order. Then, when I've dealt with the mail, I check the "non-roman" and the other mailboxes, just in case there's something in one of them that I ought to read ... but there very rarely is.

2 comments:

  1. I've found that greylisting works unreasonably well:

    http://en.wikipedia.org/wiki/Greylisting

    Have you tried it ?

    ReplyDelete
  2. I just read up about it - nice idea. I've never tried it.

    ReplyDelete