More SPAM Fighting

More progress fighting SPAM both on MediaWiki and via email.

I've been trying to find a way to combat SPAM on the Free Geek Arkansas Wiki. Unfortunately, bots that are targeting MediaWiki these days seem rather sophisticated. I turned off anonymous editing and later turned off editing prior to email confirmation. Both of these helped, although I would prefer to be able to run the wiki without either. Still, I was getting too much SPAM, an average of 3 or 4 new, orphaned pages by new, email-confirmed users.

I installed the ConfirmEdit extension and turned on the SimpleCaptcha mode. Somewhat to my surprise, this has not cut down on the SPAM rate, at all. I tried to make it very easy on real users; they only get the CAPTCHA when trying to create a new account. Either the bots simply submit the form repeatedly until they get the CAPTCHA right, or this bot has code to parse out the CAPTCHA and solve it which I suppose isn't too hard. Anyway, it didn't work so today I started looking for something else.

I found some relevant MediaWiki documentation and saw a few features that could be helpful, but the one that stuck out was using a DNSBL. I've noticed that when I clean the SPAM that the rate goes down for a while. It's likely confirmation bias, but I think it is correlated with the fact the I ban the IPs used SPAM the wiki for about a month. That said, I don't want to permanently ban any IPs. IPs get reused, moved, etc. and don't really identify the guilty parties. DNSBL seems like it might be a good compromise.

I used a Wikipedia article to decide which blacklists to use and it was quite helpful. I knew I wanted to support services with automatic delisting (unlike some of the SORBS lists; SORBS is somewhat infamous for nearly extortionistic practices once you are listed, even if it was clearly a mis-configuration). I also wanted to choose something that focused on http SPAMmers, not email SPAM sources. I finally settled on a few and wrote a configuration. I also disabled the ConfirmEdit extension, since it wasn't helping. If SPAM goes down, I may open up the wiki some more. Here's part of my LocalSettings.php:

# Lock down anonymous editing
$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['*']['writeapi'] = false;

# Require email confirmation before editing, an attempt to stop some SPAM-bots
$wgEmailConfirmToEdit = true;

# 2011-01-19: Didn't affect SPAM, disabling.
# Settings for ConfirmEdit extension
# $wgCaptchaClass = 'SimpleCaptcha';
# $ceAllowConfirmedEmail = true;
# $wgGroupPermissions['emailconfirmed']['skipcaptcha'] = true;

# DNSBL stuff -- might catch SPAM
$wgEnableDnsBlacklist = true;
$wgDnsBlacklistUrls = array(
        // '', -- SMTP-oriented, but highly accurate.

I also run a mail server. In addition to all the steps documented in How I Handle SPAM, I added some DNSBL rules there as well. There's quite a few more DNSBLs that are suitable for email, and I am particularly fond of the SpamCop services so I read through the documentation and created a configuration like this:

$ cat /etc/exim4/conf.d/acl/25_local-config_check_connect
    dnslists = \ : \ : \ : \ : \ : \ : \ : \ : \ : \ : \
    message = Blacklisted by DNSBL @ $dnslist_domain
    log_message = $dnslist_matched @ $dnslist_domain: $dnslist_text: $dnslist_value

$ cat /etc/exim4/conf.d/local/acl_check_data
  message = This message contains "$malware_name" (malware).
  malware = */defer_ok
  delay = 2m

# I hope the new header is seen by the spam condition
  dnslists =
  add_header = X-Unlikely-Server: $dnslist_matched @ $dnslist_domain: $dnslist_text: $dnslist_value

  spam = Debian-exim:true/defer_ok
  message = This message is ${spam_score_int}% SPAM.
  add_header = X-Spam-Score: $spam_score ($spam_bar)
  condition = ${if >= {$spam_score_int}{1} {1}{0}}
  set acl_m_spam_delay = ${if < {$spam_score_int}{300} {$spam_score_int}{300}}
  delay = ${acl_m_spam_delay}s
  condition = ${if >= {$spam_score_int}{10} {1}{0}}
  add_header = X-Spam-Report: $spam_report
  condition = ${if >= {$spam_score_int}{100} {1}{0}}

So far, it has stopped one SPAMmer at connect time. I'm going to monitor things, but I'm expecting a further reduce in mail SPAM.