If there is a way to make money online by abusing a service, you can guarantee spammers will give it a try. They have so little regard for the service, the spammers will club it to death without a second thought, all to increase their chances of making a fast buck.
Somewhat ironically, spammers do have their benefits: at some point you will be forced to harden your systems, both against spammers and against Things That Can Go Wrong. Let me tell you how this worked out for one of my apps, TweetingMachine.
TweetingMachine’s back-end architecture is extremely basic: all contact with Twitter’s API is done via PHP shell scripts, started once a minute via cron. Rather than using a proper message queue at the time of first development, I foolishly wrote a job queue for MySQL instead. Which, unsurprisingly, is the bane of TweetingMachine’s existence to this day.
The job queue works well enough, but if there’s a sudden large influx of messages, it gets a bit confused, stalls, and spawns lots of processes that sit there doing nothing but reading the same record from the database, over and over again. It doesn’t take long for this to wipe the smile off my server’s face.
Thanks to TweetingMachine’s hastily-written code, it was trivially easy to set off this situation, but here’s where the good part of being under attack comes in: I was forced to write code that would eventually be of use to legitimate users.
I now have several jobs that run every now and again, looking for broken data or potential errors, and fixes them up. It didn’t take that long to write either. But the spammers forced me to write it.
This also affects small things; when you’re first launching, have you always made sure that the relevant services (such as Apache, MySQL etc.) start up after a reboot? It doesn’t take long, but it’s another item on the list of “will do later,” frequently overlooked. You shouldn’t have to restart your server; but it’s a great comfort knowing that you have the option if necessary.
Regarding detecting spammers, I highly recommend Stop Forum Spam’s API. I have a cron job that checks new registrations every fifteen minutes, and marks any relevant accounts as being spammy. I have another service that checks new registrations against previously-marked accounts, and also bans as appropriate.
So far, I’ve only had one problem spammer, who, after several bans, I noticed was using multiple IPs. I wrote some generic spammer-detection for the tweets he was entering, with the result that he and several new spammers have now been banned automatically.
I’m convinced that I have it relatively easy when it comes to spammers; but when that first wave arrives, be sure to take advantage of it to harden your systems.
Nice weblog! I’ve been a regular since your first entry was published on reddit. By the way any plans to publish the current status of the Tweeting Machine? Number of active users, income, etc…
Really good point i never consider this while developing. Stop Forum Spam’s API is really good.