Today one out of four emails on the internet is an image spam message. This is not because spammers have just realized the marketing benefits of captive colors. Instead, this is because the spammers have identified a weakness in many anti-spam systems and are moving diligently to maximize this opportunity by fully exploiting this weakness.
Two of the ways of getting by a defense system are to overwhelm it and to evade it. For years spammers have used a combination of these approaches to get past anti-spam systems. Last month we talked about how the spam volumes were overwhelming email networks. Now we will explore the spammers' newly popular evasion method—image spam. Image spam is a spam message that uses graphics to display the message to the recipient rather than plain text. Moving the content from the text part of a message to being embedded in an image essentially moves it to the blind spot of the spam filtering systems. The spammers have leapfrogged from hiding their text within other text to now moving it to a place that is unreachable by most anti-spam systems. Much image spam advertises things like stocks that do not require a click action by the recipient. However image spam is increasingly being used to advertise a wide range of offerings by providing website address, email address or phone number in the image as well.
Several approaches have been tried to defend against image spam messages. One of the early approaches was to try to extract the text from the image and get back to doing traditional content filtering. This amounts to running behind the text using optical character recognition (OCR). Attempting to use OCR to deal with image spam is like taking a knife to a gun fight - it is simply an inferior technology. Now that OCR is being used, spammers have the goal of making their messages readable by humans but undecipherable by software. This is the same goal of capthca technology used by e-commerce sites to block automated registrations. They use text obfuscation techniques to further hide their text from OCR technologies. So the spammers are using the security community's own technology against us. Best case, if the OCR is successful in extracting the text, then you are now back to doing content filtering based spam detection. There may still be misspellings, Bayesian poisoning and other evasive techniques within the message. This is why using OCR is like taking a detour only to go backwards.
A different approach that has proved effective is message fingerprinting. Message fingerprinting creates signatures of email messages including the body and attachment and tracks messages moving across the internet. It is able to identify reoccurring messages that are coming from known spammers and unknown senders. Newer approaches focus on creating fingerprints specifically for image files. This is necessary because normal message fingerprinting happens from left-to-right the way English text is read. Images however are composed in blocks not line-by-line left-to-right. Secure Computing Research has developed image fingerprinting technology that does image normalization such as edge detection and thresholding in order to ignore variations and focus on reoccurring parts.
The current statistics suggest a continued increase in the amount of image spam. From May to November, image spam has grown from 10 percent to 30 percent of all spam and overall spam volumes have increased by over 50 percent. Additionally, spammers are only at the beginning of experimenting with the range of freedom that they have in formatting image spam. We see animated GIFs being used to hide the text across multiple frames. We see several small images being pieced together to create a larger spam message. The good news is that as the spammers use more advanced techniques to hide their text, it makes them stand out further from images found in legitimate email. The security community must remain focused on leveraging these properties to better identify the spammers.