Semalt infecting computers to spam the web

All over the web people are complaining about the new SEO startup Semalt from Ukraine. Nabble investigated their methods and discovered they employ malware to crawl the web and spam server logs, potentially ruining your Google Analytics data.

image

Update / July 31 — Also make sure to read the second part, where we proof that Soundfrost violates their users privacy: Semalt & Soundfrost caught spying

So what is this all about? According to their own website, they are 

[…] a professional webmaster analytics tool that opens the door to new opportunities for the market monitoring, yours and your competitors’ positions tracking and comprehensible analytics business information.

http://semalt.com/what-is-semalt.php

That sounds nice, but the complaints focus on something else: referral spam used by Semalt to drive traffic to their website, apparently to get users to sign up for their € 14.65 / month service.

The spam pollutes Google Analytics data, because all crawler traffic uses the HTTP referer header containing the URL semalt.semalt.com/crawler.php (which redirects to semalt.com).

We also became increasingly annoyed as current workarounds involve applying Google Analytics filters or editing the .htaccess file (don’t use the tool provided by Semalt, it’s known not to work). As we maintain dozens of websites on different domains, with different Google Analytics accounts, this is a time consuming task. We wanted to solve the problem in one go, but reaching out to Semalt via Twitter didn’t help much.

Then a blog post about Semalt by The New Frontier caught our eye. Especially the list providing several IP addresses crawler requests originate from (confirmed from data from our own Google Analytics accounts) looked suspicious.

Ordinarily, Internet based companies heavily engaged in the use of automation leverage only a handful of cloud networks to host their programs.  The presence of more than 50 distinct networks linked to the crawler.php index - including those dedicated to residential and mobile use - would suggest that Semalt’s spidering resources are built on the back of a large scale botnet as opposed to a legitimate telecommunications infrastructure.

http://thenewfr0ntier.blogspot.nl/2014/03/anyone-running-blogger-or-wordpress.html

Then Twitter user thomas.b mentioned several domains associated with a IP address the domain semalt.com uses, one of them offering a piece of software called Soundfrost.

Is this software related to Semalt in any way, and more importantly, does it function as a hidden crawler running from unsuspecting users who installed Soundfrost? The answer is: yes it does.

Upon installing Soundfrost and monitoring the outgoing traffic of the system, we discovered that Soundfrost is indeed the culprit.

Analysis of the Soundfrost installation (we got our hands dirty with version 3.8.2.0 downloaded from soundfrost.org) shows that it places several executable files at various locations:

  • %PROGRAMFILES%\SoundFrost\SoundFrost.exe
  • %PROGRAMFILES%\SoundFrost\SoundFrostService.exe
  • %PROGRAMFILES%\WiredTools\WiredTools.exe
  • %LOCALAPPDATA%\ContentAgent.exe
  • %LOCALAPPDATA%\ContentFinder.exe

As soon as the main program (SoundFrost.exe) us started, all 5 executables show up running in the task manager. When closing the main window, only SoundFrost.exe is killed, the other 4 programs keep running in the background.

What are they doing? Could they be the Semalt crawlers? Let’s have a detailed look.

We monitored the traffic from and to those programs, and when looking at the particular TCP stack of ContentFinder.exe we discovered the following:

  1. The program makes a request for the following URL: http://b[2DIGITS].openfrost.net/get_link.php.
  2. A ‘302 Moved Temporarily’ is returned with a new location in the form of http://server[2DIGITS].openfrost.com/get_link.php?newagent=1 which is immediately called
  3. This server (running nginx/PHP) responds with a plaintext URL in the form of http://semalt.semalt.com/semalt.php?u=[RANDOMSITE]
  4. The program makes a request for the following URL: http://semalt.semalt.com/semalt.php?u=[RANDOMSITE]
  5. This server responds with a HTML document including JavaScript designed to redirect to yet another URL using a native mouse click event:
    <html>
    <head>
    <title>...</title>
    <meta HTTP-EQUIV="Content-Type" content="text/html; charset=windows-1251">

    <script language="JavaScript">
    window.onload = function() {
    var myEvt = document.createEvent('MouseEvents');
    myEvt.initEvent('click', true, true);
    document.getElementById('myLink').dispatchEvent(myEvt);
    }
    </script>
    </html>
    <body>
    <a id="myLink" href="http://semalt.semalt.com/crawler.php?u=[RANDOMSITE]">Redirecting ...</a>
    </body>
    </html>
  6. Next, this URL (http://semalt.semalt.com/crawler.php?u=[RANDOMSITE]) is requested by the program, but it’s not clear whether this is done in interactive mode or by parsing the returned HTML.
  7. A similar HTML data as above is returned except this time the href attribute of the ‘myLink’ element contains the URL for [RANDOMSITE].
  8. In the last step the [RANDOMSITE] is honored with a visit, containing the offending HTTP Referer header:

    image

The program ContentAgent.exe behaves similarly, except from small differences in TCP and HTTP protocols and URLs (e.g. it requests http://b[2DIGITS].openfrost.net/getnew.php using HTTP HEAD instead of HTTP GET in step 1)

We can now understand the high concentration of visits from Brazil mentioned by The New Frontier, because malware statistics of those two programs show they’re mainly distributed in that country.

Some frequent IP addresses for the *.openfrost.net and *.openfrost.com servers include:

  • 217.23.11.108
  • 217.23.7.19
  • 217.23.7.130
  • 217.23.7.180
  • 109.236.86.209

All IP addresses belong to a netblock owned by Dutch Worldstream Internet Solutions, who also host the main semalt.com domain.

This all places the following tweet from a Semalt representative in an entire different perspective, doesn’t it? :)

So, I tried it and talked about it, now I want my money back.

Testing environment:

  • Virtualized Windows 7 Home Premium 32bit SP 1
  • VMWare Workstation 9.0.2

Tools used:

Also note that it’s not impossible that there’s other software out there behaving in the same way.

Update / July 30 — We just finished writing this article, when Google announced they’re introducing a new toggle to filter bots and spiders. Whether Semalt is on this list is unclear at this moment.

Update / July 31 — Also make sure to read the second part, where we proof that Soundfrost violates their users privacy: Semalt & Soundfrost caught spying

Update / August 1 — Please leave your comment or send us an e-mail if you have any additional information you want to share. We have published a list of Semalt related IP addresses and domains anyone can contribute to: https://docs.google.com/spreadsheets/d/16nVC8ZR84lPyiiHwd-Hfe6QAdtmy9StbTzkfGN7An-s/edit?usp=sharing

Update / August 8 — We’ve created a simple PHP package to block referrer spammers such as Semalt from visiting your site: https://github.com/nabble/semalt-blocker

Joram van den Boezem
google.com/+JoramvandenBoezem | joram@nabble.nl

  1. roxannedarling reblogged this from nabblenl
  2. macfilipe reblogged this from nabblenl
  3. lefooey reblogged this from nabblenl and added:
    SEMalt is into more than just spamvertizing. Block these guys from your logs for the good of your users, please.
  4. nabblenl posted this
blog comments powered by Disqus