Before I launch into my inaugural moz blog post, let me make one thing clear. I'm not a blackhat. We don't do blackhat work for clients and frankly I don't do that much of it in my free time either. Just thought I'd like to clear that up before I launch into talking about the shadier things online.... Again!

Before I launch into what analytics spam is, first I'll give you a really quick run-down of how a Javascript-based analytics system works.

Overview of Javascript-Based Analytics

Javascript-based analytics is a very common method of tracking visitors to a website -- it's how Google Analytics works

To install Google Analytics, all you need to do is place a small snippet of code on each page of your site that you want to track. The code is javascript, hosted by Google. This code drops a cookie onto the visitor, and through a mix of javascript, voodoo, and the cookie, the visitor is then tracked as they navigate around your site.

(Note: I wanted to insert a handy little flow chart of how javascript analytics works but couldn't find one! If anyone knows of one, please drop a link in the comments and I'll add it into the post. There's a bit more in-depth explanation of how it works here though.)

So, what do we mean exactly by 'tracked'? Well, each time a visitor comes to your site, the following kind of information is logged:
  • What operating system the visitor is using
  • What browser the visitor is using
  • What screen resolution the visitor has
  • Where the visitor came from
  • How long they spend on the site for that session
  • Which pages they visit
Along with a bunch of other stuff.

I'm sure I'm preaching to the converted here - you all know what Google Analytics is and how it works, so I'll shut up about that now.

It's important to highlight, however, that one of the differences between JS analytics and log-file-based analytics is that JS analytics won't track visits from search engine robots. Log-file analytics do typically show up these visits (which can really skew your data if you're not careful!). The reason is that these robots don't execute javascript. We all know this from SEO-101, right? Don't put your navigation in javascript is one of the basics. And it's because the search engine robots don't execute javascript and hence can't see those links (note that in some cases search engine robots can execute limited javascript, but they certainly don't execute analytics scripts), so they don't show up in your analytics, either.

Introducing Analytics Spam

So now that you're up to speed and in the right frame of mind, let me tell you two things:
  1. The information that analytics picks up about a visitor can be faked.
  2. It is possible to build robots that index websites which DO execute javascript.
In theory there's nothing to stop you building a robot to visit people's sites, executing javascript, and looking just like real visitors. You can even make it blend in with their other traffic by faking browser, OS, language, etc in the appropriate percentages based on traffic to your own sites. You can even make the time on site accurate and crawl around the site to avoid a 100% bounce rate, too. All in all, this would be incredibly difficult to detect.

Why would you want to do this? Here's just a few uses:
  • Inflating traffic to your site in a natural way to increase valuation
  • Screwing with a competitor's analytics
  • Generating ego-bait type visits to spam pages from webmasters
The hardest part to fake would be the IP address, though even this isn't impossible by any means.

An Example of Analytics Spam in Action

For anyone that thinks this is never going to happen or thinks it doesn't work, check out this traffic to my personal blog:

analytics spam in action

The site in question is: https://iamready4u.synthasite.com/

If you visit the site you'll see it has auto-generated content and what looks like an auto-generated link to my site. I have a hard time believing that this is genuine traffic (though I'm not 100% sure either way - which is what makes the whole prospect even more troubling)

Is This the Future of Web Spam?

Analytics spam has existed for a while for log-file based analytics due to the fact that some analytics packages link to their top referrers from a public, indexable page, which means you can spam for links but being able to spam javascript-based analytics too opens up a whole new world, particularly with the prevalence of google analytics.

Could this be the next phase of spam? Will we be seeing Google taking preventative measures against 'spam traffic'? Is Google already taking preventative measures? Personally I doubt much is being done about it as it's probably a very minor problem, but it wouldn't surprise me if this became a bigger issue in the next 12 months.

As a closing thought - there's not really a lot you can do to protect yourself from this. The best tactic I've come up with is running two types of analytics side by side (preferably logging traffic via different methods - e.g., one javascript based and one log file based) and compare the results for anything funky. If done well, the fake traffic would be hard to detect, but in reality most spammers will be lazy and won't randomise their traffic enough, so if you look closely it'll stand out.