Before I launch into what analytics spam is, first I'll give you a really quick run-down of how a Javascript-based analytics system works.
Overview of Javascript-Based Analytics
Javascript-based analytics is a very common method of tracking visitors to a website -- it's how Google Analytics works
To install Google Analytics, all you need to do is place a small snippet of code on each page of your site that you want to track. The code is javascript, hosted by Google. This code drops a cookie onto the visitor, and through a mix of javascript, voodoo, and the cookie, the visitor is then tracked as they navigate around your site.
(Note: I wanted to insert a handy little flow chart of how javascript analytics works but couldn't find one! If anyone knows of one, please drop a link in the comments and I'll add it into the post. There's a bit more in-depth explanation of how it works here though.)
So, what do we mean exactly by 'tracked'? Well, each time a visitor comes to your site, the following kind of information is logged:
- What operating system the visitor is using
- What browser the visitor is using
- What screen resolution the visitor has
- Where the visitor came from
- How long they spend on the site for that session
- Which pages they visit
I'm sure I'm preaching to the converted here - you all know what Google Analytics is and how it works, so I'll shut up about that now.
It's important to highlight, however, that one of the differences between JS analytics and log-file-based analytics is that JS analytics won't track visits from search engine robots. Log-file analytics do typically show up these visits (which can really skew your data if you're not careful!). The reason is that these robots don't execute javascript. We all know this from SEO-101, right? Don't put your navigation in javascript is one of the basics. And it's because the search engine robots don't execute javascript and hence can't see those links (note that in some cases search engine robots can execute limited javascript, but they certainly don't execute analytics scripts), so they don't show up in your analytics, either.
Introducing Analytics Spam
So now that you're up to speed and in the right frame of mind, let me tell you two things:
- The information that analytics picks up about a visitor can be faked.
- It is possible to build robots that index websites which DO execute javascript.
Why would you want to do this? Here's just a few uses:
- Inflating traffic to your site in a natural way to increase valuation
- Screwing with a competitor's analytics
- Generating ego-bait type visits to spam pages from webmasters
An Example of Analytics Spam in Action
For anyone that thinks this is never going to happen or thinks it doesn't work, check out this traffic to my personal blog:
The site in question is: https://iamready4u.synthasite.com/
If you visit the site you'll see it has auto-generated content and what looks like an auto-generated link to my site. I have a hard time believing that this is genuine traffic (though I'm not 100% sure either way - which is what makes the whole prospect even more troubling)
Is This the Future of Web Spam?
Analytics spam has existed for a while for log-file based analytics due to the fact that some analytics packages link to their top referrers from a public, indexable page, which means you can spam for links but being able to spam javascript-based analytics too opens up a whole new world, particularly with the prevalence of google analytics.
Could this be the next phase of spam? Will we be seeing Google taking preventative measures against 'spam traffic'? Is Google already taking preventative measures? Personally I doubt much is being done about it as it's probably a very minor problem, but it wouldn't surprise me if this became a bigger issue in the next 12 months.
As a closing thought - there's not really a lot you can do to protect yourself from this. The best tactic I've come up with is running two types of analytics side by side (preferably logging traffic via different methods - e.g., one javascript based and one log file based) and compare the results for anything funky. If done well, the fake traffic would be hard to detect, but in reality most spammers will be lazy and won't randomise their traffic enough, so if you look closely it'll stand out.
I love it. Soon my competitors will think all their visitors have 800 x 600 screen resolutions and will redesign accordingly.
This could be my ticket outta here...
I can think of a few ways this could be used too but I'll refrain from digressing into exploitation and concentrate on a couple things to watch out for where it can cost you $$ if you're a victim
- If you use an analytics system like WebTrends OnDemand or Omniture's latest and greatest system that has packages set up by the amount of pageviews you get...it could cost you money if you're getting spammed.
- Like was mentioned before, if someone hits your landing pages or re-uses your tracking code for conversions, it could cost you money in PPC or CPM.
- If you are paying for ad space on a site that has been hit with this ugly beast, you could end up paying for pageviews you didn't really get.
- If your in-house seo is more blackhat than you knew and they are spamming their own site to justify their own work... it will cost you their salary until you fire their lying arse.
- I feel like I should start putting "You might be a redneck" after my comments.
Since a lot of us here are SEOs, it's likely that not many people have thought of it from the perspective of an employer who doesn't understand SEO and employs someone in-house. Excellent point!
Referrer URL spamming is one of the first blackhat things I came across and learned about in the late 90's and it still works today to some extent. I don't think that I'd go to the trouble of executing JS though - easier just to bypass it and load the tracking pixels directly. :)
Good point about the tracking pixel - you have all the info you need to do that, don't you....
I think you're right evilgreenmonkey and, as expected a bit better at being blackhat than I am ;-)
Even on the white-hat side, trying to interpret the difference between standard analytics and JS-based analytics is tricky business. I realized yesterday that my blog gets about 30-50 visitors/day, according to Google Analytics, but 500+ according to my log files. Some of that's Yahoo Slurp!, one of the biggest pigs on the internet, but much of the rest is bot attacks, comment spammers, etc. Looking at the difference between the two types of analytics can really be an aid in separating your human and robotic traffic.
Clever article, Tom. I also know of several SEO's who base their fee structure on page views. These guys are all ethical, but I would imagine that unethical SEO's could capitalize on this technique in yet another way...
And if you can fake the referrer too, you can make a competitor think that their paid-for advertising is having a really low conversion rate.
...or an employer/client think they have a really high ROI on their ad spend.
I think that for media owners - online job sites in particular spring to mind - there could be quite a bit to gain from inflating traffic as often their rates are based on traffic metrics.
Anyone selling advertising on a CRM basis could also benefit.
Some sites get themselves ABCe certified (in the UK at least - not sure what the equivalent is elsewhere) - does anyone know if they check for traffic spam?
Excellent post :)
If part of a metric for a site's value is its traffic, then this would be very handy trick. Specify traffic as opposed to members or participation and you're good to go
I once saw loads of traffic coming to a particular page from stumbleupon. It turned out that a site was using my Analytics code on their site and Google was reporting the visits quite happily.
Its only when you digg deep that you can find the domain.
So to inflate somebodys visitors just use their google analytics code and copy a page of theirs. Stick it in an iframe to be really clever then you get the referrer too.
I use a number of filters on Google Analytics to split up visits to the real site into one profile, and visits to other sites (includes search engine caches, and so on) into a different profile. With staff having a cookie to ID them, they can also be filtered out of the stats.
Great post Tom.
I've seen quite a lot of this on one of the sites I used to work on. Seemed to be predominantly viagra, porn and poker sites that were doing the spam.
Are their any combined analytics packages (javascript and log-file based) that correlate the data together?
i believe unica's Affinium Netinsight does that.
I think what you might be saying is "don't believe everything you read on the internet". I wonder how prevalent this kind of tactic is for people inflating their own traffic - for advertisers / purchasers etc.
I wish I could think of better defences. I think it is hard to spoof IP addresses for this kind of purpose, so that is probably a primary defence (certainly at a network level - e.g. Google could watch out for unusual behaviour). Statistical analysis? I think that would have too many false positives...
Hmmm
Hard to spoof IPs but easy to switch....if Google watched this behavior on a network level, it would be regional discrimination-I imagine that eventually a lot of traffic from Russia/India (for example) would be dismissed, and the accuracy of the data would suffer.
Ouch! The only way to deal with this is to log ip activity and remove suspicious behaviour. You could (if you were particulary mean) distribute an attack like this across multple IP's by writing a nasty bit of adware, but why would someone want to waste their time and resources doing this? Javascript based analytics packages are ultimately flawed. Engines like Searchme execute javascript, often with hilarious consequences. I suppose the best solution is to get smarter with in house analytics and log files.
Interesting stuff Tom, I'd guess that the current key benefit would be traffic inflation rather than spamming people's figures - though the approach of injecting a load of fake visits into a competitor's analytics certainly highlights the more Machiavellian side of online business...
Perhaps you've seen into the future of Google's analytical plans and the next gen Analytics will be a combined JS/Log File package (all presented in pretty graphs and tables so we won't have to work too hard).
for everyone bishing about blackhat information on SEOmoz, I'll throw in my tiny whitehat tip:
To keep from clouding your own analytics data, browse your clients sites with firefox + noscript. You can allow globally if you'd like, and block only google-analytics.com. Alternatively, you can browse without javascript.
*finger twirl* woohoo! </sarcasm>
Those of you who are a bit more advanced could hide a cookie somewhere on the site and serve the analytics script to only those without it. In this way you can remove employees, clients, investors, etc. from your analytics data.
Perhaps you could try giving these analytics spammers a similar cookie through a hidden link. Of course, this would require the spammer to enable and accept your cookie. It won't work for every case, but it might help in some.
Can't you do IP exclusions in Google Analytics (not sure on other packages) to not count visits from staff?
@ Matthew : yes you have this option in analytics.
"If you want to exclude internal traffic from appearing in your reports, you can filter out a specific IP address or a range of IP addresses." : just follow the link!
You can also serve an "employee cookie" and filter all that traffic out too (though there is a typo in the example code for this in the Google help files).
The IP exclusion doesn’t work if one doesn’t have static IP address. In India internet connection with dynamic IP address is quite common (Like the one I am using now). For such cases one has to use the cookie based approach. As an extreme measure I also create one more profile in Google Analytics, where I just exclude the whole city where I live.
Great post, Tom. This brings up another possibility: faking user behavior in Google's SERPs. Set up bots that query Google for your keywords, click on your URL, and then spend lots of time on your site. It might require you to set up multiple Google puppet accounts, but if done right... you could theoretically increase your rankings by fooling Google into thinking that everyone loves your page. Now if only I knew how to program bots... damn.
Good point. But may be that’s why Google doesn’t pay that much attention to usage data. Here is the post on the SEOBook blog which says:
Peter Norvig - Google Does Not Directly Use Search Usage Data in Relevancy Algorithms
And yes, you wrote,
Now if only I knew how to program bots... damn.
Now I don’t really think that programming bots should be that difficult for somebody as ‘creative as Darren Slatten’
Tom,
Thanks for sharing some new devious and nefarious spamming tactics with us.
I can think of many ways this tactic could be exploited to the benefit of a spammer, both in capitalizing on his client's unawareness of traffic sources (i.e. just showing raw traffic data), as well as duping his competitors.
Fortunately, neither myself or anyone else that reads this would ever consider doing so - but it's nice to be aware of.
Stay tuned for my next post: "Spamming Google Analytics For Fun & Profit!"
Thanks Sean - I'll try and focus on some more whitehat stuff soon. Promise!
I think it teaches an interesting lesson though that you should never trust analytics too much, always cross-reference and verify.
Tom -
I say write whatever you find of value. Whitehat, blackhat - doesn't matter.
The thing about stuff like is this that it's good to be aware of it. As many people have said - it's important to know what the blackhats are up to - particularly if you're a whitehat.
You know the old saying - "Keep your friends close and your enemies closer".
This is a very informative piece and I found it of significant value, especially in the event that the tactic becomes more pervasive.
Good work.
Thanks Sean - I really respect the community here so positive feedback is always greatfully received :-)
Which is why I still have trouble understanding what the fuss was about after SMX... surely that sentence it all up pretty damn well :\
Jane, Read this.
Read it last night; totally agree :)
agree as well and yet I still think you're a troll :P
The thing about stuff like is this that it's good to be aware of it.
What a nice excuse! I just love it.
This just is a continuation of referrer spam which people used to get back links from indexed web statistics.
But still, Google Analytics is really killer app. Best value. Most of the web pages need no more, to get optimized.
Jaak, https://seoapplied.blogspot.com/
just wondering if this has increased over the past year?
Strangely enough, my site today was experiencing almost double the reported 'users online' than the average. This happened all day and I was wondering if it could be somebody just trying to mess with my analytics or just mess with the site... Still not sure whats going on. :(
1) Is it possible to find out which referrers may be doing this?
2) Is there a code or a plug-in that can be used in the analytics code to filter out fake referrers?
3) Is this something very specific to Google Analytics or is it prevalent in some of the high end analytics solutions like Omniture, Web Trends too?
I guess Google can filter out spammy referrers easily by adding up the total amount of referrers sent from a domain across the analytics network and cross reference it with PR or Alexa rank to determine if traffic in = traffic out. Even easier would be to look for spikes in referrer traffic as "ego bait" type spam needs to be near the top of referrers for it to get a click through.
very good post Tom, especially that there are many "small" businesses which look at those statistics everyday (it is a free tool after all!), and could be really thinking investing more money on the web based on wrong figures...
On a less diabolical level analytics spam can be used to make that advertiser/affiliate think you're sending them that much more traffic. (I've never done this I swear....)
Seems like you could avoid IP issues by putting the js on a network of link farms or even through XSS exploits. Then whenever a visitor hits a site with the js it fakes some visits to a few other sites, using their IP address.
i'd love to comment more on how this works.. but it would be sooo libelous... that and i'm not sure if the guy who got fired from a client after i showed them what was happening... has been googling me since he got fired.. (he tried to DOS the client's webserver, however i had put some code on the server from a previous entity i worked with (READ: the bank's security analyst ), that does some tracing things that narrow downt he information range and notifies the admin of the ISP to generate log files, so let's say there are criminal activities i cannot comment on due to NDA and umm.. just sound judgement.
however.. since i will leave the name out, this won't be a plug, there are certain analytics programs in js other than google analytics that send you an email if this happens..
but awatson.. (cough splog, cough same counter on 1000 different splogs.. cough, pay per visit, cough, get money and run)
DISCLAIMER: paisley doesn't violate rules, since 1996.. well Search Engine rules at least... i did get stopped for 110 in a 55 in dallas today. :(
At Nielsen Online (I used to work there) they do have fraud detection algorithms to check for robots and spiders. Using various metrics they do try and catch this and filter out those records. At the moment it seems pretty effective but you're right - people could develop spam spiders that do a far better of looking like 'real' people.
Thank you for the excellent post. I am going back to my own log files as I was seeing some unusual traffic from a strange place and didn't know what to make of it. Now I will look again.
This was also enlightening as to why someone might deploy some of these tactics. I can never understand all the effort that goes into spam bots and what there is to gain but perhaps what I am seeing is just the side effect and the target is the traffic. I had never thought of it that way.
This is very interesting information that you wrote about in this post. Despite the fact you are not a black hatter most of these tips you displayed are black hat tactics. Hopefully you understand spamming is wrong because it completely violates people’s privacy, but following that this was extremely enjoyable to read. This blog is always nicely written and I look forward to reading more of your great work.
Hi NSS,
Glad you enjoyed it - I know that this is pretty black hat but more white hat stuff will be coming your way soon! Check out my PPC video if you haven't already. That's solid white hat :-)