The spam in Google Analytics (GA) is becoming a serious issue. Due to a deluge of referral spam from social buttons, adult sites, and many, many other sources, people are starting to become overwhelmed by all the filters they are setting up to manage the useless data they are receiving.
The good news is, there is no need to panic. In this post, I'm going to focus on the most common mistakes people make when fighting spam in GA, and explain an efficient way to prevent it.
But first, let's make sure we understand how spam works. A couple of months ago, Jared Gardner wrote an excellent article explaining what referral spam is, including its intended purpose. He also pointed out some great examples of referral spam.
Types of spam
The spam in Google Analytics can be categorized by two types: ghosts and crawlers.
Ghosts
The vast majority of spam is this type. They are called ghosts because they never access your site. It is important to keep this in mind, as it's key to creating a more efficient solution for managing spam.
As unusual as it sounds, this type of spam doesn't have any interaction with your site at all. You may wonder how that is possible since one of the main purposes of GA is to track visits to our sites.
They do it by using the Measurement Protocol, which allows people to send data directly to Google Analytics' servers. Using this method, and probably randomly generated tracking codes (UA-XXXXX-1) as well, the spammers leave a "visit" with fake data, without even knowing who they are hitting.
Crawlers
This type of spam, the opposite to ghost spam, does access your site. As the name implies, these spam bots crawl your pages, ignoring rules like those found in robots.txt that are supposed to stop them from reading your site. When they exit your site, they leave a record on your reports that appears similar to a legitimate visit.
Crawlers are harder to identify because they know their targets and use real data. But it is also true that new ones seldom appear. So if you detect a referral in your analytics that looks suspicious, researching it on Google or checking it against this list might help you answer the question of whether or not it is spammy.
Most common mistakes made when dealing with spam in GA
I've been following this issue closely for the last few months. According to the comments people have made on my articles and conversations I've found in discussion forums, there are primarily three mistakes people make when dealing with spam in Google Analytics.
Mistake #1. Blocking ghost spam from the .htaccess file
One of the biggest mistakes people make is trying to block Ghost Spam from the .htaccess file.
For those who are not familiar with this file, one of its main functions is to allow/block access to your site. Now we know that ghosts never reach your site, so adding them here won't have any effect and will only add useless lines to your .htaccess file.
Ghost spam usually shows up for a few days and then disappears. As a result, sometimes people think that they successfully blocked it from here when really it's just a coincidence of timing.
Then when the spammers later return, they get worried because the solution is not working anymore, and they think the spammer somehow bypassed the barriers they set up.
The truth is, the .htaccess file can only effectively block crawlers such as buttons-for-website.com and a few others since these access your site. Most of the spam can't be blocked using this method, so there is no other option than using filters to exclude them.
Mistake #2. Using the referral exclusion list to stop spam
Another error is trying to use the referral exclusion list to stop the spam. The name may confuse you, but this list is not intended to exclude referrals in the way we want to for the spam. It has other purposes.
For example, when a customer buys something, sometimes they get redirected to a third-party page for payment. After making a payment, they're redirected back to you website, and GA records that as a new referral. It is appropriate to use referral exclusion list to prevent this from happening.
If you try to use the referral exclusion list to manage spam, however, the referral part will be stripped since there is no preexisting record. As a result, a direct visit will be recorded, and you will have a bigger problem than the one you started with since. You will still have spam, and direct visits are harder to track.
Mistake #3. Worrying that bounce rate changes will affect rankings
When people see that the bounce rate changes drastically because of the spam, they start worrying about the impact that it will have on their rankings in the SERPs.
This is another mistake commonly made. With or without spam, Google doesn't take into consideration Google Analytics metrics as a ranking factor. Here is an explanation about this from Matt Cutts, the former head of Google's web spam team.
And if you think about it, Cutts' explanation makes sense; because although many people have GA, not everyone uses it.
Assuming your site has been hacked
Another common concern when people see strange landing pages coming from spam on their reports is that they have been hacked.
The page that the spam shows on the reports doesn't exist, and if you try to open it, you will get a 404 page. Your site hasn't been compromised.
But you have to make sure the page doesn't exist. Because there are cases (not spam) where some sites have a security breach and get injected with pages full of bad keywords to defame the website.
What should you worry about?
Now that we've discarded security issues and their effects on rankings, the only thing left to worry about is your data. The fake trail that the spam leaves behind pollutes your reports.
It might have greater or lesser impact depending on your site traffic, but everyone is susceptible to the spam.
Small and midsize sites are the most easily impacted - not only because a big part of their traffic can be spam, but also because usually these sites are self-managed and sometimes don't have the support of an analyst or a webmaster.
Big sites with a lot of traffic can also be impacted by spam, and although the impact can be insignificant, invalid traffic means inaccurate reports no matter the size of the website. As an analyst, you should be able to explain what's going on in even in the most granular reports.
You only need one filter to deal with ghost spam
Usually it is recommended to add the referral to an exclusion filter after it is spotted. Although this is useful for a quick action against the spam, it has three big disadvantages.
- Making filters every week for every new spam detected is tedious and time-consuming, especially if you manage many sites. Plus, by the time you apply the filter, and it starts working, you already have some affected data.
- Some of the spammers use direct visits along with the referrals.
- These direct hits won't be stopped by the filter so even if you are excluding the referral you will sill be receiving invalid traffic, which explains why some people have seen an unusual spike in direct traffic.
Luckily, there is a good way to prevent all these problems. Most of the spam (ghost) works by hitting GA's random tracking-IDs, meaning the offender doesn't really know who is the target, and for that reason either the hostname is not set or it uses a fake one. (See report below)
You can see that they use some weird names or don't even bother to set one. Although there are some known names in the list, these can be easily added by the spammer.
On the other hand, valid traffic will always use a real hostname. In most of the cases, this will be the domain. But it also can also result from paid services, translation services, or any other place where you've inserted GA tracking code.
Based on this, we can make a filter that will include only hits that use real hostnames. This will automatically exclude all hits from ghost spam, whether it shows up as a referral, keyword, or pageview; or even as a direct visit.
To create this filter, you will need to find the report of hostnames. Here's how:
- Go to the Reporting tab in GA
- Click on Audience in the lefthand panel
- Expand Technology and select Network
- At the top of the report, click on Hostname
You will see a list of all hostnames, including the ones that the spam uses. Make a list of all the valid hostnames you find, as follows:
- yourmaindomain.com
- blog.yourmaindomain.com
- es.yourmaindomain.com
- payingservice.com
- translatetool.com
- anotheruseddomain.com
For small to medium sites, this list of hostnames will likely consist of the main domain and a couple of subdomains. After you are sure you got all of them, create a regular expression similar to this one:
yourmaindomain\.com|anotheruseddomain\.com|payingservice\.com|translatetool\.com
You don't need to put all of your subdomains in the regular expression. The main domain will match all of them. If you don't have a view set up without filters, create one now.
Then create a Custom Filter.
Make sure you select INCLUDE, then select "Hostname" on the filter field, and copy your expression into the Filter Pattern box.
You might want to verify the filter before saving to check that everything is okay. Once you're ready, set it to save, and apply the filter to all the views you want (except the view without filters).
This single filter will get rid of future occurrences of ghost spam that use invalid hostnames, and it doesn't require much maintenance. But it's important that every time you add your tracking code to any service, you add it to the end of the filter.
Now you should only need to take care of the crawler spam. Since crawlers access your site, you can block them by adding these lines to the .htaccess file:
## STOP REFERRER SPAM RewriteCond %{HTTP_REFERER} semalt\.com [NC,OR] RewriteCond %{HTTP_REFERER} buttons-for-website\.com [NC] RewriteRule .* - [F]
It is important to note that this file is very sensitive, and misplacing a single character it it can bring down your entire site. Therefore, make sure you create a backup copy of your .htaccess file prior to editing it.
If you don't feel comfortable messing around with your .htaccess file, you can alternatively make an expression with all the crawlers, then and add it to an exclude filter by Campaign Source.
Implement these combined solutions, and you will worry much less about spam contaminating your analytics data. This will have the added benefit of freeing up more time for you to spend actually analyze your valid data.
After stopping spam, you can also get clean reports from the historical data by using the same expressions in an Advance Segment to exclude all the spam.
Bonus resources to help you manage spam
If you still need more information to help you understand and deal with the spam on your GA reports, you can read my main article on the subject here: https://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/.
Additional information on how to stop spam can be found at these URLs:
In closing, I am eager to hear your ideas on this serious issue. Please share them in the comments below.
(Editor's Note: All images featured in this post were created by the author.)
Great timely article. We have seen a huge increase in ghost referrer spam in the last year. You can also setup a retroactive filter via table filter/custom segment as a view level filter will not go back in time. Great article!
Great tip Kevin, a custom segment would work perfectly to filter historical data.
Kevin,
Thanks for sharing, that's totally right using the same expressions for the filters in an advanced segment will help to see clean historical data. It is also an excellent way to test your filters (no matter if they are for spam or another purpose).
You can also find a link just before the Bonus Resources with a guide to clean historical data with an advanced segment.
I agree that this is the best working solution at the moment. However, I've noticed the Ghost Spam is getting smarter. We're seeing the spam hitting the correct hostname increasingly often. I'm afraid this fix will only be temporary.
Agreed, this fix is already past its best, unfortunately. You can get a bit closer by filtering for screen resolution also, but that still won't catch everything.
Yup, came here to point this out. There is no "one filter solution" to this.
Thank you for commenting WinPratt.
This post is mainly focused about the Ghost spam, this is the most aggressive type and most common of the last months, it hit many, many people
What you are talking about is Crawler Spam, this type of spam had existed long before and as its name suggest it crawls your site, therefore, it knows who is hitting and records a valid hostname. As it's mentioned at the end of the article, you have to options for crawler spam a filter or a server-side solution like the htaccess file.
There is one or two new ghost spam each week while new crawlers show every few weeks, sometimes months. You can check the list I mention in the article.
I manage several GA properties and there are always hundreds and even thousands of hits from ghost spam while there are only a couple dozens of crawler spam. The reason is that the last requires more effort and resources than the method used by the ghost spam.
Of course, there might be exceptions and the spammers may find other ways, but as for now the valid hostname filter will still take care of most of the spam, and if you combine it with a crawler filter you will be good to go.
You can find my main article on the Bonus resources part that will walk you through all the steps for both filters and other recommended settings to ensure that only valid traffic is passing.
Hopes this helps to understand more the difference between spam types.
You can use the measurement protocol to override hostnames. Trust me, the hostname filter solution has been around forever, and many of the bots have since moved beyond it.
Granted, you're right that simply setting up a hostname inclusion filter will get rid of most it. But there's a really simple way to prove that "ghost spam" can match hostnames: check an analytics profile for a domain that's not online anymore. You'll see some that match the hostname.
That's right, you can use the measurement protocol to override almost any dimension even the hostname can be faked and they do it using known names like amazon or google, but since they are targeting random tracking id's, it is always some invalid hostname most of the time they don't even bother to fill it and leave it as (not set).
To have a valid hostname, the ghost spammer must know who is hitting first. And to do that it has to crawl your site therefore it is not ghost spam anymore. Of course, there could be exceptions where someone is hit directly.
To be 100% sure whether the spammer is a crawler or a ghost the best way is to check your access log if you don't find it then is a ghost. You can also see a demonstration of this on the article I mention above.
As you correctly state and in a normal case, the valid hostname filter will take care of most of the spam, leaving you with a small percent to deal with, using any of the other methods suggested.
I'm not saying that you don't have to put some time on it, I'm just saying that using these methods will save you lots compared with other solutions.
Thank you for commenting Garfield
What we do for (not set) host names ?
Which filter we use in this situation to control the fake traffic in analytics.
Regards
Mohit
Mohit,
The (not set) is a filled after by GA when a value it's not present. So it's not possible to filter this string. The valid hostname solution explained on the article should take care of them or alternatively you can filter using another dimension like Campaign source.
That's cool, Carlos! :)
I'd like to add one more tool which I got from one of twitter thread. It works great to filter the referral spam.
Here is the link: https://referrer-spam.help/
I hope it'll useful for the community.
Cheers!
I applied this filter Hardik. It work for me. What Carlos says here makes sense and I'm gonna add that too.
Thanks Hardik, i had added the Filter for my sites which "Carlos" it works for the time being but we noticed that some real host name also used as Ghost Spam. I have used the Tool you suggested and its mind blowing.
Thank you for the tool recommendation Hardik, I'm trying this out right now! How long does it usually take to see the filters post to your analytics account? These should be visible under the account filters correct?
Here is a nice open source list of almost 550 spam domains: https://github.com/ddofborg/analytics-ghost-spam-list
Thank you for sharing Dennis, the filter explained in this article will take care of almos all of them :)
Voted up because this an excellent post! How do I know? I read the article, went, "Oh crap!", and then immediately went into my company's Google Analytics and created the filter. Boom, done. Problem solved.
Thanks, Carlos!
You are most welcome Samuel,
I'm glad you could apply it and make it work for your Company.
Thank you for the support!
Those spam referral sites are so annoying! Thanks for this fix.
Thank you for your article, is just what I wanted. I have not enough traffic, but this spam it's very annoying anyway.
Great article, Carlos!
And very useful for me. I've been trying to stop spam traffic in my ananlytics without success. Around 33% of my traffic may be spam. So, I've just set up the filters. Let,s hope they work!!
Thanks and count on my vote!
Best regards
A few caveats that I don't see mentioned:
1) This solution doesn't work retroactively. I think that this makes the title over-promise vs. expectations for many people. Yes, you can apply an advanced segment, but this often results in sampling over longer periods of time.
2) This takes maintenance and is really tough to be proactive. You put this one filter in place, and then a new spam piece pops up tomorrow. It's whack-a-mole. Many MP spam hits come in and leaves within a day or two, so this solution takes a ton of active maintenance, not just "1 filter"
3) That filter will eventually run into character limits and become more than one filter. Nit-picking a little, but that's the case.
I have long maintained that the only solution to this problem is a Google-led data annulment. I have faith that they will take this step on our behalf.
Jeff,
I guess you are thinking on a different approach where you use one expression with all the referrals and you have to keep adding every time a new one shows up. As you mention this is inefficient and you will need more than one filter when you reach the characters limit. But this is not what I'm trying to show.
The approach I'm using and what is explained on this post is that instead of running behind the ghost spam and excluding it after it shows up, don't let it in from the beginning by allowing only valid traffic with a filter based on your valid hostnames. And since ghost spam uses a fake/invalid hostname it will automatically be left out.
This filter requires minimal maintenance and for some even none since it is unlikely that you will add hostnames frequently.
Although the post is focused on ghost spam (the most aggressive and most common) there is another type of spam the crawlers, these use a valid hostname since as the name implies they crawl your site, these are much less frequent and need a different filter. I also talk about it almost at the end of the post.
You are right about filters only working onward, the same valid hostname expression can be used to apply to the advanced segment. I didn't go to much into dip about this on the post, but the last link before the Bonus Resources directs to another of my post talking about the topic.
By the way, I read your article I think almost when you got it out, I also left a comment on it. I recommend it the article and the blog I've found valuable information there.
Thank you for commenting.
100% agree it's up to the Google Analytics team to fix this issue. I'm growing less confident that they will since they haven't exactly been proactive or even publicaly vocal about the issue, at least that I can recall.
Nice post Carlos!
The parts explaining the types of spam and most common mistakes were very useful.
I totally get that filtering out all new spam on a regular basis is quite tedious, I think your suggestion to include only the known hosts isn't ideal either. If you do this, you will be also filtering out good future referrals. So, say your client affiliates with some sites after the filter has been set-up, traffic from these new affiliate sites will also be filtered out. It's a good solution but you will still need to be updating the inclusion list on a regular basis.
Modestos,
That's right as I mentioned is important to update the hostname expression if needed, for example, every time you add the tracking code to a new service. However, this is far less frequent than updating the filters for each new spammer. Some websites won't have to update the expression at all.
Another advantage is that the hostname solution is preventive while adding filters for each spammer is corrective, and even if you act fast you will have already some hits. But the one I think is the most important is that it will also stop the fake direct traffic that some of the spammers leave (probably as a misconfiguration when trying to spam) that otherwise is really hard to detect.
Either way you have to take care of the spam, and choosing the best way depends on the website characteristics.
Thank you for commenting.
Seriously Carlos is the man. I invested in his managed service where he installs the filters for you and reports back with valid hostnames and all the filters applied. He also checks in 2 weeks later to make sure everything is working properly. If you can't be bothered to do this I highly recommend just having him set it up for you.
Thanks Luiz!, as always it's a pleasure working with you.
Thank you for this information Carlos, i was always updating the .htaccess, Analytics and also install a Wordpress plugin for reffer spam, your solution is more simple and easy :)
Thanks a lot!
With little tweaks we were able to get rid of all spammy GA traffic!
Wow the post is just I was waiting for it.
Thank you, Carlos. I realized we have spam in GA and now we're able to fix it.
Hi There, Carlos, Thanks for this information. I do have a question, though. In your instructions about adding the hostname filter to filter out ghost referrers, you say, "This single filter will get rid of future occurrences of ghost spam that use invalid hostnames, and it doesn't require much maintenance. But it's important that every time you add your tracking code to any service, you add it to the end of the filter."
I don't know what that means. Does it mean, for instance, GA tracking codes and FB pixels? It would help to give examples of what tracking code you are speaking about, what service, and explain what you mean by "add it to the end of the filter." Thanks very much for the clarification! Pamela
Hi Pamela, Google Analytics can be integreted with many services by simply adding the tracking code. Each time you add this code UA-1111111-2 to any service. That service will become a valid hostname. For example, if you integrete your GA with youtube, everytime someone watch one of your videos, it will be recorded in your GA, but with the hostname Youtube.com
Thank you so much for this! Really interesting article and awesome solution! It worked for me! :)
Awesome, that´s what exactly what I needed!
Br
superb approach i would love this article.
Thanks again keep posting and Regards From Professional web design wefixtexas.
Thanks for your article, very useful!
Carlos
Thanks for sharing your solution with us.
I fighting very long with the spam on my blog and if I block one of them, I get three new at the next day :/
such a lovely piece of writing. As i was doing this manually and it takes ages for 13 sites. But this was easy set up. Thanks Carlos.
Hey Carlos,
Thanks for the love! I learned more about ghost referrals through the comments on my article, which I had to direct people to on a regular basis. Glad to see you have made a guide for how to deal with the ghosts! I now google announced they are working on a solution for this as well.
Have a good one!
Great Article Carlos, I have spend lot's of time to create separate filter for every referral site, find your article very useful.
Great article. I set aside time every week or so to check out the referral traffic and plot the data into my traffic chart to see if anything suspicious is actively bloating my stats. I quite enjoy hunting them down, although i have a ton of filters. So, thanks for the other pieces of advice, this will be a great help.
Thank you for this great tut. ghost spam is really an annoying thing. If I try to verify the inclusion filter I get this mistake: “This filter would not have changed your data. Either the filter configuration is incorrect, or the set of sampled data is too small.“ Is it because of the hypen in my domain for example:
Filter Pattern
firstname-name\.de|translate.googleusercontent\.com
Thank you
Hi @Thunderbirdshunder, this is a common message when using the verification feature. This feature takes an small sample of data of previous days, and sometime it doesnt find a match in that data. As an alternative you can test your filter using a quick segment. Here is an example. https://www.ohow.co/verify-your-filters-to-protect-your-data-in-google-analytics/
But If you followed the instructions carefully you should be fine.
Perfect timing on this article! I've seen a huge spike in weird referring traffic with a few of the same examples I saw above. Thank you!!
You're the man, Carlos!
This article helped me out big time back when I first saw it on ohow. I HATE bad data, so it was really frustrating and time consuming to try all kind of weird fixes, that would only work temporarily and need constant modifications.
Glad this got posted on Moz, I'm sure this will help a lot more people out, so we can all get back to working with REAL data.
Thanks for the support!
I think everyone feel that frustration, especially on the last months where you could find one or two new spammers every week, and keep adding filters was unbearable.
Luckily I've been using this solution since the beginning of the year, I've also been sharing it since then. I'm glad you found out about it and hope other people gets benefit from it.
Hi nice article, I, like Peter, using an ID ending higher than -1. It is 5 months that I'm safe from ghost spam. I know this can end anytime, but till now is working...
Great article, however, I got so tired of dealing with the spam / ghost traffic through GA, and setting up filters, that I completely dumped GA for Clicky analytics. So much happier now. Clicky has already done their homework and have blocked this junk from accessing a site and messing up traffic reporting.
Thank you!
Reference of spam sites are very annoying...
Hi Carlos,
Thanks for sharing this article. Today a client sees ghost spam with hostname translate.googleusercontent.com. See: screenshot
If I understand you correctly I should not filter out the translate.googleusercontent.com from the result because this is a legit hostname. Is that correct?
Thanks for clearing this out for me.
Cheers, Daniel
Thank you for sharing Strila,
This hostname is used when someone translates one of your pages. But now also is used by some spammers, you can also find it on sexyali and a few others.
So I don't recommend it to add it to your valid hostname filter unless you have a significant amount of real hits because your pages are translated constantly.
In that case, you can treat these spammers as a crawler spam.
Thanks! Can I safely say here that the only hostname to use is that of the domain? screenshot of hostnames
Excellent article, thank you. I am learning SEO in a big way right now but I'm stuck on how to add my regular express to a custom report like you mention. I would appreciate your help. Thanks!
Thanks Mark,
The expression should be used on a filter you can find them on the administration panel of Google Analytics. You can find a link to my main article on this subject on the Bonus resources section at the end of this post, you will find a more detailed walkthrough. Hope it helps.
Hi Carlos,
Would this filter not just end up excluding traffic from host names you haven't added to the filter that are genuine?
TWSI, that's right if you don't add a hostname it won't be included. That is why it is important to get a list of all your hostnames and add them to the expression, if later you add a new hostname you should add it to the filter too.
The good thing is that this doesn't happen too often in most of the sites, some of them won't need to update the filter at all.
If you are not comfortable applying the filter directly to your main view, you can create a test view first and when you are convinced that the filter is correctly configured move it.
Great suggestion! I will give it a try. We've been getting tons of ghost spam. Thanks for sharing.
Greetings, I've tried all of these techniques...neither Include nor exclude filters seem to work. The spam referrals still show up in my analytics. Please advise.
@rocket if you followed carefully all the steps then you will get rid of all ghost spam, a different filter will be needed for crawlers. If you need more detailed walkthrough you can find my main article on the Bonus resources section at the end of this post.
Bravo! Every time I read one of your posts they are spot on!
Yes, in the bigger picture we are waiting for Google to protect our Anaylitics and IDs but remember unless Google sees something is effecting "Them" making money, "we" the actually users of the internet should not hold our breath. Taking action, even if it is only a temporary or a partial solution, is a must do now for us instead of waiting for the 800 lb Gorilla to wake up and swat the pesty flys for us.
This article helps the majority of us out here that are running business and do have the time to stay on top of the crazy bad and aggreviating things that hackers and kiddie scripters with no sense of moral boundries are doing every day! Most of us who are not paid to be techies everyday are just trying to stay alive in the aftermath of Google's Googles Panda & Penguin expedion to channel them more money.
Thanks Carolos, keep up the good work!
You are most welcome, I'm happy that you found value on my articles.
Thanks for the support.
Carlos, if I ever meet you in person I would like to shake your hand and buy you a drink.
I've been blocking semalt spin-offs for what seems like forever, and I was beginning to develop a nervous twitch from the fact that a few spam domains were continuing to get through my htaccess block. Thank the god of glazed doughnuts I stumbled across this: https://www.ohow.co/what-is-referrer-spam-how-stop-it-guide/ and read about ghost spam for the very first time (how on earth did I miss the term "ghost spam" for this long?!). I swiftly Googled "ghost spam" and whaddya know - this very Moz post is sitting at the top. Just sitting there like "Yeah, I've been here all along. You took your time.".
Now it all makes sense. I've been trying to do battle with ghosts (spam).
No longer do I need to endlessly trawl stackoverflow forums trying to understand why my htaccess code is working for some referral spam but not others.
Gone are the long nights looking wistfully out of the window and longing for an apocalyptic rain of fire and brimstone to take out all the darn referral spam.
I can now hold my head high and smile knowing that there is hope. There is still hope.
Thank you from the bottom of my SEO heart.
Sara!
I'm very very glad that the articles helped you to understand more about this issue. It has been Hope they also helped you stop this threat.
Thanks for those well written and lovely words.
This was a huge issue for us in early spring. Initially I jumped to the conclusion that all of our sites were being hacked before I educated myself further and realized what was happening. Great article that I can use as a reference when clients ask about their referral data. When blocking crawler spam via htaccess, I assume this is a perpetual change as new bots pop up?
Steve, You can think of the lines on the htaccess file as filters (for the way used here), you can add and remove lines. These also block specific spammers, so for example, this line will only block referrals containing buttons-for-website[.]com while is in the file.
<code>RewriteCond %{HTTP_REFERER} buttons-for-website\.com [NC] RewriteRule .* - [F]</code>
If crawler spam with different names comes you will need to add another line to your htaccess file.
Great timely article. We have seen a huge increase in ghost referrer spam in the last year. You can also setup a retroactive filter via table filter/custom segment as a view level filter will not go back in time. Great article!
hello again, could someone please lend me a hand. Someone who got this to work correctly? Maybe we could skype and we could share screens and you could help to walk me thru this please? I am doing something wrong, please help, i would much appreciate it. Here is my skype id, if you could add me to contacts, then we can share screens, thanks so much: skype id: puttmaster thanks so much to anyone that is willing to help me with this, it is very frustrating. Thanks in advance
Steve, check my answer on your previous comment and see if that helps.
no this is new hits. i am setting the date from the next day after i added the filter, so it is all new stuff. Thanks in advance for the Help Carlos, maybe we could get on skype and share a screen? Thanks
Steve, you can find my main article on the subject at the end of the post on the Bonus resources section. I just updated it, you will find a detailed walkthrough for this type of spam and others with screenshots.
If someone new links to you and starts send you valid traffic, did you just inadvertently filter them out becuase they are not on your filters list?
Benham, I think you are confusing how the filter works, you should include your valid hostnames, not the referrals.
Every real referral will have one of your hostnames, therefore you won't be excluding those, only the spam that have an invalid hostname
Worked like a charm!
Thanks Carlos, will give this a try. Is there any word from Google on referrer spam? I've looked and haven't found anything from them. You'd think they'd have an interest - even make it a priority - to address one of their major platforms being rendered useless by their nemesis (spam).
I've seen a few post from them in the Google Analytics group in G+. I think they are addressing the issue if not this will probably would be a lot worse. The truth is that this method will be less and less efficient for the spammers so, people are aware of it now.
Thank you for commenting @dino64
Great post Carlos, we're up to 3 filters now on average combating spam sites. Wish they would get rid of the 250 character limit..
I guess you are using the Campaign source filter. Don't add more than one INCLUDE hostname filter or else it will not work.
Hi Carlos,
Thanks for the article! I have a similar method (though just with GA filters, no .htaccess) that I've turned into a free service here: https://www.quantable.com/analytics/google-analyti... for anyone that wants to automate adding the kind of filters you are talking about.
It was really helpful. Ty so much!
The one thing I am confused about in this post is what happens to valid hostnames that show up in the future? Won't these be filtered out since they are not on the INCLUDE list of valid hostnames?
Will,
Hostname filter as any other filter should be updated when necessary as mentioned on the article. You may think what is the benefit from this filter if I have to update it. The truth is that most of the websites rarely change hostnames and if they do, I will still take far less time than updating filters for spam.
Thank you for commenting
I was really worried about the spam referral visits. Thanks Carlos for your valuable post. Before I was using the .htaccess codes to block the spam sites which I got from a moz blog post by Jared Gardner published on March 18th 2015. But I think the blocking for .htaccess file is only blocking few spam sites not all.
I have one doubt Carlos, whether these spam bot attack will affect the CPU usage of server? I have problem for 2 - 3 projects. Server admin are regularly contacting me ans saying that my websites increases CPU usage of his server. Do you have any suggestion on it ? Whether these spam bot visits increase CPU usage of our servers?
Brahmadas, you should consider the different types of spam.
You can find more details about this on my main article for the spam in GA you can find the link on the Bonus resources section at the end of the post.
Hmm, Would it be useful to have a web app that you provide the CSV export from the and it generates the Regex script to be included into analytics? Would consider creating one, if I get some interest...
Amazing article Carlos.
Just wondering what can be the objective of spammers here (in case of ghosts), except spoiling the number for you?
Thanks Nitin,
The spammers have different reasons, some want to promote some service, others will redirect to an online store and through an affiliate program they will get some commission if someone buys something on the next days. You will be amazed the amount of traffic they get, so I guess some will fall on their nets.
We also noticed unusual traffic from "com, ip-pool.com, linode.com, thousandeyes.com, axcelx.net, dimenoc.com, svwh.net, tux-support.com" network domains.
This SPAM traffic caused huge increase in direct traffic and bounce rate of our home page in May and June 2015.
Kamgir, you should focus on the hostnames, not in the networks since you can have spam and valid traffic from the same network. When you are in the network report at the top you can see a small link that says hostname you should click there.
Thanks for commenting.
Many times GhostSpam use our same hostname...so, How many time we must spent to check referral sites every day? 1 year ago, when Classic GA migrated to Universal, the first spambot, Darodar, started this wargame...and Google, a big big big company, not have a good solution? bah...
Marco,
I know spammers are making it hard, in a perfect scenario, we should only focus in analyzing our data, but if you follow these steps you won't need to check your referrals every day, you will take care of all ghost spam.
You will need only to check for crawler spam, but these are far less frequent than ghost, so checking once a week or maybe every 2 weeks should be enough.
Loved this article and thank you for answering the "Why are they doing this?" question here in the comments. It's been something that has been bothering us for a while and we weren't sure if it was something we were doing to attract these hits or if it was purely random, this article is both helpful AND gives us and our clients peace of mind.
Thank you for putting this together!
Thanks so much for the article!
I've been looking for the best solution to get rid of referral spam for a while. I'll give this a go!
I'm not surprised your post has made it to YouMoz, I followed it some time back now, when all the Russian spammy referrals started to appear in my Analytic s data. Your filters and crawler controls have siginificatnly helped me to clean up my data.
Great work Carlos :) and thanks for sharing.
Tim
Thank you for the support Tim, glad you found this solution early and got rid of the spammers before it exploded.
this will help me a lot i was searching the internet for a while to deal with ghost spam.
Thank You
God Bless
Hi Carlos,
Nice explanation, This way you can block them only on your Google analytics not on your website. We don't know what type of information they are getting/taking from our website while visiting, they also reduce server speed. I think, Blocking them through .htaccess file is the best way to stop them to visit your website.
Another way is, you need to track these website's IP and block them on your server, in such cases, problem is, they use random ip with different host names. So it is the lengthy method but very effective.
Hi Vishal,
As I mention above, we can split the GA spam in GA in 2 Ghosts and Crawlers. The post is mainly about the first ones, ghosts can't be blocked using any server side solution like the htaccess, since they never interact with your page in any form, so filters are the only solution, these are also the vast majority.
On the other hand, the crawlers do access your site and therefore, can be blocked as you mention. Now normally this configuration files are powerful and some misconfiguration, can bring down the site, so for people not feeling comfortable editing them (probably not your case), is better to use also filters. There is also a section at the end of the post that talks about this.
Thanks for reading and commenting
Completely Agree with you Carlos !!
Ghosts spam can't be blocked. I have seen in such cases where referral spam visits in Google Analytics, even without tracking code added to the website. May be they are targeting random web property for that. Things will go very hard for sure in future.
Have a great weekend. :)
Thanks Vishal,
That was one of the things that led me to investigate further on this issue. I was also surprised to find "visits" on my inactive codes, I also use it to demonstrate that there is no real interaction with the site when someone insist in using server-side solutions.
The way the do it is must probably as you mention, they just generate random numbers with the tracking code format.
Hi Carlos,
That will be really great investigation for sure. If spammers are getting tracking code with any other method, then things gonna very nasty !!! Google Analytics Team must execute the plan, to stop them. Please share your experiment results once you have done it.
Hey Carlos. . Many thanks for this post. .
Am also facing this same issue. The link guardlink.org is also present in my GA account. Thanks man this will help. . !
Cheers. .
This is so helpful! I wish I would have known this information several months ago.
Thank you!
Hi Carlos,
Thanks for the great post, we have implemented your suggestions. Not sure if this has been mentioned (apologies if it has) when setting up the view, ensure all your goals are copied across - GA doesnt do this by default. Also, ensure your new view is linked up to Adwords, otherwise you'll get 'not set' data pulling through.
Hi Carlos Escalera,
Great Information shared by you. Thanks for Update.
I was also faced this spammy visits from spammy domain that you mention in your blog screenshot from Google Analytics 3 to 4 month ago. Whatever you told us in your blog that thing was done by me.
To stop Referral spam visit, I was create Referral spam filter for all domains that had referral spam visit at my Google Analytics Dashboard. All other domains visit are banned but domains like “www.event-tracking.com” and “www.Get-Free-Traffic-Now.com” visits still present at my Google analytics. So, what should I do to stop getting visits from this domain?
I was also getting direct spam visits from many countries in which my website or my business not have any presence. To stop visits from these countries, I was creating Country spam filter at my Google Analytics. So, can you tell me any other option to stop receive this direct visit from many countries?
Vishal,
If you create, the valid hostname filter explained in the article you should be able to get rid of event-tracking and free-traffic-now since this 2 are ghost spam. You will only need to take care of crawler spam with a Campaign Source filter.
As for the direct visits, there might be 3 different reasons related with the spam.
Try checking these things, normally is one of the first 2 and the solution for those is not complicated.
Hi Carlos Escalera,
Thanks for Update. As per your instruction, I will do all thing that you mention in your blog. Let us see what happen.
Hi Carlos
Thank you, Ive been trying to figure a way of getting this spam out of my GA for a while. I had tried a few methods but none had ever worked, but now I know why - I was doing it wrong. I will be implementing this later today.
Thanks
Andy
One setting to check in Analytics is blocking known bots. https://plus.google.com/+GoogleAnalytics/posts/2tJ...
You'd think they'd just automatically do this, but on second thought if you're following best practices and using multiple views, your setup will WANT to have the unfiltered view, then automatic filters, then (for example) automatic filters and manual filters.
Douglas,
This is an excellent advice!. Keep always an unfiltered view, and check the box to Exclude known bots.
Thank you for sharing.
Great article Carlos. I've always wondered how/why spam websites show up in reporting. My guess was that they wanted free traffic to their sites from curious GA clickers, although realised that they are usually 404'd as you suggest.
My only query relates to the setup of the filter. Is this created through the 'custom reports' section under the customization tab on the dashboard?
Thanks,
Brodie
That's right Brodie, I guess some of the spammers get banned from their hosting service or they just stop using that domain for a new one. From what I've been seeing most of the ghost spam shows only for around 2 or 3 weeks then they change the domain.
The filters are created at a view level, you can find them on the administration panel of Google Analytics.
Thanks for commenting
Really nice article. One question : What is the benefits from these spams sender by sending fake visits to website. Its because of our social media activities?
Thank you for commenting Hitesh,
The reason is simpler than that, they just want to get attention and drive traffic to their website, the final purpose varies sometimes they try to sell some SEO service sometimes they send you to an affiliate program.
Thanks for replying... Exactly what was on mind... Nice post.
This article is brilliant. Spam...is not.
Great read Carlos, I'll be implementing those tips for sure!
Hi Carlos,
Thank you so much for sharing this filter. I was searching this type of filter for a long time. I have tried some steps to filter my spam data, but this one is the best.
Thanks.
Hi Carlos, thanks for such an informative post!
One problem I have, which doesn't seem to be affecting others here, is I don't seem to have the option of 'include' on the filter. When I choose custom filter it only has the option for 'exclude'. Any help would be appreciated!
Thanks
Alan
Alan,
Sometimes, the rest of the options are hidden by the size of the screen, try to scroll to the bottom, you should see something like this https://i.imgur.com/sqEJvMi.png
I am really really happy with this post.Actually, i was also facing with the same issue.spam traffic/referrals.and many time i filtered from analytics. but its come new again and again.so this post we learn lots of new things for spam blocking /stopping.This post is very nice.because till now did n't know what kind of spamming? and what is the mistakes behind this.?
Thanks for...............this
The most informative article I have read in recent times. Thanks Carlos. Honestly I was clueless regarding the ghost spam and often used to think why the hell I get such traffic but not anymore. I have already implemented it on one of my site. I will keep you posted for queries and progress.
Hello, thank you very much for sharing.
I've tried on my website and it works correctly.
It is important to create a new view to never lose unfiltered information.
A greeting!
That's right Alejandra, it's important to always have a view with raw data. In fact, this should be taken as a best practice even if you create filters for other purposes. Always have an unfiltered view
Thanks for commenting.
This post couldn't have come at a better time, fantastic article!
I don't understand what spammers have to gain from ghost spam, but at least I can now filter it from my Analytics!
Thank you.
Thomas,
People are curious by nature and I include myself, they want to know what's going on on their websites and probably check every reference, the spammers use some names/urls that will catch their eye, some will try to search it and directed to the spammer website, that's the first step lure people into their website.
From there are different intentions, some try to sell services and some get a commission through an affiliate program if someone buys something.
Thank you for the comment
I found also some crazy way to stop crawlers and ghosts. They spamming only property ending with -1. If you create new property ending with -2 or -3 or other you're in safe. For now.
Thank you so much Carlos for the post. Yes, it is a serious issue, and I was looking for the solution from long time. I am using the filter at present and still getting the ghost traffic. Now I will follow your suggestion. I hope I can exclude ghost spam from my GA.
I read the original article a month back and implemented it on all my GA properties. I didn't even bother to block the crawlers, as just the host filter has taken care of 99% of the spam. Simple and effective, I highly recommend.
My website is also being SPAM last 2 months
However, I've noticed the Ghost Spam is getting smarter.
We're seeing the spam hitting the correct hostname increasingly often. I'm afraid this fix will only be temporary.
Kane,
The type of spam you are referring is Crawler spam. It is also covered on the article although not in detail. Crawler spam has been there long before Ghost spam showed up, people barely notice because the impact was minimal, and although it has increased the amount if this type of spam it's not comparable to ghost, simply because the spammer needs more resources.
As you mention these people are trying to find always ways to spam. But As for now the valid hostname solution is still the best option in my opinion to control this threat in GA
Thanks for commenting
Hi Carlos,
This is the main thread when we look for referral traffic and we see lot of spam referral in GA. I tried to block those IP by .htaccess file but I didn't succeed to stop. I found lot of new stuff thorugh your post and hope now I can able to avoid those Spam referral and get the correct statistics.
Can you please check your first link which you have linked to "Jared Gardner" article on referral spam traffic : "excellent article" the link is not working. Can you fix the issue so I can find the article and hope I can find some more tips from Jared article.
Thanks,
Satish
Thank you Satish for letting me know about the link I will get it fixed. There is also a link to my main article on the subject is long but if you want to know more about this issue you will find everything explained in detail.
Your most welcome carlos.
Definitely I will go through your article because I am facing lot of issues with the spam referral and mainly from social button websites.
Hope I will get some ideas to rectify those spam referral from your article.
Thanks.
Great article!
The ghost spam is actually a real problem for people trying to have a good positioning of our web working hard. The first time it happened I was hours trying to solve this problem. Then I discovered how to filter it. I'm not a professional, everything I've learned is thanks to internet and this blog. I have to say that until now I did was go from time to time to see and add new spam sites. I try to make the filter as you indicate in the post, I will save much time I can invest in making sure other things for my web.
Thanks for sharing,
Anyone else seeing semalt variations, rankings-analytics.com and video--production.com getting past this hostname filter?
Brian, the 2 referrals you mention are crawler spam, the hostname filter stop only ghost spam. To deal with crawlers, you have two options, blocking them from your htaccess file or adding a Campaign Soruce filter.
There is a small mention of this in the article, if you need more details, you can check the complete guide on the bonus resources section at the end of the post.
Thank you for commenting.
Hello, I am trying to apply filters as demonstrated in your post but when I click on 'verify filter' it says it would not have changed my data and might be invalid. Am I doing something wrong? I am setting up an exclusion filter from campaign source for the bots, but I also set up in the include hostname and although it was valid it had not changed any data in the 'verify filter - see how your data would have been affected', all the spam was still there.
The verification tool uses a small portion of sample data, sometimes when you try to verify and there is nothing to exclude in that sample you will get this message. You can also get this message if the filter is not configured correctly, a common mistake in the include filter is to add a bar "|" at the end that will invalidate the filter since it will include everything.
If you think your expression is correct, alternatively to test it you can use an advanced segment, the advantage is that you can select any time frame.
This post is too helpful to me because I learn a lot of things that I didn't know. The ghost spam is a real problem and your solution are the best to fight with it. Thank you!
This is a great little post, there is so much contradictory info out there about dealing with ghost referrals. I decided to test the waters and started creating a filter, but there are a over 200 Hostnames labeled (not set), if I can't capture those will this effort make a difference?
That's right @bcval, creating filters for each spammer would be a neverending job, the advantage of this filter is that you will get all ghost spam, no matter what name it is using or where in your reports it shows.
I'm afraid these recommended solutions, while useful for the tech savvy, are above the heads of mere laymen like me. Here's the problem I've been experiencing:
I seem to have a problem with referrer or ghost spam with my blogger.com blog. Each day, my analytics chart shows regularly spaced spikes, like fingers in a hand, indicating between 30-38 hits - no more, no less. And the same posts are called up in the same numbers each day. But the Traffic Sources tables indicating Referring Sites and Referring Url's show overwhelmingly sites and url's as being from Google and LinkedIn - not the usual suspects (semalt, etc.). I posed this problem in the blogger.com forum, but got no help there. I signed up for referrerspam.com in connection with my Google Analytics account with the hope of filtering out these spam hits, but, so far, it's had no effect. So, I'm utterly stumped on this one. My blog analytics are totally screwed up. I can't figure out where this referrer or ghost spam is coming from much less how to deal with it. Any insights and advice for a layman would be most welcome.
Is it safe to filter all traffic with hostname '(not set)'? Or are there times that genuine traffic might have hostname '(not set)'?
Hi Josh, in most of the cases it is totally safe to exclude it. There is a real small possibility that real traffic has a not set hostname, especially when there is something wrong with the code, in that case, the issue should be fixed.
If you want to be 100% sure that the (not set) hostname doesn't come from real traffic, you can add as a second dimension source (in the hostname report) and check if there is anything valid.
One note, it is no possible to exclude (not set) values directly in a filter, since this value is set by GA once the visit arrives. To properly exclude it you should use the valid hostname filter explained in the article.
Hope it helps :)
Thanks for the response Carlos.
Good to know that having no hostname is extremely rare for genuine traffic.
"One note, it is no possible to exclude (not set) values directly in a filter, since this value is set by GA once the visit arrives." - Ah. There's a gotcha! Good to know, thanks!
"To properly exclude it you should use the valid hostname filter explained in the article." - I have set that up, but it's not catching the ones that don't have a hostname. It's catching "website-clicks-xyz.co.uk" and "trafficmonsoon.co.uk" etc... but there are still some coming through with "(not set)" which is what I'm trying to exclude.
Thanks,
Josh
Josh, when you create an include filter, GA will only include the entries that match the expression on the filter. If the expression is built correctly, the entries without hostname information (or not set) won't find any match so they will be left out.
Three things to check:
If you need more info, you can follow my detailed guide mentioned at the end of the post.
Hi Carlos,
Sorry, I completely misread the article and created an exclude filter, excluding the hostnames we had previously been spammed by to stop them spamming us again. This is a much better solution. I really should stop rushing through these articles! I'd also completely misunderstood the purpose of the hostname. This makes so much more sense now. Thanks!
Hey,
IN my GA Hostname is my website : expogini.com
1,253(48.21%) 66.00% 827(51.05%) 9.26% 2.02 00:02:13 0.00% 0(0.00%) $0.00(0.00%)
2.www.expogini.com666(25.63%) 57.06% 380(23.46%) 44.89% 3.20 00:03:44 0.30% 2(100.00%) $0.00(0.00%)
3.www.foxnews.com 250(9.62%) 93.20% 233(14.38%) 0.00% 2.00 00:00:57 0.00%Top 3 spam Hostname in last 7 days :
I am getting spam for following sites :
Top Social Traffic:
reddit.com / referral
lifehacĸer.com / referral
addons.mozilla.org / referral
thenextweb.com / referral
abc.xyz / referral
boltalko.xyz / referral
biteg.xyz / referral
arendovalka.xyz / referral
brateg.xyz / referral
buketeg.xyz / referral
bukleteg.xyz / referral
begalka.xyz / referral
bezlimitko.xyz / referral
budilneg.xyz / referral
72539869-1.compliance-andrew.xyz / referral
72539869-1.compliance-don.xyz / referral
72539869-1.compliance-elena.xyz / referral
abcdefh.xyz / referral
advokateg.xyz / referral
alfabot.xyz / referral
i don't know to solve this /..
Hi there Amit! Because this blog post is now quite old, you may not receive a response to your question. I'd recommend asking in the Q&A forum, where it'll get better visibility and there are more experts to crowdsource a solution for you. :) Hope that helps!
Thanks @Felicia, even if the post is old I try to answer since the topic is still hot, specially in the last days. I'm also checking if I can update it.
@Amit from the list you posted I think the only valid is the first one which I think is your domain, the rest are fake hostname used by the spammers to confuse people.
As a general rule, if you didn't add the tracking code to that site then it is not a valid hostname, there are a few exeptions like google translate.
I'm getting my site as hostname, including the Spam Traffic and Organic traffic. I know that because I checked the direct/ non season which includes 3K+ pageview and the organic includes 3K+ and under the audience> Technology> network> hostname it shows 7K+. So if I block my own site URL (as hostname) will it filter the organic traffic also?
[Link removed by editor.]
Hi @Thomasbaruah, you should never exclude your own domain, if there is spam using your domain as hostname which is rare, then you should apply a source filter for that spam. You can check the link on my main article at the end of the post for instructions for this scenario.
Great tips, but I actually wants some sites permanently block to send traffic on my site. How I can?
Hi Ejaz, most of the spam in Analytics never passes through your site, is injected directly in GA, so it isn't necessary to block it from your server.
There is a small amount of crawler spam that does pass through your site but the amount is insignificant and they barely use any resource from your site, filtering from your data in Analytics should be enough.
But if you still want to block them from your server you can check this post.
https://www.ohow.co/how-to-block-unwanted-crawlers...
what if the hostname is (not set)?
Hi Luca, this hostname comes from spam, by creating the include valid hostnames filter you will be excluding it automatically.
Very comprehensive! I also came up with a similar approach when I found I had a lot of hostname spam, see: https://tomachi.co/filtering-hostname-spam-google-analytics/
Thank you for this solution! I have applied it to a filter in our view as well as created a custom segment using the exact same regular expression. When I look at Source/Medium traffic under this custom segment I am thrilled that the spam doesn't appear. However the weird thing is when I go to Adwords - Campaigns in GA all of my traffic disappears under this custom segment. When I view Adwords-Campaigns without the custom segment the data reappears. Any ideas on why a custom segment including known hostnames would affect my GA Adwords reports? Thank you!
Hi Karen, this data is imported from Adwords, and it doesn't have a hostname, so the segment will exclude this data. So just remove the segment when you are analyzing Adwords reports.
As for the filters, you shouldn't worry because as I mentioned this data is pulled from Adwords, so it won't be affected.
I'm almost certain that if any blogger wants a solution for all of his blogging problems and challenges then Moz blog is the one place where he can find the solution. Like I get the one for Ghost spam. I'd seen it recently in the Google analytics data. I tried to fix it with the help of a friend. And so far, I was thinking it as resolved. But this post me rethink on many points that I ignored earlier. And helped me to clarify many doubts. Now, I get to know the hostname filter technique from here which I'll use to resolve the issue once for all. Thanks for sharing this wonderful remedy.
Glad to hear you learn so much here. Moz is an incredible source of knowledge for any website owner no matter if you are beginning or you are a pro, you can alway find something.
Thanks Carlos. I linked to your piece in today's blog post: https://daveruch.com/advice/artist-website-use-google-analytics/
Thank you Dave :)
I am really green technically speaking and apologize in advance. I've been trying to filter out spam referral traffic as I see it. I think my GA is a mess right now and want to clean it up and get true numbers. I need to create an expression and don't follow your example. The only valid hostname in my GA is hostessatheart. Can you tell me how to write it? Thank you for your help!
That is perfectly normal, if you only see your domain as valid hostname the just add that to the filter. Some sites may have 10 hostnames while some other only one, It depends on the complexity of the site.
Hi! First of all, thanks for your help! However, whatever I do, those spammy links won't disappear from our GA. That vitaly dude, and some others have been around for a way and I think there are more of them now. May I ask what should I do?
Thanks!
Hi MilanaCB, this filter will take care of all ghost spam, including vitaly spam and everything that doesnt have one of the hostnames you include.
However it will only work forward. To clean historical data you should create an advanced segment with the same valid hostname expresion. You can find an example here: https://www.ohow.co/ultimate-guide-to-removing-irrelevant-traffic-in-google-analytics/#3-Cleaning-yourHistorical-Data
Thanks for post. So, we can have two filters only: one for exclude Home IP traffic and second is Include valid hostnames.
I want to ask you, if any chance we missed a valid hostname in the list. How we can confirm ghost spam?
Right now, I am checking 100% bounce and known spam host names
Hi Deepika, to confirm if it is ghost spam you should check the hostname if you see weird names, random characters or even known names totally unrelated to your site, then is ghost spam.
Tthe filter will only work with ghost spam. If what you are seeing is direct traffic with a valid hostname you, this is most likely a bot problem and should try a different approach.
If you can identify the source of this bot you should block it. If it isn't possible you can try excluding this traffic with an Advanced segment by checking some patterns on the data left by this bot. Here are some common causes of bot direct traffic https://www.ohow.co/common-causes-of-unnatural-spikes-in-direct-traffic-google-analytics/
As some have already pointed out, much of the ghost traffic now already uses the correct hostname. As it turns out that is the hostname from the URL submitted in the pageView call. So all they need is a map of tracking IDs to domains, which isn't that hard.
Anything that doesn't require them to crawl your sites is economical and it turns into a cat-mouse game.
I'm currently experimenting with custom dimensions to set a security code in tracking data:
1) Define custom dimension is tracking code setup - called it 'verified traffic'
2) Modify your tracking code snippet to set a known random number as that dimension to all your sessions
3) Create a filter that only includes traffic with this custom dimension.
Of course eventually they might still find a way to get around that, but not until a vast majority of sites do the exact same thing.
Since my comment above I have implemented and tested the custom dimension technique and it works quite well. For the time being it will be the one filter which should always work, as it relies on matching data only known to the site with the filter.
Instructions on my blog: Hardening Filter for GA Ghost Spam
So far I've checked all the spam with a valid hostname and found them all on the physical access log so they actually crawled the site.
Thanks for sharing your solution Jan, a very good approach to deal with the spam.
Thanks a tonne jklier for your blog ! It was just the problem I had - valid hostnames. But! I tried your filter solution and it didn't work - all the visits showed the dimension value I used for verified traffic!!! Please advice!
Thats a brilliant guide Carlos, but just need to confirm whether we can use exclude option with spam domains in the regular expression. The point is that there is always a chance of a new referrer that is legitimate and if we forget to add it in our list it can cause problems in the historical data. Please share your thought on this option.
Salman,
Sure you can use the exclude option if you for example make many changes to your hostnames, not common but can happen.
Just make sure you use Campaign Source as a filter field, don' use 'Referral' or 'Hostname' for the exclude filter.
Whatever your choice to filter the spam, remember to always have an unfiltered view, that way you can check everytime you add/modify a filter that everything is ok, and you will also have a backup of your data.
Thank you for commenting.
Hey Carlos,
What does having your own domain name in the hostname mean? I am under the impression that host name (in the referral source) should be any other sites but your own name. So I am actually curious. Please advise on how I should look at the data.
Total Sessions = 758
1 - MyDomain dot com = 670
2 - (not set) = 74
3 - others = 14
Susanta,
Hostname will be the place where the visit(source) is arriving, and that's normally the domain where you inserted the tracking code. Other valid hostname will be other places where you inserted the same code and probably some cache and translating services. The rest like (not set) and fake hostnames are comming from spam.
Thank you for commenting.
As others have pointed out, it's not as simple as one filter. We have shared a list of 16 effective filters as well as a tool to test if your filters are working on our blog: https://www.referralfilter.com/blog/google-analytics-spam-filters-july-2015
Thanks Matt for sharing,
In fact, at the moment, and if you follow this article you will need one filter for ghost spam(the most aggresive and covered on this article) and one for crawlers. 2 should filters should do the thrick.
Just a FYI, if you have enough valid hostnames like we did on one older sites, and you end up needing to create a second filter, it breaks GA and tracks nothing. This works only if your regex fits in google's max characters for a single rule.
You should almost certainly be able to use regular expressions (regex) to represent large groups of hostnames to get everything under a single rule, @jestep. Covering a domain and all subdomains under one regular expression, for example.
I have seen others who have advise using this regular expression form:
.*yourmaindomain.com|.*anotheruseddomain.com|.*payingservice.com|.*translatetool.com
Is either one better than the other for this particular filter ? I have tried to do a little bit of research regarding regex, but am not an expert yet
Try to be as explicit in your URL as possible Aaron. .*\.yourdomain\.com is better than leaving .* open without "."
The version I write restricts to www.yourdomain.com or somesubdomain.yourdomain.com while the iteration you labeled leaves a vulnerability for www2yourdomain.com (note www2 is a part of the root).
Thanks Paul, Aaron, Drew, all your suggestion should work fine and normally the hostname expression won't reach the 255 characters, and if it does you must not add a second filter as stated by @jestep.
Instead, you should optimize your expression. You can even remove the .com and special characters. For example if you have the following hostnames
Instead of adding all to the expression (that will be more than 140 characters), you can optimize the expression like this:
blueduck|greenduck
This expression will include all the hostnames above (Note that there is no need to add wildcards). Of course, this will allow anything containing "blueduck" and "greenduck" but considering that is highly unlikely that spammers use this terms there shouldn't be any problem.
and what about a direct traffic? In my account I have:'source' (direct) / (none) and 'hostname' (not set). If I set up filter with regular expression similar to this one: yourmaindomain\.com|anotheruseddomain\.com|payingservice\.com|translatetool\.com do i not exclude my direct traffic???
Rafał,
One of the advantages of using an INCLUDE filter based on your hostnames is that all GHOST spam, using an invalid hostname, no matter where it shows, will be automatically excluded.
Ghost spam can show practically in every dimension, so far I've detected it as
In theory they can use any dimension to show up.
Hi Carlos,
Too much frustrated from such sites, client keep asking that they are getting a big amount of traffic but still not getting leads, now I can show them why this happened. Thanks for the help, i would like to add one more irritating spam site - erot.co .
Keep helping and sharing such posts.
Thanks
Sorry to say this but: I've been trying a lot and this articile again only offers a less than perfect solution.
The only thing that ever worked, and perfectly worked was using another GA tracking-code. These guys always target -1. So by simply switching to "-2" you will automatically exclude all referal spam without any filters that will go wrong and screw up your analytics.
Obviously this might not be an option for a lot of sites, cuz it means you'll start from zero.
That being said - a site with 1 Million visits a month usually can ignore referal spam, whereas a new site/blog/whatever won't mind trashing their current analytics history.
Obviously in the near future they might start targeting in a different way, so changing the tracking-code might not work anymore. As of now, i have not found any referal spam that has actually been crawling websites for info - they are just guessing. So outguess them and stop fiddling around with GA Filters.
Noerml,
The near future arrived 1 or even 2 moths ago, I did use/recommend this method then, you can see a section on my main guide about the topic(you can find a link on the Bonus resources part).
It is true that you are still less vulterable using higher trackig ID, but you are not safe anymore. In fact the most nastie spammer like the share-buttons group, hit randomly no mather if you have an id ending in -15.
I agree with you that there is no perfect solution, but in my opinion this so far is the best way to deal with it. I've applied to more than 150 accounts over the last months, and as the tilte states 1 filter deals with all ghost spam.
Thank you for commenting
As our website is NGINX, we can't add the code to a .htaccess file.
Is someone able to expand on...
"...make an expression with all the crawlers, then and add it to an exclude filter by campaign source"
Christian,
For a detailed guide to creating the expression for crawlers, check my article posted on the "Bonus Resources" part at the end of the post.
I'm no expert on Nginx but if you want to block crawler spam on Nginx here are a couple of resources that might help you.
As I always recommend when working with such delicate files like the htaccess, please make a backup first.
I also use Nginx. You have to set things up in the Nginx configuration. I use an extra file called spam.conf. Where it goes depends on your Nginx setup. I haven't updated it in a while, but this is what it has:
if ($http_referer ~ "(semalt\.com|buttons-for-website\.com|simple-share-buttons\.com|humanorightswatch\.org|aliexpress\.com|best-seo-solution\.com|o-o-6-o-o\.com|googlsucks\.com|best-seo-offer\.com|4webmasters\.org|darodar\.com|torture\.ml|free-share-buttons\.com|buttons-for-your-website\.com|www\.get-free-traffic-now\.com|buy-cheap-online\.info|hulfingtonpost\.com|depositfiles-porn\.ga|social-buttons\.com|theguardlan\.com|guardlink\.org|www\.event-tracking\.com|free-social-buttons\.com|ilovevitaly\.com|trafficmonetize\.org)") {
set $prohibited "1";
}
if ($prohibited) {
return 403;
}
Hi
Very good article. Thanks,
I wanted ask if I made everything properly?
After usein filter on my domain I got that scoure:
https://snap.ashampoo.com/98YQNetW
is it alright?
Greetings
Kamil
This would be a great question to ask in our Q&A forum :) https://moz.com/community/q
Mabe yes, but I need has account moz pro and I don't want.
Kamil,
You should probably add googleusercontent.com to the expression this is used when someone checks any of your pages from google cache or translate it. You can discard it if you think it is not relevant, same case with the localhost ip. The rest should be fine.
I forgot to turn on notification.
Google should do something about this. For the past 6 months, no matter how hard I filter, some get through.
Well, Ghost Spam was very challenging task for us and i was searching to find articles on it. Few weeks ago i have read another article on this topic and when i saw this on Moz then i Tweet that post few days ago.. I have been following Moz for long, but not impressed with this article because it is exact copy of that article. It is good article but i am not happy with the approach. https://megalytic.com/blog/how-to-filter-out-fake-...
Afra, I made this article combining a few of the articles of my blog with research I've done over the past months about this issue, there are other articles around some of them are very good, I haven't read this one you mention, but I have read others from them and they are good, so probably this one is too.
Thank you for commenting.
If you're in the US & doing local seo or seo for a website where international traffic is irrelevant, just set your referral traffic filter to only exclude referral traffic from outside the US.
I managed to block nearly 100% of my referral spam traffic by excluding referrals from outside the US, whether they actually hit my website or not.
~~~ edit: For those who actually need to account for international traffic, realize most of this spam is coming from Russia & Ukraine. Consider blocking the countries entirely.
Scott,
That used to be useful for local business a few months ago, but now things have changed. You can see an example on this screenshot.
https://imgur.com/aiDw26b
While most of the spam were coming from Russia at the beginning, now is spread all over different countries. And it doesn't mean that they are spamming from those countries, it means that they can practically use any dimension at their convenience.
Thanks for sharing.
Would blocking sites in the htaccess file just register them as a different type of visit?
If you block in the htaccess file, the few crawler spammers (NOT the ones covered in this article by the way) will never be allowed to reach your page, and so can never trigger the Google Analytics tracking. So no, they couldn't appear in your data at all.
Peter,
It's true that with higher ID's you are less vulnerable, I was also using it for some accounts. But since a couple of months I guess the spammers noticed it and start hitting higher id.
Another disadvantage of changing tracking id number is that you will have to split your data, since the new property will start from zero.
If someone is starting it is not a bad Idea to use higher ID, still will be less vulnerable. I'm glad you haven't been hit.
Thank you for sharing
Just wondering how this technique differs from enabling the "Bot Filtering - Exclude all hits from known bots and spiders" function in Google Analytics under "View > View Settings".
Damon, the "Bot filtering" box it's for a totally different sources and the are usually "good bots". You should also check this box. I even think that it should be checked by default.
None of the ghost spammers are considered in this list, that's why a different solution is needed.
Thanks for commenting.
Is it possible for this filter to deindex pages? Aload of pages dropped out of the index around the same time I applied this filter.
Its worth to note I only added the filter in Google Analytics, I didn't change anything in the .htaccess file
No, not it's possible for this filter to deindex pages.
Spam referral is one of the most annoying thing. Thank you for writing the solution.
You've got a space in the URL to the original post that covered this... first link in this post...
Thank you Jeff, fixed it
I can't say it enough, ghost spam is the worst. Thank you for creating this article. I have been using a referral exclusion filter, about 12 of them actually.
Deleted all my other filters thinking 'hooray only need to set up one filter!' then found about 95% of my referral spam uses my own domain as the hostname. Major bummer!
Hi Nick,
There are 2 types of spam Crawler and Ghost, both are mentioned in the post, however, the focus is on ghosts. The valid hostname filter will take care of all ghost spam without the need to add filters every time. This is the most common type of spam for most of the people. It seems that your case is different.
I mention also the crawler spam at the end of the post, but I don't go to much into deep, if you want to know more details about it and how to stop it you can check the article I mention at the end of the post on the Bonus resources part, it is a comprehensive guide that will walk you through everything relate to this issue.
Thank you for commenting.
Seems like i am still seeing a problem. Not sure how to descibe it.
this is what i am seeing
floating-share-buttons.com / referral/ google / organic (direct) / (none) yahoo / organic bing / organic bing.com / referral webcrawler.com / referral get-free-social-traffic.com / referral yellowpages.com / referral site2.free-floating-buttons.com / referral
This is what i am seeing and i did everything that you describe in your article. Does not seem to be working correctly to me. Do you have any ideas for me?
Thanks
Steve, are you seeing new hits after you applied the filters or you mean on the historical data. Filters will work only onward. For historical data, you should use segments. You can find a link at the end of the article with more information about how to use segments for spam.
Thanks for sharing,I was using mybkdrive.com Its helped me a lot for gaining lots of lost visitors by actual tracking.
It seems I was too impatient. Now it works perfect. Thank you
Sorry something went wrong. This reply belongs to my own post.