(Image created by the author)
The Bot Bandits Are Out of Control
I’ve always known that bots crawl my websites and the sites of all my fellow developers, but I was unaware that bots now make more visits than people do to most websites. Yep, they officially overtook us in 2012, and bots now dominate website visits. Egad, it’s Star Wars run amok!
Before we become alarmed, though, let’s look at a few facts that demonstrate the preponderance of bots in our midst.
The bots are coming. The bots are coming. The bots are here!
(Image source)
Incapsula’s 2013 bot traffic report states that “Bot visits are up 21% to represent 61.5% of all website traffic." If bots are preponderant, what does that mean for us?
For those of you just tuning in, preponderance means “the quality or fact of being greater in number, quantity, or importance.” That means the bots are “more important than humans” in determining the value of websites to potential readers.
A quick look at antonyms for preponderance reveals that our plight is worse than expected. Antonyms for preponderance include disadvantage, inferiority, subordination, subservience, surrender and weakness.
All is not lost, however. Not all bots are bad. In fact, in the wild and woolly world of SEO, Googlebots are actually our friends. A "Googlebot" is Google's web crawling bot, also known as a "spider," that crawls the Internet in search of new pages and websites to add to Google's index.
Googlebots: Our Ally in the Bot Wars
If we think of the web as an ever-growing library with no central filing system, we can understand exactly what a Googlebot wants. A Googlebot’s mission is to crawl this library and create a filing system. Bots need to be able to quickly and easily crawl sites. When a Googlebot arrives at your site, its first point of access is your site’s robot.txt file, which highlights the importance of ensuring it's easy for the bots to crawl your robots.txt file. The less time Googlebots spend on irrelevant portions of your site, the better. At the same time, be sure you have not inadvertently siloed or blocked pages of your site that should not be blocked.
(Image source)
Next, Googlebots use the sitemap.xml file to discover all areas of your site. The first rule of thumb is this: keep it simple. Googlebots do not crawl DHTML, Flash, Ajax nor JavaScript as well as they crawl HTML. Since Google has been less than forthcoming about how its bots crawl JavaScript and Ajax, avoid using this code for your site’s most important elements. Next, use internal linking to create a smart, logical structure that will help the bots efficiently crawl your site. To check the integrity of your internal linking structure, go to Google Webmaster Tools -> Search Traffic -> Internal Links. The top-linked pages should be your site’s most important pages. If they aren’t, you need to rethink your linking structure.
So, how do you know if the Googlebots are happy? You can analyze Googlebot’s performance on your site by checking for crawl errors. Simply go to Webmaster Tools -> Crawl and check the diagnostic report for potential site errors, URL errors, crawl stats, site maps and blocked URLs.
The Enemy in our Midst: Bandit Bots
Googlebots aren't the only bots visiting your site. In fact, over 38% of the bots crawling our sites are out for no good. So not only are we out-numbered, but nearly 2 out of every 5 visitors to your site are trying to steal information, exploit security loopholes and pretend to be something they are not.
We'll call these evil bots "bandit bots".
So, what are we to do?
As an SEO provider and website developer, I could protest. I could blog my little heart out and get a few friends to join me. Or I could buckle down and take responsibility for my own little corner of the web and fight back against the bandit bots.
Let's do this together.
Bandit Bots: What They Are and How to Fight Back
The bad guys come in four flavors. Learn which bots to watch out for and how to fight back.
Scrapers
These bandit bots steal and duplicate content, as well as email addresses. Scraper bots normally focus on retrieving data from a specific website. They also try to collect personal information from directories or message boards. While scraper bots target a variety of different verticals, common industries include online directories, airlines, e-commerce sites and online property sites. Scraper bots will also use your content to intercept web traffic. Additionally, multiple pieces of scraped content can be scrambled together to make new content and allow them to avoid duplicate content penalties.
What’s at risk: Scrapers grab your RSS feed so they know when you publish content. However, if you don't know that your site is being attacked by scrapers, you may not realize there's a problem. In the eyes of Google, however, ignorance is no excuse. Your website could be hit by severe penalties for duplicate content and even fail to appear in search engine rankings.
How to fight back: Be proactive and attentive to your site, thus increasing the likelihood that you can take action before severe damage is done.
There are two good ways to identify if your site is the victim of a scraper attack. One option is to use a duplicate-content detection service like Copyscape to see if any duplicate content comes up.
(Image created by the author)
A second option for alerting you that content might have been stolen from your site is to use trackbacks within your own content. In general, it's good SEO to include one or two internal site links within your written content. When you include these links, be sure to activate WordPress's trackback feature. In the trackback field on your blog’s entry page, simply enter the URL of the article you are referencing. (In this case, it will be one on your own websites, not another site).
(Image created by the author)
You can manually look at your trackbacks to see what sites are using your links. If you find that your content has been re-posted without your permission on a spam site, file a DMCA-complaint with Google.
Finally, if you know the IP address from which scraper bots are operating, you can block them from your feed directly. Add the following code to your .htaccess files. Learn how to edit your .htaccess file. (See editing your .htaccess file on WordPress.)
RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^69.16.226.12
RewriteRule ^(.*)$ https://newfeedurl.com/feed
In this example, 69.16.226.12= is the IP address you want to send to and https://newfeedurl.com/feed is the custom content you want to send them.
Warning! Be very careful editing this file. It could break your site if done incorrectly. If you are unsure of how to edit this file, ask for help from a web developer.
Hacking Tools
Hacking bandit bots target credit cards and other personal information by injecting or distributing malware to hijack a site or server. Hacker bots also try to deface sites and delete critical content.
What’s at risk: It goes without saying that should your site be the victim of a hacking bot, your customers could lose serious confidence in the security of your site for e-commerce transactions.
How to fight back: Most of the attacked sites are victims of "drive-by hackings," which are site hackings done randomly and with little regard for the impacted business. To prevent your site from becoming a hacking victim, make a few basic modifications to your .htaccess file, which is typically found in the public_html directory. This is a great starter list of common hacking bots. Copy and paste this list into the .htaccess file to block any of these bots from accessing your site. You can add bots, remove bots and otherwise modify the list as necessary.
Spammers
Spam bots load sites with garbage to discourage legitimate visits, turn targeted sites into link farms and bait unsuspecting visitors with malware/phishing links. Spam bots also participate in high volume spamming in order to cause a website to be blacklisted in search results and destroy your brand’s online reputation.
What’s at risk: Failure to protect your site from spammers can cause your website to be blacklisted, destroying all your hard work at building a credible online presence.
How to fight back: Real-time malicious traffic detection is critical to your site’s security, but most of us don't have the time to simply sit around and monitor our site's traffic patterns. The key is to automate this process.
If you're using WordPress, one of the first steps to fighting back against spam bots is to stop spam in the first place. Start by installing Akismet; it is on all my personal sites as well as the sites I manage for my client. Next, install a trusted security plugin and setup automatic backups of your database.
(Image create by the author)
Require legitimate registration with CAPTCHAs for all visitors who want to make comments or replies. Finally, follow wordpress.org to learn what’s new in the world of security.
Click Frauders
Click fraud bots make PPC ads meaningless by “clicking” on the ads so many times you effectively spend your entire advertising budget, but receive no real clicks from interested customers. Not only do these attacks drain your ad budget, they also hurt your ad relevance score for whatever program you may be using. Google AdWords and Facebook ads are the most frequent targets of these attacks.
What’s at risk: Click fraud bots waste your ad budget with meaningless clicks and prevent interested customers from actually clicking on your ad. Worse, your Ad Relevance score will plummet, destroying your credibility and making it difficult to compete for quality customers in the future.
How to fight back: If your WordPress site is being targeted by click fraud bots, immediately download and install the Google AdSense Click Fraud monitoring plugin. The plugin counts all clicks on your ads. Should the clicks exceed a specified number, the IP address for the clicking bot (or human user) is blocked. The plugin also blocks a list of specific IP addresses. The plugin is specifically for the Adsense customers to install on their websites; AdWords customers have no capabilities to implement this plugin.
(Image created by the author)
When defending a website from hacker bots, it takes a concentrated effort to thwart their attacks. While the above steps are important and useful, there are some attacks, like coordinated DDoS, that you simply cannot fight off on your own. Fortunately, a number of tech security companies specialize in anti-DDoS tools and services. If you suspect your site (or one of your client’s sites) is being targeted for DDoS, such companies can be key to a successful defense.
I recommend following wordpress.org to learn what’s new in the world of security.
Summary
Giving honest Googlebots what they want is quite simple. Develop strong, relevant content and publish regularly. Combatting the fake Googlebots and other bot bandits is a bit tougher. Like many things in life, it requires diligence and hard work.
Thanks Brian for this post.
Yesterday when I checked my blog I saw that 95% of traffic is coming from BOTs which includes ilovevitaly, darodar, button-for-website.com etc which are still coming even though I ticked in analytics to prevent visits from bots & spammers.
It is really frustrating to see only bots & bots in your traffic report. I hope Google should do something to eradicate these so called bots/spammers who do this.
Didn't know about trackbacks, thanks for the tip.
You're welcome Hyderali.
Yes, it can be extremely frustrating dealing with Google on these issues. In the past, I've used iThemes Security (formerly Better WP Security) and was satisfied with the results, for the most part. Recently I upped my security game by signing up with Incapsula. They have a pretty robust security solution without forcing me to take out a loan to afford it. Free was okay for starting out but at some point as web traffic increases, improved SERP's drive visitors - my site became a target. I just can't afford any downtime.
Yes, Kristi rocks!
Thanks Brian, such a timely post for me. For too many of these issues there aren't any great solutions which is really sad. What I'm missing is referrer spam though, e.g. darodar or iloveitaly. Do you have any advice on that? Everything I find looks like I have to spend hours on fixing this..
Hi,
Below are the two perfect guides on -
1. Removing Referral Spam from Google Analytics
2. Geek guide to removing referrer spam in Google Analytics
Both are super easy to understand & you don't need any help also.
Thanks Hyderali, those are two really good references for webmasters. Cleaning up the garbage comments on Wordpress every day gets really old. The "geek guide" is an interesting guide and helpful piece of information.
You're welcome AwesomeEves! Thank you for reading it and commenting! Other than the solutions I mentioned in the post - here are some other paid solutions SieLock, Incapsula or Sucuri. I can't afford to manually manage these issues anymore, it gets axhausting.
Nice read, Brian! Thumbs up!
Thanks a lot grobro!
Pretty cool and useful article overall, Brian. Another benefit of stopping bots as preventing your competition from stealing your backlinks ;)
That said, none of the commercially available WP security products on the market really do a proper job of combatting bad bot behavior. The problem is they can switch IPs so easily that black listing is futile.
Once you've had your clients show up on SAPE a few times you start taking this seriously =D
We've spun together a plugin that detects bad bots via a honey trap then places them in a penalty box. We customize the parameters for bad behavior as well. Cuts out a TON of the issues. Highly recommend doing something similar.
Thank you, Kyle - I'm glad you found it useful.
Sounds like a great custom plugin.
As my trafiic and brand has grown, this means my sote has become a bigger target for bots. So I've had to upgrade to a professional service. The plugins were a great to begin with.
Great info Brian! I use the trackback approach to find out if my content is getting scraped - it's amazing how often that happens.
Thank you Kristi! Much appreciated. That is a great way to discover scrapers. What do you use to combat against comment spam?
Shouldn’t one of the main defenses be to limit the number of requests made on PHP files? Those bandit bots can max out a server’s memory in no time if enough of them get through. A strong firewall may be able to detect and block malicious bots by identifying their user-agent name, URL slugs or strange query strings. Will it offer 100% protection? Who knows? It’s just one more tool that you can place in your bag of tricks. Thanks for the additional information.
Hi Brian, informative post to say the least! I am having this problem where someone is adding /?kw=example.com after my domain name and submitting this url to addthis and other social platforms. Should I be worried about this behavior and how can I stop them? Thank You!
A good article . thumbs up
I would agree!
Thank you Victor! Glad you liked it.
Whenever someone asks a blogger how much traffic they have I wonder if they mean before they blocked bots or after. Smaller blogs will be saddened to learn that 80+% of their traffic was bots. When you start blocking them your traffic will drop to actual humans and Google crawlers. It may take some time to get used to the fact that you didn't have the traffic levels you thought you did.
Yes, I agree Gail, this is usually a tough adjustment for webmasters and business owners to go through. It's even worse if they are monitoring the traffic in Awstats.
Thanks Brian for sharing this information.
But it is awfully WordPress focused. How about adding some edits with resources for sites that are not on WP?
You're welcome Gregory. I appreciate your feedback and request. I am planning part two, which will expand on other CMS's, such as Joomla and Drupal - along with basic HTML sites. Part 3 will cover e-commerce such as Magento and etc.
Guys, don't forget about Cloudflare - it's easy to set up, free, speeds up your site and it can get rid of all the unwanted bots.
For the past month or so, my e-commerce site (on Yahoo! Store platform) homepage has been receiving approximately 1000 direct bot visits daily. They are all bouncing almost immediately (average time on site is .3 seconds) and they're coming in from a broad range of geo regions, ISP's, browsers, etc, thus making them very difficult to filter out. Is there anything I can do to combat this?
Andrew - I would recommend going to an enterprise level security service. I am using Incapsula, but there are plenty of them out there to choose from. Here is a post that has some options and how to choose. Good luck and let me know what you end up doing.
Great post !!!
To prevent hackers spameen on my website, I also use Akismet. At first when you have little traffic do not mind going as eliminating those annoying spam messages manually, but when you start to have a significant traffic and is becoming quite annoying as you need much more time to detect and delete these spam messages.So far I have not found any bot, but we have to be aware ...
Thank you, Tino! Yup, that's how it goes. The more traffic a site gets the more that site becomes a target for bots and spammers. I switched from using askimet to using captcha.
Equally annoying is the tactic of flooding your website Analytics as a marketing tactic (e.g. Semalt). Filter that were successful before at removing these bots seem to now be ineffective.
Yes - it's a very slimy marketing tactic!
Nice technical article. Thanks for sharing.
Very interesting post. I have a few WP sites and the only threat i noticed was when i installed the Wordfence plugin. It alerted me many times about bots trying to log into my sites using brute force. I realized that i had to rename the wp-admin folder to something else and since then, the login attempts stopped.
Thank you, anikilator. Wordfence is a good plugin. It can definitley keep you busy following all og it's recommendations.
ILoveItaly, Darodar, Buttons-for-websites and something other isn't bots that visit your website. They're make "fake" visit with cheating Google Analytics.
There are two ways - first is to use codes from UA-100000-1 to UA-10000000-1 and do it's dirty job. Other is to use scrappers to crawl Alexa Top 1000000 and just recrawl sites to extract Analytics ID (UA-XXXXXX-XX). It's funny but that sites infect only properties with UA-XXXXX-1, if you make property ended with -2, -3 or other you will be safe for while.
Since this is related to Analytics you can't fix it with disabling bot traffic. You need to make modification in your Analytics account and skip such traffic. Very quick and BAD way is to block it there as "Refferal exclusion list". But this isn't recommended since this is workaround for other issues - like payments in PayPal in ordering on your site. If you wish to remove use filters Admin -> View -> Filters -> Custom -> Exclude -> Referral.
If you have custom analytics like you're in safe. This include KISSMetrics, Piwik, MixPanel, Woopra, Adobe Analytics or other tools. But there are amazing tools just as PhantomJS so this statement won't be valid too long.
A point of clarification: Moz Analytics is not custom analytics. We don't have our own analytics program, but we do integrate with GA.
Sorry for that i will remove it from original posting. I'm not Moz customer and i can guess only what's going on Analytics.
Hi Peter,
If I understand correctly, you are saying that this problem only occurs with Google Analytics and it's not occuring with Adobe Analytics and the other mentioned tools. Correct?
Can you clarify why?
Best,
Steven.
Please note ...
Almost all sophisticated malicious bots cloak themselves through VPN proxies and user agent request headers to emulate having authority... .htaccess can only block 'dumb' bots... not ones smart enough to cloak their identities ... in short, any robust robot can circumvent .htaccess with a simple domain ping script...
In order to prevent scraping and malicious bots you need a robust solution... the strongest level of protection would be on the network itself... I'd suggest looking into an enterprise level load balancer if you really need a comprehensive on-premise solution.
I asked my friend Ed, who is a professional developer / programmer / code GEEK - to chime in. Thank you, Ed for dropping some knowledge on the topic. To your point - is exactly why I left the free methods of securing my site and have gone with Incapsula. So far so good.
Thanks Brian, for this important and helpful post. I was seeing some irrelevant referral traffic coming onto some of my sites since last couple of days. Now I have clear clues how I can block them from GA tool.
I have one quick question to you. I am not a developer but I have some basic knowledge on programming and web security. I believe hacking is done on database-driven dynamic or CMS sites but Is there any possibilities of hacking of any static site?
Monitoring own site, Google Webmaster and Google Analytics is the savior. Truly said, Ignorance can't be an Excuse to Google. Thanks for sharing the information and the solutions.
Regards
Soumya Roy
You're quite welcome Soumya! Glad you found value in it!
You can find answers here https://security.stackexchange.com/questions/74360/can-my-raw-html-website-get-hacked and or here https://stackoverflow.com/questions/878430/can-an-apache-served-pure-html-website-be-hacked. Those two links should help you find the answers you're looking for. let me know if you have any more questions.
Thanks Brian, this is a timely post for me too as I'm struggling with bots and a lot of referrer spam. Anyway, I've been using Akismet against spammers for some time and it works pretty well.
You're welcome Gabriele! Thank you for your comment. Which solution are you thinking of going with?
I'm going to talk with my devs about the htaccess one against scrapers ;)
Really solid advice Brian! I would be absolutely lost without Akismet - it's a complete lifesaver with four blogs and a few other websites. Without it, I'd probably just have to turn off commenting on my blog altogether.
Thank you very much TheBarefootNomad! I appreciate your feedback! No kidding - I am also using a combination of WordPress plugins, Really Simple CAPTCHA and Social Login which cut down on comment spam quite a bit. Spammers don't generally login with social to spam, lol.
As a moderator at Inbound, I can assure you that spammers DO login with disposable social accounts to spam.
The automated spamming software does?
Not yet. Here is procedure on Inbound.
Hi Peter - I'm very aware of the manual spammers using social login, but I was referring to spam automation. Using social login on my site is one line of defense against automated comment spam. The added lines of defense like Captcha help to fend off the manual spammers.
Yes... bots can maintain session state, create disposable accounts, emulate users ... pretty much anything you can do online can be scripted....
Good stuff!
Another post here: https://plus.google.com/+GoogleAnalytics/posts/2tJ...
Also Alex Moss built a Semalt blocker plugin for WP if anyone needs: https://wordpress.org/plugins/semalt/
Thank you - Gareth! Glad you found value in it. Thank you for posting that from Google Analytics.
I've heard about that. Thanks for the reminder. I'll have to download it. Here's two new ones that showed up in my analytics last month hulfingtonpost.com and blackhatworth.com. They just don't stop.
Great post, well done. Enjoyed reading it, it did make me want to go watch robot wars again though...!
Thank you - Chris, glad you liked it! Netflix!
Hi Brian great post!
I was somehow aware of the situation but the way you described was both fun and informative. I just could not quit reading it.
One think that I would like to add (Please forgive me, probably this is out of the scope of this post) Whenever you are editing a .htaccess file make sure to stay out of the #BEGIN WordPress and #END WordPress comment tags. Whatever is being written between this two comments tags is for Wordpress configuration purposes that you or some plugins might need to instruct. For example, If you change the settings to change the default Permalinks.
Hi Raul - thank you! I'm glad you enjoyed it and thank you for your comment.
Yes, you are on the money with that. People need to be very careful editing the .htaccess file.
Excellent coverage, Brian. The bots really are a nuisance, and people have different reasons for using them so they don't waste time doing what can be automated.
Unfortunately, it adds load to the target site.
Thank you - Alan! I'm glad you found value in it. Yes, it's a double edged sword.
Brian, this was a great read - but one thing threw me off. In regard to the section on click frauders, you discuss AdWords click fraud (people clicking on the Google ads you've created and eating up your PPC budget) - but the plugin you mentioned is to block AdSense click fraud (those who click other people's ads that you've published to your site).
Thank you - Jeffrey! I'm glad you enjoyed it and thank you for your feedback. The click fraud plugin is relevant to Google AdWords because the ads that get fed through the AdSense code is from Google AdWords (find out more). Using the plugin prevents Google AdWords customers from getting their budget sucked up from fraudulent clicks, which can result in banning the AdSense customer. What I think you're saying is that the AdWords customer has no control in this but rather the AdSense customer is in full control. Is that what you are eluding to?
Yes, that's correct. What I was trying to say is that you begin by talking about how click fraud wastes your budget because people are clicking on your ads, then you give that plugin as a way to fight back. Problem is, that plugin doesn't actually give advertisers any control over people clicking their ads and running up their budget, it just gives a website owner control of people clicking OTHER people's AdSense display ads that are published on your website.
I only mentioned it because I got my hopes up thinking there was a plugin that may help prevent click-frauders from running up my own PPC budget, as that's what you talked about in that section.
Now that you've pointed it out Jefferey - I see what you mean. I wrote it in such a way that makes it seem like it gives the Google AdWords customer the control over the plugin. I should've added a disclaimer that states the control is in the hands of the webmaster (AdSense customer). Let me see what I can do. Thank you very much for the heads up Jeffrey and I'm sorry for the misscommincation.
No worries Brian - just wanted to help. I appreciate you taking the time to respond!
No problem, Jeffrey. Have a great weekend!
how to stop people by linking bad links through automatic softwares how can I stop them google disawow tool works after 100 years till then rankings get slow here is my website any good suggestions
I have just started a movie site with WordPress. Now I want to block all bat robots that sends me fake traffic again and again to take my site down.
Please tell me how do I block all BAD ROBOTS using robots.txt or .htaccess file?
Please help!
Thanks for the .htaccess share. I was actually just about to try and pull a list of spam bots and you've saved me the time and fishing.
Great article, many thanks.
Really great information, thank you so much
Great article, From few days I saw most of traffic is coming from BOT’s. on my blog
Thank's for this helpfully article It has helped me tremendously.
very cool post I wanted to know more about the bot thingie..
Nice Tips Brian
Really Appreciate :)
Contact [email protected], he is among the few people who do legit hack work and trust me Ive met a lot of them
NEED A HACKED ATM CARD WITH PIN?
We sell physical loaded ATM cards . It is a lroned card that can be used to withdraw Cash at any ATM Machine. This Cards comes in Visa/Mastercard. Therefore it works at any ATM Machine that accept Visa/mastercard Worldwide.
CAN I USE THIS READY MADE ATM CARD TO BUY STUFF IN STORES? OR ONLINE SHOPPING? PAY BILLS?
Yes, with this physical ATM card, you can use it to pay stuff at stores through POS. With this ATM card information, you can use it online to pay bills or do online shopping. When you order for this card, Full information about the card will be given to you. We also reload your card when funds exhausted.
DO I NEED TO ORDER NEW CARD EVERY TIME I SPENT THE FUNDS FINISH?
No, if you have already ordered our card, there is no need to keep ordering new cards, Just contact us for a reload. We shall easily reload the ATM card already in your possession
HOW LONG DOES IT TAKE TO RECEIVE ATM CARD IN MY COUNTRY?
If you are in the USA, you will receive your card in 48 HOURS with guaranteed. If you are outside the USA, Your card will arrive to you between 3 – 5 business days guaranteed.
HOW SAFE IS THIS CARD?
It is 100% safe to use this card. Because it will be shipped to you as a gift card.
DO YOU ALSO RELOAD ANY OTHER CARD NOT FROM THIS CRONED CARDS?
Yes, we can reload any Active and valid cards, any type of card just contact us for a reload (prepaid cards, credit/debit cards).
HOW DO I ORDER FOR THE ATM CARD?
Send us an Email: [email protected]
HOW DO WE MAKE THIS CARD?
we use a machine MSR to crone this cards . You can also buy this machine from us at $850 Only. You can order for the ATM card AT A REDUCED PRICE, either the designed card or the blank card but still same information on them.
if you are interested, send an email to [email protected]