Googlebots-2.png

(Image created by the author)

The Bot Bandits Are Out of Control

I’ve always known that bots crawl my websites and the sites of all my fellow developers, but I was unaware that bots now make more visits than people do to most websites. Yep, they officially overtook us in 2012, and bots now dominate website visits. Egad, it’s Star Wars run amok!

Before we become alarmed, though, let’s look at a few facts that demonstrate the preponderance of bots in our midst.

The bots are coming. The bots are coming. The bots are here!

(Image source)

Incapsula’s 2013 bot traffic report states that “Bot visits are up 21% to represent 61.5% of all website traffic." If bots are preponderant, what does that mean for us?

For those of you just tuning in, preponderance means “the quality or fact of being greater in number, quantity, or importance.” That means the bots are “more important than humans” in determining the value of websites to potential readers.

A quick look at antonyms for preponderance reveals that our plight is worse than expected. Antonyms for preponderance include disadvantage, inferiority, subordination, subservience, surrender and weakness.

All is not lost, however. Not all bots are bad. In fact, in the wild and woolly world of SEO, Googlebots are actually our friends. A "Googlebot" is Google's web crawling bot, also known as a "spider," that crawls the Internet in search of new pages and websites to add to Google's index.

Googlebots: Our Ally in the Bot Wars

If we think of the web as an ever-growing library with no central filing system, we can understand exactly what a Googlebot wants. A Googlebot’s mission is to crawl this library and create a filing system. Bots need to be able to quickly and easily crawl sites. When a Googlebot arrives at your site, its first point of access is your site’s robot.txt file, which highlights the importance of ensuring it's easy for the bots to crawl your robots.txt file. The less time Googlebots spend on irrelevant portions of your site, the better. At the same time, be sure you have not inadvertently siloed or blocked pages of your site that should not be blocked.

web-crawler-s-cropped.jpg

(Image source)

Next, Googlebots use the sitemap.xml file to discover all areas of your site. The first rule of thumb is this: keep it simple. Googlebots do not crawl DHTML, Flash, Ajax nor JavaScript as well as they crawl HTML. Since Google has been less than forthcoming about how its bots crawl JavaScript and Ajax, avoid using this code for your site’s most important elements. Next, use internal linking to create a smart, logical structure that will help the bots efficiently crawl your site. To check the integrity of your internal linking structure, go to Google Webmaster Tools -> Search Traffic -> Internal Links. The top-linked pages should be your site’s most important pages. If they aren’t, you need to rethink your linking structure.

So, how do you know if the Googlebots are happy? You can analyze Googlebot’s performance on your site by checking for crawl errors. Simply go to Webmaster Tools -> Crawl and check the diagnostic report for potential site errors, URL errors, crawl stats, site maps and blocked URLs.

The Enemy in our Midst: Bandit Bots

Googlebots aren't the only bots visiting your site. In fact, over 38% of the bots crawling our sites are out for no good. So not only are we out-numbered, but nearly 2 out of every 5 visitors to your site are trying to steal information, exploit security loopholes and pretend to be something they are not. 

We'll call these evil bots "bandit bots".

So, what are we to do?

As an SEO provider and website developer, I could protest. I could blog my little heart out and get a few friends to join me. Or I could buckle down and take responsibility for my own little corner of the web and fight back against the bandit bots. 

Let's do this together.

Bandit Bots: What They Are and How to Fight Back

Terminator-Robot-dreamstime_s_34845625-C

(Image source)

The bad guys come in four flavors. Learn which bots to watch out for and how to fight back.

Scrapers

These bandit bots steal and duplicate content, as well as email addresses. Scraper bots normally focus on retrieving data from a specific website. They also try to collect personal information from directories or message boards. While scraper bots target a variety of different verticals, common industries include online directories, airlines, e-commerce sites and online property sites. Scraper bots will also use your content to intercept web traffic. Additionally, multiple pieces of scraped content can be scrambled together to make new content and allow them to avoid duplicate content penalties.

What’s at risk: Scrapers grab your RSS feed so they know when you publish content. However, if you don't know that your site is being attacked by scrapers, you may not realize there's a problem. In the eyes of Google, however, ignorance is no excuse. Your website could be hit by severe penalties for duplicate content and even fail to appear in search engine rankings.

How to fight back: Be proactive and attentive to your site, thus increasing the likelihood that you can take action before severe damage is done. 

There are two good ways to identify if your site is the victim of a scraper attack. One option is to use a duplicate-content detection service like Copyscape to see if any duplicate content comes up.

Copyscape-Plagiarism-Checker-Cropped-hig

(Image created by the author)

A second option for alerting you that content might have been stolen from your site is to use trackbacks within your own content. In general, it's good SEO to include one or two internal site links within your written content. When you include these links, be sure to activate WordPress's trackback feature. In the trackback field on your blog’s entry page, simply enter the URL of the article you are referencing. (In this case, it will be one on your own websites, not another site).

Add-New-Post-WordPress-cropped-highlight

Add-New-Post-WordPress-2-cropped-highlig

(Image created by the author)

You can manually look at your trackbacks to see what sites are using your links. If you find that your content has been re-posted without your permission on a spam site, file a DMCA-complaint with Google.

Finally, if you know the IP address from which scraper bots are operating, you can block them from your feed directly. Add the following code to your .htaccess files. Learn how to edit your .htaccess file. (See editing your .htaccess file on WordPress.)

RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^69.16.226.12
RewriteRule ^(.*)$ https://newfeedurl.com/feed

In this example, 69.16.226.12= is the IP address you want to send to and https://newfeedurl.com/feed is the custom content you want to send them.

Warning! Be very careful editing this file. It could break your site if done incorrectly. If you are unsure of how to edit this file, ask for help from a web developer.

Hacking Tools

Hacking bandit bots target credit cards and other personal information by injecting or distributing malware to hijack a site or server. Hacker bots also try to deface sites and delete critical content.

What’s at risk: It goes without saying that should your site be the victim of a hacking bot, your customers could lose serious confidence in the security of your site for e-commerce transactions. 

How to fight back: Most of the attacked sites are victims of "drive-by hackings," which are site hackings done randomly and with little regard for the impacted business. To prevent your site from becoming a hacking victim, make a few basic modifications to your .htaccess file, which is typically found in the public_html directory. This is a great starter list of common hacking bots. Copy and paste this list into the .htaccess file to block any of these bots from accessing your site. You can add bots, remove bots and otherwise modify the list as necessary.

Spammers

Spam bots load sites with garbage to discourage legitimate visits, turn targeted sites into link farms and bait unsuspecting visitors with malware/phishing links. Spam bots also participate in high volume spamming in order to cause a website to be blacklisted in search results and destroy your brand’s online reputation.

What’s at risk: Failure to protect your site from spammers can cause your website to be blacklisted, destroying all your hard work at building a credible online presence. 

How to fight back: Real-time malicious traffic detection is critical to your site’s security, but most of us don't have the time to simply sit around and monitor our site's traffic patterns. The key is to automate this process.

If you're using WordPress, one of the first steps to fighting back against spam bots is to stop spam in the first place. Start by installing Akismet; it is on all my personal sites as well as the sites I manage for my client. Next, install a trusted security plugin and setup automatic backups of your database.

WordPress-Security-Plugins.png

(Image create by the author)

Require legitimate registration with CAPTCHAs for all visitors who want to make comments or replies. Finally, follow wordpress.org to learn what’s new in the world of security.

Click Frauders

Click fraud bots make PPC ads meaningless by “clicking” on the ads so many times you effectively spend your entire advertising budget, but receive no real clicks from interested customers. Not only do these attacks drain your ad budget, they also hurt your ad relevance score for whatever program you may be using. Google AdWords and Facebook ads are the most frequent targets of these attacks.

What’s at risk: Click fraud bots waste your ad budget with meaningless clicks and prevent interested customers from actually clicking on your ad. Worse, your Ad Relevance score will plummet, destroying your credibility and making it difficult to compete for quality customers in the future.

How to fight back: If your WordPress site is being targeted by click fraud bots, immediately download and install the Google AdSense Click Fraud monitoring plugin. The plugin counts all clicks on your ads. Should the clicks exceed a specified number, the IP address for the clicking bot (or human user) is blocked. The plugin also blocks a list of specific IP addresses. The plugin is specifically for the Adsense customers to install on their websites; AdWords customers have no capabilities to implement this plugin.

AdSense-Click-Fraud.png

(Image created by the author)

When defending a website from hacker bots, it takes a concentrated effort to thwart their attacks. While the above steps are important and useful, there are some attacks, like coordinated DDoS, that you simply cannot fight off on your own. Fortunately, a number of tech security companies specialize in anti-DDoS tools and services. If you suspect your site (or one of your client’s sites) is being targeted for DDoS, such companies can be key to a successful defense.

I recommend following wordpress.org to learn what’s new in the world of security.

Summary

Giving honest Googlebots what they want is quite simple. Develop strong, relevant content and publish regularly. Combatting the fake Googlebots and other bot bandits is a bit tougher. Like many things in life, it requires diligence and hard work.