Many of you saw this post from seoco.co.uk this morning (or its Sphinn thread) about our Web 2.0 Awards being removed from Google's index. We noticed the same thing late last night and spent some time this morning going through what could have happened. We were relatively sure that we must have inadvertently linked to a bad neighborhood; the page is very link-heavy and includes some lesser-known sites as well as big guns like Google Blog Search and Last.fm.

For those of you who haven't heard about it, this is what we saw this morning:



That pointless little bar at the bottom of the screen that we constantly tell people not to worry about had gone from full of green (7/10) to sadly gray overnight. The story at Google was even worse.



The page you see ranking first is the full list of winners and honorable mentions, but it is the "shortened" version. The main page, URL https://moz.com/web.2.0 was gone.

To make a long story short, this morning, Rand got in touch with Google and was advised that changing the URL so it doesn't end in ".0" would be a wise decision. Google would prefer not to make an official or public comment, but they did give us permission to share this tidbit. Naturally, we investigated deeper, and found that it's not just inadvisable, but literally impossible to get a URL indexed in Google's engine if it ends with a .0 (similar to how Google won't index file extensions ending in .exe or .tgz).

Whilst there is plenty of evidence that URLs ending in .0 often belong to spam pages (wild guess here, but let's say there are 800,000 or so URLs on the web ending in a ".0" and maybe, oh... I don't know, 0.5% of them are worth indexing), I'm not sure that this is a good metric by which to determine an immediate penalty. Some other decent pages that have been hit in a similar way include https://en.wikipedia.org/wiki/Windows_1.0, which enjoys a healthy number of backlinks but which won't appear in Google. This page, URL https://en.wikipedia.org/wiki/Web_2.0, appears in Google's index as https://en.wikipedia.org/wiki/Web_2. None of the URLs which redirect to include the slash are flagged.

Becoming more fascinated by this, we did some investigating. What we discovered was that this penalty is indeed limited to the number zero. URLs ending in .n where "n" is any other number are not removed. If Google finds a version of the page that resolves with the slash, you'll avoid the penalty. In one instance, a page that resolved with underscores in place of the stop was indexed.

Below is an assortment of URLs which are indexed in Yahoo! (and many also in Live), but which show no PageRank and do not appear in Google's index. Below those, I've listed very similar pages that are indexed, but which do not end in .0.

Out of Google's Index (but in Yahoo!):
  • en.wikipedia.org/wiki/Windows_1.0
  • en.wikipedia.org/wiki/Web_2.0
  • https://en.wikipedia.org/wiki/Die_Hard_4.0
  • drupal.org/drupal-5.0
  • keznews.com/3799_Vista_Transformation_Pack_8.0_Final_-_VTP_8.0
  • en.wikipedia.org/wiki/BASIC_8.0
  • drupal.org/drupal-6.0
  • en.opensuse.org/OpenSUSE_11.0
  • www.shopping.com/xGS-Illustrator_11.0
  • www.mythtv.org/wiki/index.php/Opensuse_11.0
  • www.shopping.com/xGS-Suse_9.0
  • en.wikipedia.org/wiki/Mac_OS_X_10.0
  • en.opensuse.org/Bugs:Most_Annoying_Bugs_10.0
In the index:
  • en.wikipedia.org/wiki/Web_2
  • drupal.org/drupal-5.0-beta1
  • https://keznews.com/3799_Vista_Transformation_Pack_8_0_Final_-_VTP_8_0
  • drupal.org/drupal-6.0-beta1
  • www.mythtv.org/wiki/index.php/Opensuse_10.3
  • www.mythtv.org/wiki/index.php/Opensuse_10.2
  • en.opensuse.org/Bugs:Most_Annoying_Bugs_10.3
This page has PageRank (it shows a PR 3), but didn't show up in a Google search: https://www.fileplanet.com/62709/60000/fileinfo/WinZip-9.0-
https://www.fileplanet.com/62709/60000/fileinfo/WinZip-9.0 is not indexed and has no PageRank. Call this duplicate content if you will, but it still shows the same trend in action.

You'll notice some interesting things, such as the fact that
en.opensuse.org/Bugs:Most_Annoying_Bugs_10.3 is indexed but en.opensuse.org/Bugs:Most_Annoying_Bugs_10.0 is not.

Quite simply, making sure a page resolves with a slash will avoid this problem. I'm of the opinion that this is a pretty silly thing to penalise for without some sort of human review, but it's important that we pick up on things like this so that we can avoid such "false positive" penalties. Make sure to add "check for URLs ending in .0" to your next checklist for site reviews and please, do share if you've found any other filename extensions that exhibit similar behaviour from any of the engines in the comments.

UPDATE:
en.wikipedia.org/wiki/SAML_1.1 also seems to be suffering from a penalty and it will be useful to go through some more URLs that end in .n to gauge whether or not they're penalised. Most of the examples we saw that didn't involve a zero had not been hit in any way. I'd love to know how extensive this filter really is.