I veered left, but it was too late. A wall of fire sprang up in front of me, blocking my path. I turned around and there he was: Googlebot, my old nemesis and lord of the search underworld. Even as the flames engulfed me, damning me to supplemental hell, his cold, metallic laugh froze me to my very soul…

Layers of Google Hell

Ok, maybe I'm exaggerating just a little. It would probably be more accurate to compare Google's supplemental index to a virtual Purgatory, a place where morally ambiguous pages go to wander for eternity. I've personally been stuck in this purgatory for well over a year with an e-commerce client. They have a data-driven site that's been around since about 1999, and, admittedly, we've only recently started paying close attention to SEO best practices. Roughly six months ago, I realized that, of 32,000 pages we had in Google's index, all but 7 were in supplemental. Now, before I offend Matt and Vanessa, I should add that I do believe them when they say that the supplemental index isn't a "penalty box." Unfortunately, when 99.98% of your pages are stuck in supplemental, there are, in my experience, very real consequences for your Google rankings.

Yesterday, I logged into Google Webmaster tools and  finally saw the magic words: "Results 1-10 of about 24,700." Translation: our content has finally made it to the show. So, in celebration, I’d like to share what I learned during this Dante-esque, six-month journey through hell/purgatory. First, a few details: the site is essentially a search engine of training events, powered by ColdFusion/SQL. Many of our problems were architectural; it's a good site with solid content, we've never used black-hat tactics, and I don't think Google was penalizing us in any way. We simply made a lot of small mistakes that created a very spider-unfriendly environment. What follows is a laundry list of just about everything I tried. This is not a list of suggestions; I'll try to explain what I think worked and didn't, but I thought walking through the whole process might be informative:

  1. Created XML sitemap. Fresh from SES Chicago, I excitedly put a sampling of our main pages into a sitemaps.org style XML file. It didn't hurt, but the impact was negligible. 
  2. Added Custom Page TITLEs. By far, our biggest problem was use of a universal header/footer across the site, including META tags. Realizing the error of our ways, I started creating unique TITLE tags for the main search results and event details pages. 
  3. Added Custom META descriptions. When custom titles didn't do the trick, I started populating custom META description tags, starting with database-driven pages. It took about 1-2 months to roll out custom tags for the majority of the site.
  4. Fixed 404 Headers. Another technological problem: our 404s were redirecting in such a way that Google saw them as legitimate pages (200s). I fixed this problem, which started culling bad pages from the index. The culling became noticeable within about two weeks. This was the first change with an impact I could directly verify.
  5. Created Data-not-found 404s. Although this is somewhat unique to our site, we have an error page for events that have passed or no longer exist. This is useless to have in the index, so I modified it to return a 404. The user experience was still unique (they got a specialized error and search options), but the spiders were able to disregard the page.
  6. Re-created sitemap.xml. Reading about Google’s crackdown on search results that return search results, I rebuilt our sitemap file to contain direct links to all of our event brochures (the real "meat" of the site).
  7. Added robots.txt. Yes, I didn’t have one before this, because, frankly, I didn't think I should be blocking anything. Unfortunately, due to the highly dynamic nature of the site, the index was carrying as many as 10 duplicates of some pages (e.g. same page, slightly different URL). I started by purging printable versions of pages (links with "?print=1", for example) and moved out from there. Results were noticeable within two weeks, much like the 404s.
  8. Added NOODP, NOYDIR tags. This helped with our outdated description on Yahoo, but had no effect on Google, which wasn't using our Open Directory information anyway.
  9. Created Shorter, Friendlier URLs . This was a biggie. Being a dynamic, ColdFusion site, we were using way too many URL parameters (e.g. "/event.cfm?search=seomoz&awesomeness=1000&whitehat=on"). I was avoiding the re-engineering, but decided to simplify the most important pages, the event brochures, to a format that looked like "/event/seomoz".
  10. Revealed More Data to Spiders. One of my concerns was that the spiders were only seeing search results 10 at a time, and wouldn't visit very many "Next" links before giving up. I added specialized code to detect spiders and show them results in batches of 100+.
  11. Changed Home-page Title. Going through the index, it occurred to me that just about every major page started with the same word and then a preposition (e.g. "Events on", "Events by", etc.). I decided to flip some of the word-order on the home-page TITLE tag, just to shake things up.

Sorry, I realize this is getting a bit lengthy, but I felt there was some value in laying out the whole process. Steps 9-11 all happened soon before we escaped supplemental, so it's a bit hard to separate the impact, but it's my belief that #9 made a big difference. I also think that the culling of the bad data (both by #5 and #7) had a major effect. Ideally, instead of 32,000 indexed pages, our site would have something like 2,500. It sounds odd to be actively removing pages from the index, but giving Google better quality results and aggressively removing duplicates was, in my opinion, a large part of our success. We're down to about 24,000 pages in the index, and I plan to keep trimming.

Of course, the effects of escaping supplemental on our search rankings remain to be seen, but I'm optimistic. Ultimately, I think this process took so long (and was so monumentally frustrating) because I was undoing the damage we had done slowly in our inept spider diplomacy over the past 3-5 years. Now that we've dug out, I think we'll actually get ahead of the game, making our search results better for Google, end-users, and our bottom line. I hope this is informative and would love to hear from others who have gone through the same struggle.