It's an exciting day at SEOmoz - Linkscape's index has updated with fresh data crawled in the past 30 days. This update also gives us a chance to show off lots of interesting data points around the web's usage of search-specific tags and directives. Let's dive in!

The Canonical URL Tag Grows in Popularity

Rel Canonical is here to stay. Websites have been growing in their adoption of the tag since it's announcement and this index has the highest number and percentage of URLs employing it to date.

The overall numbers are still small. Canonical URLs are on less than half of 1% of all pages, and I suspect duplicate content is much more prevalent, thus giving SEOs a lot of opportunity to help sites apply this directive. Don't forget that you CAN use the tag on the original version of the page, too.

Usage of Nofollowed Links Falls

It would appear that the nofollow directive is falling out of favor, as evidenced by the chart below:

Nofollow use is down, both on external links and internal links, though it's taken more of a hit on internal links. Perhaps that's a sign more SEOs are getting internal nofollows removed after Google's announcement on the topic.

May 2010's Linkscape Index Stats

Linkscape's index this month has the largest number of unique, root domains we've ever indexed and has improved quality in several other ways as well. For example, some of you reported some link spammers that were highly effective in gaming page/domain authority scores, and those should be fixed in OSE.

  • Pages: 41,202,970,156 (41 Billion)
  • Subdomains: 289,291,281 (289 Million)
  • Root Domains: 85,725,739 (85 Million)
  • Links: 424,255,504,138 (424 Billion)

You can see a chart of growth in the number of root domains (e.g. *.domain.com) below:

This shows the growth we've been doing in reaching more new sites and getting a broader picture of the web. We've taken to heart the feedback that it's frustrating when we don't have any data on a site and are reaching out in accordance (these numbers may also show that there's lots more websites getting registered and earning links).

I've also embedded a chart below showing Linkscape's raw index URL count:

You'll notice that at the beginning of this year, we ramped up index size at the request of our users. Unfortunately, we found that this didn't correlate well to quality or usefulness in every case, so we've been refining our crawl selection and metrics before we attempt to scale up again. We do plan to grow the index again, but we're much more concerned with the value of the links and pages we report back, so we won't grow just for the sake of numbers - as Danny Sullivan and Google themselves have pointed out many times, size ≠ quality.

Changes to How OSE & Linkscape Define "Followed" vs. "Nofollowed"

Based on some more feedback from users and API partners, we've made a change to how we define "followed" and "nofollowed" links through our API, and you'll see this in Open Site Explorer. Our friends noted that links containing the rel="nofollow" attribute aren't the only ones that don't pass link juice, so we've gone ahead and made two buckets as below:

Followed:

  • 301 redirects
  • normal HTML links
  • pages that meta refresh (Google appears to treat these like 301s)
  • pages with rel="canonical" directives to another URL

Nofollowed:

  • links marked with rel="nofollow"
  • links on pages with the meta robots "nofollow" directive
  • feed autodiscovery links for blogs/RSS feeds (we're fairly sure Google doesn't treat these as juice-passing links)
  • 302 redirects

If you're using the API to pull in link data, you'll see these new delineations, which should also help with previous disparaties in link count numbers (because adding followed+nofollowed previously didn't include some of these other types of links).

Some News on the SEOmoz API

We are proud to announce the release of a Linkscape Ruby gem.  This gem contains all of the code we used to access the Site Intelligence API and power Open Site Explorer. If you were looking for a time to get started with our API, this bit of sample code should make it even easier.  For more information about the gem, check out the Ruby section of Sample Code page here.

We're also making it easy to track future updates via the Linkscape Schedule in our API wiki. If you haven't yet checked out the API, now's the time - you can build remarkable things for on-site analysis, link data extraction or anything else that requires trillions of links :-)

A Fond Farewell to Nick Gerner

Unfortunately, I've got some sad news to report as well. Nick Gerner, who helped to create Linkscape in 2008, is leaving the team next week. He's been an incredible engineer and a good friend to everyone here at SEOmoz and many of our colleagues in the community as well. We wish him well and can't wait to see what he does next (he's assured us it's something exciting in the startup world).

If you've been connecting with Nick regarding the API, you can send those requests to Sarah Bird and feel free to pass any direct questions about Linkscape to sitesupport where Ben, Chas & Phil are helping to improve the index and our tools on that front. 

Looking forward to the discussion - hope this weekend post doesn't intrude on too much family time. Don't forget to have a great Mother's Day!