It's an exciting day at SEOmoz - Linkscape's index has updated with fresh data crawled in the past 30 days. This update also gives us a chance to show off lots of interesting data points around the web's usage of search-specific tags and directives. Let's dive in!
The Canonical URL Tag Grows in Popularity
Rel Canonical is here to stay. Websites have been growing in their adoption of the tag since it's announcement and this index has the highest number and percentage of URLs employing it to date.
The overall numbers are still small. Canonical URLs are on less than half of 1% of all pages, and I suspect duplicate content is much more prevalent, thus giving SEOs a lot of opportunity to help sites apply this directive. Don't forget that you CAN use the tag on the original version of the page, too.
Usage of Nofollowed Links Falls
It would appear that the nofollow directive is falling out of favor, as evidenced by the chart below:
Nofollow use is down, both on external links and internal links, though it's taken more of a hit on internal links. Perhaps that's a sign more SEOs are getting internal nofollows removed after Google's announcement on the topic.
May 2010's Linkscape Index Stats
Linkscape's index this month has the largest number of unique, root domains we've ever indexed and has improved quality in several other ways as well. For example, some of you reported some link spammers that were highly effective in gaming page/domain authority scores, and those should be fixed in OSE.
- Pages: 41,202,970,156 (41 Billion)
- Subdomains: 289,291,281 (289 Million)
- Root Domains: 85,725,739 (85 Million)
- Links: 424,255,504,138 (424 Billion)
You can see a chart of growth in the number of root domains (e.g. *.domain.com) below:
This shows the growth we've been doing in reaching more new sites and getting a broader picture of the web. We've taken to heart the feedback that it's frustrating when we don't have any data on a site and are reaching out in accordance (these numbers may also show that there's lots more websites getting registered and earning links).
I've also embedded a chart below showing Linkscape's raw index URL count:
You'll notice that at the beginning of this year, we ramped up index size at the request of our users. Unfortunately, we found that this didn't correlate well to quality or usefulness in every case, so we've been refining our crawl selection and metrics before we attempt to scale up again. We do plan to grow the index again, but we're much more concerned with the value of the links and pages we report back, so we won't grow just for the sake of numbers - as Danny Sullivan and Google themselves have pointed out many times, size ≠ quality.
Changes to How OSE & Linkscape Define "Followed" vs. "Nofollowed"
Based on some more feedback from users and API partners, we've made a change to how we define "followed" and "nofollowed" links through our API, and you'll see this in Open Site Explorer. Our friends noted that links containing the rel="nofollow" attribute aren't the only ones that don't pass link juice, so we've gone ahead and made two buckets as below:
Followed:
- 301 redirects
- normal HTML links
- pages that meta refresh (Google appears to treat these like 301s)
- pages with rel="canonical" directives to another URL
Nofollowed:
- links marked with rel="nofollow"
- links on pages with the meta robots "nofollow" directive
- feed autodiscovery links for blogs/RSS feeds (we're fairly sure Google doesn't treat these as juice-passing links)
- 302 redirects
If you're using the API to pull in link data, you'll see these new delineations, which should also help with previous disparaties in link count numbers (because adding followed+nofollowed previously didn't include some of these other types of links).
Some News on the SEOmoz API
We are proud to announce the release of a Linkscape Ruby gem. This gem contains all of the code we used to access the Site Intelligence API and power Open Site Explorer. If you were looking for a time to get started with our API, this bit of sample code should make it even easier. For more information about the gem, check out the Ruby section of Sample Code page here.
We're also making it easy to track future updates via the Linkscape Schedule in our API wiki. If you haven't yet checked out the API, now's the time - you can build remarkable things for on-site analysis, link data extraction or anything else that requires trillions of links :-)
A Fond Farewell to Nick Gerner
Unfortunately, I've got some sad news to report as well. Nick Gerner, who helped to create Linkscape in 2008, is leaving the team next week. He's been an incredible engineer and a good friend to everyone here at SEOmoz and many of our colleagues in the community as well. We wish him well and can't wait to see what he does next (he's assured us it's something exciting in the startup world).
If you've been connecting with Nick regarding the API, you can send those requests to Sarah Bird and feel free to pass any direct questions about Linkscape to sitesupport where Ben, Chas & Phil are helping to improve the index and our tools on that front.
Looking forward to the discussion - hope this weekend post doesn't intrude on too much family time. Don't forget to have a great Mother's Day!
It's been blast working at SEOmoz. And I'm excited to see so many new hands getting involved in Linkscape (both inside and outside SEOmoz). I'm confident it will continue to be a useful source of data and intelligence, and will continue to improve.
And don't worry, once I stop being staff, all my mozPoints will count to rankings here and on https://www.mozpoints.com/.
So I've got lots of incentive to continue to participate :)
sorry to see you leave your were a lot of help with SEO Site Tools
thanks alot and good luck...
What would make the % of nofollows over time graph more useful is if it also had the number of links of time graphed with it. I'd like to see if that number went down just the same as nofollows. If so then nofollows didn't really go down since we would have to look at the percent of nofollows of the overall links and it's change over time - or is that what this graph already shows?
Hi Sean - yeah, it's already showing percentage, not absolute numbers, so we're controlling for variations in the number of links in our index.
I do think it would be valuable to show the percentage of root and subdomains employing nofollows as well as just pages, though.
Just wanted to say - best of luck Nick - your work has been amazing and I wish the all the best of luck for the future!
Nice post, thanks!
Ahem, I have always assumed that a meta refresh is treated as a 301 redirect only if set to 0 seconds; if >1 it's treated as a 302. An important distinction.
Awesome article, Rand. In a way I think it's good to see that a lot of the web is taking the initiative to prevent canonicalisation problems and also reducing the overuse of rel=nofollow on their websites (a lot of the time it is unnecessary and is sometimes downright rude).
Unfortunate to see Nick Gerner leaving your ranks, hopefully he'll go on to do great(er) things!
Ah nice will have to do an update to those who did the test using Dr Pete's OpenSite Explorer link profile
https://www.seomoz.org/blog/link-profiling-with-open-site-explorer
Sorry to hear Nick is moving on, but good idea to move to a startup and get them focused from day 1 being #1
Sorry to hear that Nick's leaving - look forward to hearing what he's up to next.
Very useful info on rel=canonical BTW.
To deal with the duplicate content issue we give 301 redirect in the .htaccess from the non www version to the www version of the site. And on the home page we give a canonical tag to specify
domainname.com
www.domainname.com
domainname.com/index.html
www.domainname.com/index.html are the same.
and also set the preferred domain as www.domainname.com in the webmaster tools.
But whenever we have used the on page canonical tag the no. of indexed pages of the site in google go on decreasing. So we have no other option but to remove the canonical tag.
If we do not use the canonical tag then the indexing improves again.
Why does that happen?
WMT (not surprisingly) was giving me duplicate warnings for listing pages (page2, page3 etc). So I thought Canonical tag to the rescue. Unfortunately I saw the same - a seemingly dramatic drop in GoogleBots' visiting activity and a marked drop in indexed pages.
I couldn't nail it down to anything else I had changed. Removed the tag, and the trend reversed - more spidering, and more pages in the index.
The listing pages themselves aren't important of course, and don't need to rank for anything. What they link to on the other hand is another matter entirely.
We also had a significant decrease in traffic following using canonical and found the culprit to be pagination. We (unfortunately) have significant pagination and found that canonical affects the bot crawling deeper into the pagination.
I saw that exact trend too. I got worried when WMT threw warnings saying that "it may be crawling uneccesary pages" due to the high number of pagination/search result filters on my sites.
I ended up reverting back to the original structure, and decided to let googlebot figure it out and do what they want with it. Seems to be doing better now. Besides, MNT even says that warnings wont affect crawl rates or indexation, which to me means i can ignore until they saw otherwise.
Rand I remember in one of the WBFs you had mentioned that the canonical tag had made one of your websites go haywire too.
Could you please throw some light on this.
I just noticed that you have not used the canonical tag on www.seomoz.org and www.seomoz.org/blog .
I have added this comment as a reply just to mantain the link to the topic.
Hi there,
Just wondering where in WMT you see the it may be crawling unecesscary pages? We've a massive site and we are undergoing a redesign at the moment so just want to make sure the crawl of the new site is as efficient as possible. So i'm trying to identify the crawling of unecessary pages. So other than html suggestions, not sure how to find these in WMT?
thanks,
Suzanne
In my opinion the increase of canonical tag, is also caused by the latest version of WordPress, which specifies the canonical metatag in all pages.
Sorry to hear we're losing Nick, and best of luck to him in his next endeavor.
Just a note, because it does come up in Q&A a lot. Remember that you can use rel=canonical cross-domain now. Google has officially endorsed it for "legitimate" uses.
As with many here, sorry you are leaving Nick. Linkscape is a really great piece of work. Best of luck on your new adventure.
Since when I started working as SEO, Linkscape has been one of the most useful tools I've ever used, and I'm so glad it's having the success it deserves and that is getting better with time.
And I thank Nick because he had been part of its success, and wish him all the best for his new adventure..
I know that many other news are going to come in the next few months on tools side of SEOmoz, as you are announcing them from time to time here and there.
These news are good news for a good weekend... let me go back to my baby "orcs" ;)... and an happy Mother's Day to all the SEO Moms
Definately a great tool and thanks for the timely updates!
I am very curious to see what these "Weapons of Mass Awesomeness" are that lie ahead!
Great post as usual Rand.
Linkscape has come a long way since back when it first was launched, that's for sure.
I'm happy (for you Nick) that you're leaving for a new, ground floor opportunity.
I'm sad for us 'cuz I think you are a brilliant engineer that we've really benefited from.
Best wishes for your future success Nick. Don't be a stranger, and in your spare time (hah, hah) come back and leave a comment or two every now and again.
What a great tool, thanks for the update and improvements.