In the last 10 months, we've taken a number of dramatic steps to improve the link information available to webmasters & SEOs. Today, I'm pleased to announce even more progress in that direction, as well and cover the impressive store of data now accessible.
Sections in this post:
- Linkscape's Web Index Over Time
- Upgrades to Link Data & Metrics
- Tools to View Link Information
- The Future of Link Data
Linkscape's Web Index Over Time
When Linkscape first launched last October, it featured ~30 Billion URLs - impressive, but much smaller than the depth we've reached today. Today we're announcing our July index update (technically launched late last week) with 48.5 Billion URLs, slightly smaller than our last index, but with a greater focus on quality and less spam/junk.
# of Links in Each Index - March-July 2009
# of Pages in Each Index - March-July 2009
# of Root & Sub Domains in Each Index - March-July 2009
Perhaps not surprisingly, though, the more we crawl, the more it becomes evident that much of the web is fairly useless to index or serve. So while these numbers are to an extent meaningful, most of our work doesn't change these statistics (and some of it decreases them) yet this work still should be contributing to improving the quality of our index.
July Linkscape Update: Upgrades to Link Data & Metrics
July's index is the first to feature several important upgrades:
#1 - The "Via 301" Link Flag
When requesting link data for a site or page, we'll now show you important links that are pointing to URLs that 301 redirect to that location. I still recall early feedback from Danny Sullivan, who was very upset that Linkscape didn't show him many of what he considered the "most important links" to SearchEngineLand.com. As it turned out, a large number of those pointed to www.searchengineland.com (which does 301 redirect), hence the confusion. For deciding 301 strategy, people sometimes run reports on a 301'ed url to see just the links through it. This still works. Now, in addition those links are also shown on reports for the target of the 301.
#2 - MozRank "Evaporation" through NoFollows
The mozRank algorithm now "evaporates" link juice through nofollowed links in much the same fashion that Google messaged their change to PageRank. For those wondering why the SEO world didn't notice the nofollow change, there's some fairly compelling information in the correlation data between mozRank & Google's toolbar PageRank:
Note that due to mozRank's ability to show greater data refinement (e.g. 5.57 vs. just "5"), a "perfect" correlation would average 0.25. Thus, the MAE (Mean Average Error) is still remarkably close, but clearly changing the nofollow treatment had only a very slight impact.
#3 - Canonical URL Tags Now Indexed
Although this began in our last index, it's good to note that canonical URL tags are being picked up and indexed. We count around 35 million of them. But until it becomes more evident exactly how the different search engines are treating the tags we are holding off anything drastic, like always trusting the tag in our canonicalization code. This means unless URLs are canonicalized for other reasons, we still produce separate reports for different URL. But you may see some "canonical tag" links in a few places.
#4 - Large Sites Have More Consistent Link Data
Although we're still a few updates away from crawling as deeply as we'd like on large sites, this latest index shows considerably more and better data about "important pages" on "important domains." Some of our users noticed that although we often had a number of pages from large sites, they were frequently not the top-level or most linked-to pages - this fix works to address that. Future indices will multiply this capacity considerably.
#5 - Blogscape's Data Helping Linkscape Stay Fresher
One of the best features of newer Linkscape indices is their inclusion of fresher link data from the blogosphere and "fresh web" (social media sites like Twitter, Facebook, LinkedIn, web forums and others that push data out via feeds). Linkscape is now sucking down link data from Blogscape's fresh crawl of the web (updated from 10 million+ feeds every 3 hours) and pushing that out in index updates. Linkscape still has the delay between updates, but the link data produced is now considerably better at showing important links from the fresh web.
Tools to View Link Information
This may be a bit overwhelming, but it's also very, very cool :-) As you probably know, Linkscape data is infiltrating all sorts of tubes on the Internet. Here's a smattering:
Quirk's SearchStatus Bar
The good folks over at Quirk.biz have baked mozRank into their SearchStatus Firefox extension:
SEOmoz's MozBar
If you haven't yet installed the MozBar, I highly recommend it. I'm also very, very excited for the upgrade coming out a few weeks from today. In fact, I'm so excited, I'm leaking a spliced up screenshot (because the 800 pixel wide bar won't fit in this 600 pixel wide post):
As we noted above, knowing the number of linking root domains is critical to SEO link analysis, so we're packing it into the new release. That "analyze page" button is going to be seriously awesome, too. Sadly, as I mentioned in my previous post about SEO operators, we've been asked by Google to remove PageRank from our toolbar, but there are lots of other third-party extensions that can provide it, like the above SearchStatus bar.
The Free Linkscape API
Our free API serves millions of requests every month, spreading link data far and wide. If you have an application, an internal tool, or hate manually importing data (like I do), check out the API and Nick's post on the subject.
Top Pages on a Domain
One of my very favorite tools on the web for SEO (and Richard Baxter's too!), Top Pages lets you enter any domain or subdomain and see the pages on it that have received the largest number of links from unique root domains. The signal to noise ratio is fantastic and it's remarkably useful for both internal analysis (Do I have opportunities I'm not executing on? Where do I have some spare link juice? What pages might perform best for given keywords?) and competitive information (What is my competition doing that's bringing them links?).
Smashing Magazine has done some serious Linkbait!
Backlink Anchor Text Analysis Tool
When you need to see anchor text distribution across thousands of links in a few seconds, there's nothing else like the Backlink Anchor Text Analysis tool. Upgraded this Spring to show Linkscape data, it features sub-30-second runtimes and phenomenal comprehensiveness.
Poor Dave... His friends aren't using good keywords to link to him. Here you go, buddy - UK SEO
If you'd like even more functionality (particularly the ability to choose a subdomain, root domain or individual URL), the labs version of this tool is also quite excellent.
Linkscape Data Visualization Tool
The most recent addition to the Labs family, Nick's amazing visualizer tool helps show exactly where strengths and weaknesses exist by comparing many of the data points Linkscape calculates on a scale using Ben's preliminary rank modeling:
Everybody loves a good radar chart
Basic Linkscape Reports
The classic Linkscape reports still provide a great depth of data and metrics, but you need to know where to look (we obviously have some usability work to do). The juiciest stuff is in the "data detail" tab:
Wow... Twitter gets a LOT of links
Advanced Linkscape Reports
For digging deep into the links that point to a page/site and the associated metrics, advanced reports are still the best source of access.
I've got more to write about Oyster.com in the near future (and not just because their namesake is delectable)
The Future of Link Data
There's clearly been a lot of exciting progress made, but it doesn't hold a candle to what's possible. Marketers need data - and SEOmoz's obligation (and mission) is to answer that call. What's been done to date hasn't been easy, and what lies ahead is even harder; particularly making many pieces of incredibly complex information simple and actionabel, but if we wanted easy, crawling the web and building query-independent search ranking metrics probably wasn't the way to go :-)
Some of the biggest things we're thinking about for the future include:
- Crawling deeper and producing more frequent index updates
- Showing historical link information (this one is especially challenging because of index and web size fluctuations)
- Illustrating more about internal link architectures on a site and providing recommendations for improvement
- Building ranking models that predict actions that will drive up organic rankings
- Visualizing important data about links, pages, keywords and global metrics
Again, I'll share a brief taste of what's ahead (remember, these are just concept wireframes):
The future looks bright indeed.
As always, we rely on the feedback of our members and the SEO community to help us improve the information provided. Please leave any requests or questions in the comments or send them over to [email protected].
How are you guys making the determination between spam/junk and quality? Are you using the Spam Detection Algorithm created by Nick, Danny, and Ben? If it's super secret, I understand. Share any details you might be able. Thanks!
That "spam detection tool" only looks at the domain name itself. Much of the junk control we're doing is about carefully choosing what we choose to crawl and index at all. You've got to have some links (internal or otherwise) and your domain had better have some external links.
Awesome news Rand - great to see you guys pushing forward. The tools are already awesome and so all these new exciting updates are bonus!
One thing I'd love to see is a combination of the top pages tool and the linkscape data - being able to see which page on your domain a link points to easily is something that's missing at the moment. Even if it was just a field in the linkscape export for linked-to URL that would be cool. I'm sure you're already on this though :-)
That sounds like a good feature! +1
And just for extra points, the ability to organise top pages data by server header response on the linked to URL...
That would be a sweet tool.
Nice review.
Small suggestion -
It will be easier to consume graphs if Y axis is named more like (in billions) and then the numbers 1,2,3 vs 1 with nine 0s behind it.
Good point Rajat! I should talk to Google's Charts team on that front :-)
Or you could use a MS product like Word or Excel to make your graphs. ;)
Good eye Raj, was thinking the same thing, thumbs up
That top pages tool is a must have for every SEO.
I remember very well getting a sneaky preview of some of those wireframes at SMX and feeling really excited. For me, better visualisation and UI is definitely the way forward for Linkscape.
Jumped out of bed to post this comment. Just going to check out the 301 data and then off again (it's getting late in the UK...)
I'm super excited about the upcoming visualization stuff too. Trying to incorporate much more of that into my client reports these days, but it also helps make the data more understandable for myself!
Question Rand: Do you make note of URLs you don't have data on to crawl in future updates? A lot of my client's sites aren't in your index yet, makes it rather hard to use Linkscape regularly :)
I'm going to have to have a chat with the higher-ups at work with nice budgets. These tools look amazing!
Here's a simple suggestion for the mozbar - add a friggen link to /blog and /ugc in the dropdown list under my username.
That's a good point actually - it's easy to forget about the static links but I actually use the link to the Q&A section all the time.
That's what I end up using too (dashboard page takes too long to load IMO) but it's still an extra click on the site when I really wanted to get to /blog or /ugc.
Congratulations on the update! I look forward to using the tools and when the new Moz Toolbar comes out for FireFox 3.5.
Wow, that is impressive. I cancelled my Pro account back when I wasn't getting much out of that tool. It might be time to return!
All this data is making me hungry.
If I could choose PRO over feeding needy children - I'd so be down with it.
Wow, exciting stuff guys! Sounds like the SEOmoz team has been busy. Can't wait to see what else you have up your sleeves.
WOW! This tool is mind boggling with the data it returns. Congratulations on being a pioneer in this industry.
It would also be good to dive more into the subject of domain-names. Then I would not have to use different tools for that matter but could stay on seomoz for all my needs : )
One thing I noticed about Linkscape and the seoMoz Toolbar is the inconsistancy with SSL. It happens with a lot of the tools here.
I was just testing out Linkscape again and ran two basic reports.
With out posting the actual domains this is what I got:
https://www.domain-a.com shows it redirects to https://www.domain-a.com/store and prompted me to see if I wanted to run the report on that URL instead.
https://www.domain-b.com is a redirect but it doesn't show that. It redirects to https://www.domain-b.com but gives me zero ability to check it.
For these tools to work as accurately as possible there needs to be support for SSL.
From google:
Results 1 - 10 of about 417,000,000 for inurl:https://
Hopefully 417 million is reason enough for some upgrades :)
Yeah - we've been thinking about ways to deal with https URLs. It's challenging for a number of reasonss - many sites don't like those URLs crawled, many block access, many have canonicalization issues around it. It's certainly something we want to address in the future, though - comprehensiveness of the index is a major goal.
I always like to approach problems with logical solutions.
We have two ways to look at this subject.
1. (My personal view) The only major difference between http and https urls is the SSL. When indexing the internet it is up to the owners of websites to ensure the https versions of their websites are set up properly.
It's up to them to block indexing if they would like to. It is up to them to ensure canonicalization issues do not arise.
I know I do, it is just proper practice to do so.
2. Treat https as some thing vastly different and let whomever is doing the indexing to compensate for domain owners.
This however doesn't seem like the route to go. If a website has pages blocked or having canonicalization issues then the tools should report it, this would help fix these issues.
PS. I like your new profile avatar, very just got back from a modeling gig in italy.
Nick's visualizer tool is amazing indeed!
Yes this is one of my fav new tools also. Seeing the domains linking compared to external links is powerful ;)
Those new visual tools for important data are much needed. I really struggle explaining SEO and the link data to clients - any extra help from nice graphs etc is going to help.
I agree SEO-Doctor. Explaining to clients the data and metrics used to improve their rankings is much more difficult that doing the work. Good graphics that can help convey the concept to non seo experts are invaluable.
Rand, a comment on how to address, "Showing historical link information (this one is especially challenging because of index and web size fluctuations)"
You could maintain a bar chat to show this data broken into three basic pieces:
1. Good clean Linkscape lovin links. These are the ones that you track and that most likely matter in the scheme of website rankings.
2. Site-side changes. 404s mostly, but maybe some other server errors and what not as well.
3. Spammy links and what have you. Things that you've scrubbed out of the index.
Do this and you could see trends in how changing a website has affected their business, increases or decreases in a % of spam, while also tracking overall, quality links.
The issue isn't so much with displaying, but the fluctuating on index sizes and focii. For example, in Yahoo! Site Explorer (or Google Webmaster Tools), if you keep track of that topline number of external links over time, it looks ridicluous. It will bounce from 100K to 40K to 250K and back. Obviously, you're not gaining and losing that many links every few days/weeks, but their indices are constantly modifying what URLs stay in vs. fall out and thus the counts are practically useless unless you have some yardstick to compare it to and understand what types of things have changed. Unlike visitors, there's no clean methodology for showing the data in an applicable way, so we have to create one...
Agreed that Google and Yahoo's numbers fluctuate quite a bit, but the idea of using a stacked bar chart would be to qualify it beyond the top line. Like so:
https://web3.twitpic.com/img/17425729-57030a6cea90038d72dabf378df490c9.4a5d0e8c-full.gif
In that loose example Solid could represent links that meet a few qualifiers or are above a certain mozRank passed, or whatever, Broken can be links that point at broken or missing on-site pages, and Spammyish can be the stuff that is still in the fluctuation category.
The main premise would be to visually represent the type of links to acquire or create with the rest of the the bars pointing out areas that could use improvement but aren't nearly as precise.
https://www.twitpic.com/adhs1
Better link
That's a pretty good idea - still not easy by any means, but great thinking :-)
If you ever have more thoughts you want to share, the team would love to hear them - just email [email protected].
this stuff looks great. ill have to bother my boss about buying a pro membership
I wonder if I should say "Yeah I'm third" or do a real comment. Well I go for the 2nd option.
I'm quite impressed on the new twist you and your team are making Rand, this new way of showing metrics will be "the buzz" on the SEO industry.
Although I wonder if there would be some sort of order among links that bring more benefits than others (relevancy, PR, popularity, you name it)
And well I just got my Firefox mozzed up, first time I hear about the mozbar. Good luck all, cheers.
*edit: darn, I'm not second anymore.
New know I love all the upcoming stuff. Looking forward to seeing the toolbar update mainly because that's coming first - I use it heavily every day. Nice work :)
Holy lots of data Batman!
Now all that's needed is a widget front-end with daily/weekly trending, and other small bits of condensed information. Too much information is overkill for us. Sometimes simple, more intuitive "snippets" is all we can afford to spend time on digesting lately.
It seems that not going Pro with seomoz at this point is just plain silly if you're at all serious about SEO.
I'll make the leap in the next couple of weeks. It'll easily pay for itself many times over each month with so many data hungry clients.
I like to think of the membership fees at SEOmoz and other trade sites that provide value as the rent I pay for taking up space in this industry. If you are going to live here, you out to share in the rent.
"Top pages tool" is a real winner. I must sign up to the PRO account; the amount of data here is quite remarkable.
If my employer understood the importance of SEO in real terms rather than what they understand by SEO in their own minds, and the tremendous affect it can have on a business it they would be buying me “SEOmoz PRO” to assist the business and knowledge base. Oh well, they will never learn.
They will learn because you will teach them...
*creepy SEOmoz-ish laugh*
It is like banging my head against a wall. Money is too precious to the wigs than understanding that for a small amount of money; more can be achieved with rather than without.
As I always say:
"Dont wait for the rain to be over, its all about learning to dance in the rain !!!".
Its raining and I am still waiting ! NIGHTMARE !!!!!!
That was a booboo !
"Dont wait for the STORM to be over, its all about learning to dance in the rain !!!".
Thats better.
I just simply love mozbar, so excited there is an upgrade incoming
Great Job :D
The SEOmoz tools are getting so good! I'm really looking forward to seeing the c-block data. I have been looking for a tool that can give me that information.
I'm sure glad I locked in my SEOmoz pro membership at a very good rate. I have an email that says I can renew at that rate indefinitely, and I'm storing it in a very safe place.
Great to see a new update for the mozbar, but can we get something cool, like an option to flag a domain for crawling within the bar, and then a countdown that shows when new fresh data will be displayed for the page/domain we are viewing.
So are the toolbars only for firefox? Or is there a stand alone app I can use. I have been toying with the idea of going Pro, but I hate firefox. It's too much of a resource hog for me...
Yeah, it's only for firefox for the moment although Rob recently wrote a post on how to get SEO "plugins" working for chrome and it runs off some of the pro tools which is pretty awesome!
https://www.distilled.co.uk/blog/reputation-monitor/google-chrome-plugins-for-seo/
Yeah I use Chrome it's lightweight and perfect for my laptop. I'll def. look into it, I am pretty sure I'm gonna go Pro soon it's just a matter of time now heh...
Most Developers I know use Firefox...
I would appreciate it if there was a plugin for Safari nevertheless as it is the browser that I am using most of the time. Any chance on this coming soon?
We don't have plans for a Safari plug-in right now, but it's something we'll definitely consider as demand rises. We might also look at a solution like StumbleUpon's recent ability to use show data in the window as you surf.
Yeah I have never really liked firefox, never liked the feel of it, and it's a major resource hog, also has crashed on me countless times... To this day Chrome has never crashed *knocks on wood, crosses fingers, etc.*
Everybody should just make plug-ins for Chrome and make me the happiest person ever. =)
Rand I am sorry if this is a stupid question, but I do beleive it 's a real question for many users...
Personally I just like using linkscape better 9 times out of 10, but what is the real benefit of using linkscape over majestic seo ?
For my humble uses the benefits that I like are
- linkscape seems to generally have a more up to date index
- filters out alot of crap, but could still filter out more :-)
- Is quick, and tends to provide me really good data for North American Sites that I work with.
Bottomline is, are there any gaps that a system like Majestic Seo fills right now that are of actionable use to an SEO ?
Having a fair larger index is clearly of massive benefit, but if the index is outdated and the results returned to the user need huge amounts of time to filter correctly how actionable is that data of unknow quality in a real world work enviroment...
This is just one that has been bugging me for a bit and I thought maybe this post would be the best time and place to ask.
Thanks in advance.
Marc
I can only speak for our strategy: We want to make the kind of data available from Linkscape available in a variety of contexts. More and more we're providing actionable insights and suggestions, rather than raw data. This is a commitment as difficult to deliver on as crawling and indexing the web.
So we think we provide a lot of technology and expertise at SEOmoz that you can't get elsewhere.
That certainly works for me....
You guys continually impress me. These are amazing tools. If you're going to be coming down here to Panama any time soon, hit me up.
By the way, I STILL haven't gotten that first disk of the training series I ordered that broke on the way? What do I need to do to get that sent again?
Thanks!
Mackenze
I spend a couple years out of the SEO world, return, and find cool tools like this. Great job.
Great tools.
The only thing that worries me is that for some websites the MozRank and PageRank differs greatly. These are Dutch government related websites and other well known Dutch institutions (that are not engaged is SEO practices whatsoever).
I don't know but a MozRank 4.5 for a highly trusted PageRank 8 website is a bit strange don't you think?
Hi bertifuel,
Can you share the sites here? If not, email [email protected] - I'm sure they'd be interested to hear your experience.
In my experience this kind of thing happens most often when you're looking at a non-canonical version of a page. e.g.
Google might show the same toolbar PR for these URLs:
www.domain.com
www.domain.com/index.php
While linkscape will show separate URL mozrank for them (and usually the index.php will be lower) so check that you're looking at the strongest page
This is all so incredible. Wish I could give this post ten thumbs up. The mozbar is a daily part of my life online...really looking forward to the update!
It's time for taking an account then....amazing additions....I wish the price would have been bit lower...