You may recall Rand's recent post about prioritizing the best pages to crawl, and mine about churn in the web. We've applied some of the principles from these posts to our own crawling and indexing. Rand discussed how crawlers might discover good content on a domain by selecting well-linked-to entry points:
In the past, we've selected pages to crawl based purely on mozRank. That turned out to favor some unsavory elements (you know who you are :P). Now, we look at each domain and determine how authoritative it is. From there we select pages using the principle illustrated above: Highly linked-to pages—the homepage, category pages, important pieces of deep content—link to other important pages we should crawl. From intuition and experience we believe this gives the right behavior to crawl like a search engine would.
In a past post, I discussed the importance of fresh data. After all, if 25% of pages on the web disappear after one month, data collected two or more months ago just isn't actionable. From now on, we're focusing on that first bar in the graph above. By the time our data approaches that second bar (meaning most of it is out of date), we should have an index update for you. If and when we show you historical data, we'll mark it as such.
What this means for you is that all our tools powered by Linkscape will provide fresher, more relevant data, and we'll have better coverage than ever. This includes things like:
As well as products and tools developed outside SEOmoz using either the free or paid API: There are plenty more. In fact, you could build one too!
Because I know how much everyone likes numbers, here are some stats from our latest index:
- URLs: 43,813,674,337
- Subdomains: 251,428,688
- Root Domains: 69,881,887
- Links: 9,204,328,536,611
Our next update is scheduled for March 11. But we'll update the index before then if the data is ready early :)
As always, keep the feedback coming. With our own toolset relying on this data, and dozens of partners using our API to develop their own applications, it's critical that we hear what you guys think.
NOTE: we're still updating the top 500 list at the moment. We'll tweet when that's ready.
it all began because of the free linkscape API...
im about th release my SEO plugin for Google chrome...
to get the beta go to https://j.mp/chromeSEO
it features enhanced yahoo site explorer with Linkscape data... https://twitpic.com/13foaj
enhanced Google webmaster tools with Linkscape data... https://twitpic.com/13evmw
social media info about a page in Google analytics https://twitpic.com/13mss3
plus a ton of other cool features to analyze on-page seo and such
the keyword and suggestion sections are still a work in progress but id love feedback... im @cartercole
youMoz post with all the features and official release will follow soon!
I've tried this extension, and it's pretty slick :)
You're a great guy!
Thank... I'll go to check out the Beta and I'll tell if I find problems/issues/bugs
Just checked out your chromeSEO extension. Awesome job!
Lots of info, and a great UI! A+
Thanks for the update! You guys rock.
I've checked OSE for some UK sites, and I am really glad to see the index has much better coverage now. (some sites used to have 0 domain authority and they are now having proper figures)
I'm sure Ben will be very pleased to hear it. I'll pass the comment along.
Really great work guys! Love me some fresh data.
Way to go Nick. Linkscape (and it's step... sister? brother? cousin? Open Site Explorer)are getting more and more reliable. It's tools like this that made me bite the bullet and become a Pro member.
Thanks for all the sweat y'all put into it.
I should definitely agree on that one, but the thing I appreciate most is that you guys just won't sit back and rest on the numerous laurels ypu have gathered over the past, but keep working to improve the services you offer. Thanks!
I'm furthermore really looking forward to the updated Top 500 list, so we can see which company has had the edge over others during the past period.
The new top 500 is up! Enjoy :-)
Thank you for all the support :)
great...
I agree, Opensite explorer was also a big factor in my upgrading to the pro level. Its a great tool and saves so much time. It was an easy choice.
It's awsome that you guys just refreshed the data. Some of my websites were not indexed in the last one but now it's good to see some fresh data in linkscape & opensiteexplorer
We've been hearing this a lot lately :)
We're all really proud of work that went into the latest index, especially the work done by Chas.
From the home screen of the Linkscape tool, would be great to see the date of the last index update
I have a hard time knowing when I need to run my reports again. I usually just wait for a post like this, but sometimes I miss them.
That's a good suggestion. We're tossing around the idea of having a calendar widget of some sort. What do you think?
I'm not sure if this is common, but I often find websites that look very spammy and unauthoritative near the top of the list when analyzing competitors backlinks in OSE. It's sorted by Page Authority, too.
If you have examples, send them on to us (PM, twitter @gerner, whatever)
Domain authority of 86 - www.image#rent%a&car.com/blog/ - remove the #%&
Give me a break lol
I have the same results. A lot of fishing sites, casino, and still some poker sites are coming across as high authority for my competitiors, but it seems like the sites are just selling backlinks to anyone with a buck.
Glad to see the more frequent updates! Keep up the good work.
What do you think will be the shortest amount of time for the linkscape index to be updated in the future? How close to real-time do you think you will be able to get?
The linkscape tools look to be spitting out much more relevant data with these new update/improvements.
Good question! Our best near-term goal is around once every 2 or 3 weeks. We're not quite there yet, but soon we hope!
At the moment the biggest bottle-neck is getting fresh raw crawl data. There's only 604,800 seconds in a week and sucking many pages per second sustained from a single server isn't an option for most crawlers.
We can update more frequently, but in terms of freshness it's likely we'll always have data that is two or three weeks old, and some data that's older even than that.
Hey Nick, wasn't the linkscape update supposed to be rolled out on the 21st? (if my memory serves me correctly)
If so, way to get the linkscape update out earlier than expected!
One heck of a job! :P
Yes! We did update well ahead of schedule :) Like I mentioned above, if we get the data ready sooner, we'll push it out early.
Thanks for noticing. This is largely due to hard work by Chas on improving the reliability of our processes.
Great to see the improvements and ready to have fun with them.
I think that freshness is really something we all need as more and more the "speed" of the Internet is getting higher. And especially is needed in terms of cleaning the web rubbish.
Linkscape and Open Site Explorer are getting also more and more a great marketing instrument in order to dissolve the last pre-contractual doubts many clients have... especially when I use Linkscape's feeded tools as the Lab's Linkscape Visualization and Comparison.
These improvements will make it even a stronger marketing tool (apart a great tool to work with).
Just imagining the great idea behind the next new member of the SEOmoz tool that will come next this year (yes, it's a petition of spoilers).
This is great news. I've got a lot of faith in the applications that SEOmoz are producing... They're only getting better. I have a massive feeling that SEOmoz are going to become the standard for SEOs around the world... They're already on the way.
Great to see the updates and improvements put into place. These tools are going to help my Seo team a hell of a lot more.
That's darn-good news Nick. Thank you! Now I will have to spend countless hours checking site data! :P
Cheers!
Open Site Explorer is a great tool. Keep up the updates coming SEOmoz!
Great stuff you guys!
I was actually wondering when the next update would be and here it is now. Its also great to know you guys are working more on excluding spammy sites, this will definitely help better with analyzing link juice and all.
More grease Seomoz, more grease
ignore my previous post, I misread it, looking forward to seeing opensiteexplorer being updated
Is there a history of how many links a page had, history of linkscape?
We don't have a tool that does that today. But you could certainly build your own using our API. Every month or couple of weeks you could pull the links you've got at that moment and build the history up over time.
This is certainly something we're hard at work on though :)
Do I get it right - right now if I have not documented it, I can't know how many links I had 5 months ago, right? Than how does https://www.majesticseo.com/ has all the data?! Can you please explain in details? Because right now I'm really willing to build such tool.
The challenge with historical link data is to illustrate meaningful change over time. Because so much of what's out there is junk (temp redirects, spam, session ids, etc.) it's easy to get a spike that doesn't mean anything. Watching changes in the top links over time might be more valuable. So grabbing what you can get from the free API or OSE and comparing that might work.
If you build it, more power to you :) Let me know about it too!
This would be a very helpful, especially to beginners like myself. Thanks for the update.