The Linkscape web index updated this past Wednesday, December 8. You can see new link data for sites and pages via Open Site Explorer, the mozBar, the Web App and Linkscape Classic. This is also one of our freshest and fastest index updates (from the prior update on November 16, it's only been 22 days), so the links you'll see were crawled at the end of November and processing occurred in the beginning of December.
This index is slightly smaller than others, but oddly, it appears to be due to some extra decay on the web; we've requested as many pages as normal but just haven't seen as many still alive. Perhaps the holidays are the time to put old sites on ice, as last year appeared to show a somewhat similar pattern (tough to confirm whether this is Linkscape's crawling behavior vs. a true representation of the web, though).
Index 34's stats are:
- 39,821,634,471 (39.8 Billion) Pages
- 420,451,251 (420 Million) Subdomains
- 104,307,322 (104 Million) Root Domains
- 387,736,255,184 (388 Billion) Links
- 2.08% of All Links are Nofollowed (down 0.02% from November)
- 57.66% are internal (up from 56.99% in November)
- 42.34% are external (down from 43.01% in November)
- 5.91% of pages have rel=canonical (up from 5.88% in November)
- 62.11 links/page on average (down from 62.28 in October)
Since it's the end of the year, I thought it would be interesting to take a look back at some of the Linkscape data from the beginning of 2010 up through this latest update. Just remember that this comes from Linkscape's indices only, so the entire web is represented, though a substantive portion is included (many tens of billions of pages per index).
Rel=Canonical Use in 2010
Links / Page in 2010
Nofollow Distribution in 2010
Quantity of Subdomains per Root Domain in 2010
Average Percent of Links to a Site that Came from the Same C-Block of IP Addresses in 2010
While we're excited with how far Linkscape has come in 2010, there's a lot more progress to be made and a lot of effort going into it. In 2011, expect to see even faster updates, recursive crawling (meaning new pages on the web go into indices at an increased clip), much larger indices (25-50%+ larger) to reach deep down into those corners of the web we miss today, and more views on the data.
In the meantime, we invite you to use the tools above and the Awesome FREE SEOmoz API, and tell us what else you'd like to see in the future.
Finally, SEOmoz PRO is getting nominations for two categories of TechCrunch's Crunchies awards. It would mean a lot to us if you'd drop by and support our nod for Best Internet Application and Best Technology Achievement. TC takes one vote/day until midnight on December 24, so if you feel inclined, feel free to vote daily. Thanks!
Good luck in the Crunchies (you got my vote)!
Side note: Speaking of crunchies, I ate an entire box of "Cap'n Crunch Oops! All Berries" last week. Why didn't they think of this sooner? :)
hahaha this just made my morning. :)
Thanks for the quick update. I would love to see this each month - esp nice for young sites.
i totally agree. would be nice to have regular updates so we can see how the new sites that we manage are performing. thanks for the quick update by the way.
Will definitely be voting for you guys at TechCrunch!
I agree!
Thanks for sharing the datas & voted :-)
"tell us what else you'd like to see in the future"
API Requests:
Keep up the good work :o)
Would it be possible to include link age in the linkscape data. Maybe you could graph new links acquired over time by domain? This could be a very powerful tool in the SEO armory.
Good work on the update. Any theories on those spikes on the Rel=Canonical and links/page graphs?
My guess is that we crawled some weirder areas of the web that were biased toward those metrics, and smoothed out over time. I doubt a substantive percent of pages actually threw them up and then took them off (but hard to say for sure).
I was wondering why people kept tweeting me to vote. I thought you could only vote once. Snap! All those missed opportunity days!
I'm going to go vote now and I'll bookmark it so I can vote daily.
Hi Rand and SEOmoz folks!
As far I remember, this is the first time you shown stats about C-Blocks... Am I right?
Do you have plans to show this info (# of backlinks from a C-Block) in the OSE report? I think that data about anchor text + C-block IPs are really great to understand why some URLs are taking good or bad rankings.
That data is already in the API and should make its way into the web app very soon, too. There's an update to OSE coming in March/April and at that time, you'll see the c-block metrics there, as well.
Great news Rand. Look forward to the update in the Spring!
Thanks Rand, just voted!
seomoz is the best, with your toolbar or telling with your mozbot, we are finding the ways how to make our sites better in the seo world.
thanks too much!
Hi
I noticed recently that some key links are missing in OSE when I looked at a few competitor sites e.g. links from Yahoo and DMOZ that I recall were there a good few months ago but seem to have disappeared from OSE. On checking in Yahoo Site Explorer the links were indeed still listed.
Also, on one of my own sites there is a least one key link, which I have had for over 6 months, which is not listed on OSE. It does show up on Y Site Explorer and G Webmaster.
I'm afraid I haven't had time to quantify this fully so it could just be some kind of blip or one-off. It would be useful if anyone experienced the same thing or could help me with this one since OSE is my preferred tool if I can be confident of the information.
Thanks
I'm assuming the ~4 subdomains per root domain is thrown off by the Blogspots, Tumblrs and Wordpress.coms of the world?
What you think in remove those kind of domains from this stats and show us a new graph?
Eg: wordpress.com, blogspot.com, tumblr.com, etc.
Yeah - sorta tough to pull out sites manually or even programmatically prior to making these calculations, but I agree those could be inflating the count, we just don't know by how much.
I have some new ideas for tools using the API that I am working on with a buddy. I'll let you know what we come up with. This is truly a new look at the web that we haven't ever seen before!
Wow, Really impressed that there was only 22 days between updates. Go SEOmoz!!
thanks! our dev team is moving toward shrinking that even more!
This Index update is very useful for us...The only one point i hope you'll fix ASAP is the quality of links crawled in foreign language...
Currently my team and i think the anchor report is not always relavant for our client website (french website) but when we're working on english websites we have very very relevant links stuff...
Maybe in the future we'll have a filter by language or country inside OSE or linkscape? Another point is the difference between uppercase and lowercase of the same anchor text...For example for linkscape "Chaussure" and "chaussure" are not the same anchor text...To me this detail level is not necessary, it's better to have the number of links for [chaussure+Chaussure]
Anyway OSE and Linkscape are my favorite links tools
Totally hear you.
Index size is an issue and right now, crawl selection biases toward English-language results. In March or April, we'll be releasing our first version of a much larger web crawl (could be later if testing reveals flaws), and you should see substantially more coverage across International sites/domains.
Interesting to see the decline in the use of no-follow on internal pages. Do you think this was in response to insights released by the engines around page rank sculpting?
You deserve a Tech Crunch Award! I'll try and vote as often as possible before Christmas.
thanks Rand for the quick update. See you Tuesday :)
Looks like we will have weekly updates soon :-)