If you've been following my posts on Linkscape's index, you know that we've been trying to aim for fresher, better and larger indices over the past few months, but have been finding some very tough challenges. It turns out that indexing the web, canonicalizing millions of pages and calculating a link graph with quality metrics is super-hard; who knew? :-)
As part of those efforts, we've been working toward an experimental index that leverages a more search-engine style crawler that crawls fresher pages/sites more often and less fresh stuff less frequently. That index, however, is taking its sweet time (and we're doing a lot of babysitting and monitoring to make sure it's smooth). Our tentative plan is to have that index launched in the next 2 weeks, but we felt that since our last index was at the very end of November, a new one with fresher data was warranted. Hence, last night, we launched an interim index with the following metrics:
- 36,660,519,013 (36 billion) URLs
- 427,626,242 (427 million) Subdomains
- 128,149,029 (128 million) Root Domains
- 387,656,119,262 (387 billion) Links
-
Followed vs. Nofollowed
- 2.05% of all links found were nofollowed
- 55.00% of nofollowed links are internal, 45.00% are external
- Rel Canonical - 10.57% of all pages now employ a rel=canonical tag
-
The average page has 69.12 links on it (negligible from last index)
- 57.76 internal links on average
- 11.36 external links on average
This index is smalller than our last few, but the numbers look reasonably solid and the data's from the first few weeks of December, so it should be helpful to all you link builders and analyzers. Do be aware, though, that this update is likely to only last a couple weeks before we replace it with our new version, for which we have high expectations (but don't want to promise the moon just yet).
Also noteworthy - last night, when the index first launched, we experienced some wackiness with Page and Domain Authority scores. Those should have largely settled down to normalcy now, but if you see anything odd, please let us know.
Wait, so tracking & analyzing trillions of pages & links is difficult? ;)
Yesterday morning was fun; I spent an hour writing a panicky email to clients warning them that their huge declines in domain authority were likely a bug, and the fact that Google, Facebook, Amazon and a number of other big players had all dropped from 100 to the mid 80's might be indicative of something squiffy!
EDIT: Rand, you didn't just get PA and DA the wrong way around, like with the data on the Agency pricing infographic, did you? :P
Ha! Thankfully, we have the real geniuses working on that stuff :-)
"we experienced some wackiness with Page and Domain Authority scores"
Yesterday I was quite surprised to see PA100 for one of our clients' homepage. :-) This morning I checked again and it went back to 83 (which is normal). It would have been a nice change to start 2012 with. :-D
Good work! Thanks for always working to improve the tools for all of us!
Quick question Rand...
You mention an additional update to follow soon, is that expected before the one of the calendar here: https://seomoz.zendesk.com/entries/345964-linkscape-update-schedule - 1st Feb?
Cheers :)
Mike.
Our Domain Authority dropped by 1 point overnight, am I doing something wrong?
Almost certainly not :-) When the index rolled out, we did some re-factoring of DA/PA, which is likely the cause of change. Even if your DA went down from a previous index, you should compare against your competitors, as DA/PA are scaled to fit the whole web's metrics, and thus link "inflation" or refitting of the algo against Google's rankings can change things from index to index.
We appreciate this last update before you implement the new index. It certainly will make the transition to the new index much smoother.
Thanks again, we are waiting the the new implmentation with a lot of anticipation. Bring it on.
I'm puzzled as to why Likscape says we have 14 backlinks while Google Webmaster Tools says we have over 500.
Moz crawls between 30-60% of what Google does at any given time (this index is smaller, so likely closer to that 30% size). With the next index, we hope to be more like 60%. That said, we do tend to crawl high quality stuff very consistently. The other explanation is that you've acquired those links in the last couple weeks, while this past index was processing (after the crawl portion stopped). I'd look at the new index in a couple weeks again to see if we catch more of the links you see in GWMT.
HI Rand,
There is an issue also with your API not returning the Page & Domain Authority. I checked it again today and still the same.
I do not get the
Page Authority - upa Domain Authority - pda
counts anymore for a couple of weeks in the lsapi.seomoz.com/linkscape/url-metrics/... API call.
They are not returned broken or anything ... they are not returned at all
Tks
Hey Ravang,
Thanks for letting us know! I haven't heard of users having any issues, but I'll have the team check it out. It would be awesome if you could send me the URL you are querying the API with - feel free to send over to [email protected] and I can track down what is going on!
Thanks!
Carin
HI Carin,
I sent you the mail with all the details. This is very frustrating because we had a ticket open for a week now and there is no complete answer and our app is having issues because of this BUG on your side.
I hope i will get a quick response and sollution to this.
Tks
Hey there!
Just wanted to add a quick response in case any other users are seeing this issue. The metrics can be queried directly - here is a link to a forum post Ravang was able to dig up:
https://seomoz.zendesk.com/entries/20129156-return-data-is-missing-fields
The engineers are confirming whether or not this metric is included in the default returned metrics for the URLMetrics call.
Thanks!
Carin
Hey Ravang,
Just about to respond to your email, but thought I would also post here to clarify for any other curious readers :)
I think there is a bit of confusion regarding automatically returned metrics for URLMetrics. In our API Wiki, the URLMetrics page lays out a table indicating the Free API calls. These calls are automatically returned except Page Authority and Domain Authority. Page Authority was missing from this documentation so I just added it for clarification.
From your email, it looks like you were hoping to return PA, DA and all external links (page to page). Unfortunately, these need to be specifically defined in the cols parameter as PA and DA are not inclued in the default metrics and all external links (page to page) is a paid Site Intelligence API call.
I hope this helps!
Carin
Very worried when we checked our DA scores last week and saw drops across the board, very relieved when the scores bounced back a little over an hour later. Looking forward to the next index update to see the results from a more search engine orientated crawl style.
Common problem. Even my client's sites had that problem.
[email protected]
Question ....why the long time between updates. Another service I use updates their index real-time or close to it.
Thanks for the update. I was doing a report of inbound links for a client and was seeing some exciting numbers, it's less exciting now but either way the client will be thrilled to see the progress, even if it's not as epic as it appeared the other day :)
Hey Guys :)
Any reply to my previous question? - About 3 questions up.
Thanks,
Mike.
Hey Mike!
We are hoping to have an index ready before 2/1, but at the very least, we will definitely have the regularly scheduled 4 week update. We are working on some good stuff and, if all goes well, we'll be able to launch in a few weeks when it's ready!
Thanks!
Carin
Hi Carin, okay cool... thanks for your reply!
Looking forward to seeing what magic you guys are going to cook up next!
Mike.
Nice job done there :) keep these coming
Thanks for the update, we too noticed a few pages yesterday with page authorities of 100. Glad to hear you guys are on it. Thanks for all the work you guys do. We really appreciate it.
Sweet, thanks for the update.
I didn't expect to see that only 10.57% have rel=canonical as it can fix a whole handful of problems.
This is very nice and informative update and author can explain each and every thing very nicely thanks for sharing.
Yay fresh data! 2 quick questions:
Could it be, that there are no seomoz links in the current update? And who's idea was it to link to the cute puppy dog if something doesn't work ? ;-)
Just spent a couple of hours going through this - looks like we have the biggest mismatch between OSE and Webmaster Tools data that I've seen up until now, although it does seem to be smaller, more local (ie. UK) links that are missing.
Looking forward to the next update - sounds like that might be the key to showing older links consistently while still being able to crawl an increased amount of fresh pages. Good luck! :)
Thanks for the heads up - we'll watch index metrics carefully to make sure we're getting a good distribution of regions in the upcoming index, too. Cheers
wow, thats 3,141,980 new domains in 2 months! i hope there's a way to segment the index to only show sites relevant to the projects we are working on.
I saw a large swing from one company in total links go from the 30,00's down to the 700's. I don't know if this is due to a poor amount of links or the index adjustments. Anyone else seeing this type of a gap?
That's pretty extreme... Can you email [email protected] and we'll forward on to the Linkscape team to have a peek. The next index may help with that count, but 30K to 700 is too massive to be explained purely by the slightly smaller index.
as long as you keep expanding and improving the international domains ie AU/UK i'm happy :)
Yeah - we've been focusing on crawling more and more domains (even if it's only a few pages from very tiny/not-so-well-linked-to sites), as we want to make sure that we're at least including every site possible. That should mean we're doing better and better on the international front, but if there's sites you're not seeing, especially large numbers in a region, please do let us know :-)
ah yes appreciated but it would be great to have a crowd voting style list that any SEOmoz members could submit a bulk number of domains both for sites to consider crawling... the more times it's submitted the more importance or interest the members have in that domain :)
i have noticed the social elements of Tweets/Likes is massively under quoted example https://lostpr.es/
You mean in OSE, right? Actually, they're not underquoted - we're using what we think are the right numbers out of their APIs, it's just that Twitter/Facebook often show you total counts or combined counts in their own stuff (e.g. FB will say SEOmoz the brand or SEOmoz the website has 50K likes, but the URL "www.seomoz.org" only has 96 actual likes according to FB's API).
This is a tough one, because it makes our numbers look wrong, when in fact, I think it's just that FB/Twitter and some other third parties will aggregate numbers or give counts on something other than just the URL.
Suggestions on how to present this data honestly and effectively are definitely appreciated :-)
p.s. We also only grab social data once per 24 hour period, so sometimes, if a URL is being tweeted like mad, we show older data (in order to be performant, we need to cache things).
ah yes sorry I mostly just use OSE these days, I think you are right but yes I think how it attributes that likes is probably a bigger issue. I know there are a few solutions such as https://sharedcount.com/dashboard.php?urls=,https://lostpr.es that seems to be a bit more accuated and does multiple URLs ie "top pages".
It has an API and all... maybe a simple plug and play?
you re digging so deep in web....!!! :o i just want to meet your system admin... ;)