Big Data, Big Problems: 4 Major Link Indexes Compared

Comments 50

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Dixon Jones

2015-07-01T05:40:25-07:00

Hi Russ,

OK - you outsmarted me. I massively respect your ability to analyse data... so please be polite if my comments show a slight misunderstanding of the post... Hopefully I have some insight to add though.

Majestic doesn't use any of those four methodologies for assessing which links to follow next per-Se and I do think our methodology (Trust Flow) is absolutely about quality over quantity. One link can have a huge impact under Trust Flow logic and we can then use Flow Metrics to determine which pages to crawl more often than others. (That is not our only criteria, but I have some secret source to protect of course.)

It is perhaps because of this approach that we may not pick up more "link pairs" as you describe, because we know that picking up more is not as effective as spending the same crawl resource analysing other stuff. As you suggest, it is not just a numbers game. A dumb crawler (OK, all crawlers are dumb) can easily get fixated on link pairs and start seeing1,000, 2000, 10,000 link pairs between two sites, but after enough to say "that's site wide" I think all the insight has been gleaned that will be gleaned.

That means Majestic generally crawl more intelligently by being able to know what is important and what is not. That is certainly not to say we don't want to also have the largest data set... we do want to... and we are not doing so badly. In the example you gave of mlb.com the absolute link counts cited to mlb.com in Moz, Ahrefs and Majestic are:

Majestic: Links 47,869,070 from 124,299 referring domains.

Ahrefs: Links 27,000,000 from 114,000 referring domains.

Moz: Links: 92,105 (established) links from 1,167 referring domains.

So if Majestic loses on the link pairs you tested, the elephant in the room is... how come Majestic has more links? I am hopeful that we know better than most when to stop following a branch in the crawl and move on to new ground. Here I totally agree that the crawl priority is the key.

From what I understand, Moz also has a way to prioritize that is not based on quantity. I certainly admire their metrics, in spite of the size differential. I do not think we would be able to mimic their methodology even if we wanted to, in part because we do not do rank checking at all. Now that Ahrefs has started rank checking again, that may help them improve their metrics as well - but I remain fiercely proud of Trust Flow as something truly independent and unique right now.

Our objective is not to create the same index as Google's. but to produce a GOOD index, that has insightful data. This is increasingly Trust Flow and Topical Trust Flow as better indicators of quality than the link counts or pairs themselves.

Dixon (Majestic).

DixonJones edited 2015-07-01T05:41:22-07:00
13 0

Hi Russ, OK - you outsmarted me. I massively respect your ability to analyse data... so please be polite if my comments show a slight misunderstanding of the post... Hopefully I have some insight to add though. Majestic doesn't use any of those four methodologies for assessing which links to follow next per-Se and I do think our methodology (Trust Flow) is absolutely about quality over quantity. One link can have a huge impact under Trust Flow logic and we can then use Flow Metrics to determine which pages to crawl more often than others. (That is not our only criteria, but I have some secret source to protect of course.) It is perhaps because of this approach that we may not pick up more "link pairs" as you describe, because we know that picking up more is not as effective as spending the same crawl resource analysing other stuff. As you suggest, it is not just a numbers game. A dumb crawler (OK, all crawlers are dumb) can easily get fixated on link pairs and start seeing1,000, 2000, 10,000 link pairs between two sites, but after enough to say "that's site wide" I think all the insight has been gleaned that will be gleaned. That means Majestic generally crawl more intelligently by being able to know what is important and what is not. That is certainly not to say we don't want to also have the largest data set... we do want to... and we are not doing so badly. In the example you gave of mlb.com the absolute link counts cited to mlb.com in Moz, Ahrefs and Majestic are: Majestic: Links 47,869,070 from 124,299 referring domains. Ahrefs: Links 27,000,000 from 114,000 referring domains. Moz: Links: 92,105 (established) links from 1,167 referring domains. So if Majestic loses on the link pairs you tested, the elephant in the room is... how come Majestic has more links? I am hopeful that we know better than most when to stop following a branch in the crawl and move on to new ground. Here I totally agree that the crawl priority is the key. From what I understand, Moz also has a way to prioritize that is not based on quantity. I certainly admire their metrics, in spite of the size differential. I do not think we would be able to mimic their methodology even if we wanted to, in part because we do not do rank checking at all. Now that Ahrefs has started rank checking again, that may help them improve their metrics as well - but I remain fiercely proud of Trust Flow as something truly independent and unique right now. Our objective is not to create the same index as Google's. but to produce a GOOD index, that has insightful data. This is increasingly Trust Flow and Topical Trust Flow as better indicators of quality than the link counts or pairs themselves. Dixon (Majestic). 
Cancel
- HiveDigitalInc
 
 2015-07-01T08:05:32-07:00
 
 Dixon, thanks for the response! I will do my best to answer in-line so others can follow. I think it is very important people understand the limited scope of this analysis and its very specific recommendations.
 
 Quote: Majestic doesn't use any of those four methodologies for assessing which links to follow next per-Se and I do think our methodology (Trust Flow) is absolutely about quality over quantity...
 
 The examples I provided were simply for illustration. I imagine Majestic, Moz, AHrefs and SEMRush have far more sophisticated crawl prioritizations than those discussed. However, I think you and I would agree that none of these indices have an identical crawl strategy to that of Google. Over time, the larger the index, the more the crawl prioritization differences will produce different results. It isn't an indictment of the methods, or the index, or really of the quality metrics - it is specific in answering the question of which relativistic metric, if available for a URL, is most similar to that produced by Google. It is important to remember that for huge swaths of URLs, Moz, AHrefs and SEMRush will have 0 scores, which make them undifferentiated when doing analysis, making Majestic the right tool the majority of the time, just not the first one to check!
 
 Quote: So if Majestic loses on the link pairs you tested, the elephant in the room is... how come Majestic has more links?
 
 Perhaps I didn't explain in the article well enough on how the link pairs matter in the analysis. Let's say for example that I only analyzed 1 site, joe.com. According to Google Search Console, joe.com has 3 referring domains. We will call them JohnA.com, JohnB.com and JohnC.com each with 100, 200, and 300 inbound links respectively back to joe.com (A ratio of 1->2->3).
 
 Now, we line up Moz and Majestic against the Google Search Console data. Moz finds misses JohnA, but gets JohnB and JohnC because they have a smaller crawl. Moz ends up with 0 for JohnA, 100 for JohnB and 200 for JohnC (A ratio of 0->1->2). Majestic finds all 3, but they find 150 for JohnA, 250 for JohnB, 300 for JohnC, and on top of that, they found a JohnD and a JohnE with 10 and 30 links respectively (A ratio of 1.5->2.5->3->.1->.3. Moz's list is more proportionally representative, even though they clearly don't have as much data as Majestic. Majestic even hit one of the domain pairs right on the head (300 for JohnC), but that doesn't mean the index as a hole is proportional.
 
 The only case where this matters, though, is in the creation of relativistic metrics. Moz's crawl size is a small fraction of Majestics, so the majority of the time you have to use Majestic. But, if the handful of URLs you have access to are all in Moz's data set, you should rely on PA or DA.
 
 Of course, all of this also assumes that the calculation of the relativistic metrics are themselves similar to the methods used by Google. Maybe Google has wholly abandoned the PageRank model, maybe they use something else more like TrustFlow. But what does remain true is the smaller the crawl, the more likely that it will randomly produce a more proportionally representative database.
 
 Quote: From what I understand, Moz also has a way to prioritize that is not based on quantity...
 
 On the contrary, Moz's performance may be primarily based on their index size. As I showed in the tree diagram, if you only make it up the tree 2 branches, you are likely to have very similar results regardless of the climb strategy you choose. It is possible that AHrefs, Moz, Majestic and SEMRush have very very similar crawl strategies, but that index size and random noise (like what links happened to be on the homepage of reddit when you crawled them last) alone greatly distorts the end product. Once again, this is not an indictment of Majestic at all.
 
 Quote: but I remain fiercely proud of Trust Flow as something truly independent and unique right now.
 
 As you should be. If you want to get a trust metric for a large set of URLs, Majestic is the only game in town. It is a great metric produced for a larger number of URLs than any other provider. I endorse it with a big subscription and API payment every month :-)
 
 Quote: Our objective is not to create the same index as Google's. but to produce a GOOD index, that has insightful data. This is increasingly Trust Flow and Topical Trust Flow as better indicators of quality than the link counts or pairs themselves.
 
 And this is certainly something you have done. Majestic is very good, always has been and always will. I stand by my conclusion that every SEO should invest in all the major link indices as they each offer many great insights.
 
 4 0
 
 Dixon, thanks for the response! I will do my best to answer in-line so others can follow. I think it is very important people understand the limited scope of this analysis and its very specific recommendations. Quote: Majestic doesn't use any of those four methodologies for assessing which links to follow next per-Se and I do think our methodology (Trust Flow) is absolutely about quality over quantity... The examples I provided were simply for illustration. I imagine Majestic, Moz, AHrefs and SEMRush have far more sophisticated crawl prioritizations than those discussed. However, I think you and I would agree that none of these indices have an identical crawl strategy to that of Google. Over time, the larger the index, the more the crawl prioritization differences will produce different results. It isn't an indictment of the methods, or the index, or really of the quality metrics - it is specific in answering the question of which relativistic metric, if available for a URL, is most similar to that produced by Google. It is important to remember that for huge swaths of URLs, Moz, AHrefs and SEMRush will have 0 scores, which make them undifferentiated when doing analysis, making Majestic the right tool the majority of the time, just not the first one to check! Quote: So if Majestic loses on the link pairs you tested, the elephant in the room is... how come Majestic has more links? Perhaps I didn't explain in the article well enough on how the link pairs matter in the analysis. Let's say for example that I only analyzed 1 site, joe.com. According to Google Search Console, joe.com has 3 referring domains. We will call them JohnA.com, JohnB.com and JohnC.com each with 100, 200, and 300 inbound links respectively back to joe.com (A ratio of 1->2->3). Now, we line up Moz and Majestic against the Google Search Console data. Moz finds misses JohnA, but gets JohnB and JohnC because they have a smaller crawl. Moz ends up with 0 for JohnA, 100 for JohnB and 200 for JohnC (A ratio of 0->1->2). Majestic finds all 3, but they find 150 for JohnA, 250 for JohnB, 300 for JohnC, and on top of that, they found a JohnD and a JohnE with 10 and 30 links respectively (A ratio of 1.5->2.5->3->.1->.3. Moz's list is more proportionally representative, even though they clearly don't have as much data as Majestic. Majestic even hit one of the domain pairs right on the head (300 for JohnC), but that doesn't mean the index as a hole is proportional. The only case where this matters, though, is in the creation of relativistic metrics. Moz's crawl size is a small fraction of Majestics, so the majority of the time you have to use Majestic. But, if the handful of URLs you have access to are all in Moz's data set, you should rely on PA or DA. Of course, all of this also assumes that the calculation of the relativistic metrics are themselves similar to the methods used by Google. Maybe Google has wholly abandoned the PageRank model, maybe they use something else more like TrustFlow. But what does remain true is the smaller the crawl, the more likely that it will randomly produce a more proportionally representative database. Quote: From what I understand, Moz also has a way to prioritize that is not based on quantity... On the contrary, Moz's performance may be primarily based on their index size. As I showed in the tree diagram, if you only make it up the tree 2 branches, you are likely to have very similar results regardless of the climb strategy you choose. It is possible that AHrefs, Moz, Majestic and SEMRush have very very similar crawl strategies, but that index size and random noise (like what links happened to be on the homepage of reddit when you crawled them last) alone greatly distorts the end product. Once again, this is not an indictment of Majestic at all. Quote: but I remain fiercely proud of Trust Flow as something truly independent and unique right now. As you should be. If you want to get a trust metric for a large set of URLs, Majestic is the only game in town. It is a great metric produced for a larger number of URLs than any other provider. I endorse it with a big subscription and API payment every month :-) Quote: Our objective is not to create the same index as Google's. but to produce a GOOD index, that has insightful data. This is increasingly Trust Flow and Topical Trust Flow as better indicators of quality than the link counts or pairs themselves. And this is certainly something you have done. Majestic is very good, always has been and always will. I stand by my conclusion that every SEO should invest in all the major link indices as they each offer many great insights.
 Cancel
 - Dixon Jones
 
 2015-07-01T08:52:01-07:00
 
 AH! OK. So it is the overlap of absolute number of domains linking in Google's index vs the others that you are comparing. GOT IT now :) [It's the end of a hot day in England here!]
 
 (Aside: But Google does not report all the referring domains. They limit their list to 1000 referring domains.)
 
 There is another way to spin the debate of whether Big is a) Beautiful or b) hiding the wood for trees - which all comes down to the quality metric. There was a study which used the pure (mathematical) Page Rank algorithm but only on Wikipedia (because presumably the researcher didn't have enough Crays or Watson computers to use the whole web). They found that Carl Linnaeus (yeh who?) was more famous than Jesus. Clearly in the case of the Page Rank algo, having an index that was too small (just wikipedia ratherbtan the whole web) gives less accurate data than if it is larger. Both Majestic and Moz (hopefully correctly in most people's eyes) put the order the other way around.
 
 So my point is... whilst size certainly can decrease correlation in your test, it does not have to decrease quality. If the quality metric on each individual URL crosses a quality Rubicon, then at that point, size can improve understanding. You are old enough to remember our old metric, AC rank. I think it is fair to say that AC Rank did not pass the quality Rubicon.
 
 1 0
 
 AH! OK. So it is the overlap of absolute number of domains linking in Google's index vs the others that you are comparing. GOT IT now :) [It's the end of a hot day in England here!] (Aside: But Google does not report all the referring domains. They limit their list to 1000 referring domains.) There is another way to spin the debate of whether Big is a) Beautiful or b) hiding the wood for trees - which all comes down to the quality metric. There was a study which used the pure (mathematical) Page Rank algorithm but only on Wikipedia (because presumably the researcher didn't have enough Crays or Watson computers to use the whole web). They found that <a href="https://www.google.co.uk/search?q=flow+metrics+jesus+hitler+and+carl+linnaeus" rel="nofollow">Carl Linnaeus (yeh who?) was more famous than Jesus</a>. Clearly in the case of the Page Rank algo, having an index that was too small (just wikipedia ratherbtan the whole web) gives less accurate data than if it is larger. Both Majestic and Moz (hopefully correctly in most people's eyes) put the order the other way around. So my point is... whilst size certainly can decrease correlation in your test, it does not have to decrease quality. If the quality metric on each individual URL crosses a quality Rubicon, then at that point, size can improve understanding. You are old enough to remember our old metric, AC rank. I think it is fair to say that AC Rank did not pass the quality Rubicon. 
 Cancel
 - HiveDigitalInc
 
 2015-07-01T10:49:54-07:00
 
 Quote: (Aside: But Google does not report all the referring domains. They limit their list to 1000 referring domains.)
 
 Oops, I didn't mention this in the post. We only chose sites with fewer than 1000 referring domains in GSC so we knew it wasn't getting truncated!
 
 Quote: They found that Carl Linnaeus (yeh who?) was more famous than Jesus.
 
 This is absolutely true, but if Google screws up and thinks Carl Linnaeus is more important than Jesus, then I want a link from Carl Linnaeus before I want one from Jesus. #goingtohellforthatone
 
 It is entirely possible that Majestic could build a better, more accurate link graph than Google, in relation to the web as a whole, but that is a different question than building one comparable to Google.
 
 1 0
 
 Quote: (Aside: But Google does not report all the referring domains. They limit their list to 1000 referring domains.) Oops, I didn't mention this in the post. We only chose sites with fewer than 1000 referring domains in GSC so we knew it wasn't getting truncated! Quote: They found that <a href="https://www.google.co.uk/search?q=flow+metrics+jesus+hitler+and+carl+linnaeus" rel="nofollow">Carl Linnaeus (yeh who?) was more famous than Jesus</a>. This is absolutely true, but if Google screws up and thinks Carl Linnaeus is more important than Jesus, then I want a link from Carl Linnaeus before I want one from Jesus. #goingtohellforthatone It is entirely possible that Majestic could build a better, more accurate link graph than Google, in relation to the web as a whole, but that is a different question than building one comparable to Google.
 Cancel
 - Dixon Jones
 
 2015-07-02T00:19:22-07:00
 
 Quote: but if Google screws up and thinks Carl Linnaeus is more important than Jesus,
 
 No - Google didn't screw up... Stanford universtity's PageRank algorithm made the unlikely conclusion when ONLY used on Wikiedia pages. It would get it right if it crawled the whole web. That's my point. The (original) PageRank Maths NEEDS breadth to work.
 
 1 0
 
 Quote: but if Google screws up and thinks Carl Linnaeus is more important than Jesus, No - Google didn't screw up... Stanford universtity's PageRank algorithm made the unlikely conclusion when ONLY used on Wikiedia pages. It would get it right if it crawled the whole web. That's my point. The (original) PageRank Maths NEEDS breadth to work. 
 Cancel
 
 HiveDigitalInc
 
 2015-07-02T05:05:30-07:00
 
 Quote: No - Google didn't screw up... Stanford universtity's PageRank algorithm
 
 I understand this. But if Moz's corpus is more proportionally representative to Google's corpus than is Majestic's, it is likely to produce more similar results. It is more likely to produce the correct and incorrect conclusions as Google's. No one has a perfect index of the web, and if you want to produce a metric that predicts how Google will judge a particular URL, then your best bet is to start with a data set as proportionally relative to Google's as possible.
 
 HiveDigitalInc edited 2015-07-02T05:06:23-07:00
 1 0
 
 Quote: No - Google didn't screw up... Stanford universtity's PageRank algorithm I understand this. But if Moz's corpus is more proportionally representative to Google's corpus than is Majestic's, it is likely to produce more similar results. It is more likely to produce the correct and incorrect conclusions as Google's. No one has a perfect index of the web, and if you want to produce a metric that predicts how Google will judge a particular URL, then your best bet is to start with a data set as proportionally relative to Google's as possible. 
 Cancel
Torben Henke

2015-07-01T04:48:41-07:00

"All we hear about these days is big data; we almost never hear about good data."
Amen! That is probably the biggest problem in SEO right now.

Thank you for doing this analysis, it is really interesting.

But I have a question for you:

"Compare the referring domain link pairs of each data set to Google" - am I assuming correctly that you measured the total overlap? So if an Index has a linkpair that is not in the search console that would be a demotion for that index?

In that case any bigger Index must statistically have a lower relevancy. This is immanent, as the Search Console only gives us a control sample of the known links. So the link pairs are, by their very nature, limited. Any Index bigger than the amount of link pairs in the search console would be demoted. This is an inherent problem with these kind of datasets.

I have a different idea, but I do not know if it is feasible. If you could check for every Link from the Index if the linking page (the "from" page) is in the Google Cache, it would show us that Google knows that link (or not, if it isn't cached). Of course, this would still not give as any information regarding how and if Google counts these links.

Anyway, it is an interesting thought experiment: Is the proportional representation to Google Search Console data more important than sheer size?

5 0

"All we hear about these days is big data; we almost never hear about good data." Amen! That is probably the biggest problem in SEO right now. Thank you for doing this analysis, it is really interesting. But I have a question for you: "Compare the referring domain link pairs of each data set to Google" - am I assuming correctly that you measured the total overlap? So if an Index has a linkpair that is not in the search console that would be a demotion for that index? In that case any bigger Index must statistically have a lower relevancy. This is immanent, as the Search Console only gives us a control sample of the known links. So the link pairs are, by their very nature, limited. Any Index bigger than the amount of link pairs in the search console would be demoted. This is an inherent problem with these kind of datasets. I have a different idea, but I do not know if it is feasible. If you could check for every Link from the Index if the linking page (the "from" page) is in the Google Cache, it would show us that Google knows that link (or not, if it isn't cached). Of course, this would still not give as any information regarding how and if Google counts these links. Anyway, it is an interesting thought experiment: Is the proportional representation to Google Search Console data more important than sheer size? 
Cancel
- HiveDigitalInc
 
 2015-07-01T05:15:20-07:00
 
 Quote: In that case any bigger Index must statistically have a lower relevancy. This is immanent, as the Search Console only gives us a control sample of the known links. So the link pairs are, by their very nature, limited. Any Index bigger than the amount of link pairs in the search console would be demoted. This is an inherent problem with these kind of datasets.
 
 Thank you for your thoughtful critique. This was certainly something I considered and attempted to address.
 
 First, I wanted to make sure that Google's sample data provided via GSC was sufficiently large to indicate only a moderate amount of sampling. We know that Google representatives have said that everything you need to do a link cleanup is available in GSC, which indicates that the sampling can't be so stark as to miss the bad links which might be causing penalties. However, I wanted to take it a step further to be careful. Using the domain data that we did have, I was able to infer that the link graph represented by GSC is roughly 800,000,000,000 URLs in size. This puts it on par with Majestic Fresh, the largest link fresh link index. If we were looking at a much smaller sample from GSC, let's say the size of SEMRush or Moz, we would have more to worry about.
 
 Second, the technique I used pitted one link index against another to determine the number of "wins". A win occurs when the one site's link disparity is greater than anothers. This created a Price-is-Right style result where indices that had just 1 more link than the other could win. It also created an interesting scenario where small indices, like SEMRush, would nearly always have no links for a domain-pair when the other index did not have the GSC reported domain pair either. So, while SEMRush might have won a lot of head-to-heads, their cumulative wins were much lower because they tied so often.
 
 Finally, when a link index found a link that GSC did not, we assumed that there was at least 1 real link and it was a failure on Google's behalf. This helped remove some of the bias as well.
 
 I think I did a reasonably good job of addressing this particular issue, but I won't pretend it was perfect.
 
 As for your alternate experiment, you read my mind. I already began the process here with this report on the age of Google Search Console data and its cache status.
 
 4 0
 
 Quote: In that case any bigger Index must statistically have a lower relevancy. This is immanent, as the Search Console only gives us a control sample of the known links. So the link pairs are, by their very nature, limited. Any Index bigger than the amount of link pairs in the search console would be demoted. This is an inherent problem with these kind of datasets. Thank you for your thoughtful critique. This was certainly something I considered and attempted to address. First, I wanted to make sure that Google's sample data provided via GSC was sufficiently large to indicate only a moderate amount of sampling. We know that Google representatives have said that everything you need to do a link cleanup is available in GSC, which indicates that the sampling can't be so stark as to miss the bad links which might be causing penalties. However, I wanted to take it a step further to be careful. Using the domain data that we did have, I was able to infer that the link graph represented by GSC is roughly 800,000,000,000 URLs in size. This puts it on par with Majestic Fresh, the largest link fresh link index. If we were looking at a much smaller sample from GSC, let's say the size of SEMRush or Moz, we would have more to worry about. Second, the technique I used pitted one link index against another to determine the number of "wins". A win occurs when the one site's link disparity is greater than anothers. This created a Price-is-Right style result where indices that had just 1 more link than the other could win. It also created an interesting scenario where small indices, like SEMRush, would nearly always have no links for a domain-pair when the other index did not have the GSC reported domain pair either. So, while SEMRush might have won a lot of head-to-heads, their cumulative wins were much lower because they tied so often. Finally, when a link index found a link that GSC did not, we assumed that there was at least 1 real link and it was a failure on Google's behalf. This helped remove some of the bias as well. I think I did a reasonably good job of addressing this particular issue, but I won't pretend it was perfect. As for your alternate experiment, you read my mind. I already began the process here with this report on the age of <a href="https://angular.marketing/2015/06/22/google-search-console-delays-how-old-is-the-data/" rel="nofollow">Google Search Console data and its cache status</a>.
 Cancel
Nauman Zia Butt

2015-07-01T14:53:31-07:00

This issue is also one of the Google new update called "Quality Update". After all that all we need is to get natural organic links instead of buying thousands of risky links. After reading the post what we understand is that, Fashion of buying thosands of messy links is past away.

Thank for the post

4 0

This issue is also one of the Google new update called "Quality Update". After all that all we need is to get natural organic links instead of buying thousands of risky links. After reading the post what we understand is that, Fashion of buying thosands of messy links is past away. Thank for the post 
Cancel
Bob van Biezen

2015-07-01T00:58:45-07:00

Hi Russ,

Good point about the quantity vs quality of link indexes. To be honest, I never through about it that way. I assumed Google was at a point where they’re capable of crawling all the important and the a bit less important parts of the web. If I check WMT at any client website I see a lot of crawl activity and I can’t imagine they miss a link on these websites. Of course Google won’t be able to crawl everything, but every normal website does get crawled unless you really screw up.

Since WMT shows only a part of your backlinks can’t it be that link indexes are still so far behind on the amount of pages that Google does crawl that investing in quantity would suppress the need for quality at this point in time and space?

Besides that, I think a quality product would be a good fit for a specialist. I’m just afraid there will be a lot of folks that just make their judgement (buying decision) based on the amount of links an index returns.

Love to hear your view on this.

3 0

Hi Russ, Good point about the quantity vs quality of link indexes. To be honest, I never through about it that way. I assumed Google was at a point where they’re capable of crawling all the important and the a bit less important parts of the web. If I check WMT at any client website I see a lot of crawl activity and I can’t imagine they miss a link on these websites. Of course Google won’t be able to crawl everything, but every normal website does get crawled unless you really screw up. Since WMT shows only a part of your backlinks can’t it be that link indexes are still so far behind on the amount of pages that Google does crawl that investing in quantity would suppress the need for quality at this point in time and space? Besides that, I think a quality product would be a good fit for a specialist. I’m just afraid there will be a lot of folks that just make their judgement (buying decision) based on the amount of links an index returns. Love to hear your view on this.
Cancel
- HiveDigitalInc
 
 2015-07-01T02:16:33-07:00
 
 Certainly one of the assumptions i had to make was that GWT data was reoresentative of Google's data. It is possible there is a bias in that data, although detecting such a bias would prove difficult.
 
 Regarding crawl activity, Googke exhausts far more resources revisiting pages than do the link indexes. Google may visit your homepage several times a day or week while a link crawler may only once a month. Unless there are new links, that crawl activity from Google may go a long way to keep their search index fresh without impacting the link graph dramatically.
 
 1 0
 
 Certainly one of the assumptions i had to make was that GWT data was reoresentative of Google's data. It is possible there is a bias in that data, although detecting such a bias would prove difficult. Regarding crawl activity, Googke exhausts far more resources revisiting pages than do the link indexes. Google may visit your homepage several times a day or week while a link crawler may only once a month. Unless there are new links, that crawl activity from Google may go a long way to keep their search index fresh without impacting the link graph dramatically. 
 Cancel
Mustansar

2015-07-03T07:55:31-07:00

such a great piece of link analysis. And how these crowler index the web world.

3 0

such a great piece of link analysis. And how these crowler index the web world. 
Cancel
Aleh Barysevich

2015-07-03T07:33:46-07:00

Hi Russ,

Thumbs up for all the work you’ve done! I can only imagine how much time and effort it took you to analyze data received from 5 separate tools and put together this in-depth comparison :)

And though I'm sorry not to see our backlink index, WebMeUp (https://webmeup.com), included in this comparison, our team would definitely like to make some further research based on your article and see if we could contribute to all the impressive work you’ve done. Though I do have a few questions regarding the methodology used and especially the resulting chart (just like all of the guys analyzed here - Moz, Majestic, Ahrefs and SEMrush :)), I totally understand how hard it is to put all your thoughts and statistical calculations into one single "edutaining" blog post – great work!

But to let our team see if we can use the same approach in our research, and to demonstrate its statistical validity etc, could you please share the set of initial data you used for the analysis - i.e. "the root linking domain pairs and values to 100+ sites" you took from Google Search Console? (hope this won't be a problem since these are supposed to be pure numbers that carry no commercial information)

3 0

Hi Russ, Thumbs up for all the work you’ve done! I can only imagine how much time and effort it took you to analyze data received from 5 separate tools and put together this in-depth comparison :) And though I'm sorry not to see our backlink index, WebMeUp (https://webmeup.com), included in this comparison, our team would definitely like to make some further research based on your article and see if we could contribute to all the impressive work you’ve done. Though I do have a few questions regarding the methodology used and especially the resulting chart (just like all of the guys analyzed here - Moz, Majestic, Ahrefs and SEMrush :)), I totally understand how hard it is to put all your thoughts and statistical calculations into one single "edutaining" blog post – great work! But to let our team see if we can use the same approach in our research, and to demonstrate its statistical validity etc, could you please share the set of initial data you used for the analysis - i.e. "the root linking domain pairs and values to 100+ sites" you took from Google Search Console? (hope this won't be a problem since these are supposed to be pure numbers that carry no commercial information)
Cancel
- HiveDigitalInc
 
 2015-07-03T07:52:04-07:00
 
 Hi Aleh,
 
 When I first began this analysis, I actually was using WebMeUp data, but the research was put on hold for a few months and subsequently we dropped our subscription during that time. More importantly, the primary reason proportional representation matters would be for the calculation of relativistic stats like MozRank or CitationFlow, which currently (AFAIK) WebMeUp does not do. That being said, we did include SEMRush simply because we are customers for other reasons and the data was readily available. I am interested in adding WebMeUp's data now though.
 
 Unfortunately, I cannot share the actual domain pairs because that would reveal our client, customer, and internal properties list, as we were limited to domains for which we had access to Google Search Console. You are right, they don't carry commercial information, but they would disclose the companies that have worked with Angular over the years.
 
 I will try and add WebMeUp to the analysis in the next few weeks.
 
 3 0
 
 Hi Aleh, When I first began this analysis, I actually was using WebMeUp data, but the research was put on hold for a few months and subsequently we dropped our subscription during that time. More importantly, the primary reason proportional representation matters would be for the calculation of relativistic stats like MozRank or CitationFlow, which currently (AFAIK) WebMeUp does not do. That being said, we did include SEMRush simply because we are customers for other reasons and the data was readily available. I am interested in adding WebMeUp's data now though. Unfortunately, I cannot share the actual domain pairs because that would reveal our client, customer, and internal properties list, as we were limited to domains for which we had access to Google Search Console. You are right, they don't carry commercial information, but they would disclose the companies that have worked with Angular over the years. I will try and add WebMeUp to the analysis in the next few weeks.
 Cancel
 - Aleh Barysevich
 
 2015-07-06T11:02:25-07:00
 
 Dear Russ,
 
 Thanks for being interested - I’m always happy to hear from data scientists who’re eager to keep updating and expanding their research! I’ve just located your account with WebMeUp (registered under your Gmail email) and added a year of Enterprise plan to it - so that you can easily get all the data you need for the research. I’ll be thrilled to hear about the results you achieve with WebMeUp!
 
 P.S.: Please PM me if you need API access to the system to automate the process - I’m sure we can arrange that as well.
 
 Thanks again!
 
 2 0
 
 Dear Russ, Thanks for being interested - I’m always happy to hear from data scientists who’re eager to keep updating and expanding their research! I’ve just located your account with WebMeUp (registered under your Gmail email) and added a year of Enterprise plan to it - so that you can easily get all the data you need for the research. I’ll be thrilled to hear about the results you achieve with WebMeUp! P.S.: Please PM me if you need API access to the system to automate the process - I’m sure we can arrange that as well. Thanks again!
 Cancel
Roman Bębenista

2015-07-01T00:35:28-07:00

Well, Majestic Fresh surprised me... Did you use the largest type of report there?

Thanks for this comparison.

2 0

Well, Majestic Fresh surprised me... Did you use the largest type of report there? Thanks for this comparison.
Cancel
- HiveDigitalInc
 
 2015-07-01T02:09:17-07:00
 
 Majestic has the most aggressive crawler (which makes it an indispensible tool in my opinion because it can help find links even before google does), but it causes it to falk victim to the phenomenon I described above where crawl depth exacerbates differences.
 
 As for Majestic Historic, my guess is its size is having an undue influence over the regression model, but you cant really throw awya outliers when you onky have a few data points. If I log() the index size, it reducss that influence a bit, but it doesnt pull away from the regression line so much as to reconsider our conclusions.
 
 2 0
 
 Majestic has the most aggressive crawler (which makes it an indispensible tool in my opinion because it can help find links even before google does), but it causes it to falk victim to the phenomenon I described above where crawl depth exacerbates differences. As for Majestic Historic, my guess is its size is having an undue influence over the regression model, but you cant really throw awya outliers when you onky have a few data points. If I log() the index size, it reducss that influence a bit, but it doesnt pull away from the regression line so much as to reconsider our conclusions. 
 Cancel
dnpile

2015-07-02T03:30:10-07:00

One of the best blog I have been through so far. You showed a depth of SEO and specially link building. Just link building to hundreds of sites does not help in SEO you need to have quality with these as well. Thank you for writing this blog and sharing.

2 0

One of the best blog I have been through so far. You showed a depth of SEO and specially link building. Just link building to hundreds of sites does not help in SEO you need to have quality with these as well. Thank you for writing this blog and sharing.
Cancel
Lou_S

2015-07-01T14:12:59-07:00

Perfectly executed advanced SEO information! Well done sir.

2 0

Perfectly executed advanced SEO information! Well done sir.
Cancel
- HiveDigitalInc
 
 2015-07-01T15:58:06-07:00
 
 Thanks!
 
 2 0
 
 Thanks!
 Cancel
Artur Brugeman

2015-08-11T21:55:57-07:00

Thank you, Russ, for the very interesting article!

I have some questions, though.

You state that google and all other services cannot index the whole Web, so they index only a part of it, each service crawling some part different from the other, which results in mismatch in relative authorities of pages and domains btw Google and these services. Do I get it right?

If so, than I'll try to argue. True, that's impossible to index the whole Web - it is infinite given dynamic content generated by dumb scripts. However, if we are looking at relative authority calculation algorithms, what seems more important is not the quantity (do we index all the content?) but quality (do we index content that is significant in terms of our authority calculations?). Do you agree? Let's call such content 'important' (significantly affecting authority calculations).

If two services both manage to index a big part (say 80%) of important content (and given they both use same authority calculation algo), than whatever mismatch in their content coverage won't cause significant mismatch in calculated authorities.

Now, could Google cover 80% of important content? First, it's crawling strategy must be based on that very authority, otherwise it a waste of resource. Second, I'd guess that important content tends to live around 'something that has non-zero cost', and any authority algo should revolve around this concept (does it?). An additional page on a site costs 0 (that's why we can't index all the Web), but paid domain costs real money, and an IP, and a subnet. I'd once again guess there are less than 1 billion paid domain names registered. Is it realistic that google did cover them all (given it's crawling strategy should strive to do it)? I feel it is. What do you think?

Now, if important content is all around 1 billion paid domains, it seems much more realistic that all link graphing services could cover them all. If so (would be cool to hear Aleh Barysevich and Dixon Jones on this), it would mean that any mismatch in authorities is caused by algo mismatch, rather than content coverage mismatch.

Love this discussion!

2 0

Thank you, Russ, for the very interesting article! I have some questions, though. You state that google and all other services cannot index the whole Web, so they index only a part of it, each service crawling some part different from the other, which results in mismatch in relative authorities of pages and domains btw Google and these services. Do I get it right? If so, than I'll try to argue. True, that's impossible to index the whole Web - it is infinite given dynamic content generated by dumb scripts. However, if we are looking at relative authority calculation algorithms, what seems more important is not the quantity (do we index all the content?) but quality (do we index content that is significant in terms of our authority calculations?). Do you agree? Let's call such content 'important' (significantly affecting authority calculations). If two services both manage to index a big part (say 80%) of important content (and given they both use same authority calculation algo), than whatever mismatch in their content coverage won't cause significant mismatch in calculated authorities. Now, could Google cover 80% of important content? First, it's crawling strategy must be based on that very authority, otherwise it a waste of resource. Second, I'd guess that important content tends to live around 'something that has non-zero cost', and any authority algo should revolve around this concept (does it?). An additional page on a site costs 0 (that's why we can't index all the Web), but paid domain costs real money, and an IP, and a subnet. I'd once again guess there are less than 1 billion paid domain names registered. Is it realistic that google did cover them all (given it's crawling strategy should strive to do it)? I feel it is. What do you think? Now, if important content is all around 1 billion paid domains, it seems much more realistic that all link graphing services could cover them all. If so (would be cool to hear <a href="https://moz.com/community/users/82807" rel="nofollow">Aleh Barysevich</a> and <a href="https://moz.com/community/users/45926" rel="nofollow">Dixon Jones</a> on this), it would mean that any mismatch in authorities is caused by algo mismatch, rather than content coverage mismatch. Love this discussion!
Cancel
- HiveDigitalInc
 
 2015-08-14T12:21:21-07:00
 
 1. Yes, the biased samples of the web produced by incomplete, non-random crawls create disparities in relative authority metrics.
 
 2. Unfortunately, the crawl determines the authority, not the other way around. While subsequent crawls may build off authority metrics from previous ones, they aren't given as initial conditions at the first crawl.
 
 3. If 2 services came close to indexing the whole of the web, then yes, they would converge in our calculations as their samples of the web would grow more similar.
 
 The problem seems to be this - you can't sample the web. You have to start somewhere and then make decisions on where to go from there. If everyone doesnt agree on where to start, and how to proceed from what you find, they will diverge over time until they begin to approach the full theoretical index size.
 
 Now, you proposed an interesting point. What if we started off with all domains that are registered as an assumption of important content? While this method may produce more similar results if everyone had agreed to follow it, there would still be big disparities pretty quickly. For example, in what order do you crawl those? If it takes 3 days to crawl every domain homepages, how many of them will have changed? Reddit? Slashdot? Every news outlet? What about random link blocks? Logged-in-now links on forum homepages? Or bigger yet, what about all the subdomains that are missed? espn.go.com? en.wikipedia.org?
 
 So, I don't disagree with many of your thoughts, I think the current state of crawl-based-indexation still generally lends itself to my analysis that as indexes grow, they will likely deviate more from one another.
 
 2 0
 
 1. Yes, the biased samples of the web produced by incomplete, non-random crawls create disparities in relative authority metrics. 2. Unfortunately, the crawl determines the authority, not the other way around. While subsequent crawls may build off authority metrics from previous ones, they aren't given as initial conditions at the first crawl. 3. If 2 services came close to indexing the whole of the web, then yes, they would converge in our calculations as their samples of the web would grow more similar. The problem seems to be this - you can't sample the web. You have to start somewhere and then make decisions on where to go from there. If everyone doesnt agree on where to start, and how to proceed from what you find, they will diverge over time until they begin to approach the full theoretical index size. Now, you proposed an interesting point. What if we started off with all domains that are registered as an assumption of important content? While this method may produce more similar results if everyone had agreed to follow it, there would still be big disparities pretty quickly. For example, in what order do you crawl those? If it takes 3 days to crawl every domain homepages, how many of them will have changed? Reddit? Slashdot? Every news outlet? What about random link blocks? Logged-in-now links on forum homepages? Or bigger yet, what about all the subdomains that are missed? espn.go.com? en.wikipedia.org? So, I don't disagree with many of your thoughts, I think the current state of crawl-based-indexation still generally lends itself to my analysis that as indexes grow, they will likely deviate more from one another.
 Cancel
Todd Maxwell

2015-07-02T15:44:50-07:00

Hi Russ, I have to admit that I am pretty new to this aspect of big data analysis, but I think the idea of having search engine that crawls for things like content freshness, uniqueness and value is a great idea. Are there any resources you recommend to those of us who are trying hard to better understand Google's algorithm and how backlinks influence it? Also, in regards to Google's influence...Do you feel Google should have the influence it has right now in the market? I have spoken with a lot of people who are frustrated with Google's seeming monopoly. Although Google's talent with tackling data is impressive, many people simply feel that Google, especially the Google Search algorithm, has too much influence on the online successes and/or failures of businesses around the world. Do you believe that one company, without oversight, should be allowed to arbitrarily pick who is or isn't worth my time, your time or the time of Internet users everywhere?

2 0

Hi Russ, I have to admit that I am pretty new to this aspect of big data analysis, but I think the idea of having search engine that crawls for things like content freshness, uniqueness and value is a great idea. Are there any resources you recommend to those of us who are trying hard to better understand Google's algorithm and how backlinks influence it? Also, in regards to Google's influence...Do you feel Google should have the influence it has right now in the market? I have spoken with a lot of people who are frustrated with Google's seeming monopoly. Although Google's talent with tackling data is impressive, many people simply feel that Google, especially the Google Search algorithm, has too much influence on the online successes and/or failures of businesses around the world. Do you believe that one company, without oversight, should be allowed to arbitrarily pick who is or isn't worth my time, your time or the time of Internet users everywhere?
Cancel
- HiveDigitalInc
 
 2015-07-03T07:58:02-07:00
 
 Are there any resources you recommend to those of us who are trying hard to better understand Google's algorithm and how backlinks influence it?
 
 Here is a good recent article from Moz on the relationship of backlinks and rankings. It is easiest to think of backlinks as votes, except some votes matter more than others because the voters are voting for one another too.
 
 Also, in regards to Google's influence...Do you feel Google should have the influence it has right now in the market?
 
 I think Google has undue influence over the search landscape right now, but I don't think they are exploiting it.
 
 Do you believe that one company, without oversight, should be allowed to arbitrarily pick who is or isn't worth my time, your time or the time of Internet users everywhere?
 
 I certainly think there should be oversight, especially from the FTC and FCC, but I do think it is reasonable that a search engine can produce results that they choose. As consumers, we need to do a better job of shopping around our search traffic, rather than always defaulting to Google. It is hard to blame Google for our own laziness in that regard.
 
 2 0
 
 Are there any resources you recommend to those of us who are trying hard to better understand Google's algorithm and how backlinks influence it? Here is a good recent article from Moz on the relationship of <a href="https://moz.com/blog/backlinks-google-study" rel="nofollow">backlinks and rankings</a>. It is easiest to think of backlinks as votes, except some votes matter more than others because the voters are voting for one another too. Also, in regards to Google's influence...Do you feel Google should have the influence it has right now in the market? I think Google has undue influence over the search landscape right now, but I don't think they are exploiting it. Do you believe that one company, without oversight, should be allowed to arbitrarily pick who is or isn't worth my time, your time or the time of Internet users everywhere? I certainly think there should be oversight, especially from the FTC and FCC, but I do think it is reasonable that a search engine can produce results that they choose. As consumers, we need to do a better job of shopping around our search traffic, rather than always defaulting to Google. It is hard to blame Google for our own laziness in that regard.
 Cancel
MattAntonino

2015-07-01T23:29:23-07:00

I agree we should be after "good" data, not just big data - but I do see one problem with this approach. When I do a disavow, I'm always going to use Majestic Historic first. Yes, it may contain links that don't exist. Yes, it may have branches of links that even Google doesn't have - but my disavow is going to be the most complete - now and always.

In terms of SEO value and competitor analysis, these tools became useless the day the disavow tool came out because you can't know what a competitor has disavowed. Thus, the index I want IS the largest - so I can do the most complete disavow, not necessarily the "right one today."

(Edit to add: when I do a disavow I'm going to use ALL lists available to me - but if I had to choose ONE, I mean.)

MattAntonino edited 2015-07-01T23:29:56-07:00
2 0

I agree we should be after "good" data, not just big data - but I do see one problem with this approach. When I do a disavow, I'm always going to use Majestic Historic first. Yes, it may contain links that don't exist. Yes, it may have branches of links that even Google doesn't have - but my disavow is going to be the most complete - now and always. In terms of SEO value and competitor analysis, these tools became useless the day the disavow tool came out because you can't know what a competitor has disavowed. Thus, the index I want IS the largest - so I can do the most complete disavow, not necessarily the "right one today." (Edit to add: when I do a disavow I'm going to use ALL lists available to me - but if I had to choose ONE, I mean.)
Cancel
- HiveDigitalInc
 
 2015-07-02T05:02:57-07:00
 
 Hi Matt,
 
 This isn't a problem with my approach. I completely agree with you that there are situations in which the larger the database the better. If you go back and read my "So What Do We Do" conclusions, the example I give is "Let's say you have a list of domains or URLs for which you would like to know their relative values."
 
 That is what this analysis is for - to show you which data set will provide you results more similar to Google's in terms of relativistic metrics (Like PA/DA, MozRank, MozTrust, AhrefsRank, CitationFlow, TrustFlow, etc). Nothing more, nothing less.
 
 The conclusions of this study are restricted to a very particular question, as any good study should be, which is simply: which data set is most proportionally representative of GSC data. What you do with that information is a different question altogether.
 
 1 0
 
 Hi Matt, This isn't a problem with my approach. I completely agree with you that there are situations in which the larger the database the better. If you go back and read my "So What Do We Do" conclusions, the example I give is "Let's say you have a list of domains or URLs for which you would like to know their relative values." That is what this analysis is for - to show you which data set will provide you results more similar to Google's in terms of relativistic metrics (Like PA/DA, MozRank, MozTrust, AhrefsRank, CitationFlow, TrustFlow, etc). Nothing more, nothing less. The conclusions of this study are restricted to a very particular question, as any good study should be, which is simply: which data set is most proportionally representative of GSC data. What you do with that information is a different question altogether. 
 Cancel
Toby Bateson

2015-07-01T04:22:07-07:00

Great article! Very intelligently written

2 0

Great article! Very intelligently written
Cancel
- HiveDigitalInc
 
 2015-07-01T04:32:04-07:00
 
 Thanks! I had a lot of help from Diana Carter and Andrew Cron which allowed me to give a little more polish. Plus Trevor at Moz is always helpful with last minute editorial improvements.
 
 3 0
 
 Thanks! I had a lot of help from Diana Carter and Andrew Cron which allowed me to give a little more polish. Plus Trevor at Moz is always helpful with last minute editorial improvements.
 Cancel
Icah Ats

2015-07-01T13:32:29-07:00

Thank you very much @Rush Jones.

2 0

Thank you very much @Rush Jones.
Cancel
Slava Rybalka

2015-07-01T08:54:39-07:00

That's a great research. What I could deduct from it is that my best bet would be a combo of Moz OSE + aHrefs. Would be also interesting to see how CognitiveSEO performs in this analysis.

SlavaRybalka edited 2015-07-01T08:55:33-07:00
2 0

That's a great research. What I could deduct from it is that my best bet would be a combo of Moz OSE + aHrefs. Would be also interesting to see how CognitiveSEO performs in this analysis.
Cancel
- HiveDigitalInc
 
 2015-07-01T09:08:07-07:00
 
 Actually, you can't ignore Majestic. You are missing out on half of the links if you only go with OSE and AHrefs. Imagine you had 100 randomly selected links from the web for which you would like to compare their relative authority. The chances that all 100 would be in Moz's index, or AHref's index would be much lower than in Majestic's. You should try Moz first, then AHrefs, but more often than not you would end up needing Majestic.
 
 Plus, Majestic has some other awesomely useful metrics like Topic-based measures. You really need all 3.
 
 2 0
 
 Actually, you can't ignore Majestic. You are missing out on half of the links if you only go with OSE and AHrefs. Imagine you had 100 randomly selected links from the web for which you would like to compare their relative authority. The chances that all 100 would be in Moz's index, or AHref's index would be much lower than in Majestic's. You should try Moz first, then AHrefs, but more often than not you would end up needing Majestic. Plus, Majestic has some other awesomely useful metrics like Topic-based measures. You really need all 3. 
 Cancel
 - Slava Rybalka
 
 2015-07-07T09:58:44-07:00
 
 I have used them before and I didn't like the experience. I think that you can only get the real value only from the OSE and aHrefs. But thank you for the suggestion.
 
 SlavaRybalka edited 2015-07-07T10:00:14-07:00
 1 0
 
 I have used them before and I didn't like the experience. I think that you can only get the real value only from the OSE and aHrefs. But thank you for the suggestion.
 Cancel
Vishal Mehta

2015-07-01T06:28:58-07:00

Intelligent,, Yes, we need good data. Problem is , How can I measure crawl rate of another website which already have backlinks to my website?

2 0

Intelligent,, Yes, we need good data. Problem is , How can I measure crawl rate of another website which already have backlinks to my website?
Cancel
- HiveDigitalInc
 
 2015-07-01T07:35:44-07:00
 
 To my knowledge, there is no easy way to do this, but I could imagine a fairly straightforward methodology...
 
 1. Spider the site with some sort of tool like ScreamingFrog, Microsoft IIS Toolkit, etc. to get a list of pages.
 2. Using proxies, check the cache date of these pages every day/week in Google.
 3. Store the cache dates and use them to determine, over time, the rate at which Google crawls their site.
 
 I'm not sure how this would be valuable to you, but it seems possible.
 
 3 0
 
 To my knowledge, there is no easy way to do this, but I could imagine a fairly straightforward methodology... 1. Spider the site with some sort of tool like ScreamingFrog, Microsoft IIS Toolkit, etc. to get a list of pages. 2. Using proxies, check the cache date of these pages every day/week in Google. 3. Store the cache dates and use them to determine, over time, the rate at which Google crawls their site. I'm not sure how this would be valuable to you, but it seems possible.
 Cancel
 - Vishal Mehta
 
 2015-07-02T02:06:25-07:00
 
 Yes, Russ. its pretty straight . One more thing, Is it advisable to set crawl rate for a website which is hosted on the shared server with multiple other websites? What if the site was down while crawling? Most of the small business websites are having shared server with different field websites. I think crawling rate schedule is improper option for them.
 
 MehtaVishal edited 2015-07-02T02:06:50-07:00
 2 0
 
 Yes, Russ. its pretty straight . One more thing, Is it advisable to set crawl rate for a website which is hosted on the shared server with multiple other websites? What if the site was down while crawling? Most of the small business websites are having shared server with different field websites. I think crawling rate schedule is improper option for them.
 Cancel
 - Dixon Jones
 
 2015-07-02T02:28:55-07:00
 
 Quote: How can I measure crawl rate of another website
 
 in the Pages tab of Majestic Site Explorer, we record the last crawl date for each page (and the response code), so you can use that - but of course, all the crawlers crawl differently.
 
 DixonJones edited 2015-07-02T02:30:07-07:00
 3 0
 
 Quote: How can I measure crawl rate of another website in the Pages tab of Majestic Site Explorer, we record the last crawl date for each page (and the response code), so you can use that - but of course, all the crawlers crawl differently.
 Cancel
 - Vishal Mehta
 
 2015-07-02T05:08:47-07:00
 
 Thank You Dixon, I will definitely use it soon
 
 2 0
 
 Thank You Dixon, I will definitely use it soon 
 Cancel
Gabriel Nwatarali

2015-07-01T12:19:13-07:00

Thanks for sharing this information.

2 0

Thanks for sharing this information.
Cancel
digimethod

2015-07-01T11:33:15-07:00

hi dear , Today now i joined to your site , i am very happy , because your content is best , i am Persian and in my language content about seo is very cheap ! my imagine is learn seo tactic more and more and teach it to all of persian people . Good luck - anyone can tell me , How Can I Start reading Step by Step from Where ?/

Best Regards / if you want see my website , i have course and content about social Media and digital life

Digimehtod

EricaMcGillivray edited 2015-07-02T10:12:33-07:00
2 0

hi dear , Today now i joined to your site , i am very happy , because your content is best , i am Persian and in my language content about seo is very cheap ! my imagine is learn seo tactic more and more and teach it to all of persian people . Good luck - anyone can tell me , How Can I Start reading Step by Step from Where ?/ Best Regards / if you want see my website , i have course and content about social Media and digital life Digimehtod 
Cancel
- Erica McGillivray
 
 2015-07-02T10:14:12-07:00
 
 You should definitely check out our beginner's guide to SEO.
 
 2 0
 
 You should definitely check out <a href="https://moz.com/beginners-guide-to-seo" rel="nofollow">our beginner's guide to SEO.</a>
 Cancel
MatildaRose

2015-08-15T14:35:43-07:00

One thing people can try is Bing Webmaster Tools - they show you the links they have found, and they are likely to be more similar to Google's than any of the other link checkers

1 0

One thing people can try is Bing Webmaster Tools - they show you the links they have found, and they are likely to be more similar to Google's than any of the other link checkers
Cancel
HiveDigitalInc

2015-07-01T02:02:47-07:00

Hey folks, I'll do my best to try and answer every question here today. You can also oing me on twitter (@rjonesx) if you like. Thanks in advance for your comments and questions.

1 0

Hey folks, I'll do my best to try and answer every question here today. You can also oing me on twitter (@rjonesx) if you like. Thanks in advance for your comments and questions.
Cancel
sunidhi

2016-04-13T02:31:33-07:00

Thanks for the good explanation about the comparison of link indexes in big data. I took big data online training in Intellipaat. I loved it all and I am still doing so. My journey for big data training with Intellipaat is fabulous.

1 0

Thanks for the good explanation about the comparison of link indexes in big data. I took big data online training in Intellipaat. I loved it all and I am still doing so. My journey for big data training with Intellipaat is fabulous.
Cancel
Shuwei Zhao

2015-07-10T12:17:34-07:00

Really enjoy this article! Data speaks loud here.

Also, I really appreciate the comments about how to use different tools together!

1 0

Really enjoy this article! Data speaks loud here. Also, I really appreciate the comments about how to use different tools together!
Cancel
Cynthia Coffield

2015-07-08T09:55:10-07:00

Great outline of these different link indexes and how they work. I always start with Moz and move to the other tools as I need more information to take action on.

1 0

Great outline of these different link indexes and how they work. I always start with Moz and move to the other tools as I need more information to take action on.
Cancel
Iván Torrente

2015-07-10T03:48:21-07:00

Web https://searchengineland.com known no studies on the Big Data for sure you are interesting. I recommend !! ;)

1 0

Web https://searchengineland.com known no studies on the Big Data for sure you are interesting. I recommend !! ;)
Cancel
Eric Van Buskirk

2015-07-09T19:26:17-07:00

Russ, as you know keywords are my specialty. Having said that, I was one of 10 employees in SEMrush for 14 months I saw as the link index went from infant to adolescence. I'd imagine they're happy to shed more light on their process. Send tweet to @RadioMS, US Marketing Dir. SEMrush is so very new to the link analysis game, I have to believe they're attempt to scale their db in a short time plays a big part in their process.

1 0

Russ, as you know keywords are my specialty. Having said that, I was one of 10 employees in SEMrush for 14 months I saw as the link index went from infant to adolescence. I'd imagine they're happy to shed more light on their process. Send tweet to @RadioMS, US Marketing Dir. SEMrush is so very new to the link analysis game, I have to believe they're attempt to scale their db in a short time plays a big part in their process.
Cancel
Rina Nandeshwar

2015-07-06T03:16:14-07:00

Thanks for sharing advanced SEO information! superb sir.
Quality theme relevant links are important for ranking ...

1 0

Thanks for sharing advanced SEO information! superb sir. Quality theme relevant links are important for ranking ... 
Cancel
Habitos-Saludables

2015-11-19T03:30:30-08:00

Your analysis was fantastic. You know a lot about backlinks and we can see it in your post. I have a question, if I have more data is more easy that I can lose the reference about my backlinks?

1 1

Your analysis was fantastic. You know a lot about backlinks and we can see it in your post. I have a question, if I have more data is more easy that I can lose the reference about my backlinks?
Cancel
Alberto López

2015-11-24T04:29:44-08:00

It has been a very interesting and engaging article. Thank you.

1 1

It has been a very interesting and engaging article. Thank you.
Cancel

Post Analytics

Big Data, Big Problems: 4 Major Link Indexes Compared

Proportional representation to Google Search Console data

A Visualization

Methodology

Steps

Results

What does this mean?

So what do we do?

Recommendations for the link graphing industry

Credits

Comments 50

Proportional representation to Google Search Console data

A Visualization

Methodology

Steps

Results

What does this mean?

So what do we do?

Recommendations for the link graphing industry

Credits

Comments 50

Log in to Moz

Don't have an account?