February Linkscape Update: 66 Billion URLs

Comments 42

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

James Norquay

2012-02-28T18:23:44-08:00

Fantastic work by rand and the OSE team, I was suprised on Quora the other day when I saw how much this tool costs to run!! Overall but one of the best SEO tools in the market =)

4 0

Fantastic work by rand and the OSE team, I was suprised on Quora the other day when I saw how much this tool costs to run!! Overall but one of the best SEO tools in the market =)
Cancel
- Rand Fishkin
 
 2012-02-28T23:10:19-08:00
 
 Thanks James - lots of people seemed to like that thread, so I'm linking to it here: https://www.quora.com/How-much-does-SEOMOZ-Linkscape-infrastructure-cost Sadly, costs went up substantially this index due to us having to re-run a lot of processing (over $300K in total), but we're hoping Amazon is going to provide a refund for a good portion of that.
 
 4 0
 
 Thanks James - lots of people seemed to like that thread, so I'm linking to it here: <a href="https://www.quora.com/How-much-does-SEOMOZ-Linkscape-infrastructure-cost" rel="nofollow">https://www.quora.com/How-much-does-SEOMOZ-Linkscape-infrastructure-cost</a> Sadly, costs went up substantially this index due to us having to re-run a lot of processing (over $300K in total), but we're hoping Amazon is going to provide a refund for a good portion of that.
 Cancel
 - Moosa Hemani
 
 2012-02-28T23:27:02-08:00
 
 (eyes wide open) $300K... Hats off to Rand and the team of OSE. SERIOUSLY! <3 SEOmoz
 
 2 0
 
 (eyes wide open) $300K... Hats off to Rand and the team of OSE. SERIOUSLY! <3 SEOmoz
 Cancel
 - Alex Czartoryski
 
 2012-02-29T08:06:24-08:00
 
 re: $300k - Wow! Although quite awesome that you can just outsource the computing power to the Amazon cloud and not have to build your own datacenter. Very cool. Make sure you charge it on your credit card to collect the Air Miles... :)
 
 2 0
 
 re: $300k - Wow! Although quite awesome that you can just outsource the computing power to the Amazon cloud and not have to build your own datacenter. Very cool. Make sure you charge it on your credit card to collect the Air Miles... :)
 Cancel
 - James Norquay
 
 2012-03-06T03:45:39-08:00
 
 Yeah I really liked that thread sent it to a few people who use the tool alot, but thats the best thing about SEOmoz is the transparancy with information within the company.
 
 1 0
 
 Yeah I really liked that thread sent it to a few people who use the tool alot, but thats the best thing about SEOmoz is the transparancy with information within the company.
 Cancel
A-W

2012-02-28T23:43:01-08:00

Ok good job, but tell me how long would it take you to start a search engine, namely Moz.com in competition to Google.com?May seem a bit funny at the moment, but I can see you are progressing well towards that(If you want to go for it)!

3 0

Ok good job, but tell me how long would it take you to start a search engine, namely Moz.com in competition to Google.com?May seem a bit funny at the moment, but I can see you are progressing well towards that(If you want to go for it)! 
Cancel
- Syed Noman Ali
 
 2012-02-28T23:52:09-08:00
 
 It was a top secret and finally you open that :D
 
 1 0
 
 It was a top secret and finally you open that :D
 Cancel
 - A-W
 
 2012-02-29T00:05:48-08:00
 
 Oh no dear, you are taking in wrong here, TAGFEE does not leave any place for any mysteries in professional work, you can still check Moz.com and you will find that Roger Bot is standing there behind the door!
 
 2 0
 
 Oh no dear, you are taking in wrong here, TAGFEE does not leave any place for any mysteries in professional work, you can still check Moz.com and you will find that Roger Bot is standing there behind the door!
 Cancel
- Rand Fishkin
 
 2012-02-29T00:09:16-08:00
 
 We've talked a few times about building an actual search engine, but to be honest, it's not our mission or vision. We want to help people share their ideas on the web, and another search engine is somewhat ancillary to our goals around bigger, better, fresher data and better visualization, analytics and recommendations from that data.
 
 That said, it may be an R&D project for us at some point to attempt a search engine build, simply to get more information we can use to provide recommendations and advice and inform our own product roadmap.
 
 2 0
 
 We've talked a few times about building an actual search engine, but to be honest, it's not our mission or vision. We want to help people share their ideas on the web, and another search engine is somewhat ancillary to our goals around bigger, better, fresher data and better visualization, analytics and recommendations from that data. That said, it may be an R&D project for us at some point to attempt a search engine build, simply to get more information we can use to provide recommendations and advice and inform our own product roadmap.
 Cancel
A-W

2012-02-28T23:57:13-08:00

Guess what Rand!

I signed in for the "Ginza Metrics" After reading your comments about them i-e "check out some cool stuff they're doing with Moz data here and in the screenshot below."

And now i got 41(forty one) mails in my inbox for activation of my account! Can you please tell them not to annoy users with their Email bombardment? Its really annoying!Thanks

*Update: Talked to 'Ray' there, and he is looking into the issue, seems some sort of technical error.

A-W edited 2012-02-29T00:07:05-08:00
2 0

Guess what Rand! I signed in for the "Ginza Metrics" After reading your comments about them i-e "check out some cool stuff they're doing with Moz data here and in the screenshot below." And now i got 41(forty one) mails in my inbox for activation of my account! Can you please tell them not to annoy users with their Email bombardment? Its really annoying!Thanks *Update: Talked to 'Ray' there, and he is looking into the issue, seems some sort of technical error. 
Cancel
- Rand Fishkin
 
 2012-02-29T00:07:19-08:00
 
 Whoa! I'd definitely drop Ray a line about that - I'm sure it's unintentional.
 
 3 0
 
 Whoa! I'd definitely drop Ray a line about that - I'm sure it's unintentional.
 Cancel
Susan Kolbe

2012-02-28T18:50:42-08:00

ZOMOES! Zero Other Mashups Offer Equal Service.

2 0

ZOMOES! Zero Other Mashups Offer Equal Service.
Cancel
Mark Capuano

2012-02-28T18:24:31-08:00

This is Absolutely Awesome!!!

Look at how much work is available for people doing SEO. This is great data and I am really looking forward to your next update. Wow, what an amazing feat… To make this already inconceivably large index 2-3x larger.

Thank you SEOMoz

2 0

This is Absolutely Awesome!!! Look at how much work is available for people doing SEO. This is great data and I am really looking forward to your next update. Wow, what an amazing feat… To make this already inconceivably large index 2-3x larger. Thank you SEOMoz
Cancel
Nathan Byloff

2012-02-28T18:25:33-08:00

Ooo a day early. I am noticing a drop in things like mozRank & trust for us and our competitors (but DA was up). Not a lot, but your calculated metrics shaved a few percentage points off the scores, while links increased. Was there any real change to how things are calculated or is this just an effect of the change to the crawl? Not that it matters, just curious if you see anything similar.

Also, when I look at the top pages tab in OSE, it shows pages that have been long dead for over a year that we 301'd to its new page. All links point to the new page, is there every any update to that top page list? The site used to be PHP, now it's .NET. Our 5th "top page" is an old PHP page that is redirected to the new live one. That old one shouldn't contain any value anymore and be invisible, right?

Thanks for the update!

nbyloff edited 2012-02-28T19:13:08-08:00
2 0

Ooo a day early. I am noticing a drop in things like mozRank & trust for us and our competitors (but DA was up). Not a lot, but your calculated metrics shaved a few percentage points off the scores, while links increased. Was there any real change to how things are calculated or is this just an effect of the change to the crawl? Not that it matters, just curious if you see anything similar. Also, when I look at the top pages tab in OSE, it shows pages that have been long dead for over a year that we 301'd to its new page. All links point to the new page, is there every any update to that top page list? The site used to be PHP, now it's .NET. Our 5th "top page" is an old PHP page that is redirected to the new live one. That old one shouldn't contain any value anymore and be invisible, right? Thanks for the update!
Cancel
- Rand Fishkin
 
 2012-02-28T23:13:53-08:00
 
 We've seen that over time, too. I suspect what happens is the same thing you see with Google's PageRank score (though on a much more granular level). Basically, as the sites at the "top" of the link graph (those earning the most links) get matched to a PageRank / MozRank 10, the sites in the middle of the curve naturally distribute a bit lower. Hence, the best way to look at this is always in a competitive comparison view so you can see how your competitors' metrics are affected, too.
 
 4 0
 
 We've seen that over time, too. I suspect what happens is the same thing you see with Google's PageRank score (though on a much more granular level). Basically, as the sites at the "top" of the link graph (those earning the most links) get matched to a PageRank / MozRank 10, the sites in the middle of the curve naturally distribute a bit lower. Hence, the best way to look at this is always in a competitive comparison view so you can see how your competitors' metrics are affected, too.
 Cancel
des mc carthy

2012-03-01T16:25:58-08:00

@ 300K an update you guys should really start looking @ building your own private cloud.

1 0

@ 300K an update you guys should really start looking @ building your own private cloud.
Cancel
Thommas

2012-03-01T14:03:41-08:00

Thtas fantastic, I can see many more links now. Keep up the good work SEOmoz!

1 0

Thtas fantastic, I can see many more links now. Keep up the good work SEOmoz! 
Cancel
SAS Accounting Services

2012-03-01T22:43:02-08:00

WOW really great news, every month am accounting days to see next index update, really very helpful for SEO.

1 0

WOW really great news, every month am accounting days to see next index update, really very helpful for SEO.
Cancel
Robert Babak Rowshan

2012-03-01T15:04:54-08:00

Wow! Fantastic work by SEOmoz as usual.

1 0

Wow! Fantastic work by SEOmoz as usual.
Cancel
KTaylor

2012-03-03T07:35:24-08:00

Nice! I totally missed this the other day. Now to begin digging...

1 0

Nice! I totally missed this the other day. Now to begin digging...
Cancel
Brian Reynolds

2012-03-20T17:31:19-07:00

Great stuff. It is good to see such transparency and I am encouraged to connect with your affiliates.

1 0

Great stuff. It is good to see such transparency and I am encouraged to connect with your affiliates.
Cancel
AmitShaw17

2012-03-08T10:32:00-08:00

Thanks Rand and Team. Lots of new links. . . Thanks for this.

1 0

Thanks Rand and Team. Lots of new links. . . Thanks for this.
Cancel
MiroAsh

2012-03-01T10:02:36-08:00

Thanks Rand and Team. Lots of new links found too!

1 0

Thanks Rand and Team. Lots of new links found too!
Cancel
mypctechs

2012-03-01T23:10:22-08:00

I really enjoyed watching the inbound domains count in one campaign nearly triple after the update. It's nice to know that YOUR efforts allow me to track MY efforts. Keep up the great work. :)

1 0

I really enjoyed watching the inbound domains count in one campaign nearly triple after the update. It's nice to know that YOUR efforts allow me to track MY efforts. Keep up the great work. :)
Cancel
__unset

2012-02-29T06:09:21-08:00

Thanks again for this huge index!

I would be very curious to know what is the software architecture used for processing this enormous amount of data on amazon EC2, and if you tried different solutions which one was the better one (for example SQL vs NoSQL - assuming Amazon allows some freedom, especially for people that pay what you guys pay!).

Another thing that I've always been wondering is what machine learning technique you used to "reverse engineer" google algorithm, although I can understand if you do not want to disclose it.

Obviously I would have so many more questions in my head, but I dont want to ask too much!

Finally, I apologise if you answered this question already, but I could not find it anywhere (I found some hints on the machine learning algorithm on a Whiteboard Friday video with Ben, but nothing specific).

Thanks in advance,

Riccardo

1 0

Thanks again for this huge index! I would be very curious to know what is the software architecture used for processing this enormous amount of data on amazon EC2, and if you tried different solutions which one was the better one (for example SQL vs NoSQL - assuming Amazon allows some freedom, especially for people that pay what you guys pay!). Another thing that I've always been wondering is what machine learning technique you used to "reverse engineer" google algorithm, although I can understand if you do not want to disclose it. Obviously I would have so many more questions in my head, but I dont want to ask too much! Finally, I apologise if you answered this question already, but I could not find it anywhere (I found some hints on the machine learning algorithm on a Whiteboard Friday video with Ben, but nothing specific). Thanks in advance, Riccardo
Cancel
- Rand Fishkin
 
 2012-02-29T10:18:59-08:00
 
 I'm not going to be able to explain all the architectural bits, but I can say that most of the code is C++, and we actually use a custom-built, flat-file storage system rather than a SQL or other existing database.
 
 As far as the machine learning process - we used some existing libraries customized to build a model against ~10K Google search results (the first 30). Correlation against these is fairly simplistic, but then we have PA/DA, which are a mashup of all our metrics to get a "best fit" line using these keyword-agnostic metrics.
 
 Hope that helps!
 
 2 0
 
 I'm not going to be able to explain all the architectural bits, but I can say that most of the code is C++, and we actually use a custom-built, flat-file storage system rather than a SQL or other existing database. As far as the machine learning process - we used some existing libraries customized to build a model against ~10K Google search results (the first 30). Correlation against these is fairly simplistic, but then we have PA/DA, which are a mashup of all our metrics to get a "best fit" line using these keyword-agnostic metrics. Hope that helps!
 Cancel
 - Nathan Byloff
 
 2012-02-29T13:33:06-08:00
 
 Did you change your internal id's on things like links on this last crawl? Up until this last OSE update, I was comparing links I had previously stored with new ones receive from the API. I would use your lrid property in the API (the internal link id) to see if the link had been previously discovered. That worked until this last crawl. Now all the lrid's are new.
 
 1 1
 
 Did you change your internal id's on things like links on this last crawl? Up until this last OSE update, I was comparing links I had previously stored with new ones receive from the API. I would use your lrid property in the API (the internal link id) to see if the link had been previously discovered. That worked until this last crawl. Now all the lrid's are new.
 Cancel
 - Rand Fishkin
 
 2012-02-29T15:33:49-08:00
 
 I'm not 100% sure about that, but certainly possible. Let me ask around the team.
 
 On edit: got this from one of our big data team members:
 
 "Yes, our internal IDs rotate out with each new index. Unfortunately any tools that were relying on them will need to be tweaked; at the very least they will need to collect new ids. Note also that these ids are explicitly not part of our public interface; use at your own risk and all that."
 
 randfish edited 2012-02-29T15:53:54-08:00
 1 0
 
 I'm not 100% sure about that, but certainly possible. Let me ask around the team. On edit: got this from one of our big data team members: "Yes, our internal IDs rotate out with each new index. Unfortunately any tools that were relying on them will need to be tweaked; at the very least they will need to collect new ids. Note also that these ids are explicitly not part of our public interface; use at your own risk and all that."
 Cancel
 - Nathan Byloff
 
 2012-03-01T07:01:36-08:00
 
 Thanks for finding out the answer! I will explore new ways of keeping up with historical link discovery.
 
 1 0
 
 Thanks for finding out the answer! I will explore new ways of keeping up with historical link discovery.
 Cancel
 - BrandonF
 
 2012-02-29T15:56:33-08:00
 
 The internal lDs on links, URLs, FQDNs, and PLDs are exactly that: internal IDs. These change on every index release. If you got lucky and some of them did not change across an index release, I would be quite surprised.
 
 BrandonF edited 2012-02-29T15:58:45-08:00
 1 0
 
 The internal lDs on links, URLs, FQDNs, and PLDs are exactly that: internal IDs. These change on every index release. If you got lucky and some of them did not change across an index release, I would be quite surprised.
 Cancel
Syed Noman Ali

2012-02-28T23:53:26-08:00

BTW it is proud moment for me to be a part of SEOmoz (Daily Reader). Thanks Rand

1 0

BTW it is proud moment for me to be a part of SEOmoz (Daily Reader). Thanks Rand
Cancel
SajeetNair

2012-02-28T21:25:02-08:00

Hi Rand,

This is just so freaking awesome.

Was just wondering, will it be possible to have something like "Historic link profile data". Since you guys remove historic data but i believe that you must store that data somewhere right?

I am only asking this as from an analyst point of view it would be very helpful for me. For example - If one of my competitor’s ranks fell drastically, looking at the historic data and the fresh data I will be able to analyze what went wrong for them and then i will refrain from using the same tactics.

- Sajeet

SajeetNair edited 2012-02-28T22:09:25-08:00
1 0

Hi Rand, This is just so freaking awesome. Was just wondering, will it be possible to have something like "Historic link profile data". Since you guys remove historic data but i believe that you must store that data somewhere right? I am only asking this as from an analyst point of view it would be very helpful for me. For example - If one of my competitor’s ranks fell drastically, looking at the historic data and the fresh data I will be able to analyze what went wrong for them and then i will refrain from using the same tactics. - Sajeet
Cancel
- Rand Fishkin
 
 2012-02-28T23:15:12-08:00
 
 We actually DO have this now! If you go in your campaigns and look at the link data tab, you'll find historical link metrics for you and your competitors. Here's the post announcing it from December: https://www.seomoz.org/blog/historical-link-analysis-is-here
 
 1 0
 
 We actually DO have this now! If you go in your campaigns and look at the link data tab, you'll find historical link metrics for you and your competitors. Here's the post announcing it from December: <a href="https://www.seomoz.org/blog/historical-link-analysis-is-here" rel="nofollow">https://www.seomoz.org/blog/historical-link-analysis-is-here</a>
 Cancel
Matt Burgess

2012-02-28T18:58:30-08:00

Hey Rand... Re: Ginza, assuming that in the sentence "check out some cool stuff they're doing with Moz data here", the here is meant to be linked? :)

1 0

Hey Rand... Re: Ginza, assuming that in the sentence "check out some cool stuff they're doing with Moz data here", the here is meant to be linked? :)
Cancel
- Rand Fishkin
 
 2012-02-28T23:14:18-08:00
 
 Doh! Sorry - fixed that (and added a nice screenshot from them).
 
 1 0
 
 Doh! Sorry - fixed that (and added a nice screenshot from them).
 Cancel
sfarmapiatra

2012-02-29T04:27:50-08:00

Congratulations!

1 0

Congratulations!
Cancel
- Andrew Borman
 
 2012-06-15T15:24:11-07:00
 
 Gratz! It's a realy huge update for all webmasters)
 
 2 0
 
 Gratz! It's a realy huge update for all webmasters)
 Cancel
MyFavouriteCottages

2012-02-29T04:59:56-08:00

Always like seeing updated and more data.

I must say the average number of links on a page suprises me. Especaly since most of my tools are configured to flag up pages with >100 links, that 70 is the average made me realise how easy that is to go over on a lot of the sites I deal with.

1 0

Always like seeing updated and more data. I must say the average number of links on a page suprises me. Especaly since most of my tools are configured to flag up pages with >100 links, that 70 is the average made me realise how easy that is to go over on a lot of the sites I deal with.
Cancel
Holly Wade

2012-02-29T12:06:44-08:00

Sweet! Thanks for the hard work!

1 0

Sweet! Thanks for the hard work!
Cancel
Dan Bochichio

2012-02-29T06:45:05-08:00

Looking good! Always happy to see updated data when I wake up.

1 0

Looking good! Always happy to see updated data when I wake up.
Cancel
Luke Summerfield

2012-02-29T06:04:09-08:00

that's great guys. Thanks for all the hard work! I couldn't imagaine having to work with 66 BILLION URLS! Ha!

1 0

that's great guys. Thanks for all the hard work! I couldn't imagaine having to work with 66 BILLION URLS! Ha!
Cancel
sooger

2012-02-29T16:45:48-08:00

Last update is a huge jump and big suprise for me, all my websites have 70% more backlinks discovered. Great job!

1 0

Last update is a huge jump and big suprise for me, all my websites have 70% more backlinks discovered. Great job!
Cancel

Post Analytics

Comments 42

Log in to Moz

Don't have an account?