November Linkscape Update is Live; Binary Files Issue Dramatically Reduced

Comments 27

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Richard Baxter

2011-11-06T10:18:22-08:00

Noticed the index update a few days ago - thank you! A ~30 day refresh on a 100 billion URL index will be an epic achievement. I don't envy the bandwidth cost of that!

On a separate note, Will mentioned this in his Searchlove presentation: https://blekko.com/webgrep the ability to grep the LS index would be completely awesome. Even if you guys were taking requests (like Blekko) you could probably show the community some phenomenal insight!

Steve Souder's HTTParchive is another interesting example of this: https://httparchive.org/

By the way, just a small feature request - when you update the index could we get an email notification? :-)

4 0

Noticed the index update a few days ago - thank you! A ~30 day refresh on a 100 billion URL index will be an epic achievement. I don't envy the bandwidth cost of that! On a separate note, Will mentioned this in his Searchlove presentation: <a href="https://blekko.com/webgrep" rel="nofollow">https://blekko.com/webgrep</a> the ability to grep the LS index would be completely awesome. Even if you guys were taking requests (like Blekko) you could probably show the community some phenomenal insight! Steve Souder's HTTParchive is another interesting example of this: <a href="https://httparchive.org/" rel="nofollow">https://httparchive.org/</a> By the way, just a small feature request - when you update the index could we get an email notification? :-)
Cancel
- Casey Henry
 
 2011-11-06T10:49:41-08:00
 
 Hi Richard,
 
 We are talking internally right now about a way to notify people about an update. Stay tuned as we have a few options in mind but need to narrow it down and execute it.
 
 3 0
 
 Hi Richard, We are talking internally right now about a way to notify people about an update. Stay tuned as we have a few options in mind but need to narrow it down and execute it.
 Cancel
Ben R Woodard

2011-11-07T04:22:46-08:00

It is articles like this one that prove to me I made a wise decision to get the pro membership. Not only is the Moz team doing awesome work but the community adds so much value.

Thanks Rand and team for your responses and thanks community for interacting. GREAT STUFF!

2 0

It is articles like this one that prove to me I made a wise decision to get the pro membership. Not only is the Moz team doing awesome work but the community adds so much value. Thanks Rand and team for your responses and thanks community for interacting. GREAT STUFF!
Cancel
Sebastian Fiebiger

2011-11-06T10:27:42-08:00

Yeaaah! Good Work! Fresh Data always sets a highlight of my day.

And I still hope, that - one day - there will be a solution for tracking domain indicators like DA, DmR, DmT historical, so that I do not have to type these values after each update into an Excel spreadsheet ;-)

Advanced Webranking (a third-party software, that imports Linkscape data) does provide historical tracking for the MOZ-values. But I would love to have this inside of SEOmoz Pro!

With sunny regards from Germany,

Sebastian

2 0

Yeaaah! Good Work! Fresh Data always sets a highlight of my day. And I still hope, that - one day - there will be a solution for tracking domain indicators like DA, DmR, DmT historical, so that I do not have to type these values after each update into an Excel spreadsheet ;-) Advanced Webranking (a third-party software, that imports Linkscape data) does provide historical tracking for the MOZ-values. But I would love to have this inside of SEOmoz Pro! With sunny regards from Germany, Sebastian
Cancel
- Rand Fishkin
 
 2011-11-06T11:21:38-08:00
 
 Yep - product team has that spec'd and it's being built into the web app as I type. Hopefully released before the year is out (maybe even in November).
 
 4 0
 
 Yep - product team has that spec'd and it's being built into the web app as I type. Hopefully released before the year is out (maybe even in November).
 Cancel
 - dbomestar
 
 2011-11-06T14:15:42-08:00
 
 I totaly agree with softclick and I can't wait to get this feature. +1 for that. That is the feature that I miss the most. After that you can just stop improving it. Kidding :)
 
 Related to web app. I saw different results in ranking on web app rankings and "old" rank tracker tool. Why is that happening and is that normal thing?And it was just few hours difference from the weekly update and checking on old rank tracker tool. Similar thing also on multiple domains.
 
 dbomestar edited 2011-11-06T14:16:38-08:00
 2 0
 
 I totaly agree with softclick and I can't wait to get this feature. +1 for that. That is the feature that I miss the most. After that you can just stop improving it. Kidding :) Related to web app. I saw different results in ranking on web app rankings and "old" rank tracker tool. Why is that happening and is that normal thing?And it was just few hours difference from the weekly update and checking on old rank tracker tool. Similar thing also on multiple domains. 
 Cancel
 - Rand Fishkin
 
 2011-11-06T16:37:30-08:00
 
 The web app should have the most accurate, geo/personalization-agnostic rankings; believe old rank tracker may still be pulling with Seattle-geo-biasing.
 
 1 0
 
 The web app should have the most accurate, geo/personalization-agnostic rankings; believe old rank tracker may still be pulling with Seattle-geo-biasing.
 Cancel
 - Sebastian Fiebiger
 
 2011-11-06T15:04:34-08:00
 
 Your reply makes me very happy!
 
 1 0
 
 Your reply makes me very happy!
 Cancel
James Norquay

2011-11-06T18:11:34-08:00

Nice on rand, have been seeing those funny files in the OSE index, happy the problems are now been fixed up =)

Any plans to work on a historic index too as a side project?

2 0

Nice on rand, have been seeing those funny files in the OSE index, happy the problems are now been fixed up =) Any plans to work on a historic index too as a side project? 
Cancel
SarahGoliger

2011-11-06T10:02:37-08:00

Thank you, this is fantastic. I am seeing a lot of links from small blogs than I am not used to seeing in the past, it seems like the index is getting great at including all of them as well. I'm seeing a lot more links in general to some pages on my site since the last update.

Great work!

2 0

Thank you, this is fantastic. I am seeing a lot of links from small blogs than I am not used to seeing in the past, it seems like the index is getting great at including all of them as well. I'm seeing a lot more links in general to some pages on my site since the last update. Great work!
Cancel
- Rand Fishkin
 
 2011-11-06T10:12:23-08:00
 
 I was seeing the same thing for my personal site and for Everywhereist.com. Going to check some more sites, but yeah, kinda cool that we seem to be crawling some of that stuff more deeply.
 
 1 0
 
 I was seeing the same thing for my personal site and for Everywhereist.com. Going to check some more sites, but yeah, kinda cool that we seem to be crawling some of that stuff more deeply.
 Cancel
Bryant Dunivan

2011-11-07T07:15:44-08:00

Awesome! Thanks for keeping results fresh!

1 0

Awesome! Thanks for keeping results fresh!
Cancel
sean459

2011-11-07T16:57:59-08:00

Would be nice if reciprocal links were flagged like no follow ones are.

1 0

Would be nice if reciprocal links were flagged like no follow ones are.
Cancel
MLTGroup

2011-11-08T14:52:05-08:00

The last couple of days my mozbar has been asking for a username and password even though I'm logged in. Is this something that is related to latest update?

Thanks

1 0

The last couple of days my mozbar has been asking for a username and password even though I'm logged in. Is this something that is related to latest update? Thanks
Cancel
nivomediagroup

2012-01-02T11:20:22-08:00

Does Linkscape plan on updating again any time soon? My last crawl was in November and I need to send out a report.. Very frustrating..

1 0

Does Linkscape plan on updating again any time soon? My last crawl was in November and I need to send out a report.. Very frustrating..
Cancel
Moosa Hemani

2011-11-06T22:14:40-08:00

Nice to see another update… I can see lot of links from different sources…great wor SEOmoz!

1 0

Nice to see another update… I can see lot of links from different sources…great wor SEOmoz!
Cancel
Damion Brown

2011-11-06T14:08:33-08:00

I'm still finding OSE data way off from what's being reported by MajesticSEO and Ahrefs.com.

If out of three data sources, two are reasonably linear and one is way off the mark, is the only logical and correct assumption that the one way off the mark is unreliable at best and misleading at worst?

I fine OSE data an interesting fiction but it's gone way past the point of it being useful and actionable data of the sort that we can put to clients.

1 0

I'm still finding OSE data way off from what's being reported by MajesticSEO and Ahrefs.com. If out of three data sources, two are reasonably linear and one is way off the mark, is the only logical and correct assumption that the one way off the mark is unreliable at best and misleading at worst? I fine OSE data an interesting fiction but it's gone way past the point of it being useful and actionable data of the sort that we can put to clients. 
Cancel
- dbomestar
 
 2011-11-06T14:21:19-08:00
 
 As far as I understood it it's simple. Majestic pulls a lot of the c**p even with their new way with "fresh data", where OSE cut's off some percentage of lowest quality links. Also I think OSE first needs to index page and link than on the next update they gonna index the pages that were linked from the pages that got indexed in previous update. So it takes time for links to get picked up. I think for older domains and older links differences are much lower.
 
 Just my 2 euro cents.
 
 dbomestar edited 2011-11-06T14:22:16-08:00
 2 0
 
 As far as I understood it it's simple. Majestic pulls a lot of the c**p even with their new way with "fresh data", where OSE cut's off some percentage of lowest quality links. Also I think OSE first needs to index page and link than on the next update they gonna index the pages that were linked from the pages that got indexed in previous update. So it takes time for links to get picked up. I think for older domains and older links differences are much lower. Just my 2 euro cents.
 Cancel
- Rand Fishkin
 
 2011-11-06T16:40:01-08:00
 
 We generally map well to the numbers you see from referring links/domains in your Google Analytics, as well as what's reported in Google/Bing Webmaster Tools and Yahoo! Site Explorer. MJ tends to be much larger than any of these, often not in proportion. I'm unfamiliar with Ahrefs; not sure if they build their own indices or rely on third party data in whole or part.
 
 Our goal is to always have the metrics that best correlate to rankings in Google, and for the past few years, whenever we or other third parties have run analyses, we tend to hit that mark. That's not to say we don't want to get bigger and fresher - we realize a sample set is not as valuable as the entire web's link graph (so long as they're all portions Google/Bing are crawling + counting, too).
 
 I should also note that we do a lot of canonicalization and de-duplication of URLs, which we've found leads to higher quality stuff, but it does make the raw counts lower.
 
 randfish edited 2011-11-06T16:41:55-08:00
 2 0
 
 We generally map well to the numbers you see from referring links/domains in your Google Analytics, as well as what's reported in Google/Bing Webmaster Tools and Yahoo! Site Explorer. MJ tends to be much larger than any of these, often not in proportion. I'm unfamiliar with Ahrefs; not sure if they build their own indices or rely on third party data in whole or part. Our goal is to always have the metrics that best correlate to rankings in Google, and for the past few years, whenever we or other third parties have run analyses, we tend to hit that mark. That's not to say we don't want to get bigger and fresher - we realize a sample set is not as valuable as the entire web's link graph (so long as they're all portions Google/Bing are crawling + counting, too). I should also note that we do a lot of canonicalization and de-duplication of URLs, which we've found leads to higher quality stuff, but it does make the raw counts lower.
 Cancel
William Craig

2011-11-06T17:55:07-08:00

Thanks for the update Ran! Glad to see the data is a little more accurate now

1 0

Thanks for the update Ran! Glad to see the data is a little more accurate now
Cancel
Dave Cardwell

2011-11-06T09:53:18-08:00

“In 2012, we're aiming to reach into the 100million+ URL index size”

100billion+?

Congrats on the update. Do you plan on doing (or have you already done) an analysis blog post of how your index has changed over time in terms of things like % nofollowed, number of pages with rel=canonical, etc? I’d be interested to see how quickly the web follows the search engines’ new initiatives.

davecardwell edited 2011-11-06T09:56:01-08:00
1 0

“In 2012, we're aiming to reach into the 100million+ URL index size” 100billion+? Congrats on the update. Do you plan on doing (or have you already done) an analysis blog post of how your index has changed over time in terms of things like % nofollowed, number of pages with rel=canonical, etc? I’d be interested to see how quickly the web follows the search engines’ new initiatives.
Cancel
- Rand Fishkin
 
 2011-11-06T10:09:36-08:00
 
 Doh! Thanks for the catch; fixed that up.
 
 In terms of analysis - yes; great idea. Next update (in December), I'll do some charts showing nofollow and rel canonical over time.
 
 8 0
 
 Doh! Thanks for the catch; fixed that up. In terms of analysis - yes; great idea. Next update (in December), I'll do some charts showing nofollow and rel canonical over time.
 Cancel
 - Adriaanb
 
 2011-11-07T05:05:50-08:00
 
 Looking forward to see these charts Rand! Keep up the good work!
 
 2 0
 
 Looking forward to see these charts Rand! Keep up the good work!
 Cancel
Sasha Zabelin

2011-11-06T20:06:53-08:00

"As always, feedback on the new index is greatly appreciated - if you're seeing stuff we've missed, files we shouldn't have crawled...?

I was wondering, could you please purge my personal credit card information and my social security out of your recent crawled index data? (just kidding)

On a serious note, I was looking for a tool, or a service, that could generate visual map-cloud of links pointing to one's website. Something that was described by social-network-spam-and-author-rank post.Please point me in the right direction if that is available. Google and bing can generate that map, but I don't think I have access to it.

This may become possible for your engineers and either provide that through opensiteexplorer, or at least via PRO tool. Such feature alone would tip me over to become a paid pro member.

Otherwise, congratulations - 43bil urls, wow!

Best regards,

Sasha

QRrabbit edited 2011-11-06T20:53:00-08:00
1 0

"As always, feedback on the new index is greatly appreciated - if you're seeing stuff we've missed, files we shouldn't have crawled...? I was wondering, could you please purge my personal credit card information and my social security out of your recent crawled index data? (just kidding) On a serious note, I was looking for a tool, or a service, that could generate visual map-cloud of links pointing to one's website. Something that was described by <a href="social-network-spam-and-author-rank">social-network-spam-and-author-rank </a>post.Please point me in the right direction if that is available. Google and bing can generate that map, but I don't think I have access to it. This may become possible for your engineers and either provide that through opensiteexplorer, or at least via PRO tool. Such feature alone would tip me over to become a paid pro member. Otherwise, congratulations - 43bil urls, wow! Best regards, Sasha
Cancel
Justin Briggs

2011-11-06T11:44:09-08:00

I asked this question in QA, but I'm curious about how Linkscape handles .html URLs that 302 redirect to .exe files. Those .html files are showing up as top linked to URLs for our site.

If I change these to 301s, what happens with how linkscape handles theses URLs and files?

I can also robots.txt these URLs, but if I do that before you see that they're redirects to .exe files, what happens then?

Our affiliate program has resulted in quite a lot of links to these download.html files, which are all redirects to a URL that prompts a download of an exe.

Example:

https://www.bigfishgames.com/download-games/4346/top-chef/download.html

It's an odd problem, but one result is it's made the top pages report have a lot of noise in it.

Justin_Briggs edited 2011-11-06T11:50:50-08:00
1 0

I asked this question in QA, but I'm curious about how Linkscape handles .html URLs that 302 redirect to .exe files. Those .html files are showing up as top linked to URLs for our site. If I change these to 301s, what happens with how linkscape handles theses URLs and files? I can also robots.txt these URLs, but if I do that before you see that they're redirects to .exe files, what happens then? Our affiliate program has resulted in quite a lot of links to these download.html files, which are all redirects to a URL that prompts a download of an exe. Example: https://www.bigfishgames.com/download-games/4346/top-chef/download.html It's an odd problem, but one result is it's made the top pages report have a lot of noise in it.
Cancel
- Rand Fishkin
 
 2011-11-06T16:40:25-08:00
 
 Pinged the Linkscape engineering team Justin - they should have a reply on this soon.
 
 1 0
 
 Pinged the Linkscape engineering team Justin - they should have a reply on this soon.
 Cancel
- Carin Overturf
 
 2011-11-08T13:12:36-08:00
 
 Hey Justin!
 
 Great question! I forwarded this to our engineers and here is their response below. They weren't exactly sure of your use case, so if this isn't a suffient answer, let me know and we'll track down a deeper explanation for you! You can always email me at [email protected] if you want to reach out directly!
 
 "I don't think the Linkscape crawlers treat 301s differently from 302s. We will track the actual status code and we may eventually try to follow the redirect based on its MozRank.
 
 In the case of this particular URL, I checked and it is currently a 302. The redirected URL is an exe like stated, and the content-type is currently being specified as application/octet-stream. I don't believe these URLs should be causing us any problems since the content-type header is being specified correctly. The problem that we have with binary files is when the content-type is not being specified properly. Then we have to rely on the file extension..."
 
 In short, the binary file is not an issue for our crawlers, but the link will be counted as a 301 or a 302 since it is essentially an inbound link.
 
 I hope this helps, but let me know if you have more questions!
 
 Thanks!!
 
 Carin
 
 carinoverturf edited 2011-11-08T16:01:10-08:00
 1 0
 
 Hey Justin! Great question! I forwarded this to our engineers and here is their response below. They weren't exactly sure of your use case, so if this isn't a suffient answer, let me know and we'll track down a deeper explanation for you! You can always email me at carin@seomoz.org if you want to reach out directly! "I don't think the Linkscape crawlers treat 301s differently from 302s. We will track the actual status code and we may eventually try to follow the redirect based on its MozRank. In the case of this particular URL, I checked and it is currently a 302. The redirected URL is an exe like stated, and the content-type is currently being specified as application/octet-stream. I don't believe these URLs should be causing us any problems since the content-type header is being specified correctly. The problem that we have with binary files is when the content-type is not being specified properly. Then we have to rely on the file extension..." In short, the binary file is not an issue for our crawlers, but the link will be counted as a 301 or a 302 since it is essentially an inbound link. I hope this helps, but let me know if you have more questions! Thanks!! Carin
 Cancel

Post Analytics

Comments 27

Log in to Moz

Don't have an account?