Serious Robots.txt Misuse &amp; High Impact Solutions

Comments 65

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

algogmbh_petra

2010-10-11T23:15:19-07:00

Thank you for that summary. That gave me a perfect understanding of those technical issues.

I suppose a lot of people use rather the robots.txt file instead of the "noindex, follow" in the metags to block pages - simply because it is easier to handle. If e.g. a homepage with lots of pages and just one template should block some pages, the faster way would be by robots.txt (without the need of making some duplicate templates....). But the fastest shouldn't be always the best way - as you teached us :-).

3 0

Thank you for that summary. That gave me a perfect understanding of those technical issues. I suppose a lot of people use rather the robots.txt file instead of the "noindex, follow" in the metags to block pages - simply because it is easier to handle. If e.g. a homepage with lots of pages and just one template should block some pages, the faster way would be by robots.txt (without the need of making some duplicate templates....). But the fastest shouldn't be always the best way - as you teached us :-).
Cancel
- Lindsay Wassell
 
 2010-10-12T04:55:50-07:00
 
 This is true! As with most things SEO, a little (or a lot) of extra work can really pay off. These are the type of tactics that separate basic optimization from advanced optimization.
 
 1 0
 
 This is true! As with most things SEO, a little (or a lot) of extra work can really pay off. These are the type of tactics that separate basic optimization from advanced optimization.
 Cancel
 - Kyle Richey
 
 2010-10-12T11:30:22-07:00
 
 Thanks for the examples, Lindsay! It's time to dive into the robts.txt file on a couple client sites (and my own) to make sure everything is still running smoothly.
 
 1 0
 
 Thanks for the examples, Lindsay! It's time to dive into the robts.txt file on a couple client sites (and my own) to make sure everything is still running smoothly.
 Cancel
WeRASkitzzo

2010-10-12T10:29:38-07:00

This is actually a very useful post and something that is easy (for me at least) to forget or overlook.

It got me thinking though, is there much of a reason to use noindex and nofollow most of the time? Wouldn't noindex, follow be better in almost all cases?

If that became the default use, it would be a lot easier than trying to remember to switch it for specific SEO valuable pages etc.

3 0

This is actually a very useful post and something that is easy (for me at least) to forget or overlook. It got me thinking though, is there much of a reason to use noindex and nofollow most of the time? Wouldn't noindex, follow be better in almost all cases? If that became the default use, it would be a lot easier than trying to remember to switch it for specific SEO valuable pages etc.
Cancel
- Lindsay Wassell
 
 2010-10-12T11:02:35-07:00
 
 The only use I can see for meta 'noindex,nofollow' might be a page that you want to keep out of the index and one that is also full of paid links. However, 'noindex,nofollow' is really just like adding a file to the robots.txt in that it blocks link juice flow. The main difference being that the 'noindex,nofollow' tag would actually be effective at keeping a listing out of the SERPs.
 
 An additional advanced note is that 'follow' isn't necessary to say. 'follow' is the default and is implied. Your standard entry for pages you want to keep out of the index could simply be;
 
 <meta name="robots" content="noindex">
 
 instead of;
 
 <meta name="robots" content="noindex, follow">
 
 The result is the same.
 
 Lindsay edited 2010-10-12T11:05:14-07:00
 3 0
 
 The only use I can see for meta 'noindex,nofollow' might be a page that you want to keep out of the index and one that is also full of paid links. However, 'noindex,nofollow' is really just like adding a file to the robots.txt in that it blocks link juice flow. The main difference being that the 'noindex,nofollow' tag would actually be effective at keeping a listing out of the SERPs. An additional advanced note is that 'follow' isn't necessary to say. 'follow' is the default and is implied. Your standard entry for pages you want to keep out of the index could simply be; <meta name="robots" content="noindex"> instead of; <meta name="robots" content="noindex, follow"> The result is the same.
 Cancel
 - jennita
 
 2010-10-12T11:49:23-07:00
 
 Doh! :) Good to know about the default... durr.
 
 1 0
 
 Doh! :) Good to know about the default... durr.
 Cancel
- Dr. Peter J. Meyers
 
 2010-10-12T11:53:57-07:00
 
 I know there's some debate on this subject, from an internal PR-flow standpoint, but personally, I think NOINDEX,NOFOLLOW can be handy when you have an architecture where all of the pages past a certain level are either duplicates or pages you definitely don't want indexed. For example, let's say your site looks like:
 
 (1) Home > (2) Search > (3) Product > (4) Product Options > (5) Cart > (6) Checkout
 
 You might not want to index "Product Options," because they'll be near-duplicates, and you don't want to index your shopping cart and checkout pages, so why not cut off the bots at "level" (4) by tagging those pages with NOINDEX,NOFOLLOW. It ends up being a lot cleaner than trying to nofollow all the links, block parameters, etc.
 
 Dr-Pete edited 2010-10-12T11:54:14-07:00
 2 0
 
 I know there's some debate on this subject, from an internal PR-flow standpoint, but personally, I think NOINDEX,NOFOLLOW can be handy when you have an architecture where all of the pages past a certain level are either duplicates or pages you definitely don't want indexed. For example, let's say your site looks like: (1) Home > (2) Search > (3) Product > (4) Product Options > (5) Cart > (6) Checkout You might not want to index "Product Options," because they'll be near-duplicates, and you don't want to index your shopping cart and checkout pages, so why not cut off the bots at "level" (4) by tagging those pages with NOINDEX,NOFOLLOW. It ends up being a lot cleaner than trying to nofollow all the links, block parameters, etc.
 Cancel
 - WeRASkitzzo
 
 2010-10-12T12:08:00-07:00
 
 Dr. Pete, even if you don't want those pages indexed, why not allow the links to be followed? Parameters aren't an issue if you use canonical tags & that way you don't lose out on the benefit of any incoming links from people linking to your product pages or something of that nature.
 
 The issue of paid or sponsored links would definitely be a good use for nonindex,nofollow , but I would imagine that's a fairly rare occurance for most people.
 
 1 0
 
 Dr. Pete, even if you don't want those pages indexed, why not allow the links to be followed? Parameters aren't an issue if you use canonical tags & that way you don't lose out on the benefit of any incoming links from people linking to your product pages or something of that nature. The issue of paid or sponsored links would definitely be a good use for nonindex,nofollow , but I would imagine that's a fairly rare occurance for most people.
 Cancel
 - Dr. Peter J. Meyers
 
 2010-10-12T12:11:57-07:00
 
 LOL - I described the whole setup and then never even mentioned the NOFOLLOW aspect. Nice. That's what babies do to your brain.
 
 In that example, since you know that nothing after level (4) should be indexed, I think it can be cleaner to cut off the bots entirely. You're not typically going to be cross-linking those pages in any way useful to search visitors, and you can save the crawlers bandwidth and focus them on your more important pages. You're basically saying "Everything below Level 4 is useless to you - ignore it, and focus on 1-3".
 
 2 0
 
 LOL - I described the whole setup and then never even mentioned the NOFOLLOW aspect. Nice. That's what babies do to your brain. In that example, since you know that nothing after level (4) should be indexed, I think it can be cleaner to cut off the bots entirely. You're not typically going to be cross-linking those pages in any way useful to search visitors, and you can save the crawlers bandwidth and focus them on your more important pages. You're basically saying "Everything below Level 4 is useless to you - ignore it, and focus on 1-3".
 Cancel
 - Alan Bleiweiss
 
 2010-10-12T12:22:59-07:00
 
 I have to agree with Dr. Pete. Entire sections of a site should be sliced out of ot the equation. Better to expend more energy on those higher level pages in this scenario. The extremely minute PR that such deep pages passes isn't necessarily worth the effort.
 
 1 0
 
 I have to agree with Dr. Pete. Entire sections of a site should be sliced out of ot the equation. Better to expend more energy on those higher level pages in this scenario. The extremely minute PR that such deep pages passes isn't necessarily worth the effort.
 Cancel
 - AndyBeard
 
 2010-10-12T23:31:44-07:00
 
 How about an easier thought process?
 
 If you could envisage ever wanting to use "noindex, nofollow" that is likely a page that should have any juice redirected away to somewhere else using canonical tags or some kind of cloaking bot herding.
 
 In which case it shouldn't really matter about the nofollows on the page
 
 The thing is those pages don't get a tiny amount of PageRank - you might well have a link from every product page even if they are not in sitewide navigation. If you are avoiding using link level nofollow because you don't know what is happening to the juice, and avoiding javascript due to accessibility problems and have raw links, that can add up to a fair amount of juice.
 
 1 0
 
 How about an easier thought process? If you could envisage ever wanting to use "noindex, nofollow" that is likely a page that should have any juice redirected away to somewhere else using canonical tags or some kind of cloaking bot herding. In which case it shouldn't really matter about the nofollows on the page The thing is those pages don't get a tiny amount of PageRank - you might well have a link from every product page even if they are not in sitewide navigation. If you are avoiding using link level nofollow because you don't know what is happening to the juice, and avoiding javascript due to accessibility problems and have raw links, that can add up to a fair amount of juice.
 Cancel
AndyBeard

2010-10-12T04:41:25-07:00

It is nice to get a link to that old post and good that you pointed out about the date.

I am actually going to be partially debunking myself soon as for the last 16 months since the nofollow change was announced I have seen some pretty compelling evidence that we are missing a huge chunk of the equation.

This is what I wrote on Matt's nofollow change post:

Halfdeck, my hope/guess, and this is purely speculation, is that Google uses something like “DDD” Dynamic Domain Dampening.

It is something that is possibly needed to handle hanging/dangling pages effectively anyway, whereby rather than giving this part of the dampening factor to the whole web, it is redistributed with the domain instead.

Note: dynamic domain dampening is just something I came up with for a tweet. For a while now I have been convinced that it might apply to all blocked URLs as well, including robots.txt.

Whilst I can do some testing, and I possibly have the best scripts to do it, it is harder to prove anything than the old days of proving any kind of sculpting improved results of specific ranking goals.

The good news is I am going to be releasing my code soon as open source so other people can blow up their brains on this as well.

Dr-Pete edited 2010-10-12T06:56:50-07:00
3 0

It is nice to get a link to that old post and good that you pointed out about the date. I am actually going to be partially debunking myself soon as for the last 16 months since the nofollow change was announced I have seen some pretty compelling evidence that we are missing a huge chunk of the equation. This is what I wrote on Matt's <a href="https://www.mattcutts.com/blog/pagerank-sculpting/#comment-347490" rel="nofollow">nofollow change post</a>: Halfdeck, my hope/guess, and this is purely speculation, is that Google uses something like “DDD” Dynamic Domain Dampening. It is something that is possibly needed to handle hanging/dangling pages effectively anyway, whereby rather than giving this part of the dampening factor to the whole web, it is redistributed with the domain instead. Note: dynamic domain dampening is just something I came up with for a tweet. For a while now I have been convinced that it might apply to all blocked URLs as well, including robots.txt. Whilst I can do some testing, and I possibly have the best scripts to do it, it is harder to prove anything than the old days of proving any kind of sculpting improved results of specific ranking goals. The good news is I am going to be releasing my code soon as open source so other people can blow up their brains on this as well.
Cancel
- AndyBeard
 
 2010-10-12T05:37:06-07:00
 
 Edited to try to get the indent working on the quoted text with no luck. Also couldn't type any normal characters when trying to edit, though carriage returns were working.
 
 1 0
 
 Edited to try to get the indent working on the quoted text with no luck. Also couldn't type any normal characters when trying to edit, though carriage returns were working.
 Cancel
Alan Bleiweiss

2010-10-12T09:31:03-07:00

This really IS a must-read article... t's this kind of knowledge that separates people who "think" they know SEO and advanced professionals. Now - if I could just do a better job of motivating clients to actually implement such tasks when pointing them out... < sigh >

3 0

This really IS a must-read article... t's this kind of knowledge that separates people who "think" they know SEO and advanced professionals. Now - if I could just do a better job of motivating clients to actually implement such tasks when pointing them out... < sigh >
Cancel
BillNordwall

2010-10-11T23:17:15-07:00

Can't you just use X-Robots for excluding non-HTML content?

For example: https://pastie.org/1214742

BillNordwall edited 2010-10-11T23:18:02-07:00
3 0

Can't you just use X-Robots for excluding non-HTML content? For example: https://pastie.org/1214742 
Cancel
- Lindsay Wassell
 
 2010-10-12T05:07:28-07:00
 
 Thanks Bill! You've taught me something.
 
 It is possible to block indexing of non-html files by adding a meta directive to the HTTP Header of a document. Others, read more about the x-robots tag from Google here. Bing also supports it, and describes their policies a little on page 13 of this document (it is a downloadable PDF from Bing).
 
 Thanks again, Bill.
 
 2 0
 
 Thanks Bill! You've taught me something. It is possible to block indexing of non-html files by adding a meta directive to the HTTP Header of a document. Others, read more about the <a href="https://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html" rel="nofollow">x-robots tag</a> from Google here. Bing also supports it, and describes their policies a little on page 13 of <a href="https://download.microsoft.com/download/4/5/4/454c13d4-d94d-4b54-8e46-fe403df7632b/wmc_faq.pdf" rel="nofollow">this document</a> (it is a downloadable PDF from Bing). Thanks again, Bill.
 Cancel
- Lindsay Wassell
 
 2010-10-12T05:25:13-07:00
 
 I've updated the post to reflect the x-robots protocol for non-HTML pages and provided a little attribution to you for being the first to point it out. Big thanks, Bill!
 
 2 0
 
 I've updated the post to reflect the x-robots protocol for non-HTML pages and provided a little attribution to you for being the first to point it out. Big thanks, Bill!
 Cancel
 - BillNordwall
 
 2010-10-12T09:12:26-07:00
 
 No problem - happy to help. Great article!
 
 1 0
 
 No problem - happy to help. Great article!
 Cancel
Stephen Palkot

2010-10-12T07:19:47-07:00

Whoa! Seriously great tip. I just saw that some of the most powerful pages on a site I'm working on are being wasted. Now for yet another item in my to-do list. :0)

2 0

Whoa! Seriously great tip. I just saw that some of the most powerful pages on a site I'm working on are being wasted. Now for yet another item in my to-do list. :0) 
Cancel
- Lindsay Wassell
 
 2010-10-12T09:12:55-07:00
 
 Really? I'm amazed how many sites are doing this without realizing it. Too get the most value out of this adjustment, make sure that these top pages that will be changed to 'noindex,follow' also have nice internal links to your most important pages that are in the index. That way you will pass that link value through.
 
 Good luck!
 
 1 0
 
 Really? I'm amazed how many sites are doing this without realizing it. Too get the most value out of this adjustment, make sure that these top pages that will be changed to 'noindex,follow' also have nice internal links to your most important pages that are in the index. That way you will pass that link value through. Good luck!
 Cancel
Chris Horner

2010-10-12T08:29:08-07:00

Thanks Lindsay, I am going through one of my websites and found 23 useless pages being index... thanks a million, this should move it up from No.2 to no.1 on Google

2 0

Thanks Lindsay, I am going through one of my websites and found 23 useless pages being index... thanks a million, this should move it up from No.2 to no.1 on Google
Cancel
- Lindsay Wassell
 
 2010-10-12T09:10:57-07:00
 
 That sounds like a big win in my book! Congratulations!
 
 2 0
 
 That sounds like a big win in my book! Congratulations!
 Cancel
 - Chris Horner
 
 2010-10-13T16:07:28-07:00
 
 I am always amazed how a little mistake like that can when rectified can make such a big difference, I got be honest, it look like it's also moved it's second keyphrase from 7th to 2nd.
 
 Thanks again for a great post.
 
 1 0
 
 I am always amazed how a little mistake like that can when rectified can make such a big difference, I got be honest, it look like it's also moved it's second keyphrase from 7th to 2nd. Thanks again for a great post.
 Cancel
Staff

Dr. Peter J. Meyers
Staff

2010-10-12T06:58:54-07:00

This is a bit anecdotal (and I'm curious about other SEO's experiences), but I've also found that Robots.txt can be pretty lousy for removing content that's already been indexed. If you have it from the beginning, it tends to work alright (not foolproof, as you said, especially if you get inbound links to those pages). Once your content is indexed, though, adding Robots.txt to remove it is extremely unpredictable.

2 0

This is a bit anecdotal (and I'm curious about other SEO's experiences), but I've also found that Robots.txt can be pretty lousy for removing content that's already been indexed. If you have it from the beginning, it tends to work alright (not foolproof, as you said, especially if you get inbound links to those pages). Once your content is indexed, though, adding Robots.txt to remove it is extremely unpredictable.
Cancel
- pjfusco
 
 2010-10-12T08:25:14-07:00
 
 Robots.txt is completely useless for removing content that's already been indexed. Think about it ... you disallow crawling of some portion of your website's content so the bots don't crawl the content. That's critical, because the bots don't crawl the content. So I you have impelmented meta robots on each page for <noindex> <nochache> <noarchive> etc., the bots never see the instructions because they don't bother to crawl the page. In summary:
 
 DO: Use meta robots to help "deindex" batches of content
 
 DON't: Use robots.txt in tandem with meta robots to help deinded batches of content
 
 OR: Use GWMT to quickly remove indexed URLs and then add the meta robots to continue to keep the content out of search egnine indexes
 
 1 0
 
 Robots.txt is completely useless for removing content that's already been indexed. Think about it ... you disallow crawling of some portion of your website's content so the bots don't crawl the content. That's critical, because the bots don't crawl the content. So I you have impelmented meta robots on each page for <noindex> <nochache> <noarchive> etc., the bots never see the instructions because they don't bother to crawl the page. In summary: DO: Use meta robots to help "deindex" batches of content DON't: Use robots.txt in tandem with meta robots to help deinded batches of content OR: Use GWMT to quickly remove indexed URLs and then add the meta robots to continue to keep the content out of search egnine indexes 
 Cancel
 - Dr. Peter J. Meyers
 
 2010-10-12T08:41:23-07:00
 
 I think the problem is that a lot of people think Robots.txt is a hatchet that can be applied to instantly hack away already indexed pages. We give so many warnings about mis-using it (and rightly so) that people give it near magical properties. Of course, then they add a bunch of pages to Robots.txt only to find out weeks or months later that little or nothing happened.
 
 1 0
 
 I think the problem is that a lot of people think Robots.txt is a hatchet that can be applied to instantly hack away already indexed pages. We give so many warnings about mis-using it (and rightly so) that people give it near magical properties. Of course, then they add a bunch of pages to Robots.txt only to find out weeks or months later that little or nothing happened.
 Cancel
bostonjoe

2010-10-13T23:24:10-07:00

Great Article Lindsay, just the one I was looking for. I was struggling over finding a way to add a # in my url for a classified site that I had, this meta noindex was just the answer I needed. Keep up the great work.

2 0

Great Article Lindsay, just the one I was looking for. I was struggling over finding a way to add a # in my url for a classified site that I had, this meta noindex was just the answer I needed. Keep up the great work.
Cancel
CollegeAmerica

2010-10-12T09:46:14-07:00

Excellent post. I've seen robots.txt inaccurately explained in so many blog posts by "experienced" SEOs, it's like a nasty virus! I try to always leave a comment and clear up the confusion, but now I can just include a link to this post. Thanks.

2 0

Excellent post. I've seen robots.txt inaccurately explained in so many blog posts by "experienced" SEOs, it's like a nasty virus! I try to always leave a comment and clear up the confusion, but now I can just include a link to this post. Thanks.
Cancel
- jennita
 
 2010-10-12T11:57:23-07:00
 
 Exactly! I was thinking the very same thing about answering questions in Q & A. We often get questions about blocking pages and now we have a nice, succinct post to point them too. Lindsay obviously rocks. :)
 
 1 0
 
 Exactly! I was thinking the very same thing about answering questions in Q & A. We often get questions about blocking pages and now we have a nice, succinct post to point them too. Lindsay obviously rocks. :)
 Cancel
Gemma N

2010-10-12T02:30:48-07:00

It’s amazing how many big names out there are misusing the robots file. I too have been guilty of this in the past but now use the preferred option of the robots Meta tag to noindex pages.

Thanks for the useful explanations as well as the alternative methods available

2 0

It’s amazing how many big names out there are misusing the robots file. I too have been guilty of this in the past but now use the preferred option of the robots Meta tag to noindex pages. Thanks for the useful explanations as well as the alternative methods available
Cancel
Himanshu Sharma

2010-10-12T02:04:10-07:00

That's a good reminder post. I certainly need a dose of such posts from time to time as i am sufferring from information overflow. I would like to bust one myth regarding the use of robots.txt. There is no such thing as 'Allow:' field in robots.txt (https://www.robotstxt.org/robotstxt.html), still you can find webmasters using them. Here is an interesting video from Matt Cutts on 'Can I use robots.txt to optimize Googlebot's crawl?'. Note how matt reacts :)

2 0

That's a good reminder post. I certainly need a dose of such posts from time to time as i am sufferring from information overflow. I would like to bust one myth regarding the use of robots.txt. There is no such thing as 'Allow:' field in robots.txt (https://www.robotstxt.org/robotstxt.html), still you can find webmasters using them. Here is an interesting video from Matt Cutts on '<a href="https://www.youtube.com/watch?v=I2giR-WKUfY" rel="nofollow">Can I use robots.txt to optimize Googlebot's crawl?</a>'. Note how matt reacts :)
Cancel
goodnewscowboy

2010-10-18T06:50:26-07:00

Had to wait a week to read it, but the wait was well worth it Lindsay. What a fantastic resource for robots.txt.

It's posts like this that keep reminding me that I am not nearly as advanced in SEO as I thought I was and instead am simply a student with much more to learn. [sigh]

1 0

Had to wait a week to read it, but the wait was well worth it Lindsay. What a fantastic resource for robots.txt. It's posts like this that keep reminding me that I am not nearly as advanced in SEO as I thought I was and instead am simply a student with much more to learn. [sigh]
Cancel
Ben Acheson

2010-10-15T06:43:36-07:00

I love the way Google ignores basic SEO best practice. I wonder what sort of SEO capability they have internally looking at their own assets.

ben.acheson edited 2010-10-15T06:45:52-07:00
1 0

I love the way Google ignores basic SEO best practice. I wonder what sort of SEO capability they have internally looking at their own assets.
Cancel
frostmill

2010-10-13T11:56:20-07:00

Thank you! That cleared most of the questions. I can understand the effort you put in to get this post up. Two thumbs up!!

1 0

Thank you! That cleared most of the questions. I can understand the effort you put in to get this post up. Two thumbs up!!
Cancel
João Vargas

2010-10-13T10:45:04-07:00

Great work. I was losing a lot of juice, I was using the meta robots without "..., follow. " Thank you, you're beautiful and smart.

1 0

Great work. I was losing a lot of juice, I was using the meta robots without "..., follow. " Thank you, you're beautiful and smart.
Cancel
VictoriaBlount

2010-10-12T01:23:36-07:00

In using a robots txt file, if for example i need to dissallow a page like the terms & conditions page on an eccomerce shop, because it will come up for duplicate content and that can effect the SEO, then i will only dissalow that individual file.

1 0

In using a robots txt file, if for example i need to dissallow a page like the terms & conditions page on an eccomerce shop, because it will come up for duplicate content and that can effect the SEO, then i will only dissalow that individual file.
Cancel
- Lindsay Wassell
 
 2010-10-12T05:13:09-07:00
 
 Hi Victoria - Ideally, you want to ensure that you website does not generate duplicate content. The next best solution is to add the meta robots tag, 'noindex,follow'. This ensures that any link value passed into the page will transfer through to other pages and help those them rank. Save the link juice!
 
 1 0
 
 Hi Victoria - Ideally, you want to ensure that you website does not generate duplicate content. The next best solution is to add the meta robots tag, 'noindex,follow'. This ensures that any link value passed into the page will transfer through to other pages and help those them rank. Save the link juice!
 Cancel
Andre Van Kets

2010-10-19T08:34:43-07:00

Hi Lindsay - thanks for all the robots.txt best practise tips. I gave myself 9 out of 10 for my latest implementation :)

I do have a questrion for you relating to one of your cautionary points: "the practice of internal PR sculpting with the nofollow link attribute (is) discouraged".

What technique would you recommend for internal PR sculting?

1 0

Hi Lindsay - thanks for all the robots.txt best practise tips. I gave myself 9 out of 10 for my latest implementation :) I do have a questrion for you relating to one of your cautionary points: "the practice of internal PR sculpting with the nofollow link attribute (is) discouraged". What technique would you recommend for internal PR sculting?
Cancel
SashaGibbs

2012-10-01T23:57:10-07:00

Is it OK to use robot.txt for out going links? I use links for price comparison on my site, and for this purpose i have to give links of other sites on every page, so I have added robot.txt for every such link. Can Google mind such activity or they do not care how many times you use it... Please share your knowledge on it.

1 0

Is it OK to use robot.txt for out going links? I use links for price comparison on my site, and for this purpose i have to give links of other sites on every page, so I have added robot.txt for every such link. Can Google mind such activity or they do not care how many times you use it... Please share your knowledge on it. 
Cancel
drsaimhashmi

2016-07-16T05:27:24-07:00

great info here. i become shocked when i found this article here. Nice admin.

https://snapcrack.net/winaso

1 0

great info here. i become shocked when i found this article here. Nice admin. https://snapcrack.net/winaso 
Cancel
Jonathan Elder

2011-03-16T02:36:49-07:00

Slightly late comment!

Wouid it be sensible to use noindex,follow on a search page.

The page has its own url (/search.php) but of course the content changes every time.

1 0

Slightly late comment! Wouid it be sensible to use noindex,follow on a search page. The page has its own url (/search.php) but of course the content changes every time.
Cancel
andrew6906

2011-01-22T21:47:38-08:00

I use robots.txt on my pages which have duplicate content to avoid any penalties.Those pages dont have any link jusice to pass on.

1 0

I use robots.txt on my pages which have duplicate content to avoid any penalties.Those pages dont have any link jusice to pass on.
Cancel
seomoxie

2010-11-03T10:48:35-07:00

Thanks for the tips! I recently experienced with one of my larger clients the usage of robot.txt to segragate the xml sitemap crawl into two websites. Your tips brought back memories. I appreciate the information.

1 0

Thanks for the tips! I recently experienced with one of my larger clients the usage of robot.txt to segragate the xml sitemap crawl into two websites. Your tips brought back memories. I appreciate the information.
Cancel
Christopher G. McGiffen

2010-10-13T06:11:47-07:00

Some nice examples that are often lacking from articles about robots.txt; I'm going to be a bit pedantic though and point out that it is not the title tag in the Cisco results - after all the page is not indexed so how could the search engine know what the title is! Probably comes from the text of incoming links.

1 0

Some nice examples that are often lacking from articles about robots.txt; I'm going to be a bit pedantic though and point out that it is not the title tag in the Cisco results - after all the page is not indexed so how could the search engine know what the title is! Probably comes from the text of incoming links.
Cancel
Carlos del Rio

2010-10-12T11:32:39-07:00

I strongly agree with you that the best use of robots.txt is almost always "don't bother."

Robots.txt has a very high potential for mistakes and damage to overall SEO and in the end is handled as a suggestion even by the major players (Google and Bing), as shown in your post. They can be used to improve crawl efficiency, especially with multiple sitemaps, but I would advise most people too use page level handling of both follow and index commands.

1 0

I strongly agree with you that the best use of robots.txt is almost always "don't bother." Robots.txt has a very high potential for mistakes and damage to overall SEO and in the end is handled as a suggestion even by the major players (Google and Bing), as shown in your post. They can be used to improve crawl efficiency, especially with multiple sitemaps, but I would advise most people too use page level handling of both follow and index commands.
Cancel
Jeremy Nelson

2010-10-12T06:29:24-07:00

Insightful post.

One question I am having is whether or not using Robots.txt is still useful for trying to prevent affiliate URLs from being indexed. I am aware that rel=canonical can solve this well, but I'd like to know peoples thoughts on avoiding the backend by modifying Robots.txt with a wildcard like Disallow: *?affid= or something like that.

jnelson312 edited 2010-10-12T06:39:34-07:00
1 0

Insightful post. One question I am having is whether or not using Robots.txt is still useful for trying to prevent affiliate URLs from being indexed. I am aware that rel=canonical can solve this well, but I'd like to know peoples thoughts on avoiding the backend by modifying Robots.txt with a wildcard like Disallow: *?affid= or something like that.
Cancel
- Lindsay Wassell
 
 2010-10-12T06:34:35-07:00
 
 The trouble with with blocking tracking code variants in the robots.txt is that you are creating a link juice dead-end for any value that the inbound affiliate links may be creating for your site. You would be much better off using the canonical tag to consolidate the duplicate page URLs that these links can create.
 
 1 0
 
 The trouble with with blocking tracking code variants in the robots.txt is that you are creating a link juice dead-end for any value that the inbound affiliate links may be creating for your site. You would be much better off using the canonical tag to consolidate the duplicate page URLs that these links can create.
 Cancel
 - AndyBeard
 
 2010-10-12T10:16:29-07:00
 
 I would add to that noindex anything with a tracking parameter. Google doesn't always handle canonical.
 
 You also have an option in Webmaster tools to strip parameters, though I have also found that to be a little tempermental, even with something obvious such as Google analytics code.
 
 A 3rd option is to use '#' fragments for the parameters, and then use javascript to write a cookie.
 
 if you can test for a cookie being set, a 301 redirect to a clean URL is an option
 
 1 0
 <ul><li>I would add to that noindex anything with a tracking parameter. Google doesn't always handle canonical.</li> <li>You also have an option in Webmaster tools to strip parameters, though I have also found that to be a little tempermental, even with something obvious such as Google analytics code.</li> <li>A 3rd option is to use '#' fragments for the parameters, and then use javascript to write a cookie.</li> <li>if you can test for a cookie being set, a 301 redirect to a clean URL is an option</li> </ul>
 Cancel
Casey Henry

2010-10-12T06:14:31-07:00

Lindsay,

Great post! I can't count how many times I have had to tell clients or other people on the web to stop misusing robots.txt and the damages it can do. I love to see the grin on their face when they see the results of removing those pages from robots.txt and adding the meta noindex tag. Thanks for taking the time to write this out!

1 0

Lindsay, Great post! I can't count how many times I have had to tell clients or other people on the web to stop misusing robots.txt and the damages it can do. I love to see the grin on their face when they see the results of removing those pages from robots.txt and adding the meta noindex tag. Thanks for taking the time to write this out!
Cancel
James Svoboda

2010-10-12T06:04:04-07:00

Well done. A very nice post indeed.

1 0

Well done. A very nice post indeed.
Cancel
padipascua101

2010-10-12T05:55:24-07:00

Awesome post ,like have to review it for once and for all hahaha.

1 0

Awesome post ,like have to review it for once and for all hahaha.
Cancel
John Turner

2010-10-12T05:05:37-07:00

Wow, Digg really doesn't get SEO right at all! It feels like it wasn't that long ago when they didn't redirect from "www" to "non www". Awesome example of how to shoot yourself in the foot!

1 0

Wow, Digg really doesn't get SEO right at all! It feels like it wasn't that long ago when they didn't redirect from "www" to "non www". Awesome example of how to shoot yourself in the foot!
Cancel
Aidan Beanland

2010-10-11T23:22:09-07:00

This is an excellent summary with good real-world examples Lindsay. A great reference resource for those who risk making some potentially drastic errors!

1 0

This is an excellent summary with good real-world examples Lindsay. A great reference resource for those who risk making some potentially drastic errors!
Cancel
eBoost Consulting

2010-10-12T10:17:09-07:00

Great post, thank you for this. I've always favored robots.txt over meta noindex and now I can see why that's not always the optimal way to do it. I'll be making metanoindex, follow a best practice for pages I don't want indexed but that may still receive links. Thanks!

- Evan

1 0

Great post, thank you for this. I've always favored robots.txt over meta noindex and now I can see why that's not always the optimal way to do it. I'll be making metanoindex, follow a best practice for pages I don't want indexed but that may still receive links. Thanks! - Evan
Cancel
Gianluca Fiorelli

2010-10-12T09:32:34-07:00

Let me ask a maybe provocative question.

If robots.txt can cause so many misuses, why not simply use it just to block, from the first seconds of a website life, only the "backend" carpets (java, scripts, admin...) and to point faster bots to your sitemap.xml and rely on meta robots tag, 301 and canonical for all the content/frontend related pages?

gfiorelli1 edited 2010-10-12T09:40:31-07:00
1 0

Let me ask a maybe provocative question. If robots.txt can cause so many misuses, why not simply use it just to block, from the first seconds of a website life, only the "backend" carpets (java, scripts, admin...) and to point faster bots to your sitemap.xml and rely on meta robots tag, 301 and canonical for all the content/frontend related pages?
Cancel
- Lindsay Wassell
 
 2010-10-12T10:58:45-07:00
 
 I like your thought process. The problem, in my view, is that the search engines like to have permission to view that stuff, especially the js, to ensure nothing funny is going on there. The admin content should be password protected anyway, so why bother?
 
 1 0
 
 I like your thought process. The problem, in my view, is that the search engines like to have permission to view that stuff, especially the js, to ensure nothing funny is going on there. The admin content should be password protected anyway, so why bother?
 Cancel
 - Gianluca Fiorelli
 
 2010-10-12T12:32:23-07:00
 
 Thanks Lindsay.
 
 I was thinking about the admin carpet and "backend" in general having in mind the classical robots.txt that comes with CMS like Joomla, that comes this way:
 
 User-agent: *Disallow: /administrator/Disallow: /cache/Disallow: /components/Disallow: /images/Disallow: /includes/Disallow: /installation/Disallow: /language/Disallow: /libraries/Disallow: /media/Disallow: /modules/Disallow: /plugins/Disallow: /templates/Disallow: /tmp/Disallow: /xmlrpc/
 
 I usually have always to retouch it (images, media and, from what you're saying other stuff).
 
 1 0
 
 Thanks Lindsay. I was thinking about the admin carpet and "backend" in general having in mind the classical robots.txt that comes with CMS like Joomla, that comes this way: User-agent: *Disallow: /administrator/Disallow: /cache/Disallow: /components/Disallow: /images/Disallow: /includes/Disallow: /installation/Disallow: /language/Disallow: /libraries/Disallow: /media/Disallow: /modules/Disallow: /plugins/Disallow: /templates/Disallow: /tmp/Disallow: /xmlrpc/ I usually have always to retouch it (images, media and, from what you're saying other stuff).
 Cancel
 - Lindsay Wassell
 
 2010-10-12T12:48:52-07:00
 
 If what you really want is to keep these pages out of the search engine index, the meta robots tag would be more effective at that. In the name of conserving robot resources on your site, you could probably safely nofollow some of these admin pages as well. I agree that you should be careful with the /images and /media defaults!
 
 1 0
 
 If what you really want is to keep these pages out of the search engine index, the meta robots tag would be more effective at that. In the name of conserving robot resources on your site, you could probably safely nofollow some of these admin pages as well. I agree that you should be careful with the /images and /media defaults!
 Cancel
Sorano

2010-10-12T05:24:42-07:00

Hehe, it's fun to know that the best use of robots.txt is to not use it.

Thanks !

1 0

Hehe, it's fun to know that the best use of robots.txt is to not use it. Thanks !
Cancel
Olaf Lederer

2010-10-12T00:10:53-07:00

I'm using the robots.txt file to block bots for the script file names like:

Disallow: /index.php

or if I'm using mod_rewrite like:

RewriteRule ^([a-z\-]*\/[a-z\-0-9]*\.html)$ /winkel.php?url=$1 [L]

I block the script too:

Disallow: /winkel.php

1 1

I'm using the robots.txt file to block bots for the script file names like: Disallow: /index.php or if I'm using mod_rewrite like: RewriteRule ^([a-z\-]*\/[a-z\-0-9]*\.html)$ /winkel.php?url=$1 [L] I block the script too: Disallow: /winkel.php
Cancel
- Dr. Peter J. Meyers
 
 2010-10-12T07:00:59-07:00
 
 Personally, I think this is a situation where you're much better off with either 301-redirects or the Canonical tag. Otherwise, as Lindsay pointed out, you may be cutting off link-juice to those alternate versions of the pages.
 
 1 0
 
 Personally, I think this is a situation where you're much better off with either 301-redirects or the Canonical tag. Otherwise, as Lindsay pointed out, you may be cutting off link-juice to those alternate versions of the pages.
 Cancel
 - Kyle Richey
 
 2010-10-12T11:31:46-07:00
 
 Agreed...I use 301's in this situation too.
 
 1 0
 
 Agreed...I use 301's in this situation too.
 Cancel
shahzeb

2010-10-14T13:58:07-07:00

WOW amazing.. hats off to you mam.. its a great great information for us on robot.txt file ..we thought that it really stop the crawler to check the site.. But SEO is a silly thing no one can be sure when its at the top and when its jump out of the competition lolz.. but you said it right..here is another site which surely gives you the chance to boost up your keywords and take you to the top...Successful Site is guaranteed to increase traffic to your site and generate more customers, and more sales for your business.Portland seo is a new seo company from portland and we surely took a alot of care about the robot.txt issues.... thanks alot for that nice piece of information ... nobody can understand about google...

caseyhen edited 2010-10-15T05:55:06-07:00
1 1

WOW amazing.. hats off to you mam.. its a great great information for us on robot.txt file ..we thought that it really stop the crawler to check the site.. But SEO is a silly thing no one can be sure when its at the top and when its jump out of the competition lolz.. but you said it right..here is another site which surely gives you the chance to boost up your keywords and take you to the top...Successful Site is guaranteed to increase traffic to your site and generate more customers, and more sales for your business.Portland seo is a new seo company from portland and we surely took a alot of care about the robot.txt issues.... thanks alot for that nice piece of information ... nobody can understand about google...
Cancel
Kalpesh B Patel

2010-10-11T23:44:51-07:00

till date I havent realised this, thanks for such a nice brief & have started implementiong all of my high traffic sites.

1 1

till date I havent realised this, thanks for such a nice brief & have started implementiong all of my high traffic sites.
Cancel
- Lindsay Wassell
 
 2010-10-12T05:08:25-07:00
 
 Let the link juice flow! Yippee!
 
 1 0
 
 Let the link juice flow! Yippee!
 Cancel

Post Analytics

Serious Robots.txt Misuse & High Impact Solutions

Your Pages Could Still Show Up in the SERPs

Cisco Login Page

WordPress’s Next Blog Page

Robots.txt Usage Can Block Inbound Link Effectiveness

3 Big Sites with Blocked Opportunity in the Robots.txt File

#1 - Digg.com

#2 - Blogger.com & Blogspot.com

#3 - IBM

Superior Solutions to the Robots.txt

Noindex

301 Redirect

Canonical Tag

Password Protection

Effective Robots.txt Usage

The Bad Bots

Competitors

Handling Non-HTML & System Content

More Reading

Action Items

Comments 65

Your Pages Could Still Show Up in the SERPs

Cisco Login Page

WordPress’s Next Blog Page

Robots.txt Usage Can Block Inbound Link Effectiveness

3 Big Sites with Blocked Opportunity in the Robots.txt File

#1 - Digg.com

#2 - Blogger.com & Blogspot.com

#3 - IBM

Superior Solutions to the Robots.txt

Noindex

301 Redirect

Canonical Tag

Password Protection

Effective Robots.txt Usage

The Bad Bots

Competitors

Handling Non-HTML & System Content

More Reading

Action Items

Comments 65

Log in to Moz

Don't have an account?