Robot Access & Indexation Restriction Techniques: Avoiding Conflicts

Comments 48

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

James Norquay

2011-08-18T01:27:53-07:00

Good post will be handy for any one who wants to learn about SE access restrictions, easy to read and understand =)

I must admit the award for the best robots.txt files I have ever seen goes to Rishi's website:

https://explicitly.me/robots.txt

Any one seen a better robots.txt ??

14 0

Good post will be handy for any one who wants to learn about SE access restrictions, easy to read and understand =) I must admit the award for the best robots.txt files I have ever seen goes to Rishi's website: <a href="https://explicitly.me/robots.txt" rel="nofollow">https://explicitly.me/robots.txt</a> Any one seen a better robots.txt ??
Cancel
- Moosa Hemani
 
 2011-08-18T01:48:37-07:00
 
 Oh yes! i remebrer this! Dear Google...! :)
 
 1 0
 
 Oh yes! i remebrer this! Dear Google...! :)
 Cancel
- Gianluca Fiorelli
 
 2011-08-18T01:59:37-07:00
 
 Not as sophisticated as the Rishi one, but the SEOmoz one has a cool index in its robots.text :D https://www.seomoz.org/robots.txt
 
 gfiorelli1 edited 2011-08-18T01:59:58-07:00
 4 0
 
 Not as sophisticated as the Rishi one, but the SEOmoz one has a cool index in its robots.text :D <a href="../robots.txt">https://www.seomoz.org/robots.txt</a>
 Cancel
 - Moosa Hemani
 
 2011-08-18T02:17:28-07:00
 
 WOW sir! Robots.txt is getting fancy day by day… Remember the daily mail SEO Job advertisement in their Robots.txt? that was a fabulous idea too…
 
 2 4
 
 WOW sir! Robots.txt is getting fancy day by day… Remember the daily mail SEO Job advertisement in their Robots.txt? that was a fabulous idea too…
 Cancel
 - Martin Oxby
 
 2011-08-19T04:59:12-07:00
 
 What's the blank Disallow line achieving? Out of curiosity...
 
 1 0
 
 What's the blank Disallow line achieving? Out of curiosity...
 Cancel
- Joe Steinbeck
 
 2011-08-18T08:22:41-07:00
 
 I don't think it's "better," but I do recall coming across a feature including this:
 
 https://www.last.fm/robots.txt
 
 Disallow: /harming/humans
 
 Disallow: /ignoring/human/orders
 
 Disallow: /harm/to/self
 
 Good rules for GoogleBot to follow. :) I wonder how a search engine treats links to a robots.txt?
 
 JoeSteinbeck edited 2011-08-18T08:23:17-07:00
 2 0
 
 I don't think it's "better," but I do recall coming across a feature including this: https://www.last.fm/robots.txt Disallow: /harming/humans Disallow: /ignoring/human/orders Disallow: /harm/to/self Good rules for GoogleBot to follow. :) I wonder how a search engine treats links to a robots.txt?
 Cancel
- Craig Kilgore
 
 2011-08-18T13:01:05-07:00
 
 haha that robots.txt file is awesome! great post too Lindsay
 
 1 0
 
 haha that robots.txt file is awesome! great post too Lindsay
 Cancel
 - SeoDuck48
 
 2011-11-09T06:10:13-08:00
 
 What's so funny about robots.txt. This is a must have file on any website.
 
 1 0
 
 What's so funny about robots.txt. This is a must have file on any website.
 Cancel
Martin Oxby

2011-08-19T05:00:59-07:00

Thanks for this - genuinely. It's great to get a refocus - I've had conflicts recently on a client and it's caused issues. But all cleared up thanks to meta tags and a clean robots file! Thanks!

2 0

Thanks for this - genuinely. It's great to get a refocus - I've had conflicts recently on a client and it's caused issues. But all cleared up thanks to meta tags and a clean robots file! Thanks!
Cancel
nikolassv

2011-08-19T00:50:41-07:00

In my opinion robots.txt it's the most efficient way to mess up your entire seo strategy. And even if you do it right it still blocks the free flow of trust and page rank on your site. So, I agree that you should try to use other ways of controlling crawler access whereever possible.

To check in audits if robots.txt does any harm to a site or blocks pages that it should not, I developed a little firefox add on. This add on - roboxt! - shows in the status bar whether the actual URL is blocked by robots.txt. The user could also choose in the preferences that it should mark internal links to blocked URLs and display the total number of links blocked on the current page.

If you are interested in using that add on, you can download it here at mozilla: https://addons.mozilla.org/en/firefox/addon/roboxt/ (a new version that is compatible with ff 6 is currently reviewed by mozilla and will be availible soon). For further information consult this short manual: https://nikolassv.de/roboxt-en/

1 0

In my opinion robots.txt it's the most efficient way to mess up your entire seo strategy. And even if you do it right it still blocks the free flow of trust and page rank on your site. So, I agree that you should try to use other ways of controlling crawler access whereever possible. To check in audits if robots.txt does any harm to a site or blocks pages that it should not, I developed a little firefox add on. This add on - roboxt! - shows in the status bar whether the actual URL is blocked by robots.txt. The user could also choose in the preferences that it should mark internal links to blocked URLs and display the total number of links blocked on the current page. If you are interested in using that add on, you can download it here at mozilla: <a href="https://addons.mozilla.org/de/firefox/addon/roboxt/" rel="nofollow">https://addons.mozilla.org/en/firefox/addon/roboxt/</a> (a new version that is compatible with ff 6 is currently reviewed by mozilla and will be availible soon). For further information consult this short manual: <a href="https://nikolassv.de/roboxt-en/" rel="nofollow">https://nikolassv.de/roboxt-en/</a>
Cancel
- Lindsay Wassell
 
 2011-08-20T06:28:21-07:00
 
 Is there an english version of the plugin?
 
 1 0
 
 Is there an english version of the plugin?
 Cancel
 - nikolassv
 
 2011-08-23T10:06:43-07:00
 
 yes, the plugin is translated and should detect your browsers language automaticly. The correct link to the english version of its entry at mozilla is: https://addons.mozilla.org/en/firefox/addon/roboxt/
 
 (the link in my last comment does still link to the de-URL)
 
 1 0
 
 yes, the plugin is translated and should detect your browsers language automaticly. The correct link to the english version of its entry at mozilla is: <a href="https://addons.mozilla.org/en/firefox/addon/roboxt/" rel="nofollow">https://addons.mozilla.org/en/firefox/addon/roboxt/</a> (the link in my last comment does still link to the de-URL)
 Cancel
Debopriyo Dattaray

2011-08-18T23:50:14-07:00

Hello Lindsay

Its a great post on Robots.txt. One thing I would like to add here is the list you mentioned to follow has got a slight mistake for the :

NODIR Prevents Yahoo! Directory titles and descriptions for the page from being displayed in the search resultsIt should be NOYDIR to block the Yahoo Directory.I hope everyone should update it on their list.Cheers

caseyhen edited 2011-08-18T23:52:31-07:00
1 0

Hello Lindsay Its a great post on Robots.txt. One thing I would like to add here is the list you mentioned to follow has got a slight mistake for the : NODIR Prevents Yahoo! Directory titles and descriptions for the page from being displayed in the search resultsIt should be NOYDIR to block the Yahoo Directory.I hope everyone should update it on their list.Cheers
Cancel
Nerds On Call

2011-08-18T14:38:19-07:00

Lindsay, thanks for making some of the more technical aspects of SEO easy to understand. This post definitely explained some stuff that was "foggy" for me.

1 0

Lindsay, thanks for making some of the more technical aspects of SEO easy to understand. This post definitely explained some stuff that was "foggy" for me. 
Cancel
John Doherty

2011-08-18T13:57:54-07:00

Lindsay -

Great post here! I love seeing a good technical SEO post here on the Moz blog.

This post was particularly actionable for me today with a client. This one is bookmarked for sure.

Cheers!

1 0

Lindsay - Great post here! I love seeing a good technical SEO post here on the Moz blog. This post was particularly actionable for me today with a client. This one is bookmarked for sure. Cheers!
Cancel
mayonnaise

2011-08-19T07:12:40-07:00

Ive never think about this before... Thanks..

1 0

 Ive never think about this before... Thanks..
Cancel
Vijaysekhar

2011-08-18T21:31:59-07:00

Great post about Robots.txt. Reilly it will more use full for SEO. Thank you Keep raking

1 0

Great post about Robots.txt. Reilly it will more use full for SEO. Thank you Keep raking
Cancel
Anil Valvi

2011-08-20T00:50:10-07:00

Thanks Lindsay. You have covered all technical things.

One more thing is what would happen if you put this entry into your robots.txt file?

User-agent: *

Disallow: /robots.txt

I know this is crazy idea but try this :)

Cheers..

anilvalvi edited 2011-08-20T00:51:27-07:00
1 0

Thanks Lindsay. You have covered all technical things. One more thing is what would happen if you put this entry into your robots.txt file? User-agent: * Disallow: /robots.txt I know this is crazy idea but try this :) Cheers..
Cancel
- Lindsay Wassell
 
 2011-08-20T06:25:54-07:00
 
 Have you tried this yourself? What was the result?
 
 1 0
 
 Have you tried this yourself? What was the result?
 Cancel
 - SeoDuck48
 
 2011-11-09T11:21:01-08:00
 
 This looks like total absurd. What would that string do?
 
 1 0
 
 This looks like total absurd. What would that string do?
 Cancel
Alain Carpentier

2011-10-19T21:25:29-07:00

Is it not simplier to just use the robots.txt?

1 0

Is it not simplier to just use the robots.txt?
Cancel
Willem-Siebe

2013-09-06T15:51:16-07:00

With noindex you don't prevent Google for crawling your page, so why can Google not find a canonical if you use this together?

1 0

 With noindex you don't prevent Google for crawling your page, so why can Google not find a canonical if you use this together? 
Cancel
Jesper Jørgensen

2011-08-26T02:07:43-07:00

Great post Lindsay :-)

I willadd that 9 out of 10 times you should use:

<meta name="robots" content="noindex">

And not

<meta name="robots" content="noindex,nofollow">

Which is unfortunately a tag I see a lot.

1 0

Great post Lindsay :-) I willadd that 9 out of 10 times you should use: <meta name="robots" content="noindex"> And not <meta name="robots" content="noindex,nofollow"> Which is unfortunately a tag I see a lot.
Cancel
psolanki.ims

2011-08-23T05:03:48-07:00

Thanks for such an interesting post. I am regular reader of your blog and implement your valuable suggestions.

1 0

Thanks for such an interesting post. I am regular reader of your blog and implement your valuable suggestions.
Cancel
amagnino

2011-08-18T13:46:22-07:00

Great post Linday !Anyway I share the same doubt of Marcoswidung and a clarification from you on this would be much appreciated. I've often used robots.txt to stop the bot crawling useless pages (often created by CMS) to optimise crawling resources. Is this a good idea or meta robot noindex is always the best option ?

Thanks,

Ale

1 0

Great post Linday !Anyway I share the same doubt of Marcoswidung and a clarification from you on this would be much appreciated. I've often used robots.txt to stop the bot crawling useless pages (often created by CMS) to optimise crawling resources. Is this a good idea or meta robot noindex is always the best option ? Thanks, Ale
Cancel
Riona

2011-08-21T09:26:16-07:00

Hey Lindsay,

One issue which I havn't seen discussed and one which still confuses me is how to handle websites that are available via both http and https. I usually use PHP with an If/then statement like below to check if https is on and if so I add meta robots noindex tags. Is this a good strategy?

if (isset($_SERVER['HTTPS']) && strtolower($_SERVER['HTTPS']) == 'on') {

echo '<meta name="robots" content="noindex,nofollow">';

}

1 0

Hey Lindsay, One issue which I havn't seen discussed and one which still confuses me is how to handle websites that are available via both http and https. I usually use PHP with an If/then statement like below to check if https is on and if so I add meta robots noindex tags. Is this a good strategy? if (isset($_SERVER['HTTPS']) && strtolower($_SERVER['HTTPS']) == 'on') { echo '<meta name="robots" content="noindex,nofollow">'; }
Cancel
- Jesper Jørgensen
 
 2011-08-26T02:09:24-07:00
 
 Hi Riona
 
 You should typically use a 301 redirect in this case, to redirect users to the http version.
 
 1 0
 
 Hi Riona You should typically use a 301 redirect in this case, to redirect users to the http version.
 Cancel
kateG1298

2011-08-19T11:32:30-07:00

I didn't think about the noindex. Good points Ill have to bookmark and check this out again later.

1 0

I didn't think about the noindex. Good points Ill have to bookmark and check this out again later.
Cancel
treb

2011-08-18T11:36:45-07:00

Great post Lindsay... Thanks for sharing very useful information...

1 0

Great post Lindsay... Thanks for sharing very useful information...
Cancel
Hiren Vaghela

2011-08-18T04:40:06-07:00

Great read Lindsay,

These all are teachinal stuff and it should be execute in every website and it will definately resolve many problems regarding your SEO work. robots.txt and x-robots tags are vital and you describe very well here. Thanks.

1 0

Great read Lindsay, These all are teachinal stuff and it should be execute in every website and it will definately resolve many problems regarding your SEO work. robots.txt and x-robots tags are vital and you describe very well here. Thanks.
Cancel
Mitch Monsen

2011-08-18T08:10:26-07:00

A nice refresher on a useful topic, thanks!

I've been using robots.txt to exclude entries from the engines since I don't have the ability to actively edit the headers of my pages (gotta love Wordpress). It's been working well for me so far, though from what you say, it sounds like the meta robots tag would be vastly superior. Oh well, whatcha gonna do?

1 0

A nice refresher on a useful topic, thanks! I've been using robots.txt to exclude entries from the engines since I don't have the ability to actively edit the headers of my pages (gotta love Wordpress). It's been working well for me so far, though from what you say, it sounds like the meta robots tag would be vastly superior. Oh well, whatcha gonna do?
Cancel
Ben Acheson

2011-08-18T02:46:35-07:00

An excellent, concise post - thanks.

The value of managing robot access effectively cannot be overstated.

Until you've checked what's actually being indexed by Google (especially from a large website) you can easily fail to appreciate the amount of junk that can get indexed and used by search engines - to the detriment of your online objectives and goals.

ben.acheson edited 2011-08-18T02:47:41-07:00
1 0

An excellent, concise post - thanks. The value of managing robot access effectively cannot be overstated. Until you've checked what's actually being indexed by Google (especially from a large website) you can easily fail to appreciate the amount of junk that can get indexed and used by search engines - to the detriment of your online objectives and goals.
Cancel
Frederik Hyldig

2011-08-18T02:39:47-07:00

This knowledge is one of the SEO-related things that very few "normal" people know and therefore separates a professional SEO from the Average Joe who has read one article on SEO.

Therefore you also need to know this if you work with SEO, as this is one of the reasons anyone should hire you.

In the future more and more people will know some basic SEO stuff (titles, descriptions, including keyword in content etc.). But the more technical stuff is not going mainstream anytime soon.

Great walkthrough.

1 0

This knowledge is one of the SEO-related things that very few "normal" people know and therefore separates a professional SEO from the Average Joe who has read one article on SEO. Therefore you also need to know this if you work with SEO, as this is one of the reasons anyone should hire you. In the future more and more people will know some basic SEO stuff (titles, descriptions, including keyword in content etc.). But the more technical stuff is not going mainstream anytime soon. Great walkthrough.
Cancel
Rob Brideson

2011-08-18T02:39:37-07:00

Great description of the indexing process and avoiding conflicts. I have just tweeted this to my clients and future clients who will find this a particulary good point of reference.

1 0

Great description of the indexing process and avoiding conflicts. I have just tweeted this to my clients and future clients who will find this a particulary good point of reference.
Cancel
TalkInThePark

2011-08-18T12:09:10-07:00

Good post, though I miss a discussion about the potential use of robots.txt in trying to avoid the spider trap. If your crawling budget is limited you it might be worth to stop spiders from even reaching certain pages and let the spider spend more of the budget on pages that are more important from a landing page perspective, despite missing out on some link juice. For example complex internal search URLs resulting from faceted navigation versus product pages. In those cases, add a param to those URLs you do not want to be crawled and block that param in robots.txt. And make sure to noindex those pages as well.

1 0

Good post, though I miss a discussion about the potential use of robots.txt in trying to avoid the spider trap. If your crawling budget is limited you it might be worth to stop spiders from even reaching certain pages and let the spider spend more of the budget on pages that are more important from a landing page perspective, despite missing out on some link juice. For example complex internal search URLs resulting from faceted navigation versus product pages. In those cases, add a param to those URLs you do not want to be crawled and block that param in robots.txt. And make sure to noindex those pages as well. 
Cancel
- Lindsay Wassell
 
 2011-08-18T16:58:13-07:00
 
 I am not a fan of the robots.txt at all. I view it as a link juice brick wall, plus it doesn't even keep pages out of the index. Sure, it is easy to implement but does it really meet the goals of search marketing? Check out this post where I rant at length about the robots.txt and why it is pretty useless. https://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions. Would love to hear your thoughts.
 
 1 0
 
 I am not a fan of the robots.txt at all. I view it as a link juice brick wall, plus it doesn't even keep pages out of the index. Sure, it is easy to implement but does it really meet the goals of search marketing? Check out this post where I rant at length about the robots.txt and why it is pretty useless. https://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions. Would love to hear your thoughts.
 Cancel
 - TalkInThePark
 
 2011-08-19T16:07:53-07:00
 
 I can't deny that I get a bit hesitant about implementing the so-called hatchet approach on our e-commerce site when reading your posts... But the reason I am considering it is our site´s rather complex faceted navigation that features 10 facets with approx. 10-15 values each. In combination with our catalogue of 4 million+ products, that faceted navigation exposes bots to hundreds of thousands (if not millions, I am not very good at math) of unique and many times very complex URLs leading to pages featuring quite thin and in many cases practically the same content . What I am afraid of is that search bots get tired of following those URLs and bail out instead of eating our product pages. I would of course only disallow complex URLs (3+ or so facets) and URLs containing params/param values considered irrelevant from a landing page perspective.
 
 Yes, there might be a few inbound links to those "far out" URLs but if we can make bots crawl important pages instead of useless pages, don't you think that we have gained something? Or do you consider the hatchet approach useless for "spider trap" prevention? You do mention those disallowed URLs still end up in the SE index.
 
 I know there are other methods to avoid spider traps out there but they require quite too much system development on our part.
 
 1 0
 
 I can't deny that I get a bit hesitant about implementing the so-called hatchet approach on our e-commerce site when reading your posts... But the reason I am considering it is our site´s rather complex faceted navigation that features 10 facets with approx. 10-15 values each. In combination with our catalogue of 4 million+ products, that faceted navigation exposes bots to hundreds of thousands (if not millions, I am not very good at math) of unique and many times very complex URLs leading to pages featuring quite thin and in many cases practically the same content . What I am afraid of is that search bots get tired of following those URLs and bail out instead of eating our product pages. I would of course only disallow complex URLs (3+ or so facets) and URLs containing params/param values considered irrelevant from a landing page perspective. Yes, there might be a few inbound links to those "far out" URLs but if we can make bots crawl important pages instead of useless pages, don't you think that we have gained something? Or do you consider the hatchet approach useless for "spider trap" prevention? You do mention those disallowed URLs still end up in the SE index. I know there are other methods to avoid spider traps out there but they require quite too much system development on our part.
 Cancel
 - Lindsay Wassell
 
 2011-08-20T06:34:27-07:00
 
 This topic would make a great blog post and discussion. I'm not going to deny that crawler fatigue can become an issue with gigantic websites. In your case, your website produces a huge number of 'search results' pages. That is a concern all of it's own because the search engines love to hate these types of pages and hundreds of thousands of them are certainly capable of getting you in trouble.
 
 Before you can make a decision about robot access methods for the site, you'd need to make sure that your deeper product pages are all accessible through alternate means.
 
 1 0
 
 This topic would make a great blog post and discussion. I'm not going to deny that crawler fatigue can become an issue with gigantic websites. In your case, your website produces a huge number of 'search results' pages. That is a concern all of it's own because the search engines love to hate these types of pages and hundreds of thousands of them are certainly capable of getting you in trouble. Before you can make a decision about robot access methods for the site, you'd need to make sure that your deeper product pages are all accessible through alternate means.
 Cancel
Staff

Dr. Peter J. Meyers
Staff

2011-08-18T09:29:09-07:00

Good point - I've seen a few people throw everything but the kitchen sink at Google, and a lot of times these tactics actually impede each other and slow down what you're trying to accomplish. It can be frustrating to wait, but doing it wrong can take a whole lot longer than having a little patience.

I'd just add that, while it's not an ideal solution, Google Webmaster Tools parameter blocking is another option, if you're really in a bind. I don't think it's a good, long-term approach, but it's fast. If you do something bad and end up indexing a ton of URL-based duplicates, I sometimes recommend it.

1 0

Good point - I've seen a few people throw everything but the kitchen sink at Google, and a lot of times these tactics actually impede each other and slow down what you're trying to accomplish. It can be frustrating to wait, but doing it wrong can take a whole lot longer than having a little patience. I'd just add that, while it's not an ideal solution, Google Webmaster Tools parameter blocking is another option, if you're really in a bind. I don't think it's a good, long-term approach, but it's fast. If you do something bad and end up indexing a ton of URL-based duplicates, I sometimes recommend it.
Cancel
ezclickmedia

2011-08-18T11:17:53-07:00

Great post! Personally, I've always used the robots.txt file to exclude files/pages/etc. I have a couple questions for you regarding the robots.txt file:
1. To allow a robot to index my entire site, do I use "Allow: /" or "Allow:" or "Disallow: /" or "Disallow:"
Thanks everybody for any feedback you can give me.

-REF

1 0
Great post! Personally, I've always used the robots.txt file to exclude files/pages/etc. I have a couple questions for you regarding the robots.txt file: <ol><li>To allow a robot to index my entire site, do I use "Allow: /" or "Allow:" or "Disallow: /" or "Disallow:"</li> </ol> Thanks everybody for any feedback you can give me. -REF
Cancel
- Kenny Martin
 
 2011-08-19T21:34:55-07:00
 
 Hey Rob,
 
 You will want to use the following:
 
 User-agent: *
 
 Disallow:
 
 Kenneth_Martin edited 2011-08-19T21:36:40-07:00
 1 0
 
 Hey Rob, You will want to use the following: User-agent: * Disallow:
 Cancel
Mysterio

2011-08-18T00:48:55-07:00

What a great little run down - good stuff yet again. I am a firm believer that the Canonical Tag is the most underused and neglected tag that is so effective, I wish people would use them more and correctly :)

1 0

What a great little run down - good stuff yet again. I am a firm believer that the Canonical Tag is the most underused and neglected tag that is so effective, I wish people would use them more and correctly :)
Cancel
Andy Kuiper

2011-08-18T11:02:28-07:00

Nice post Lindsay :-) very informative and I'm sure it will be linked to by many :-)

I especially liked your take on the Robots.txt Disallow & Meta Robots 'noindex'

1 0

Nice post Lindsay :-) very informative and I'm sure it will be linked to by many :-) I especially liked your take on the Robots.txt Disallow & Meta Robots 'noindex' 
Cancel
William Craig

2011-08-18T11:02:09-07:00

This post is exactly what I needed to read. I have felt that the more I researched robots.txt vs meta robots 'noindex," the more confusing I made a rather simple topic. Thanks again.

1 0

This post is exactly what I needed to read. I have felt that the more I researched robots.txt vs meta robots 'noindex," the more confusing I made a rather simple topic. Thanks again.
Cancel
Abogadosenmiami

2011-08-18T10:10:12-07:00

That issue more extensive, no doubt always have something to learn SEO, I think this is a complete guide of indexing techniques and restriction, excellent recommendations.

Abogadosenmiami edited 2011-08-18T10:11:24-07:00
1 0

That issue more extensive, no doubt always have something to learn SEO, I think this is a complete guide of indexing techniques and restriction, excellent recommendations.
Cancel
Traian Neacsu

2011-08-18T09:53:36-07:00

Lindays, good write up. Keep in mind that search engines can choose not to follow your robots.txt or canonical directives, i.e. the root of a gov site was entirely disallowed with robots.txt, there were 1000s of links pointing to it and Google decided to still index it, for user experince.

Generally, you can say that Robots.txt Disallow & Meta Robots 'noindex' is not to be used, but one can support the other is such cases.

Cheers!

1 0

Lindays, good write up. Keep in mind that search engines can choose not to follow your robots.txt or canonical directives, i.e. the root of a gov site was entirely disallowed with robots.txt, there were 1000s of links pointing to it and Google decided to still index it, for user experince. Generally, you can say that Robots.txt Disallow & Meta Robots 'noindex' is not to be used, but one can support the other is such cases. Cheers!
Cancel
Moosa Hemani

2011-08-18T01:14:01-07:00

Nice Post!

Robots.txt is one of the most important tag that is used by almost every webmaster but there is much confession because different tag have almost similar functionality (at least it looks like that its similar) Post define a clear differences between the tags and how to go with that…

Only a thing I wanted to add as per my experience is that understand that scenario and use one tag only… using multiple tag may hurt you instead of saving you…

Over all a great read!

1 3

Nice Post! Robots.txt is one of the most important tag that is used by almost every webmaster but there is much confession because different tag have almost similar functionality (at least it looks like that its similar) Post define a clear differences between the tags and how to go with that… Only a thing I wanted to add as per my experience is that understand that scenario and use one tag only… using multiple tag may hurt you instead of saving you… Over all a great read!
Cancel
ZAKA ULLAH

2011-08-18T05:51:45-07:00

yes its good topic rebots.txt, When a search engine crawler comes to your site, it will look for a special file on your site. That file is called robots.txt and it tells the search engine spider, which Web pages of your site should be indexed and which Web pages should be ignored. But <meta name=”ROBOTS” content=”NOINDEX” /> is provide good way.

Here are some Robots tags that are common < Meta content="NOINDEX" name="ROBOTS">- Ignore content and follow links < Meta content="NOFOLLOW, INDEX " name="ROBOTS">- Include content and do not follow links < Meta content="NOINDEX,NOFOLLOW" name="ROBOTS">- Ignore content and do not follow links < Meta content="INDEX,FOLLOW" name="ROBOTS">- Include content and follow links < Meta content="NOARCHIVE" name="ROBOTS">- Cache link should not show Search results pages < Meta content="NOODP" name="ROBOTS">- The Open Directory Project (ODP) title and description for the page should not be displayed in Search results < Meta content="NOYDIR" name="ROBOTS">- The Yahoo Directory title and description for the page should not be displayed in Search results < content="NOSNIPPET" name="ROBOTS">- Titles are only displayed in Search results page and not description or text context for this page

1 8

yes its good topic rebots.txt, When a search engine crawler comes to your site, it will look for a special file on your site. That file is called robots.txt and it tells the search engine spider, which Web pages of your site should be indexed and which Web pages should be ignored. But <meta name=”ROBOTS” content=”NOINDEX” /> is provide good way. Here are some Robots tags that are common < Meta content="NOINDEX" name="ROBOTS">- Ignore content and follow links < Meta content="NOFOLLOW, INDEX " name="ROBOTS">- Include content and do not follow links < Meta content="NOINDEX,NOFOLLOW" name="ROBOTS">- Ignore content and do not follow links < Meta content="INDEX,FOLLOW" name="ROBOTS">- Include content and follow links < Meta content="NOARCHIVE" name="ROBOTS">- Cache link should not show Search results pages < Meta content="NOODP" name="ROBOTS">- The Open Directory Project (ODP) title and description for the page should not be displayed in Search results < Meta content="NOYDIR" name="ROBOTS">- The Yahoo Directory title and description for the page should not be displayed in Search results < content="NOSNIPPET" name="ROBOTS">- Titles are only displayed in Search results page and not description or text context for this page
Cancel

Post Analytics

Command	Description
NOINDEX	Prevents the page from being included in the index
NOFOLLOW	Prevents bots from following the links on a page
NOARCHIVE	Prevents a cached copy of the page from being available in the search results
NOSNIPPET	Prevents a description from appearing below the page link in the search results AND prevents caching of the page
NOODP	Prevents the Open Directory Project (DMOZ.org) description of the page from being displayed in the search results
NODIR	Prevents Yahoo! Directory titles and descriptions for the page from being displayed in the search results

Quick Refresher

Avoiding Conflicts

Comments 48

Log in to Moz

Don't have an account?