Robots Exclusion Protocol 101

Comments 43

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

XMCP

2008-01-16T16:25:22-08:00

Sebastian, I have no idea why out of every complicated issue on the internet you decided to latch onto Robots.txt files.

But I'm glad you did. Although I believe it to be somewhat insane.

Heh whatever. Awesome post.

4 0

Sebastian, I have no idea why out of every complicated issue on the internet you decided to latch onto Robots.txt files. But I'm glad you did. Although I believe it to be somewhat insane. Heh whatever. Awesome post. 
Cancel
- Sebastian
 
 2008-01-16T17:34:20-08:00
 
 Probably because robots.txt is a popular topic at the moment, due to Google's experiments with REP tags for robots.txt. I think that Webmasters should care, because changes that Google can standardize will stick, although they're pretty much weird (IOW not REP compliant) in their current experimental stage.
 
 Geek stuff is "insane" by design. ;)
 
 Thanks!
 
 3 0
 
 Probably because robots.txt is a popular topic at the moment, due to Google's experiments with REP tags for robots.txt. I think that Webmasters should care, because changes that Google can standardize will stick, although they're pretty much weird (IOW not REP compliant) in their current experimental stage. Geek stuff is "insane" by design. ;) Thanks!
 Cancel
 - grasshopper
 
 2008-01-16T19:02:26-08:00
 
 well done, sebastian. thanks for shedding some light on an important topic. in my experience, strategic use of robots.txt can be a game-changing step in consolidating a site's link equity and putting important pages in a better position to rank.
 
 i always do whatever is necessary to educate clients that times have changed since https://www.robotstxt.org came online.
 
 specifically - it's true that, strictly speaking, robots.txt doesn't allow wildcards. however, both google and yahoo support pattern matching as an extension of the standard:
 
 https://www.google.com/support/webmasters/bin/answer.py?answer=40367&topic=8846
 
 https://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html
 
 with that in hand, you can do a *heap* of powerful stuff to help clients rank better.
 
 1 0
 
 well done, sebastian. thanks for shedding some light on an important topic. in my experience, strategic use of robots.txt can be a game-changing step in consolidating a site's link equity and putting important pages in a better position to rank. i always do whatever is necessary to educate clients that times have changed since https://www.robotstxt.org came online. specifically - it's true that, strictly speaking, robots.txt doesn't allow wildcards. however, both google and yahoo support pattern matching as an extension of the standard: https://www.google.com/support/webmasters/bin/answer.py?answer=40367&topic=8846 https://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html with that in hand, you can do a *heap* of powerful stuff to help clients rank better.
 Cancel
 - XMCP
 
 2008-01-17T17:44:43-08:00
 
 Heh popular subject? I need to check out your blogroll more then, since I've seen very little about it (Except from you).
 
 Oh well, looks like I'm getting educated none-the-less.
 
 1 0
 
 Heh popular subject? I need to check out your blogroll more then, since I've seen very little about it (Except from you). Oh well, looks like I'm getting educated none-the-less. 
 Cancel
 - Sebastian
 
 2008-01-18T01:30:07-08:00
 
 You're right. Popularity is based on the size of the crowd, and the crowd is small in this case.
 
 1 0
 
 You're right. Popularity is based on the size of the crowd, and the crowd is small in this case.
 Cancel
g1smd

2008-01-17T05:13:12-08:00

Sebastian's site is rapidly becoming one of the best resources on the net for many SEO and coding related topics.

Don't just read the robots.txt stuff that he has posted. Go browse the rest of his site. Without delay. Go on...

4 0

Sebastian's site is rapidly becoming one of the best resources on the net for many SEO and coding related topics. Don't just read the robots.txt stuff that he has posted. Go browse the rest of his site. Without delay. Go on...
Cancel
- Sebastian
 
 2008-01-17T11:33:17-08:00
 
 Thank you for the compliment. :)
 
 1 0
 
 Thank you for the compliment. :)
 Cancel
 - Lord Manley
 
 2008-01-18T07:23:46-08:00
 
 It is not an idle compliment. Your recent post on getting URLs outta Google - the good, the popular, and the definitive way was also excellent.
 
 You have a talent for imparting detail without becoming overly verbose, of which I am more than a little jealous.
 
 3 0
 
 It is not an idle compliment. Your recent post on <a href="https://sebastians-pamphlets.com/getting-urls-out-of-google-the-good-popular-definitive-way/" rel="nofollow">getting URLs outta Google - the good, the popular, and the definitive way</a> was also excellent. You have a talent for imparting detail without becoming overly verbose, of which I am more than a little jealous.
 Cancel
identity

2008-01-17T19:15:11-08:00

Sebastian,

nice job tackling a challenging subject. Great seeing additional information on this "ancient" -- at least in Internet years -- but still very important protocol.

Still amazing how many sites haven't implemented this, even not disallowing anything just to avoid 404 errors throwing off their analytics, or have implemented, but incorrectly . . . such as accidentally disallowing their entire site, or not including blank lines between disallow statements (though technically specified, I'm hoping that the robots have become savvy enough to overlook this issue . . . but who knows).

One statement kind of threw me here, so hoping you can clarify...

Located at the web server's root level, that's the gatekeeper for the entire site. In other words, if any other directive conflicts with a statement in robots.txt, robots.txt overrules it.

For example, using mydomain.com/mypage.htm...

if robots.txt says to disallow mypage.htm, then the robots won't crawl that page. So this would be the case where the meta robots tag on that page said to index or follow (which would be unnecessary anyway) as the meta robots tags won't be seen anyway.

but if robots.txt has no related directive or even an allow directive (which I'd typically not recommend as it may not be recognized by all, though if memory serves me correctly, it may have finally been recognized by all the majors... but that didn't used to be the case) and mypage.htm has a noindex meta, then it seems the statement above would say that robots.txt would win and the page would be indexed.

Maybe I misunderstood, but this would not seem correct as the meta directive should overrule the txt at that point to give greater control at the page level.

This was by design as the protocol, as mentioned, was designed to be exclusionary. Prior to the allow directive or wildcard pattern matching, all one could do was to disallow, therefore, control was based on where you disallowed.... through robots.txt at the site, directory, or file level, or through meta at the file level.

3 0

Sebastian, nice job tackling a challenging subject. Great seeing additional information on this "ancient" -- at least in Internet years -- but still very important protocol. Still amazing how many sites haven't implemented this, even not disallowing anything just to avoid 404 errors throwing off their analytics, or have implemented, but incorrectly . . . such as accidentally disallowing their entire site, or not including blank lines between disallow statements (though technically specified, I'm hoping that the robots have become savvy enough to overlook this issue . . . but who knows). One statement kind of threw me here, so hoping you can clarify... <blockquote>Located at the web server's root level, that's the gatekeeper for the entire site. In other words, if any other directive conflicts with a statement in robots.txt, robots.txt overrules it.</blockquote> For example, using mydomain.com/mypage.htm... if robots.txt says to disallow mypage.htm, then the robots won't crawl that page. So this would be the case where the meta robots tag on that page said to index or follow (which would be unnecessary anyway) as the meta robots tags won't be seen anyway. but if robots.txt has no related directive or even an allow directive (which I'd typically not recommend as it may not be recognized by all, though if memory serves me correctly, it may have finally been recognized by all the majors... but that didn't used to be the case) and mypage.htm has a noindex meta, then it seems the statement above would say that robots.txt would win and the page would be indexed. Maybe I misunderstood, but this would not seem correct as the meta directive should overrule the txt at that point to give greater control at the page level. This was by design as the protocol, as mentioned, was designed to be exclusionary. Prior to the allow directive or wildcard pattern matching, all one could do was to disallow, therefore, control was based on where you disallowed.... through robots.txt at the site, directory, or file level, or through meta at the file level. 
Cancel
- Sebastian
 
 2008-01-18T01:48:20-08:00
 
 "Disallow" is a crawler directive, "noindex" is an indexer directive. "Disallow" doesn't disallow indexing, just forbids crawling. If you want to make sure that search engines comply to your indexer directives, you must allow crawling.
 
 Currently you can't restrict indexing in robots.txt. Also, the lack of a "Disallow" statement for a particular URI doesn't mean "index it even when the page has a 'nofollow' tag".
 
 Crawlers don't index stuff, and follow only crawler directives. Indexers (and query engines) don't crawl stuff, and follow only indexer directives.
 
 If a page is disallow'ed, robots.txt wins because search engines can't spot the indexer directives on the page.
 
 If you submit a disallow'ed URL via XML sitemap, robots.txt wins and the engines won't crawl it. In theory they could list the URL picked from the sitemap on the SERPs, in fact that happens only when the uncrawlable URL has strong inbound links.
 
 3 0
 
 "Disallow" is a crawler directive, "noindex" is an indexer directive. "Disallow" doesn't disallow indexing, just forbids crawling. If you want to make sure that search engines comply to your indexer directives, you must allow crawling. Currently you can't restrict indexing in robots.txt. Also, the lack of a "Disallow" statement for a particular URI doesn't mean "index it even when the page has a 'nofollow' tag". Crawlers don't index stuff, and follow only crawler directives. Indexers (and query engines) don't crawl stuff, and follow only indexer directives. If a page is disallow'ed, robots.txt wins because search engines can't spot the indexer directives on the page. If you submit a disallow'ed URL via XML sitemap, robots.txt wins and the engines won't crawl it. In theory they could list the URL picked from the sitemap on the SERPs, in fact that happens only when the uncrawlable URL has strong inbound links.
 Cancel
SarvenCapadisli

2008-01-16T20:02:13-08:00

@Sebastian:
rel="nofollow" is a debated format for microformats. The microformats community did not develop it, it was Google's team.See (Specification, Abstract and Open Issues):

https://microformats.org/wiki/rel-nofollow

And this:

https://www.seomoz.org/blog/12-ways-to-keep-your-content-hidden-from-the-search-engines#jtc46076

(also see Rand's 11th point)

[It appears to be that there is something about rel="nofollow" daily here on seomoz :)]

SarvenCapadisli edited 2008-01-16T20:13:39-08:00
3 0

@Sebastian:rel="nofollow" is a debated format for microformats. The microformats community did not develop it, it was Google's team.See (Specification, Abstract and Open Issues): <a href="https://microformats.org/wiki/rel-nofollow" rel="nofollow">https://microformats.org/wiki/rel-nofollow</a> And this: <a href="../../../../blog/12-ways-to-keep-your-content-hidden-from-the-search-engines#jtc46076">https://www.seomoz.org/blog/12-ways-to-keep-your-content-hidden-from-the-search-engines#jtc46076</a> (also see Rand's 11th point) [It appears to be that there is something about rel="nofollow" daily here on seomoz :)]
Cancel
- Sebastian
 
 2008-01-17T02:55:50-08:00
 
 Yep, rel-nofollow in its original shape is debated at Microformats, but due to enough adopters it's a settled de facto standard. BTW only Google doesn't discover new stuff from condomized links, other engines (Yahoo/MSN) just don't pass reputation or ignore it totally (Ask).
 
 1 0
 
 Yep, rel-nofollow in its original shape is debated at Microformats, but due to enough adopters it's a settled de facto standard. BTW only Google doesn't discover new stuff from condomized links, other engines (Yahoo/MSN) just don't pass reputation or ignore it totally (Ask).
 Cancel
Sebastian

2008-01-16T10:55:31-08:00

It seems the CSS lacks support of DL/DT/DD elements.

2 0

It seems the CSS lacks support of DL/DT/DD elements. 
Cancel
- Sebastian
 
 2008-01-16T17:23:49-08:00
 
 Thanks for the edits. :)
 
 3 0
 
 Thanks for the edits. :)
 Cancel
- HoboSEO
 
 2008-01-17T00:52:04-08:00
 
 Sebastian - it frightens me to think what you put online you need to know so much about how to keep it all away from prying robots.... :)
 
 Another great post about robots. Learned more about Robots in the last couple of posts from Sebastian than on any other site previously.
 
 HoboSEO edited 2008-01-17T00:59:18-08:00
 1 0
 
 Sebastian - it frightens me to think what you put online you need to know so much about how to keep it all away from prying robots.... :) Another great post about robots. Learned more about Robots in the last couple of posts from Sebastian than on any other site previously. 
 Cancel
 - Sebastian
 
 2008-01-17T04:42:58-08:00
 
 Shaun, the way more interesting side of the REP is steering search engines the other way round.
 
 2 0
 
 Shaun, the way more interesting side of the REP is steering search engines the other way round.
 Cancel
SeanMaguire

2008-01-16T12:25:19-08:00

"video sitemaps"

I was unaware of that. Worth the price of admission. Thanks.

Thumbs up.

2 0

"video sitemaps" I was unaware of that. Worth the price of admission. Thanks. Thumbs up. 
Cancel
- Rajat Garg
 
 2008-01-16T13:41:13-08:00
 
 I like the idea of mobile sitemaps.
 
 1 0
 
 I like the idea of mobile sitemaps. 
 Cancel
- Sebastian
 
 2008-01-16T17:25:00-08:00
 
 I agree, that's awesome.
 
 1 0
 
 I agree, that's awesome.
 Cancel
MattAustin72

2008-01-17T15:21:27-08:00

What about the robots-nocontent attribute for marking content at the block/element level? Although I think this is only implemented by Yahoo! - plus it doesn't really block content as such - but thought it should probably be worth a mention. Yahoo has a help page on it: https://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-14.html

1 0

What about the robots-nocontent attribute for marking content at the block/element level? Although I think this is only implemented by Yahoo! - plus it doesn't really block content as such - but thought it should probably be worth a mention. Yahoo has a help page on it: https://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-14.html 
Cancel
- Sebastian
 
 2008-01-17T15:38:07-08:00
 
 Only 500 sites on the whole Web have implemented it, probably because it's, well, politely put, unusable.
 
 The idea is neither new nor bad, but Yahoo's implementation turned to a miserable failure. Kudos to Yahoo for trying it, but they should have thought about it more than half a second before the launch. That's why the block/element level row in the image above doesn't mention Yahoo's robots-nocontent class name nor Google's section targeting.
 
 I'm sorry, to explain the flaws I'd need to write a book and I fear the tiny editor can't eat that. Here is at least a rant.
 
 1 0
 
 Only 500 sites on the whole Web have implemented it, probably because it's, well, politely put, unusable. The idea is neither new nor bad, but Yahoo's implementation turned to a miserable failure. Kudos to Yahoo for trying it, but they should have thought about it more than half a second before the launch. That's why the block/element level row in the image above doesn't mention Yahoo's robots-nocontent class name nor Google's section targeting. I'm sorry, to explain the flaws I'd need to write a book and I fear the tiny editor can't eat that. Here is at least a <a href="https://sebastians-pamphlets.com/yahoo-search-going-to-torture-webmasters/" rel="nofollow">rant</a>.
 Cancel
dojeda

2008-01-17T14:34:46-08:00

Excellant post, a little high in the clouds for me but looking forward to checking out your website!!!

In your diagram you talk about page level use of a nofollow. If a site was already built and I needed to nofollow 3 subpages (such as about us, contact us & shipping info) would you recommend nofollowing all the links on the site pointing to these 3 pages (which is alot of work) or can it be done on a page level instead.

And if I can do it on a page level will I encounter any problems versus the link level change?

1 0

Excellant post, a little high in the clouds for me but looking forward to checking out your website!!! In your diagram you talk about page level use of a nofollow. If a site was already built and I needed to nofollow 3 subpages (such as about us, contact us & shipping info) would you recommend nofollowing all the links on the site pointing to these 3 pages (which is alot of work) or can it be done on a page level instead. And if I can do it on a page level will I encounter any problems versus the link level change?
Cancel
- Sebastian
 
 2008-01-17T15:00:06-08:00
 
 The "nofollow" directive tells search engines not to follow links on an entire page (meta element, X-Robots-Tag), respectively particular links (rel-nofollow) on a do-follow'ed page. The "nofollow" is applied to the link destination, not to the page carrying the links. That means nofollow'ing a contact page condomizes all its outgoing links, but has absolutely no impact on incoming links.
 
 Currently you've no other chance than adding a rel-nofollow to all A elements that point to your contact page. That's a shitload of work, and fault-prone.
 
 There's an inofficial way to accomplish that on the site level with Google, but I won't recommend it because this method blocks crawling and indexing, besides marking the contact page as dangling node that doesn't suck PageRank.
 
 I've developed a flexible and safe method suitable to accomplish that, but I don't know whether or not the search engines will implement it (anytime soon). A few SE engineers discuss my draft, that's all I can tell so far.
 
 Sebastian edited 2008-01-17T15:02:48-08:00
 2 0
 
 The "nofollow" directive tells search engines not to follow links on an entire page (meta element, X-Robots-Tag), respectively particular links (rel-nofollow) on a do-follow'ed page. The "nofollow" is applied to the link destination, not to the page carrying the links. That means nofollow'ing a contact page condomizes all its outgoing links, but has absolutely no impact on incoming links. Currently you've no other chance than adding a rel-nofollow to all A elements that point to your contact page. That's a shitload of work, and fault-prone. There's an inofficial way to accomplish that on the site level with Google, but I won't recommend it because this method blocks crawling and indexing, besides marking the contact page as dangling node that doesn't suck PageRank. I've developed a flexible and safe method suitable to accomplish that, but I don't know whether or not the search engines will implement it (anytime soon). A few SE engineers discuss my draft, that's all I can tell so far. 
 Cancel
OphirCohen

2008-01-22T15:10:15-08:00

Sebastian,

This is a great post exaplaining the technical aspect of handling robots. I will sure mark this onr as reference.

That being said, in a world of SEO 2.0, tags, blogs, universal search, RSS and syndication - The real and main SEO challenge gets more and more technical, avoiding duplicate content and guiding the robots to places which are important to us. I think in that field the whole indistry is lacking some good practical examples such as what content do we block? from who? when? when do we use nosnippet and where do we use Meta nofollow?

I think some insights to that area may really leverage the good work you started.

OC

1 0

Sebastian, This is a great post exaplaining the technical aspect of handling robots. I will sure mark this onr as reference. That being said, in a world of SEO 2.0, tags, blogs, universal search, RSS and syndication - The real and main SEO challenge gets more and more technical, avoiding duplicate content and guiding the robots to places which are important to us. I think in that field the whole indistry is lacking some good practical examples such as what content do we block? from who? when? when do we use nosnippet and where do we use Meta nofollow? I think some insights to that area may really leverage the good work you started. OC
Cancel
virtuafriend

2011-05-05T20:56:25-07:00

Thanks for your post. You really know what you're talking about

1 0

Thanks for your post. You really know what you're talking about
Cancel
Netmasons

2010-04-23T07:37:53-07:00

Thanks, helped me explain it easily to someone!

1 0

Thanks, helped me explain it easily to someone!
Cancel
g1smd

2008-02-05T18:30:15-08:00

The very recent draft for HTML 5 now includes some stuff about the nofollow attribute.

1 0

The very recent draft for HTML 5 now includes some stuff about the nofollow attribute.
Cancel
Christian Maund-Anderson

2008-01-17T13:50:06-08:00

this is a fantastic post, very informative and inclusive!

1 0

this is a fantastic post, very informative and inclusive!
Cancel
UtahSEOPro

2008-01-20T00:29:39-08:00

Sebastian is robots.txt God. I hope you get your proposed robots.txt directives Sabastian because I know you won't sleep until you do :)

1 0

Sebastian is robots.txt God. I hope you get your proposed robots.txt directives Sabastian because I know you won't sleep until you do :) 
Cancel
omarinho

2008-01-17T07:00:01-08:00

Sebastian: I didn't know about your website, but now I have seen it and I have to say that it is great! From now on, I will have that in my frequently visited sites. :-)

1 0

Sebastian: I didn't know about your website, but now I have seen it and I have to say that it is great! From now on, I will have that in my frequently visited sites. :-)
Cancel
- SEOguys94
 
 2008-01-17T08:22:02-08:00
 
 thanks, it was a useful post
 
 1 0
 
 thanks, it was a useful post
 Cancel
Associate

Jane Copland
Associate

2008-01-16T16:55:28-08:00

Fantastic resource, Sebastian, thanks! Rarely do you come across a good guide like this. The information regarding which directives "win" over others is also great.

JaneCopland edited 2008-01-16T16:56:15-08:00
1 0

Fantastic resource, Sebastian, thanks! Rarely do you come across a good guide like this. The information regarding which directives "win" over others is also great.
Cancel
- Sebastian
 
 2008-01-16T17:46:24-08:00
 
 Thanks Jane. :) Please keep in mind that search engines might change the rules at any time without notice. All REP standards are non-binding recommendations respectively suggestions. Currently all engines have a different take on the REP, IOW each SE maintains a proprietary REP implementation. I wouldn't be surprised if for example tomorrow MSN or Ask starts to support X-Robots-Tags, but insanely decides that robots meta elements deserve priority over HTTP headers. Closely monitoring such changes is a good idea.
 
 3 0
 
 Thanks Jane. :) Please keep in mind that search engines might change the rules at any time without notice. All REP standards are non-binding recommendations respectively suggestions. Currently all engines have a different take on the REP, IOW each SE maintains a proprietary REP implementation. I wouldn't be surprised if for example tomorrow MSN or Ask starts to support X-Robots-Tags, but insanely decides that robots meta elements deserve priority over HTTP headers. Closely monitoring such changes is a good idea.
 Cancel
Kimber Scott

2008-01-16T14:42:47-08:00

sometimes you make my head spin with your pamphlets, but that's okay. i love it. this is great stuff sebastian, thanks.

1 0

sometimes you make my head spin with your pamphlets, but that's okay. i love it. this is great stuff sebastian, thanks.
Cancel
- Sebastian
 
 2008-01-16T17:27:30-08:00
 
 Thanks and sorry for the headaches. I try to simplify things, but sometimes I fail when the topic is somewhat complex.
 
 1 0
 
 Thanks and sorry for the headaches. I try to simplify things, but sometimes I fail when the topic is somewhat complex.
 Cancel
globusinternet

2008-01-16T11:32:25-08:00

Excellet Post. thanks.

1 0

Excellet Post. thanks.
Cancel
Sebastian

2008-01-16T18:04:38-08:00

@all Thanks for your thoughts. :)

1 0

@all Thanks for your thoughts. :)
Cancel
Ann Smarty

2008-01-17T02:47:51-08:00

An awesome post - please sphinn it, mozzers!

1 0

An awesome post - please <a href="https://sphinn.com/story/23574" rel="nofollow">sphinn it</a>, mozzers!
Cancel
Patrick Sexton

2008-01-17T08:30:51-08:00

Hi Sebastian, great job.

I see you made the avatar switch here at SEOmoz :)

1 0

Hi Sebastian, great job. I see you made the <a href="https://sebastians-pamphlets.com/how-to-get-the-perfect-logo-for-your-blog/" rel="nofollow">avatar switch</a> here at SEOmoz :)
Cancel
- Sebastian
 
 2008-01-17T11:42:48-08:00
 
 Hi Pat, thanks, and yes, the real red crab looks better.
 
 1 0
 
 Hi Pat, thanks, and yes, the real red crab looks better.
 Cancel
Gustavo Parra

2008-01-17T06:59:08-08:00

Thanks, another great guide to expand my knowledge on robots.txt, though is tough to keep up with the latest trends and work. Any ways I'll keep reading, I need more caffeine.

1 0

Thanks, another great guide to expand my knowledge on robots.txt, though is tough to keep up with the latest trends and work. Any ways I'll keep reading, I need more caffeine.
Cancel
zegron

2008-01-17T12:33:56-08:00

Very interesting, I never had really looked into all the inner workings of robots/crawlers.

I did have a question about crawlers and shopping carts. I have a client that was having issues with crawlers putting hundreds of thousands of dollars in to his cart and of course abandoning them. This messes up our ability to accurrately determine what his true cart abandonment rate is. This month we got 55 carts abandoned in one day for $150,00o total.

So I created a robots.txt file and uploaded it. I was pretty sure that i formatted it correctly but it still didn't work. So I nofollowed the Add to Cart buttons and we still are getting a burst of cart abandonments once or twice a month.

www.fs4sports.com/robots.txt if someone wants to peek at it and maybe point out something I missed.

1 0

Very interesting, I never had really looked into all the inner workings of robots/crawlers. I did have a question about crawlers and shopping carts. I have a client that was having issues with crawlers putting hundreds of thousands of dollars in to his cart and of course abandoning them. This messes up our ability to accurrately determine what his true cart abandonment rate is. This month we got 55 carts abandoned in one day for $150,00o total. So I created a robots.txt file and uploaded it. I was pretty sure that i formatted it correctly but it still didn't work. So I nofollowed the Add to Cart buttons and we still are getting a burst of cart abandonments once or twice a month. www.fs4sports.com/robots.txt if someone wants to peek at it and maybe point out something I missed. 
Cancel
- Sebastian
 
 2008-01-17T13:49:13-08:00
 
 You should disallow the checkout URLs too. But that's not the problem. Not all Web robots obey robots.txt. Log every HTTP request of the URIs in question, record IP, host name, user agent and such. Then check these reports frequently for bots and block those from the shopping cart with a server sided script that serves them a 401 or 403 HTTP response code.
 
 3 0
 
 You should disallow the checkout URLs too. But that's not the problem. Not all Web robots obey robots.txt. Log every HTTP request of the URIs in question, record IP, host name, user agent and such. Then check these reports frequently for bots and block those from the shopping cart with a server sided script that serves them a 401 or 403 HTTP response code.
 Cancel

Post Analytics

Comments 43

Log in to Moz

Don't have an account?