12 Ways to Keep Your Content Hidden from the Search Engines

Comments 63

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

vingold

2008-01-15T09:43:05-08:00

I think it is important to note that if you absolutely don't want a page or content crawled/indexed, etc. - you can't just rely on one or even two of these methods.

I've seen too many people think a "rel=nofollow" or form submit helps, only to have someone from the outside link to the page. And robots.txt can't be relied on completely - since not all search engines follow REPs the same way. And it too seems to be "ignored" at weird and inconvenient times.

And if you publish a full RSS feed - that brings up other issues as well, especially if the content is scraped/archived by someone else.

If you have a lot of content you want readily available on the web (no passwords required), but not available in search engines - it would almost take a full-time anti-SEO initiative employing at least a handful of these techniques just to be safe.

It is one of the mysteries of search engines that as hard as it can be to get some pages indexed, it can be just as tricky to get others never indexed in the first place.

4 0

I think it is important to note that if you absolutely don't want a page or content crawled/indexed, etc. - you can't just rely on one or even two of these methods. I've seen too many people think a "rel=nofollow" or form submit helps, only to have someone from the outside link to the page. And robots.txt can't be relied on completely - since not all search engines follow REPs the same way. And it too seems to be "ignored" at weird and inconvenient times. And if you publish a full RSS feed - that brings up other issues as well, especially if the content is scraped/archived by someone else. If you have a lot of content you want readily available on the web (no passwords required), but not available in search engines - it would almost take a full-time anti-SEO initiative employing at least a handful of these techniques just to be safe. It is one of the mysteries of search engines that as hard as it can be to get some pages indexed, it can be just as tricky to get others never indexed in the first place. 
Cancel
- Will Critchlow
 
 2008-01-15T10:00:41-08:00
 
 I've never really experimented with content where it was critical that it didn't appear in SEs. Interesting to hear that it's so hard.
 
 1 0
 
 I've never really experimented with content where it was critical that it didn't appear in SEs. Interesting to hear that it's so hard.
 Cancel
 - vingold
 
 2008-01-15T10:57:34-08:00
 
 I had a 'Disallow: /wp-login.php' declaration on a blog's robots.txt file. Somehow it still got indexed in Google, it has since fallen out, but it took a while.
 
 I don't know why someone doesn't make a default robots.txt file for the Wordpress Install.
 
 Do a Google search on inurl:wp-login.php and look at all the "page bloat" (and probable leaking link juice). Its only like 500K pages, not even a smidgeon when when talking about the Internet - but still.
 
 I also would like to have an official definitive list of all the ways Google will first find your page.
 
 A while back my son and I made a little page on Brinkster for him to mess around on. It wasn't a common URL at all (vinnymyster.com). Neither of us linked to it via any other site. So it shouldn't have just been out there and no need for a robots.txt file or any other exclusions.
 
 A month later I was doing a google search on his gamer tag (VinnyMyster) and the site popped up in Google. I did a link:vinnymyster.com search and nothing was returned. I went to Yahoo - and the site was no where to be found.
 
 Someone on a forum said that Google might have picked it up because my host or registrar might send out some RSS feed of "new sign-ups" or that Google could put in a queue newly registered domain names.
 
 I think it was because he built the site using almost 100% includes from other sites (he was trying to replicate a MySpace page), and that one of those sites gave some kind of link back to all sites calling it - and Google somehow picked that up.
 
 This was a while ago and I never did quite figure it out.
 
 I guess the point is - you can never know 100% where Google might be finding its incoming link - so make sure you stay on top of it.
 
 2 0
 
 I had a 'Disallow: /wp-login.php' declaration on a blog's robots.txt file. Somehow it still got indexed in Google, it has since fallen out, but it took a while. I don't know why someone doesn't make a default robots.txt file for the Wordpress Install. Do a Google search on inurl:wp-login.php and look at all the "page bloat" (and probable leaking link juice). Its only like 500K pages, not even a smidgeon when when talking about the Internet - but still. I also would like to have an official definitive list of all the ways Google will first find your page. A while back my son and I made a little page on Brinkster for him to mess around on. It wasn't a common URL at all (vinnymyster.com). Neither of us linked to it via any other site. So it shouldn't have just been out there and no need for a robots.txt file or any other exclusions. A month later I was doing a google search on his gamer tag (VinnyMyster) and the site popped up in Google. I did a link:vinnymyster.com search and nothing was returned. I went to Yahoo - and the site was no where to be found. Someone on a forum said that Google might have picked it up because my host or registrar might send out some RSS feed of "new sign-ups" or that Google could put in a queue newly registered domain names. I think it was because he built the site using almost 100% includes from other sites (he was trying to replicate a MySpace page), and that one of those sites gave some kind of link back to all sites calling it - and Google somehow picked that up. This was a while ago and I never did quite figure it out. I guess the point is - you can never know 100% where Google might be finding its incoming link - so make sure you stay on top of it.
 Cancel
 - Will Critchlow
 
 2008-01-15T11:20:19-08:00
 
 I always used to be paranoid about G toolbar for finding sites that didn't have links to them...
 
 2 0
 
 I always used to be paranoid about G toolbar for finding sites that didn't have links to them...
 Cancel
 - Lori Bourne
 
 2008-01-15T13:38:05-08:00
 
 Good points, vingold. I have also seen that it can take a long, long time to get content out of the SEs once it's already been indexed. So if you plan on hiding some of your content, it's best to do it right away rather than waiting.
 
 2 0
 
 Good points, vingold. I have also seen that it can take a long, long time to get content out of the SEs once it's already been indexed. So if you plan on hiding some of your content, it's best to do it right away rather than waiting. 
 Cancel
 - Dr. Peter J. Meyers
 
 2008-01-15T15:11:45-08:00
 
 Part of the problem nowadays is with the authority of community sites. It's amazing how quickly a personal page can get picked up just because you linked to it on Facebook, MySpace, LiveJournal, etc.
 
 I've also found that password-protected areas are about the only surefire method.
 
 2 0
 
 Part of the problem nowadays is with the authority of community sites. It's amazing how quickly a personal page can get picked up just because you linked to it on Facebook, MySpace, LiveJournal, etc. I've also found that password-protected areas are about the only surefire method. 
 Cancel
 - Sebastian
 
 2008-01-15T15:18:38-08:00
 
 A "Disallow:" statement in robots.txt doesn't prevent from indexing by design. In fact, "Disallow:/" means "Don't crawl it, but feel free to index" it. That's perfectly compliant to all related Web standards. See my comment above.
 
 3 0
 
 A "Disallow:" statement in robots.txt doesn't prevent from indexing by design. In fact, "Disallow:/" means "Don't crawl it, but feel free to index" it. That's perfectly compliant to all related Web standards. <a href="https://www.seomoz.org/blog/12-ways-to-keep-your-content-hidden-from-the-search-engines#jtc45926" rel="nofollow">See my comment above</a>. 
 Cancel
 - vingold
 
 2008-01-15T17:53:08-08:00
 
 Sebastian, excellent! Your post certainly clears a lot of things up.
 
 I need to print it out, read it, let it simmer in my head, rinse and repeat.
 
 It is a very definitive breakdown of the whole no-indexing issue.
 
 So much for what I picked up in the forums.
 
 1 0
 
 Sebastian, excellent! Your post certainly clears a lot of things up. I need to print it out, read it, let it simmer in my head, rinse and repeat. It is a very definitive breakdown of the whole no-indexing issue. So much for what I picked up in the forums.
 Cancel
Sebastian

2008-01-15T06:00:27-08:00

Rand, your REP methods (robots.txt, robots meta element) need some nitpicking clarification. ;)

Currently we've no indexer directives for robots.txt. Crawler directives like "Disallow:" do not prevent from indexing based on 3rd party signals or even internal links pointing to disallow'ed contents. Also, disallow'ed URLs waste PageRank. If you want to deindex stuff you shouldn't block it in robots.txt, because all indexer directives require crawling.

Only Google obeys the "noindex" directive in robots meta tags as well as X-Robots-Tags (I agree with Joost, REP tags in the HTTP header are way sexier than meta elements on the page, and they work with PDFs and other non-HTML content types too). Yahoo as well as MSN do index references to URLs carrying a "noindex" REP tag.

Selfish but relevant link drop:

Getting stuff out of Google - the good, the popular, and the definitive way

Thanks for your time

Sebastian

Sebastian edited 2008-01-15T06:03:13-08:00
5 1

Rand, your REP methods (robots.txt, robots meta element) need some nitpicking clarification. ;) Currently we've no indexer directives for robots.txt. Crawler directives like "Disallow:" do not prevent from indexing based on 3rd party signals or even internal links pointing to disallow'ed contents. Also, disallow'ed URLs waste PageRank. If you want to deindex stuff you shouldn't block it in robots.txt, because all indexer directives require crawling. Only Google obeys the "noindex" directive in robots meta tags as well as X-Robots-Tags (I agree with Joost, REP tags in the HTTP header are way sexier than meta elements on the page, and they work with PDFs and other non-HTML content types too). Yahoo as well as MSN do index references to URLs carrying a "noindex" REP tag. Selfish but relevant link drop: <a href="https://sebastians-pamphlets.com/getting-urls-out-of-google-the-good-popular-definitive-way/" rel="nofollow">Getting stuff out of Google - the good, the popular, and the definitive way</a> Thanks for your time Sebastian
Cancel
- Rand Fishkin
 
 2008-01-15T11:04:16-08:00
 
 Excellent points, Sebastian, and no worries on the link drop - we certainly appreciate them when relevant :)
 
 I'll try to update the piece with your notes!
 
 2 0
 
 Excellent points, Sebastian, and no worries on the link drop - we certainly appreciate them when relevant :) I'll try to update the piece with your notes! 
 Cancel
 - Sebastian
 
 2008-01-15T12:47:08-08:00
 
 I've submitted additional REP info to YOUmoz.
 Thanks
 Sebastian
 
 1 0
 
 I've submitted additional REP info to YOUmoz. Thanks Sebastian
 Cancel
 - Rand Fishkin
 
 2008-01-15T15:32:45-08:00
 
 Sebastian - terrific! Perhaps I can simply link over to your post to help guide the more complex aspects of noindex and disallow.
 
 1 0
 
 Sebastian - terrific! Perhaps I can simply link over to your post to help guide the more complex aspects of noindex and disallow.
 Cancel
 - aimClear
 
 2008-01-15T15:36:28-08:00
 
 Thanks, that's how I understand it. I was wondering why you mentioned noFollow in the post. :)
 
 1 0
 
 Thanks, that's how I understand it. I was wondering why you mentioned noFollow in the post. :)
 Cancel
 - Sebastian
 
 2008-01-16T04:41:07-08:00
 
 Thanks Rand, I'd certainly apprecite the link. :)
 By the way, it seems my REP post is stuck in the YOUmoz moderation queue.
 
 1 0
 
 Thanks Rand, I'd certainly apprecite the link. :) By the way, it seems my REP post is stuck in the YOUmoz moderation queue.
 Cancel
- Lord Manley
 
 2008-01-16T09:29:31-08:00
 
 Yahoo! Search added support for the X-Robots-Tag directive back at the beginning of December, and to date I have had no experience of them failing to adhere to the tag (and I have just checked my test-beds).
 
 If you have definite examples of Slurp or the Yahoo! index disobeying the Robots Exclusion Protocol headers then I would be interested in seeing them.
 
 1 0
 
 Yahoo! Search added support for the X-Robots-Tag directive back at the beginning of December, and to date I have had no experience of them failing to adhere to the tag (and I have just checked my test-beds). If you have definite examples of Slurp or the Yahoo! index disobeying the Robots Exclusion Protocol headers then I would be interested in seeing them. 
 Cancel
- globusinternet
 
 2008-01-16T11:26:14-08:00
 
 Great Stuff there Sebastian. Thanks for the link.
 
 1 0
 
 Great Stuff there Sebastian. Thanks for the link.
 Cancel
Pulkit ILoveFashionRetail.com

2008-01-15T04:56:43-08:00

And I think I deserve an extra moz point to find this spelling mistake -

11. the liklihood is too high :)

4 0

And I think I deserve an extra moz point to find this spelling mistake - 11. the liklihood is too high :)
Cancel
- Rebecca Kelley
 
 2008-01-15T19:05:04-08:00
 
 Sorry, I was a bit late in editing this post.
 
 1 0
 
 Sorry, I was a bit late in editing this post.
 Cancel
special_k-23808

2008-01-15T05:26:48-08:00

The search engines may not have pig-latin translators, however Google has translated its own search engine into pig latin for you:

https://www.google.com/intl/xx-piglatin/

:)

edit: to insert link correctly

special_k-23808 edited 2008-01-15T05:28:49-08:00
2 0

The search engines may not have pig-latin translators, however Google has translated its own search engine into pig latin for you: <a href="https://www.google.com/intl/xx-piglatin/" rel="nofollow"> https://www.google.com/intl/xx-piglatin/</a> :) edit: to insert link correctly 
Cancel
- le_ffrench
 
 2008-01-15T06:07:03-08:00
 
 There is another method I have seen in use on some websites. They make all the links towards these pages unreadable with javascript. The links are obfuscated, so that search engines may not even detect they are links to other pages.
 
 1 0
 
 There is another method I have seen in use on some websites. They make all the links towards these pages unreadable with javascript. The links are obfuscated, so that search engines may not even detect they are links to other pages. 
 Cancel
Matt Burgess

2008-01-16T19:44:30-08:00

Hey Rand,

Belated comment here as the comment about iframes sparked off an idea that I had to follow through on first!

So thanks for a very timely post, which just may have saved me a very large headache :)

2 0

Hey Rand, Belated comment here as the comment about iframes sparked off an idea that I had to follow through on first! So thanks for a very timely post, which just may have saved me a very large headache :) 
Cancel
Vishal Mehta

2015-12-15T03:58:04-08:00

8 years !!! Need to update this info now, and other thing is need solution for how to avoid unwanted keywords in google webmasters

2 0

8 years !!! Need to update this info now, and other thing is need solution for how to avoid unwanted keywords in google webmasters
Cancel
UtahSEOPro

2008-01-19T22:33:28-08:00

I wouldn't rely on robots.txt alone. I think using all preventative measures you can use (robots.txt, meta noindex/nocrawl, and rel=nofollow) is the best route.

1 0

I wouldn't rely on robots.txt alone. I think using all preventative measures you can use (robots.txt, meta noindex/nocrawl, and rel=nofollow) is the best route. 
Cancel
Gunjan Pandya

2008-01-15T22:13:52-08:00

Thanks Rand,

Again Great Post

1 0

Thanks Rand, Again Great Post 
Cancel
PeterJ

2008-01-17T02:52:31-08:00

I would like to know your trick of using robots.txt to block an inpage (but external URL) iframe. I'm sure this is not possible as the robots standard, for obvious reasons, ignores references to external URL's.

1 0

I would like to know your trick of using robots.txt to block an inpage (but external URL) iframe. I'm sure this is not possible as the robots standard, for obvious reasons, ignores references to external URL's.
Cancel
Don Garcia

2008-01-22T12:43:39-08:00

Thanks for having this 12 ways to keep the content hidden from the SEs, This is really a great for me as a beginner in SEO..GOOD Work and Keep it up.

1 0

Thanks for having this 12 ways to keep the content hidden from the SEs, This is really a great for me as a beginner in SEO..GOOD Work and Keep it up.
Cancel
globusinternet

2008-01-16T11:29:29-08:00

Great Post. First we learn how to get the content into search engine and now how to keep the content out of search engines way.

1 0

Great Post. First we learn how to get the content into search engine and now how to keep the content out of search engines way.
Cancel
Shraddha Gore

2012-01-12T13:06:55-08:00

Hello, I know this is an old blog but I had a quick question with respect to iframes. I have a persistent piece of content on my site that I do not want Google or any other search engine to index and I have included that content in an iframe as mentioned in this blog and the link from which the iframes derives content is set as noindex, nofollow but still the content is indexed by search engines. Could you please tell me a way to solve this problem? Thanks!

1 0

Hello, I know this is an old blog but I had a quick question with respect to iframes. I have a persistent piece of content on my site that I do not want Google or any other search engine to index and I have included that content in an iframe as mentioned in this blog and the link from which the iframes derives content is set as noindex, nofollow but still the content is indexed by search engines. Could you please tell me a way to solve this problem? Thanks!
Cancel
Anchal Saroha

2016-03-10T19:50:01-08:00

hey, i want a clarification i.e. i am using a blog in which i have a directory which i dont want to index but i put it in robot.txt and then also google is showing results, in which it is written "description not available due to site robots.txt but why is it index? can u help me? thanks in advance.

EricaMcGillivray edited 2016-03-11T11:07:54-08:00
1 0

hey, i want a clarification i.e. i am using a blog in which i have a directory which i dont want to index but i put it in robot.txt and then also google is showing results, in which it is written "description not available due to site robots.txt but why is it index? can u help me? thanks in advance. 
Cancel
uncomplicated

2016-07-09T15:08:14-07:00

Ever since the Phoenicians invented money the matter has had a simple solution: popular demand. The companies that bother you with unsolicited information when using a search engine do so expecting some return for what they pay for the advertising. We make it a custom to preclude doing business with entities that show up during our searches and interfere with our main goal. The search engines providers then might find it beneficial to put the filters themselves in order to protect the interests of their paying clients. Filters that allow searchers to pinpoint relevant data, excluding all others not specifically checked, might be of great benefit to paying advertisers.

1 0

Ever since the Phoenicians invented money the matter has had a simple solution: popular demand. The companies that bother you with unsolicited information when using a search engine do so expecting some return for what they pay for the advertising. We make it a custom to preclude doing business with entities that show up during our searches and interfere with our main goal. The search engines providers then might find it beneficial to put the filters themselves in order to protect the interests of their paying clients. Filters that allow searchers to pinpoint relevant data, excluding all others not specifically checked, might be of great benefit to paying advertisers.
Cancel
coolbuck

2013-03-20T11:53:23-07:00

Hi there,
Currently we have so many unwanted search engines browse our website https://coolbuck.com. For example,
5.199.133.88 [arni.name]
182.118.22.206 [hn.kd.ny.adsl]
74.215.13.162 [WS1-DSL-74-215-13-162.fuse.net]
5.9.27.74 [5-9-27-74.crawler.sistrix.net]
... and many more.

We want to block them from crawling our website. What should we write into our robot.txt.

Thx.
Peter

coolbuck edited 2013-03-20T11:57:16-07:00
1 0

Hi there, Currently we have so many unwanted search engines browse our website https://coolbuck.com. For example, 5.199.133.88 [arni.name] 182.118.22.206 [hn.kd.ny.adsl] 74.215.13.162 [WS1-DSL-74-215-13-162.fuse.net] 5.9.27.74 [5-9-27-74.crawler.sistrix.net] ... and many more. We want to block them from crawling our website. What should we write into our robot.txt. Thx. Peter 
Cancel
Kiva Bottero

2012-06-17T00:15:43-07:00

You can also save a PDF as an image. It's a PDF that looks like any other PDF except the file size is larger.

1 0

You can also save a PDF as an image. It's a PDF that looks like any other PDF except the file size is larger.
Cancel
Sylvain Gauchet

2010-06-29T06:49:50-07:00

Hi,

Great post! I'm new on SEOmoz and the value of some post is just incredible.

I have a more specific question about something we're trying to do.The account settings of our community members are in a jquery sliding panel which is on top of everypage. The problem is that search bots can crawl through it (checked on seo-browser): it is unecessary and prevents them from seeing the important content of the page.

From this post I would assume that the best way to do that is to use iframes. Am I correct?

1 0

Hi, Great post! I'm new on SEOmoz and the value of some post is just incredible. I have a more specific question about something we're trying to do.The account settings of our community members are in a jquery sliding panel which is on top of everypage. The problem is that search bots can crawl through it (checked on seo-browser): it is unecessary and prevents them from seeing the important content of the page. From this post I would assume that the best way to do that is to use iframes. Am I correct?
Cancel
SarvenCapadisli

2008-01-15T21:22:41-08:00

I don't believe there is any evidence that supports the claim that rel="nofollow" prevents the crawlers from visiting a resource.

1 0

I don't believe there is any evidence that supports the claim that rel="nofollow" prevents the crawlers from visiting a resource. 
Cancel
- Rand Fishkin
 
 2008-01-15T22:31:17-08:00
 
 Sarven - if you read my comments, that's exactly what I wrote! Maybe an additional diagram showing this would drive the point home further.
 
 1 0
 
 Sarven - if you read my comments, that's exactly what I wrote! Maybe an additional diagram showing this would drive the point home further.
 Cancel
Fizzarotti

2009-04-24T13:25:36-07:00

Rand, great info. I have a similar problem, but slightly different.

We upload pdf files that change each year as updated health plans are released. We delete the pdf's from prior years to avoid duplicate content because plans from year to year are very similiar but Google still has these pages indexed. After we remove them, Google reports that the pages are 'not found' in webmaster tools. Should I remove these pages on a case by case basis through webmaster tools? or is there a more efficient way? Thanks Rand! Really appreciate the input.

1 0

Rand, great info. I have a similar problem, but slightly different. We upload pdf files that change each year as updated health plans are released. We delete the pdf's from prior years to avoid duplicate content because plans from year to year are very similiar but Google still has these pages indexed. After we remove them, Google reports that the pages are 'not found' in webmaster tools. Should I remove these pages on a case by case basis through webmaster tools? or is there a more efficient way? Thanks Rand! Really appreciate the input.
Cancel
Staff

Dr. Peter J. Meyers
Staff

2008-01-15T15:13:48-08:00

A question (for you and the Moz community): what about content that's already out there but you want to have removed (especially dynamic content where the removal request isn't feasible)? I've found that the META tags don't always work for content that's already spidered, and even Robots.txt takes weeks to affect the pre-existing index.

Just wondering if one method might be better than others for pages that have previously been indexed (as opposed to entirely new pages).

1 0

A question (for you and the Moz community): what about content that's already out there but you want to have removed (especially dynamic content where the removal request isn't feasible)? I've found that the META tags don't always work for content that's already spidered, and even Robots.txt takes weeks to affect the pre-existing index. Just wondering if one method might be better than others for pages that have previously been indexed (as opposed to entirely new pages).
Cancel
- Sebastian
 
 2008-01-15T15:27:17-08:00
 
 Deindexing and Not-Indexing follow the same set of rules. The problem is that with a pretty low crawl frequency changes in your indexer directives don't take effect instantly. Submit an XML sitemap (with all URLs you want to have deidexed) to refresh the robots.txt cache and to force crawling of recently outdated resources. When you assign a "noindex" REP tag to an URL, search engines won't obey it unless they've crawled and processed it. Not that all search engines obey "noindex" REP tags...
 
 3 0
 
 Deindexing and Not-Indexing follow the same set of rules. The problem is that with a pretty low crawl frequency changes in your indexer directives don't take effect instantly. Submit an XML sitemap (with all URLs you want to have deidexed) to refresh the robots.txt cache and to force crawling of recently outdated resources. When you assign a "noindex" REP tag to an URL, search engines won't obey it unless they've crawled and processed it. Not that all search engines obey "noindex" REP tags...
 Cancel
 - Dr. Peter J. Meyers
 
 2008-01-15T15:30:19-08:00
 
 Interesting, thanks. I wondered whether it was an issue of delisting working differently or just a matter of the time it takes to "rediscover" secondary pages. I thought it might be what you suggested, but didn't have any verification.
 
 1 0
 
 Interesting, thanks. I wondered whether it was an issue of delisting working differently or just a matter of the time it takes to "rediscover" secondary pages. I thought it might be what you suggested, but didn't have any verification.
 Cancel
 - Lord Manley
 
 2008-01-16T09:42:47-08:00
 
 The Removal Request Tool in the Google Webmaster Console is going to help you here as well.
 
 Google require you to have followed the appropriate actions for excluding the content, that is you need to:
 
 Ensure requests for the page return an HTTP status code of either 404 or 410,
 Ensure that the pages you want to remove have been blocked using robots.txt. If you want to remove your site or an entire directory, you must choose this option, or
 Ensure that the pages you want to remove have been blocked using a meta noindex tag.
 As yet Google have not announced that the Removal Request Tool supports the X-Robots-Meta protocol.Details are available from Google at the webmaster help centre.
 
 1 0
 
 The Removal Request Tool in the Google Webmaster Console is going to help you here as well. Google require you to have followed the appropriate actions for excluding the content, that is you need to: <ol><li>Ensure requests for the page return an HTTP status code of either 404 or 410,</li><li>Ensure that the pages you want to remove have been blocked using robots.txt. If you want to remove your site or an entire directory, you must choose this option, or </li><li>Ensure that the pages you want to remove have been blocked using a meta noindex tag.</li></ol>As yet Google have not announced that the Removal Request Tool supports the X-Robots-Meta protocol.Details are <a href="https://www.google.com/support/webmasters/bin/answer.py?answer=61062&ctx=sibling" rel="nofollow">available from Google</a> at the webmaster help centre. 
 Cancel
Joost de Valk

2008-01-15T05:21:44-08:00

I'm missing one, although you could see it as part of the robots.txt thing: the x-robots-tag. Cool thing about it is that it's even harder to detect for people :)

1 0

I'm missing one, although you could see it as part of the robots.txt thing: the x-robots-tag. Cool thing about it is that it's even harder to detect for people :)
Cancel
- Lord Manley
 
 2008-01-16T09:15:12-08:00
 
 The x-robots-tag meta is an excellent addition and allows SWF and PDF files, in particular, to be excluded with ease.
 
 The downside is that, to date, only Google and Yahoo! are supporting the tag. Until at least two other engines begin supporting this meta tag I think that it has to remain an addition, rather than an alternative, to other methods.
 
 Manley edited 2008-01-16T09:16:50-08:00
 1 0
 
 The x-robots-tag meta is an excellent addition and allows SWF and PDF files, in particular, to be excluded with ease. The downside is that, to date, only Google and Yahoo! are supporting the tag. Until at least two other engines begin supporting this meta tag I think that it has to remain an addition, rather than an alternative, to other methods. 
 Cancel
Judith Lewis

2008-01-15T06:24:25-08:00

You know Rand, I was just wondering about methods of exposing myself less. Too often I've been finding myself exposed all over the place, multiple times. I'm sure people are getting sick of me exposing too much of myself and...

*giggles*

In all seriousness - this is a fantastic post. I have a site that's just going through several issues where they are even deduping stuff scrapers stole. It's been a long process.

I've also recommended the iFrames technique to a friend who needed to hide some content but not all from search robots on some pages.

*teeheehee* but everyone has to love the Pig Latin suggestion! And you'd probably rank extremely well (until everyone caught on... damn them ;-) )

deCabbit edited 2008-01-15T06:24:50-08:00
1 0

You know Rand, I was just wondering about methods of exposing myself less. Too often I've been finding myself exposed all over the place, multiple times. I'm sure people are getting sick of me exposing too much of myself and... *giggles* In all seriousness - this is a fantastic post. I have a site that's just going through several issues where they are even deduping stuff scrapers stole. It's been a long process. I've also recommended the iFrames technique to a friend who needed to hide some content but not all from search robots on some pages. *teeheehee* but everyone has to love the Pig Latin suggestion! And you'd probably rank extremely well (until everyone caught on... damn them ;-) )
Cancel
Pulkit ILoveFashionRetail.com

2008-01-15T04:51:05-08:00

There is one more good reason why you should consider hiding some of your content. Spammers... If you don't hide, they can actually screw your mailbox and yes rankings too.Needless to mention, nice post Rand !! Cheers !

PulkitRas edited 2008-01-15T04:52:55-08:00
1 0

There is one more good reason why you should consider hiding some of your content. Spammers... If you don't hide, they can actually screw your mailbox and yes rankings too.Needless to mention, nice post Rand !! Cheers ! 
Cancel
Maria S Balayan

2008-01-15T04:44:24-08:00

Great post, thanks Rand!

1 0

Great post, thanks Rand!
Cancel
- cvos
 
 2008-01-16T18:04:53-08:00
 
 Why didn't you know that all the nouveau webmasters don't want you to see their content? At least if you are in the digg demographic. All those pesky kids steal bandwitdth and never click on ads, like Dennis the Menace.
 
 https://whydiggisblocked.com/
 
 If people refuse to understand social media, should they be enlightened?
 
 1 0
 
 Why didn't you know that all the nouveau webmasters don't want you to see their content? At least if you are in the digg demographic. All those pesky kids steal bandwitdth and never click on ads, like Dennis the Menace. https://whydiggisblocked.com/ If people refuse to understand social media, should they be enlightened? 
 Cancel
Cupsize

2008-01-15T03:18:33-08:00

What about using JavaScript to alter the CSS. This could present slightly different content to customers rather than search engines.

Where's the fine line between doing this to game the system and doing it for accessibility e.g. adding section headings for screenreaders etc.

1 0

What about using JavaScript to alter the CSS. This could present slightly different content to customers rather than search engines. Where's the fine line between doing this to game the system and doing it for accessibility e.g. adding section headings for screenreaders etc.
Cancel
Ann Smarty

2008-01-15T03:24:34-08:00

"This technique is, ironically, popular with webmasters who mistakenly assume that search engine spiders are spammers attempting to steal their content, and thus block the IP ranges to restrict access and save bandwidth."

Haha! Never heard about that!

Great post - I love it to have all techniques listed in one place!

1 0

<blockquote>"This technique is, ironically, popular with webmasters who mistakenly assume that search engine spiders are spammers attempting to steal their content, and thus block the IP ranges to restrict access and save bandwidth."</blockquote> Haha! Never heard about that! Great post - I love it to have all techniques listed in one place!
Cancel
rtwoods

2008-01-15T08:12:23-08:00

Great info Rand! I'm gonig to share this on our E-Marketing Performance reading list!

1 0

Great info Rand! I'm gonig to share this on our E-Marketing Performance reading list!
Cancel
Gamermk

2008-01-15T08:17:05-08:00

#6 Forcing Form Submission

Really just felt like filler b/c as you say while it works as soon as a link comes bounding in.. it doesn't : /

Would have been a stronger piece if you'd have just say 11 Ways.

1 0

#6 Forcing Form Submission Really just felt like filler b/c as you say while it works as soon as a link comes bounding in.. it doesn't : / Would have been a stronger piece if you'd have just say 11 Ways. 
Cancel
justFred

2008-01-15T02:40:34-08:00

Welcome to SEOmoz Mister Morgelsprocken!

:P

Thanks for another great post Rand.

1 0

Welcome to SEOmoz Mister Morgelsprocken! :P Thanks for another great post Rand. 
Cancel
aimClear

2008-01-15T15:16:43-08:00

Rand, Thanks for the Pig Latin tip. Rand, please discuss the difference between distributing link juice and hiding away content. We do as you do and employ noFollows all throughout...say a comments thread. For instance on this page " Reply Private Message Permalink Add Commentare all noFollow. I understand why and am interested in hearing you discuss the different approaches and reasoning in noFollow vrs hiding content by other methods. THANKS.

1 0

Rand, Thanks for the Pig Latin tip. Rand, please discuss the difference between distributing link juice and hiding away content. We do as you do and employ noFollows all throughout...say a comments thread. For instance on this page " <a>Reply</a> <a href="../messages/write/60506">Private Message</a> <a href="#jtc46039">Permalink</a> <a>Add Comment</a>are all noFollow. I understand why and am interested in hearing you discuss the different approaches and reasoning in noFollow vrs hiding content by other methods. THANKS.
Cancel
- Rand Fishkin
 
 2008-01-15T15:32:08-08:00
 
 Marty - the thing is, nofollow is used to control the flow of PageRank - we wrote about that here. Keeping content away from the engines entirely is more the focus of this post.
 
 1 0
 
 Marty - the thing is, nofollow is used to control the flow of PageRank - we wrote about that <a href="how-pagerank-works-why-the-original-pr-formula-may-be-flawed">here</a>. Keeping content away from the engines entirely is more the focus of this post.
 Cancel
 - aimClear
 
 2008-01-15T15:35:24-08:00
 
 Thanks, that's how I understand it.
 
 1 0
 
 Thanks, that's how I understand it.
 Cancel
Remi Turcotte

2008-01-15T11:07:16-08:00

What about flash ? Is it indexable ?

1 0

What about flash ? Is it indexable ?
Cancel
- LukeE
 
 2008-01-15T14:50:36-08:00
 
 You're right there - Flash certainly isn't indexable. Thankfully, we can get similarly impressive UI using CSS!
 
 LukeE edited 2008-01-15T14:52:38-08:00
 1 0
 
 You're right there - Flash certainly isn't indexable. Thankfully, we can get similarly impressive UI using CSS!
 Cancel
 - Lord Manley
 
 2008-01-16T09:31:48-08:00
 
 I think you may be somewhat overconfident there.
 
 I would not say that google are incapable of crawling content and following links in swf files. The Flash spidering test in my public test-bed has been compromised by a third party linking to the destination page, so I am not able to give you a firm example, but relying on Adobe Flash as a method of excluding content would not be a good model.
 
 Manley edited 2008-01-16T09:35:18-08:00
 2 0
 
 I think you may be somewhat overconfident there. I would not say that google are incapable of crawling content and following links in swf files. The Flash spidering test in my public test-bed has been compromised by a third party linking to the destination page, so I am not able to give you a firm example, but relying on Adobe Flash as a method of excluding content would not be a good model. 
 Cancel
 - Will Critchlow
 
 2008-01-16T09:43:08-08:00
 
 Shoulda made your test page a little less compelling, huh?
 
 I know they are testing a lot of things to index inside flash. Expect to see real progress soon.
 
 1 0
 
 Shoulda made your test page a little less compelling, huh? I know they are testing a lot of things to index inside flash. Expect to see real progress soon.
 Cancel
- Dave3
 
 2014-10-01T06:56:38-07:00
 
 Now flash can be indexed as well
 
 https://support.google.com/webmasters/answer/72746?hl=en
 
 1 0
 
 Now flash can be indexed as well https://support.google.com/webmasters/answer/72746?hl=en
 Cancel
awatson

2008-01-15T09:27:29-08:00

Hadn't thought of using iframe for that - could be hany.

Flash is also an option (similar to images/java)...

And what about Ajax?

1 0

Hadn't thought of using iframe for that - could be hany. Flash is also an option (similar to images/java)... And what about Ajax? 
Cancel
- Lord Manley
 
 2008-01-16T09:45:23-08:00
 
 AJAX can be used to exclude content for exactly this purpose.
 
 I provided a news blog solution to a British financial bank where I use this approach to supply users with content on the home page without having that content indexed, thuis avoiding duplication issues between the front page and the specific article pages.
 
 1 0
 
 AJAX can be used to exclude content for exactly this purpose. I provided a news blog solution to a British financial bank where I use this approach to supply users with content on the home page without having that content indexed, thuis avoiding duplication issues between the front page and the specific article pages. 
 Cancel
Associate

Will Critchlow
Associate

2008-01-15T08:52:59-08:00

Getting the feeling you could only really think of 11 there, Rand?

What did twiceler do to you?

1 0

Getting the feeling you could only really think of 11 there, Rand? What did twiceler do to you?
Cancel
aimClear

2008-01-15T15:33:54-08:00

Here is Stephan Spencer weighing in with Matt Cutts: https://blogs.cnet.com/8301-13530_1-9834708-28.html

1 0

Here is Stephan Spencer weighing in with Matt Cutts: https://blogs.cnet.com/8301-13530_1-9834708-28.html
Cancel

Post Analytics

Comments 63

Log in to Moz

Don't have an account?