Logic, Meet Google - Crawling to De-index

Comments 85

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

James Norquay

2012-03-21T21:20:17-07:00

Nice post DR Pete, I agree this is an area where people make a few small changes and it can have a big impact on the website overall and hurt the linking strucutre.

In my eyes straight 301's are the best fix if the content is really poor or to re do the content on the page to make it more unique, I never really have been a fan of no index tags or robots.txt blocks unless it is a private section of the site which is a must to be blocked.

But overall love the posts you have been making, the images are the best =)

6 0

Nice post DR Pete, I agree this is an area where people make a few small changes and it can have a big impact on the website overall and hurt the linking strucutre. In my eyes straight 301's are the best fix if the content is really poor or to re do the content on the page to make it more unique, I never really have been a fan of no index tags or robots.txt blocks unless it is a private section of the site which is a must to be blocked. But overall love the posts you have been making, the images are the best =)
Cancel
- CommercePundit
 
 2012-03-21T22:18:22-07:00
 
 This blog post and your comment makes my day. Right now, I have similar problem with my eCommerce website. I want to give one live example to give more idea. As per your suggestion, 301 redirect is best solution for poor page. But, I can't remove my review pages because, I have to make it live.
 
 This is my product page:
 
 https://www.vistastores.com/patio-umbrellas-california-umbrella-alus756-sp57-yellow.html
 
 And, This is my review page:
 
 https://www.vistastores.com/review/product/list/id/1453/
 
 I have set Meta Robots NOINDEX,FOLLOW, to product page and restrict review folder by Robots.txt.
 
 I was surviving with big question about my product page performance. Product pages' indexing is quite poor due to certain issue. After reading your comment and this blog post, I assumed that, it's happening due to 3 implementation on review page. What you think about it? Can you give me some input over here? I did not want to add my question in comment because discussion board is right place for it. But, I imagine This is right way and platform to add my issue on same subject blog post.
 
 BTW: Dr. Pete. As I said, You make me happy with this blog post. Thanks for sharing.
 
 3 2
 
 This blog post and your comment makes my day. Right now, I have similar problem with my eCommerce website. I want to give one live example to give more idea. As per your suggestion, 301 redirect is best solution for poor page. But, I can't remove my review pages because, I have to make it live. This is my product page: <a href="https://www.vistastores.com/patio-umbrellas-california-umbrella-alus756-sp57-yellow.html" rel="nofollow">https://www.vistastores.com/patio-umbrellas-california-umbrella-alus756-sp57-yellow.html</a> And, This is my review page: <a href="https://www.vistastores.com/review/product/list/id/1453/" rel="nofollow">https://www.vistastores.com/review/product/list/id/1453/</a> I have set Meta Robots NOINDEX,FOLLOW, to product page and restrict review folder by <a href="https://www.vistastores.com/robots.txt" rel="nofollow">Robots.txt</a>. I was surviving with big question about my product page performance. Product pages' indexing is quite poor due to certain issue. After reading your comment and this blog post, I assumed that, it's happening due to 3 implementation on review page. What you think about it? Can you give me some input over here? I did not want to add my question in comment because discussion board is right place for it. But, I imagine This is right way and platform to add my issue on same subject blog post. BTW: Dr. Pete. As I said, You make me happy with this blog post. Thanks for sharing.
 Cancel
 - Dave Fowler
 
 2012-03-24T16:26:51-07:00
 
 "I have set Meta Robots NOINDEX, FOLLOW to product page"
 
 If I've read your comment correctly it sounds like you're planning to apply (or have already applied) NOINDEX to your product pages?
 
 If correct, that is surely a very bad idea...you'd be telling the search engines not to add your *product* pages to their index? It's late here in the UK, so apologies if I've misread your statement.
 
 2 0
 
 "I have set Meta Robots NOINDEX, FOLLOW to product page" If I've read your comment correctly it sounds like you're planning to apply (or have already applied) NOINDEX to your product pages? If correct, that is surely a very bad idea...you'd be telling the search engines not to add your *product* pages to their index? It's late here in the UK, so apologies if I've misread your statement.
 Cancel
 - CommercePundit
 
 2012-03-26T04:13:19-07:00
 
 +1 from my side to find out my comment mistake. That was for my review pages. Today, I have good news about this error. I have removed internal review system from website due to this issue and integrate PowerReviews. I have restricted all review pages by Robots.txt to fix this issue.
 
 1 0
 
 +1 from my side to find out my comment mistake. That was for my review pages. Today, I have good news about this error. I have removed internal review system from website due to this issue and integrate PowerReviews. I have restricted all review pages by Robots.txt to fix this issue.
 Cancel
- Dr. Peter J. Meyers
 
 2012-03-22T06:59:34-07:00
 
 I should point out that I only used NOINDEX as an example - my hastily-added point at the end was meant to reflect that the same goes for 301s and rel-canonical. This problem applies to a bunch of tactical combinations.
 
 I will say, though, that in this particular example, a 301-redirect wouldn't be feasible. The review pages are actually distinct pages, and users need to see page that's tied to the proper product. You could capture that in a cookie or session variable and then 301-redirect, but you wouldn't want to send visitors all to one generic review form.
 
 3 0
 
 I should point out that I only used NOINDEX as an example - my hastily-added point at the end was meant to reflect that the same goes for 301s and rel-canonical. This problem applies to a bunch of tactical combinations. I will say, though, that in this particular example, a 301-redirect wouldn't be feasible. The review pages are actually distinct pages, and users need to see page that's tied to the proper product. You could capture that in a cookie or session variable and then 301-redirect, but you wouldn't want to send visitors all to one generic review form.
 Cancel
 - CommercePundit
 
 2012-03-22T21:20:43-07:00
 
 Thanks for your reply. I got your point. I just want to capture page rank which may available after getting few unique review. I'm going to implement NOINDEX FOLLOW tag which help me to avoid poor indexing as well as capture page rank from those pages to my valuable product page.
 
 1 0
 
 Thanks for your reply. I got your point. I just want to capture page rank which may available after getting few unique review. I'm going to implement NOINDEX FOLLOW tag which help me to avoid poor indexing as well as capture page rank from those pages to my valuable product page.
 Cancel
R debere

2012-03-22T02:54:02-07:00

Hi there.

Got a minor issue.

You've said that Google has crawled the Review pages.

You then state that applying NoFollow to the links to those Review pages cuts those links,

and means Google cannot crawl the Review pages to see the NoIndex.

That's incorrect.

Once Google has crawled a URL - it knows it.

It doens't need links to it any more.

It will still attempt to visit that URL.

You could remove ALL links to a previously crawled URL - and it will still get visits from GoogleBot.

(Yes, no links/fewer links means lower frequency/slower - but still gets hit!)

Further - NoFollow does not mean Google will not nofollow that link.

G have admitted that it is only a "suggestion" and may decide to crawl it anyway.

4 0

Hi there. Got a minor issue. You've said that Google has crawled the Review pages. You then state that applying NoFollow to the links to those Review pages cuts those links, and means Google cannot crawl the Review pages to see the NoIndex. That's incorrect. Once Google has crawled a URL - it knows it. It doens't need links to it any more. It will still attempt to visit that URL. You could remove ALL links to a previously crawled URL - and it will still get visits from GoogleBot. (Yes, no links/fewer links means lower frequency/slower - but still gets hit!) Further - NoFollow does not mean Google will not nofollow that link. G have admitted that it is only a "suggestion" and may decide to crawl it anyway.
Cancel
- Dr. Peter J. Meyers
 
 2012-03-22T07:12:46-07:00
 
 You make a good point, and it's why I said "potentially blocking the spiders". To be honest, I didn't want to get deep into the nuances and confuse the issue. Google will recrawl from it's "memory" of your index, to a point and does cache your indexed URLs. What I've found in practice, though, is that that usually only covers a small percentage of pages, especially when those pages are deep and/or duplicates.
 
 In this case, if you added nofollow and NOINDEX on the same day to the 1,000 review pages/links, you'd probably see some initial drop, but I suspect it would die out around the 20-30% mark. At that point, Google would stop revisiting the pages that no longer have internal links. They might gradually prune off more, but ends up being a very slow process. By keeping the links open, things go much more smoothly, in my experience.
 
 1 0
 
 You make a good point, and it's why I said "potentially blocking the spiders". To be honest, I didn't want to get deep into the nuances and confuse the issue. Google will recrawl from it's "memory" of your index, to a point and does cache your indexed URLs. What I've found in practice, though, is that that usually only covers a small percentage of pages, especially when those pages are deep and/or duplicates. In this case, if you added nofollow and NOINDEX on the same day to the 1,000 review pages/links, you'd probably see some initial drop, but I suspect it would die out around the 20-30% mark. At that point, Google would stop revisiting the pages that no longer have internal links. They might gradually prune off more, but ends up being a very slow process. By keeping the links open, things go much more smoothly, in my experience.
 Cancel
 - R debere
 
 2012-03-22T10:03:24-07:00
 
 I'm not disputing the fact that G requires the links to be open.It's covered in enough places (such as Googles Forums).
 
 I was merely pointing out the line about nofollow resulting in G not crawling the destination URLs.
 
 RupertDeBere edited 2012-03-22T10:03:47-07:00
 1 0
 
 I'm not disputing the fact that G requires the links to be open.It's covered in enough places (such as Googles Forums). I was merely pointing out the line about nofollow resulting in G not crawling the destination URLs.
 Cancel
 - Dr. Peter J. Meyers
 
 2012-03-22T10:06:18-07:00
 
 I think there's anecdotal evidence that G doesn't crawl the destination URLs as vigorously or necessarily index them and honor the signals on those pages. You're right, though - it's not black-and-white by a long shot. I didn't want to hedge that point too much and make a confusing issue even more confusing for folks.
 
 1 0
 
 I think there's anecdotal evidence that G doesn't crawl the destination URLs as vigorously or necessarily index them and honor the signals on those pages. You're right, though - it's not black-and-white by a long shot. I didn't want to hedge that point too much and make a confusing issue even more confusing for folks.
 Cancel
 - R debere
 
 2012-03-22T16:52:49-07:00
 
 Fewer Internal Links = less PR Flow, lower Importance = slower crawl rate.Nothing really confusing about it (atleast, there shouldn't be).
 
 If you really want stuff gone;
 
 Make sure the URL is crawlable
 
 Make sure it has NoIndex (robot/googlebot meta, or the xheader)
 
 Link to the unwanted pages from prominent/priority pages (such as the homepage)
 
 Dump those URLs in their own sitemap and submit it to G
 
 Try and use the Fetch as GoogleBot tool in GWT on those URLs
 
 That lot should do the job if you are in a hurry.
 
 Not overly scalable - but should work well enough so long as you are willing to invest a little time/patience.
 
 RupertDeBere edited 2012-03-22T16:54:06-07:00
 1 0
 
 Fewer Internal Links = less PR Flow, lower Importance = slower crawl rate.Nothing really confusing about it (atleast, there shouldn't be). If you really want stuff gone; <ol><li> Make sure the URL is crawlable</li> <li>Make sure it has NoIndex (robot/googlebot meta, or the xheader)</li> <li>Link to the unwanted pages from prominent/priority pages (such as the homepage)</li> <li>Dump those URLs in their own sitemap and submit it to G</li> <li>Try and use the Fetch as GoogleBot tool in GWT on those URLs</li> </ol> That lot should do the job if you are in a hurry. Not overly scalable - but should work well enough so long as you are willing to invest a little time/patience.
 Cancel
netmeg

2012-03-22T09:23:58-07:00

If I had a nickel for every time I've had to explain this, I'd have at least five dollars.

Now I can just send 'em to this post.

Thanks for saving me time. Enjoy your nickels.

3 0

If I had a nickel for every time I've had to explain this, I'd have at least five dollars. Now I can just send 'em to this post. Thanks for saving me time. Enjoy your nickels.
Cancel
sichristie

2012-03-21T21:43:53-07:00

Nice post Dr Pete.

So we have done exactly this on our large E-Commerce sites -> a mix of nofollow and noindexing. We are continuing a round of no-indexing on various sections and gradually we are starting to see positive results reported back in WebMaster Tools.

The purpose of the process of de-indexing is to improve our crawl equity and get Google to the fresh content as it updates daily, rather than the bots wasting their bandwidth on deeper and less relevant pages - like deeper pagination and faceted navigation.

We don't believe in 301'ing any of our content - only on expired listings and classifieds. However in the case of facted navigation we are using rel canonical where we are starting to see great results.

3 0

Nice post Dr Pete. So we have done exactly this on our large E-Commerce sites -> a mix of nofollow and noindexing. We are continuing a round of no-indexing on various sections and gradually we are starting to see positive results reported back in WebMaster Tools. The purpose of the process of de-indexing is to improve our crawl equity and get Google to the fresh content as it updates daily, rather than the bots wasting their bandwidth on deeper and less relevant pages - like deeper pagination and faceted navigation. We don't believe in 301'ing any of our content - only on expired listings and classifieds. However in the case of facted navigation we are using rel canonical where we are starting to see great results.
Cancel
franco lucchetti

2012-03-22T02:14:10-07:00

Hi DR Pete,

Your process is right, adding nofollows is not the right way the page to be seen by the spiders, furthemore when we want the page no more indexed. Another common mistake, is also when we put in the robots.txt a disallow to prevent pages or directories from being indexed.

If we write

User-agent:*

Disallow: /page1/

we prevent page1 from being crawled NOT indexed, so the on page robots is the best choice

2 0

Hi DR Pete, Your process is right, adding nofollows is not the right way the page to be seen by the spiders, furthemore when we want the page no more indexed. Another common mistake, is also when we put in the robots.txt a disallow to prevent pages or directories from being indexed. If we write User-agent:* Disallow: /page1/ we prevent page1 from being crawled NOT indexed, so the on page robots is the best choice
Cancel
Mark Capuano

2012-03-22T03:13:46-07:00

Good advice on planning out your strategy.

2 0

Good advice on planning out your strategy.
Cancel
Tiggerito

2012-03-22T06:57:08-07:00

I was under the impression that the poorly named rel nofollow does not stop Google’s robot from crawling to the destination page, but just causes the PageRank algo from passing any rank juice.

So it will not stop the destination from being re-crawled and indexed.

As already pointed out, once Google has actually indexed it seems their ability to re-find it during a crawl is not relevant. It's in, and will stay in until there is a direct signal to remove it.

The meta robots tag with noindex seems to be the most direct but slow way to get a page removed from search results. However, I think it is still indexed as you can specify follow which means it has to parse the page, it just does not get included in search results. Like nofollow, noindex is also poorly named.

What I see as a potentially real issue is the blocking via robots.txt. This seems to stop Google from re-crawling a page and therefore finding out it's meta robots tag requests it not to be indexed. So Google webmaster tools gets full of "blocked by robots.txt" errors (does it still report them?). I guess it's the same though, the page is in the index but not shown in search results.

In reality, I don't think there is any way to truly get a page removed from the index, you can just block it from being in search results.

2 0

I was under the impression that the poorly named rel nofollow does not stop Google’s robot from crawling to the destination page, but just causes the PageRank algo from passing any rank juice. So it will not stop the destination from being re-crawled and indexed. As already pointed out, once Google has actually indexed it seems their ability to re-find it during a crawl is not relevant. It's in, and will stay in until there is a direct signal to remove it. The meta robots tag with noindex seems to be the most direct but slow way to get a page removed from search results. However, I think it is still indexed as you can specify follow which means it has to parse the page, it just does not get included in search results. Like nofollow, noindex is also poorly named. What I see as a potentially real issue is the blocking via robots.txt. This seems to stop Google from re-crawling a page and therefore finding out it's meta robots tag requests it not to be indexed. So Google webmaster tools gets full of "blocked by robots.txt" errors (does it still report them?). I guess it's the same though, the page is in the index but not shown in search results. In reality, I don't think there is any way to truly get a page removed from the index, you can just block it from being in search results.
Cancel
- Dr. Peter J. Meyers
 
 2012-03-22T07:23:33-07:00
 
 See my reply to @RupertDeBere - I didn't want to confuse the issue too much, but you're right - Google will still cache and recrawl some nofollow'ed URLs. What I've found, in practice, is that deep/thin/duplicate URLs tend to only get recrawled partially and unreliably. So, in a situation like this, the nofollow would definitely hamper and slow your de-indexation efforts.
 
 I've actually found NOINDEX to be fairly fast, in most cases. Rel-canonical can be faster, but it has to be used appropriately. I don't like to use it for generic examples, because people tend to get into trouble with that tag.
 
 What frustrates me the most, especially on very large sites, are the GWT options. Removing a page or even a folder can be very fast. Parameter handling, though, which is in theory very useful, is slow or sometimes doesn't work at all. I really wish they'd work to make these tools and how they use them more consistent and reliable.
 
 2 0
 
 See my reply to @RupertDeBere - I didn't want to confuse the issue too much, but you're right - Google will still cache and recrawl some nofollow'ed URLs. What I've found, in practice, is that deep/thin/duplicate URLs tend to only get recrawled partially and unreliably. So, in a situation like this, the nofollow would definitely hamper and slow your de-indexation efforts. I've actually found NOINDEX to be fairly fast, in most cases. Rel-canonical can be faster, but it has to be used appropriately. I don't like to use it for generic examples, because people tend to get into trouble with that tag. What frustrates me the most, especially on very large sites, are the GWT options. Removing a page or even a folder can be very fast. Parameter handling, though, which is in theory very useful, is slow or sometimes doesn't work at all. I really wish they'd work to make these tools and how they use them more consistent and reliable.
 Cancel
CommercePundit

2012-07-12T01:21:31-07:00

Today, I was searching How much time Google will take to De-Index 301 redirected pages on Google. I found this blog post where I have submitted my 2 comments with associated example. But, I have very similar different question for De-Indexing of 301 redirected pages.

I have changed URL structure for 11,000 product pages and set up 301 redirect from OLD URLs to NEW URLs on 3rd July, 2012.

As you mentioned, De-Indexing is quite complex method of Google and may take long time. Google webmaster tools have one section to remove URLs from web search. So, Can I use for my OLD URLs? Will it De-Index my actual product page or not?

I'm going to raise similar question on discussion board. But, I think: This is right platform to drop my question. Is there any specific method which enable me to increase speed of De-Indexing?

Right now, I'm measuring my Indexing and De-Indexing ratio via NEW & OLD XML sitemaps.

2 0

Today, I was searching How much time Google will take to De-Index 301 redirected pages on Google. I found this blog post where I have submitted my 2 comments with associated example. But, I have very similar different question for De-Indexing of 301 redirected pages. I have changed URL structure for 11,000 product pages and set up 301 redirect from OLD URLs to NEW URLs on 3rd July, 2012. As you mentioned, De-Indexing is quite complex method of Google and may take long time. Google webmaster tools have one section to remove URLs from web search. So, Can I use for my OLD URLs? Will it De-Index my actual product page or not? I'm going to raise similar question on discussion board. But, I think: This is right platform to drop my question. Is there any specific method which enable me to increase speed of De-Indexing? Right now, I'm measuring my Indexing and De-Indexing ratio via NEW & OLD XML sitemaps. 
Cancel
- Dr. Peter J. Meyers
 
 2012-07-12T14:25:18-07:00
 
 It can take a frustratingly long time. The problem is that many deep pages aren't crawled very often, and even when Google re-crawls, they don't always process or honor a 301 the first time they see it, in my experience. One key is to leave crawl paths to the old URLs open - they have to be re-crawled for the 301 to kick in. Also leave XML sitemaps with the old URLs up (you can add the new ones, but keep the old ones active for a while), for the same reason.
 
 1 0
 
 It can take a frustratingly long time. The problem is that many deep pages aren't crawled very often, and even when Google re-crawls, they don't always process or honor a 301 the first time they see it, in my experience. One key is to leave crawl paths to the old URLs open - they have to be re-crawled for the 301 to kick in. Also leave XML sitemaps with the old URLs up (you can add the new ones, but keep the old ones active for a while), for the same reason.
 Cancel
- Colin Myerscough
 
 2013-02-15T06:09:30-08:00
 
 How did you manage to solve this? Or was it just a long wait? I have an explosion of pages due to some mistake along the line. For 15.000 products hiting the 1.040.000 links in Google.
 
 I have been removing URL's, blocking content in robots.txt but without any effect in the last 2 weeks.
 
 1 0
 
 How did you manage to solve this? Or was it just a long wait? I have an explosion of pages due to some mistake along the line. For 15.000 products hiting the 1.040.000 links in Google. I have been removing URL's, blocking content in robots.txt but without any effect in the last 2 weeks. 
 Cancel
Yarashi

2012-03-21T23:37:22-07:00

One of the best blogs of the summer Dr. Pete. De - indexing the pages from google is one of the trickiest but easy jobs. Thanks for your invaluable tips.

2 0

One of the best blogs of the summer Dr. Pete. De - indexing the pages from google is one of the trickiest but easy jobs. Thanks for your invaluable tips.
Cancel
Anil Singh

2012-03-23T05:48:13-07:00

When we are done putting the META NOINDEX on the pages we do not want to be in the index, we can use the google url removal tool :) to qucikly get it deindexed without having to go through changing each links to nofollow.

2 0

When we are done putting the META NOINDEX on the pages we do not want to be in the index, we can use the google url removal tool :) to qucikly get it deindexed without having to go through changing each links to nofollow.
Cancel
- Dr. Peter J. Meyers
 
 2012-03-23T05:58:14-07:00
 
 Unfortunately, while the GWT tool works very well for a handful of URLs, or at the folder level, it doesn't scale well. Requesting 100s or 1000s of removals is very tedious, and generally not recommended by Google (they said something about it, but I'm having trouble finding the link). It is fast, though - if you've just got a few problem URLs, like a duplicate home-page, it's a good tool.
 
 1 0
 
 Unfortunately, while the GWT tool works very well for a handful of URLs, or at the folder level, it doesn't scale well. Requesting 100s or 1000s of removals is very tedious, and generally not recommended by Google (they said something about it, but I'm having trouble finding the link). It is fast, though - if you've just got a few problem URLs, like a duplicate home-page, it's a good tool.
 Cancel
 - TalkInThePark
 
 2012-03-24T14:26:57-07:00
 
 Tip: You can GWT folder remove pages by file name, for instance all URLs beginning with /apps/removeme.php.
 
 1 0
 
 Tip: You can GWT folder remove pages by file name, for instance all URLs beginning with /apps/removeme.php. 
 Cancel
 - TalkInThePark
 
 2012-06-10T04:14:48-07:00
 
 Yes, Google says removing pages for the "wrong" reasons "may cause problems for your site". What problems, you may wonder. Any idea?
 
 1 0
 
 Yes, Google says removing pages for the "wrong" reasons "may cause problems for your site". What problems, you may wonder. Any idea?
 Cancel
Alan Mosley

2012-03-23T12:05:37-07:00

i would add a no-index , follow meta tag to review pages and leave it at that.this way you will remove from index, but link juice going to these pages will be retruned though out links.

AlanMosley edited 2012-03-23T12:06:16-07:00
1 0

i would add a no-index , follow meta tag to review pages and leave it at that.this way you will remove from index, but link juice going to these pages will be retruned though out links.
Cancel
Udaya123

2012-03-27T23:41:18-07:00

Hi..All.

Indexing Problem. This is common for all the Websites which are having more 50K urls. Please check with your sitemap XML and HTML. There are very minute errors which will avoid crawlers from indexing. Check with priority and frequencies. Avoid old unused urls.

Hope this helps.

1 0

Hi..All. Indexing Problem. This is common for all the Websites which are having more 50K urls. Please check with your sitemap XML and HTML. There are very minute errors which will avoid crawlers from indexing. Check with priority and frequencies. Avoid old unused urls. Hope this helps.
Cancel
rajachandraa

2012-03-27T04:24:20-07:00

Hi Pete

Great post. It really helpful. Can you help me better understand my scenario? Right now there is a problem with Google indexing our website.We had around 50K plus pages where all were indexed, but as we changed some parameters to pull the data dynamically, suddenly the index dropped to mere 900. Now the indexed pages are increasing slowly, but there was an interesting observation, when we checked the total indexed pages on Google it is 2400, but when we tried to find indexed pages at the next level (Sub folder/sub level) Google shows all thse pages were indexed as well where in we have around 2000 pages in each sub level. What can this be. Can you please help?

rajachandraa edited 2012-03-27T04:25:31-07:00
1 0

Hi Pete Great post. It really helpful. Can you help me better understand my scenario? Right now there is a problem with Google indexing our website.We had around 50K plus pages where all were indexed, but as we changed some parameters to pull the data dynamically, suddenly the index dropped to mere 900. Now the indexed pages are increasing slowly, but there was an interesting observation, when we checked the total indexed pages on Google it is 2400, but when we tried to find indexed pages at the next level (Sub folder/sub level) Google shows all thse pages were indexed as well where in we have around 2000 pages in each sub level. What can this be. Can you please help?
Cancel
- Dr. Peter J. Meyers
 
 2012-03-27T08:28:40-07:00
 
 Any given site: results can be a bit hard to trust. A couple of suggestions:
 
 (1) Track it daily - you may see some very low days, but then a steadier number that's more realistic.
 
 (2) Dig into all sub-folders, in a logical manner. If those sub-folder counts don't add up to something close to the overall count, either the overall count is suspicious, or Google is re-evaluating your indexed pages for some reason.
 
 Since you did make a major change, I would re-check that history. When you changed parameters, did site-wide URLs change? Did you set up 301-redirects from the old URLs? Did these parameterized URLs create potential duplicate content? There are a lot of things that could've happened when you made the switch.
 
 2 0
 
 Any given site: results can be a bit hard to trust. A couple of suggestions: (1) Track it daily - you may see some very low days, but then a steadier number that's more realistic. (2) Dig into all sub-folders, in a logical manner. If those sub-folder counts don't add up to something close to the overall count, either the overall count is suspicious, or Google is re-evaluating your indexed pages for some reason. Since you did make a major change, I would re-check that history. When you changed parameters, did site-wide URLs change? Did you set up 301-redirects from the old URLs? Did these parameterized URLs create potential duplicate content? There are a lot of things that could've happened when you made the switch.
 Cancel
Alessio Lo Vecchio

2012-03-24T08:14:12-07:00

Can I use robots.txt to deindex pages?Or Google will maintain the cache copy?

1 0

Can I use robots.txt to deindex pages?Or Google will maintain the cache copy? 
Cancel
- Dr. Peter J. Meyers
 
 2012-03-24T16:33:27-07:00
 
 Robots.txt ("Disallow") can work well to prevent pages from being indexed, but seems to do a lousy job of knocking pages out of the index once they've been crawled. Personally, I've had better luck with other methods, such as META NOINDEX, 301s, or canonicals (depending on the situation).
 
 1 0
 
 Robots.txt ("Disallow") can work well to prevent pages from being indexed, but seems to do a lousy job of knocking pages out of the index once they've been crawled. Personally, I've had better luck with other methods, such as META NOINDEX, 301s, or canonicals (depending on the situation).
 Cancel
Brian Reynolds

2012-03-26T18:52:00-07:00

Thanks for the easy to understand examples you have given. This helps me to understand the best practice easier-Thank you!

1 0

Thanks for the easy to understand examples you have given. This helps me to understand the best practice easier-Thank you!
Cancel
usef4u

2012-03-25T14:04:35-07:00

Hi pete this was an informative post and I enjoyed reading it. However I was thinking can,t we have 5 to 10 random reviews about the same product below the form that can actually provide more user generated content and make page unique as well, instead of adding no follow and no index.

1 0

Hi pete this was an informative post and I enjoyed reading it. However I was thinking can,t we have 5 to 10 random reviews about the same product below the form that can actually provide more user generated content and make page unique as well, instead of adding no follow and no index.
Cancel
- Dr. Peter J. Meyers
 
 2012-03-26T09:11:40-07:00
 
 I think I may have made the example seem a bit too literal - I was trying to illustrate the kind of situation where this problem could occur. I'm definitley not suggesting this is an ideal site structure.
 
 In this situation, I was assuming that the actual product reviews lived on the product page, and the review page was nothing but a form that just happened to spin out a unique URL for every product. In that case, the review pages would have no search value.
 
 1 0
 
 I think I may have made the example seem a bit too literal - I was trying to illustrate the kind of situation where this problem could occur. I'm definitley not suggesting this is an ideal site structure. In this situation, I was assuming that the actual product reviews lived on the product page, and the review page was nothing but a form that just happened to spin out a unique URL for every product. In that case, the review pages would have no search value.
 Cancel
 - usef4u
 
 2012-03-26T13:08:53-07:00
 
 Yes you are right. Thanks for the informative posts.
 
 1 0
 
 Yes you are right. Thanks for the informative posts.
 Cancel
Maher Singh

2012-03-28T00:00:47-07:00

Nice article bro ithink you r right

1 0

Nice article bro ithink you r right
Cancel
ChrisMorgan

2012-06-13T18:38:45-07:00

During the period where the pages are noindexed, but the links are still being followed, are you passing wasted page rank to these duplicate pages? And if so, should you eventually change the followed links to nofollow links when most of them have been deindexed? Or is Google smart enough to not bother passing page rank to pages that are noindexed?

1 0

During the period where the pages are noindexed, but the links are still being followed, are you passing wasted page rank to these duplicate pages? And if so, should you eventually change the followed links to nofollow links when most of them have been deindexed? Or is Google smart enough to not bother passing page rank to pages that are noindexed?
Cancel
Haeck-Design

2013-11-04T12:24:44-08:00

I can't even explain how helpful this post is. My numbers took a big hit though when I noticed my Twitter and Facebook connect buttons we're getting crawled... Then I recalled that I just adjusted my robots to "close the flow" - I didn't even think about how Google would read that. Thanks a ton!

1 0

 I can't even explain how helpful this post is. My numbers took a big hit though when I noticed my Twitter and Facebook connect buttons we're getting crawled... Then I recalled that I just adjusted my robots to "close the flow" - I didn't even think about how Google would read that. Thanks a ton! 
Cancel
SpookSEO

2014-01-01T21:00:01-08:00

Hello Dr. Pete! I think people forget that just because their site is recrawled every day, it does not actually mean that every page is recrawled. It also does not mean that Google recognizes all of the new signals. It can actually takes weeks or even months.

1 0

 Hello Dr. Pete! I think people forget that just because their site is recrawled every day, it does not actually mean that every page is recrawled. It also does not mean that Google recognizes all of the new signals. It can actually takes weeks or even months. 
Cancel
Katasha

2016-08-22T01:38:46-07:00

I definitely agree that de-indexation requires plenty of time! To my mind this article is valuable enough for the people who have their own website and want the search engine to get the renewed info from their site. See more and go deeper into the topic on deindex.pro

1 0

I definitely agree that de-indexation requires plenty of time! To my mind this article is valuable enough for the people who have their own website and want the search engine to get the renewed info from their site. See more and go deeper into the topic on deindex.pro
Cancel
Bilal Sarwari

2013-10-11T04:08:34-07:00

Perfect post Dr. Peter as always, thanks today you saved my time in finding some common deindexing issues.

1 0

 Perfect post Dr. Peter as always, thanks today you saved my time in finding some common deindexing issues. 
Cancel
Websitedevelopment

2012-09-21T02:11:57-07:00

I don't want to damage my incoming traffic that I do have, but had to fix issues like Leads Vs Traffic conversion was horrible, broken and non-existent links etc. The question is what should I do with my old pages to not negatively impact search engine placement.

Dr-Pete edited 2012-09-21T05:58:31-07:00
1 0

 I don't want to damage my incoming traffic that I do have, but had to fix issues like Leads Vs Traffic conversion was horrible, broken and non-existent links etc. The question is what should I do with my old pages to not negatively impact search engine placement. 
Cancel
SandraMoZ

2012-04-30T03:07:46-07:00

Wouldn't be a way better to include the review pages into the product pages themselves?...Much more unique content and less headache with the noindex..

1 0

Wouldn't be a way better to include the review pages into the product pages themselves?...Much more unique content and less headache with the noindex..
Cancel
Ted Gregory

2012-03-23T02:16:58-07:00

Thank you for the info. Just experiencing such problem and can't get the desired effect. What can you say about removing old unexisting subdomain contents from google?

1 0

Thank you for the info. Just experiencing such problem and can't get the desired effect. What can you say about removing old unexisting subdomain contents from google?
Cancel
Nadia Mubashar

2012-06-21T11:21:52-07:00

Thanks for the post, leads me to a quick question about our site:

We have an automated rel=canonical tagging system that uses the current url structure of the page to automatically add and update the canonical tag in the header. We de-indexed/followed hundreds of user pages, however they still have the rel=canonical tag attached to them, should this be removed as well?

1 0

Thanks for the post, leads me to a quick question about our site: We have an automated rel=canonical tagging system that uses the current url structure of the page to automatically add and update the canonical tag in the header. We de-indexed/followed hundreds of user pages, however they still have the rel=canonical tag attached to them, should this be removed as well? 
Cancel
- Dr. Peter J. Meyers
 
 2012-06-21T11:35:55-07:00
 
 I'm not a big fan of mixed signals when it comes to Google, but if you have a self-referencing canonical and a META NOINDEX,FOLLOW, I think the NOINDEX is going to win out the vast majority of the time. The proof is in the pudding - if the pages drop out of the index, you've got nothing to worry about.
 
 1 0
 
 I'm not a big fan of mixed signals when it comes to Google, but if you have a self-referencing canonical and a META NOINDEX,FOLLOW, I think the NOINDEX is going to win out the vast majority of the time. The proof is in the pudding - if the pages drop out of the index, you've got nothing to worry about.
 Cancel
swebdizajn

2012-03-28T01:00:18-07:00

Certainly a very interesting and quality articles. My compliments to the author..

1 0

Certainly a very interesting and quality articles. My compliments to the author..
Cancel
Ricard Menor

2012-03-22T14:05:29-07:00

Hola Doctor,

I would rather pay attention to your finale, talking about sitemaps and ways to discover isolated pages.I did some unsuccessful testing trying to get isolated pages indexed only by being prefetched with link rel=next.

Have you - or any of my post neighbours - ever tried this? If so, any positive result?

Thanks for taking the time to write this post, it is not much for many people out there (because we want you to grow us very clever SEO people) but A LOT of other people may have found this very instructive.

Happy weekend everyone

seofreelance edited 2012-03-22T14:06:26-07:00
1 0

Hola Doctor, I would rather pay attention to your finale, talking about sitemaps and ways to discover isolated pages.I did some unsuccessful testing trying to get isolated pages indexed only by being prefetched with link rel=next. Have you - or any of my post neighbours - ever tried this? If so, any positive result? Thanks for taking the time to write this post, it is not much for many people out there (because we want you to grow us very clever SEO people) but A LOT of other people may have found this very instructive. Happy weekend everyone
Cancel
weknowmarketing

2012-03-22T04:01:27-07:00

I think this work can also be done in a different way, through xml sitemap as even with nofollow tag google can crawl linked pages but it will not pass any link juice from the host page. I have past experience that inspite of any internal linking i have managed to crawl few pages through xml sitemap. And once you have that access for spider bots, then placing the meta noindex would have done the same job.

Anyways good work and nice info for webmasters.

weknowmarketing edited 2012-03-22T04:02:47-07:00
2 1

I think this work can also be done in a different way, through xml sitemap as even with nofollow tag google can crawl linked pages but it will not pass any link juice from the host page. I have past experience that inspite of any internal linking i have managed to crawl few pages through xml sitemap. And once you have that access for spider bots, then placing the meta noindex would have done the same job. Anyways good work and nice info for webmasters.
Cancel
Amy Crompton

2012-03-23T02:04:18-07:00

Great post. My first action would have been to nofollow the review links too, but following them to allow the noindex tag to be seen is (now) insanely obvious.

P.S The diagrams rock

1 0

Great post. My first action would have been to nofollow the review links too, but following them to allow the noindex tag to be seen is (now) insanely obvious. P.S The diagrams rock
Cancel
Modestos Siotos

2012-03-22T03:15:46-07:00

Nice post Pete.

I would like to add a couple of things.

In the part 1 fix, don't you think that as long as the review pages appear in the XML/HTML sitemaps search engines would still be able to reach them regardless of the nofollow?

The second one is about the very common misconception that dissalowing certain directories/files in robots.txt will make them drop out of the index. Unfortunately, this is not neccesarily the case and Google suggests adding a robots noindex as the best way to remove pages from their index.

Using WMT again doesn't always work, and if it does that would only work for Google.

2 1

Nice post Pete. I would like to add a couple of things. In the part 1 fix, don't you think that as long as the review pages appear in the XML/HTML sitemaps search engines would still be able to reach them regardless of the nofollow? The second one is about the very common misconception that dissalowing certain directories/files in robots.txt will make them drop out of the index. Unfortunately, this is not neccesarily the case and Google suggests adding a robots noindex as the best way to remove pages from their index. Using WMT again doesn't always work, and if it does that would only work for Google.
Cancel
- Dr. Peter J. Meyers
 
 2012-03-22T07:13:53-07:00
 
 I'm embarrassed to say that I meant to address the XML fix, and I completely forgot. I don't think it's quite as effective as keeping links active, but it certainly can help. It's also good for situations, for whatever reason, you just can't put the links back. I'll add a note in the post - thanks.
 
 1 0
 
 I'm embarrassed to say that I meant to address the XML fix, and I completely forgot. I don't think it's quite as effective as keeping links active, but it certainly can help. It's also good for situations, for whatever reason, you just can't put the links back. I'll add a note in the post - thanks.
 Cancel
caulon chasar

2012-03-22T00:39:24-07:00

I have a wordpress based site and am using sitemap plugin to control crawling, www.mefindcoupon.com, am curious if not indexing categories, and tags pages will benefit our site. What do you think......... also, have over 33,000 indexed pages, would it be wise to eliminate as many as possible. THANKS

KeriMorgret edited 2012-03-26T16:00:24-07:00
1 0

I have a wordpress based site and am using sitemap plugin to control crawling, www.mefindcoupon.com, am curious if not indexing categories, and tags pages will benefit our site. What do you think......... also, have over 33,000 indexed pages, would it be wise to eliminate as many as possible. THANKS
Cancel
Asif Dilshad

2012-03-21T22:01:56-07:00

Good Post Dr. Pete! Non technical SEO’s often make this kind of logical mistakes. As a non technical seo I want to ask one question and want to share one technique, don’t know they work or not!

Question: What happened if we block pages in robots.txt is they remove from Google index with the passage of time?

Suggestion: I often use another technique that block Google to do not read my dynamic pages with the help of parameters in Google webmaster tools and I think it is a good way to clean our index (I am seeing some results but still in testing phase).

2 1

Good Post Dr. Pete! Non technical SEO’s often make this kind of logical mistakes. As a non technical seo I want to ask one question and want to share one technique, don’t know they work or not! Question: What happened if we block pages in robots.txt is they remove from Google index with the passage of time? Suggestion: I often use another technique that block Google to do not read my dynamic pages with the help of parameters in Google webmaster tools and I think it is a good way to clean our index (I am seeing some results but still in testing phase).
Cancel
Ryan Chooai

2012-03-22T08:20:13-07:00

Dr. Pete you may want to tag this post "Magento"

1 0

Dr. Pete you may want to tag this post "Magento"
Cancel
Joshua Hedlund

2012-03-22T06:23:52-07:00

"Pro tip: Don’t take any single day’s “site:” count too seriously – it can be unreliable from time to time. Look at the trend over time."

Definitely. It can all over the map. You can query pages in a subfolder too, not just the whole site, and sometimes the counts in a subfolder will be more than the counts in the folder above it. In general Bing's numbers seem to be a little more unstable (and generally smaller) than Google's, at least in my experience, but it's getting better too. I've been weekly tracking some "site:" counts in an Excel spreadsheet for over a year along with the Webmaster Tools sitemap counts, and monitoring those trends together helps me know I'm moving in the right direction.

2 1

"Pro tip: Don’t take any single day’s “site:” count too seriously – it can be unreliable from time to time. Look at the trend over time." Definitely. It can all over the map. You can query pages in a subfolder too, not just the whole site, and sometimes the counts in a subfolder will be more than the counts in the folder above it. In general Bing's numbers seem to be a little more unstable (and generally smaller) than Google's, at least in my experience, but it's getting better too. I've been weekly tracking some "site:" counts in an Excel spreadsheet for over a year along with the Webmaster Tools sitemap counts, and monitoring those trends together helps me know I'm moving in the right direction.
Cancel
- Dr. Peter J. Meyers
 
 2012-03-22T07:15:10-07:00
 
 I like to break sites into subfoldrs with "site:" and then see if those sections add up, as a gut-check. It not only helps validate the overall indexed page count, but it helps me spot missing problems. Unfortunately, that's a tedious, manual process. I do it on site audits, but it takes a couple of hours to do it well (plus all the time to track it over time).
 
 1 0
 
 I like to break sites into subfoldrs with "site:" and then see if those sections add up, as a gut-check. It not only helps validate the overall indexed page count, but it helps me spot missing problems. Unfortunately, that's a tedious, manual process. I do it on site audits, but it takes a couple of hours to do it well (plus all the time to track it over time).
 Cancel
Elliott Richmond

2012-03-23T01:17:43-07:00

The same can be done with the robot.txt file no?

1 0

The same can be done with the robot.txt file no?
Cancel
Woj Kwasi

2012-03-22T19:29:01-07:00

I really like the diagrams in this post Dr Pete - very clear & a pleasure to read :)

Meta NOINDEX tags are also a good fix for certain scenarios like getting rid of .HTML files on old IIS servers where 301s aren't easy to manage ;)

1 0

I really like the diagrams in this post Dr Pete - very clear & a pleasure to read :) Meta NOINDEX tags are also a good fix for certain scenarios like getting rid of .HTML files on old IIS servers where 301s aren't easy to manage ;)
Cancel
nadoor

2012-03-22T21:16:19-07:00

i

1 0

i
Cancel
Mark Wright

2012-03-22T11:48:12-07:00

Good tip on controling your duplicate content Dr. Pete.

But once those review pages start being populated with reviews, they're no longer duplicates, and do become valuable for anyone searching "product1 reviews". I think you would want to monitor your review pages closely, or build in automation that would add/remove the noindex tag based on the presence of a review.

markwrightseo edited 2012-03-22T11:48:35-07:00
1 0

Good tip on controling your duplicate content Dr. Pete. But once those review pages start being populated with reviews, they're no longer duplicates, and do become valuable for anyone searching "product1 reviews". I think you would want to monitor your review pages closely, or build in automation that would add/remove the noindex tag based on the presence of a review.
Cancel
- Dr. Peter J. Meyers
 
 2012-03-22T11:51:29-07:00
 
 This was just an example, but I intended it to be just a review form tied to the product - the reviews (hypothetically) would be on the product page, but each link to a review form would have a unique URL. I saw a similar case recently. Of course, I don't intend that as an ideal structure - just mean it to illustrate a scenario that could cause this problem.
 
 1 0
 
 This was just an example, but I intended it to be just a review form tied to the product - the reviews (hypothetically) would be on the product page, but each link to a review form would have a unique URL. I saw a similar case recently. Of course, I don't intend that as an ideal structure - just mean it to illustrate a scenario that could cause this problem.
 Cancel
Bill Sebald

2012-03-22T10:02:20-07:00

Canonical tag worked really good every one of the (feels like hundreds) ecommerce sites I've optimized, but it can take a little while for Google to "get it." Ie; whenever they run their canonical algo, then compile.

But since ecomm sites are dynamic, always be on the lookout for a page template that didn't do what you expected. The bigger the site, the more the complexity, and the larger room for error. Scan your templates in QA. I've caught sites improperly funneling spiders months after implementing the canonical because the developers thought they knew what they were doing, or because users were using templates in a different way than they were intended.

1 0

Canonical tag worked really good every one of the (feels like hundreds) ecommerce sites I've optimized, but it can take a little while for Google to "get it." Ie; whenever they run their canonical algo, then compile. But since ecomm sites are dynamic, always be on the lookout for a page template that didn't do what you expected. The bigger the site, the more the complexity, and the larger room for error. Scan your templates in QA. I've caught sites improperly funneling spiders months after implementing the canonical because the developers thought they knew what they were doing, or because users were using templates in a different way than they were intended.
Cancel
Bob Jones

2012-03-21T22:52:30-07:00

Im actually dealing with a similar situation at the moment. An e-commerce site with about 800 products. My initial problem was figuring out how to make sure the category pages weren't going to be causing any duplicate content problems. At the moment I just use the canonical attribute for the product pages, but I'm not using noindex or nofollow anywhere internally. As far as I'm aware, this should be sufficient for Google to figure out what's what.

1 1

Im actually dealing with a similar situation at the moment. An e-commerce site with about 800 products. My initial problem was figuring out how to make sure the category pages weren't going to be causing any duplicate content problems. At the moment I just use the canonical attribute for the product pages, but I'm not using noindex or nofollow anywhere internally. As far as I'm aware, this should be sufficient for Google to figure out what's what.
Cancel
a2z-247104

2012-03-22T00:26:32-07:00

I was experiencing duplicate pages on my site a few weeks ago, site: on Googlwe showed 7220 pages ! I had to create a rule on my htaccess file and change some code configuration and that fixed tha problem.

a2z-247104 edited 2012-03-22T00:27:14-07:00
1 1

I was experiencing duplicate pages on my site a few weeks ago, site: on Googlwe showed 7220 pages ! I had to create a rule on my htaccess file and change some code configuration and that fixed tha problem.
Cancel
Mat Bennett

2012-03-22T02:12:17-07:00

Maybe it's my ego, but I can't help think I was an inspiration for this post in some small way! Dr Pete gave me the same advice a couple of weeks ago looking at an issue I had with faceted navigation on an e-commerce post.

One thing I would add is that de-indexing can be SLOW. We had 400k pages indexed on a site that realistically should have been more around the 5k pages mark. Still not sure whether that resulted in a panda-esque penalty of just massive cannibalisation, but most of the most valuable pages absolutely tanked in the results.

We have gone with visciously strict canonical tags. Our plan is to Canonical back to just the bare minimum then look at re-introducing sections one at a time if we think they are significant enough.

Changes were made on-site just over 1 month ago now. We're currently at around 220k indexed pages, although Google does periodically tease us by showing 4k. A few results are coming back, although other things are happening so it is hard to say if that is solely down to the clean-up.

What is worth noting though is that the speed of de-indexing is dropping. Unless the results that we are getting teased with go live, we're not expecting this to be "fixed" for quite a few more weeks.

1 1

Maybe it's my ego, but I can't help think I was an inspiration for this post in some small way! Dr Pete gave me the same advice a couple of weeks ago looking at an issue I had with faceted navigation on an e-commerce post. One thing I would add is that de-indexing can be SLOW. We had 400k pages indexed on a site that realistically should have been more around the 5k pages mark. Still not sure whether that resulted in a panda-esque penalty of just massive cannibalisation, but most of the most valuable pages absolutely tanked in the results. We have gone with visciously strict canonical tags. Our plan is to Canonical back to just the bare minimum then look at re-introducing sections one at a time if we think they are significant enough. Changes were made on-site just over 1 month ago now. We're currently at around 220k indexed pages, although Google does periodically tease us by showing 4k. A few results are coming back, although other things are happening so it is hard to say if that is solely down to the clean-up. What is worth noting though is that the speed of de-indexing is dropping. Unless the results that we are getting teased with go live, we're not expecting this to be "fixed" for quite a few more weeks.
Cancel
- Martin Oddy
 
 2012-03-22T02:43:00-07:00
 
 Sounds like the makings of a good blog post.
 
 2 0
 
 Sounds like the makings of a good blog post.
 Cancel
 - Mat Bennett
 
 2012-03-22T10:50:17-07:00
 
 Am considering it... I might wait to see whether it has a happy ending first though!
 
 1 0
 
 Am considering it... I might wait to see whether it has a happy ending first though!
 Cancel
- Dr. Peter J. Meyers
 
 2012-03-22T07:08:55-07:00
 
 Are you sure you want to be the inspiration for this post? ;)
 
 It really is a painfully slow process. People forget that, just because they're site is recrawled every day, it doesn't mean ever page is recrawled or that Google honors all of the new signals. It really can take weeks or months, and you often have to adjust as you go.
 
 Be a little careful with canonical - used too broadly, it can give you some trouble. It can also be tough to reverse, if you want to re-open content later. I actually like NOINDEX a bit better for that. It's a little easier to reverse if you just want to add content gradually. Of course, it's very situational, which is what makes giving advice so hard.
 
 2 0
 
 Are you sure you want to be the inspiration for this post? ;) It really is a painfully slow process. People forget that, just because they're site is recrawled every day, it doesn't mean ever page is recrawled or that Google honors all of the new signals. It really can take weeks or months, and you often have to adjust as you go. Be a little careful with canonical - used too broadly, it can give you some trouble. It can also be tough to reverse, if you want to re-open content later. I actually like NOINDEX a bit better for that. It's a little easier to reverse if you just want to add content gradually. Of course, it's very situational, which is what makes giving advice so hard.
 Cancel
 - Mat Bennett
 
 2012-03-22T10:54:04-07:00
 
 We will probably be changing the URL structure of any "re-opened" sections anyway. The main motivation for this is a technical consideration, but will hopefully save us from any stubborn canonical instructions that we can't undo.
 
 Compared with the 400,000 URLs we've be staring at lately it's a pretty small consideration anyway.
 
 1 0
 
 We will probably be changing the URL structure of any "re-opened" sections anyway. The main motivation for this is a technical consideration, but will hopefully save us from any stubborn canonical instructions that we can't undo. Compared with the 400,000 URLs we've be staring at lately it's a pretty small consideration anyway.
 Cancel
Chris Horner

2012-03-22T08:43:33-07:00

Thanks Dr Pete for the article.Can I clarify the suggestion for new sites to use internal rel=nofollow , I was under the impression any internal nofollow is not best practice, as this is seen as PR sculpting?

1 1

Thanks Dr Pete for the article.Can I clarify the suggestion for new sites to use internal rel=nofollow , I was under the impression any internal nofollow is not best practice, as this is seen as PR sculpting?
Cancel
- Dr. Peter J. Meyers
 
 2012-03-22T09:02:59-07:00
 
 You certainly want to be more cautious, but I think internal nofollows still have a place when you really want to discourage crawlers from going deeper down a path. I sometimes put it at dead-ends or at layers where anything beyond that layer is content you don't want indexed (shopping carts, for example). You can NOINDEX, etc., of course, but I find that the nofollow helps the crawlers sort out what's important and keeps you from wasting their bandwidth. I don't, admittedly, have strong proof of this.
 
 1 0
 
 You certainly want to be more cautious, but I think internal nofollows still have a place when you really want to discourage crawlers from going deeper down a path. I sometimes put it at dead-ends or at layers where anything beyond that layer is content you don't want indexed (shopping carts, for example). You can NOINDEX, etc., of course, but I find that the nofollow helps the crawlers sort out what's important and keeps you from wasting their bandwidth. I don't, admittedly, have strong proof of this.
 Cancel
Ishita Ghosh

2012-03-22T00:05:39-07:00

Dr Pete,

Really good post. I am also facing such kind of problem, when I handle e-Commerce sites. Thanks a lot for your valuable suggestion.

By the way your picture representation is really intelligent.

1 2

Dr Pete, Really good post. I am also facing such kind of problem, when I handle e-Commerce sites. Thanks a lot for your valuable suggestion. By the way your picture representation is really intelligent.
Cancel
A-W

2012-03-21T23:06:37-07:00

Hi Dr Pete,Again, I am ready to get tons of dislike for this comment, but not every comment will be as sweet as sugar. The post is having no meaning, you could have written this post in 2-3 lines easily instead of uselessly stretching such a minor thing. When I saw the title of the post, I was like, Oh, I am going to have something really good and techincal from Dr Pete, but disappointed.

The title is much irrelevant, in a sense that title is too heavy while the content have nothing worth noticing. All you have to say is that add a noindex to the pages that may seems similar, while the title is saying something else.

And to be true, the post just ended before it even started, now thats not the way posts from Dr pete are. Next time you have to write a post, try to do some home work, write all the points on which you have to write post and then write about it. Otherwise writing such thin posts and uselessly stretching a minor topic wont yeild anything.So now I will get tons of thumbs down from lots of blind followers, who are following the trends of liking stuff from famous face(even if that are time wastage sutff) and disliking comments that shows mirror to the big faces.best of luck for your article next time. This one is flopped :)

A-W edited 2012-03-21T23:08:49-07:00
5 9

Hi Dr Pete,Again, I am ready to get tons of dislike for this comment, but not every comment will be as sweet as sugar. The post is having no meaning, you could have written this post in 2-3 lines easily instead of uselessly stretching such a minor thing. When I saw the title of the post, I was like, Oh, I am going to have something really good and techincal from Dr Pete, but disappointed. The title is much irrelevant, in a sense that title is too heavy while the content have nothing worth noticing. All you have to say is that add a noindex to the pages that may seems similar, while the title is saying something else. And to be true, the post just ended before it even started, now thats not the way posts from Dr pete are. Next time you have to write a post, try to do some home work, write all the points on which you have to write post and then write about it. Otherwise writing such thin posts and uselessly stretching a minor topic wont yeild anything.So now I will get tons of thumbs down from lots of blind followers, who are following the trends of liking stuff from famous face(even if that are time wastage sutff) and disliking comments that shows mirror to the big faces.best of luck for your article next time. This one is flopped :) 
Cancel
- Chris Butterworth
 
 2012-03-21T23:20:52-07:00
 
 I enjoy succinct posts. Too many are streched out way beyond requirements. In my ideal world every post would be bullet pointed with important content highlighted at the start and then a deeper explanation below if required.
 
 This post raises some interesting points to keep in mind when carrying out something which is fairly important in SEO. And remember that not everyone who reads SEOmoz is an SEO pro. Though I might agree, the title is a little ambiguous.
 
 ChrisButterworth edited 2012-03-21T23:22:46-07:00
 3 1
 
 I enjoy succinct posts. Too many are streched out way beyond requirements. In my ideal world every post would be bullet pointed with important content highlighted at the start and then a deeper explanation below if required. This post raises some interesting points to keep in mind when carrying out something which is fairly important in SEO. And remember that not everyone who reads SEOmoz is an SEO pro. Though I might agree, the title is a little ambiguous. 
 Cancel
 - A-W
 
 2012-03-22T01:42:45-07:00
 
 Without my critical comment, I am damn sure you would never have said anything about the title of the post! But look in this community we have to be true and transparent rather than sugar coated fake memeber with diabetes causing sweet comments. TAGEFEE iswhat we need to follow here, and Mozzers may agree(thought my comment may have sparked their anger nerve but still I can hope for something) with this TAGFEE things.But am glad I got one more vote about the ambiguous title... and I hope that the censoring authorities of the SEOMoz should censor and reject any future post with ambiguous titles like this one.
 
 1 3
 
 Without my critical comment, I am damn sure you would never have said anything about the title of the post! But look in this community we have to be true and transparent rather than sugar coated fake memeber with diabetes causing sweet comments. TAGEFEE iswhat we need to follow here, and Mozzers may agree(thought my comment may have sparked their anger nerve but still I can hope for something) with this TAGFEE things.But am glad I got one more vote about the ambiguous title... and I hope that the censoring authorities of the SEOMoz should censor and reject any future post with ambiguous titles like this one.
 Cancel
- eleuth
 
 2012-03-22T00:56:28-07:00
 
 Hey Asad,
 
 I think your assumption is that everyone understands the concepts illustrated in the post and just needs the 'bullets'. Whilst for me personally a shorter version would've been just fine, I'm sure there are enough people out there who are glad to see a graphical illustration of how this works with individual steps.
 
 at the end of the day, as you already knew what is going on, it shouldn't have taken you more than 30 secs to skim the post, so no real 'harm' done in terms of your time wasted, however for anyone new to this, there was a lot of 'good' done.
 
 It's not like it's a 7 minute video where you have to sit through the entire thing just to find out it was nothing new for YOU.
 
 just my $0.02
 
 Veit
 
 3 0
 
 Hey Asad, I think your assumption is that everyone understands the concepts illustrated in the post and just needs the 'bullets'. Whilst for me personally a shorter version would've been just fine, I'm sure there are enough people out there who are glad to see a graphical illustration of how this works with individual steps. at the end of the day, as you already knew what is going on, it shouldn't have taken you more than 30 secs to skim the post, so no real 'harm' done in terms of your time wasted, however for anyone new to this, there was a lot of 'good' done. It's not like it's a 7 minute video where you have to sit through the entire thing just to find out it was nothing new for YOU. just my $0.02 Veit
 Cancel
 - A-W
 
 2012-03-22T01:51:31-07:00
 
 Respected eleuth, its not about my time wastage, if I am here in SEOmoz I am ready for both good and bad stuff. And I have the option to just leave the community and not come here, but this is not the case, all I want is to imrove the stuff.I dont know if you have idea or not but this Dr Pete is a super genious when it comes to SEO, and I expect much much better stuff from him rather than such hollow post. The article has a good and valid point, but its a minor issue for which there is no need of taking a whole post. And Dr. Pete is you are listening, Eleuth, said in his comment above "personally a shorter version would've been just fine".
 
 Now let me answer any point, If you are reading a book by einstein, you would most probably be expecting something from advanced physics rather than basics. Same is the case here, for newbies there are a lot of places to learn, but from Dr Pete atleast I expect something a bit advance and posts with thick content.
 
 2 3
 
 Respected eleuth, its not about my time wastage, if I am here in SEOmoz I am ready for both good and bad stuff. And I have the option to just leave the community and not come here, but this is not the case, all I want is to imrove the stuff.I dont know if you have idea or not but this Dr Pete is a super genious when it comes to SEO, and I expect much much better stuff from him rather than such hollow post. The article has a good and valid point, but its a minor issue for which there is no need of taking a whole post. And Dr. Pete is you are listening, Eleuth, said in his comment above "personally a shorter version would've been just fine". Now let me answer any point, If you are reading a book by einstein, you would most probably be expecting something from advanced physics rather than basics. Same is the case here, for newbies there are a lot of places to learn, but from Dr Pete atleast I expect something a bit advance and posts with thick content.
 Cancel
 - Scott P. Dailey
 
 2012-03-22T05:26:53-07:00
 
 I get it. We all do. You disapprove of the packaging. You like the message (sorta) and dislike the delivery (alota). Truly and sincerely: I get it. Message received. But here's the thing about your delivery and analysis of the author's that I disapprove of: the title of this blog is "The Daily SEO Blog." That is not a remotely ambiguous title. Not even possibly easily misunderstood. Using your Einstein analogy, which was haste and agenda-driven incidentally, the title of this blog should be more something like, "The Daily Atom-splitting SEO Blog." Allow me to expand. How obvious is it, for instance, that we shouldn't be killing one another all over the world? Does it still happen? Does it still consume headlines? Are there panels and panels of "experts" pouring over the topic? Yes. Are they wasting their time patronizing the super genius in the room? Perhaps, at times yes, but the problem still remains. So let me pay homage to your degree of candor by responding in kind: your delivery - your packaging - your approach to contention (in at least this instance) comes off equal parts brilliant and informed, yet unfortunately for this reader, more parts pretentious, narrow and immature.
 
 So here we are. SEOmoz espouses the genius and often the largely obvious, but perhaps overlooked as well. They're the SEO for everyone. The hostess with the mostess, I guess one could say. This greenhorn has never undertaken a read on this Website that doesn't speak to me. Never above me, never beneath me. Kudos I say! So given the choice between your notions of substance meets bluntness and the good Dr's bedside manner, somebody get me a doctor! Good day.
 
 9 1
 
 I get it. We all do. You disapprove of the packaging. You like the message (sorta) and dislike the delivery (alota). Truly and sincerely: I get it. Message received. But here's the thing about your delivery and analysis of the author's that I disapprove of: the title of this blog is "The Daily SEO Blog." That is not a remotely ambiguous title. Not even possibly easily misunderstood. Using your Einstein analogy, which was haste and agenda-driven incidentally, the title of this blog should be more something like, "The Daily Atom-splitting SEO Blog." Allow me to expand. How obvious is it, for instance, that we shouldn't be killing one another all over the world? Does it still happen? Does it still consume headlines? Are there panels and panels of "experts" pouring over the topic? Yes. Are they wasting their time patronizing the super genius in the room? Perhaps, at times yes, but the problem still remains. So let me pay homage to your degree of candor by responding in kind: your delivery - your packaging - your approach to contention (in at least this instance) comes off equal parts brilliant and informed, yet unfortunately for this reader, more parts pretentious, narrow and immature. So here we are. SEOmoz espouses the genius and often the largely obvious, but perhaps overlooked as well. They're the SEO for everyone. The hostess with the mostess, I guess one could say. This greenhorn has never undertaken a read on this Website that doesn't speak to me. Never above me, never beneath me. Kudos I say! So given the choice between your notions of substance meets bluntness and the good Dr's bedside manner, somebody get me a doctor! Good day.
 Cancel
 - A-W
 
 2012-03-26T00:50:14-07:00
 
 Well, you have complicated the stuff too much, but the issue is simple, let me try to explain in a bit easier way.
 
 Suppose you buy a box with Ipad 3, written and made on it, so what you expect is an iPad 3 in the box. And when you open it, you get a pair of shoes instead, what are you going to do?
 
 That's what happend here! The box is from very authentic company Apple Inc(Dr Pete in our example). The title is "ipad 3"(Logic meet google crawling to deindex in our example, very fascinating title indeed) and the inner content is not what we expect of it.
 
 Hope you get it now. Let us be honest in our views rather than just blind praising!
 
 1 2
 
 Well, you have complicated the stuff too much, but the issue is simple, let me try to explain in a bit easier way. Suppose you buy a box with Ipad 3, written and made on it, so what you expect is an iPad 3 in the box. And when you open it, you get a pair of shoes instead, what are you going to do? That's what happend here! The box is from very authentic company Apple Inc(Dr Pete in our example). The title is "ipad 3"(Logic meet google crawling to deindex in our example, very fascinating title indeed) and the inner content is not what we expect of it. Hope you get it now. Let us be honest in our views rather than just blind praising!
 Cancel
- Moosa Hemani
 
 2012-03-22T07:00:24-07:00
 
 I don’t have to say much about your comment but it’s like people do have different opinion and tastes...It’s completely fine that you didn’t like the idea and even great that you share it openly with the other community members but the thing I don’t like is the way to react to it.
 
 I mean telling someone to do some homework, when writing a post for the next time is way too humiliating (especially when you are talking to the industry leader)...
 
 I do respect dr. Pete and I am following him for quite a long time. The reason why I think this post is important (IMHO and you can disagree here) is because in the race of new and complex SEO ideas some people forget the basic thumb rules and make errors... Remember he said “...couple of common mistakes...”
 
 P.S. you see! you didn't got much thumbs down here... I think you should re-consider the way you think about other community memebers...#justathought
 
 MoosaHemani edited 2012-03-22T07:02:42-07:00
 3 0
 
 I don’t have to say much about your comment but it’s like people do have different opinion and tastes...It’s completely fine that you didn’t like the idea and even great that you share it openly with the other community members but the thing I don’t like is the way to react to it. I mean telling someone to do some homework, when writing a post for the next time is way too humiliating (especially when you are talking to the industry leader)... I do respect dr. Pete and I am following him for quite a long time. The reason why I think this post is important (IMHO and you can disagree here) is because in the race of new and complex SEO ideas some people forget the basic thumb rules and make errors... Remember he said “...couple of common mistakes...” P.S. you see! you didn't got much thumbs down here... I think you should re-consider the way you think about other community memebers...#justathought
 Cancel
 - A-W
 
 2012-03-26T00:58:19-07:00
 
 Eight thumbs Down is not less I guess, with few thumbs down on sub comments as well.
 
 And the homework statement is not in literal sense, and there is nothing about humiliating as such, its just an honest view about the article where I feel that Dr Pete can provide us with much much better content than that. He is one of the industry leaders, so its obvious he does not need any homework as such. there is nothing humiliating or personal, its simply about the given article we are discussing here.
 
 Consider it as this example, "I dont expect Intel to make that 486 or Pentium-1 anymore, I would expect them to make Corei5, Core i7 or even higher".
 
 Hope you get the point now!
 
 1 0
 
 Eight thumbs Down is not less I guess, with few thumbs down on sub comments as well. And the homework statement is not in literal sense, and there is nothing about humiliating as such, its just an honest view about the article where I feel that Dr Pete can provide us with much much better content than that. He is one of the industry leaders, so its obvious he does not need any homework as such. there is nothing humiliating or personal, its simply about the given article we are discussing here. Consider it as this example, "I dont expect Intel to make that 486 or Pentium-1 anymore, I would expect them to make Corei5, Core i7 or even higher". Hope you get the point now!
 Cancel
- Dr. Peter J. Meyers
 
 2012-03-22T07:04:02-07:00
 
 I'm always open to critical feedback, but I feel like you may have missed part of the point of the post - it's not a post about just using NOINDEX. It's a post about using NOINDEX (or any page-based de-indexation cue) correctly. The devil is in the details, as they say.
 
 I wrote this post for the same reason I write many of my technical posts - because I've seen a handful of problems in Q&A and even with my own clients. Many people seem to be misunderstanding the nuances of de-indexation - it sounds simple in theory, but it's incredibly difficult in practice, especially on large sites. I've seen this particular mistake cost people weeks or months (and that means $, in most cases).
 
 I'd also point out that, with 100K subscribers, we can't make 100% of the people happy 100% of the time. Different authors here have different approaches to that problem, and mine is diversity. I try to mix it up - some in-depth posts, some comprehensive, some more basic. Sometimes, I'll even write about entrepreneurship or blogging. Sometimes, I get the mix wrong, and people call me on it - I appreciate that. On the other hand, I'll never make everyone happy with every post.
 
 4 0
 
 I'm always open to critical feedback, but I feel like you may have missed part of the point of the post - it's not a post about just using NOINDEX. It's a post about using NOINDEX (or any page-based de-indexation cue) correctly. The devil is in the details, as they say. I wrote this post for the same reason I write many of my technical posts - because I've seen a handful of problems in Q&A and even with my own clients. Many people seem to be misunderstanding the nuances of de-indexation - it sounds simple in theory, but it's incredibly difficult in practice, especially on large sites. I've seen this particular mistake cost people weeks or months (and that means $, in most cases). I'd also point out that, with 100K subscribers, we can't make 100% of the people happy 100% of the time. Different authors here have different approaches to that problem, and mine is diversity. I try to mix it up - some in-depth posts, some comprehensive, some more basic. Sometimes, I'll even write about entrepreneurship or blogging. Sometimes, I get the mix wrong, and people call me on it - I appreciate that. On the other hand, I'll never make everyone happy with every post.
 Cancel
 - Anthony Wakefield
 
 2012-03-22T21:03:51-07:00
 
 Sometimes the simple topics are the most important. I've seen noindex attempts blocked by robot.txt and nofollow declarations many times. You've touched on an important skill, and outlined how to actually get it right. It's simple and effective! Good Job :)
 
 1 0
 
 Sometimes the simple topics are the most important. I've seen noindex attempts blocked by robot.txt and nofollow declarations many times. You've touched on an important skill, and outlined how to actually get it right. It's simple and effective! Good Job :)
 Cancel
 - A-W
 
 2012-03-26T01:04:06-07:00
 
 Guess what? Your reply is the most sensible and most relevant and addresses my concerns.(One of the four thumbs up for your reply is mine :P)
 
 I dont know why many other respectable memebers are taking it sort of personal, misunderstanding parts of my comment and considering it somewhat humiliating or something. Its just a view about an article, nothing more nothing less.
 
 *White Flag* :)
 
 1 0
 
 Guess what? Your reply is the most sensible and most relevant and addresses my concerns.(One of the four thumbs up for your reply is mine :P) I dont know why many other respectable memebers are taking it sort of personal, misunderstanding parts of my comment and considering it somewhat humiliating or something. Its just a view about an article, nothing more nothing less. *White Flag* :)
 Cancel

Post Analytics

Logic, Meet Google - Crawling to De-index

Scenario: Product Reviews

The “Fix”, Part 1

The Fix, Part 2

New vs. Existing Sites

301, Rel-canonical, etc.

Don’t Get Ahead of Yourself

Comments 85

Scenario: Product Reviews

The “Fix”, Part 1

The Fix, Part 2

New vs. Existing Sites

301, Rel-canonical, etc.

Don’t Get Ahead of Yourself

Comments 85

Log in to Moz

Don't have an account?