Duplicate Content in a Post-Panda World

Comments 300

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Gianluca Fiorelli

2011-11-17T01:06:55-08:00

Woah!

While I was reading this guide:
1. a new European nation entered into the debt crisis;
2. 10 white sharks were killed
3. 462 kids were born
4. 340 new businessess were created in the world
5. 340 old businessess closed due to the crisis
6. 1450320449 insignificant promises has been launched by politicians
7. 15 new updates have been tested by Google
8. 15*100.000 new stars are shining over our heads
9. 943 persons crossed the Brooklyn Bridge
10. 3 new "SEO is dead" post had been published
Thanks Peter for this titanic effort of classifying the different kind of duplicated content, which is apparently so easy to understand but so confusing at times, as you (as I) have seen from the tons of questions about this topic in the Q&A.

29 0
Woah! While I was reading this guide: <ol><li>a new European nation entered into the debt crisis;</li> <li>10 white sharks were killed</li> <li>462 kids were born</li> <li>340 new businessess were created in the world</li> <li>340 old businessess closed due to the crisis</li> <li>1450320449 insignificant promises has been launched by politicians</li> <li>15 new updates have been tested by Google</li> <li>15*100.000 new stars are shining over our heads</li> <li>943 persons crossed the Brooklyn Bridge</li> <li>3 new "SEO is dead" post had been published</li> </ol> Thanks Peter for this titanic effort of classifying the different kind of duplicated content, which is apparently so easy to understand but so confusing at times, as you (as I) have seen from the tons of questions about this topic in the Q&A.
Cancel
- Sha Menz
 
 2011-11-17T01:11:12-08:00
 
 Nice job Gianluca,
 
 Epic comment to match an epic post!
 
 Sha
 
 2 0
 
 Nice job Gianluca, Epic comment to match an epic post! Sha
 Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T05:33:29-08:00
 
 LOL - nice :) Imagine what happened while I wrote it.
 
 Dr-Pete edited 2011-11-17T05:33:40-08:00
 5 0
 
 LOL - nice :) Imagine what happened while I wrote it.
 Cancel
- algogmbh_petra
 
 2011-11-17T06:14:21-08:00
 
 Lol - delighful analogies Gianluca!
 
 2 0
 
 Lol - delighful analogies Gianluca!
 Cancel
- Elijah-Blue Vieau
 
 2011-11-17T09:54:03-08:00
 
 Haha!
 
 Duplicate Content 101 with Dr Pete... You just gave me 723 more reasons to continue educating our clients on the importance of [unique] content.
 
 1 0
 
 Haha! Duplicate Content 101 with Dr Pete... You just gave me 723 more reasons to continue educating our clients on the importance of [unique] content.
 Cancel
Keri Morgret

2011-11-16T20:37:02-08:00

Thanks for all of the time you spent in the research and in putting this together in an easy-to-understand format, Dr. Pete! I know I'll be using it and sharing it a lot in Q&A. This is one of those posts that will wind up on the most-popular list for sure.

17 0

Thanks for all of the time you spent in the research and in putting this together in an easy-to-understand format, Dr. Pete! I know I'll be using it and sharing it a lot in Q&A. This is one of those posts that will wind up on the most-popular list for sure.
Cancel
Cleo Kirkland

2011-11-16T20:47:23-08:00

Dr Pete, thank you so much. I've been looking for a thorough post on duplicate content for a while. Also, for those SEO Excel ninjas, the TEXT TO COLUMN command works well for slicing up your internal link data, especially if you're looking for duplicate content. For instance, just change the delimiter to "/" and you'll be able to sort http for https with ease. Thanks for the post Doc.

7 0

Dr Pete, thank you so much. I've been looking for a thorough post on duplicate content for a while. Also, for those SEO Excel ninjas, the TEXT TO COLUMN command works well for slicing up your internal link data, especially if you're looking for duplicate content. For instance, just change the delimiter to "/" and you'll be able to sort http for https with ease. Thanks for the post Doc. 
Cancel
Stephanie Chang

2011-11-16T21:42:36-08:00

Dr. Pete,

Wow, this is an incredible resource! This has definitely become one of my bookmarked pages and a post that I'll be referencing to my clients whenever they have any duplicate content concerns.

Something to consider: regarding syndicated content, we found that if the original content is behind a paywall, then the situation actually becomes a little bit different. This was a special exception for one of my clients who wanted to know whether syndicated content was worth the expense. Even though Google had crawled and indexed the original content, my client's content still ranked higher because it wasn't behind a paywall. However, we still recommended that they place a cross-domain canonical on their syndicated content, since it would likely improve the overall authority of the site and demonstrates their trustworthiness to search engines.

Other than that, really appreciated your advice about how to find duplicate content via search operators, especially not using www. Also, I agree that perusing Google Webmaster Tools when analyzing a new site can bring incredible insights. For instance, I would suggest using Google Webmaster Tools to find which sites link back to your site. This has helped me find tons of duplicate content, especially when clients own other domains and build a link network between these almost identical sites because they think it will help them take over the search results.

Overall, sometimes less is more.

5 0

Dr. Pete, Wow, this is an incredible resource! This has definitely become one of my bookmarked pages and a post that I'll be referencing to my clients whenever they have any duplicate content concerns. Something to consider: regarding syndicated content, we found that if the original content is behind a paywall, then the situation actually becomes a little bit different. This was a special exception for one of my clients who wanted to know whether syndicated content was worth the expense. Even though Google had crawled and indexed the original content, my client's content still ranked higher because it wasn't behind a paywall. However, we still recommended that they place a cross-domain canonical on their syndicated content, since it would likely improve the overall authority of the site and demonstrates their trustworthiness to search engines. Other than that, really appreciated your advice about how to find duplicate content via search operators, especially not using www. Also, I agree that perusing Google Webmaster Tools when analyzing a new site can bring incredible insights. For instance, I would suggest using Google Webmaster Tools to find which sites link back to your site. This has helped me find tons of duplicate content, especially when clients own other domains and build a link network between these almost identical sites because they think it will help them take over the search results. Overall, sometimes less is more.
Cancel
- Gianluca Fiorelli
 
 2011-11-17T01:33:20-08:00
 
 Excellent tip about cross domains duplicated content.
 
 2 0
 
 Excellent tip about cross domains duplicated content.
 Cancel
 - Andrew Borman
 
 2012-09-02T10:28:38-07:00
 
 Agreed. It's obvious but useful for someone.
 
 2 0
 
 Agreed. It's obvious but useful for someone.
 Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T05:19:14-08:00
 
 Great tip - I admit I didn't cover syndication in great detail here, and it really is a complex topic. That was one of the biggest challenges of writing the post - at least half the scenarios I discuss have many incarnations and the "right" solution can be very situational.
 
 2 0
 
 Great tip - I admit I didn't cover syndication in great detail here, and it really is a complex topic. That was one of the biggest challenges of writing the post - at least half the scenarios I discuss have many incarnations and the "right" solution can be very situational.
 Cancel
- pintofmilk
 
 2011-12-03T12:53:50-08:00
 
 Hi Stephanie, Dr Pete,
 
 I am doing the SEO for a large hotel price comparison aggregator site which is chock full of syndicated content and has a duplicate content penalty. I was thinking of using noindex but I like your logic and am going to follow your lead aby adding syndication-source tag to give credit for hotel descriptions to the booking sites that they came from. I am hoping that it will make the site more trustworthy and improve rankings for the pages that do have original content. Do you think it should work?
 
 As I read up on the tag I noticed that on 2/11/11 Google added a note to credit where credit is due article stating "we’ve updated our system to use rel=canonical instead of syndication-source, if both are specified". This seems to indicate that Google considers rel=canonical to have a similar effect to syndication-source. Following this through, it seems to me that the following scenario will look to Google like an attempt to claim original authorship and this could be part of the hotel comparison sites current woes. What do you think?
 
 domain.com/hotel-123.html publishing syndicated content from another hotel site but using
 
 Thanks
 
 Ben
 
 jennita edited 2011-12-03T12:58:33-08:00
 1 0
 
 Hi Stephanie, Dr Pete, I am doing the SEO for a large hotel price comparison aggregator site which is chock full of syndicated content and has a duplicate content penalty. I was thinking of using noindex but I like your logic and am going to follow your lead aby adding syndication-source tag to give credit for hotel descriptions to the booking sites that they came from. I am hoping that it will make the site more trustworthy and improve rankings for the pages that do have original content. Do you think it should work? As I read up on the tag I noticed that on 2/11/11 Google added a note to <a href="https://googlenewsblog.blogspot.com/2010/11/credit-where-credit-is-due.html" rel="nofollow">credit where credit is due article</a> stating "we’ve updated our system to use rel=canonical instead of syndication-source, if both are specified". This seems to indicate that Google considers rel=canonical to have a similar effect to syndication-source. Following this through, it seems to me that the following scenario will look to Google like an attempt to claim original authorship and this could be part of the hotel comparison sites current woes. What do you think? domain.com/hotel-123.html publishing syndicated content from another hotel site but using Thanks Ben
 Cancel
Sha Menz

2011-11-16T21:30:52-08:00

Wow Dr Pete!

First I have to say thank you for pressing on and covering the whole range of duplicate content issues and solutions! Half doing the job would have just lead to more confusion.

This post will be an especially outstanding resource for those of us who spend time in Q&A. To be honest, all I want for Christmas is a pdf download link so I can print it for use as a quick reference!

This one was a huge commitment in time & effort. Thank you for being generous enough to do all of that for the community. Very much appreciated.

Sha

5 0

Wow Dr Pete! First I have to say thank you for pressing on and covering the whole range of duplicate content issues and solutions! Half doing the job would have just lead to more confusion. This post will be an especially outstanding resource for those of us who spend time in Q&A. To be honest, all I want for Christmas is a pdf download link so I can print it for use as a quick reference! This one was a huge commitment in time & effort. Thank you for being generous enough to do all of that for the community. Very much appreciated. Sha
Cancel
- VistaStores
 
 2011-11-16T21:47:04-08:00
 
 @Sha Menz
 
 You are right. Yesterday, I have added my question on SEOmoz. And, today I got christmas gift from Dr. Pete. Today, I was thinking to tweet you my question. But, Now all are on same platform with this rock solid blog post. Duplicate content will be stand at exit door of website. :)
 
 1 0
 
 @Sha Menz You are right. Yesterday, I have added my question on SEOmoz. And, today I got christmas gift from Dr. Pete. Today, I was thinking to tweet you my question. But, Now all are on same platform with this rock solid blog post. Duplicate content will be stand at exit door of website. :)
 Cancel
algogmbh_petra

2011-11-16T23:03:13-08:00

You upsetted my time plan for the day with that long post but it was worth it :-)

This is one of my favorite blog posts of the year - a real comrehensive guide!

One addition to the tools for fixing duplicates: the X-robots Tag for e.g. pdf's.

4 0

You upsetted my time plan for the day with that long post but it was worth it :-) This is one of my favorite blog posts of the year - a real comrehensive guide! One addition to the tools for fixing duplicates: the X-robots Tag for e.g. pdf's.
Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T05:29:49-08:00
 
 You know, I pondered X-robots, but I was really tired :) Seriously, that is an important new tool. I'd like to see a post from someone who's used it a few different ways, because I've only dabbled at this point. It's a bit trickier to set up than most of the traditional solutions.
 
 1 0
 
 You know, I pondered X-robots, but I was really tired :) Seriously, that is an important new tool. I'd like to see a post from someone who's used it a few different ways, because I've only dabbled at this point. It's a bit trickier to set up than most of the traditional solutions.
 Cancel
John Doherty

2011-11-16T21:36:35-08:00

Pete -

Wow. Just wow. This is an incredible amount of content and I really hope people take the time to digest it and dig in to fix their duplicate content issues. I am constantly amazed at how many sites have www/non-www/index.html/index.aspx issues.

One you forgot to mention, I think, was that sites done in ASPX will still a return a 200 status code regardless of the case (upper or lowercase) in the URL. If you use ASPX on your site, this is something to watch for, and there is a strong argument for using a rel=canonical tag here to deal with it.

Amazing post. I will reference it frequently.

4 0

Pete - Wow. Just wow. This is an incredible amount of content and I really hope people take the time to digest it and dig in to fix their duplicate content issues. I am constantly amazed at how many sites have www/non-www/index.html/index.aspx issues. One you forgot to mention, I think, was that sites done in ASPX will still a return a 200 status code regardless of the case (upper or lowercase) in the URL. If you use ASPX on your site, this is something to watch for, and there is a strong argument for using a rel=canonical tag here to deal with it. Amazing post. I will reference it frequently.
Cancel
- Sha Menz
 
 2011-11-16T21:44:22-08:00
 
 Hey John,
 
 That shiny new badge looks good on you! :)
 
 congrats
 
 Sha
 
 2 0
 
 Hey John, That shiny new badge looks good on you! :) congrats Sha
 Cancel
 - John Doherty
 
 2011-11-17T13:38:53-08:00
 
 It's been there for a while, but thanks :-)
 
 1 0
 
 It's been there for a while, but thanks :-)
 Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T05:17:22-08:00
 
 For my sanity, I try to pretend that .NET doesn't exist :) Someone could write another post this long (maybe two) about weird duplicate content it creates.
 
 4 0
 
 For my sanity, I try to pretend that .NET doesn't exist :) Someone could write another post this long (maybe two) about weird duplicate content it creates.
 Cancel
 - JonoAlderson
 
 2011-11-17T06:13:57-08:00
 
 I might adopt that as general policy; being an SEO in a .NET agency with an enterprise CMS is slightly challenging to say the least!
 
 1 0
 
 I might adopt that as general policy; being an SEO in a .NET agency with an enterprise CMS is slightly challenging to say the least!
 Cancel
 - John Doherty
 
 2011-11-17T13:39:49-08:00
 
 It's quite annoying, I agree Pete. Sometimes I wonder if Google might realize this and therefore we don't have to worry about compensating for it. But now we get into "This is just a guess" and "Let Google figure it out", when we know that more often than not they get it wrong.
 
 2 0
 
 It's quite annoying, I agree Pete. Sometimes I wonder if Google might realize this and therefore we don't have to worry about compensating for it. But now we get into "This is just a guess" and "Let Google figure it out", when we know that more often than not they get it wrong.
 Cancel
Daniel Deceuster

2011-11-17T08:36:38-08:00

I can't believe this post has one thumbs down. Please tell me that was an accident. Otherwise, whoever you are, you should be ashamed of yourself. This is hands down one of the most valuable SEO resources online.

4 0

I can't believe this post has one thumbs down. Please tell me that was an accident. Otherwise, whoever you are, you should be ashamed of yourself. This is hands down one of the most valuable SEO resources online.
Cancel
Carl Joel Määttä

2011-11-17T00:36:23-08:00

First of all, great articel Dr. Pete. I think this will be a good guide for solving and preventing duplicate content issues for many of us.

About meta robots. I would like to contribute that I always preffer NOINDEX, FOLLOW before NOINDEX, NOFOLLOW because using FOLLOW will not only allow the bots to crawl the links on that specific page as you write but I believe that it will also pass important link juice (that may be pointing to the page) to the links on that specific page.

4 0

First of all, great articel Dr. Pete. I think this will be a good guide for solving and preventing duplicate content issues for many of us. About meta robots. I would like to contribute that I always preffer NOINDEX, FOLLOW before NOINDEX, NOFOLLOW because using FOLLOW will not only allow the bots to crawl the links on that specific page as you write but I believe that it will also pass important link juice (that may be pointing to the page) to the links on that specific page.
Cancel
- Gianluca Fiorelli
 
 2011-11-17T01:25:38-08:00
 
 It depends and, as an example of the rightness of using Noindex, Nofollow, I present you a real case I have to deal with.
 
 A site I worked on had - due to bad programming - an horrible case of substantially duplicated content (when pages are not 100% identical, but they are overall so that Google consider them as duplicated). In fact, the filters' system they created was generating automatically 64*64 new URLs, 70% of which paginated. You can imagine the crawl problems Googlebot was having... such that Google itself sent an email to my client Webmaster Tools profile saying: "Hey dude, with all those URLs you're driving me crazy".
 
 This issue was causing that more important pages, category ones too!!, were not crawled regularly with a the result that my client site was literally ranking in position 2 one day and position 10 the day after and position 7 the day after the second and so on for important category related kws.
 
 Because of all of this, we finally decided to order Google to not index any faceted navigation (and their related paginated pages) to not overload the crawler. Results: ranking are getting stable and products pages that Googlebot was not able to find because it was spending all its crawl budget, finally were indexed and are ranking.
 
 gfiorelli1 edited 2011-11-17T01:28:49-08:00
 4 0
 
 It depends and, as an example of the rightness of using Noindex, Nofollow, I present you a real case I have to deal with. A site I worked on had - due to bad programming - an horrible case of substantially duplicated content (when pages are not 100% identical, but they are overall so that Google consider them as duplicated). In fact, the filters' system they created was generating automatically 64*64 new URLs, 70% of which paginated. You can imagine the crawl problems Googlebot was having... such that Google itself sent an email to my client Webmaster Tools profile saying: "Hey dude, with all those URLs you're driving me crazy". This issue was causing that more important pages, category ones too!!, were not crawled regularly with a the result that my client site was literally ranking in position 2 one day and position 10 the day after and position 7 the day after the second and so on for important category related kws. Because of all of this, we finally decided to order Google to not index any faceted navigation (and their related paginated pages) to not overload the crawler. Results: ranking are getting stable and products pages that Googlebot was not able to find because it was spending all its crawl budget, finally were indexed and are ranking.
 Cancel
 - Carl Joel Määttä
 
 2011-11-17T02:00:15-08:00
 
 Thank you Gianluca, for your good input and great example as a case when not to prefer it.
 
 You are absolute right on that when it can get to "complex situations". I didn't reflect much on the budget here but wanted to hightlight the believe of FOLLOW passing link juice! :)
 
 1 0
 
 Thank you Gianluca, for your good input and great example as a case when not to prefer it. You are absolute right on that when it can get to "complex situations". I didn't reflect much on the budget here but wanted to hightlight the believe of FOLLOW passing link juice! :)
 Cancel
 - Gianluca Fiorelli
 
 2011-11-17T02:02:20-08:00
 
 You're welcome...
 
 gfiorelli1 edited 2011-11-17T02:26:07-08:00
 2 0
 
 You're welcome...
 Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T05:32:51-08:00
 
 Generally, I think you're right - FOLLOW is going to be safer than NOFOLLOW. I think the one situation I use NOFOLLOW consistently is if you know you've reached the end of a path. For example, I might Meta NOINDEX,NOFOLLOW shopping cart pages, because everything "below" them should be NOINDEX'ed as well.
 
 Now, you could argue that, given the recursive nature of PageRank calculations, that you're blocking the flow of internal PR back up (to navigation, etc.). I suspect that's negligible and that what you save from crawler fatigue outweighs what you lose in PR-passing, but I can't prove that. These things are nearly impossible to measure precisely.
 
 4 0
 
 Generally, I think you're right - FOLLOW is going to be safer than NOFOLLOW. I think the one situation I use NOFOLLOW consistently is if you know you've reached the end of a path. For example, I might Meta NOINDEX,NOFOLLOW shopping cart pages, because everything "below" them should be NOINDEX'ed as well. Now, you could argue that, given the recursive nature of PageRank calculations, that you're blocking the flow of internal PR back up (to navigation, etc.). I suspect that's negligible and that what you save from crawler fatigue outweighs what you lose in PR-passing, but I can't prove that. These things are nearly impossible to measure precisely.
 Cancel
Todd Malicoat

2011-11-17T11:38:29-08:00

Really fantastic post Pete - I'm a big fan of the crawl "budget" terminology. It's a concept that is often lost on people, and a great way to explain it. Kudos.

3 0

Really fantastic post Pete - I'm a big fan of the crawl "budget" terminology. It's a concept that is often lost on people, and a great way to explain it. Kudos.
Cancel
Gerry White

2011-11-17T03:22:03-08:00

One bit of duplicate content missed - case senstitive...

It actually shocks me that Google has issues with 'www' vs none 'www' and that Google IS case senstitive, and spaces in URL's (%20 vs + ), I am sure that a simple alogirithim could be written that says if site is MS then it is not case sensitive.

My currently preferred duplicate content tool is Dan Sharps Screaming Spider Frog and Excel!

Have sent this guide though to everyone!!!

3 0

One bit of duplicate content missed - case senstitive... It actually shocks me that Google has issues with 'www' vs none 'www' and that Google IS case senstitive, and spaces in URL's (%20 vs + ), I am sure that a simple alogirithim could be written that says if site is MS then it is not case sensitive. My currently preferred duplicate content tool is Dan Sharps Screaming Spider Frog and Excel! Have sent this guide though to everyone!!! 
Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T05:37:49-08:00
 
 I was town on both trailing slash issues and case-sensitivity, because they're just so inconsistent these days. While I avoid mixed-case URLs, they sometimes cause no problems. Then, once in a while, boom - a bunch of problems.
 
 This is a case where solid, site-wide canonical tags can prevent issues. I only hesitate to recommend site-wide canonical because so many people implement them wrong.
 
 1 0
 
 I was town on both trailing slash issues and case-sensitivity, because they're just so inconsistent these days. While I avoid mixed-case URLs, they sometimes cause no problems. Then, once in a while, boom - a bunch of problems. This is a case where solid, site-wide canonical tags can prevent issues. I only hesitate to recommend site-wide canonical because so many people implement them wrong.
 Cancel
 - Gerry White
 
 2011-11-17T06:40:07-08:00
 
 I would have (based on Googles advice) said that wrong canonical tags wouldn't be too much of a problem... but since seeing in practice where a canonical tag points to a 404 or similar and these pages not being indexed I am loathe to trust Google...! Anything invisible from canonical tags sitemaps and headers are more often implemented wrong than right... which is one of the reasons I love the SEOmoz custom crawl!
 
 1 0
 
 I would have (based on Googles advice) said that wrong canonical tags wouldn't be too much of a problem... but since seeing in practice where a canonical tag points to a 404 or similar and these pages not being indexed I am loathe to trust Google...! Anything invisible from canonical tags sitemaps and headers are more often implemented wrong than right... which is one of the reasons I love the SEOmoz custom crawl!
 Cancel
Jenni Brown

2011-11-17T07:21:18-08:00

+1ing the excellent post Dr Pete! I personally can't wait for the day when Google's algorithm can pick up on poor quality spun content that doesn't make any sense and flag it as spam immediately.

3 0

+1ing the excellent post Dr Pete! I personally can't wait for the day when Google's algorithm can pick up on poor quality spun content that doesn't make any sense and flag it as spam immediately.
Cancel
jamesm5i

2011-11-17T08:04:58-08:00

I know the rules of search are ever-changing, but for now let's call this what it is: the definitive guide to duplicate content issues. Thanks, Hulk;)

3 0

I know the rules of search are ever-changing, but for now let's call this what it is: the definitive guide to duplicate content issues. Thanks, Hulk;)
Cancel
narendrap

2011-11-16T23:16:59-08:00

I enjoyed reading this article Thanks for the update. Google Panda is really doing WOW in the field of SEO. Duplicate content should be removed from the site for better performance.Duplicate content you must avoid it,try to make fresh content and also you give quality content this now beneficial for us,so we try to give fresh and quality content for that we update regular.

2 0

I enjoyed reading this article Thanks for the update. Google Panda is really doing WOW in the field of SEO. Duplicate content should be removed from the site for better performance.Duplicate content you must avoid it,try to make fresh content and also you give quality content this now beneficial for us,so we try to give fresh and quality content for that we update regular.
Cancel
YuriKolovsky

2012-04-02T01:39:58-07:00

Please update the article and add the point.

(10) International Duplicates hreflang="x"

https://support.google.com/webmasters/bin/answer.py?hl=en&answer=189077

it is supported now, and would save noobs like me some time researching after reading the long and informative article.

2 0

Please update the article and add the point. (10) International Duplicates hreflang="x" https://support.google.com/webmasters/bin/answer.py?hl=en&answer=189077 it is supported now, and would save noobs like me some time researching after reading the long and informative article.
Cancel
- Dr. Peter J. Meyers
 
 2012-04-02T13:27:28-07:00
 
 Thanks for the reminder. I don't usually add things to older posts, but since this one was definitely intended as a reference, I decided that you're right. I've added hreflang="x" as (14) at the end of the Tools section.
 
 1 0
 
 Thanks for the reminder. I don't usually add things to older posts, but since this one was definitely intended as a reference, I decided that you're right. I've added hreflang="x" as (14) at the end of the Tools section.
 Cancel
Roberto Robles

2012-12-19T08:33:21-08:00

I've had this problem with clients that I do design work for. They provide the content, but a lot of times it turns out to be content just copied from the competition. So I have to work with them to do some original content.

2 0

I've had this problem with clients that I do design work for. They provide the content, but a lot of times it turns out to be content just copied from the competition. So I have to work with them to do some original content. 
Cancel
Taylor Cimala

2011-11-17T09:03:32-08:00

Hey Dr. Pete - Google actually allows you to remove pages that are still accessible via the removal tool in Webmaster tools. I would agree that it's definitely a best practice and redundant to do a URL removal that can still be accessed, but a few months ago they opted to allow it - https://googlewebmastercentral.blogspot.com/2011/05/easier-url-removals-for-site-owners.html.

This is in reference to section IV, number 6.

I would probably only find this useful however if I had blocked something like a search results page in the robots.txt and didn't want to wait for Google to go back through and re-index/remove those URL's. Not sure why you'd want to remove a 200 page (outside of testing) that could get re-indexed soon after.

2 0

Hey Dr. Pete - Google actually allows you to remove pages that are still accessible via the removal tool in Webmaster tools. I would agree that it's definitely a best practice and redundant to do a URL removal that can still be accessed, but a few months ago they opted to allow it - https://googlewebmastercentral.blogspot.com/2011/05/easier-url-removals-for-site-owners.html. This is in reference to section IV, number 6. I would probably only find this useful however if I had blocked something like a search results page in the robots.txt and didn't want to wait for Google to go back through and re-index/remove those URL's. Not sure why you'd want to remove a 200 page (outside of testing) that could get re-indexed soon after.
Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T13:57:19-08:00
 
 Thanks - I missed that update. I'll add a note to the post ASAP.
 
 1 0
 
 Thanks - I missed that update. I'll add a note to the post ASAP.
 Cancel
pointblankseo

2011-11-16T21:48:02-08:00

This is one of those landmark posts that's going to be referenced for years :D

Dr. Pete, Just asking - how long did it take you to put this together?

2 0

This is one of those landmark posts that's going to be referenced for years :D Dr. Pete, Just asking - how long did it take you to put this together?
Cancel
- Moosa Hemani
 
 2011-11-16T22:54:10-08:00
 
 Indeed! This is the great example of link worthy content, i must say!
 
 1 0
 
 Indeed! This is the great example of link worthy content, i must say!
 Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T05:28:31-08:00
 
 You know, I don't track very well, but maybe 20 hours? I find that's kind of a magic number for me.
 
 6 0
 
 You know, I don't track very well, but maybe 20 hours? I find that's kind of a magic number for me.
 Cancel
 - pointblankseo
 
 2011-11-17T17:09:39-08:00
 
 Wow. That's called dedication.
 
 +1 for Dr. Pete.
 
 1 0
 
 Wow. That's called dedication. +1 for Dr. Pete.
 Cancel
Jeff Downer

2011-11-17T07:29:32-08:00

This is the single best practical guide to Panda that I have come across. It was definitely worth the time it took to read and digest. I am glad to report it has done more to ease my mind than to create more stress.

2 0

This is the single best practical guide to Panda that I have come across. It was definitely worth the time it took to read and digest. I am glad to report it has done more to ease my mind than to create more stress.
Cancel
Evaevisima

2011-11-17T08:30:37-08:00

Thanks for all this effort and work you've done, Dr. Pete. And thanks for telling it so well and so easy to understand!!

2 0

Thanks for all this effort and work you've done, Dr. Pete. And thanks for telling it so well and so easy to understand!!
Cancel
Catdynamics

2012-11-28T22:35:49-08:00

Fantastic post... pity Google are not doing anything about duplicate content even after they have been informed many times!

I have written a detailed post on the topic and would be interested in others in similar situations? - https://www.my-beautiful-life.com.au/business/google-doing-nothing-about-duplicate-websites/

2 0

Fantastic post... pity Google are not doing anything about duplicate content even after they have been informed many times! I have written a detailed post on the topic and would be interested in others in similar situations? - https://www.my-beautiful-life.com.au/business/google-doing-nothing-about-duplicate-websites/ 
Cancel
Michael Janik

2011-11-17T11:55:59-08:00

Wow, in Germany I would say: "Dieser Artikel ist der Hammer" - This article is very impressive. Very detailed and very pro. It really covers most aspects of duplicatate content.

2 0

Wow, in Germany I would say: "Dieser Artikel ist der Hammer" - This article is very impressive. Very detailed and very pro. It really covers most aspects of duplicatate content.
Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T14:06:57-08:00
 
 I like this very much.
 
 1 0
 
 I like this very much.
 Cancel
JonRWilhelm

2011-11-18T09:37:18-08:00

How about mobile sites?

2 0

How about mobile sites?
Cancel
- Dr. Peter J. Meyers
 
 2011-11-18T13:22:54-08:00
 
 That's a very good question, and definitely an oversight on my part. I'm not a mobile expert, by a long shot, but the common subdomain situation (like "m.example.com") is probably worth adding. Google has gotten better about it, but there are still plenty of people running into trouble.
 
 Dr-Pete edited 2011-11-18T14:35:32-08:00
 1 0
 
 That's a very good question, and definitely an oversight on my part. I'm not a mobile expert, by a long shot, but the common subdomain situation (like "m.example.com") is probably worth adding. Google has gotten better about it, but there are still plenty of people running into trouble.
 Cancel
Ian Lurie

2011-11-18T07:38:42-08:00

Fantastic article, but I still think zombies are far more dangerous than Pandas.

2 0

Fantastic article, but I still think zombies are far more dangerous than Pandas.
Cancel
- Dr. Peter J. Meyers
 
 2011-11-18T08:33:17-08:00
 
 When Google unleases the "Zombie" update in 2012, I know who to blame ;)
 
 1 0
 
 When Google unleases the "Zombie" update in 2012, I know who to blame ;)
 Cancel
Woj Kwasi

2011-11-17T18:31:54-08:00

Epic Post is EPIC :)

I like section VII (4) where you discuss "Your Own Brain" - often overlooked by many

Will be pointing others to this post for sure ;)

2 0

Epic Post is EPIC :) I like section VII (4) where you discuss "Your Own Brain" - often overlooked by many Will be pointing others to this post for sure ;)
Cancel
Derek Roach

2011-11-17T16:19:02-08:00

When I was creating multiple website that were geo specific, I had to make sure the content was unique so none of the website would be considered duplicate content by google.

Compare hundreds of pages of copy to make sure they are all considerably different is not easy. I used https://www.duplicatecontent.net/ and https://jetchecker.com/ which gave me a percentage of how close the content is between multiple pages.

Hopefully these resources will help you avoid duplicate content penalties like it did for me!

2 0

When I was creating multiple website that were geo specific, I had to make sure the content was unique so none of the website would be considered duplicate content by google. Compare hundreds of pages of copy to make sure they are all considerably different is not easy. I used <a href="https://www.duplicatecontent.net/" rel="nofollow">https://www.duplicatecontent.net/</a> and <a href="https://jetchecker.com/" rel="nofollow">https://jetchecker.com/</a> which gave me a percentage of how close the content is between multiple pages. Hopefully these resources will help you avoid duplicate content penalties like it did for me!
Cancel
- AESEO
 
 2013-07-02T11:35:22-07:00
 
 Thanks Derek- Great resources! I always dig deep into the comments on these articles for THIS very reason.
 
 1 0
 
 Thanks Derek- Great resources! I always dig deep into the comments on these articles for THIS very reason. 
 Cancel
Darren Moloney

2011-11-21T10:40:21-08:00

Cheers for this Pete, certainly a lot of effort's gone into writing it... thankfully we've made good use here of the canonical tag on clients sites (mostly ecommerce) - where they've unintentionally duplicated data - to good effect. :)

2 0

Cheers for this Pete, certainly a lot of effort's gone into writing it... thankfully we've made good use here of the canonical tag on clients sites (mostly ecommerce) - where they've unintentionally duplicated data - to good effect. :)
Cancel
JasonWhite85

2011-11-17T15:32:09-08:00

Wow Dr. Pete. Just an incredible amount of work baked into this post. Despite the irony that this comment may qualify as a complete duplicate of the others preceeding it, I just wanted to thank you on behalf of seo students everywhere for putting it together.

2 0

Wow Dr. Pete. Just an incredible amount of work baked into this post. Despite the irony that this comment may qualify as a complete duplicate of the others preceeding it, I just wanted to thank you on behalf of seo students everywhere for putting it together.
Cancel
freepeople

2011-11-17T06:08:16-08:00

Fantastic post Dr. Pete. I actually just managed to resolve a lot of duplicate content issues on our site (various types of internal search pages for the most part) with the addition of the noindex tag, so the timing is perfect. This is giving me even more ideas of content that could be removed. I'm looking forward to seeing the results of trimming down our index. Thanks again for all the work that went into this...

2 0

Fantastic post Dr. Pete. I actually just managed to resolve a lot of duplicate content issues on our site (various types of internal search pages for the most part) with the addition of the noindex tag, so the timing is perfect. This is giving me even more ideas of content that could be removed. I'm looking forward to seeing the results of trimming down our index. Thanks again for all the work that went into this...
Cancel
VistaStores

2011-11-16T21:44:12-08:00

@Dr. Pete

I can say that, this blog post is on right time for me. Yesterday, I have asked one question on SEOmoz Q&A section regarding How to fix issue regarding URL parameters? I got quick answer from Alan Mosley. But, I have concern to index all pages which are associated to search pagination as follow.

https://www.vistastores.com/table-lamps

https://www.vistastores.com/table-lamps?p=2

https://www.vistastores.com/table-lamps?p=3

https://www.vistastores.com/table-lamps?p=4

https://www.vistastores.com/table-lamps?p=5

Right now, I have blocked all pages by robots.txt with following syntax.

Disallow: /*?p=

Honestly, I am not happy with this solution. I don't want to select any solution from rel=canonical or no index follow.

Now, turn for my brain as per Dr.Pete's recommendation. I have found one solution what I want to share on this blog which may solve issuer regarding search pagination.

I have checked search pagination over Mozilla Add-ons’ review for Google global.

https://addons.mozilla.org/en-US/firefox/addon/google-global/reviews/?page=1

https://addons.mozilla.org/en-US/firefox/addon/google-global/reviews/?page=2

Both pages are indexed by Google and shows on different search query with require snippets from content.

In this example, Meta info is same in all search pagination pages.

One another great example, from SEO Chat Forums.

https://forums.seochat.com/google-optimization-7/the-sticky-for-meta-description-keywords-keyword-density-and-title-199983.html

https://forums.seochat.com/google-optimization-7/the-sticky-for-meta-description-keywords-keyword-density-and-title-199983-2.html

Google have still not issue to index all search pagination pages.

In this example, Title tag is different from first one. They have added micro variation at beginning of Title tag.

I have double check with SEOmoz tool, both website did not add Rel=canonical or NOINDEX NO Follow Meta.

I would like to follow similar strategy with my ecommerce website because, in my website all search pagination have unique products. So, why should I break down my impression during long trail keywords?

I have another reason to trust on this method. Because, good authority and branding website like Mozilla and SEO Chat forum is following this strategy rather than focus on canonical tag or meta.

Now, I am looking forward for Dr. Pete's inputs or anyone who can give me more idea about it. Because, my webmaster tools shows me that, 6751 pages restricted by Robots.txt. I know that, my website does not contain that much duplication. Again, Here, I have used my brain to drill down few website. Dr. Pete. What you think about it?

2 0

@Dr. Pete I can say that, this blog post is on right time for me. Yesterday, I have asked one question on SEOmoz Q&A section regarding <a href="../q/how-to-fix-issues-regarding-url-parameters">How to fix issue regarding URL parameters?</a> I got quick answer from Alan Mosley. But, I have concern to index all pages which are associated to search pagination as follow. https://www.vistastores.com/table-lamps https://www.vistastores.com/table-lamps?p=2 https://www.vistastores.com/table-lamps?p=3 https://www.vistastores.com/table-lamps?p=4 https://www.vistastores.com/table-lamps?p=5 Right now, I have blocked all pages by robots.txt with following syntax. Disallow: /*?p= Honestly, I am not happy with this solution. I don't want to select any solution from rel=canonical or no index follow. Now, turn for my brain as per Dr.Pete's recommendation. I have found one solution what I want to share on this blog which may solve issuer regarding search pagination. I have checked search pagination over Mozilla Add-ons’ review for Google global. https://addons.mozilla.org/en-US/firefox/addon/google-global/reviews/?page=1 https://addons.mozilla.org/en-US/firefox/addon/google-global/reviews/?page=2 Both pages are indexed by Google and shows on different search query with require snippets from content. In this example, Meta info is same in all search pagination pages. One another great example, from SEO Chat Forums. https://forums.seochat.com/google-optimization-7/the-sticky-for-meta-description-keywords-keyword-density-and-title-199983.html https://forums.seochat.com/google-optimization-7/the-sticky-for-meta-description-keywords-keyword-density-and-title-199983-2.html Google have still not issue to index all search pagination pages. In this example, Title tag is different from first one. They have added micro variation at beginning of Title tag. I have double check with SEOmoz tool, both website did not add Rel=canonical or NOINDEX NO Follow Meta. I would like to follow similar strategy with my ecommerce website because, in my website all search pagination have unique products. So, why should I break down my impression during long trail keywords? I have another reason to trust on this method. Because, good authority and branding website like Mozilla and SEO Chat forum is following this strategy rather than focus on canonical tag or meta. Now, I am looking forward for Dr. Pete's inputs or anyone who can give me more idea about it. Because, my webmaster tools shows me that, 6751 pages restricted by Robots.txt. I know that, my website does not contain that much duplication. Again, Here, I have used my brain to drill down few website. Dr. Pete. What you think about it?
Cancel
- Alan Mosley
 
 2011-11-17T04:19:47-08:00
 
 I recall my advice was to use a canonical tag if the pages are in fact duplicate, and to do nothing if they are not, if it is just the titles and description, then I am not sure the work and complexity of altering them for each facet is worth it. Something I should of added, is that one would assume that you have a landing page that is going to rank better then all these product pages and that may be where you should put you time and efforts.
 
 3 0
 
 I recall my advice was to use a canonical tag if the pages are in fact duplicate, and to do nothing if they are not, if it is just the titles and description, then I am not sure the work and complexity of altering them for each facet is worth it. Something I should of added, is that one would assume that you have a landing page that is going to rank better then all these product pages and that may be where you should put you time and efforts.
 Cancel
 - Dr. Peter J. Meyers
 
 2011-11-17T05:27:53-08:00
 
 You raise an important point - there are sites where the search results are the content. Some directory or affiliate sites, for example, may only link out to outside sites and not have their own "product" layer. In that case, pagination is a bit trickier issue. Those search pages may be your bread and butter. For most of us, it's the deeper pages that count.
 
 2 0
 
 You raise an important point - there are sites where the search results are the content. Some directory or affiliate sites, for example, may only link out to outside sites and not have their own "product" layer. In that case, pagination is a bit trickier issue. Those search pages may be your bread and butter. For most of us, it's the deeper pages that count.
 Cancel
 - VistaStores
 
 2011-11-17T21:02:25-08:00
 
 Honestly, I am agree with you. But, I have one mind set where I try to fix each and every issue which shows me on GWT or SEOmoz tool. Duplicate title tag is also one issue which I show on GWT. This is my ultimate concern. Each and every micro observation and editing may help me more for better performance. Thanks again to be with me on my comment.
 
 1 0
 
 Honestly, I am agree with you. But, I have one mind set where I try to fix each and every issue which shows me on GWT or SEOmoz tool. Duplicate title tag is also one issue which I show on GWT. This is my ultimate concern. Each and every micro observation and editing may help me more for better performance. Thanks again to be with me on my comment.
 Cancel
 - Alan Mosley
 
 2011-11-18T00:17:31-08:00
 
 I have to agree with you, if it was me i would do the same, i dont like any loose ends
 
 1 0
 
 I have to agree with you, if it was me i would do the same, i dont like any loose ends
 Cancel
- Gianluca Fiorelli
 
 2011-11-17T04:47:13-08:00
 
 Actually, Dr Pete listed the solution to your doubt: the use of in the paginated content.
 
 This is also the solution Google itself is preaching to use in these cases.
 
 1 0
 
 Actually, Dr Pete listed the solution to your doubt: the use of in the paginated content. This is also the solution <a href="https://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html" rel="nofollow">Google itself</a> is preaching to use in these cases.
 Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T05:26:14-08:00
 
 There's certainly no official word from Google that says you can't let all paginated search pages be indexed. I find, though, that people wildly overestimate the value of these pages. Landing a visitor on Page 7 of a search has virtually no SEO value, IMO. Page 1 of any major search will have your core category keywords and capture most of the SEO value.
 
 Here's the bigger problem, though - what if those 6,700 pages "wear out" the crawlers to the point that your actually product pages don't get crawled. Now, you're sacrificing high-value, high-conversion pages for low-value internal search pages. Practically, I see this happen too often. I solved this problem for one client long before Panda, and saw their search traffic triple over the next 2 months. Paginated content and other duplicates were keeping Google from crawling their most important content.
 
 I'd also say - in general - that just because a site is high-authority doesn't mean you should take your technical SEO cues from them. Many reputable sites have sub-optimal on-page SEO. Because these sites are high authority, they can sometimes get away with things you can't.
 
 One final word, though - don't use Robots.txt for pagination. You may end up blocking the crawl paths. Meta NOINDEX,FOLLOW will keep the pages out but the crawl paths open.
 
 Again, you can leave them open and see what happens. You may be fine. Most of the time, though, my experience is that controlling your index has significant benefits. Since Panda, those benefits (and the risks of letting your index grow out of control) have only increased.
 
 6 0
 
 There's certainly no official word from Google that says you can't let all paginated search pages be indexed. I find, though, that people wildly overestimate the value of these pages. Landing a visitor on Page 7 of a search has virtually no SEO value, IMO. Page 1 of any major search will have your core category keywords and capture most of the SEO value. Here's the bigger problem, though - what if those 6,700 pages "wear out" the crawlers to the point that your actually product pages don't get crawled. Now, you're sacrificing high-value, high-conversion pages for low-value internal search pages. Practically, I see this happen too often. I solved this problem for one client long before Panda, and saw their search traffic triple over the next 2 months. Paginated content and other duplicates were keeping Google from crawling their most important content. I'd also say - in general - that just because a site is high-authority doesn't mean you should take your technical SEO cues from them. Many reputable sites have sub-optimal on-page SEO. Because these sites are high authority, they can sometimes get away with things you can't. One final word, though - don't use Robots.txt for pagination. You may end up blocking the crawl paths. Meta NOINDEX,FOLLOW will keep the pages out but the crawl paths open. Again, you can leave them open and see what happens. You may be fine. Most of the time, though, my experience is that controlling your index has significant benefits. Since Panda, those benefits (and the risks of letting your index grow out of control) have only increased.
 Cancel
 - VistaStores
 
 2011-11-17T21:05:57-08:00
 
 Agree, I am going to follow similar one for my website. I will let you know status very soon on this blog comment.. Thanks!
 
 1 0
 
 Agree, I am going to follow similar one for my website. I will let you know status very soon on this blog comment.. Thanks!
 Cancel
- TME_Digital
 
 2011-11-18T09:44:41-08:00
 
 I might be missing something, but I'm not sure I am - Why not just add " - Page 2" etc. to the end of the paginated page titles? Would that not solve the duplicate page title issue across the entire site?
 
 Also, if you are disallowing robots to crawl pages with "?p=" in the URL, aren't you then restricting Google from being able to view products within those pages too? Unless they are linked to from elsewhere, anyway.
 
 Seems there's some confusion, because the pages you linked are ecommerce category pages, not search pages, right?
 
 Maybe I'm wrong/confused, don't know! It's late on a Friday so could well be! :P
 
 ...P.S. Great post, Pete, will be using this for months/years to come no doubt! :)
 
 1 0
 
 I might be missing something, but I'm not sure I am - Why not just add " - Page 2" etc. to the end of the paginated page titles? Would that not solve the duplicate page title issue across the entire site? Also, if you are disallowing robots to crawl pages with "?p=" in the URL, aren't you then restricting Google from being able to view products within those pages too? Unless they are linked to from elsewhere, anyway. Seems there's some confusion, because the pages you linked are ecommerce category pages, not search pages, right? Maybe I'm wrong/confused, don't know! It's late on a Friday so could well be! :P ...P.S. Great post, Pete, will be using this for months/years to come no doubt! :)
 Cancel
 - Dr. Peter J. Meyers
 
 2011-11-18T13:21:28-08:00
 
 Adding "Page 2", etc. to the titles is certainly better than nothing and can help de-duplicate to a small degree, but I find that other issues (like diluting your index) still remain for most sites. Again, it's a matter of scale. If you're talking a few dozen paginated results on a 1000-page index - no problem. If you've got 10,000 indexed pages and half of them are paginated search, then I think you'd have great results pruning that back.
 
 You're correct about the parameter blocking - it's definitely a less optimal solution than Meta NOINDEX,FOLLOW, as it could cut off the bots. It doesn't seem to be all or none - I've used Robots.txt to block pagination parameters, and Google still seemed to crawl to some deeper pages, but in many cases I had other crawl paths in play. Parameter-blocking would probably be a last resort, given the wide array of options available.
 
 2 0
 
 Adding "Page 2", etc. to the titles is certainly better than nothing and can help de-duplicate to a small degree, but I find that other issues (like diluting your index) still remain for most sites. Again, it's a matter of scale. If you're talking a few dozen paginated results on a 1000-page index - no problem. If you've got 10,000 indexed pages and half of them are paginated search, then I think you'd have great results pruning that back. You're correct about the parameter blocking - it's definitely a less optimal solution than Meta NOINDEX,FOLLOW, as it could cut off the bots. It doesn't seem to be all or none - I've used Robots.txt to block pagination parameters, and Google still seemed to crawl to some deeper pages, but in many cases I had other crawl paths in play. Parameter-blocking would probably be a last resort, given the wide array of options available.
 Cancel
Bart Fellinger

2011-11-17T05:34:28-08:00

Great article! Used it to convince my clients to fix their duplicate content issues!

2 0

Great article! Used it to convince my clients to fix their duplicate content issues!
Cancel
Rob Chant

2011-12-07T09:11:59-08:00

Great post, although it may also be worth noting that changing a few words in a piece of content does not de-duplicate it! I struggle with this with some people.

Interestingly, with the trailing slashes -- I've been working with two sites recently on the same platform, which unfortunately exposed links both with and without the trailing slash. In one case, GWT reported lots of duplicates, but in the other case it didn't report any problems. Either way, it's still best to fix it at your own end and not trust Google to do it for you!

2 0

Great post, although it may also be worth noting that changing a few words in a piece of content does not de-duplicate it! I struggle with this with some people. Interestingly, with the trailing slashes -- I've been working with two sites recently on the same platform, which unfortunately exposed links both with and without the trailing slash. In one case, GWT reported lots of duplicates, but in the other case it didn't report any problems. Either way, it's still best to fix it at your own end and not trust Google to do it for you!
Cancel
- Dr. Peter J. Meyers
 
 2011-12-07T09:18:35-08:00
 
 I'm curious - beyond the GWT errors, did the trailing slashes seem to have any impact on indexing, ranking, etc.? I haven't seen any recent cases of problems, but it's always good to be aware of it.
 
 1 0
 
 I'm curious - beyond the GWT errors, did the trailing slashes seem to have any impact on indexing, ranking, etc.? I haven't seen any recent cases of problems, but it's always good to be aware of it.
 Cancel
 - Rob Chant
 
 2011-12-07T09:39:26-08:00
 
 It's hard to say I'm afraid, as both sites (completely unrelated sites but built by the same developer) had just switched to this platform and there needed to be a lot of other 301 action for various URLs that had changed on one of them, so it wouldn't be a very clean test...
 
 1 0
 
 It's hard to say I'm afraid, as both sites (completely unrelated sites but built by the same developer) had just switched to this platform and there needed to be a lot of other 301 action for various URLs that had changed on one of them, so it wouldn't be a very clean test...
 Cancel
 - Dr. Peter J. Meyers
 
 2011-12-07T10:00:37-08:00
 
 No worries - mostly just curious. I don't want to tell people not to worry if problems are still popping up regularly.
 
 1 0
 
 No worries - mostly just curious. I don't want to tell people not to worry if problems are still popping up regularly.
 Cancel
 - Rob Chant
 
 2011-12-07T10:03:29-08:00
 
 Well, the site as a whole definitely suffered significantly due to the change of platform, but I think it was more likely down to all the URLs and page content changing. Still, it's an easy problem to fix, and I'd always much rather fix it properly than rely on Google to sort it out.
 
 1 0
 
 Well, the site as a whole definitely suffered significantly due to the change of platform, but I think it was more likely down to all the URLs and page content changing. Still, it's an easy problem to fix, and I'd always much rather fix it properly than rely on Google to sort it out.
 Cancel
Caroline Watkin

2011-12-21T04:30:28-08:00

Great post...but something I'm still not clear on re cross domain content duplication...

We have over 20 different country sites to manage each with a different domain:

e.g. www.moneyboxsaver.co.uk, www.moneyboxsaver.com.au, www.moneyboxsaver.co.nz each by the way these aren't actually my sites...

Much of the article content we want to create would be relevant in each country - what's the best practice here, can I post the same content on each of these countries (barring human translation for non-English speaking countrries which I believe isn't seen as duplication) Should we take a 'syndicated' view of our own content?

KeriMorgret edited 2011-12-22T11:59:57-08:00
2 0

Great post...but something I'm still not clear on re cross domain content duplication... We have over 20 different country sites to manage each with a different domain: e.g. www.moneyboxsaver.co.uk, www.moneyboxsaver.com.au, <a href="">www.moneyboxsaver.co.nz each</a> by the way these aren't actually my sites... Much of the article content we want to create would be relevant in each country - what's the best practice here, can I post the same content on each of these countries (barring human translation for non-English speaking countrries which I believe isn't seen as duplication) Should we take a 'syndicated' view of our own content?
Cancel
- Dominic108
 
 2011-12-27T16:56:54-08:00
 
 This is an important question. Before the Panda revolution, we had quite a few quotes from Matt Cutts and other Google engineers that suggest that it was fine to duplicate the content among same language countries. Some SEO experts have raised the concern that things are not so clear now. In my opinion, this concern is not founded. If you have almost the same content in the same language for different countries and your client can have a central administration with a single domain, then you can have one domain for all the countries and use IP delivery (without redirection) to adapt the content to each country. However, this only works if the different contents that you present (under a unique URL) to the different countries are almost identical. If, on the contrary, the content is very different, then, as you do now, you must use different domains for different countries. Even if the pages are very similar, you might be forced to use different URLs for different countries because of the administrative needs of your client. I believe that this will be fine and that the SEO experts that say otherwise are just trying to be on the safe side, without really understanding what is going on - nobody knows for sure what is the situation. The best is to go with the constraints from your clients and what needs to be presented to the visitors. I believe that the quotes from Matt Cutts and others still apply today : duplicate content when the different URLs are targeted toward different countries is fine. One things is certain : even if you use different URLs for the same content, as long as the different URLs are for different countries and you do IP delivery, you do not need to worry about dilution of inbound links because they are considered separately in each country. The point is that, when Googlebot follows the links, it gets redirected as any other user agents. In this way, each instance of Googlebot sees its own version of the links (with their redirections). As far as every individual instance of Googlebot is concerned, there is no duplicate content. It is very unlikely that one instance of Googlebot is going to concern itself with what is seen by the other instances to check whether there is a duplication of content among the different instances. This is corroborated by the fact that I have read that many websites still rank very well despite the fact that there is a duplication of content among same language countries.
 
 Of course, I am sure that you want to know the opinion of the experts here, as much as I do. I am just a guest.
 
 1 0
 
 This is an important question. Before the Panda revolution, we had quite a few quotes from Matt Cutts and other Google engineers that suggest that it was fine to duplicate the content among same language countries. Some SEO experts have raised the concern that things are not so clear now. In my opinion, this concern is not founded. If you have almost the same content in the same language for different countries and your client can have a central administration with a single domain, then you can have one domain for all the countries and use IP delivery (without redirection) to adapt the content to each country. However, this only works if the different contents that you present (under a unique URL) to the different countries are almost identical. If, on the contrary, the content is very different, then, as you do now, you must use different domains for different countries. Even if the pages are very similar, you might be forced to use different URLs for different countries because of the administrative needs of your client. I believe that this will be fine and that the SEO experts that say otherwise are just trying to be on the safe side, without really understanding what is going on - nobody knows for sure what is the situation. The best is to go with the constraints from your clients and what needs to be presented to the visitors. I believe that the quotes from Matt Cutts and others still apply today : duplicate content when the different URLs are targeted toward different countries is fine. One things is certain : even if you use different URLs for the same content, as long as the different URLs are for different countries and you do IP delivery, you do not need to worry about dilution of inbound links because they are considered separately in each country. The point is that, when Googlebot follows the links, it gets redirected as any other user agents. In this way, each instance of Googlebot sees its own version of the links (with their redirections). As far as every individual instance of Googlebot is concerned, there is no duplicate content. It is very unlikely that one instance of Googlebot is going to concern itself with what is seen by the other instances to check whether there is a duplication of content among the different instances. This is corroborated by the fact that I have read that many websites still rank very well despite the fact that there is a duplication of content among same language countries. Of course, I am sure that you want to know the opinion of the experts here, as much as I do. I am just a guest. 
 Cancel
lostpet

2011-12-31T09:19:15-08:00

Hi, We wrote a book 10 years ago. The book is a collection of hundreds of stand alon articles. We are the original authors and copyright owners. Now we want to revise it with the help of our online community. The plan was to build the site www.keepingpetssafe.com (it is under construction) check it out.

Then we want to add a BLOG and blog the content one piece at a time, (one article each day for one year) and allow comments, I AM AFRAID after reading your warnings about duplicate content we could be about to implement a losing strategy. The blog would be alive and social, reaching out and involving the community. It would link to the "reference", main part of the website, which contains all of the original articles, blogged and not yet blogged. If you think it is possible to use this strategy do you recommend installing blog withing a single url such as www.keepingpetssafe.com/blog ? thank You

KeriMorgret edited 2011-12-31T09:21:09-08:00
2 0

Hi, We wrote a book 10 years ago. The book is a collection of hundreds of stand alon articles. We are the original authors and copyright owners. Now we want to revise it with the help of our online community. The plan was to build the site www.keepingpetssafe.com (it is under construction) check it out. Then we want to add a BLOG and blog the content one piece at a time, (one article each day for one year) and allow comments, I AM AFRAID after reading your warnings about duplicate content we could be about to implement a losing strategy. The blog would be alive and social, reaching out and involving the community. It would link to the "reference", main part of the website, which contains all of the original articles, blogged and not yet blogged. If you think it is possible to use this strategy do you recommend installing blog withing a single url such as www.keepingpetssafe.com/blog ? thank You
Cancel
Himanshu Sharma

2011-11-17T04:28:50-08:00

So that's your '6000 words and 38 images' post. Awesome. I think any post better than this on duplicate contents will be a duplicate of your post. I have no choice but to bookmark this post. Great job Peter.

2 0

So that's your '6000 words and 38 images' post. Awesome. I think any post better than this on duplicate contents will be a duplicate of your post. I have no choice but to bookmark this post. Great job Peter. 
Cancel
Alan Mosley

2011-11-17T04:00:23-08:00

I find the problem concerning trailing slashes, is not that canonical issue, but that many sites (wordpress) have 301 redirects to a trailing slash, but have internal links using the non-trailing slash, causing a un-necessary 301 redirect and a leak in link juice

2 0

I find the problem concerning trailing slashes, is not that canonical issue, but that many sites (wordpress) have 301 redirects to a trailing slash, but have internal links using the non-trailing slash, causing a un-necessary 301 redirect and a leak in link juice
Cancel
Benj Arriola

2012-01-04T10:21:20-08:00

Excellent post Dr. Pete! In the examples, I think one more you can mention is variable order. Like:

example.com?var1=val1&var2=val2

example.com?var2=val2&var1=val1

And for diagnosing duplicate content, you do mention GWT, Search Commands and Moz' Tools which are all good. I guess it is still worth mentioning CopyScape also.

2 0

Excellent post Dr. Pete! In the examples, I think one more you can mention is variable order. Like: example.com?var1=val1&var2=val2 example.com?var2=val2&var1=val1 And for diagnosing duplicate content, you do mention GWT, Search Commands and Moz' Tools which are all good. I guess it is still worth mentioning CopyScape also.
Cancel
- Dr. Peter J. Meyers
 
 2012-01-04T10:30:15-08:00
 
 Good call - one great way to make a URL-based duplicate problem even worse is to have a CMS that appends the variables in different orders depending on the path you took. It's amazing how easy it is to spin out 100 copies of a page.
 
 1 0
 
 Good call - one great way to make a URL-based duplicate problem even worse is to have a CMS that appends the variables in different orders depending on the path you took. It's amazing how easy it is to spin out 100 copies of a page.
 Cancel
AjayYadav_InboundMarketer

2011-11-17T05:15:58-08:00

Dr. Pete .. I have become fan of you.... Almost everything and every kind of situations regarding duplicate contents ..you have explained with the best possible ways to solve those issues... you are truly a Dr. for your field of expertise..

Your another post about "Catastrophic effects of wrong cannonicalization" is highly valuable for me .....I got so much info without utilizing my ATP's for doing Research.. just by referring your posts..It was easy to get thourough knowledge about it. Please ..please .. keep on posting...these kinds of quality posts..

Thanks

AjayYadav_InboundMarketer edited 2011-11-17T05:16:42-08:00
3 1

Dr. Pete .. I have become fan of you.... Almost everything and every kind of situations regarding duplicate contents ..you have explained with the best possible ways to solve those issues... you are truly a Dr. for your field of expertise.. Your another post about "Catastrophic effects of wrong cannonicalization" is highly valuable for me .....I got so much info without utilizing my ATP's for doing Research.. just by referring your posts..It was easy to get thourough knowledge about it. Please ..please .. keep on posting...these kinds of quality posts.. Thanks
Cancel
Moosa Hemani

2011-11-16T22:50:18-08:00

Dr. Pete,

This is an amazing resource that i can refer to someone who needs all the details of duplicate content all at one place. Also, thank you for dividing the post in to different headings, as this allows me to digest the long big post easily.

Duplicate content is a major problem if you are dealing with either ecommerce sites of sites that produce massive amount of content from time to time. I was dealing with the massive amount of duplicate content due to the CMS platform as it automatically create multiple versions of URL displaying the same content. At that time i simply blocked that area of a website through robots.txt, your post actually gives me an idea that there can be better options to deal with it.

I will surly look in to that matter again and come up to a better strategy!

A Great Resource without any doubt!

2 0

Dr. Pete, This is an amazing resource that i can refer to someone who needs all the details of duplicate content all at one place. Also, thank you for dividing the post in to different headings, as this allows me to digest the long big post easily. Duplicate content is a major problem if you are dealing with either ecommerce sites of sites that produce massive amount of content from time to time. I was dealing with the massive amount of duplicate content due to the CMS platform as it automatically create multiple versions of URL displaying the same content. At that time i simply blocked that area of a website through robots.txt, your post actually gives me an idea that there can be better options to deal with it. I will surly look in to that matter again and come up to a better strategy! A Great Resource without any doubt!
Cancel
Steve Morgan

2011-11-17T05:07:57-08:00

Incredible post, Dr Pete. One section in particular should help out my friend's site - I've passed on the info to pass on to his web developers. Well explained and easy to understand, on a subject that has a tendency to be complicated, tricky and irritating all at the same time. Good job sir!

steviephil edited 2011-11-17T05:08:45-08:00
2 0

Incredible post, Dr Pete. One section in particular should help out my friend's site - I've passed on the info to pass on to his web developers. Well explained and easy to understand, on a subject that has a tendency to be complicated, tricky and irritating all at the same time. Good job sir!
Cancel
paolo83firenze

2012-05-27T10:14:36-07:00

Hi, Dr. Pete. I have a question about Duplicate Content and User-generated Content.

Say I have create a website that allows users to post their own content, like stories or reviews, and say that the users are pretty much copying wikipedia 90% of the time.

What would my safest strategy be? Hide user-generated content from google by robots.txt or "noindex,no follow"? The problem with this solution is that there will be almost no content left on the website.

I was thinking about putting up a structure like user-generated.mywebsite/content as opposed to mywebsite/user-generated/content so that at least the duplicate content problem will be limited to the subdomain only. Will this still end up damaging the site's rankings in the long run?

Thanks and I hope others will find the answer usefull.

1 0

Hi, Dr. Pete. I have a question about Duplicate Content and User-generated Content. Say I have create a website that allows users to post their own content, like stories or reviews, and say that the users are pretty much copying wikipedia 90% of the time. What would my safest strategy be? Hide user-generated content from google by robots.txt or "noindex,no follow"? The problem with this solution is that there will be almost no content left on the website. I was thinking about putting up a structure like user-generated.mywebsite/content as opposed to mywebsite/user-generated/content so that at least the duplicate content problem will be limited to the subdomain only. Will this still end up damaging the site's rankings in the long run? Thanks and I hope others will find the answer usefull.
Cancel
- Dr. Peter J. Meyers
 
 2012-05-27T12:41:32-07:00
 
 Unfortunately, there's no magic solution. UGC is great, but if it's all thin/duplicate content, then you're probably better off NOINDEX'ing then you are letting thousands of low-quality, duplicate pages get crawled. If all the UGC you're atttracting is low quality and scraped, then I think you have to ask what's not working. Maybe you need to aggressively moderate. Maybe you need to only allow certain users (who have participated for a while) to submit content. Maybe you need to reward good users and encourage unique content.
 
 You could isolate it to a sub-domain and potentially protect the root domain, but it still begs the question - why is the content like this? What are you building that's of value (for you, your users, or search users) if so much of the UGC is of this nature? In other words, there may be a better way all around, even aside from SEO.
 
 2 0
 
 Unfortunately, there's no magic solution. UGC is great, but if it's all thin/duplicate content, then you're probably better off NOINDEX'ing then you are letting thousands of low-quality, duplicate pages get crawled. If all the UGC you're atttracting is low quality and scraped, then I think you have to ask what's not working. Maybe you need to aggressively moderate. Maybe you need to only allow certain users (who have participated for a while) to submit content. Maybe you need to reward good users and encourage unique content. You could isolate it to a sub-domain and potentially protect the root domain, but it still begs the question - why is the content like this? What are you building that's of value (for you, your users, or search users) if so much of the UGC is of this nature? In other words, there may be a better way all around, even aside from SEO.
 Cancel
 - paolo83firenze
 
 2012-05-27T15:11:40-07:00
 
 Thank you for the very quick reply. I am probably worrying over nothing. I need to trust UGC a little more and probably come up with a structure that rewards good users and good content as well as mixing it up with unique content of my own.
 
 1 0
 
 Thank you for the very quick reply. I am probably worrying over nothing. I need to trust UGC a little more and probably come up with a structure that rewards good users and good content as well as mixing it up with unique content of my own.
 Cancel
Rameez Ramzan

2012-04-11T02:22:15-07:00

I was also searching this kind of articles because two days back, I made 10 geograpgical pages in my website so, I am little bit confused about website content. I added similar content in all pages so this is a duplicate content or others. In future i want to make around 1000 geographic pages so, every time I add new content in pages or I use similar content

RameezRamzan edited 2012-04-11T02:54:03-07:00
1 0

I was also searching this kind of articles because two days back, I made 10 geograpgical pages in my website so, I am little bit confused about website content. I added similar content in all pages so this is a duplicate content or others. In future i want to make around 1000 geographic pages so, every time I add new content in pages or I use similar content
Cancel
Mark Garner

2012-05-25T22:20:16-07:00

Great post, almost too long, may have been better as a pdf guide.

Anyway, 2 points

1: If you use the canonical url with the trailing slash as you've done, doesn't that make the whole site appear as one page?

IE should it be

<link href="https://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world" /> or

<link href="https://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world" >

I would have thought you'd want the engines to index the page and not a directory.

2 Despite what Matt Cutts says Google is not able to figure out the differnce between slashes and non slashes. My Google Webmaster Tools account is full of messages telling me I've got duplicate content for slashed and non slashed urls.

So I'd really like to know how to fix that.

Thanks

Mark33 edited 2012-05-25T22:22:03-07:00
1 0

Great post, almost too long, may have been better as a pdf guide. Anyway, 2 points 1: If you use the canonical url with the trailing slash as you've done, doesn't that make the whole site appear as one page? IE should it be <link href="https://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world" /> or <link href="https://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world" > I would have thought you'd want the engines to index the page and not a directory. 2 Despite what Matt Cutts says Google is not able to figure out the differnce between slashes and non slashes. My Google Webmaster Tools account is full of messages telling me I've got duplicate content for slashed and non slashed urls. So I'd really like to know how to fix that. Thanks
Cancel
- Dr. Peter J. Meyers
 
 2012-05-26T11:51:59-07:00
 
 There is a downloadable (PDF) version at the very end of the post, for convenience.
 
 (1) That trailing slash is actually the closing slash for the tag itself. Since <link> has no closing tag (i.e. you don't put </link> after it), the "/>" format is technically correct. Honestly, though, it generally works either way.
 
 (2) I've found it rarely makes a big difference these days, but standard practice is to 301 redirect to the preferred version. It's also best to using a consistent version internal - if you use "www.example.com/", then your internal links should reflect that, as well as any canonicalization.
 
 1 0
 
 There is a downloadable (PDF) version at the very end of the post, for convenience. (1) That trailing slash is actually the closing slash for the tag itself. Since <link> has no closing tag (i.e. you don't put </link> after it), the "/>" format is technically correct. Honestly, though, it generally works either way. (2) I've found it rarely makes a big difference these days, but standard practice is to 301 redirect to the preferred version. It's also best to using a consistent version internal - if you use "www.example.com/", then your internal links should reflect that, as well as any canonicalization.
 Cancel
 - Mark Garner
 
 2012-05-26T20:31:31-07:00
 
 Dr Pete,
 
 thanks for the quick response.
 
 I also found this explanation which may help others
 
 https://webdesign.about.com/od/beginningtutorials/f/why-urls-end-in-slash.htm
 
 1 0
 
 Dr Pete, thanks for the quick response. I also found this explanation which may help others https://webdesign.about.com/od/beginningtutorials/f/why-urls-end-in-slash.htm
 Cancel
WolfBlass

2012-06-06T08:37:36-07:00

Hi Pete,

your article relates to domains with same owners.

How we can handle legal duplicate content on competitive domains of different owners?

Our pharmacy software is used by 150 pharmacies. We supply 80 000 descriptions of drugs. This result in 150 duplicates for 80.000 products. Unique content would result in 12 Million unique descriptions. This is not affordable and high risk given to pharmacy standards.

We can’t use cross-domain-canonicalization, because the multiple 150 domains are competitors. And we can not use all of your presented single-owner-solutions for the same reason.

How to deal with competitive-domain-duplicate-content?“

Any idea?

Regards, Wolf

1 0

Hi Pete, your article relates to domains with same owners. How we can handle legal duplicate content on competitive domains of different owners? Our pharmacy software is used by 150 pharmacies. We supply 80 000 descriptions of drugs. This result in 150 duplicates for 80.000 products. Unique content would result in 12 Million unique descriptions. This is not affordable and high risk given to pharmacy standards. We can’t use cross-domain-canonicalization, because the multiple 150 domains are competitors. And we can not use all of your presented single-owner-solutions for the same reason. How to deal with competitive-domain-duplicate-content?“ Any idea? Regards, Wolf
Cancel
- Dr. Peter J. Meyers
 
 2012-06-06T12:03:59-07:00
 
 I'll be honest - there's no magic bullet. 150 different sites won't all rank equally for 80K products. Without cross-domain canonicalization, syndication-source, or getting them to link back to you, you've got to compete with them head-to-head. That means building a stronger link profile, more authority, and as much unique content as you can muster.
 
 You may also want to consider focusing your energy and search index. If your site is relatively new or has a weak-to-moderate link profile don't try to rank all 80K pages at once. Focus on the money-makers and really try to hit 500-1,000 hard.
 
 1 0
 
 I'll be honest - there's no magic bullet. 150 different sites won't all rank equally for 80K products. Without cross-domain canonicalization, syndication-source, or getting them to link back to you, you've got to compete with them head-to-head. That means building a stronger link profile, more authority, and as much unique content as you can muster. You may also want to consider focusing your energy and search index. If your site is relatively new or has a weak-to-moderate link profile don't try to rank all 80K pages at once. Focus on the money-makers and really try to hit 500-1,000 hard.
 Cancel
TopicBay

2012-04-16T06:37:18-07:00

WOW. Very comprehensive piece. Lots answered.

1 0

WOW. Very comprehensive piece. Lots answered. 
Cancel
SamsonMedia.net

2011-12-09T11:37:29-08:00

So as an Internet marketer who utilizes article syndication as an inbound linking strategy, you're saying that as long as each duplicate article links back to the original source material (usually via the author resource box) the seo value will remain intact? GREAT article BTW. Thank you! -gene

1 0

So as an Internet marketer who utilizes article syndication as an inbound linking strategy, you're saying that as long as each duplicate article links back to the original source material (usually via the author resource box) the seo value will remain intact? GREAT article BTW. Thank you! -gene
Cancel
- Dr. Peter J. Meyers
 
 2011-12-09T14:35:35-08:00
 
 Linking back properly will help Google understand the true source (and is probably a positive trust signal), but it won't necessarily help you rank as the one syndicating the content. I think it depends a lot on how much syndicated content you use and whether there's enough unique content in the mix to back it up. Google could still devalue these pages if they make up too much of your site.
 
 1 0
 
 Linking back properly will help Google understand the true source (and is probably a positive trust signal), but it won't necessarily help you rank as the one syndicating the content. I think it depends a lot on how much syndicated content you use and whether there's enough unique content in the mix to back it up. Google could still devalue these pages if they make up too much of your site.
 Cancel
vivien

2011-12-08T20:23:26-08:00

hello and thank you for the great post!

I have a question. You wrote "While robots.txt is effective for blocking uncrawled content, [...]."

Well I have a problem with a new website: google still crawls all pages even the ones blocked in the robots.txt. I have about 150 pages not blocked by robots but yesterday google crawled 903 pages in 1 day (and it's increasing regularly)... According to what you wrote in "the crawl budget" it's not that good.

Not only he crawls all the pages but he also indexes them (I can see them only after I click "repeat the search with the omitted results included").

Actually I posted this problem (is it a problem?) more in details here: https://stackoverflow.com/questions/8440681/why-is-google-crawling-pages-blocked-by-my-robots-txt

If you have time to answer it would be great!

Thanks!

1 0

hello and thank you for the great post! I have a question. You wrote "While robots.txt is effective for blocking uncrawled content, [...]." Well I have a problem with a new website: google still crawls all pages even the ones blocked in the robots.txt. I have about 150 pages not blocked by robots but yesterday google crawled 903 pages in 1 day (and it's increasing regularly)... According to what you wrote in "the crawl budget" it's not that good. Not only he crawls all the pages but he also indexes them (I can see them only after I click "repeat the search with the omitted results included"). Actually I posted this problem (is it a problem?) more in details here: https://stackoverflow.com/questions/8440681/why-is-google-crawling-pages-blocked-by-my-robots-txt If you have time to answer it would be great! Thanks!
Cancel
- Dr. Peter J. Meyers
 
 2011-12-09T10:05:54-08:00
 
 I find Google respects Robots.txt more as a preventive measure (block a folder from being crawled, for example) than as a cure for duplicate content. Once it's indexed, Robots.txt won't always remove it. If you block a large percentage of pages, Google may also start to ignore Robots.txt.
 
 I'd strongly consider a switch to META NOINDEX and/or possibly using the canonical tag, where appropriate. Blocking too much of a site with Robots.txt can get mess and, as you said, just doesn't always work as intended.
 
 1 0
 
 I find Google respects Robots.txt more as a preventive measure (block a folder from being crawled, for example) than as a cure for duplicate content. Once it's indexed, Robots.txt won't always remove it. If you block a large percentage of pages, Google may also start to ignore Robots.txt. I'd strongly consider a switch to META NOINDEX and/or possibly using the canonical tag, where appropriate. Blocking too much of a site with Robots.txt can get mess and, as you said, just doesn't always work as intended.
 Cancel
jpwest

2012-07-03T10:20:54-07:00

Thanks for the outstanding post and ongoing discussion. I didn't see this specifically addressed - on an ecommerce site, what's the best approach for presenting product detail pages that are the same product offered in different formats? For example, a book that is available in paperback, hardcover and ebook formats. We can do some differentiating based on the format, but the title and description fields are likely to be quite similar. Should we Rel-canonical these to one main format?

1 0

Thanks for the outstanding post and ongoing discussion. I didn't see this specifically addressed - on an ecommerce site, what's the best approach for presenting product detail pages that are the same product offered in different formats? For example, a book that is available in paperback, hardcover and ebook formats. We can do some differentiating based on the format, but the title and description fields are likely to be quite similar. Should we Rel-canonical these to one main format?
Cancel
- Dr. Peter J. Meyers
 
 2012-07-05T19:43:47-07:00
 
 It's a similar situation to V-14 in the post ("Product Variations"). Honestly, it's very situational - those pages can look thing AND they can have long-tail SEO value, so it's a balancing act. In the case of something like books, I think it's probably best to only index a "parent" page and not all of the formats, for a couple of reasons:
 
 (1) The format pages will probably only vary by a price and a few words, and will look thin, especially multiplied across 100s or 1000s of products.
 
 (2) If you land searchers on the parent page, they have the option of choosing their format. In general, I think it's a better search user experience.
 
 If your Amazon, you may have the authority and luxury of indexing everything, but that doesn't apply to most sites. I think the focus is generally better for SEO.
 
 1 0
 
 It's a similar situation to V-14 in the post ("Product Variations"). Honestly, it's very situational - those pages can look thing AND they can have long-tail SEO value, so it's a balancing act. In the case of something like books, I think it's probably best to only index a "parent" page and not all of the formats, for a couple of reasons: (1) The format pages will probably only vary by a price and a few words, and will look thin, especially multiplied across 100s or 1000s of products. (2) If you land searchers on the parent page, they have the option of choosing their format. In general, I think it's a better search user experience. If your Amazon, you may have the authority and luxury of indexing everything, but that doesn't apply to most sites. I think the focus is generally better for SEO.
 Cancel
HiteshJain87

2012-06-29T04:00:51-07:00

Hi Friends,

I need help on this. My US Based client is a interior designer and runs his service in multiple places like Michigan, Ohio, Florida.

I want to know will his 3 website be penalised for duplicate content by google if he has :

1. 3 Websites like InteriorDesignersMichigan.com, InteriorDesignersOhio.com, InteriorDesignersFlorida.com

2. The look and feel, the website template and content text of all the websites are same.

3. But the city names are changed in the content text.

For example all three website have content like:

1. Design Tech is a interior design company Since 1993 for office space in Michigan. We provide our services.....Blah...Blah

2. Design Tech is a interior design company Since 1993 for office space in Ohio. We provide our services.....Blah...Blah

3. Design Tech is a interior design company Since 1993 for office space in Florida. We provide our services.....Blah...Blah

Please suggest will all the three websites be panelised for duplicate content.

1 0

Hi Friends, I need help on this. My US Based client is a interior designer and runs his service in multiple places like Michigan, Ohio, Florida. I want to know will his 3 website be penalised for duplicate content by google if he has : 1. 3 Websites like InteriorDesignersMichigan.com, InteriorDesignersOhio.com, InteriorDesignersFlorida.com 2. The look and feel, the website template and content text of all the websites are same. 3. But the city names are changed in the content text. For example all three website have content like: 1. Design Tech is a interior design company Since 1993 for office space in Michigan. We provide our services.....Blah...Blah 2. Design Tech is a interior design company Since 1993 for office space in Ohio. We provide our services.....Blah...Blah 3. Design Tech is a interior design company Since 1993 for office space in Florida. We provide our services.....Blah...Blah Please suggest will all the three websites be panelised for duplicate content. 
Cancel
- Dr. Peter J. Meyers
 
 2012-06-29T19:50:12-07:00
 
 The short answer is: Yes, you are at risk, especially if the sites are linked to each other. With three sites, it's a bit hard to judge how risky it is. You haven't built out dozens of sites. Still, this content is "thin" almost my definition - it's duplicated except for a few keywords, and that's a tactic Google is devaluing more very day.
 
 In most cases, one or two of the sites would just get devalued or the links between the sites would get discounted. In extreme cases, you could run into trouble with Panda. There was a time when microsites worked. Now, their SEO advantages are limited (and there's some risk). So, the question is - is it worth splitting your efforts three ways? In most cases, in 2012, I don't think it is. I think you would be better trying to build up unique content for all regions or focusing on one core set of content and then having a small amount of unique content for each region.
 
 2 0
 
 The short answer is: Yes, you are at risk, especially if the sites are linked to each other. With three sites, it's a bit hard to judge how risky it is. You haven't built out dozens of sites. Still, this content is "thin" almost my definition - it's duplicated except for a few keywords, and that's a tactic Google is devaluing more very day. In most cases, one or two of the sites would just get devalued or the links between the sites would get discounted. In extreme cases, you could run into trouble with Panda. There was a time when microsites worked. Now, their SEO advantages are limited (and there's some risk). So, the question is - is it worth splitting your efforts three ways? In most cases, in 2012, I don't think it is. I think you would be better trying to build up unique content for all regions or focusing on one core set of content and then having a small amount of unique content for each region.
 Cancel
 - HiteshJain87
 
 2012-07-02T18:44:51-07:00
 
 Thanks Dr.Pete for your valuable suggestions. My 3 website are not linked to each other but still I would start writing unique content for all the 3 websites.
 
 1 0
 
 Thanks Dr.Pete for your valuable suggestions. My 3 website are not linked to each other but still I would start writing unique content for all the 3 websites. 
 Cancel
Bulrush

2012-06-15T07:42:44-07:00

Dr. Pete

Excellent article..thank you very much.

I'm now implementing and API with daily updates to import product content from one ecommerce site to another, they have totally seperate domains. I am not intending to link the sites or institute 301's or cross-domain canonical tags

If I use my own additional unique content with additional products and articles surrounding the duplicate data for indexing, is it still advisable to block indexation of the product content being imported, to avoid penalization of either the feed supplier url or my own?

Thanks

1 0

Dr. Pete Excellent article..thank you very much. I'm now implementing and API with daily updates to import product content from one ecommerce site to another, they have totally seperate domains. I am not intending to link the sites or institute 301's or cross-domain canonical tags If I use my own additional unique content with additional products and articles surrounding the duplicate data for indexing, is it still advisable to block indexation of the product content being imported, to avoid penalization of either the feed supplier url or my own? Thanks
Cancel
marco2012

2012-04-05T18:38:27-07:00

Even if I am a newbie for SEO and non English native speaker, I found this article so very informative and helpful. Thanks so much. However, as I am not a coder, I would have appreciated it even more if you had added very short basic examples about how to implement your suggestions. For example, coming to Meta Robots and Canonical Tag paragraphs, let's say I have example.com/hotels.php and I want to add canonical tag for Google not to index pages such as 'example.com/hotels.php?city=...' , should I add canonical tag to the "hotels.php" file? And if at the same time I need to index pages such as 'example.com/hotels.php?rates=...' , the canonical tag I have added for the parameter "city" automatically will suggest to Google not to index pages including "rates" and any other search parameter too?

Also, how do I fix the duplicate content problem on WordPress sites, as I have only a file there, index.php, and many potential duplicate (index.php/category/...., index.php/archive/...., index.php/tag/...., etc...) ?

Thanks for your patience

1 0

Even if I am a newbie for SEO and non English native speaker, I found this article so very informative and helpful. Thanks so much. However, as I am not a coder, I would have appreciated it even more if you had added very short basic examples about how to implement your suggestions. For example, coming to Meta Robots and Canonical Tag paragraphs, let's say I have example.com/hotels.php and I want to add canonical tag for Google not to index pages such as 'example.com/hotels.php?city=...' , should I add canonical tag to the "hotels.php" file? And if at the same time I need to index pages such as 'example.com/hotels.php?rates=...' , the canonical tag I have added for the parameter "city" automatically will suggest to Google not to index pages including "rates" and any other search parameter too? Also, how do I fix the duplicate content problem on WordPress sites, as I have only a file there, index.php, and many potential duplicate (index.php/category/...., index.php/archive/...., index.php/tag/...., etc...) ? Thanks for your patience
Cancel
eververs

2012-06-08T07:15:34-07:00

Articles keep getting bigger and bigger lately ;-) But I love this one nonetheless.

I was wondering what could go wrong with internal duplicate content when using country based pages. Google recommends a few methods of which one is (domain.com/us/, domain.com/uk/ almost similar content), but you mention that it could go wrong here, do you know the specific details as to why this could be happening?

Thanks for this great post.

1 0

Articles keep getting bigger and bigger lately ;-) But I love this one nonetheless. I was wondering what could go wrong with internal duplicate content when using country based pages. Google recommends a few methods of which one is (domain.com/us/, domain.com/uk/ almost similar content), but you mention that it could go wrong here, do you know the specific details as to why this could be happening? Thanks for this great post.
Cancel
rmsmall

2012-06-06T13:18:17-07:00

Great stuff Dr Pete.

I made the mistake of allowing a staging server to get indexed and about a month ago the live server lost it's rankings. I contacted Google via webmaster tools for the development server and their response was that no manual actions were taken against the site.

In a scenario like this would Google consider any action they take to be algorithmic?

If the duplicate content were removed (it was removed a month ago) could we expect to see any action to be reversed? We haven't seen any improvements in rankings yet.

Is there any way to know with certainty there was a duplicate content action taken? The staging server has no links, no meta information, no traffic, so it would seem odd to me that Google would even view this as a problem.

Any suggestions?

Many thanks

rmsmall edited 2012-06-06T13:23:41-07:00
1 0

Great stuff Dr Pete. I made the mistake of allowing a staging server to get indexed and about a month ago the live server lost it's rankings. I contacted Google via webmaster tools for the development server and their response was that no manual actions were taken against the site. In a scenario like this would Google consider any action they take to be algorithmic? If the duplicate content were removed (it was removed a month ago) could we expect to see any action to be reversed? We haven't seen any improvements in rankings yet. Is there any way to know with certainty there was a duplicate content action taken? The staging server has no links, no meta information, no traffic, so it would seem odd to me that Google would even view this as a problem. Any suggestions? Many thanks
Cancel
- Dr. Peter J. Meyers
 
 2012-06-06T14:25:45-07:00
 
 If it's just a filter (and this can be tough to tell), de-indexing the staging server should help relatively quickly. The trick is often to actually get it de-indexed. Monitor those URLs carefully - ranking won't recover until they fall out. It really depends on the scope, what got indexed, and how you removed it.
 
 If it was Panda related, then you may have to wait for a data update, and that can still be a month or so (between data updates).
 
 1 0
 
 If it's just a filter (and this can be tough to tell), de-indexing the staging server should help relatively quickly. The trick is often to actually get it de-indexed. Monitor those URLs carefully - ranking won't recover until they fall out. It really depends on the scope, what got indexed, and how you removed it. If it was Panda related, then you may have to wait for a data update, and that can still be a month or so (between data updates).
 Cancel
Justin Hammack

2012-01-05T17:27:45-08:00

Finally had a chance to read this post, I really appreciate how descriptive it is. Thanx!

1 0

Finally had a chance to read this post, I really appreciate how descriptive it is. Thanx!
Cancel
SEOmed

2012-01-26T11:00:28-08:00

sir ! i should first thank you for all ur hard work and interest (mainly) to write this greatest post (rather should say resource) ...

i am having so much problem with the scrapper sites and recently posted a question in the google webmaster forums regarding this problem ... and i request you to help me answering my questions ...

The sites which are effected by the Google panda (take it as site X), Scrapper sites (take it as site Y), Site which is not effected by Google panda (take it as Z)..

First question : once the site got effected by google panda and after that if site Y copied articles from site X, if you see the results in the google then the articles which are indexed in google for site X are outranked by site Y (though this is some problem in google algo which is already mentioned by Matt cutts when i asked him in the google plus hangout), i want to know the permanent solution for this. Second question : for the above condition many people suggested that we have to file DMCA complaint against that particular site Y to get rid of that out ranking problem, even people from google suggested this (may be in webmaster forums). Everytime new article posted in site X is outranked by site Y and complaining again and same thing happening - dont you think this is a burden for people ? or dont you think this is going to be a time loss? Third question : in some instances one person reported to google that a site y is copying articles from site x and asked to remove those articles and lets take the number as 20 .... again same situation ... again same situation ..... still that scaper site is the winner..... finally that person asked the google to remove that site y completely ... the reply from the google is that unless we have minimum of 100 urls which are copied ones there is no way to remove the site as a whole form the google search, my question is that we have to wait for the google to invent new solution for this outranking problem till the 99 article ??

========

final question dr. in your name is related to medical doc ? just asked out of my curiousity because i am doc in medical field ! .....

thank you once again !

SEOmed edited 2012-01-26T11:02:04-08:00
1 0

sir ! i should first thank you for all ur hard work and interest (mainly) to write this greatest post (rather should say resource) ... i am having so much problem with the scrapper sites and recently posted a question in the google webmaster forums regarding this problem ... and i request you to help me answering my questions ... The sites which are effected by the Google panda (take it as site X), Scrapper sites (take it as site Y), Site which is not effected by Google panda (take it as Z).. First question : once the site got effected by google panda and after that if site Y copied articles from site X, if you see the results in the google then the articles which are indexed in google for site X are outranked by site Y (though this is some problem in google algo which is already mentioned by Matt cutts when i asked him in the google plus hangout), i want to know the permanent solution for this. Second question : for the above condition many people suggested that we have to file DMCA complaint against that particular site Y to get rid of that out ranking problem, even people from google suggested this (may be in webmaster forums). Everytime new article posted in site X is outranked by site Y and complaining again and same thing happening - dont you think this is a burden for people ? or dont you think this is going to be a time loss? Third question : in some instances one person reported to google that a site y is copying articles from site x and asked to remove those articles and lets take the number as 20 .... again same situation ... again same situation ..... still that scaper site is the winner..... finally that person asked the google to remove that site y completely ... the reply from the google is that unless we have minimum of 100 urls which are copied ones there is no way to remove the site as a whole form the google search, my question is that we have to wait for the google to invent new solution for this outranking problem till the 99 article ?? ======== final question dr. in your name is related to medical doc ? just asked out of my curiousity because i am doc in medical field ! ..... thank you once again ! 
Cancel
- Dr. Peter J. Meyers
 
 2012-01-26T11:19:07-08:00
 
 Unfortuantely, there is no permanent solution. Determining the source of content is tricky, and Google is getting it wrong in plenty of situations. It's important to fix your internal problems, of course (including Panda). You've got to build up authority and your link profile, and you've got to get out signals that tell Google your content came first.
 
 If they're flat out stealing, you can take legal action (including DMCA), but it's going to take time and probably money. So, it's always a trade-off of how aggressive you want to get. If it's consistently one site, I think that fight makes more sense.
 
 I'm an experimental psychologist by training, not an MD.
 
 2 0
 
 Unfortuantely, there is no permanent solution. Determining the source of content is tricky, and Google is getting it wrong in plenty of situations. It's important to fix your internal problems, of course (including Panda). You've got to build up authority and your link profile, and you've got to get out signals that tell Google your content came first. If they're flat out stealing, you can take legal action (including DMCA), but it's going to take time and probably money. So, it's always a trade-off of how aggressive you want to get. If it's consistently one site, I think that fight makes more sense. I'm an experimental psychologist by training, not an MD.
 Cancel
 - SEOmed
 
 2012-01-31T00:59:02-08:00
 
 I am facing lot of problems with blogspot blogs .. see for example if you take medium sized sites which are effected by panda and if the content is copied exactly by the blogspot blogs ... they are easily overranking the original source and may be people knowing this may even do harm by knowing this proceedure ... is the main domain power or authority what u say for that (blogspot.com) is the reason for over ranking ? no other way to soleve this problem and if u are interested i will personally show u one example where i faced 6 times the scrapper attacks from blogspot blogs!
 
 1 0
 
 I am facing lot of problems with blogspot blogs .. see for example if you take medium sized sites which are effected by panda and if the content is copied exactly by the blogspot blogs ... they are easily overranking the original source and may be people knowing this may even do harm by knowing this proceedure ... is the main domain power or authority what u say for that (blogspot.com) is the reason for over ranking ? no other way to soleve this problem and if u are interested i will personally show u one example where i faced 6 times the scrapper attacks from blogspot blogs!
 Cancel
SEOTranslator

2012-01-28T09:43:27-08:00

Impressive post, really a must-read because it perfectly summarizes the duplicate content issues.

However, for #10 (International Duplicates) in the examples there is a real easy solution. I suggest that you use the new rel alternate canonical link element (check it out at https://googlewebmastercentral.blogspot.com/2011/12/new-markup-for-multilingual-content.html)

SEOTranslator edited 2012-01-28T10:59:24-08:00
1 0

Impressive post, really a must-read because it perfectly summarizes the duplicate content issues. However, for #10 (International Duplicates) in the examples there is a real easy solution. I suggest that you use the new rel alternate canonical link element (check it out at <a href="https://googlewebmastercentral.blogspot.com/2011/12/new-markup-for-multilingual-content.html" rel="nofollow">https://googlewebmastercentral.blogspot.com/2011/12/new-markup-for-multilingual-content.html</a>)
Cancel
PPC-Management

2012-01-10T07:42:17-08:00

This is the far by the most detailed and well explained post about duplicate contents.Thanks so much for this.I hope to read more from you sir.

1 0

This is the far by the most detailed and well explained post about duplicate contents.Thanks so much for this.I hope to read more from you sir.
Cancel
Latha Lukose

2012-01-24T21:52:40-08:00

Awesome post... I am linking to this from my next article. My readers will love it.

1 0

Awesome post... I am linking to this from my next article. My readers will love it.
Cancel
Ethan805

2012-07-06T13:48:30-07:00

Can someone please lend me a hand?

I had someone redesign a site for a client of mine, and there was a plugin used that added ?cbg_tz=0 at the end of all the URL's on the whole site. The plugin was removed but I want to know how I can make sure these links don't show up in the index anymore, as they still do. I am positive this is why I am not able to get him to show up even in the top 1,000 results for his KW terms. (local terms, low-med competition)

1 0

Can someone please lend me a hand? I had someone redesign a site for a client of mine, and there was a plugin used that added ?cbg_tz=0 at the end of all the URL's on the whole site. The plugin was removed but I want to know how I can make sure these links don't show up in the index anymore, as they still do. I am positive this is why I am not able to get him to show up even in the top 1,000 results for his KW terms. (local terms, low-med competition)
Cancel
Ewan Kennedy

2012-01-22T11:35:31-08:00

A very complex subject explained in a very clear and logical way. A terrific reference work.

1 0

A very complex subject explained in a very clear and logical way. A terrific reference work.
Cancel
Dominic108

2011-12-06T09:34:46-08:00

This post is indeed a comprehensive resource on the subject matter. It provides answers that I could not find anywhere else to many questions that I had. I still have a few questions that are left unanswered, in particular about the canonical link element or canonical link tag (as it is called by Matt Cutt in https://www.mattcutts.com/blog/canonical-link-tag/.) One of the thing that I learned from you in a previous post is that it is in principle possible that <a href="https://www.seomoz.org/blog/6-extreme-canonical-tricks#jtc160924">the noindex signal in the canonical link tag is followed, but the link juice to the target is not passed</a>. In my opinion, it would be terrible if Google was doing that, very confusing. I would even say "unfair", but nothing is fair or unfair in a world where no laws exist. So, I was looking for ways to measure whether or not the link juice is passed. In Google Webmaster tool "Links to your site", if one clicks on a target page in the site, one obtains a list of all the external pages that has a link toward the target. One can even find out what are these external links. Let us consider an example, say the page example.com/source.html contains a canonical link tag toward example.com/target.html. Since, the former gets deindexed, only the latter one is a possible target in Google Webmaster tool. Nevertheless, one can see that the external links toward example.com/target.html is actually a link toward example.com/source.html. My first question is whether this is a strong indication that the link part of the canonical link element was followed by Google? Of course, the link juice that is passed from the external pages to the source, which contains the link element, depends on factors such as the authority of the external pages and the relevance of the anchor text, etc., but this is not the issue here. We are concerned about the link juice that is passed from the source to the target. The second question is, once we have determined that the link element is (fully) followed, are there additional factors used by Google to determine the link juice that is passed from the source to the target? I never heard about such factors. For example, I never heard that Google would analyze the differences between the source and the target to determine the link juice that will be passed or any thing like that. The only thing I ever seen mentioned is that there must be a small damping factor as in a 301 redirect, but this is fine and expected.

Dominic108 edited 2011-12-06T09:38:03-08:00
1 0

This post is indeed a comprehensive resource on the subject matter. It provides answers that I could not find anywhere else to many questions that I had. I still have a few questions that are left unanswered, in particular about the canonical link element or canonical link tag (as it is called by Matt Cutt in https://www.mattcutts.com/blog/canonical-link-tag/.) One of the thing that I learned from you in a previous post is that it is in principle possible that <a href="https://www.seomoz.org/blog/6-extreme-canonical-tricks#jtc160924">the noindex signal in the canonical link tag is followed, but the link juice to the target is not passed</a>. In my opinion, it would be terrible if Google was doing that, very confusing. I would even say "unfair", but nothing is fair or unfair in a world where no laws exist. So, I was looking for ways to measure whether or not the link juice is passed. In Google Webmaster tool "Links to your site", if one clicks on a target page in the site, one obtains a list of all the external pages that has a link toward the target. One can even find out what are these external links. Let us consider an example, say the page example.com/source.html contains a canonical link tag toward example.com/target.html. Since, the former gets deindexed, only the latter one is a possible target in Google Webmaster tool. Nevertheless, one can see that the external links toward example.com/target.html is actually a link toward example.com/source.html. My first question is whether this is a strong indication that the link part of the canonical link element was followed by Google? Of course, the link juice that is passed from the external pages to the source, which contains the link element, depends on factors such as the authority of the external pages and the relevance of the anchor text, etc., but this is not the issue here. We are concerned about the link juice that is passed from the source to the target. The second question is, once we have determined that the link element is (fully) followed, are there additional factors used by Google to determine the link juice that is passed from the source to the target? I never heard about such factors. For example, I never heard that Google would analyze the differences between the source and the target to determine the link juice that will be passed or any thing like that. The only thing I ever seen mentioned is that there must be a small damping factor as in a 301 redirect, but this is fine and expected.
Cancel
- Dr. Peter J. Meyers
 
 2011-12-07T09:22:40-08:00
 
 Unfortunately, it's nearly impossible to tell how much link-juice is being passed by any given link. If you had a ton of links to the canonical source and none to the target, you added the canonical tag, and the target suddenly started ranking, then it's pretty clear link-juice was passed. On the granular level, though, I'm afraid the data just isn't transparent.
 
 I suspect that there are cases where Google devalues or partially ignores a canonical, especially if it seems like you're abusing it (just canonicalizing a ton of pages for their link-juice, even though they have nothing in common). This happens with 301s from time to time. In this case, they might de-index the source page but not pass the link-juice.
 
 Honestly, though, I can't point to a clear example of that happening. If anything, Google is very lenient with canonical tag usage right now. There's some discussion that Bing might be less forgiving. I only suspect it could happen because we've seen it happen with 301s (as people have abused them).
 
 1 0
 
 Unfortunately, it's nearly impossible to tell how much link-juice is being passed by any given link. If you had a ton of links to the canonical source and none to the target, you added the canonical tag, and the target suddenly started ranking, then it's pretty clear link-juice was passed. On the granular level, though, I'm afraid the data just isn't transparent. I suspect that there are cases where Google devalues or partially ignores a canonical, especially if it seems like you're abusing it (just canonicalizing a ton of pages for their link-juice, even though they have nothing in common). This happens with 301s from time to time. In this case, they might de-index the source page but not pass the link-juice. Honestly, though, I can't point to a clear example of that happening. If anything, Google is very lenient with canonical tag usage right now. There's some discussion that Bing might be less forgiving. I only suspect it could happen because we've seen it happen with 301s (as people have abused them).
 Cancel
 - Dominic108
 
 2011-12-07T13:43:24-08:00
 
 Thank you for the clarification. Yes, the kind of experiments that you suggest will be great, I agree. I don't have the resources to do that. To do that, one needs to be big and have some controls over many sites that can be used for testing.
 
 1 0
 
 Thank you for the clarification. Yes, the kind of experiments that you suggest will be great, I agree. I don't have the resources to do that. To do that, one needs to be big and have some controls over many sites that can be used for testing. 
 Cancel
Total_Displays

2012-02-09T04:21:19-08:00

Excellent post, thank you!

1 0

Excellent post, thank you!
Cancel
carap

2012-02-17T10:36:31-08:00

Hi Dr.Pete,

Great article, thank you! I have a question regarding the implementation of Do you think the rel=prev/next attributes should be combined with meta robots noindex,follow tags on paginated pages? Google doesn't specify this but I think this is a fail safe way to satisfy Google and other search engines that do not comply with rel=prev/next.

Additionally, I have seen rel=prev/next implemented with self referencing canonicals on paginated pages. Any thoughts on this? This seems uneccesary.

Would love to hear your thoughts. Thanks!

1 0

Hi Dr.Pete, Great article, thank you! I have a question regarding the implementation of Do you think the rel=prev/next attributes should be combined with meta robots noindex,follow tags on paginated pages? Google doesn't specify this but I think this is a fail safe way to satisfy Google and other search engines that do not comply with rel=prev/next. Additionally, I have seen rel=prev/next implemented with self referencing canonicals on paginated pages. Any thoughts on this? This seems uneccesary. Would love to hear your thoughts. Thanks!
Cancel
- Dr. Peter J. Meyers
 
 2012-02-18T05:20:50-08:00
 
 I'm not a big fan of mixing signals - if it doesn't work, you never know quite why. If you're having problems related to pagination, I'd go with the META NOINDEX. If you're just looking to prevent future problems, I'd give rel=prev/next a shot and let it run by itself. It really depends on the scope and severity.
 
 When you say "self-referencing", do you mean back to Page 1 or to that actually specific page (e.g. page 23) of results. Self-referencing back to the specific page would be the opposite signal - can't imagine every wanting to do that.
 
 1 0
 
 I'm not a big fan of mixing signals - if it doesn't work, you never know quite why. If you're having problems related to pagination, I'd go with the META NOINDEX. If you're just looking to prevent future problems, I'd give rel=prev/next a shot and let it run by itself. It really depends on the scope and severity. When you say "self-referencing", do you mean back to Page 1 or to that actually specific page (e.g. page 23) of results. Self-referencing back to the specific page would be the opposite signal - can't imagine every wanting to do that.
 Cancel
pintofmilk

2011-12-06T05:09:19-08:00

Hi, I asked this before on December 3rd but think it was lost in the other threads and some of the content was removed by the editor. Here I have tried to explain the idea in a different way, hope you can help me...

Hi Stephanie, Dr Pete,

I am doing the SEO for a large hotel price comparison aggregator site which is chock full of syndicated content and has a duplicate content penalty. I was thinking of using noindex on the pages with duplicate content but now thinking of using syndication-source tag to give credit for hotel descriptions to the booking sites that they came from. I am hoping that it will make the site more trustworthy and improve rankings for the pages that do have original content.

As I read up on the syndication-source tag I noticed that on 2/11/11 Google added a note to their credit where credit is due article stating "we’ve updated our system to use rel=canonical instead of syndication-source, if both are specified". This seems to indicate that Google considers rel=canonical to have a similar effect to syndication-source. Therefore if a site uses rel=canonical with a link to it's own page (implemented so that affiliate links are not indexed) but using syndicated content, Google might consider this an attempt to claim original authorship of the syndicated content.

What do you think?

Thanks

Ben

pintofmilk edited 2011-12-08T19:34:24-08:00
1 0

Hi, I asked this before on December 3rd but think it was lost in the other threads and some of the content was removed by the editor. Here I have tried to explain the idea in a different way, hope you can help me... Hi Stephanie, Dr Pete, I am doing the SEO for a large hotel price comparison aggregator site which is chock full of syndicated content and has a duplicate content penalty. I was thinking of using noindex on the pages with duplicate content but now thinking of using syndication-source tag to give credit for hotel descriptions to the booking sites that they came from. I am hoping that it will make the site more trustworthy and improve rankings for the pages that do have original content. As I read up on the syndication-source tag I noticed that on 2/11/11 Google added a note to their <a href="https://googlenewsblog.blogspot.com/2010/11/credit-where-credit-is-due.html" rel="nofollow">credit where credit is due article</a> stating "we’ve updated our system to use rel=canonical instead of syndication-source, if both are specified". This seems to indicate that Google considers rel=canonical to have a similar effect to syndication-source. Therefore if a site uses rel=canonical with a link to it's own page (implemented so that affiliate links are not indexed) but using syndicated content, Google might consider this an attempt to claim original authorship of the syndicated content. What do you think? Thanks Ben
Cancel
- Dr. Peter J. Meyers
 
 2011-12-09T10:03:50-08:00
 
 Interesting - I see what you're saying, but honestly, syndication-source is still new enough that I haven't seen that combo in play. My guess is that the canonical might overpower the syndication-source tag in this case, although I don't think there'd be any harm in using both (internal canonical and cross-domain syndication-source). Worst case, the cross-domain signal just won't work.
 
 The other option would be to only put the canonical tag on the non-canonical versions (say, pages with tracking IDs) and then put the syndications-source tag but NOT the canonical tag on the canonical version. That would take some coding, though.
 
 1 0
 
 Interesting - I see what you're saying, but honestly, syndication-source is still new enough that I haven't seen that combo in play. My guess is that the canonical might overpower the syndication-source tag in this case, although I don't think there'd be any harm in using both (internal canonical and cross-domain syndication-source). Worst case, the cross-domain signal just won't work. The other option would be to only put the canonical tag on the non-canonical versions (say, pages with tracking IDs) and then put the syndications-source tag but NOT the canonical tag on the canonical version. That would take some coding, though.
 Cancel
Debe Maxwell

2012-01-05T10:37:59-08:00

I believe I've read this post 3 times now and still have to go back and refer to it to remember what 'move' I need to make next! Thank you SO much for this detailed information. I should probably print this to put on my desk beside me for daily reference!

1 0

I believe I've read this post 3 times now and still have to go back and refer to it to remember what 'move' I need to make next! Thank you SO much for this detailed information. I should probably print this to put on my desk beside me for daily reference! 
Cancel
Sebastien François

2012-01-06T00:41:51-08:00

Thanks a lot for this excellent recap. Dr Pete !

About the international duplicates (10), it may be interesting to add a link towards a recent post written on the Google Webmaster Central Blog about new markup for multilingual content: https://googlewebmastercentral.blogspot.com/2011/12/new-markup-for-multilingual-content.html

1 0

Thanks a lot for this excellent recap. Dr Pete ! About the international duplicates (10), it may be interesting to add a link towards a recent post written on the Google Webmaster Central Blog about new markup for multilingual content: https://googlewebmastercentral.blogspot.com/2011/12/new-markup-for-multilingual-content.html
Cancel
- Dominic108
 
 2012-01-10T22:03:14-08:00
 
 Thank you so much. This is really an important addition. There is yet so much more we could say.
 
 1 0
 
 Thank you so much. This is really an important addition. There is yet so much more we could say. 
 Cancel
Heather Physioc

2012-03-23T15:42:43-07:00

It took me 4 days to make enough time to get through this enormous post. WORTH IT. You really put a lot of work into it, thank you.

I especially appreciated all the uses you suggested for the canonical tag. Several were instances I actually hadn't thought of, and wasn't quite sure how to guide the developers of my clients' sites to fix. Very much appreciated, Dr. Pete.

- HP

1 0

It took me 4 days to make enough time to get through this enormous post. WORTH IT. You really put a lot of work into it, thank you. I especially appreciated all the uses you suggested for the canonical tag. Several were instances I actually hadn't thought of, and wasn't quite sure how to guide the developers of my clients' sites to fix. Very much appreciated, Dr. Pete. - HP
Cancel
sethgecko

2012-02-23T04:49:07-08:00

awesome post - covers just everything - really good job, thank you.

1 0

awesome post - covers just everything - really good job, thank you.
Cancel
daverage

2012-03-19T03:26:31-07:00

Hi. Great resource! I am trying to figure out where I sit - I think it is cross domain syndication.

Basically, as well as fresh content I write, I also publish a lot fo press releases that are cut and pasted straight into my CMS. I change the title, but that is about all I have time for.

As these are not directly syndicated from another site, I was wondering how best to handle them. I know I am getting heavilly hit by Panda about it as well.

At the moment I am setting the site to add noindex to any non unique page that is more than 5 days old and remove them from the sitemap.xml

Is there anything else I can do?

Thanks!!

1 0

Hi. Great resource! I am trying to figure out where I sit - I think it is cross domain syndication. Basically, as well as fresh content I write, I also publish a lot fo press releases that are cut and pasted straight into my CMS. I change the title, but that is about all I have time for. As these are not directly syndicated from another site, I was wondering how best to handle them. I know I am getting heavilly hit by Panda about it as well. At the moment I am setting the site to add noindex to any non unique page that is more than 5 days old and remove them from the sitemap.xml Is there anything else I can do? Thanks!!
Cancel
CommercePundit

2012-04-04T07:11:11-07:00

@Dr. Pete

I have implemented each and every attributes on my website which are suggested by you. But, I have question about Pagination and SEO: How do I fix during search parameters? I'm in hurry to get reply. :) BTW, Thanks for sharing and recommend to all SEO guys for implement on website.

1 0

@Dr. Pete I have implemented each and every attributes on my website which are suggested by you. But, I have question about <a href="../q/pagination-and-seo-how-do-i-fix-it-during-search-parameters">Pagination and SEO: How do I fix during search parameters?</a> I'm in hurry to get reply. :) BTW, Thanks for sharing and recommend to all SEO guys for implement on website.
Cancel
Sachinsurana

2012-10-28T03:01:13-07:00

1. What is the best approach to solve duplicate content - "301 Redirect" OR "Canonical URLs"?

Sachinsurana edited 2012-10-28T03:02:31-07:00
1 0

1. What is the best approach to solve duplicate content - "301 Redirect" OR "Canonical URLs"? 
Cancel
SpookSEO

2014-02-13T19:08:01-08:00

Peter this is such a great and more in-depth post that is describing the exact meaning of duplicates and this post is helpful for me to better understand what this duplicate means for search engine. I have never seen this type of post that is delivering the best ideas and after reading I can easily prevent myself from having a duplicate content that is harmful to the site and also for its ranking.

1 0

 Peter this is such a great and more in-depth post that is describing the exact meaning of duplicates and this post is helpful for me to better understand what this duplicate means for search engine. I have never seen this type of post that is delivering the best ideas and after reading I can easily prevent myself from having a duplicate content that is harmful to the site and also for its ranking. 
Cancel
Estela Silva

2014-07-18T08:06:13-07:00

Full detailed guide. congratulations...

1 0

Full detailed guide. congratulations...
Cancel
Bradley Smith

2014-08-13T10:32:49-07:00

Dr. Pete. Love this article. Great job on laying all this out. I know my comment is late to the post, but this page came up recently in one of my searches and I'm trying to provide a client with some backing to my recommendations related specifically to duplicate content found via secure and nonsecure versions of a page. I have a suggestion for a potential modification.

I'd say this section "V. Examples of Duplicate Content" particularly this item "(4) Secure (https) Pages" and the solution mentioned here "In many cases, it’s best to Noindex (IV-4) secure pages – shopping cart and check-out pages have no place in the search index." should now be slightly modified due to the recent announcement of using HTTPS as a ranking signal (https://googlewebmastercentral.blogspot.com/2014/08...)

I'd say that with Google moving towards SSL/HTTPS as a ranking signal, Noindexing shopping cart and check-out pages is still true, but wouldn't go as far as to say now that "it's best to Noindex secure pages". And I know the wording is not all inclusive because it is prefaced with "In many cases", but this could now be misleading to someone who may come across this without the prior knowledge or understanding of HTTPS as a ranking signal (e.g. I recommend a change to a client based on duplicate content from secure and nonsecure pages and send them to this post as a source, but to fix their problem they decide to Noindex secure pages because they haven't heard Google's recent announcement. Sure, I will do what I can to inform them of the announcement, but nonetheless, they may end up "fixing the problem" of duplicate content while creating a new one and potentially affecting their ranking in search in the future if Google uses the factor more heavily).

[fixed hyperlink - km]

KeriMorgret edited 2014-08-14T07:44:58-07:00
1 0

Dr. Pete. Love this article. Great job on laying all this out. I know my comment is late to the post, but this page came up recently in one of my searches and I'm trying to provide a client with some backing to my recommendations related specifically to duplicate content found via secure and nonsecure versions of a page. I have a suggestion for a potential modification. I'd say this section "V. Examples of Duplicate Content" particularly this item "(4) Secure (https) Pages" and the solution mentioned here "In many cases, it’s best to Noindex (IV-4) secure pages – shopping cart and check-out pages have no place in the search index." should now be slightly modified due to the recent announcement of using HTTPS as a ranking signal (<a href="https://googlewebmastercentral.blogspot.com/2014/08/https-as-ranking-signal.html" rel="nofollow">https://googlewebmastercentral.blogspot.com/2014/08...</a>) I'd say that with Google moving towards SSL/HTTPS as a ranking signal, Noindexing shopping cart and check-out pages is still true, but wouldn't go as far as to say now that "it's best to Noindex secure pages". And I know the wording is not all inclusive because it is prefaced with "In many cases", but this could now be misleading to someone who may come across this without the prior knowledge or understanding of HTTPS as a ranking signal (e.g. I recommend a change to a client based on duplicate content from secure and nonsecure pages and send them to this post as a source, but to fix their problem they decide to Noindex secure pages because they haven't heard Google's recent announcement. Sure, I will do what I can to inform them of the announcement, but nonetheless, they may end up "fixing the problem" of duplicate content while creating a new one and potentially affecting their ranking in search in the future if Google uses the factor more heavily). [fixed hyperlink - km]
Cancel
sminetows

2013-10-07T21:54:51-07:00

We'll done, myself and every other reader are convinced we all need a negative content specialist on our team. Seo sales copy at its best. if I only had the time or budget.

1 0

 We'll done, myself and every other reader are convinced we all need a negative content specialist on our team. Seo sales copy at its best. if I only had the time or budget. 
Cancel
Myster

2013-09-25T22:15:52-07:00

Hi, I'm looking at syndication, I have a parent site and a subject matter specialist site.

I wish to syndicate some articles from the parent to the specialist site.

We use google CSE to search the sites (one index, which we filter when searching each site using the site: operator)

- CSE searches for "article about cats" on the subject site eg: "site:subject.com article about cats" what will happen will they get any results?

- Google searches for “article about cats” may only show results on parent (probably ok)

- Google searchrs for “specialist article about cats” may only show results on parent (probably not ok)

My Approach:

1) Ask you....

2) We’ll build the site with the canonical functionality and see what happens, (using Google webmaster tools)

3) We have any of the above problems, we can turn off this feature and hope the duplicates don’t mess up our rankings.

1 0

 Hi, I'm looking at syndication, I have a parent site and a subject matter specialist site. I wish to syndicate some articles from the parent to the specialist site. We use google CSE to search the sites (one index, which we filter when searching each site using the site: operator) - CSE searches for "article about cats" on the subject site eg: "site:subject.com article about cats" what will happen will they get any results? - Google searches for “article about cats” may only show results on parent (probably ok) - Google searchrs for “specialist article about cats” may only show results on parent (probably not ok) My Approach: 1) Ask you.... 2) We’ll build the site with the canonical functionality and see what happens, (using Google webmaster tools) 3) We have any of the above problems, we can turn off this feature and hope the duplicates don’t mess up our rankings. 
Cancel
- Dr. Peter J. Meyers
 
 2013-09-26T14:27:45-07:00
 
 Sorry, I have very little experience with CSE, but I'm not under the impression that canonical tags impact it at all. If you do a cross-domain canonical that should help prevent any duplicate content issues, but you will have to pick which domain should rank. I don't think that will impact your CSE results, but I'm not 100% sure.
 
 1 0
 
 Sorry, I have very little experience with CSE, but I'm not under the impression that canonical tags impact it at all. If you do a cross-domain canonical that should help prevent any duplicate content issues, but you will have to pick which domain should rank. I don't think that will impact your CSE results, but I'm not 100% sure. 
 Cancel
Ravi Ahuja

2013-05-17T23:43:13-07:00

Amazing article, I ma facing problem with https. My previous host had shared SSL and my blog was also working with https version. Now Google have indexed my blogs https version too which I feel is creating duplicate content issue.
My new host do not offer SSL by default and now my site don't work with https version but Google is still showing https results. Now I am confused what to do. Please help me on this issue.

1 0

Amazing article, I ma facing problem with https. My previous host had shared SSL and my blog was also working with https version. Now Google have indexed my blogs https version too which I feel is creating duplicate content issue. My new host do not offer SSL by default and now my site don't work with https version but Google is still showing https results. Now I am confused what to do. Please help me on this issue. 
Cancel
Public Wizard

2013-08-13T22:17:09-07:00

Even two years later I find some of this useful and relevant when trying to freshen up on a few items. All I can say is Thank you.. two years later.

1 0

 Even two years later I find some of this useful and relevant when trying to freshen up on a few items. All I can say is Thank you.. two years later. 
Cancel
TheCraig

2013-09-08T10:42:12-07:00

Thanks for this Dr. Pete! Question about your quote:

Here's the bigger problem, though - what if those 6,700 pages "wear out" the crawlers to the point that your actually product pages don't get crawled. Now, you're sacrificing high-value, high-conversion pages for low-value internal search pages. Practically, I see this happen too often. I solved this problem for one client long before Panda, and saw their search traffictriple over the next 2 months. Paginated content and other duplicates were keeping Google from crawling their most important content.
This hits close to home and right on target for us. Our search result pages far out-index our product pages. Do you by chance have an article anywhere where you talk about how you handled this? This is our situation and we need to make a careful transition to rank our product pages while de-ranking our search pages.

Thanks!!

1 0

 Thanks for this Dr. Pete! Question about your quote: Here's the bigger problem, though - what if those 6,700 pages "wear out" the crawlers to the point that your actually product pages don't get crawled. Now, you're sacrificing high-value, high-conversion pages for low-value internal search pages. Practically, I see this happen too often. I solved this problem for one client long before Panda, and saw their search traffictriple over the next 2 months. Paginated content and other duplicates were keeping Google from crawling their most important content. This hits close to home and right on target for us. Our search result pages far out-index our product pages. Do you by chance have an article anywhere where you talk about how you handled this? This is our situation and we need to make a careful transition to rank our product pages while de-ranking our search pages. Thanks!!
Cancel
- Dr. Peter J. Meyers
 
 2013-09-09T11:01:04-07:00
 
 Unfortunately, pagination can be a very complex topic, and it depends a lot on your situation. I think this resource by Adam Audette is good, and it gets into just how tricky the problem can be:
 
 https://searchengineland.com/five-step-strategy-for-solving-seo-pagination-problems-95494
 
 Google has hinted more and more strongly that they don't think search pages are of value to users (i.e. their searches landing on your searches). On the other hand, major category searches, etc. are often key landings pages for some sites, so it's a balancing act.
 
 2 0
 
 Unfortunately, pagination can be a very complex topic, and it depends a lot on your situation. I think this resource by Adam Audette is good, and it gets into just how tricky the problem can be: <a href="https://searchengineland.com/five-step-strategy-for-solving-seo-pagination-problems-95494" rel="nofollow">https://searchengineland.com/five-step-strategy-for-solving-seo-pagination-problems-95494</a> Google has hinted more and more strongly that they don't think search pages are of value to users (i.e. their searches landing on your searches). On the other hand, major category searches, etc. are often key landings pages for some sites, so it's a balancing act. 
 Cancel
Ravi Adepu

2014-08-30T03:43:59-07:00

I am working on product (machine) re-seller website...here each product specifications and descriptions are same from head website I have no chance to write content for each product here I have copy the same specifications form head office website

i have a fear that i have copied all specification from other website and google may not give priority to my website...

getting full confuse please help me

1 0

I am working on product (machine) re-seller website...here each product specifications and descriptions are same from head website I have no chance to write content for each product here I have copy the same specifications form head office website i have a fear that i have copied all specification from other website and google may not give priority to my website... getting full confuse please help me
Cancel
tomarajay

2014-09-06T21:52:59-07:00

Nice artical very informative. Really helpful for us and I really appreciate it.

1 0

Nice artical very informative. Really helpful for us and I really appreciate it.
Cancel
Parveender

2016-11-23T22:00:13-08:00

Great work on duplicate content issue Dr. Peter. Awesome guide to solve many content related problems.

1 0

Great work on duplicate content issue Dr. Peter. Awesome guide to solve many content related problems.
Cancel
Parveender

2016-11-29T23:34:23-08:00

Everyday we see some changes in internet marketing. Google updates it's algorithms to give best results for users but sometime this harms many SEO workers. Google updates on content optimisation are very harmful for fake or duplicate content and it takes some time to understand this types of updates.

1 0

Everyday we see some changes in internet marketing. Google updates it's algorithms to give best results for users but sometime this harms many SEO workers. Google updates on content optimisation are very harmful for fake or duplicate content and it takes some time to understand this types of updates.
Cancel
Parveender

2016-11-29T23:35:27-08:00

hi Peter this is awesome article to escape for duplicate content problems

1 0

hi Peter this is awesome article to escape for duplicate content problems
Cancel
Sri_Sanka_Liyanage

2015-05-28T07:14:38-07:00

my website has two home pages as "www.xxxx.com" and www.xxxx.com/index.php" these two URLs have been indexed in google search. so request developer team to remove the index.php page or redirect it to the root.. becasue all the web pages are having two URLs having index.php path... "www.xxxx.com/contact" www.xxxx.com/index.php/contact

Sri_Sanka_Liyanage edited 2015-05-28T07:18:10-07:00
1 0

my website has two home pages as "www.xxxx.com" and www.xxxx.com/index.php" these two URLs have been indexed in google search. so request developer team to remove the index.php page or redirect it to the root.. becasue all the web pages are having two URLs having index.php path... "www.xxxx.com/contact" www.xxxx.com/index.php/contact
Cancel
liam-odowd

2015-05-13T01:34:23-07:00

Still a solid post. Thanks.

1 0

Still a solid post. Thanks.
Cancel
meritnation

2014-11-18T04:40:30-08:00

Thanks for the information

meritnation edited 2014-11-18T04:41:22-08:00
1 0

Thanks for the information
Cancel
Pavan1804

2014-11-22T21:58:03-08:00

I am sharing my site content with social sites like GPlus and Facebook. one day I copied my first para of the article and pasted it on google search I saw my facebook page came before my site page. Is it ok? for a normal person it would look like I have copied content from facebook. please suggest?

1 0

I am sharing my site content with social sites like GPlus and Facebook. one day I copied my first para of the article and pasted it on google search I saw my facebook page came before my site page. Is it ok? for a normal person it would look like I have copied content from facebook. please suggest?
Cancel
NathanBrook

2015-04-07T23:36:41-07:00

Pretty good overview of the possible duplicate issues. I’d only add internal search and additive filtering to the list.

1 0

Pretty good overview of the possible duplicate issues. I’d only add internal search and additive filtering to the list.
Cancel
Adam greenwald

2013-02-13T02:16:31-08:00

OMG! This article is very long but very informative. As Esaky mentioned to take out the print of whole article to learn the things deeply. All vital things in seo panda updation are mentioned here through which anyone can save their website get penalized or banned and attain good rankings.I have read this article only once and now i am going to print it for future assistance. Thanks Dr. Pete

1 0

OMG! This article is very long but very informative. As Esaky mentioned to take out the print of whole article to learn the things deeply. All vital things in seo panda updation are mentioned here through which anyone can save their website get penalized or banned and attain good rankings.I have read this article only once and now i am going to print it for future assistance. Thanks Dr. Pete
Cancel
Sri Ganesh.M

2013-02-01T13:23:13-08:00

This article is very huge and will print it for better understanding !
My problem with the blog isthe Duplicate content is marked as for Search Terms or Tags ! like i have written 10 post about design art, i added tags as "design art" in the tag form. using WordPress.org self hosted website !

1 0

This article is very huge and will print it for better understanding ! My problem with the blog isthe Duplicate content is marked as for Search Terms or Tags ! like i have written 10 post about design art, i added tags as "design art" in the tag form. using WordPress.org self hosted website ! 
Cancel
WriteOnPointSEO

2012-08-13T09:38:05-07:00

I have been relying on all of the valuable information in this post since you wrote it! Quite amazing work by the way! I see that in a forum Google recently said that the syndication source tag was deprecated. I was wondering if you had a suggestion for what to use in its place for a situation where a company syndicates health content to various hospitals what to use. The content is only a portion of the client's page so the rel canonical tag won't really work. Is a link to the original source sufficient?

1 0

I have been relying on all of the valuable information in this post since you wrote it! Quite amazing work by the way! I see that in a forum Google recently said that the syndication source tag was deprecated. I was wondering if you had a suggestion for what to use in its place for a situation where a company syndicates health content to various hospitals what to use. The content is only a portion of the client's page so the rel canonical tag won't really work. Is a link to the original source sufficient?
Cancel
- Dr. Peter J. Meyers
 
 2012-08-13T13:48:18-07:00
 
 Thanks - I wasn't aware of that, and just found a reference from the Google News team. Would've been nice if they told us that a bit louder, but it does appear to be official. I'll update the post.
 
 2 0
 
 Thanks - I wasn't aware of that, and just found a reference from the Google News team. Would've been nice if they told us that a bit louder, but it does appear to be official. I'll update the post.
 Cancel
 - WriteOnPointSEO
 
 2012-08-14T03:58:35-07:00
 
 Thanks Dr. Pete. With this tag gone, I am not sure what to do for my client except have the content on their customer's site link back to their site. As this syndicated content is only a portion of the page, rather than an entire page, it appears none of the other options will work.
 
 2 0
 
 Thanks Dr. Pete. With this tag gone, I am not sure what to do for my client except have the content on their customer's site link back to their site. As this syndicated content is only a portion of the page, rather than an entire page, it appears none of the other options will work.
 Cancel
 - Dr. Peter J. Meyers
 
 2012-08-14T07:36:11-07:00
 
 Unfortunately, I don't think even syndication-source was intended for portions of a page - there are really no solid partial-content canonicalization or blocking solutions. The link back probably is your best bet.
 
 2 0
 
 Unfortunately, I don't think even syndication-source was intended for portions of a page - there are really no solid partial-content canonicalization or blocking solutions. The link back probably is your best bet.
 Cancel
tjhenke

2012-08-17T10:37:44-07:00

First -- THANK YOU -- I am very impressed that you have taken the time to answer all of these questions.

I am developing a site for student apartment renters that I would like to bring to numerous markets. Apartment, sublet, and roommate posts will be user generated and unique to each site. The site will also include hundreds of helpful links (to relevant local resources) that will be mostly unique to each site.

However, the site is relatively copy-heavy and I would like to "duplicate" in numerous places (we have "About This Site" / "About This Page" copy on each page). For example, if the homepage says that our site is "Madison's favorite place to find Madison sublets", this copy (with the appropriate city name) would remain relevant in each market.

Question: Even though the copy is about our site in a market, and we have large amounts of other unique content, is duplication still a huge no no?

Thanks again!

tjhenke edited 2012-08-17T10:38:27-07:00
1 0

First -- THANK YOU -- I am very impressed that you have taken the time to answer all of these questions. I am developing a site for student apartment renters that I would like to bring to numerous markets. Apartment, sublet, and roommate posts will be user generated and unique to each site. The site will also include hundreds of helpful links (to relevant local resources) that will be mostly unique to each site. However, the site is relatively copy-heavy and I would like to "duplicate" in numerous places (we have "About This Site" / "About This Page" copy on each page). For example, if the homepage says that our site is "Madison's favorite place to find Madison sublets", this copy (with the appropriate city name) would remain relevant in each market. Question: Even though the copy is about our site in a market, and we have large amounts of other unique content, is duplication still a huge no no? Thanks again!
Cancel
Radko Aleksandrov

2012-08-29T08:25:32-07:00

Great content. Good work. This was the most informative article to me concerning Duplicate content.

Thank you Dr. Pete!

1 0

Great content. Good work. This was the most informative article to me concerning Duplicate content. Thank you Dr. Pete! 
Cancel
William VanVeen

2012-07-30T14:01:44-07:00

Just a note that the section where you find your missing & duplicate descriptions and titles that used to be "Diagnostics" in GWT is now called "Optimization" then "HTML Improvements".

1 0

Just a note that the section where you find your missing & duplicate descriptions and titles that used to be "Diagnostics" in GWT is now called "Optimization" then "HTML Improvements". 
Cancel
Staff

Dr. Peter J. Meyers
Staff

2012-07-16T21:00:47-07:00

My apologies, but I can't answer audit-level questions in blog comments - it's just proving to be too time-consuming, and quick answers often end up being bad answers in situations as complex as these. I'd encourage any SEOmoz members to submit complex questions to Private Q&A here on the site, where we can at least take a closer look at any individual site.

1 0

My apologies, but I can't answer audit-level questions in blog comments - it's just proving to be too time-consuming, and quick answers often end up being bad answers in situations as complex as these. I'd encourage any SEOmoz members to submit complex questions to Private Q&A here on the site, where we can at least take a closer look at any individual site.
Cancel
CommercePundit

2012-07-13T04:54:01-07:00

@Dr. Pete.

Today, I'm quite confuse with my Product Variations pages. I want to give one example to know more about it.

https://www.vistastores.com/patio-umbrellas

This is my main product page. I have developed new URL structure and left navigation structure to develop new web pages. All pages are open for crawling.

These are my branch pages.

https://www.vistastores.com/patio-umbrellas/shopby/manufacturer-california-umbrella
https://www.vistastores.com/patio-umbrellas/shopby/lift-method-search-manual-lift/manufacturer-california-umbrella
https://www.vistastores.com/patio-umbrellas/shopby/canopy-shape-search-hexagonal
https://www.vistastores.com/patio-umbrellas/shopby/canopy-shape-search-hexagonal/color-search-green

I have big big big confusion with Canonical, NOINDEX and Robots.txt. I have implemented all with my left navigation section.

Today, I need final solution which I will never change. My organic traffic is going down & quite worried. Can you give me exact solution for it. So, I can implement on website without any hesitation.

1 0

@Dr. Pete. Today, I'm quite confuse with my Product Variations pages. I want to give one example to know more about it. https://www.vistastores.com/patio-umbrellas This is my main product page. I have developed new URL structure and left navigation structure to develop new web pages. All pages are open for crawling. These are my branch pages. https://www.vistastores.com/patio-umbrellas/shopby/manufacturer-california-umbrella https://www.vistastores.com/patio-umbrellas/shopby/lift-method-search-manual-lift/manufacturer-california-umbrella https://www.vistastores.com/patio-umbrellas/shopby/canopy-shape-search-hexagonal https://www.vistastores.com/patio-umbrellas/shopby/canopy-shape-search-hexagonal/color-search-green I have big big big confusion with Canonical, NOINDEX and Robots.txt. I have implemented all with my left navigation section. Today, I need final solution which I will never change. My organic traffic is going down & quite worried. Can you give me exact solution for it. So, I can implement on website without any hesitation. 
Cancel
Matthew White

2012-07-16T12:23:20-07:00

Hey Doc,
I need a professional diagnoses.
My illness is possible duplicate content. (Its not just a coincidence that I'm here :) )
You've got a very complete and detailed explanation here, and I wouldnt dare ask to trim it down or dumb it down. But without doing that, I'm a little confused, maybe its simply because no example fit my exact situation.
I have about 1,500 product pages. They are pages in wordpress, not posts. (Although they used to be and were all deleted and I made new pages. Changed themes and it wasnt able to look good as a post)
Well, heres my site https://funeralparlour.com but thats not gonna take you straight to my issue without clicking a few links inside.
Heres my setup.
I sell funeral program templates, in packages. Basic - Regular - Complete.
There are four types of lets say "sub packages".
Tri Fold Brochure - Single Fold Brochure - 4 Page Grad Fold Brochure - 2 Page Grad Fold Brochure
Ok, so for the Basic package, we would have Tri Fold Brochure and Thank You card..
Im sure you can guess what the next 3 different basic packages would be right?
And so on.. Regular also includes a Bookmark, and the complete also includes a Postcard and Prayer Card.
Now the only other difference is the "Theme" or Design..
Nature designs, I have 37 different designs in that section. So 37 Tri-Fold basic - regular - complete and 37 Single Fold: Basic - Regular - Complete etc.. 37**3*4= 444 different products in that section.

I also have many themes, such as patriotic, hobbies, professions, sports, religious etc.

I dont exactly how much is duplicate, but most pages are very similar, some text changes but not all that much, and the pictures change.

I have some pictures indexed and found in google images. I spent alot of time organizing them and changing all the alt tags, file names etc., to make it look pretty much like every picture is a unique picture, even if some are duplicate pictures, the filenames and alt tags are different.

So my question is basically, what do I do? hehe :)
My best course of action (aside from rewriting 1,500 product pages to be unique, which would take,, well I wouldnt get it finished this year.
Heres a few direct links, maybe you can see a bit cleared what I'm talking about
https://funeralparlour.com/store/christianity-01-tri-fold-obituary-template-basic-package/
https://funeralparlour.com/store/christianity-01-tri-fold-obituary-template-regular-package
https://funeralparlour.com/store/christianity-01-tri-fold-obituary-template-complete-package

https://funeralparlour.com/store/astronomy-01-4-page-grad-fold-obituary-template-basic-package
https://funeralparlour.com/store/astronomy-01-4-page-grad-fold-obituary-template-regular-package
https://funeralparlour.com/store/astronomy-01-4-page-grad-fold-obituary-template-complete-package

Help me Doc,
Its appreciated
Sorry about all the links, but its the easiest way for me to show you my symptoms :)

Cheers

Matt

1 0

Hey Doc, I need a professional diagnoses. My illness is possible duplicate content. (Its not just a coincidence that I'm here :) ) You've got a very complete and detailed explanation here, and I wouldnt dare ask to trim it down or dumb it down. But without doing that, I'm a little confused, maybe its simply because no example fit my exact situation. I have about 1,500 product pages. They are pages in wordpress, not posts. (Although they used to be and were all deleted and I made new pages. Changed themes and it wasnt able to look good as a post) Well, heres my site https://funeralparlour.com but thats not gonna take you straight to my issue without clicking a few links inside. Heres my setup. I sell funeral program templates, in packages. Basic - Regular - Complete. There are four types of lets say "sub packages". Tri Fold Brochure - Single Fold Brochure - 4 Page Grad Fold Brochure - 2 Page Grad Fold Brochure Ok, so for the Basic package, we would have Tri Fold Brochure and Thank You card.. Im sure you can guess what the next 3 different basic packages would be right? And so on.. Regular also includes a Bookmark, and the complete also includes a Postcard and Prayer Card. Now the only other difference is the "Theme" or Design.. Nature designs, I have 37 different designs in that section. So 37 Tri-Fold basic - regular - complete and 37 Single Fold: Basic - Regular - Complete etc.. 37**3*4= 444 different products in that section. I also have many themes, such as patriotic, hobbies, professions, sports, religious etc. I dont exactly how much is duplicate, but most pages are very similar, some text changes but not all that much, and the pictures change. I have some pictures indexed and found in google images. I spent alot of time organizing them and changing all the alt tags, file names etc., to make it look pretty much like every picture is a unique picture, even if some are duplicate pictures, the filenames and alt tags are different. So my question is basically, what do I do? hehe :) My best course of action (aside from rewriting 1,500 product pages to be unique, which would take,, well I wouldnt get it finished this year. Heres a few direct links, maybe you can see a bit cleared what I'm talking about https://funeralparlour.com/store/christianity-01-tri-fold-obituary-template-basic-package/ https://funeralparlour.com/store/christianity-01-tri-fold-obituary-template-regular-package https://funeralparlour.com/store/christianity-01-tri-fold-obituary-template-complete-package https://funeralparlour.com/store/astronomy-01-4-page-grad-fold-obituary-template-basic-package https://funeralparlour.com/store/astronomy-01-4-page-grad-fold-obituary-template-regular-package https://funeralparlour.com/store/astronomy-01-4-page-grad-fold-obituary-template-complete-package Help me Doc, Its appreciated Sorry about all the links, but its the easiest way for me to show you my symptoms :) Cheers Matt 
Cancel
thelearningman

2012-07-16T13:28:10-07:00

Thanks for posting this information.

I do have a question about this subject matter. I have noticed that several of my competitors have created a number of pages whereby the only difference in content is the city and state name, and they are being indexed and searchable by Google. I attempted to do the same, and though my site has been crawled considerably over the last two weeks, I do not come up the search results. Am I being penalized by Panda or am I being too impatient waiting for some positive results.

Thank you.

1 0

Thanks for posting this information. I do have a question about this subject matter. I have noticed that several of my competitors have created a number of pages whereby the only difference in content is the city and state name, and they are being indexed and searchable by Google. I attempted to do the same, and though my site has been crawled considerably over the last two weeks, I do not come up the search results. Am I being penalized by Panda or am I being too impatient waiting for some positive results. Thank you. 
Cancel
Gurpreet Singh

2012-09-13T04:33:12-07:00

Very well explained Dr. Pete. Thanks for sharing such a wonderful information. Honestly this part was missed in my studies on SEO.

Once again Thanks.

1 0

Very well explained Dr. Pete. Thanks for sharing such a wonderful information. Honestly this part was missed in my studies on SEO. Once again Thanks.
Cancel
behnam

2012-10-17T00:19:42-07:00

hii am so surprise that i read your good post.i have question.when i copy a post and i write reference source. does google's panda recognize my post as copy?My site is : www.rajeoon.com in persian

KeriMorgret edited 2012-10-17T08:37:34-07:00
1 0

hii am so surprise that i read your good post.i have question.when i copy a post and i write reference source. does google's panda recognize my post as copy?My site is : www.rajeoon.com in persian
Cancel
hoang huy do

2012-12-24T00:44:59-08:00

Hi
I just have trouble with duplicate content with link exchange page. As you know:
I have 1 module file linkexchange.php to make linkexchangepages and i found that i should use
(10) rel =next, rel = prev in your article. Could i use it in 1 file php like my troulbe?

1 0

Hi I just have trouble with duplicate content with link exchange page. As you know: I have 1 module file linkexchange.php to make linkexchangepages and i found that i should use (10) rel =next, rel = prev in your article. Could i use it in 1 file php like my troulbe? 
Cancel
Greg T Morancey

2013-01-09T13:59:55-08:00

If Pete is still around? This affects a client if they build different sites using different ips and buy urls that stay linked within themselves?? The sport, "soccer" we use is called different things across the globe so we have used different urls to get that countries users, certainly there is no penalty for being smart? What is your advise for the over 140 urls bought with soccer and football in them? Should we simply point each to the main company's site or is there a better way to use them? I am sure its simply pointing the urls to the main site but never hurts to ask. Thank you for your time I know how little each of us have in this industry!!

1 0

If Pete is still around? This affects a client if they build different sites using different ips and buy urls that stay linked within themselves?? The sport, "soccer" we use is called different things across the globe so we have used different urls to get that countries users, certainly there is no penalty for being smart? What is your advise for the over 140 urls bought with soccer and football in them? Should we simply point each to the main company's site or is there a better way to use them? I am sure its simply pointing the urls to the main site but never hurts to ask. Thank you for your time I know how little each of us have in this industry!! 
Cancel
- Dr. Peter J. Meyers
 
 2013-01-09T14:28:48-08:00
 
 So, all 140 URLs can be crawled and resolve to the same content? Yeah, that's definitely going to look thin - cross-link them, and it's going to look like a link network. You could even end up with a fairly large-scale penalty.
 Now, if they don't resolve separately, but just redirect to one core domain, that's fine. In Google's eyes, that will be one site. If each one is being crawled/indexed, though, you can create a real mess.
 
 1 0
 
 So, all 140 URLs can be crawled and resolve to the same content? Yeah, that's definitely going to look thin - cross-link them, and it's going to look like a link network. You could even end up with a fairly large-scale penalty. Now, if they don't resolve separately, but just redirect to one core domain, that's fine. In Google's eyes, that will be one site. If each one is being crawled/indexed, though, you can create a real mess.
 Cancel
 - Greg T Morancey
 
 2013-01-10T12:43:46-08:00
 
 my thought was to use all urls bought as a billboard or advertisement with no links to main company(in fact no links at all) but an advertisement like billboard with web-commercials embedded..would this be a better way to use them...and help me to understand you are saying just point the url to another address so the spiders cant crawl..then the urls are simply chance landings if a web user would happen to place this in an url box? Thanks for responding nice of you..we are number one in most soccer searches but this is getting out of hand..running 16 years in the making the competition is growing and would like to maintain my positions
 
 1 0
 
 my thought was to use all urls bought as a billboard or advertisement with no links to main company(in fact no links at all) but an advertisement like billboard with web-commercials embedded..would this be a better way to use them...and help me to understand you are saying just point the url to another address so the spiders cant crawl..then the urls are simply chance landings if a web user would happen to place this in an url box? Thanks for responding nice of you..we are number one in most soccer searches but this is getting out of hand..running 16 years in the making the competition is growing and would like to maintain my positions 
 Cancel
 - Greg T Morancey
 
 2013-01-10T12:46:00-08:00
 
 and pardon if it was in the readings but how do you know if you are being penalized..they communicate with you? or just slap a penalty silently on a user?
 
 1 0
 
 and pardon if it was in the readings but how do you know if you are being penalized..they communicate with you? or just slap a penalty silently on a user? 
 Cancel
Varun Sharma

2013-01-27T08:12:25-08:00

Hi Dr. Pete
I'm having problem with duplicate content for dating website datetolove.com. If you go to the location link in the website you will see the profile of same persons are showcasing in Country,state & city pages which will lead to duplication.if i use canonical tag in the country page then I think google will crawl only country page & will leave state & city pages.But I need that google should crawl all the 3 pages without any duplication's.Please help me out with this problem please check the link below.
https://www.datetolove.com/en/locations

1 0

Hi <a href="https://www.seomoz.org/users/profile/22897" rel="nofollow">Dr. Pete</a> I'm having problem with duplicate content for dating website datetolove.com. If you go to the location link in the website you will see the profile of same persons are showcasing in Country,state & city pages which will lead to duplication.if i use canonical tag in the country page then I think google will crawl only country page & will leave state & city pages.But I need that google should crawl all the 3 pages without any duplication's.Please help me out with this problem please check the link below. https://www.datetolove.com/en/locations
Cancel
Kamal Kishor

2012-11-27T05:55:39-08:00

Although i am making my comment too late but this is the best case study i ever read regarding search engine algorithm......
I really honor Mr.Dr. Pete

1 0

 Although i am making my comment too late but this is the best case study i ever read regarding search engine algorithm...... I really honor Mr.Dr. Pete
Cancel
Birder

2012-11-08T09:58:07-08:00

Our non-profit website www.birdlist.org has been plagued by increasingly unforgiving Google requirements. Created in 1998, we have one of the oldest bird websites on the net and we provide a list of birds for every country in the world and for every USA State. This is high demand information bird watchers. But all our info looks so similar to Google. Some states may only vary 5 species out of a list of 400.

So we tried to put in some content text, but for so many pages, it is impossible for us to author completely new text for each page. So we used a standard text, varied it a bit with keywords for the state in question. Example: https://www.birdlist.org/checklists_of_the_birds_of_the_united_states/birds_of_kentucky.htm For a while our page would come up but then it would sink away again. Our most common search is "birds of + state or country"

We still get half a million site visits per year, but we are 50% down from a year ago.

What can we do to improve our scoring again?

1 0

Our non-profit website www.birdlist.org has been plagued by increasingly unforgiving Google requirements. Created in 1998, we have one of the oldest bird websites on the net and we provide a list of birds for every country in the world and for every USA State. This is high demand information bird watchers. But all our info looks so similar to Google. Some states may only vary 5 species out of a list of 400. So we tried to put in some content text, but for so many pages, it is impossible for us to author completely new text for each page. So we used a standard text, varied it a bit with keywords for the state in question. Example: https://www.birdlist.org/checklists_of_the_birds_of_the_united_states/birds_of_kentucky.htm For a while our page would come up but then it would sink away again. Our most common search is "birds of + state or country" We still get half a million site visits per year, but we are 50% down from a year ago. What can we do to improve our scoring again? 
Cancel
Mr Smith

2012-10-25T06:36:01-07:00

I have an interesting issue, one of our products below is no longer indexed in Google:

https://www.keepitpersonal.co.uk/personalised-swarovski-crystal-heart-vase-engraved-gift-p-836.html

It's an item which many other companies are selling however all competitors are using a "thinner description" and duplicate images as each other, ours is completely unique but there content is ranked!

I ran copyscape and noticed that a price comparison site has scraped our page content and wondering if this is why it is no longer indexed.

https://www.copyscape.com/view.php?o=27146&u=http%3A%2F%2Fwww.shopwiki.co.uk%2Fl%2FPersonalised-Engraved-Horse-Glass-Vase-Gift&t=1351162865&s=http%3A%2F%2Fwww.keepitpersonal.co.uk%2Fpersonalised-swarovski-crystal-heart-vase-engraved-gift-p-836.html&w=54&i=1&r=3

I think scrapers are effecting our ranking, anyone else had this issue, shall i contact this site and ask to remove?

1 0

I have an interesting issue, one of our products below is no longer indexed in Google: https://www.keepitpersonal.co.uk/personalised-swarovski-crystal-heart-vase-engraved-gift-p-836.html It's an item which many other companies are selling however all competitors are using a "thinner description" and duplicate images as each other, ours is completely unique but there content is ranked! I ran copyscape and noticed that a price comparison site has scraped our page content and wondering if this is why it is no longer indexed. https://www.copyscape.com/view.php?o=27146&u=http%3A%2F%2Fwww.shopwiki.co.uk%2Fl%2FPersonalised-Engraved-Horse-Glass-Vase-Gift&t=1351162865&s=http%3A%2F%2Fwww.keepitpersonal.co.uk%2Fpersonalised-swarovski-crystal-heart-vase-engraved-gift-p-836.html&w=54&i=1&r=3 I think scrapers are effecting our ranking, anyone else had this issue, shall i contact this site and ask to remove? 
Cancel
Ian Mason

2011-12-05T20:49:04-08:00

Thank you so much for your advice on parameter handling. I recently discovered duplicate urls caused by parameters I am not even linking to on my site, but were made available to googlebot somehow by the developers of my site...

1 0

Thank you so much for your advice on parameter handling. I recently discovered duplicate urls caused by parameters I am not even linking to on my site, but were made available to googlebot somehow by the developers of my site...
Cancel
AjayChanchal

2012-07-11T07:37:53-07:00

Hi Dr Pete

First of all Many Thanks for such a nice post over duplicate content issue.

I need one help regarding the site of my one client.

This site has lost its all ranking for all the keywords last week, and no any single keywords is appearing even in top 200 search results. So how to determine why this site has been penalized. Was it penalized by Panda or Penguin or anything else.

After analysis I also found that the content of some pages of this website was also present over some press release sites. When I searched by putting texts block from those pages in Google Search, then this very site did not appear but those press release sites containing the same content appeared at top position and the website of my client was nowhere.

And other question, is it possible to keep those press releases alive as well as that very same content on the website, as the client to wish the content on both sites(on press release and his website), is it possible.

Don't know what to do?

How to know why the website was penalized?

What should I do to recover this penalty or whatever and how to regain the previous ranking positions in Google Search Results?

Pleas Guide Me Dr. Pete.

Many Thanks.

1 0

Hi Dr Pete First of all Many Thanks for such a nice post over duplicate content issue. I need one help regarding the site of my one client. This site has lost its all ranking for all the keywords last week, and no any single keywords is appearing even in top 200 search results. So how to determine why this site has been penalized. Was it penalized by Panda or Penguin or anything else. After analysis I also found that the content of some pages of this website was also present over some press release sites. When I searched by putting texts block from those pages in Google Search, then this very site did not appear but those press release sites containing the same content appeared at top position and the website of my client was nowhere. And other question, is it possible to keep those press releases alive as well as that very same content on the website, as the client to wish the content on both sites(on press release and his website), is it possible. Don't know what to do? How to know why the website was penalized? What should I do to recover this penalty or whatever and how to regain the previous ranking positions in Google Search Results? Pleas Guide Me Dr. Pete. Many Thanks. 
Cancel
UlyanaB

2011-11-28T05:23:51-08:00

Thank you, Dr.Pete! The post is awesome, extremely useful.

1 0

Thank you, Dr.Pete! The post is awesome, extremely useful.
Cancel
byoung

2011-11-17T11:14:46-08:00

Oustanding! Thank you so much for your time putting this together. A great resource.

1 0

Oustanding! Thank you so much for your time putting this together. A great resource.
Cancel
Keith Paulin

2011-11-17T12:52:45-08:00

Wow - that has to be a "Hall Of Fame" post - high level summaries for non-propellor heads and solid detail for the techies we often bump heads with when trying to clean up thechnical infratsructure issues, of which duplicate content is often nightmarish. Outstanding!

1 0

Wow - that has to be a "Hall Of Fame" post - high level summaries for non-propellor heads and solid detail for the techies we often bump heads with when trying to clean up thechnical infratsructure issues, of which duplicate content is often nightmarish. Outstanding!
Cancel
Bryant Dunivan

2011-11-17T13:16:56-08:00

Very good post - but we have been warned of panda since caffine indexing came out. Fresh and unique content was sought, and now panda almost gaurentees it

dunivan edited 2011-11-17T13:17:18-08:00
1 0

Very good post - but we have been warned of panda since caffine indexing came out. Fresh and unique content was sought, and now panda almost gaurentees it
Cancel
baggelin

2011-11-17T11:09:11-08:00

Great and clarifying article. Mentioning bing and get dont Yahoo site Explorer have any clever tools useful for similar functionality? I am not sure ;)

1 0

Great and clarifying article. Mentioning bing and get dont Yahoo site Explorer have any clever tools useful for similar functionality? I am not sure ;)
Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T14:08:21-08:00
 
 Like our own Open Site Explorer, YSE is really more of a tool for exploring your link graph. Unfortunately, most of the old Yahoo tools for really digging into your index are slowly going away since the Bing integration. I love YSE, but it's hard to recommend these tools to people, because they may not be around very soon.
 
 1 0
 
 Like our own Open Site Explorer, YSE is really more of a tool for exploring your link graph. Unfortunately, most of the old Yahoo tools for really digging into your index are slowly going away since the Bing integration. I love YSE, but it's hard to recommend these tools to people, because they may not be around very soon.
 Cancel
JFloyd

2011-11-17T11:07:11-08:00

Dr. Pete, thank you for taking the 20 or so hours to write this post it covered something that I have been thinking about allot recently. I do have a question for you and anyone who is willing to help.

Regarding "Near cross-domain duplicates" I love food and I am in the process of creating a recipe site using Wordpress (for fun and exp) to build a list of recipes that all have one ingredient in common (one of my favorites of course), but I am curious about how you think Google looks at recipes compared to other content online. The reason I ask is that when you think about it there could be thousands of almost the same recipe online (Spinach Dip for expl.) with small variations in the ingredients and preparation instructions.

Basically does Google look at recipes difrent than other content online and if not how do you avoid getting penalized for duplicate content without having to do extensive research on every recipe posted?

Thank you for any impute.

1 0

Dr. Pete, thank you for taking the 20 or so hours to write this post it covered something that I have been thinking about allot recently. I do have a question for you and anyone who is willing to help. Regarding "Near cross-domain duplicates" I love food and I am in the process of creating a recipe site using Wordpress (for fun and exp) to build a list of recipes that all have one ingredient in common (one of my favorites of course), but I am curious about how you think Google looks at recipes compared to other content online. The reason I ask is that when you think about it there could be thousands of almost the same recipe online (Spinach Dip for expl.) with small variations in the ingredients and preparation instructions. Basically does Google look at recipes difrent than other content online and if not how do you avoid getting penalized for duplicate content without having to do extensive research on every recipe posted? Thank you for any impute.
Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T14:05:00-08:00
 
 I think it's natural for content to be similar across subject matter areas. We obviously use the same terminology as other SEO blogs, cover some of the same topics, etc. There will inevitably be keyword overlap, sometimes large scale. In that case, it comes down to the usual SEO factors - your on-page targeting, link profile, etc.
 
 If you're all posting the same (or 90%+ identical) recipes, it is a lot trickier. We see this a lot in e-commerce - 500 sites sell a product and all use the manufacturer's description. More and more, those sites are losing ranking ability, and it really comes down to what else you bring to the table. One way or another, you're going to need some unique content going forward.
 
 2 0
 
 I think it's natural for content to be similar across subject matter areas. We obviously use the same terminology as other SEO blogs, cover some of the same topics, etc. There will inevitably be keyword overlap, sometimes large scale. In that case, it comes down to the usual SEO factors - your on-page targeting, link profile, etc. If you're all posting the same (or 90%+ identical) recipes, it is a lot trickier. We see this a lot in e-commerce - 500 sites sell a product and all use the manufacturer's description. More and more, those sites are losing ranking ability, and it really comes down to what else you bring to the table. One way or another, you're going to need some unique content going forward.
 Cancel
 - JFloyd
 
 2011-11-17T14:45:38-08:00
 
 Thanks allot for taking the time to answer my question Dr. Pete and I completely agree. It was great to get some confirmation on wether I was thinking about things the right way from someone with a bit more or I should say allot more exp than me :). Also I already have more than a few concepts in mind to make the site stand out at least a little from all the others online to help bring in new visitors and keep them coming back.
 
 Again ty so much for your time.
 
 1 0
 
 Thanks allot for taking the time to answer my question Dr. Pete and I completely agree. It was great to get some confirmation on wether I was thinking about things the right way from someone with a bit more or I should say allot more exp than me :). Also I already have more than a few concepts in mind to make the site stand out at least a little from all the others online to help bring in new visitors and keep them coming back. Again ty so much for your time.
 Cancel
Anthony Dazhan

2011-11-17T10:05:54-08:00

Great article; however, its very long. I'd suggest adding a table of contents

1 0

Great article; however, its very long. I'd suggest adding a table of contents
Cancel
Superb

2011-11-17T10:15:00-08:00

Wow! Thanks for the detailed interesting post on a really complicated issue!

This is an ultimate guide to understand Duplicate Content from Basic to Advance.

1 0

Wow! Thanks for the detailed interesting post on a really complicated issue! This is an ultimate guide to understand Duplicate Content from Basic to Advance.
Cancel
Samuel John Beavan

2011-11-17T10:56:00-08:00

A useful, but very long post. I will refer to this again and again.

1 0

A useful, but very long post. I will refer to this again and again.
Cancel
Associate

Mike Tekula
Associate

2011-11-17T13:24:30-08:00

Damn Pete, this is a serious amount of work. Thanks for pulling this together. Bookmarked.

1 0

Damn Pete, this is a serious amount of work. Thanks for pulling this together. Bookmarked.
Cancel
PMDigitalSEO54

2011-11-17T13:35:27-08:00

Great Diagnosis Doctor!

1 0

Great Diagnosis Doctor!
Cancel
Hummingbird25

2011-11-17T15:39:52-08:00

Hi Dr Pete I am just a very inexperienced website owner who has seen a dramatic dip in website visitors since mid-October and it has set me thinking about duplicate content. Having read your article, I think it could possibly be one of two things, but would welcome your opinion. I sell children's books, through linking as an affiliate to Amazon. Could it be those affiliate URLs causing the problem? As far as I know, they only appear on the Amazon landing page. Would Google penalise me for this? Or is it more likely that Google doesn't like the cut and pasted book descriptions which came mainly from another publisher whose books I used to buy regularly? I can, of course change them to my own descriptions, although clearly it would take a while with 400 books. My logic for using their descriptions was that I was selling their books!Naieve maybe?

1 0

Hi Dr Pete I am just a very inexperienced website owner who has seen a dramatic dip in website visitors since mid-October and it has set me thinking about duplicate content. Having read your article, I think it could possibly be one of two things, but would welcome your opinion. I sell children's books, through linking as an affiliate to Amazon. Could it be those affiliate URLs causing the problem? As far as I know, they only appear on the Amazon landing page. Would Google penalise me for this? Or is it more likely that Google doesn't like the cut and pasted book descriptions which came mainly from another publisher whose books I used to buy regularly? I can, of course change them to my own descriptions, although clearly it would take a while with 400 books. My logic for using their descriptions was that I was selling their books!Naieve maybe?
Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T16:30:07-08:00
 
 It's tough for affiliates these days, and Google hasn't been kind. Your instinct is correct - if you're using Amazon's information/copy and then linking out, Google is going to question whether you're adding any value. It's critical that you get some form of unique content in place to supplement the shared information. It doesn't have to be all 400 at once - start with your Top 20 revenue drivers and see what kind of impact it has.
 
 2 0
 
 It's tough for affiliates these days, and Google hasn't been kind. Your instinct is correct - if you're using Amazon's information/copy and then linking out, Google is going to question whether you're adding any value. It's critical that you get some form of unique content in place to supplement the shared information. It doesn't have to be all 400 at once - start with your Top 20 revenue drivers and see what kind of impact it has.
 Cancel
 - Vikk
 
 2011-11-27T15:12:02-08:00
 
 I never pull duplicate information from Amazon. For one thing, it's not my content. For another, I figured the text would be dinged for duplicate. And besides, if the user can get that description from Amazon why bother with me? I'm not adding anything to the mix.
 
 1 0
 
 I never pull duplicate information from Amazon. For one thing, it's not my content. For another, I figured the text would be dinged for duplicate. And besides, if the user can get that description from Amazon why bother with me? I'm not adding anything to the mix. 
 Cancel
MatildaRose

2011-11-17T19:42:04-08:00

It's duplicate content on the same site that is a problem. Am not really convinced that syndication hurts, as long as you ensure your own page is stronger (ie has more links) than the syndicated copies.

MatildaRose edited 2011-11-17T19:42:41-08:00
1 0

It's duplicate content on the same site that is a problem. Am not really convinced that syndication hurts, as long as you ensure your own page is stronger (ie has more links) than the syndicated copies.
Cancel
Hummingbird25

2011-11-18T00:24:43-08:00

Thanks Dr Pete, I'll get going on that content then. Could the URLs themselves, that I link to on Amazon, be causing a problem ? They must all look very similar, containing, as they do my affiliate ID number? Or is that irrelevant?

1 0

Thanks Dr Pete, I'll get going on that content then. Could the URLs themselves, that I link to on Amazon, be causing a problem ? They must all look very similar, containing, as they do my affiliate ID number? Or is that irrelevant?
Cancel
- Dr. Peter J. Meyers
 
 2011-11-18T05:20:30-08:00
 
 Purely from a duplicate content standpoint, handling those affiliate URLs is only an issue for Amazon to deal with. However, being an affiliate and linking out for your products does have many of its own SEO challenges.
 
 1 0
 
 Purely from a duplicate content standpoint, handling those affiliate URLs is only an issue for Amazon to deal with. However, being an affiliate and linking out for your products does have many of its own SEO challenges.
 Cancel
 - Hummingbird25
 
 2011-11-18T10:20:10-08:00
 
 Thanks for all your help. 10 product descriptions changed....390 to go.......
 
 Hummingbird25 edited 2011-11-18T10:23:23-08:00
 1 0
 
 Thanks for all your help. 10 product descriptions changed....390 to go.......
 Cancel
 - Vikk
 
 2011-11-27T15:15:30-08:00
 
 Any chance you could explain why using an Amazon affiliate link is an SEO problem? (This is only my 2nd post to read here and I know nothing but am working on it.)
 
 1 0
 
 Any chance you could explain why using an Amazon affiliate link is an SEO problem? (This is only my 2nd post to read here and I know nothing but am working on it.) 
 Cancel
 - Dr. Peter J. Meyers
 
 2011-11-28T10:07:46-08:00
 
 It's not really a duplicate or Panda problem, so much as that Google doesn't view affiliates all that positively. In their mind, Amazon is the source, and you're just a copy - you don't have your own product pages, so they see you as less authoritative. So, the trick is to create your own unique, supporting content, and become an authority.
 
 1 0
 
 It's not really a duplicate or Panda problem, so much as that Google doesn't view affiliates all that positively. In their mind, Amazon is the source, and you're just a copy - you don't have your own product pages, so they see you as less authoritative. So, the trick is to create your own unique, supporting content, and become an authority.
 Cancel
NewGlobalVentures.comSEOTexas

2011-11-17T15:20:35-08:00

Mad props Dr. Pete! This is great stuff. Thanks and keep 'em coming!

1 0

Mad props Dr. Pete! This is great stuff. Thanks and keep 'em coming!
Cancel
trader247

2011-11-17T15:02:18-08:00

This is a great and extensive reference for duplicate content. Thank you so much for taking the time to put it together. This will be great for a sticky and reference post.

1 0

This is a great and extensive reference for duplicate content. Thank you so much for taking the time to put it together. This will be great for a sticky and reference post.
Cancel
Michael Muselík

2011-11-17T14:23:39-08:00

Very nice summary. Many times I was convinced that the elimination of duplicate content significantly improved positions in search engines and website visits.

1 0

Very nice summary. Many times I was convinced that the elimination of duplicate content significantly improved positions in search engines and website visits.
Cancel
TargetClick

2011-11-17T14:35:16-08:00

Dr. Pete you are a stud! I have already implemented some of the techniques you detailed above. Stellar work!

1 0

Dr. Pete you are a stud! I have already implemented some of the techniques you detailed above. Stellar work! 
Cancel
Dana DiTomaso

2011-11-17T09:48:48-08:00

Dr. Pete, what a fabulous article. Thank you so much for putting it together.

1 0

Dr. Pete, what a fabulous article. Thank you so much for putting it together.
Cancel
AdamJustice48

2011-11-17T09:47:23-08:00

Dr. Pete, a great article on this subject. One thing I am still curious about and maybe I missed this in other comments or somewhere else but what do we do to protect ourselves when someone else duplicates our content?

An example is we have a customer overseas that buys product from time to time, I come to find out they have copied and pasted all of our web contnet on their website without asking. Other then asking them to take it down whats the best way to stay protected from this type of thing hurting our rankings?

Thanks!

1 0

Dr. Pete, a great article on this subject. One thing I am still curious about and maybe I missed this in other comments or somewhere else but what do we do to protect ourselves when someone else duplicates our content? An example is we have a customer overseas that buys product from time to time, I come to find out they have copied and pasted all of our web contnet on their website without asking. Other then asking them to take it down whats the best way to stay protected from this type of thing hurting our rankings? Thanks!
Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T14:02:24-08:00
 
 Funny, that came up on Twitter, too. I was approaching the article from the standpoint of sites that are duplicating content (not being duplicated) - the problem of what to do when your content is being scraped is a really tough one. Google will try to find the original source, but they don't always get it right. Unfortunately, without cooperation from the other site, it gets tricky. If things get bad, you can request a DMCA takedown.
 
 2 0
 
 Funny, that came up on Twitter, too. I was approaching the article from the standpoint of sites that are duplicating content (not being duplicated) - the problem of what to do when your content is being scraped is a really tough one. Google will try to find the original source, but they don't always get it right. Unfortunately, without cooperation from the other site, it gets tricky. If things get bad, you can request a DMCA takedown.
 Cancel
Sergio Redondo

2011-11-17T01:49:31-08:00

WOW!

Eh, guys, finally we have it: "The Definitive Duplicate Content Post". This is an example of solidarity from a SEO expert to all his mates. Thanks Dr. Pete, obviously you've made a huge effort compiling all this information.

Thanks for your help. Simply, thank you very much.

1 0

WOW! Eh, guys, finally we have it: "The Definitive Duplicate Content Post". This is an example of solidarity from a SEO expert to all his mates. Thanks Dr. Pete, obviously you've made a huge effort compiling all this information. Thanks for your help. Simply, thank you very much.
Cancel
Katie Walton

2011-11-17T01:56:33-08:00

Think I need a cup of tea or some kind of sports drink to recover now I've read that. Thanks for the info Dr Pete!

1 0

Think I need a cup of tea or some kind of sports drink to recover now I've read that. Thanks for the info Dr Pete!
Cancel
seozhu

2011-11-17T02:11:34-08:00

Amazing detailed guide this! Congratulation!

1 0

Amazing detailed guide this! Congratulation!
Cancel
soltec

2011-11-17T01:44:37-08:00

Hi Dr. Pete,

Great article, I will be implementing your advice on our next round of website updates.

Thank you K

1 0

Hi Dr. Pete, Great article, I will be implementing your advice on our next round of website updates. Thank you K
Cancel
Chris Hill

2011-11-17T01:33:55-08:00

Great list of sources of duplication.

This one might be covered under 'duplicate paths', but server load balancing can also be a source of duplication - www1 and www2 versions of the same page.

1 0

Great list of sources of duplication. This one might be covered under 'duplicate paths', but server load balancing can also be a source of duplication - www1 and www2 versions of the same page. 
Cancel
Atul Sharma

2011-11-16T20:53:05-08:00

What a great resource about duplicate content ! Thats sure for reference when any discussion comes up about duplicate content.

I didn't understand this part. would you pls explain...

" Where it gets tricky is that you’re almost always going to have to generate these tags dynamically, as your search results are probably driven by one template."

1 0

What a great resource about duplicate content ! Thats sure for reference when any discussion comes up about duplicate content. I didn't understand this part. would you pls explain... " Where it gets tricky is that you’re almost always going to have to generate these tags dynamically, as your search results are probably driven by one template."
Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T05:15:44-08:00
 
 Oh, sorry - Here's a longer explanation. The trick with Rel-Prev/Rel-Next is that the tag(s) are different depending on what page of paginated content you're on (2, 3, 4, etc.). Typically, your entire search is driven by one physical page/template, so it's going to take some code to generate the right tags.
 
 Actually, Google even says to not put Rel-Prev on the first page or Rel-Next on the last page - that makes perfect sense, but it makes the code even a little tricker. It's not the friendliest solution for CMS users or webmasters without much coding experience.
 
 2 0
 
 Oh, sorry - Here's a longer explanation. The trick with Rel-Prev/Rel-Next is that the tag(s) are different depending on what page of paginated content you're on (2, 3, 4, etc.). Typically, your entire search is driven by one physical page/template, so it's going to take some code to generate the right tags. Actually, Google even says to not put Rel-Prev on the first page or Rel-Next on the last page - that makes perfect sense, but it makes the code even a little tricker. It's not the friendliest solution for CMS users or webmasters without much coding experience.
 Cancel
Hexpress

2011-11-16T22:20:29-08:00

Congrats to DR Pete, many known and unknown issues for me, all together placed at one post.

1 0

Congrats to DR Pete, many known and unknown issues for me, all together placed at one post. 
Cancel
Jackson Lo

2011-11-16T22:40:25-08:00

Same with jrcooper asked, how long did it take you to write this post!?

I saw the post get shared on Twitter and after visiting it, I scrolled to the bottom because it was past 12am, there was a long debate whether to read it... I went with reading it :)

This is one very useful guide to diagnose duplicate content and you illustrated some great examples that I've seen with the clients I've worked with that has/had those issues, mostly in the e-commerce space. The biggest issue is probably the duplicate paths and we've recommended to add the canonical tag since some CMS will generate and publish multiple versions of a product pages if they are tagged/categorized multiple times (grrr!). Great to validate my thoughts/suggestions with yours here Dr. Pete! :)

jackson_lo edited 2011-11-16T22:50:09-08:00
1 0

Same with jrcooper asked, how long did it take you to write this post!? I saw the post get shared on Twitter and after visiting it, I scrolled to the bottom because it was past 12am, there was a long debate whether to read it... I went with reading it :) This is one very useful guide to diagnose duplicate content and you illustrated some great examples that I've seen with the clients I've worked with that has/had those issues, mostly in the e-commerce space. The biggest issue is probably the duplicate paths and we've recommended to add the canonical tag since some CMS will generate and publish multiple versions of a product pages if they are tagged/categorized multiple times (grrr!). Great to validate my thoughts/suggestions with yours here Dr. Pete! :)
Cancel
VVV

2011-11-17T02:20:58-08:00

Hi Pete, for the Cross-ccTLD Duplicates case (19) have you thought about the Rel=alternate hreflang solution ? Here it goes : https://www.google.com/support/webmasters/bin/answer.py?answer=189077&&hl=en (see the "How does this apply to multi-regional webpages?" part)

I haven't tested it yet but am about to do so.

Has anybody tested out this solution ?

Thanks !

2 1

Hi Pete, for the Cross-ccTLD Duplicates case (19) have you thought about the Rel=alternate hreflang solution ? Here it goes : https://www.google.com/support/webmasters/bin/answer.py?answer=189077&&hl=en (see the "How does this apply to multi-regional webpages?" part) I haven't tested it yet but am about to do so. Has anybody tested out this solution ? Thanks !
Cancel
- Dr. Peter J. Meyers
 
 2011-11-17T05:35:15-08:00
 
 I"ve gotta be honest - that's the first I've heard of that one. I admit that international SEO isn't my strong point. I'd love to hear from people who have used it, as well. YOUmoz post, anyone? :)
 
 1 0
 
 I"ve gotta be honest - that's the first I've heard of that one. I admit that international SEO isn't my strong point. I'd love to hear from people who have used it, as well. YOUmoz post, anyone? :)
 Cancel
 - VVV
 
 2011-11-17T06:55:31-08:00
 
 There is still little information on the Internet about this solution.
 
 It was previously mentioned on SEOMoz (for instance here https://www.seomoz.org/blog/duplicate-content-block-redirect-or-canonical) but the problem is nobody seems to have actually tested it so we can't really know if it works or not ...
 
 It would be great if we could have some feedback from persons who tried this!
 
 1 0
 
 There is still little information on the Internet about this solution. It was previously mentioned on SEOMoz (for instance here <a href="https://www.seomoz.org/blog/duplicate-content-block-redirect-or-canonical" rel="nofollow">https://www.seomoz.org/blog/duplicate-content-block-redirect-or-canonical</a>) but the problem is nobody seems to have actually tested it so we can't really know if it works or not ... It would be great if we could have some feedback from persons who tried this!
 Cancel
- Stephanie Chang
 
 2011-11-17T07:00:47-08:00
 
 At Distilled, we've analyzed this specific post several times. However, for my own international client, we decided against recommending it. It's very complicated, code heavy, and even the post itself isn't entirely clear. We just thought it might actually be difficult for Google to implement. Also Google is pretty good at determining translated content.
 
 2 0
 
 At Distilled, we've analyzed this specific post several times. However, for my own international client, we decided against recommending it. It's very complicated, code heavy, and even the post itself isn't entirely clear. We just thought it might actually be difficult for Google to implement. Also Google is pretty good at determining translated content.
 Cancel
 - VVV
 
 2011-11-17T07:06:24-08:00
 
 But what about non translated content ? For instance a website that has exactly the same content for the .uk and .com domains.
 
 Problem is, although we have declared a preferred domain in Google Webmaster Tools and heavily individualized the Title & Description tags for our pages, Google still mixes them from time to time (showing the UK website on Google COM and vice-versa).
 
 Theoretically I find this solution satisfactory but I really wonder if it works
 
 1 0
 
 But what about non translated content ? For instance a website that has exactly the same content for the .uk and .com domains. Problem is, although we have declared a preferred domain in Google Webmaster Tools and heavily individualized the Title & Description tags for our pages, Google still mixes them from time to time (showing the UK website on Google COM and vice-versa). Theoretically I find this solution satisfactory but I really wonder if it works
 Cancel
 - Stephanie Chang
 
 2011-11-17T07:35:19-08:00
 
 For one, having two domains with the exact same content isn't ideal because it will be considered duplicate content. However, to get to the main point, I have never seen a live example of where the hreflang has been implemented. My understanding of it is that Google isn't consistent on how it implements the tag either. Overall, I'd be very interested to see if the solution is satisfactory. Please write a YOUMOZ post on this ;).
 
 1 0
 
 For one, having two domains with the exact same content isn't ideal because it will be considered duplicate content. However, to get to the main point, I have never seen a live example of where the hreflang has been implemented. My understanding of it is that Google isn't consistent on how it implements the tag either. Overall, I'd be very interested to see if the solution is satisfactory. Please write a YOUMOZ post on this ;).
 Cancel
 - VVV
 
 2011-11-17T07:46:45-08:00
 
 Actually our website is one of the most important current French and European websites.
 
 The reason for us having duplicate content on different domains is that we have a very international approach. For CTR reasons we had to serve the english content for UK users on a .uk domain and the english content for international users on a .com domain. The same goes for French (with domains in France, Belgium and Switzerland), German, Italian etc etc etc
 
 We will be testing the hreflang tag in a month or so, only for the UK - COM websites. Will sure let people know about it, because I think it can solve many problems if it works.
 
 2 0
 
 Actually our website is one of the most important current French and European websites. The reason for us having duplicate content on different domains is that we have a very international approach. For CTR reasons we had to serve the english content for UK users on a .uk domain and the english content for international users on a .com domain. The same goes for French (with domains in France, Belgium and Switzerland), German, Italian etc etc etc We will be testing the hreflang tag in a month or so, only for the UK - COM websites. Will sure let people know about it, because I think it can solve many problems if it works.
 Cancel
Simon Dalley

2011-11-17T02:25:24-08:00

What a brilliant post and great contribution to this subject. I've been struggling with a site for a while that sits on a .com address where the .com targets the UK and .com/USA targets the US - only problem this really confuses the heck out of Google!

1 0

What a brilliant post and great contribution to this subject. I've been struggling with a site for a while that sits on a .com address where the .com targets the UK and .com/USA targets the US - only problem this really confuses the heck out of Google!
Cancel
Dan Alderson

2011-11-17T09:23:52-08:00

A very useful post for any SEO, thanks Dr. Pete!

1 0

A very useful post for any SEO, thanks Dr. Pete!
Cancel
Associate

Michael Cottam
Associate

2011-11-17T09:27:40-08:00

Truly fantastic post, Pete!

The next post I'd like to see following on from this would cover how Google detects duplicate content. I.e., how does Google measure the similarity/difference between two pages....do they have a way to remove the "template" elements (header, nav, footer), do they create a hash of the text blocks and compare those....do image filenames get inspected...etc.

Certainly in the Q&A questions I've been tackling here I've had a TON of people desperate to know what to do to avoid getting tagged with duplicate content, when their content really comes from a very common source (i.e. an RSS feed, or an affiliate program, or a manufacturer's set of product descriptions and images etc.).

Of course, the answer to that today would probably not be valid in a month or whenever the next Panda iteration happens :-)

1 0

Truly fantastic post, Pete! The next post I'd like to see following on from this would cover how Google detects duplicate content. I.e., how does Google measure the similarity/difference between two pages....do they have a way to remove the "template" elements (header, nav, footer), do they create a hash of the text blocks and compare those....do image filenames get inspected...etc. Certainly in the Q&A questions I've been tackling here I've had a TON of people desperate to know what to do to avoid getting tagged with duplicate content, when their content really comes from a very common source (i.e. an RSS feed, or an affiliate program, or a manufacturer's set of product descriptions and images etc.). Of course, the answer to that today would probably not be valid in a month or whenever the next Panda iteration happens :-)
Cancel
gabrielaz

2011-11-17T09:44:08-08:00

Dr. Pete! ...I found your post more than useful! It's for sure it'll be a classic ..thank sfor your time doing it and for sharing of course :D ...

1 0

Dr. Pete! ...I found your post more than useful! It's for sure it'll be a classic ..thank sfor your time doing it and for sharing of course :D ...
Cancel
Pedro Matias

2011-11-17T09:14:58-08:00

Read nothing else on the subject just pin it on your SEO wall!

After a long well craffted list the best comes at the end with "use your brain", always the best tool! Or not :)

1 0

Read nothing else on the subject just pin it on your SEO wall! After a long well craffted list the best comes at the end with "use your brain", always the best tool! Or not :)
Cancel
al sefati

2011-11-17T07:18:18-08:00

I have a press release site and I thought about blocking the PR folder in robots.txt but that would kill all the SEO benefits for me and also the submitters so what I did is that I did a cross site canonical tag. So far its working although some may say its a hack but it work.

1 0

I have a press release site and I thought about blocking the PR folder in robots.txt but that would kill all the SEO benefits for me and also the submitters so what I did is that I did a cross site canonical tag. So far its working although some may say its a hack but it work. 
Cancel
DanAlmond

2011-11-17T03:01:42-08:00

Another great post Dr Pete, even if I did start reading yesterday and have only just finished...:)

The major point of frustration for me with 'near duplicate content' is with regards to #15: Geo-keyword variations. I have invested a huge amount of time and effort in creating unique location-specific content for our agency site and for multiple client sites, including the use of 'clickable maps' to implement a link structure that doesn't look spammy. I then see competitors using the old 'find and replace' technique (simply replacing the place name hundreds of times) and consistently outranking my sites.

I actually look forward to the day that Google seriously devalue the existence of inbound external links, as these seem to have the power to 'override' legitimate SEO work on occasions like these.

1 0

Another great post Dr Pete, even if I did start reading yesterday and have only just finished...:) The major point of frustration for me with 'near duplicate content' is with regards to #15: Geo-keyword variations. I have invested a huge amount of time and effort in creating unique location-specific content for our agency site and for multiple client sites, including the use of 'clickable maps' to implement a link structure that doesn't look spammy. I then see competitors using the old 'find and replace' technique (simply replacing the place name hundreds of times) and consistently outranking my sites. I actually look forward to the day that Google seriously devalue the existence of inbound external links, as these seem to have the power to 'override' legitimate SEO work on occasions like these.
Cancel
ARMofficial

2011-11-17T03:34:26-08:00

Insane post. Thank you kindly DR. Pete.

1 0

Insane post. Thank you kindly DR. Pete.
Cancel
StoresDirect

2011-11-18T02:22:16-08:00

Thanks Dr Pete for putting so much effort in this post. It's gone straight into my list of top 10 resources.

1 0

Thanks Dr Pete for putting so much effort in this post. It's gone straight into my list of top 10 resources.
Cancel
Simon Colley

2011-11-18T03:54:08-08:00

Thanks for such a fantastic post.

I have a question following a couple of points made above.A client has an ecommerce site and has taken the time to create unique product descriptions on its site rather than using the manufacturer's.

My client also sells these products on Amazon, if they use the same original product descriptions from their site on Amazon, is that still going to be viewed as duplicate content?

1 0

Thanks for such a fantastic post. I have a question following a couple of points made above.A client has an ecommerce site and has taken the time to create unique product descriptions on its site rather than using the manufacturer's. My client also sells these products on Amazon, if they use the same original product descriptions from their site on Amazon, is that still going to be viewed as duplicate content?
Cancel
- Dr. Peter J. Meyers
 
 2011-11-18T05:21:24-08:00
 
 It could be, yes, especially since Amazon has such massive authority. If you're already set up this way, you should definitely keep tabs on how your product pages rank vs. the Amazon versions.
 
 1 0
 
 It could be, yes, especially since Amazon has such massive authority. If you're already set up this way, you should definitely keep tabs on how your product pages rank vs. the Amazon versions.
 Cancel
 - Simon Colley
 
 2011-11-18T06:15:15-08:00
 
 Thanks Pete much appreciated.
 
 1 0
 
 Thanks Pete much appreciated.
 Cancel
 - JonRWilhelm
 
 2011-11-18T09:44:19-08:00
 
 I would say you should do a test over a month or 2 with a few products using the amazon approach and a few that are simply on your clients site. That is the best way to find out real world. We do tests like this all the time. We also do tests on static vs dynamic page content and find that static content 100% of the time trumps dynamic content.
 
 1 0
 
 I would say you should do a test over a month or 2 with a few products using the amazon approach and a few that are simply on your clients site. That is the best way to find out real world. We do tests like this all the time. We also do tests on static vs dynamic page content and find that static content 100% of the time trumps dynamic content.
 Cancel
itzjenna

2011-11-30T09:56:27-08:00

Wow. Thanks so much for this content. Super useful and appreciated. LMAO and extra TY from the "visual learners" : )

1 0

Wow. Thanks so much for this content. Super useful and appreciated. LMAO and extra TY from the "visual learners" : )
Cancel
a_laurie001

2011-11-30T11:26:54-08:00

Thanks a great article, can you just confirm:

If i have two micro websites (both pointing to the main website) and i have put the same copy on each but then changed (spun) a number of the words and changed the images on one of the micro sites. Would Google pick that up and classify the sites as 'Near duplicates'? Thus giving one of them or both of them a bad ranking?

Thanks!

a_laurie001 edited 2011-11-30T11:28:47-08:00
1 0

Thanks a great article, can you just confirm: If i have two micro websites (both pointing to the main website) and i have put the same copy on each but then changed (spun) a number of the words and changed the images on one of the micro sites. Would Google pick that up and classify the sites as 'Near duplicates'? Thus giving one of them or both of them a bad ranking? Thanks!
Cancel
- Dr. Peter J. Meyers
 
 2011-11-30T14:30:47-08:00
 
 It depends, but it's highly unlikely that all of them will rank. Google is likely to filter out 1 or more. With 3 sites, the odds you'd be flat-out penalized aren't that high (depending on the sites), but the odds that 3 very similar sites will all rank are very low, unless there's virtually no competition. I think these kinds of microsites, which used to be pretty effective a few years ago (maybe even a couple of years ago) are going to lose ground quickly over the next year.
 
 2 0
 
 It depends, but it's highly unlikely that all of them will rank. Google is likely to filter out 1 or more. With 3 sites, the odds you'd be flat-out penalized aren't that high (depending on the sites), but the odds that 3 very similar sites will all rank are very low, unless there's virtually no competition. I think these kinds of microsites, which used to be pretty effective a few years ago (maybe even a couple of years ago) are going to lose ground quickly over the next year.
 Cancel
Morgan Linton

2011-12-02T13:12:55-08:00

Great article and definitely one of the most in-depth pieces I've ever read on the topic. I'm still surprised how many people still don't understand how Google treats duplicate content. I'll definitely share this on my blog as it's a question that I get all the time. Thanks again!

1 0

Great article and definitely one of the most in-depth pieces I've ever read on the topic. I'm still surprised how many people still don't understand how Google treats duplicate content. I'll definitely share this on my blog as it's a question that I get all the time. Thanks again!
Cancel
GLOBONET

2011-11-30T07:34:58-08:00

Thanks for the great post. Ok, unique content is the only way to led search engines know who I am. But how can I concince our customers to create unique content for businesses like srews or turned parts? Following Matt Cutts advice to become a leader in my niche, it seems very hard to realize, because my competitor has the same idea with the same content...

1 0

Thanks for the great post. Ok, unique content is the only way to led search engines know who I am. But how can I concince our customers to create unique content for businesses like srews or turned parts? Following Matt Cutts advice to become a leader in my niche, it seems very hard to realize, because my competitor has the same idea with the same content...
Cancel
- Dr. Peter J. Meyers
 
 2011-11-30T14:29:10-08:00
 
 I think you have to look at content as an extension of your unique value proposition. What does your client do that sets them apart? It's a broader and critical business question, well beyond SEO. If they can answer it, you've got a basis for content. If they can't, you have bigger problems than SEO.
 
 I'd honestly suggest walking the floor, if you can - talk to the people who work there and find out what energizes them. Someone there is good at their job in a way that's interesting, and they probably don't even know it. Someone is passionate about a product others find boring. If you can understand those people, you've got a starting point.
 
 1 0
 
 I think you have to look at content as an extension of your unique value proposition. What does your client do that sets them apart? It's a broader and critical business question, well beyond SEO. If they can answer it, you've got a basis for content. If they can't, you have bigger problems than SEO. I'd honestly suggest walking the floor, if you can - talk to the people who work there and find out what energizes them. Someone there is good at their job in a way that's interesting, and they probably don't even know it. Someone is passionate about a product others find boring. If you can understand those people, you've got a starting point.
 Cancel
Bob Jones

2011-11-29T00:15:14-08:00

Wish I could thumb up this post again - because I would.

1 0

Wish I could thumb up this post again - because I would.
Cancel
DamirV

2011-11-25T08:04:05-08:00

I wasn't see a long time such well prepared post which pointing to potential problems and offers effective practical solutions. Duplicate content issue will stay hot topic for a while. It takes time for cleaning of big amount of duplicate content all over the net, created mostly by spammers and low quality link providers. But, their time is gone. Now I know what Dr. means when we talk about Dr. Pete. Excellent post, I'll bookmark it and read once again.

1 0

I wasn't see a long time such well prepared post which pointing to potential problems and offers effective practical solutions. Duplicate content issue will stay hot topic for a while. It takes time for cleaning of big amount of duplicate content all over the net, created mostly by spammers and low quality link providers. But, their time is gone. Now I know what Dr. means when we talk about Dr. Pete. Excellent post, I'll bookmark it and read once again.
Cancel
Vikk

2011-11-27T15:55:01-08:00

It is an absolute delight to read someone who can actually write about this subject using English that a layperson who knows absolutely nothing about the subject can actually understand. I can't believe it and didn't think it was possible.

I don't know code, don't know SEO, and can barely figure out a stupid keyword. I am as non-technie as they come. I'm a writer who blogs. I think, for me, PANDA has been a god-send because SEO is now moving more toward my territory because things are starting to make a little more sense.

This is my first visit here and, as I said, the 2nd article i've read--both by you. I didn't find this piece long at all. I found it comprehensive and the length probably necessary in order to deal intelligently with the subject matter. What immediately caught my attention was the whole question of what is duplicate content. This is actually the first article I've read that suggests that there is more to it than scraping and copying blocks of text. Not only had I never considered or realized that pages could be duplicated in the way you're discussing but the question of "near content" and specifically with regard to the organization of content on the page raised flags. So I have a couple of questions.
- If you are doing a series of articles (pages) and use a common layout structure with the content, photos, tweaking headings and subheadings but following a specified model, how is that going to play out with Google? The content text would be unique as it would be a variation on a themed subject such as a series of articles on, say, Best Books for Writers and then each page would target different types of writer or categories of books. Given your description, it sounds pretty "near" to me. So this would be a no-no?
- Another thing that struck me was when you discussed the keyword usage--I think it may have been in the comments as I read each one. Here you seem to be suggesting that variations on a keyword could cause Google to suspect "near duplicate" content. The example was using "home tuition," "home tuition classes," and "tutor.
- - Are you saying that Google would interpret the usage of the varied keywords in the page url and in keyword application would be considered "near" even if the text content was unique or specific enough not to be the same? Is it just the URL issue or is it also content? I ask because I see a lot of posts that talk about creating content based on that type of approach.
- - If this is all considered "near duplicate," then how does that impact sites that look for a unified, choesive look and feel as well as that type of content?
I honestly just thought all this Panda hoopla was basically over scraping and plagerism and things like that because that's what I've been reading. This is different--way different.

I want to know these things but clearly I'm not a coder and much of what you are talking about is way over my head. I'm assuming that these non-content issues such as URL duplication, etc., were I to find them, would require a technical person to handle.

What helps me is to know about these things and understand what I can do at the level of creating a post to eliminate the possibility of these things happening; but I get the feeling that some of what you are talking about may not be something I can control. That it happens when you have a site that does specific things such as sell items and use a shopping cart.

So my last question would be whether what are you talking about predominately happens on sites that developers create or if this is something that happens if you have a blog site that uses Wordpress and a theme such as Headway and you don't do a lot of fancy stuff or try to change the basic theme structure.

In essence, how much should I be worried?

Oh, and the only thing I've read for months about "thin" content references wordcount. You raise a whole new issue. (#16)

Thanks again for a well-thought out and developed article that even I could understand enough to go through the whole thing. And thanks to everyone commenting for such great interaction and suggestions. (My comment pays homage to your article in length.)

Vikk edited 2011-11-27T16:07:13-08:00
1 0
It is an absolute delight to read someone who can actually write about this subject using English that a layperson who knows absolutely nothing about the subject can actually understand. I can't believe it and didn't think it was possible. I don't know code, don't know SEO, and can barely figure out a stupid keyword. I am as non-technie as they come. I'm a writer who blogs. I think, for me, PANDA has been a god-send because SEO is now moving more toward my territory because things are starting to make a little more sense. This is my first visit here and, as I said, the 2nd article i've read--both by you. I didn't find this piece long at all. I found it comprehensive and the length probably necessary in order to deal intelligently with the subject matter. What immediately caught my attention was the whole question of what is duplicate content. This is actually the first article I've read that suggests that there is more to it than scraping and copying blocks of text. Not only had I never considered or realized that pages could be duplicated in the way you're discussing but the question of "near content" and specifically with regard to the organization of content on the page raised flags. So I have a couple of questions. <ul><li>If you are doing a series of articles (pages) and use a common layout structure with the content, photos, tweaking headings and subheadings but following a specified model, how is that going to play out with Google? The content text would be unique as it would be a variation on a themed subject such as a series of articles on, say, Best Books for Writers and then each page would target different types of writer or categories of books. Given your description, it sounds pretty "near" to me. So this would be a no-no?</li> </ul> <ul><li>Another thing that struck me was when you discussed the keyword usage--I think it may have been in the comments as I read each one. Here you seem to be suggesting that variations on a keyword could cause Google to suspect "near duplicate" content. The example was using "home tuition," "home tuition classes," and "tutor. </li> <li> <ul><li>Are you saying that Google would interpret the usage of the varied keywords in the page url and in keyword application would be considered "near" even if the text content was unique or specific enough not to be the same? Is it just the URL issue or is it also content? I ask because I see a lot of posts that talk about creating content based on that type of approach.</li> </ul></li> </ul> <ul><li> <ul><li>If this is all considered "near duplicate," then how does that impact sites that look for a unified, choesive look and feel as well as that type of content? </li> </ul></li> </ul> I honestly just thought all this Panda hoopla was basically over scraping and plagerism and things like that because that's what I've been reading. This is different--way different. I want to know these things but clearly I'm not a coder and much of what you are talking about is way over my head. I'm assuming that these non-content issues such as URL duplication, etc., were I to find them, would require a technical person to handle. What helps me is to know about these things and understand what I can do at the level of creating a post to eliminate the possibility of these things happening; but I get the feeling that some of what you are talking about may not be something I can control. That it happens when you have a site that does specific things such as sell items and use a shopping cart. So my last question would be whether what are you talking about predominately happens on sites that developers create or if this is something that happens if you have a blog site that uses Wordpress and a theme such as Headway and you don't do a lot of fancy stuff or try to change the basic theme structure. In essence, how much should I be worried? Oh, and the only thing I've read for months about "thin" content references wordcount. You raise a whole new issue. (#16) Thanks again for a well-thought out and developed article that even I could understand enough to go through the whole thing. And thanks to everyone commenting for such great interaction and suggestions. (My comment pays homage to your article in length.)
Cancel
- Dr. Peter J. Meyers
 
 2011-11-28T10:06:31-08:00
 
 Thanks for the positive feedback :) Regarding your specific questions:
 
 (1) Having a common template is fine, as long as the actual content is unique. The one exception would be "spun" articles - creating dozens or 100s of articles that are very similar and only differ by a few keyphrases. These are often going to look low-value to Google. Sharing a common theme and layout is ok, though - virtually all major blogs do that.
 
 (2) Some variation is fine, and even natural. What I'd worry about is creating multiple pages that target keyword variations, where that keyword variation is the ONLY thing that changes. If you have "/home-tuition", "/home-tuition-classes", "/home-tuition courses" as 3 different pages, that only differ by those keywords (all other content is identical), that's a low-value tactic. It used to be common, and honestly, it used to work, but now the risks outweigh the benefits.
 
 If those topics are all unique and you have unique things to say about them, that's different. It's fine to target keyword variations across a site (and I'd argue it's good SEO), but you have to have the content to support those variations.
 
 (3) You can have good SEO on a custom site or a WordPress (or other CMS) site. The only issue with CMS's like WordPress is that they sometimes create duplicate URLs and paths or put the same TITLE and META description on every page. There are plug-ins and other ways to control that, though. The template itself normally isn't a problem. The exceptions would be massive templates that take a long-time to load or very "heavy" templates with very little unique content. If your template is loaded with images, ads, and plug-ins (social plug-ins, for example), and every article is a short paragraph, your site will look thin. That's a balancing act, though - there's no easy answer.
 
 1 0
 
 Thanks for the positive feedback :) Regarding your specific questions: (1) Having a common template is fine, as long as the actual content is unique. The one exception would be "spun" articles - creating dozens or 100s of articles that are very similar and only differ by a few keyphrases. These are often going to look low-value to Google. Sharing a common theme and layout is ok, though - virtually all major blogs do that. (2) Some variation is fine, and even natural. What I'd worry about is creating multiple pages that target keyword variations, where that keyword variation is the ONLY thing that changes. If you have "/home-tuition", "/home-tuition-classes", "/home-tuition courses" as 3 different pages, that only differ by those keywords (all other content is identical), that's a low-value tactic. It used to be common, and honestly, it used to work, but now the risks outweigh the benefits. If those topics are all unique and you have unique things to say about them, that's different. It's fine to target keyword variations across a site (and I'd argue it's good SEO), but you have to have the content to support those variations. (3) You can have good SEO on a custom site or a WordPress (or other CMS) site. The only issue with CMS's like WordPress is that they sometimes create duplicate URLs and paths or put the same TITLE and META description on every page. There are plug-ins and other ways to control that, though. The template itself normally isn't a problem. The exceptions would be massive templates that take a long-time to load or very "heavy" templates with very little unique content. If your template is loaded with images, ads, and plug-ins (social plug-ins, for example), and every article is a short paragraph, your site will look thin. That's a balancing act, though - there's no easy answer.
 Cancel
 - cozyweb
 
 2011-12-03T06:50:52-08:00
 
 Wow, I am glad I found this reply before posting. This addresses my question; will the old technique of adding dozens of nearly identical keyword landing pages create a penalty?
 
 It looks like combining close phrases on the same page is now better than creating a page for each keyword phrase variation or modifier.
 
 1 0
 
 Wow, I am glad I found this reply before posting. This addresses my question; will the old technique of adding dozens of nearly identical keyword landing pages create a penalty? It looks like combining close phrases on the same page is now better than creating a page for each keyword phrase variation or modifier.
 Cancel
Crown Partners

2011-11-28T06:22:33-08:00

One question which might be related to duplicate content. When a website has one url, not unique urls for each page, yet many unique pages with unique content what happens from an SEO perspective?

I am wondering if this is an advantage, since all of the pages affect the url pagerank, but I also see it as a disadvantage. Any ideas?

CrownPartners edited 2011-11-28T06:26:05-08:00
1 0

One question which might be related to duplicate content. When a website has one url, not unique urls for each page, yet many unique pages with unique content what happens from an SEO perspective? I am wondering if this is an advantage, since all of the pages affect the url pagerank, but I also see it as a disadvantage. Any ideas?
Cancel
PriyoMukherjee

2011-12-02T23:26:10-08:00

Great Information Dr. Pete. Its A-Z guidlines to slove duplicate Content issue post panda. This will surely help websites owner & webmasters who still was not able to overcome panda effect. Thanks a lot

1 0

Great Information Dr. Pete. Its A-Z guidlines to slove duplicate Content issue post panda. This will surely help websites owner & webmasters who still was not able to overcome panda effect. Thanks a lot
Cancel
BG Mahesh

2011-12-03T01:51:15-08:00

What is the best way to find out if there are duplicate content URLs for a given URL on my site? Is it copyscape.com?

Any other tools that can just keep an eye on our entire site (or selected URLs) to see if anyone has copied our content?

1 0

What is the best way to find out if there are duplicate content URLs for a given URL on my site? Is it copyscape.com? Any other tools that can just keep an eye on our entire site (or selected URLs) to see if anyone has copied our content?
Cancel
- Dr. Peter J. Meyers
 
 2011-12-05T13:05:47-08:00
 
 CopyScape is more for tracking duplicates across other domains - such as syndicated or scraped content. Our PRO tools will monitor internal duplicates, and Google Webmaster Tools can handle some of that, too.
 
 If you want to know about people copying, though, CopyScape is still one of the better, automated tools. I sometimes do a quick-and-dirty version by just copying blocks of unique test in quotes into the Google search. It's amazing how fast you can spot a couple-dozen scrapers with just a few lines of text from one of your more popular pages.
 
 1 0
 
 CopyScape is more for tracking duplicates across other domains - such as syndicated or scraped content. Our PRO tools will monitor internal duplicates, and Google Webmaster Tools can handle some of that, too. If you want to know about people copying, though, CopyScape is still one of the better, automated tools. I sometimes do a quick-and-dirty version by just copying blocks of unique test in quotes into the Google search. It's amazing how fast you can spot a couple-dozen scrapers with just a few lines of text from one of your more popular pages.
 Cancel
Sherryl Perry

2011-12-04T07:52:48-08:00

This is an amazing resource. My head is spinning from all the information. I can't thank you enough for your research and compiling this into a document that I can refer back to.

1 0

This is an amazing resource. My head is spinning from all the information. I can't thank you enough for your research and compiling this into a document that I can refer back to.
Cancel
Leigh Maher

2011-12-04T10:33:05-08:00

Thanks Pete. This will be a fantastic document to keep as a reference.

Just a few points: you mentioned that the best solution on search sorts is a meta noindex on that page. I use a canonical tag. Is that just as effective?

Also, one interesting thing I read when Google announced their pagination solution. They said that through their testing, that users prefer to have only one page i.e. no pagination at all. This solves all the pagination issues, but also gives the user the best experience, according to Google's "testing".

Leighm edited 2011-12-04T10:33:51-08:00
1 0

Thanks Pete. This will be a fantastic document to keep as a reference. Just a few points: you mentioned that the best solution on search sorts is a meta noindex on that page. I use a canonical tag. Is that just as effective? Also, one interesting thing I read when Google announced their pagination solution. They said that through their testing, that users prefer to have only one page i.e. no pagination at all. This solves all the pagination issues, but also gives the user the best experience, according to Google's "testing".
Cancel
- Dr. Peter J. Meyers
 
 2011-12-05T13:13:23-08:00
 
 Purists will say that you shouldn't use canonical on search sorts, because they aren't "true" duplicates, but honestly, I suspect it'll work fine in most cases. I'd be careful with Google's testing findings - they have a bad habit of condensing the entire web into one data point. From what I've seen, having all results (or a lot) on one page CAN help, but it really depends on your audience. I'd definitely A/B test it. It's funny that they say that, but still have a 10-result SERP (for now).
 
 2 0
 
 Purists will say that you shouldn't use canonical on search sorts, because they aren't "true" duplicates, but honestly, I suspect it'll work fine in most cases. I'd be careful with Google's testing findings - they have a bad habit of condensing the entire web into one data point. From what I've seen, having all results (or a lot) on one page CAN help, but it really depends on your audience. I'd definitely A/B test it. It's funny that they say that, but still have a 10-result SERP (for now).
 Cancel
 - Leigh Maher
 
 2011-12-07T10:15:31-08:00
 
 That makes sense about the canonical. Thanks.
 
 Yeah, I'm taking their "testing" results with a grain of salt. This is the easiest solution for them to deal with so hense they're promoting it!
 
 Thanks again, Pete.
 
 Leighm edited 2011-12-07T10:16:23-08:00
 1 0
 
 That makes sense about the canonical. Thanks. Yeah, I'm taking their "testing" results with a grain of salt. This is the easiest solution for them to deal with so hense they're promoting it! Thanks again, Pete.
 Cancel
Furore Internet Marketing

2011-12-05T01:05:58-08:00

Wow this article rules! it helped me a lot in fixing lots of problems on my Magento webshop! Thnx

1 0

Wow this article rules! it helped me a lot in fixing lots of problems on my Magento webshop! Thnx
Cancel
Jens Martin Hedegaard

2011-12-03T13:18:21-08:00

What a great resource

1 0

What a great resource
Cancel
miamiman

2011-12-03T08:53:18-08:00

Like everybody else ... wow ... excellent job Dr. Pete!

Questions:

1. If part of your on-page content is dynamic syndicated content, where is the cut-off for duplicate and near duplicate? Is there a percentage? 50%? 75%? Is there a clear cut-off?

2. Tags on cms page headers would have to be implemented on a template by template basis. That is for example, the standout tag would only have to be present on that part of the site with breaking/original news? And not used more than seven times per week?

Again many thanks here!

miamiman edited 2011-12-03T08:53:52-08:00
1 0

Like everybody else ... wow ... excellent job Dr. Pete! Questions: 1. If part of your on-page content is dynamic syndicated content, where is the cut-off for duplicate and near duplicate? Is there a percentage? 50%? 75%? Is there a clear cut-off? 2. Tags on cms page headers would have to be implemented on a template by template basis. That is for example, the standout tag would only have to be present on that part of the site with breaking/original news? And not used more than seven times per week? Again many thanks here!
Cancel
- Dr. Peter J. Meyers
 
 2011-12-05T13:11:00-08:00
 
 (1) Unfortunately, not really. I think it depends a bit on your industry, how you're syndicating, and your overall authority (link profile strength, basically). Personally, I wouldn't push past 50% syndicated, unless your whole industry is 100% syndicated.
 
 (2) The "standout" tag hasn't been very well documented at this point, I'm afraid. I don't have much data on how much attention Google pays attention to it. I'm afraid it's going to be like other call-out tags - people will abuse it to the point that Google stops paying attention. If you use it, I'd use it sparingly - they're more likely to take it seriously that way.
 
 2 0
 
 (1) Unfortunately, not really. I think it depends a bit on your industry, how you're syndicating, and your overall authority (link profile strength, basically). Personally, I wouldn't push past 50% syndicated, unless your whole industry is 100% syndicated. (2) The "standout" tag hasn't been very well documented at this point, I'm afraid. I don't have much data on how much attention Google pays attention to it. I'm afraid it's going to be like other call-out tags - people will abuse it to the point that Google stops paying attention. If you use it, I'd use it sparingly - they're more likely to take it seriously that way.
 Cancel
CumbriaRay

2011-12-03T04:34:04-08:00

This is a gem of a post. It reflects a pretty deep grained understanding of the issues and I thank you for the effort you've put into this SEO duplicate 101. There was so much info that I may have simply not absorbed the answer, but you could help me out with this issue. A client's web designer has placed copies of my client's home page on satellite URLs. Same text same everything. the links from the product category navigation items in the main menu and the in content pictorial categories link and redirect to the main website's relevant pages though the anchor text etc is now out of date.

I wanted to simply redirect any traffic straight into the main website's content. I wanted to completely remove the duplicated homepages. The designer's judgement is that this will be seen as cloaking by the Big G and incur a different penalty. I disagree based on what I have tried to find out and my own fairly limited experience of working with large sites.

My alternative strategy was to refresh the content on those existing 1 page satellite sites and kill the duplication that way so that the urls are hosting more targeted content for their keywords.

The designer also controls the hosting and has been really arsey - it took 2 months to actually get the conversation - more than 12 phone calls and to be honest I would advise the clients to move hosts. Unfortunately they like cheap hosting that is offered, despite the shite customer service.

Would someone detached from the above argument be prepared to give an opinion on the best strategy for deduping in this situation. I am arguing with a person who has 10 years more mileage than me but I am convinced the strategy he has given my client is hurting their rankings for selling their goods across the uk.

Many thanks,

Ray

1 0

This is a gem of a post. It reflects a pretty deep grained understanding of the issues and I thank you for the effort you've put into this SEO duplicate 101. There was so much info that I may have simply not absorbed the answer, but you could help me out with this issue. A client's web designer has placed copies of my client's home page on satellite URLs. Same text same everything. the links from the product category navigation items in the main menu and the in content pictorial categories link and redirect to the main website's relevant pages though the anchor text etc is now out of date. I wanted to simply redirect any traffic straight into the main website's content. I wanted to completely remove the duplicated homepages. The designer's judgement is that this will be seen as cloaking by the Big G and incur a different penalty. I disagree based on what I have tried to find out and my own fairly limited experience of working with large sites. My alternative strategy was to refresh the content on those existing 1 page satellite sites and kill the duplication that way so that the urls are hosting more targeted content for their keywords. The designer also controls the hosting and has been really arsey - it took 2 months to actually get the conversation - more than 12 phone calls and to be honest I would advise the clients to move hosts. Unfortunately they like cheap hosting that is offered, despite the shite customer service. Would someone detached from the above argument be prepared to give an opinion on the best strategy for deduping in this situation. I am arguing with a person who has 10 years more mileage than me but I am convinced the strategy he has given my client is hurting their rankings for selling their goods across the uk. Many thanks, Ray
Cancel
Baron Turner

2011-12-03T08:10:58-08:00

Hi.

You say: "All else being equal, bloated indexes dilute your ranking ability."

That's quite an important factor for me. As each user comes onto my site, their search term (if available) is used to construct a new page for future users of the same search term. Hence, as an example, I have a dynamic page called metal-blue-widgets.htm and another called blue-widgets-metal.htm. They are the same page, or duplicate content.

It's tricky, programatically, to spot which page would be canonical, so that it can be added as a header tag - I have over 500 products. The idea is to provide a page for my customers that just provides a list of the metal blue widgets they want - it saves them having to find it amongst the other stuff, so the purpose is good.

With 6000 of these pages in existence - all providing variations of products, and all indexed - it's very important for me to recognise the truth value of the above statement. Could anyone indicate other material that shows the same?

Regards

Baron

bturner edited 2011-12-03T08:11:40-08:00
1 0

Hi. You say: "All else being equal, bloated indexes dilute your ranking ability." That's quite an important factor for me. As each user comes onto my site, their search term (if available) is used to construct a new page for future users of the same search term. Hence, as an example, I have a dynamic page called metal-blue-widgets.htm and another called blue-widgets-metal.htm. They are the same page, or duplicate content. It's tricky, programatically, to spot which page would be canonical, so that it can be added as a header tag - I have over 500 products. The idea is to provide a page for my customers that just provides a list of the metal blue widgets they want - it saves them having to find it amongst the other stuff, so the purpose is good. With 6000 of these pages in existence - all providing variations of products, and all indexed - it's very important for me to recognise the truth value of the above statement. Could anyone indicate other material that shows the same? Regards Baron
Cancel
- Dr. Peter J. Meyers
 
 2011-12-05T13:08:39-08:00
 
 I don't find user-generated search pages to be very valuable, honestly, and they can often spin out of control. I think it's much better to manage your own categories/sub-categories and focus on what's important. Otherwise, you've got thousands of very similar pages competing for attention in the rankings, and Google's not that keen on search pages to begin with.
 
 1 0
 
 I don't find user-generated search pages to be very valuable, honestly, and they can often spin out of control. I think it's much better to manage your own categories/sub-categories and focus on what's important. Otherwise, you've got thousands of very similar pages competing for attention in the rankings, and Google's not that keen on search pages to begin with.
 Cancel
 - Dominic108
 
 2011-12-06T12:08:50-08:00
 
 I am far from being as authoritative as Dr. Pete, but I know the theory and the last few months I read extensively on the subject matter. Google algorithm is based on the Page Rank algorithm. Nowadays, a lot of additional factors come into play, but the Page Rank algorithm is still very important. Basically, mathematically speaking, diluting your inbound links among many identical content is bad. I have seen it mentioned over and over by many experts and you should not have any difficulty to find your own references.
 
 This dilution of inbound links and also the fact that you use badly your crawl budget are not considered as penalties imposed by Google. They are penalties that you impose to yourself. Before the Panda revolution, Google was very clear that it did not penalize duplicated content. With Panda, this might have changed. In my opinion, this means that you are better find a solution to your duplicate content issue.
 
 However, removing the duplicate content is not always ideal. Often, the content is duplicated for a useful purpose. I am always annoyed when a so called SEO expert quickly jump to the conclusion that duplicated content or near duplicate content is a bad structure. On the contrary, it is fundamentally natural. Information progresses by phases of duplication (analysis) and consolidation (synthesis). Near duplication is fundamentally required because there are so many possible expectations from users and some times small details are important. Any fight by Google or SEO experts against near duplication is a waste of times. It is natural and necessary.
 
 This being said, generating the duplicate content automatically in response to a query does not seem to fulfill any useful purpose except forcing many pages with similar content to be indexed separatly in Google index, which is bad. Duplicate content is fine in your site - this is the part that is natural, but you should do every thing that is possible to avoid duplicate content in Google index because Google search pages correspond to the synthesis part - we do not want duplicate content at this level. The canonical link element is a very useful tool for that purpose. I want to know what Dr. Pete thinks about it, but if you find the way to use the canonical tag to present a nice structure to Google with little duplicate content, then you will be fine even with these user-generated search pages.
 
 2 0
 
 I am far from being as authoritative as Dr. Pete, but I know the theory and the last few months I read extensively on the subject matter. Google algorithm is based on the Page Rank algorithm. Nowadays, a lot of additional factors come into play, but the Page Rank algorithm is still very important. Basically, mathematically speaking, diluting your inbound links among many identical content is bad. I have seen it mentioned over and over by many experts and you should not have any difficulty to find your own references. This dilution of inbound links and also the fact that you use badly your crawl budget are not considered as penalties imposed by Google. They are penalties that you impose to yourself. Before the Panda revolution, Google was very clear that it did not penalize duplicated content. With Panda, this might have changed. In my opinion, this means that you are better find a solution to your duplicate content issue. However, removing the duplicate content is not always ideal. Often, the content is duplicated for a useful purpose. I am always annoyed when a so called SEO expert quickly jump to the conclusion that duplicated content or near duplicate content is a bad structure. On the contrary, it is fundamentally natural. Information progresses by phases of duplication (analysis) and consolidation (synthesis). Near duplication is fundamentally required because there are so many possible expectations from users and some times small details are important. Any fight by Google or SEO experts against near duplication is a waste of times. It is natural and necessary. This being said, generating the duplicate content automatically in response to a query does not seem to fulfill any useful purpose except forcing many pages with similar content to be indexed separatly in Google index, which is bad. Duplicate content is fine in your site - this is the part that is natural, but you should do every thing that is possible to avoid duplicate content in Google index because Google search pages correspond to the synthesis part - we do not want duplicate content at this level. The canonical link element is a very useful tool for that purpose. I want to know what Dr. Pete thinks about it, but if you find the way to use the canonical tag to present a nice structure to Google with little duplicate content, then you will be fine even with these user-generated search pages. 
 Cancel
 - Dr. Peter J. Meyers
 
 2011-12-06T15:03:04-08:00
 
 I'll agree on the user level that some duplication is perfectly sensible, but I'm mostly talking about removing it from Google's view. For example, you need search pagination if you have a ton of results, in most cases - it's valid, useful and necessary. Google doesn't want that indexed, though, and that's the key.
 
 Actually, it goes deeper than that - usability is not the same as search usability. Paginated search (for example) is perfectly useful for your site visitors. On the other hand, running a search that pulls up Page 17 of one of your internal search results is NOT useful for search visitors. They're going from Google's search results to yours, and that 17th page of results has very little context or meaning. So, I think these are two very different arguments. What's good for users on your site isn't always good for users who arrive on your site via Google.
 
 1 0
 
 I'll agree on the user level that some duplication is perfectly sensible, but I'm mostly talking about removing it from Google's view. For example, you need search pagination if you have a ton of results, in most cases - it's valid, useful and necessary. Google doesn't want that indexed, though, and that's the key. Actually, it goes deeper than that - usability is not the same as search usability. Paginated search (for example) is perfectly useful for your site visitors. On the other hand, running a search that pulls up Page 17 of one of your internal search results is NOT useful for search visitors. They're going from Google's search results to yours, and that 17th page of results has very little context or meaning. So, I think these are two very different arguments. What's good for users on your site isn't always good for users who arrive on your site via Google.
 Cancel
 - Dominic108
 
 2011-12-06T17:46:52-08:00
 
 You are the best !
 
 1 0
 
 You are the best !
 Cancel
jasikamarshel

2011-11-24T01:19:08-08:00

Helllo DR.Pete,

Here you explain panda update and about duplicate content this really very best but i had doubt in this "Rel=Canonical" in this tag we write this url "www.Example.com" so this take this url or " www.example.com/index.html”, so this we take i m confused in this .So i hope you explain me this well..

1 0

Helllo DR.Pete, Here you explain panda update and about duplicate content this really very best but i had doubt in this "Rel=Canonical" in this tag we write this url "www.Example.com" so this take this url or " www.example.com/index.html”, so this we take i m confused in this .So i hope you explain me this well..
Cancel
- Arpitsrivastava
 
 2011-11-24T03:32:11-08:00
 
 jasika marshel canonical means preferred so if your preferred URL is www.Example.com in that case you have to add a code at the top of the head section of not preferred URL<link href="https://www.example.com/"/>, if you have a landing page for which you run lot of paid campaigns and hence involves lot of URL tags for tracking purpose as a precautionary measure you can add :
 
 <link href="https://www.example.com/landingpage.shtml"/>
 
 2 0
 
 jasika marshel canonical means preferred so if your preferred URL is www.Example.com in that case you have to add a code at the top of the head section of not preferred URL<link href="https://www.example.com/"/>, if you have a landing page for which you run lot of paid campaigns and hence involves lot of URL tags for tracking purpose as a precautionary measure you can add : <link href="https://www.example.com/landingpage.shtml"/>
 Cancel
- Dr. Peter J. Meyers
 
 2011-11-28T09:39:07-08:00
 
 Arpitsrivastava is essentially correct, although you need the '' in the tag. Unfortunately, I think our comment editor is eating that part (it did it to me, too). Please see this Google post for the proper syntax:
 
 https://www.google.com/support/webmasters/bin/answer.py?answer=139394
 
 The confusing part is usually where to put the tag. The root ("/") version and "index.html" version of the page are almost always the same actual file/template, so you only need to put the tag in one place, and it'll cover all variations of the home-page (usually).
 
 The trick is if you have a CMS or some kind of sitewide page header - you don't want to add a canonical tag to one template and have it roll out to 100s of pages. The actual implementation can get tricky in practice.
 
 Dr-Pete edited 2011-11-28T09:41:11-08:00
 1 0
 
 Arpitsrivastava is essentially correct, although you need the '' in the tag. Unfortunately, I think our comment editor is eating that part (it did it to me, too). Please see this Google post for the proper syntax: <a href="https://www.google.com/support/webmasters/bin/answer.py?answer=139394" rel="nofollow">https://www.google.com/support/webmasters/bin/answer.py?answer=139394</a> The confusing part is usually where to put the tag. The root ("/") version and "index.html" version of the page are almost always the same actual file/template, so you only need to put the tag in one place, and it'll cover all variations of the home-page (usually). The trick is if you have a CMS or some kind of sitewide page header - you don't want to add a canonical tag to one template and have it roll out to 100s of pages. The actual implementation can get tricky in practice.
 Cancel
SajeetNair

2011-11-22T22:42:08-08:00

Dr Pete,

Great post, but there are certain things that i would like to point out -

www.abc.com/index.html >> 301 Redirect >> www.abc.com is not possible as that would lead to a redirect loop and the home page will go down. Canonical tag is the only way.

For international sites with duplicate content there are lot of options like-
- Using GEO tags
- Using Rel=alternate tag
- Specifying Geo Location in WMT
- Using Language tags
- Having Country specific terms in SEO elements like Titles and Metas
- Sajeet

SajeetNair edited 2011-11-22T22:44:29-08:00
1 0
Dr Pete, Great post, but there are certain things that i would like to point out - www.abc.com/index.html >> 301 Redirect >> www.abc.com is not possible as that would lead to a redirect loop and the home page will go down. Canonical tag is the only way. For international sites with duplicate content there are lot of options like- <ul><li>Using GEO tags</li> <li>Using Rel=alternate tag</li> <li>Specifying Geo Location in WMT</li> <li>Using Language tags</li> <li>Having Country specific terms in SEO elements like Titles and Metas </li> </ul> - Sajeet
Cancel
- Dr. Peter J. Meyers
 
 2011-11-23T09:22:15-08:00
 
 You're right, in the sense that the "index.html" redirect can loop, and I think Apache servers tend to have trouble with it. It's not impossible, though - there are some ways to get around it, and the rewrite is safer on other platforms. For a home-page, though, I'll agree that canonical is usually a better bet. In addition to being safer/easier, it also scoops up other variants.
 
 Dr-Pete edited 2011-11-23T09:22:31-08:00
 1 0
 
 You're right, in the sense that the "index.html" redirect can loop, and I think Apache servers tend to have trouble with it. It's not impossible, though - there are some ways to get around it, and the rewrite is safer on other platforms. For a home-page, though, I'll agree that canonical is usually a better bet. In addition to being safer/easier, it also scoops up other variants.
 Cancel
John Britsios

2011-11-18T18:27:40-08:00

Great article Dr. Pete. But I think some points require clarification.

1. I do no agree with point 4 about using the meta robots directives "noindex,nofollow". The reason is that you create dangling pages or nodes (dead end pages). The appropriate implementation would be the robots directives "noindex,noarchive,nosnippet,follow".

2. The information you provided about the meta tag "syndication-source" (11) is outdated. Google changed that to "standout".

1 0

Great article Dr. Pete. But I think some points require clarification. 1. I do no agree with point 4 about using the meta robots directives "noindex,nofollow". The reason is that you create dangling pages or nodes (dead end pages). The appropriate implementation would be the robots directives "noindex,noarchive,nosnippet,follow". 2. The information you provided about the meta tag "syndication-source" (11) is outdated. Google changed that to "standout". 
Cancel
- Dr. Peter J. Meyers
 
 2011-11-18T18:33:12-08:00
 
 I think there are times when it's ok and even advantageous to create a dead-end, if the path is naturally a dead-end for spiders. Otherwise, if the paths are complex enough, you could end up with crawl fatigue. Unfortunately, these situations are usually so complex that I've never seen anyone effectively measure one vs. the other. So, we're sometimes left with a difference between two educated guesses.
 
 Do you have a reference that suggests "standout" has replaced "syndication-source"? It was my impression that "standout" was a way to call attention to a small sub-set of news items, not to send a syndication signal. You can "standout" links to other sites, but that isn't a canonical signal (as far as I understand it). I haven't implemented it, so I may be wrong.
 
 Dr-Pete edited 2011-11-18T18:33:59-08:00
 1 0
 
 I think there are times when it's ok and even advantageous to create a dead-end, if the path is naturally a dead-end for spiders. Otherwise, if the paths are complex enough, you could end up with crawl fatigue. Unfortunately, these situations are usually so complex that I've never seen anyone effectively measure one vs. the other. So, we're sometimes left with a difference between two educated guesses. Do you have a reference that suggests "standout" has replaced "syndication-source"? It was my impression that "standout" was a way to call attention to a small sub-set of news items, not to send a syndication signal. You can "standout" links to other sites, but that isn't a canonical signal (as far as I understand it). I haven't implemented it, so I may be wrong.
 Cancel
 - John Britsios
 
 2011-11-18T19:11:31-08:00
 
 About my first point, I was referring to the original PageRank patent, and since there is no evidence that have been modified, I prefer to stick to my point.
 
 About my second point, "standout" replaced the purpose of use of the "syndication-source". To be specific:
 
 standout can be used to indicate which URL should be credited with the standout journalism behind a breaking news story.
 
 syndication-source can be used when an article is a slight modification of another article, such as a wire story.
 
 Source: https://www.google.com/support/news_pub/bin/answer.py?answer=191283
 
 SEOWorkers edited 2011-11-18T19:27:41-08:00
 2 0
 About my first point, I was referring to the original PageRank patent, and since there is no evidence that have been modified, I prefer to stick to my point. About my second point, "standout" replaced the purpose of use of the "syndication-source". To be specific: <ul><li>standout can be used to indicate which URL should be credited with the standout journalism behind a breaking news story.</li> <li>syndication-source can be used when an article is a slight modification of another article, such as a wire story.</li> </ul> Source: https://www.google.com/support/news_pub/bin/answer.py?answer=191283
 Cancel
 - Dr. Peter J. Meyers
 
 2011-11-18T20:35:58-08:00
 
 This is purely my opinion, based on anecdotal evidence. By the original PageRank patent, you're right - since the PageRank calculation is iterative, cutting off a path completely could keep it from looping back up and through a site, theoretically choking off a small amount PR. I suspect, though, that:
 
 (1) Changes to the Google algorithm over time have modulated the amount of PR passed by navigation elements. So, if the only links on a page other than sitewide links (including navigation) have no value, then the loss of iterative PR is virtually none.
 
 (2) For deep pages, the amount of PR passed back up is small enough that the negative of losing it is smaller than the negative of causing the crawler to go through 100s or 1000s of unnecessary pages. Of course, this is highly situational.
 
 An example where I'd consider using NOINDEX,NOFOLLOW is on a shopping cart page. Every page below it (checkout, for example) is useless to search. So, if the contextual links are useless and the PR-passing power of the sitewide links is modulated, I expect the loss is negligible. Of course, I'd also probably NOFOLLOW the link to the shopping cart itself, so the Meta NOINDEX,NOFOLLOW is really just a backup at that point. By nofollowing the link itself, PR flow is cut much more surgically.
 
 For any given situation, calculating the amount of PR lost vs. the crawler fatigue is something only Google can do (and even they probably couldn't give you those numbers for any given site). So, it's highly speculative. I do agree that, when in doubt, NOINDEX,FOLLOW is going to be safer for most people in most situations.
 
 Thanks for the reference on the changes to syndication-source. I'll dig into that over the weekend and update the post once I understand the distinction better.
 
 1 0
 
 This is purely my opinion, based on anecdotal evidence. By the original PageRank patent, you're right - since the PageRank calculation is iterative, cutting off a path completely could keep it from looping back up and through a site, theoretically choking off a small amount PR. I suspect, though, that: (1) Changes to the Google algorithm over time have modulated the amount of PR passed by navigation elements. So, if the only links on a page other than sitewide links (including navigation) have no value, then the loss of iterative PR is virtually none. (2) For deep pages, the amount of PR passed back up is small enough that the negative of losing it is smaller than the negative of causing the crawler to go through 100s or 1000s of unnecessary pages. Of course, this is highly situational. An example where I'd consider using NOINDEX,NOFOLLOW is on a shopping cart page. Every page below it (checkout, for example) is useless to search. So, if the contextual links are useless and the PR-passing power of the sitewide links is modulated, I expect the loss is negligible. Of course, I'd also probably NOFOLLOW the link to the shopping cart itself, so the Meta NOINDEX,NOFOLLOW is really just a backup at that point. By nofollowing the link itself, PR flow is cut much more surgically. For any given situation, calculating the amount of PR lost vs. the crawler fatigue is something only Google can do (and even they probably couldn't give you those numbers for any given site). So, it's highly speculative. I do agree that, when in doubt, NOINDEX,FOLLOW is going to be safer for most people in most situations. Thanks for the reference on the changes to syndication-source. I'll dig into that over the weekend and update the post once I understand the distinction better.
 Cancel
tstolber1

2011-11-18T20:21:40-08:00

Excellent article, I have recently tried out a new internal linking structure to deal with the issue of targeting multiple local content. I didn't duplicate content or have lots of city specific links, what I did was create a linking structure that associated each core subject to the city in a clever (I think) internal linking structure.

I have yet to see the results, but I plan on it being by first YouMoz post when I see the results.

The whole structure was designed to avoid the local duplicate content issues you were discussing.

1 0

Excellent article, I have recently tried out a new internal linking structure to deal with the issue of targeting multiple local content. I didn't duplicate content or have lots of city specific links, what I did was create a linking structure that associated each core subject to the city in a clever (I think) internal linking structure. I have yet to see the results, but I plan on it being by first YouMoz post when I see the results. The whole structure was designed to avoid the local duplicate content issues you were discussing.
Cancel
rakeshkalra

2011-11-19T02:21:04-08:00

Excellent article.

I am curious to know if you would consider the following 3 listing pages as duplicate content:

https://www.thinkvidya.com/bangalore/home-tuitions-classes

https://www.thinkvidya.com/bangalore/home-tuition

https://www.thinkvidya.com/bangalore/home-tutors

Is it a good strategy to build pages for related search terms, and which then show different set of search results?

1 0

Excellent article. I am curious to know if you would consider the following 3 listing pages as duplicate content: https://www.thinkvidya.com/bangalore/home-tuitions-classes https://www.thinkvidya.com/bangalore/home-tuition https://www.thinkvidya.com/bangalore/home-tutors Is it a good strategy to build pages for related search terms, and which then show different set of search results?
Cancel
- Dr. Peter J. Meyers
 
 2011-11-20T14:32:25-08:00
 
 I think it depends a bit on the scope, but if you're spinning out the same category (search results, in this case) just to target slight keyword variations, then I would call these near-duplicate. If you do it a couple of times to target your major keywords, it's probably fine. If every category has a handful of variations just to target different keywords, then you're probably going into run into problems.
 
 1 0
 
 I think it depends a bit on the scope, but if you're spinning out the same category (search results, in this case) just to target slight keyword variations, then I would call these near-duplicate. If you do it a couple of times to target your major keywords, it's probably fine. If every category has a handful of variations just to target different keywords, then you're probably going into run into problems.
 Cancel
Kathy Long

2011-11-18T15:25:14-08:00

Great article! This is the only one I'll probably need on duplicate content because you were so thorough. Thank you.

Question, twice now I've had to go up against chiropractor and veterinarian sites that were purchased or rented and came complete with copy, sometimes hundreds of pages of it. This same copy is shared by many, many other chiropractors and veterinarians across the web. It is duplicate copy, but nevertheless those sites rank high. The question is why? Maybe Google doesn't care?? I'm sure there are other factors coming into play here as well, some even as a result of all that copy which must be attracting visits, increasing time on site, lowering bounce rate, etc. Are all those things making up for the duplicate copy? What are your thoughts?

You can see it if you search on "santa cruz chiropractor." In the top 5, only McCollum and Griffin have custom copy.

1 0

Great article! This is the only one I'll probably need on duplicate content because you were so thorough. Thank you. Question, twice now I've had to go up against chiropractor and veterinarian sites that were purchased or rented and came complete with copy, sometimes hundreds of pages of it. This same copy is shared by many, many other chiropractors and veterinarians across the web. It is duplicate copy, but nevertheless those sites rank high. The question is why? Maybe Google doesn't care?? I'm sure there are other factors coming into play here as well, some even as a result of all that copy which must be attracting visits, increasing time on site, lowering bounce rate, etc. Are all those things making up for the duplicate copy? What are your thoughts? You can see it if you search on "santa cruz chiropractor." In the top 5, only McCollum and Griffin have custom copy. 
Cancel
- Sha Menz
 
 2011-11-18T20:12:24-08:00
 
 One of the most helpful tools in the SEOmoz toolkit is the Keyword Difficulty Tool, which allows you to see exactly which metrics are being won and lost by each of the top 10 results for a given keyword term.
 
 To see all of the metrics you need to run a full report and when setting that up, you can also add your own URL for comparison against the Top 10.
 
 Rand gave a detailed explanation on how to use the full reports to see exactly what is influencing the rankings for each site in his post The Best Kept Secret in the SEOmoz Toolset.
 
 Looks like these results are heavily influenced by local search factors.
 
 Sha
 
 2 0
 
 One of the most helpful tools in the SEOmoz toolkit is the <a href="https://pro.seomoz.org/tools/keyword-difficulty" rel="nofollow">Keyword Difficulty Tool</a>, which allows you to see exactly which metrics are being won and lost by each of the top 10 results for a given keyword term. To see all of the metrics you need to run a full report and when setting that up, you can also add your own URL for comparison against the Top 10. Rand gave a detailed explanation on how to use the full reports to see exactly what is influencing the rankings for each site in his post <a href="the-best-kept-secret-in-the-seomoz-toolset">The Best Kept Secret in the SEOmoz Toolset</a>. Looks like these results are heavily influenced by local search factors. Sha
 Cancel
- Dr. Peter J. Meyers
 
 2011-11-20T14:25:16-08:00
 
 As Sha said, there are quite a few factors that can come into play. With a local business, like a chiropractor or veternarian, local SEO factors (like your Google Places listing, citations, etc.) can definitely play a strong role. I also suspect that, if the templated sites target different regions (and especially if they're smaller sites), Google may overlook it. So, they may be able to push the envelope a little more. On the other hand, I still think that your own unique content is a competitive advantage over time.
 
 1 0
 
 As Sha said, there are quite a few factors that can come into play. With a local business, like a chiropractor or veternarian, local SEO factors (like your Google Places listing, citations, etc.) can definitely play a strong role. I also suspect that, if the templated sites target different regions (and especially if they're smaller sites), Google may overlook it. So, they may be able to push the envelope a little more. On the other hand, I still think that your own unique content is a competitive advantage over time.
 Cancel
resonancesocial

2011-11-18T07:53:01-08:00

Thanks for the insights, Dr. Pete. I do have a question regarding those pesky "near duplicate" pages: In your opinion, how different does content have to be in order to be considered non-duplicate? Would rewording the verbiage between one page and another do the trick, or is a more complex "intervention" needed?

And what about items like a company's "boilerplate" descriptor? As an important part of the brand, there are definite advantages to having it on multiple pages from a brand-identity standpoint, but does it fall into the duplicate-content category?

Look forward to your answer!

1 0

Thanks for the insights, Dr. Pete. I do have a question regarding those pesky "near duplicate" pages: In your opinion, how different does content have to be in order to be considered non-duplicate? Would rewording the verbiage between one page and another do the trick, or is a more complex "intervention" needed? And what about items like a company's "boilerplate" descriptor? As an important part of the brand, there are definite advantages to having it on multiple pages from a brand-identity standpoint, but does it fall into the duplicate-content category? Look forward to your answer!
Cancel
- Dr. Peter J. Meyers
 
 2011-11-18T08:32:49-08:00
 
 That's a really tough one, and it depends a lot on your site as a whole - how many pages are near-duplicates, what kind of authority do you have, etc. It also depends if you're already been hit by Panda or are just being proactive. It can take significant changes to undo Panda, but you can ease into it if you're just trying to prevent future problems.
 
 I think there's another issue with chunks of copied content, like boilerplate company descriptions - they can cannibalize your own keywords. They might be a weak signal that you want the entire site to rank for terms in that description, but they also confuse Google as to your priorities. I think it's often better to target one page for that content, and stick to something shorter (like a solid tagline) sitewide. In most cases, I strongly suspect visitors ignore it, too - we tend to overestimate the value of our brand content in the eyes of site visitors.
 
 2 0
 
 That's a really tough one, and it depends a lot on your site as a whole - how many pages are near-duplicates, what kind of authority do you have, etc. It also depends if you're already been hit by Panda or are just being proactive. It can take significant changes to undo Panda, but you can ease into it if you're just trying to prevent future problems. I think there's another issue with chunks of copied content, like boilerplate company descriptions - they can cannibalize your own keywords. They might be a weak signal that you want the entire site to rank for terms in that description, but they also confuse Google as to your priorities. I think it's often better to target one page for that content, and stick to something shorter (like a solid tagline) sitewide. In most cases, I strongly suspect visitors ignore it, too - we tend to overestimate the value of our brand content in the eyes of site visitors.
 Cancel
 - resonancesocial
 
 2011-11-18T09:40:11-08:00
 
 Great insights -- thank you!
 
 1 0
 
 Great insights -- thank you!
 Cancel
Full Media

2011-11-18T05:45:39-08:00

Really good stuff, Dr. Pete. I think "Your Own Brain" gets underutilized in many instances. Again, great article.

1 0

Really good stuff, Dr. Pete. I think "Your Own Brain" gets underutilized in many instances. Again, great article.
Cancel
Ingo Bultschnieder

2011-11-18T07:06:00-08:00

Many thanks for this brilliant post. Can't remember ever reading such a long blog post. This should be written exactly this way in any SEO book.

1 0

Many thanks for this brilliant post. Can't remember ever reading such a long blog post. This should be written exactly this way in any SEO book.
Cancel
Aidy

2011-11-18T07:19:53-08:00

Thanks so much for the clarification Dr. Pete. Extremely detailed and helpful article. So many changes with Google and the changes are becoming too frequent. Good to have your article for reference.

1 0

Thanks so much for the clarification Dr. Pete. Extremely detailed and helpful article. So many changes with Google and the changes are becoming too frequent. Good to have your article for reference. 
Cancel
SeoDuck48

2011-11-19T23:05:24-08:00

Wow! That's what I call a comprehensive article about duplicated content. Everything clear and neat.

1 0

Wow! That's what I call a comprehensive article about duplicated content. Everything clear and neat.
Cancel
Erdal Bezaroglu

2011-11-19T23:12:12-08:00

Epic Post for Duplicate Content, Thank you.

1 0

Epic Post for Duplicate Content, Thank you.
Cancel
- M.-J. Taylor
 
 2011-11-23T13:05:42-08:00
 
 What he said.
 
 1 0
 
 What he said.
 Cancel
Vince Lin

2011-11-22T15:34:58-08:00

How long did it take you to write this post? Must read for all SEO strategists

1 0

How long did it take you to write this post? Must read for all SEO strategists
Cancel
Eraz

2011-11-22T20:13:43-08:00

Wow!!! Just hats off mate.....Thanks for the great info. This is really gonna be a big help. Appreciated.

Cheers!!!

Edit: Just realized....1 thumb down??? SEOmoz surely has a spam....lol!!!

Eraz edited 2011-11-22T20:17:44-08:00
1 0

Wow!!! Just hats off mate.....Thanks for the great info. This is really gonna be a big help. Appreciated. Cheers!!! Edit: Just realized....1 thumb down??? SEOmoz surely has a spam....lol!!!
Cancel
Rahul Singh

2011-11-22T21:24:07-08:00

All the collective information under one roof..........thanks good research work

1 0

All the collective information under one roof..........thanks good research work
Cancel
Rob Boirun

2011-11-22T07:24:03-08:00

Great post. I didn't think that having a link to index.htm was different then linking to the main url. I was noticing some funny business in Google results by which it was showing the full url including index.htm and I was linking to the page by the domain only. I will setup a canonical to take care of this.

1 0

Great post. I didn't think that having a link to index.htm was different then linking to the main url. I was noticing some funny business in Google results by which it was showing the full url including index.htm and I was linking to the page by the domain only. I will setup a canonical to take care of this.
Cancel
Mohsin Ali Waheed

2011-11-21T10:51:55-08:00

Now i understand why Google still prefers HTML pages because blogs are very complicated and creates lots of problems for search engines bots. Why they say that blogs are SEO friendly.

1 0

Now i understand why Google still prefers HTML pages because blogs are very complicated and creates lots of problems for search engines bots. Why they say that blogs are SEO friendly.
Cancel
Edward Coram-James

2011-11-20T06:34:48-08:00

What a fantastic comment Dr Pete.

I have been feeling for a while that duplicate content was going to become a much less isolated issue that it was, so panda has not been a massive surprise. That said- boy are the updates coming thick and fast!

The Rel-prev and Rel-next tags are interesting ones too. They are something that I will be watching closely. What is the SEOmoz feeling on them. Awesome or a bit of a waste of time?

Of all of the great SEO blogs out there, SEOmoz does prove to be the most cutting edge out there.

So, all the way from the UK, we say 'legends' and 'keep up the good work!'

EdwardCoram-James edited 2011-11-20T06:38:19-08:00
1 0

What a fantastic comment Dr Pete. I have been feeling for a while that duplicate content was going to become a much less isolated issue that it was, so panda has not been a massive surprise. That said- boy are the updates coming thick and fast! The Rel-prev and Rel-next tags are interesting ones too. They are something that I will be watching closely. What is the SEOmoz feeling on them. Awesome or a bit of a waste of time? Of all of the great SEO blogs out there, SEOmoz does prove to be the most cutting edge out there. So, all the way from the UK, we say 'legends' and 'keep up the good work!'
Cancel
Staff

Dr. Peter J. Meyers
Staff

2011-11-21T10:10:47-08:00

A number of people asked if we could make a stand-alone PDF version of this post available. That's complete now, and you can download it here. FYI, it's about 22 pages and 560KB.

1 0

A number of people asked if we could make a stand-alone PDF version of this post available. That's complete now, and you can <a href="https://static.seomoz.org/files/duplicate-content-post-panda.pdf" rel="nofollow">download it here</a>. FYI, it's about 22 pages and 560KB.
Cancel
Quentin Aisbett

2011-12-05T19:39:16-08:00

Dr Pete, wowee! What a thorough article and one that explains each issue so very well. This is certainly one that I will be book marking and making regular reference to.

Cheers

1 0

Dr Pete, wowee! What a thorough article and one that explains each issue so very well. This is certainly one that I will be book marking and making regular reference to. Cheers
Cancel

Post Analytics

- Sgt. Jericho “Bamboo” Jackson

The Supplemental Index

The Crawl “Budget”

The Indexation “Cap”

The Penalty Debate

The Panda Update

(1) True Duplicates

(2) Near Duplicates

(3) Cross-domain Duplicates

(1) 404 (Not Found)

(2) 301 Redirect

(3) Robots.txt

(4) Meta Robots

(5) Rel=Canonical

(6) Google URL Removal

(7) Google Parameter Blocking

(8) Bing URL Removal

(9) Bing Parameter Blocking

(10) Rel=Prev & Rel=Next

(11) Syndication-Source

(12) Internal Linking

(13) Don’t Do Anything

(14) Rel="alternate" hreflang="x"

(1) “www” vs. Non-www

(2) Staging Servers

(3) Trailing Slashes ("/")

(4) Secure (https) Pages

(5) Home-page Duplicates

(6) Session IDs

(7) Affiliate Tracking

(8) Duplicate Paths

(9) Functional Parameters

(10) International Duplicates

(11) Search Sorts

(12) Search Filters

(13) Search Pagination

(14) Product Variations

(15) Geo-keyword Variations

(16) Other “Thin” Content

(17) Syndicated Content

(18) Scraped Content

(19) Cross-ccTLD Duplicates

(1) Google Webmaster Tools

(2) Google’s Site: Command

(3) SEOmoz Campaign Manager

(4) Your Own Brain

Comments 300

Log in to Moz

Don't have an account?