Fat Pandas and Thin Content

Comments 77

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Gianluca Fiorelli

2011-05-04T21:06:44-07:00

F.A.N.T.A.S.T.I.C

I think your post is even more useful than being just about Panda... it's a perfect small manual about what a duplicate content is; something easy to say but that I discovered hard to make understand for real.

11 0

F.A.N.T.A.S.T.I.C I think your post is even more useful than being just about Panda... it's a perfect small manual about what a duplicate content is; something easy to say but that I discovered hard to make understand for real.
Cancel
- Libertine
 
 2011-05-05T01:09:20-07:00
 
 Seconded - cracking post with specific examples and actionable suggestions.
 
 You're really showing the value of your PHD here Dr. Pete :)
 
 4 0
 
 Seconded - cracking post with specific examples and actionable suggestions. You're really showing the value of your PHD here Dr. Pete :)
 Cancel
- TME_Digital
 
 2011-05-05T01:12:24-07:00
 
 Definitely, agreed. Great post, Pete!
 
 2 1
 
 Definitely, agreed. Great post, Pete!
 Cancel
- Timo Smeets
 
 2011-05-05T01:34:11-07:00
 
 Totally agree with you Gianluca!
 
 Fantastic post with great insights.
 
 3 1
 
 Totally agree with you Gianluca! Fantastic post with great insights.
 Cancel
- Thomas Høgenhaven
 
 2011-05-05T09:49:01-07:00
 
 Yes, agree - an excellent post! i really like the graphic design in it too. So easy to pass on to other as it is basically self-explaining.
 
 1 0
 
 Yes, agree - an excellent post! i really like the graphic design in it too. So easy to pass on to other as it is basically self-explaining.
 Cancel
Robert Duckers

2011-05-05T04:14:30-07:00

About #7 - Don't forget you can use Webmaster Tools Parameter Handling to filter out those parameters you want Googlebot to ignore. This might be quicker to do than trying to NOINDEX etc. Coupling parameter handling with canonicalisation is a good step in the right direction.

There must be millions of ecommerce sites with 'Thin' content. This is a helpful post...

5 0

About #7 - Don't forget you can use Webmaster Tools Parameter Handling to filter out those parameters you want Googlebot to ignore. This might be quicker to do than trying to NOINDEX etc. Coupling parameter handling with canonicalisation is a good step in the right direction. There must be millions of ecommerce sites with 'Thin' content. This is a helpful post...
Cancel
- Dr. Peter J. Meyers
 
 2011-05-05T07:56:32-07:00
 
 My only issue with GWT parameter handling (and this applies to seo-himanshu's comment as well) is that it's Google-only. It won't impact Bing or your SEO tools and analytics (including our own campaign tools). Google could also just up and change how it works, so it makes me a little jumpy.
 
 That said, it does seem to be effective and often faster than other methods. I think of it more as a short-term solution, though - something to patch the leak while you rebuild the levee.
 
 Dr-Pete edited 2011-05-05T07:56:52-07:00
 2 0
 
 My only issue with GWT parameter handling (and this applies to seo-himanshu's comment as well) is that it's Google-only. It won't impact Bing or your SEO tools and analytics (including our own campaign tools). Google could also just up and change how it works, so it makes me a little jumpy. That said, it does seem to be effective and often faster than other methods. I think of it more as a short-term solution, though - something to patch the leak while you rebuild the levee.
 Cancel
Himanshu Sharma

2011-05-05T05:23:52-07:00

I would like to give one small tip here. Generally websites which duplicate its own contents through faceted navigation, filters, internal search, session IDs etc append some parameter at the end of a URL. For e.g.

https://www.abc.com/i-am-original.php

https://www.abc.com/i-am-original.php?_SID=U

https://www.abc.com/i-am-original.php?ocode=02916595

https://www.abc.com/i-am-original.php?ocode=02916592

..

Through Google webmaster tools 'parameter handling' feature you can sugges google to ignore such parameters. This method can quickly reduce large number of duplicate contents on your website in Google index. It is also more efficent than slapping noindex, canonical tags on each and every page esp. if your website is very big and produce/remove large volume of contents on daily/weekly basis.

OptimizeSmart edited 2011-05-05T05:25:10-07:00
4 0

I would like to give one small tip here. Generally websites which duplicate its own contents through faceted navigation, filters, internal search, session IDs etc append some parameter at the end of a URL. For e.g. https://www.abc.com/i-am-original.php https://www.abc.com/i-am-original.php?_SID=U https://www.abc.com/i-am-original.php?ocode=02916595 https://www.abc.com/i-am-original.php?ocode=02916592 .. Through Google webmaster tools 'parameter handling' feature you can sugges google to ignore such parameters. This method can quickly reduce large number of duplicate contents on your website in Google index. It is also more efficent than slapping noindex, canonical tags on each and every page esp. if your website is very big and produce/remove large volume of contents on daily/weekly basis.
Cancel
Jon-Mikel Bailey

2011-05-05T07:43:45-07:00

Brilliant post. The visuals in this do more to explain duplicate content and solutions for it better than any overview I have seen. This also really speaks to a company's overall content strategy. By reading this I hope it becomes clear to many that usefulness is very important and probably more important than content saturation. I am certain the Google and the like will continue to change the rules making relevance and usefulness matter more and more. I love that with every update to their algorhythm there are fewer chances of "cheating the system".

3 0

Brilliant post. The visuals in this do more to explain duplicate content and solutions for it better than any overview I have seen. This also really speaks to a company's overall content strategy. By reading this I hope it becomes clear to many that usefulness is very important and probably more important than content saturation. I am certain the Google and the like will continue to change the rules making relevance and usefulness matter more and more. I love that with every update to their algorhythm there are fewer chances of "cheating the system".
Cancel
- Dr. Peter J. Meyers
 
 2011-05-05T07:58:35-07:00
 
 Not to be self-congratulatory, but thanks for complimenting my illustrations. I'm trying to do more of that, but it takes a REALLY long time right now.
 
 3 0
 
 Not to be self-congratulatory, but thanks for complimenting my illustrations. I'm trying to do more of that, but it takes a REALLY long time right now.
 Cancel
Adrian Drysdale

2011-05-05T03:38:06-07:00

Do I get a free shirt if i get 500 thumbs downs? I'm on my way!

In regard to the blog, great findings!

3 0

Do I get a free shirt if i get 500 thumbs downs? I'm on my way! In regard to the blog, great findings!
Cancel
saibose

2011-05-04T20:41:44-07:00

What do you suggest for a website like hotels.com which shows pages based on listings. The listing differ based on query type, but essentially, they are the same.

Is syndicating content from other pages a good idea?

Example: NYC hotels and cheap NYC hotels are 2 different URLs and they pull content from a destination page which has around 7 parts. The NYC hotels page pulls content from 1 part of NYC writeup and links to that page with a read more link. Similarly, cheap NYC hotels will pull content from 2nd part and so on.

Does that make the pages thin? considering that the listing entries are different for NYC hotels and cheap NYC hotels and its ensured that there is syndication of content and no scraping?

3 0

What do you suggest for a website like hotels.com which shows pages based on listings. The listing differ based on query type, but essentially, they are the same. Is syndicating content from other pages a good idea? Example: NYC hotels and cheap NYC hotels are 2 different URLs and they pull content from a destination page which has around 7 parts. The NYC hotels page pulls content from 1 part of NYC writeup and links to that page with a read more link. Similarly, cheap NYC hotels will pull content from 2nd part and so on. Does that make the pages thin? considering that the listing entries are different for NYC hotels and cheap NYC hotels and its ensured that there is syndication of content and no scraping?
Cancel
- Dr. Peter J. Meyers
 
 2011-05-05T07:39:27-07:00
 
 That's a good (and very tough) question. Let me start by saying that it's not a level playing field. Big brands and high authority sites can get away with things that other sites can't.
 
 Technically, I do think this is duplicate content, and I think it has low value for search visitors. These pages are created purely to rank on alternate terms. The thing is - it's probably working for them and it may be generating tons of traffic. Does that mean Hotels.com will be immune forever? Probably not, but you'd be hard pressed to convince them to cut out ranking pages today.
 
 It's a risk calculation, from an SEO perspective. Originally, large-scale duplication just meant some pages might go supplemental or get filtered out. Not ideal, but not a disaster. Later, duplication started to creating indexation issues and stronger filters that started hurting pages beyond the duplicates. Still not a Capital-P Penalty, but the consequences could be serious.
 
 Now, you've got Panda, and what really feels like a penalty to a lot of people. I don't think duplicates alone are the cause in most cases, but now you're looking at a situation where large-scale duplication could impact an entire site. So, when do you start to move toward the next algorithm change? That's a very difficult calculation.
 
 4 0
 
 That's a good (and very tough) question. Let me start by saying that it's not a level playing field. Big brands and high authority sites can get away with things that other sites can't. Technically, I do think this is duplicate content, and I think it has low value for search visitors. These pages are created purely to rank on alternate terms. The thing is - it's probably working for them and it may be generating tons of traffic. Does that mean Hotels.com will be immune forever? Probably not, but you'd be hard pressed to convince them to cut out ranking pages today. It's a risk calculation, from an SEO perspective. Originally, large-scale duplication just meant some pages might go supplemental or get filtered out. Not ideal, but not a disaster. Later, duplication started to creating indexation issues and stronger filters that started hurting pages beyond the duplicates. Still not a Capital-P Penalty, but the consequences could be serious. Now, you've got Panda, and what really feels like a penalty to a lot of people. I don't think duplicates alone are the cause in most cases, but now you're looking at a situation where large-scale duplication could impact an entire site. So, when do you start to move toward the next algorithm change? That's a very difficult calculation.
 Cancel
 - saibose
 
 2011-05-05T09:12:32-07:00
 
 Pete,
 
 websites (not necessarily hotels.com) but websites of similar status who have numerous number of legitimate social and website mentions. Folks refer to them (like we are doing here) that have these issues on a particular page.
 
 My question is:
 
 does google consider a webpage in full or it takes components? It happens that maybe a block of text has been syndicated, but the whole page on the whole is not similar to any other pages.
 
 Would you consider a page with syndicated content from another page (which is about 10% of the real estate) and the rest of the area is informative listings which are not replicated elsewhere?
 
 1 0
 
 Pete, websites (not necessarily hotels.com) but websites of similar status who have numerous number of legitimate social and website mentions. Folks refer to them (like we are doing here) that have these issues on a particular page. My question is: does google consider a webpage in full or it takes components? It happens that maybe a block of text has been syndicated, but the whole page on the whole is not similar to any other pages. Would you consider a page with syndicated content from another page (which is about 10% of the real estate) and the rest of the area is informative listings which are not replicated elsewhere?
 Cancel
teebus

2011-05-06T12:25:15-07:00

I find this direct duplication of your original (and rather excellent) article a tad ironic - https://indbox.co.tv/fat-pandas-and-thin-content/.

2 0

I find this direct duplication of your original (and rather excellent) article a tad ironic - <a href="https://indbox.co.tv/fat-pandas-and-thin-content/" rel="nofollow">https://indbox.co.tv/fat-pandas-and-thin-content/</a>.
Cancel
- Dr. Peter J. Meyers
 
 2011-05-06T12:27:57-07:00
 
 I like how they changed the title slightly, but then kept my original title as the short URL. Classy :)
 
 SEOmoz has a ton of scrapers. It's the sincerest form of online flattery, right?
 
 1 0
 
 I like how they changed the title slightly, but then kept my original title as the short URL. Classy :) SEOmoz has a ton of scrapers. It's the sincerest form of online flattery, right?
 Cancel
Sebes

2011-05-04T22:43:45-07:00

Great article! I got a question: do you know what algo google uses to determine similarity of content?

2 0

Great article! I got a question: do you know what algo google uses to determine similarity of content?
Cancel
- firstconversion
 
 2011-05-05T02:17:13-07:00
 
 Dont know about Google, but I use this https://www.wordsfinder.com/tool_duplicate_content_checker.php
 
 3 0
 
 Dont know about Google, but I use this https://www.wordsfinder.com/tool_duplicate_content_checker.php
 Cancel
Moosa Hemani

2011-05-05T01:06:27-07:00

Another great advice by Dr. Pete, even besides panda this is a great post that features the problem and solutions regarding ‘thin’ and duplicate content.

Two Things are the main take away in the post (IMO)

1. Within site or Cross site… Duplicate content either exactly duplicate or near duplicates can hurt your rankings and over all visibility in search engines.

2. Make money online or applying with Adsense is not bad but having too much ads can hurt so you have to scale and try some experiments to see what works best…

Simply Epic!

2 0

Another great advice by Dr. Pete, even besides panda this is a great post that features the problem and solutions regarding ‘thin’ and duplicate content. Two Things are the main take away in the post (IMO) 1. Within site or Cross site… Duplicate content either exactly duplicate or near duplicates can hurt your rankings and over all visibility in search engines. 2. Make money online or applying with Adsense is not bad but having too much ads can hurt so you have to scale and try some experiments to see what works best… Simply Epic!
Cancel
Gordian

2011-05-05T16:30:10-07:00

What are your thoughts on large ecommerce sites that have a large amount of affiliates, do you think this is a bad thing now after panda because of the potential duplicate content issuses to have any affiliates at all?

1 0

What are your thoughts on large ecommerce sites that have a large amount of affiliates, do you think this is a bad thing now after panda because of the potential duplicate content issuses to have any affiliates at all?
Cancel
Mat Hammer

2011-05-10T07:27:10-07:00

Nice post, thank you for taking the time to explain duplicate content and how to avoid it.

1 0

Nice post, thank you for taking the time to explain duplicate content and how to avoid it.
Cancel
lfseo

2011-05-11T21:21:13-07:00

We've lost a lot of traffic after the Panda updates, so I've been trying to figure out how to eliminate duplicate content.

I just did a quick Google Analytics check to see how much sales were actually generated from users who landed on page 2+ of our category or search pages (the only pages on our site that have multi-page results). I ran a "landing page vs. sales" report with some simple filters. In our case, this was fairly easy, since everything is controlled by "cat=", "search=" and "page=" URL parameters.

I discovered that 1.41% of our sales (in dollars) over the past year were generated by customers who landed on a page 2+ category or search page.

After adding another filter to separate the category and search pages, I came up with these results:

1.25% of sales came from a page 2+ category landing page

0.19% of sales came from page 2+ search results landing page

This was just a quick and dirty test, I didn't bother to drill down and see whether all this traffic actually came from Google search (some of it came from other search engines and traffic sources).

However, since only a small percentage of sales came from page 2+ search results, it would probably be safe to add a NOINDEX tag to those pages. If adding the NOINDEX tag to those pages would result in regaining the traffic we've lost due to duplicate content, it would probably be worthwhile.

I've already added the "canonical" tag to those pages, but it's too early to tell if it has made any difference. If that doesn't work, I'll try the NOINDEX. From what I've read in the Google guidelines, you should only use one tag or the other and not both.

If you are running Google Analytics to track ecommerce, you should be able to do a similar analysis.

lfseo edited 2011-05-11T21:31:35-07:00
1 0

We've lost a lot of traffic after the Panda updates, so I've been trying to figure out how to eliminate duplicate content. I just did a quick Google Analytics check to see how much sales were actually generated from users who landed on page 2+ of our category or search pages (the only pages on our site that have multi-page results). I ran a "landing page vs. sales" report with some simple filters. In our case, this was fairly easy, since everything is controlled by "cat=", "search=" and "page=" URL parameters. I discovered that 1.41% of our sales (in dollars) over the past year were generated by customers who landed on a page 2+ category or search page. After adding another filter to separate the category and search pages, I came up with these results: 1.25% of sales came from a page 2+ category landing page 0.19% of sales came from page 2+ search results landing page This was just a quick and dirty test, I didn't bother to drill down and see whether all this traffic actually came from Google search (some of it came from other search engines and traffic sources). However, since only a small percentage of sales came from page 2+ search results, it would probably be safe to add a NOINDEX tag to those pages. If adding the NOINDEX tag to those pages would result in regaining the traffic we've lost due to duplicate content, it would probably be worthwhile. I've already added the "canonical" tag to those pages, but it's too early to tell if it has made any difference. If that doesn't work, I'll try the NOINDEX. From what I've read in the Google guidelines, you should only use one tag or the other and not both. If you are running Google Analytics to track ecommerce, you should be able to do a similar analysis.
Cancel
- lfseo
 
 2011-05-11T22:00:07-07:00
 
 UPDATE: I just drilled down to filter only Google search traffic and discovered that only 0.02% of our annual sales came from landing pages that were page 2+ search results. In our case, this was just one sale.
 
 Since only a negligible amount of sales are coming from these pages, it could mean that Google may not even have most of them indexed. Even if they are indexed, it certainly wouldn't hurt to add a "NOINDEX" tag, since they don't do much for sales and might be creating a duplicate content penalty.
 
 1 0
 
 UPDATE: I just drilled down to filter only Google search traffic and discovered that only 0.02% of our annual sales came from landing pages that were page 2+ search results. In our case, this was just one sale. Since only a negligible amount of sales are coming from these pages, it could mean that Google may not even have most of them indexed. Even if they are indexed, it certainly wouldn't hurt to add a "NOINDEX" tag, since they don't do much for sales and might be creating a duplicate content penalty.
 Cancel
G Chandrashekar Reddy

2011-05-07T09:28:23-07:00

Nice article based on duplication contant internal and external website.

1 0

Nice article based on duplication contant internal and external website.
Cancel
Drazen_d

2011-05-07T00:38:06-07:00

Hi,After reading your article, it occured to me that pages such as monthly archives and category archives (which are standardly created in one way or another by all publishing platforms I can think of) may be viewed as spun content by Google. Since we use both monthly archives such as this:https://www.masternewmedia.org/2011/02/and category archives like this: https://www.masternewmedia.org/search_tools_and_technologies.htmWhat you discussed under point #3 Near Duplicates (Internal) means that these pages we had for years are now seen as partial copies of the articles.We already noindexed monthly archives and are about to do the same on the category archives.Is there anything else you would suggest doing on such pages or would you do something different?Also, if you look at any article, we have "related articles" section at the bottom of each article. That section (yet again) has excerpts from other related articles.If we show those via Javascript do you think Google may be happier that way?

1 0

Hi,After reading your article, it occured to me that pages such as monthly archives and category archives (which are standardly created in one way or another by all publishing platforms I can think of) may be viewed as spun content by Google. Since we use both monthly archives such as this:https://www.masternewmedia.org/2011/02/and category archives like this: https://www.masternewmedia.org/search_tools_and_technologies.htmWhat you discussed under point #3 Near Duplicates (Internal) means that these pages we had for years are now seen as partial copies of the articles.We already noindexed monthly archives and are about to do the same on the category archives.Is there anything else you would suggest doing on such pages or would you do something different?Also, if you look at any article, we have "related articles" section at the bottom of each article. That section (yet again) has excerpts from other related articles.If we show those via Javascript do you think Google may be happier that way?
Cancel
- Dr. Peter J. Meyers
 
 2011-05-09T08:58:00-07:00
 
 I wouldn't worry too much about the excerpts (especially if they're short) - this isn't an all-or-none-thing, and excerpts like that have clear intent and some value. Meanwhile, hiding content with JS could prove more risky.
 
 Category pages (which are essentially searches) are tougher, since some of them may rank. I usually start with the lowest value pages - like sorts and filters - and work up. Tackle pagination yet. I also think your aggressiveness in culling near-duplicates has to be weighed against risk. If you've taken a hit, it's worth being more aggressive. If you're trying to preempt future problems, then ease into it. There's a balance, and it's not easy to find.
 
 2 0
 
 I wouldn't worry too much about the excerpts (especially if they're short) - this isn't an all-or-none-thing, and excerpts like that have clear intent and some value. Meanwhile, hiding content with JS could prove more risky. Category pages (which are essentially searches) are tougher, since some of them may rank. I usually start with the lowest value pages - like sorts and filters - and work up. Tackle pagination yet. I also think your aggressiveness in culling near-duplicates has to be weighed against risk. If you've taken a hit, it's worth being more aggressive. If you're trying to preempt future problems, then ease into it. There's a balance, and it's not easy to find.
 Cancel
Zarko Zivkovic

2011-05-17T06:41:26-07:00

I think we are slowly beginning to see some websites that are on the road to recovery from the Panda. maybe not the ones that were hit the most on the global level, but as far as i could tell most of them took some drastic emasures to improve the quality of the content and also to remove the duplicate content issues. the only thing bugging me is the fact that non-English websites (most of them) had yet to see the ligh of Panda as they were not yet affected, the question is if they will be at all. But for smart users this gives a time window to sort all the issues before Panda comes to their homes :)

1 0

I think we are slowly beginning to see some websites that are on the road to recovery from the Panda. maybe not the ones that were hit the most on the global level, but as far as i could tell most of them took some drastic emasures to improve the quality of the content and also to remove the duplicate content issues. the only thing bugging me is the fact that non-English websites (most of them) had yet to see the ligh of Panda as they were not yet affected, the question is if they will be at all. But for smart users this gives a time window to sort all the issues before Panda comes to their homes :)
Cancel
Francisco Javier Sanz

2011-05-06T08:27:00-07:00

Hi everyone!

I was very busy the last month and now that I'm back, I've found the new "thin content" expresion. I was searching, but I couldn't find a good definition of it. It has to be a language issue (I'm from Argentina, Patagonia). Could somebody tell me the meaning of it?

Thank you all! Have a great weekend!

Fran.

1 0

Hi everyone! I was very busy the last month and now that I'm back, I've found the new "thin content" expresion. I was searching, but I couldn't find a good definition of it. It has to be a language issue (I'm from Argentina, Patagonia). Could somebody tell me the meaning of it? Thank you all! Have a great weekend! Fran.
Cancel
- Dr. Peter J. Meyers
 
 2011-05-06T08:30:22-07:00
 
 It really just means that the web pages in question are light on original content. Google talks a lot about "thin affiliates", for example. These are sites that repurpose other people's products for sale (with affiliate links) but then add essentially nothing of value to them. They buy a keyword-loaded domain, slap up some pages, and wait for the money to come in. Meanwhile, the content on those pages is on 100 other sites across the web. So, while the pages may be long or short, the original content is "thin".
 
 Dr-Pete edited 2011-05-06T08:30:54-07:00
 1 0
 
 It really just means that the web pages in question are light on original content. Google talks a lot about "thin affiliates", for example. These are sites that repurpose other people's products for sale (with affiliate links) but then add essentially nothing of value to them. They buy a keyword-loaded domain, slap up some pages, and wait for the money to come in. Meanwhile, the content on those pages is on 100 other sites across the web. So, while the pages may be long or short, the original content is "thin".
 Cancel
 - Francisco Javier Sanz
 
 2011-05-06T08:54:06-07:00
 
 Dr. Pete: thanks a lot! Now I understand!
 
 So, in few words: a website with "thin content" means that the website has poor/few original content. Right?
 
 Thanks again!
 
 1 0
 
 Dr. Pete: thanks a lot! Now I understand! So, in few words: a website with "thin content" means that the website has poor/few original content. Right? Thanks again!
 Cancel
algogmbh_petra

2011-05-05T21:53:22-07:00

Really good takaways for beating duplicate content issues - not only regarding the Panda update which hasn't hit us (in my country) yet. With that solid guide counteracting should be possible!

1 0

Really good takaways for beating duplicate content issues - not only regarding the Panda update which hasn't hit us (in my country) yet. With that solid guide counteracting should be possible!
Cancel
rahul_b

2011-09-02T04:37:21-07:00

Liked the term "thin content". Brilliant post I must say.

We've been scratching our heads trying to figure out what to do after taking the pounding from the G-Panda.

Ours is an e-commerce website with about 4.2 million pages (unique URLs). All the content is from publishers since we sell books.

I fully agree with your solution of padding up the content....

"The red text is the same, but here I’ve supplemented it with 2 unique bits of copy: (1) a brief editorial description, and (2) user reviews. Even a unique 1-2 sentence lead-off editorial that’s unique to your site can make a difference, and UGC is free (although it does take time to build)."

.... but really strugglng to see how to do it with the number of pages we have.

There must be atleast 400 plus websites using the same content and this starts from the publisher to amazon to people like us.

Is there nothing that people like me can do ?

1 0

Liked the term "thin content". Brilliant post I must say. We've been scratching our heads trying to figure out what to do after taking the pounding from the G-Panda. Ours is an e-commerce website with about 4.2 million pages (unique URLs). All the content is from publishers since we sell books. I fully agree with your solution of padding up the content.... "The red text is the same, but here I’ve supplemented it with 2 unique bits of copy: (1) a brief editorial description, and (2) user reviews. Even a unique 1-2 sentence lead-off editorial that’s unique to your site can make a difference, and UGC is free (although it does take time to build)." .... but really strugglng to see how to do it with the number of pages we have. There must be atleast 400 plus websites using the same content and this starts from the publisher to amazon to people like us. Is there nothing that people like me can do ?
Cancel
- Marcus Miller
 
 2011-09-12T04:12:46-07:00
 
 Hey Rahul
 
 This is exactly the problem and where your site brings nothing new to the index then you will always lose out the big authority sites like amazon. I am working with a client now who has a similar problem although not on the scale you are talking about here.
 
 Really, you can look at this as a problem or an opportunity and whilst rewriting all that content is going to take forever, if you are the only vendor online with original descriptions for these products you will possibly be able to piggy back over lots of other more authoritative sites due to the unique content that makes for better search results.
 
 4.2 Million products is a hell of a rewrite though so best of luck with that!
 
 Marcus
 
 1 0
 
 Hey Rahul This is exactly the problem and where your site brings nothing new to the index then you will always lose out the big authority sites like amazon. I am working with a client now who has a similar problem although not on the scale you are talking about here. Really, you can look at this as a problem or an opportunity and whilst rewriting all that content is going to take forever, if you are the only vendor online with original descriptions for these products you will possibly be able to piggy back over lots of other more authoritative sites due to the unique content that makes for better search results. 4.2 Million products is a hell of a rewrite though so best of luck with that! Marcus
 Cancel
clubd20

2013-11-27T19:31:09-08:00

Thank you very much for the helpful info. I am running into the thin pages problem with one of my websites.

1 0

 Thank you very much for the helpful info. I am running into the thin pages problem with one of my websites. 
Cancel
Janet Nevins

2015-03-17T06:48:44-07:00

Great article and it seems Google is now going to put a stop to the duplicate pages in local.

1 0

Great article and it seems Google is now going to put a stop to the duplicate pages in local.
Cancel
Nihal E Abdulla

2015-08-18T06:39:53-07:00

My site have been Google sent email that "Thin content with little or no added valueThis site appears to contain a significant percentage of low-quality or shallow pages which do not provide users with much added value (such as thin affiliate pages, cookie-cutter sites, doorway pages, automatically generated content, or copied content)"

I've posted a few articles How to/Tips format and send request review to Google but, after 7 days, but they have not indexed for my new post

So, Should I deindex my site and build a new site?

1 0

My site have been Google sent email that "Thin content with little or no added valueThis site appears to contain a significant percentage of low-quality or shallow pages which do not provide users with much added value (such as thin affiliate pages, cookie-cutter sites, doorway pages, automatically generated content, or copied content)" I've posted a few articles How to/Tips format and send request review to Google but, after 7 days, but they have not indexed for my new post So, Should I deindex my site and build a new site?
Cancel
Wil Martindale

2015-08-19T07:00:55-07:00

Very good article, THANKS!!

Bottom line ... avoid low quality pages in the first place. But "search within search" is an interesting factor (and possible dilemma) to consider. Something you don't really think about until you get into something like a large e-commerce site.

1 0

Very good article, THANKS!! Bottom line ... avoid low quality pages in the first place. But "search within search" is an interesting factor (and possible dilemma) to consider. Something you don't really think about until you get into something like a large e-commerce site.
Cancel
CommercePundit

2013-06-14T23:41:19-07:00

I am working on one e-commerce website where we have added 300+ pages to target different local cities in USA. We have added quite different paragraphs on 100+ pages to remove internal duplicate issue and save our website from Panda penalty.

You can visit following page to know more about it. And, We have added unique paragraphs on few pages. But, I have big concerns with other elements which are available on page like Banner Gallery, Front Banner, Tool and few other attributes which are commonly available on each pages exclude 4 to 5 sentence paragraph.

I have compiled one XML sitemap with all local pages and submitted to Google webmaster tools since 1st June 2013. But, I can see only 1 indexed page by Google on Google webmaster tools.

https://www.bannerbuzz.com/local

https://www.bannerbuzz.com/local/US/Alabama/Vinyl-Banners

https://www.bannerbuzz.com/local/MO/Kansas-City/Vinyl-Banners

and so on...

Can anyone suggest me best solution for it?

1 0

I am working on one e-commerce website where we have added 300+ pages to target different local cities in USA. We have added quite different paragraphs on 100+ pages to remove internal duplicate issue and save our website from Panda penalty. You can visit following page to know more about it. And, We have added unique paragraphs on few pages. But, I have big concerns with other elements which are available on page like Banner Gallery, Front Banner, Tool and few other attributes which are commonly available on each pages exclude 4 to 5 sentence paragraph. I have compiled one XML sitemap with all local pages and submitted to Google webmaster tools since 1st June 2013. But, I can see only 1 indexed page by Google on Google webmaster tools. <a href="https://www.bannerbuzz.com/local" rel="nofollow">https://www.bannerbuzz.com/local</a> <a href="https://www.bannerbuzz.com/local/US/Alabama/Vinyl-Banners" rel="nofollow">https://www.bannerbuzz.com/local/US/Alabama/Vinyl-Banners</a> <a href="https://www.bannerbuzz.com/local/MO/Kansas-City/Vinyl-Banners" rel="nofollow">https://www.bannerbuzz.com/local/MO/Kansas-City/Vinyl-Banners</a> and so on... Can anyone suggest me best solution for it?
Cancel
Brian Reynolds

2012-04-04T20:10:16-07:00

Great post on how to handle different forms of duplicate content. Good to know what is worse as well...good categorization.

1 0

Great post on how to handle different forms of duplicate content. Good to know what is worse as well...good categorization.
Cancel
Develop41

2011-08-21T13:39:47-07:00

The most informative article I've read on the topic. Thanks.

1 0

The most informative article I've read on the topic. Thanks.
Cancel
Cory Haldeman

2011-05-05T13:50:21-07:00

Great Post. From what I can tell the biggest impact of Panda is keeping SEOs honest. A lot of the issues that Panda has rasied relates to us as SEOs not properly taking the time to diagnose and address duplicate content.

1 0

Great Post. From what I can tell the biggest impact of Panda is keeping SEOs honest. A lot of the issues that Panda has rasied relates to us as SEOs not properly taking the time to diagnose and address duplicate content.
Cancel
Alain Carpentier

2011-11-04T16:52:18-07:00

Nice article, I personnaly have lot of problem with duplicate content in some of my websites that generates lot of pages for categories, etc... This give me some ideas to fix.

1 0

Nice article, I personnaly have lot of problem with duplicate content in some of my websites that generates lot of pages for categories, etc... This give me some ideas to fix. 
Cancel
fchristant

2011-08-17T13:39:10-07:00

Great article and it largely covers how to avoid and/or recover from Panda. Still, I do want to say that Panda has quite a few false positives:

- Maybe I have a very simple lookup question as a query. The first result answer it and I immediately hit back. That does not mean it is a terrible site.

- Some types of sites are naturally thin on content. A photo site for example. Users may totally love the site but Google does not

- Many sites rely on user-generated content. This can lead to some pages being of low quality whilst the site as a whole is great. Still, the site as a whole is punished

- There have been reports of people with original content being punished whilst a content thief is promoted. Google still has problems determining an original source.

- Google is also not able to determine the intend of an affiliate link. It could be very useful to the user

In other words: Panda is unable to understand context in all situations and thereby also makes a lot of victims.

1 0

Great article and it largely covers how to avoid and/or recover from Panda. Still, I do want to say that Panda has quite a few false positives: - Maybe I have a very simple lookup question as a query. The first result answer it and I immediately hit back. That does not mean it is a terrible site. - Some types of sites are naturally thin on content. A photo site for example. Users may totally love the site but Google does not - Many sites rely on user-generated content. This can lead to some pages being of low quality whilst the site as a whole is great. Still, the site as a whole is punished - There have been reports of people with original content being punished whilst a content thief is promoted. Google still has problems determining an original source. - Google is also not able to determine the intend of an affiliate link. It could be very useful to the user In other words: Panda is unable to understand context in all situations and thereby also makes a lot of victims.
Cancel
- Dr. Peter J. Meyers
 
 2011-08-17T14:30:34-07:00
 
 I strongly believe this is part of why Panda is not just one ranking signal, but a combination of inputs being fed into a maching-learning system. What you end up with is a sort of "PandaRank", and that becomes just one more ranking factor in the 200+. So, in other words, you have 200+ ranking factors, with Panda being one of them, and even Panda represents dozens of signals.
 
 As you said, any given signal, especially user signals (like bounce rate), could false positive in some situations. Honestly, that's true of many ranking factors. That's why searche engines look at so many factors and combine them in such complex ways. Regardless, though, there are always situations where they get it wrong. It's certainly no coincidence that we're on Panda 2.3 at this point.
 
 1 0
 
 I strongly believe this is part of why Panda is not just one ranking signal, but a combination of inputs being fed into a maching-learning system. What you end up with is a sort of "PandaRank", and that becomes just one more ranking factor in the 200+. So, in other words, you have 200+ ranking factors, with Panda being one of them, and even Panda represents dozens of signals. As you said, any given signal, especially user signals (like bounce rate), could false positive in some situations. Honestly, that's true of many ranking factors. That's why searche engines look at so many factors and combine them in such complex ways. Regardless, though, there are always situations where they get it wrong. It's certainly no coincidence that we're on Panda 2.3 at this point.
 Cancel
 - fchristant
 
 2011-08-19T01:38:04-07:00
 
 Agreed. The way Panda differs though is that in a way it is binary. You are punished or not. Where all other individual parameters may have an incremental effect on your ranking, Panda may sweep away almost all traffic in one go if you trigger it via any unknown combination of parameters.
 
 1 0
 
 Agreed. The way Panda differs though is that in a way it is binary. You are punished or not. Where all other individual parameters may have an incremental effect on your ranking, Panda may sweep away almost all traffic in one go if you trigger it via any unknown combination of parameters.
 Cancel
Blake Waddill

2011-05-05T10:08:32-07:00

I’m curious how directories affect duplicate content, specifically address and descriptions. Are unique descriptions necessary for each individual directory? Can the description be a snippet from the website, or does it need to be unique?

1 0

I’m curious how directories affect duplicate content, specifically address and descriptions. Are unique descriptions necessary for each individual directory? Can the description be a snippet from the website, or does it need to be unique?
Cancel
Flapjack

2011-05-05T02:38:01-07:00

great article pete. i've bookmarked it for an indepth read over a cup of coffee for later on.

I have three ads per page all above the fold in their respective sizes (160,300,728).

One of my competitors that now lands on page 1/2 for most of the time has 8 ads and then has a click here to view the actual content that leads to another 12 ads on the actual page. 95% of the ads are adsense. I have no adsense ads on mine. And my pages that used to be on 1/2 are now on p13/17. I have seen some sites at the forefront with roughly more or the same as my ads but with less unique content.

So either ads are not that much of a concern to google or they are giving sites showing their ads preferential treatment.

I have tons of duplicate content on the web 100% of which are from fruitcakes copying my meta descriptions or pages.

One thing I've never done is rss feeds or social facebook, myspace etc. The new sites I'm seeing at the front appear to be more socially intertwined (facebook shares, bookmarks etc).

Some of the sites I am now seeing daily (for new content) are at page 1/2, are high traffic sites but pre panda used to rank on pages 3-5 so the social interaction must carry a lot of weight.

I have a feeling that once legitimate sites fix their crawl errors, duplicate content, dmca's etc then they'll begin to rise. I'm slowly beginning to see a slight increase with site fixes that I'm continuing to do now.

thanks

Flapjack edited 2011-05-05T03:24:35-07:00
1 0

great article pete. i've bookmarked it for an indepth read over a cup of coffee for later on. I have three ads per page all above the fold in their respective sizes (160,300,728). One of my competitors that now lands on page 1/2 for most of the time has 8 ads and then has a click here to view the actual content that leads to another 12 ads on the actual page. 95% of the ads are adsense. I have no adsense ads on mine. And my pages that used to be on 1/2 are now on p13/17. I have seen some sites at the forefront with roughly more or the same as my ads but with less unique content. So either ads are not that much of a concern to google or they are giving sites showing their ads preferential treatment. I have tons of duplicate content on the web 100% of which are from fruitcakes copying my meta descriptions or pages. One thing I've never done is rss feeds or social facebook, myspace etc. The new sites I'm seeing at the front appear to be more socially intertwined (facebook shares, bookmarks etc). Some of the sites I am now seeing daily (for new content) are at page 1/2, are high traffic sites but pre panda used to rank on pages 3-5 so the social interaction must carry a lot of weight. I have a feeling that once legitimate sites fix their crawl errors, duplicate content, dmca's etc then they'll begin to rise. I'm slowly beginning to see a slight increase with site fixes that I'm continuing to do now. thanks
Cancel
- Dr. Peter J. Meyers
 
 2011-05-05T07:47:51-07:00
 
 Unfortunately, the rules don't get applied evenly, and you can find an exception for any SEO best practice. Google has been vague about ad guidelines, and for obvious reasons - they make a fortune from Adsense. Still, they've been pretty clear about frowning on abuse, since it harms the buyer side.
 
 How does this translate into organic SEO? That's a lot tougher. We saw some Panda cases where ad ratio seemed to play in, but it's not clear, and there were sites that seemed to be affected by Panda that didn't have this issue. There were also, as you said, sites with tons of ads that don't seem to be affected at all. We're getting to the point where these algo changes aren't just based on one single variable or an IF statement. They're getting more and more sophisticated, and even Google can't always predict the results when they roll complex chagnes out.
 
 2 0
 
 Unfortunately, the rules don't get applied evenly, and you can find an exception for any SEO best practice. Google has been vague about ad guidelines, and for obvious reasons - they make a fortune from Adsense. Still, they've been pretty clear about frowning on abuse, since it harms the buyer side. How does this translate into organic SEO? That's a lot tougher. We saw some Panda cases where ad ratio seemed to play in, but it's not clear, and there were sites that seemed to be affected by Panda that didn't have this issue. There were also, as you said, sites with tons of ads that don't seem to be affected at all. We're getting to the point where these algo changes aren't just based on one single variable or an IF statement. They're getting more and more sophisticated, and even Google can't always predict the results when they roll complex chagnes out.
 Cancel
Steve Morgan

2011-05-05T04:40:08-07:00

This post couldn't have come at a better time. Earlier this morning, I was researching why my friend's site in the UK (a comparison site) had been affected by Panda, after its traffic had dropped massively from mid-April onwards.

Your comments on the "Search within Search" issue got me thinking and I soon discovered and realised that some of their search results were getting indexed as well as the main category landing pages. We're going to work on noindexing them ASAP and hopefully it'll resolve the issue.

Thank you very much, Dr. Pete, you're a lifesaver!

1 0

This post couldn't have come at a better time. Earlier this morning, I was researching why my friend's site in the UK (a comparison site) had been affected by Panda, after its traffic had dropped massively from mid-April onwards. Your comments on the "Search within Search" issue got me thinking and I soon discovered and realised that some of their search results were getting indexed as well as the main category landing pages. We're going to work on noindexing them ASAP and hopefully it'll resolve the issue. Thank you very much, Dr. Pete, you're a lifesaver!
Cancel
Aaron Dicks

2011-05-05T05:42:22-07:00

I think the sugestion of noindexing the search within search pages are the most valuable. I have a category page for manufacturers - one that will acutally be of use to my users, so I have addded lots of unique content around the listings to make it unique and 'fat'. The majority of my category pages and practically all of the categories on my clients sites are noindexed since long before Panda.

Great post title!

Regards

Aaron

1 0

I think the sugestion of noindexing the search within search pages are the most valuable. I have a category page for manufacturers - one that will acutally be of use to my users, so I have addded lots of unique content around the listings to make it unique and 'fat'. The majority of my category pages and practically all of the categories on my clients sites are noindexed since long before Panda. Great post title! Regards Aaron
Cancel
Gareth Cartman

2011-05-05T01:50:38-07:00

Thanks for this excellent post. Has really cleared a few things up.

On the Adsense side of things I always found I got paid more per click when I was running one add as opposed to bombarding the page with ads.

1 0

Thanks for this excellent post. Has really cleared a few things up. On the Adsense side of things I always found I got paid more per click when I was running one add as opposed to bombarding the page with ads.
Cancel
Martin Preisler

2011-05-05T01:07:02-07:00

Just in time, as we are about to launch a geo based scenario. Thank you.

1 0

Just in time, as we are about to launch a geo based scenario. Thank you.
Cancel
entropytc

2011-05-04T22:12:36-07:00

This post is really relevant for those using affiliate datafeeds where the same description field could be used on hundreds of different sites.

Thanks Dr. Pete

1 0

This post is really relevant for those using affiliate datafeeds where the same description field could be used on hundreds of different sites. Thanks Dr. Pete
Cancel
Anirban Das

2011-05-04T22:15:05-07:00

Good notes for Panda..thanks for that.what I am seeing if you are a original content creator and your domain has less authoritative,and a big scrapper website scrapped your content before Google index the originator and after seeing your content, identifying "you" as a scrapper,bad part of panda, there is no solution..till now...another point those who are using rss feed publication from major websites/blogs on their prospective industry verticals,G-panda actually giving preference to those website,where a big time professional writing website out ranked on SERP..incredible!huh!- ..another bad part is Google pandalized authoritative website's those who are sharing content's to there partner's, even a site like Digital Trends,some medical research website's got affected...third one is if some one re-write your 5 years old article's,then G panda outranking your five year old content by this re-written copy or stolen copy....filing about DMCA?its possible for those who are maintaining for 1000+ pages, but its near to impossible for those having pages 100k+....to some extend panda running similar treatment for content originator or scrapped content provider...

Anirban_Das edited 2011-05-04T22:22:05-07:00
1 0

Good notes for Panda..thanks for that.what I am seeing if you are a original content creator and your domain has less authoritative,and a big scrapper website scrapped your content before Google index the originator and after seeing your content, identifying "you" as a scrapper,bad part of panda, there is no solution..till now...another point those who are using rss feed publication from major websites/blogs on their prospective industry verticals,G-panda actually giving preference to those website,where a big time professional writing website out ranked on SERP..incredible!huh!- ..another bad part is Google pandalized authoritative website's those who are sharing content's to there partner's, even a site like Digital Trends,some medical research website's got affected...third one is if some one re-write your 5 years old article's,then G panda outranking your five year old content by this re-written copy or stolen copy....filing about DMCA?its possible for those who are maintaining for 1000+ pages, but its near to impossible for those having pages 100k+....to some extend panda running similar treatment for content originator or scrapped content provider...
Cancel
- Dr. Peter J. Meyers
 
 2011-05-05T07:41:38-07:00
 
 I wish we had good answers for people who were unfairly affected by Panda (such as being treated like scrapers when they were the content source), but we really don't yet. You can appeal to Google, you can build authority, and you can push the legal side, but any of those may be ineffective or may require a large investment of time and money.
 
 1 0
 
 I wish we had good answers for people who were unfairly affected by Panda (such as being treated like scrapers when they were the content source), but we really don't yet. You can appeal to Google, you can build authority, and you can push the legal side, but any of those may be ineffective or may require a large investment of time and money.
 Cancel
 - Anirban Das
 
 2011-05-05T19:42:43-07:00
 
 Yes,indeed though we are digging the point of panda content classifier and ad placement policy besides scrapping issues....though again we wish when Google panda settle down after some slicing and dicing ,it will again identify the originator..
 
 1 0
 
 Yes,indeed though we are digging the point of panda content classifier and ad placement policy besides scrapping issues....though again we wish when Google panda settle down after some slicing and dicing ,it will again identify the originator..
 Cancel
 - Anirban Das
 
 2011-05-06T23:36:46-07:00
 
 Google's Amit Singhal came out with few points Google Webmaster Central blog More guidance on building high-quality sites.Let's get into the points(few steps has been done by the expertise webmaster's already!)
 
 Anirban_Das edited 2011-05-06T23:37:19-07:00
 1 0
 
 Google's Amit Singhal came out with few points Google Webmaster Central blog <a href="https://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html" rel="nofollow">More guidance on building high-quality sites</a>.Let's get into the points(few steps has been done by the expertise webmaster's already!)
 Cancel
Francisco Meza

2011-05-04T23:27:30-07:00

I like: 7. Search within Search

I am an ecommerce web developer and this is by far my biggest problem. I'm still not 100% sure that no indexing those duplicate pages are the best thing to do. Instead, buying plugins maybe the way to go for us ecommerce web developers.

1 0

I like: 7. Search within Search I am an ecommerce web developer and this is by far my biggest problem. I'm still not 100% sure that no indexing those duplicate pages are the best thing to do. Instead, buying plugins maybe the way to go for us ecommerce web developers. 
Cancel
- Steve Morgan
 
 2011-05-05T04:48:07-07:00
 
 In my friend's site's instance (see my comment below), deeper search pages are getting indexed, which I'm happy to noindex because they offer little to no value to people visiting from the search engines. We would rather the main category landing page (which is the start of the search results) be what they land on anyway.
 
 That's what you have to think about at the end of the day:
 - Will removing these pages from Google's index affect people coming onto the site?
 - On the contrary, will it be better (they come through a main landing page rather than an odd search result)?
 If it gets rid of thin/duplicate content issues while also improving customer experience then it's a double-win, but like you say, if you're worried it might kill traffic then checking Analytics for the offending pages should help to do the trick.
 
 steviephil edited 2011-05-05T04:49:06-07:00
 1 0
 In my friend's site's instance (see my comment below), deeper search pages are getting indexed, which I'm happy to noindex because they offer little to no value to people visiting from the search engines. We would rather the main category landing page (which is the start of the search results) be what they land on anyway. That's what you have to think about at the end of the day: <ul><li>Will removing these pages from Google's index affect people coming onto the site?</li><li>On the contrary, will it be better (they come through a main landing page rather than an odd search result)?</li></ul> If it gets rid of thin/duplicate content issues while also improving customer experience then it's a double-win, but like you say, if you're worried it might kill traffic then checking Analytics for the offending pages should help to do the trick.
 Cancel
- Dr. Peter J. Meyers
 
 2011-05-05T07:44:26-07:00
 
 Search within search is tough, and blocking search pagination isn't the right solution for all sites. In general, though, I do think that you should focus on your top level search pages. Landing someone on Page 17 of results for subtopic Q isn't high-value, and those visitors are probably going to bounce. Usually, it makes more sense to focus your index and your internal link-juice.
 
 As for plug-ins, too many of them copy content across pages (internally) or even across sites. There are exceptions, but I find that many of them don't add valuable content. In addition, if they're Javascript-based or use other dynamic technology, they may not even be crawled as content.
 
 2 0
 
 Search within search is tough, and blocking search pagination isn't the right solution for all sites. In general, though, I do think that you should focus on your top level search pages. Landing someone on Page 17 of results for subtopic Q isn't high-value, and those visitors are probably going to bounce. Usually, it makes more sense to focus your index and your internal link-juice. As for plug-ins, too many of them copy content across pages (internally) or even across sites. There are exceptions, but I find that many of them don't add valuable content. In addition, if they're Javascript-based or use other dynamic technology, they may not even be crawled as content.
 Cancel
donthe

2011-05-05T06:13:54-07:00

Great post. Any post about Panda these days is comforting. It shows you haven't forgotten those slapped by the evil Panda :-(

I lost 30% of my Google traffic from panda 2.0 in April and losing more every day.(after gaining 25% from Panda 1.0)

70% of my traffic (40k/day) lands on my category pages which contain a unique 100 word introduction and 10 listings per page. The listings are like a search result page, the item title, description and link.

How and why do I make these category pages unique? My visitors don't want to read a 300 word introduction. They are there for the listings I can't Noindex the category pages because they are the landing pages for 70% of my traffic.

Any ideas or tips would be most appreciated!

1 0

Great post. Any post about Panda these days is comforting. It shows you haven't forgotten those slapped by the evil Panda :-( I lost 30% of my Google traffic from panda 2.0 in April and losing more every day.(after gaining 25% from Panda 1.0) 70% of my traffic (40k/day) lands on my category pages which contain a unique 100 word introduction and 10 listings per page. The listings are like a search result page, the item title, description and link. How and why do I make these category pages unique? My visitors don't want to read a 300 word introduction. They are there for the listings I can't Noindex the category pages because they are the landing pages for 70% of my traffic. Any ideas or tips would be most appreciated!
Cancel
Fladem

2011-05-05T06:28:02-07:00

Thank you for this post.

I'm a PPC manager and i've seen a global (15 countries) decrease of conversions on april.

I suspect Panda is guilty. Why? Because more then Google searches, Google has "search partners", the inside search in websites. It isn't a "content campaign". So, if these site were banned on Google organic, perhaps it can clarify me the decrease of conversions for almost all campaigns.

Any Opinion?

1 0

Thank you for this post. I'm a PPC manager and i've seen a global (15 countries) decrease of conversions on april. I suspect Panda is guilty. Why? Because more then Google searches, Google has "search partners", the inside search in websites. It isn't a "content campaign". So, if these site were banned on Google organic, perhaps it can clarify me the decrease of conversions for almost all campaigns. Any Opinion?
Cancel
- Dr. Peter J. Meyers
 
 2011-05-05T07:51:36-07:00
 
 Even on the organic side, Google makes something on the order of a change EVERY day, and only a handful get names or press. Tie in the PPC side, and it's really tough to pin down whether any given algo change impacted someone.
 
 It's possible that Panda impacted the content network dramatically, although I haven't seen direct evidence of that. On the Adwords side, though, you could see this by looking at placements. Were there dramatic shifts in which sites showed your ads? Maybe some of the high-convering sites went away, dropping your average conversion rate.
 
 I had a similar thing happen to a client a year or two ago. They have an ambiguous name that's shared with other companies and entities, but they advertise a lot, so it tends to be cost-effective to bid on the broad term. One content network site was generating solid conversions, and then we saw a dip. We later realized that that site was originally a parked domain but then got bought out and became a dating site (no relevance to my client or their business), so just that ONE site caused a decent-sized drop in conversion rate.
 
 1 0
 
 Even on the organic side, Google makes something on the order of a change EVERY day, and only a handful get names or press. Tie in the PPC side, and it's really tough to pin down whether any given algo change impacted someone. It's possible that Panda impacted the content network dramatically, although I haven't seen direct evidence of that. On the Adwords side, though, you could see this by looking at placements. Were there dramatic shifts in which sites showed your ads? Maybe some of the high-convering sites went away, dropping your average conversion rate. I had a similar thing happen to a client a year or two ago. They have an ambiguous name that's shared with other companies and entities, but they advertise a lot, so it tends to be cost-effective to bid on the broad term. One content network site was generating solid conversions, and then we saw a dip. We later realized that that site was originally a parked domain but then got bought out and became a dating site (no relevance to my client or their business), so just that ONE site caused a decent-sized drop in conversion rate.
 Cancel
Kasy Allen

2011-05-05T09:36:35-07:00

Thanks for the post Dr. Pete. I work with a lot of project managers that ask about duplicate content issues all the time, and I've been putting off a manual like this for far too long.

I think you did an excellent job of explaining situations that we face with our clients on a daily basis, and this is a perfect way for our employees and clients to understand exactly how important unique content really is.

Thanks again!

1 0

Thanks for the post Dr. Pete. I work with a lot of project managers that ask about duplicate content issues all the time, and I've been putting off a manual like this for far too long. I think you did an excellent job of explaining situations that we face with our clients on a daily basis, and this is a perfect way for our employees and clients to understand exactly how important unique content really is. Thanks again! 
Cancel
Jeremy Nelson

2011-05-05T10:15:24-07:00

One issue that I am quite close to is that of additional attributes via faceted navigation. The ability to add Brand + refinement + refinement is very useful, not only from an on-site perspective but also for capturing the long tail. Unfortunately, there seems to be no good way to go about this, since many of the pages have the same products even if they are refined with additional attributes.

I hesitate to recommend noindexing those kinds of pages. On a page by page basis, they add little to traffic stats, but cumulatively speaking it would be deflecting a significant portion of traffic.

I suppose I am left with choosing important attributes on a category-by-category basis, and noindex, following those pages with less important attributes (such as price).

1 0

One issue that I am quite close to is that of additional attributes via faceted navigation. The ability to add Brand + refinement + refinement is very useful, not only from an on-site perspective but also for capturing the long tail. Unfortunately, there seems to be no good way to go about this, since many of the pages have the same products even if they are refined with additional attributes. I hesitate to recommend noindexing those kinds of pages. On a page by page basis, they add little to traffic stats, but cumulatively speaking it would be deflecting a significant portion of traffic. I suppose I am left with choosing important attributes on a category-by-category basis, and noindex, following those pages with less important attributes (such as price).
Cancel
- Dr. Peter J. Meyers
 
 2011-05-05T11:50:53-07:00
 
 Internal search is a lot tougher than URL-based duplication, where canonicalization is kind of a no-brainer. I think you have to look at your data. If those deep search pages have traffic, that's important. If they have links, then you have to know that, too (and you may choose to canonicalize instead of noindexing).
 
 I also think you have to separate facets from sorts and display options. For example, ascending/descending results or Show 10 vs. 50 vs. 100 per page are pretty useless for crawlers (while clearly valuable for visitors). Start with those low-value variations. Then, you may want to tackle paginated results. Then, you can look at deep subcategories and see if they have value. It's ok to do this in stages, and it's probably smart if your site is doing well.
 
 1 0
 
 Internal search is a lot tougher than URL-based duplication, where canonicalization is kind of a no-brainer. I think you have to look at your data. If those deep search pages have traffic, that's important. If they have links, then you have to know that, too (and you may choose to canonicalize instead of noindexing). I also think you have to separate facets from sorts and display options. For example, ascending/descending results or Show 10 vs. 50 vs. 100 per page are pretty useless for crawlers (while clearly valuable for visitors). Start with those low-value variations. Then, you may want to tackle paginated results. Then, you can look at deep subcategories and see if they have value. It's ok to do this in stages, and it's probably smart if your site is doing well.
 Cancel
karen kouf

2011-05-05T10:24:25-07:00

Great, great post and perfect timing. I'll pass this along to a client who copied product info from a supplier because they were busy, now I have the experts on my side! Thanks.

1 0

Great, great post and perfect timing. I'll pass this along to a client who copied product info from a supplier because they were busy, now I have the experts on my side! Thanks.
Cancel
Alan Bleiweiss

2011-05-05T11:06:26-07:00

Dr. Pete,

You've definitely hit the critical "thin content" points. While there are other Panda factors, site owners should definitely pay attention to and address any / all of these you've covered that they might have on their sites. Really good work on this article.

1 0

Dr. Pete, You've definitely hit the critical "thin content" points. While there are other Panda factors, site owners should definitely pay attention to and address any / all of these you've covered that they might have on their sites. Really good work on this article. 
Cancel
- Dr. Peter J. Meyers
 
 2011-05-05T11:52:05-07:00
 
 Thanks, Alan. I know this is your specialty and you deal with these problems on some big sites, so I appreciate the vote of confidence.
 
 1 0
 
 Thanks, Alan. I know this is your specialty and you deal with these problems on some big sites, so I appreciate the vote of confidence.
 Cancel
Dan-Petrovic

2011-05-05T07:21:43-07:00

Is it true that Google looks at the size of the ads in addition to placement location and numbers?

1 0

Is it true that Google looks at the size of the ads in addition to placement location and numbers?
Cancel
- Dr. Peter J. Meyers
 
 2011-05-05T07:53:54-07:00
 
 I don't have clear data on that, but I strongly suspect that they do. They clearly have the technology, from the PPC side, and it's become pretty evident that Google can visually parse a page. We're seeing more and more that they can tell headers from navigation from ads from footers, etc. There's almost no way to do that without rendering the HTML. That's also why some tricks, like moving content around in source code with CSS, seem to be very low-impact these days. I think Google has a pretty good sense of what a page looks like.
 
 1 0
 
 I don't have clear data on that, but I strongly suspect that they do. They clearly have the technology, from the PPC side, and it's become pretty evident that Google can visually parse a page. We're seeing more and more that they can tell headers from navigation from ads from footers, etc. There's almost no way to do that without rendering the HTML. That's also why some tricks, like moving content around in source code with CSS, seem to be very low-impact these days. I think Google has a pretty good sense of what a page looks like.
 Cancel
Momchil Petrushkov

2011-05-05T07:07:01-07:00

Well this is just the next amazing article... Thank you ...

1 0

Well this is just the next amazing article... Thank you ... 
Cancel
Nick Stamoulis

2011-05-05T07:04:26-07:00

This article does a great job of explaining duplicate content in all its forms, which is something that a lot of site owners don't fully grasp. There are varying degrees of "duplicate," and each of them can negatively impact your site's performance in Google. Thanks for not only pointing them out, but also offering suggestions on how to fix it.

1 0

This article does a great job of explaining duplicate content in all its forms, which is something that a lot of site owners don't fully grasp. There are varying degrees of "duplicate," and each of them can negatively impact your site's performance in Google. Thanks for not only pointing them out, but also offering suggestions on how to fix it.
Cancel
John Doherty

2011-05-05T13:02:51-07:00

Dr Pete -

I have to say, your posts are always right up my alley. I love a good technical SEO post.

I think it's important to point out that Webmaster Tools offers many different ways to direct the search engines what to index and what to avoid. While I am not exactly sure how Bing does it (do they use robots.txt when you "disallow" a certain type of URL, like "a /?tag/title"?), but they are many different ways to direct the search engines.

I just discovered in GWT the other day this ability as well. If you log in, then go to Site Configuration -> Settings -> Parameter Handling, there will be a list of uniquely generated parameters (from your site) that you can direct Google on how to handle these parameters! Genius!

I'd love to see someone write a full post about Google and Bing's different ways of disallowing content.

Great post!

1 0

Dr Pete - I have to say, your posts are always right up my alley. I love a good technical SEO post. I think it's important to point out that Webmaster Tools offers many different ways to direct the search engines what to index and what to avoid. While I am not exactly sure how Bing does it (do they use robots.txt when you "disallow" a certain type of URL, like "a /?tag/title"?), but they are many different ways to direct the search engines. I just discovered in GWT the other day this ability as well. If you log in, then go to Site Configuration -> Settings -> Parameter Handling, there will be a list of uniquely generated parameters (from your site) that you can direct Google on how to handle these parameters! Genius! I'd love to see someone write a full post about Google and Bing's different ways of disallowing content. Great post!
Cancel
Harriet Yoder

2011-05-05T06:55:29-07:00

Thanks for a timely and helpful article.

Think Thin, Ugh.

I am recombining pages that I split because long pages were bad. I used 301 directs, updated the sitemaps, and I think Google likes it. Adding unique content as I check pages is helping, too.

Why? Our sales and page views are back up to more normal levels for this time of year. [Were we affected by Panda or is the economy finally hitting our niche? I think a bit of both.]

I think what Google wants isn't necessarily thin, but lean, mean, and muscular which can be bulky. It makes sense from Google's point of view. Why should they spend money, time, and effort spidering mass quantities of junk and repetitious junk at that?

1 0

Thanks for a timely and helpful article. Think Thin, Ugh. I am recombining pages that I split because long pages were bad. I used 301 directs, updated the sitemaps, and I think Google likes it. Adding unique content as I check pages is helping, too. Why? Our sales and page views are back up to more normal levels for this time of year. [Were we affected by Panda or is the economy finally hitting our niche? I think a bit of both.] I think what Google wants isn't necessarily thin, but lean, mean, and muscular which can be bulky. It makes sense from Google's point of view. Why should they spend money, time, and effort spidering mass quantities of junk and repetitious junk at that? 
Cancel
Joachim Andersson

2011-05-05T04:01:48-07:00

Great article!

I have experienced a lot of change in e-commerce solutions I work on lately, and the only reasonable answer I've been able to find is that content is considered as duplicate content, due to products that may be found in several categories. Local SEO experts here in Sweden argue that Google Farmer/Panda update will not look at internal duplicate information as duplicate information, but I am sure this is the case. Your article is actually the first one I've found that says the same. Internal duplicate content is bad and Farmer/Panda update will be an issue.

Again, thanks!

Best regards,

Joachim Andersson

https://i.googlify.se (English) | https://www.bluebirdsolutions.se (Swedish)

1 2

Great article! I have experienced a lot of change in e-commerce solutions I work on lately, and the only reasonable answer I've been able to find is that content is considered as duplicate content, due to products that may be found in several categories. Local SEO experts here in Sweden argue that Google Farmer/Panda update will not look at internal duplicate information as duplicate information, but I am sure this is the case. Your article is actually the first one I've found that says the same. Internal duplicate content is bad and Farmer/Panda update will be an issue. Again, thanks! Best regards, Joachim Andersson https://i.googlify.se (English) | https://www.bluebirdsolutions.se (Swedish)
Cancel
- David Defoe
 
 2011-05-05T06:43:31-07:00
 
 Running a few ecommerce sites I have seen product pages decrease in Goggle after the Panda update. The pages would fall under the above mentioned "Near Duplicates (Internal) format. I've often wondered how to get around this or if Goggle will tweak the algorithm for ecommerce sites. The reality is - most of the pages are the same. In the automotive industry each product is considered different and has a unique stock number so every item is listed. I may have 5 items that are identical and look like duplicate content with the exception of the stock number and VIN.
 
 2 0
 
 Running a few ecommerce sites I have seen product pages decrease in Goggle after the Panda update. The pages would fall under the above mentioned "Near Duplicates (Internal) format. I've often wondered how to get around this or if Goggle will tweak the algorithm for ecommerce sites. The reality is - most of the pages are the same. In the automotive industry each product is considered different and has a unique stock number so every item is listed. I may have 5 items that are identical and look like duplicate content with the exception of the stock number and VIN.
 Cancel

Post Analytics

Fat Pandas and Thin Content

Quality: A Machine’s View

1. True Duplicates (Internal)

The Solution

2. True Duplicates (Cross-site)

The Solution

3. Near Duplicates (Internal)

The Solution

4. Near Duplicates (Cross-site)

The Solution

5. Low Unique Ratio

The Solution

6. High Ad Ratio

The Solution

7. Search within Search

The Solution

A Few Words of Caution

Comments 77

Quality: A Machine’s View

1. True Duplicates (Internal)

The Solution

2. True Duplicates (Cross-site)

The Solution

3. Near Duplicates (Internal)

The Solution

4. Near Duplicates (Cross-site)

The Solution

5. Low Unique Ratio

The Solution

6. High Ad Ratio

The Solution

7. Search within Search

The Solution

A Few Words of Caution

Comments 77

Log in to Moz

Don't have an account?