How to Deal with Pagination & Duplicate Content Issues

Comments 56

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Joost de Valk

2007-06-07T03:58:50-07:00

Rand, I hate to see you miss such a great opportunity to link to me :P

My fix for blog / wordpress pages going supplemental :)

8 0

Rand, I hate to see you miss such a great opportunity to link to me :P <a href="https://www.joostdevalk.nl/wordpress-subpages-going-supplemental-the-fix/" rel="nofollow">My fix for blog / wordpress pages going supplemental</a> :) 
Cancel
- BFTUK
 
 2007-06-07T04:45:22-07:00
 
 Thanks for that Joost, very handy.
 
 I also found this video yesterday whilst looking for the exact same fix for duplicate content!
 
 https://www.wolf-howl.com/video/make-wordpress-search-engine-friendly/
 
 2 1
 
 Thanks for that Joost, very handy. I also found this video yesterday whilst looking for the exact same fix for duplicate content! https://www.wolf-howl.com/video/make-wordpress-search-engine-friendly/ 
 Cancel
- Kurt
 
 2007-06-07T06:48:01-07:00
 
 Good link Joost!
 
 1 0
 
 Good link Joost!
 Cancel
- Brian Horn
 
 2007-06-07T09:10:14-07:00
 
 Thanks for the link. Thumbs up!
 
 1 0
 
 Thanks for the link. Thumbs up!
 Cancel
Associate

Will Critchlow
Associate

2007-06-07T02:21:12-07:00

There's another SEOmoz tool idea in there somewhere - a duplicate content detector. I would build one, but it sounds hard, so I thought I'd just suggest it ;)

It is easy to find some examples of dup content, but a thorough site review can be hard - especially for large or complex sites. It's not always obvious the number of different ways you can reach content...

And a question: Do you believe there is any duplicate content issue with very small canonicalisation stuff, e.g.: www.example.com/directory, www.example.com/directory/ and www.example.com/directory/index.html all providing the same content? Seems to me that it would be nice to fix this, but I'm sure the engines are on top of this one. Anyone got any thoughts on that?

7 1

There's another SEOmoz tool idea in there somewhere - a duplicate content detector. I would build one, but it sounds hard, so I thought I'd just suggest it ;) It is easy to find some examples of dup content, but a thorough site review can be hard - especially for large or complex sites. It's not always obvious the number of different ways you can reach content... And a question: Do you believe there is any duplicate content issue with very small canonicalisation stuff, e.g.: www.example.com/directory, www.example.com/directory/ and www.example.com/directory/index.html all providing the same content? Seems to me that it would be nice to fix this, but I'm sure the engines are on top of this one. Anyone got any thoughts on that?
Cancel
- Taylor Cimala
 
 2007-06-07T08:50:01-07:00
 
 Well www.example.com/directory will 301 redirect to www.example.com/directory/ by default so there isn't much issue there. With the /index.html thrown into the mix though I don't think that there is much to worry about getting hit with dup content, but I would say it is diluting the strength of the intended URL though. Plus, this gives users the opportunity to naturally link to either page that they want, which can further dilute the page(s).
 
 1 2
 
 Well www.example.com/directory will 301 redirect to www.example.com/directory/ by default so there isn't much issue there. With the /index.html thrown into the mix though I don't think that there is much to worry about getting hit with dup content, but I would say it is diluting the strength of the intended URL though. Plus, this gives users the opportunity to naturally link to either page that they want, which can further dilute the page(s).
 Cancel
 - PocketSEO
 
 2007-06-08T17:12:22-07:00
 
 www.example.com/directory will only redirect to www.example.com/directory/ if there is an actual directory there.
 
 Some content management systems create things that look like directories that are not actual directories and that can be accessed with or without the trailing slash. They are different URLs.
 
 A slash in Unix is the symbol for a directory (as a backslash is in Windows). For example, the correct URL for a home page is https://example.com/ -- the trailing slash indicates that you are requesting the root directory of example.com. If you forget the trailing slash, the server will add it.
 
 On Apache you can use .htaccess to enforce the inclusion of a slash or removal of the slash.
 
 2 0
 
 www.example.com/directory will only redirect to www.example.com/directory/ if there is an actual directory there. Some content management systems create things that look like directories that are not actual directories and that can be accessed with or without the trailing slash. They are different URLs. A slash in Unix is the symbol for a directory (as a backslash is in Windows). For example, the correct URL for a home page is https://example.com/ -- the trailing slash indicates that you are requesting the root directory of example.com. If you forget the trailing slash, the server will add it. On Apache you can use .htaccess to enforce the inclusion of a slash or removal of the slash.
 Cancel
- Cyberdogs7
 
 2007-06-07T20:21:29-07:00
 
 I actually have been seeing issues like this a lot with my clients recently. I'm not sure how the SE's deal with it but I have seen some funky things. Like www.example.com/home.html having 0/10 PR but the same page when typed in www.example.com/Home.html has a 6/10 PR. I'm not sure if Google sees the two pages as seperate or if just PR is case sensitive, either way it seems kinda wacky. Anyone else see anything like this and if so any fixes? Does it even matter?
 
 1 0
 
 I actually have been seeing issues like this a lot with my clients recently. I'm not sure how the SE's deal with it but I have seen some funky things. Like www.example.com/home.html having 0/10 PR but the same page when typed in www.example.com/Home.html has a 6/10 PR. I'm not sure if Google sees the two pages as seperate or if just PR is case sensitive, either way it seems kinda wacky. Anyone else see anything like this and if so any fixes? Does it even matter?
 Cancel
 - identity
 
 2007-06-08T07:33:46-07:00
 
 Technically, those are or at least can be two different pages. On a *nix based server home.html and Home.html could be different content, so could have two different PR ratings. IIS servers don't make a distinction between these.
 
 1 0
 
 Technically, those are or at least can be two different pages. On a *nix based server home.html and Home.html could be different content, so could have two different PR ratings. IIS servers don't make a distinction between these. 
 Cancel
 - PocketSEO
 
 2007-06-08T17:16:53-07:00
 
 Letter case in URLS definitely matters with Google. /Home.html is a different URL than /home.html, though as you mention IIS (Windows) doesn't care. If the site were on a *nix server one or the other would send a 404 error.
 
 For example:
 https://www.google.com/index.html is correct
 
 https://www.google.com/Index.html is a 404 error
 
 2 0
 
 Letter case in URLS definitely matters with Google. /Home.html is a different URL than /home.html, though as you mention IIS (Windows) doesn't care. If the site were on a *nix server one or the other would send a 404 error. For example: https://www.google.com/index.html is correct https://www.google.com/Index.html is a 404 error
 Cancel
solro

2007-06-07T02:58:46-07:00

Here is an often overlooked tip for reducing blog duplicate content.

1. Add category descriptions, headers and other unique information to the category pages and the main (or most powerfull) paginated pages.

This is easily accomplished by adding extra templates to your blog's theme (like custom category templates which have a category description etc).

I'll do a follow up blog post and give away some more tips. Nice post!

solro edited 2007-06-07T03:04:47-07:00
4 0

Here is an often overlooked tip for reducing blog duplicate content. 1. Add category descriptions, headers and other unique information to the category pages and the main (or most powerfull) paginated pages. This is easily accomplished by adding extra templates to your blog's theme (like custom category templates which have a category description etc). I'll do a follow up blog post and give away some more tips. Nice post! 
Cancel
- Joost de Valk
 
 2007-06-07T03:57:24-07:00
 
 Very true Solomon, on my blog I've got unique titles and such for pages as well. It's not quite easy to do in WordPress though, you'll have to invest some time :)
 
 1 0
 
 Very true Solomon, on my blog I've got unique titles and such for pages as well. It's not quite easy to do in WordPress though, you'll have to invest some time :)
 Cancel
- Patricia Skinner
 
 2007-06-09T07:40:54-07:00
 
 Solomon, please don't forget to follow up. I for one am very interested in this topic. :) Thanks.
 
 1 0
 
 Solomon, please don't forget to follow up. I for one am very interested in this topic. :) Thanks.
 Cancel
Hendy

2007-06-08T07:38:04-07:00

What I do with blogs to avoid dup. content.

1. Truncate the post to 30-40 words.

2. "Conitinue reading here" - make sure to change the anchor text into something more appropriate (like the post title).

3. Category pages or archive pages? I dont use archive pages. They list the content in it's entirety. I use category pages with message excepts.

From a usability standpoint, having smaller posts reduces scrolls and lets people navigate to the content they want, more quickly.

mytwocents

3 0

What I do with blogs to avoid dup. content. 1. Truncate the post to 30-40 words. 2. "Conitinue reading here" - make sure to change the anchor text into something more appropriate (like the post title). 3. Category pages or archive pages? I dont use archive pages. They list the content in it's entirety. I use category pages with message excepts. From a usability standpoint, having smaller posts reduces scrolls and lets people navigate to the content they want, more quickly. mytwocents 
Cancel
roadies

2007-06-07T02:38:00-07:00

I agree this is a huge ongoing issue, and while the engines are potentially working with the major blog/CMS platforms to address default setups and the duplicate content they create out of the box, there is still a lot of potential abuse issues for lesser supported sites and plugins. The practice of tagging and archiving only increased the issue.

I recall reading a post recently about an end-all-be-all of .htaccess and/or robots.txt files for Wordpress. This suppossedly handled categories, archives, feeds, tags, comments, index pages, etc. However, I just spent a half-hour looking for said post and can't find it for the life of me. If I do find it in the near future, I'll post it up.

3 0

I agree this is a huge ongoing issue, and while the engines are potentially working with the major blog/CMS platforms to address default setups and the duplicate content they create out of the box, there is still a lot of potential abuse issues for lesser supported sites and plugins. The practice of tagging and archiving only increased the issue. I recall reading a post recently about an end-all-be-all of .htaccess and/or robots.txt files for Wordpress. This suppossedly handled categories, archives, feeds, tags, comments, index pages, etc. However, I just spent a half-hour looking for said post and can't find it for the life of me. If I do find it in the near future, I'll post it up. 
Cancel
- Kurt
 
 2007-06-07T06:29:33-07:00
 
 I recall reading a post recently about an end-all-be-all of .htaccess and/or robots.txt files for Wordpress. This suppossedly handled categories, archives, feeds, tags, comments, index pages, etc. However, I just spent a half-hour looking for said post and can't find it for the life of me. If I do find it in the near future, I'll post it up.
 
 This would be really helpful... please post it if you find it Roadies.
 
 Thanks.
 
 1 0
 
 <blockquote>I recall reading a post recently about an end-all-be-all of .htaccess and/or robots.txt files for Wordpress. This suppossedly handled categories, archives, feeds, tags, comments, index pages, etc. However, I just spent a half-hour looking for said post and can't find it for the life of me. If I do find it in the near future, I'll post it up. </blockquote> This would be really helpful... please post it if you find it Roadies. Thanks.
 Cancel
 - Michael Visser
 
 2007-06-07T21:29:35-07:00
 
 You owe me a thumbs up. :D
 
 Creating the ultimate wordpress robotstxt file
 
 visser edited 2007-06-07T21:30:07-07:00
 5 0
 
 You owe me a thumbs up. :D <a href="https://www.twentysteps.com/creating-the-ultimate-wordpress-robotstxt-file/" rel="nofollow">Creating the ultimate wordpress robotstxt file</a> 
 Cancel
- Hamlet Batista
 
 2007-06-07T15:55:08-07:00
 
 I would add two items to Rand's article: one to the problem and one to the solution.
 
 As willcritchlow comments: Same content accessed via multiple URLs is a potential issue as well (ie.:/index.html and /; www and non-www). This can be fixed via .htaccess's PermanentRedirect, Google webmaster central and/or some mod_rewrite lines.
 
 In addition to meta robots tag "noindex", we can alternatively use the robots.txt disallow directive. This is especially useful to block robots' access to RSS feeds. As the feeds are in XML, we can not use the meta robots tag there.
 
 I wrote a detailed post about this in my blog. I include relevant examples of robots.txt and .htaccess files.
 
 HamletBatista edited 2007-06-07T16:28:04-07:00
 3 0
 
 I would add two items to Rand's article: one to the problem and one to the solution. As willcritchlow comments: Same content accessed via multiple URLs is a potential issue as well (ie.:/index.html and /; www and non-www). This can be fixed via .htaccess's PermanentRedirect, Google webmaster central and/or some mod_rewrite lines. In addition to meta robots tag "noindex", we can alternatively use the robots.txt disallow directive. This is especially useful to block robots' access to RSS feeds. As the feeds are in XML, we can not use the meta robots tag there. I wrote a <a href="https://hamletbatista.com/2007/06/07/preventing-duplicate-content-issues-v/" rel="nofollow">detailed post</a> about this in my blog. I include relevant examples of robots.txt and .htaccess files. 
 Cancel
 - Patrick Sexton
 
 2007-06-07T16:46:16-07:00
 
 Hamlet, nice article.
 
 You bring up the robot.txt and how Google supports the wildcard, it is a very true and simple piece of advice that many people can benefit from.
 
 1 0
 
 Hamlet, nice article. You bring up the robot.txt and how Google supports the wildcard, it is a very true and simple piece of advice that many people can benefit from.
 Cancel
 - Hamlet Batista
 
 2007-06-07T16:50:00-07:00
 
 Thanks, Pat. I am glad you found the article useful.
 
 HamletBatista edited 2007-06-07T16:51:58-07:00
 1 0
 
 Thanks, Pat. I am glad you found the article useful.
 Cancel
 - PocketSEO
 
 2007-06-08T17:45:58-07:00
 
 I don't think that all robots should be blocked from the RSS feeds -- at least not the main one(s). I let Google get my main RSS feed and block the other feeds (in WordPress). I create alternate content in the main RSS feed by creating a custom excerpt for each post which becomes the content of the feed. That prevents duplicate content of the home page and it also creates different content on the category pages.
 
 1 0
 
 I don't think that all robots should be blocked from the RSS feeds -- at least not the main one(s). I let Google get my main RSS feed and block the other feeds (in WordPress). I create alternate content in the main RSS feed by creating a custom excerpt for each post which becomes the content of the feed. That prevents duplicate content of the home page and it also creates different content on the category pages.
 Cancel
 - Frank Watson
 
 2007-06-09T08:00:27-07:00
 
 Between your post and this one I think I have dup content solutions well covered...
 
 robots.txt files were developed to cover a lot of these issues and am glad they are starting to get the serious attention they deserve.
 
 Now if someone were to create a site that creates and monitors a site's robots.txt file we would have a winner.
 
 1 0
 
 Between your post and this one I think I have dup content solutions well covered... robots.txt files were developed to cover a lot of these issues and am glad they are starting to get the serious attention they deserve. Now if someone were to create a site that creates and monitors a site's robots.txt file we would have a winner.
 Cancel
 - Hamlet Batista
 
 2007-06-09T09:22:44-07:00
 
 AussieWebmaster,
 
 I think I have the idea for doing what you propose.
 
 Such a tool would crawl a website, identify the duplicate content and create the robots.txt file to fix this.
 
 I have the code for the crawler, and creating the robots.txt is not a big deal. I need some time to carefully research the best approach to detect duplicate content.
 
 I will post the crawler to my blog next week.
 
 1 0
 
 AussieWebmaster, I think I have the idea for doing what you propose. Such a tool would crawl a website, identify the duplicate content and create the robots.txt file to fix this. I have the code for the crawler, and creating the robots.txt is not a big deal. I need some time to carefully research the best approach to detect duplicate content. I will post the crawler to my blog next week.
 Cancel
Frank Watson

2007-06-09T08:36:01-07:00

Does Google offer the same robots no-content attribute as Yahoo? Have not asked about this yet.... though you have reminded me to do that Monday unless someone wants to jump in here.

2 0

Does Google offer the same <a href="https://www.ysearchblog.com/archives/000444.html" rel="nofollow">robots no-content attribute as Yahoo</a>? Have not asked about this yet.... though you have reminded me to do that Monday unless someone wants to jump in here.
Cancel
- Joost de Valk
 
 2007-06-09T08:38:18-07:00
 
 They do.
 
 1 0
 
 <a href="https://www.joostdevalk.nl/noindex-for-rss-feeds/" rel="nofollow">They do.</a>
 Cancel
 - Joost de Valk
 
 2007-06-09T08:39:45-07:00
 
 Hmm sorry, that's for RSS feeds. They don't for content afaik.
 
 2 0
 
 Hmm sorry, that's for RSS feeds. They don't for content afaik.
 Cancel
 - Frank Watson
 
 2007-06-09T08:41:14-07:00
 
 Thanks for the quick response... let's see what we can do about that - there are enough influential people here to help it along.
 
 1 0
 
 Thanks for the quick response... let's see what we can do about that - there are enough influential people here to help it along.
 Cancel
 - Joost de Valk
 
 2007-06-09T08:42:57-07:00
 
 Personally i don't like the thought of it, and would like to see it documented some more before diving into it... How links within blocks like that are handled for instance...
 
 2 0
 
 Personally i don't like the thought of it, and would like to see it documented some more before diving into it... How links within blocks like that are handled for instance...
 Cancel
 
 PocketSEO
 
 2007-06-09T09:00:15-07:00
 
 I agree. The robots no-content attribute is not a good idea. The average Web site owner will not know how to use it correctly and SEOs will come up with all kinds of absurd explanations about where it should be applied.
 
 A vast number of people still get rel=nofollow wrong (hint: it does not mean "do not follow", it means "do not vouch").
 
 No-content is non-standardized, ambiguous code bloat and should not be used on Web sites. It is the job of the search engines to determine which sections of a page are part of the template and which parts are the content.
 
 In the future microformats may be able to provide more information about sections of Web pages to search engines, but no-content is not a good solution.
 
 2 0
 
 I agree. The robots no-content attribute is not a good idea. The average Web site owner will not know how to use it correctly and SEOs will come up with all kinds of absurd explanations about where it should be applied. A vast number of people still get rel=nofollow wrong (hint: it does not mean "do not follow", it means "do not vouch"). No-content is non-standardized, ambiguous code bloat and should not be used on Web sites. It is the job of the search engines to determine which sections of a page are part of the template and which parts are the content. In the future microformats may be able to provide more information about sections of Web pages to search engines, but no-content is not a good solution.
 Cancel
 
 Joost de Valk
 
 2007-06-09T09:01:50-07:00
 
 Agreed... though all these sorts of tags are nice for our job security, I don't think they make the web a better place...
 
 We should focus on making things simpler... Not harder.
 
 Yoast edited 2007-06-09T09:02:20-07:00
 2 0
 
 Agreed... though all these sorts of tags are nice for our job security, I don't think they make the web a better place... We should focus on making things simpler... Not harder. 
 Cancel
Tomislav Biscan

2007-06-07T14:36:15-07:00

G-Man wrote about Wordpress duplication before half year.

Another problem (not usually in blogging software, but in custom CMS's) it's if you separate one article or category with pages and link it on this way: Original URL: example.com/article-name.htm Page 2: example.com/article-name/page2.htm And link from page 2 or page x to the first page: example.com/article-name/page1.htm

Notice that you have duplication of Original/page 1 content! You need to also avoid that.

2 0

<a href="https://www.lookwhatgmanfound.com/duplicate-content-in-wordpress-begone/" rel="nofollow">G-Man</a> wrote about Wordpress duplication before half year. Another problem (not usually in blogging software, but in custom CMS's) it's if you separate one article or category with pages and link it on this way: Original URL: example.com/article-name.htm Page 2: example.com/article-name/page2.htm And link from page 2 or page x to the first page: example.com/article-name/page1.htm Notice that you have duplication of Original/page 1 content! You need to also avoid that.
Cancel
- Licciardi
 
 2007-06-07T19:23:24-07:00
 
 Great point. Never thought to look for this!
 
 1 0
 
 Great point. Never thought to look for this!
 Cancel
dockarl

2007-06-19T17:51:23-07:00

Hi Rand!

Thanks for the informative post.

The latest chatter over at the G Webmaster Help forums seems to suggest that dupe is not a cause of supps. Personally I think that's not true, but irregardlesss...

I've had a bash at making a plugin to fix these issues - I also have there a few quotes about the latest thoughts on the causes / solutions etc - but not quite as pretty as yours.

https://www.utheguru.com/seo_wordpress-wordpress-seo-plugin

I'd love your comments / suggestions about how to improve the plugin.

Ciao,

Matt

1 0

Hi Rand! Thanks for the informative post. The latest chatter over at the G Webmaster Help forums seems to suggest that dupe is not a cause of supps. Personally I think that's not true, but irregardlesss... I've had a bash at making a plugin to fix these issues - I also have there a few quotes about the latest thoughts on the causes / solutions etc - but not quite as pretty as yours. https://www.utheguru.com/seo_wordpress-wordpress-seo-plugin I'd love your comments / suggestions about how to improve the plugin. Ciao, Matt 
Cancel
Eren

2008-04-02T19:31:13-07:00

Okay - I have an idea as to how to deal with this and I'd like to know your opinion on it.

What if instead of using the categories which automatically creates duplicate content simply not use the categories.

Instead with the text box widget create our anchor text links with the categories names in it.

Each time we write a post that would be categorized in that subject, we could go to that main page and manually write in a few sentences for the new post and link from that main post to the new post.

To solve the front page - make it a page that we manually go to and update too if we weremaking a site and not a blog.

This technique could be used to build a site with wordpress- what do you think of it?

I think that in this way we could be certain of it not having duplicate content.

1 0

Okay - I have an idea as to how to deal with this and I'd like to know your opinion on it. What if instead of using the categories which automatically creates duplicate content simply not use the categories. Instead with the text box widget create our anchor text links with the categories names in it. Each time we write a post that would be categorized in that subject, we could go to that main page and manually write in a few sentences for the new post and link from that main post to the new post. To solve the front page - make it a page that we manually go to and update too if we weremaking a site and not a blog. This technique could be used to build a site with wordpress- what do you think of it? I think that in this way we could be certain of it not having duplicate content.
Cancel
Aidan Beanland

2010-01-26T14:51:07-08:00

Thankfully, since this article was written, we have the rel=canonical solution which really helps address such headaches.

1 0

Thankfully, since this article was written, we have the rel=canonical solution which really helps address such headaches.
Cancel
SpiderMan55

2008-05-12T06:29:02-07:00

I tend to agree with Matt that dup content does impact your site.

I've been dealing with duplicate content on pagination of comments. The comments would get paged, but the text of the post would show up on the next page of comments (duplicate content). I came up with a seo friendly solution to dealing with hundreds of comments - it's a wordpress plugin called paginated comments.

I know this is not exactly on topic, but it's close enough that those of you struggling with too many comments may appreciate the information.

1 0

I tend to agree with Matt that dup content does impact your site. I've been dealing with duplicate content on pagination of comments. The comments would get paged, but the text of the post would show up on the next page of comments (duplicate content). I came up with a seo friendly solution to dealing with hundreds of comments - it's a wordpress plugin called paginated comments. I know this is not exactly on topic, but it's close enough that those of you struggling with too many comments may appreciate the information.
Cancel
Frank Watson

2007-06-09T08:10:12-07:00

This and the support links makes the article mandatory reading.

I would copy and paste it all into one big blog post but would get hammered for duplicate content :)

So maybe I will just link to the articles and show some link love.

AussieWebmaster edited 2007-06-09T08:32:17-07:00
1 0

This and the support links makes the article mandatory reading. I would copy and paste it all into one big blog post but would get hammered for duplicate content :) So maybe I will just link to the articles and show some link love. 
Cancel
Christian Maund-Anderson

2007-10-23T12:32:07-07:00

hmm, but what about those occaisional links that happen to show up in your articles, do you noindex, nofollow... or noindex, follow print friendly pages?

1 0

hmm, but what about those occaisional links that happen to show up in your articles, do you noindex, nofollow... or noindex, follow print friendly pages?
Cancel
LindaBustos

2007-06-07T16:46:36-07:00

Excellent post, I'm making this mandatory reading for our development team. I was totally wondering how search engines handle the multiple areas where content gets placed. I wonder if having a lot of comments helps, as they are usually not displayed on the home page?

Also, I've seen some blogs only post intros with "read more" links to the full post. Is this another workaround?

LindaBustos edited 2007-06-07T16:47:14-07:00
1 0

Excellent post, I'm making this mandatory reading for our development team. I was totally wondering how search engines handle the multiple areas where content gets placed. I wonder if having a lot of comments helps, as they are usually not displayed on the home page? Also, I've seen some blogs only post intros with "read more" links to the full post. Is this another workaround? 
Cancel
Navaneeth Kishor

2007-06-07T04:46:56-07:00

In a wordpress structure if you are using the excerpts for the archive pages will it be helpful enough to get rid of this Duplicate issue? While using Excerts in teh Category pages I found some of those pages are getting ranked for some Combinations and getting listed in teh SEPRs. So I am not seeing any serious issues facing for restricting the visibility to such pages.

If I go with redirection which page is going to replace my old pages? I am not sure 301 resirect is the solution here.. "Noindex" is acceptable and seems to be the best solution.

I could be wrong though...

1 0

In a wordpress structure if you are using the excerpts for the archive pages will it be helpful enough to get rid of this Duplicate issue? While using Excerts in teh Category pages I found some of those pages are getting ranked for some Combinations and getting listed in teh SEPRs. So I am not seeing any serious issues facing for restricting the visibility to such pages. If I go with redirection which page is going to replace my old pages? I am not sure 301 resirect is the solution here.. "Noindex" is acceptable and seems to be the best solution. I could be wrong though...
Cancel
- Joost de Valk
 
 2007-06-08T01:54:14-07:00
 
 kichus: i agree with the first part of what you're saying: make them rank for combinations of terms in different blog posts. Just make sure these pages are unique :)
 
 I myself have been thinking of a way to automically include one keyword from each post into the title for that specific category or archive page... that would probably rule :)
 
 1 0
 
 kichus: i agree with the first part of what you're saying: make them rank for combinations of terms in different blog posts. Just make sure these pages are unique :) I myself have been thinking of a way to automically include one keyword from each post into the title for that specific category or archive page... that would probably rule :) 
 Cancel
BamaStangGuy

2007-06-07T02:44:13-07:00

Well I was looking through this sites robots.txt and noticed you block page and category from being indexed. Just curious on why both of these were blocked and how you expect bots to find deep pages if you don't allow them to view categories or pages?

On a default Wordpress install you have archive, category along with pagination on those pages. Then you have the pagination of the main site. How do you suggest organizing this for bots? Right now I am allowing categories to be indexed and blocking archives along with the paginated main page files. ex. site.com/page/2/ but allowing site.com/category/name/ and site.com/category/name/page/2/

I am trying to find a better way to do this as I am not happy with that setup.

BamaStangGuy edited 2007-06-07T02:48:17-07:00
1 0

Well I was looking through this sites robots.txt and noticed you block page and category from being indexed. Just curious on why both of these were blocked and how you expect bots to find deep pages if you don't allow them to view categories or pages? On a default Wordpress install you have archive, category along with pagination on those pages. Then you have the pagination of the main site. How do you suggest organizing this for bots? Right now I am allowing categories to be indexed and blocking archives along with the paginated main page files. ex. site.com/page/2/ but allowing site.com/category/name/ and site.com/category/name/page/2/ I am trying to find a better way to do this as I am not happy with that setup. 
Cancel
- JoeCotellese81
 
 2010-12-31T06:08:30-08:00
 
 So which did you ultimately end up using for your Wordpress setup? Did you block categories, tags or did you block the pagination links?
 
 1 0
 
 So which did you ultimately end up using for your Wordpress setup? Did you block categories, tags or did you block the pagination links?
 Cancel
identity

2007-06-07T06:23:57-07:00

Duplication is like an onion, or a bloomin' onion (mmm) if you prefer your onions deep-fried...

...once you get through one layer, you find there is another one underneath--- although an endless bloomin' onion would be a good thing, not so much with duplication issues.

1 0

Duplication is like an onion, or a bloomin' onion (mmm) if you prefer your onions deep-fried... ...once you get through one layer, you find there is another one underneath--- although an endless bloomin' onion would be a good thing, not so much with duplication issues. 
Cancel
Kurt

2007-06-07T08:56:39-07:00

Here is another wordpress duplicate content issue...

I have a bunch of pages indexed with referrer metadata... so the URLs looks like these:

https://www.mysite.com/?referrer=www.othersite.com

https://www.mysite.com/index.php?referrer=www.othersite.com

And the pages show up with the same content as the index page.

I think you can do this for any wordpress page.. add a question mark after the URL for any page and put in some additional characters... instant duplicate content.

1 0

Here is another wordpress duplicate content issue... I have a bunch of pages indexed with referrer metadata... so the URLs looks like these: https://www.mysite.com/?referrer=www.othersite.com https://www.mysite.com/index.php?referrer=www.othersite.com And the pages show up with the same content as the index page. I think you can do this for any wordpress page.. add a question mark after the URL for any page and put in some additional characters... instant duplicate content.
Cancel
- Kurt
 
 2007-06-07T09:04:40-07:00
 
 Now that I think about it, this could be used against a wordpress blog to screw with your rankings.
 
 Post a few hundred links to your competitor's site like:
 
 https://www.competitorsite.com/?your-rankings-will-belong-to-me
 
 https://www.competitorsite.com/?have-fun-sorting-this-one-out
 
 I hope someone has a fix for this because I’m vulnerable right now.
 
 1 0
 
 Now that I think about it, this could be used against a wordpress blog to screw with your rankings. Post a few hundred links to your competitor's site like: https://www.competitorsite.com/?your-rankings-will-belong-to-me https://www.competitorsite.com/?have-fun-sorting-this-one-out I hope someone has a fix for this because I’m vulnerable right now.
 Cancel
 - PocketSEO
 
 2007-06-08T22:16:27-07:00
 
 If the pages are indexed, make a list of them and then redirect those specific URLs to the correct location with 301 redirects. Then configure your site to send a 404 header when non-existent query strings are requested.
 
 On a site where those URLs haven't been indexed, you could add to robots.txt:
 
 User-agent: *
 Disallow: /?
 
 and/or:
 
 Disallow: /*?
 
 PocketSEO edited 2007-06-08T22:22:16-07:00
 1 0
 
 If the pages are indexed, make a list of them and then redirect those specific URLs to the correct location with 301 redirects. Then configure your site to send a 404 header when non-existent query strings are requested. On a site where those URLs haven't been indexed, you could add to robots.txt: User-agent: * Disallow: /? and/or: Disallow: /*?
 Cancel
Andrew Shotland

2007-06-07T17:20:07-07:00

I have been dealing with a major dupe issue with a client. Their site is basically an aggregation of blogs. Members all have a blog page and then at the top level there are several indexes of blog posts, e.g. most recent by category, most popular by category, most popular by category today, most popular by category all time, etc. And of course each directory has pagination. I tried no indexing and no folllowing a lot of these pages to create a direct path to the member blogs and main category pages and if anything it stopped traffic growth dead in its tracks. I am removing them now but looking for some creative ideas as to how to deal with this stuff. Any ideas Mozzers?

1 0

I have been dealing with a major dupe issue with a client. Their site is basically an aggregation of blogs. Members all have a blog page and then at the top level there are several indexes of blog posts, e.g. most recent by category, most popular by category, most popular by category today, most popular by category all time, etc. And of course each directory has pagination. I tried no indexing and no folllowing a lot of these pages to create a direct path to the member blogs and main category pages and if anything it stopped traffic growth dead in its tracks. I am removing them now but looking for some creative ideas as to how to deal with this stuff. Any ideas Mozzers?
Cancel
Drig

2007-06-07T15:58:15-07:00

Awesome post. Im trying to deal with duplicate content right now dealing with http and https.

1 0

Awesome post. Im trying to deal with duplicate content right now dealing with http and https.
Cancel
- Hamlet Batista
 
 2007-06-07T16:30:41-07:00
 
 Drig,
 
 I don't think http and https will cause duplicate content problems. Search engine crawlers only follow http links, as far as I know.
 
 I added an http to https redirection example to the .htaccess file in my post.
 
 HamletBatista edited 2007-06-07T16:56:57-07:00
 1 0
 
 Drig, I don't think http and https will cause duplicate content problems. Search engine crawlers only follow http links, as far as I know. I added an http to https redirection example to the .htaccess file in my post.
 Cancel
 - identity
 
 2007-06-08T07:24:17-07:00
 
 Actually, I didn't think SEs had an issue with crawling or indexing https.
 
 1 0
 
 Actually, I didn't think SEs had an issue with crawling or indexing https. 
 Cancel
 - Hamlet Batista
 
 2007-06-08T08:24:42-07:00
 
 Identity, you are right. I was not up to date on this.
 
 Thanks for the heads up! I'm updating my article to correct this.
 
 1 0
 
 Identity, you are right. I was not up to date on this. Thanks for the heads up! I'm updating my article to correct this.
 Cancel
 - Adam Carson
 
 2007-06-08T08:27:56-07:00
 
 I had a client site that had a minor problem with some duplicate http/https pages. Google doesn't seem to have a problem indexing https pages if they're linked to. That doesn't necessarily mean they'll choose to show that one over an identical http version but it's something to consider.
 
 2 0
 
 I had a client site that had a minor problem with some duplicate http/https pages. Google doesn't seem to have a problem indexing https pages if they're linked to. That doesn't necessarily mean they'll choose to show that one over an identical http version but it's something to consider.
 Cancel
- PocketSEO
 
 2007-06-08T18:00:17-07:00
 
 Sending the same content over HTTP and HTTPS can create duplicate content. This often happens on sites that use relative URLs on internal links.
 
 So if you are on https://example.com/page.php and then click on href="/" then you (and spiders) will end up on https://example.com/ instead of https://example.com/ -- two different URLs with the same content.
 
 One solution is to send different robots.txt files for HTTP and HTTPS.
 
 More here:
 https://www.google.com/support/webmasters/bin/answer.py?answer=35302
 https://blogs.msdn.com/livesearch/archive/2006/06/28/649980.aspx
 
 PocketSEO edited 2007-06-08T18:44:55-07:00
 1 0
 
 Sending the same content over HTTP and HTTPS can create duplicate content. This often happens on sites that use relative URLs on internal links. So if you are on https://example.com/page.php and then click on href="/" then you (and spiders) will end up on https://example.com/ instead of https://example.com/ -- two different URLs with the same content. One solution is to send different robots.txt files for HTTP and HTTPS. More here: https://www.google.com/support/webmasters/bin/answer.py?answer=35302 https://blogs.msdn.com/livesearch/archive/2006/06/28/649980.aspx 
 Cancel
 - Hamlet Batista
 
 2007-06-08T18:47:41-07:00
 
 That is the proposed solution in my post.
 
 1 0
 
 That is the proposed solution in my post.
 Cancel
itaca

2007-06-08T13:50:20-07:00

This is a really great way illustrate it. This is definately going to my programmers.

1 0

This is a really great way illustrate it. This is definately going to my programmers.
Cancel

Post Analytics

Comments 56

Log in to Moz

Don't have an account?