Rewriting the Beginner's Guide Part IV Continued - Canonical and Duplicate Versions of Content

Comments 55

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Henry_Saurus

2008-01-21T08:42:22-08:00

Fantastic article, love the diagrams!I've been lurking here at SEOmoz for a while now, but I figured I better post and say thanks once in a while! You guys are basicly teaching me SEO (Im an apprentice SEO at a company in the uk) and I really appreciate it. - Henry

3 0

Fantastic article, love the diagrams!I've been lurking here at SEOmoz for a while now, but I figured I better post and say thanks once in a while! You guys are basicly teaching me SEO (Im an apprentice SEO at a company in the uk) and I really appreciate it. - Henry
Cancel
identity

2008-01-21T20:25:35-08:00

While canonicalization is hard for both SEO's and novices alike to say five times fast, it's even harder for novices to understand, even if explained five times slow.

There are two ways to look at this, from a "content" perspective and a "URL" perspective. The problem may be that we often talk from a content point of view, when what we are really talking about is URLs.

When you talk with clients who aren't all that knowledgable, it isn't surprising that they have such a hard time with this, but when you talk with clients who are very web savvy who also have a hard time with this, it makes you start to wonder whether you are both on the same page.

Let's take a typical site's homepage as an example... most would say, "No, that page is unique, that content isn't used anywhere else." This of course is a content focused view, based on duplicating content. But it helps to break it down to the URL level, to show them that:

domain.com and www.domain.com are actually two pages (that just so happen to show the same content), which with a little explanation, many will get and then implement redirects to either the www or the non-www version.

But we have to take it further, and show them how the "Home" navigation, which leads to www.domain.com/default.asp (or whatever) is also technically another page, as is www.domain.com/default.asp?source=header, as is www.domain.com/default.asp?source=footer, as is www.domain.com/default.asp?source=sitemap.

Laying out these URL variations helps to convey that duplicate content is often as much about URL variations leading to the same "page" as it is about having multiple pages with the same or chunks of the same content.

identity edited 2008-01-21T20:26:46-08:00
2 0

While canonicalization is hard for both SEO's and novices alike to say five times fast, it's even harder for novices to understand, even if explained five times slow. There are two ways to look at this, from a "content" perspective and a "URL" perspective. The problem may be that we often talk from a content point of view, when what we are really talking about is URLs. When you talk with clients who aren't all that knowledgable, it isn't surprising that they have such a hard time with this, but when you talk with clients who are very web savvy who also have a hard time with this, it makes you start to wonder whether you are both on the same page. Let's take a typical site's homepage as an example... most would say, "No, that page is unique, that content isn't used anywhere else." This of course is a content focused view, based on duplicating content. But it helps to break it down to the URL level, to show them that: domain.com and www.domain.com are actually two pages (that just so happen to show the same content), which with a little explanation, many will get and then implement redirects to either the www or the non-www version. But we have to take it further, and show them how the "Home" navigation, which leads to www.domain.com/default.asp (or whatever) is also technically another page, as is www.domain.com/default.asp?source=header, as is www.domain.com/default.asp?source=footer, as is www.domain.com/default.asp?source=sitemap. Laying out these URL variations helps to convey that duplicate content is often as much about URL variations leading to the same "page" as it is about having multiple pages with the same or chunks of the same content. 
Cancel
sahota

2008-01-22T06:19:03-08:00

What should I say Rand ?

Most of times I read blog of seomoz I always find something really useful. As I am new to SEO, its great resource for me. This time you shed light on PING which I was not aware of at all.

Thanks for always sharing very useful things like this. I am a regular reader of seomoz hardly miss any post.
Thanks to whole SEOMOZ team for making SEOMOZ such a great resource for new people like me.

2 0

What should I say Rand ? Most of times I read blog of seomoz I always find something really useful. As I am new to SEO, its great resource for me. This time you shed light on PING which I was not aware of at all. Thanks for always sharing very useful things like this. I am a regular reader of seomoz hardly miss any post. Thanks to whole SEOMOZ team for making SEOMOZ such a great resource for new people like me. 
Cancel
Richard Baxter

2008-01-21T09:19:05-08:00

Hey Rand - good call on the scrapers comments. I like Joost's strategy on the subject. There's a heap of problems with dupe content - particulary dealing with proxy sites at the moment. Is any of that covered yet?

2 0

Hey Rand - good call on the scrapers comments. I like <a href="https://www.joostdevalk.nl/make-the-scrapers-work-for-you/" rel="nofollow">Joost's strategy</a> on the subject. There's a heap of problems with dupe content - particulary dealing with proxy sites at the moment. Is any of that covered yet?
Cancel
Kimber Scott

2008-01-21T13:25:18-08:00

with the canonical URLs such as site.com, site.com/index.html, www.site.com and www.site.com/index.html i think it's not only important to 301 them to one URL but to also make sure that all of your links are pointing to the page in the same way. if you are redirecting everything to www.site.com then all of your internal links should point to the page as www.site.com (absolute as rand points out) even though you've set it up to redirect anyways.

though on a side note if your site is ASP on IIS you'll likely not be able to redirect site.com/default.asp to www.site.com as it will cause a loop. if anyone knows how to fix this, by all means let me know. i'ts been hurting my head for weeks.

kimberscott edited 2008-01-21T13:27:19-08:00
2 0

with the canonical URLs such as site.com, site.com/index.html, www.site.com and www.site.com/index.html i think it's not only important to 301 them to one URL but to also make sure that all of your links are pointing to the page in the same way. if you are redirecting everything to www.site.com then all of your internal links should point to the page as www.site.com (absolute as rand points out) even though you've set it up to redirect anyways. though on a side note if your site is ASP on IIS you'll likely not be able to redirect site.com/default.asp to www.site.com as it will cause a loop. if anyone knows how to fix this, by all means let me know. i'ts been hurting my head for weeks. 
Cancel
- youfoundjake
 
 2008-01-21T15:54:23-08:00
 
 Kimber, you're not the only one banging your head against the wall with IIS 301 redirects. I've searched and searched, but keep coming back with either the loop, or vb code so difficult, that most people will just give up and allow the dupe content.
 
 Am I being anal for also making sure that index.* redirects back to /? I stress that quite a bit, but really its only the index page as opposed to the whole non-www to www issue.
 
 1 0
 
 Kimber, you're not the only one banging your head against the wall with IIS 301 redirects. I've searched and searched, but keep coming back with either the loop, or vb code so difficult, that most people will just give up and allow the dupe content. Am I being anal for also making sure that index.* redirects back to /? I stress that quite a bit, but really its only the index page as opposed to the whole non-www to www issue. 
 Cancel
 - Kimber Scott
 
 2008-01-21T16:49:37-08:00
 
 yes, i too standardize all of my homepage urls to site.com/ with the trailing slash. i'm not exactly sure why that is, but i do it anyway.
 
 1 0
 
 yes, i too standardize all of my homepage urls to site.com/ with the trailing slash. i'm not exactly sure why that is, but i do it anyway.
 Cancel
GeorgeDavis

2008-01-21T18:48:49-08:00

"Canonicalization can be a challenging concept to understand (and hard to pronounce - "ca-non-ick-cal-eye-zay-shun"),"

Is it weird that I found this to be the most helpful part of the re-write? ;)

1 0

"Canonicalization can be a challenging concept to understand (and hard to pronounce - "ca-non-ick-cal-eye-zay-shun")," Is it weird that I found this to be the most helpful part of the re-write? ;)
Cancel
g1smd

2008-01-21T19:22:07-08:00

The canonical URL for the root of a domain and for any folder MUST end with a trailing "/" on the end.

Never link to "https://www.domain.com" or to "https://www.domain.com/folder" without it.

The correct URLs are "https://www.domain.com/" and "https://www.domain.com/folder/" with the trailing "/" included.

That's direct from the HTTP specs.

1 0

The canonical URL for the root of a domain and for any folder MUST end with a trailing "/" on the end. Never link to "https://www.domain.com" or to "https://www.domain.com/folder" without it. The correct URLs are "https://www.domain.com/" and "https://www.domain.com/folder/" with the trailing "/" included. That's direct from the HTTP specs.
Cancel
- GeorgeDavis
 
 2008-01-21T19:26:00-08:00
 
 Where did you get that?
 
 1 0
 
 Where did you get that?
 Cancel
 - g1smd
 
 2008-01-21T19:30:06-08:00
 
 It's mentioned many times in the various RFCs (I forget the number, but it might be RFC 3986, I think) and in the Apache webserver documentation.
 
 2 0
 
 It's mentioned many times in the various RFCs (I forget the number, but it might be RFC 3986, I think) and in the Apache webserver documentation.
 Cancel
 - GeorgeDavis
 
 2008-01-21T19:38:02-08:00
 
 Here?
 
 Under URL Layout - Trailing Slash Problem.
 
 It is relevant to point out this only applies to apache web servers.
 
 1 0
 
 <a href="https://httpd.apache.org/docs/2.0/misc/rewriteguide.html" rel="nofollow">Here?</a> Under URL Layout - Trailing Slash Problem. It is relevant to point out this only applies to apache web servers.
 Cancel
- Pulkit ILoveFashionRetail.com
 
 2008-01-21T20:50:46-08:00
 
 I wonder if thats true. What do you say Rand?
 
 1 0
 
 I wonder if thats true. What do you say Rand? 
 Cancel
- youfoundjake
 
 2008-01-22T13:16:01-08:00
 
 I check out alot of sites, and I notice that they don't redirect index.* back to the root of the domain, but I started doing that probably 2 years ago.
 
 As far as http specs or apache documentation, when ever there is a question about 301's, g1smd is pretty locked on. Poor guy, I think everyone should chip in and buy him a one page domain with all his knowledge so he doesn't have to keep saying the same stuff over and over. heeh.
 
 https://www.google.com/search?hl=en&lr=&q=g1smd+%2B+301+site:webmasterworld.com&btnG=Search
 
 1 0
 
 I check out alot of sites, and I notice that they don't redirect index.* back to the root of the domain, but I started doing that probably 2 years ago. As far as http specs or apache documentation, when ever there is a question about 301's, g1smd is pretty locked on. Poor guy, I think everyone should chip in and buy him a one page domain with all his knowledge so he doesn't have to keep saying the same stuff over and over. heeh. <a href="https://www.google.com/search?hl=en&lr=&q=g1smd+%2B+301+site:webmasterworld.com&btnG=Search" rel="nofollow">https://www.google.com/search?hl=en&lr=&q=g1smd+%2B+301+site:webmasterworld.com&btnG=Search</a>
 Cancel
 - g1smd
 
 2008-01-22T15:00:36-08:00
 
 Hmm. Maybe I ought to see if SMX or SES want the full works on canonical URLs, redirects, and stuff that can screw up your rankings and traffic if you get it wrong. :-)
 
 1 0
 
 Hmm. Maybe I ought to see if SMX or SES want the full works on canonical URLs, redirects, and stuff that can screw up your rankings and traffic if you get it wrong. :-)
 Cancel
 - youfoundjake
 
 2008-01-22T16:37:21-08:00
 
 I don't know if they'd want it, but I'd sure as heck take it. Thanks Ian.
 
 1 0
 
 I don't know if they'd want it, but I'd sure as heck take it. Thanks Ian.
 Cancel
 - g1smd
 
 2008-01-27T10:41:07-08:00
 
 *** a one page domain with all his knowledge ***
 
 Hang on a moment. Are you saying that all my knowledge would only fit on one page?
 
 Cheeky Bugger!!! LOL.
 
 1 0
 
 *** a one page domain with all his knowledge *** Hang on a moment. Are you saying that all my knowledge would only fit on one page? Cheeky Bugger!!! LOL. 
 Cancel
smaxor

2008-01-21T22:59:34-08:00

With regards to scrapers. Funny enough I've had a little experience with scrapers and what you find is that most people don't get many links for their content so even if site A pinged and got it indexed first google will rank site B above as long as you get a couple links more then site A.

Being as most SEO guys are just learning about deep linking this is a rather easy thing to do. Take a longtail term like "buy kdh-8374 stereo receiver" stick is on some parasite hosting with a lot of domain weight like blogger, aol pages, hubpages, squidoo ... and the list goes on. Get 2-3 links to it and all of a sudden your content is making others money. Also this is very hard to do anything about because you can't find the person because you can't trace domain or hosting.

It's the perfect storm for the scraper. Good Content, Links with anchor, and a trusted domain. Don't need much more then that to rank.

So as far as I can see it the only thing you can do to defend against this is get a lot of deep links to each of your pages. Even then you're going to have a hard time. It's just a flaw in googles algorithm. Life goes on... Happy money making.

1 0

With regards to scrapers. Funny enough I've had a little experience with scrapers and what you find is that most people don't get many links for their content so even if site A pinged and got it indexed first google will rank site B above as long as you get a couple links more then site A. Being as most SEO guys are just learning about deep linking this is a rather easy thing to do. Take a longtail term like "buy kdh-8374 stereo receiver" stick is on some parasite hosting with a lot of domain weight like blogger, aol pages, hubpages, squidoo ... and the list goes on. Get 2-3 links to it and all of a sudden your content is making others money. Also this is very hard to do anything about because you can't find the person because you can't trace domain or hosting. It's the perfect storm for the scraper. Good Content, Links with anchor, and a trusted domain. Don't need much more then that to rank. So as far as I can see it the only thing you can do to defend against this is get a lot of deep links to each of your pages. Even then you're going to have a hard time. It's just a flaw in googles algorithm. Life goes on... Happy money making. 
Cancel
seointern

2008-01-22T16:27:17-08:00

Rand,

The Canonical example you gave with your company and using the 3 different versions of your content was a good example. But what I am curious to know is what type of example would a website be using the same content on 2 different pages that would need a 301 re-direct?

I do know a lot of the duplicate issue problems can rise from sections on individual pages, that have the same content, but in this particular case you wouldn't want to 301 re-direct the entire page. As each page has it's purpose.

Does anyone have another good example of where you would use the 301 re-direct from 2 pages that have the same content?

Thanks,

BJ

1 0

Rand, The Canonical example you gave with your company and using the 3 different versions of your content was a good example. But what I am curious to know is what type of example would a website be using the same content on 2 different pages that would need a 301 re-direct? I do know a lot of the duplicate issue problems can rise from sections on individual pages, that have the same content, but in this particular case you wouldn't want to 301 re-direct the entire page. As each page has it's purpose. Does anyone have another good example of where you would use the 301 re-direct from 2 pages that have the same content? Thanks, BJ 
Cancel
riesart

2008-01-21T18:09:43-08:00

This beginners Guide will grow big, so it seems... Good article!

RIes

1 0

This beginners Guide will grow big, so it seems... Good article! RIes 
Cancel
Anthony May

2008-01-21T20:15:53-08:00

Does the advice about print-only pages also apply to pages with CSS style switchers?

1 0

Does the advice about print-only pages also apply to pages with CSS style switchers?
Cancel
- identity
 
 2008-01-21T20:32:23-08:00
 
 It is probably going to come down to how it is handled. If the switching is handled by appending a parameter to the URL, like domain.com/mypage.htm?css=big, then yes, it will be creating duplicate content.
 
 While maybe not ideal, but a couple ways to help limit or protect against that is to nofollow those links or run them through javascript, and using robots.txt to isolate those URLs. But, that's not to say that someone won't come along, view the page with the "big" styles, and copy the URL and use it in their blog, and thus creating a link to the page.
 
 Even better though might be to use server-side scripting to dynamically change the style instead of relying on URL parameters.
 
 1 0
 
 It is probably going to come down to how it is handled. If the switching is handled by appending a parameter to the URL, like domain.com/mypage.htm?css=big, then yes, it will be creating duplicate content. While maybe not ideal, but a couple ways to help limit or protect against that is to nofollow those links or run them through javascript, and using robots.txt to isolate those URLs. But, that's not to say that someone won't come along, view the page with the "big" styles, and copy the URL and use it in their blog, and thus creating a link to the page. Even better though might be to use server-side scripting to dynamically change the style instead of relying on URL parameters. 
 Cancel
 - Anthony May
 
 2008-01-22T02:52:45-08:00
 
 Taken from SEOMoz's own 301 page, if you were using querystrings for your stylesheet links, would this be the solution?RedirectMatch 301 /index.php(?css=big) https://www.yoursite.com/index.php$1
 
 1 0
 
 Taken from SEOMoz's own 301 page, if you were using querystrings for your stylesheet links, would this be the solution?RedirectMatch 301 /index.php(?css=big) https://www.yoursite.com/index.php$1
 Cancel
- smaxor
 
 2008-01-21T23:04:37-08:00
 
 no
 
 1 0
 
 no
 Cancel
aira

2008-01-21T15:22:42-08:00

Thanks for giving such a good advice on how to prevent duplicate content. The illustrations helped a lot =)

1 0

Thanks for giving such a good advice on how to prevent duplicate content. The illustrations helped a lot =)
Cancel
SeanMaguire

2008-01-21T09:51:17-08:00

"We worked to individually 301 re-direct all of the print-friendly versions of the content back to the originals and created a CSS option to show the page in printer-friendly format (on the same URL). This resulted in a boost of more than 20% in search engine traffic within 60 days."

In Joomla CMS, you have the option to activate a "printer friendly" icon to print a page from the site.

Will this have the same impact (as what you have outlined above), as if you had created individual web and print versions manually, or would you actually have to manually create a printer friendly version and do a 301 re-direct to potentially see that kind of traffic increase?

My interest is this - if I could gain a significant improvement in traffic simply by having two copies (original web version and print version) , and re-directing the latter to the former, it would seem like this would be a pretty good standard practice since it's not much work for that kind of potential benefit. Is that a fair statement?

1 0

"We worked to individually 301 re-direct all of the print-friendly versions of the content back to the originals and created a CSS option to show the page in printer-friendly format (on the same URL). This resulted in a boost of more than 20% in search engine traffic within 60 days." In Joomla CMS, you have the option to activate a "printer friendly" icon to print a page from the site. Will this have the same impact (as what you have outlined above), as if you had created individual web and print versions manually, or would you actually have to manually create a printer friendly version and do a 301 re-direct to potentially see that kind of traffic increase? My interest is this - if I could gain a significant improvement in traffic simply by having two copies (original web version and print version) , and re-directing the latter to the former, it would seem like this would be a pretty good standard practice since it's not much work for that kind of potential benefit. Is that a fair statement? 
Cancel
- Dito
 
 2008-01-21T11:04:50-08:00
 
 Good stuff Rand. Illustrations helped.
 
 1 0
 
 Good stuff Rand. Illustrations helped. 
 Cancel
- Rand Fishkin
 
 2008-01-21T13:58:12-08:00
 
 Sean - you're basically fixing a mistake in site architecture, not actually benefiting from the two versions. Note the link I pointed to for Omarinho's question below.
 
 Joomla CMS - I really don't know how it operates, but if there are two URLs for the same content, you'll have a problem. If that print-friendly link just uses Javascript to change the CSS or uses the 'print' command in the browser, it should be fine.
 
 1 0
 
 Sean - you're basically fixing a mistake in site architecture, not actually benefiting from the two versions. Note the link I pointed to for Omarinho's question below. Joomla CMS - I really don't know how it operates, but if there are two URLs for the same content, you'll have a problem. If that print-friendly link just uses Javascript to change the CSS or uses the 'print' command in the browser, it should be fine. 
 Cancel
 - semanjoe
 
 2008-01-24T09:52:43-08:00
 
 Hi Rand,
 
 Me thinks I have found a good example of a conical version of SEOMOZ content:
 
 https://www.seomoz.org/blog
 
 https://www.seomoz.org/blog/
 
 Providing no one linked to the latter it would not be a problem, however I suspect that it not the case.
 
 Could a 301 be on the cards?
 
 1 0
 
 Hi Rand, Me thinks I have found a good example of a conical version of SEOMOZ content: <a href="/">https://www.seomoz.org/blog</a> <a href="/">https://www.seomoz.org/blog/</a> Providing no one linked to the latter it would not be a problem, however I suspect that it not the case. Could a 301 be on the cards? 
 Cancel
- g1smd
 
 2008-01-21T19:24:41-08:00
 
 Joomla is stack full of Duplicate Content issues.
 
 I find it a complete nightmare. Be very careful.
 
 1 0
 
 Joomla is stack full of Duplicate Content issues. I find it a complete nightmare. Be very careful.
 Cancel
BrettBorders

2008-01-21T09:34:07-08:00

Great post, Rand! The new beginner's guide is going to be one of the most influential and valuable SEO documents ever created, if not the most.

Just make a sticky note and remember to 301 this post over to the entire beginner's guide once it's finished ;)

1 0

Great post, Rand! The new beginner's guide is going to be one of the most influential and valuable SEO documents ever created, if not the most. Just make a sticky note and remember to 301 this post over to the entire beginner's guide once it's finished ;) 
Cancel
Maria S Balayan

2008-01-21T08:43:04-08:00

Great post and advices Rand. Using complete urls on internal links lets you detect the referring link on analytics and pointing out the scrapers.

1 0

Great post and advices Rand. Using complete urls on internal links lets you detect the referring link on analytics and pointing out the scrapers. 
Cancel
UtahSEOPro

2008-01-21T17:16:13-08:00

This stuff is very important for any SEO to understand.

1 0

This stuff is very important for any SEO to understand.
Cancel
omarinho

2008-01-21T11:08:49-08:00

If you put a 301 on the print versions in order to re-direct to the original pages, how will be able the users to see the print versions when they need to print a specific page? CSS option in the same URL? I don't understand. :-o If you can post a link with an example, it would be great.

1 0

If you put a 301 on the print versions in order to re-direct to the original pages, how will be able the users to see the print versions when they need to print a specific page? CSS option in the same URL? I don't understand. :-o If you can post a link with an example, it would be great.
Cancel
- Rand Fishkin
 
 2008-01-21T13:55:36-08:00
 
 Omarinho - the reason you need to 301 is to grab that link juice. Then, you can use a modified CSS stylesheet (see this AListApart article) to make the same URL produce two different looking documents - one for print, one for web.
 
 2 0
 
 Omarinho - the reason you need to 301 is to grab that link juice. Then, you can use a modified CSS stylesheet (<a href="https://www.alistapart.com/stories/goingtoprint/" rel="nofollow">see this AListApart article</a>) to make the same URL produce two different looking documents - one for print, one for web.
 Cancel
 - omarinho
 
 2008-01-21T15:25:25-08:00
 
 I get it now. Thank you!
 
 1 0
 
 I get it now. Thank you!
 Cancel
 - identity
 
 2008-01-21T20:37:50-08:00
 
 The all CSS technique is by far the best approach, at least for handling print, mobile, or other versions because the styling is handled at the browser level, not through different pages or parameters, and thus creating potential duplicate content.
 
 Even if you block the print pages from spiders, why add the chance that anyone might copy and paste those blocked URLs or, depending on how the pages are handled, keep multiple versions of pages (granted, using a CMS probably won't make this an issue).
 
 1 0
 
 The all CSS technique is by far the best approach, at least for handling print, mobile, or other versions because the styling is handled at the browser level, not through different pages or parameters, and thus creating potential duplicate content. Even if you block the print pages from spiders, why add the chance that anyone might copy and paste those blocked URLs or, depending on how the pages are handled, keep multiple versions of pages (granted, using a CMS probably won't make this an issue). 
 Cancel
Pulkit ILoveFashionRetail.com

2008-01-21T08:14:24-08:00

Nice. I didn't knew about that ping thing. Will make sure I do that every time I post my blog. Thanks Rand:)

1 0

Nice. I didn't knew about that ping thing. Will make sure I do that every time I post my blog. Thanks Rand:)
Cancel
- sirbots
 
 2008-01-21T11:30:19-08:00
 
 Both FeedBurner and WordPress can be set up to automatically ping certain sites every time you blog something new.
 
 1 0
 
 Both FeedBurner and WordPress can be set up to automatically ping certain sites every time you blog something new.
 Cancel
 - Pulkit ILoveFashionRetail.com
 
 2008-01-21T20:46:19-08:00
 
 Oh really? Leme check. Do i require a plugin for that?
 
 1 0
 
 Oh really? Leme check. Do i require a plugin for that? 
 Cancel
 - Will Critchlow
 
 2008-01-23T00:53:53-08:00
 
 The basic wordpress install normally handles it for you, I believe.
 
 1 0
 
 The basic wordpress install normally handles it for you, I believe.
 Cancel
rtwoods

2008-01-21T15:12:46-08:00

Great post, Rand! I'm going to share this on our team reading list for 1/22 at the EMP blog!

1 0

Great post, Rand! I'm going to share this on our team reading list for 1/22 at the EMP blog!
Cancel
Gustavo Parra

2008-01-21T12:09:50-08:00

Thanks Rand for the article. I have a question... For blogs which create duplicate content in archives and categories, how can that be handle? Should we 301 redirect the archive and categories pages to the main page of the article? .

1 0

Thanks Rand for the article. I have a question... For blogs which create duplicate content in archives and categories, how can that be handle? Should we 301 redirect the archive and categories pages to the main page of the article? .
Cancel
- Kimber Scott
 
 2008-01-21T13:16:34-08:00
 
 i just robots.txt those URLs out. i think it's pretty unlikely that anybody is externally linking to your blog's archives or categories pages to 301 them and preserve backlinks. using robots.txt will simply tell the spiders to go away and not crawl or index those pages.
 
 but then again, i could be wrong.
 
 1 0
 
 i just robots.txt those URLs out. i think it's pretty unlikely that anybody is externally linking to your blog's archives or categories pages to 301 them and preserve backlinks. using robots.txt will simply tell the spiders to go away and not crawl or index those pages. but then again, i could be wrong. 
 Cancel
 - Gustavo Parra
 
 2008-01-21T14:22:04-08:00
 
 Thanks Kimber for your reply, it sounds logical to me. I'm just confused on what Rand tries to explained about having different copies of the same text in different URL's, like Rand mentioned about the main page of the article and a page in printer-friendly format. Can then this page in printer-friendly format be blocked by robots.txt also, as a solution to duplicate content?
 
 1 0
 
 Thanks Kimber for your reply, it sounds logical to me. I'm just confused on what Rand tries to explained about having different copies of the same text in different URL's, like Rand mentioned about the main page of the article and a page in printer-friendly format. Can then this page in printer-friendly format be blocked by robots.txt also, as a solution to duplicate content? 
 Cancel
 - BradleyT-50754
 
 2008-01-21T14:33:46-08:00
 
 "Can then this page in printer-friendly format be blocked by robots.txt also, as a solution to duplicate content?"
 
 You could, but remember in Rand's situation both the printer version and regular version had incoming links. Therefore by 301'ing the printer version he sends all that link juice to the non-printer version.
 
 So you could robot.txt block the printer version however that doesn't prevent people from linking to your (no ads) printer version.
 
 1 0
 
 "Can then this page in printer-friendly format be blocked by robots.txt also, as a solution to duplicate content?" You could, but remember in Rand's situation both the printer version and regular version had incoming links. Therefore by 301'ing the printer version he sends all that link juice to the non-printer version. So you could robot.txt block the printer version however that doesn't prevent people from linking to your (no ads) printer version.
 Cancel
 - Gustavo Parra
 
 2008-01-21T15:05:27-08:00
 
 Thanks BradleyT, that's perfectly clear now.
 
 1 0
 
 Thanks BradleyT, that's perfectly clear now.
 Cancel
 - g1smd
 
 2008-01-22T14:45:57-08:00
 
 If you use the "media" function within CSS, there is no need to have multiple URLs for the screen and print versions of the page.
 
 1 0
 
 If you use the "media" function within CSS, there is no need to have multiple URLs for the screen and print versions of the page.
 Cancel
 - dojeda
 
 2008-01-21T15:09:53-08:00
 
 ANy links to details on how to use the robot file for this?
 
 1 0
 
 ANy links to details on how to use the robot file for this?
 Cancel
 - Kimber Scott
 
 2008-01-21T16:53:30-08:00
 
 robotstxt.org tells you all about it. it's also recommended that you "test" out your robots.txt in Google Webmaster Tools to make sure you aren't blocking important pages by accident.
 
 2 0
 
 <a href="https://www.robotstxt.org/robotstxt.html" rel="nofollow">robotstxt.org</a> tells you all about it. it's also recommended that you "test" out your robots.txt in Google Webmaster Tools to make sure you aren't blocking important pages by accident.
 Cancel
 - dojeda
 
 2008-01-21T23:22:16-08:00
 
 Thanks Kimber!!!
 
 1 0
 
 Thanks Kimber!!! 
 Cancel
 - g1smd
 
 2008-01-21T19:26:56-08:00
 
 Using robots.txt does keep the duplicates out of the index, but those URLs still accumulate PageRank (because you are linking to them from within your site) which is now wasted, because it can't be passed on elsewhere within the site.
 
 1 0
 
 Using robots.txt does keep the duplicates out of the index, but those URLs still accumulate PageRank (because you are linking to them from within your site) which is now wasted, because it can't be passed on elsewhere within the site.
 Cancel
 - Kimber Scott
 
 2008-01-22T07:53:24-08:00
 
 so you should also add nofollow to the internal links?
 
 1 0
 
 so you should also add nofollow to the internal links?
 Cancel
 - identity
 
 2008-01-21T20:45:06-08:00
 
 I wouldn't totally rule out what people might link to.
 
 Case in point, in reviewing various URLs where a particular URL structure (heavy parameter-based version that lead to product pages that had much cleaner URLs) was used for the on-site search function, the first thought was that it was no concern since spiders couldn't fill out and submit the search form . . . but then I was surprised seeing that some of the URLs were showing up in the SERPs.
 
 Doing some more digging and hopping down the backlink trail, the issue became clear . . . bloggers and others who wanted to link to the site and would often come to the site, and rather than navigate down through to the specific product page, would use the on-site search to search for it and link to the resulting page . . . which was the ugly URL page, not the nice, clean, spider and searcher friendly URL page.
 
 It isn't always possible, especially on large CMS or ecommerce sites, but I'd always recommend eliminating the potential for URL variations whenever possible.
 
 1 0
 
 I wouldn't totally rule out what people might link to. Case in point, in reviewing various URLs where a particular URL structure (heavy parameter-based version that lead to product pages that had much cleaner URLs) was used for the on-site search function, the first thought was that it was no concern since spiders couldn't fill out and submit the search form . . . but then I was surprised seeing that some of the URLs were showing up in the SERPs. Doing some more digging and hopping down the backlink trail, the issue became clear . . . bloggers and others who wanted to link to the site and would often come to the site, and rather than navigate down through to the specific product page, would use the on-site search to search for it and link to the resulting page . . . which was the ugly URL page, not the nice, clean, spider and searcher friendly URL page. It isn't always possible, especially on large CMS or ecommerce sites, but I'd always recommend eliminating the potential for URL variations whenever possible. 
 Cancel
JeffS

2008-01-27T06:53:20-08:00

This can be a bit complicated, but I agree content is king. I've found a lot of info regarding this type of information lately.

https://www.massmailsoftware.com/blog/ has some good bits on it, and the others I forget :P

Regardless, it is worth making sure content is continually fresh, easy to be found, useful and sitting on a solid foundation.

1 1

This can be a bit complicated, but I agree content is king. I've found a lot of info regarding this type of information lately. https://www.massmailsoftware.com/blog/ has some good bits on it, and the others I forget :P Regardless, it is worth making sure content is continually fresh, easy to be found, useful and sitting on a solid foundation.
Cancel

Post Analytics

Comments 55

Log in to Moz

Don't have an account?