Let's look at a few diagrams to help explain:
This is the standard way rel=canonical is employed. Different versions of a page, whether on your own site, on partner sites, or places you're licensing content (note: this is an update Google launched on Dec. 17th, 2009) can all reference back to the original to help tell the search engines where to find that piece. However, it's also perfectly OK to do this:
Looking through Google's blog post on the subject, this isn't explicitly stated. However, you can see that even the example website, Wikia, employs this practice on the page Google points out. You can also see Googler Maile Ohye answering a comment on this:
@Wade: Yes, it's absolutely okay to have a self-referential rel="canonical". It won't harm the system and additionally, by including a self-reference you better ensure that your mirrors have a rel=”canonical” to you.
Maile's got really good advice here. If you run into situations where third parties are referencing your posts and appending strings of data to the URL, it can be really helpful to have the canonical URL tag on these by default. In fact, we've worked with many companies recently who found it helpful to employ sitewide as a best practice, just to prevent future iterations or less SEO savvy development from reproducing versions of the page that didn't contain the rel=canonical and potentially losing link juice / causing canonicalization issues.
One last piece - it's a really, really good way to make sure Google indexes the http rather than https version of your page (and counts link juice to the proper one). This had historically been a royal pain in the butt for many SEOs, and we've heard enough positive stories now to feel confident recommending it.
Welcome to 2010! Hope everyone had a great holiday break :-)
Honestly I did not have that doubt... and that's strange as my motto could be "I doubt, therefore I exist".
From the first appearance of the 'rel="canonical" ' I thought it was ok to use it in a auto-referencial way too.
Anyway, thanks Rand for the clarification... if I was going to have a post-dated doubt about the use of the canonical, now I won't.
About self referential canonical tags: In your pro dashboard rel canonical is seen as minor issue - this implies for me that having all pages of an installation with rel canonical is not seen as positive as written in this article.
I would be interested to know why you mark this as issue in the seomoz crawl issues...
Thanks,
Andreas
Ha. I just noticed that today, too, in the On-Page Report Card. I thought I was going a little crazy and wondered if something had changed. Had to go do some research to make sure that self-referencing canonicals were still okay.
I could've sworn that, when Google first announced the rel=canonical tag, the post said that best practice was not to use it on the actual canonical page, but they never gave a reason or suggested what the consequences might be (unfortunately, I can't find that link). Glad to hear it's not an issue in practice.
It would seem really strange for them to offer that advice. This tag seems in part to help CMS deal with multiple ways for someone to get to the same content.
Since I would say the majority of the people who will use the tag is when it is implemented in wordpress, joomla, drupal etc. Now that seems a better target then SEO that are able to hand roll things. If your looking at those system they are going to definitely have it self linking back to the source.
You would also think it would help to reinforce that this is the true source, if it was not there I would think it would send a worse signal.
In retrospect, I think it may have just been an implication. From Google Webmaster Central: To specify a canonical link to the page https://www.example.com/product.php?item=swedish-fish, create a element as follows:<link rel="canonical" href="https://www.example.com/product.php?item=swedish-fish"/>Copy this link into the <head> section of all non-canonical versions of the page, such as https://www.example.com/product.php?item=swedish-fish&sort=price.I suppose that's the "ideal" version, since you shouldn't waste spider resources telling them that the canonical page is canonical, but in practice, it probably doesn't matter. As you said, for most implementations, it seems logical that the default would be to use it on every copy, since that's much easier from a coding perspective.
Excellent point about using rel=canoncial for https. You wouldn't think that secure pages would have a duplicate, but it comes up more often than not.
@Rand thanks for the clarification on the issue, but not sure how happy the developers who sometimes struggle with redirecting the non-www version of a domain, also have to worry about the https version of your website.
This does make sense and since I understand that many websites have to pay for bandwidth run through your HTTPS certificate it may make financial sense to limit traffic from search engines.
The HTTPS version might also be a good place for page rank sculpting...
rel = canonical has worked pretty well to resolve pagination issues - we've always included it on the canonical page itself plus non-canonical pages. I've really only paid attention to Google but it seems to work fine.
Wowww, huanzi, nice spammmmmmmmmmmmmm... Mozers, getting spammed...
*POOF* and the spam goes up in smoke. :)
Just out of interest what are the sort of 'defences' against spam here? I had around one hundred emails this morning just spam. There seems to have been an increase.
Sorry about that! We had several accounts that were created and spammed us pretty hard before their accounts and IP addresses were banned. We're working on more robust and proactive ways of combatting that sort of spam. Again, sorry about all the emails!
the emails seems all basically the same, is there a way so that I will get 1 email, then when I come back and read the new posts I will then get another email when a new comment is added. That way if I am away from the computer I do not come back with 30 emails all the same
It's fine, in the end I recognised the usernames of each member and only clicked to read the comments of those who were legitimate members :-).
@nicolash: It allows us to see where and when people have posted if it's done this way.
Thanks for the useful tip.
Noticed though that you're not using the tag on this blog post, even though the URL I'm looking at has utm parameters appended. Would have thought this would be the ideal opportunity to use rel=canonical, any reason why you don't?
That's something I've noticed with this site too. It doesn't employ the tag at all. I'm not sure whether it's that they haven't got around to it yet or something, but they're telling us it's good practice and not doing it themselves.
I'm not saying they're wrong and I am using the canonical tag, I just don't understand why they aren't.
I have always used the canonical tag on the parent page. It never crossed my mind that any problems would occur as a result anyway.
However, thanks for the reassurance!
Paul Martin
Cube3 Marketing
Nice clarification, although it has got me wondering what the supposed issues were. Probably best not to go there, no point remembering issues that don't exist.
Only one niggle, the diagram threw me initially, because it looks like two pages that point to each other, rather than one page that points to itself.
Alun.
Great post - I'm glad to have my research validated. I ended up finding the exact same thing from the horse's mouth itself: Matt Cutts' video on YouTube.
Thanks for this. It was very useful. I don't really understand why Google don't overtly say this. They only use the example of there actually existing bad versions of a page. The whole thing seems weird to me.
Maybe someone can answer my question:
We have recently introduced rel canonical into every page of our site (www.aferry.co.uk). So when people link to the site wrongly (e.g. without 'www') hopefully Google should only record one page on its index and not more. When we implemented the rel canonical we decided that case was important.
So all the rel canonicals refer to a lower-case file name. However some files acutally have mixed case file names. E.g. https://www.example.com/Widgets.htm has: <link href="https://www.example.com/widgets.htm"/>
Does this matter to Google? Will Google be reading all of these as two pages? Or does only the link matter?
Thanks very much. Any help much appreciated or if you have seen a similar issue discussed elsewhere.
That all make sense, thanks for the update.
Rand I´m confused, you mentioned:
"Canonical URL tag, operates exclusively on a single root domain (it will carry over across subfolders and subdomains)." and yet on the first image of this post you are showing partnersite.com/article using the rel="canonical" tag...??
Also some one that copies my content could place that tag in their page to tell SE that is the original page? How would SE know about it?
Hey Gustavo - as I pointed out in the post, Google just recently updated their policy around canonical url tags to allow for cross-domain functionality - https://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html
In terms of knowing which version is the original, I'd presume that if Google encounters two copies of a piece of content, both employing rel=canonical, they'll use the same metrics and mechanisms they do today when finding duplicate content. It's really only helpful if both versions point back to one original (or if the copy does).
I know this is an ancient post but on the off chance that someone might actually catch it, should the duplicate post have a if the content isn't completely the same? If the formatting and html is different but the text content (Article content) is duplicate should the link tag still be there?
Nice Post ;)
Thank's
Great Post! I had heard this rumor so its nice to have a set answer.
Great post! Thanks for the info, will start looking to utilize in my work.
I think it makes more sense to include it on the canonical page itself. We've been planning on using it for an upcoming project so it's good to see the discussion being had here on where to use it. Thanks SEOmoz!
It's a good post to clear this up, but would also think that this should be common sence, but maby its just me?
Allways making sure you point your page to the "right" page also builds for any future additions in parameters like AB testing, adding tracking code for advanced analytics purposes etc...
We allways implement canonical on everything we build as a standard to ensure future smooth running.
But I guess its allways nice to have an "offical" post stateing this is good behavior and not something that will create problems.
Happy successful 2010 everyone!
Good to know.
I didn't really think of using it to exclude the parameters in a URL.Might come in handy, thanks!
Thank you for clearing it up. I always thought this was a case so just stuck it on both :).
Nice to hear this news, implementing canonical tags on dynamic duplicate pages can be a pain, so now I can use the tag sitewide it is much easier to manage.
That was always my main concern as well. But since I recently 'refurbished' my site, I thought I'd give it a go and use the canonical tag permanently.
Nice read as always Rand; I'm with a lot of the people above in having practiced this for a while but still enjoying seeing it officially posted somewhere.
Thanks for clearing this up. They're great for session ID's.
I wonder if it would be good practice to use them every time a page is created in case an issue were to come up in the future. Are there any possible negative effects?
Be careful about rel=canonical loops (or chains, whatever you want to call it) or other implementation errors.
For some reason I am surprised at how much confusion the rel="canonical" tag has caused however I guess anything new results in some confusion.
Does anyone know of a post or research where someone outlines a test they did with rel="canonical" and reported results positive or negative backed with data as evidence?
I have yet to see an in depth post on this and would be curious.
After our implementation of link rel='canonical' in early 2009, we saw our Google Duplicate META Description Tag error report go from 3,400+ dupe tag descrips down to about 250 last time I checked.
Being a commerce site, we're stuck with URL parms, so the majority of the dupe tag errors were for the same content with discrete URL variations only.
I attrbiute this improvement directly to our canonical tag rollout - without actually changing the text source or description rules for building the tags. I know the useage of that tag and it's importance in the SE alogorithims is debtable, but, I take this as evicence Google was able to use our instructions as to what is the definitive URL and (hopefully) steer all rank to the one canonical page.
That's good. Is there any chance that you could provide us with some metrics of this? I'd like to see how it works in reality.
Unfortunately, I haven't had a chance to put it in place somewhere that had major problems with duplicate content and the like.
I haven't tested this myself, so good to know. It seems in line with logic. It would just be an extra line of code that is unnecessary, but shouldn't cause harm. Just more of a circular reference.
Thanks Rand.
I have been using the canonical tag in-house in self-referential fashion for only a few months. As I requested this change to the IT dept, I realised they also applied it to the canonical version, so I did wonder if it would be counteractive to have it that way. I havent experienced any problems, at least in terms of loss of rankings. It's good to get this myth clarified for piece of mind, thanks Rand.
This is logical sense. If you look at it from a page | URL perspective and understand the differences between the two, it's inherently clear. In the case of a tracking code URL that is dynamically created or a URL with a session ID appended to it for example, no actual unique page / file exists for that URL - it pulls the same coding from the original page and URL. Placing the canonical tag on the original source page in this case, and as a sitewide best practice makes logical sense because when all of these variant URL's are created they will still be pulling the source code from the original actual page, which will have the canonical tag on it. If this wasn't the case, how could you ever solve the problem? You can't put a canonical tag on a URL that doesn't have a unique page associated with it because there is no file, i.e, no source code.
In the cases where an actual page / file does exist for a URL which displays the same content as another URL and page / file, you put the tag on the non-canonical. However I think the sitewide method covers all your bases and leaves you less to worry about.
Interesting idea to add the tag to other sites that use your content...makes sense.
Also like the idea to use it on the https version of a page.
Hey great post, a nice way to break into the new year. Thanks for the post, I doubt I would have realised it was okay to use rel=canonical on the original page. Thanks for clearing it up!