It sounds like an easy question, doesn’t it? While we hear a lot about duplicate content since the Panda update(s), I’m amazed at how many people are still confused by a much more fundamental question – which URL for any given page is the canonical URL? While the idea of a canonical URL is simple enough, finding it for a large, data-driven site isn’t always so easy. This post will guide you through the process with some common cases that I see every week.
Let’s Play Count the Pages
Before we dive in, let’s cover the biggest misunderstanding that people have about “pages” on their websites. When we think of a page, we often think of a physical file containing code (whether it’s static HTML or script, like a PHP file). To a crawler, a page is any unique URL that it finds. One file could theoretically generate thousands of unique URLs, and every one of those is potentially a “page” in Google’s eyes.
It’s easy to smile and nod and all agree that we understand, but let’s put it to the test. In each of the following scenarios, how many pages does Google see?
(A) “Static” Site
- www.example.com/
- www.example.com/store
- www.example.com/about
- www.example.com/contact
(B) PHP-based Site
- www.example.com/index.php
- www.example.com/store.php
- www.example.com/about.php
- www.example.com/contact.php
(C) Single-template Site
- www.example.com/index.php?page=home
- www.example.com/index.php?page=store
- www.example.com/index.php?page=about
- www.example.com/index.php?page=contact
The answer is (A) 4, (B) 4, and (C) 4. In Google’s eyes, it doesn’t matter whether the pages have extensions (“.php”), the home-page is at the root (“/”) or at index.php, or even if every page is being driven off of one physical template. There are four unique URLs, and that means there are four pages. If Google can crawl them all, they’ll all be indexed (usually).
Let’s dive right into a few examples. Please note: these are just examples. I’m not recommending any of the URL structures in this post as ideal – I’m just trying to help you determine the correct canonical URL for any given situation.
Case 1: Tracking URLs
I’ll start with an easy one. Many sites still use URL parameters to track visitor sessions or links from affiliates. No matter what the parameter is called or which purpose it’s used for, it creates a duplicate for every individual visitor or affiliate. Here are a few examples:
- www.example.com/store.php?session=1234
- www.example.com/store.php?affiliate=5678
- www.example.com/store.php?product=1234&affiliate=5678
In the first two examples, the session and affiliate ID create a copy, in essence, of the main store page. In both of these cases, the proper canonical URL is simply:
- www.example.com/store.php
The last example is a bit trickier. There, we also have a “product=” parameter that drives the product being displayed. This parameter is essential – it determines the actual content of the page. So, only the “affiliate=” parameter should be stripped out, and the canonical URL is:
- www.example.com/store.php?product=1234
This is just one of many cases where the canonical URL is NOT the root template or the URL with no parameters. Canonical URLs aren’t always short or pretty – many canonical URLs will have parameters. Again, I’m not arguing that this structure is ideal. I’m just saying that the canonical URL in this case would have to include the “product=” parameter.
Case 2: “Dynamic” URLs
Unfortunately, the word “dynamic” gets thrown around a little too freely – for the purposes of this blog post, I mean any URLs that pass variables to generate unique content. Those variables could look like traditional URL parameters or be embedded as “folders”.
A good example of the kind of URLs I’m talking about are blog post URLs. Take these four:
- www.example.com/blog/1234
- www.example.com/blog.php?id=1234
- www.example.com/blog.php?id=1234&comments=on
- www.example.com/blog/20120626
Again, it doesn’t matter whether the URLS have parameters or hide those parameters as virtual folders. All of these URLs use a unique value (either an ID or date) to generate a specific blog post. So what’s the canonical URL here? Obviously, if you canonicalize to “/blog”, you’re going to reduce your entire blog to one page. It’s a bit of a trick question, because the canonical URL could actually be something like this:
- www.example.com/blog/this-is-a-blog-post
This is why we have such a hard time detecting the proper canonical URLs with automated tools – it really takes a deep knowledge of a site’s architecture and the builder’s intent. Don’t make assumptions based on the URL structure. You have to understand your architecture and crawl paths. If you just start stripping off URL parameters, you could cause an SEO disaster.
Case 3: The Home-page
It might seem strange to put the home page third, but the truth is that the first two cases were probably easier. Part of the problem is that home pages naturally spin out a lot of variations:
- www.example.com
- www.example.com/
- www.example.com/default.html
- www.example.com/index.php
- www.example.com/index.php?page=about
Add in complications like secure pages (https:), and you can end up multiplying all of these variants. While this is technically true of any page, the problem tends to be more common for the home page, since it’s usually the most linked-to page (both internally and from external sites) by a large margin.
In most cases, the technically correct home-page URL is:
- https://www.example.com/
…but there are exceptions (such as if you secure your entire site). I don’t see the trailing slash (“/”) causing a ton of problems on home pages these days, since most browsers and crawlers add it automatically, but I think it’s still a best practice to use it.
Another common exception is if your site automatically redirects to another version of the home-page – ASP is notorious about this, and often lands visitors and bots at “index.aspx” or a similar page. While that situation isn’t ideal, you don’t want to cross signals. If the redirect is necessary, then the target of that redirect (i.e. the “index.aspx” URL) should be your canonical URL.
Finally, be very careful about situation #5 – in that case, as I discussed in the first section of this post, the “index.php” code template is actually driving other pages with unique content. Canonicalizing that to the root or to “index.php” could collapse your site to one page in the Google index. That particular scenario is rare these days, but some CMS systems still use it.
Case 4: Product Pages
In some ways, product pages are a lot like the blog-post pages in Case #2, except moreso. You can naturally end up with a lot of variations on an e-commerce site, including:
- www.example.com/store.php?id=1234
- www.example.com/store/1234
- www.example.com/store/this-is-a-product
- www.example.com/store.php?id=1234¤cy=us
- www.example.com/store/1234/red
- www.example.com/store/1234/large
If you have a URL like #3, then that’s going to be your canonical URL for the product in most cases (especially #1-#3). If you don’t, then work up the list. In other words, if you have #3, use it; if not, use #2; if not #2, use #1. You have to work with the structure you have.
URLs #4-#6 are a bit trickier. Something like the currency selector in #4 can be very complicated and depends on how those selections are implemented (user selection vs. IP-based geo-location, for example). For Google’s purposes, you would typically want them to use the dominant price for the site’s audience and canonical to the main product URL (#1-#3, depending on the site architecture). Indexing every price variant, unless you have multiple domains, is just going to make your content look thinner.
With #5 and #6, the URL indicates a product variant, let’s say a T-shirt that comes in different colors and sizes. This situation depends a lot on the structure and scope of the content. Technically, your T-shirt in red/large is unique, and yet that page could look “thin” in Google’s eyes. If you have a variant or two for a handful of products, it’s no big deal. If every product has 50 possible combinations, then I think you need to seriously consider canonicalization.
Case 5: Search Pages
Now, the ugliest case of them all – internal search pages. This is a double-edged sword, since Google isn’t a fan of search-within-search (their results landing on your results) in general and these pages tend to spin out of control. Here are some examples:
- www.example.com/search.php?topic=1234
- www.example.com/search/this-is-a-topic
- www.example.com/topic
- www.example.com/search.php?topic=1234&page=2
- www.example.com/search.php?topic=1234&page=2&sort=desc
- www.example.com/search.php?topic=1234&page=2&filter=price
The list, unfortunately, could go on and on. While it’s natural to think that the canonical version should be #1-#3 (depending on your URL structure, just like in Case #4), the trouble is pagination. Pages 2 and beyond of your topic search may appear thin, in some cases, but they return unique results and aren’t technically duplicates. Google’s solutions have changed over time, and their advice can be frustrating, but they currently say to use the rel=prev/next tags. Put simply, these tags tell Google that the pages are part of a series.
In cases like #5-#6, Google recommends you use rel=prev/next for the pagination but then a canonical tag for the “&page=2” version (to collapse the sorts and filters). Implementing this properly is very complicated and well beyond the scope of this post, but the main point is that you should not canonicalize all of your search pages to page 1. Adam Audette has an excellent post on pagination that demonstrates just how tricky this topic is.
Know Your Crawl Paths
Finally, an important reminder – the most important canonical signal is usually your internal links. If you use the canonical tag to point to one version of a URL, but then every internal link uses a different version, you’re sending a mixed signal and using the tag as a band-aid. The canonical URL should actually be canonical in practice – use it consistently. If you’re an outside SEO coming into a new site, make sure you understand the crawl paths first, before you go and add a bunch of tags. Don’t create a mess on top of a mess.
Dr. Pete i saw these things alot in ecommerce websites as well as forums. But i have one question here.
If we have page
Both page have same source code file then how should i define Canonical Tag for both individually?
pseudo code - but a simple code block looks something like this
var page = request("page")
var canonical
canonical ="https://www.example.com?page=" & "page"
this isn't always possible, in fact if possible the canonical should be stored with your page in a database or some sort of CMS so it knows the canonical, but as Wonderkidxx said - typically devs somehow get it wrong!
Yeah - unfortunately, it's going to take custom code. In many cases, the CMS will be driving the page from a database, and you could use the same lookup/query to grab the name of the page, if it's not available in the referring string.
The same goes for custom blog names or product names like "/which-page-is-canonical" in this post - that string is pulled from the database of blog entries. Sometimes, that's relatively easy (just a couple of lines of code). Problem is, you have to know which couple of lines :)
As I said on Twitter, your posts are my favourite of anybody within the SEO sphere shall we say.
I'll stick by what I said.
The amount of sites using Canonical Tags incorrectly is insane.
Let's share this and get sites performing correctly.
+1
Between canonical tags and XML sitemaps - it seems incredible to me that web developers are utterly clueless about this - if I hear another web dev just "run a free site scan" ... Google are great at keeping technical SEOs in a job (which is good!)
+1, all developers should at least take a crash course into technical SEO before they're allowed to touch a site where SEO is important.
Hi Dr.pete,
I have read your post in depth it was really depth explanation of the Rel canonical, we can not use rel canonical for the cross domain but some people denied this so can you give me feed back fr this confusion?
Rel-canonical can be used cross-domain (it couldn't originally, but Google changed that later), but it's only appropriate for certain situations, like content syndication. Typically, these would be "true" duplicates.
Great post! Canonicalization is a big deal I can testify to that.
After a redesign of our ASP site last year we saw a 180%+ increase in organic search traffic after putting some ISAPI_rewrites/.htaccess in place. I found this post from Scott Hanselman extremely helpful:
https://www.hanselman.com/blog/ASPNETMVCAndTheNewIIS7RewriteModule.aspx
Hope that is helpful to anyone using ASP.
We're currently working with a site with a url issue we're unable to resolve. The url will return a page no matter the case used, for instance:
www.example.com/example-page
www.example.com/exAMpLE-PagE
These both return the same result, could this be causing a duplicate content issue? Would you recommend implementing a canonical tag pointing to the original (all lower case) url?
Unfortunately yes. And yes should be the answer to your question
(or check out if you can solve it via htaccess rule).
Hello,
Ah yes that is causing duplicate content for sure... if you are unsure check GWT URLs... the case issue is common to ASP.net platforms, you can set a rule for the server to force lowercase but also certainly make sure the canonical URL is all lowercase otherwise you make it that much harder for yourself.
Also check your internal links because you might find you have the same case issue...
David
It can definitely cause duplicates, although I'd say it depends a bit on whether you're actually seeing a problem. Having the mixed-case version resolve isn't ideal, but no one is going to link to ("/exAMpLe") under normal circumstances. Practically, a bigger problem comes when people link internally to mixed-case (like "/Example-Page"), but then inbound links use all lower-case. If your links are currently all lower-case and you aren't seeing mixed-case URLs in the index or in inbound links, it's probably not a big problem.
Thanks guys, we'll get it implemented and double check all internal urls.
Great post and good list. Something I will save for future reference.
About rel prev/next, I would add that Google is quite strict about the use of the canonical tag , and it says it should be self-referential. By the way, in cases of categories pagination, I always try to see if it is possible a "view all" option (especially in terms of Page Speed), and usually use that URL as canonical of its paginated series.
Then, related to Search pages. The safest solution should be to make those pages indexed. But, we all know how they can be overly useful in terms of long tail. So... up to you what to do. In any case, I agree with what you say in "Case 5".
Another issue which sees involved the is the faceted navigation. You talk about it in the product pages, but my real concern is when faceted navigation is related to categories. It surely not a problem related just to canonical, but canonical may help a lot (by the way, I suggest you to read this post by Barry Adams if you want to dig more about the faceted navigation issue).
So, to end this... Peter, I am happy you did not screwed up any of your sites with strange canonical experiments this time :)
Gianluca,
Are you trying to say that people should be wary of the Canonical Tag or they shouldn't use it all?
Your message is giving mixed signals.
Wary everybody would presume and I'd recommend but what do you think?
No... actually I am saying that it should be used always. Sorry if I wasn't clear.
why not just have your search pages noindexed? then you avoid the canonical problem here as it is no longer required. do they really bring in much extra traffic? all the content appears on other pages.
I should say that, when I say "search", I mean everything from the results of an actual search box to topic and tag pages. So, for many people, search results (page 1, especially) can be critical pages for SEO. In most cases, I don't think pages 2+ have much SEO value.
Good post Dr. Pete. There is another one that I see causing issues as well and that is when using sub-domains that pull up the same content such as is often the case with www and non-www versions. I've even seen servers configured to map the same content for any files version no matter what the sub-domain may be. This issue can be complicated by the use of canonical tags and setting a preferred domain in Webmaster Tools that contradict them. I find it just best to establish a standard early on and stick to it. The good thing is that search engines tend to figure things out once you have developed a consistent linking pattern.
Definitely - thanks. Forgot to include the "www" vs non-www, which can definitely be a big one. It ends up being a bit of a debate, since some people strongly prefer one or the other. Personally, I don't think it matters that much, as long as you're consistent.
Just a moment ago i am discussing the same issue with Gianluca Fiorelli on twitter and yes you are right this not much make any difference but one thing i found my indexing problem , but still its not a big problem from ranking perspective ...
"www" vs non-www , this problem we can easily solve through Google webmaster .and i dont thing it will play a main role in keyword or page rank .
Good write up, quick question for any wordpress users out there; do you know if you make a certain url structure canonical by how you enter your url in the wordpress settings?
I have been reading about canonical for about 4 hours to understand it after I got all freaked out that my website was not showing up as indexed. Turns out I was checking out the www.website.com version. When I checked website.com (non-www), it was showing up as indexed. Freaked me out because the www.website.com version is the one with about 1,000 more external links. I found canonical was set as website.com (non-www) and wondered if I should change that so that www.website.com would be indexed.
Anyway, after my day of reading, I think I understand this (and am hoping for affirmation):1. It is not advisable to change canonical to www since it has been that way for awhile2. The way to make sure all the seo link juice is not wasted is to to set a 301 redirect on www.website.com.
Thanks for all this great information!
Great post! I've have a large site and you've clarified a few things.
What would you do in this scenario?
www.example.com/search.php?topic=1234&page=2&results-per-page=25
www.example.com/search.php?topic=1234&page=2&results-per-page=100
www.example.com/search.php?topic=1234&page=2&results-per-page=500
Canonicalize it to: www.example.com/search.php?topic=1234&page=2 ?
BIG QUESTION: I noticed on eBay they have an "All Items" tab, "Auctions Only" and "Buy It Now Only" tabs. They canonicalize everything and all pages to the first page of the "All Items" tab... What is your recommendation on this?
Thanks
You really can't canonicalize to Page 2, because "Page 2" is three different pages, depending on the results/page. Your best bet is to default users and Google to one version, and make the other versions uncrawlable. You could also just block the "results-per-page" parameter in Google Webmaster Tools. It's best just to keep Google away from those options altogether.
like your website url is xyz.com and on the time of offpage submission you have used this url www.xyz.com. then Google stored your website with two domain in different different name and duplicate content one is xyz.com and second is www.xyz.com then to save this type of error you use canonical tag.
if we dont use canonical tag will google caught those pages as duplicate page?
Thanks for the sharing informative information, it is good stuff.
Good to know about these, I have used several but again going to try more which you listed.
i really observed so many aspects about this important area on URL creation! thanks
Pete, if we noindex nofollow internal search pages, does it require to have a canonical tag?
No - you should almost never use both (it's a mixed signal), and Google frowns on canonical for internal search pages. You can NOINDEX or use Rel=Prev/Next - not to oversimplify, but those two options will cover about 90% of the cases these days.
Thanks Pete :)
Canonical Tags have always been a case of confusion for many (including my immediate boss LOL).. thanks for the post!
Hi Dr. Pete! You got a very in-depth article here. I am actually not very familiar with canonical linking but thanks to this very informative post of yours. I have learned someting new today. I will also share this article to my friends for them to learn more about canonical linking.
Since page 1 (/my_page?page=1), is the same content as the 'root' page (i.e. /my_page/) - wouldn't it make since to have /my_page?page=1 canonical to /my_page/ ?
Dr. Pete. I am working with a client (large ecommerce store) that has tons of duplicate content issues. The SEO before had duplicate pages all sending a canonical to a master page (a specific type of vehicle). With the current CMS a specific part can be accessed via several different navigational paths (vehicle selections) giving it multiple URLS. We are currently writing unique content to help indicate which type of vehicle the specific url is for. Once this is finished we will no longer canonical the master as it can be confusing to customers looking for their type of product. I am having a hard time deciding if I should have the pages canonical themselves or if I should leave them without a canonical. What are your thoughts?
For what you're trying to accomplish (I don't know enough to endorse or argue against the overall strategy), I would use self-referencing canonicals. Since the pages previously had a canonical to another page, this may help signal Google that there's been a change more strongly than simply removing the tag.
Thanks for the input. We will be rolling it out after testing in the next week or so. I will let you know in a few months how it works.
we can also remove this error by .htaccess file in index page
the code is for .htaccess file:-
Options +FollowSymlinks
RewriteEngine on
rewritecond %{http_host} ^domain.com [nc]
rewriterule ^(.*)$ https://www.https://www.domain.com/$1 [r=301,nc
Great, very in-depth article. I was very interested in canonical linking when I first started hearing more about it some time ago. We also put out an article on this in our news section at https://www.elevatesem.com/blog/canonical-linking-intro/. Funny thing - it has a link to Rand's whiteboard Friday Video on the same topic :)
In the “Case 5” I would like to suggest that we can use <meta name="robots" content="noindex"> for “/search.php?” so crawler will not index that particular page, but they can follow the links on that pages. For advance option please advise.
Looking forward to hearing from you. Thank you.
Dr. Pete what about Forums? Especially the pagination on threads and forum URLs within vBulletin?
Unfortunately, that can get hairy pretty fast. Google does seem to be pushing us toward rel=prev/next for large-scale pagination, but I've personally had mixed results. It seems to be working better the past few months. It depends a lot on the scope of the problem and if you have complicating factors, like sorts/filters.
Another insightful post by the good doctor - wonderful descriptions here. For me, it's surprising how rarely site owners notice duplicate content, even if it's staring them in the face. I'm not talking about lengthy parameters and somewhat buried pages, but even quirks with their CMS and navigation that make primary pages accessibly via several "clean" URLs. It's in these cases that I've seen canonical tags work the most magic and translate into excellent results.
Which leads me to a prequel to setting up canonical tags - clean up your navigation! Simply running a Xenu report or other link checker can reveal all sorts of surprises.
Great post! I always recommend having absolute URLs, fixes most of the basic issues.
Clearest post I've read on the subject of canonicalisation - and luckily for me, I'm already doing it all correctly! :-)
Hi There
Thanks Dr. Pete for this article and pointing out the Adam Audette Post on
https://searchengineland.com/five-step-strategy-for-solving-seo-pagination-problems-95494,
This is a good article for pagination buy why one should noindex pagination pages. I can understand about view all page as it might take a long time to load and would not be good for users experience but for pagination url's i don't think there is a need of noindex..
A couple of reasons:
(1) Paginated search can often appear thin (same Titles, META data), Google doesn't value it, and it can spin out of control. All of this creates the perception of thin content.
(2) Pages 2+ of any search tend to be very low-value to search users. They don't attract links and aren't great landing pages in most cases.
So, you've got sometimes thousands of pages that Google doesn't like and that have little or no SEO benefit. Especially since Panda, it's made more sense to control those pages.
Thanks Dr. Pete. It does make sense that Page 2+ pages hardly comes in SERP's and they don't attract links, and by making noindex, google will anyways crawl those pages and follow the product/item from pagination pages. Liked it :)
I'm new to the idea of canonical URL's, so please bear with me. How does this apply to blogs where you have a post that shows on it's own page, on the actual blog page, and on a tag or category sort?
These could all show the same content
A:www.example.com/blog
B:www.example.com/blog/post-1
C:www.example.com/blog/?Tag=Tag-A
Is it best to always apply the canonical tag to the original post or do search engine spiders already understand this?
Any recommended reading on this topic?
They should all have their own canonicals, but they should not be showing the same content, especially if you have multiple posts on the home page or in your categories.
Most blogging software allows you to use a summary or display only the first portion of the blog post on the home page or category filters - look at seomoz.org/blog for instance. This will reduce duplicate content issues (and assuming your intros are compelling, encourage more views on your blog posts).
The only exception would be if you have a popular blog that consistently generates dozens of comments. Then it may be OK to show the full content on grouped pages because you will have enough unique content in the comments. In most cases, though, I would recommend only showing summaries/intros on home and category pages to keep each link unique.
A good article on something that most SEOs take for granted.
Was particularly interested in #5. In many cases, I have canonicalised back to P1 when faced with this situation. This is especially true when images are different but readable text content is very similar - why is Google going to bother showing page 35 in their results? I thought of it as being a helpful thing to do to tell Google that they needn't bother - just so long as they follow links from canonicalised pages, which they do.
Is this the wrong thing to do? Can you provide any examples of where this is a) the right solution and b) the wrong solution, and why?
Thanks,
Mark
Google's official position now is that you should not canonical to page 1 of results. You can canonical to a "View All", but only if that pages is relatively fast loading and a decent user experience (defining that is tricky). In the past, I tended to lean toward META NOINDEX'ing pages 2+ of results. Google won't endorse that, but it still seems to work ok in some situations. Rel=prev/next isn't a bad solution, but it's tricky to implement, Bing doesn't fully support it, and results can be mixed on big sites.
Thanks for your reply, Dr Pete. A view all is a good solution, except in situations where there are hundreds of products, each with their own image/s, and thus leading to a poor loading speed. I totally agree that rel=next/previous is a no brainer.
I hadn't read that Google advise against canonicalising back to P1 - where did you get that info? Have done a few searches but can only find the odd forum reference to that point (not doubting, just want to read the whole thing).
So, it looks like I'll be implementing no-index rather than canonical in future. It seems like the only sensible option.
Thanks again :)
You can also use Google Webmaster Tools to set which parameters significantly change your content and which do not. :)
You are right... but then you should remember to do it also in Bing Webmaster Tools (it has a similar function).
Oh... and remember that doing that you're probably screwing any third part crawling system (i.e.: the SEOmoz crawl or Spider Frog SEO Tool), which will still seeing those errors.
Good point, very true. Might be best used as a complementary technique.
I usually use it as last chance solution...
Great post as usual - thanks.
One question from an issue I had on a customer website last week:
What happens in the case that the rel=canonical links to a non-exisitant page?
In the customers case canonical linked to "https://127.0.0.1/index.php" because they just directly copied the test server to live version.
Will this be ignored, or result in current page not being correctly indexed / ranked?
Google has said that they try to ignore canonical tags that lead to 404s or clearly seem to be in error, and I've seen them ignore a bad canonical, but it's never a good idea to leave it to their interpretation.
These Canonical problems appear especially in ecommerce sites, where you have lots of filters for sorting and modifying the content of a page.
The sad part is that many webmasters have no idea about these big problems with their sites.
It depends on experience and people will learn with time so as you :)
Great summary on those cryptic "Canonical" and "Duplicate" pages. Great work Dr. Pete on summarising all the possible cases.
I have 2 quick questions, if anyone can please help:
1. Does an anchor URL (like href="#top") which points to an anchor within a page will be counted as a duplicate page by Google?
I have seen online shops having Product page URLs like below:
https://www.example.com/product-page.html
but having tabs within those product pages with URLs like:
https://www.example.com/product-page.html#AboutProduct
Will above be treated as 2 separate pages by Google?
2. I have also came across websites (for example, https://www.seomoz.org/) which uses self-referencing canonical tags on a page which points to the same page. Is this a recommended practice promoted by Google? Under which circumstances, we should USE and AVOID the use of such self-referencing URLs?
Any thoughts on this please.
(1) Names anchors are almost never a problem. It gets more complicated nowadays, since the hash-style ("#") URLs can be used by AJAX, but traditionally Google ignores everything after it.
(2) I've never seen it be a problem, and it's usually just easier all around. Bing has said they don't love it, since it's a lot of extra tags to process, but I haven't seen anyone penalized by Google or Bing for self-referencing canonicals.
Great and good to know about these small but tricky stuffs while working on duplication problem across the website. Information on Bing was new for me. Hope it retains its non-harming stand on self-referencing canonicals.
Cheers and thanks Dr. Pete.
Thanks Dr. Pete for your great post! I have two questions and hope to have your answer:
1. You mentioned in Case 3 regarding URL #5" - Canonicalizing that to the root or to “index.php” could collapse your site to one page in the Google index."
I thought it only tells Google to count this one page's traffic to home page'. Why did you say it could collapse the whole site to home page? Did you mean if someone uses the tag for all pages like this?
Another question is about your reminder at the end of the post: " If you use the canonical tag to point to one version of a URL, but then every internal link uses a different version, you’re sending a mixed signal"
Could you kindly give an example so I can understand better?
Thanks very much.
(1) Canonical tags cause the non-canonical versions to effectively drop out of the index, at least for any practical purposes (including ranking). So, if you accidentally canonical to the wrong page, you can cause serious harm. I ran an experiment on this (an extreme case) a while back:
https://www.seomoz.org/blog/catastrophic-canonicalization
(2) In other words, if you link to "www.example.com/contact.php" internally, but then you set a canonical tag to "www.example.com/contact" (without the .php), you're basically telling Google two things. Your internal links are one of the most important (possibly the most important) signals for which URL is canonical.
Thanks for your this explanation.
Nice Post..
Excellent article.
Just 1 query.
So If I am tracking my site's banner performance like
https://www.example.com/productpage1.html?utm_source=banner1
are these being tracked and recorded as different URL's by Google?
Google is pretty good about ignoring utm_source, given the tie-in to Adwords, but it doesn't hurt to ignore it in Google Webmaster Tools. If you're using your own tracking parameters you should definitely control those URLs somehow.
Step 1 is always to check for indexation. Just run a query like:
site:example.com inurl:utm_source
...and see if Google has indexed anything. If they haven't, it may not be a big deal.
Simply Amazing Post About Canonical!!! I am deeply thanks full to you Dr.Pete for sharing such a nice information in depth.
Hello, can anybody tell me if this bit of code is causing canonical issues for my homepage?
<div id="logo"><a href="index.html">Home</a>
Generally, unless the "index.html" version is your canonical version, I would link to either the full path or just "/" for the home-page. Otherwise, you're very likely to end up with a copy indexed.
Thank you sir for shared here such a informative and valuable post.
from when i became a member of SEOMOZ , I always read your post. I read this post in one complete stop and i like this post so much .
Really canonical issue is always a very difficult to understand but here you have explained a very easy.
thumb up to You !!!
This post is somehow more descriptive and informative regarding the canonical issue. but you initially missed the www vs. non-www that you indicated in a comment. Great post!! Nice utilization of my time.
Great post , lot of webmaster is confuse on the canonical issu .But above post almost resolve the URL canonical and content duplicate's problem . Thanks Dr. Pete Sir