It was once commonplace for developers to code relative URLs into a site. There are a number of reasons why that might not be the best idea for SEO, and in today's Whiteboard Friday, Ruth Burr Reedy is here to tell you all about why.
Let's discuss some non-philosophical absolutes and relatives
Howdy, Moz fans. My name is Ruth Burr Reedy. You may recognize me from such projects as when I used to be the Head of SEO at Moz. I'm now the Senior SEO Manager at BigWing Interactive in Oklahoma City. Today we're going to talk about relative versus absolute URLs and why they are important.
At any given time, your website can have several different configurations that might be causing duplicate content issues. You could have just a standard https://www.example.com. That's a pretty standard format for a website.
But the main sources that we see of domain level duplicate content are when the non-www.example.com does not redirect to the www or vice-versa, and when the HTTPS versions of your URLs are not forced to resolve to HTTP versions or, again, vice-versa. What this can mean is if all of these scenarios are true, if all four of these URLs resolve without being forced to resolve to a canonical version, you can, in essence, have four versions of your website out on the Internet. This may or may not be a problem.
It's not ideal for a couple of reasons. Number one, duplicate content is a problem because some people think that duplicate content is going to give you a penalty. Duplicate content is not going to get your website penalized in the same way that you might see a spammy link penalty from Penguin. There's no actual penalty involved. You won't be punished for having duplicate content.
The problem with duplicate content is that you're basically relying on Google to figure out what the real version of your website is. Google is seeing the URL from all four versions of your website. They're going to try to figure out which URL is the real URL and just rank that one. The problem with that is you're basically leaving that decision up to Google when it's something that you could take control of for yourself.
There are a couple of other reasons that we'll go into a little bit later for why duplicate content can be a problem. But in short, duplicate content is no good.
However, just having these URLs not resolve to each other may or may not be a huge problem. When it really becomes a serious issue is when that problem is combined with injudicious use of relative URLs in internal links. So let's talk a little bit about the difference between a relative URL and an absolute URL when it comes to internal linking.
With an absolute URL, you are putting the entire web address of the page that you are linking to in the link. You're putting your full domain, everything in the link, including /page. That's an absolute URL.
However, when coding a website, it's a fairly common web development practice to instead code internal links with what's called a relative URL. A relative URL is just /page. Basically what that does is it relies on your browser to understand, "Okay, this link is pointing to a page that's on the same domain that we're already on. I'm just going to assume that that is the case and go there."
There are a couple of really good reasons to code relative URLs
1) It is much easier and faster to code.
When you are a web developer and you're building a site and there thousands of pages, coding relative versus absolute URLs is a way to be more efficient. You'll see it happen a lot.
2) Staging environments
Another reason why you might see relative versus absolute URLs is some content management systems -- and SharePoint is a great example of this -- have a staging environment that's on its own domain. Instead of being example.com, it will be examplestaging.com. The entire website will basically be replicated on that staging domain. Having relative versus absolute URLs means that the same website can exist on staging and on production, or the live accessible version of your website, without having to go back in and recode all of those URLs. Again, it's more efficient for your web development team. Those are really perfectly valid reasons to do those things. So don't yell at your web dev team if they've coded relative URLS, because from their perspective it is a better solution.
Relative URLs will also cause your page to load slightly faster. However, in my experience, the SEO benefits of having absolute versus relative URLs in your website far outweigh the teeny-tiny bit longer that it will take the page to load. It's very negligible. If you have a really, really long page load time, there's going to be a whole boatload of things that you can change that will make a bigger difference than coding your URLs as relative versus absolute.
Page load time, in my opinion, not a concern here. However, it is something that your web dev team may bring up with you when you try to address with them the fact that, from an SEO perspective, coding your website with relative versus absolute URLs, especially in the nav, is not a good solution.
There are even better reasons to use absolute URLs
1) Scrapers
If you have all of your internal links as relative URLs, it would be very, very, very easy for a scraper to simply scrape your whole website and put it up on a new domain, and the whole website would just work. That sucks for you, and it's great for that scraper. But unless you are out there doing public services for scrapers, for some reason, that's probably not something that you want happening with your beautiful, hardworking, handcrafted website. That's one reason. There is a scraper risk.
2) Preventing duplicate content issues
But the other reason why it's very important to have absolute versus relative URLs is that it really mitigates the duplicate content risk that can be presented when you don't have all of these versions of your website resolving to one version. Google could potentially enter your site on any one of these four pages, which they're the same page to you. They're four different pages to Google. They're the same domain to you. They are four different domains to Google.
But they could enter your site, and if all of your URLs are relative, they can then crawl and index your entire domain using whatever format these are. Whereas if you have absolute links coded, even if Google enters your site on www. and that resolves, once they crawl to another page, that you've got coded without the www., all of that other internal link juice and all of the other pages on your website, Google is not going to assume that those live at the www. version. That really cuts down on different versions of each page of your website. If you have relative URLs throughout, you basically have four different websites if you haven't fixed this problem.
Again, it's not always a huge issue. Duplicate content, it's not ideal. However, Google has gotten pretty good at figuring out what the real version of your website is.
You do want to think about internal linking, when you're thinking about this. If you have basically four different versions of any URL that anybody could just copy and paste when they want to link to you or when they want to share something that you've built, you're diluting your internal links by four, which is not great. You basically would have to build four times as many links in order to get the same authority. So that's one reason.
3) Crawl Budget
The other reason why it's pretty important not to do is because of crawl budget. I'm going to point it out like this instead.
When we talk about crawl budget, basically what that is, is every time Google crawls your website, there is a finite depth that they will. There's a finite number of URLs that they will crawl and then they decide, "Okay, I'm done." That's based on a few different things. Your site authority is one of them. Your actual PageRank, not toolbar PageRank, but how good Google actually thinks your website is, is a big part of that. But also how complex your site is, how often it's updated, things like that are also going to contribute to how often and how deep Google is going to crawl your site.
It's important to remember when we think about crawl budget that, for Google, crawl budget cost actual dollars. One of Google's biggest expenditures as a company is the money and the bandwidth it takes to crawl and index the Web. All of that energy that's going into crawling and indexing the Web, that lives on servers. That bandwidth comes from servers, and that means that using bandwidth cost Google actual real dollars.
So Google is incentivized to crawl as efficiently as possible, because when they crawl inefficiently, it cost them money. If your site is not efficient to crawl, Google is going to save itself some money by crawling it less frequently and crawling to a fewer number of pages per crawl. That can mean that if you have a site that's updated frequently, your site may not be updating in the index as frequently as you're updating it. It may also mean that Google, while it's crawling and indexing, may be crawling and indexing a version of your website that isn't the version that you really want it to crawl and index.
So having four different versions of your website, all of which are completely crawlable to the last page, because you've got relative URLs and you haven't fixed this duplicate content problem, means that Google has to spend four times as much money in order to really crawl and understand your website. Over time they're going to do that less and less frequently, especially if you don't have a really high authority website. If you're a small website, if you're just starting out, if you've only got a medium number of inbound links, over time you're going to see your crawl rate and frequency impacted, and that's bad. We don't want that. We want Google to come back all the time, see all our pages. They're beautiful. Put them up in the index. Rank them well. That's what we want. So that's what we should do.
There are couple of ways to fix your relative versus absolute URLs problem
1) Fix what is happening on the server side of your website
You have to make sure that you are forcing all of these different versions of your domain to resolve to one version of your domain. For me, I'm pretty agnostic as to which version you pick. You should probably already have a pretty good idea of which version of your website is the real version, whether that's www, non-www, HTTPS, or HTTP. From my view, what's most important is that all four of these versions resolve to one version.
From an SEO standpoint, there is evidence to suggest and Google has certainly said that HTTPS is a little bit better than HTTP. From a URL length perspective, I like to not have the www. in there because it doesn't really do anything. It just makes your URLs four characters longer. If you don't know which one to pick, I would pick one this one HTTPS, no W's. But whichever one you pick, what's really most important is that all of them resolve to one version. You can do that on the server side, and that's usually pretty easy for your dev team to fix once you tell them that it needs to happen.
2) Fix your internal links
Great. So you fixed it on your server side. Now you need to fix your internal links, and you need to recode them for being relative to being absolute. This is something that your dev team is not going to want to do because it is time consuming and, from a web dev perspective, not that important. However, you should use resources like this Whiteboard Friday to explain to them, from an SEO perspective, both from the scraper risk and from a duplicate content standpoint, having those absolute URLs is a high priority and something that should get done.
You'll need to fix those, especially in your navigational elements. But once you've got your nav fixed, also pull out your database or run a Screaming Frog crawl or however you want to discover internal links that aren't part of your nav, and make sure you're updating those to be absolute as well.
Then you'll do some education with everybody who touches your website saying, "Hey, when you link internally, make sure you're using the absolute URL and make sure it's in our preferred format," because that's really going to give you the most bang for your buck per internal link. So do some education. Fix your internal links.
Sometimes your dev team going to say, "No, we can't do that. We're not going to recode the whole nav. It's not a good use of our time," and sometimes they are right. The dev team has more important things to do. That's okay.
3) Canonicalize it!
If you can't get your internal links fixed or if they're not going to get fixed anytime in the near future, a stopgap or a Band-Aid that you can kind of put on this problem is to canonicalize all of your pages. As you're changing your server to force all of these different versions of your domain to resolve to one, at the same time you should be implementing the canonical tag on all of the pages of your website to self-canonize. On every page, you have a canonical page tag saying, "This page right here that they were already on is the canonical version of this page. " Or if there's another page that's the canonical version, then obviously you point to that instead.
But having each page self-canonicalize will mitigate both the risk of duplicate content internally and some of the risk posed by scrappers, because when they scrape, if they are scraping your website and slapping it up somewhere else, those canonical tags will often stay in place, and that lets Google know this is not the real version of the website.
In conclusion, relative links, not as good. Absolute links, those are the way to go. Make sure that you're fixing these very common domain level duplicate content problems. If your dev team tries to tell you that they don't want to do this, just tell them I sent you. Thanks guys.
I am sorry Ruth, but i have to strongly disagree.
Do i assume correctly that you are not a web-developer? There are a lot of easy solutions for the problems you mention and ALL of them are better than absolute URLs (which will be a pain in the a.. in the long run).
First: There is the obvious <base href="https://www.example.com/" /> This little thing in the HTML Head alone makes most of your WBF obsolete.
Second: As others mentioned above: redirects in the .htacces are a good fallback to this base tag. And they give you much power over what your server serves to the client in any circumstance.
Third: Absolute URLs are Error prone. If someone uses the non-canonical absolute URL the only way to really fix it is to find and change the link. When using relative URLs the server wide fixes are described above. And when people start using absolute URLs you will find all kind of creative things they will start putting in there.
Additionally the likelihood of a typo resulting in a 404 is much higher. https://www.exlample.com/test.html will generate an error. But as exlample.com is NOT even your Server, it will NOT show your beautiful 404 Page to lead the customer further into your page and help him find stuff. Instead it will show an empty browser error page.
Fourth: Changes in the Base-URL will be a pain with absolute URLs. Nowadays everybody changes to https an with relative URLs that is no problem at all. With absolute URLs you have to change all your templates, everything in the database of your CMS and alls the hard-coded HTML files floating around on your web server. And i guarantee you: You will NOT find all those little buggers. (how about all the targets of eyery <form> on your website for example.)
Fifth: Absolute URLs will stop scrapers? Really? A simple find and replace will be to hard for them? I - do - not - think - so.
As a web developer and SEO with 10 years of experience i urge you and everyone else NEVER to use absolute URLs for internal links in ANY case.
Quite opposite i train all people in our company to use relative URLs and i often have to repeat that lesson whenever there is a new problem caused by absolute URLs.
I agree with the above. This WBF has good tidbits throughout, but, like Torben, I also disagree with the final recommendation. Yes, all domains should resolve to one domain. Sure, self-canonicalization.
Another counter-argument to the whole scraper argument: If they are going to scrape my content with all the relative links included, their version of my content is going to be riddled with 404s. As far as links coming in to my site from my own scraped content hosted on their site goes, I really don't want inbound links from scraper sites; period.
Too many arguments against human error and better resource use.
Yes, the problem analysis Ruth did is absolute spot on. Canonicalization is a problem and should be dealt with.
Just the solution she proposed is short sighted and doesn't really fit the problem.
Good point from you about the scraper sites. I would not want spammy links from scraper sites neither. In fact whenever I find scraped content I contact the Webmaster and ask for removal and file a DMCA complaint if he doesn't comply. A lot of scraped links pointing to your site sounds like a penguin to me. ;-)
Absolute links won't help with people scraping your site. Many, if not all, scraping tools will automatically correct the links as they scrape the site. Having an absolute link doesn't help.
I strongly agree on this. The mentioned issues are easily handled using proper webcoding best practices and should not be an issue. I am unsure what Ruth meant with this WBF. I can't see any benifits of using absolute URLs at all.
I registered just to vote your comment up and agree with you.
You're simply totally right, this WBF teaches bad practices and it should be mentioned that there are far better and more up-to-date solutions to this.
Excellent WBF Ruth, great to see you on here :)
I'd also add that when 301 redirecting, the source URL should be relative and the destination URL should be absolute (for those unaware).
For viewers: more details (and best practices) can be found on an answer to a YouMoz question on this subject and a full run-down on redirects on Moz's own guide.
Ruth importantly mentioned crawl budget - AJ Kohn wrote a wonderful post about crawl optimisation (well worth a read). Also see Google's duplicate content guide which lists a ton of resources & another Google post exploring 5 common mistakes with rel=canonical (see #2 on absolute URLs).
Thanks for these great resources Tony. Also, Ruth you made a great point with Absolute and relative thing here and the ways to fix it. I was not aware about Crawl Budget. Thanks again!
You're welcome Amit.
We have to remember that WBF's are watched by both those in our industry and not - I tell my clients to watch every episode when they can (and many of them are not technical), so I included those resources in the hope that they'll help others get more of an understanding, as URL's are one of the most miss-understood elements of IA, design and SEO.
If Google can't crawl or index a website due to its URL structure, no amount of SEO, UX or content will help that website be found ;)
Great point on the 301 redirect component and using absolute urls, Tony. I'll definitely be making the devs here aware of this (they've recently been pushing for the marketing team to use relative urls for the new destination).
Thank you!
My pleasure AJVenture, glad to help :)
Thanks Tony Dimmock, not only I learned a lot from Ruth but you as well. It's fascinating how you guys explain complicated things to us (not web developers) and we can understand it easily.
If all redirects of non-www, www, https and http are done correctly to one version of the URL on the server side, there would be no duplicate content issues for the four versions of the URL and therefore no unnecessary crawling of duplicate content right? If this is the case, the only reason left to use absolute URLs instead of relative URLs would be to make life harder for the scrapers? Or am I missing something?
Just what I was thinking. Why make development so much harder and prone to errors when the redirects would solve the problem anyhow.
I think it's always a good idea to, whenever possible: 1.) Present Google with exactly the URL you want them to crawl/index, and 2.) eliminate or drastically reduce any and all times a user or search engine has to pass through a redirect.
Yes, fixing the server-side duplicate content issues goes most of the way toward fixing this issue.
Dude... relative or absolute, I still have to deal with crap like this from scrapers/syndicators. This is my list for 1 website of URLs who have scraped that content. While it isn't the case for this one particular site yet... one of the other properties is already being outranked by it's own content on another domain. I honestly couldn't even paste the entire list because it's too many characters for the comment box, but when you get ANYWHERE close to this level of saturation duplicate content wise, you see serious declines in rank & impressions...
Pueblo Bonito Los Cabos English Site:
https://www.travimp.com/hotel.php?msg=sjdpbb
https://www.myaev.com/destinations/mexico/sjd/sjd_pbb.html&msg=sjdpbb
https://www.homeaway.com/vacation-rental/p3561605
https://www.homeaway.com/vacation-rental/p3561436
https://quest.travimp.com/phase1/erlext.cgi?msg=sjdpbb
https://www.bisbees.com/resources/private-condo-listings/
https://www.buyatimeshare.com/resorts/Pueblo-Bonito-Los-Cabos.asp
https://www.purecabo.com/about-cabo-san-lucas/cabo-hotels/
https://www.mywvrentals.com/property/pueblo-bonito-cabo-san-lucas/
https://www.mywvrentals.com/property/pueblo-bonito-cabo-san-lucas-2/
https://www.avantidestinations.com/EVWeb/prodinfo1.jsp?theme=AvdMex&sourceid=AvdMex&utype=agent&stype=HTL&supid=SJDPBL&servid=JRSEBB&path=Products/HTL/SJD&title=Pueblo%20Bonito%20Los%20Cabos%20All%20Suites-Junior%20Suite%20-%20Special
https://www.avantidestinations.com/EVWeb/prodinfo1.jsp?theme=AvdMex&sourceid=AvdMex&utype=agent&stype=HTL&supid=SJDPBL&servid=JRA&path=Products/HTL/SJD&title=Pueblo%20Bonito%20Los%20Cabos
https://www.avantidestinations.com/EVWeb/Process.jsp;jsessionid=9BCEB6B893D630DBCBD08544B9327F42?sourceid=AvdMex&theme=AvdMex&utype=agent&isagency=Y&agenthidden=N&brand=AVD/PDXTWeb&jjjj=Process.jsp&page=Specials&title=Specials
https://vacatia.com/timeshares-resales/cabo-san-lucas-mexico/pueblo-bonito-los-cabos-resort-33328
https://www.dargal.com/resorts/mexico/los-cabos/pueblo-bonito-los-cabos?&destination=los&meal_plan=AI
https://www.perx.com/resorts/mex-pacific/los-cabos/pueblo-bonito-los-cabos
https://itx-loscabos.com/Baja-California/83/38/Pueblo-Bonito-Los-Cabos/2037
https://itx-loscabos.com/Baja-California/83/43/Pueblo-Bonito-Los-Cabos/1821
https://itx-loscabos.com/Baja-California/83/43/Pueblo-Bonito-Los-Cabos/201
https://itx-loscabos.com/Baja-California/83/10/Pueblo-Bonito-Los-Cabos/201
https://itx-loscabos.com/Baja-California/83/51/Pueblo-Bonito-Los-Cabos/201
https://itx-loscabos.com/Baja-California/83/157/Pueblo-Bonito-Los-Cabos/201
https://itx-loscabos.com/Baja-California/83/137/Pueblo-Bonito-Los-Cabos/2037
https://itx-loscabos.com/Baja-California//94/
https://www.itx-loscabos.com/Baja-California/for-sale
https://itx-loscabos.com/Baja-California//157/
https://itx-loscabos.com/Baja-California/83/121/Pueblo-Bonito-Los-Cabos/1821
https://itx-loscabos.com/Baja-California/83/14/Pueblo-Bonito-Los-Cabos/1821
https://itx-loscabos.com/Baja-California/83/10/Pueblo-Bonito-Los-Cabos/1821
https://itx-loscabos.com/Baja-California/83/41/Pueblo-Bonito-Los-Cabos/1821
https://itx-loscabos.com/Baja-California/83/146/Pueblo-Bonito-Los-Cabos/1821
https://itx-loscabos.com/Baja-California/83/43/Pueblo-Bonito-Los-Cabos/2037
https://itx-loscabos.com/Baja-California/83/65/Pueblo-Bonito-Los-Cabos/2037
https://itx-loscabos.com/Baja-California/83/146/Pueblo-Bonito-Los-Cabos/2037
https://itx-loscabos.com/Baja-California/83/141/Pueblo-Bonito-Los-Cabos/2037
https://itx-loscabos.com/Baja-California/83/25/Pueblo-Bonito-Los-Cabos/2037
https://itx-loscabos.com/Baja-California/83/51/Pueblo-Bonito-Los-Cabos/2037
https://itx-loscabos.com/Baja-California/83/132/Pueblo-Bonito-Los-Cabos/1990
https://itx-loscabos.com/Baja-California/83/41/Pueblo-Bonito-Los-Cabos/1990
https://itx-loscabos.com/Baja-California/83/10/Pueblo-Bonito-Los-Cabos/1990
https://itx-loscabos.com/Baja-California/83/139/Pueblo-Bonito-Los-Cabos/1990
https://itx-loscabos.com/Baja-California/83/146/Pueblo-Bonito-Los-Cabos/1990
https://www.getaroom.com/hotels/pueblo-bonito-los-cabos?rinfo=[[18]]
https://itx-loscabos.com/Baja-California/83/121/Pueblo-Bonito-Los-Cabos/201
Totally agree. The additional dev and admin effort required to use absolute URLs does not, in my view, become a better alternative to the following...
1. 301 redirect all versions to your preferred version. That covers off the first SEO benefit and takes 2 minutes to implement.
2. Use a canonical tag on your preferred version. That covers the second SEO benefit and takes 5 minutes to implement. If scrapers are an issue then the canonical tag will still be present telling search engines who the original provider was. Scrapers can easily add one line of code to switch out absolute URLs.
I have to be missing something here as I can't see why this is being pushed out as a must do.
Hello Ruth, great to see you on WBF.
This is the type of video that Moz should make more of. Good, basic, evergreen SEO advice - with a bonus of some "not often thought about" information such as mitigating scraper risk with rel=canonical and the fine points of crawl budget.
Anybody who didn't watch this because they thought it was a noob topic missed out.
Thanks for the video.
I agree with you. I thought this was a very simple article, all know that must use absolute rather than relative urls, but the prospect Ruth planted in this article was great.
In addition it was great bonus on canonical, I add to my favorites.
Great point, EGOL! I'm sure many passed up watching Ruth's WBF, unfortunately. I'm glad I watched. I didn't know about the scraper threats and that would be a pain in the @$$ to deal with for any business. Calls for all new content or a new site. It happens more than most think about.
I'd like to toss in my $0.02 (for overseas readers, it's a U.S. expression meaning "opinion") on this topic.
First, really, the frame around this discussion is in my view is about a website's internal links, and how they're coded.
So setting aside all else and just focusing on how internal links are coded, here's what I'd like to add based on my experience as an Agency-level SEO over the last ~6 years working with small, medium, large and extremely-large-sized websites (I'm of course referring to the number of pages the site has when I use the word "size").
My notes below are primarily from issues found when working with large to extremely large websites:
1. If you have a large number of internal links passing through redirects because you decided the easiest way to handle your messy internal linking problem is to use .htaccess (or whatever server config. method for whatever platform you're on) then that's nice and easy from a dev. perspective, but having internal links passing through redirects is simply sub-optimal from a server performance perspective and from the perspective of how Link Authority is passed along by that internal link. Yes, there are (still) debates about how much (if any) Link Authority (PageRank, "juice" - whatever you want to call it) is passed via 301 or 302, but linking directly to the intended final URL via internal hyperlink(s) is obviously optimal compared to passing through redirects of any kind. But wait - don't order - I've many, many times seen internal links pass through multiple-hop redirects (302 and 301) where the internal hyperlink redirects to another URL and another and another - and redirect chains like that are then even more sub-optimal than a single-hop internal redirect. Typically those chains are the "normal" result of a website re-dev/re-design or simply through reorganizing the existing site folder structure.
2. Something Ruth didn't mention, but I'm sure meant to :) - another huge problem I've seen (for some reason, it's never the small or medium sites, it's always the large to very-very large sites in my experience) is relative path internal hyperlinks that were coded without the forward slash. So instead of the relative path internal hyperlink being /whatever-page, it's coded as whatever-page, and I've seen that cause spider traps (where endless, practically infinite versions of the page/URL are "created"/resolve) - talk about killing crawl budget! Interestingly, it would appear that whether a relative path internal hyperlink missing the slash causes a spider trap is platform-dependent in some way (server/host) because I've seen sites with relative hyperlinks missing the slash not cause a spider trap, but I've seen a fair amount of spider traps in my day created due to the missing slash.
And with that, I will now go back to working my day job ;) - I hope this is helpful!
And 3. Redirect chains is overkill for mobile users. It's just waste of time for them.
Excellent point, Peter, thank you for adding that!
I agree, David - especially with your point about not passing internal links through redirects whenever possible. They're the only links you have full control over so why wouldn't you want to make them as clean and direct as possible?
I'm an overseas reader from India, but understand $0.02 :P ;-)
Great inputs though, sir, befitting a great article.
Just a 3-4 line code (like below) in htaccess can fix all of these things.
RewriteEngine on
rewritecond %{http_host} ^example.com [nc]
rewriterule ^(.*)$ https://www.example.com/$1 [r=301,nc]
Pretty basic things! To be honest, this WBF is not to the standard level set by Moz for WBF. (May be I am too odd but this is what I think)
We felt the exact same way :/
Agreed. I feel like this post could have been much more concise: redirect all traffic for a single site to a single url, preferably https. At that point, duplicate content is no longer an issue. Also, I've never shared the whole "scraper" concern, but if someone wants to scrape your site badly enough, they'd just need to regex out the first piece of the absolute urls anyway.
Also, the topic of test/staging versions was breezed over, but is a huge issue for teams who need site internal links to work in multiple varying environments where relative URLs shine.
But maybe I'm missing something?
It's not quite that simple if you also want to ensure the https protocol is redirected properly and in one step. e.g. your example does not deal with the https www version
Here's a generic example to force to the https non www version as preferred by this article:
# Redirect HTTP with www to HTTPS without www
RewriteCond %{HTTPS} off
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule .* https://%1%{REQUEST_URI} [R=301,L]
# Redirect HTTP without www to HTTPS without www
RewriteCond %{HTTPS} off
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
# Redirect HTTPS with www to HTTPS without www
RewriteCond %{HTTPS} on
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule .* https://%1%{REQUEST_URI} [R=301,L]
Its More useful as compare to Brijesh's example . Thanks TiggerIto.
I like that Ruth went more technical and went more in-depth than most of my WB Fridays (which have tended to the more conceptual/high level of late). I also think it's hugely valuable for those doing SEO to have all the insights about WHY relative vs. absolute URLs matter, especially the crawl budget and scraper issues, which are so often ignored. Many practitioners in our field need that ammunition to be able to convince their teams/bosses/clients of what needs doing (which, granted, is frustrating, but it's just how it goes much of the time).
BTW - if you ever do have suggestions for topics you'd like to see on WB Friday, feel free to tweet at me (@randfish)!
Great article Ruth, very thx!
Thanks everyone for your comments on this post - I love the lively discussions this community fosters!
When making this video, I was attempting to make a pretty complex technical issue with a lot of pros and cons into a more bite-sized, digestible video, with the result that I omitted some important considerations and spent more time than necessary on others. I do feel that I over-emphasized scraper risk, which a.) will be a risk no matter what and b.) often doesn't present too much of an issue for most sites.
I'm not a developer, and I appreciate those of you with web dev experience weighing in on the post. This is an issue that is often cause for debate between SEOs and web developers, and one thing that I neglected to say in the video (that I always say to my clients in real life) is that every website is different, there are many very good reasons to NOT do it the way I said to, and that you should have a careful and considered conversation with your developer over whether the cost to implement and maintain this structure is going to be worth the added SEO benefit. I have absolutely seen serious SEO benefits to eliminating duplicate content and pointing search engines and browsers to exactly the URLs that you want them to crawl and index - but I'm also not the one who has to make these changes or deal with the other development headaches they can cause, and I should have approached that topic with more empathy.
It's awesome to see the amount of technical prowess in this community - it's really grown up a lot even in the last couple of years. I've learned a lot from this thread about ways to make these change requests easier on my and my clients' dev teams, and that will help me continue my lifetime of learning more about SEO. I hope everyone in the Moz community knows that with this blog, the original poster often learns as much from the comment thread as readers do.
Thanks for the article Ruth, this is something I hadn't really considered as an approach when dealing with content served from multiple URL's.
I can already hear the developers screaming out in frustration.
If you use a generic file (web config for IIS and .htaccess for php for example) to specify your domain name you can automate much of the process of making URL's absolute, hopefully that should make it relatively (absolutely) painless when it comes to maintaining the site.
Agreed, just noticed your nice concise comment after posting my own (less concise!) comment along the same lines. However I am eager to know from anyone if there is a reason why changing URLs is better than using an overall canonical redirect, there might be something I've missed here!
What do you mean when you say using an overall canonical redirect?
I am assuming you mean using the canonical tag to point to the correct URL's, but the canonical tag is not to be used to correct site structure. It's actually just a hint to Google, and can still result in duplicate content if Google thinks another URL is more valuable to the user.
Recently had an issue where Google was flat out ignoring the canonical tags, indexing two versions of the content!
I mean a few lines of code in the htaccess that ensures that all pages redirect to the preferred domain format. I've checked in Google and found a good number of sources calling it this, but I may have picked up the wrong term here and caused some confusion, sorry if that is the case!
And re the canonical tags - Google would still index both pages as I understand it, but the canonical tags ensure that Google knows you don't mean to present duplicate content purely for nefarious reasons. I'd not necessary expect Google to stop indexing the non-canonical version of the page. However this is rather based on less than scientific observation, so could be wrong.
So yes, if you want the pages not to be listed, you'd want to use a 'canonical redirect' as I am calling it, in the htaccess. Canonical tags allow the page to still be indexed, but tell Google it's not the origin page for the content.
Ah I see, yes that makes sense I had just jumped straight to canonical tag. With regards Google indexing, generally Google will only try and index the most relevant pages, generally using a canonical tag indicates to Google that the most relevant page is actually located elsewhere, this is where Google will decide which URL (or both) it wants to index.
Redirecting to the preferred domain is a good idea anyway, but I don't see how that solves the absolute/relative links issue. You will still have relative links if they are written that way and be open to things such as content scraping.
Ah good stuff, yes agree with your first paragraph. :)
I went into this in my lower comment more, but I truly thing content scraping isn't going to be inconvenienced much by just having to use a find/replace to change the domain across all the files. It will take an extra two minutes, and won't stop anyone.
The other thing to keep in mind when using the "canonical redirect" you've mentioned above is that if you don't also update your internal links, you're forcing browsers and search engines to take the extra step of going through the redirect. It's just not as clean and can dilute the power of your internal links somewhat.
But even simple thing as rel canonical can going wrong. Glen Gabe wrote article about them there
Mine favorite is 6th one - canonical to 404.
Nice WBF; really good topic. I actually never thought about the implications of scraper bots stealing the site with relative URLs (that would have been a huge fail on my part). I have the save views on duplicate content as you, but the relative vs absolute URL section gave me something new to consider. Thanks for the post!
1. 'Scraper Risk' - A simple script can change the domain name, this isn't a real risk.
2. Problems with https/www/etc - These should already redirected. But the browser (including spiders) will default to what it is already using, this isn't an issue in itself. Only a side effect of another.
3. The crawl budget issue is a huge and absolutely wrong stretch between website versions and relative URL's.
If you're doing the coding (as in your CMS isn't do this for you), go relative, it'll save headaches when you need to move things around later. This isn't a real problem. The video is basically a naive correlation between two things. Kind of like a bad conspiracy theory that took one point (not piece of evidence) and blindly applied it to everything.
Really bad advice.
It's true that the risks posed by scrapers are both fairly small and easily resolved by scrapers who actually know what they're doing. However, a small risk is not the same as no risk.
"Should" is very different from "is" - I still see https/http/www versions of websites not resolving to each other all the time. Just because it seems very basic to people who have been doing this for a while doesn't mean it's something that the entire rest of the web is thinking about at all. You say "the browser (including spiders) will default to what it is already using," but that's my point - from an SEO perspective we don't want that. We want one version of our website to be crawled and indexed. It's something that site owners should take control of for themselves rather than allowing Google to make the call.
It's hard for me to address your third point since you don't really elaborate, but in my ~10 years of SEO experience I've seen sites with large canonization and duplicate content problems get crawled less deeply and less frequently over time, and been able to increase the depth and frequency (not to mention the accuracy) of the crawl by addressing these duplicate content problems.
Lastly, the inbound link dilution of having four different versions of your website resolve can be a huge headache to fix. As I said in the video, often your dev team will not want to re-code your internal links from relative to absolute, and there are valid reasons for this, but from a site cleanliness and crawl cleanliness perspective it's worth fixing.
In your ~10 years of experience you haven't learned a thing about web development because you've no idea what I'm talking about. The problems you're talking about (as I JUST said) have nothing to do with relative URLs. Period. There's no excuse, logic, or "I'm experienced!" lies that can get around that.
As an obviously young SEO practitioner that does not have real world experience with companies of all sizes you stated that the presenter had no idea of web development which could of been said in a more pleasant way and not like a 12 year old bully.
The developers that I work with daily prefer to have relative URL's so when pages change or we have to change platforms for the client, custom builds, ect it makes the URL structure easy to deal with. In house at my company that is the reasoning behind it, but Ruth just had the point the if you are placing absolute URL's you are providing exactly what you want the search engine to see. Just today I had to do a bunch of redirects because my WMT account was alerting me of 404's and it was all based around how old URL's were handled from a content change.
My end point is I do agree with you in some instances, but every site is different and every client is different so to say that someone is stating a conspiracy like statement or that it is "Really Bad Advice" is just being naive and not approaching SEO, Web Development, and this community with an empty cup taking information from peers and using or not using it.
I will be sure to look out for your whiteboard Friday
Ethan - what are you talking about here? Please use examples, logic, or at least explain if you're going to disagree so vehemently. To your points:
1) Change the domain name? So your logic is that since a more sophisticated scraper can be made even if absolute URLs are used, we shouldn't get the link equity and the canonicalization benefit that comes from having the lazier scrapers point to us? IMO that's foolish and a waste.
By using absolute URLs, you can make those lazy scrapers all link to you, which then means that Google & Bing are more likely to consider you the original source when they find the more sophisticated scrapers that don't. If there's 50 copies of a site out there, and Google sees that 20 of them all link to 1, and the other 30 are non-linked copies, that 1 is still more likely to be seen as the original and rank.
2) "Should" and "do" are two separate things. It sounds like you're conceding that this is a valuable point and reason, but then trying to argue that because you should do it anyway (and engines should recognize it anyway), it's not a valid point? That's odd to me. Many folks may not be aware of all of these, and any piece on this subject that doesn't bring them up would be remiss, IMO. If your complaint is that Ruth's being too comprehensive, OK, but that strikes me as a good thing - better safe than sorry.
3) Crawl budget is not a stretch at all... I've seen hundreds of examples of large sites where conserving budget by canonicalizing URLs properly (as Ruth shows) meant a lot more URLs indexed and ranking, and faster indexation of new URLs, too. This is SEO - so we care about optimization!
p.s. Ruth has REMARKABLE knowledge of web development. She's the one who ran Moz's SEO for years, and she's worked on more technical SEO projects than I (or most of us for that matter) ever have. Please keep personal attacks about the author out of the comments and let's stick to the content.
Rand, I appreciate your distaste for personal attacks, and calling people to defend their reasoning. Kudos.
How would you counter-argue this from your 1st point: If scraper sites are going to steal my content with all the relative links included, their version of my content is going to be riddled with 404s. As far as links coming in to my site from my own scraped content hosted on their site goes, I really don't want inbound links from scraper sites; period.
It's not sophisticated. It's an extra 2 lines of code. This is NOT a form of security in any way. Don't waste time with absolute URLs, rejecting the huge benefits of relative for such a minor thing that is ignorant of how it works. As I'll say one more time, the crawl budget point ASSUMES so much that is absolutely wrong. Set your website version default and relative will ALWAYS point to it.
AGAIN, absolute URLs fix NOTHING. They are a band-aid that hide bad practice. Use relative and you won't break your own site. Got the issues laid above? There are BETTER, EASIER solutions that won't HARM your site.
This whole video is a completely naive extra cost onto developers in favor of some bad correlations that make it LOOK like the job is easier for the SEO. It's wrong. Period.
Ethan,
I have a client who had relative links and a ton of navigation links over 40 on every single page of their website after doing a lot of tedious work including changing their link structure to absolute URLs instead of relative URLs then removed the overuse of navigation links on every single page and their traffic doubled in less than 12 months. The site went from 50,000 visits a month to over 100,000 visits a month.
it's like you're saying adding having a subdomain and even separate hostname would not affect the way Google crawls your site?
This is Moz respect for others' ideas goes a lot longer when you are not hurling insults. Just because were not the same room does not mean we should treat each other poorly.
nburch,
All of Rand's points of are very logical take # Google and Bing would obviously use the incredibly powerful clue of your URL pointing back to where your original content was created.
It is the equivalent of saying how would you like 20 out of 50 people saying you were holding a gun at the scene of a shooting?
I also applaud you not using personal attacks and not condoning them.
In my ~20 Years of SEO Experience I have seen the same thing.This can be link dilution of having four different versions of your website resolve can be a huge headache to fix.
I don't know about the headache part but when you fix it you look like a hero.
I also like BrijB htaccess solution
We can all agree duplicate content = BAD
I am Dutch so excuse my bluntness (and crapy English). I am not trying to be rude, but I am simply straight forward in my reply. Of course anyone who want to reply to my reply feel free to be straight forward :-)
I have to completely agree wit Ethan and completely disagree with Ruth. I have not programmed in years, but the issues Ruth mentions create more problems than they solve. Scraping is a problem in any scenario. Maybe relative makes it a bit easier, but overall there is no difference. Canonicalizin does not solve scraping too. If someone wants to scrape you they will. Also is scraping an argument for using abosulte urls? To put it in a different perspective: you could better not leave your house because a meteor might hit you on you the head. Personaly I would not put my reasoning into thinking in risk. It's good to think about them and address them, but a good customer experience and page speed is propriety one. If relative makes a website faster (source please) this is a good argument to do it. Every (micro)second counts.
Secondly the problem is not that a domain has 4 url's and the duplicate content it create. It's bad planning from the get go. If you have some knowledge of the technical side of things (and I am not talking about search engine) you now: www and non-www are technical 2 pages. So as a result you think ahead for these scenario's and how you can fix them quick, easy and good. You can put a variable in your programming code that contains your domain as in https://domain.com etc. and put that automaticly in every internal url. From a budget and time effort relative is cheaper, better and quicker. Do you know how much time it takes to clean up urls? Plus if these problems that are mentions in this video a cur, from my experience: good luck finding someone with a document of all url's. Again: the problem is not relative or absolute: bad planning is.
My 2 cents. Do it right and do it relative. Create a sitemap of the every changing scenario for back-up and good management. Does the url structure change. Create a new sitemap and let search engines now about that. Redirect everything serverside, but this should be a no brainer for www / non-www issues in the first plus. Plus redirect via 301 when you go from http to https or whatever change in a later stage. And last but not least. Always use canonicals they are like 301 in a way. A good back-up to have plusyou can UTM / linkbuild all you want without having to worry to much about index issues.
In short. Plan ahead and start correct and there should almost never be duplicate content issues.
"Plan ahead and start correct" aren't luxuries that all website owners or SEOs have. It's very easy for people who know their stuff to scoff and say that of COURSE you should know that www and non-www are two different pages, but I see websites that are built with both resolving every day. I think there's room to be creating content for people who don't know as much about these things as we do.
As for relative vs. absolute URLs, I think we're going to have to agree to disagree - from my perspective, pointing search engines to exactly the place you'd like them to go with as few redirects as possible is worth doing and worth the extra time and effort it takes to code correctly. Your recommendation of "do it right and do it relative" only works if the "do it right" scenario is in place, which assumes that you were there when the site was built. Coding absolute URLs creates more headaches on the dev side but increases crawl efficiency on the SEO side - so every company is going to have to weigh those two priorities when implementing this change.
This is from a report on Alexa's pro audit I love Moz too.
"No action needed -- both of your hostnames point to the same site."
Most search engines treat these as separate hosts, so it is important to have one point to the other. Otherwise, you risk splitting your traffic across the two versions of your site. That would result in lower placement on search engine results.
Test example.com www.example.com
Resolve: www & non-www
HTTP https://example.com
HTTP: https://www.example.com
HTTPS: https://www.example.com
HTTPS: https://example.com
Each point to just one (NON HREFLANG or One language) https://www.example.com
Because of the way it is constructed in the results of the test I have given you a link below with a photograph to better illustrate what Ruth said is absolutely correct.
A snapshot of the original report here https://d.pr/i/1ky09
Hi Ruth,
Thank you for such a great whiteboard Friday.
More people have to talk about the cause and effect what this should something that is checked for across-the-board. It is really nice to see a focus on technical SEO. When it comes down to it for many sites especially e-commerce (in my experience) are set up to use relative URLs.
This should be added to the Moz Developer’s SEO checklist. Thank you for bringing up a crawl budget I have found deep crawl to be amazing at detecting and displaying crawl budget issues.
Although there is not a tool I know of that tells you if you have a relative link issue and why it is an issue.
A thought for Moz add this to on page reviews and maybe a tool that will show people if Google is indexing more than one of the four below.
Fixing a websites is a crawl budget I have made large improvements any website that believe they need to link to every page on their site.
Thanks again,
Tom
thank you - I have just been discussing this with our developers and they were wanting some evidence about why they should change. I will just be sending them the video. This could not of come at a better time for me
Ruth, good to see a new face. Intrigued about the duplicate content aspect. So it is not a penalty but simply a confusion factor. I always assumed duplicate content if "too heavy" was a penalty.
I see this a lot with our hotel clients.... and typically it's the client who's gone out and duplicated their content on other domains because it performs well for their website. You'll see a marked drop in Organic Impressions the further the content is duplicated... as Google may start serving up someone else's domain for your content instead of you (scraped site wise). You can find out more about that aspect of Panda here.
Everywhere in the application you should use relative URL instead of absolute URL. For example in the login page you have this reference to a javascript file(this URL is absolute! ):
<script src="/javascripts/main.js?1215344801" type="text/javascript"></script>
You should have this reference instead:
<script src="javascripts/main.js?1215344801" type="text/javascript"></script>
This is really important if you use redmine in a corporate environment where there is often a web server configure in a reverse proxy mode (example: apache httpd) in front of the web application (ex: redmine).
This WBF is specifically about websites, not apps - I don't have enough experience about app development to be able to speak to relative URLs in apps.
I'm sure I'll be repeating many other comments--I didn't have the time to read through them all. But here goes...
THE BEST AND ONLY NECESSARY SOLUTION is to canonicalize your website so there is only one version of protocol/domain being used. This fixes the problem at the source.
As Ruth mentioned, relative links can be a requirement if you have dev/test/stage/production environments. And if your dev team has to hand code things and your domain name (or protocol) ever changes, every link on the site will need to be updated. Not a way to make friends with your dev team. :-)
So relative links are my first choice, but I'm usually a fan of root-relative links to help prevent spider traps--all links start with a forward slash so they're referencing the root of the website instead of the location of the current page.
The scraping thing is a non-issue. It's trivial to replace the domain when scraping a site to make the links relative instead of absolute. Most of the scraping tools I've used do that automatically.
And specifying the canonical URL is definitely a good thing. Here's more about it, from Google: https://support.google.com/webmasters/answer/13906...
Great Whiteboard Friday, and apt timing as we are in the process of building a new website!
Ok so if someone could please help me and confirm the below points as I understand them:
1) The 4x versions of domain should resolve (redirect) to a single domain server side? (Will this also ensure all the link equity from existing backlinks to the different domains are passed to our preferred version?)
2) Web editors going forward should ideally use the absolute url of the prefrred domain in all internal linking?
Thank you so much MOZZERS!
Ben, London
Great article Ruth thank you for the information.
Thank you Ruth for this fantastic Whiteboard Friday. It has helped me understand why do few website self-canonicalize the pages. Nobody was able to answer this question. I even posted in the Q&A section. I even attended the Google Hangout to get the answer but... :p Thanks a ton!
Thanks for the article. We are developing tutor profiles on our website and would like to know whether you know what the best option to use in terms of URL structure. for e.g. www.gutsytutoring.co.za / tutors-rondebosch-cape-town or www.gutsytutoring.co.za/tutors/rondebosch/cape-town/ . Which is the best in terms of SEO?
Hi Kyle,
I recommend posting this question to the Moz Q&A forum, where it will be seen by more SEO experts who can weigh in.
Hey Ruth,
As somebody in the top 20 on Q&A I have reference this article multiple times. It is extremely valuable.
The one part I wish was brought up was setting the canonical via Google Webmasters tools / search Console. In all fairness though geo SEO goes so far. On Google search console does not matter much outside of certain countries.
Therefore GWTs cannot fix everything if you're using Yandex, Baidu or even Yahoo! in Japan
That's a great topic to do some analysis on. How many people are experts who watch WBF and how many people are trying to do it themselves. Just learn or finding that they need any agency.
PS love the idea of bringing the developers in it is something that is needed desperately but also help you get links reference to this article.
All the best,
Tom
You might remember me from such films as...
Really interesting article Ruth. Good to see you're still active on Moz. I work alongside a team of developers so it's invaluable information which I will be passing on and advising accordingly.
Thanks Ruth, recently i had changed my web to https and with canonical and 301 redirect i solved all problems :)
First of all welcome to the Moz community, it's my first post here, however i'v been following the amazing work that has been done for a long time.
So to the chase, my company has presence in over 40 markets, with local website for each one of them. Consequently, we have several websites with duplicate content in english, i.g. Australian english website & UK english (those have to be on separate domains for the regulatory reasons). Me and my team were arguing for quite a while wether google can actually see that, for instance, www.website.com & www.website.com.au are actually the two versions of the same website the same fashion as it's done with subdomains & not penalise us for duplicate content?
If the question is relevant or uniformed - my sincere apologies.
I would love to do a white board Friday on crawl budget.
It is so important to get this information out there to everyone It is extremely difficult complex for a short video.
Regardless I am going to create one and send to you guys tell me what your thoughts are after seeing It.
It is probably one of most important parts of search that is unfortunately overlooked constantly.
There most likely better people than I to do this Rand, Mike from DeepCrawl, Dan from Screaming Frog to name a few.
I too would love to see this Thomas, as you say it's an important and somewhat complex topic.
Thanks William I appreciate that. I will send you a copy of what I make.
Thanks for this geeky article Ruth, great explanation of the crawl budget concept!
Hey Ruth,
You mentioned using absolute URLs for the canonical tags - do you think it would be worth changing the hreflang tags from relative to absolute, or would the self-serving canonical tag be enough to cover everything?
Thanks,
Sam
Hreflang one of the most critical URLs in any multilingual site.
They should be served with an absolute URL. Of course you should update the canonical to reflect the change in the URL as well.
Think of Hreflang as navigation URL ( because they truly are).
They should be absolute
Thx Ruth. People who haven’t already done this should do this fast. Duplicate content is a sign of low quality content. Search Engine Land mentioned that a new Panda update is coming in the upcoming weeks, which aimed at low quality content. This can screw up your rankings.
Ruth, you go silent after 1:15 mins on our PC's, no volume! Anyone else getting this?
I got message "Post not Found", after 2-3 clicks the post URL got opened
I had the same audio problem when watching in FireFox, but watching it in Chrome worked fine and played all the way through with no audio problems.
That's so strange! We heard multiple reports of issues with the video early this morning (~2-3 a.m. in Seattle), and they had all magically resolved themselves by the time we woke up. So much for troubleshooting. =\
We'll dive in a bit deeper and see if we can't figure out what went wrong. Thanks for letting us know, folks!
I have switched a half dozen sites from relative to absolute and they all improved in ranking. I am old school anyways and like control.
I do not like duplicate content and some of the eCommerce platform make the craziest duplicate URLs that amount to SEO poison.
Scrapers lol I love it when they duplicate the content and leave my affiliate links still in page. I heart lazy thief's.
rel=canonical is your friend
Happy Friday!
Great WBF Ruth. It sounds simple but it's often overlooked by a number of people, possibly because they think that it's a big resource for web developers or it's not important. But despite the work involved, the benefits by far outweighs the time required by developers to spend on it.
I think people are also forgetting about URLs that end with and without forward slash too, as they are also the cause of duplicate pages if they are not handled correctly. So essentially, you can argue that there are possibly up to eight versions of your website Google has to deal with:
Finally, you may also want to look at the 'Site Settings' within Google Search Console (yes, we have to use that name now) and select your preferred domain (or the canonical domain if you will) to help Google further with their crawling and indexing of your site. More details are available here >> https://support.google.com/webmasters/answer/44231
Thats wrong Ahmed, that's not how directories work. With or without the trailing slsh, it goes to the same place. With no .HTML or .php on the end the browser is looking for /example/index.html or /example/home.html regardless of how it's linked.
Not quite right either. Slash or no slash on a domain only URL makes no difference. But once there is a path in the URL an ending slash does make it a different URL.
Same things:
Different things:
No it doesn't. Test it.
https://googlewebmastercentral.blogspot.com.au/2010...
"Rest assured that for your root URL specifically, https://example.com is equivalent to https://example.com/ and can’t be redirected even if you’re Chuck Norris."
There you go, you don't need to test it after all. Although, if you want the intuition to know why that is, I'd suggest spending some time building your own website (NOT through a CMS) to get the feel of things.
Would developing a few CMSs qualify me?
I think the burden of proof is in the other direction. Can you provide an example where the addition of a slash to a root domain URL makes any difference?
Say, a website that responds in any different way between the two. e.g. different content, one redirects, one errors, http response headers are different?
Or if you can show a search engine that has an indexed copy of both.
Why not try to do a redirect from one to the other? Or find a way to change the content in one?
Or try in the chrome browser to visit both ;-)
The URI structure means the slash or lack of it is the path from the websites root folder on the server. slash means current directory, no slash the same (or it means nothing). So they point to exactly the same place and therefore serve up the same content.
I tried testing it using .Net requests on an IIS server. Unfortunately .Net always adds the slash to the URI before making a request. Maybe a clue there from Microsoft as well.
Come on man no laws of physics, mathematics, pretty much any law including the letter of the law apply to Chuck Norse ( or Beetlejuice)
But his web developers not CN he will be roundhouse kicked in the face multiple times for creating relative URLs.
So to solve for the X factor rember X is CN=RHK but no one has lived through the round house kicking to commit relative URLs crimes and lived to Tell the tale.
It's true not even the extras on Walker Texas Ranger.
What an awesome expleneation Ruth! Thanks :)
Thanks for the video, all very interesting points and some of them really coming from a perspective that is new to me.
I almost feel the argument '1) scrapers' (against absolute URLs which states that it makes it easier for someone to scrape your site and re-upload it elsewhere) makes such a small difference as to be irrelevant. It takes mere minutes to use a program (like Dreamweaver say) to perform a find/replace on a domain name in a batch of html site files, so you're literally only making the whole process easier by 5 minutes.
However, I do agree with the motivation for all of your arguments, I suppose it actually just comes down to two solutions for the same problem... When working on SEO for a site, we always implement a canonical redirect in the htaccess file for a domain, to ensure that users and spiders are always redirected to the format we are optimising. It takes two seconds to implement, and works very effectively.
When working on sites built server-side or by another company and with limited access etc, changing the URL structure is harder than adding a few lines to the htaccess. So I personally feel the htaccess method is easier and more efficient. However, I'm sure this may not always be the case, so considering both solutions is always a good idea.
I agree with you, realistically if someone wants to scrape your content and they know enough about development they will find a way to do it.
However adding these barriers increases the cost of doing it which will deter some people. This is just one point, and you can often see if there is much value in this by searching for your content on the web to see if anyone has scraped it.
Totally agree, Richard - a determined scraper can bypass all of these things easily, but it can help against fully-automated scrapers that don't take the time to update content but just slap it up as-is.
Thanks, Ruth! Glad to see you on a WBF! Even though this was a more basic SEO topic, it's still packs a punch for a lot of new developers, new marketers, new SEOs and, hell, even some more experienced professionals. I know I learned about the scraper threat, so that was exciting and bothersome at the same time :)
Great content here for any level SEO and developer. Developers and SEO/marketers needs to be in sync for the goals set forth. Otherwise you run the risk of doing double the work if needing to change things up in this regard to relative vs absolute URL structure. - Patrick
Thanks for sharing informative post, in WBF class!
Nice Post. Having SEO-friendly URL structure on a site means having the URL structure that helps the site rank higher in the search results. While, from the point of view of web development, a particular site’s architecture may seem crystal-clear and error-free, for an SEO manager this could mean missing on certain ranking opportunities.
Technically, I don't think the URL structure helps much with ranking higher--it helps Google discover and crawl your site more efficiently and mitigates duplicate content issues. Ranking is based more on quality content and other signals.
What about if my https version is just blank? Need to redirect?
If you happen to be using wordpress, just force http or https & it will handle the redirect for you.
Scott is correct - forcing one or the other is the way to go. You definitely don't want the https version to be blank or not found.
If you have SSL, you're better off redirecting everything to HTTPS since Google is leading everyone in that direction.
Really helpful, thanks Ruth! You have cleared up a few issues for me.
Great article Ruth. Duplicate content is always an issue. I can't expect top ranking positions for my website in google if i am having duplicate content present on my website. Your article might giving an favour to use of absolute url.
Great information and great video. It seems that changing from http to https will affect your rankings because the highly ranked http page no longer exists. Am I missing something or does Google have a way of knowing that my https page is actually the same as my http page and should be ranked similarily?
Excellent article. Exactly what I was looking for.
You miss and importan (and very freak) reason.
Using relative URL put less information in the DB, making it a little moré light and fast.
The tutoring was excellent. Congratulations Ruth!!
Hey all, I realize this is a relatively (har har) older thread, but as a follow-up, how does this guidance from Google re: https factor in? They are recommending relative URLs: https://moz.com/blog/relative-vs-absolute-urls-whiteboard-friday. Would appreciate thoughts.
Yelling at developers... you only need to place the URL in a global variable e.g. <a href="<% mybaseurlvar %>/blah.htm"> then change the value in that variable when switching between live and dev servers.
Completely agree, but for a lot of sites going through a http to https conversion they are praying for relative urls.
Another fantastic article and Whiteboard Friday. The team here at Urban Media are always educating our clients across Watford on the importance of doing the right type of link building but it can often be difficult for others to understand how it can negatively impact their SEO.
For one of my clients their website was originally built with .ASP extensions then many years later we moved on to .ASPX extensions and this year moved from http to https.
So we had to add a redirect from every ASP to ASPX to make sure the old links still work and setup the Web.Config to automatically load https instead of http giving a Permanent Redirect code so search engines know the URL has changed permanently.
Worked out pretty well, the main SEO key phrases remainig with the same ranking position after moving from http to https as, meaning Google processed it well. Now we're also upgrading the site to responsive web design to further enhance the SEO and user experience.
Good for you for making sure your client's move was handled well and everything was still working! They are lucky to have you.
Great WBF Ruth, I just have a question for you: What is your take on URLs with a trailing slash and without.
For Example: https://www.abc.com/abc
And : https://www.abc.com/abc/
Although both these URLs point to the same page but according to Matt Cutts it was better to go with a trailing slash.
With trailing slashes, as with the other examples above, I think it matters most that one version resolves to the other, rather than which one you use. I know a lot of web devs prefer to use the trailing slash, so that's what I would lean for, but I haven't seen much SEO benefit specifically from using one over the other.
Thank you Ruth :) I personally prefer the trailing slashes although I was curious about any SEO benefit but sometimes it is just fine to go with the flow.
Great post that covers an aspect of SEO that I hadn't heard a convincing argument about one way or the other before. It makes sense that absolute URLs are the way to go for a number of reasons.
One more questions - what about shopping cart pages? If we have a large number of product variations, sometimes it is hard to just have all the URLs keyword optimized. What is your recomenndation for such scenarios?
I think this might be a better question for the Moz Q&A section, where you can get opinions from community members who may not be reading this comment thread. You can find it at moz.com/community/q.
Hey Ruth, really interesting theory about urls. Specially about duplicate content, I totally agree with you that many people think of penalty and that's not real.
i just recently updated my website www.vividcleaning.ca to redirect to www version i thought that redirect will give me a ranking boost, but i didn't notice. My guess is that these duplicates does not impact small websites that much. I had set preferred domain in google console. Now i will set up absolute links in menu. thanks for explanations.
It's unusual that you'd see a huge ranking boost from just this one change - SEO changes are often cumulative in nature. The first step is making sure your site is as easy as possible for Google to crawl, index and rank - eliminating duplicate content is a great first step toward that. The next step is continuing to optimize and build content, links and authority in your niche. Good luck!
Thank you this was very informative. Sometimes one can get caught up with other stuff and forget the basics. What about for WP sites?
Depending on how they're set up, WP sites can definitely be prone to all of these errors. Make sure you're checking that www, non-www, http and https versions of the site are all resolving. If relative links are being automatically created for e.g. nav items, you can usually change that in your settings.
Hey Ruth,
Thanks for sharing this - it's a really succinct way of articulating why how your URL structure is structured matters. Any idea to what degree this applies when URLs aren't force lowercase and will render and version of the URL in terms of capitalisation? That's one of my big pet peeves.
Absolute URLs that aren't made as lowercase URLs are not necessarily going to be duplicates because the person that created them obviously made one (I hope) mixed case URLs are not good to build however there's already only one of that mixed URL if it's absolute & setup the right way.
If you think about the way a database servers assigns a page ID to a URL or your server cashe.
It is going to be ideal if they're all lowercase but if mixed case URLs are made when somebody type in the URL that has an
/I-Love-Slow-301s/ ( redirects will be needed)
yes however having Multi cAseD URLs Slows your site down.
vs
https://example.com/i-love-fast-200s/ (no redirect needed)
No matter what must force the domain redirect from HTTP to HTTPS or HTTPS to HTTP (your pick)
Then whether or not Multi cAseD Url is absolute does not change much outside of site speed & the fact absolute URL should be used anyway. Site speed your server cashe will miss behave and your users will have a Less then perfect slower experience.
Glad I'm doing the right thing then when it comes to using absolute URLs in our websites navigation instead of being only relative links Thanks for the clarification in the videos!
Hi Ruth Burr Reedy,
Nice descriptive whiteboard Friday you have given. After watching complete video I started analyzing my two websites. First case, 1 Static Website has very low authority around 15, and also it has very less back-links and having strong social signals but when I check it technically it uses absolute URL, page speed is average, Also the internal linking is no more then 15 on each page. And you don't believe its ranking high from last 1.5 years, (and from last 9 months I am not putting any efforts and still daily I get 700+ visits)
Now the Second Case I analyzed is my main website which is dynamic eCommerce website and developed in Asp.net, developer uses relative links on all internal links, page loading time is 10 Seconds, Domain authority is 30+ and back-links and social strength is also good compared to case 1. But there are 150+ internal links on each page. And I am facing issue to rank it well, sometime I got top positions and the same has gone within next 3 weeks. I don't know why it happens. I had also implemented canonical links on each page, so very less chance of duplicate content. Also the 301 redirection receives both status 301 and 302, which is also a big issue.
After watching your video I had strongly guide my web developer to do this task on priority basis. Implementation will take couple of weeks. Once it live i will wait for 2 weeks and update you whether absolute links works for me or not?
Thank you
Vijay Bhabhor
Got to know about the crawl budget. Thanks for sharing Ruth.
Great WBF, cheers!
I think Moz's developers need to take a look at this as their URLs are not consistent - some are https://moz.com and others are moz.com.
I always use relative URLs for my Website.
you use relative URLs and you change your path than it's not necessary to change your URL.
Thanks for Sharing this wonderful Blog with us
Hello Ruth,
After login, I got find your post because it was showing post not found before 15 minutes. Anyway, you explained very well in the video about why should fix the canonical issues. Still, the future will gonna finished for those who have this issue and not taking care of it.
Especially, the "Crawl Budget" part made me stuck because I actually did not know about it. So, now its time to play safe games, not give such hints to Google and be original (absolute) always. Thanks for all the information Ruth.
Great post. A lot of things you say make sense. Have you come across situations where having too many keywords in your URL has negatively affected rankings?
Yamini
Hi Yamini!
You actually might have more luck with this question if you ask it in Moz Q&A. :) There are a number of experts in there who could help.
Good luck!