First, a quick refresher: URL prettying and 301 redirection can both be done in .htaccess files, or in your 404 handler. If you're not completely up to speed on how URL rewrites and 301s work in general, this post will definitely help. And if you didn't read last week's post on RewriteRule's split personality, it's probably helpful background material for understanding today's post.
"URL prettying" is the process of showing readable, keyword-rich URLs to the end user (and Googlebot) while actually using uglier, often parameterized URLs behind the scenes to generate the content for the page. Here, you do NOT do a 301 redirection. (Unclear on redirection, 301s vs. 302s, etc.? There's help waiting for you here in the SEOmoz Knowledge Center.) |
301s are done when you really have moved the page, and you really do want Googlebot to know where the new page is. You're admitting to Googlebot that it no longer exists in the old location. You're also asking Googlebot to give the new page credit for all the link juice the old page had earned in the past.
|
If you're trigger-happy, you might leap to the conclusion that RewriteRule is the weapon of choice for both URL prettying and 301 redirects. Certainly you CAN use RewriteRule for these tasks, and certainly the regex syntax is a powerful way to accomplish some pretty complex URL transformations. And really, if you're going to use RewriteRule, you should probably be using it in your httpd.conf file instead.
The Apache docs have a great summary of when not to use .htaccess.
Fear Not the 404 Handler
First, all y'all who tremble at the thought of creating your very own custom 404 handler, take a Valium. It's not that challenging. If you've gotten RewriteRule working and lived to tell the tale, you're not going to have any difficulty making a custom 404 error handler. It's just a web page that displays some sort of "not found" message, but it gives you an opportunity to have a look at the page that was requested, and if you can "save it", you redirect the user to the page they're looking for with just a line or two of code. |
If not, the 404 HTTP status gets returned, along with however you'd like the page to look when you tell them you couldn't find what they were looking for.
By the way, having your own 404 handler gives you the opportunity to entertain your user, instead of just making them feel sorry for themselves. Check out this post from Smashing Magazine on creative 404 pages.
Having a good sense of humor could inspire love & loyalty from a customer who otherwise might just be miffed at the 404.
Here's an example of a 404 handler in ASP. Important note: don't use Response.Redirect -- it does a 302, not a 301!
For PHP, you need to add a line to your .htaccess pointing to wherever you've put your 404 handler:
- ErrorDocument 404 /my-fabulous-404-handler.php
Then, in that PHP file, you can get the URL that wasn't found via:
- $request = $_SERVER['REDIRECT_URL'];
Then, use any PHP logic you'd like to analyze the URL and figure out where to send the user.
If you can successfully redirect it, set:
- header("HTTP/1.1 301 Moved Permanently");
- header ("Location: https://www.acmewidgets.com/purple-gadgets.php");
And here's where it gets a bit hairy in PHP. There's no real way to transfer control to another webpage behind the scenes--without telling the browser or Googlebot via 301 that you're handing it off to the other page. But you can use call require() on the fly to pull in the code from the target page. Just make sure to set the HTTP code to 200 first:
- header('HTTP/1.1 200 OK');
And you've got to be careful throughout your site to use include_once() instead of include() to make sure you don't pull a common file in twice. Another option is to use curl to grab the content of the target page as if it were on a remote server, then regurgitate the HTML back in-stream by echoing what you get back. A bit hazardous if you're trying to drop cookies, though...
And, if you really need to send a 404:
- header('HTTP/1.0 404 Not Found');
Very Important: be careful to make sure you're returning the right HTTP code from your 404 handler. If you've found a good content page you'd like to show, return a 200. If you found a good match, and want Googlebot to know about that pagename instead of what was requested, do a 301. If you really don't have a good match, be sure you send a 404. And, be sure to test the actual response codes received--I'm a huge fan of the HttpFox Firefox plug-in.
Ease of Debugging
This is where the 404 handler really wins my affection. Because it's just another web page, you can output partial results of your string manipulation to see what's going on. Don't actually code the redirection until you're sure you've got everything else working. Instead, just spit out the URL that came in, the URL you're trying to fabricate and redirect to, and any intermediate strings that help you figure it all out. With RewriteRule, debugging pretty much consists of coding your regex expression, putting in the flags, then seeing if it worked. Is the URL coming in in mixed case? The slashes...forward? Reverse? Did I need to escape that character...or is it not That Special? |
You're flying blind. It works, or it doesn't work.
If you're struggling with RewriteRule regular expressions, Rubular has a nice regex editor/tester.
Programming Flexibility
With RewriteRule, you've got to get all the work done in the single line of regex. And while regex is elegant, powerful, and should be worshipped by all, sometimes you'll want to do more complex URL rewriting logic than just clever substitution. In your 404 handler, you can call functions to do things like convert numeric parameters in your source URL to words and vice versa. |
Access to Your Database
If you're working with a big, database-driven site, you may want to look up elements in your database to convert from parameters to words.
And since the 404 handler is just another webpage, you can do anything with your database that you'd do in any other webpage. |
For example, I had a travel website where destinations, islands, and hotels all were identified in the database by numeric IDs. The raw page that displayed content for a hotel also needed to show the country and island that the hotel was on.
The raw URL for a specific hotel page might have been something like:
/hotel.asp?dest=41&island=3&hotel=572
Whereas the "pretty URL" for this hotel might have been something like:
/hotels/Hawaii/Maui/Grand-Wailea/
When the "pretty URL" above was requested by the client, my 404 handler would break the URL down into sections:
- looking up the 2nd section in the destinations table (Hawaii = 41)
- looking up the 3rd section in the island table (Maui = 3)
- looking up the 4th section in the hotel table (Grand Wailea = 572)
Then, I'd call the ASP function Server.Transfer to transfer execution to /hotel.asp?dest=41&island=3&hotel=572 to generate the content.
Now, keep in mind that you'll probably want to generate the links to your pretty URLs from the database identifiers, rather than hard-code them. For instance, if you have a page that lists all of the hotels on Maui, you'll get all of the hotel IDs from the database for hotels where the destination = 41
and island = 3, and want to write out the links like /hotels/Hawaii/Maui/Grand-Wailea/. The functions you write to do this are going to be very, very similar
to the ones you need to decode these URLs in your 404 handler.
Last but not least: you can keep track of 404s that surprise you (i.e. real 404s) by having the page either email you or log the 404'ed URLs to a table
in your database.
Performance
For most people, the performance hit of doing the work in .htaccess is not going to be significant. But if you're doing URL prettying for a massive site, or have renamed an enormous list of pages on your site, there are a few things you might want to be aware of--especially with Google now using page load speed as one of its ranking factors. |
All requests get evaluated in .htaccess, whether the URLs need manipulation/redirection or not.
That includes your CSS files, your images, etc.
By moving your rewriting/redirecting to your 404 handler, you avoid having your URL pattern-matching code check against every single file requested from your webserver--only URLs that can't be found as-is will hit the 404 handler.
Having said that, note that you can pattern-match in .htaccess for pages you do NOT want manipulated, and use the L flag to stop processing early in .htaccess for URLs that don't need special treatment.
Even if you expect nearly every page requested to need URL de-prettying (conversion to parameterized page), don't forget about the image files, Javascript files, CSS, etc. The 404 handler approach will avoid having the URLs for those page components checked against your conversion patterns every single time they're fetched.
A Special Case
OK, maybe this case isn't all that special--it's pretty common, in fact. Let's say we've moved to a structure of new pretty URLs from old parameterized URLs.
Not only do we have to be able to go from pretty URL --> parameterized URL to generate the page content for the user, we also want to redirect link juice from any old parameterized URL links to the new pretty URLs.
In the actual parameterized web page (e.g. hotel.asp in the above example), we want to do a 301 redirect to the pretty URL. We'll take each of the numeric parameters, look up the destination, island, and hotel name, and fabricate our pretty URL, and 301 to that. There, link juice all saved...
But we've got to be careful not to get into an infinite loop, converting back and forth and back and forth:
When this happens, Firefox offers a message to the effect that you've done something so dumb it's not going even bother trying to get the page. They say it so politely though: "Firefox has detected that the server is redirecting the request for [URL] in a way that will never complete."
By the way, it's entirely possible to cause this same problem to happen through RewriteRule statements--I know this from personal experience :-(
It's actually not that tough to solve this. In ASP, when the 404 handler passes control to the hotel.asp page, the query string now starts with "404;http". So in hotel.asp, we see if the query string starts with 404, and if it does, we just continue displaying the page. If it doesn't start with 404;http then we 301 to the pretty URL.
Other References
Information on setting up your 404 handler in Apache:
- https://www.plinko.net/404/custom.asp
- https://www.webreference.com/new/011004.html
- https://www.phpriot.com/articles/search-engine-urls/4
Apache documentation on RewriteRule:
ASP.net custom error pages:
Great article on creating 404 pages for WordPress sites that keep customers on your site (thanks archshrk!):
Nice helpful article.
One thing, following Matt Cutts interview here recently, be aware that redirect chains are a BAD thing.
They suggest at most two redirects - so if you have a page that 301 redirects to a page that 302 redirects to a page that 301 redirects, change the first page to redirect to the last one...
This makes me wish we had a 404 page contest where we compare 404 pages, share tips and ideas and vote on categories like "most fun", "most helpful" and "best themed", etc.
Now, if you really want to kickass with your 404 page, and bring it to the next level, you can do recommendations to the user, according to the "pretty URL" that caused the error.
Let me explain: If the received URL is carrying some interesting keywords, why not use them?
If you take a look at the first piece of code Michael gave us, you see there's a "Seach our site" link at the bottom "/search.asp".
Assuming this page can perform search on your website, I'm pretty sure it does it using keywords, and more, it probably get these keywords using GET or POST variables.
You could simply copy the piece of code that queries your database, modify the part that retrieves these variables, and paste it in your 404 page, to retrieve search results according to the "pretty URL".In this case, our keywords would be "violet" and "widgets" (Just split the QueryString using "/" and then "-" as the separator, trimming the file extension (.asp) part.
Keeping the first 3 or 4 results, you can display them in your custom 404 page, under your error message, which could give something like this.... Tah-Dah...! Now, you get a powerful 404 page that acts like an extension of your internal structure optimization!
Oh no... we can't find that page!
 First, let us apologize--you were undoubtedly hoping for something more interesting and helpful. Perhaps we can help you find what you're looking for anyway.
Did you mean:
Nice!!! Thanks for adding this, great suggestion!
I already have this implemented on my 404 page(s). I followed this post for my WordPress sites.
That's a great reference...I'm going to add it to the list at the top so people don't miss it. Thanks!
Your post is really detailed and helpful. To add to the discussion, another way to do a 301 that I've found to be quick and easy to implement is to use the RedirectMatch.
Here's the usage:
RedirectMatch 301 Regex DestURL
Will
Thanks Will....do you (or anyone else) know a particular reason to use one rather than the other? As far as I can tell, they do the same thing, but come from different modules.
The PHP stuff is over the heads of 97% of the readers here and will probably result in more bad header responses then good if people actually try it.
thanks for your helpfull article.
Ä°t's very usefull explaine.
Perde
MichaelC
 Excellent post buddy. We're in the process of proposing mod_rewrites for an Ecommerce client of ours
We're getting significant push back from their ecommerce platform provider on this. Could you or anyone for that matter weigh in on the importance of having the ability to rewrite URLs?
 After an indexable navigation we would say it's #1. Specially for such an extremely competitive space they are playing in.
Thanks,
BRLM
It's definitely important...but take a look at last year's ranking factors survey. Things like the TITLE tag and internal/external anchor text are probably more important though.
I'd still say it's important enough that I'd switch platform providers if they can't do it. The difference between ranking #1 and #2 for any particular term is 2 to 4 times the traffic.
This is a good post on htaccess, i have been using htaccess redirect <a href="https://www.htaccessredirect301.com">here</a> to generate redirect rules. I have been reading moz article, This article is very educative.
It is amazing how many companies are not using the htaccess file, even your bigger companies that you would expect to!
please tell me when i saw report in seomoz tool then it show many 404 errors on my woebsite pages but i want to remove these errors with 301(parmanently moved).
SO i set these lines in my theme 404.phpÂ
<?
header("HTTP/1.1 301 Moved Permanently");
header("Location: https://www.mynewdomain.com/page.htm");
exit();
?>
so please tell me that when seomoz genrate next report then it removes these errors or not.
If not then what can i do.
Please tell me.
Gary, you don't want to do that. The search engines will test your site with a fake URL that they KNOW doesn't exist, and expect to get a 404 response from that request. Artificially redirecting all 404s to a page and returning a 200 response is frowned upon.
If you have specific pages that are being linked to from another site where it's difficult to get them to fix their link, then add 301 redirects for JUST those particular pages, i.e. create a page with that filename on your server and have it do the 301 redirect at the top of the page.
Hello,
please give .htaccass code for reditect the 404 pages.from which we can remove the errors.please.
<?
header("HTTP/1.1 301 Moved Permanently");
header("Location: https://www.mynewdomain.com/page.htm");
exit();
?>
IF these lines removes error then please tell me.
Thank you
There's a nice tool for generating htaccess redirects here.Â
SEOmoz has a great overview here as well, with a number of good examples too.
Outstanding atricle. Really good advice and direction.
 We always use .htcaccess here
------------------------------
Tony Heyden
Goodings Media
https://goodingsmedia.com
Interesting article, but are you sure you can force a 200 HTTP response from a custom error page?
I've tried this on two sites and, while I can force a 301 without any problems, in the case of a 200 I simply get a 404.
Perhaps there is some server configuration to alter in order to allow this to work?
Yes...you must set the status in the header BEFORE outputting any HTML at all to the client though.
That's what I've been doing.
Exact contents of my test custom 404 page:
<?php
header('HTTP/1.1 200 OK');
?>
Bailed!
Outcome:
URL=https://www.board-crazy.co.uk/nonexistentpage
Result code: 404 (NotFound / Not Found)
I get the same from Live HTTP Headers and Web Sniffer.
Obviously I don't disbelieve you, just trying to figure out what your server is doing differently!
OK I'm working on a "clean" test version of this now, so I can figure out what the issue might be...hang tight...
OK, test version #1: running PHP under IIS 6.0, here's what I had to do:
Now I can return 200 or 404 based on calling header('HTTP/1.1 200 OK') or header('HTTP/1.1 404 Not found') at the top of my error handler. Tested it with a complete & shiny 404 page that looks like the rest of my site, too :-)
Next, I need to create the Apache test example of course. The problem I'm running into with the hosting for my test server is that the CPanel config tool doesn't seem to want to let me create anything other than shtml, which of course won't go through the PHP processor...stay tuned...
OK, figured out how to make this work in Apache. Here's my 404 page that returns a 200. The key is to call ob_start() and ob_end_flush() to make sure the header bit with the status is NOT sent before we can override it with our header() call:
<?php   ob_start();   header('HTTP/1.1 200 OK', true, 200);?>[snipping out the HEAD, BODY tag etc. to avoid getting important stuff lost in the noise....]
<h2>So sorry!</h2><br/><br/>Looks like the page you're looking for is lost deep in the New Zealand Southern Alps, without a knowledgable hunting guide to help it find its way.</body>
</html>
    <?php    ob_end_flush(); ?>
Thanks for all your effort trying to get this to work but, no dice I'm afraid!
Using ob_start and ob_end_flush does something (stops any further requests being made in Live HTTP Headers) but the page still returns a 404 status!
Arghhh!
I'm introducing this thread to a coding forum I frequent as well to get more brains on it!
Hmmm...maybe it's webserver-specific? I didn't need to do the ob_start() trick with IIS 7.5 running PHP 5.3.2; but I did on my other test server, which is running Apache and PHP 5.2.13.
Thanks for the article. I had previously only used htaccess. This gives me something else to think about.
I like the idea of offloading redirects from the .htaccess to the custom 404 page. I hadn't thought about sending a 301 HTTPÂ header after the Web server already decided to display the custom 404 page.
Few questions:
-What's a "smack-down?"
-Why is the article called "URL Rewrite Smack-Down: .htaccess vs. 404 handler" if it's about URL redirects? Is URL Rewrite the arena in which the .htaccess and 404 handler are smacking down? :P
-How does this work if the file does still exist? The custom 404 page will never be served, right? In my experience URL rewrites are done on top of the underlying Web application, and all of the original files still exist.
To any readers having trouble implementing the PHP redirects: it's best practice to add exit; after you're done sending headers for the redirect so the page doesn't accidently continue loading.
Example:
header( 'Location: https://www.example.com/', TRUE, 301 );
exit;
I'm not saying the 404 handler is the better solution all of the time--in fact, on an Apache server, the URL rewrites are harder to do in a 404 handler due to the absense of a built-in way to invisibly transfer execution to another page.
True, the title would be more accurate if it mentioned redirects AND rewrites, since it's equally about each of those two.
If there was already a file with the same name, and you actually intend a DIFFERENT file to deliver the content, then yes, the original file would be served and the 404 handler ignored. In this case, you'd just wipe the content of the original file and replace it with a plain old 301 redirect (or, you could do a 301 in the .htaccess, which I believe will be processed first, looking for rewrites to "save it" before declaring a 404.
I haven't seen any cases where a large number of the new URLs match previous URLs, but I imagine that could happen if you were doing a mild change to your URL structure. Typically what I've seen is either a complete conversion from parameterized URLs, or else a "richening" of URLs by adding an intermediate folder name or prefix/suffix to the leaf name.
Good point about adding the exit(), by the way :-) Â Thank you!
Great post, I am not a programmer, but I can finally understand what exactly occures with a URL rewrite and how to direct the process to replace the parametres with more useful keywords.