Apache webservers have a really cool and useful little feature called RewriteRule. It's sometimes referred to as the Swiss Army Knife of URL manipulation, because it's got two very important, often-used abilities:
- URL rewriting: turning dynamic, parameterized URLs into readable, keyword-loaded URLs
- 301s: to tell the browser (or search bot) that a page has been moved
These are VERY different tasks, with VERY different results sent to the browser or search bot. RewriteRule, in your apache2.conf file, or vhosts, or your .htaccess file, does both tasks. Think Dr. Jekyll and Mr. Hyde, except...well, in the case of RewriteRule neither personality is actually evil.
So how does RewriteRule work?
It's a command you put (often many times) in one of your webserver configuration files. Many people put the RewriteRule directive in their .htaccess file, but really the .htaccess file should only be used for directory-specific configurations, and you should use your apache2.conf file or vhosts instead, to cover all files requested from your webserver. However, .htaccess can live in the root directory of your website and apply to all files on your site, or you can override it in a particular subfolder by putting a different .htaccess file there. If you choose to use .htaccess, make sure you've added the AllowOverride All directive set.
Every request made to the webserver by the browser (or bot) passes through this file. Keep in mind that for each page on your site a user accesses, several requests are made of the server: one for the core HTML, one for each style sheet, one for each Javascript file included, and one for each image on the page.
The URL requested is compared against the regular expression that's the first parameter in each RewriteRule statement in .htaccess. If it finds a match, then the 2nd parameter in the RewriteRule statement is the page that's actually going to be used to create the HTML that's sent back to the browser. That does NOT necessarily mean the page is redirected...for that, you need to use the [R] flag. But DO you need to use the [R] flag? Not so fast...
Don't confuse 301 redirection with URL makeovers for SEO. Both of these can happen RewriteRule, and they look very, very similar. But like the screwdriver and the corkscrew in a Swiss Army knife, these have pretty radically different purposes.
You use a 301 when you want to tell the browser (or Googlebot) "it's not here anymore, go over HERE right now to get it...I've moved it permanently". You're:
- admitting it's not really there
- telling the bot or browser where it's been moved to
- telling the bot or browser that the move is permanent
- telling the bot that anything that used to link to the old page should link to the new page, because it's the same content, just relocated
If you're using RewriteRule for an "URL makeover", you're creating a mapping between the URLs for your pages that you show to the outside world (e.g. in your navigation, links within content, sitemap, etc.) to the underlying pages that actually generate the content. Most of the time, you're doing this because the underlying pages use parameters like product IDs, category IDs, etc. rather than the pretty keywords that make your URLs easy to read, and rank better for those keywords.
Example:
/products/details.asp?pid=11623&catid=42
Perhaps category 42 is necklaces, and product ID 11623 is your SKU for a certain amethyst necklace. The text, image URL, etc. for that necklace is probably stored in your database under a primary key of 11623 (the product ID of that necklace), and info about the category (such as the word "Necklaces") is probably stored in another table in your database.
When your webserver needs to show the page for that necklace, it can look up category ID = 42 in the database to find its name ("Necklaces") and display that in the title, meta description, breadcrumb links, etc. Because you're a smart cross-seller, you probably also pull from the database a list of other pieces of jewelry in that category and show links to those pieces on the page as well.
Then, of course, your webserver looks up all the product info: description, brand name, weight, links to photos, price, etc. from the database and plugs that into your template for the page. Voila, there's your very pretty necklace....and your very ugly URL.
You'd like the URL to be something more like:
/products/necklaces/purple-amethyst-necklace-11623
But you still want to keep all of the logic in your parameterized ASP page, because hey, it works, and also it's a pretty efficient way to pull all the stuff from the database and build the page for every item in your product catalog.
So you build your site with the pretty links with the keywords in them, and when someone clicks on one of those links, you want your webserver to simply figure out what the parameterized page is and get that page to generate and return the HTML....without letting the browser (or Googlebot) see what you're doing.
If you 301 to /products/details.asp?pid=11623&catid=42, Google will index the parameterized version...and the parameterized version will appear in the user's browser too.
Instant UN-makeover! Not what you wanted.
Whether you're using .htaccess, or your 404 handler to do your rewrites, the decision about whether to 301 or simply rewrite is basically the same:
- if you want to tell Google the page is really somewhere else, do a 301
- if you want to show a pretty URL, but use an ugly URL behind the scenes to generate the HTML, don't use a 301
So what happens to the link juice if you DON'T use a 301?
Nothing. It's all there. You do your link-building externally to the pretty URL, and you link to the pretty URL from within your website. As far as Googlebot is concerned /products/necklaces/purple-amethyst-necklace-11623 REALLY exists, it's got all this great content (from your database), and when Googlebot
requests that page, they get all that juicy content back, along with a swell little HTTP 200 (OK) status code.
Why would people confuse these two? Because the RewriteRule statement in .htaccess lets you do both, with very similar syntax.
RewriteRule syntax
For a simple URL makeover:
RewriteRule ^oldstuff\.html$ newstuff.html
This checks to see if a page named oldstuff.html was what was requested. If so, it transfers control to the file newstuff.html to generate the webpage and send it back to the client. The client (the bot or the browser) still thinks they're looking at a page called oldstuff.html. No 301.
Other notes: the ^ indicates the start of the page name, so that the rule will match oldstuff.html but not reallyoldstuff.html. The $ indicates the end of the filename, so that this rule will match oldstuff.html but not oldstuff.htmlly. That slash in the middle? Well, that 1st parameter is a regular expression (often referred to as regex), and in regular expressions, . is a wildcard that matches any single character. Preceding it with a \ is called "escaping" the character, and indicates that we don't mean the wildcard character . but rather we actually mean a period.
Now, a 301:
RewriteRule ^oldstuff\.html$ newstuff.html [R=301,L]
This is a 301 redirect. It's a redirect because we used the R flag inside the []. It's a 301 because we put =301 after the R; if we'd left that out, it would be a 302 redirect, which indicates we just temporarily moved the page, and links to it wouldn't pass any link juice. 99% of the time you're going to want to use a 301, NOT a 302.
There are two parameters inside the [] brackets, separated by a comma. That second parameter, the "L", stands for Last. It says that if the regex pattern matches the page that was just requested, then after whatever processing is done (in this case, the 301 redirect to newstuff.html) then we can skip checking the page against any of the other rules in the .htaccess file. 99% of the time you'll want to use the L flag with your 301 redirects.
92% of the time you'll want to use the L flag with your non-301 rewrites. Why not 99%?
Sometimes it's helpful to have multiple rewrite rules applied to an incoming URL. Let's say you have a number of first-level folders which you want to rewrite, plus you have a number of subfolders you want to rewrite as well...each of which occurs in all of the 3 first-level folders. You can do your main folder name substitution in one RewriteRule (preserving the next level folder as-is for now), then apply a second RewriteRule that preserves the just-updated top folder while rewriting the next folder down.
Example:
Original URL:
- /prods/metal1/necklace-11623.htm
RewriteRule #1 might substitute /jewelry-products/ for /prods/ so now you have:
- /jewelry-products/metal17/necklace-11623.htm
RewriteRule #2 might substitute /gold/ for /metal17/ giving you:
- /jewelry-products/gold/necklace-11623.htm
Now, for bonus points, let's say we have an entire catalog of jewelry pieces in their, each with a glorious photo named [product ID].jpg. How very convenient for our database and our programmer. How terribly sucky for SEO for image search. Remember how I said that requests for images go through .htaccess as well? You can use RewriteRule to map the name of the image to something more friendly too, so that you can show Googlebot an image named something like:
- /images/necklaces/gold/amethyst-11623.jpg
Instead of the real filename:
- /images/prods/11623.jpg
Now, RewriteRule isn't the only way you can do a redirect or do an URL makeover. Next week I'll post about how to do this in your 404 error handler--there are some advantages to doing it there instead, including ease of debugging your translations, ability to translate from words in the URL to IDs by looking them up in your database, and performance benefits for large sites.
Some references for brave readers:
A simple guide to .htaccess from YOUmoz:
Help with regular expressions:
Info on setting up 404 handler in Apache:
Info on 301s in htaccess:
A past post I did on writing your own URL rewriter from scratch:
Since the first time I've signed in SEOmoz I started creating a carpet in my favs dedicated to posts like this one.
And even though the most marketing-kind-of-posts are the ones I like the most because they touch my most entrepeneur cords, I've to admit that the technical SEO posts are the ones I re-read the most, as they are always so useful for people like me come to the SEO world from the marketing/content field and not from the devs universe.
Thank you MichaelC, and thanks for the useful collection of links at the end of the post too.
P.D.: if I can make a request about Rewrite, it would be some sort a "Dumb Guide about SEF" series for all the most common platforms (Apache, ASP, ASP.Net)
wow this is pretty much the exact same comment I was going to write haha.
Great post! I also tend to lean towards the more creative marketing posts because I'm also originally from the world of marketing as opposed to dev (I'm self taught there like I think a lot of us are ;) )
I really appreciate these posts that break down some of the technical aspects of SEO and explain why we need to do certain things and what is going on behind the scenes when we do.
Loved the E-commerce necklace example.
Added to my bookmarks :)
Thanks Mike, it's good to know the nerdier posts like mine are appreciated :-)
Nerdier posts like yours are not only appreciated, they're sorely needed for code challenged nerd wanna be's like me.
Ok, that's the best explanation of 301-redirects vs url-rewriting for dummies I've ever read. Still a smidge technical for me, but I think if I re-read it a few times it might make sense :)
Thanks! I'll admit I spent a lot of hours pulling my hair out trying to figure this stuff out the first time I worked with it. It's a classic case of the simple example is simple, but any real-world example is much, much tougher to actually do.
I'm pretty much off the deep end on the technical side, and I struggled with it, so don't feel bad! One of the issues is that while RewriteRule is powerful (and so are regex's), it's pretty black-box...you put stuff in, close your eyes, and it either works or doesn't. Makes it tough to debug...and regex analyzers help, but only take you halfway.
Awesome, very well explained post (love the illustrations!). This twin functionality of .htaccess used to really confuse me; the [R=301] rules made perfect sense but I found the 'URL makeover' rewrites confusing, particularly because the URL you 'end up' on is coded in the 'opposite order' to a 301. If I'd had this post back then it would have cleared it up for me in no-time with minimal brain anguish ;)
A couple of questions for you, on a slightly different note:
Thanks :D
for question 2.
We have done quite a bit of 301s recently and it took probably up-to 3 weeks for Google webmaster tools to show that all is fine, it probably varies from site to site depending on how often your site is crawled and indexed.
I'm seeing about the same delays with a couple of clients' sites where they've undergone a big 301 migration. Google is taking a few weeks...the change in Yahoo was next-day. Bing seems to be taking its time as well.
Hi Jaamit
Both bots and browsers cannot cache a .htaccess file or it's rules, as it's content cannot be viewed by them, only the results. Plus with rewrites they wont even know that anything has happened.
Bots will cache the files that have been rewrote/redirected, just as they would any normal file.
I would be interested to know how long servers take to parse the .htaccess file every time a request is made though (which I think is what you are alluding to in question 1). I have sites I've designed/coded with 400 products each with their own line in the .htaccess file, which I produce automatically with a php file every time I add/edit a product. Each product has it's own line and does not effect speed noticeable. If the site is any larger I'll generally use Magento to do it for me. But I'd be interested to know what happens when you have 10,000+ lines. I would guess that it won't slow the server down as it's just one of many processes the server goes through for every request.
On a lighter note though MichaelC (or possibly any other American for tha matter). The way you said '...might substitute /jewelry-products/ for /prods/' just looks wrong to me. I would have said '...might substitute /prods/ for /jewelry-products/' ...or at least I think I would have. Maybe it's just a football (soccer) thing lol ...either way, it's posts like this that make me think I should sign-up to be a pro member - very good.
I 2nd this--the bots have no visibility to .htaccess (well, I suppose they COULD read it if they wanted to). Having said that, note the difference between a 301 and a 302:
But that's about all the caching kind of effect you might see.
Oh, and the /jewelry-products/ substitution: the idea was to end up with an URL with likely search keywords in it. If the resulting URL had "prods" in it, you'd be targeting users searching for something other than jewelry :-)
And one last comment: re performance, stay tuned for next week...I'm working on a blog post about where to do your redirecting/URL rewriting (.htaccess vs. 404 handler)...performance considerations, ease of debugging, flexibility, ability to pull bits from your database to form the URLs on the fly, etc.
Hi Jaamit, stay tuned for next week's post on .htaccess and 404 handlers for rewrites/301s. I've had a couple of clients lately I've done fairly monstrous redirection/URL prettying for, plus I did a pretty massive system for the honeymoon travel company I started as well. Performance is an issue, but also when you start getting complex in your rewriting schemes, debugging regex can be tricky as it's pretty black box.
Hey Michael, dude I completely forgot to subscribe to this thread and didnt see all your and others awesome answers! Thanks for the replies, off to look for your other post now ;)
This is where I got the idea bout search engines caching 301s from: great article from redirect legend Ian McAnerin: https://mcanerin.blogspot.com/2009/09/what-you-probably-dont-know-about.html
Lots of good content in this article, and it really explains URL rewriting well.
Do you have any suggestions for where to get this kind of information if you're running IIS (with ASP.Net) rather than Apache. The RewriteRule and .htaccess files are not part of IIS so I was wondering if this same kind of functionality is possible in IIS.
It's nice to see some more technical posts! ModRewrite can definitely be extremely useful, but very confusing. I took a stab at giving some basic intution on how the syntax works: https://mrcoles.com/blog/simple-way-understand-mod-rewrite/
For your specific example with the amethyst necklace, would you need two rules like this:
RewriteRule /products/details.asp?pid=11623&catid=42 /products/necklaces/purple-amethyst-necklace-11623 [R=301,L]
RewriteRule /products/necklaces/purple-amethyst-necklace-11623 /products/details.asp?pid=11623&catid=42
The first rule would prevent duplicate links, and the second would make the preferred link serve the content from the old link? And I think this would work, since the rules are applied in order—please correct me if I'm wrong :)
Also, examples like this product url are why I love using frameworks that make pretty urls easy—such as django or ruby on rails.
(edit: fixed formatting)
And...you have to make sure you don't lock yourself up in an infinite loop (been there, done that, felt stupid). I think your example is correct but then again I've coded a lot of RewriteRules that I thought were correct. And I was wrong :-(
You inspired me to test my code. For my configuration I needed a [PT] flag on the 2nd rule to allow post-processing, so it would look like this:
RewriteRule /products/details.asp?pid=11623&catid=42 /products/necklaces/purple-amethyst-necklace-11623 [R=301,L]
RewriteRule /products/necklaces/purple-amethyst-necklace-11623 /products/details.asp?pid=11623&catid=42 [PT]
(edit: fixed formatting again)
Great stuff! FYI, the HttpFox Firefox plugin is awesome at seeing the full set of HTTP response codes sent by the server.
I use it extensively when answering Q&A questions here at SEOMoz--it often makes me look much smarter than I am :-)
use google webmaster tools to find 404 errors and fix with 301s.
Definitely...excellent suggestion. 404s are not only wasted link juice, but wastes of potential real visitors!
Rewrite rules always make me break out in a sweat. I have read dozens of articles and rewrite specs to get what I needed to know, but this was post was easy to understand.
I look forward to the next installment.
good and interesting post
Good post, Its good to see someone do a simplified and easy to understand post about redirects. This should help alot of people out.
Yes, nice article. Have a question about "URL makeover". This is also so called "Friendly URLs".What about Search Engines? using "URL makeover" will may cause your both URLs to be indexed(due to link reference or sitemap)... I have already experimented with this. Both different links were in google index and both with the same content. Means both duplicated and as far as I know google may give you a penalty because of this.Am I right? Any comments on this? Regards,Alex
Missed a very popular YouMoz article that covers writing your first rule:
https://www.seomoz.org/ugc/making-website-urls-seo-friendly-and-pretty
HI all,
what happens whne I have to rewirte round about a million of pages?
It is huge portal in the classified market.
the .htaccess redirect is OK?
To put all the redirections line per line it could generate hundred thousends of lines.
Or it is better to use refresh or meta tags on the pages.
Have any of you experience in moving a big portal pages with 301?
Thanks in advance
M
Great post. does any one have any contact details for a URL-rewrite developer for a new Php project?
I want to re-write my website urls. my website is in classic ASP. I hv ISAPI re-write on my hosting server.
I want to rewrite and re-diredt my urls like follows.
https://www.mysite.com/fld1/pg1.asp
to
https://www.mysite.com/fld/fld-1/
can anyone tell me the rule for it. tyries many rule sbut not working.
Looking forward to hearing back from you all.
Thanks,
Rau.
So it does mean that a 301 redirect is really better.
Thanks for posting Michael! Kudos to you!
Definitely a 301 is better than a 302 if you need to move an URL.
But the key here is that RewriteRule does 301's, but ALSO does URL prettying. Syntax is very similar, but they're two TOTALLY different tasks.
With 301s, you WANT the search engines to know that the page is now someplace else.
With URL prettying, you want the search engines to see a nicely keyword-decorated URL, but actually...hidden, behind the scenes....use a different page on your server to generate the HTML to be sent back to the client.
Hi
Great post and great explanation.
A couple of questions.
1) In a recent article Matt Cutts suggested that 301 redirects lose some value, obviously a large pinch of salt is added but have you been able to replicate this?
2) I have also come across a couple of sites recently that have a range of legacy 301 redirects to each others i.e. www.homepage.com redirects to www.homepage.com/en and in turn redirects to www.home.com and then to www.home.com/en. Yep, hard enough to write let alone explain and yes, what a mess. Whats your take on this practice? What are the associated risks.
From what we've seen there's some reduction of link juice when passed through a 301....something less than 10% is lost (see this page on Redirection).
Multiple 301 redirects are definitely something I'd avoid like the plague. I had a Q&A question from a PRO member recently about some ranking issues and it turned out that not only were there multiple redirections, but the first one in the chain was a 302 :-(.
Making it tough for the search engines to follow your linking is probably not a recipe for great rankings!
And that brings up another point: when doing 301s, you should always follow up and try to get the original pages to link to the new URLs. And that includes your own nav links :-). No point in throwing away any link juice at all.
Thanks Michael
Thanks for sanitising my thoughts.
G
This is a fantastic post! There is surprisingly little good information out there that differs redirection from rewriting; this really boils it down.
This should be required reading for anyone who spends a lot of time using Wordpress. It's easy to flip the switch on rewrites in Wordpress, but a lot of people have problems because the codex doesn't explain how all of this actually works. It took me a long time to figure it out, this would have been helpful! :)
Thanks Jeffrey--perhaps it's worth a blog post to talk about custom URLs in Wordpress, plugging in the post name and category parameters, etc. I just finished doing this for a client actually and like you say, ONCE you figure it out, it's easy, but the docs are kinda skinny :-)
Great post, but the last link on the page is broken.
fixed!
Whoops, good catch, thanks! All fixed now...
What does [QSA] actually mean?
And could you give an example?
Thanks
OK, I'm lost...where did you see QSA?
Sorry, I should have explained!
One of our client sites was using [QSA], in their .htaccess like this:
^some-example/([a-zA-Z0-9\_\-)+])/$ some-example.php?v1=$1 [QSA]
Now I got to find out what the ([a-zA-Z0-9\_\-)+]) meant, but I couldn't find anything that could really tell me what QSA was supposed to achieve.
All I found was that QSA stood for Query String Append and I still don't get it!
Here's a pretty good explanation of QSA...search the page for QSA, it's about 1/3 the way down.
Thanks! :)
Very good writing job!
It's precise and has good explanations even though I think you could improve on the illustrations. Do it more "rand"-style (comic > real life images).
And here I thought I'd saved y'all the horror of my amateurish Paintshop Pro skills! :-)
I had the opposite reaction. I thought the graphics were great! I loved the monster view of the server and as far as Mr. Googlebot, I found myself wanting one.
Thumbs up for the graphics.
Thanks :-)
This is a great post - especially for those who are new to SEO site audits or don't have a developer handy.
Getting a handle on current redirects / lost pages is a great first step for an SEO audit. Right out of the gate it's pretty easy to identify if someone has set up 302 redirects where 301 redirects are needed. It's also not too difficult to dig up lost legacy pages with loads of inbound links and set up appropriate redirects.
Thanks...and that reminds me, on the subject of 301s/302s, there's a Very Large and Well Known and Inexpensive registrar (surely everyone knows who I'm talking about by now) that by default does a very wacky implementation of redirecting with starts with returning a 302, but has multiple redirects before completing. I don't recall exactly but I think it has to do with either redirecting an entire domain, or redirecting non-www to www. version of your site.
Either way, my point here is I'm a big fan of the HttpFox Firefox plugin to see the ENTIRE sequence of HTTP response codes. In the example above, when I was trying to figure out the problem someone in SEOMoz Q&A was having, some tools were telling me it was returning 302, others 301. HttpFox shows the full set, which was very illuminating (and frustrating, but at least then we knew the problem).
This has given me a great push in the right direction... I've been trying for an embarrassingly long time to figure out how to let my site editor make search engine friendly URLS. I've found the RewriteRule spot in the htaccess. txt. My site editor says this needs to be changed to .htaccess before I click the "yes" to better URL's... is the right one?
If you're using Joomla, here's some helpful info on that.
Right now, I'm using Mambo... it's sadly old and frustrating, but I don't think my boss will want to move anytime soon :)
OK, when you're looking at .htaccess files, keep in mind they're folder-specific. So, if you're trying to do something site-wide, make sure you're editing the one that's in the root folder for the website.
A great way to test that your .htaccess file is really getting read (i.e. after you rename it, make sure the Rewrite engine is on, etc. etc.) is to add a 301 for a bogus page, then try to hit that page, e.g.:
RewriteRule ^bogus.htm$ contactus.htm [R=301,L]
And if it IS getting read, you should get your contactus.htm page when you put www.yoursite.com/bogus.htm into your browser's address bar.
Doing it this way means that while you figure out whatever settings are needed to get this operational you're not messing with any users hitting real pages on the site.
Okay, I've got the root htaccess, now just have to figure out the renaming... thanks for the bogus code, I'll definitely be checking that.
I got it to work! Definitely doing the newbie dance over here; it's taken me sooo long to get this figured out. Thanks so much for the info, it really helped!
Thanks for the Joomla tip. I've been learning it and every little bit helps.
I wasn't aware that there were multiple .htaccess folders either.