Overview
- The core problem: your site uses parameter-happy URLs, but for SEO and user-friendliness you're dreaming of semi-readable URLs instead.
- You've got a lot of content, mostly coming from the database.
- The number of products, categories, subcategories, etc. mean the prospect of trying to create (and maintain!) rules for ISAPI_Rewrite or mod_rewrite makes you run screaming into the woods.
- Existing inbound links: redirecting the old URLs to the new URLs via 301 redirects
- Intra-site links: converting the parameterized URLs to the readable versions everywhere YOU link to them in your own site
- Serving content: when you get a request for the new URL, handling it with the parameterized page invisible to the user (and the search crawler)
- Duplicate content issues
For clarity (and because it's Case Study month at SEOmoz :-) we'll use my honeymoon registry and travel site, www.thebigday.com, for our examples. We group the resorts on our site by destination and sub-destination - e.g., Hawaii -> Maui -> Fairmont Kea Lani. I'll do the examples in classic ASP, but it should be very easy for you to see how to convert my logic to PHP, ASP.NET, etc.
We serve up a hotel page using an ASP page called /Package.asp. It takes 3 numeric parameters, 1 for destination (Hawaii in this example), 1 for the sub-destination (the island of Maui), and 1 for the resort itself (Fairmont Kea Lani). What we really want to show is something more like this:
/TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm
Note that I've worked another important (to me, anyway) key phrase into the URL (travel specials) for SEO purposes. "ISAPI_Rewrite?" you say? Fine, if you have just a handful of categories and product names...and they never (or rarely) change. In this example, our rewrite "rules" are essentially translating between names and ID numbers by looking up either one in the database.
How it's going to do its magic:
Inbound Parameterized Links
You want to 301 redirect
/Package.asp?dest=2&subdest=51&resort=123
to
/TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm
You need a function that creates the readable URL from the parameters:
Function MakeFancySchmancyUrl (nDestID, nSubdestID, nResortID)
Dim sFancyURL
sFancyURL = "/TravelSpecials/" & GetDestName (nDestID) & "-" & GetSubdestName (nSubdestID) & "/" & GetResortName (nResortID) & ".htm"
MakeFancySchmancyUrl = sFancyURL
End Function
- GetDestName(), GetSubdestName(), and GetResortName() are functions you need to write that retrieve the English name of the component given the ID, BUT....you need to do a little "cleanup" on the names (all of them) to make sure you get decent URLs coming out the other end.
Here's an example of a resort name that would behave very badly as part of an URL without cleanup:
St. Regis Princeville, Kaua'i
Essentially you'll want a function that simply removes any non-alpha character in the name and returns the (probably shortened) result, and each of the Getxxxx() functions must do call this on the names they return. Some people have used the technique of also embedding the IDs in the URL as well as the names. While that does simplify the look up process, I'll admit, I do think it reduces the readability of the URL to the user, and doubles the number of "words" in the URL that the search engine might be looking at. To me, it's the equivalent of putting duct tape on your website.
Intra-Site Links
You'll need to go through your site, find all the places that reference your parameterized URL (e.g., /Package.asp) and replace those with a call to MakeFancySchmancyUrl().
Safety net: Keep in mind that if for some reason you miss converting any of your in-site links, the mechanism for 301'ing inbound links will take care of those for you.
Now, your parameterized ASP page is going to be called in two ways:
- By users or search engine (in which case they need to 301 to the readable URL)
- By your 404 handler (next topic, don't worry!), in which case you DO NOT want to redirect...you want to follow through the logic on that page to actually produce the HTML content
When someone clicks a link to /TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm, whether it's a link on your site, from a SERP, or from another site, we've got a little magic to perform, as there isn't really a page with this name (or with those folder names, either!).You'll need to create a custom 404 error handler (if you haven't already), and in there, look for these requests and hand them over to the /Package.asp page to show the content.
Here's our example:
On Error Resume Next
Dim iPos, sPageHit, cnList, cmdList, sUserID, rsUserWebPage, sGuestPassword, chTmpRegTypes
PageHit = Trim(Request.QueryString)
iPos = InStr (12, sPageHit, "/", 1)
sPageHit = Right (sPageHit, Len(sPageHit) - iPos)
sPageHit = LCase (sPageHit)
Dim sPageLeaf, iDomainEnd
sPageLeaf = LCase (Trim(Request.QueryString))
iDomainEnd = InStr (sPageLeaf, "thebigday.com")
If (iDomainEnd > 0) Then
sPageLeaf = Mid (sPageLeaf, iDomainEnd + Len (sThisPageDomain))
End If
'See if it's one of our static URLs that needs to be converted:
If (Left(sPageLeaf, 8) = "/travelspecials/") Then
Server.Transfer "/Package.asp"
End If
...
If it's not one of our magical virtual pages, then the logic continues on to actually display a 404 page. Note that Server.Transfer will delegate the responsibility of spitting out the page content to /Package.asp BUT the user will still see the full readable URL in the browser, and the browser will get a nice happy HTTP 200 OK response.
In /Package.asp, you'll need to:
- Parse out the destination, sub-destination, and resort name
- Look up each in the database and get the parameter equivalent
- Fetch whatever data from the database you need for the destination, sub-destination, and resort to display the content on the page
Handling the 404 Handler Bit
In IIS anyway, the 404 handler has the original URL requested in its query string, pre-pended by 404. The full query string for our example would be:
404;https://www.thebigday.com/TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm
So, just look for:
404;http
as follows:
sFullQueryString = LCase (Request.QueryString)
If (Len (sFullQueryString) > 8)
Then
If (Left (sFullQueryString, 8) = "404;http") Then
Call ExtractResortParms (sFullQueryString)
End If
End If
Our function ExtractResortParms() above will parse the query string, pull out the destination name, subdestination name, and resort name, and attempt to look those up in the database.
If anyone would like to actually see what my version of ExtractResortParms() looks like, email me...it's not very exciting, just fun & games with Mid(), Left(), and InStr() etc.
Now, remember that the resort name, etc. in the URL isn't generally going to match what's in the database, as spaces and punctuation will have been stripped out....Fairmont Kea Lani became FairmontKeaLani. So you're not going to be able to do an indexed look up of the name--instead, you'll have to retrieve the whole set of possible names into a record set and walk the record set, running your name cleanup function on each name, THEN see if it matches what you extracted from the URL. If those record sets are going to be very big (say, over 100 records), you'll want to do a little optimization for performance. For us, the list of destinations and sub-destinations are both short enough that we don't worry about this, but for the resort name, we parse the destination and sub-destination first, then retrieve just the list of resorts that match those, which results in a much smaller list. An alternative that's pretty good performance-wise is to add a field to the database table for the "cleaned" name, and simply call your cleanup function in the content management page where you add/edit the content element, then put an index on the new cleaned name column.
301 Redirection Bit
If you didn't see 404;http in the beginning of the query string, then you've probably been linked to using the parameterized URLs and need to 301 to the readable version. "But," you ask, "since you have the parameters now, why not just look up the friggin' content and show it now?" Because, grasshopper, you want any link juice from your old URLs to be carried over to the new readable URL. So, pull the parameters out of the query string.
For example, the link will be something like
/Package.asp?dest=2&subdest=51&resort=123
And the redirect, using that fabulous function you wrote earlier to make your readable URLs:
Response.Status = "301 Moved Permanently"Response.AddHeader "Location", MakeFancySchmancyUrl (nDestID, nSubdestID, nResortID)
Gotchas
If you're renaming products occasionally, you could find yourself leaking link juice here and there...for example, let's say Princeville Resort is renamed to The St. Regis Resort, Princeville, and someone linked to our page a while ago as:
/TravelSpecials/Hawaii-Kauai/PrincevilleResort.htm
Of course, that's gonna get the user a shiny real-life 404 (and no link juice) as there will no longer be any resort found whose name "cleans" to "PrincevilleResort". Two options (your choice will depend on how frequently things get renamed):
- If they're few and far between, you can add a few manual 301's in your 404 handler.
- You can create a table of resort name history, and each time your content management code changes the resort name, add a record to this table.Then, if your resort page handler doesn't find a match for the name, it looks up the cleaned name requested in this table.
If you've spent any time learning how your customers shop, you're well aware that the categorizations of your products that are most logical and convenient to you aren't likely to be the way your customers think about your products, and you've probably already got a number of different ways to group your products,which means that a given product page might appear in a number of different URLs using the above scheme. In our case, we not only group resorts by destination, but also by type of experience and by brand. If this is the case, you're going to need to tell the search engines which version of the rewritten URL is the "main" one, and that the others are really the same page. Time to use the new rel="canonical" trick. In our case, we have our categories (e.g., "all-inclusive", "spas", "luxury", etc.) coded as pseudo-destinations, so what we do is look up the primary destination ID that the resort belongs to and fabricate the URL for that:
<link rel="canonical" href="https://www.thebigday.com<%=MakeFancySchmancyUrl (nDestID, nSubdestID, nResortID)%>">
Conclusion
The above might LOOK like a lot of work, but seriously shouldn't take you more than a day, especially if you ask questions of people like me when you get stuck or confused :-)
Without doubt one of the best "How to.."'s I've come across for not only explaining the why's but specific recommendations/coding/structure for actually doing it. Hope you don't mind but we've archived/bookmarked this to rollout when we are discussing technical hurdles with well-meaning but SEO-inept web and db developers. Hopefully, after reading this, they actually get it and how important it is, not just for SEO but also to raise click thru's.
I'd use lower case urls as a standard to prevent canonical ussues.
Especially when clients are working in a CMS the url-thing needs to be foolproof.
Really good point there...thx for contributing that.
Just a quick followup...no matter what technique you're using to do your rewrites, I strongly recommend you use some sort of tool to see exactly what's coming down in your headers, to make sure the browser (well, crawler!) really does see 301s when you mean it to, and does NOT see 404 when your 404 handler does its magical redirection. For IE, there's an app called IEHttpHeaders; for Firefox, I use a Web Developer tools snapin from Chris Pederick: https://chrispederick.com/work/web-developer/ (then right-click on the page, select Web Developer ... Information ... View Response Headers).
The Web Developer extension only shows the response header of the current page you are on. For accurate HTTP request/response headers, I think HttpFox (https://addons.mozilla.org/en-US/firefox/addon/6647) does a good job!
Great post, I learned a lot from this post on URL rewrites which will help me in future for sure!
Thanks Michael :-)
Hi,
I still concern about 301 redirect, now i added already but i am not clear at (Redirect requests to this destination) what is a new url i should point to? I am doing in my localhost and using IIS 7 with ASP Script.
Thanks
404 custom error page make dynamic well rewrite.
This is a trick ! I love classic Asp.
Live example - https://www.cambodia-tourism.org/news/
anyone know how to convert to real url not id ?
I wrote a similar bit of code in PHP for my Sh0p.
The urls look like this:
/parent_category/child_category/child_category/product_name-id/
My htaccess looks like this:
RewriteRule ^(parent_category1|parent_category2|parent_category3)/([a-z0-9_/.-]+)?$ /?c=$1&raw=$2
Then in my PHP script, $_GET['c'] contains the parent category and $_GET['raw'] contains the rest of the URL. The script then explodes raw into an array with the forward slash as the delimeter. Eventually, with a bit of regular expression work, the script has an array of categories, a product name and a product id.
Then the genius bit:
$product_url = getProductUrl( $product_id ); // works out what the url should be by looping through categories
if ( $product_url != $_SERVER['REQUEST_URI'] ) {header( 'HTTP/1.1 301 Moved Permanently' );header( "Location: $product_url" );}
So, if someone supplies a product id in the url, but the category is wrong, the script will automatically 301 them to the correct location. Also works great if you move products around into different categories, or even start from scratch with new categories. As long as you keep the product ids the same in the database, any links to your old product pages will continue to pass juice and you don't need a million htaccess rules.
It also means I can find something by just substituting the product id in the url instead of searching around the site for it. So it's great for lazy people, too!
Wow... this was a very good post. I'm surprised by how much all the moz community knows.
Great post! For those who might not know IIS 7.0 comes with its own URL rewrite module now, so no longer the need for ISAPI_rewrite unless you're using a older version of IIS. There are some known issues with the IIS 7.0 URL rewrite module, but for the most part you can use Apache mod_rewrite interchangeably to rewrite URL's in IIS.
https://learn.iis.net/page.aspx/460/using-url-rewrite-module/
I love classic ASP. as i have very poor coding skills, i find it the easiest way to build simple dynamic websites.
up do date resources on classic ASP are getting hard to find these days - and this makes your post very valuable for me.
Thanks
Hey Michael!
I totally used your Honeymoon Registry for my honeymoon and it was awesome! I've been looking for it since then (I couldn't remember the name and you changed) to reccomend to a friend and look where I find you, of all places!
(ok-on post topic)
Thanks for the in-depth review of the hand coding for a rewrite. I've been looking into doing some of these for one of our smaller sites. My question to you, and to the SEOmoz community in general - does anyone have any suggestions for doing this with a Yahoo! Store? I've not been able to find a nice, smooth, easy way to do this in the Yahoo framework yet. Any suggestions would be most helpful.
-Susan
Hi Susan,
Thanks for being our customer, and for referring others!
Unfortunately I don't know diddly about Yahoo Store coding...
Michael.
This would have been very handy about a month ago! We needed to create one in ASP.Net. I've forwarded to my devs anyway, as I'm sure it will still be a useful reference for them.
Good post. For those on Apache, an easy approach will be to use mod_rewrite with the same consideration suggested by Michael. And a search yields https://www.phpriot.com/articles/search-engine-urls/5 which seems to explain how you can achieve it. Cheers.
great post, glad to see it got promoted!