I've said it before, and no doubt I'll say it again: in my opinion, the number one reason for websites failing online is because they have a poor information architecture. Don't worry, I'm not going to compare myself to Bond. Instead, I'm going to use Quantum of Solace to demonstrate how something as simple as categorizing a new action movie can lead to some serious problems in your site architecture. (Just for the record, I tend to use the phrases 'site' or 'information architecture' interchangeably to cover a multitude of sins.)

The Illustrated Guide to Building a Search Friendly Website covers information architecture and its importance. I see far too many sites where very little or no thought has been put into the flow of information and, as the guide says, this will have a huge impact on the potential search engine rankings.

The most common information architecture issue I see is when there isn't a way for the search engines to get to every page of a website. This most commonly occurs when there is regular new content, often user-generated. Usually there is, at best, a theoretical way for the spiders to get to every page, but this often includes going via a link deep in pagination.

A lot of clients I talk to struggle with the concept of information architecture. I'm going to talk through my thought process when I'm devising a website's architecture to try and help with this. I'll also touch on some of the common problems you come up against.

Most of the sites that I deal with have a large number of pages, which means the correct architecture is almost always a hierarchy of some sorts. People tend to be quite comfortable thinking in terms of a hierarchy, but less comfortable coming up with one for their site.

Quantum of Solace

The Benefit of Breadcrumbs

When I'm thinking in terms of a hierarchy I like to start with a page at the tip of what will become your tree. Unsurprisingly, this is often called a leaf. If you can't identify these pages there are bigger problems than information architecture to worry about. Often called the 'money pages', these pages are normally where a sale is made or where the user finds the information they are looking for.

Let's take a film site as our example. It's not really relevant that Quantum of Solace is out about two weeks earlier in the UK than the US, but it makes me feel pretty smug, and that's a good enough reason as any to use it as the example. For a site that lists films, the page about the actual movie is the money page. Maybe it's the page selling the DVD or cinema tickets, or it could simply be the page with all the information about the movie (with a goal of advertising revenue).

Once you have identified your money pages, think about the ideal breadcrumb for that page. Don't worry about being generic or how it will work, just think about your ideal breadcrumb. For Quantum of Solace, you may have come up with something similar to this:

Home > Action Movies > Quantum of Solace

It is short so it's not many clicks from the homepage, and few people will argue about Quantum of Solace being an Action Movie. We could pick any number of films and slot them into a breadcrumb structure like the above. I also happen to know that there is search volume for the category in our hierarchy (i.e., action movies).

Home > Comedy > Monty Python and the Holy Grail
Home > Romance > Casablanca

The problem arises when there are too many options at any given level. The homepage is probably okay; according to IMDb.com there are 27 genres. That's a pretty decent number of links for a top level in a hierarchy. The issue comes when you look at the number of films that would be in each category. According to IMDb there would be 26,469 action films. The maximum number of links you can get away with on a page will depend on your site, but a good rule of thumb is keep it close to 100. Certainly 26,469 links is way past the cutoff point.

There are two options available if you find yourself with too many links in a category. The most obvious is to split the category by adding in sub-categories. In this case we could add in a sub-genre. I'm not aware of any universally known sub-genre system, so this looks unlikely to work. It's important that any sub-category you add doesn't confuse your users. If they don't know which sub-category a film is in, they have to start guessing, which means your hierarchy has failed.

The second option is to add a level above the problematic category. For example, we could add the release date as a category above the genre. When adding categories high up in the chain, you need to think about keyword cannibalisation. Let's add the release date and see what happens:

Home > Movies > 2008 > Action > Quantum of Solace

With this hierarchy we end up with a page for 2008 action movies, one for 2007 action movies, and one for 2006 action movies, etc. That's not a good situation for a whole heap of reasons. Firstly, rather than just one page targeting "action movies", we now have one page for each year that a movie has been released - that sounds like keyword cannibalisation to me. The second problem is writing content for the action movie page for each year - that's going to be tough to write compelling content without introducing duplicate content. Finally, I strongly suspect there is low search volume for "2008 action movies".

Adding high level categories isn't always a bad thing. You'll notice I've sneaked Movies in as a category, as I think that it's unlikely to lead to keyword cannibalisation in the same way as the date hierarchy would.

My final tip when thinking about your site architecture is that you can sometimes move categories around to help with having a small number of links per category. For example, we can avoid duplicate content and reduce the number of films per category by moving the date hierarchy I added earlier to below the genre category. This leaves us with the following hierarchy, which looks like a much better place to start.

Home > Movies > Action > 2008 > Quantum of Solace

Disclaimer: The above hierarchy is an example only. I suspect it wouldn't end up being the right answer, and you would end up removing the year and adding alphabetised pagination or similar. Using the above hierarchy may cause severe pain, injury or death.

Ensuring Your Quality Content is Spidered

With your hierarchy in place you now have to work out what to put on each page. The best information architectures closely map to the way people search, and this gives you an opportunity to create useful pages that people may want to link to. The category pages you create shouldn't just be a list of links, that's the job of a sitemap.

In the Action Movies example above, we could list the most popular new releases or show user generated comments about some of the films. It's alright to add links that miss out a category. For example, given the anticipation around Quantum of Solace, I'd add a link to it from the action (and probably movie) category pages. The key is to add links to your important pages. How you define importance is up to you. It could be pages that make you the most money, pages that aren't currently ranking, or any other metric you can think of.

Hurdles

The sheer volume of pages that most sites have, and the fact that most things in life don't fit into a nice category structure, means that there is a huge number of hurdles to getting your information architecture correct.

Overlapping hierarchies

More often than not there are multiple different suitable hierarchies that could be chosen. When this happens it is important to choose the closest fit as your default hierarchy. You can easily add the other hierarchy for your users, but you can only ever choose one set of breadcrumbs, so to avoid duplicate content you must choose your default hierarchy.

For example, you often get overlapping hierarchies when organising architecture around places. In London, people talk in terms of boroughs, districts and tube stations, which means our office could be said to be in Southwark, Bermondsey and London Bridge.

When new releases get old

In an early draft of this post I had "New Releases" as a possible category that could sit above action movies. At first glance this isn't a horrific choice - it will have search volume, it helps to reduce the number of films in each category, and it would be an easy page to write in a way that naturally gathered links. There are two issues with it. Firstly, it would cause keyword cannbalisation in the same way that adding a year did. Suddenly there are at least 2 pages competing for the search "action movies". You have the new release action movie page and you'd have the archived action movie page.

The other problem is that new releases quickly turn into old releases. At this stage you have to update the film's place in the hierarchy. This will most likely include updating the breadcrumbs of the page, the URL, and all the category pages that link to it. I mentioned earlier that you need to make it as easy for your users as possible. In three months' time, if you were browsing around a site looking for a page about Quantum of Solace, would you choose the new release category or not? What about 4 months? 5? 6? My point is, try to avoid choosing a hierarchy that will lead to pages moving from one place to another within the structure.

I hope this has given you food for thought the next time you are thinking about how to structure your site. Get the architecture right and suddenly you have a site that your users understand and has the potential to rank for relevant phrases. You, my friend, are a hero.

P.S. Sorry for the somewhat mis-leading headline. I think people in the know call it creative license, I suspect a number of you will call it keyword stuffing :)