If you work on an enterprise site — particularly in e-commerce or listings (such as a job board site) — you probably use some sort of faceted navigation structure. Why wouldn’t you? It helps users filter down to their desired set of results fairly painlessly.
While helpful to users, it’s no secret that faceted navigation can be a nightmare for SEO. At Distilled, it’s not uncommon for us to get a client that has tens of millions of URLs that are live and indexable when they shouldn’t be. More often than not, this is due to their faceted nav setup.
There are a number of great posts out there that discuss what faceted navigation is and why it can be a problem for search engines, so I won’t go into much detail on this. A great place to start is this post from 2011.
What I want to focus on instead is narrowing this problem down to a simple question, and then provide the possible solutions to that question. The question we need to answer is, “What options do we have to decide what Google crawls/indexes, and what are their pros/cons?”
Brief overview of faceted navigation
As a quick refresher, we can define faceted navigation as any way to filter and/or sort results on a webpage by specific attributes that aren’t necessarily related. For example, the color, processor type, and screen resolution of a laptop. Here is an example:
Because every possible combination of facets is typically (at least one) unique URL, faceted navigation can create a few problems for SEO:
- It creates a lot of duplicate content, which is bad for various reasons.
- It eats up valuable crawl budget and can send Google incorrect signals.
- It dilutes link equity and passes equity to pages that we don’t even want indexed.
But first… some quick examples
It’s worth taking a few minutes and looking at some examples of faceted navigation that are probably hurting SEO. These are simple examples that illustrate how faceted navigation can (and usually does) become an issue.
Macy’s
First up, we have Macy’s. I’ve done a simple site:search for the domain and added “black dresses” as a keyword to see what would appear. At the time of writing this post, Macy’s has 1,991 products that fit under “black dresses” — so why are over 12,000 pages indexed for this keyword? The answer could have something to do with how their faceted navigation is set up. As SEOs, we can remedy this.
Home Depot
Let’s take Home Depot as another example. Again, doing a simple site:search we find 8,930 pages on left-hand/inswing front exterior doors. Is there a reason to have that many pages in the index targeting similar products? Probably not. The good news is this can be fixed with the proper combinations of tags (which we’ll explore below).
I’ll leave the examples at that. You can go on most large-scale e-commerce websites and find issues with their navigation. The points is, many large websites that use faceted navigation could be doing better for SEO purposes.
Faceted navigation solutions
When deciding a faceted navigation solution, you will have to decide what you want in the index, what can go, and then how to make that happen. Let’s take a look at what the options are.
"Noindex, follow"
Probably the first solution that comes to mind would be using noindex tags. A noindex tag is used for the sole purpose of letting bots know to not include a specific page in the index. So, if we just wanted to remove pages from the index, this solution would make a lot of sense.
The issue here is that while you can reduce the amount of duplicate content that’s in the index, you will still be wasting crawl budget on pages. Also, these pages are receiving link equity, which is a waste (since it doesn’t benefit any indexed page).
Example: If we wanted to include our page for “black dresses” in the index, but we didn’t want to have “black dresses under $100” in the index, adding a noindex tag to the latter would exclude it. However, bots would still be coming to the page (which wastes crawl budget), and the page(s) would still be receiving link equity (which would be a waste).
Canonicalization
Many sites approach this issue by using canonical tags. With a canonical tag, you can let Google know that in a collection of similar pages, you have a preferred version that should get credit. Since canonical tags were designed as a solution to duplicate content, it would seem that this is a reasonable solution. Additionally, link equity will be consolidated to the canonical page (the one you deem most important).
However, Google will still be wasting crawl budget on pages.
Example: /black-dresses?under-100/ would have the canonical URL set to /black-dresses/. In this instance, Google would give the canonical page the authority and link equity. Additionally, Google wouldn’t see the “under $100” page as a duplicate of the canonical version.
Disallow via robots.txt
Disallowing sections of the site (such as certain parameters) could be a great solution. It’s quick, easy, and is customizable. But, it does come with some downsides. Namely, link equity will be trapped and unable to move anywhere on your website (even if it’s coming from an external source). Another downside here is even if you tell Google not to visit a certain page (or section) on your site, Google can still index it.
Example: We could disallow *?under-100* in our robots.txt file. This would tell Google to not visit any page with that parameter. However, if there were any "follow" links pointing to any URL with that parameter in it, Google could still index it.
"Nofollow" internal links to undesirable facets
An option for solving the crawl budget issue is to "nofollow" all internal links to facets that aren’t important for bots to crawl. Unfortunately, "nofollow" tags don’t solve the issue entirely. Duplicate content can still be indexed, and link equity will still get trapped.
Example: If we didn’t want Google to visit any page that had two or more facets indexed, adding a "nofollow" tag to all internal links pointing to those pages would help us get there.
Avoiding the issue altogether
Obviously, if we could avoid this issue altogether, we should just do that. If you are currently in the process of building or rebuilding your navigation or website, I would highly recommend considering building your faceted navigation in a way that limits the URL being changed (this is commonly done with JavaScript). The reason is simple: it provides the ease of browsing and filtering products, while potentially only generating a single URL. However, this can go too far in the opposite direction — you will need to manually ensure that you have indexable landing pages for key facet combinations (e.g. black dresses).
Here’s a table outlining what I wrote above in a more digestible way.
Options: |
Solves duplicate content? |
Solves crawl budget? |
Recycles link equity? |
Passes equity from external links? |
Allows internal link equity flow? |
Other notes |
---|---|---|---|---|---|---|
“Noindex, follow” |
Yes |
No |
No |
Yes |
Yes |
|
Canonicalization |
Yes |
No |
Yes |
Yes |
Yes |
Can only be used on pages that are similar. |
Robots.txt |
Yes |
Yes |
No |
No |
No |
Technically, pages that are blocked in robots.txt can still be indexed. |
Nofollow internal links to undesirable facets |
No |
Yes |
No |
Yes |
No |
|
JavaScript setup |
Yes |
Yes |
Yes |
Yes |
Yes |
Requires more work to set up in most cases. |
But what’s the ideal setup?
First off, it’s important to understand there is no “one-size-fits-all solution.” In order to get to your ideal setup, you will most likely need to use a combination of the above options. I’m going to highlight an example fix below that should work for most sites, but it’s important to understand that your solution might vary based on how your site is built, how your URLs are structured, etc.
Fortunately, we can break down how we get to an ideal solution by asking ourselves one question. “Do we care more about our crawl budget, or our link equity?” By answering this question, we're able to get closer to an ideal solution.
Consider this: You have a website that has a faceted navigation that allows the indexation and discovery of every single facet and facet combination. You aren’t concerned about link equity, but clearly Google is spending valuable time crawling millions of pages that don’t need to be crawled. What we care about in this scenario is crawl budget.
In this specific scenario, I would recommend the following solution.
- Category, subcategory, and sub-subcategory pages should remain discoverable and indexable. (e.g. /clothing/, /clothing/womens/, /clothing/womens/dresses/)
- For each category page, only allow versions with 1 facet selected to be indexed.
- On pages that have one or more facets selected, all facet links become “nofollow” links (e.g. /clothing/womens/dresses?color=black/)
- On pages that have two or more facets selected, a “noindex” tag is added as well (e.g. /clothing/womens/dresses?color=black?brand=express?/)
- Determine which facets could have an SEO benefit (for example, “color” and “brand”) and whitelist them. Essentially, throw them back in the index for SEO purposes.
- Ensure your canonical tags and rel=prev/next tags are setup appropriately.
This solution will (in time) start to solve our issues with unnecessary pages being in the index due to the navigation of the site. Also, notice how in this scenario we used a combination of the possible solutions. We used “nofollow,” “noindex, nofollow,” and proper canonicalization to achieve a more desirable result.
Other things to consider
There are many more variables to consider on this topic — I want to address two that I believe are the most important.
Breadcrumbs (and markup) helps a lot
If you don't have breadcrumbs on each category/subcategory page on your website, you’re doing yourself a disservice. Please go implement them! Furthermore, if you have breadcrumbs on your website but aren’t marking them up with microdata, you’re missing out on a huge win.
The reason why is simple: You have a complicated site navigation, and bots that visit your site might not be reading the hierarchy correctly. By adding accurate breadcrumbs (and marking them up), we’re effectively telling Google, “Hey, I know this navigation is confusing, but please consider crawling our site in this manner.”
Enforcing a URL order for facet combinations
In extreme situations, you can come across a site that has a unique URL for every facet combination. For example, if you are on a laptop page and choose “red” and “SSD” (in that order) from the filters, the URL could be /laptops?color=red?SSD/. Now imagine if you chose the filters in the opposite order (first “SSD” then “red”) and the URL that’s generated is /laptops?SSD?color=red/.
This is really bad because it exponentially increases the amount of URLs you have. Avoid this by enforcing a specific order for URLs!
Conclusions
My hope is that you feel more equipped (and have some ideas) on how to tackle controlling your faceted navigation in a way that benefits your search presence.
To summarize, here are the main takeaways:
- Faceted navigation can be great for users, but is usually setup in a way that negatively impacts SEO.
- There are many reasons why faceted navigation can negatively impact SEO, but the top three are:
- Duplicate content
- Crawl budget being wasted
- Link equity not being used as effectively as it should be
- Boiled down further, the question we want to answer to begin approaching a solution is, “What are the ways we can control what Google crawls and indexes?”
- When it comes to a solution, there is no “one-size-fits-all” solution. There are numerous fixes (and combinations) that can be used. Most commonly:
- Noindex, follow
- Canonicalization
- Robots.txt
- Nofollow internal links to undesirable facets
- Avoiding the problem with an AJAX/JavaScript solution
- When trying to think of an ideal solution, the most important question you can ask yourself is, “What’s more important to our website: link equity, or crawl budget?” This can help focus your possible solutions.
I would love to hear any example setups. What have you found that’s worked well? Anything you’ve tried that has impacted your site negatively? Let’s discuss in the comments or feel free to shoot me a tweet.
My solution for this has been using the URL Parameters settings in Search Console, but I don't see this mentioned. Is there a negative to relying on Search Console for telling Google not to crawl these filtered/tagged/sorted pages?
Hey Ria—great point, thanks for brining it up! Using Search Console for this can be fine as long as you are only interested in how GoogleBot crawls the site and as long as your facets are parameters, not subdirectories.
Hi Serge. Great article, but I'm confused on something here.
In order to start applying filters on a category page, a user typically checks options like a brand tickbox, etc. How would a bot discover variations of a page with different colour, brand etc, if it's not a User and can't select those options? (aside from a user applying a filter and then posting that link somewhere on the web)?
Is this the parameter point Ria was making? Why would you not want to setup filters as parameters if so and it avoids indexation issues?
Hey Randal.
If we're talking about a typical setup, a bot would discover those facets by following the links to pointing to them. I have seen some setups where the link is generated after all of the facets have already been selected (this is more common on mobile). So, instead of the user clicking on "black" for a dress color and a URL being generated then, the user would click on "black" & "under $100" for color/price, and then a URL would generate. This can be a cause for concern (as you mention) and would require further digging.
In terms of parameters, you can absolutely setup filters as parameters (many sites do). But again, parameters can still be crawled and indexed by bots. You can ask Google to not include certain parameters in Search Console, but you would only be excluding them to GoogleBot, and you would be leaving that up to Google (which isn't always the best way to go).
Hope that helps!
Hi all
Very useful discussion around the URL structure and the concept!
I have some questions around this though (and in the process re-confirming my understanding),
- Facets are excluded as a general rule on our site
- URL's that have 1 facet are manually added to the sitemap (e.g. colour and brand separately since these are both important)
- Each of those faceted URL's are marked as no-follow
- This should result in each of the faceted URL's showing up in Google (hopefully)
Questions,
1. You mention that the single facet URL be put as a no-follow and then you white list them if they are important. Does this mean putting them as a follow?
2. Have you in your experience seen that the traffic generally increases as a result of this approach because more URL's are presumably being displayed by google?
Thank you!
Great post explaining the possible solutions for e-commerce website to conserve crawl budget and avoid duplication.
My solutions are:
1. noindex, nofollow all internal search pages
2. noindex, follow category pages with little or no content
3. use canonical when products are very similar e.g. different color or where the shop software generates many urls for the same product page e.g. when the product is placed in several categories. Note that 301 redirects cannot be used in many such cases else that too is an option.
Nice one, Serge! You've covered this topic really well.
Your 'ideal setup' is something I've worked with a lot in Magento. I always allowed 2-3 facets to be crawled and indexed because there was a lot of value in having more combinations indexed for long-tail traffic.
However, like you said, allowing multiple facets to be crawled and indexed won't work well in every situation, but it was great for the large clothing and furniture sites I worked on.
Cheers,
David
Thanks David! I appreciate that. Yeah, I think once you get done taking out some of the more obvious facets from the index, there is a larger project of choosing what to throw back in. As you mention, this greatly depends on the vertical your client is in. Definitely takes some further analysis!
Great post Serge.
I have regular battles with clients' developers who insist that simply adding a noindex tag is the best and only solution to this issue.
It's a real shame that there is no official guidance on this from Google. It would be so helpful in persuading clients and to stop their developers pulling the wool over their eyes.
Hey Danny—thanks for the comment! I agree with you. There is a lot of information from Google on parts of this topic, but no "go to" which is definitely a shame. Probably one of the quickest ways to get the seriousness of this communicated properly to devs/managers would be to illustrate the amount of time bots are wasting on unnecessary facets (and explaining why that matters). If you're lucky, you might be able to see this quickly in Search Console. If it isn't that clear, a log analysis usually points out some shocking facts that won't be taken lightly (and hopefully will result in change).
Solid stuff, and really well-explained, Serge. I don't know how many times I've had to go through this with client sites!
Nice guide Serge!
One thing though: Canonical tags do reduce Googlebot crawling on canonicalized pages.
We've tested this by checking access logs on a large e-commerce site where all filtered categories have a canonical tag pointing to the unfiltered version. The filtered pages dropped from 10-50 Googlebot visits per day to about 0-5.
John Mueller also confirmed this in a recent hangout.
That's why we prefer going the canonical route when it comes to faceted navigation, since it helps with ranking signals, indexing and crawl budget.
PS: An alternative to the canonical tags is using Search Console if the filtered pages use URL parameters.
Cheers!
You da man , Serge! Thanks for an insightful post. So many amazing ideas I had never even considered. I LOVE ARTICLES LIKE THIS THAT REALLY MAKE YOU THINK.
Question though -- Have you tried adding the current year in your title? If so, is there data on how that performs in terms of behavior signals for SEO?
Thanks again, this was great.
This is really helpful and useful thank you for this. Duplicate content is something i have noticed that brings a lot of websites down in there rankings/ Very common problem. Thank you for this Serge Stefoglo!
I personally experienced SEO for e-commerce website is more easy than any service website. I can understand there are lot of variations for the same product but we can make it very simple.
Suppose, I have an e-commerce website of grocery products and I have 3 different variations in apple Juice. Suppose the company name of apple juice is Moz and variations are 100ml, 200ml, 1ltr. Now lets do On Page:
1. Make single page for the Moz Apple Juice
2. Make Single URL - (website domain)/drinks/moz-apple-juice
3. Make Page Title - Buy Moz Apple Juice | 100ml, 200ml & 1ltr
4. Set only Product Name in Product Title - Moz Apple Juice and Set Product description accordingly
5. Make functionality where page show by-default 100ml Moz Apple Juice's price and image.(Lowest price attracts users to visit from Search Engines)
6. Make select functionality where user can change the product type - 100ml,200ml or 1ltr. Make sure every product type has unique image.
7. Set breadcrumb - Home/Drinks/Juices/Moz Apple Juice
8. Use Similar Tags like - Apple Juice, Juice, etc. Set Tag Page Titles like - Buy Apple Juice Products at (website domain). Do same thing for Categories
That's end :), your page will be in ranking within few days.
Note - This is just an idea. You can use any appropriate format for Page Title, URL and Tags. But make sure you tell your programmer to program a code where it does set automatically
For non technical seos or marketers it may be difficult to implement it. While I can make my own wordpress webs it may be harder to do this.
Is it necessary for small ecommerce sites or when would you consider the solution you mentioned?
Regards.
Hello Serge,
My question is about your meta description, in the view source of this article your meta description is not show completely in google search engine, can you please define this.
Your half meta description is show.
Nice post Serge, I agree with most of your points except your stance on the noindex, follow solution.
I don't see noindex pages receiving link equity being a waste because this link equity can help other pages. Link equity flows in all directions, so the upward link equity flow will strengthen pages sitting higher on the site's hierarchy. This is because all pages can contribute link equity even if they are not externally linked to.
In addition, the downward link equity flow will help the indexable faceted pages get discovered, indexed and ranked higher. Last, pages with a noindex tend to be crawled less and less frequently overtime (similar to canonicalised pages) so the crawl budget issue isn't a huge one IMO.
Really every day that passes I see that I have much to learn...
Good article, thanks for sharing it !!
Anybody with any thoughts on PRG Patterns vs. ajax/javascript implementation?
So great to see this topic covered in such detail. I think the part I like the best about the article is how you encourage critical thinking about what is a priority for us as site owners. Do I care more about link equity or crawl budget? Once I have that answer, you've given excellent guidance for coming up with a strategy that will work best for our individual case. Acknowledging that there is rarely a one-size-fits-all approach is so important.
Also schema for breadcrumbs? Awesome idea! I'm just getting ready to start a project to begin using schema and this is sounds like a good fit for us.
Thanks!
Thanks for the comment! I think a lot of times as SEOs we tend to overcomplicate problems or get ahead of ourselves when trying to solve specific problems. I know I do this all the time. So I'm always a big fan of simplifying projects or problems down to the root question, and going from there. :)
Best of luck on your schema project!
I'd say it's important to remember that using Canonical is only a suggestion to Google. Maybe big G will index the other page if he figures it's different!
Good post in laying out options, but a couple things stick out to me.
Great point Matt. I definitely could have used some better examples to illustrate!
To add on to your second point, that's absolutely true. It's a good reminder that these directives are hints to help GoogleBot crawl our sites more efficiently. Always worth mentioning Google doesn't have to follow them!
Hello,
We are starting a new version of our ecommerce website and I found your post very useful. I would add one thing : by default, we close all facets for indexation. We need human intervention in order to generate the link on the facet.
When a facet works only with javascript, we use session in order to regenerate the same results for the user when the user is clicking back on his navigator.
I rarely comment, but this article is exceptional. Thanks for sharing so many details.
Quick questions:
Thanks!
Thanks for the comment Marin!
Thank you Serge,
We are starting with SEO on our e-commerce website and we found very interesting your post. Our e-commerce website has a lot of products and, of course, there are some filters. We will have to find a way to prevent google to index our filter pages. Anybody knows a 'easy' way to add Meta noindex/follow on faceted pages in Magento?
Interested in this issue, too.
Thanks,
Great post, Serge! I still don't understand how would a Javascript setup can actually tick all the boxes including the crawl budget? Are we limiting the number of our pages with this setup?
Stan, the javascript setup would be to change the content on the page dynamically, i.e, no extra URL's exist. It ticks all the boxes, however, this can also cause other problems such as content not indexing at all due to complicated / dynamically generated content. BUT it can be done to work properly :)
Hi
You miss the best solution: PRG Pattern (see article in wikipedia)
have a form for all filters you do not want to be crawled and indexed
Canonical makes no sense as it is not DC and Google tells us not to use it for this
robots.txt is wrong. The only reason to block pages with robots.txt is because Google crashes your server
best regards
Julian
Hey Serge!
Great post, except one thing:
I would not recommend to ever use nofollow links within the internal linking structure, because it wastes all the linkjuice. Let's say a page has five outgoing internal links and I switch one of them to nofollow the four remaining links only get 80 percent of the available linkjuice and the remaining 20 percent is just gone.
Greetings from Germany!
Hey Serge, thanks for the post. I've been struggling with this problem of faceted navigation for a while and this post made it much clearer. More of those SEO-tech articles! Cheers, Martin
The best solution is to plan ahead of developing the project, there are number of ways you can handle this, but if you are fixing the already live cms, then I am surprised to see you did not mention "Noindex, Nofollow" in combination with Robots.txt file, implementing this will save time and resource. The pages might still get indexed, but chances are pretty slim.
This is nice. I was wondering so many things for a project that was in my mind and I needed to solve some issues, for example the things related to noindex, follow. Now I am on a project and I cannot start with this. But anyway, I save this post for the future (sorry for my English xD).
another question (because I also think about that problem for years already):
Why not the combination of all 4 variants?? (if javascript and PRG ist no option)
To avoid the crawling:
-robots.txt --> to avoid from the beginning on that bots are crawling the filter pages
-internal nofollow-links --> to avoid that the bot doesn't crawl the filter pages (additional to the robots.txt, just to be sure)
To avoid the indexation:
-noindex,follow of filter pages --> if the bot finds the filter pages
-canonical tag --> additional to the noindex, follow
Would that combination of that 4 work? If not, why??
Very curious about your answer! (or answers of you others) :)
Great examples Serge, we will definitely be saving this info for any e-commerce or listings based customers that we have in the future. Also, awesome breakdown of proper URL siloing and keeping everything organized so a website doesn't become a growing mess that has to be completely overhauled or have time and money invested in it to be cleaned up later.
Good post Serge !!
Facet navigation is imperative today in almost any website, users if they do not find quickly what they are looking for do not hesitate to leave so we must take care of it to the maximum. How much to SEO, can it really negatively affect the ranking of our site ?, ie search engines can lower you positions because "do not like your facet navigation"? Or maybe it harms you because of duplicate content?
Thank you and greetings.
As mentioned before the PRG Pattern works good.
Canonical is a signal not a directive. That means the parameter page with a canonical will still show up in serps if it has stronger signals than the canonicalized url. You can see this a lot.
For large sites that don't need every external backlink I recommend always having additionally robots.txt rules that disallow certain parameters. It's just a secure backup.
A question - what about the basket in a webshop, should it have noindex or??
First time we faced this problem was in 2012, for a large e-commerce website. Their tech still hate us for the solution we gave but their daily visitor count jumped from 28000 to 62000 in 3 weeks duration.
So, what you are talking here definitely works. We tell our customers " your system is heavily overweight, we should first make it a lean & mean (for dramatizing it ) machine. "
Ideal solution sequence should be:-
Great post, Serge.
I would not recommend to ever use nofollow links within the internal linking structure, because it wastes all the linkjuice
A question - what about the basket in a webshop, should it have noindex or? I can't find any best practices for this subject.. any experience with this topic?
Great post. I am going to go back to my (small) site and see how well these ideas can be implemented. We run a blog that pulls blog articles in into dynamic pages via different category tags. This sounds like it is another instance of faceted navigation where the categories are a facet. Each place the article appears on the crawl is a duplication. What's worse, I have multiple navigation schemes to help my human users find content. I'll send this article to my tech guy and make sure that we have the site optimized.
ok just got off the phone and this is what I understand. Since I don't do any dynamic link generation this isn't a problem for me but I could optimize my data layers and links from the landing page in general so the link authority from the landing page goes where I want it to go. I am very thankful that this forum exists to keep introducing me to SEO.
Thanks Serge - solid post!
How about the others about seo?
Hi Serge,
Brilliant article. A quick question as I am just in the middle of building a site. Some of my products will vary in size or quantity, so I was thinking of having the option to change the size or quantity within the product page instead of the URL. So all variations for that product, but the same URL.
Now, when it comes to selecting the URL should i include the original size of the product? For example:
Mywebsite.com/this-is-a-can-330ml-p0001
Or should i stick with:
Mywebsite.com/this-is-a-can-p0001
Which is better in terms of SEO? Keep in mind all faceting will be done on page anway, all I am asking is what should I include into the URL? Should I be size specific?