Listings sites have a very specific set of search problems that you don't run into everywhere else. In the day I'm one of Distilled's analysts, but by night I run a job listings site, teflSearch. So, for my first Moz Blog post I thought I'd cover the three search problems with listings sites that I spent far too long agonising about.
Quick clarification time: What is a listings site (i.e. will this post be useful for you)?
The classic listings site is Craigslist, but plenty of other sites act like listing sites:
- Job sites like Monster
- E-commerce sites like Amazon
- Matching sites like Spareroom
1. Generating quality landing pages
The landing pages on listings sites are incredibly important. These pages are usually the primary drivers of converting traffic, and they're usually generated automatically (or are occasionally custom category pages) .
For example, if I search "Jobs in Manchester", you can see nearly every result is an automatically generated landing page or category page.
There are three common ways to generate these pages (occasionally a combination of more than one is used):
- Faceted pages: These are generated by facets—groups of preset filters that let you filter the current search results. They usually sit on the left-hand side of the page.
- Category pages: These pages are listings which have already had a filter applied and can't be changed. They're usually custom pages.
- Free-text search pages: These pages are generated by a free-text search box.
Those definitions are still bit general; let's clear them up with some examples:
Amazon uses a combination of categories and facets. If you click on browse by department you can see all the category pages. Then on each category page you can see a faceted search. Amazon is so large that it needs both.
Indeed generates its landing pages through free text search, for example if we search for "IT jobs in manchester" it will generate: IT jobs in manchester.
teflSearch generates landing pages using just facets. The jobs in China landing page is simply a facet of the main search page.
Each method has its own search problems when used for generating landing pages, so lets tackle them one by one.
Aside
Facets and free text search will typically generate pages with parameters e.g. a search for "dogs" would produce:
But to make the URL user friendly sites will often alter the URLs to display them as folders
These are still just ordinary free text search and facets, the URLs are just user friendly. (They're a lot easier to work with in robots.txt too!)
Free search (& category) problems
If you've decided the base of your search will be a free text search, then we'll have two major goals:
- Goal 1: Helping search engines find your landing pages
- Goal 2: Giving them link equity.
Solution
Search engines won't use search boxes and so the solution to both problems is to provide links to the valuable landing pages so search engines can find them.
There are plenty of ways to do this, but two of the most common are:
- Category links alongside a search
Photobucket uses a free text search to generate pages, but if we look at example search for photos of dogs, we can see the categories which define the landing pages along the right-hand side. (This is also an example of URL friendly searches!)
- Putting the main landing pages in a top-level menu
Indeed also uses free text to generate landing pages, and they have a browse jobs section which contains the URL structure to allow search engines to find all the valuable landing pages.
Breadcrumbs are also often used in addition to the two above and in both the examples above, you'll find breadcrumbs that reinforce that hierarchy.
Category (& facet) problems
Categories, because they tend to be custom pages, don't actually have many search disadvantages. Instead it's the other attributes that make them more or less desirable. You can create them for the purposes you want and so you typically won't have too many problems.
However, if you also use a faceted search in each category (like Amazon) to generate additional landing pages, then you'll run into all the problems described in the next section.
At first facets seem great, an easy way to generate multiple strong relevant landing pages without doing much at all. The problems appear because people don't put limits on facets.
Lets take the job page on teflSearch. We can see it has 18 facets each with many options. Some of these options will generate useful landing pages:
The China facet in countries will generate "Jobs in China" that's a useful landing page.
On the other hand, the "Conditional Bonus" facet will generate "Jobs with a conditional bonus," and that's not so great.
We can also see that the options within a single facet aren't always useful. As of writing, I have a single job available in Serbia. That's not a useful search result, and the poor user engagement combined with the tiny amount of content will be a strong signal to Google that it's thin content. Depending on the scale of your site it's very easy to generate a mass of poor-quality landing pages.
Facets generate other problems too. The primary one being they can create a huge amount of duplicate content and pages for search engines to get lost in. This is caused by two things: The first is the sheer number of possibilities they generate, and the second is because selecting facets in different orders creates identical pages with different URLs.
We end up with four goals for our facet-generated landing pages:
- Goal 1: Make sure our searchable landing pages are actually worth landing on, and that we're not handing a mass of low-value pages to the search engines.
- Goal 2: Make sure we don't generate multiple copies of our automatically generated landing pages.
- Goal 3: Make sure search engines don't get caught in the metaphorical plastic six-pack rings of our facets.
- Goal 4: Make sure our landing pages have strong internal linking.
The first goal needs to be set internally; you're always going to be the best judge of the number of results that need to present on a page in order for it to be useful to a user. I'd argue you can rarely ever go below three, but it depends both on your business and on how much content fluctuates on your site, as the useful landing pages might also change over time.
We can solve the next three problems as group. There are several possible solutions depending on what skills and resources you have access to; here are two possible solutions:
Category/facet solution 1: Blocking the majority of facets and providing external links
- Easiest method
- Good if your valuable category pages rarely change and you don't have too many of them.
- Can be problematic if your valuable facet pages change a lot
Nofollow all your facet links, and noindex and block category pages which aren't valuable or are deeper than x facet/folder levels into your search using robots.txt.
You set x by looking at where your useful facet pages exist that have search volume. So, for example, if you have three facets for televisions: manufacturer, size, and resolution, and even combinations of all three have multiple results and search volume, then you could set you index everything up to three levels.
On the other hand, if people are searching for three levels (e.g. "Samsung 42" Full HD TV") but you only have one or two results for three-level facets, then you'd be better off indexing two levels and letting the product pages themselves pick up long-tail traffic for the third level.
If you have valuable facet pages that exist deeper than 1 facet or folder into your search, then this creates some duplicate content problems dealt with in the aside "Indexing more than 1 level of facets" below.)
The immediate problem with this set-up, however, is that in one stroke we've removed most of the internal links to our category pages, and by no-following all the facet links, search engines won't be able to find your valuable category pages.
In order re-create the linking, you can add a top level drop down menu to your site containing the most valuable category pages, add category links elsewhere on the page, or create a separate part of the site with links to the valuable category pages.
The top level drop down menu you can see on teflSearch (it's the search jobs menu), the other two examples are demonstrated in Photobucket and Indeed respectively in the previous section.
The big advantage for this method is how quick it is to implement, it doesn't require any fiddly internal logic and adding an extra menu option is usually minimal effort.
Category/facet solution 2: Creating internal logic to work with the facets
- Requires new internal logic
- Works for large numbers of category pages with value that can change rapidly
There are four parts to the second solution:
- Select valuable facet categories and allow those links to be followed. No-follow the rest.
- No-index all pages that return a number of items below the threshold for a useful landing page
- No-follow all facets on pages with a search depth greater than x.
- Block all facet pages deeper than x level in robots.txt
As with the last solution, x is set by looking at where your useful facet pages exist that have search volume (full explanation in the first solution), and if you're indexing more than one level you'll need to check out the aside below to see how to deal with the duplicate content it generates.
Aside: Indexing more than one level of facets
If you want more than one level of facets to be indexable, then this will create certain problems.
Suppose you have a facet for size:
- Televisions: Size: 46", 44", 42"
And want to add a brand facet:
- Televisions: Brand: Samsung, Panasonic, Sony
This will create duplicate content because the search engines will be able to follow your facets in both orders, generating:
- Television - 46" - Samsung
- Television - Samsung - 46"
You'll have to either rel canonical your duplicate pages with another rule or set up your facets so they create a single unique URL.
You also need to be aware that each followable facet you add will multiply with each other followable facet and it's very easy to generate a mass of pages for search engines to get stuck in. Depending on your setup you might need to block more paths in robots.txt or set-up more logic to prevent them being followed.
Letting search engines index more than one level of facets adds a lot of possible problems; make sure you're keeping track of them.
2. User-generated content cannibalization
This is a common problem for listings sites (assuming they allow user generated content). If you're reading this as an e-commerce site who only lists their own products, you can skip this one.
As we covered in the first area, category pages on listings sites are usually the landing pages aiming for the valuable search terms, but as your users start generating pages they can often create titles and content that cannibalise your landing pages.
Suppose you're a job site with a category page for PHP Jobs in Greater Manchester. If a recruiter then creates a job advert for PHP Jobs in Greater Manchester for the 4 positions they currently have, you've got a duplicate content problem.
This is less of a problem when your site is large and your categories mature, it will be obvious to any search engine which are your high value category pages, but at the start where you're lacking authority and individual listings might contain more relevant content than your own search pages this can be a problem.
Solution 1: Create structured titles
Set the <title> differently than the on-page title. Depending on variables you have available to you can set the title tag programmatically without changing the page title using other information given by the user.
For example, on our imaginary job site, suppose the recruiter also provided the following information in other fields:
- The no. of positions: 4
- The primary area: PHP Developer
- The name of the recruiting company: ABC Recruitment
- Location: Manchester
We could set the <title> pattern to be: *No of positions* *The primary area* with *recruiter name* in *Location* which would give us:
4 PHP Developers with ABC Recruitment in Manchester
Setting a <title> tag allows you to target long-tail traffic by constructing detailed descriptive titles. In our above example, imagine the recruiter had specified "Castlefield, Manchester" as the location.
All of a sudden, you've got a perfect opportunity to pick up long-tail traffic for people searching in Castlefield in Manchester.
On the downside, you lose the ability to pick up long-tail traffic where your users have chosen keywords you wouldn't have used.
For example, suppose Manchester has a jobs program called "Green Highway." A job advert title containing "Green Highway" might pick up valuable long-tail traffic. Being able to discover this, however, and find a way to fit it into a dynamic title is very hard.
Solution 2: Use regex to noindex the offending pages
Perform a regex (or string contains) search on your listings titles and no-index the ones which cannabalise your main category pages.
If it's not possible to construct titles with variables or your users provide a lot of additional long-tail traffic with their own titles, then is a great option. On the downside, you miss out on possible structured long-tail traffic that you might've been able to aim for.
Solution 3: De-index all your listings
It may seem rash, but if you're a large site with a huge number of very similar or low-content listings, you might want to consider this, but there is no common standard. Some sites like Indeed choose to no-index all their job adverts, whereas some other sites like Craigslist index all their individual listings because they'll drive long tail traffic.
Don't de-index them all lightly!
3. Constantly expiring content
Our third and final problem is that user-generated content doesn't last forever. Particularly on listings sites, it's constantly expiring and changing.
For most use cases I'd recommend 301'ing expired content to a relevant category page, with a message triggered by the redirect notifying the user of why they've been redirected. It typically comes out as the best combination of search and UX.
For more information or advice on how to deal with the edge cases, there's a previous Moz blog post on how to deal with expired content which I think does an excellent job of covering this area.
Summary
In summary, if you're working with listings sites, all three of the following need to be kept in mind:
- How are the landing pages generated? If they're generated using free text or facets have the potential problems been solved?
- Is user generated content cannibalising the main landing pages?
- How has constantly expiring content been dealt with?
Good luck listing, and if you've had any other tricky problems or solutions you've come across working on listings sites lets chat about them in the comments below!
I enjoyed this post and will be referring people to it for sure. I see a lot of Panda hit sites that have crazy internal duplication because they've allowed their pages that were generated based on user searches to be indexed, or because they've got every possible combination of facets indexed.
I'd love to see a post (or perhaps an addition to this one) that gives examples of robots.txt or .htaccess files as described in this post. That would be incredibly helpful!
Should've mentioned this in the post. I think the niftiest way I've seen to implement this was adding an extra parameter to the URL when the faceted nav goes too deep which is then blocked in robots.txt.
Full credit to Mike Pantoliano's old post on faceted navigation. It's solution 2.
Thanks for very detailed post. One question puzzling for so long related to this topic. When we search "Lions Photobucket" the first few on the SERP seems custom search URLs. How to index those custom search URLs in Google?
Hello Brij,
The most simple and helpful way I know is create an XML sitemap listing the pages you want Google to index. You can read here in detail - https://support.google.com/customsearch/answer/115... . It will definitely help you.
Thanks
This is a really good question!
Cornel
The URLs you're referring to are just search friendly URLs.
To get them indexed, you'd link down to those URLs from say an animal category.
Dominic has already answered your question in his blog post. Look for: "Photobucket uses a free text search to generate pages" and read from there. Your can have either search friendly URLs or not, but you have to tell Google, literally to point on this particular URL/page on your website. That's why you create these custom menus either on the sidebar or anywhere else, so that Googlebot could crawl these URLs and add them to their index. And of course, they'd better be search friendly.
Besides, the contents of these pages should contain value and not to be considered "thin". It's all in the article.
Hey guys thanks for the reply! But you should notice that lions is not category in animal. It is not a page linked as category. It is search result page. See two URLs in the search result, one for lion and second for lions have different results as well. How to index those search result pages is my question.
It's a good question BrijB but the all the answers given above are still correct. Search engines can't randomly generate search pages from an input box, they have to be given the link somewhere, even if we can't find it.
That link could come from anywhere, a sitemap like Shubham Tiwari mentioned, a category page, a topdown menu, another website who links to it. Photobuckets search will try and provide relevant results for whatever you input. So in your case, a link exists to the lion page and the lions page. I think the a large sitemap index is probably the best guess, but the internet is a big place!
Great post, personally I liked the part about making quality landing pages, months ago I started to create landing pages quality and am having very good results.
Worth reading for eCommerce marketing guys! I would like to share one experience over here. I am working on one eCommerce website where we have 3,00,000+ products. We found certain issues to index / rank category pages. Because, Website have 1000+ category pages and certain category pages contain very few products. And, It's creating low quality / less information pages on website.
We have developed website on Magento technology. Title and Meta Description were populated with Magneto formula. So, It was not working for us.
As you mention, URL structure and Title are very important .... Similar, We have decided to draft Title Tag and Meta Description manually for each category page by defining primary and secondary keywords. Honestly, It's working out well and able to see good hike over search impressions and clicks data.
I have a idea about indexing content based on usages and context. Instead of Indexing HTML structure ,looking for inbound and outbound links. I believe in a unit : Information Unit based on its specification
Like a product as a Unit chunk of Information where Price,Colors ,Weight,Usages are properties
Like Job as a Information Unit where Job Provider,Start Date, Salary,Eligibility are Properties
Using this technique, Indexing and retrieval will be much easier
Hello,
Nice article, thanks for sharing, one question I have maybe you guys have some suggestions, what if you have some content that can expire after lets say 30 days, but the person who posted it, may come later and prolong the listing ? what's the best in this case:
a) showing a page saying that page has expired returning a 200 header
b) or 301 redirect it to its category ?
Regards
Thanks you for share your experience.. [styyo.com fashion ]
A great post where you discover the importance of security and bad people indexing referring ago. A greeting and THANKS for your help Animo .
Congratulations for the post. I have been very helpful examples you provide. Thank you.
Very valuable stuff here Dominic as I have been facing the exact same issues with a jobs site I am managing. My team and I have wrangled back and forth on all of these points and it's good to see some solutions validated and others we didn't fully take into consideration.
One area we are having difficulty with is a sunsetting strategy for expired pages. For example, /marketing-jobs-nowhere-ca has 50 job listings and all are for a single company but now that company went under or for whatever reason pulls all the job listings. How and where would we 301 this URL along with the company job listings page (/company-jobs)? Would we just noindex/follow until jobs return to either of these pages then revert back? Is there a problem with constantly switching back and forth between noindex/index? I noticed Indeed seems to noindex pages with less than 10 results -- thats pretty aggressive!
Also, how and when do you decide to breakout and create a new category page? For example, /marketing-los-angeles-ca will always contain results but when should we create /virtual-reality-marketing-los-angeles or any other sub-category of a paopular category page?
great post :)
Some websites like Certainly decide on to no-index all their job adverts, whereas some other web-sites like Craigslist index all their person listings due to the fact they'll drive extended tail site visitors.
Great Post. I add the problems with the Meta Tags: When writing a title tag, it is important to keep it under 70 characters so that it is not truncated in the results. Put the most important keyword first, followed by the second and third most important keywords. These should be separated by a hyphen or a pipe. Robots.txt issues: The robots.txt file is an important file on your server because it informs search engine spiders on how to engage with your site to index your content. Just a minor change of one character in the robots.txt file can cause specific indexing issues, and, unintentionally, you could be blocking your whole site from being indexed. Your site then will not show up in search results for anything. 302 redirects instead of 301s: A 302 redirect reflects a temporary redirect. If you are removing a page for a short period of time to make changes you want to use the 302. 404 errors: You can find the crawl errors section in your Webmaster Tools in the left sidebar under the “Crawl” dropdown. Once you click on “Crawl Errors,” you will see the “Not found” section. Internal linking: When a search engine crawler is going through your site, it counts on your internal link structure to bring it to every page of your site so it can be indexed easily and clearly. With a poor internal linking structure, there may be pages the crawlers cannot reach because there are no internal links providing direction. Make sure that every page of your site has links going to it. Good Luck!!
Thanks for the tips, I totally agree that the whole thing is indexed and can provide value. I think forcing users to make a shorter title makes them think more about the content and write more descriptive titles.
Very clever, this makes a lot of sense. Many thanks.
There are several sites that generate automated pages with search terms of users, and generates internal duplicate content, should we be so fool to not realize that they are making a big mistake?
This is common sense, anyone with half a brain can understand that you should not create systems that generate automatic indexable pages.
They spend so much money in developing its website, but can not spend money on an SEO.
Is amazing...
We just launched our listing site. Thanks to you and Moz for providing such helpful info. If anyone wants to give feedback on things our site is doing right or wrong, check us out at https://findthebest.gift/
Thanks for the post Dominic! Very informative.
"But with so many tools now available for businesses to control this information, it is surprising that the data consistency problem is still so common."
I'm not surprised. Two years ago I took on a multi-location business that had used yellow page tracking numbers (over 20 different #s) through the early 2000s, and one location had moved 8 years ago. I spent 2-6 hours a week for 2 years trying to fix/kill/correct the seemingly infinite permutations of wrong phone numbers/old address that were associated with the business name. It's *mostly* better, but there's still a lot of screwed up listings for them out there. And that's a business that understood the importance of their online presence and was willing to invest significantly in correcting the problem. Most businesses aren't nearly as aware, much less have the wherewithal to sink that kind of time/money into fixing the problem. (And yes, I've tried paid services-IMO it actually exacerbated the problem).
Great post Dominic! you explained in excellent way..Thank You for sharing your experience with us it is very usefull.
very nice and effective post with proper research.you have covered everything in your summary. and i also focused on making perfect category pages with proper URL structure and Title meta. and its work well for me.i think this is one important lesson that every SEO must learn.
It's a very useful post Dominic and congratulations for your first post at Moz. Though, you've covered everything in detail, I'd like to know what kind of challenges you're facing after the google updates? I mean, does listing sites possess some other concerns?
Thanks
Hi Dominic
Congrats with your first post. I really liked it but after reading it I still have some questions:
1. Concerning the expired content - you mention the article on moz.com but that only lists 3 options (301/404/leave the content in place) - there is however a 4th option - the "unavailable_after" meta tag (as explained here) - what's your opinion on this tag?
2. The Free text example of indeed - isn't that a very risky strategy (as Google condemns indexing search results pages)?
As this article will probably get referenced a lot on Moz.com - it might be interesting to include this article on faceted search from the Google Webmaster blog which list some common mistakes with facetted navigation
rgds,
Dirk
One of the most difficult, and often overlooked, areas is information architecture. When you drill down to very specific areas, many times content will get used over for these landing pages. This creates a BIG issue and to create valuable content for all of these landing pages is a manual chore. Becuase of that, many people will not do it and creates a big oppurtunity for people who do Great summary of it Dominic!
Hi Dominic!
As you said in the post, the landing page is very important to drive traffic conversion, as it surely will be the first customer contact to our business. These must be of high quality since landing pages are the tools with which to achieve our potential customers.
They must be designed strategically setting a clear objective, design, title, an interesting and well structured content, images, etc.
Thanks for sharing these problems and their solutions with us. The contents of cannibalization is possibly the most complicated to solve for me.
Great post!
Hi
Does duplication of content on listing site affect the SERP?
Thank You
Great post, I have worked on these kinds sites for many years and agree with a lot of what you are saying. Do you have any data on the blocking of listings to rank improvement vs loss of long-tail traffic etc, there would be a very interesting study there I think for someone.
A very nice and detailed post Dominic, the major problem with listing websites is the content duplication or thin content pages, we can also identify the most juicy terms and create separate pages for them with some content so they can rank better in SERPS and drive better conversions.
Very usefull articles!
Dominic, could you please tell, if you're deindexing listing pages that has less than 3 search results? Did you implement this method after launch of the website of it was like that since the beginning?
I assume, i may have a lot of such, but it's hard to estimate how much. I'm afraid i could lose a lot of landing pages. Do you remember how much % of landing pages did you de-index because of this change and how did it work? Did your site get a Panda-bonus? ;)
Please share more on this topic. I'm very intrested.
Br.
Roman
I don't have estimates for my site, but are you tracking your site with GA or something similar? You should be able to find all your search pages in the landing page report, set a segment for only organic search traffic and see how much benefit they're bringing you.
Really nice Article. . I like the landing page part the most. but how to managed Page quality ?? - MoonTechnolabs
Hi Dom
Informative post. The way you explained is really awesome. Thanks for sharing these info.
Thanks for the detailed post!
Hello Dom,
Very good explanation. I also like the landing page part :)
Hello Dom,
Good explanation.. I like the landing page part the most.
Hello, your article is very useful. Thanks for sharing.
Nice information. I like landing pages part in this post. Thanks for sharing such type of information about SEO.
thanks for discuss about this important topics.