In all my years of SEO consulting, I’ve seen many clients with wild misconceptions about XML sitemaps. They’re a powerful tool, for sure — but like any power tool, a little training and background on how all the bits work goes a long ways.
Indexation
Probably the most common misconception is that the XML sitemap helps get your pages indexed. The first thing we’ve got to get straight is this: Google does not index your pages just because you asked nicely. Google indexes pages because (a) they found them and crawled them, and (b) they consider them good enough quality to be worth indexing. Pointing Google at a page and asking them to index it doesn’t really factor into it.
Having said that, it is important to note that by submitting an XML sitemap to Google Search Console, you’re giving Google a clue that you consider the pages in the XML sitemap to be good-quality search landing pages, worthy of indexation. But, it’s just a clue that the pages are important... like linking to a page from your main menu is.
Consistency
One of the most common mistakes I see clients make is to lack consistency in the messaging to Google about a given page. If you block a page in robots.txt and then include it in an XML sitemap, you’re being a tease. "Here, Google... a nice, juicy page you really ought to index," your sitemap says. But then your robots.txt takes it away. Same thing with meta robots: Don’t include a page in an XML sitemap and then set meta robots "noindex,follow."
While I’m at it, let me rant briefly about meta robots: "noindex" means don’t index the page. “Nofollow” means nothing about that page. It means "don’t follow the links outbound from that page," i.e. go ahead and flush all that link juice down the toilet. There’s probably some obscure reason out there for setting meta robots "noindex,nofollow," but it’s beyond me what that might be. If you want Google to not index a page, set meta robots to "noindex,follow."
OK, rant over…
In general, then, you want every page on your site to fall into two buckets:
- Utility pages (useful to users, but not anything you’d expect to be a search landing page)
- Yummy, high-quality search landing pages
Everything in bucket #1 should either be blocked by robots.txt or blocked via meta robots "noindex,follow" and should not be in an XML sitemap.
Everything in bucket #2 should not be blocked in robots.txt, should not have meta robots "noindex," and probably should be in an XML sitemap.
(Bucket image, prior to my decorating them, courtesy of Minnesota Historical Society on Flickr.)
Overall site quality
It would appear that Google is taking some measure of overall site quality, and using that site-wide metric to impact ranking — and I’m not talking about link juice here.
Think about this from Google’s perspective. Let’s say you’ve got one great page full of fabulous content that ticks all the boxes, from relevance to Panda to social media engagement. If Google sees your site as 1,000 pages of content, of which only 5–6 pages are like this one great page… well, if Google sends a user to one of those great pages, what’s the user experience going to be like if they click a link on that page and visit something else on your site? Chances are, they’re going to land on a page that sucks. It's bad UX. Why would they want to send a user to a site like that?
Google engineers certainly understand that every site has a certain number of "utility" pages that are useful to users, but not necessarily content-type pages that should be landing pages from search: pages for sharing content with others, replying to comments, logging in, retrieving a lost password, etc.
If your XML sitemap includes all of these pages, what are you communicating to Google? More or less that you have no clue as to what constitutes good content on your site and what doesn't.
Here’s the picture you want to paint for Google instead. Yes, we have a site here with 1,000 pages… and here are the 475 of those 1,000 that are our great content pages. You can ignore the others — they’re utility pages.
Now, let's say Google crawls those 475 pages, and with their metrics, decides that 175 of those are "A" grade, 200 are "B+," and 100 are "B" or "B-." That’s a pretty good overall average, and probably indicates a pretty solid site to send users to.
Contrast that with a site that submits all 1,000 pages via the XML sitemap. Now, Google looks at the 1,000 pages you say are good content, and sees over 50% are "D" or "F" pages. On average, your site is pretty sucky; Google probably doesn’t want to send users to a site like that.
The hidden fluff
Remember, Google is going to use what you submit in your XML sitemap as a clue to what's probably important on your site. But just because it's not in your XML sitemap doesn't necessarily mean that Google will ignore those pages. You could still have many thousands of pages with barely enough content and link equity to get them indexed, but really shouldn't be.
It's important to do a site: search to see all the pages that Google is indexing from your site in order to discover pages that you forgot about, and clean those out of that "average grade" Google is going to give your site by setting meta robots "noindex,follow" (or blocking in robots.txt). Generally, the weakest pages that still made the index are going to be listed last in a site: search.
Noindex vs. robots.txt
There’s an important but subtle difference between using meta robots and using robots.txt to prevent indexation of a page. Using meta robots "noindex,follow" allows the link equity going to that page to flow out to the pages it links to. If you block the page with robots.txt, you’re just flushing that down the toilet.
In the example above, I'm blocking pages that aren't real pages — they're tracking scripts — so I'm not losing link equity, as these pages DO NOT have the header with the main menu links, etc.
Think of a page like a Contact Us page, or a Privacy Policy page — probably linked to by every single page on your site via either the main menu or the footer menu. So there’s a ton of link juice going to those pages; do you just want to throw that away? Or would you rather let that link equity flow out to everything in your main menu? Easy question to answer, isn’t it?
Crawl bandwidth management
When might you actually want to use robots.txt instead? Perhaps if you’re having crawl bandwidth issues and Googlebot is spending lots of time fetching utility pages, only to discover meta robots "noindex,follow" in them and having to bail out. If you’ve got so many of these that Googlebot isn’t getting to your important pages, then you may have to block via robots.txt.
I’ve seen a number of clients see ranking improvements across the board by cleaning up their XML sitemaps and noindexing their utility pages:
Do I really have 6,000 to 20,000 pages that need crawling daily? Or is Googlebot chasing reply-to-comment or share-via-email URLs?
FYI, if you’ve got a core set of pages where content changes regularly (like a blog, new products, or product category pages) and you’ve got a ton of pages (like single product pages) where it’d be nice if Google indexed them, but not at the expense of not re-crawling and indexing the core pages, you can submit the core pages in an XML sitemap to give Google a clue that you consider them more important than the ones that aren’t blocked, but aren’t in the sitemap.
Indexation problem debugging
Here’s where the XML sitemap is really useful to SEOs: when you’re submitting a bunch of pages to Google for indexing, and only some of them are actually getting indexed. Google Search Console won’t tell you which pages they’re indexing, only an overall number indexed in each XML sitemap.
Let’s say you’re an e-commerce site and you have 100,000 product pages, 5,000 category pages, and 20,000 subcategory pages. You submit your XML sitemap of 125,000 pages, and find out that Google is indexing 87,000 of them. But which 87,000?
First off, your category and subcategory pages are probably ALL important search targets for you. I’d create a category-sitemap.xml and subcategory-sitemap.xml and submit those separately. You’re expecting to see near 100% indexation there — and if you’re not getting it, then you know you need to look at building out more content on those, increasing link juice to them, or both. You might discover something like product category or subcategory pages that aren’t getting indexed because they have only 1 product in them (or none at all) — in which case you probably want to set meta robots "noindex,follow" on those, and pull them from the XML sitemap.
Chances are, the problem lies in some of the 100,000 product pages — but which ones?
Start with a hypothesis, and split your product pages into different XML sitemaps to test those hypotheses. You can do several at once — nothing wrong with having a URL exist in multiple sitemaps.
You might start with 3 theories:
- Pages that don’t have a product image aren’t getting indexed
- Pages that have less than 200 words of unique description aren’t getting indexed
- Pages that don’t have comments/reviews aren’t getting indexed
Create an XML sitemap with a meaningful number of pages that fall into each of those categories. It doesn’t need to be all pages in that category — just enough that the sample size makes it reasonable to draw a conclusion based on the indexation. You might do 100 pages in each, for instance.
Your goal here is to use the overall percent indexation of any given sitemap to identify attributes of pages that are causing them to get indexed or not get indexed.
Once you know what the problem is, you can either modify the page content (or links to the pages), or noindex the pages. For example, you might have 20,000 of your 100,000 product pages where the product description is less than 50 words. If these aren’t big-traffic terms and you’re getting the descriptions from a manufacturer’s feed, it’s probably not worth your while to try and manually write additional 200 words of description for each of those 20,000 pages. You might as well set meta robots to "noindex,follow" for all pages with less than 50 words of product description, since Google isn’t going to index them anyway and they’re just bringing down your overall site quality rating. And don’t forget to remove those from your XML sitemap.
Dynamic XML sitemaps
Now you’re thinking, "OK, great, Michael. But now I’ve got to manually keep my XML sitemap in sync with my meta robots on all of my 100,000 pages," and that’s not likely to happen.
But there’s no need to do this manually. XML sitemaps don’t have to be static files. In fact, they don’t even need to have a .XML extension to submit them in Google Search Console.
Instead, set up rules logic for whether a page gets included in the XML sitemap or not, and use that same logic in the page itself to set meta robots index or noindex. That way, the moment that product description from the manufacturer’s feed gets updated by the manufacturer and goes from 42 words to 215 words, that page on your site magically shows up in the XML sitemap and gets its meta robots set to "index,follow."
On my travel website, I do this for a ton of different kinds of pages. I’m using classic ASP for those pages, so I have sitemaps like this:
When these sitemaps are fetched, instead of rendering an HTML page, the server-side code simply spits back the XML. This one iterates over a set of records from one of my database tables and spits out a record for each one that meets a certain criteria.
Video sitemaps
Oh, and what about those pesky video XML sitemaps? They're so 2015. Wistia doesn't even bother generating them anymore; you should just be using JSON-LD and schema.org/VideoObject markup in the page itself.
Summary
- Be consistent — if it’s blocked in robots.txt or by meta robots "noindex," then it better not be in your XML sitemap.
- Use your XML sitemaps as sleuthing tools to discover and eliminate indexation problems, and only let/ask Google to index the pages you know Google is going to want to index.
- If you’ve got a big site, use dynamic XML sitemaps — don’t try to manually keep all this in sync between robots.txt, meta robots, and the XML sitemaps.
Cornfield image courtesy of Robert Nunnally on Flickr.
Has anyone got an interesting example of using XML sitemaps to diagnose what they needed to do to their content to get Google to start indexing a certain class of pages?
I'd love to see breakpoint stats on something like minimum image size, or original vs. stock photo image, adding a video to get the page indexed, or internal linking or clicks-from-home-page minimums.
Worth pointing out that "XML" sitemaps don't have to be XML format. They can just be a text file of URLs separated by a new line, and they're just as valid and trusted as actual XML. A very good option to avoid complex dev work and you can even make them on your own machine if it's a smallish site.
However, if you want to implement hreflang via XML sitemap then it'll need to be true XML
Good point. But if you have a small site, you might as well use the free version of Screaming Frog and let it generate a complete XML sitemap for you. Then you can tweak priorities, last update dates, etc. as needed.
Seems priorities are ignored according to Google: https://twitter.com/methode/status/846796737750712...
Interesting--I guess they must have seen very few people using them in a helpful way!
Great Post!
I would like to add one little thing: I also exclude every URL that contains a Canonical to another Page -because it tells Google, that I do not want this 'non-canonical' Page to be indexed.
Greetings from Germany!
Excellent point!
Thanks Michael for nailing the point down in plain language without much of the technical jargon we usually see on blog posts about XML sitemaps.
Now, apart from that, I also want to point out another important thing when it comes to XML Sitemaps - some sites don't realize that the XML Sitemaps actually gather all the pages from the site - including the ones that were crafted with care - like the lead generation pages (lead magnets) where PDF downloads are offered in exchange for reader's email IDs.
So, one could essentially grab the PDF without the site owner's requirement of providing the email address. I have come across hundreds of sites with this problem and have personally emailed them to fix it. Heck, I wrote a book on this. (I am not including its link here and end up appearing spammy) Nevertheless, this appears to be seldom addressed by site owners. Perhaps the seemingly technical stuff scares them out, though it just boils down to plain common sense.
That's a great point, Arun. If the do this, not only could a savvy web developer see through this and get the PDFs directly, but it would also be encouraging Google to index those PDFs directly, so non-developers might get to them without going through the paywall...directly from search results!
Totally agree!
Aptly named, here's the name of the book I mentioned earlier (just realized, I didn't give the name) - The Backdoor in Your Blog.available on Amazon.
A well explained guide of XML-sitemaps. I was also having some of the myths discussed in this post, as it helps to get page index, in fact, I have been taught in my SEO Training, but now got clear. This is how we learn many things from MOZ blogs.
Great post Michael,
Have done something similar for an eCommerce website in past. And after optimizing the sitemap and robots.txt, we saw better crawling stats in GSC.
The issue was something like this, the eCommerce has created specific pages for all their categories, yet they were allowing dynamic search URLs get indexed, even they had these URLs in sitemap, which is dynamically generated.
Another issue was, they have user profiles on their website, that only contains order history and related stuff, these URLs were also a part of the sitemap in big number.
We had a discussion with our client over the importance of such user profiles for search users, and we decided to remove them from the sitemap after that. Then, we got all these profile and dymanic search URLs deindexed from the search engine followed by blocking them from robots.txt. Within days, we saw improved crawl stats for the website.
Thanks
Thanks Praveen...this is probably one of the biggest problems e-commerce sites have: where the very helpful UX gives you filtering, sorting, and user options that cause incredible numbers of variations on what really is pretty much the same page of content.
Wow! Very helpful information. Now I should check how to generate my dynamic xml sitemaps on my magento with only my important pagrs.
Thank you!
You don't need Magento, really....just any server-side programming language that can access your Magento database.
Hello Michael!
Thanks for your advice, I will keep it in mind from now on.
Sometimes we have the best information in front of our eyes and we do not realize that
I've learned a lot about XML sitemaps in a single post, clarifying several ideas
I'll share the link so others can read it
Excellent post Michael, I use Yoast plugin and that helps me solve most of these problems.
What about All in Seo?
Not sure. I used All in 1 SEO several years ago, but I've since switched all of my sites and my clients' sites to Yoast.
What a great recommendation about the utility pages. I have been wondering if the no value pages for search on a site and more of a user tool should be ignored or indexed, and you just answered that thought. I really think you touched on some great points in this read by talking about both the value of sitemaps and how Google and other search engines have a pre-compiled algorithm that will determine if the page is work indexing.
Last note on the e-commerce indexing fantastic when a person is wondering why there are so many products not being consumed by the index bot.
Thanks for the contribution to the Moz community.
Thanks Tim!
Great summary on XML Sitemaps Michael! I'd be lying if I said I didn't have a couple misconceptions about them throughout the years, but you summed it up quite nicely and this will be great to refer back to. Also, I most definitely agree that understanding the difference between a utility page and a search landing page for your website is crucial.
Thanks Nicholas!
Excellent Post Michael! Very helpful!
Great post, Michael. Thanks!
XML is always a problem. A mismatch between xml and robots.txt is real.
Thanks! And I agree. It gets even worse when meta robots doesn't line up with robots.txt and that doesn't line up with the XML sitemap.
Oh my god... Seriously, all these years i thought just add xml sitemap is enough to get an attention from Google. honestly i never know there are so many things in xml sitemap. Good that at least i have learned now. Thanks a lot
I want to add 2 important things which needs to be understood along with this great article!
1) HTML Sitemap: As Michael explained XML Sitemap is like giving clue to Google that these pages are important for Indexing whereas HTML sitemaps are usually give clue to visitors to have a better and easier site experience.
2) XML Sitemap Priority: I often saw that client assigns a high priority(1.0) to all of the URLs on sitemap but It won't help ever. It's only a a hint to Search Crawler to select between URLs on the same site.
I Have this website with over 300,000 index pages that the users add content themselves. how am I supposed to make XML sitemap with too many links? Also, What if the users delete some content and it remains in my map?
here is the site
For big xml sitemaps, you can break them into part and upload them separately.
I agree with Shiv--break it into many smaller sitemaps. Google limits you to 50,000 URLs per sitemap, in fact. You should be generating your sitemap automatically, or at least on a very regular basis, from the actual content in your CMS.
Hi Alireza, I reviewed your website and I recommend you to to make category wise sitemaps. i.e Electronic Components has separate sitemap and others have the same. Please let me know if there is any follow-up question.
Definitely agree. If you have an HTML sitemap, and you're finding a lot of users are resorting to the sitemap to find what they're looking for, then this is a good indication that you need to improve your main navigation!
Agreed on the sitemap priority number. People need to understand that it's there for you to give Google a clue as to which of two or more pages about the same topic is the more important one, i.e. your category page about purple widgets vs. a blog post about purple widgets. It's not going to affect how your page ranks against pages from another website.
Well said. Google clearly tells that "Design as much as possible user friendly and responsive website, it'll automatically add SEO value". HTML sitemap plays big role in user friendliness. Thanks Michael!
Great technical article! Very useful for Seos without a technical background like me.
Thanks Mark!
Very, very helpful and ready for immediate application after I resolve some areas of ignorance. For example, I now understand that any pages behind password protection should be noindexed as they are not landing pages. But, over half of my pages are PHP action pages with no HTML block. Do these pages need to be noindexed?
Any pages that are password-protected shouldn't really need noindex, unless there's actually a way for Google to find a link to them and get the content without logging in as one of your users. If that's the case, then you probably need to work on your login security :-).
For your PHP pages that have no HTML on them, I'd block those in robots.txt. There's no point in letting Google crawl those as they have no outbound links to send link juice to other pages on your site.
I have been weighing the benefits of publishing a series of small articles to boost regular content but I am worried about producing regular content. Would I use XML Sitemaps to keep the crawlers focused on higher quality content?
If you only have your more important articles in your XML sitemap, it MAY cause Google to crawl those first, especially if you resubmit that sitemap.
Hello,
I have used XML sitemaps plugin on my WordPress site from 1 years, it was working fine, but a few days ago I found some spamming issue in it, when i am try to click on "XML sitemap", and I have started ping my site manually.
Hi Michael, thanks for the article. What do you think about uploading sitemaps regularly based on the months with the latest pages?
For example "sitemap-2017-april.xml" etc.
Thanks. Cheers, Martin
When you submit an XML sitemap in Search Console, it's a hint/suggestion to Google that you've either updated that content or it's new. So, if you've got new articles in that sitemap, then that can be a good idea.
But that sounds like a lot of manual work to me :-).
I'd do something programmatically that pulled the latest 10 days worth of articles, generated a newest-articles.xml sitemap, setting the modification frequency to daily on all the URLs.
I think this article could help so much people like me, because I just use the XML sitemap plugin by Arne Brachold and i do not configure anything. It is also true that my sites are so small and therefore the importance of this tool could be less than for huge projects.
I get the below error for my XML sitemap in Search Console. Not able to resolve it :/
Your Sitemap or Sitemap index file doesn't properly declare the namespace. Expected: https://www.w3.org/1999/xhtml Found: https://www.sitemaps.org/schemas/sitemap/0.9
Parent tag: url
Tag: link
I'm betting a special character in there somewhere is messing up the XML. What did you edit it with or create it with?
Do you have any advice for small sites? I have a sitemap that updates daily and Google still only indexes weird pages. I even used the Googe (XML) Sitemaps Generator Plugin for Wordpress and it's still a mess. Our blog doesn't show up at all and pages that don't exist, like "portfolio tag" and "branding tag" show up constantly no matter how many times I block them.
Hey Micheal, Just to touch on what you said regarding utility I often ask myself before posting anything on one of our websites for example "Is it relevant?". It sounds kind of odd, but when your writing content 11 hours a day 4 days a week it does get tiresome and easy to drift off topic. But with regards to this post and XML sitemaps, your absolutely right.
I often tell my clients, we have to fix your layout, drop some keywords and make your website even and consistent. Many SEO agencies here is Australia often forget to write their content for humans to read and search engines to rank. If a human wont read my content, why would a search engine?
Out of all the posts, pages and back links, I have submitted to google, the one issue that gives me anxiety is: Sitemap. I didn't think about dynamic sitemaps before until now, it makes a lot of sense. Thanks for sharing.
Don’t underestimate an xml sitemap. And make sure it is setup and working in the Google Search Consol and Bing Webmaster Tools. To many forget the xml sitemap importance.
Michael, excellent content thanks for posting! We have a Wordpress site with 50k+ indexed pages. I've been advised against using Yoast to manage our XML-Sitemap for our site specifically and am currently using ScreamingFrog to manually create the XML sitemap.
At the end of the day, do we need to build our our xml sitemap based around the rules you mapped out above specific to our content? Or is there another tool/process you'd recommend? Right now our process is very manual and I want to find a more automated/optimized route to handling our XML sitemap.
For large sites, I recommend building internal processes for generating your sitemaps. Break your content down into various types, and generate a separate sitemap for each type. For my travel site, for instance, I have an XML sitemap for just hotel pages, another for travel specials, another for static pages, and a set of them (Yoast-generated for these) for the blog pages (only the blog part of my site is WordPress). It's a relatively simple thing to iterate over all of a certain type of record in your database and spit out the URLs for those types of entities, in XML sitemap format.
Good Information.
But How To Find Which page are index and Which are pending in Google?
Break your sitemap into many smaller sitemaps. You can then look for sitemaps that have a low indexation rate, and then that's where your problems lie. You can then take THOSE problem sitemaps, and break them into smaller sitemaps even further, based on whatever hypothesis you have on why some of those URLs aren't getting indexed and others are.
What if the indexed pages by google are higher in number than any possible XML site map we can create? do we stilll need one?
Absolutely. In fact, this is an indication that you have a big problem with indexation, in that Google is finding and indexing pages that you don't think are important or potential search landing pages! Likely that means they're very light on content...and if Google ends up indexing them, then from an overall site perspective, Google is seeing the average content quality per page as lower than they should.
As an example, let's say you have a page for sharing a URL from your website. Let's say this page takes some parameter that indicates the page to be shared, and at the top shows the heading from the page and a snippet from the content, plus the usual form fields for sharing...just enough content so that Google does decide to index it. You're not going to put all of those pages in your XML sitemap, of course. If Google is indexing those, and you have 1000 pages of real content on your site, you've now got Google indexing 1000 good pages + 1000 share-this pages of non-content. And so Google will see half your site as pretty marginal content.
Hello Mike, so pretty much an index bloat, which in the long run is going to affect how Google sees a website, ie is it a quality site, or low quality ( Low EAT) site.
This means that even though a lot of pages are indexed, the crawl rate will go down, the over all rankings will be affected, or worse make it harder to do clean and propoer SEO?
Also, I have created hundereds of sitemaps using screaming frog paid liscence , , inculded sub domains, images, videos, etc but never set priorities. This may be a good idea but googlebot ultimately will do what it things is best, which pages it feels is most relevant.
I have never created a dynamic site map - can you please point me to a resource or tool?
Thank you and this is terrific post.
@seogrowthhacker from San Francisco
Hello Vjay,
I think you're exactly right on the index bloat/quality comments.
For dynamic sitemaps, I don't know that there's a tool for that. What I have done is written database queries to return the values I need to figure out all page URLs for a given type, and then form the URLs the same way I'd form them on the web pages that list links to those pages....but instead, spit out XML in the sitemap syntax.
Hi, great post and very helpful. We have a few websites but one of them https://flyusanywhere.com/ has yoast and I tried to activate the google xml tool as well but it won't allow me to run both as it says they will get confused. Is it better to deactivate the Yoast one and run the Google version or what do you think is best?
Many thanks
There's no problem in Search Console in submitting a number of different sitemaps. Even if some URLs are included in more than one, that should be just fine. I do this all the time.
Having said that, there might possibly be a conflict between the two plugins, i.e. something simple like they're both trying to write out to sitemap_index.xml or something like that.
This was a great post. I'd also include canonical URLs in Bucket #2. I've instances where a product feed - that generates the XML Sitemaps - has dynamic parameters to reference SKU's or unique ID's, canonical to the clean URL, but only the SKU URLs added tot he sitemap.
Absolutely agree. Good point Mark!
Thanks for sharing and so just a quick question for an insurance website. Please also forgive I'm still a layman but If I have agents/brokers that access a training or sensitive information section that is not intended for public eyes or indexing, isn't this where no index no follow could apply?
Also does yoast plug in automatically update xml with meta in their no index page option mentioned above?
I wouldn't use just noindex for those, I'd make sure those pages are password-protected instead. Otherwise not-very-well-behaved bots and scrapers will still be able to see (and perhaps copy) those pages.
Important note with Yoast configuration: you MUST make sure that what you're including in your XML sitemaps aligns with what you're indexing/noindexing on the pages themselves. It doesn't do this for you automatically.
Is it good to use a plain simple straight forward sitemap or a tree like sitemap in an e-commerce website?
Google does not follow the sitemap at all, they crawl more than what the sitemap says, they sort of juice out everything they can find in your domain, which is worthwhile and fresh. This is the main issue for so many duplicate contents especially in e-commerce platforms. Google should be considering the use of sitemap strictly especially in e-commerce websites.
I generally recommend for e-com sites creating a bunch of separate sitemaps for similar pages. Note I said "similar" and not "related"...I wouldn't create a sitemap for all types of pages in one product group, for instance...instead, I'd create a sitemap for blog posts, one for all category pages, one for all subcategory pages, and then one or more for all product pages. You want to be able to see what types of pages are giving you indexation nightmares.
You were unclear as to when it was a good idea to use noindex,nofollow so I thought I'd provide an example.
I use noindex on pages that shouldn't ever be seen (such as a web app) in search engines. While 99% of the time they are accessed by a user/pw wall, I also have a custom HREF and script that will log you into the demo account, thus an avenue where a crawler could find themselves on a page that should never be in the index.
I use nofollow when the majority of links on that page are to other noindex pages, such as in the web app.
Hi Mario, I think I covered that pretty well in the Consistency section? I wouldn't use nofollow on a page unless 100% of the outbound links are to noindexed pages....otherwise, you're just throwing away link juice.
You did, "you’re being a tease." So you're contradicting yourself by saying it's okay to be a tease as long as there is at least 1 link that should be followed, when that's just not true.
I think you misunderstood. I think it's perfectly fine to tell Google you'd like the outbound links from a page to be counted, but that you don't think the page itself is index-worthy content.
Hi Michael,
Thanks for the article.
I have a question regarding how to differentiate between utility pages and high quality search landing pages.
My company is currently working on creating a new ecommerce site for one of our clients who runs a local business. This store has hundreds of products, and I've noticed that all of the product descriptions are word for word the same with just the name of the product being different.
I understand that in an ideal world, we would create unique descriptions for each product, but this client doesn't have the time or money to devote to such an effort for his hundreds of different products.
Since there is so much duplicate content on these pages, would it be a bad idea to noindex, follow these product pages?
With the current site, these pages are being indexed, and I'm wondering if we couldn't improve our client's rankings quicker by not indexing them in the new iteration of his site vs. spending the time, effort, and money to create unique product descriptions with quality content (which isn't a viable option currently).
What are your thoughts on an approach such as this?
Thanks,
Taylor
Hi Taylor,
I think what I would do is this: look at search traffic in aggregate to those product pages--try using URL patterns in Search Analytics in Search Console to see this. If you're not getting search traffic to those pages anyway, then I'd noindex them, as you're right....they may be dragging down your rankings for other pages on the site. If you ARE getting search traffic to them, leave them alone else you're cutting off traffic from Google.
Note that I believe that Google has some sort of overall site quality ranking factor that affects your best pages based on something like the average quality of pages on your site....I believe this based on what I've seen happen on clients' sites when they've pruned off a lot of thin content. But, I don't recall ever seeing any statement from Google backing this up, so it's just my gut feel based on patterns I think I've seen.
Great post explaining XML sitemaps. However, I have noticed that if you want fast indexing, submitting to Google via Google search console is the fastest to get a page indexed. And if you domain has got reasonable amount of authority .... the page may start appearing in search results within hours.
Next for category pages . In case you want a category page to get indexed and rank in the search results make sure there is enough amount of relevant , unique text around 1,000 words or more the better. And then submit to Google with the option to index the linked pages as well
Regarding the internal linking issue..... For a domain with a reasonable amount of authority having a lot of internal links will definitely help in getting the page indexed faster.
Good point, Joseph. Submitting (or resubmitting, if you've made a major update) a page in Search Console is a hint to Google that you think it's important and worth crawling before whatever would normally be in the queue to crawl from your website.
Category pages: Google appears to be less fond than it used to be of plain old category archives pages where there's an H1 heading and then a list of either products or blog posts. Fair enough: really all that page is is a list of links (and that's what Google wants to be!). Improving the content on a category page by adding an overview, some images or videos--that makes for a better page about that topic, for sure. From a UX perspective, many users just want to see the products (or blog posts) because they're familiar with the topic overall, and so often people will put a snippet of the overview up top and hide the majority of it initially, and supply a "Read more" link or button.
That's great! Category pages will be helpful in "short tail keywords". Yes, I can relate your explanation on how a category page should be with IKEA's category pages.
For better performance, we must configure frequencies and priorities of each urls in XML sitemap. Do not use invalid URLs in XML sitemap and must validate them in Google search console.
Thanks Michael, lots of useful info in here, thanks for the help. Any reccommendations on how to structure the sitemap besides how important the content is? Ive seen some sitemaps that tell google what the content is ex products, blog, articl
I doubt Google pays attention to those other fields. See the comment above mentioned Gary Illyes' tweet saying even the priority field is "just noise".
Great and useful information.
I have a few doubts about the application. If you want to avoid indexing pages like 'Who we are' or 'Contact us' and other irrelevant pages you recommend using meta robots "no index, follow" right?
An easy way to do it for a Wp web includes Yoast Seo plugin. Is it correct this way or is there a better one?
Is there any way to know if a page is A, B, C, D...?
Great post. Very useful for non technical seos. Thank you!
If you don't care about potential recoil in website performance then robots.txt will be useful. But I recommend to do to noindex, follow because it indicates search engines that you do not want the pages to be indexed.
How do you see robots.txt affecting performance? It's not processed by the web server with every request, like .htaccess is.
I'm a big fan of the Yoast plug-in, and yes, there's a page setting that allows you to noindex specific pages. They've also got some very helpful settings like noindexing subpages of archives, noindexing tag archives, etc.
Thank you Michael.