Controlling Search Engine Crawlers for Better Indexation and Rankings - Whiteboard Friday

Comments 110

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

PIXUS

2015-07-17T06:28:40-07:00

Great vid Rand, got a bit distracted by that hair though, is that a temporary redirect down the right side of your head?

28 0

Great vid Rand, got a bit distracted by that hair though, is that a temporary redirect down the right side of your head?
Cancel
- Rand Fishkin
 
 2015-07-17T09:49:07-07:00
 
 Well, that's what you get when you tell your barber to "do something cool" in the back... :-)
 
 17 0
 
 Well, that's what you get when you tell your barber to "do something cool" in the back... :-)
 Cancel
Cedric Markwatson

2015-07-17T01:04:52-07:00

Such an incredible video for all of us. You have given really great examples for robots.txt and meta robots tag. Many newbies as well as experienced get confused between them, but after going through this awesome stuff, I don't think they would be in any doubt.

This is very simple, attractive and easy to understand video that helps us to grow our website. I will bookmark this post and share with my staff and friends for enhance their knowledge. Thanks Rand for helping us.

7 1

Such an incredible video for all of us. You have given really great examples for robots.txt and meta robots tag. Many newbies as well as experienced get confused between them, but after going through this awesome stuff, I don't think they would be in any doubt. This is very simple, attractive and easy to understand video that helps us to grow our website. I will bookmark this post and share with my staff and friends for enhance their knowledge. Thanks Rand for helping us. 
Cancel
- Rand Fishkin
 
 2015-07-17T09:49:39-07:00
 
 Thrilled to hear it! Let us know if you've got any specific issues or questions (or ask 'em here: https://moz.com/q)
 
 3 0
 
 Thrilled to hear it! Let us know if you've got any specific issues or questions (or ask 'em here: <a href="https://moz.com/q" rel="nofollow">https://moz.com/q</a>)
 Cancel
Sheena Schleicher

2015-07-17T08:50:06-07:00

I'm happy you brought up the Twitter rtxt example - I was confused as to why they would do such a thing & I'm even more surprised their SEO team (if your assumption is correct about www/non-www) would go this route. Great examples & reminders about which of these options are considered first and have the potential to block the others.

5 0

I'm happy you brought up the Twitter rtxt example - I was confused as to why they would do such a thing & I'm even more surprised their SEO team (if your assumption is correct about www/non-www) would go this route. Great examples & reminders about which of these options are considered first and have the potential to block the others. 
Cancel
- Rand Fishkin
 
 2015-07-17T09:58:43-07:00
 
 Yeah, not the smartest move on Twitter's part, IMO (although we don't know for sure exactly what their intentions might be, and some have speculated that Google actually asked them to do this as part of the partnership - I'm skeptical, but who knows!?).
 
 randfish edited 2015-07-17T09:58:53-07:00
 1 0
 
 Yeah, not the smartest move on Twitter's part, IMO (although we don't know for sure exactly what their intentions might be, and some have speculated that Google actually asked them to do this as part of the partnership - I'm skeptical, but who knows!?).
 Cancel
 - Ryan Hughson
 
 2015-07-17T12:04:05-07:00
 
 This wouldnt force other search engines to try to buy that "firehose" connection like google would it??
 
 1 0
 
 This wouldnt force other search engines to try to buy that "firehose" connection like google would it??
 Cancel
 - RAN_SEO
 
 2015-07-19T08:18:10-07:00
 
 Well Google is reportedly looking to hire a SEO manager so perhaps they don't know better themselves. :)
 
 1 0
 
 Well Google is reportedly looking to hire a SEO manager so perhaps they don't know better themselves. :)
 Cancel
Martin Oxby

2015-07-17T03:44:35-07:00

This is so important to get right. We had a major, albeit temporary catastrophe on a client where the web designer had been asked to do upgrades. They had rightfully applied the noindex,nofollow tag on the development URL but then did not change this back when they rolled out the upgrade.

The result was within 24hrs, pages started slipping out the index. Thankfully as we track stats daily we caught this before too much harm was done. Although we caught it within 24hrs, because Google had crawled a ton of the noindexes, it actually took 3-4 days for pages to stop being removed and then to get reindexed.

But does go to show that the META ROBOTS tag is honoured, and often swiftly so, so be warned and only stop search bots only if you're sure that's the right thing for your site.

4 0

This is so important to get right. We had a major, albeit temporary catastrophe on a client where the web designer had been asked to do upgrades. They had rightfully applied the noindex,nofollow tag on the development URL but then did not change this back when they rolled out the upgrade. The result was within 24hrs, pages started slipping out the index. Thankfully as we track stats daily we caught this before too much harm was done. Although we caught it within 24hrs, because Google had crawled a ton of the noindexes, it actually took 3-4 days for pages to stop being removed and then to get reindexed. But does go to show that the META ROBOTS tag is honoured, and often swiftly so, so be warned and only stop search bots only if you're sure that's the right thing for your site.
Cancel
- Rand Fishkin
 
 2015-07-17T09:51:48-07:00
 
 Excellent example Martin - thanks for sharing. I think this makes a strong case for having some sort of crawl monitoring/alerting set up (either Moz Analytics or Onpage.org or the like). Manual runs of something like Screaming Frog can be really useful too.
 
 2 0
 
 Excellent example Martin - thanks for sharing. I think this makes a strong case for having some sort of crawl monitoring/alerting set up (either Moz Analytics or Onpage.org or the like). Manual runs of something like Screaming Frog can be really useful too.
 Cancel
 - Martin Oxby
 
 2015-07-21T05:46:06-07:00
 
 Definitely. I've only just come across OnPage.org but crawl monitoring is definitely something worth looking into more. (Although until Google sort out their 'indexed URLs' bug I wouldn't put much store by WMT figures, so any external tool is a good idea!)
 
 1 0
 
 Definitely. I've only just come across OnPage.org but crawl monitoring is definitely something worth looking into more. (Although until Google sort out their 'indexed URLs' bug I wouldn't put much store by WMT figures, so any external tool is a good idea!)
 Cancel
- Salman Sharif
 
 2015-07-22T03:43:30-07:00
 
 Yes that can result in a catastrophic event, but if you are using no index tag or blocking pages through robots.txt in bulk, it generates a warning in Search console (former GWMT). So it is better to keep an eye on the search console on regular basis.
 
 1 0
 
 Yes that can result in a catastrophic event, but if you are using no index tag or blocking pages through robots.txt in bulk, it generates a warning in Search console (former GWMT). So it is better to keep an eye on the search console on regular basis.
 Cancel
Michael Kohlfürst

2015-07-17T00:56:24-07:00

Great, that you guys also always return to BASICS for our starting members and also to refresh and to get rid of wrong opinions. Thnaks to Rand and the whole MOZ Team, Michael from Austria

PromoMasters edited 2015-07-17T00:56:30-07:00
5 1

Great, that you guys also always return to BASICS for our starting members and also to refresh and to get rid of wrong opinions. Thnaks to Rand and the whole MOZ Team, Michael from Austria
Cancel
Yaniv Goldenberg

2015-07-18T13:34:32-07:00

Hey Rand,

As always, you are the BOSS.

You have a gift, it's rare to see people present with such fluency.

1) Do you think it's possible to consider high crawling bandwidth as a negative SEO factor?

2) Could you please elaborate on the best SEO practices for tags & categories

4 0

Hey Rand, As always, you are the BOSS. You have a gift, it's rare to see people present with such fluency. 1) Do you think it's possible to consider high crawling bandwidth as a negative SEO factor? 2) Could you please elaborate on the best SEO practices for tags & categories
Cancel
- PhilipKushmaro
 
 2015-07-19T06:30:20-07:00
 
 Hey Yaniv,
 
 That is a great question, and I think I can answer your second one. If I understood Rand correctly the best practice is to create [no index, follow] meta tags.
 This way we don't have to worry about duplicate content that can arise from this kind of navigation on the site but google will still follow the links. Which is important for internal link profile optimization.
 
 PhilipKushmaro edited 2015-07-19T06:30:34-07:00
 2 0
 
 Hey Yaniv, That is a great question, and I think I can answer your second one. If I understood Rand correctly the best practice is to create [no index, follow] meta tags. This way we don't have to worry about duplicate content that can arise from this kind of navigation on the site but google will still follow the links. Which is important for internal link profile optimization. 
 Cancel
Associate

Michael Cottam
Associate

2015-07-17T12:54:13-07:00

Super useful WBF Rand. I'd guess 1/3 of my clients have misused robots.txt and noindex,NOfollow vs. noindex,follow.

One quick note about 410 http responses: while it's supposed to indicate to the search engines that a page is gone forever, I have seen a case (6 months ago) where a client had a manual penalty based on pages that were returning 410 (they were created by site users and totally spammy), and in the manual penalty reinclusion request REJECTION, Google was citing those pages that had been returning 410 for several months. It was only when the client changed the DNS to remove the subdomain they were on entirely that the penalty was lifted. I'd have said this was a mistake in how Google's reviewers were handling this, personally. But...that's what I saw, so all be warned!

3 0

Super useful WBF Rand. I'd guess 1/3 of my clients have misused robots.txt and noindex,NOfollow vs. noindex,follow. One quick note about 410 http responses: while it's supposed to indicate to the search engines that a page is gone forever, I have seen a case (6 months ago) where a client had a manual penalty based on pages that were returning 410 (they were created by site users and totally spammy), and in the manual penalty reinclusion request REJECTION, Google was citing those pages that had been returning 410 for several months. It was only when the client changed the DNS to remove the subdomain they were on entirely that the penalty was lifted. I'd have said this was a mistake in how Google's reviewers were handling this, personally. But...that's what I saw, so all be warned!
Cancel
Neal Glazier

2015-07-17T09:08:40-07:00

I think it's worth making the distinction that Webmaster tool's removal function actually removes pages from the search results --- not Google's index. See John Mu's post here.

If you are dealing with things like duplicate content, it's probably not as effective as other options mentioned above. Thanks for the video!

3 0

I think it's worth making the distinction that Webmaster tool's removal function actually removes pages from the search results --- not Google's index. See John Mu's post <a href="https://productforums.google.com/forum/#!msg/webmasters/loZtSDv9aCk/WHu4f6drky4J" rel="nofollow">here</a>. If you are dealing with things like duplicate content, it's probably not as effective as other options mentioned above. Thanks for the video!
Cancel
- Rand Fishkin
 
 2015-07-17T09:57:53-07:00
 
 Good point Ashley! There's a difference between removed from index and removed from search visibility. Thanks for clarifying that :-)
 
 2 0
 
 Good point Ashley! There's a difference between removed from index and removed from search visibility. Thanks for clarifying that :-)
 Cancel
RP.Companies

2015-07-17T16:48:50-07:00

Rand on a blog site should catagory pages be noindex, follow?

RP.Companies edited 2015-07-17T16:49:45-07:00
3 0

Rand on a blog site should catagory pages be noindex, follow? 
Cancel
Sergio Redondo

2015-07-17T01:41:58-07:00

This post is something like a 'back to basics' but I admit that it's very necessary. Lately I'm seeing a proliferation of posts about this topic, and there's no doubt that the reason is its usefulness and necessity.

It is very important to differentiate between crawling and indexing. I think the problem exposed in the example of blogtest.html is more common than we think because sometimes we tend to obsess with 'blocking' a page (name it as you like), and we forget that some 'orders' need from a previous crawling to be effective.

Great post, Rand. One last thing: I love your T-shirt; very original.

3 0

This post is something like a 'back to basics' but I admit that it's very necessary. Lately I'm seeing a proliferation of posts about this topic, and there's no doubt that the reason is its usefulness and necessity. It is very important to differentiate between crawling and indexing. I think the problem exposed in the example of blogtest.html is more common than we think because sometimes we tend to obsess with 'blocking' a page (name it as you like), and we forget that some 'orders' need from a previous crawling to be effective. Great post, Rand. One last thing: I love your T-shirt; very original.
Cancel
- Rand Fishkin
 
 2015-07-17T09:55:30-07:00
 
 Thanks Sergio! Agree that lots of issues come up due to the fine points between crawling vs. indexing.
 
 2 0
 
 Thanks Sergio! Agree that lots of issues come up due to the fine points between crawling vs. indexing.
 Cancel
Peter Nikolow

2015-07-17T02:02:34-07:00

First - T-Shirt is amazing...

Second - i found all these complications with all mechanisms controlling crawlers little bit disturbing. Because there are many ways to shoot yourself in foot using them.

One of last example is brand new site with approx 100 pages. Designers put there canonical tag with one "small" issue. All canonical tags point to homepage. They just want to be sure that there isn't duplicate content. And i saw this 3 months after site is up.

Sometimes even small changes can break things. WordPress have one radio box about crawlability and SEO plugins (AIO or Yoast) add few more. And one incidental click can erase all site for crawling.

3 0

First - T-Shirt is amazing... Second - i found all these complications with all mechanisms controlling crawlers little bit disturbing. Because there are many ways to shoot yourself in foot using them. One of last example is brand new site with approx 100 pages. Designers put there canonical tag with one "small" issue. All canonical tags point to homepage. They just want to be sure that there isn't duplicate content. And i saw this 3 months after site is up. Sometimes even small changes can break things. WordPress have one radio box about crawlability and SEO plugins (AIO or Yoast) add few more. And one incidental click can erase all site for crawling.
Cancel
- Rand Fishkin
 
 2015-07-17T09:50:39-07:00
 
 Yup - we see this a lot, too. The crawl and indexing options can be powerful, but they also can seriously mess things up if not done properly. On the plus side, that's pretty much a guarantee that there'll always be demand and need for talented SEOs :-)
 
 4 0
 
 Yup - we see this a lot, too. The crawl and indexing options can be powerful, but they also can seriously mess things up if not done properly. On the plus side, that's pretty much a guarantee that there'll always be demand and need for talented SEOs :-)
 Cancel
Mustansar

2015-07-20T04:10:33-07:00

Yet again the best edition...was away for couple of days couldn't really catch up early. I was having the same issue of index we have few site with no index tag but those sites been index by both search engines. Now got an answer why they were indexed.

Thanks Rend for making it clear.

3 0

Yet again the best edition...was away for couple of days couldn't really catch up early. I was having the same issue of index we have few site with no index tag but those sites been index by both search engines. Now got an answer why they were indexed. Thanks Rend for making it clear. 
Cancel
Shubham Tiwari

2015-07-17T03:05:06-07:00

Hello Rand,

Good refreshment of some basic concepts as well as technical points about robots. One thing I wish to include is - If you disallowed any page in robots.txt file and expect that it will never appear in Google then you are little bit missguided. Because, if Google thinks that your disallowed page is relevant and could help to searcher based on search query, then it will be appear in search result as Google taking references from open directory sites like DMOZ.

So, if you really want to make your page away from search engine then you must have to put NOINDEX code and if you want to prevents search engine to show its own description taken from directory, use this code - <meta name="robots" content="noodp " />

Hope it helps to people.

3 0

Hello Rand, Good refreshment of some basic concepts as well as technical points about robots. One thing I wish to include is - If you disallowed any page in robots.txt file and expect that it will never appear in Google then you are little bit missguided. Because, if Google thinks that your disallowed page is relevant and could help to searcher based on search query, then it will be appear in search result as Google taking references from open directory sites like DMOZ. So, if you really want to make your page away from search engine then you must have to put NOINDEX code and if you want to prevents search engine to show its own description taken from directory, use this code - <meta name="robots" content="noodp " /> Hope it helps to people.
Cancel
- Rand Fishkin
 
 2015-07-17T10:01:56-07:00
 
 Yup! That's exactly what I noted with my visual in the first part of the video.
 
 Re: the description from the directory - we've seen that work sometimes and not so well other times. Google can pull descriptions from anchor text and other places it seems, and there's no way to stop them from doing that, sadly.
 
 2 0
 
 Yup! That's exactly what I noted with my visual in the first part of the video. Re: the description from the directory - we've seen that work sometimes and not so well other times. Google can pull descriptions from anchor text and other places it seems, and there's no way to stop them from doing that, sadly.
 Cancel
 - Shubham Tiwari
 
 2015-07-19T22:07:41-07:00
 
 Yes, because google believes to deliver relevant result only, so it would be either by us or goggle its self. Anyway, thanks for the reply :)
 
 2 0
 
 Yes, because google believes to deliver relevant result only, so it would be either by us or goggle its self. Anyway, thanks for the reply :)
 Cancel
Vishal Mehta

2015-07-17T03:05:49-07:00

Hey Wizard,

Correct, You have to use 'noindex' to completely remove from the results too. However, I didn't understand your point to ' let them crawl it' .

As per my experience with robots files, most of the times if the page is disallowed in robots.txt and doesn't have meta description ,in such case Google fetches important text or actionable text from that page and shows within search results.

For example,

Keyword "open site explorer" shows https://moz.com/researchtools/ose/ (Blocked in Robots.txt) on first rank with the actionable text "It looks like you have JavaScript disabled. JavaScript is required to use Open Site Explorer" at description.

If I direct put that URL into the search bar, it shows me Moz's footer page text "Moz doesn't provide consulting, but here's a list of recommended companies who do!" in the description. Question is how Google shows such results for disallowed URLs.

MehtaVishal edited 2015-07-17T09:59:17-07:00
3 0

Hey Wizard, Correct, You have to use 'noindex' to completely remove from the results too. However, I didn't understand your point to ' let them crawl it' . As per my experience with robots files, most of the times if the page is disallowed in robots.txt and doesn't have meta description ,in such case Google fetches important text or actionable text from that page and shows within search results. For example, Keyword "<a href="https://www.google.com/?gws_rd=ssl#q=open+site+explorer" rel="nofollow">open site explorer</a>" shows <a href="https://moz.com/researchtools/ose/" rel="nofollow">https://moz.com/researchtools/ose/</a> (Blocked in Robots.txt) on first rank with the actionable text "It looks like you have JavaScript disabled. JavaScript is required to use Open Site Explorer" at description. If I direct put that <a href="https://www.google.com/?gws_rd=ssl#q=https:%2F%2Fmoz.com%2Fresearchtools%2Fose%2F" rel="nofollow">URL</a> into the search bar, it shows me Moz's footer page text "Moz doesn't provide consulting, but here's a list of recommended companies who do!" in the description. Question is how Google shows such results for disallowed URLs. 
Cancel
- Rand Fishkin
 
 2015-07-17T10:00:41-07:00
 
 Hi Vishal - here's a good example from Moz: https://www.google.com/search?q=site%3Amoz.com%2Fa...
 
 You can see that Google's showing a disallowed page from Moz, but since they can't crawl it, they say: "A description for this result is not available because of this site's robots.txt – learn more."
 
 2 0
 
 Hi Vishal - here's a good example from Moz: <a href="https://www.google.com/search?q=site%3Amoz.com%2Fapi%2Fuser" rel="nofollow">https://www.google.com/search?q=site%3Amoz.com%2Fa...</a> You can see that Google's showing a disallowed page from Moz, but since they can't crawl it, they say: "A description for this result is not available because of this site's <a href="https://moz.com/robots.txt" rel="nofollow">robots.txt</a> – <a href="https://support.google.com/webmasters/bin/answer.py?answer=156449&hl=en" rel="nofollow">learn more</a>."
 Cancel
Amit Roy

2015-07-17T03:22:47-07:00

Hello Rand,

I had that WTF moment when I was beginner and it took sometime for me to overcome those through experiments. Google recently was true when it said that SEO's need to get their hands dirty in order to learn better. This is really great WBF for people who are struggling with such on-page techniques. We had a set of landing page hosted on clients sub-domain and the page was blocked in robots files. However, it was ranking well in search results for the targeted keyword when searches were done. Those pages we not intended for organic and were landing pages for AdWords. Later we did implement meta no-index and let it be crawled which helped us move it away from indexation. I guess many SEO's will get to learn a lot form today's WBF.

3 0

Hello Rand, I had that WTF moment when I was beginner and it took sometime for me to overcome those through experiments. Google recently was true when it said that SEO's need to get their hands dirty in order to learn better. This is really great WBF for people who are struggling with such on-page techniques. We had a set of landing page hosted on clients sub-domain and the page was blocked in robots files. However, it was ranking well in search results for the targeted keyword when searches were done. Those pages we not intended for organic and were landing pages for AdWords. Later we did implement meta no-index and let it be crawled which helped us move it away from indexation. I guess many SEO's will get to learn a lot form today's WBF.
Cancel
Icah Ats

2015-07-18T16:20:24-07:00

The best of article..thanks @Rand Fishkin.

2 0

The best of article..thanks @Rand Fishkin.
Cancel
Todd Maxwell

2015-07-19T10:19:12-07:00

I found this article quite helpful. Frankly, I’m a small business owner and knew little about the relationship between search engines and robot tags. I’m seeking new SEO approaches to better serve my business venture. Nevertheless, I found Rand’s explanation for when to disallow search engines in robots.txt file as well as when I should employ meta robots tags in a page header. The tutorial on no following links as well proved helpful. I especially enjoyed the content on meta robots and how they inhabit the headers of individual pages, whereby you can administer a single page through a meta robots tag. I understand now how this communicates to the given search engine whether it be Google or Bing and whether they should maintain a page in the given index and if they should continue to follow the links on a given page. I’m curious how some of these approaches can be applied to small business sites in terms of enhancing inbound marketing efforts to increase end sales? I would love to hear any ideas related to this. Thanks again!

2 0

I found this article quite helpful. Frankly, I’m a small business owner and knew little about the relationship between search engines and robot tags. I’m seeking new SEO approaches to better serve my business venture. Nevertheless, I found Rand’s explanation for when to disallow search engines in robots.txt file as well as when I should employ meta robots tags in a page header. The tutorial on no following links as well proved helpful. I especially enjoyed the content on meta robots and how they inhabit the headers of individual pages, whereby you can administer a single page through a meta robots tag. I understand now how this communicates to the given search engine whether it be Google or Bing and whether they should maintain a page in the given index and if they should continue to follow the links on a given page. I’m curious how some of these approaches can be applied to small business sites in terms of enhancing inbound marketing efforts to increase end sales? I would love to hear any ideas related to this. Thanks again!
Cancel
David Spelts

2015-07-19T11:51:25-07:00

Great article. For search results, I think the use of query string parameters allows great flexibility. Bookmarkable, yet still configurable using webmaster tools,

2 0

Great article. For search results, I think the use of query string parameters allows great flexibility. Bookmarkable, yet still configurable using webmaster tools, 
Cancel
aquaspressovending

2015-07-19T04:24:58-07:00

Hi Rand

Great post. Just about to become a full time Moz customer :-) We just launched a .com version of our site (our original is a .co.za) and didn't rel canonical all the .com pages which were exact duplicates. We now, after seeing a traffic drop in our .co.za results have rel canonicaled all the duplicate .com pages. Is this all we would need to do? Do we just wait now, for google to re-index, and can we expect our rankings to return to normal once this occurs?

Thanks!

Mike

2 0

Hi Rand Great post. Just about to become a full time Moz customer :-) We just launched a .com version of our site (our original is a .co.za) and didn't rel canonical all the .com pages which were exact duplicates. We now, after seeing a traffic drop in our .co.za results have rel canonicaled all the duplicate .com pages. Is this all we would need to do? Do we just wait now, for google to re-index, and can we expect our rankings to return to normal once this occurs? Thanks! Mike
Cancel
Roshan Chauhan

2015-07-19T22:59:37-07:00

Nice post Rand!

I have one question, I added "Robots.txt" file in my sub domain, e.g. "subdomain.mydomain.com/robots.txt" and I disallow sub-domain through Robots.txt. But still, it's showing in Google search like "A description of this result is not available because of this site's robots.txt – learn more." Why?

2 0

Nice post Rand! I have one question, I added "Robots.txt" file in my sub domain, e.g. "subdomain.mydomain.com/robots.txt" and I disallow sub-domain through Robots.txt. But still, it's showing in Google search like "A description of this result is not available because of this site's robots.txt – learn more." Why?
Cancel
- Salman Sharif
 
 2015-07-22T03:40:58-07:00
 
 It is happening because you have blocked the content through robots.txt which means the URL is indexed but the content of the page is blocked from the bot.
 
 1 0
 
 It is happening because you have blocked the content through robots.txt which means the URL is indexed but the content of the page is blocked from the bot.
 Cancel
 - Roshan Chauhan
 
 2015-07-22T21:15:58-07:00
 
 Thanks for the reply Salman!
 
 I have added following code "User-agent: * Disallow: /" in following root "subdomain.mydomain.com/robots.txt". Why Google indexed this URL, so what should I have to do to stop indexing the sub-domain URL? I don't want to index my sub-domain on Google search any more.
 
 1 0
 
 Thanks for the reply Salman! I have added following code "User-agent: * Disallow: /" in following root "subdomain.mydomain.com/robots.txt". Why Google indexed this URL, so what should I have to do to stop indexing the sub-domain URL? I don't want to index my sub-domain on Google search any more.
 Cancel
Simon Hayter

2015-07-20T07:58:09-07:00

I have to disagree with the advice given on this part "If I have content who quality I'm still improving that isn't ready for Google, what should I do?" Noindex on a few hundred articles if they are not ready..

I know I won't be a big fan when I say this but to be honest if your pages are not ready for Google then its doubtful they are ready for your visitors. If the page is thin or duplicate then your likely not giving your users an experience you want them to have and you should put those pages completely on hold.

Furthermore its a form of shaping your reputation with Google, I understand the reasons but at the end of the day if you follow the principles whatever is good for your users then its good for Google too, only time you should ever consider noindex is on sensitive pages.

It's also a very bad idea to noindex 'thin' pages, I often see people using "noindex, follow" thinking they will get the juice from any links made to it, which they won't...Even tag pages should be indexed, this is why we now have canonical links.

My advice is only ever use noindex for content that was never intended for Google. Using noindex on user usable pages reminds me of the days people used 'nofollow' on internal a hrefs in an attempt to manipulate the page flow.

simonhayter edited 2015-07-20T08:05:52-07:00
2 0

I have to disagree with the advice given on this part "If I have content who quality I'm still improving that isn't ready for Google, what should I do?" Noindex on a few hundred articles if they are not ready.. I know I won't be a big fan when I say this but to be honest if your pages are not ready for Google then its doubtful they are ready for your visitors. If the page is thin or duplicate then your likely not giving your users an experience you want them to have and you should put those pages completely on hold. Furthermore its a form of shaping your reputation with Google, I understand the reasons but at the end of the day if you follow the principles whatever is good for your users then its good for Google too, only time you should ever consider noindex is on sensitive pages. It's also a very bad idea to noindex 'thin' pages, I often see people using "noindex, follow" thinking they will get the juice from any links made to it, which they won't...Even tag pages should be indexed, this is why we now have canonical links. My advice is only ever use noindex for content that was never intended for Google. Using noindex on user usable pages reminds me of the days people used 'nofollow' on internal a hrefs in an attempt to manipulate the page flow. 
Cancel
Tobias Hövelborn

2015-07-27T15:33:58-07:00

Hi Rand, great post. My somewhat late addition:

I think the X-Robots HTTP-Header is a rather useful alternative to the meta robots tag:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
While providing similar possibilities, it enables you to address non-HTML Content like PDFs, Images as well. It’s also easy to apply for folders, certain URL patterns or whole domains, if you apply it via webserver configuration. Quite useful if you want to exclude a development URL in an effective and efficient way for example.

Regards, Tobias

2 0

Hi Rand, great post. My somewhat late addition: I think the X-Robots HTTP-Header is a rather useful alternative to the meta robots tag: <a href="https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag" rel="nofollow">https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag</a> While providing similar possibilities, it enables you to address non-HTML Content like PDFs, Images as well. It’s also easy to apply for folders, certain URL patterns or whole domains, if you apply it via webserver configuration. Quite useful if you want to exclude a development URL in an effective and efficient way for example. Regards, Tobias 
Cancel
mboyle327

2015-09-16T15:26:58-07:00

The section on search results is exactly the issue i'm trying to tackle. I've got huge amounts of crawl budget being spent on internal search results pages, but only a few variations of those searches (thought it amounts to thousands of pages) are truly valuable.

Thankfully i have a parameter that i can use to identify which ones i truly think are useful or not to external searches.

examples:
1. search?interface=ImSuperUseful
2. search?interface=ImAlsoSuperUseful
3. search?interface=UberUtilityOverHere
4. search?interface=NobodyGivesaCrapAboutThisExternally (v1, v2, v3 etc into perpetuity)
If I understood this correctly, the best bet would be to nofollow links to anything that falls into #4, but also include noindex, follow in the <head> of these pages. For the options 1-3, I should not nofollow links, and I should not include the noindex, follow. I should also not use the robots.txt to block any of these pages (1-4).

This still doesn't solve any problems with crawl budgeting, though, so how can I attempt to accomplish this? There are tons of links out to these pages (almost our entire site is search-controlled), and I'd rather not waste where nothing could be gained.

Finally, any thoughts on the url parameter controls in GWMT? if interface=ImSuperUseful&param2=xyz also creates a useless page, does that parameter tool in GWMT do anything to help this? Has anybody run any tests on its utility or effectiveness?

Thanks to anyone who can comment

-MB

mboyle327 edited 2015-09-16T15:29:02-07:00
2 0
The section on search results is exactly the issue i'm trying to tackle. I've got huge amounts of crawl budget being spent on internal search results pages, but only a few variations of those searches (thought it amounts to thousands of pages) are truly valuable. Thankfully i have a parameter that i can use to identify which ones i truly think are useful or not to external searches. examples: <ol><li>search?interface=ImSuperUseful</li><li>search?interface=ImAlsoSuperUseful</li><li>search?interface=UberUtilityOverHere</li><li>search?interface=NobodyGivesaCrapAboutThisExternally (v1, v2, v3 etc into perpetuity)</li></ol> If I understood this correctly, the best bet would be to nofollow links to anything that falls into #4, but also include noindex, follow in the <head> of these pages. For the options 1-3, I should not nofollow links, and I should not include the noindex, follow. I should also not use the robots.txt to block any of these pages (1-4). This still doesn't solve any problems with crawl budgeting, though, so how can I attempt to accomplish this? There are tons of links out to these pages (almost our entire site is search-controlled), and I'd rather not waste where nothing could be gained. Finally, any thoughts on the url parameter controls in GWMT? if interface=ImSuperUseful&param2=xyz also creates a useless page, does that parameter tool in GWMT do anything to help this? Has anybody run any tests on its utility or effectiveness? Thanks to anyone who can comment -MB
Cancel
Jason McKim

2015-07-20T07:38:03-07:00

Great explanation on what is going on there. My simple tools have been SEO plugins (just check the box!) and Redirection plugin...but see there is a little more strategy I can now employ when encountering certain situations.

2 0

Great explanation on what is going on there. My simple tools have been SEO plugins (just check the box!) and Redirection plugin...but see there is a little more strategy I can now employ when encountering certain situations.
Cancel
Nitin Manchanda

2015-07-20T03:48:19-07:00

Hey Rand,

Just a little confused about "nofollow" tags. Well, we were exposing all the search pages URLs (which are infinite pages in our case) from our site, that means we were exposing URLs having same content and hence, we were open for duplicate content penalty. For instance, a page with search query "mobile" might have absolutely same content as a search page with search query "mobiles" and we were dynamically linking these both of these pages.

So, to avoid this, I have put a "nofollow" to all the places from where these search pages were linked internally (can't noindex them because of some stupid reasons, and even changing the inter-linking logic is a little tricky part as of now). Does that tell bots that I don't trust my own site and hence nofollowing my own URLs?

What should be the best way to handle this particular issue? Have a look at this site: https://www.askme.com/delhi/search/pizza-hut, and check for all the links with having "/search/" in the it.

2 0

Hey Rand, Just a little confused about "nofollow" tags. Well, we were exposing all the search pages URLs (which are infinite pages in our case) from our site, that means we were exposing URLs having same content and hence, we were open for duplicate content penalty. For instance, a page with search query "mobile" might have absolutely same content as a search page with search query "mobiles" and we were dynamically linking these both of these pages. So, to avoid this, I have put a "nofollow" to all the places from where these search pages were linked internally (can't noindex them because of some stupid reasons, and even changing the inter-linking logic is a little tricky part as of now). Does that tell bots that I don't trust my own site and hence nofollowing my own URLs? What should be the best way to handle this particular issue? Have a look at this site: https://www.askme.com/delhi/search/pizza-hut, and check for all the links with having "/search/" in the it.
Cancel
- Simon Hayter
 
 2015-07-20T08:13:13-07:00
 
 If you have 3 pages labeled A, B and C...
 - A links to B,
 - B Links to C.
 A uses "noindex, nofollow" which typically means the Googlebot won't visit B and discover C...
 
 However this goes on the assumption that Google will respect the nofollow, while this is true it can often ignore the tag completely. The safest way to avoid content being indexed whcih was never intended for search engines in the first place is to use a noindex on those pages, followed by a rule in the robots.txt as a backup in case something goes wrong with the header responses.
 
 simonhayter edited 2015-07-20T08:14:54-07:00
 3 0
 If you have 3 pages labeled A, B and C... <ul><li>A links to B, </li><li>B Links to C. </li></ul> A uses "noindex, nofollow" which typically means the Googlebot won't visit B and discover C... However this goes on the assumption that Google will respect the nofollow, while this is true it can often ignore the tag completely. The safest way to avoid content being indexed whcih was never intended for search engines in the first place is to use a noindex on those pages, followed by a rule in the robots.txt as a backup in case something goes wrong with the header responses. 
 Cancel
PeopleEasy.com

2015-07-19T21:27:25-07:00

Very nicely described..

Thanks

PeopleEasy.com edited 2015-07-19T21:27:55-07:00
2 0

Very nicely described.. Thanks
Cancel
edensoftwares

2015-07-19T21:35:08-07:00

hiii,,

very nice article, i already know the use of robots.txt, but didnt know it so deeply, it will help me in doing my site.

2 0

hiii,, very nice article, i already know the use of robots.txt, but didnt know it so deeply, it will help me in doing my site.
Cancel
emretonguc

2015-07-18T03:21:48-07:00

Hi Rand,

Great video, thank you for that.

I have a question: I have a customer and his website is like a marketplace. Sometimes different sellers list same product so there might be same products with different SKUs. On the other hand, because every SKU has different URL, this creates duplicate content. How can I solve this problem? Does canonical solve this problem?

Thank you for your help

Emre Tonguç

2 0

Hi Rand, Great video, thank you for that. I have a question: I have a customer and his website is like a marketplace. Sometimes different sellers list same product so there might be same products with different SKUs. On the other hand, because every SKU has different URL, this creates duplicate content. How can I solve this problem? Does canonical solve this problem? Thank you for your help Emre Tonguç
Cancel
Tim Wilson

2015-07-19T14:36:31-07:00

This is a great post and answers many questions to how to get rid of a page for a client "the right way" when I run into duplicate content issues, or irrelevant pages that need to be taken care of.

The Robots.txt file power is something that I think is miss underused at times but the meta robots really gave another tool for specific pages.

Plus the new tools in the WMT is a great tip for any that have not been using them yet.

Either way always a great WBF and also awesome information on how we can all be better SEO's.

2 0

This is a great post and answers many questions to how to get rid of a page for a client "the right way" when I run into duplicate content issues, or irrelevant pages that need to be taken care of. The Robots.txt file power is something that I think is miss underused at times but the meta robots really gave another tool for specific pages. Plus the new tools in the WMT is a great tip for any that have not been using them yet. Either way always a great WBF and also awesome information on how we can all be better SEO's.
Cancel
Sagar_Pednekar

2015-07-19T02:42:13-07:00

Hey Rand,

This edition of WBF certainly brushed up the basic.. Thanks for helping me with this insight..
Cheeeers!!

2 0

Hey Rand, This edition of WBF certainly brushed up the basic.. Thanks for helping me with this insight.. Cheeeers!!
Cancel
Julian Cordes

2015-07-17T03:07:39-07:00

About: 2. Dealing with duplicate or thin content

The color filter does not create duplicate content. Therefore the canonical-tag is wrong here (like using it on paginated pages). Google is recommending "noindex, follow".

2 0

About: 2. Dealing with duplicate or thin content The color filter does not create duplicate content. Therefore the canonical-tag is wrong here (like using it on paginated pages). Google is recommending "noindex, follow". 
Cancel
- Rand Fishkin
 
 2015-07-17T10:03:20-07:00
 
 Yeah, but their recommendation is wrong. If someone links to the "gray" version of the tshirt page and I want that link to count to my original, rel=canonical is the way to go. That said, if folks are searching for the gray version of the shirt separately from the others, then I want to let that page actually get indexed!
 
 4 0
 
 Yeah, but their recommendation is wrong. If someone links to the "gray" version of the tshirt page and I want that link to count to my original, rel=canonical is the way to go. That said, if folks are searching for the gray version of the shirt separately from the others, then I want to let that page actually get indexed!
 Cancel
 - Julian Cordes
 
 2015-07-20T08:02:31-07:00
 
 I was wrong and thinking of category filters. For filtered product pages of course, canonical is ok :)
 
 2 0
 
 I was wrong and thinking of category filters. For filtered product pages of course, canonical is ok :)
 Cancel
Headwest

2015-07-17T03:58:39-07:00

Hi Rand,

Thanks for the video. Interesting distinction between the robots.txt and the robots meta tag.

Question on internal search results - A site currently has search results indexed which I don't believe is the best as far as crawl budget etc is concerned. Is the right course of action to disallow the search results within robots.txt and then use the URL removal tool to remove the search results?

Thanks

Gareth

2 0

Hi Rand, Thanks for the video. Interesting distinction between the robots.txt and the robots meta tag. Question on internal search results - A site currently has search results indexed which I don't believe is the best as far as crawl budget etc is concerned. Is the right course of action to disallow the search results within robots.txt and then use the URL removal tool to remove the search results? Thanks Gareth
Cancel
JacobZucchi

2015-07-17T03:59:33-07:00

Great article! I have a question about nofollow, does it still matter in term of sculpting the flow of PR?

2 0

Great article! I have a question about nofollow, does it still matter in term of sculpting the flow of PR? 
Cancel
- Rand Fishkin
 
 2015-07-17T10:04:22-07:00
 
 Sort of, but it's such a teeny tiny ranking factor that it's mostly useless (with the exeception a few rare edge cases).
 
 4 0
 
 Sort of, but it's such a teeny tiny ranking factor that it's mostly useless (with the exeception a few rare edge cases).
 Cancel
cakodelgado

2015-07-17T02:53:46-07:00

wow! it's realy a good information. The Spanish SEO manager like me have a problem with the lenguage but i can understand with this graphic video.

2 0

wow! it's realy a good information. The Spanish SEO manager like me have a problem with the lenguage but i can understand with this graphic video. 
Cancel
Richard Barrett

2015-07-17T01:44:36-07:00

Always good to review the basics and make sure you haven't missed out on an essential building block.

One suggestion I do have for the search pages is a more pragmatic approach:
You noindex your search results by default,
Record what searches are being performed on your site,
Once a search reaches a certain amount (I will let you decide what is enough traffic to be valuable) you craft a page that serves those results and allow it to be indexed.

This benefits your users (so long as you keep serving them the same content!!!) as they want this information and you are making it easier for them to get straight to it, it also benefits you as this page that has some value is now more visible.

2 0

Always good to review the basics and make sure you haven't missed out on an essential building block. One suggestion I do have for the search pages is a more pragmatic approach: You noindex your search results by default, Record what searches are being performed on your site, Once a search reaches a certain amount (I will let you decide what is enough traffic to be valuable) you craft a page that serves those results and allow it to be indexed. This benefits your users (so long as you keep serving them the same content!!!) as they want this information and you are making it easier for them to get straight to it, it also benefits you as this page that has some value is now more visible. 
Cancel
- Mirko Obkircher
 
 2015-07-17T05:25:43-07:00
 
 Hi Richard,
 
 As I'm currently having a very very similar issue, I'd just like to hear a second(in this case your) opinion about internal search.
 
 I have a search term that is looked up by visitors quite often resulting in the tipical /catalogsearch/result/?q=term page.
 My question now is: If there already exists a more SEO friendly Landing Page, that is focusing on this term, should I 301 redirect this specific search /catalogsearch/result/?q=term to the /category/subcategory/ page or does that cause any issue?
 
 Thanks a lot in advance
 
 2 0
 
 Hi Richard, As I'm currently having a very very similar issue, I'd just like to hear a second(in this case your) opinion about internal search. I have a search term that is looked up by visitors quite often resulting in the tipical /catalogsearch/result/?q=term page. My question now is: If there already exists a more SEO friendly Landing Page, that is focusing on this term, should I 301 redirect this specific search /catalogsearch/result/?q=term to the /category/subcategory/ page or does that cause any issue? Thanks a lot in advance 
 Cancel
 - Rand Fishkin
 
 2015-07-17T09:53:42-07:00
 
 You can either use a 301 or you can rel=canonical if you think some visitors who use the search function would prefer to get the search-results style page.
 
 4 0
 
 You can either use a 301 or you can rel=canonical if you think some visitors who use the search function would prefer to get the search-results style page.
 Cancel
 - Richard Barrett
 
 2015-07-20T01:03:13-07:00
 
 You already have an answer from Rand (lucky you) but just to confirm I would 301/canonical to the preferred page, it just makes sense to focus your authority onto the one page.
 
 2 0
 
 You already have an answer from Rand (lucky you) but just to confirm I would 301/canonical to the preferred page, it just makes sense to focus your authority onto the one page.
 Cancel
- Rand Fishkin
 
 2015-07-17T09:53:02-07:00
 
 Yes! Great suggestion Richard. Love that methodology for finding which search queries to make into landing pages.
 
 3 0
 
 Yes! Great suggestion Richard. Love that methodology for finding which search queries to make into landing pages.
 Cancel
BrijB

2015-07-17T01:48:36-07:00

Google does say that they don't like your search result pages in search result, but whenever I type little tricky local search query, they show ton of such search result pages in ads appear on SERPs. So it seems Adwords ads are not spoiling search quality but when you want to do it in organic, you are spoiling it.

2 0

Google does say that they don't like your search result pages in search result, but whenever I type little tricky local search query, they show ton of such search result pages in ads appear on SERPs. So it seems Adwords ads are not spoiling search quality but when you want to do it in organic, you are spoiling it.
Cancel
- CommercePundit
 
 2015-07-17T01:53:15-07:00
 
 Yes i agree with you BrijB, Sometime it look like Google wants to take more and more website owners to start PPC,
 
 2 0
 
 Yes i agree with you BrijB, Sometime it look like Google wants to take more and more website owners to start PPC,
 Cancel
Ash Rane

2015-07-17T16:51:58-07:00

Hi Rand, great video (as always!). I still am on cloud 9 from being able to touch the holy whiteboard earlier in the week =D.

You mentioned crawl bandwidth, do you know if anyone has done any research or tests on the limits of how many pages Google is able to crawl per day?

Also do you get any penalties towards the number of pages crawled if you have error(s)?

Cheers, Ash

2 0

Hi Rand, great video (as always!). I still am on cloud 9 from being able to touch the holy whiteboard earlier in the week =D. You mentioned crawl bandwidth, do you know if anyone has done any research or tests on the limits of how many pages Google is able to crawl per day? Also do you get any penalties towards the number of pages crawled if you have error(s)? Cheers, Ash
Cancel
Mike Blazer

2015-07-17T04:30:15-07:00

I agree with you Rand, if a search result page provides a valuable content to a user, who couldn't easily find that content any ther way, we need to let Google (Bing and others) index and rank those search result pages....

2 0

I agree with you Rand, if a search result page provides a valuable content to a user, who couldn't easily find that content any ther way, we need to let Google (Bing and others) index and rank those search result pages....
Cancel
Mike Fitzpatrick

2015-07-17T03:30:27-07:00

Hi Rand

We have built several "un-indexable" websites for a business who want to have an exclusive offer for fidelity card clients only arriving through link in a private area of a high traffic website. Having spent most of my time building search friendly websites optimized for maximum visibility - the first time I heard this request I wanted to cry.

Anyway, we have used robot meta with noindex, and absolutely no links on the web, and this has worked perfectly so far - 0 hits through search in over 1 year. It also definitely helped that we build the site on a brand new domain.

You made an interesting caveat about the conflict between robots.txt and robots meta - the txt stopping robots from actually reading the robots meta instructions - I hadn't looked at it that way.

Thanks

3 1

Hi Rand We have built several "un-indexable" websites for a business who want to have an exclusive offer for fidelity card clients only arriving through link in a private area of a high traffic website. Having spent most of my time building <a href="https://jeyjoo.com/search-engine-optimization-seo" rel="nofollow">search friendly websites</a> optimized for maximum visibility - the first time I heard this request I wanted to cry. Anyway, we have used robot meta with noindex, and absolutely no links on the web, and this has worked perfectly so far - 0 hits through search in over 1 year. It also definitely helped that we build the site on a brand new domain. You made an interesting caveat about the conflict between robots.txt and robots meta - the txt stopping robots from actually reading the robots meta instructions - I hadn't looked at it that way. Thanks
Cancel
Garrett Esser

2015-07-17T13:11:47-07:00

The "duplicate/thin content" bit is spot-on. As usual, great post. Fishkin for Prez.

2 0

The "duplicate/thin content" bit is spot-on. As usual, great post. Fishkin for Prez.
Cancel
William Bay

2015-07-17T14:41:48-07:00

Thanks for the video. So I was already pretty clear on a lot of this.
But I never did quite get rel=canonical when it came out. It sounds like it acts essentially as a 301, but without the actual redirect.

Is that accurate, or am I oversimplifying things?

2 0

Thanks for the video. So I was already pretty clear on a lot of this. But I never did quite get rel=canonical when it came out. It sounds like it acts essentially as a 301, but without the actual redirect. Is that accurate, or am I oversimplifying things? 
Cancel
- Rand Fishkin
 
 2015-07-17T14:52:09-07:00
 
 Yeah, that's pretty much it. It's not as perfectly respected/followed as a 301, but close. More here: https://moz.com/learn/seo/canonicalization
 
 3 0
 
 Yeah, that's pretty much it. It's not as perfectly respected/followed as a 301, but close. More here: <a href="https://moz.com/learn/seo/canonicalization" rel="nofollow">https://moz.com/learn/seo/canonicalization</a>
 Cancel
 - William Bay
 
 2015-07-17T14:54:26-07:00
 
 Perfect. Thanks Rand!
 
 2 0
 
 Perfect. Thanks Rand!
 Cancel
seo32

2015-07-17T13:15:14-07:00

What should the step-by-step process be if you are migrating from seperate desktop and mobile URL's to responsive? Which should be used to make sure the mobile URL's no longer index in search results; 301 redirects, noindex/nofollow, ad/or robots.txt to block mobile? If any of them should be used when should they be added? Should anything get added before the full responsive migration? Should they be added when the site launches?

2 0

What should the step-by-step process be if you are migrating from seperate desktop and mobile URL's to responsive? Which should be used to make sure the mobile URL's no longer index in search results; 301 redirects, noindex/nofollow, ad/or robots.txt to block mobile? If any of them should be used when should they be added? Should anything get added before the full responsive migration? Should they be added when the site launches?
Cancel
- Rand Fishkin
 
 2015-07-17T13:51:04-07:00
 
 You'd want to do pretty much what's done when you redirect one site to another, i.e. redirect each individual m.yourdomain.com page to the right www.yourdomain.com page (rewrite rules can be very helpful here). I wouldn't block crawling to the m-dot URLs or you'll prevent Google from seeing the redirects!
 
 2 0
 
 You'd want to do pretty much what's done when you redirect one site to another, i.e. redirect each individual m.yourdomain.com page to the right www.yourdomain.com page (rewrite rules can be very helpful here). I wouldn't block crawling to the m-dot URLs or you'll prevent Google from seeing the redirects!
 Cancel
Jitender Kumar

2015-07-17T09:47:31-07:00

Hiii Rand, Very Informative Video, What WordPress tags should be index or not if every tag contain more than 5 articles?

2 0

Hiii Rand, Very Informative Video, What WordPress <a href="https://moz.com/community/q/tags-on-wordpress-sites-good-or-bad" rel="nofollow">tags</a> should be index or not if every tag contain more than 5 articles?
Cancel
Ryan Hughson

2015-07-17T11:58:58-07:00

Hey Rand!

First off this is great for people like me who dont know much about Robots.txt thanks

I had a quick question i wanted your opinion on, I submitted a site map through search console of about 4800 URLs but its only indexing between 94-120 i know today (7/14) there was a bug but this has been for months and we arent using any robots or no index that would block that. Any advice??

again thanks for the videos, i look forward to them every friday

2 0

Hey Rand! First off this is great for people like me who dont know much about Robots.txt thanks I had a quick question i wanted your opinion on, I submitted a site map through search console of about 4800 URLs but its only indexing between 94-120 i know today (7/14) there was a bug but this has been for months and we arent using any robots or no index that would block that. Any advice?? again thanks for the videos, i look forward to them every friday
Cancel
Umar Khan

2015-07-17T06:58:14-07:00

As usual a great WBF and a cool hair style :)

What's your take on improving the indexation of Sub-domains? I came across to a site which has the sub-domain version and it's been months since its launch and only a single page got indexed. Sitemap, Robots.txt everything seems alright. Can we control it in any other way?

Thanks,

2 0

As usual a great WBF and a cool hair style :) What's your take on improving the indexation of Sub-domains? I came across to a site which has the sub-domain version and it's been months since its launch and only a single page got indexed. Sitemap, Robots.txt everything seems alright. Can we control it in any other way? Thanks, 
Cancel
- Rand Fishkin
 
 2015-07-17T09:57:15-07:00
 
 If you're having indexation problems it's usually one of four things:
 - Some sort of directive blocking crawlers (robots.txt, meta robots, Webmaster Tools instructions, etc)
 - Not enough links or important links for Google to consider the pages worth crawling/indexing
 - Duplicate or thin content to the point where Google finds the pages unworthy of being indexed
 - A penalty or ban on the site/content due to manipulative or problematic behavior
 3 0
 If you're having indexation problems it's usually one of four things: <ul><li>Some sort of directive blocking crawlers (robots.txt, meta robots, Webmaster Tools instructions, etc)</li><li>Not enough links or important links for Google to consider the pages worth crawling/indexing</li><li>Duplicate or thin content to the point where Google finds the pages unworthy of being indexed</li><li>A penalty or ban on the site/content due to manipulative or problematic behavior</li></ul>
 Cancel
 - Umar Khan
 
 2015-07-18T03:31:32-07:00
 
 Thanks for your input. It seems 2nd and 3rd points are causing this.
 
 By the way, is there any specific name of your new hair cut? :)
 
 Umar
 
 1 0
 
 Thanks for your input. It seems 2nd and 3rd points are causing this. By the way, is there any specific name of your new hair cut? :) Umar 
 Cancel
Mirko Obkircher

2015-07-17T05:32:04-07:00

Hi Rand,

just got a quick question about your affirmation at 2:40 where you say: Just because a page doesn't get crawled, it doesn't mean that it gets indexed.

Am I getting this right:
If you disallow the page/or folder right from the beginning, it shouldn't get index (assuming that google respects your robots.txt settings)

If you disallow the page/or folder after some time and it already got crawled and indexed, than the disallow setting somehow "comes late" and a page could appear in the search results even though you set robots.txt on disallow

Any reply would be much appreciated

Thanks

ennovators edited 2015-07-17T06:39:53-07:00
2 0

Hi Rand, just got a quick question about your affirmation at 2:40 where you say: Just because a page doesn't get crawled, it doesn't mean that it gets indexed. Am I getting this right: If you disallow the page/or folder right from the beginning, it shouldn't get index (assuming that google respects your robots.txt settings) If you disallow the page/or folder after some time and it already got crawled and indexed, than the disallow setting somehow "comes late" and a page could appear in the search results even though you set robots.txt on disallow Any reply would be much appreciated Thanks 
Cancel
- Rand Fishkin
 
 2015-07-17T09:54:59-07:00
 
 Sorry about my lack of clarity - it's not that the disallowed page gets indexed, but it can get into search results. Google will show something like a "we can't show a description for this result because of robots.txt" type of message for the description. They don't actually crawl and index the page, but they do index the URL and create a record of it which can appear in search results.
 
 3 0
 
 Sorry about my lack of clarity - it's not that the disallowed page gets indexed, but it can get into search results. Google will show something like a "we can't show a description for this result because of robots.txt" type of message for the description. They don't actually crawl and index the page, but they do index the URL and create a record of it which can appear in search results.
 Cancel
AlexVidal

2015-07-29T14:28:47-07:00

Very useful video again, i alsto I think the x Robots HTTP Header is a quite useful alternative to the meta robots tag, if i may say it. Once again, im quite impressed with the vids hehe

1 0

Very useful video again, i alsto I think the x Robots HTTP Header is a quite useful alternative to the meta robots tag, if i may say it. Once again, im quite impressed with the vids hehe
Cancel
JoMarie Thomson

2015-07-28T15:15:48-07:00

Each of these are issues for my clients. I appreciate the validation. Now I need to make a variety of cms tools play nice. Maybe the developers know how to read and will fix their crummy programming after I forward this article!

1 0

Each of these are issues for my clients. I appreciate the validation. Now I need to make a variety of cms tools play nice. Maybe the developers know how to read and will fix their crummy programming after I forward this article! 
Cancel
Daniel Boardman

2015-07-28T11:59:18-07:00

Rand! Great refresher for me here, enjoyed it as with all WBF's :)

I've got a question following the .css and .js warning that Google Search Console is distributing at present. I'm blocking our /ajax/ files (which load in boring dropdown lists dynamically) using our robots.txt file. This is causing Google to only partially render the page upon a 'Fetch' so concerned I may have to revert this decision and allow crawling...

My question is then, can you rel="noindex" ajax files, .js, .css files etc. just as with 'normal' pages?

Having a mini panic about this as we're doing this to assist UX as these items have no value in the SERPs! :) Thanks!

1 0

Rand! Great refresher for me here, enjoyed it as with all WBF's :) I've got a question following the .css and .js warning that Google Search Console is distributing at present. I'm blocking our /ajax/ files (which load in boring dropdown lists dynamically) using our robots.txt file. This is causing Google to only partially render the page upon a 'Fetch' so concerned I may have to revert this decision and allow crawling... My question is then, can you rel="noindex" ajax files, .js, .css files etc. just as with 'normal' pages? Having a mini panic about this as we're doing this to assist UX as these items have no value in the SERPs! :) Thanks! 
Cancel
- CleverPhD
 
 2015-07-28T12:09:50-07:00
 
 Hi Daniel, while those files "have no value in the SERPs" they do impact how all of your other pages are crawled and shown in the SERPs. Do not worry about noindexing those files. Let Googlebot crawl them, see them, love them. They are seen as important and essential files to Google. Google Webmaster guidelines explicitly state:
 
 https://support.google.com/webmasters/answer/35769...
 - To help Google fully understand your site's contents, allow all of your site's assets, such as CSS and JavaScript files, to be crawled. The Google indexing system renders webpages using the HTML of a page as well as its assets such as images, CSS, and Javascript files. To see the page assets that Googlebot cannot crawl and to debug directives in your robots.txt file, use the Fetch as Google and the robots.txt Tester tools in Search Console
 If you do block those files, it could have a negative impact on your site's indexation and ranking. I thinks you are overthinkings things. I would not panic because you do not have CSS and JS blocked, I would panic if you did have CSS and JS blocked!
 
 Cheers
 
 3 0
 Hi Daniel, while those files "have no value in the SERPs" they do impact how all of your other pages are crawled and shown in the SERPs. Do not worry about noindexing those files. Let Googlebot crawl them, see them, love them. They are seen as important and essential files to Google. Google Webmaster guidelines explicitly state: <a href="https://support.google.com/webmasters/answer/35769?hl=en" rel="nofollow">https://support.google.com/webmasters/answer/35769...</a> <ul><li>To help Google fully understand your site's contents, allow all of your site's assets, such as CSS and JavaScript files, to be crawled. The Google indexing system renders webpages using the HTML of a page as well as its assets such as images, CSS, and Javascript files. To see the page assets that Googlebot cannot crawl and to debug directives in your robots.txt file, use the <a href="https://support.google.com/webmasters/answer/6066467" rel="nofollow">Fetch as Google</a> and the <a href="https://support.google.com/webmasters/answer/6062598" rel="nofollow">robots.txt Tester</a> tools in Search Console</li></ul> If you do block those files, it could have a negative impact on your site's indexation and ranking. I thinks you are overthinkings things. I would not panic because you do not have CSS and JS blocked, I would panic if you did have CSS and JS blocked! Cheers
 Cancel
 - Daniel Boardman
 
 2015-07-28T12:49:41-07:00
 
 Thanks for the quick reply.
 
 I'll remove the /ajax/ line out of my robots.txt and allow Google to take them into the index. Shame I can't allow crawling of these assets and still have them noindexed.. the files literally are just meaningless lists on there own! :)
 
 Hopefully Google will realise these have no user value and drop the files themselves out of the index after a while.
 
 Certainly not worried about allowing the .js and .css to be crawled - but just wondering if there was a method of noindexing without preventing crawling.
 
 Thanks again for the response - if anyone else has any tips on this I'd be greatful to hear! Thanks!
 
 1 0
 
 Thanks for the quick reply. I'll remove the /ajax/ line out of my robots.txt and allow Google to take them into the index. Shame I can't allow crawling of these assets and still have them noindexed.. the files literally are just meaningless lists on there own! :) Hopefully Google will realise these have no user value and drop the files themselves out of the index after a while. Certainly not worried about allowing the .js and .css to be crawled - but just wondering if there was a method of noindexing without preventing crawling. Thanks again for the response - if anyone else has any tips on this I'd be greatful to hear! Thanks!
 Cancel
 - CleverPhD
 
 2015-07-28T14:36:10-07:00
 
 If they are meaningless lists, they will probably not rank so I think you are good. Are you concerned that they will outrank the main page?
 
 Something else I just thought of. If those ajax files contain content that is canonical to the page you can use a canonical http header
 
 https://moz.com/blog/how-to-advanced-relcanonical-...
 
 to show that those are parts of the main page. Not sure if this is true, for your setup, but another option.
 
 2 0
 
 If they are meaningless lists, they will probably not rank so I think you are good. Are you concerned that they will outrank the main page? Something else I just thought of. If those ajax files contain content that is canonical to the page you can use a canonical http header <a href="https://moz.com/blog/how-to-advanced-relcanonical-http-headers" rel="nofollow">https://moz.com/blog/how-to-advanced-relcanonical-...</a> to show that those are parts of the main page. Not sure if this is true, for your setup, but another option.
 Cancel
 - Daniel Boardman
 
 2015-07-29T01:50:58-07:00
 
 Hi CleverPhD - No, not concerned they will outrank anything. We have around 12,000 pages of real content pages which should always outperform. It was just a case of making that decision for Google and telling it that these were unimportant with regards to indexation. Just looking to make its time on site more efficient! :)
 
 Thanks for your help - much appreciated. This morning we've removed that line from the robots.txt file and will let the big G make its own mind up about these resources! :)
 
 1 0
 
 Hi CleverPhD - No, not concerned they will outrank anything. We have around 12,000 pages of real content pages which should always outperform. It was just a case of making that decision for Google and telling it that these were unimportant with regards to indexation. Just looking to make its time on site more efficient! :) Thanks for your help - much appreciated. This morning we've removed that line from the robots.txt file and will let the big G make its own mind up about these resources! :) 
 Cancel
alineseo

2015-07-31T13:57:23-07:00

Hey Rand. I have a question about duplicate content for e-commerce websites.

I have different colours for the same produc. However I don´t have a "default" page with no colour mention to use canonical.

In that case can I say that I the the pink colour , for instance, is my main page and do the canonical from the other colour to it?

Tks a lot

1 0

Hey Rand. I have a question about duplicate content for e-commerce websites. I have different colours for the same produc. However I don´t have a "default" page with no colour mention to use canonical. In that case can I say that I the the pink colour , for instance, is my main page and do the canonical from the other colour to it? Tks a lot
Cancel
Jonatas Leonel

2015-08-06T05:56:06-07:00

Hey Randy, how are you?

Amazing content by the way, I used to inspire on your posts sometimes to build my answers to the developer tech lead when I have to prove to them why I am requesting changes and updates on the website =) =) ... Good job!

So, I was looking throughtout the internet for any info that could confirm the part where you said:

"Google has said many times that they don't like your search results from your own internal engine appearing in their search results, and so this can be a tricky use case."

But I wasn´t lucky to find any article or content about it. Could you please indicate to me where Google said that? Any hint will add more XP score to my knowledge and my colleagues here ;)

We have tons of search result pages cached on Google and I definitely would like to give them a solution.

Cheers,

1 0

Hey Randy, how are you? Amazing content by the way, I used to inspire on your posts sometimes to build my answers to the developer tech lead when I have to prove to them why I am requesting changes and updates on the website =) =) ... Good job! So, I was looking throughtout the internet for any info that could confirm the part where you said: "Google has said many times that they don't like your search results from your own internal engine appearing in their search results, and so this can be a tricky use case." But I wasn´t lucky to find any article or content about it. Could you please indicate to me where Google said that? Any hint will add more XP score to my knowledge and my colleagues here ;) We have tons of search result pages cached on Google and I definitely would like to give them a solution. Cheers, 
Cancel
- Tobias Hövelborn
 
 2015-08-06T06:12:44-07:00
 
 Hi Jonatas,
 
 I found this remark
 
 Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.
 
 in the classic webmaster guidelines and a pretty old blog article by Matt Cuts.
 
 Cheers, Tobias
 
 1 0
 
 Hi Jonatas, I found this remark Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines. in the classic <a href="https://support.google.com/webmasters/answer/35769?hl=en" rel="nofollow">webmaster guidelines</a> and a pretty old blog article by <a href="https://www.mattcutts.com/blog/search-results-in-search-results/" rel="nofollow">Matt Cuts</a>. Cheers, Tobias
 Cancel
JulianPope

2015-07-28T09:54:42-07:00

so if I create a page such as "pumping unit rentals" specifically for the purpose to rank for that kw, yet Google decides it wants to use a different page that does not have as much quality content, how can I get Google to switch?

1 0

so if I create a page such as "pumping unit rentals" specifically for the purpose to rank for that kw, yet Google decides it wants to use a different page that does not have as much quality content, how can I get Google to switch?
Cancel
MatildaRose

2015-08-15T14:23:58-07:00

The trouble with noindexing pages is that you can send the crawler into a black hole. If you have really bad pages on your site, either improve them or delete them. What is left is what the crawler should see and what the visitor should see too.

1 0

The trouble with noindexing pages is that you can send the crawler into a black hole. If you have really bad pages on your site, either improve them or delete them. What is left is what the crawler should see and what the visitor should see too.
Cancel
cythilya

2016-05-30T01:48:31-07:00

Hi,

If I just disallow the robots to access the entire file folder instead of the url.

Will the page show up in the search results?

1 0

Hi, If I just disallow the robots to access the entire file folder instead of the url. Will the page show up in the search results?
Cancel
- Rand Fishkin
 
 2016-05-30T05:38:44-07:00
 
 The URL can still show up in results. The only way to keep it out entirely is to let Google crawl the page and use the meta robots noindex protocol.
 
 3 0
 
 The URL can still show up in results. The only way to keep it out entirely is to let Google crawl the page and use the meta robots noindex protocol.
 Cancel
 - cythilya
 
 2016-05-30T18:28:16-07:00
 
 Thanks :)
 
 1 0
 
 Thanks :)
 Cancel
Brian O'Grady

2015-08-05T16:55:23-07:00

Hi Rand,

Thanks for this fantastic refresher!

I'm working with a real estate directory website (similar to Zillow and Trulia). In the past they experienced warnings in Google Webmaster Tools due to Googlebot encountering an extremely large number of links. This was occurring due to the almost-endless amount of internal search results pages (many of which contain facets) that Google were discovering and attempting to crawl. The solution at the time was to block ALL search results pages in robots.txt and create a seperate SEO-friendly directory of property type/location pages which was internally linked to from the footer and within the XML sitemap.

My feeling is that this is not the best solution to this problem and I'd like to propose removing that robots.txt disallow and completely change the way their URLs are structured so that useful pages are contained in subfolders (like property type, location, buy/rent/sold) and all non-search essential parameters (e.g. number of bathrooms, car spaces... etc.) are built as parameters. Rather than blocking these pages to Google, all pages would simply contain canonical tags which only contain the subfolder URLs (i.e. all parameters are stripped from the canonical URL). This solution should enable Google to crawl and honour any links at faceted pages while only prioritising and indexing valuable directory results. I'm hoping it will also make Google's crawl of the site more efficient, saving on bandwidth. The only real drawback is that I would have to setup 301s for all old URLs.

Is this the best solution for this type of website?

1 0

Hi Rand, Thanks for this fantastic refresher! I'm working with a real estate directory website (similar to Zillow and Trulia). In the past they experienced warnings in Google Webmaster Tools due to Googlebot encountering an extremely large number of links. This was occurring due to the almost-endless amount of internal search results pages (many of which contain facets) that Google were discovering and attempting to crawl. The solution at the time was to block ALL search results pages in robots.txt and create a seperate SEO-friendly directory of property type/location pages which was internally linked to from the footer and within the XML sitemap. My feeling is that this is not the best solution to this problem and I'd like to propose removing that robots.txt disallow and completely change the way their URLs are structured so that useful pages are contained in subfolders (like property type, location, buy/rent/sold) and all non-search essential parameters (e.g. number of bathrooms, car spaces... etc.) are built as parameters. Rather than blocking these pages to Google, all pages would simply contain canonical tags which only contain the subfolder URLs (i.e. all parameters are stripped from the canonical URL). This solution should enable Google to crawl and honour any links at faceted pages while only prioritising and indexing valuable directory results. I'm hoping it will also make Google's crawl of the site more efficient, saving on bandwidth. The only real drawback is that I would have to setup 301s for all old URLs. Is this the best solution for this type of website?
Cancel
MikaD

2015-08-03T10:53:08-07:00

Hi Rand,

Great article/video, it's also a good refresh beacouse it's really important to know how Controlling Search Engine Crawlers.

However, I have a question on your second example (ecommerce tee shirt and different colors).

Why do you say that use noindex on canonical pages might interfere with the rel canonical ?

If we don't use noindex meta tag, it's possible that Google indexes these pages (and also the default version) if usage and backlinks are on them.

Thanks for the time you will spend to answer me.

1 0

Hi Rand, Great article/video, it's also a good refresh beacouse it's really important to know how Controlling Search Engine Crawlers. However, I have a question on your second example (ecommerce tee shirt and different colors). Why do you say that use noindex on canonical pages might interfere with the rel canonical ? If we don't use noindex meta tag, it's possible that Google indexes these pages (and also the default version) if usage and backlinks are on them. Thanks for the time you will spend to answer me. 
Cancel
Ecom-Team-Access

2015-07-21T09:28:53-07:00

Hi Rand,

Thanks for this post. Duplicate content has been an ongoing issue for us. Our organic rankings are doing pretty good, but we could potentially do better by eliminating our dup content issues.

I'm wondering what you would recommend for the following.

The Moz crawl says that https://www.incipio.com/cases/tablet-cases.html is a duplicate of https://www.incipio.com/cases/tablet-cases/microsof... . There is a rel canonical in place on the https://www.incipio.com/cases/tablet-cases.html page. The canonical is the following: <link rel="canonical" href="https://www.incipio.com/cases/tablet-cases.html" />

Why is the crawl still saying there is a duplicate with the canonical?

Thanks for your help!

Danielle_Launders edited 2015-07-22T14:42:21-07:00
1 0

Hi Rand, Thanks for this post. Duplicate content has been an ongoing issue for us. Our organic rankings are doing pretty good, but we could potentially do better by eliminating our dup content issues. I'm wondering what you would recommend for the following. The Moz crawl says that https://www.incipio.com/cases/tablet-cases.html is a duplicate of https://www.incipio.com/cases/tablet-cases/microsof... . There is a rel canonical in place on the https://www.incipio.com/cases/tablet-cases.html page. The canonical is the following: <link rel="canonical" href="https://www.incipio.com/cases/tablet-cases.html" /> Why is the crawl still saying there is a duplicate with the canonical? Thanks for your help!
Cancel
- Danielle Launders
 
 2015-07-22T14:43:07-07:00
 
 Hello Nicole,
 
 If you don't find the answer you are looking for here, you can always ask a question like this to our community in the Moz Q&A Forum! https://moz.com/community/q
 
 1 0
 
 Hello Nicole, If you don't find the answer you are looking for here, you can always ask a question like this to our community in the Moz Q&A Forum! https://moz.com/community/q
 Cancel
 - Ecom-Team-Access
 
 2015-07-22T14:49:41-07:00
 
 Hi Danielle,
 
 I've tried that. I was hoping someone here could answer.
 
 Thanks!
 
 1 0
 
 Hi Danielle, I've tried that. I was hoping someone here could answer. Thanks! 
 Cancel
Mazen Aloul

2015-07-21T08:05:36-07:00

Very useful, thanks Rand! I was wondering about the use of "Crawl-delay" in the robots.txt - under which circumstances would you want to use that?

1 0

Very useful, thanks Rand! I was wondering about the use of "Crawl-delay" in the robots.txt - under which circumstances would you want to use that?
Cancel
Dilyan Grigorov

2015-07-20T14:54:14-07:00

.

dido.grigorov edited 2015-07-20T14:59:43-07:00
1 0

.
Cancel
Dilyan Grigorov

2015-07-20T14:51:27-07:00

This video is a total refresher! Thank you, Rand! You are always great and provide useful information :)

Most of all I like point 4 about the internal search engines - it always depends on the website, its direction, business model and the audience behavior I think, but in most cases it is really confusing and not a good idea the search results to come up in Google.

Dido Grigorov

1 0

This video is a total refresher! Thank you, Rand! You are always great and provide useful information :) Most of all I like point 4 about the internal search engines - it always depends on the website, its direction, business model and the audience behavior I think, but in most cases it is really confusing and not a good idea the search results to come up in Google. Dido Grigorov
Cancel
Joogifts Joogifts

2015-07-17T04:36:40-07:00

This article is so nice it give so many knowledge about the robot.txt file.

2 1

This article is so nice it give so many knowledge about the robot.txt file. 
Cancel
sandeep.h2so4

2015-07-21T23:44:17-07:00

Hii Fishkin...
very nice video and post, i already know about robots.txt and robots meta tag, but didn't know their deep fact. such a interesting for my software company and seo company. we will definitely use on client websites.

Danielle_Launders edited 2015-07-22T14:35:09-07:00
1 0

Hii Fishkin... very nice video and post, i already know about robots.txt and robots meta tag, but didn't know their deep fact. such a interesting for my software company and seo company. we will definitely use on client websites.
Cancel
seofashionista

2015-07-22T11:28:28-07:00

HI Rand

Great video and something that I am currently doing a bit of work around myself. Loved the part about product pages and the multi variants that often come along with these size, colour, images etc. So agree that the best solution here is to use the rel=canonical tag to point to the original source of that information. Although how about this as another idea, applying rules in the code that when another colour is selected then the rule rewrites the meta title and meta description by adding the colour to those attributes. So for instance lets say we are selling (Nike) which by the way we don't :-)

www.examplesite.com/nike/test-trainer

Meta title - Nike Test Trainer - example site clothing store

Meta Description - Shop the Nike test Trainers from official stockists example clothing store | Free deliveries on orders over £40.

Then applying a rewrite rule code side that when the user say for instance selects red in that item

www.examplesite.com/nike/test-trainer/red

Meta Title - Nike Test Trainer in Red - Example Site Clothing Store

Meta Description - Shop the Nike Test Trainers in Red from Official Stockists Example Clothing Store | Free Deliveries on Orders Over £40

and so and and so forth.

The new page would have the rel=canonical tag on this pointing to the original page.

So the question is 2 fold.

1) Is this generating a new page to be indexed and would it help with long tail queries.

2) Would the rel=canonical need removing from the new pages for the above to happen and then would it create duplication issues across the site.

I would love to hear some feedback on this and if anyone has tested the above and what sort of results you had either way.

Danielle_Launders edited 2015-07-22T14:26:47-07:00
1 0

HI Rand Great video and something that I am currently doing a bit of work around myself. Loved the part about product pages and the multi variants that often come along with these size, colour, images etc. So agree that the best solution here is to use the rel=canonical tag to point to the original source of that information. Although how about this as another idea, applying rules in the code that when another colour is selected then the rule rewrites the meta title and meta description by adding the colour to those attributes. So for instance lets say we are selling (Nike) which by the way we don't :-) www.examplesite.com/nike/test-trainer Meta title - Nike Test Trainer - example site clothing store Meta Description - Shop the Nike test Trainers from official stockists example clothing store | Free deliveries on orders over £40. Then applying a rewrite rule code side that when the user say for instance selects red in that item www.examplesite.com/nike/test-trainer/red Meta Title - Nike Test Trainer in Red - Example Site Clothing Store Meta Description - Shop the Nike Test Trainers in Red from Official Stockists Example Clothing Store | Free Deliveries on Orders Over £40 and so and and so forth. The new page would have the rel=canonical tag on this pointing to the original page. So the question is 2 fold. 1) Is this generating a new page to be indexed and would it help with long tail queries. 2) Would the rel=canonical need removing from the new pages for the above to happen and then would it create duplication issues across the site. I would love to hear some feedback on this and if anyone has tested the above and what sort of results you had either way. 
Cancel
- Danielle Launders
 
 2015-07-22T14:34:36-07:00
 
 If you don't find the answer you are looking for here, another great place to ask a question like this to our community in our Q&A Forum! https://moz.com/community/q
 
 1 0
 
 If you don't find the answer you are looking for here, another great place to ask a question like this to our community in our Q&A Forum! https://moz.com/community/q
 Cancel
CleverPhD

2015-07-28T09:07:42-07:00

Hello there!

I know I am late on this, but I had a quick comment on #3. Passing link equity without appearing in search results. What is missing here is use of the rel=next and rel=prev tags on those paginated pages. Google recommends using rel next prev so that they can see the relationship between the series of pages. You can then add the noindex,follow onto the paginated pages (with the exception of Page 1 assuming it is a useful landing page with information) so that pages 2-n do not get indexed (or removed).

I think of the rel=next prev as the opposite of disallowing those pages in robots. Rel next prev helps Google navigate the paginated pages so that it can find what it needs (i.e. the pages that are linked to in the pagination) before the meta robots prevents the paginated pages from being indexed. (I think I made sense there).

Technically, Google would prefer for you to not use meta robots with rel next prev, as they would like to decide what page in the series is most important for the search results, but we use the combo so that we have better control and it works pretty well.

Cheers!

1 0

Hello there! I know I am late on this, but I had a quick comment on #3. Passing link equity without appearing in search results. What is missing here is use of the rel=next and rel=prev tags on those paginated pages. Google recommends using rel next prev so that they can see the relationship between the series of pages. You can then add the noindex,follow onto the paginated pages (with the exception of Page 1 assuming it is a useful landing page with information) so that pages 2-n do not get indexed (or removed). I think of the rel=next prev as the opposite of disallowing those pages in robots. Rel next prev helps Google navigate the paginated pages so that it can find what it needs (i.e. the pages that are linked to in the pagination) before the meta robots prevents the paginated pages from being indexed. (I think I made sense there). Technically, Google would prefer for you to not use meta robots with rel next prev, as they would like to decide what page in the series is most important for the search results, but we use the combo so that we have better control and it works pretty well. Cheers!
Cancel
Steve Jones

2015-07-27T15:39:43-07:00

Hey Rand - great post. Most of the post was about how to not have some things crawled. What about the opposite?

What are your thoughts about throwing your sitemap into the robots file? Does that help making the site more crawlable? And what about all those <changefreq> and <priority>? I've always kind of put my thumb in the air on that one.

1 0

Hey Rand - great post. Most of the post was about how to not have some things crawled. What about the opposite? What are your thoughts about throwing your sitemap into the robots file? Does that help making the site more crawlable? And what about all those <changefreq> and <priority>? I've always kind of put my thumb in the air on that one. 
Cancel
Kristinperry

2015-07-24T05:38:22-07:00

Such a fantastic feature for every one of us. Numerous amateurs and also experienced get befuddled between them, however in the wake of experiencing this great stuff, I don't think they would be in any uncertainty.You have given truly extraordinary samples for robots.txt and meta robots tag.

1 0

Such a fantastic feature for every one of us. Numerous amateurs and also experienced get befuddled between them, however in the wake of experiencing this great stuff, I don't think they would be in any uncertainty.You have given truly extraordinary samples for robots.txt and meta robots tag. 
Cancel
Gipsonjames

2015-07-23T22:24:04-07:00

thank you

1 0

thank you 
Cancel
Andy Kuiper

2015-07-28T09:26:50-07:00

Awesome post:-) I especially like the part about ---> So then we have our SEO folks go, "you know what, let's make doubly sure that doesn't show up in search results; we'll put in the meta robots tag:" <---

1 0

Awesome post:-) I especially like the part about ---> So then we have our SEO folks go, "you know what, let's make doubly sure that doesn't show up in search results; we'll put in the meta robots tag:" <---
Cancel

Post Analytics

Controlling Search Engine Crawlers for Better Indexation and Rankings - Whiteboard Friday

Video transcription

Four crawling/indexing problems to solve

1. Content that isn't ready yet

2. Dealing with duplicate or thin content

3. Passing link equity without appearing in search results

4. Search results-type pages

Comments 110

Video transcription

Four crawling/indexing problems to solve

1. Content that isn't ready yet

2. Dealing with duplicate or thin content

3. Passing link equity without appearing in search results

4. Search results-type pages

Comments 110

Log in to Moz

Don't have an account?