One of the things that excites me most about the development of the web is the growth in learning resources. When I went to college in 1998, it was exciting enough to be able to search journals, get access to thousands of dollars-worth of textbooks, and download open source software. These days, technologies like Khan Academy, iTunesU, Treehouse and Codecademy take that to another level.
I've been particularly excited by the possibilities for interactive learning we see coming out of places like Codecademy. It's obviously most suited to learning things that look like programming languages - where computers are naturally good at interpreting the "answer" - which got me thinking about what bits of online marketing look like that.
The kinds of things that computers are designed to interpret in our marketing world are:
- Search queries - particularly those that look more like programming constructs than natural language queries such as [site:distilled.net -inurl:www]
- The on-site part of setting up analytics - setting custom variables and events, adding virtual pageviews, modifying e-commerce tracking, and the like
- Robots.txt syntax and rules
- HTML constructs like links, meta page information, alt attributes, etc.
- Skills like Excel formulae that many of us find a critical part of our day-to-day job
I've been gradually building out codecademy-style interactive learning environments for all of these things for DistilledU, our online training platform, but most of them are only available to paying members. I thought it would make a nice start to 2013 to pull one of these modules out from behind the paywall and give it away to the SEOmoz community. I picked the robots.txt one because our in-app feedback is showing that it's one of the ones from which people learned the most.
Also, despite years of experience, I discovered some things I didn't know as I wrote this module (particularly about precedence of different rules and the interaction of wildcards with explicit rules). I'm hoping that it'll be useful to many of you as well - beginners and experts alike.
Interactive guide to Robots.txt
Robots.txt is a plain-text file found in the root of a domain (e.g. www.example.com/robots.txt). It is a widely-acknowledged standard and allows webmasters to control all kinds of automated consumption of their site, not just by search engines.
In addition to reading about the protocol, robots.txt is one of the more accessible areas of SEO since you can access any site's robots.txt. Once you have completed this module, you will find value in making sure you understand the robots.txt files of some large sites (for example Google and Amazon).
For each of the following sections, modify the text in the textareas and see them go green when you get the right answer.
Basic Exclusion
The most common use-case for robots.txt is to block robots from accessing specific pages. The simplest version applies the rule to all robots with a line saying User-agent: *. Subsequent lines contain specific exclusions that work cumulatively, so the code below blocks robots from accessing /secret.html.
Add another rule to block access to /secret2.html in addition to /secret.html.
Exclude Directories
If you end an exclusion directive with a trailing slash ("/") such as Disallow: /private/ then everything within the directory is blocked.
Modify the exclusion rule below to block the folder called secret instead of the page secret.html.
Allow Specific Paths
In addition to disallowing specific paths, the robots.txt syntax allows for allowing specific paths. Note that allowing robot access is the default state, so if there are no rules in a file, all paths are allowed.
The primary use for the Allow: directive is to over-ride more general Disallow: directives. The precedence rule states that "the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule. The order of precedence for rules with wildcards is undefined.".
We will demonstrate this by modifying the exclusion of the /secret/ folder below with an Allow: rule allowing /secret/not-secret.html. Since this rule is longer, it will take precedence.
Restrict to Specific User Agents
All the directives we have worked with have applied equally to all robots. This is specified by the User-agent: * that begins our commands. By replacing the *, however, we can design rules that only apply to specific named robots.
Replace the * with googlebot in the example below to create a rule that applies only to Google's robot.
Add Multiple Blocks
It is possible to have multiple blocks of commands targeting different sets of robots. The robots.txt example below will allow googlebot to access all files except those in the /secret/ directory and will block all other robots from the whole site. Note that because there is a set of directives aimed explicitly at googlebot, googlebot will entirely ignore the directives aimed at all robots. This means you can't build up your exclusions from a base of common exclusions. If you want to target named robots, each block must specify all its own rules.
Add a second block of directives targeting all robots (User-agent: *) that blocks the whole site (Disallow: /). This will create a robots.txt file that blocks the whole site from all robots except googlebot which can crawl any page except those in the /secret/ folder.
Use More Specific User Agents
There are occasions when you wish to control the behavior of specific crawlers such as Google's Images crawler differently from the main googlebot. In order to enable this in robots.txt, these crawlers will choose to listen to the most specific user-agent string that applies to them. So, for example, if there is a block of instructions for googlebot and one for googlebot-images then the images crawler will obey the latter set of directives. If there is no specific set of instructions for googlebot-images (or any of the other specialist googlebots) they will obey the regular googlebot directives.
Note that a crawler will only ever obey one set of directives - there is no concept of cumulatively applying directives across groups.
Given the following robots.txt, googlebot-images will obey the googlebot directives (in other words will not crawl the /secret/ folder. Modify this so that the instructions for googlebot (and googlebot-news etc.) remain the same but googlebot-images has a specific set of directives meaning that it will not crawl the /secret/ folder or the /copyright/ folder:
Basic Wildcards
Trailing wildcards (designated with *) are ignored so Disallow: /private* is the same as Disallow: /private. Wildcards are useful however for matching multiple kinds of pages at once. The star character (*) matches 0 or more instances of any valid character (including /, ?, etc.).
For example, Disallow: news*.html blocks:
- news.html
- news1.html
- news1234.html
- newsy.html
- news1234.html?id=1
But does not block:
- newshtml note the lack of a "."
- News.html matches are case sensitive
- /directory/news.html
Modify the following pattern to block only pages ending .html in the blog directory instead of the whole blog directory:
Block Certain Parameters
One common use-case of wildcards is to block certain parameters. For example, one way of handling faceted navigation is to block combinations of 4 or more facets. One way to do this is to have your system add a parameter to all combinations of 4+ facets such as ?crawl=no. This would mean for example that the URL for 3 facets might be /facet1/facet2/facet3/ but that when a fourth is added, this becomes /facet1/facet2/facet3/facet4/?crawl=no.
The robots rule that blocks this should look for *crawl=no (not *?crawl=no because a query string of ?sort=asc&crawl=no would be valid).
Add a Disallow: rule to the robots.txt below to prevent any pages that contain crawl=no being crawled.
Match Whole Filenames
As we saw with folder exclusions (where a pattern like /private/ would match paths of files contained within that folder such as /private/privatefile.html), by default the patterns we specify in robots.txt are happy to match only a portion of the filename and allow anything to come afterwards even without explicit wildcards.
There are times when we want to be able to enforce a pattern matching an entire filename (with or without wildcards). For example, the following robots.txt looks like it prevents jpg files from being crawled but in fact would also prevent a file named explanation-of-.jpg.html from being crawled because that also matches the pattern.
If you want a pattern to match to the end of the filename then we should end it with a $ sign which signifies "line end". For example, modifying an exclusion from Disallow: /private.html to Disallow: /private.html$ would stop the pattern matching /private.html?sort=asc and hence allow that page to be crawled.
Modify the pattern below to exclude actual .jpg files (i.e. those that end with .jpg).
Add an XML Sitemap
The last line in many robots.txt files is a directive specifying the location of the site's XML sitemap. There are many good reasons for including a sitemap for your site and also for listing it in your robots.txt file. You can read more about XML sitemaps here.
You specify your sitemap's location using a directive of the form Sitemap: <path>.
Add a sitemap directive to the following robots.txt for a sitemap called my-sitemap.xml that can be found at https://www.distilled.net/my-sitemap.xml.
Add a Video Sitemap
In fact, you can add multiple XML sitemaps (each on their own line) using this syntax. Go ahead and modify the robots.txt below to also include a video sitemap called my-video-sitemap.xml that lives at /my-video-sitemap.xml.
What to do if you are stuck on any of these tests
Firstly, there is every chance that I've made a mistake with my JavaScript tests to fail to grade some correct solutions the right way. Sorry if that's the case - I'll try to fix them up if you let me know.
Whether you think you've got the answer right (but the box hasn't gone green) or you are stuck and haven't got a clue how to proceed, please just:
- Check the comments to see if anyone else has had the same issue; if not:
- Leave a comment saying which test you are trying to complete and what your best guess answer is
This will let me help you out as quickly as possible.
Obligatory disclaimers
Please don't use any of the robots.txt snippets above on your own site - they are illustrative only (and some would be a very bad idea). The idea of this post is to teach the general principles about how robots.txt files are interpreted rather than to explain the best ways of using them. For more of the latter, I recommend the following posts:
- How to block content from the search results (pro-tip - don't rely on robots.txt despite my examples above excluding "secret" files and folders)
- Learn more about why you might want to block robots from certain areas of your site
- Avoid accidentally giving conflicting directives with the various different ways of blocking robots
- Read up on some "don'ts" (old but still relevant): robots.txt misuse, accidentally blocking link juice
I hope that you've found something useful in these exercises whether you're a beginner or a pro. I look forward to hearing your feedback in the comments.
Thanks for making the module free! Hope it's a step to more free modules ;-)? Joking aside - I really love the way DistilledU is developing itself. It looks very promising. Codecademy is a great website to look up to and use as an example.
As for robots.txt I've never really needed the 'advanced' ways since A) all crawlers are welcome and B) exclusion of a page/directory I usually do on the page itself as to not accidentally block the whole site!
Very interesting post, expecially for beginners, but I have to correct you on a basic/extremely-important code that you have mispelled:
Sitemap: /file.xml <- this is totally wrong
Sitemap: file.xml <- still wrong
Sitemap: https://www.site.com/file.xml <- correct
Just try yourself inside your WMT panel: proof here
Bye =)
Good spot. I'll get that fixed up.
That should be fixed shortly (here and within DistilledU!).
Thanks for pointing it out.
It's fixed now (still need to update the text to make it clear that the new version tests for a sitemap at https://www.distilled.net/my-sitemap.xml).
This tool was great for one of my new account managers, thanks so much Will!
He did mention that while you're making this change, you might wanna make it clear that the next question tests for a sitemap at https://www.distilled.net/my-video-sitemap.xml as well.
@Will
Interesting post. I would like to add...
* Just stating the obviously for the newbies, but a robots.txt block prevents onpage meta canonical being crawled, hence if canonical are important - then robots.txt might not be the best way to block or control crawled and indexed pages (e.g. exclude parameters in GWT and BingWT might be safer).
* Robots.txt is only applied at a sub-domain level, thus a block on www.seomoz.org does not stop stageosev3.seomoz.org or apiwiki.seomoz.org being crawled for example:
https://www.seomoz.org/robots.txt
https://apiwiki.seomoz.org/robots.txt
https://go.seomoz.org/robots.txt
https://stageosev3.seomoz.org/robots.txt
* Personally, for dev servers and news websites - I use both Disallow: / and Noarchive: / in order to prevent internal pages and external backlinks to these pages being indexed (rather than just block prevent internal pages being crawled) - just to be on the safe side.
For example the sub-domain "go.seomoz.org" is disallowed, but backlinks with &aff_id are not Noarchived. Hence a search for "site:go.seomoz.org inurl:aff_id" returns 550 results.
https://www.google.com/search?q=site:go.seomoz.org+inurl:aff_id&num=100&pws=0&as_qdr=all&prmd=imvns&filter=0
* On the PPC side, you did not mention that User-agent: AdsBot-Google does not follow the robots.txt protocol precisely - because it ignores user-agent: * and thus needs to be declared explicitly.
Conversely BingAdbot (called User-agent: Adidxbot) does comply with robots.txt user-agent: *
* Personally I prefer to use "&seo_robots_noindex=true" rather than "&crawl=no"
then
User-agent: *
Disallow: /*?seo_robots_noindex=true
Disallow: /*&seo_robots_noindex=true
e.g. https://www.seomoz.org/pages/search_results#stq={keyword}&placement_category={target}&seo_robots_noindex=true
Or
User-agent: AdsBot-Google
User-agent: AdsBot-Google-Mobile
User-agent: Adidxbot
Dissallow:
Allow: /ppc-landing-pages/
* Lastly, you might find this interesting it is a "robots.txt whitelist" I collated (a technique used by www.facebook.com/robots.txt and www.alexa.com/robots.txt) as a means to block scrapers and reduce bandwidth rather than using server-side htaccess user-agent block.
https://www.dropbox.com/s/w7wl8k79oz00wuy/robots.txt
Note: "Crawl-delay: 30" within robots.txt can also be used to reduce bandwidth.
Cheers
Phil.
P.S. Going a bit off topic, but the importance of using "&seo_robots_noindex=true" or "crawl=no" method in URLs increases - when lots of dynamic pages are used for PPC (and hence these need to be blocked within GoogleOrganic) for example, here is a string I am testing at the moment:
https://www.mydomain.com/[category]/[seed]/[expansion]/[final]/?search={keyword}&device={ifmobile:mobile}&city={lb.city}&postalCode={lb.postalCode}&adtype={adtype}&KW={keyword}&match={matchtype}{ifcontent:c}&distribution={ifsearch:search}{ifcontent:content}&creativeid={creative}&adposition={adposition}&network={network}&placement_category={target}&placement={placement}&ad_param2={param2}&ad_param1={param1}&ad_insertiontext={insertionText}&adwords_producttargetid={adwords_producttargetid}&campaign_exp_aceid={aceid}&seo_robots_noindex=true
Good call on the Adsbot, it's not a well known fact and you just reminded me too.
Do you think that the whitelist technique is useful? Scrapers don't have to respect robots.txt, nor would they let directives stop them ;)
I've been a member of DistilledU since it was in beta (not sure if it still is??) and can say its fantastic. This is just one example out of loads of great resources and training it provides, as well as all the videos that have recently been made available for the DistilledU members. Would highly recommend a visit (and a subscription). Keep up the good work guys.
Thanks Will. The rules are displaying on a single line in your examples, shouldn't they be on seperate lines? M
Yes. They were in my final draft post - I'm working to get that fixed up now.
The post is now updated with this change. Sorry for the inconvenience, and thanks for reading!
Great. Was about to say you learn something new every day haha
This is, hands down, the best post I've ever seen on the SEOmoz site. I wish I had Will's coding skills, so I could teach people Excel this way.
Thanks Annie!
I already did the excel one in DistilledU.:)
Hey, thanks very much for this Will. I've been using DistilledU for quite a few months now, and it's been particularly useful for training new staff. However, there is clearly something for the advanced as well with the new modules that are being added. Another useful interactive module I noticed was the " search operators" lesson in the beginners section, where you get quizzed on which which "operators" are besto for each given situation. Really been impressed with DistilledU for training + learning purposes!
I love Distilled U!
Will,100 thumbs Up post :) really its give clear idea about how we can use robots.txt file by smart way.
Good post on robots.txt with basics for newbies. But somebody answer me what's the difference between:
User-agent: * Disallow: /secret/
and
User-agent: * Disallow: /secret/*
Thanks in advance.
There's no difference. Wildcards at the end of the line are ignored (or, rather, implied).
Thanks for the answer. If so should test it for several common search engines...
Thanks Will, very good interactive guide/tutorial. If all other modules are like this one, $40 a month is a great deal. Will have to give it a try.
Thanks for sharing..Nice summarize tutorials and example about robots.txt.
Brilliant idea for a post, Will. They say people all learn in different ways, and I'd much rather try an interactive version of something than simply read about it. This is like the Codeacademy of SEO! Good work!
How about one on hreflang in the future? I can see that being useful for a lot of folk! :-)
Hey Will,Thanks for sharing such a nice post.I believe wildcard characters are star (*) and question marks (?)Is question marks (?) are considered in robots.txt?
That's just a perfect way to learn!
I just just couldn't resist to make my way through - and I have learned some new things about robots.txt.
Thanks for the links at the top. I knew already Codeacademy (I am a huge fan, too) - I will check them out asap.
Wow that functionality is awesome!
Very simple and easy to follow, even for tech challenged users. An excellent share on some of the ways to leverage using the robots.txt file.
Very cool, interactive post.
Thanks Will this is great
One question - if a disallow is added, say Disallow: /tag/ but a load of pages within the tag folder are already indexed, should they fall out over time, or is there a better way to handle this?
Robots.txt doesn't actually do anything to with regard to indexation (even pages that have always been blocked with robots.txt can appear in the index - they just don't know much about them).
Depending on your priorities, you could add a meta noindex to those pages, let them get re-crawled and then block them in robots.txt.
It's one of the best post to know about robots.txt from basic, I ever read which helps everyone to easily understand about it.
Very informative post, i know about robots.txt but a detailed explaanation i got here.
Bookmarked.
Dear willcritchlow,
How to solve this problem of my website.
Ex: My website have page structure mysite.com/category/a-b-c.html. But now it appears 2 page
mysite.com/category/login.aspx and mysite.com/category/register.aspx. I think that these problems from my developers. And I try to use robots.txt like
User-agent: *
Disallow: /register.aspx
Disallow: /login.aspx
But it 's not effective ! So please tell me how to solve this problem using robots.txt
Thanks.
You need something like:
User-agent: *
Disallow: */register.aspx
Disallow: */login.aspx
(assuming you want to blog all versions of those pages wherever they sit).
@willcritchlow
Regarding your example:
User-agent: * Disallow: /secret/Allow: /secret/not-secret.html
All the folders and files who are in the /secret/ folder will be excluded? (except not-secret.html)
If not, how can i block the robots access to/or no index those directory's or files in the ex. bellow:
/secret/directory1/
/secret/directory2/
/secret/*.*
except /secret/not-secret.html or another directory ex. /secret/allowed/
Thank you in advance.
Correct. Which I think means you know how to do the second part of your question - you simply allow the specific paths you want to allow within the disallowed directory.
i got it:)
Thank's
I like your article & the way you are explaining about robots.txt. It is nice article & contain genuine data for learning how we create robots.txt file for a site.
Nice, I am defineatly going to return to this post when I need a specific robots.txt command!
nice idea but is it broken? March 2015 it "lights up green" with any entry. Seems broken to me.
Is there a trick to get it to work??
I know this is an old post, but it ranks well on Google (~7th) for "robots.txt sample"
It looks like some of the formatting on the page has been broken over the years? Specifically, see the "Add an XML Sitemap" section and how most of the text boxes appear to be missing line breaks.
It still helped me out though, Thanks!
In case it helps anyone that lands here, you can test robots.txt rules with example urls here: https://robots-txt-parser.stapps.io/
Thanks so much for this - super helpful as I try to enhance my skill set. The correct answers were not turning green for me - is this functionality still operational? Or was I just completely wrong on everything ...
Hi.. I'm working on a Chinese website, and due to some laws in the country. It shows a splash screen, asking the users if they are below 18 or 18+. This is causing an issues in the indexing of the site. Google isn't fetching the content of the site, while it only shows the splash screen in the indexed data.\
If I apply "Crawl-delay directive" would this help me? Because by then the user would have selected one of the options.
Is it just me, or are the boxes not turning green anymore?
This is amazing! can anyone clear the robots.txt. What should i do if i want to block a single page having URL is www.xyz.com/abe-ace ?
Hello ,
I have one question. I have a site for example https://www.abc.com/xyz.html. Now there is content of this page. This page contain pagination. When i go to page 2, URL shows like https://www.abc.com/xyz.html?p=2. Means google take it as duplicate content. How can i prevent from this using robots.txt. I know about canonical. But i want to do this by using Robots.txt. Please Help me.
-Ankit
Hi Will! Yes its so true that Robots.txt is a plain-text file found in the root of a domain. It is actually an accepted standard and allows webmasters to control all kinds of automated consumption of their site, ant not just by search engines.
Great intro to robots.txt! This answered a lot of my questions, thanks!
Thanks Will, great for referencing.
Great post, but it seems to me that the example entries are pretty mixed up. Under "exclude directories" the example is excluding a single file. Under "allow specific paths" the example is excluding a path. Maybe they are all one step out of sequence or something? Makes it confusing.
They are interactive. You need to enter the correct answers...
Will, thank you for the detailed post about the fundamental things in SEO.But I think many people are faced with a situation, that Googlebot considers disallowing in robots.txt only as "don't index content of this page", but page URL can appear in SERP with notice "A description for this result is not available because of this site's robots.txt" (or even with links anchors), especially if page have some inbound links or Google+ mentions.So, best way to prevent page indexing and ranking - is a meta noindex instruction in document head section. Are you agree?
Robots.txt is good when you don't want Google to access a lot of pages and waste crawl bandwidth. Really there is no general best solution, rather the best solution is dependent upon your goals.
Some of the resources I link out to at the end give more general guidance on when robots.txt is a good idea (and what kind of problem it's good for solving).
Also, see Geoff's answer :)
Yes, colleagues. I agree about importance of goals.
Hey Will,
I'm curious to know if you can add a wildcard between elements in a robots.txt file and not risk everything past the first element. For instance, what would be the impact of having a Disallow element like this?:
Disallow: /folder/*.aspx
Would that only block .aspx elements within /folder/ or would that disallow everything under /folder/?
To expand my question even further, would you be able to block all occurrences of a specific subfolder name regardless of the URL structure if you had a Disallow element like this?:
Disallow: /folder/*/widgets/
Thanks,
Keith
Outstanding post! A basic file that is so flexible and powerful. Thanks so much for sharing great info and examples.
+1 Will, I really enjoyed testing myself with the interactive robots.txt test!! Thank you!
I've got two that aren't turning green:
Add Multiple Blocks:
User-agent: googlebot Disallow: /secret/User-agent: * Disallow: /
Use More Specific User Agents:
User-agent: googlebot Disallow: /secret/
User-agent: googlebot-images Disallow: /secret/User-agent: googlebot-images Disallow: /copyright/
Hi there - sorry that there weren't line breaks in the text areas initially. Try refreshing the page and having another go - it should be clearer and easier now.
I *think* that might be the root of your issues with the first one.
For the "more specific user agents" test, I think your solution may be technically correct, but I think you have an extra "User-agent: googlebot-images" versus my proposed solution - you can add both the Disallow lines into the same block (as per the very first test).
Let me know if that's not clear or still doesn't work for you.
Thank you for this work, it's a good post to learn robots.txt. You add somes good examples.
Must admit, quite impressive!
I will send a couple of my online buddy to this post who are interested in SEO techniques.
Now that you've changed the answers to the two sitemap questions, using /file.xml doesn't work and I had no idea what domain to use. I figured out it was https://www.distilled.net/ eventually but without that, the last two are very confusing to try to answer.
Great post - I wish I knew how to do what you've done to make these interactive questions. That's a pretty cool feature for this type of educational post.
Hi Will, AWESOME post and love the interactivity. Just a quick question, I seem to be stuck on the basic wildcards section. I've come up with: User-agent: * Disallow: *.html /blog/ --- what am I doing wrong? Many thanks!
Quite excellent information on technical SEO. I just a have doubt what should be Robots.txt url for this sample website https://example.example1.com ?
Thanks for sharing this post.
https://example.example1.com/robots.txt
Is that what you mean?
Kalu - use https://www.distilled.net
@Will - see my comment about 8 up. Now that the answers are fixed on the last two per Danillo's comment, you need to know the URL for the file to make the last two turn green.
User-agent: *
Disallow: /private/
Sitemap: https://www.distilled.net/my-sitemap.xml
Sitemap: https://www.distilled.net/my-video-sitemap.xml
Without knowing the domain, you can't answer those.
Yep - sorry - I can't edit the post in live and I missed that - getting it fixed up with the editors.
Will, Excellent post on robots.txt. The start of the article is very interesting. Add multiple blocks and use more specific user agents will help to a beginner in robots.txt.
I am trying to block a certain directory of pages using:Disallow: /directory/*/directory2/
Will this still allow the use of pages inside /directory/
Yes. That blocks things like /directory/anything/directory2/anything is that what you mean?
extraordinary publish on robots.txt. thank you for the specific publish about the essential factors in SEO
Thanks Will,
This is a good reference for people wanting to learn about Robots.txt. Seems to be back in style again huh?
An oldie but a real goodie.
Hey Will,
Above you show examples for "Disallow: news*.html"
In that section you say it blocks "/news.html" but it doesn't block "/directory/news.html". Can you give me a URL example of when /news.html would be blocked? Does it have to bump up right next to the root as in https://domain.com/news.html ?
Thanks.
~ Joe
Any of the following patterns would block /directory/news.html
Hope that helps.
Hi Will,
It definitely helps, but what you wanted to say about
Is that sitemap location /private/?
tks
As another commenter reminded me, sitemap locations need to be full URLs so it should be:
https://www.distilled.net/my-sitemap.xml
Hope that helps.
Hey Untypical! That's really a very good question. Thanks Will for such a nice and very interactive robots.txt 101. I too have a question for you. What should one do to block directives and pages in a sub domain. Should we add a separate robots.txt file for each sub domain on a site ?
Yes. They need their own robots file.
-deleted-
This is a nice intro to robots.txt. For further reading and advanced techniques I would recommend Google's Robots.txt Specifications.
Yep - definitely worth going to the source. I linked to that in the paragraph that starts "In addition to reading about the protocol". Thanks
Oops - I missed that. Instead I'll contribute a web-based tool for creating a robots.txt and a tutorial on building a robots.txt using the IIS SEO Toolkit. :-)
This is very basic but really in-depth and defines it from the starting so that even a one who has no idea of what robots.txt is about can learn how to go with it. I love these guides as they are handy and very much actionable.
I will add this in my presentation some time so that more and more people can read about this and learn from it.
Thankyou will for this!
Great post for a beginner..but am bit disappointed that there is nothing new for an experienced one :(
@ willcritchlow but i felt very boring stuff. u went too much in deep
I disagree with you Mike. I think one of the myths about SEO is that it's all an exciting, magical concoction of chants, spells and potions. While it may not be nearly as sexy (nor does it require a magic wand), understanding the value of a properly configured Robots.txt file is real grunt-work SEO. It's that kind of thing that often makes the biggest difference. This post is totally aligned with my SEO mantra: "It's a matter of time, patience and intelligent work."
Thanks for the post Will. I'm definitely a fan.