An Introduction to the Search Engines' Tools for Webmasters
To encourage webmasters to create sites and content in accessible ways, each of the major search engines have built support and guidance-focused services. Each provides varying levels of value to search marketers, but all of them are worthy of understanding. These tools provide data points and opportunities for exchanging information with the engines that are not provided anywhere else.
The sections below explain the common interactive elements that each of the major search engines support and identify why they are useful. There are enough details on each of these elements to warrant their own blog posts, but for the purposes of this guide, only the most crucial and valuable components will be discussed.
Common Search Engine Protocols
Sitemaps
Sitemaps are a formatted list of all of the pages on a given website. They are used to ensure that search engines can easily find the location of all of the webpages on a website and to assign each page a relative priority.
The sitemaps protocol (explained in detail at Sitemaps.org) is applicable to three different file formats:
XML - Extensible Markup Language (Recommended Format)
Pros - This is the most widely accepted format for sitemaps. It is extremely easy for search engines to parse and can be produced by a plethora of sitemap generators. Additionally, it allows for the most granular control of page parameters.
Cons - Relatively large file sizes. Since XML requires an open tag and a close tag around each element, files sizes suffer.
RSS - Really Simple Syndication or Rich Site Summary
Pros - Easy to maintain. RSS sitemaps can easily be coded to automatically update when new content is added.
Cons - Harder to manage. Although RSS is a dialect of XML, it is actually much harder to manage due to its updating properties.
Txt - Text File
Pros - Extremely easy. The text sitemap format is one URL per line up to 50,000 lines.
Cons - Does not provide the ability to add meta data to pages.
Sitemaps can either be submitted directly to the major search engines or have their location specified in robots.txt.
Robots.txt
The robots.txt file (a product of the Robots Exclusion Protocol) should be stored at a website's root directory (e.g., www.google.com/robots.txt). The file serves as an access guide for automated visitors (web robots). By using robots.txt, webmasters can indicate which areas of a site they would like to disallow bots from crawling as well as indicate the locations of sitemaps files (discussed below) and crawl-delay parameters. The following commands are available:
- Disallow - Prevents compliant robots from accessing specific pages or folders
- Sitemap - Indicates the location of a website's sitemap or sitemaps
- Crawl Delay - Indicates the speed (in milliseconds) at which a robot can crawl a server.
Warning: It is very important to realize that not all web robots follow robots.txt. People with bad intentions (e.g., e-mail address scrapers) build bots that don’t follow this protocol and in extreme cases can use it to identify the location of private information. For this reason, it is recommended that the location of administration sections and other private sections of publicly accessible websites not be included in the robots.txt. Instead, these pages can utilize the meta robots tag (discussed next) to keep the major search engines from indexing their high risk content.#Robots.txt www.example.com/robots.txt
User-agent: *
Disallow:
# Don't allow spambot to crawl any pages
User-agent: spambot
Disallow: /
sitemap:www.example.com/sitemap.xml
Meta Robots
The meta robots tag creates page-level instructions for search engine bots that govern everything from page inclusion to snippet controls and more.
The meta robots tag should be included in the head section of the HTML document.
Example of Meta Robots:
<html>
<head>
<title>The Best Webpage on the Internet</title>
<meta name=”ROBOT NAME” content=”ARGUMENTS” />
</head>
<body>
<h1>Hello World</h1>
</body>
</html>
In this example, “ROBOT NAME” is the user-agent of a specific web robot (ex. Googlebot) or an asterisk to identify all robots, and “ARGUMENTS” is one of the meta/x-robots-tag below.Use Case | Robots.txt | META/ X-Robots-Tag | Other | Supported By |
Allow access to your content | Allow | FOLLOW INDEX |
Google Yahoo Microsoft |
|
Disallow access to your content | Disallow |
NOINDEX NOFOLLOW |
Google Yahoo Microsoft |
|
Disallow access to index images on the page | NOIMAGEINDEX | |||
Disallow the display of a cached version of your content in the SERP | NOARCHIVE | Google Yahoo Microsoft |
||
Disallow the creation of a description for this content in the SERP | NOSNIPPET | Google Yahoo Microsoft |
||
Disallow the translation of your content into other languages | NOTRANSLATE | |||
Do not follow or give weight to links within this content | NOFOLLOW | a href attribute: rel=NOFOLLOW |
Google Yahoo Microsoft |
|
Do not use the Open Directory Project (ODP) to create descriptions for your content in the SERP | NOODP | Google Yahoo Microsoft |
||
Do not use the Yahoo Directory to create descriptions for your content in the SERP | NOYDIR | Yahoo | ||
Do not index this specific element within an HTML page | class=robots-nocontent | Yahoo | ||
Stop indexing this content after a specific date | UNAVAILABLE_AFTER | |||
Specify a sitemap file or a sitemap index file | Sitemap | Google Yahoo Microsoft |
||
Specify how frequently a crawler may access your website | Crawl-Delay |
Google WMT | Yahoo Microsoft |
|
Authenticate the identity of the crawler | Reverse DNS Lookup | Google Yahoo Microsoft |
||
Request removal of your content from the engine's index | Google WMT Yahoo SE Microsoft WMT |
Google Yahoo Microsoft |
Source: Jane & Robot - Managing Robots' Access to Your Website
Rel="nofollow"
Nofollow is a common inline parameter that is adhered to by all of the major search engines. It is appended to links to prevent them from passing ranking power (or "link juice").
Example of nofollow:
<a href=”https://www.example.com” title=”Example”
rel=”nofollow”>Example Link</a>
An excellent and more comprehensive resource on robots.txt can be found at Jane & Robot - Managing Robots' Access to Your Website. Additionally, a printer friendly version of this information is available on The Web Developer’s SEO Cheat Sheet.Search Engine Tools
The following tools are provided free of charge by the major search engines and enable webmasters to have more control over how their content is indexed.
Google Webmaster Tools
Google Webmaster Tools
Sign Up
Google Webmaster Tools Sign Up
Settings
Geographic target - If a given site targets users in a particular location, webmasters can provide Google with information that will help determine how that site appears in our country-specific search results, and also improve Google search results for geographic queries.
Preferred Domain - The preferred domain is the one that a webmaster would like used to index their site's pages. If a webmaster specifies a preferred domain as https://www.example.com and Google finds a link to that site that is formatted as https://example.com, Google will treat that link as if it were pointing at https://www.example.com.
Image Search - If a webmaster chooses to opt in to enhanced image search, Google may use tools such as Google Image Labeler to associate the images included in their site with labels that will improve indexing and search quality of those images.
Crawl Rate - The crawl rate affects the speed of Googlebot's requests during the crawl process. It has no effect on how often Googlebot crawls a given site. Google determines the recommended rate based on the number of pages on a website.
Diagnostics
Web Crawl - Web Crawl identifies problems Googlebot encountered when it crawls a given website. Specifically, it lists Sitemap errors, HTTP errors, nofollowed URLs, URLs restircted by robots.txt and URLs that time out.
Mobile Crawl - Identifies problems with mobile versions of websites.
Content Analysis - This analysis identifies search engine unfriendly HTML elements. Specifically, it lists meta description issues, title tag issues and non-indexable content issues.
Statistics
These statistics are a window into how Google sees a given website. Specifically, it identifies top search queries, crawl stats, subscriber stats, “What Googlebot sees” and Index stats.
Link Data
This section provides details on links. Specifically, it outlines, external links, internal links and sitelinks. Sitelinks are section links that sometimes appear under websites when they are especially applicable to a given query.
Sitemaps
This is the interface for submitting and managing sitemaps directly with Google.
Yahoo! Site Explorer
Yahoo! Site Explorer
Sign Up
Yahoo! Site Explorer Sign Up
Features
Statistics - These statistics are very basic and include data like the title tag of a homepage and number of indexed pages for the given site.
Feeds - This interface provides a way to directly submit feeds to Yahoo! for inclusion into its index. This is mostly useful for websites with frequently updated blogs.
Actions - This simplistic interface allows webmasters to delete URLs from Yahoo’s index and to specify dynamic URLs. The latter is especially important because Yahoo! traditionally has a lot of difficulty differentiating dynamic URLs.
Live Webmaster Tools
Live Webmaster Center
Sign Up
Live Webmaster Center
Features
Profile - This interface provides a way for webmasters to specify the location of sitemaps and a form to provide contact information so Live can contact them if it encounters problems while crawling their website.
Crawl Issues - This helpful section identifies HTTP status code errors, Robots.txt problems, long dynamic URLs, unsupported content type and, most importantly, pages infected with malware.
Backlinks - This section allows webmasters to find out which webpages (including their own) are linking to a given website.
Outbound Links - Similarly to the aforementioned section, this interface allows webmasters to view all outbound pages on a given webpage.
Keywords - This section allows webmasters to discover which of their webpages are deemed relevant to specific queries.
Sitemaps - This is the interface for submitting and managing sitemaps directly to Microsoft.
It is a relatively recent occurrence that search engines have provided ways for webmasters to interact directly with crawlers. While this relationship is still not optimal, the search engines have made great strides toward opening their proprietary indices. This has been very helpful for webmasters who now rely so much on search driven traffic.
As always, comments and constructive criticism are appreciated. You'll note that I'm trying to go back to making this more of a true "beginner's" guide, as I'm concerned that the previous guide may have gone a bit too in-depth. Hopefully between Rand and me, we can finish this mammoth undertaking :)
that's excellent advice for people just getting oriented on the web.
a direct follow-on of that is to obfuscate or avoid using file names that share too much info, e.g. don't use file names like /administrator.html, etc.
A well organized presentation of information for the beginner.
This is a great article that addresses SEO in a different manor of ways. Explaining the correct terminology and meanings is extremely beneficial for users wanting to implement this into their website. SEO for any web designer can be a challenging thing. Having the right resources allows you to use keywords/meta tags etc correctly and efficiently.
Thanks Danny,
It is very nice explanations of Robots.txt and Webmaster Tools. This information will really helpful to beginner SEOs and also it is easy to understand for them.
I want to give you one suggestion that we should to add information on how to submit XML sitemap in Ask.com
Gunjan,
Interesting point you bring up. It is my understanding that Ask.com is on its way out. I know that in my work I have never noticed any signifigant amount of traffic from Ask.com. So unless I hear otherwise in the comments, I am not going to add sitemap submission for Ask.
That said, I did do some research for you. Ask.com performed method for getting sitemaps is by being lead to it by robots.txt's "Sitemap:"parameter. Additionally, it has a system setup to receive a ping whenever sitemaps are updated. The URL is below:
https://submissions.ask.com/ping?sitemap=http%3A//www.the URL of your sitemap here.xml
Source: https://about.ask.com/en/docs/about/webmasters.shtml#22
On a personal note, I am sad to see Ask.com go. Their old mascot, Jeeves, is one of my first memories of the internet. I think they had a great mission (answer natural language search queries) but promised too much and delivered too little.
bots that don’t follow this protocol and in extreme cases can use it to identify the location of private information
I think I'd mention that information on the internet is not private unless you take steps to make it private. One of the comments mentions preventing image crawling using meta robots instructions - that's fine until sopmeone links to one of your images from their own website.
.htaccess will be the basic (not simple!) solution for most.
Nice work, Danny. I like the lead-in with the protocols and features that are common across the engines. One thing that might be interesting is to compare and contrast the three engines' webmaster tools a little more. They all have their strengths and weaknesses.
Good work. I went end to end in signing up for all 3 in a row the other day on a personal project (most client projects are signed up for at least one and I'm not normally the one who would do the actual signing up for the others). It was an interesting experience to see the differences between the three. I found Google very straight forward (familiarity most likely) and MSN very easy to do even though I'd never done it personally before. Yahoo! was easily the most confusing in my opinion. I couldn't work out how to add a domain and all sub-domains at once.... Which is annoying when your feeds are served from a feeds. subdomain.
Will, I did that for our new site as well and found it interesting to compare-
I found Google to provide the most use and in the easiest manner.
MSN - actually seems to "lose" my verification and needs to recheck often - but did offer some interesting insight.
Yahoo - good, didnt offer as much as MSN or Google and could offer a lot more.
Just a quick note: could you change the .copy code CSS to be 1em instead of 1.2? Just looks a little tidier, I think. (default.v4.css, line 216)
I forwarded this to our designers/geniuses. Thanks!
I think we're doing another small update today. Look for that change along with a huge update to the Linkscape basic report.
Enjoy!
good review for those basic action ! good post Danny. Thx
Super Useful article, thanks a lot!
There was this one question I had about tools + SEO: does the use of tools impact rank? Not covered here - well, might be a silly question - but i had a freshly emerging project with no ranks etc. - and after one day of intense use of diagnostic tools (!only diagnostic, no changes to site or anything!, here at seomoz and webmaster tools and some other rankcheckers) it suddenly started coming up in Google (immediately after). Coincidence or is there some connection? - it sure looked like that! But i have never heard anybody suggesting this might really be so.
Odd. Since you mentioned that you did not actually change anything on the site, my guess is that it was just a coincidence.
Weird though.
Thanks for reading :-)
Great guide. I use the Google Webmaster tools the most.
As well as this submitting a sitemap does get you indexed in google a lot quicker.
-Brenelz
I also use Google WebMaster Tools on a regular basis. It's extra helpful when I am handed over a site that has been looked after by someone else prior to me.
Speaking of which, if anyone knows if I should use a robots file to deal with Drupal sites creating duplicate page titles for items like event calendars I would love to hear. Thanks!
Great post. I just knowing that I can disallow robot to access images through meta robot tag
Question: Does adding a site map to a site that has been stable for a while (ie all pages are indexed) offer any benefit?
Thanks
Short Answer: Yes, but it usually causes short term results. It most definitely won't hurt your rankings.
Long Answer: https://www.seomoz.org/blog/do-sitemaps-effect-crawlers
Hope that helps!
Hi Danny - considering this is the beginners guide, whilst you recommend the tools from the search engines there are no recommendations on tools to create sitemaps...
Any favourites out there?
I find this an interesting issue as some server side ones make mistakes - and some webbased crawler ones also miss out sections of your sites!
So perhaps for those of us who want a refresher you could recommend a few?
I use a custom generator that works for CakePHP. Since I am asusming you don't use that framework I recommend checking out https://googlewebmastercentral.blogspot.com/2009/01/new-google-sitemap-generator-for-your.html It just came out today, so I can't vouch for it, but its from Google so it can't be too bad :-)
Nice breakdown as usual Danny.
My biggest pet peeve from an agency standpoint is that Yahoo and MSN make it difficult to manage large groups of domains. I like that Google will allow 500 sites per account. In an ideal situation each site would have its own login to webmaster tools from all 3, but on the scale I deal with that is just not possible.
I typically use Site Explore withour verifying it which I know limits the functions but still provides some data that is helpful.
Thanks! I am hoping to get back into the schedule of blogging once a week again. This company is changing a lot (for the better!) and everyone is wearing a lot more hats than before.
In regards to your "Uber Tool", Google offers an API for webmaster tools (https://code.google.com/apis/webmastertools/docs/2.0/developers_guide_protocol.html) as does Yahoo. I am not sure of the status of Live (anyone?). It would not be too difficult for someone to build such a tool.
We should just keep this idea between you and I ;-)
well written article, i personally life like the last section where you mention about Live Webmaster Section, Yahoo site explorer and Google webmaster... i use these tool for almost for all of my projects and find them very helping. I like the Crawl Issues area in Live Webmaster but i found it hard to really understand there Keyword section area. Thanks for Sharing
you can find a list of very useful SEO Tools at List of paid and Free SEO Tools
Great post. I just recently signed up for MSN and Yahoo webmaster tools and I really wish they had some of the features that Google has - like the geographic target section.