For the years from 1998-2005, Google and webmasters were like nerds and dating - separated by an unwritten code that kept them far away from one another. With the exception of the occasional comment by GoogleGuy on the forums or Matt Cutts at an industry show, there was virtually no connection between the search giant and those who marketed sites and content via their medium.
In the summer of 2005, Google started its sitemaps program, which eventually evolved into Webmaster Central. Today, that organization comprises hundreds of individuals around the world, working on webmaster relations, webmaster problems and webmaster tools. It's the de facto model followed by Microsoft and Yahoo! and in many ways epitomizes the legitimacy that SEO has achieved from our darkest days as web outcasts.
However, there is one faction infuriated with what Google has built and desperate to stop the advancement of the Webmaster Central programs - in particular those portions that require site owners to verify their domains with Google. Who are they? Google's upstart competition. I won't talk specifically about the sources of these opinions, but they include more than a few of the major startup players in web search, as well as International engines and stealth-mode operators. What's the problem? I'll explain.
From the beginning of the web until about 2006, any company who wanted to build a web search engine had everything they needed at their disposal (at least, so long as they had the funds to make the effort happen). Despite the massive technical and financial requirements, nothing stood in the way of a creative group crawling and indexing the web, then using that data to construct an index and an algorithm to show results. It was all there on the Internet, just waiting to be fetched. Google changed that with the introduction of Sitemaps and the later growth of Webmaster Central.
Now, there's tons of data no startup engine could access without hacking Google's servers. Information like:
- Sitemaps - The lifeblood of many sites' crawlability and accessibility as well as information about which canonicalization of URLs and even URL exclusion is now exclusively available to search engines that receive the sitemap. Even Yahoo! and Microsoft are severely disadvantaged, as webmasters are less likely to submit sitemaps to them.
- Geo-targeting information - Google allows you to set geography and language for subdomains, subfolders, or entire sites inside the Webmaster Console, which is fantastic for websites and SEOs, but gives a clear competitive advantage over any player wishing to enter the search space.
- Crawl Rate Control - For sites that want to allow faster crawling or those who require less heavy demand, crawl control is an option inside Webmaster Tools and another piece of information that benefits Google's control over site data.
- Domain preference - Even though it's a simple thing, the ability to set a domain preference (url.com vs. www.url.com) gives Google a decided advantage.
Any piece of information that's submitted behind a verification protocol, rather than openly accessible by crawling, is going to hinder competition and help reinforce the market leader's domination. Suddenly, in the last two years, the barriers of entry to building an effective web-wide search engine have skyrocketed.
Caveat - I personally do not believe that Google built these tools with the goals of eliminating potential competition, and in fact am a huge fan of Webmaster Central, particularly the folks who work as analysts and evangelists to the webmaster community. However, it's certainly valuable to consider the effect a service like this has on the broader technology market, and how it fits in with Google's pledges of open-ness and "non-evil."
Wow. I'd never even heard that idea before, let alone thought it myself. Somehow I doubt that someone's www vs. non-www preference is going to keep a dedicated entrepreneur from starting up a new business, but let's talk through it.
The biggest hole in that theory is that Sitemaps are now discoverable via robots.txt. Far from being "exclusively available to search engines that receive the sitemap", many sites put a link to their sitemap in their robots.txt, where it can be retrieved by not only the big search engines but any small search engine as well.
In that respect, Google has supported Sitemaps to make it *easier* for new search engines to crawl and find new urls. Writing an industrial-strength crawler is harder than most people realize, so the fact that Sitemaps can be discovered and crawled by anyone actually makes it easier for a competing search engine to get started: to begin with, just find all the Sitemaps files mentioned in robots.txt and crawl those urls. You can do that before you often even can crawl regular links out on the web.
Google has been asking webmasters to send data to Google since 2001 with our spam report form, so it's not as if Google's webmaster tools are new in that respect. Personally, I think it's cool that webmasters and Google can communicate more closely with the webmaster console, and it's not as if the communication is all one-way. We send out a lot of messages to webmasters whose sites have been hacked, have hidden text/links, etc. So I call Unfounded Conspiracy Theory. :)
Matt - I'm with you in theory, and I agree that Webmaster Central has done amazing things for webmaster relations with Google and the ability to build and promote great sites.
However, I think you're attacking the weakest parts of the startups' arguments and ignoring the stronger ones. Yes, you CAN make your sitemaps file discoverable to the engines via robots.txt, but I'm guessing that a large percentage of sitemaps users upload theirs directly through WMC, and thus prevent other engines from accessing it. Same goes for language targeting - you COULD use meta tags, but the engines claim to ignore these in favor of the languages/regions that are selectable inside WMC. It's important data that Google gets about sites that other engines cannot access.
Do I think it's malicious in intent? No. I think it's Google trying to help. Granted, Google does it to help themselves get more good content and a better engine (and thus hopefully grow market share).
Conspiracy theory? Yeah, OK. But there are a lot of people working really hard on search that don't have Google's resources or market share and between things like this - skrenta.com/2008/04/cuill_is_banned_on_10000_sites.html - and the inaccessibility of the webmaster data, it's a bit unfair to call it unfounded.
I may just be being very dense but how do you upload a sitemap to WMC? Am I missing it or are we talking about submitting a URL in WMC pointing back to your site?
Being able to upload directly would be an even neater solution to the point I raised above in reply to the comment by jameskm03.
On a lighter note the comment by 'Concerned Webmaster' on that cuill banned by 10000 post just cracks me up. "You've linked to my site! What were you thinking?!"
When I log in to WM Tools, I go to this URL - google.com/webmasters/tools/showaddsitemap? siteUrl=http%3A%2F%2Fwww.seomoz.org%2F&hl=en which lets me specify where my sitemap is hosted. Large sites often have multiple sitemaps and don't necessarily link to them, so a new search engine wouldn't be able to find those, while Google knows where they are. Maybe "upload" was the wrong word - sorry about that.
"Yes, you CAN make your sitemaps file discoverable to the engines via robots.txt, but I'm guessing that a large percentage of sitemaps users upload theirs directly through WMC, and thus prevent other engines from accessing it."
Rand, all you give Google is the location of a sitemap url on your webserver. I'm guessing that most people put it in the recommended location, e.g. www.example.com/sitemap.xml . These files are not hard to find. If I were writing a crawler, I'd check that location just like I'd check robots.txt. I'm not sure what Google can do besides make it a public standard in robots.txt so that any crawler can access the file too. "Sorry, I know you wanted to send data to Google, but we're going to reject it unless you prove that you provided the data to everyone else"? That's a hard argument to make.
I don't know why some webmasters have decided to block Cuill, but that seems like a separate issue. Maybe Cuill crawled too hard, and people didn't like that? Maybe people want to wait and see how Cuill looks before letting them crawl their site? In the early days, lots of people blocked MSFT's crawler because they considered it ill-behaved, but MSFT eventually made their crawler more polite and most people that had blocked it eventually unblocked it. I have to admit: I don't see the connection between Google's webmaster console and people choosing to block Cuill? If people are going to the effort to block a specific bot, that seems like a communication issue for the owners of that particular bot. That's the sort of thing a webmaster blog would be perfect to address.
Speaking as someone who was present in the early days of the webmaster tools team, I supported it because it made communication between webmasters and Google support much more scalable. By identifying common issues and providing self-service, automatic ways that webmasters could diagnose and fix their own issues, the webmaster console reduced the email load on Google's support team while radically improving the ability of an average site owner to accomplish Google-related tasks and get more information. I think the fact that both Yahoo and MSFT have ramped up their webmaster portals just demonstrates that more communication in both directions is a good thing.
Fair enough. I said in the post title that I believed the intent behind WMC was benign and beneficial. I've been one of the services biggest supporters for years, but I think it's still reasonable to bring awareness to how these folks are feeling - it's not like I made this up off the top of my head, these are real feelings from search startups.
However - I am going to edit the post title. Rather than "designed to be" I'm going with "Could they hurt?" which is more fair. Sorry about that - I need to be more careful with my titles.
FYI, Webmaster Tools doesn't offer language targeting, as stated here. We only offer geographic targeting, and we also look at publicly-available signals such as ccTLDs and server location.
You can't blame them for making something good that people want to use. We should focus on hating Microsoft for making bad things that people want to use.
Here Here!
I must admit though, MSN does have the best copy of WMC. Still not what WMC is, but better than what Yahoo provides.
Wait, did I just stand up for MSN? Crap, I gotta stop that.
I could not agree more! Thumbs for you :-)
Interesting point but....
Google, of course, built their initial market-killing search without any of that.
And....
They don't care about new search engines because anything worthwhile can be bought (price is no object) and if it can't then it won't be able to dent their marketshare in search, advertising, or anything else important anyway.
-OT
"Google, of course, built their initial market-killing search without any of that."
I agree!
I do see the potential and fully agree with your last paragraph. Then again how many webmasters use google's webmaster tools?
I think a more serious threat to competition is webmaster overzealousness over the bots control. Skrenta posted the other day that some 10 000 sites prevent cuill.com from crawling.
Maybe I'm being overly harsh on these startups but the only aspect of webmaster central which I think might be truly important is the geo-targeting information. I can see how that would be truly valuable.
Sitemaps, crawl rate and domain preference I suspect is information that would be nice to have but isn't exactly essential. On the other hand, they've got a tough task ahead of them and I wouldn't like to predict which will be the straw that breaks the proverbial camels back.
Either way it is an interesting perspective. It would be interesting to know the percentage of sites using webmaster central and related offerings and the percentage of sites pointing to a sitemap file in a robots.txt file.
totally agree with Streety. How many sites are really using the WMC ? lots of bigs players do not want to share their datas, and being on the WMC scare them a bit. I wonder if only small to medium size business are only on the WMC ?
Maybe I'm missing something, but I'm not following the logic for the conspiracy theory. Google has only added features and functionality to the web, I really don't see where the barriers to entry are with regards to WMC.
Most xml sitemaps are https://www.yoursite.com/sitemap.xml just as most robots.txt files are https://www.yoursite.com/robots.txt
I would always suggest that anyone's sitemaps should be accessible via a link on their homepage.
It seems like most savvy web masters that would actually use WMC would also know how to be consistent with their domains and effectively communicate their preferences with such.
I would also be interested to know the percentage of websites that actually use WMC. I would assume it's a very small percentage of the total.
I believe the real barrier to entry for a new search engine is the exponential growth of data on the web. The server power needed to crawl the world wide web and keep the cached up-to-date must be outrageous.
If webmaster tools were remotely accurate or even semi-dependable I'd almost agree, however since they are so often broken I think the lack of computing resources allocated to the program is an indication that they are not really targetting any sort of advantage seriously.
It shouldn't be that difficult to get the most mundane things correct, like the last time the home page was accessed, yet it is more often wrong than right. It causes more trouble and concern than any real value of a true metric.
Now if they do decide to make the program actually work I'd entertain the idea. With the examples you gave there are protocols available to demonstrate the sites desire, if not knowing how a site would like Google to index it (with or without www) is a disadvantage I'm not sure, however it is a crutch for people that cannot/won't/don't set up a 301 redirect and probably more of an indication of poor site management than anything else, maybe the upstart could use that information as an indicator.
Personally, I think that Google's Webmaster Central only helps to make the Google search engine a better product. As Google gets bigger and better, of course new search engines will have an increasingly difficult time competing. You can't fault Google from making a better product just because they are the most popular. New search engines will have to realise that they are not entering on a level playing field and give users a good reason to switch.Plus, since most sites don't use Webmaster Central, any new search engine can't really rely too much on that information anyway.
Sitemaps are an open framework that is adopted by other search engines too. Because they are pretty much standard does this not make it easier for another startup to easily crawl and index a domain? I mean if they can find and read the sitemap.xml file then a new search engine would be good to go on the relavent web. If a site doesn't follow these standards then they probably aren't worth being crawled and indexed in a new search engine anyway.
I think this is a good conversation here, but I'm going to have to disagree that the field is harder to enter because of webmaster tools. Besides most people would agree that Yahoo's Site Explorer is more useful than Webmaster tools in understanding links to your site anyway.
"I mean if they can find and read the sitemap.xml file then a new search engine would be good to go on the relavent web."
If the sitemap url is in the robots.txt file or linked to from the homepage then all is well. But what if the sitemap is on an obscure url and linked to from nowhere? A webmaster could still get all the benefits of a sitemap in google by pinging them but the data would be unobtainableby a startup.
A sitemap could conceivably be a valuable of intelligence to your competitors so it is not unlikely that some webmasters will attempt to obscure the file.
A hidden sitemap... i mean seriously this sounds shady at best. Any good site has a relavent sitemap page because some users like and expect this.
It should be a best practice as mentioned to keep your sitemap in the root directory. If you hide it for whatever reason then your site deserves to not be as easily crawled. I also think as mentioned that mentioning your sitemap in the robots.txt is important.
So you saying you hide that thing also? What's the point in creating them then?
While this article was good food for thought, it definitely misses the mark.
I still see Google Webmaster Central as a "value added" tool and I really don't see any "lock-in" whatsoever. The only "lock-in" that Google really has over other search engines is traffic - if you want to expose your site to the huge user base and number of queries being executed against Google's index, you need to play Google's game and by their rules.
I agree with some of the last few comments - and with regards to the title of this post, the answer is surely 'yes'. This post can be reduced to the question - Is the fact that Google offers a great, useful product going to hinder competing startups who fail to offer a similar quality of product? Of course! Is the existence of eBay hindering marketplace startups? I would suspect so...
WMC is a very useful product, and the best of it's kind made available by search engines. Of course this hurts new competitors. Any startup (David) that tries to pull the rug from under an established business (Goliath) will face the task of competing with Goliath's superior infrastructure, existing customers, data and brand recognition. That is the essence of business. Latecomers to the game must confront these barriers and, through innovation, make inroads into Goliath's inertia.
The timing on this couldn't be better. I just read a scathing attack on Google and Webmaster Tools before coming here. I love coincidence.
https://www.brendastardom.com/arch.asp?ArchID=1308
https://www.debtguru.com offers the best <a href="https://www.debtguru.com">consumer credit counseling</a> to help you out of all your debt problems.
Google raised the bar and keeps on raising it. Yahoo, MSN and others had plenty of time to develop similar tools. Did they get even close?
I do not agree with some of Google's policies (specially on the selective/preemptive punishments) but I cannot see Google guilty for pumping out better, wide accepted and used products.
I guess before crying, smaller start ups should focus on creating something different... My robots.txt with a link to my sitemap is waiting... and waiting... and waiting...
Show me the traffic!
Edited for typo...
I see these concerns as quite a stretch. Yeah, it's hard to compete against an etablished competitor, but that's common for any company/industry. There is nothing preventing the same information from being collected by a new search engine, it just takes time for them to build their reputation to where it would seem worthwhile to webmasters to do such. And a new company actually gains an advantage in knowing that such information would be useful to their efforts.
If anything, there is a new potential for developing a search engine with open standards handling everything within Webmaster Central with simple instructional xml / text files in the same way that Sitemaps and Robots.txt are currently done.
All of this information could conceivably be put in robots.txt or meta tags, which any search engine should look for. Sitemaps can be declared in robots.txt already. Yahoo has a crawl-delay for robots.txt.
So, any search engine startup could just read the Google and Yahoo specific parts of robots.txt.
You can geotarget in meta tags that no search engine seems to read or take into consideration. But, it is there.
After reading this post I have two tangent thoughts:
1) I can't wait to see what secrets are behind the Cuill door.
2) How in the world does a single man keep a fairly relevant competitor to Google even somewhat viable? (referring to Gigablast)