Many would argue that the nofollow tag solution is riddled with problems, but no one disagrees that on principle, major search engines cooperating on issues like nofollow and, more recently, MSN & Google adopting the use of the noodp tag is a great leap forward. Now another issue affecting a great number of searchers around the world needs the attention and consensus of the major search engines - international content, targeting, language & hosting.
What are the current problems:
- No agreement on how to determine whether content is intended for a specific country, language or subgroup (i.e. French speaking Québécois in Canada)
- No universal guidelines for the use of top level domains (TLDs)
- No universal guidelines for the use of hosting in specific geographies
- No recommendations for webmasters seeking to reach a particular geographic or linguistic audience
- Users - the best results as of now are generally found in English language searches from countries like the US, UK, Canada & Australia, users in other countries, using their native language or a popular non-official language are often getting a much worse user experience.
- Businesses & Publishers - the vast majority of webmasters and content creators on the web are seeking to be found through the search engines. Without language and regional guidelines, it's very hard for these folks (whether large or small) to decide where to host their site, what TLD to register with and what language or languages to write in to reach their audience.
- Search Engines - Without a consistent, congruous system to follow, the search engines themselves are dealing with a multitude of problematic issues from spam to database management to unhappy users and confused webmasters.
- If a website wants to reach multiple audiences in multiple countries with different languages for each, what is the best practice?
- If a website is intended to be targeted to a regional language-speaking group inside a country with a different official language, what is the best practice?
- If a website wishes to target all speakers of a language worldwide, what is the best practice?
Now, let's explore some of the particular problems currently affecting the search engines to help illustrate the problems.
Searching for the Spanish word for books - "Libros" - on a few different search engines:
A search on MSN in the US for libros
A search on MSN México - prodigy.msn.com - for libros
A search from the US on Yahoo! for libros
A Search on Yahoo! México for libros
A search at Google US for libros
A search at Google México for libros
Even readers who aren't familar with Spanish can see that many of the above results contain serious inconsistencies and issues with what sites are or should be appearing in the results. We're forced to wonder why Amazon's Spanish language book section ranks at Google in the US, while Barnes & Noble's section ranks in a search from México. Yahoo! and MSN have very odd advertisments for their US sectors, and the organic results in English aren't geared towards what searchers are most likely seeking. There are language interpretation problems, relevancy issues and questions about why a certain site should reach US Spanish-language speakers vs. Mexican audiences at all three engines.
Along with this specific example, there are multiple other issues that we've encountered while surfing in different languages and from different country portals:
-
Google generally appears to consider hosting & TLD extension more strongly than Yahoo! & MSN, although they eliminate this requirement for certain sites (like Wikipedia), yet don't apply it universally for content on sites like the BBC, Amazon or Yahoo! (who all produce lots of international content)
-
Yahoo! has remarkable inconsistencies with ranking US-focused content that mentions or uses words in other languages, although their system appears to be somewhat more even-handed than Google's with regards to content ranking in certain domains/countries.
-
MSN is overly reliant on link data, which led to my post from a few weeks back on how to MSN-bowl someon out of the results by linking to them heavily from another country. This needs serious attention.
I'd love to see more information, suggestions and ideas for solutions in the comments - please do contribute if you, too have experience with international targeting issues at the search engines. With some luck, they'll get together in the near future and issue some guidelines so the non-English language users of the world can receive higher quality search results and we, as webmasters, will know how to reach our audience.
Google does address this issue on its sitemaps blog. https://sitemaps.blogspot.com/2006/07/tips-for...
I certainly find that hosting location is the No. 1 factor. Especially in the UK where loads of big sites use a .com rather than .co.uk
1) Trust siteowners XHTML has document and element specific xml:lang and lang attributes - why not use them as a primary identification method?
2) Detect otherwise Like in case of nofollow, most websites will never adopt xml:lang and lang attributes. So there must always be other ways to detect language / target area if not specified by website owner. IMO the strongest candidate would be IR based language detection.
However, using TLD's is IMO not an international-wise option, because in many countries acquiring national TLD's is strictly limited. Also placing weight on hosting location is old-fashioned in globalizing world. Search engine engineers must know /realize this, so why do search engines still place (too much) weight on site's physical features?
The suggestion that search engines see relevance in where a site is hosted really burns me up.
Why anyone would want to host their Australian sites here in Australia beats me. We get twice the service at half the cost when we have our servers in the US.
We also host a couple of sites in Holland. With that host you raise a trouble ticket and your phone is ringing staight away as the tech calls to tell you the problem is being taken care of.
You just don't get service like that with local hosts down here.
Great post here, Rand.
I think the SE's have a bigger problem on their hands, 3-letters explain it all... IDN.
How will the SEs view Internationalized Domain Names that are .com versus from .JP, for example.
Lots of room for improvement in the international search game at the moment.
This, I guess, is somewhat related to a problem I have...
I'm on a small island in the Caribbean with a world class data centre, so I host my client sites there on my own colocated servers - our country has enough problems without unnecessarily dumping money out of the country for colo services when they exist here with a higher standard of support and service that I can get in, say, the US. (not to mention those rare days when I need physical access to one of my boxes)
But... one of my sites, which targets anyone and everyone *outside* of the country (specifically the US and Canada), ranks fantastically when keyword searches are run on Google and Yahoo from my office, but only ranks adequately when searched from a US IP address.
Basically, these search engines appear to be penalising my site for my decision to support my own nation's economy! It sucks, its unfair, and completely wrong!
Erm... sorry... rant over!
Rant away benj! This is a problem for most people who want to have their servers locally but target the International market (which the US accounts for a lot of).
What TLDs are you using on the sites?
This question got brought up quite a few times at SES Latino. I'll give a good example of the issue at hand here.
Spain has a population of about 40 million people, and while many people probably know that "Spain Spanish" sounds different than Spanish spoken in other parts of the world (think muchath grathiath instead of muchas gracias, plus they use an extra verb conjugation), people may not be aware that Spain technically has four official languages: Spanish (or Castilian), Euskara (or Basque), Catalan, and Galician.
Euskara is spoken by the Basque population and is concentrated in the northern tip of the country. A little over 1 million Spaniards speak Basque, with about 700,000 speaking it as their first language.
Catalan can be thought of as a cross between Spanish and French, and it's spoken primarily along the east coast of the country. Over 10 million people speak Catalan, with 7 million active speakers.
Galician (or Gallego) is similar to Portuguese and is spoken primarily on the northwest coast of Spain. 3 to 4 million Spaniards speak Gallego.
Think now of a company who is targeting Spain and its inhabitants as a potential market. An example of the sort of road block the company could encounter would be the word castle. In Spanish it's castillo, but in Euskara/Basque it's gaztelu, in Catalan it's castell, and in Galician it's castelo. Hmmm, we may have a problem here.
Targeting one language can alienate another, and thanks to Nacho Hernandez's presentation (which I summarized earlier), we know that ads presented in the targeted user's native language can yield a much higher ROI than non-native language ads. In Spain's case, what is a company to do?
And, for that matter, what is a search engine to do? Rank pages with the word castillo when someone searches for gaztelu? Try to figure out who speaks which language based on their region? It's a very hard issue.