In the Distilled R&D department we have been ramping up the amount of automated monitoring and analysis we do, with an internal system monitoring our client's sites both directly and via various data sources to ensure they remain healthy and we are alerted to any problems that may arise.
Recently we started work to add in functionality for including the rel-alternate-hreflang annotations in this system. In this blog post I'm going to share an open-source Python library we've just started work on for the purpose, which makes it easy to read the hreflang entries from a page and identify errors with them.
If you're not a Python aficionado then don't despair, as I have also built a ready-to-go tool for you to use, which will quickly do some checks on the hreflang entries for any URL you specify. :)
Google's Search Console (formerly Webmaster Tools) does have some basic rel-alternate-hreflang checking built in, but it is limited in how you can use it and you are restricted to using it for verified sites.
rel-alternate-hreflang checklist
Before we introduce the code, I wanted to quickly review a list of five easy and common mistakes that we will want to check for when looking at rel-alternate-hreflang annotations:
- return tag errors - Every alternate language/locale URL of a page should, itself, include a link back to the first page. This makes sense but I've seen people make mistakes with it fairly often.
- indirect / broken links - Links to alternate language/region versions of the page should no go via redirects, and should not link to missing or broken pages.
- multiple entries - There should never be multiple entries for a single language/region combo.
- multiple defaults - You should never have more than one x-default entry.
- conflicting modes - rel-alternate-hreflang entries can be implemented via inline HTML, XML sitemaps, or HTTP headers. For any one set of pages only one implementation mode should be used.
So now imagine that we want to simply automate these checks quickly and simply...
Introducing: polly - the hreflang checker library
polly is the name for the library we have developed to help us solve this problem, and we are releasing it as open source so the SEO community can use it freely to build upon. We only started work on it last week, but we plan to continue developing it, and will also accept contributions to the code from the community, so we expect its feature set to grow rapidly.
If you are not comfortable tinkering with Python, then feel free to skip down to the next section of the post, where there is a tool that is built with polly which you can use right away.
Still here? Ok, great. You can install polly easily via pip:
pip install polly
You can then create a PollyPage() object which will do all our work and store the data simply by instantiating the class with the desired URL:
my_page = PollyPage("https://www.facebook.com/")
You can quickly see the hreflang entries on the page by running:
print my_page.alternate_urls_map
You can list all the hreflang values encountered on a page, and which countries and languages they cover:
print my_page.hreflang_values print my_page.languages print my_page.regions
You can also check various aspects of a page, see whether the pages it includes in its rel-alternate-hreflang entries point back, or whether there are entries that do not see retrievable (due to 404 or 500 etc. errors):
print my_page.is_default print my_page.no_return_tag_pages() print my_page.non_retrievable_pages()
Get more instructions and grab the code at the polly github page. Hit me up in the comments with any questions.
Free tool: hreflang.ninja
I have put together a very simple tool that uses polly to run some of the checks we highlighted above as being common mistakes with rel-alternate-hreflang, which you can visit right now and start using:
https://hreflang.ninja
Simply enter a URL and hit enter, and you should see something like:
Example output from the ninja!
The tool shows you the rel-alternate-hreflang entries found on the page, the language and region of those entries, the alternate URLs, and any errors identified with the entry. It is perfect for doing quick'n'dirty checks of a URL to identify any errors.
As we add additional functionality to polly we will be updating hreflang.ninja as well, so please tweet me with feature ideas or suggestions.
To-do list!
This is the first release of polly and currently we only handle annotations that are in the HTML of the page, not those in the XML sitemap or HTTP headers. However, we are going to be updating polly (and hreflang.ninja) over the coming weeks, so watch this space! :)
Resources
Here are a few links you may find helpful for hreflang:
- Moz's guide to Hreflang
- Aleyda Solis' hreflang generator tool
- Kaitlin McMichael's post on 7 common hreflang mistakes
- Dave Sottimano's post on hreflang insights
Got suggestions?
With the increasing number of SEO directives and annotations available, and the ever-changing guidelines around how to deploy them, it is important to automate whatever areas possible. Hopefully polly is helpful to the community in this regard, and we want to here what ideas you have for making these tools more useful - here in the comments or via Twitter.
thats great to have open source tool. I did check and works great. start using it :)
Works great.
One suggestion - can you make "hello world" example with few running pages and few errors. This will be very useful for novices to see how things working. Of course host that files anywhere in internet since "localhost:8888" didn't work anymore. I test it.
And one more thing - when hreflang isn't found you can dump message as "Can't find hreflang tags on this page". Because now show table with headers and w/o content.
Hi Peter,
Thanks - you make a good point about the localhost - I wanted an example that had some errors which wasn't too long etc. and couldn't find anything to fit my needs. I considered adding some test pages into hreflang.ninja itself for testing against, so maybe I'll do that.
I've added the feature you suggest with the message. Thanks!
Great! Dev here also i hope to not boring you.
And another four suggestions:
1. Do robots.txt and sitemap.xml because now shown error in Django. You also can also do DEBUG = False in Django config file to return standard 404.
2. Place somewhere "Check another page" with link to home or ajax style showing url box.
3. Can you check why didn't works if url is https://play.google.com/store/apps/details?id=com.... I also find few urls that return 502 error.
4. This link "https://pinterest.com" didn't work due unicode char somewhere.
1. We were having some issues so quickly enabled debug but it is off again now.
2. Cool idea. Thanks.
3. Yeah - funnily enough the Google Play store was one of my test scenarios and works fine locally. We are working on fixing it. Think uwsgi is timing out.
4. Interesting - looking now. Thanks. :)
Peter - the 502 issue is now fixed. :)
True... i check already.
But now Pinterest - https://www.pinterest.com/ return error 500
We've fixed the encoding problem now. Pinterest should be working. :)
Roger that.
And last one... if you put only something i.e. peter.nikolow.me (note w/o schema) error was shown. If you put something with space i.e. "peter nikolow" (w/o quotes) - return error 500.
Schema and invalid URLs are now handled a lot better. Thanks Peter! :)
This is cool. I love tools like this!
If you have more than one page you want to test (say a list of pages or an entire XML sitemap), check out my tool at Hreflang.org. It does NOT use polly but tests all sorts of edge cases, including the clash of hreflang and rel=canonical URLs.
Thanks so much for information as for the tutorial.
Thanks Tom for this much needed guide. I'll start working on an Arabic site this week and really can't wait to experiment this.
Let us know how you get on, Umar!
It's great to hear about an open source tool for checking hreflang - badly needed! Thanks also for the mention!
Thanks for the post, Kaitlin! :)
Thank for the link, I tested and I found some error with my hreflang.
Thank you for this open-sources tools. Always is a good tools when it's free.
Excellent tool, I couldn't resist using it. Ran a quick check on my websites. And luckily the tool passed all of them. Thanks for sharing.
Can't wait to test my sitemap with your tool :)
Hai Tom
Now i try the tool.But give the some issue.I have attached the link.https://hreflang.ninja/check/?url=http%3A%2F%2Fwww.sejongflex.com%2F validator.after explain me
Such an excellent idea and really useful tool!
Everything is ok with my code. But the redirect doesn't work properly. Whats the problem ?
Thanks much for the tool! Just recently fixed the backlink mistake with href so I was anxious to try out the tool and was pretty happy with it.
Some feedback for v1.2:
Please take this as helpful feedback from a dev, not as complaints. I did a lot of extra testing to give you actionable improvements in hopes of helping. :)
Hey Mario,
Thanks for the feedback! :)
Give the new build a try and see what you think. :)
Tried one of my domains that is non www and non-ssl (phpcodechecker.com) and it gave an Invalid URL (Schema) error, so #1 still not working as expected.
+1 on the GET, Try Again, and retry alternate links icon.
Also +1 on the new icons for success/failure (more useful for people who may be color blind).
As far as following redirects, maybe it is working as expected (you want errors for malformed URLs), but it's not immediately obvious if I was using this to test for problems, since my expectation is based on my experience visiting the URLs directly, where redirects and CaSe iSsUeS are automatically corrected and are not usually an issue.
This includes a 301 redirect (https://www.readysum.com/ goes to https://readysum.com/) and a 302 redirect (https://readysum.com/ goes to https://readysum.com/). I'm just wondering that without catches for these, the user won't understand why they got the big fat red X.
Oops - I messed up the deploy (didn't push new version of polly). I think that addresses both the issues. Thanks again for feedback. :)
Hello Tom,
Great tool you suggest, I have one question in mind related to href tag, what it will be good to use href tag for different country within one of our directory. I have see lot's of big websites use it with subdomains.
Hey Jitender - you can have different language versions in different directories just fine. :)
Thank for great experiment. Nice informative post.
wwooww !! good tool that I start to try now. Thanks !!
Hi Tom,
I get https://hreflang.ninja/check/ 502 bad gateway when i try my URL. Could that be because the page i tried to check has over 100 hreflang tags?
Can you PM me your URL so I can test? I think I may have fixed this though.
Hi Tom!
I have 5 pages for german, english, france, italian and russian, alltogether 25 pages. Should I set only to the index (index.htm, index_english.htm, index_fr. htm, index_it.htm and index_ru.htm) the hreflang tag?
Thanks!
hi!
when i check my site www. volgger.at there are too many entries, is this bad, what can i do?
Thanks
You should only have a single alternate URL for each language/region combo. Your page https://www.volgger.at/fewo/information.htm has multiple alternate en-GB links which makes it impossible for Google to determine which en-GB page is the equivalent of this German language page. You definitely need to change the current solution and ensure you are mapping equivalent pages to each other in a one to one fashion (per language).
Thanks!
But how can I mapping equivalent pages to each other in a one to one fashion (per language). In the head of each page I wrote the language version. Maybe you can help?
Thanks
Clemens
Get ready for Polly, the open-source hreflang checker library. It is a Python library, so it's easy to get up and running. It is a great way to scan your HTMl code for specific types of programming errors. It scans your HTML code for all hreflang annotations, then recursively scans each link for the following rel-alternate-hreflang common programming mistakes:
It does more than this small list, but it is pretty easy to see that having a library automatically handle this sort of annotation checking will make your site more compliant with HTML standards.
The world needs a hero to come and provide answers to the issue of hreflang and redirecting landing pages.
Our .com domain 301 redirects English users to .com/en, with others redirected to .com/de.
We've had indexing issues for a long time now (only the German homepage snippet was shown when searching for our company name).
We had hreflang annotations between the .com/en and .com/de pages as the .com page has no content and is simply a 301 redirect. Your great tool said that the .com/en and .com/de pages were fine, but when searching for our .com domain, it stated "Return tag error (page does not link back)" for the .com/en page.
I've now changed the .com/de hreflang annotation to be merely .com, which solved that problem (but now obviously returns errors for the .com/de page). I'm hoping that with the nature of the 301 redirect that this will be the solution.
Got any advice on what else could be going on?
---
Was:
...hreflang="de" href="https://www.example.com/de ...
...hreflang="en" href="https://www.example.com/en ...
Now:
...hreflang="de" href="https://www.example.com ...
...hreflang="en" href="https://www.example.com/en ...
Hello Tom,
Great experiment, this tool seems very interesting.
I had tried many times with different URL's but it does not work for me, check this image . Can you please recheck and help me?
Thanks
Hi Shubham,
I do not see any hreflang entries on that page. As Peter Nikolow has suggested below I'm rolling out an update to make it clearer when it does not find any entries. :)
Hello Tom,
Yes, that would be nice, looking forward to see.
Hey Shubham - it is now done. :)
thanks, great read and tutorial