Nearly all of us have used Screaming Frog to crawl websites. Many of you have probably also used Google's Structured Data Testing Tool (formerly known as the Rich Snippet Testing Tool) to test your authorship setup and other structured data.
This is a quick tutorial on how to combine these two tools to check your entire website for structured data such as Google Authorship and Rel="Publisher", along with various types of Schema.org markup.
The concept:
Google's structured data tester uses the URL you're testing right in their own URL. Here's an example:
- When I enter this URL into the testing tool...
https://www.contentharmony.com/tools/ - ...the testing tool spits out this URL: https://www.google.com/webmasters/tools/richsnippets?q=http%3A%2F%2Fwww.contentharmony.com%2Ftools%2F&html=
We can take advantage of that URL structure to create a list of URLs we want to test for structured data markup, and process that list through Screaming Frog.
Why this is better than simply crawling your site to detect markup:
You could certainly crawl your site and use Screaming Frog's custom filters to detect things like rel="author" and ?rel=author within your own code. And you should.
This approach will tell you what Google is actually recognizing, which can help you detect errors in implementation of authorship and other markup.
Disclaimer: I've encountered a number of times when the Structured Data Testing Tool reported a positive result for authorship implementation, but authorship snippets in search results were not functioning. Upon further review, changing the implementation method resolved the issue. Also, authorship may not be granted or present for a particular Google+ user. As a result, it's important to note that the Structured Data Tester isn't perfect and will produce false positives, but it will suit our need in this case, quickly testing a large number of URLs all at once.
Getting started
You're going to need a couple things to get started:
- Screaming Frog with a paid license (we'll be using custom filters which are only available in the paid version)
- One of the following: Excel 2013, URL Tools for Excel, or SEO Tools for Excel (any of these three will allow us to encode URLs inside of Excel with a formula)
- Download this quick XLSX template: Excel Template for Screaming Frog and Snippet Tester.xlsx
The video option
This short video tutorial walks through all eight steps outlined below. If you choose to watch the video, you can skip straight to the section titled "Four ways to expand this concept."
Steps 1, 2, and 3: Gather your list of URLs into the Excel template
You can find the full instructions inside the Excel template, but here's the simple 1-2-3 version of how to use the Excel template (make sure URL Tools or SEO Tools is installed before you open this file or you'll have to fix the formula):
Step 4: Copy all of the URLs in Column B into a .txt file
Now that Column B of your spreadsheet is filled with URLs that we'll be crawling, copy and paste that column into a text file so that there is one URL per line. This is the .txt file that we'll use in Screaming Frog's list mode.
Step 5: Open up Screaming Frog, switch it to list mode, and upload your file
Step 6: Set up Screaming Frog custom filters
Before we go crawling all of these URLs, it's important that we set up custom filters to detect specific responses from the Structured Data Testing Tool.
Since we're testing authorship for this example, here are the exact pieces of text that I'm going to tell Screaming Frog to track:
- Authorship is working for this webpage.
- rel=author markup has successfully established authorship for this webpage.
- Page does not contain authorship markup.
- Authorship is not working for this webpage.
- The service you requested is currently unavailable.
Just to be clear, here's the explanation for each piece of text we're tracking:
- The first filter checks for text on the page confirming that authorship is set up correctly.
- The second filter reports the same information as filter 1. I'm adding both of them for redundancy; we should see the exact same list of pages for custom filters 1 and 2.
- The third filter is to detect when the Structured Data Testing Tool reports no authorship found on the page.
- The fourth filter is to detect when broken authorship is detected. (Typically because either the link is faulty or the Google+ user has not acknowledged the domain in the "Contributor To" section of their profile).
- The fifth filter contains the standard error text for the structured data tester. If we see this, we'll know we should re-spider those URLs.
Step 7: Let 'er rip
At this point we're ready to start crawling the URLs. Out of respect for Google's servers and to avoid them disabling our ability to crawl URLs in this manner, you might consider adjusting your crawl rate to a slower pace, especially on large sites. You can adjust this setting in Screaming Frog by going to Configuration > Speed, and decreasing your current settings.
Step 8: Export your results in the Custom tab
Once the crawl is finished, go to the Custom tab, select each filter that you tested, and export the results.
Wrapping it up
That's the quick and dirty guide. Once you export each CSV, you'll want to save them according to the filters you put in place. For example, my filter 3 was testing for pages that contained the phrase "Page does not contain authorship markup." So, I know that anything that is exported under Filter 3 did not return an authorship result in the Structured Data Testing Tool.
Four ways to expand this concept:
1: Use a proper scraper to pull data on multiple authors
Screaming Frog is an easy tool to do quick checks like the one described in this tutorial, but unfortunately it can't handle true scraping tasks for us.
If you want to use this method to also pull data such as which author is being verified for a given page, I'd recommend redesigning this concept to work in Outwit Hub. John-Henry Scherck from SEOGadget has a great tutorial on how to use Outwit for basic scraping tasks that you should read if you haven't used the software before.
For the more technical among us, there are plenty of other scrapers that can handle a task like this - the important part is understanding the process so you can use it in your tool of choice.
2: Compare authorship tests against ranking results and estimated search volume to find opportunities
Imagine you're ranking 3rd for a high-volume search term, and you don't have authorship on the page. I'm willing to bet it would be worth your time to add authorship to that page.
Use hlookups or vlookups in Excel to compare data from three tabs: rankings, estimated search volume, and whether or not authorship is present on the page. It will take some data manipulation, but in the end you should be able to create a Pivot Table that filters out pages with authorship already, and sorts the pages by estimated search volume and current ranking.
Note: I'm not suggesting you add Authorship to everything—not every page should be attributed to an author—e-commerce product pages, for example.
3: Use this method to test for other structured markup besides authorship
The Structured Data Testing Tool goes far beyond just authorship. Here's a short list of other structured markup you can test:
- E-commerce product reviews and pricing
- Rel Publisher
- Event Listings
- Review and price markup on App Listings
- Music Snippets
- Recipes
- Business Reviews
- Just about anything referencing schema.org, data-vocabulary.org, and similar markup.
4: Blend this idea with Screaming Frog's other capabilities
There's a ton of ways to use Screaming Frog. Aichlee Bushnell at SEER did a great job of cataloging 55+ Ways To Use Screaming Frog. Go check out that post and I'm sure you can come up with additional ways to spin this concept into something useful.
Not to end on a dull note, but a couple comments on troubleshooting:
- If you're having issues, the first thing to do is manually test the URLs you're submitting and make sure there weren't any issues caused during the Excel steps. You can also add "Invalid URL or page not found." as one of your custom filters to make sure that the page is loading correctly.
- If you're working with a large number of URLs, try turning down Screaming Frog's crawl rate to something more polite, just in case you're querying Google too much in too short a period of time.
- When you first open the Excel template, the formula may accidentally change depending on whether or not you have URL Tools or SEO Tools installed already. Read the instructions on the first page to find the correct formula to replace it with.
This is super creative dude, nice post - I'll be using this (and thinking of other ways to use this idea potentially). Could be done on any tool that works via unique URL and parameters.
Thanks! The biggest downside to custom filters is that they can only detect the presence of code or text - they don't actually report back on attributes or other pieces of data like a true scraper.
However, you could go work with an app like Outwit or anything that can scrape xPath, and that would allow you to pull data from sites that use URLs as a parameter.
Also, I don't have many qualms about querying Google a few times per second as long as they'll let me, but I'd probably turn down the crawl speed if I was trying to run this process on a smaller site, and I'd scan the preliminary results for non-200 code results to make sure everything was testing properly.
ok
very nice
This is good, and you just made it readily accessible for thousands of people now. If there's a Captcha next time I test structured data, well, blame is on you for your creative method ;)
Yeah - I tried to include a few disclaimers about crawling at a polite speed. It's a pretty infrequent need for most people so I doubt it will register as much of a blip on Google's radar. If I were Google, I wouldn't even bother with the captcha, I'd just return a temporarily down page status of some kind until the requests from that IP address slowed down.
I don't know why, but after reading that I feel like I have been living under a rock. Thanks for the detailed guide.
Great post! Thanks for sharing.
Informative post, i have use screaming frog it helping in onpage sectors & provide valuable report.
This makes buying Screaming Frog instead of using the free version a lot more interesting.
I didn't know these filters were so powerfull. Thanks for this good post.
Wow! This is really very helpful. I am new to this but definitely will be trying to use screaming frog for my structured data.
Mary of Affilorama
Very Informative post.
Very Good Google Authorship Understanding. !!
great post. but you are working with seotools. why do you not use is found on page or regexisfind "
Is this promoting the paid version?
Nope, I have zero affiliation with the folks at Screaming Frog.
The techniques you shown us on how to use Screaming Frog to test if the authorship implementation was successfully done are very interesting, but I'm said that Screaming Frog can also provide incorrect reports.
nice post, The Screaming Frog SEO Spider crawls (only) the subdomain you enter and treats other subdomains it encounters as external links as default.
Terrific guide!
I am definitely going to use this
Nice visualization @Kane, thanks for this tutorial.
Any updated way of doing this, now that the data testing tool has been changed?
https://developers.google.com/structured-data/testing-tool/
Hey Matt, yes. You have to analyze the XHR requests in the Network tab of Inspect Element/Dev Tools to find out that Google is send a URL parameter query in order to get results. Which means, a URL like https://developers.google.com/structured-data/testing-tool/?url=https://www.yahoo.com will get the data for you.
You'll need to rebuild the "footprints" that I identified based upon what shows up in the right side of the screen. Find some URLs that cause errors and some URLs that work and you should be able to find the new footprints.
Good luck - please feel free to reply back here on the page if you find any good footprints that are worth sharing!
Great post.. Sad part is this, it only works in paid version...
Very informative post, now its quite easy to test whole site for authorship and other schema at once.
Good stuff, thanks for the tutorial Kane. I'm building a blog/info site now and intend to implement Google authorship. When I do I'll definitely be using this method for making sure it's been implemented correctly, and on all pages.
Awesome Post! Never thought that we can use Screaming Frog for testing of structured data too. :)
I have a question "What is the different between Structured data and data highlighter in Google Webmaster tools."
Thanks in advance;
Well as per google webmaster aspects -
The Structured Data page in Webmaster Tools shows the structured information that Google was able to detect on your site. The Structured Data page lists each type of structured data discovered on your site, along with the number of URLs containing each type. To see source URLs, click an item. In the Source URLs list, click a link to see the structured data Google was able to extract from that page. To see how a piece of structured data might appear in Google’s search results, click Rich snippets preview. Structured data is becoming an increasingly important part of the web ecosystem. Google makes use of structured data in a number of ways including rich sinppets which allow websites to highlight specific types of content in search results. Websites participate by marking up their content using industry-standard formats and schemas.
Data Highlighter is a webmaster tool for teaching Google about the pattern of structured data on your website. You simply use Data Highlighter to tag the data fields on your site with a mouse. Then Google can present your data more attractively and in new ways in search results and in other products such as the Google Knowledge Graph.
The structured data tool allows you to detect what type of structured data is implemented on your site and whether it has been installed properly.
The data highlighter, as I understand it, allows you to signal to Google the presence of structured data without actually inserting markup into your code. The biggest downside to this approach is that you're only telling Google, not other search engines and crawlers.
Thanks for the detailed answer; Now I got the exact difference between them. Thanks once again. :)
I agree with that Data Highlighter, you want a change on your site so all other search engine have very clear idea about your site.
Awesome post, very informative and analytical. Thank you!
Incredibly useful post, Kane. As much as I love Screaming Frog I'm not sure I ever would've thought of a testing process as good as this one. Just to clarify, you can't do this at all unless you have the paid version of Screaming Frog? Or is there a way to still do it, but it just takes longer because of the lack of access to Custom Filters?
Correct - in order to use custom filters you have to have a paid license which is £99 per year. There's no way to do this in Screaming Frog without that feature. It can be done in other scraping programs, however they're usually paid as well. Towards the end of the post I mentioned a tool called Outwit which could certainly handle this task, and it's around $60 per year I believe.
Good to see technical how-to posts like this one. Helps balance the concept and theory.
It’s a Wonderful post Kane, this is the most helpful post to evaluates the Authorship Rank in Google..... ! Thanks, Pankaj
This is really awesome for testing authorship and i will definitely use screaming frog
This is a great work-flow, Kane. No doubt I'll be using it sometime soon for checking product price and reviews markup. Good job putting together a clear and well written post. :)
Kane, you are the man! We use Screaming Frog every day and this is more ammo for the arsenal. We sometimes use SF in meetings with clients or prospects and they are "WOW'd" by how fast it is, then info it gathers and with this new technique, it'll allow us to help them understand what Authorship is and can do while getting us to test more effectively since we're in the software so much as is. Many thanks and your YouTube and this post are now bookmarked! - Patrick
Nice Post, thanks
Great post I loved it I now know much more about eventbrite! Very well written keep up the good work.