A steady rise in content-related marketing disciplines and an increasing connection between effective SEO and content has made the benefits of harnessing strategic content clearer than ever. However, success isn't always easy. It's often quite difficult, as I’m sure many of you know.
A number of challenges must be overcome for success to be realized from end-to-end, and finding quick ways to keep your content ideas fresh and relevant is invaluable. To help with this facet of developing strategic content, I’ve laid out a process below that shows how a few SEO tools and a little creativity can help you identify content ideas based on actual conversations your audience is having online.
What you’ll need
Screaming Frog: The first thing you’ll need is a copy of Screaming Frog (SF) and a license. Fortunately, it isn’t expensive (around $150/USD for a year) and there are a number of tutorials if you aren’t familiar with the program. After you’ve downloaded and set it up, you’re ready to get to work.
Google AdWords Account: Most of you will have access to an AdWords account due to actually running ads through it. If you aren’t active with the AdWords system, you can still create an account and use the tools for free, although the process has gotten more annoying over the years.
Excel/Google Drive (Sheets): Either one will do. You'll need something to work with the data outside of SF.
Browser: We walk through the examples below utilizing Chrome.
The concept
One way to gather ideas for content is to aggregate data on what your target audience is talking about. There are a number of ways to do this, including utilizing search data, but it lags behind real-time social discussions, and the various tools we have at our disposal as SEOs rarely show the full picture without A LOT of monkey business. In some situations, determining intent can be tricky and require further digging and research. On the flipside, gathering information on social conversations isn’t necessarily that quick either (Twitter threads, Facebook discussion, etc.), and many tools that have been built to enhance this process are cost-prohibitive.
But what if you could efficiently uncover hundreds of specific topics, long-tail queries, questions, and more that your audience is talking about, and you could do it in around 20 minutes of focused work? That would be sweet, right? Well, it can be done by using SF to crawl discussions that your audience is having online in forums, on blogs, Q&A sites, and more.
Still here? Good, let’s do this.
The process
Step 1 – Identifying targets
The first thing you’ll need to do is identify locations where your ideal audience is discussing topics related to your industry. While you may already have a good sense of where these places are, expanding your list or identifying sites that match well with specific segments of your audience can be very valuable. In order to complete this task, I'll utilize Google’s Display Planner. For the purposes of this article, I'll walk through this process for a pretend content-driven site in the Home and Garden vertical.
Please note, searches within Google or other search engines can also be a helpful part of this process, especially if you're familiar with advanced operators and can identify platforms with obvious signatures that sites in your vertical often use for community areas. WordPress and vBulletin are examples of that.
Google’s Display Planner
Before getting started, I want to note I won’t be going deep on how to use the Display Planner for the sake of time, and because there are a number of resources covering the topic. I highly suggest some background reading if you’re not familiar with it, or at least do some brief hands-on experimenting.
I’ll start by looking for options in Google’s Display Planner by entering keywords related to my website and the topics of interest to my audience. I’ll use the single word “gardening.” In the screenshot below, I’ve selected “individual targeting ideas” from the menu mid-page, and then “sites.” This allows me to see specific sites the system believes match well with my targeting parameters.
I'll then select a top result to see a variety of information tied to the site, including demographics and main topics. Notice that I could refine my search results further by utilizing the filters on the left side of the screen under “Campaign Targeting.” For now, I'm happy with my results and won’t bother adjusting these.
Step 2 – Setting up Screaming Frog
Next, I'll take the website URL and open it in Chrome.
Once on the site, I need to first confirm that there's a portion of the site where discussion is taking place. Typically, you’ll be looking for forums, message boards, comment sections on articles or blog posts, etc. Essentially, any place where users are interacting can work, depending on your goals.
In this case, I'm in luck. My first target has a “Gardening Questions” section that's essentially a message board.
A quick look at a few of the thread names shows a variety of questions being asked and a good number of threads to work with. The specific parameters around this are up to you — just a simple judgment call.
Now for the fun part — time to fire up Screaming Frog!
I’ll utilize the “Custom Extraction” feature found here:
Configuration → Custom → Extraction
...within SF (you can find more details and broader use-case documentation set for this feature here). Utilizing Custom Extraction will allow me to grab specific text (or other elements) off of a set of pages.
Configuring extraction parameters
I'll start by configuring the extraction parameters.
In this shot I've opened the custom extraction settings and have set the first extractor to XPath. I need multiple extractors set up, because multiple thread titles on the same URL need to be grabbed. You can simply cut and paste the code into the next extractors — but be sure to update the number sequence (outlined in orange) at the end to avoid grabbing the same information over and over.
Notice as well, I've set the extraction type to “extract text.” This is typically the cleanest way to grab the information needed, although experimentation with the other options may be required if you’re having trouble getting the data you need.
Tip: As you work on this, you might find you need to grab different parts of the HTML than what you thought. This process of getting things dialed can take some trial-and-error (more on this below).
Grabbing Xpath code
To grab the actual extraction code we need (visible in the middle box above):
- Use Chrome
- Navigate to a URL with the content you want to capture
- Right-click on the text you’d like to grab and select “inspect” or “inspect element”
Make sure you see the text you want highlighted in the code view, then right-click and select “XPath” (you can use other options, but I recommend reviewing the SF documentation mentioned above first).
It’s worth noting that many times, when you're trying to grab the XPath for the text you want, you’ll actually need to select the HTML element one level above the text selected in the front-end view of the website (step three above).
At this point, it’s not a bad idea to run a very brief test crawl to make sure the desired information is being pulled. To do this:
- Start the crawler on the URL of the page where the XPath information was copied from
- Stop the crawler after about 10–15 seconds and navigate to the “custom” tab of SF, set the filter to “extraction” (or something different if you adjusted naming in some way), and look for data in the extractor fields (scroll right). If this is done right, I’ll see the text I wanted to grab next to one of the first URLs crawled. Bingo.
Resolving extraction issues & controlling the crawl
Everything looks good in my example, on the surface. What you’ll likely notice, however, is that there are other URLs listed without extraction text. This can happen when the code is slightly different on certain pages, or SF moves on to other site sections. I have a few options to resolve this issue:
- Crawl other batches of pages separately walking through this same process, but with adjusted XPath code taken from one of the other URLs.
- Switch to using regex or another option besides XPath to help broaden parameters and potentially capture the information I'm after on other pages.
- Ignore the pages altogether and exclude them from the crawl.
In this situation, I'm going to exclude the pages I can’t pull information from based on my current settings and lock SF into the content we want. This may be another point of experimentation, but it doesn’t take much experience for you to get a feel for the direction you’ll want to go if the problem arises.
In order to lock SF to URLs I would like data from, I’ll use the “include” and “exclude” options under the “configuration” menu item. I’ll start with include options.
Here, I can configure SF to only crawl specific URLs on the site using regex. In this case, what’s needed is fairly simple — I just want to include anything in the /questions/ subfolder, which is where I originally found the content I want to scrape. One parameter is all that’s required, and it happens to match the example given within SF ☺:
- https://www.site.com/questions/.*
The “excludes” are where things get slightly (but only slightly) trickier.
During the initial crawl, I took note of a number of URLs that SF was not extracting information from. In this instance, these pages are neatly tucked into various subfolders. This makes exclusion easy as long as I can find and appropriately define them.
In order to cut these folders out, I’ll add the following lines to the exclude filter:
- https://www.site.com/question/archive/.*
- https://www.site.com/question/show/.*
Upon further testing, I discovered I needed to exclude the following folders as well:
- https://www.site.com/question/genus/.*
- https://www.site.com/question/popular/.*
It’s worth noting that you don’t HAVE to work through this part of configuring SF to get the data you want. If SF is let loose, it will crawl everything within the start folder, which would also include the data I want. The refinements above are far more efficient from a crawl perspective and also lessen the chance I'll be a pest to the site. It’s good to play nice.
Completed crawl & extraction example
Here’s how things look now that I've got the crawl dialed:
Now I'm 99.9% good to go! The last crawl configuration is to reduce speed to avoid negatively impacting the website (or getting throttled). This can easily be done by going to Configuration → Speed and reducing the number of threads and URIs that can be crawled. I usually stick with something at or under 5 threads and 2 URIs.
Step 3 – Ideas for analyzing data
After the end goal is reached (run time, URIs crawled, etc.) it’s time to stop the crawl and move on to data analysis. There a number of ways to start breaking apart the information grabbed that can be helpful, but for now I'll walk through one approach with a couple of variations.
Identifying popular words and phrases
My objective is to help generate content ideas and identify words and phrases that my target audience is using in a social setting. To do that, I’ll use a couple of simple tools to help me break apart my information:
The top two URLs perform text analysis, with some of you possibly already familiar with the basic word-cloud generating abilities of tagcrowd.com. Online-Utility won’t pump out pretty visuals, but it provides a helpful breakout of common 2- to 8-word phrases, as well as occurrence counts on individual words. There are many tools that perform these functions; find the ones you like best if these don’t work!
I’ll start with Tagcrowd.com.
Utilizing Tagcrowd for analysis
The first thing I need to do is export a .csv of the data scraped from SF and combine all the extractor data columns into one. I can then remove blank rows, and after that scrub my data a little. Typically, I remove things like:
- Punctuation
- Extra spaces (the Excel “trim” function often works well)
- Odd characters
Now that I've got a clean data set free of extra characters and odd spaces, I'll copy the column and paste it into a plain text editor to remove formatting. I often use the one online at editpad.org.
That leaves me with this:
In Editpad, you can easily copy your clean data and paste it into the entry box on Tagcrowd. Once you’ve done that, hit visualize and you’re there.
Tagcrowd.com
There are a few settings down below that can be edited in Tagcrowd, such as minimum word occurrence, similar word grouping, etc. I typically utilize a minimum word occurrence of 2, so that I have some level of frequency and cut out clutter, which I’ve used for this example. You may set a higher threshold depending on how many words you want to look at.
For my example, I've highlighted a few items in the cloud that are somewhat informational.
Clearly, there’s a fair amount of discussion around “flowers,” seeds,” and the words “identify" and “ID.” While I have no doubt my gardening sample site is already discussing most of these major topics such as flowers, seeds, and trees, perhaps they haven’t realized how common questions are around identification. This one item could lead to a world of new content ideas.
In my example, I didn’t crawl my sample site very deeply and thus my data was fairly limited. Deeper crawling will yield more interesting results, and you’ve likely realized already how in this example, crawling during various seasons could highlight topics and issues that are currently important to gardeners.
It’s also interesting that the word “please” shows up. Many would probably ignore this, but to me, it’s likely a subtle signal about the communication style of the target market I'm dealing with. This is polite and friendly language that I'm willing to bet would not show up on message boards and forums in many other verticals ☺. Often, the greatest insights besides understanding popular topics from this type of study are related to a better understanding of communication style, phrasing, and more that your audience uses. All of this information can help you craft your strategy for connection, content, and outreach.
Utilizing Online-Utility.org for analysis
Since I've already scrubbed and prepared my data for Tagcrowd, I can paste it into the Online-Utility entry box and hit “process text.”
After doing this, we ended up with this output:
There’s more information available, but for the sake of space, I've grabbed only a couple of shots to give you the idea of most of what you’ll see.
Notice in the first image, the phrases “identify this plant” & “what is this” both show up multiple times in the content I grabbed, further supporting the likelihood that content developed around plant identification is a good idea and something that seems to be in demand.
Utilizing Excel for analysis
Let’s take a quick look at one other method for analyzing my data.
One of the simplest ways to digest the information is in Excel. After scrubbing the data and combining it into one column, a simple A→Z sort, puts the information in a format that helps bring patterns to light.
Here, I can see a list of specific questions ripe for content development! This type of information, combined with data from tools such as keywordtool.io, can help identify and capture long-tail search traffic and topics of interest that would otherwise be hidden.
Tip: Extracting information this way sets you up for very simple promotion opportunities. If you build great content that answers one of these questions, go share it back at the site you crawled! There’s nothing spammy about providing a good answer with a link to more information if the content you’ve developed is truly an asset.
It’s also worth noting that since this site was discovered through the Display Planner, I already have demographic information on the folks who are likely posting these questions. I could also do more research on who is interested in this brand (and likely posting this type of content) utilizing the powerful ad tools at Facebook.
This information allows me to quickly connect demographics with content ideas and keywords.
While intent has proven to be very powerful and will sometimes outweigh misaligned messaging, it’s always great to know as much about who you're talking to and be able to cater messaging to them.
Wrapping it up
This is just the beginning and it’s important to understand that.
The real power of this process lies in its usage of simple, affordable, tools to gain information efficiently — making it accessible to many on your team, and an easy sell to those that hold the purse strings no matter your organization size. This process is affordable for mid-size and small businesses, and is far less likely to result in waiting on larger purchases for those at the enterprise level.
What information is gathered and how it is analyzed can vary wildly, even within my stated objective of generating content ideas. All of it can be right. The variations on this method are numerous and allow for creative problem solvers and thinkers to easily gather data that can bring them great insight into their audiences’ wants, needs, psychographics, demographics, and more.
Be creative and happy crawling!
Thanks to all who read through the article! I had a few questions I'd love to have your input on:
1) What other methods do you use to gather audience insights at scale to help develop content? If you've utilized more than one, which have you found to be the most successful?
2) If you use Screaming Frog (or another crawler) regularly, what other uses have you found for their abilities to scrape content for more creative oriented tasks?
3) For the process outlined above, what recommendations do you have to improve the process do you have? How could the process be modified to accomplish related tasks?
Thanks for the guide. This is a very interesting approach and results look great.
I have 2 methods that I use primarily:
Finding competitor's advertised content. Boils down to finding pages that are advertised in Google Adwords and contain the word blog in URL. Always some great stuff there, and not always the most popular.
And looking up search suggestions for a keyword. We have a free tool for this in Serpstat, you basically enter a keyword and it gives you a list of autocomplete suggestions in form of questions, for example for a word Gardenia, it returns: how to make gardenia scented oil, how to care for gardenia bonsai plants, how to make a gardenia boutonniere and on and on.
Igor,
Thanks for the comment and idea. I know many that like to use various aspects of advertising (especially within the Adwords system) to help with competitive analysis. Spotting advertised content that way is a good tip. Also, I'll have to check out Serpstat. I typically use keywordtool.io for question generation (or now the Moz Keyword Explorer Tool), but it's always good to have another option!
Thanks for reading and for the comment!
1) We have found by virtue of lot of hits-trial-misses , the best method to gather Audience Insights is via user generated content. That is something available on social media platforms and blogs in abundance under the comments section. If you read it properly, you will be able to see different perspective and hidden insights, which might have remained unexplored even by competitors.
2) We use Screaming Frog and other crawlers regularly is using them to see real-time success of our pages and posts (pageviews, search data) by integrating Google Analytics data , more information is available on this link https://www.distilled.net/resources/4-things-you-can-do-with-screaming-frog-that-you-couldnt-do-a-year-ago/
3) For the process outlined above, when you go to first step of understanding your audience, that is when you have to devote time and effort to real the pain and gap areas. This is where the real opportunity exists.
Thanks,
Vijay
hi,
can you do whiteboard Friday on this topic i mean video season because it is quite interesting.....
Billiant! I've never useed screaming frog in that way.
Is exactly what I needed for a my new project I'm working on.
Thanks for sharing.
Glad it might help you out!
I've been using Screaming Frog for a time now, but never figured out nothing close to this analysis. Quite inspiring, I won't just apply it on some of my own projects, but I'm starting to come up with some ideas on my own.
Really, thanks!
Hey Todd!
Really interesting approach to this, love the intersection of tech & content here.
I walked through your process and had some trouble getting relevant/valuable sites in the Display Planner - I was looking into "PC Games" and got a lot of PC building sites so had to exclude that topic from my campaign and saw a lot of "Anonymous" sites. Finally found a site worth looking at and am now having some issues getting the XPath to work properly. It'll grab 1 or 2 of the titles but them won't grab the rest. Tried grabbing the <span class="title" rs_id"###"> == $0 which gives me the 2 extractions, but not sure where I'm going wrong in not getting the rest...
As always, it's a pleasure picking your brain :)
-Kyla
P.S. Say hi to JM for me!
Hey Kyla,
So fun to see you here! I can't really figure out how to help from the description, but if you DM me on twitter I can pass you my email and we can go from there. Or just submit the contact form at newbrewmedia.com. I'll do my best to help if I can!
Nice post. I'm not doing what you're suggesting but you introduced me to the SF "extraction" feature and that is exactly what I needed to know for a different project I'm working on. Thanks!
Awesome--glad to hear it and happy extracting!
Brillian, i never used the custom extraction featured in SF, thanks
You bet!
Hey Todd, I'm pretty green when it comes to utilizing Screaming Frog for content analysis and strategy (no pun intended). The few times I have used it were just to double check that my sites had the appropriate title tags, h1s, subheadings, etc. It was much quicker than opening up 200 tabs and going through them one by one like I had been doing prior. So i'm already a fan.
But these insights that you've given are definitely a game changer. Brainstorming new content ideas was always somewhat of a loosely definied process for me which involved writing down some ideas off the top of my head, and using google trends + keyword planner for volume data. This however, gets to the heart of user intent, which is awesome!
One question though, is your method possible with the free version of the software as well? Thank you!
Hey Kenneth,
Thanks for taking a look and I'm really glad the process seems valuable to you. It's helped us at times come up with ideas when we were jammed up and unsure of where we wanted to start with content.
The process could work with the free version of the tool. The only difference that I'm aware of is that you're capped on the number of URLs you can crawl. So, the data you can pull will be much lower, but you could still get a start and explore how the process works.
I will say the premium version is well worth it. I've always found the SF team to be very helpful, the tool is powerful and continually worked on, and for about $150/usd, it's pretty affordable compared to a lot of SEO tools, or tools designed to help you generate content ideas.
That said, I know what tight budgets can be like, so hopefully you can get what you need!
Thanks again for the input.
I find Screaming Frog slightly awkward to use, but having followed the instructions in this article I've made some good progress. Another Screaming Frog article in the near future would be appreciated...
Good feedback thanks Steve. It has a bit of a learning curve, but once you get into it, it's not so bad. This post from Seer is a little outdated now, but much of the information is still good -- https://www.seerinteractive.com/blog/screaming-frog-guide/
Wow, I love it...
The truth is that I'm used to only use Screaming Frog to analyze my websites, but never thought it could be so useful to generate a content
Thank you very much for the post!!
You bet Luis! Hope it gets the creative juices flowing.
Excellent article, thanks for sharing. Interesting tool, I must say and I didn't know about this one. Personally I have been basing my content on researching Blog and Forum comments to establish a certain need within a specific Niche. Couldn't agree with you more, there is nothing as in demand as Relevant, Interesting and actually Useful Content. Way too much blabbering and spamming going on these days...
A good, actionable piece. In my opinion, if you are an SEO, there is no reason to not be using Screaming Frog on a daily basis. It is that good!
Hey Todd,
That is a really nice tip, but must admit it does take time getting used to Screaming Frog. We use it to clean up content on our sites, but this method of researching upcoming keywords for new keywords is quite new to me.
However, does this work only on sites with a lot of User Generated Content i.e forums with heavy usage. Then in that case in niche industries ( non-tech) where forum discussion is limited, this might have its limitations.
Insightful post and learnt something new today!!
I think that even in niche industries you may get quite good results. Every niche has its own language, with words and expresions very usual amond their actors, and this method help us to identify and focus on them. I think that it's a good thing that you know the language of your target niche.
Hi Todd,
Great post! I have used Screaming Frog for years but only for sorting 301 redirects on a redesign etc.
Thanks for sharing :)
Laurence
Thanks for sharing.
Agreed - thanks for the interesting use case for SF.
me encanta esta herramienta por las posibilidades que ofrece de cara al SEO On Page. Y ahora que me acaban de abrir los ojos con las nuevas posibilidades que ofrece, más aún. Creo que screaming Frog es un básico en toda estrategia de Seo y además ¡es gratuita!. Muy muy fan, de Screaming Frog
Thanks for this post Todd. Coming up with content ideas can be really daunting at least so this will come in handy !
Hi Todd - Extremely thorough and valuable post. I've spent some time kicking the tires on this method and Screaming Frog's custom extraction configurations and can see where this could definitely save some time and lead to some intriguing quick wins generating content marketing concepts.
I'm working my way through XPath vs CSS Path vs Regex extraction methods and can see where there could definitely be some application-specific tweaks depending on the site crawled especially ones that utilize a dynamic uid for titles/questions in a forum. Any experience with this and if Xpath can be used in a scenario like this?
Thanks again for the great post!
I never try screaming frog. After reading this i think it is really worth using it.
Wow, I never actually used Screaming Frog for content purposes!
Gonna have to check it out in the future.
Thanks for the post!
You're welcome--glad it got you thinking a different direction.
I'm most excited about using the process outlined here to scrape data more efficiently from competitors sites. I have only used import.io in the past which requires a list of URLs usually to be effective in scraping the section of content that you want.
This can be used for competitive content analysis as well to help make the case for increasing content creation.
Joseph,
Glad you grabbed something useful from the post. I totally agree with you on the power of using this feature for competitive analysis and a variety of other tasks. The feature is by far one of my favorites they implemented as there have been so many times I wanted to pull something at scale either from a clients site or competitors site for a variety of reasons.
Thanks for taking a look and commenting. If you care to share any more details on how you might specifically use it for competitive work I'd love to hear it!
I'm using screaming frog from one year for analyzing several on page seo factors like title, description, image alt text but i never thought like this. It is great news for me that now i can generate content through screaming frog so thank you so much Todd.
You're welcome--hopefully it leads to more creative usage on your part and better audience understanding. Thanks for taking a look!
Hi Veeren,
That's up to the Moz folks, although at some point I could do a video walk through of the process. I'll think about that. Thanks for taking a look!
Hi Todd,
Thank you so much for opening my mind and showing new ways and uses for Screaming Frog, I only used it for analyze basic features.
Regards from Spain.
Javi
I totally agree with you I really liked your post. On this there is much to explore.
Thanks Todd,
I cannot use SF now as its paid but in near future i would love to use it to extact content ideas as suggested by you.
Regards
Pulkit Thakur
Quite a technical article! Isn't it?
I don't think it's not so technical. Yes, it issues some technical-ish concepts, but as part of a very step-by-step guide that you don't really need to fully understrand their ins and outs. Just by following the procedure you will reach nice results. I agree that technical terms may scare a bit at first, but, at least in this case, it's not as bad as it's made out to be. Give it a chance! ;)
Woah this is nasty, gonna play with this for sure!
Enjoy, just play nice :)
Though I haven't used Screaming Frog.. but since I have read your blog.. I guess it must be worthy.. (y)
Will use it!!