We can all agree that a large timesink in outreach link building is site prospecting. Who really wants to spend the time to go through a bunch of pages on a site to figure out if the site is worthwhile? Further, if you’re following the “Throw Away Your Form Letters” principles then you are looking for content on a blogger or webmaster’s site that is of interest in order to start a conversation – but that takes a lot of time too. It would be awesome to scale that process wouldn’t it? Now, there’s an app for that.
I had the idea that if I spidered a site and matched the URLs with social metrics and then used natural language processing to figure out the core concepts of every page I could tell at a glance whether a site is worth my time and what content (if any) is popular. Note: I have purposely left out Linkscape’s metrics from this as I don’t believe we should waste API calls on what may be many worthless pages. You should identify the worthwhile pages and head over to Open Site Explorer. Sound good? Ok, let’s do this!
Natural Language Processing Explained
Natural Language Processing is a machine learning technique in which an application algorithmically performs text analytics to extract core concepts and in effect determine what a page is about. This type of distillation is the proxy between the written document and programming to allow a computer to “understand” content. As you can guess this is something that Google strongly leverages as can be seen in the “
Systems and Methods for Inferring Concepts for Association with Content” patent from 2004.
There are a variety of awesome APIs that do natural language processing but for this we will be using
Textwise simply because it’s entirely free.
Of course Rand is a pro so he titles his posts properly therefore the concepts “creative,” “typography” and “ux” come as no surprise here, but for less savvy writers and people who write more colorfully you may not be able to tell what a page is about from just the title. Also you get a better sense of what keywords, concepts or topics a computer will associate with a given page.
The next example is a page from
QN5 Music (full disclosure: I do music with these guys and they are incredible) where the title “Thank You for An Incredible Evening” is somewhat vague.
The post is a recap of their
2011 Megashow but there’s no meta description so you may not be able to tell what the page is about when prospecting from an Excel sheet generated by Screaming Frog. Now let’s couple that with Textwise concepts:
At a glance you can guess that the page is about some sort of incredible music performance and there were puppets involved. You’d be wrong thinking that it was Rock music though, but that’s just the gift and the curse of ambiguity of words. In other words after 5 seconds you are about 90% correct as to what the page is about without ever looking at it.
Your New Best Friend SiteSkout
SiteSkout is a brand new tool I wrote in PHP that spiders a site, retrieves social metrics, scrapes the page title and meta description and pings Textwise for concepts and categories then shows you all that awesome information as it happens and then exports it to a CSV file for download and Excel ninjitsu. (*dusts off shoulder*)
There are a few options that will affect the speed at which this all happens. You can have it spider a site from a given URL just like you would Screaming Frog or Xenu but be warned single-threaded spidering is slow. So I would suggest you use Screaming Frog for your spidering and just dump the URLs to SiteSkout or use the HTML or XML Sitemap because it will just crawl those URLs for scraping purposes instead of crawling through every link trying to determine the URL of every page on the site.
Bring Your Own API
So while my “
Using Social Media to Get Ahead of Search Demand” post may have underperformed by my personal standards (only 49 thumbs up) I learned a valuable lesson – if you put a tool on the front page of SEOmoz you better account for a very high number of API calls.
So for SiteSkout I’m encouraging users to bring their own API keys. The tool is built on 4 keys so it will run without it, but to ensure stability, signup for your own Textwise API key.
- Step 1: Register – Textwise has a very painless registration process, all you need is a name and email address.
- Step 2: Find your API key – Your API key is hidden away in your profile, grab it and save it somewhere like a text file.
- Step 3: Plug it into SiteSkout. SiteSkout will cookie your Textwise API for you so you don’t have to enter it every time you use the tool.
Applications
My motto is “all actionable everything” so let’s talk about how this data will help you do more effective link building.
Prospecting a Site
The obvious application is that it helps you prospect a site; if you mash this data up with a Screaming Frog export what you get is a macroscopic view of what the site is about at a glance and then a microscopic view of what a page is about without ever visiting the site. Use VLOOKUP on the URLs and bring all the data together. I'd suggest using the heading tags, level, inlinks, outlinks, external outlinks and hash columns from Screaming Frog in concert with this.
If you use a SiteSkout export in concert with the
SEER OSE-Twitter link building methodology (I love that method so much) you can quickly figure out who follows you but doesn’t link to you and what existing page on a given site you should ask for a link from.
Outreach Material
In my eyes the real power is in that you now easily have something to talk to the webmaster/blogger about. You now at a glance can determine the most popular content on the site and the magic inherent in that is social proof works both ways. That is to say if something is popular it makes sense that you would contact the writer about it. Your link target will be disarmed to a certain degree because they are very likely to have received a lot of praise and correspondence via social media and email because of their popular content. In short, SiteSkout helps you take out the cold calling aspect of link building.
Like I always say, Context is King!
I’d love to hear your thoughts and success stories with the tool in the comments below! There's bound to be some bugs in this, please just hit me on twitter (@ipullrank) if anything goes wrong for you. I continue to update these tools with feedback from you until they are running super smoothly. Also this tool is NOT an SEOmoz tool and any errors or failures are my fault, not the wonderful team of developers at the Mozplex, so if you run into any problems ping me not them.
About iPullRank —
Michael King is a software and web developer turned SEO turned full-fledge marketer since 2006. He is a the founder and managing director of integrated digital marketing agency iPullRank, focusing on SEO, Marketing Automation, Solutions Architecture, Social Media, Content Strategy and Measurement. In a past life he was also an international touring rapper. Follow him on twitter @ipullrank or his blog - The Best Practice
That's an awesome post. I have an alternative method which i think is worth sharing and can also help you in scaling link prospecting.
1) Head over to google or google blog search. Type an advanced search query. Say i am looking for top food websites. So i can type something like: food intitle:“top 10 resources”
2) Scrape the top 10 or top 20 organic search results using SERPs Redux bookmarklet. Use Autopager to scrape data from mutiple pages at once.
3) Copy-paste the URL list into excel and then with the help of the Excel Plugin by Niels Bosma scrape title, meta description, meta keywords, H1, H2, H3 tags and social metrics (like facebook likes, google plus counts, twitter counts). Use xpathonurl function to scrape comments (which is the strongest signal of user engagement). Justin can teach how to scrape comments using Xpath in this post.
4) Use the excel macro spreadsheet from seogadget to fetch keywords from the text of a URL(s). Then sort the keywords in decreaing order of their relevance. You will get a pretty good idea of what a document is all about just through this action.
So here we go. We have all the metrics we need in one excel spreadsheet to quickly analyse a document or set of documents for link prospecting. If you truly want to scale your SEO process then i strongly suggest you to become a data scraping expert and learn Xpaths or may be python (to use tools like scrapy).
WOW
And he very modestly didn't include a link to a post of his own, which I just found.
https://seohimanshu.com/2011/10/07/data-scraping-guide-for-seo/
I love this.
Love it himanshu.
ALSO, regarding the SERPS redux bookmarklet, to get 100 results;
1. Go to search settings
2. Turn OFF Google Instant
3. Set to 100 results perpage
Now the bookmarklet will give you 100 results at one time.
-Dan
Do you know that you have maybe created the Holy Grail tool for Curation Content?
It can be such a time saver for content prospecting! My compliments Mike.
Ah Mozzers! Please click the Rapper link in the Mike bio... So cool.
I have just seen the Rapper link of Mike and it is really awesome. Great Article also Mike and thanks for this update to all Mozzers Gianluca..! Thumbs up for sure..:)
Pretty cool stuff.. :) Mike did a awesome job. :)
Thanks Gianluca =]
Just suggested that the guys at Screaming Frog add it as features. Not sure they will but it's an exciting idea because it would be so much faster if the NLP was handled within the program.
quick idea yoou just gave me, mike
just a riff. curious to hear other thoughts. note to readers annd moz team, this editor makes it so i cant capitalize or use most punctuation on my mobile phone.now im all selfconscious n stuff...
Awesome ideas. Dave Minchala for the win!
Hey Mike!
So I played around with the tool a little bit, it seems easy enough to understand, so very nice job and appreciate you sharing it!
PROS
CONS
Few questions too:
1. Where are you getting the social data from?
2. I couldn't find that window on the TextWise site where you can enter the URL - is it at https://textwise.com/demo ? If so, its now different and/or not working. (In other words, where did you get this image - https://ipullrank.com/guest-blogs/seomoz/speedy-site-prospecting/rand-post-textwise-screenshot.jpg ?)
- Dan
Hey Dan,
I just typed out a long response to this and then the CMS ate it...let me try again...
First thanks for reading as always and thanks for the constructive criticism. It's very important and helping me determine what's important to users. So I will def make that change to the UX and I'll look into pushing the file to automatically download similar to how moz does it.
Can you send me the XML sitemap you were using? I haven't seen that error yet.
In answer to your questions:
Thanks again man! I'll ping you when I get the changes up.
-Mike
I tried this with my site and then used with seer interactive's OSE method in order to track my twitter users following us but not having any Link relationship with our site, that's just much much cool way.. loved it. and your tools are much efficient!
Glad you found it uself Ajay!
Awesome tool!! that one really gonna help us to get the best results out of it.
Ok, this is now on my ever expanding to do list. I've printed it out, so that means business! Long live The King!!
Thanks for reading Pat. Do let me know how it goes!
Impressive, most impressive. And... yeah, you're probably a jedi.
Thanks for the tool, sir! I will probably use that API key for more than just your tasty little prospector. I appreciate the info!
Yeah Mitch! There's definitely a ton of applications especially because Textwise has a lot of great functions for feature extraction. Do let us know what you come up with!
I actually really enjoyed your last post, so I headed over and gave you a thumbs up there.
I've written similiar code to achieve this, but far more involved, I really like this lite version and it's a great idea to include the API key, speedy! :)
Hey Christopher,
I'm very interested in what you made. Care to share?
awesome post! I have already tried once with my site and it shows not perfect but good results anyway. It's something really cool and it has a great potential in it.
thanks for sharing!
Hey Alessio,
I'm curious what your results were, were the issues with the concept tagging or just general programming issues?
Thanks!
Great tool you have created ipullrank, have tested it with a few different sites.
Also saw your rapping video on YouTube nice stuff =)
Much appreciated James!
Very interesting and practical, tahnks!
Thanks, Have to check this out very soon.
I apologize for my rookie question. This article is obviously written in 2011 and much have changed since then. I don't understand how SiteSkout and Natural Language Processing replaces the ability of MOZ - Open Site Explorer. I thought with OSE, you can easily look at the "Meta Info" and "Link Metrics" to help determine link's worthiness. And even so, we can utilize MOZ API. Thank you for your time.
This tool looks like it will really help me in my niche market. Thanks.
Pretty awesome! Thanks for a great tool.
Perhaps they are all thumbed out after my post yesterday? :-P
In all seriousness, I'm not sure where everyone is - this is a great tool and I think it is going to be seriously useful for outreach - maybe some people just don't get it? Or like Gianluca says, perhaps everyone is on holiday!
Good work iPullRank!
Awesome post iPullRank and awesome tool! I tried to get result for one of my website, it didn't show perfect diggs, tweets and Google+ as well
I leverage the same APIs as Shared Count for this so you should be seeing accurate numbers but I will definitely run some more tests to make sure I'm not making a mistake on my end. Thanks for the heads up!
Nice to see the TextWise API incorporated in your excellent app!
Mary McKenna - CEO - TextWise
Hi,
Seems good for native english seo experts ! , but for french ones...:(
Without fanboying too hard, all of your articles (especially the social media focused) have been innovative, detailed, and on-point. If content is king you are clearly succeeding, thanks Mike.
Thanks Alex, I really appreciate that.
I'm just trying to present ideas that I think will be useful and will encourage us all to spend less time copy & pasting and doing manual labor. So I'm glad you like it!
Thanks for a great post Mike! Now I'm going to really dig into this stuff because it sure beats the way I'm doing link prospecting now.
... silly me. I thought the SiteSkout tool would actually be up and running. Instead I find a page with nothing but text on it - no input fields, no drop downs, no buttons, nothing.
:(
It is up and running right here. Can you send me a screenshot of what you're seeing?