Rand recently asked you all for feedback about improving the blog. The two areas that you asked us to write about more were linkbuilding and tools. In a shameless populist move, I thought I'd write a post about tools for automating (bits of) linkbuilding.
Recently, I keep coming across ways that we are actually living in the future. I don't mean the jet-pack-wearing-holidaying-in-space-hover-car future, more the holy cow, you can actually run select * from internet where... kind of future.
Yes, I know it's not as cool.
I believe that technical skills are important in SEO. There are plenty of non-technical roles in SEO agencies or teams (especially on the creative side of things) but if you have ambitions to lead teams, set strategy and run SEO projects, you kinda need to understand how the internet works under the covers. For me, that means knowing how to build stuff - even though I'm not a developer and should never be let near production code, I like to understand the concepts and principles. To keep on top of things, this means occasionally getting my hands dirty and building stuff. It's fun. I can highly recommend it.
In order to bring you something useful and actionable, I decided to pick something simple that my team wanted, something to help with linkbuilding, but also something I could put together relatively quickly. I chose to build a prototype tool for monitoring the web for mentions of a website that don't link to that website. Hopefully it's pretty clear how this could be helpful - but just to give one example - if you are running a PR campaign, you may well get coverage that doesn't link to you, but if you just drop the journalist a line straight after publishing, they can often get a link included. For those of you who think better in pictures, here is a diagram of what I mean, with my limited drawing skillz:
I recently wrote about some moderately technical tools (e.g. Mozenda, Smartsheet) in my post on data visualization techniques. The tools I'm going to cover today are even more technical and advanced - but they are also infinitely more flexible. I don't want to scare you into thinking this is something you can't do yourself though. I learnt all the techniques and tools below and finished my mini-project from scratch in 2 hours. In fact, it'll take me longer to write the post than it did to do everything in it. If I can do it, so can you.
At 1pm UK time on Thursday 15th April, I tweeted this:
Why those particular tools? Well:
- xpath allows you to navigate and select elements and attributes from an XML document (including HTML). This gives you a really simple way of pulling information out of HTML pages
- How this helps my mini-project: I get a straight-forward way of pulling all links out of a page in order to check whether the page in question links to you
- YQL (Yahoo! Query Language) is the select * from internet where ... magic I referred to earlier. It provides an API that you can use to grab pages, RSS feeds and a whole load of other cool stuff
- How this helps my mini-project: with one line of code, I can grab RSS feeds of mentions (for my proof-of-concept, I used a Google Alerts RSS feed) as well as grabbing the pages referenced in order to run xpath on them (did I mention that YQL supports xpath?)
- Google App Engine allows you to deploy web applications without worrying about most of the usual environment, server and configuration issues. It is also a way of dropping buzzwords into your conversation by deploying your newly scalable application to the cloud. FTW
- How this helps my mini-project: I didn't want to assume any prerequisites like having servers at your disposal, but I also didn't have time to set anything up from scratch. App Engine is free for small-scale use, and I went from not having an account to deploying my code in under 2 hours
- Python is one of the two programming languages supported by Google App Engine. The other being Java. I know a tiny bit of Java from years ago, whereas I didn't even know Python gives indentation semantic meaning before I started my project
- How this helps my mini-project: I needed some kind of programming language to enable me to build loops, display the output etc. and I needed to pick one that I (a) didn't already know and (b) could be used with App Engine
Getting Going
Before I start, let me warn you to read the disclaimer at the end of this post: I build a prototype / proof of concept here and you should definitely not rely on my code. Use at your own risk!
My 'specification' for the project was:
- Grab mentions from a Google Alerts RSS feed (I chose to hardcode 'SEOmoz' [RSS link] into my proof-of-concept)
- For each mention, see if there is a link to any page on https://moz.com
- Output a list of mentions that don't link
Pretty simple, right?
With the clock ticking, I started by downloading the install files for App Engine and Python while reading up on YQL.
The Python download was taking a while, so I spent the first half hour building the YQL queries I needed on the console.
To grab the Google Alerts, I used:
select * from feed where url='https://www.google.com/alerts/feeds/02091889458087148316/10137124638087203861'
and for each page in that list, I could grab the list of links using:
select * from html where url='<target URL>' and xpath="//a[starts-with(@href,'https://moz.com')]"
The xpath there probably needs a bit of explaining - I built it using a combination of the basic xpath documentation linked above and the ever-awesome stackoverflow. You can consider it in three sections:
- //a means select all 'a' (anchor) elements (i.e. links)
- //a[@href] means select all href attributes of all links
- //a[starts-with(@href,'https://moz.com')] means select all href attributes of all links that start with https://moz.com
By the time I'd cracked that, my downloads had finished and I set about getting my environment ready using the App Engine quick start guide.
I also had a lucky break at about this point. I discovered that there is a YQL library for Python. Holy awesome batman! I figured it was going to be pretty easy to build something in Python to query the YQL API, but I didn't realise it was going to be as easy as yql.Public().execute(query). Sweet!
It took me a while to work out how to import third party libraries into my App Engine environment (turns out you just grab the source code and include the folder in your application's root folder). My time was running out by this point. I was about halfway through my two hours and I hadn't yet written a single line of code.
Writing Python Code
I'm not the right person to teach you how to write Python code. Especially because about 10 minutes before the end of my challenge, I realised I didn't know how to create an if statement. My approach to learning Python is not to be recommended; but there are loads of great tutorials out there. I really wish I could step through and explain my code line-by-line, but honestly? I'd probably just expose my horrific lack of knowledge.
[Want a link from SEOmoz? Understand Python? Write up an explanation for beginners, drop me a line and I'll link to it here. For bonus points, you could show how to improve my code ]. In the meantime, working through the code has to be (as my university lecturers used to say) left as an exercise for the interested reader. Update: Peter Coles has kindly taken the time to go through my code (improving and) explaining things - if you're interested, I suggest you read his explanation of my Python code. Thanks Peter!
All you really need to know is that in 33 lines of code (at the time of writing), I built my basic prototype. You can see the resulting code over at Google code.
The Outcome
With time running out, I clicked 'deploy' and.....
.... huh. That was easy.
OK, so I get time-outs / server errors from time to time and it's really only a proof-of concept at the moment (see below) but I still think it justified my tweet exactly two hours after the previous one:
Huge Caveats
My prototype is essentially just a proof-of-concept. Among loads of other things, note that it doesn't have:
- Any error-handling
- Much testing
- Any documentation (including comments)
And that it does have:
- Hardcoded variables
- Massive limitations even given the hardcoding (only grabbing 10 results, for example)
- No way of automating it or doing anything other than running it manually (though App Engine does provide simple ways of extending into this)
In its current form, it's not really useful for anything, but hopefully it will become interesting soon. If you want to build anything off it (or the ideas contained in it), I'd love to hear about it (but please bear in mind that it really is the definition of non-production-ready code, so if you do use it, you do so entirely at your own risk!).
I still think that it has been a useful learning exercise for me and I hope it presents you with some food for thought (unfortunately not real food like Rand's recent post).
Please share your ideas
I'd love to hear your thoughts for similar small tools that help us all do our jobs better. I'd also love to see someone take this and turn it into a more fully-functional tool (if you do that, let me know and you'll likely get a link from here!).
"I learnt all the techniques and tools"
Right now I am coding in Monty Python. I'll try regular Python once I Have solved for "Meaning of Life"
King Arthur:Look, you stupid B-----d. You've got no arms left.
Black Knight: Yes I have.
King Arthur: *Look*!
Black Knight: It's just a flesh wound.
I'll bite your bloody legs off!
watch out for the neck peircing rabbit !
ECKEH ECKEH ECKEH PATUNG ZOOBOING
Since I read your comment I am whistling this song.
Great, now you got me doing it!
Priceless clip tho. Thanks (posted to my FB page)
I am still trying to figure out the speed of the unladen swallow, so this post went right over my head. I think i will stick to Google Alerts and and Social mention tools for the time being.
I've found that it's easier to understand SEOmoz articles if you head into the nearest wood and chop down the largest tree you can find with a herring!
NIH
I completely agree that getting hands dirty with code or tools or any range of technical tasks makes you better at your job.
It reminds me a lot of studying English whilst simultaneously learning Latin and studying linguistics. 60% of English (or thereabouts) is derived from Latin; linguistics investigates why our words work the way they do. I found that both made me a better writer.
Not everyone will enjoy or learn from building a tool to automate or help with link building. However, if building something like this isn't up your street, there are likely technical tasks that you'll enjoy and which will help you learn and understand. Everything remotely technical that I've embarked on in the past few years has made me better at understanding SEO, and I don't do as much of it as I should.
Have a few sites, a few tools or a few old PCs to play with. Break them. Set limits on tasks, like Will's two hour time frame. Learn stuff, even if it's via online tutorials. You'll teach yourself as much as you learn from reading. Wreck an entire website by driving its .htaccess file into a brick wall. You'll be better at .htaccess from that moment on :)
Really good points Jane and you've gone and made me ashamed of myself and my "avoidance" of things code like.
So with a brand new attitude, I'll march on, confident that I am on my way to becoming a "reformed Bumble"
PS - It would almost be worth taking Latin if it would help me be a better writer.
Themz sum prity sik drawin skillz blud.
Even though your posts always tend to blow my mind they're usually spectacularly fun and easy to read. That's not to say that I can understand what the hell is going on most of the time!
I think one thing that I need to do one day is sit down and learn some code that will aid me. Your explanation of YQL seems very easy and the select * from internet seems ridiculously easy.
All of this is definitely in my to-do list for the coming months.
Why does this deserve a thumbs down? Is it because of my gangstaaa talk ? If so, then I only did that because of Will's spelling of 'skills' in the fourth paragraph.
Time for some more gangster:
Who dem hataz?
I thumbed you up just because I saw the thumbs down...!
Great tutorial Will - very inspiring!
I had a similar experiment over a weekend with PHP and came up with this keyword research tool.
What I found amazing was just how many libraries and open source there was out there to make seemingly complex tools relatively easy to build.
Once you have the initial idea of what the tool should do there are plenty of building blocks to get you started.
I'm certainly looking forward to digging deeper in to the tools you've discussed - more posts like this please!
Imagine how many links each day a site with even a few mentions in the press/blogosphere likely loses due to "non-linking" behavior?
Will - this is just awesome. I'll bet tons of us could simply run this script daily/weekly, get a friendly intern to contact those sites that forget and end up with 20-30% extra link growth over the competition. Brilliant post, and I love that you gave away the code/tool for free
Thumbs Up :-)
Thumbs up for the comics.
SEOmoz needs more comics... :)
Wouldnt it be great if this was one of the tools that SEOMoz provided its users with!
I think some of the coders over there should get to work and tidy it up, give it a lovely interface and most of all, make it free to Non-Pro's to use! :)
It would make a super addition to the current tool set you offer.
I have not rated this post. No thumbs up or down, simply because i didn't get it (except asking the journalist to link out). May be web developers will be in better position to understand it.
Yes Himansu you right this is very complex to understanding for me.
Implementation details apart, I understand the basic non-technical idea of finding and persuading sites that mention you without linking. For example, if I Google my full name, there are thousands of results (yeah, pretty much all about yours truly, no namesakes), but most of them don't link to any of my sites. If this were a concern, I would contrive to get them to link to me, and I am sure the SEO impact would be good. I am not interested in SEO'ing my person right now, but I get the SEO point very clearly.
For which thanks, Will!
Great tool Will! This can come in really handy :)
Will - excellent job. Building a web based application is a very daunting task. You are to be commended on this demo.
Regardless of how useful this tool is or is not, I think what you've done here is illustrate the potential of learning coding and building a tool. It is without question there are many useful SEO tools that have yet to be created. Rather than sitting and waiting for someone to build you said tools, why not build it yourself? Heck, if it is good enough, maybe you can even charge people to use it?
Need inspiration? Check out what David Heinemeier Hansson (creator of Ruby on Rails and Basecamp) has to say in his presentation, "The Secret to Making Money Online"...it's currently the third video listed on the page.
Will, great post! I might be biased, but as a developer, I firmly believe that the best SEO’s have strong technical skills. Conversely, maybe something like this could be an opportunity for a less technical SEO to reach out to a developer within his or her company.
YQL is pretty awesome, I like the set of tools you presented for this post. Often when looking at HTML I’ll use Beautiful soup (https://www.crummy.com/software/BeautifulSoup/), or I’ll get annoyed with it and just use regular expressions. Nonetheless, using YQL with xpath is great!
Were you hoping someone could just clean up your python example if necessary and add comments to explain what’s going on? I might be able to be of assistance there…
There's not much more than a link in it for you, but if you clean anything up / comment things and post it on your own site, I'll definitely drop you a link ;)
Thanks!
OK, I got a chance to put together something technical on this, without further ado:
https://mrcoles.com/blog/technical-look-seomoz-automated-link-building-tool/
Thank you Peter. I updated the main post with a link.
I have found that YQL & xpath together have problems with many pages that are not well-formed (properly written RSS/HTML/XHTML/etc...) so this tool may likely miss a decent amount links reported as false positives for "no-link" so be sure to double check the pages before locking and loading on the link request.
I tend to write my apps in Java for AppEngine and I like using a library called htmlcleaner
https://htmlcleaner.sourceforge.net/ to clean up the html before parsing it to extract the links or other content.
Thanks for the tip. I guess it makes sense that xpath relies on well-formatted XML...
Will,
Great post, thanks for the illustrated examples!
Would it be too much to ask (in the spirit of today's post) to get a link to www.smartsheet.com?
Thanks!
Heh. Of course. I'd like to say I did that just to prove the point, but actually I have no idea why I didn't link. Thanks for the pointer - I've fixed it.
“There is no great genius without some touch of madness.”
-Roman Philosopher
Thanks!
Sounds time consuming to me and quite technical. For the less confident coders you could try this quick way to get the data (not as good, but quick n easy). Set up a Google alert as below:
"Brand Name" -link:site.com
Is there any good way of getting only recent links with this kind of query?
mind you, (and thinking out loud here) you should really hook that up to an 80legs spider for full effect, then get a team of interns churning through the results.I think thats my weekend sorted ;-)
EDIT: Ignore this comment, just realised the fairly obvious issue with my suggestion....
Yep...run the query as mentioned above, then use the bookmarklet that Rand shared a while ago that filters results for last 24 hours or even 7 days if you're doing this weekly.
Also, I think you can build the -link:site.com into a Google alert query, right? I'm gonna try that, it seems much less technical.
The concept is great though.
Edit
I just tested this with Google Alerts and found about 50 mentions without links. One was an LA Times article...I reached out to the author and was able to get a commitment to add the link. All of this just happened in the last 10 minutes.
Hold on though - I don't think this query works does it? Running it for distilled returns this page from seogadget that does link to us...
I suspect that in the same way the google link: command is unreliable, so is -link:...?
It DOES work, but yer you get a few odd ones thrown in.
works just fine for me. I suspect the more broad your brand name is the crazier the results will be. When I run the query for "distilled" I get a bunch of irrelevant results, some about the benefits of distilled water, that obviously don't link to your site. So yes, the code example you presented does work better...I think some of us are just suggesting less technical ways of answering the same question without having to code or program. I think there is some value in that within the community here.
Sorry - I wasn't clear - the bit that isn't working for me is that some of the results *do* link to us despite the -link: in the search (same when I try it on seomoz or other brands).
I definitely appreciate the discussion though - keep it coming.
oh...sorry I misunderstood. I don't see it for our brand related terms. All the results seem to be what I'm looking for. Mentions without a link.
I'll send you my invoice later ;)
Oops, ignore. I'd recommended the same thing as the poster above me..
Perhaps filtering by Google's time metric?
Brilliant! I love YQL and generally fooling around with stuff like this. I'm no developer though and I do wish I had more time to learn.
Interesting stuff thanks!
Dear Will,
just because you remind us to check out mentions without links is due a big thumb up.
This is the classical post I love and hate at the same time:
love, because it helps me understanding better the "tech" behind SEO, and also because it's like a living conscience voice telling me: you too have to dirt your hands with code experiments
hate, because it's not so intuitive for a not developer mind as I am and it makes my brain "smoking" in creating synapsis between concepts.
But the love part is greater, Will... I don't mind to need more time than usual to really understand a post like yours.
Anyway, a tools like this one you created is not also behind the "Google Domain Mention" of Trifecta (somehow)?
Cheers for getting in there and giving it a go!
One of my goals this year is to read five text books on web development to be able to create tools and cool sites for whatever purposes I like. The plan is just to knock out ten pages a day, so after one year I'd have read 3,650 pages!
Anyone can read 10 pages a day, and especially considering how cool the light at the end of the tunnel is: programming badassery.
First up is The Essential Guide to CSS and HTML Web Design by Craig Grannell, which I'd give a 9/10 on quality and enjoyability.
If you ever need any css or XHTML advice let me know :-) I'm no longer sellin my services but I'm going to be giving people advice. A good book to read is also CSS Mastery. They've just brought out another volume too.
HEY! What a change in your avatar!... finally I can give you a face :)
Haha, as it turns out I'm a 20 year old child! (was 19 in that photo)
Yay! I like faces :)
Haha, as do I :-).
I was getting a bit sick of just being a faceless person :D thought I'd go for the same avatar as my other stuff.
Im 100% behind the point you're making here Will.
I'm not really a coder either but I;ve been developing my own mini tools in php and vba recently. It's a good thing to do, and , if you want to find something out quickly, it's great to have the knowledge to do it rather than missing the opporuntity because you need to schedule it in for IT.
Although there are many soft SEO skills these days like linkbait, social networking etc there is also a whole tier developing (specifically around links) making things ever more complex, and a good SEO should really be flexible for either skill type.
#1, this is awseome! As a technical SEO I'd love to get my hands dirty in this.
#2, you write as fast as you talk, I'd definitley pay you a quid to make more sense of this. :)
Isn't there a much simpler solution like bookmarking this link:https://www.google.com/search?tbo=p&tbs=qdr%3Ad&q=seomoz+-%22www.seomoz.org%22&meta=&aq=f&aqi=&aql=&oq=&gs_rfai=
That will find mentions of seomoz that don't mention www.seomoz.org - which is different to whether they link to seomoz.
However, my point is not really that *this* is a tool that you might want to build, but rather that the toolkit is there to build all kinds of things that might be useful.
classifiedsfreeclassifieds.com - post free buy sell,Sundries,Job,Service in word classifieds ads.Want Ads
<a href="https://www.classifiedsfreeclassifieds.com">FREE CLASSIFIEDS</a>
Great tool to discover some absolute new link opportunities. thanks for sharing the code here for free.
Good one!
I love this solution, but the app doesn't appear to be working now. Is there another good way to find brand mentions that are sans backlink? (either an app, source, or googledoc wizardry would all work)
YQL and XPath are what powers the anchor text that is returned in Yahoo Site Explorer and Google Webmaster Tools for SEO Site Tools that and linkscape data awesomeness
Brilliant Will!!
Of course, now you just reminded me of more reasons why I need to find more time to read and tinker.
impressive, next step will be automating your copywriting techniques into machine algorhytms!
maybe it's one of the reasons why I dislike linkbuilding so much, the fact that it can be automated.
don't worry, this doesn't automate link building. I just automates a method to find low hanging fruit. The content is already out there but the author didn't give you a link! They're talking about you but not crediting the source. Don't know why anyone would be opposed to automating a process like this? One still has to go out and get the link manually.
Really good stuff!
I'd never heard of YQL before -- I'll give it a try asap.
Will-
Much thanks for this great article. Apparently the folks with mad coding skilz don't subscribe to SEOmoz in their feed reader.
I had assumed that by the time I scrolled through the comments someone would have perfected your prototype and we would have a cool new tool for the masses.
In the meantime, is there any reason why SEO-doctor's google alerts idea won't work?
Give it a try...it's free and will take u 2 mins to set up!
works great....for me.
Brilliant! I love it.
My Python skills sound similar to yours, so I'm not sure I should be the one coding, but what about adding a similar feature of getting the links your competitors have, but you don't?
The best thing about this tool, is it builds on already existing technology. No need to re-invent the wheel :-).
It sounds very simple to build, and very powerful.
Excellent contributions
Thanks
Really great post Will!
Never knew it was so easy to list the websites that are mentioning you, but not linking to you... In other appliances this is a wonderfull tool to show some very interesting insights! Have to think of a cool way to use it, but first... let's try some Python!
Thanks again for the great post!
Will
Great work, show if you want to do something, it can be done.
thanks
Thanks Will, this is a definite "to come back-to" post... thank you for the time-warning and the great technique. I may have some more thoughts once I've had the time to properly try it out.
Looks like a big winner though-- glad the cries for more advanced posts are being heard :)
Great post! SEOMoz needs more dev posts IMO.
One thing you could do to make this a little better is set up multiple alerts in Google Reader, then use the feed from that so you can iterate through the aggregated feeds to search for different variations of your brand terms.
Also, I feel like PHP is a little more accessible to most people for something like this too. You can use SimpleXML and MooTools to create a page that pulls a few rss feeds and does an ajax refresh every few minutes or so. You could parse the SimpleXML result with a regex to see if the post links back to you. (Admittedly not my strongest subject :D)
Great post. I did not realize that there is a tool like YQL before.
My initial idea of how to use it was utilizing Yahoo SiteExplorer stats. However it gives only 50 results (or maybe I don't know how to get more) .
Another idea is to use YQL for measuring brand recognition on Twitter. See: https://developer.yahoo.com/yql/console/#h=select%20*%20from%20twitter.search%20where%20q%3D%27seomoz%27 . You could run this query every day and measure the frequency of new posts that appear on Twitter, especially after publishing interesting articles.
Comment out of topic:
Doesn't seem as if today here is acting a thumbs down spammer?
PD: as this is a "notification comment" feel free to delete it if you think is due.
Great tool. This is definitely an excellent way to pick up extra links that I would consider "low hanging fruit."
Keep up the good work!
Hey Will, my hats off to you for venturing into that scary world of code. FOr myself, aside from some self taught c++ a century ago, I'm essentially an code illiterate.
So I concur with theexo51. It would be great to have a tool like this in the labs section. For those of us too chicken to stick our toes into the coding waters.
One of the greatest post I have ever seen. Thanks!
Thanks Will! The program you wrote is useful and the flexibility in using these tools is exciting. I have my homework for the week.
Thanks Mozzers for all the follow up comments!
Will, this is fab - a really useful tool :)
At last something worth reading. More posts about automation plz.