Don’t panic – it’s not what you think. Last fall, I did an analysis of 50 blog posts before and after Google+ to see what factors drove traffic. At the time, I really wanted to do more, but collecting the data posed multiple challenges. Rand suggested that 50 was ok, but 500 would be great. So, I set out to make it 1000, just to make the boss proud. Then, I thought, “Why not 2000?!”. Three months passed...
Long story short, I built a crawler and not only expanded the 50-post analysis to 2011 posts, but added a chunk of variables for good measure. This analysis covers the top 2011 SEOmoz posts of 2011, ranked by Unique Pageviews (UPVs). These posts could be written at any time (some go back to 2005) – I’m just looking at which pages got traffic during 2011.
Let’s See Those Numbers
I could keep talking, or I could show you the numbers. The following graph shows Spearman correlations (r-values) for 13 variables with UPVs. Blue bars are social factors, green are community factors, and purple are content factors:
Most of the variables are self-explanatory, but a few that might need elaborating:
- Words (Post) is the word count of the post’s content
- Words (Title) is the word count of just the post’s title
- Headers is the count of all header tags (<h1>, <h2>, etc.)
- Bold Tags is the count of all <b> and <strong> tags
- Lists is the count of all <ol> and <ul> lists
We use Spearman rank-order correlations because many of these variables tend to be skewed (for example, some posts get a ton of Tweets, whereas many get very few). As always, correlation does not imply causation. I originally captured both Pageview (PV) and Unique Pageview (UPV) data, but the correlation between them was very high (r = 0.998), so I decided to just keep it simple. Every cited r-value is significant at p < 0.01. Many thanks to our resident stats guru, Dr. Matt Peters, for helping me pull the numbers together.
What Does It All Mean?
First off, I’d better explain the “Post Age” data (in red). That’s actually a negative correlation with UPVs. In other words, the older the post, the less traffic it got. That may sound counterintuitive, but remember that the traffic data was only from 2011, whereas the posts could be written at any time. Naturally, posts written in 2011 tended to get more traffic in 2011. In retrospect, that seems obvious. Interestingly, thumbs up was also negatively correlated with post age (r = -0.76) – the other reality is that the community has just grown over time.
Clearly, social factors had the strongest influence in this data set. Causality is a bit tough to pin down, as we do have a chicken-vs-egg problem. Likes, for example, may drive sharing and traffic, but posts with a lot of traffic will naturally get more clicks on the Like button. Which came first? Probably a little of both. As we saw in the smaller data set last year, there does seem to be “cross-talk” between the 3 social buttons. People that like a post will naturally +1 it. For reference, here are the inter-correlations between social factors:
As you can see, they’re pretty highly correlated with each other. It’s hard to separate why, at least from this data. It could be that (1) The best content attracts the most social signals and the most traffic, (2) People who regularly use social tend to use all 3 services, or (3) People use all 3 because the buttons are close to each other.
Community factors are similarly tricky – posts with more traffic get more thumbs, all else being equal. Still, it seems that our community metrics have some validity – posts that get a lot of thumbs up and comments tend to also get a lot of traffic.
The content factors are the weakest group, as a whole, but here the causality is at least clear. No post magically got longer or had more images in it because more people visited it. It does appear that longer posts tended to fare pretty well with our audience.
Where Do We Go From Here?
While we can’t predict the future of any given social network, and Google+ is still in its infancy (even by internet time), I think that 2011 was the year where social really made its mark. It’s clear that social is driving traffic, and the impact of social factors on SEO is growing fast.
I think both studies suggest that you shouldn’t be afraid to use all 3 of the major social buttons. I wouldn’t go crazy (if you have 50 social buttons, you weaken them all), but the inter-correlations strongly suggest that, at worst, the 3 big buttons don’t hurt each other. People who regularly use social probably send multiple signals.
It’s also interesting to me that long posts seem to do pretty well on SEOmoz. When I wrote my duplicate content mega-post, it was a bit of an experiment. We had talked about doing another guide for e-commerce SEO and opted to try a long-form post on one sub-topic instead. I don’t think that every post needs to be that long, but there’s certainly room for mega-posts when the topic merits them. To give credit where credit’s due, Oli’s mega-post made that point before mine did.
Of course, every audience is different. I admit that I do these analyses as much for myself as anyone else – I’m really fascinated by trying to figure out what works and what doesn’t. Much like with SEO in general, though, “quality” is a complicated thing. If you write a long post just to fill up space, you’ll have a mountain of crap instead of a pile. Use the data wisely.
Hi,
Thanks a lot for your potential data. In today's era we live in Social networking sites. So Social promotion can easily intigrated the target customer base.
I wonder how skewed the social button usage is by the tech-savvy, Google+ savvy SEO audience?
Next, can you explain how we can all do this for our own blogs? Or is that a book in itself? That would be really useful knowledge for us as well.
The Spearman's are a bit tough - you need a stats package or at least an Excel add-on. Pearson correlations can be easiy run in Excel, but they're often not appropriate for this kind of data.
Honestly, the toughest part of this post was collecting the data. I built a custom crawler and database that imported data from Google Analytics and then crawled each of the top 2011 posts to extract on-page information. Then, it hit all 3 of the APIs for the big social sites. Cleaning the data was also a chore - I started with 2,500+ posts, and then had to pull out anything that was defunct, not a real post (say, a "/blog" URL that served another function), or had been redirected to a regular page.
I agree! It would be great if there was an easy way to collect all this information (page views, comments, social interactions etc) in one place for our own blogs. It would certainly help with future content strategies.
Hi Dr Pete,
When you say "It's also interesting to me that long posts seem to do pretty well on SEOmoz" how are you defining 'very well' - the only reason I ask being that surely people don't know how long a post is before they navigate to it and hence register a visit?
I suspect that you must have been referring to the social aspect, meaning that longer posts fared well in comparison with their shorter counterparts in terms of likes, +1s and tweets?
Absolutely - the causality is hard to tease apart. What I can technically say is that, of the on-page factors, post length had the highest correlation with unique visitors. Relative to other factors, it's still not very high.
It's definitely true that there's a bit of a loop - initial visitors come to a longer post, and if it's a good one, link to it, mention it, share it on social media, etc. That drives more visitors, and the cycle continues. In that sense, I think length drove visitors (to a point), but the path zig-zags a bit.
What I'm comfortable saying is that long posts weren't a detriment to success here on SEOmoz. I've seen that be hit-or-miss on other blogs. Some audiences don't care for long-form content. If we did it every day, it probably wouldn't work. Once in a while, though, it does very well.
Good post, Dr., thanks for putting in the time. Seeing some of those stats gives me flashbacks of TI-84 calcs and Math professors quipping I'm not in Literature class anymore, but I digress.
I would be interested to know if the author's name had large influences on posts' popularity in addition to the author's social media "status" and"clout." You would hope that popularity of posts is mutually exclusive from the reputation of the author and begotten due to the information within the post, but I don't think that's always true.
Given this kind of data set, it's really tough to look at a categorical variable (like author) and make reliable conclusions. If you look at, say, the average unique visitors per post, people like Oli do amazingly, because he had 1 incredible post. People like Rand are surprisingly low on the list, because he's written 100s of posts since the beginning of the blog (when it had relatively little traffic).
My gut, as a regular blogger here, is that authors have a ton of incluence, especially in this social media age. We're not only regulars and known to the community, but our brands extend on to Twitter, Facebook, Google+, etc. We have contacts to call on for the big posts to get the word out. Of course, we also write a lot of posts that flop - that's the nature of doing anything regularly.
It's very intersting to see which posts get the most attention and how they get them. I haven't had the same results with what I'm doing but I know they work well. Thanks for the updates!
Very well researched article. Loved it.
Hi Dr. P.
Waow! What an insight! Well I truly agree to you it is actually the audience that decides the word count for the posts, and being precise is always a good way to do! But I think sometimes even shorter posts are not liked by many, sometimes even a simple topic (i.e. for general audience) can do good as people love EXTRA details!
1 thing that I need to ask is that if you notice any strong impact of likes and dislikes on posts. I mean did it do any good for posts anyways !?
Looking back on how these number have changed from then till now has helped out, i really appreciate the information provided, cheers.
Ya like to see the correlation matrix. ANd Also intersting interesting to consider the correlation between a lots of items in this data set. As example word count and Just likes.
Dr. P,
Awesome post!
1) “Built a crawler”- can you expand on how, what programming language, open source versions available…
2) It’s interesting to see Plusses vs Tweets & Likes as the strongest signals impacting page views in 2011..How long would you estimate for Google + to become the strongest social signal especially with Pages on Google + changing the SERPS?
(1) It's pretty quick-and-dirty - I'm using PHP/MySQL. I only need to crawl Moz posts and their specific HTML/CSS structure, so a lot of it was just basic text parsing. The Big 3 social sites all have APIs that can be called by URL, so that was surprisingly easy. Tom Anthony has a good write-up on getting Google+ counts here:
https://www.tomanthony.co.uk/blog/google_plus_one_button_seo_count_api/
(2) I suspect Google+ will be all-or-none this year - it'll either not catch on and fade away or it'll explode. It's really tough to say. If it explodes, it'll be bad news for Facebook, especially as Google integrates search + social.
Nice study. Thanks!
I would like to see the full correlation matrix. It would be interesting to consider the correlation between lots of items in this data set. For example word count and likes. :)
The full data-set just got to be a monster. Even the correlation matrix is a bit hairy. I'm happy to share any specific bits of data, though. The Spearman correlations between word count and social factors were:
The Google+ data is the toughest to interpret, since so many posts were written before Google+ launched (and there isn't that much post-launch data). It's interesting, though, that many older posts do have social signals, even if we didn't have the buttons when they were originally posted.
Thank you! Thanks for the data. This type of study has a lot of potential. I might try some of my own.
The only problem with these studies is that, sometimes, I run the data and then realize I'm not entirely sure what it means :) Still, it's fascinating, to some of us.
When I first saw the page title this morning when it got into my reader, I thought "Oh geez, I think they messed up the page title by repeating 2011 twice!" But, silly me, I was wrong.
But great post Pete! I'm a sucker for awesome data :)
I can tell you from first hand experience that content mega-posts as you call them (real long posts crammed with useful tips) tend to perform really well, but that doesn't surprise me as they're filled with many tips. It also doesn't surprise me that older posts get less traffic because that's typical for a blog, but the rest of the analysis is pretty solid. Here's to a job well done!
Great post! Thanks a lot, you get to be the first article on here I've commented on :] decided to make an account just now. Thanks for the links to the mega-posts, having a good read through them too.
Hmm.. Really informative... Very nice comparisons...
Another informative post Dr. Pete. It's nice to see that in this current day and age people are still trying to push the boundaries.
Good research you've done. After all we can say that Social Media is too mcuh important then SEO (onpage & offpage). Due to advanced technology social media is in hand of users.
Users can access easily an information about news and news going around the social media through various plug ins. Good post. Nicely done. Keep it up
Great post, thanx Im waiting for another e-commerce SEO post....