I took some time to revise the Website Analytics article. The data that I originally posted had some errors, which have now been corrected. I've also taken the suggestion of CSandB and used Pearson's correlation coefficient to calculate the actual relationships between the external metrics and the stats data:
- Number of Technorati Links (0.74)
- SEOmoz Page Strength (0.60)
- Number of Links to the Blog URL via Yahoo! Site Explorer (0.56)
- Number of Links to the Domain via Yahoo! Site Explorer (0.54)
- Bloglines Subscriptions (0.49)
- Technorati Rank (0.49)
- Alexa Rank (0.49)
- Netcraft Rank (0.43)
- Newsgator Subscribers (0.39)
- Compete.com Rank (0.38)
- Ranking.com Rank (0.36)
- Google PageRank (0.21)
Technorati links is actually an almost usable option at this point, though any scientific analysis would tell you that correlations below 90-95% shouldn't be used. We're also thinking about new ways to use this methodology to build and improve on some of our tools (Page Strength for one).
If you've got some other great ideas of how to use this data, please let us know - I'll be more than happy to link out to anyone who can use this information in new and fascinating ways. In case you're interested, the Excel file is downloadable here.
Update: I've made some changes to the Excel file and the article based on some additional inputs from the contributors.
Really interesting analysis. I've done a little more work on the dataset provided, but haven't had enough time to really get stuck in yet:
Permalink to results of my regression analysis
If anybody is interested in giving me more ideas for analysis, or can get their hands on more data, I'd be very interested in posting results here and/or on my blog Innovation|Trust
David - that's fantastic work. When I get home, I'll make sure to link to your analysis from the original piece. I agree with your conclusions about the sample size being too small, but remember that we're also working with a near-perfect sample for metrics like Technorati, Alexa, Compete, etc. These sites are all in the exact same sphere, cater to the same type of audience (who should have similar installation percentages of various toolbars), all are blogs and have the metrics associated with those types of sites, etc.
I'd love to be able to get 500 people running the same stats package, all in one niche, but it's a very, very hard thing to attempt.
:)
Well, my overzealous mind would love to get a whole bunch of people, in different niches, with different stats packages. Then, we could actually fit a model showing the relative accuracy of various competitive intelligence metrics for different niches and allowing for differences in stat packages.
For example, we could see that Metric A is good overall, but better in some niches than others. Further, Stats package B consistently indicates higher visits than package C. I think that could be quite interesting in itself.
Moving on to my major area of interest, we could build a model (probably using more sophisticated techniques than multiple regression such as GLM or some form of logisitic regression) to provide a tool to estimate visitors to a range of sites based on external metrics. Something like a complementary tool to the Page Strength tool.
Yeah, I know I'm mostly dreaming, but if anyone wants to make efforts to get this going either now or in the future, count me in for the analysis. The work is interesting and the exposure useful.
Great post Rand. Being a stat junkie I am very intrigued. It would be cool to do this experiment across different site types/industries to see the fluctuations in the results. This is very useful for any SEM firm because they'll be able to offer solutions that focus on the best external metrics.
It would be interesting to see the data for an industry that wouldn't be as skewed by web monkeys. For instance, sewing or office supplies.
Ken - I certainly thought about that, but I also believe that there's going to be the same skewing for all these blogs. The audiences for them are very, very similar...
I agree that it would be great to get stats to compare from a non-tech field, but I don't know how one could go about making the connections to do so.
Very interesting article! Would it be possible to state what stats package was used on each blog? Data reports between web trackers can vary quite a bit. Some have a tendency to report bots as pageviews/visitors, while others don't. The numbers can be quite different. For example, Webalizer reports almost double the amount of pageviews/visitors than Mint does for my blog.
Rand - Kineda is in the non-tech field. Our focus is predominantly Entertainment, so if you're interested let me know. Although my site has been around since 1997, I didn't adopt the blog format until a couple years ago. I think you'll find that Technorati isn't as accurate for non-tech sites.
I'm sure there are plenty of others that run non-tech blogs that would be willing to help as well.
Kineda - thrilled to see you here; thanks so much for your offer, I'll certainly look into it (maybe you can shoot me an email).
BTW - You'll be happy to know that I recently used your blog's design as an example of excellence. Great work!
Rand -
You NERD!
Nice data here...I agree that Technorati is most closely acurate. Great write up on the analytics article, that was a nice read and can really show the scope of some sites!
Thanks
- Scott Fish
Yes, but he's our nerd. Eat that, Scottfish!
It's weird how they turned out to be so much closer than anything else, when they don't even claim to predict traffic data... :)