Way back in November of last year, I suggested a linkbait idea to the wide community on a comparison of the major analytics vendors. Despite an incredible amount of work and effort required to put it together, Eric Enge of Stone Temple Consulting prevailed and produced the incredibly valuable, must read research report - The 2007 Web Analytics Shootout.
I blogged previously about Eric's interim report, which also provided some good information, but with the final report out, we've got a treasure trove of comparative data and an excellent starting point for those attempting to choose the right analytics package. The big message for me was highlighted in the executive summary:
Web analytics packages, installed on the same web site, configured the same way, produce different numbers. Sometimes radically different numbers. In some cases the package showing the highest numbers reported 150% more traffic than the package reporting the least traffic...
...By far the biggest source of error in analytics is implementation error. A Web analytics implementation needs to be treated like a software development project, and must be subjected to the same scrutiny and testing to make sure it has been done correctly....
...Two other major factors drive differences in the results. One of these is the placement of JavaScript on the site, as being placed far down on a page may result in some users leaving the page before the JavaScript can execute. Traffic that is not counted as a result of the JavaScript can be considered an error, because the data for that visit is lost (or at least the data regarding the original landing page and, if the visitor came from the search engine, the keyword data would also be lost)...
This report hasn't been without its detractors, though. My good friend and analytics guru, Avinash Kaushik, had some serious reservations, despite generally praising the report (I had hoped that he would discuss these on his blog, but other than a mention in this video, there doesn't seem to be anything I can link to). I asked Eric if he'd be willing to participate in a brief interview on the topic, and he contributed the following:
Rand: The analytics comparison project took you more than 9 months to complete - that's an incredible amount of work. Can you tell us some of the things that made it so challenging and time consuming to create and run?
Part of it was that it simply took about 3 months to get vendors and web sites on board. With each vendor, we had to learn the installation procedures, and then we had to work through them with each participating site.
Once the code was setup on all the participating sites, the actual data collection part proceeded without a ton of effort by us. We spent many months collecting data, so that we could look for any interesting patterns that might emerge.
However, we were also interested in putting together a qualitative analysis. To be able to write that up, we had to spend many hours in each application trying to do different things with them, to find out what we could, and couldn't do. We also spent many hours on the phone with the various vendors having them educate us about that.
Last, but not least, analyzing the data took quite a bit of effort. We collected massive reams of numbers, and I had to do a lot of analysis before meaningful information started to emerge.
Rand: On the technical side - you noted that some of the analytics programs were particularly simple/hard to install and run - what are the big challenges there? Do you think there are a lot of folks who have analytics packages that they haven't properly installed and, as such, collect bad data? Which companies need to fix what?
Some packages, such as Visual Sciences (HBX) and Omniture, are targeted at high end customers, and as a result they offer a lot of flexibility and the ability to customize their products. There are some amazing things that they can be a part of, such as behavioral targeting applications.
What comes with that is a harder install. Even some of the basic stuff can require extra effort. For example, with (Visual Sciences) HBX, you need to modify your DNS by adding a CNAME record to it to implement a first party cookie. Note that you get some improvements in accuracy in doing this, but some people won't want to do this.
Then as you move forward from there and begin to setup the things you want to track, you are more likely to find out that you need to customize the Javascript then you would be in the other packages.
As for bad installs, I think the analytics industry wisdom says that Javascript tagging errors are the number one source of error in analytics. I don't have any specific data to prove that assertion, but, for example, I have seen it happen over and over again that pages were left untagged on sites.
This is just the first part. Once you have customized your Javascript (for example, you have tagged a group of pages with a common label because you want to look at them as a group), these customizations can get messed up. When you add pages that belong in that group, did you tag them and include the same customization? Does the developer accidentally remove them?
Lastly, it also happens that people put malformed Javascript on the page. I have seen all these types of errors happen.
The biggest thing that companies need to do is to treat it like a software development process, with verification and testing. Then they need to question the data when they start getting it and make sure that it makes sense.
Rand: The report shows some fairly diverse data on page views and visits - what is your best guess as to the major causes of the disparities? Is it a technology issue? A programmatic issue? Some sort of temporal differentiation?
There are a few types of differences that can occur that I talked about in my presentation at SES San Jose. Some of these are:
3.1 Processing of bad or ambiguous data - The web is a mess. Counting on the web is far from deterministic. Users from AOL have IP addresses that change mid-session. Proxy servers strip referrer information. About 3% of users disable Javascript. 2-3% of people don't support cookies. There are many issues of these types.
Each package handles these issues differently. For example, some packages do not collect session related information from people without cookies, and others fall back on IP and User Agent tracking.
3.2 Session Handling - The industry standard for a session inactivity timeout is 30 minutes, but not every package does that. For example, Clicktracks defaults to 15 minutes (you can configure it to 30, however).
3.3 Javascript Placement - This turns out to be a big one too. We measured the impact of adding a 1.4 second delay before running the analytics Javascript on visitor counts, and saw a loss of between 2% and 4% of traffic. I believe that this problem scales rapidly as the delay to execution of the Javascript increases (because users have more time to click on a link and leave your page before the Javascript executes).
Rand: If you were to suggest an analytics product for a small business/website with a few thousand visitors per month, which of the providers would you recommend?
The answer here would be Google Analytics or Clicktracks Appetizer. In answering this question, I am assuming that their analytics needs are likely to be simpler.
In addition, these businesses probably have budget constraints that limit what they can afford to spend on analytics, so a free product may be the best way to go. That way their only expense is on the people who use the tool on their behalf.
Rand: Let's say we're talking about a relatively sizable company/site with 10K+ visitors each day, and some more complex action and conversion tracking required - would your recommendations change?
Assuming that budget is still something of a limitation here, and that Google Analytics or Clicktracks Appetizer no longer offer enough capability, Clicktracks and IndexTools are good mid-range packages that offer much richer functionality then the free tools at a reasonable price.
The best one to select, however, will be driven by the actual requirements of the application, which varies greatly by web site.
Rand: Lastly, for a very large site with millions of visitors each month, would you have a preferred provider?
All of the vendors, including Google, have enterprise level customers. Google lists some of these on their site, and Clicktracks and IndexTools have large quantities of enterprise customers who get what they need from these tools. Even if you can afford it, there is no reason to spend money on a high end application if you don't need to.
However, once your traffic gets up to these levels, there is an increasing chance that you will need a package that can be customized more substantially. For example, using analytics in a dynamic fashion to provide data for a behavioral targeting system that updates a web application on the fly. APIs and customization capabilities that can provide this come with Visual Sciences and Omniture.
Or, if you are looking at integrating online and offline customer data you really need to consider Unica Affinium NetInsight.
To summarize the answer to the last 3 questions, which package you pick should be driven by your requirements, not by price (unless, of course, a package is simply out of your price range).
Rand: One of the biggest issues that folks seem to have with the report is the use of the "average" as a baseline for measurement. I can understand the need for a baseline, but what made you choose the "average" as opposed to something like log file data or a custom-built javascript system that only checked page loads and visits?
We recognized that average did not really represent a standard, because these is no such thing as the right answer, so we did not mean to represent that as the right answer. However, it was helpful in allowing us to put together a standard deviation analysis, which I think provides some sense of the scope of how the packages differ.
Our own log file examination would probably have represented a better standard, but we were simply limited on the time we could invest in the effort (it ran into hundreds of hours of work).
Beyond that, I think what people should observe out of the data is the following:
7.1 In some cases, the package reporting the most visitors reported 50% more than the package reporting the fewest visitors. That's a huge difference!
7.2 Who counted higher, and who counted lower, varied significantly across the visits, unique visitor, and page view data for the 4 sites.
7.3 The data we collected on the impact of Javascript placement on the results suggests that this is a big issue. In fact, it's an area that we are planning to subject to further study.
Rand: How would you suggest that consultants and organizations use the data you've collected? What are the most valuable applications for this information?
We drew a few major conclusions out of the report. These were:
8.1 Analytics packages are not accurate. At least not, in the absolute sense. This is the one clearest thing I learned from the study. The Internet is a mess from an analytics perspective. It's just not simple to measure your traffic, let alone analyze it.
However, the relative measurement capabilities of the packages are outstanding. Learn to focus on using those capabilities. The big wins are in SEO related analysis, PPC campaign analysis, segmenting your customers and A/B and multivariate testing. This is where the highest value and ROI can be found.
8.2 Know your errors. In other words get an idea as to the specific accuracy issues for your analytics package on your site. This simply helps you understand how to make better use of the data you are collecting.
8.3 Verify and calibrate every way you can. Don't use analytics to count the total revenue from your PPC campaign. It will be missing some of that data. Instead, rely on URL parameters in your PPC ads to tell you what you need to know about that.
Note that I would estimate that 50% of the people that attended the Analyzing Analytics session on Thursday morning at SES San Jose indicated that they run more than one analytics package on their site.
8.4 Each package has different strengths and weaknesses. Determining what package is best for your company needs to be evaluated based on how your specific requirements match up with those strengths and weaknesses. Don't buy the higher priced package simply because your site is larger.
8.5 To get to specifics, I think it's safe to say that HBX often counts lower than most other packages, and Clicktracks counts higher than most other packages. However, I don't think this affects either packages ability to provide extremely valuable data as I outlined in point 8.1 above.
I want to thank Eric, not only for the insightful interview (I'll probably be referring people to both it and the report, for years to come), but also for the hundreds of hours he put into this effort and for giving everyone who uses analytics and everyone who needs web analytics a better view at the technological and competitive landscape. If you'd like to show your support for Eric (and I hope you do), please link to his site and the report. To me, this is one of those pieces of viral content that provides such incredible value, it shouldn't be missed by any marketing professional who handles an analytics account.
A great report and a great interview, Rand. Thanks particularly for your question to Eric about the best analytics programs for small businesses. I've been using Google Analytics for a couple of years now for most of my clients and have been happy with it. But it's great to get that validation from Eric that it truly is one of the top options out there for small budgets.
I don't think you can over estimate the complexity of this project or the effort involved. Analytics was the focus of discussion at the recent Search Expo in China. Kudos to Eric for undertaking such a monumental task. Stone Temple Consulting rocks, Eric Enge rules.
3.3 Javascript Placement - This turns out to be a big one too. We measured the impact of adding a 1.4 second delay before running the analytics Javascript on visitor counts, and saw a loss of between 2% and 4% of traffic. I believe that this problem scales rapidly as the delay to execution of the Javascript increases (because users have more time to click on a link and leave your page before the Javascript executes).
Multiple times I have seen a 'magical' traffic increase when the tech department reduces the time to load per page.
When you have adserving and nested tables increasing loading times, the poor javascript stuck at the bottom of the page doesn't have a chance to fire if a visitor leaves soon after arriving. It can seriously distort your reporting so it's a good practice to hound your tech manager on reducing page load time.
In fact, sometimes we use an additional 'catchall' javascript near the top of the page just-in-case. This at least captures overall visitor numbers, if not the more detailed stats, but really, implementing two instances of a script is a silly step to take to ensure we don't 'lose' traffic.
A great article indeed, and a report that I've been looking at since last week.
Another handy freebie tool, and one especially useful when a client is missing legacy copies of their site, is the Wayback Machine at archive.org. This can often be a handy way to compare past implementation.
I was thrilled to see this report's release last week. As a company that recently switched from ClickTracks Pro to Google Analytics, it's nice to see hard data that shows our move made the most sense for our company (and it's clients). Bravo to all involved, and thanks for the additional interview data.
This is great work and I can tell you that I will refer to the report when our clients ask us about analytics packages.
One of the things that we always remind clients is that analytics is not a once-and-done process. We've seen countless times where new features were added on sites (such as a/b testing or new landing pages) and the analytics code doesn't get properly carried to the new pages. That's where it's so important to check the relative stats as mentioned -- if your traffic or conversion rate changes dramatically, you need to try to figure out what changed on the site.
However, the relative measurement capabilities of the packages are outstanding.
This is probably the most important point in the article for any analytics consultant or website manager. From the early days of WebTrends Enterprise to today with all of the choices outlined in this study, knowing how to look at any analytics data in a relative instead of an absolute manner has been key in maximizing the value of said data.
Fantastic job on the interview and study.
Agreed. The comparative tools within analytics packages are what make them valuable. If it's raw numbers you want, time to start adding log files.
Wow. Great detail. I'm sure there are some interesting things we can take away at the moment given we are working on a few analytics challenges (see recent Q+A...). Thanks for the effort.
Very nice work. Big props to Eric.
This couldn't have been better timed - I was just talking with a colleague today about getting stuck into analytics and then you go and post about it...
This is one of the best reads I've had in a couple of weeks!
Is it just me, or is Clicktracks Appetizer no longer available? It seems the URL now redirects to the pro version. Did Clicktracks quietly discontinue this option?