For the past couple weeks, I've been chatting over email with the folks at Compete.com about their web popularity reporting & analytics tools. Luckily enough, Jeremy Crane, the Director of Search & Online Media for Compete, agreed to an interview. Below, you can learn a lot more about how Compete gathers data, where they struggle, where they succeed and what the future of third party visitor analysis may hold.
For our readers who may not be familiar with Compete, can you explain the service and its goals and give us a brief background on yourself and your position?
Compete is a competitive web analytics company. We’ve been around since 2001. There’s actually a number of folks in the web space that think of us as a relatively new startup, but the reality is that we’ve been doing custom web analytics work for top tier brands for nearly 7 years now. The predominant driver of this impression is the launch of compete.com back in November 2006. Compete.com is by far the most visible part of Compete, but really only represents a small piece of what we do. We do have big plans for compete.com though and intend it to be a much larger piece of our overall business.
There were four main drivers behind our launch of compete.com; unmet market need, improved visibility for Compete, greater transparency into the data, and an effort to give something back to the community driving our data.
The core driver of our launch of compete.com a year ago was an unmet market need. There are millions of people running business on the web or at the very least highly dependent upon their website to drive their business. Up until the launch of compete.com there really was no reliable and consistent source of competitive web analytics for the average marketer. The existing companies in the space Netratings, ComScore, and Hitwise really were only servicing the top 1,000 or so companies in the space. As we all know the beauty of the web economy is that all those millions of players outside of the top 1,000 can have a voice. We felt those millions needed a better way to understand their web business.
As for my own role with Compete… I’ve been with Compete for just under 3 years now. I joined Compete in our Automotive Practice helping major Automotive OEM clients to improve their online marketing efforts. About a year and half ago I transitioned into my current role as the Director of Search and Online Media. In a nutshell I lead all of our efforts in the Search and Online Media space. This includes the development of Search Analytics on Compet.com as well as the custom client work we do with the major search engines, ad networks, and interactive agencies. I also work in a cross-functional capacity helping our Industry Vertical teams analyze and understand the impact of Search and Online Media in their particular industry space.
I noted here that Compete talks a bit about using various methods of data collection - a toolbar, ISP data, and a panel (whose size is 2 million in the US). Can you elaborate and/or be more specific about your data collection methods?
One of things that sets us apart from the rest of our peers in the space is the fact that our data comes from multiple sources. In fact we collect data on a monthly basis from more than 10 sources including ISP data, ASP data, Custom toolbars/desktop applications, and our own panel. The multiple sources of data allows us to adjust for source bias that can exist with a single source of data, however, it also brings some complications along with it. It’s quite difficult to integrate multiple data sources which is likely the reason no one else in the space has tried it. We only use sources of data that we feel confident are completely transparent to the consumer providing their data anonymously. Not only do each of these data sources provide different forms of data but they also deliver the data to us in a variety of forms. Our own toolbar and application data is sent to us essentially in real-time as the users traverse the web. Every click is captured anonymously and recorded in our database with a time stamp and consistent ID tag. For our partner data this data is typically sent to us on a daily basis in a bulk file. Every click event in the data is cataloged in our data base again and saved with a timestamp and user ID. In addition to the click stream data we have demographic and usage data for every panelist. This allows us to normalize and project the data across the multiple sources so that it is representative of the US online population.
From a product standpoint, how far along in development is Compete's service? Do you consider Compete 50% finished, 80% finished?
Is anything ever finished? To answer that question I actually need to split up our business. With regard to Compete Inc and our “more traditional” services I would say we’re probably half way there. We have some incredibly valuable industry leading product offerings that help leading brands to succeed. However, I think there’s significant room for growth ahead. Some industry verticals are more savvy than others when it comes to analyzing web data. We’re definitely farther along in those spaces and as a result have significantly expanded the work we do with those clients.
The second answer is with respect to compete.com. Our engineering team might tie me up in a closet for saying this but I would say we’re only about 10% of the way there on compete.com. This is not to say the current implementation is lacking in any way. The reality is that we have a laundry list of enhancements and new features we intend to rollout over the coming year and beyond. In the next couple month’s you’ll be seeing some major changes to the site that we think will greatly improve what we offer in terms of tools and services. That won’t be the end by a long shot though. I think compete.com is likely to be a continual work in progress for us.
What are the major areas that you foresee Compete expanding? More data from ISPs? More toolbar users? More panelists? Other ways of collecting data?
We’re always looking to expand our data set. The larger our panel the more we can do with it. We do a lot with our sample of 2 million but we would love to turn that into 3, 5, or even 10 million active panelists. Rest assured there are a number of people working on this right now. One of the primary focuses for us on this front is maintaining the transparency into where we get out data and what we do with the data. In addition to expanding our current partnerships we are working on a number of new ways to expand our panel by offering consumers value in exchange for their data. Data is currency and we’re constantly looking for ways to help consumers realize and benefit from this. Unfortunately, I can’t be too specific on exactly where we’re headed here.
Compete Search Analytics is a newer service from Compete - can you tell us a little about that product and how it's supposed to help marketers and webmasters?
Search Analytics makes competitive keyword level search referral data available to every online marketer, not just the ones with budgets in the millions. There are a number of inexpensive or even free tools available in the market to understand query level data. Word Tracker is a great example of a paid service and the major engines obviously offer a number of free tools. These kinds of tools are excellent for generating lists of potential keywords. The piece of the puzzle they lack that we provide is the actual action that resulted from the search activity. Compete’s Search Analytics data is based on search referrals. We look across all of the search activity exhibited by our panel of 2 million people and then look at where those consumers click through to. Within the tool itself you can look at this from essentially two angles. You can start with a certain keyword or phrase and the tool can provide you with a list of domains that are capturing traffic from that keyword. Alternatively you can start at the domain level and do a deep dive into the top keywords generating traffic to that domain. In addition to simply generating lists of domains we take it a step further and provide a number of metrics to help marketers determine the relative value of different keywords beyond simple volume of referrals. Along with the data itself we spent a lot of time developing a pricing structure that would allow marketers to extract the maximum benefit from the tools while maintaining a budget appropriate to their business. There are a few other competitive tools available in the marketplace that provide referral level data (Hitwise and comScore), however, their pricing structures allow only the top tier marketers with the highest budgets to utilize the data. Just like everything we do with Compete.com we’re about opening the data to more people not less. As I mentioned … this is just the start for Search Analytics. There are a lot of additions and improvements to come.
Do you expect to be able to provide other products around competitive analysis based on your datasets in the future? Any hints as to what those might be :) ?
We’ll be expanding the footprint and toolset available on compete.com quite a bit in the coming months. We’ll be adding small improvements such as Paid and Organic search breakouts, refinement tools, and Engine splits to Search Analytics. We will also be adding some major new tool such as a Top Lists Tool and Trended Search Data. Unfortunately beyond that I can’t share too many specifics at this time. Suffice it to say, we won’t be satisfied until compete.com is the defacto industry source for the most precise competitive web analytics data available.
I noted that Matt Cutts, Danny Sullivan and myself at SEOmoz all shared traffic data recently, and yet when one looks at a Compete chart for those three, the numbers are way off (even in a relative sense).
_
As you can see above, Compete's data suggests that all three sites were virtual non-entities until June of 2007 (I wonder if that's when the SearchStatus for Firefox toolbar started reporting back data to Compete... hmm?)
_
Obviously, that creates a lot of uncertainty about using services like Compete (and the others - as Alexa, Quantcast, etc. don't have anything resembling accuracy either). These aren't the biggest sites on the web, but they're getting millions of visits each year and aren't even comparatively accurate. How do you reconcile that with Compete's mission and do you think the service can reach a point where you're getting traffic or rank numbers on sites like these usable for competitive analysis?
We have the best and most robust panel solution available in the US right now. If you want competitive data we truly think our data is hands down the most precise. However, the smaller the sites get the more difficult it is for our representative sample to pick up those smaller sites. We are constantly working to both increase the size of the panel and the representativeness. In general, we feel confident in our ability to estimate the traffic for the top 1 million domains. We always make an effort to address specific requests to investigate potential anomalies in our data. When we dig into the data sometimes we find projection improvements on our side and sometimes we don’t. We’re always open to have the discussion though.
We regularly dig into the data for our major clients and compare to their internal numbers. Anecdotally we have almost always found our numbers to be extremely accurate especially when you look at activities where it is possible to remove methodology bias. For example think about credit card applications. This is a very measurable activity on Credit Card service providers site. It’s something you can physically count with no cookie implications. We have found that we are generally able to estimate these types of activities with incredible accuracy. In many cases we’re only off by a few percentage points from the actual number.
Unfortunately, It’s difficult to compare different methodologies. Outside of panel based measurement there are cookie based measurement systems. As we’ve seen from a number of studies, cookie deletion is much more prevalent than one might expect. While our panel represents only a sample of the population it is exempt from cookie deletion issues. When you consider that amongst the general population upwards of 20% of cookies get deleted on a regular basis the ramifications are clear. Additionally if you look at sites such as those you cite, the average visitor to these sites is likely to be more technically inclined and web savvy. I have no data to back this up but I’m guessing that segment actually deletes cookies more often than the average consumer on the web. I for one delete my cookies on average 3 times a week as I investigate various websites and Online Media, I’m also a regular reader of all three sites. I may be the extreme but you get the idea.
(Follow Up to #7) You noted that while Compete may not have great data on smaller sites, you feel that the data is very good for the top million domains. A couple questions on that point - first, what is the unique visitors / month cutoff to be in that group and - second, I compared a couple domains whose traffic I'm familiar with (Domain1.com & Domain2.com - sorry, couldn't share these publicly, but both get 3mil+ visits per month) and found one to be off by an order of 50% and the other more than 20%. That's obviously still much more accurate than the data for the SEO sites, but I have a hard time imagining that the difference is just in cookie deletion. How confident are you about the data accuracy? Do you have a figure like "X% of our data on the top 1 million websites is within Y% of the real numbers"?
The cutoff value for our confidence interval is roughly around 20-25,000 Unique Visitors per month. I hope this helps.
On the (Domain1.com) and (Domain2.com) question we typically see 20 to 30% as being easily attributed to cookie deletion and duplication. 50% is a little extreme. Unfortunately I don't have a great answer other than to say it is something we could always look into a little deeper. Perhaps we need to tweak our scaling and normalization for that site and/or similar sites.
(The following is on this subject from later in our conversation string over email)
Quick side note on Domain1.com and Domain2.com - As it turns out (unbeknownst to me) we actually did a deep dive with the Domain1.com team a few months back and reconciled our numbers with their internal numbers. We have also had a similar conversation with the folks at Domain2.com. I believe that one may still be in process.
Just some interesting tidbits the Compete.com and Data Ops folks shared with me when I highlighted your concern on those two sites.
The cookie deletion problem is an interesting one, too and it brings to mind another follow-up; why not count visits? If visits are the metric used, both your data and the data of the analytics reporting from the sites themselves would match up more closely, right?
Visits is one of our standard metrics. You can get this at the domain level on Compete.com right now. We generally deliver visits as one of our standard metrics for any client work. The reality is however, that the industry has been conditioned to focus on UVs. We often have clients that don’t understand the value of Visits vs. Unique Visitors at all.
FYI in case you hadn’t seen it, Comscore pulled together a decent study on cookie deletion last year - https://www.comscore.com/press/release.asp?press=1389
Following up on the previous question - one of my big concerns for Compete would be attracting the tech world elite, including bloggers, journalists and pundits and turning them into raving fans. Is this part of Compete's strategy and if so, do you worry that the difficulty in tracking blog traffic makes it hard to "sell" these folks on the service?
Yes, Absolutely. Turning the tech/web world elite into fans is a HUGE focus of ours. The key to this is getting them to actually join the Compete community and contribute their anonymous data to the overall panel. Every additional tech elite that we get to join our panel gets us one step closer to the tail and more accurate representation of the segments web behavior. This is a bold undertaking, however, since this group is often more reluctant to share that kind of data. As a response we’re creating a number of tools and plugins that we think will add enough value for the tech elite to join contribute their clicks. It’s truly amazing how many of the tech elite enable click tracking on their Google Toolbars, and for what … Pagerank? We think we have significantly more to offer than Pagerank.
You mentioned that Compete will attempt to verify and reconcile its numbers against what many of the larger sites listed are reporting. Can you talk about that process and what's entailed? This seems like something that might give Compete a real competitive edge - do you publicize when these connections are complete? How many have you done over the past year?
The process is somewhat ad hoc. Essentially when a company or site owner contacts us about discrepancies between their data and our own or even between our data and another third party source we look at the variation to determine if the gap is substantial or not. If it does look substantial we dive into a couple of areas. First we look across our 10+ different sources of data to see if there are any panel bias issues. For instance does a certain type of panel have abnormally low statistics for the given site. If we do see this than we can apply an adjustment to our normalization process that accounts for this. If this does not address the issue we then take a look at external factors that might address the issue.
In most cases if we can’t identify a normalization adjustment we can generally find an external factor that at least rationalizes the issue. At the end of the process there is really no “public” announcement or anything of that kind. Over the past year or so we have probably gone through this process for at least 50 domains. The bottom line is that we don’t simply change the numbers to match the sites internal numbers. We look for a logic explanation as to why our projection and normalization process is generating different values and adjust as necessary.
Going in a completely different direction... As a guy who's in the competitive analysis field, are there any blogs or sites that you personally read/recommend?
I’m a huge feed reader. Right now I have about 30 feeds that I track in my reader. I read all of these pretty religiously every week but, if I had to pick my favorites it would probably be (present company excluded of course)
- Danny’s Search Engine Land
- GigaOm
- John Battelle’s Blog
- Read/Write Web
- Juice Analytics
- Tim Ferriss’s Blog
- Techcrunch
Although I have to admit keeping up with Arrington and his Tech crunch army of posters can be a full time job. Beyond my feeds I keep tabs on the Hitwise blogs especially Bill’s. And of course I never miss a post over on the Compete blog!
Outside of Compete, what's your favorite web-based site analysis tool?
Not exactly a site analysis tool but, I love playing with Google Trends. I just wish they would provide some units on those charts, they certainly have the sample to do it. I really like what Quantcast has done with their dashboard look as well. I spend a fair amount of time checking our numbers against Quantcast numbers and while I’m hot and cold on the data, the presentation and layout is great.
If you weren't working at Compete, where might you be, Jeremy? Any other big passions that you'd pursue?
That’s a tough one. I’m definitely fascinated by some of the work going on in the semantic and next gen search space. Powerset, Hakkia, and Mahalo, all have some pretty interesting technology/concepts. It’s hard to say whether any of them will succeed in the long run but it would definitely be very fun to be working on the cutting edge of search. Beyond the web, Skiing and Cycling are my biggest passions in life. If I could somehow how get paid to ski powder 6 months a year and ride bikes the other 6 that would be pretty cool! I’m not holding my breath.
Thanks a ton for the interview, Jeremy - it's great to have such openness from Compete.
Timely! I bought my first Compete credits the other day. One request having bought a few reports is to be able to see some trend data in there (e.g. trends in kw / referrer volumes for competitive sites).
Keep your eyes on Compete Will. We will be rolling out some small examples of trended data in our next release and then will be adding some full blown trended tools at the keyword level shortly after that this year.
Fantastic. Looking forward to it.
I agree - that could be very useful.
Great interview and I thought Jeremy was very forthcoming in comparison to some of the SE folks you've interviewed. However, just by seeing the innacuracy of the data I would be more than hesitant to use the service.
They sure make pretty pictures though.
They do make pretty graphs don't they. I guess that's part of the package. Maybe the better the graphs look the more likely you are to believe that the results are credible, reliable, and accurate.
Good marketing, but just shows that we have to be responsible and always question everything.
Not only I like the look of the graphs, I really enjoy using the tools - they are easy to use. This is my favorite tool in terms of usability and positive user experience (no, I don't use their tools often but I just can't stop browsing the site once I am there).
Rand - you seem to have got an exclusive on Compete's rebrand as a site about domestic animals!
Seriously though, nice interview & one I need to re-read to digest. Can I second Jeremy's point about the Hitwise blogs, although as a Brit I prefer Heather's (although she's gone rather quiet since moving to Japan).
I do like Compete's Search Analytics tool but I still feel like we're in the dark ages of web traffic analytics. There's just nothing out there that's really reliable, even in relative terms, as Rand's example demonstrates.
Is it something where you can say (like Avinash Kaushik suggests doing), well, I don't have full confidence in the data, and I don't even have 50% confidence --
But maybe I have 10% confidence, and I can still make a (small) decision based on that.
Regardless of the unknowns, we still have to make decisions. Better to make them on incomplete data than nothing at all.
The argument could be made that the relatively weak web data is still much better than data for other media like newspapers, magazines, and TV.
Respectfull, I disagree. Making decisions that involve committing thousands of dollars based on incorrect data is by far worse than knowing that you don't know.
I'm not saying that Compete's data is incorrect as I don't have personal experience with it, but for me there is a level of accuracy that would have to exist for it to be considered in what are often costly decisions.
Yes, to some extent ... which is why SEO is about more than just numbers, scripts, and even best practices ... there is also experience and gut.
But I think it is important to keep in mind that most data is questionable, to some extent ... it often comes down to knowing the right questions to ask, but also that the web and the search industry probably provides more data accessibility to anyone than just about any other industry.
Companys spend thousand even millions on marketing and advertising with comparable or even less valuable data. Runninga billboard ad will easily run $20K plus in an average market, based solely on location, size, and eyeballs.
I'm totaly with you though on wanting to see continue to get even more accurate.
Of course, also look at how much weight people put on Google backlink counts, Alexa data, Keyword Discovery or Wordtracker.
Absolutely ... the joy of working in the web space is that we get to complain about the flaws in the data we have access to. When you think about this, it is truly a luxury. We expect a lot in our world. Most advertisers in traditional media would kill for the equivalent level of data offline. Think about how much weight goes into Nielsen TV ratings. You want to talk about a flawed sytem of data...
I agree...better something than nothing. and this is like 1000% confidence verus Alexa.
This was a great interview and a nice inner glimpse. I especially like Compete's goal to try open up this information to the greater bulk of the web community...whether that be SEO/web firms or site owners themselves, many who simply don't have deep enough pockets for some of the existing services.
The Search Analytics view into keywords would be especially valuable as a complement to some of the keyword databases, which are probably not nearly as ideal as any of us would like to hope.
One of the concerns I would have would be whether there is anything that helps isolate data sources? I like that the data comes from multiple sources, but does that also mean that numbers could be overstated due to double or triple counting?
I'm also glad to see Compete being as "open" as they are on the data sources, even giving an approximate number of the panel size. But, when we start talking panels and toolbars, I can't help but wonder about the skew one direction or another around demographics... highly techy, non-techy, male to female, or narrow age groups.
Either way, it's one more snapshot of information to compare up against all other data. And like all search/web data, must always be taken with a grain of salt and questioned.
We have regular checks in place to identify the same user showing up through multiple sources. It's actually fairly simple to identify since we have timestamaps and URL streams. When we see a datafeed that is identical coming from two different data sources we can remove one from the final dataset so we don't double count.
Excellent, and thanks for the follow up here Jeremy.
I was thinking or at least hoping there would be something in place. Of course there must be reasonable expectations that there will be some overlap that slips through the cracks -- reality is, there are few if any perfect sciences out there.
I've definitely been wanting to take a deeper, closer look, so this is a great prompter.
It may well be useful for US sites but from my perspective Compete's data on Australian .AU sites is so bad that it makes Alexa look like it offers quality data (and we all know that it doesn't)
Here is an even better example to compare Compete's accuracy results...
Here are the public stats for 1/4 of the Technorati Top 100 Bloggers
Clearly Compete.com has met a need in the market. I have several clients that are too small to afford a Hitwise contract but desperately need good competitive intel. Compete.com has helped plug that hole and I look forward to them continuing on with their development plans. I bought credits from them because I want to make sure my money helps them continue to grow which will in turn help my firm grow as well. Despite its shortcomings, clients love this kind of data!
What this can tell us is to keep looking at those results as trends and not as exact numbers. So we can make decissions based on trends on the data we gather from several data tools plus other related information in the Industry. I believe it will keep improving though towards getting much more exact data.
Great interview. I've just been digging into Compete in the past week or so. I have to say that the site's i'm familiar with are showing about double the traffic across the board in Compete as to what they really receive. That is Unique Vistors as reported by Google Analytics vs. People Count as reported by Compete. However the traffic trends seem to match up fairly close.
Also, the cutoff of 20-25,000 Unique Visitors seems way off. I looked at a one of our sites that has less than 1,000 Uniques a month that is ranked in the 650,000 range on Compete.
Still, like others have said, some data is better than no data at all, we just can't make accurate predictions on the real traffic levels, but certainly can compare sites to get an idea.
Sorry I should have been more explicit about the 20-25K number. Essentially we have very high confidence in our ability to estimate down to the 20 to25k range. This takes you roughly down to the 100,000th site as ranked by UVs. Beyond this we have reasonable confidence in our estimates down to the millionth site which gets roughly 1,000 UVs per month. There will be a bit more volatility in the numbers from 20k down to 1k in terms of UV estimates but for the majority of those site we feel directionally confident. On a monthly basis we actually recieve data on something closer to 6 million domains. I hope this helps clarify a bit.
Thanks Rand for this interview and great examples. I have used Compete and I like it, plus it's relatively cheap. Much like toolbar PR, however, this should really be used more for entertainment and estimation purposes than planning trillion dollar budgets, as pointed out above.
I think that the comparison chart is obviously a no-no, since one site could skew upwards and the other downwards by 50% and you could get really confused.
Very cool interview there by the SEOMoz guyz and Jeremy's thoughts were too very well explained :)