For the past couple weeks, I've been chatting over email with the folks at Compete.com about their web popularity reporting & analytics tools. Luckily enough, Jeremy Crane, the Director of Search & Online Media for Compete, agreed to an interview. Below, you can learn a lot more about how Compete gathers data, where they struggle, where they succeed and what the future of third party visitor analysis may hold.

The Compete.com Homepage
Compete.com's Homepage

For our readers who may not be familiar with Compete, can you explain the service and its goals and give us a brief background on yourself and your position?

Compete is a competitive web analytics company.  We’ve been around since 2001.  There’s actually a number of folks in the web space that think of us as a relatively new startup, but the reality is that we’ve been doing custom web analytics work for top tier brands for nearly 7 years now.   The predominant driver of this impression is the launch of compete.com back in November 2006.  Compete.com is by far the most visible part of Compete, but really only represents a small piece of what we do.  We do have big plans for compete.com though and intend it to be a much larger piece of our overall business.

There were four main drivers behind our launch of compete.com; unmet market need, improved visibility for Compete, greater transparency into the data, and an effort to give something back to the community driving our data.

The core driver of our launch of compete.com a year ago was an unmet market need.  There are millions of people running business on the web or at the very least highly dependent upon their website to drive their business.  Up until the launch of compete.com there really was no reliable and consistent source of competitive web analytics for the average marketer.  The existing companies in the space Netratings, ComScore, and Hitwise really were only servicing the top 1,000 or so companies in the space.  As we all know the beauty of the web economy is that all those millions of players outside of the top 1,000 can have a voice.  We felt those millions needed a better way to understand their web business.

As for my own role with Compete… I’ve been with Compete for just under 3 years now.  I joined Compete in our Automotive Practice helping major Automotive OEM clients to improve their online marketing efforts.  About a year and half ago I transitioned into my current role as the Director of Search and Online Media.  In a nutshell I lead all of our efforts in the Search and Online Media space.  This includes the development of Search Analytics on Compet.com as well as the custom client work we do with the major search engines, ad networks, and interactive agencies.  I also work in a cross-functional capacity helping our Industry Vertical teams analyze and understand the impact of Search and Online Media in their particular industry space.

I noted here that Compete talks a bit about using various methods of data collection - a toolbar, ISP data, and a panel (whose size is 2 million in the US). Can you elaborate and/or be more specific about your data collection methods?

One of things that sets us apart from the rest of our peers in the space is the fact that our data comes from multiple sources.  In fact we collect data on a monthly basis from more than 10 sources including ISP data, ASP data, Custom toolbars/desktop applications, and our own panel.  The multiple sources of data allows us to adjust for source bias that can exist with a single source of data, however, it also brings some complications along with it.  It’s quite difficult to integrate multiple data sources which is likely the reason no one else in the space has tried it.  We only use sources of data that we feel confident are completely transparent to the consumer providing their data anonymously.  Not only do each of these data sources provide different forms of data but they also deliver the data to us in a variety of forms.  Our own toolbar and application data is sent to us essentially in real-time as the users traverse the web.  Every click is captured anonymously and recorded in our database with a time stamp and consistent ID tag.  For our partner data this data is typically sent to us on a daily basis in a bulk file.  Every click event in the data is cataloged in our data base again and saved with a timestamp and user ID.  In addition to the click stream data we have demographic and usage data for every panelist.  This allows us to normalize and project the data across the multiple sources so that it is representative of the US online population.


From a product standpoint, how far along in development is Compete's service? Do you consider Compete 50% finished, 80% finished?

Is anything ever finished?  To answer that question I actually need to split up our business.  With regard to Compete Inc and our “more traditional” services I would say we’re probably half way there.  We have some incredibly valuable industry leading product offerings that help leading brands to succeed.  However, I think there’s significant room for growth ahead.  Some industry verticals are more savvy than others when it comes to analyzing web data.  We’re definitely farther along in those spaces and as a result have significantly expanded the work we do with those clients.

The second answer is with respect to compete.com.  Our engineering team might tie me up in a closet for saying this but I would say we’re only about 10% of the way there on compete.com.  This is not to say the current implementation is lacking in any way.  The reality is that we have a laundry list of enhancements and new features we intend to rollout over the coming year and beyond. In the next couple month’s you’ll be seeing some major changes to the site that we think will greatly improve what we offer in terms of tools and services.   That won’t be the end by a long shot though.  I think compete.com is likely to be a continual work in progress for us.

What are the major areas that you foresee Compete expanding? More data from ISPs? More toolbar users? More panelists? Other ways of collecting data?

We’re always looking to expand our data set.  The larger our panel the more we can do with it.  We do a lot with our sample of 2 million but we would love to turn that into 3, 5, or even 10 million active panelists.  Rest assured there are a number of people working on this right now.  One of the primary focuses for us on this front is maintaining the transparency into where we get out data and what we do with the data.  In addition to expanding our current partnerships we are working on a number of new ways to expand our panel by offering consumers value in exchange for their data.  Data is currency and we’re constantly looking for ways to help consumers realize and benefit from this.  Unfortunately, I can’t be too specific on exactly where we’re headed here.

Compete Search Analytics is a newer service from Compete - can you tell us a little about that product and how it's supposed to help marketers and webmasters?

Search Analytics makes competitive keyword level search referral data available to every online marketer, not just the ones with budgets in the millions.  There are a number of inexpensive or even free tools available in the market to understand query level data.  Word Tracker is a great example of a paid service and the major engines obviously offer a number of free tools.  These kinds of tools are excellent for generating lists of potential keywords.  The piece of the puzzle they lack that we provide is the actual action that resulted from the search activity.  Compete’s Search Analytics data is based on search referrals.  We look across all of the search activity exhibited by our panel of 2 million people and then look at where those consumers click through to.  Within the tool itself you can look at this from essentially two angles.  You can start with a certain keyword or phrase and the tool can provide you with a list of domains that are capturing traffic from that keyword.  Alternatively you can start at the domain level and do a deep dive into the top keywords generating traffic to that domain.  In addition to simply generating lists of domains we take it a step further and provide a number of metrics to help marketers determine the relative value of different keywords beyond simple volume of referrals.  Along with the data itself we spent a lot of time developing a pricing structure that would allow marketers to extract the maximum benefit from the tools while maintaining a budget appropriate to their business.  There are a few other competitive tools available in the marketplace that provide referral level data (Hitwise and comScore), however, their pricing structures allow only the top tier marketers with the highest budgets to utilize the data.  Just like everything we do with Compete.com we’re about opening the data to more people not less. As I mentioned … this is just the start for Search Analytics.  There are a lot of additions and improvements to come.

Do you expect to be able to provide other products around competitive analysis based on your datasets in the future? Any hints as to what those might be :) ?

We’ll be expanding the footprint and toolset available on compete.com quite a bit in the coming months.  We’ll be adding small improvements such as Paid and Organic search breakouts, refinement tools, and Engine splits to Search Analytics.  We will also be adding some major new tool such as a Top Lists Tool and Trended Search Data.  Unfortunately beyond that I can’t share too many specifics at this time.  Suffice it to say, we won’t be satisfied until compete.com is the defacto industry source for the most precise competitive web analytics data available.

I noted that Matt Cutts, Danny Sullivan and myself at SEOmoz all shared traffic data recently, and yet when one looks at a Compete chart for those three, the numbers are way off (even in a relative sense).

Compete.com Data for SEOmoz, MattCutts.com & SearchEngineLand.com
_
As you can see above, Compete's data suggests that all three sites were virtual non-entities until June of 2007 (I wonder if that's when the SearchStatus for Firefox toolbar started reporting back data to Compete... hmm?)
_

Obviously, that creates a lot of uncertainty about using services like Compete (and the others - as Alexa, Quantcast, etc. don't have anything resembling accuracy either). These aren't the biggest sites on the web, but they're getting millions of visits each year and aren't even comparatively accurate. How do you reconcile that with Compete's mission and do you think the service can reach a point where you're getting traffic or rank numbers on sites like these usable for competitive analysis?

We have the best and most robust panel solution available in the US right now.  If you want competitive data we truly think our data is hands down the most precise.  However, the smaller the sites get the more difficult it is for our representative sample to pick up those smaller sites.  We are constantly working to both increase the size of the panel and the representativeness.  In general, we feel confident in our ability to estimate the traffic for the top 1 million domains.  We always make an effort to address specific requests to investigate potential anomalies in our data.  When we dig into the data sometimes we find projection improvements on our side and sometimes we don’t.  We’re always open to have the discussion though.

We regularly dig into the data for our major clients and compare to their internal numbers.  Anecdotally we have almost always found our numbers to be extremely accurate especially when you look at activities where it is possible to remove methodology bias.  For example think about credit card applications.  This is a very measurable activity on Credit Card service providers site.  It’s something you can physically count with no cookie implications.  We have found that we are generally able to estimate these types of activities with incredible accuracy.  In many cases we’re only off by a few percentage points from the actual number.

Unfortunately, It’s difficult to compare different methodologies.  Outside of panel based measurement there are cookie based measurement systems.  As we’ve seen from a number of studies, cookie deletion is much more prevalent than one might expect.  While our panel represents only a sample of the population it is exempt from cookie deletion issues.  When you consider that amongst the general population upwards of 20% of cookies get deleted on a regular basis the ramifications are clear.  Additionally if you look at sites such as those you cite, the average visitor to these sites is likely to be more technically inclined and web savvy.  I have no data to back this up but I’m guessing that segment actually deletes cookies more often than the average consumer on the web.  I for one delete my cookies on average 3 times a week as I investigate various websites and Online Media, I’m also a regular reader of all three sites.  I may be the extreme but you get the idea.

(Follow Up to #7) You noted that while Compete may not have great data on smaller sites, you feel that the data is very good for the top million domains. A couple questions on that point - first, what is the unique visitors / month cutoff to be in that group and - second, I compared a couple domains whose traffic I'm familiar with (Domain1.com & Domain2.com - sorry, couldn't share these publicly, but both get 3mil+ visits per month) and found one to be off by an order of 50% and the other more than 20%. That's obviously still much more accurate than the data for the SEO sites, but I have a hard time imagining that the difference is just in cookie deletion. How confident are you about the data accuracy? Do you have a figure like "X% of our data on the top 1 million websites is within Y% of the real numbers"?

The cutoff value for our confidence interval is roughly around 20-25,000 Unique Visitors per month.  I hope this helps.

On the (Domain1.com) and (Domain2.com) question we typically see 20 to 30% as being easily attributed to cookie deletion and duplication.  50% is a little extreme.  Unfortunately I don't have a great answer other than to say it is something we could always look into a little deeper.  Perhaps we need to tweak our scaling and normalization for that site and/or similar sites.

(The following is on this subject from later in our conversation string over email)

Quick side note on Domain1.com and Domain2.com - As it turns out (unbeknownst to me) we actually did a deep dive with the Domain1.com team a few months back and reconciled our numbers with their internal numbers.  We have also had a similar conversation with the folks at Domain2.com.  I believe that one may still be in process.

Just some interesting tidbits the Compete.com and Data Ops folks shared with me when I highlighted your concern on those two sites.

The cookie deletion problem is an interesting one, too and it brings to mind another follow-up; why not count visits? If visits are the metric used, both your data and the data of the analytics reporting from the sites themselves would match up more closely, right?

Visits is one of our standard metrics.  You can get this at the domain level on Compete.com right now.  We generally deliver visits as one of our standard metrics for any client work.   The reality is however, that the industry has been conditioned to focus on UVs.  We often have clients that don’t understand the value of Visits vs. Unique Visitors at all.

FYI in case you hadn’t seen it, Comscore pulled together a decent study on cookie deletion last year - https://www.comscore.com/press/release.asp?press=1389

Following up on the previous question - one of my big concerns for Compete would be attracting the tech world elite, including bloggers, journalists and pundits and turning them into raving fans. Is this part of Compete's strategy and if so, do you worry that the difficulty in tracking blog traffic makes it hard to "sell" these folks on the service?

Yes, Absolutely.  Turning the tech/web world elite into fans is a HUGE focus of ours.  The key to this is getting them to actually join the Compete community and contribute their anonymous data to the overall panel.  Every additional tech elite that we get to join our panel gets us one step closer to the tail and more accurate representation of the segments web behavior.  This is a bold undertaking, however, since this group is often more reluctant to share that kind of data.  As a response we’re creating a number of tools and plugins that we think will add enough value for the tech elite to join contribute their clicks.  It’s truly amazing how many of the tech elite enable click tracking on their Google Toolbars, and for what … Pagerank?  We think we have significantly more to offer than Pagerank.

You mentioned that Compete will attempt to verify and reconcile its numbers against what many of the larger sites listed are reporting. Can you talk about that process and what's entailed? This seems like something that might give Compete a real competitive edge - do you publicize when these connections are complete? How many have you done over the past year?

The process is somewhat ad hoc.  Essentially when a company or site owner contacts us about discrepancies between their data and our own or even between our data and another third party source we look at the variation to determine if the gap is substantial or not.  If it does look substantial we dive into a couple of areas.  First we look across our 10+ different sources of data to see if there are any panel bias issues.  For instance does a certain type of panel have abnormally low statistics for the given site.  If we do see this than we can apply an adjustment to our normalization process that accounts for this. If this does not address the issue we then take a look at external factors that might address the issue.

In most cases if we can’t identify a normalization adjustment we can generally find an external factor that at least rationalizes the issue. At the end of the process there is really no “public” announcement or anything of that kind. Over the past year or so we have probably gone through this process for at least 50 domains.  The bottom line is that we don’t simply change the numbers to match the sites internal numbers.  We look for a logic explanation as to why our projection and normalization process is generating different values and adjust as necessary.

Going in a completely different direction... As a guy who's in the competitive analysis field, are there any blogs or sites that you personally read/recommend?

I’m a huge feed reader.  Right now I have about 30 feeds that I track in my reader.   I read all of these pretty religiously every week but, if I had to pick my favorites it would probably be (present company excluded of course)

Although I have to admit keeping up with Arrington and his Tech crunch army of posters can be a full time job.  Beyond my feeds I keep tabs on the Hitwise blogs especially Bill’s.  And of course I never miss a post over on the Compete blog!

Outside of Compete, what's your favorite web-based site analysis tool?

Not exactly a site analysis tool but, I love playing with Google Trends.  I just wish they would provide some units on those charts, they certainly have the sample to do it.  I really like what Quantcast has done with their dashboard look as well.  I spend a fair amount of time checking our numbers against Quantcast numbers and while I’m hot and cold on the data, the presentation and layout is great.

If you weren't working at Compete, where might you be, Jeremy? Any other big passions that you'd pursue?

That’s a tough one.  I’m definitely fascinated by some of the work going on in the semantic and next gen search space.  Powerset, Hakkia, and Mahalo, all have some pretty interesting technology/concepts.  It’s hard to say whether any of them will succeed in the long run but it would definitely be very fun to be working on the cutting edge of search.  Beyond the web, Skiing and Cycling are my biggest passions in life.  If I could somehow how get paid to ski powder 6 months a year and ride bikes the other 6 that would be pretty cool!  I’m not holding my breath. 

Thanks a ton for the interview, Jeremy - it's great to have such openness from Compete.