If you're a Google Analytics fan, you probably already know that Google released a new and incredibly useful featured called Weighted Sort. If you haven't seen it, here's a quick example – let's say you want to know which of your referring sites have the highest bounce rate. You could pull up your referrers, sort by bounce rate, and get something like this:
Fascinating, right? I now know that I lost 7 visitors due to 5 sites. If I could just get that bounce rate down to 60%, I'd have 3 more visitors. Wow. What did you really want to know, intuitively? Probably something more like this:
That's better – it's not the absolute highest bounce rate you wanted to know about, but the most important high bounce rate referrers. In a nutshell, that's the question weighted sort tries to answer.
How It Works
So, how does weighted sort work, exactly? Avinash Kaushik wrote a fascinating and very transparent post on the method behind Google's weighted sort algorithm. I encourage you to read his post and I don't want to copy it, but I'll try to do a very basic review here.
Google uses something called the "Estimated True Value" (ETV). ETV essentially says this – if the count column of the sort (in this case, Visits) is very low, assume that the column of interest (Bounce Rate) is roughly the average for the data in question. In other words, if a row has 1 visit and the average bounce rate is 75%, then set the ETV of bounce rate for that row to 75%. Since 1 visit isn't enough, statistically speaking, to make any really conclusions, we'll essentially ignore it.
On the other end of the spectrum, if you have a very high visit value, assume the bounce rate is accurate as is. Simple enough, right? What about values in the middle? Well, Google sets the ETV somewhere in between the average and the row's bounce rate. Exactly how much of each they use is the tricky part.
The Equation
This is where Avinash's post ends and mine really begins. I should warn you – it's not going to get Ben-complicated, but there is going to be some math. After a bout of 4am insomnia, I pieced together a simplified weighted sort equation. I'm going to present it first, explain it, and then provide an Excel spreadsheet with some real-life examples.
Let's assume we've got a data set exactly like above – visit counts and bounce rates for a set of referring sites. We're going to need 4 sets of variables:
- V = Visits for Row X
- B = Bounce Rate for Row X
- MV = Max Visits for the data set
- AB = Average (mean) Bounce Rate for the data set
For any given row, the ETV of Bounce Rate – ETV(B) – can be represented by the following equation:
ETV(B) = (V / MV * B) + ((1 - (V / MV)) * AB)
Crystal clear, right? It's not really as bad as it looks. Let's take an example – say we have the following data (same 4 variables as above):
- V = 100
- B = 80%
- MV = 500
- AB = 60%
The ETV(B) will consist of two components:
- V / MV * B = 100 / 500 * 0.80 = 0.20 * 0.80 = 0.16
- 1 - (V / MV) * AB = (1 - (100 / 500)) * 0.60 = 0.80 * 0.60 = 0.48
- ETV(B) = 0.16 + 0.48 = 0.64
Pay attention to the parts in bold – since 100 visits is 20% of the max visits for this data set, this row gets 20% of its bounce rate from the actual value and the rest (80%) from the average value for the data set. So, essentially, how much we use the "real" bounce rate for the row is a function of the proportion of that row's visit value to the visit value of the top referrer.
Build Your Own
Want to try it yourself? You can download my Excel spreadsheet and see the formula at work across a larger data set of actual referring visits from my own site. Although this replicates a function you already have in Google Analytics, it can be used for all sorts of applications that you don't have in GA, including PPC metrics (Visits by Quality Score, for example).
There are actually four sheets in the Excel workbook:
- Basic ETV formula
- Google's ETV sort
- Weighted ETV formula
- Log-based ETV formula
Those last two require a bit of explaining. In my very simple model (1), I calculate the average bounce rate by just taking an average across all the rows (for this data set = 70.6%). The thing is, that's not how Google calculates the average bounce rate. They actually weight it by the number of visits, which makes perfect sense. So, in Google Analytics, my bounce rate for this data set is 74.6%, which is what (3) shows. If you compare (2) to (3), you'll see that my weighted formula only differs in the Top 10 by rows #8 and #9 being swapped.
My approach is a pretty good approximation for this data set, but it's still just an approximation. If you have a very large range of visit values (1 to 100,000), you might find that rows with smaller but still interesting counts (1,000+) get unfairly ignored. Sheet (4) is a more complex formula that uses the Log (base 2) of visits instead of the raw visit value. This has the effect of de-emphasizing the visit count in favor of the "real" bounce rate for that row.
If you're still with me at this point, I hope you'll play around with the spreadsheet. If you find issues with your own data sets or discover some better/cooler way of doing it, please share it in the comments.
When Dr Pete sent me a working copy of this spreadsheet a week or so ago, I said: "holy crap!".
I love it when I learn soemthing new, particularly when it comes to Excel :-)
Thanks Dr Pete!
Hey Pete, as I was reading your post my eyes started getting blurry as you broke out the math. I think I'd better tackle this in the morning after coffee.
And while I am definitely a Google Analytics fan I'm probably the only one reading that didn't know about the new Weighted Sort [sigh] If I could only find a way to add 34 more hours to my days, I'm sure I'd stay on top of everything.
You know, I'm on Twitter and industry blogs so much that I just assume everyone knows about everything instantly, but some people have real jobs and a social life ;) I'm sure plenty of people haven't checked it out yet, as you don't really even see Weighted Sort until you click on an applicable sort column. It's well worth checking out, though.
I can say yes to the real job(s) but what's that other thing you mentioned...what's it called again? Oh yeah. A social life. Not sure what that is. Perhaps you could do a post about it?
no point on posting about Social Life. Everybody will just argue about how you calculate the ROI!
and dont worry, I hadnt seen "weighted sort" before I read this either.
thanks for the post Dr P.
Same for me on both the real job and "what's a social life" angle.
Hadn't heard of this either, thanks for the heads up! (and the head destroying math!)
A good intorduction to something we have needed for a long time now. No more faffing about using advanced filters to get rid of all pages with <200 vists and >1000 so that we can find actionable data - it will make my life just that little bit easier every day.
Between this and the latest AdWords Editor updates (at last I can see positions for campaigns and ads! no more flicking between online and off line! Why on earth didn't they give us this data before?) life in the future will involve a bit less fiddling to get the data and a bit more time doing something useful with it.
So True,
I was just thinking the same: 'goodbye advanced filter" at list for a few of the matrices.
Thanks for bringing this great feature to light here.
As Caseyhen and others said... great post.
I did know of this news thanks to Twitter stream... but actually did not checked it seriously until now.
I've experimented with your worksheet with the Analytics of my own site (yes, Mike is right: to learn you have to "play" with things) and actually discovered some little keywords gems that could be worth to polish.
Thanks again Peter.
PD: if you do things like this Just for fun - as you tweeted - what when are you playing hard for real?
Hey, you gotta love what you do, right? I get to work with great companies, do math for fun, make decent money, and still spend plenty of time with my baby girl. What more can I ask for?
To win the Lottery, maybe ;)?
Nice one Pete, I always learn something valuable from your posts. I'm glad that you've started blogging more frequently. Keep up the awesome work!
Golf clap for Avinash / Google Analytics team. This feature takes their product up another few notches in my eyes. I know it will save me some time and effort sorting/filtering analytics reports.
Hi, Pete,
The second component of ETV(B) formula missed a set of Parentheses. It should be
(1 - (V / MV)) * AB = (1 - (100 / 500)) * 0.60 = 0.80 * 0.60 = 0.48
BTW, "ETV(B) = V/MV*(B-AB)+AB" will be much easier to understand.
It's been so long since I looked at the formula, that it took me 10 minutes just to understand the comment. You're right on the parentheses. I've fixed that - thanks.
Great feature. Would love to be able to throw my goal conversion rate column in there but seem to loose the weighted sort when i try to.
Great post re:GA weighted sort here Dr. Pete. I really love this new feature from GA. Less time spent on analyzing tons of keyword traffic data.
So is this some insight into the next stage of the SEOmoz PRO toolkit that might automated this data analysis for you?
Just food for thought for now, but we are exploring integrating Google Analytics data into the PRO App and campaign management, so you never know.
I could imagine that you could do some really cool things once you roll out the historical Linkscape data linked in with actual campaign data
Thanks for sharing. I will be playing with your excel spreadsheet to get the hang of it.
This feature is not avaiable when using advanced segments.
Hi Peter,
Old post but still very interesting. I have been playing around with this using different count and interest columns. One question though, how should you deal with instances when the interest column is zero and the count is small? Division by zero. I guess the correct answer should lie towards the mean rather than zero.
Sorry, it's been quite a while. The only division I'm seeing/remembering is against the maximum value for the metric you're weighting ("Visits", in the spreadsheet). If the maximum value for that column was zero, then you'd basically have no data. Happy to dig deeper, if you can help me understand what you're seeing.
I bookmarked this to reread. And my head still hurts from the maths :(
Good thing that i have people who do the maths better than I working for me... Thanks for sharing the excel sheet, that dioes make it easier... :(
Thanks for this update, I'll give it go and see what I can find.
Very cool! This comes in handy in a lot of situations, thanks for putting it in a usable format :)
I did not about that feature either. It seems to be pretty useful. It is good to see that google analytics trying to help with actionable analysis of data.
Awesome, I actually didn't know about weighted sort until I read this. The excel sheet is a huge bonus too, saves me the trouble of making it myself but lets me see the equations and play around with them.
Thanks!
Whoa, Doc!
You just took me back to my high school Algebra days! Besides the point, great breakdown of how the new weighted system works. I've had the time to play with it, and I'm pretty impressed so far. But your post definitely gave me a new way to use the system. Keep up the great writing.
Talk to ya soon!
This is critical info in web buliding. Great post, even with all the math.
Good post Peter. As Avinash once said, "the problem with averages is that it gives you average results". So when you have a very large range of data set you get misleading and erroneous results like the one you specified in your post (1 to 100,000 visits). So it make sense to segment your data as much as possible before you play with it. For e.g. before you apply weighted sort, filter out branded keywords using filters on your reporting interface or better create a custom report. In this way you can minimize erroneous results one get through averages and your rows with smaller but interesting counts (1,000+) won't be ignored.
Pete,
Brilliant summary and thanks for providing a simplified model that can be used (within reason) across various datasets. What hits this out of the park for me is the Excel download. I don't know if the old-timey learning classifications still hold up but I need to hear, see and play with stuff. I need it all if I am to truly internalize the knowledge. You have given me this.
Besides, it's time I nail a handful of useful math/marketing models. I'm an authority and can hold my own across all my areas of "professional" responsibility (both internally and especially with clients), but this is my weak spot. I'll be posting back any questions tomorrow evening. Today is packed full of deliverables.
...
Oh, I guess I can ask this... Does anyone have useful models they use for digital marketing that you would be willing to share? Personally, I have some I need to dust off from Conductor, SEOmoz, Laura Lippay, etc. Perhaps we can do this by PM as to not dilute this post?
Nice GA add-on and thanks for the details and the how-to.
Pete,
Great post, I've been playing around with your worksheet and think it's great! I've tried it out on a few clients and it really made me think about some of the keyword they have been targeting. Everyone should do a quick run through with their clients and see the results. Double thumbs up!!