How to Set Up Meaningful (Non-Arbitrary) Custom Attribution in Google Analytics

Attribution modeling in Google Analytics (GA) is potentially very powerful in the results it can give us, yet few people use it, and those that do often get misleading results. The built-in models are all fairly useless, and creating your own custom model can easily dissolve into random guesswork. If you’re lucky enough to have access to GA Premium, you can use Data-Driven Attribution, and that’s great—but if you haven't got the budget to take that route, this post should show you how to get started with the data you already have.

If you've read up on attribution modelling in the past, you probably already know what’s wrong with the default models. If you haven’t, I recommend you read this post by Avinash, which outlines the basics of how they all work.

In short, they’re all based on arbitrary, oversimplified assumptions about how people use the internet.

The time decay model

The time decay model is probably the most sensible out of the box, and assumes that after I visit your site, the effect of this first visit on the chance of me visiting again halves every X days. The below graph shows this relationship with the default seven-day half-life. It plots "days since visit" against "chance this visit will cause additional visit." If it takes seven days for the repeat visit to come around, the first visit's credit halves to 25%. If it takes 14 days for the repeat visit to come around, the first visit's credit halves again, to 12.5%. Note that the graph is stepped—I'm assuming it uses GA's "days since last visit" dimension, which rounds to a whole number of days. This would mean that, for example, if both visits were on the day of conversion, neither would be discounted and both would get equal credit.

There might be some site and userbase out there for which this is an accurate model, but as a starting assumption it’s incredibly bold. As an entire model, it’s incredibly simplistic—surely we don’t really believe that there are no factors relevant in assigning credit to previous visits besides how long ago they occurred? We might consider it relevant if the previous visit bounced, for example. This is why custom models are the only sensible approach to attribution modelling in Google Analytics—the simple one-size-fits-all models are never going to be appropriate for your business or client, precisely because they’re simple, one-size-fits-all models.

Note that in describing the time decay model, I’m talking about the chance of one visit generating another—an important and often overlooked aspect of attribution modelling is that it’s about probabilities. When assigning partial credit for a conversion to a previous visit, we are not saying that the conversion happened partly because of the previous visit, and partly because of the converting visit. We simply don’t know whether that was the case. It could be that after their first visit, the user decided that whatever happened they were going to come back at some point and make a purchase. If we knew this, we’d want to assign that first visit 100% credit. Or it might be that after their first visit, the user totally forgot that our website existed, and then by pure coincidence found it in their natural search results a few days later and decided to make a purchase. In this case, if we knew this, we’d want to assign the previous visit 0% credit. But actually, we don’t know what happened. So we make a claim based on probabilities. For example, if we have a conversion that takes place with one previous visit, what we’re saying if we assign 40% credit to that previous visit is that we think that there is a 40% chance that the conversion would not have happened without the first visit.

If we did think that there was a 40% chance of a conversion being caused by an initial visit, we’d want to assign 40% credit to “Position in Path” exactly matching “First interaction” (meaning visits that were the user's first visit). If you want to use “Position in Path” as your sole predictor of the chance that a visit generated the conversion, you can. Provided you don’t pull the percentages off the top of your head, it’s better than nothing. If you want to be more accurate, there’s a veritable smorgasbord of additional custom credit rules to choose from, with any default model as your starting point. All we have to do now is figure out what numbers to put in, and realistically, this is where it gets hard. At all costs, do not be tempted to guess—that renders the entire exercise pointless.

Tested assumptions

One tempting approach is simply to create a model based to a greater or lesser extent on assumptions and guesswork, then test the conclusions of that model against your existing marketing strategy and incrementally improve your strategy in this manner. This approach is probably better than nothing for improving your market strategy, and testing improvements to your strategy is always worthwhile, but as a way of creating a realistic attribution model this starting point is going to set you on a long, expensive journey.

The ideal solution is to do this process in reverse—run controlled experiments to build your model in the first place. If you can split your users into representative segments, then test, for example,

the effect of a previous visit on the chance of a second visit
the effect of a previous non-bounce visit on the chance of a second visit
the effect of a previous organic search visit on the chance of a second visit

and so on, you can start filling in your custom credit rules this way. If your tests are done well, you can get really excellent results. But this is expensive, difficult, and time consuming.

The next-best alternative is asking users. If users don’t remember having encountered your brand before, that previous visit they had probably didn’t contribute to their conversion. The most sensible way to do this would be an (optional but incentivised) post-conversion questionnaire, where a representative sample of users are asked questions like:

How did you find this site today?
Have you visited this site before?

If yes:

How many times?
How did you find it?
Did this previous visit impact your decision to visit today?
How long ago was your most recent visit?

The results from questions like these can start filling in those custom credit rules in a non-arbitrary way. But this is still somewhat expensive, difficult and time-consuming. What if you just want to get going right away?

Deconstructing the Data-Driven Attribution model

In this blog post, Google offers this explanation of the Data-Driven Attribution model in GA Premium:

“The Data-Driven Attribution model is enabled through comparing conversion path structures and the associated likelihood of conversion given a certain order of events. The difference in path structure, and the associated difference in conversion probability, are the foundation for the algorithm which computes the channel weights. The more impact the presence of a certain marketing channel has on the conversion probability, the higher the weight of this channel in the attribution model.The underlying probability model has been shown to predict conversion significantly better than a last-click methodology. Data-Driven Attribution seeks to best represent the actual behaviour of customers in the real world, but is an estimate that should be validated as much as possible using controlled experimentation.” (my emphasis)

Similarly, this paper recommends a combination of a conditional probability approach and a bagged logistic regression model. Don't worry if this doesn't mean much to you—I’m going to recommend here using a variant of the much simpler conditional probability method.

I'd like to look first at the kind of model that seems to be suggested by Google's explanation above of their Data Driven Attribution feature. For example, say we wanted to look at the most basic credit rule: How much credit should be assigned to a single previous visit? The basic logic outlined in the explanation from Google above would suggest an approach something like this:

Find conversion rate of new visitors (let’s say this is 4%)
Find conversion rate of returning visitors with one previous visit (let’s say this is 7%)
Credit for previous visit = ((7-4)/7) = 43%

To me, this model is somewhat flawed (though I’m fairly sure that this flaw lies in my application of Google’s explanation of their Data-Driven Attribution rather than in the model itself). For example, say we had a large group of repeat visitors who were only coming to the site because of a previous visit, but that were converting poorly. We’d want to assign credit for these (few) conversions to the previous visits, but the model outlined above might assign them low or negative credit; this is because even though conversions among this group are caused by previous visits, their conversion rate is lower than that of new visitors. This is just one example of why this model can end up being misleading.

My best solution

Figuring out from our data whether a repeat visitor came because of a previous visit or independently of a previous visit is hard. I’ll be honest: I don’t know how Google does it. My best solution is an approximation, but a non-arbitrary one. The idea is using the percentage of traffic that is either branded or direct as an indicator for brand familiarity. Going back again to how much credit should be assigned to a single previous visit, my solution looks like this:

Calculate the percentage of your new visitor traffic is direct, branded organic or branded PPC (let’s say it’s 50%)

Note: Obviously most of your organic is (not provided), so I recommend multiplying your total organic traffic by the % of your known keyword traffic that is branded. As (not provided) approaches 100%, you’ll have to use PPC data to approximate your branded organic traffic levels.

Calculate the percentage of your 2nd-time-visitor traffic is direct, branded organic or branded PPC (let’s say it’s 55%)
Based on the knowledge that only 50% (in this case) of people without previous visits use branded/direct, approximate that without their first visit we’d only have seen (100%-55%)*(100/50)=90% of these 2nd time visitors.
Given this, 10% of visitors came because of a previous visit, so we should assign 10% credit for 2nd time visits to the first visit.

We can use similar logic applied to users with 3+ visits to calculate the credit deserved by “middle interactions”.

This method is far from perfect—that’s why I recommended two others above it. But if you want to get started with your existing data in a non-arbitrary way, I think this is a non-ridiculous way to get started. If you’ve made it this far and you have any ideas of your own, please post them in the comments below.

Comments 16

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Christophe BENOIT

2014-03-10T03:10:56-07:00

The link to Avisnash post is broken (:/ are missing in the url). The good one is https://www.kaushik.net/avinash/multi-channel-attribution-modeling-good-bad-ugly-models/

4 0

 The link to Avisnash post is broken (:/ are missing in the url). The good one is https://www.kaushik.net/avinash/multi-channel-attribution-modeling-good-bad-ugly-models/ 
Cancel
- Tom Capper
 
 2014-03-10T03:30:10-07:00
 
 Thanks Christophe - should be fixed soon.
 
 2 0
 
 Thanks Christophe - should be fixed soon. 
 Cancel
 - Keri Morgret
 
 2014-03-10T03:40:46-07:00
 
 Fixed!
 
 2 0
 
 Fixed! 
 Cancel
Amit Roy

2014-03-10T18:17:21-07:00

Hey Tom,

I used the attribution modeling to figure out the conversion path and to view the recency of visitors. But I really liked the tips you discussed at the closing note of your post. I'm gonna try it and see if it really yields the required results. I had analyzed the data pulled from the analytics through Pearson Correlation to figure out relations if any. It works too. Attribution becomes quiet important as one needs to figure out what's performing and what's not for a marketer. It helps to gauge the performance of multiple channels and plan out for necessary actions needed.

1 0

 Hey Tom, I used the attribution modeling to figure out the conversion path and to view the recency of visitors. But I really liked the tips you discussed at the closing note of your post. I'm gonna try it and see if it really yields the required results. I had analyzed the data pulled from the analytics through Pearson Correlation to figure out relations if any. It works too. Attribution becomes quiet important as one needs to figure out what's performing and what's not for a marketer. It helps to gauge the performance of multiple channels and plan out for necessary actions needed. 
Cancel
Kamil Storc

2014-03-11T01:20:11-07:00

Very usefull

2 1

 Very usefull 
Cancel
Amit Roy

2014-03-10T18:12:08-07:00

Hey Tom,
I used the attribution modeling to figure out the conversion path and to view the recency of visitors. But I really liked the tips you discussed at the closing note of your post. I'm gonna try it and see if it really yields the required results. I had analyzed the data pulled from the analytics through Pearson Correlation to figure out relations if any. It works too. Thanks for such tips and links you have shared!!

1 0

 Hey Tom, I used the attribution modeling to figure out the conversion path and to view the recency of visitors. But I really liked the tips you discussed at the closing note of your post. I'm gonna try it and see if it really yields the required results. I had analyzed the data pulled from the analytics through Pearson Correlation to figure out relations if any. It works too. Thanks for such tips and links you have shared!! 
Cancel
Alice Lee

2014-03-10T09:38:32-07:00

Hi Tom, awesome post.

After you find out the % of value that should be assigned to the 1st visit, do you simply take the volume distribution of "assisted conversion" for each marketing channel and divide the total attributable value up between the channels accordingly?

1 0

 Hi Tom, awesome post. After you find out the % of value that should be assigned to the 1st visit, do you simply take the volume distribution of "assisted conversion" for each marketing channel and divide the total attributable value up between the channels accordingly? 
Cancel
- Tom Capper
 
 2014-03-17T03:18:52-07:00
 
 GA will do all that for you - all you need to do is figure out the percentage you want to use and enter it into the model.
 
 1 0
 
 GA will do all that for you - all you need to do is figure out the percentage you want to use and enter it into the model. 
 Cancel
JamesHalloran

2014-03-10T08:59:29-07:00

Hi Tom,

Using your PPC data is definitely a good idea for connecting the dots. But what about people that aren't very PPC-heavy? I know some non-profits have had a hard time figuring out the effectiveness of their site visits since most of the organic traffic went "Not Provided." Any thoughts on that?

1 0

 Hi Tom, Using your PPC data is definitely a good idea for connecting the dots. But what about people that aren't very PPC-heavy? I know some non-profits have had a hard time figuring out the effectiveness of their site visits since most of the organic traffic went "Not Provided." Any thoughts on that? 
Cancel
- valetseo
 
 2014-03-10T10:23:26-07:00
 
 One option would be to use Google and Bing Webmaster tools search query data to establish a percentage of brand versus non-brand traffic to the site by look at the Click data in those reports.
 
 2 0
 
 One option would be to use Google and Bing Webmaster tools search query data to establish a percentage of brand versus non-brand traffic to the site by look at the Click data in those reports. 
 Cancel
- Tom Capper
 
 2014-03-17T03:20:44-07:00
 
 Hi James. That's a tough situation - I'd have to recommend running a small PPC experiment in that situation, unfortunately.
 
 1 0
 
 Hi James. That's a tough situation - I'd have to recommend running a small PPC experiment in that situation, unfortunately. 
 Cancel
vep.carl

2014-03-10T01:25:52-07:00

Been exploring ANALYTICS but never done this before. Thank you for sharing tom.
And by the way, correct me if I'm wrong tom. Does the "paid search" in the above image .that you are tracking
May refer as?
- Adwords
- Adsense
And can we scrape this thing either excel or any platforms.

vep.carl edited 2014-03-10T01:27:26-07:00
1 0
 Been exploring ANALYTICS but never done this before. Thank you for sharing tom. And by the way, correct me if I'm wrong tom. Does the "paid search" in the above image .that you are tracking May refer as? <ul><li>Adwords</li> <li>Adsense</li> </ul> And can we scrape this thing either excel or any platforms. 
Cancel
- Tom Capper
 
 2014-03-10T02:20:33-07:00
 
 Hi Adam. It refers to Adwords. What exactly would you want to scrape?
 
 1 0
 
 Hi Adam. It refers to Adwords. What exactly would you want to scrape? 
 Cancel
 - vep.carl
 
 2014-03-10T04:12:22-07:00
 
 Ah I see. Actually i was exactly referring to Google ppc ad results for a particular keyword. Is it possible tom?.
 
 1 0
 
 Ah I see. Actually i was exactly referring to Google ppc ad results for a particular keyword. Is it possible tom?. 
 Cancel
 - Tom Capper
 
 2014-03-10T04:14:45-07:00
 
 I believe it is possible, yes, though I'm not sure it'd be useful in this context.
 
 2 0
 
 I believe it is possible, yes, though I'm not sure it'd be useful in this context. 
 Cancel
 - vep.carl
 
 2014-03-10T04:17:05-07:00
 
 Yes, that's why I am confused regarding with this context of yours. I just wonder if it is available in Analytics. Anyways thanks Tom :)
 
 1 0
 
 Yes, that's why I am confused regarding with this context of yours. I just wonder if it is available in Analytics. Anyways thanks Tom :) 
 Cancel

Post Analytics