I've been hiding from my natural geekiness recently. My last few blog posts and my most recent presentations have all been about broad marketing ideas, things that play out well in the boardroom, and big picture "future of the industry" stuff.
Although those topics are all well and good, sometimes I need to feed the geek. And my geek lives on logic and maths (yes, I'm going to use the *s* throughout - it's how we roll in the UK and that's where I studied). One of our most recent hires in our London office is a fellow maths graduate and I've been enjoying the little discussions and puzzles.
(The last one we worked on together: in how many number bases does the number 2013 end in a "3"? Feel free to share your answers and workings in the comments.)
Rather than just purely geek out over pointless things, I have been casting my mind over the ways that mathematical ideas can help us out as marketers; either by making us better at our jobs, or by helping us understand more advanced or abstract concepts. Obviously a post like this can only scratch the surface, so I've designed it to link out to a bunch of resources and further reading. In approximate ascending order of difficulty and prerequisites, here are some of my favourite mathematical ideas for marketers:
Averaging averages
The first and simplest idea is really a correction of a common misconception. We were talking about it here in the context of some data we were visualising for a client. The problem goes like this:
Our client had data for average income broken down by all combinations of age, location, and gender (details changed to protect the innocent). We wanted to get the average income by gender.
It's tempting to think that you can do this from the data provided by averaging all the female values and averaging all the male values, but that would be incorrect. If the age or geographic distribution is not perfectly uniform by gender, then we will get the wrong answer. Consider the following entirely made up example:
- Female, 25, London - Average: 30,000 (10,000 people)
- Female, 26, London - Average: 31,000 (11,000 people)
It's tempting to say that the average for the whole group is 30,500. In fact, it's 30,524 (because of the hidden variable that there are more in the second group than the first).
You will often encounter this in marketing when presented with percentages. Suppose you have a campaign that made 200% ROI in month one and 250% ROI in month two. What's the ROI of the campaign to date?
Answer: anywhere in the range 200-250%. You have no idea where.
Try it out on this brainteaser hat-tip @tomanthonyseo:
If I drive at 30mph for 60 miles, how fast do I have drive the next 60 to average 60mph for the whole trip?
Correlation coefficients
Although the mathematical background can look scary, linear regression and correlation coefficients represent a relatively simple concept. The idea is to measure how closely related two variables are; think about trying to draw a "line of best fit" through an X-Y scatter chart of the two variables.
The summary of how it works is that it finds the line through the scatter chart that minimises the sum of the distances of the points of the scatter plot away from the line.
The great part is that you don't even need to dig into the mathematical details to use this technique. Excel has built in functions to help you do it - check out this YouTube video showing how to do it:
Bayes
Thomas Bayes was a mathematician who lived in the early 1700s. The break-through he made was to come up with a way of analysing probability statements of the form:
"What's the probability of event A given that event B happened?"
Mathematicians write that as P(A|B).
Bayes discovered that this = P(A and B) / P(B)
In plain English, that means:
"The probability of both event A and B happening divided by the probability of B happening."
And also that P(A|B) = P(B|A) * P(A) / P(B)
Which means:
"The probability of B happening given A happened, times the probability of A happening, divided by the probability of B happening"
Why is this important? It's critical to understanding the results of all kinds of tests - ranging from medical trials to conversion rate. Here's a challenge from this great explanation of Bayesian thinking:
"1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?"
If you want to dig deeper into the marketing implications, I really like this article.
O(n) and o(n)
One of the things I did during my maths degree was write really bad code. My lecturers suggested using either Pascal or C. C sounded like "real programming," so I chose that. It's incredibly easy to write horrible programs in C because you manage your own memory (reminding me of this programming joke).
When you think of programs failing, you tend to think of crashes or bugs that return the wrong answer. But one of the most common failings when you start hacking on real world problems is writing programs that run for ever and never give you an answer at all.
As we get easy access to more and more data, it's becoming ever easier accidentally to write programs that would take hours, days, weeks, or even longer to run.
Computer scientists use what is known as "big O notation" to describe the characteristics of how long an algorithm will take to run.
Suppose you are running over a data set of "n" entries. Big O notation is the computer scientists' way of describing how long the algorithm will run in terms of "n."
In very rough terms, O(n^2) for example means that as the size of the dataset grows, the algorithm run-time will grow more like the square of the size of the dataset. For example, an O(n) algorithm on 100 things might take 100 seconds but an O(n^2) would take 100*100 =10,000 seconds.
If you're interested in digging deeper into this concept, this is a really good primer.
At a basic level, if you are writing data analysis programs, what I'm really recommending here is that you spend some time thinking about how long your program will take to run expressed in terms of the size of the dataset. Watch out for things like nested loops or evaluations of arrays. This article shows some simple algorithms that grow in different ways as the data size grows.
Nash equilibria
Using words like equilibria makes this sound scary, but it was explained in layman's terms in the film A Beautiful Mind:
"Games" are defined in all kinds of formal ways, but you can think of them as just being two people in competition, then:
"A Nash equilibrium occurs when both players can’t do any better by changing their strategies, given the likely response of their opponent."
The reason I include this bit of game theory is that it's critical to all kinds of business and marketing success; in particular, it's huge in pricing theory.
If you want a more pop culture example of game theory, this is incredible:
Time series
Time series is the wonkish mathematical name for data on a timeline. The most common time series data in online marketing comes from analytics.
This branch of maths covers the tools and methodologies for analysing data that comes in this form. Much like the regression analysis functions in Excel, the nice thing with time series analysis is that there is software and tools to apply the hard maths for you.
One of the most direct applications of time series analysis to marketing is decomposing analytics data into the different seasonality effects and real underlying trends. I covered how you do this using software called R in a presentation a few years ago - see slides 39+:
Prime numbers/RSA
OK. I'm getting a little tenuous now. It's not so much that you actually need to know the maths behind factoring large numbers or the technical details of public key cryptography.
What I do think is useful to us as technical marketers is to have some idea of how HTTPS/SSL secure connections work. The best resources I know of for this are:
- Entry-level and very readable introduction to codes and cryptography
- A surprisingly accessible technical overview of SSL
Markov chains
You might have come across the concept of Markov chains in relation to machine-generated content (this is a great overview). If you want to dive deep into the underlying maths, this is a great primer [PDF]
The general concept of Markov chains is an interesting one - the mathematical description is that a Markov chain is a sequence of random variables where each variable depends only on the previous one (or, more generally, previous "n").
Google Scholar has a bunch of results for the use of Markov Chains in marketing.
It turns out that there are a bunch of great mathematical properties of Markov Chains. By removing any possibility of the outcome of the next step being dependent on arbitrary inputs (allowing only the outcomes of the most recent entries in the sequence), we get results like conditions for stationary distributions [PDF]. A stationary distribution is one that converges to a fixed probability distribution - i.e. one that *isn't* based on previous elements in the sequence. This leads me neatly into my final topic:
Eigenvectors/Eigenvalues
OK. Now we're talking real maths. This is at least undergraduate stuff and quickly gets into graduate territory.
There is a branch of maths called linear algebra. It deals with matrix and vector computations (see MIT opencourseware if you want to dig into the details).
To follow the rest of my analogy, all you really need to know is how to multiply a matrix and a vector.
The result of multiplying appropriate vectors and matrices is another vector. When that vector is a fixed (scalar) multiple of the original vector, the vector is called an "eigenvector" of the matrix and the scalar multiplier is called an "eigenvalue" of the matrix.
Why are we talking about matrices? And what do they have to do with stationary distributions of Markov chains?
Well, remember PageRank?
From a mathematical perspective, there are two models of PageRank:
- The random surfer model - where you imagine a web visitor who randomly clicks on outbound links (and randomly "jumps" to another arbitrary page with a fixed probability)
- The (dominant) eigenvector of the link matrix
You'll notice that the random surfer model is a Markov model (the probability of moving from page A to page B is dependent *only* on A).
It turns out that the eigenvector is actually the stationary distribution of the random surfer Markov chain.
And not only that. The random jump factor? Turns out that is necessary to (a) make sure that the Markov chain has a stationary distribution AND (b) make sure that the link matrix has an eigenvector.
Things like this are the the things that make mathematicians excited.
I appreciate that this post has been something a bit different. Thanks for bearing with me. I'd love to hear your geek-out tips and tricks in the comments.
Good Will (Critchlow) Hunting? :)
Do you like apples?
Fellow UK maths grad here, couldn't resist the first problem.
Feel like I'm missing a trick, but in my head all the bases where 2013 ends in a 3 should just be all the divisors of 2010 > 3 (so that if 2010 => 0, then 2013 => 3).
prime divisors = 2, 3, 5, 67.
Adding combinations of multiple prime divisors ( 6, 10, etc), I get an answer of 13 different bases.
Almost, you forgot that you can't have a digit equal to 3 unless you're at least in base 4. There are 11 different bases.
I think Patrick is right actually. You are right that you need to limit to bases >= 4 but still, there are 13 I think:5, 6, 10, 15, 30, 67, 134, 201, 335, 402, 670, 1005, 2010
It's funny. Math was always something I detested in school.
Yet when it impacts money or your business it's amazing how important it becomes. I will be back to digest this post over the next few days. ;)
Andrew
I think also a short primer on sampling,standard deviations, standard errors and the use of Wald approximations of binomial distributions to calculate confidence intervals on tests would be a good addition.
Most tests in most channels required people to measure and report Net Response Rates or CTR% and people either
a) don't understand samplingb) don't understand the theory behind the equation they use for CIs.c) or run tests till they spot a significant result (ie the peeking error)
this is quite a neat tool to check if the Wald approx holds (1.96 * SQRT (p(1-p)/n)) vs the other more complicated ones
https://epitools.ausvet.com.au/content.php?page=CIProportion (via aussie vet site)
Yeah - you're right. I nearly put in a section on standard deviations, but the rest of that would have been good as well.
How dare you present math problems to me early on a Monday morning, Will. I shall revisit your equations on Tuesday evening :)
Math(s), statistics, predictive analytics, bayesian analytics and data modeling is the future of business and it's been headed that way for the last 20 years. if you were not good at math and abstract equations in school, you are going to hate the direction business is headed in the next 20.
Great post Will, now I just need to figure out what you actually said...
Great Post Will, I learned a lot.
Very good article, I have been looking for more information on this topic and in this blog I have been able to find the most relevant information. thank you very much
Well, I wasn't expecting this when I turned on my computer this morning! It's strange - I recognised that a number of markets have reached more or less a Nash equilibrium (that persists until someone throws in what Seth Godin calls a Purple Cow), and that the random surfer model is a Markov chain, but I didn't make the connection between that and eigenvectors. Maybe because I didn't expect something I studies in quantum mechanics to come into marketing...
But given that PageRank has moved on from the random surfer to the preferential surfer, can we look at it in the same way? Or do we need to think more about power laws? That would be convenient, because as marketers we are kind of used to them having dealt with the Pareto distribution.
I reckon it was the connection between Markov chains and eigenvectors that gave Google the edge - leaving others wondering how the hell Google managed to recalculate PageRank so quickly over such a huge dataset.
The change from random surfer to preferential shouldn't change things much, just changes how the probabilities are calculated (cut down the chances that links in the footer are clicked and increase chances of links above the fold in main content) but actual theory/method and implementation probably very similar.
Correct. My linear algebra is hazy, but I think that the jump probability (which still exists in reasonable surfer) is enough to ensure the matrix has an eigenvector even if you futz with the probabilities of clicking on each link on the page.
cool & fun post - for those of us who are (among other things) mathematically inclined
i have so many clients (i.e., business owners) who are 'afraid' of math (and/or convinced they have no aptitude for it) - this sort of post would have them running to the nearest seo 'expert' with open wallets and blank checks :-O
i'm far too ethical to show this blog post to them (can you say "unfair advantage?")
I have learnt so many new things today from a single blog today! Thanks and please give us a new blog something like this one.. Geeky not a typical marketing stuff.
Marketing + Math = happy clients :)
Not yet. I didn't put it to them. I'll ask them at lunchtime.
BTW, loved the Goldenballs example of Game Theory. It clearly showed the guy on the right was a little brighter than the other guy. Would have been great if the bald guy had worked out why he was insisting on stealing and calculated that the guy probably didn't want to make himself look like a complete t**t on national TV, so was probably doing it deliberately and was probably therefore going to show "Split" (as he did), as he (the bald guy) could have then have stolen anyway and then said:
"Yeah, well I knew what he was doing and he thought I thought he was so convincing that I had no choice so had to just trust him and choose "Split", but I knew that he thought I was most likely to think this way, and therefore I decided I could steal, but I'll still give him 1/2 as he was only practising Game Theory and deserves 1/2 the money for being so clever (but not quite as clever as me)."
You get into all that Level 1, Level 2, Level 3 thinking professional poker players talk about.
Three of our tech guys got your brainteaser in about 10 seconds - the answer is either A) "It's impossible because the first half of the trip takes you 2 hours, you would need to travel the whole 120 miles in 2 hours in order to average 60 mph for the whole trip, so as the first half of the trip has already taken you 2 hours, you therefore have no time left in order to go that fast" or B) Infinite Speed (if there is such a thing). I don't think the speed of light is near enough as even travelling at 299,792,458 m/s would still mean it would take a minute fraction of a second to travel 60 miles and you don't even that!My answer is this: the question is redundant as Tom lives and works in London, doesn't he? So the chances of him travelling for 60 miles in any direction from the capital at an average speed of 30mph is zero! :)
It was me (not Tom) but yes, I live in London. Imagine I went to a race track or something ;)
You're right with the solution though.
Did your tech guys get the 2013 question right?
I don't think anyone answered your brain teaser here. It's not possible to average 60mph! You could come close though if you were traveling the speed of light for the latter half.
I really love the Nash equilibria example. I studied Game Theory for a semester in college and never understood at the time how it could be applied to real life work. Those examples in the link provided on using it to for competitive strategy is definitely interesting to think about.
Bayes Theory is definitely great for CRO testing and even strategic planning for marketing budgets and understanding what strategies work well, given you have the right data set to calculate probabilities off of. I would love to use things like this more with my current work, but it's great to know that someone else is at the least!
Whoa. My brain hurts. Reminds me how much math I've forgotten since college! I have a computer science degree, and when we started using more robust social media monitoring tools I noticed how the idea of complex boolean search queries baffled some fellow marketers. My mathematical background certainly helped me here. Trying to explain how important a good boolean query string was for effective social media monitoring led me to write a whole white paper to explain the concept. If you're interested, check it out here: The Importance of the Boolean Search Query In Social Media Monitoring Tools.
https://www.dragonsearchmarketing.com/social-media-monitoring-tools-query-white-paper/
Great article, especially on linear Algebra and SEO. When i look for a decent SEO article, there is organic seo on seoMOZ available, which includes an incredible group of intellects which belong to the seoMOZ community. I go to seoMOZ for innovative,fresh, and unique organic seo ideas!
Bookmarked this post.
It will take me a few reads to understand...... maybe.
I thought Google Analytics does all the math for me automagically. :-P
Thanks for sharing your insight. I'm not as excited about math as some other readers but I'm excited about knowing whether my campaigns are working. Math is just the "necessary evil" to me.
I love this. Even though I'm a marketer (now turned business owner), I'm a mathematician at heart and always loved looking at these things, esp game theory (that was my favorite class). I love the Nash equilibrium...and the movie in general and how they visualized everything. It's ironic, because I was problem solving an issue a few months ago,and that scene came into my head and gave me my solution.
this is going to occupy me all day. In Physics school if we wanted to solve a base problem issue like that, we'd just ask a mathematician
Thanks fro great post Will. From now this link will be my answer to non-technical marketers who want to know about basic stuff. But there is one point, most math models have limitations and work only for defined set of parameters. Ignoring such limitations can lead to mistakes in analysis, buy the way take a look at Ergodicity concept (time-series in some cases can't be analysed as random sets) . Also here is good article on Markov networks and web search success , which illustrate how it can be used.
AWESOME!!! I just read an SEO post that talks about Bayes' theorem. Not just for mathematicians, that's some pure judgment & decision making (& psych) gold. Even referenced my favorite fallacy - ignorance of the base rate and did so with my favorite puzzle (mammogram story). Awesome. Thank you.
Thanks for the article - some interesting reading and definitely some different content! Nice.
That was the best way out of a prisoner's dilemma that I've ever seen. Thanks for sharing that video and some awesome thoughts about math and marketing.
Interesting read. At first when I saw the title for this piece I thought that it may be a stretch to make this topical, but in the end I was pleasantly surprised. Thanks
Cool post, it's like being back at class getting homework set. How fast can a time machine go?
Bookmarked for review later so I can be sure I remember what I'm talking about! Thanks Will!