We recently posted some correlation statistics on our blog. We believe these statistics are interesting and provide insight into the ways search engines work (a core principle of our mission here at SEOmoz). As we will continue to make similar statistics available, I'd like to discuss why correlations are interesting, refute the math behind recent criticisms, and reflect on how exciting it is to engage in mathematical discussions where critiques can be definitively rebutted.
I've been around SEOmoz for a little while now, but I don't post a lot. So, as a quick reminder, I designed and built the prototype for the SEOmoz's web index, as well as wrote a large portion of the back-end code for the project. We shipped the index with billions of pages nine months after I started on the prototype, and we have continued to improve it since. Recently I made the machine learning models that are used to make Page Authority and Domain Authority, and am working on some fairly exciting stuff that has not yet shipped. As I'm an engineer and not a regular blogger, I'll ask for a bit of empathy for my post - it's a bit technical, but I've tried to make it as accessible as possible.
Why does Correlation Matter?
Correlation helps us find causation by measuring how much variables change together. Correlation does not imply causation; variables can be changing together for reasons other than one affecting the other. However, if two variables are correlated and neither is affecting the other, we can conclude that there must be a third variable that is affecting both. This variable is known as a confounding variable. When we see correlations, we do learn that a cause exists -- it might just be a confounding variable that we have yet to figure out.
How can we make use of correlation data? Let's consider a non-SEO example.
There is evidence that women who occasionally drink alcohol during pregnancy give birth to smarter children with better social skills than women who abstain. The correlation is clear, but the causation is not. If it is causation between the variables, then light drinking will make the child smarter. If it is a confounding variable, light drinking could have no effect or even make the child slightly less intelligent (which is suggested by extrapolating the data that heavy drinking during pregnancy makes children considerably less intelligent).
Although these correlations are interesting, they are not black-and-white proof that behaviors need to change. One needs to consider which explanations are more plausible: the causal ones or the confounding variable ones. To keep the analogy simple, let's suppose there were only two likely explanation - one causal and one confounding. The causal explanation is that alcohol makes a mother less stressed, which helps the unborn baby. The confounding variable explanation is that women with more relaxed personalities are more likely to drink during pregnancy and less likely to negatively impact their child's intelligence with stress. Given this, I probably would be more likely to drink during pregnancy because of the correlation evidence, but there is an even bigger take-away: both likely explanations damn stress. So, because of the correlation evidence about drinking, I would work hard to avoid stressful circumstances. *
Was the analogy clear? I am suggesting that as SEOs we approach correlation statistics like pregnant women considering drinking - cautiously, but without too much stress.
* Even though I am a talented programmer and work in the SEO industry, do not take medical advice from me, and note that I construed the likely explanations for the sake of simplicity :-)
Some notes on data and methodology
We have two goals when selecting a methodology to analyze SERPs:
- Choose measurements that will communicate the most meaningful data
- Use techniques that can be easily understood and reproduced by others
These goals sometimes conflict, but we generally choose the most common method still consistent with our problem. Here is a quick rundown of the major options we had, and how we decided between them for our most recent results:
Machine Learning Models vs. Correlation Data: Machine learning can model and account for complex variable interactions. In the past, we have reported derivatives of our machine learning models. However, these results are difficult to create, they are difficult to understand, and they are difficult to verify. Instead we decided to compute simple correlation statistics.
Pearson's Correlation vs. Spearman's Correlation: The most common measure of correlation is Pearson's Correlation, although it only measures linear correlation. This limitation is important: we have no reason to think interesting correlations to ranking will all be linear. Instead we choose to use Spearman's correlation. Spearman's correlation is still pretty common, and it does a reasonable job of measuring any monotonic correlation.
Here is a monotonic example: The count of how many of my coworkers have eaten lunch for the day is perfectly monotonically correlated with the time of day. It is not a straight line and so it isn't linear correlation, but it is never decreasing, so it is monotonic correlation.
Here is a linear example: assuming I read at a constant rate, the amount of pages I can read is linearly correlated with the length of time I spend reading.
Mean Correlation Coefficient vs. Pooled Correlation Coefficient: We collected data for 11,000+ queries. For each query, we can measure the correlation of ranking position with a particular metric by computing a correlation coefficient. However, we don't want to report 11,000+ correlation coefficients; we want to report a single number that reflects how correlated the data was across our dataset, and we want to show how statistically significant that number is. There are two techniques commonly used to do this:
- Compute the mean of the correlation coefficients. To show statistical significance, we can report the standard error of the mean.
- Pool the results from all SERPs and compute a global correlation coefficient. To show statistical significance, we can compute standard error through a technique known as bootstrapping.
The mean correlation coefficient and the pooled correlation coefficient would both be meaningful statistics to report. However, the bootstrapping needed to show the standard error of the pooled correlation coefficient is less common than using the standard error of the mean. So we went with #1.
Fisher Transform Vs No Fisher Transform: When averaging a set of correlation coefficients, instead of computing the mean of the correlation coefficients, sometimes one computes the mean of the fisher transforms of the coefficients (before applying the inverse fisher transform). This would not be appropriate for our problem because:
- It will likely fail. The Fisher transform includes a division by the coefficient minus one, and so explodes when an individual coefficient is near one and outright fails when there is a one. Because we are computing hundreds of thousands of coefficients each with small sample sizes to average over, it is quite likely the Fisher transform will fail for our problem. (Of course, we have a large sample of these coefficients to average over, so our end standard error is not large)
- It is unnecessary for two reasons. First, the advantage of the transform is that it can make the expect average closer to the expected coefficient. We do nothing that assumes this property. Second, as mean coefficients are near to zero, this property holds without the transform, and our coefficients were not large.
Rebuttals To Recent Criticisms
Two bloggers, Dr. E. Garcia and Ted Dzubia, have published criticisms of our statistics.
Eight months before his current post, Ted Dzubia wrote an enjoyable and jaunty post lamenting that criticism of SEO every six to eight months was an easy way to generate controversy, noting "it's been a solid eight months, and somebody kicked the hornet's nest. Is SEO good or evil? It's good. It's great. I <3 SEO." Furthermore, his twitter feed makes it clear he sometimes trolls for fun. To wit: "Mongrel 2 under the Affero GPL. TROLLED HARD," "Hacker News troll successful," and "mailing lists for different NoSQL servers are ripe for severe trolling." So it is likely we've fallen for trolling...
I am going to respond to both of their posts anyway because they have received a fair amount of attention, and because both posts seek to undermine the credibility of the wider SEO industry. SEOmoz works hard to raise the standards of the SEO industry, and protect it from unfair criticisms (like Garcia's claim that "those conferences are full of speakers promoting a lot of non-sense and SEO myths/hearsays/own crappy ideas" or Dzubia's claim that, besides our statistics, "everything else in the field is either anecdotal hocus-pocus or a decree from Matt Cutts"). We also plan to create more correlation studies (and more sophisticated analyses using my aforementioned ranking models) and thus want to ensure that those who are employing this research data can feel confident in the methodology employed.
Search engine marketing conferences, like SMX, OMS and SES, are essential to the vitality of our industry. They are an opportunity for new SEO consultants to learn, and for experienced SEOs to compare notes. It can be hard to argue against such subjective and unfair criticism of our industry, but we can definitively rebut their math.
To that end, here are rebuttals for the four major mathematical criticisms made by Dr. E. Garcia, and the two made by Dzubia.
1) Rebuttal to Claim That Mean Correlation Coefficients Are Uncomputable
For our charts, we compute a mean correlation coefficient. The claim is that such a value is impossible to compute.
Dr. E. Garcia : "Evidently Ben and Rand don’t understand statistics at all. Correlation coefficients are not additive. So you cannot compute a mean correlation coefficient, nor you can use such 'average' to compute a standard deviation of correlation coefficients."
There are two issues with this claim: a) peer reviewed papers frequently published mean correlation coefficients; b) additivity is relevant for determining if two different meanings of the word "average" will have the same value, not if the mean will be uncomputable. Let's consider each issue in more detail.
a) Peer Reviewed Articles Frequently Compute A Mean Correlation Coefficient
E. Garcia is claiming something is uncomputable that researchers frequently compute and include in peer reviewed articles. Here are three significant papers where the researchers compute a mean correlation coefficient:
"The weighted mean correlation coefficient between fitness and genetic diversity for the 34 data sets was moderate, with a mean of 0.432 +/- 0.0577" (Macquare University - "Correlation between Fitness and Genetic Diversity", Reed, Franklin; Conversation Biology; 2003)
"We observed a progressive change of the mean correlation coefficient over a period of several months as a consequence of the exposure to a viscous force field during each session. The mean correlation coefficient computed during the force-field epochs progressively..." (MIT - F. Gandolfo, et al; "Cortical correlates of learning in monkeys adapting to a new dynamical environment," 2000)
"For the 100 pairs of MT neurons, the mean correlation coefficient was 0.12, a value significantly greater than zero" (Stanford - E Zohary, et al; "Correlated neuronal discharge rate and its implications for psychophysical performance", 1994)
SEOmoz is in a camp with reviewers from the journal Nature, as well as researchers from MIT, Stanford and authors of 2,400 other academic papers that use the mean correlation coefficient. Our camp is being attacked by Dr. E. Garcia's, who argues our camp doesn't "understand statistics at all." It is fine to take positions outside of the scientific mainstream, although when Dr. E. Garcia takes such a position he should offer more support for it. Given how commonly Dr. E. Garcia uses the pejorative "quack," I suspect he does not mean to take positions this far outside of academic consensus.
b) Additivity Relevant For Determining If Different Meanings Of "Average" Are The Same, Not If Mean Is Computable
Although "mean" is quite precise, "average" is less precise. By "average" one might intend the words "mean", "mode", "median," or something else. One of these other things that it could be used as meaning is 'the value of a function on the union of the inputs'. This last definition of average might seem odd, but it is sometimes used. Consider if someone asked "a car travels 1 mile at 20mph, and 1 mile at 40mph, what was the average mph for the entire trip?" The answer they are looking for is not 30mph, which is mean of the two measurements, but ~26mph, which is the mph for the whole 2 mile trip. In this case, the mean of the measurements is different from the colloquial average which is the function for computing mph applied to the union of the inputs (the whole two miles).
This may be what has confused Dr. E. Garcia. Elsewhere he cites Statsweb when repeating this claim. Which makes the point that this other "average" is different than the mean. Additivity is useful in determining if these averages will be different. But even if another interpretation of average is valid for a problem, and even if that other average is different than the mean, it neither makes the mean uncomputable nor meaningless.
2) Rebuttal to Claim About Standard Error of the Mean vs Standard Error of a Correlation Coefficent
Although he has stated unequivocally that one cannot compute a mean correlation coefficient, Garcia is quite opinionated on how we ought to have computed standard error for it. To wit:
E. Garcia: "Evidently, you don’t know how to calculate the standard error of a correlation coefficient... the standard error of the mean and the standard error of a correlation coefficient are two different things. Moreover, the standard deviation of the mean is not used to calculate the standard error of a correlation coefficient or to compare correlation coefficients or their statistical significance."
He repeats this claim even after making the point above about mean correlation coefficients, so he clearly is aware the correlation coefficients being discussed are mean coefficients and not coefficients computed after pooling data points. So let's be clear on exactly what his claim implies. We have some measured correlation coefficients, and we take the mean of these measured coefficients. The claim is that we should have used the same formula for standard error of the mean of these measured coefficients that we would have used for only one. Garcia's claim is incorrect. One would use the formula for the standard error of the mean.
The formula for the mean, and for the standard error of the mean, apply even if there is a way to separately compute standard error for one of the observations the mean was over. If we were computing the mean of the count of apples in barrels, lifespans of people in the 19th century, or correlation coefficients for different SERPs, the same formula for the standard error of this mean applies. Even if we have other ways to measure the standard error of the measurements we are taking the mean over - for instance, our measure of lifespans might only be accurate to the day of death and so could be off by 24 hours - we cannot use how we would compute standard error for an observation to compute standard error of the mean of those observations.
A smaller but related objection is over language. He objects to my using the standard deviations in reference to a count of how far away a point is from a mean in units of the mean's standard error. As wikipedia notes, the "standard error of the mean (i.e., of using the sample mean as a method of estimating the population mean) is the standard deviation of those sample means" So the count of how many lengths of standard error a number is away from the estimate of a mean, according to Wikipedia, would be standard deviations of our mean estimate. Beyond it being technically correct, it also fit the context, which was the accuracy of the sample mean.
3) Rebuttal to Claim That Non-Linearity Is Not A Valid Reason To Use Spearman's Correlation
I wrote "Pearson’s correlation is only good at measuring linear correlation, and many of the values we are looking at are not. If something is well exponentially correlated (like link counts generally are), we don’t want to score them unfairly lower.”
E. Garcia responded by citing a source whom he cited as "exactly right": "Rand your (or Ben’s) reasoning for using Spearman correlation instead of Pearson is wrong. The difference between two correlations is not that one describes linear and the other exponential correlation, it is that they differ in the type of variables that they use. Both Spearman and Pearson are trying to find whether two variables correlate through a monotone function, the difference is that they treat different type of variables - Pearson deals with non-ranked or continuous variables while Spearman deals with ranked data."
E. Garcia's source, and by extension E. Garcia, are incorrect. A desire to measure non-linear correlation, such as exponential correlations, is a valid reason to use Spearman's over Pearson's. The point that "Pearson deals with non-ranked or continuous variables while Spearman deals with ranked data" is true in that to compute Spearman's correlation, one can convert continuous variables to ranked indices and then apply Pearson's. However, the original variables do not need to originally be ranked indices. If they did, Spearman's would always produce the same results as Pearson's and there would be no purpose for it.
My point that E. Garcia objects to, that Pearson's only measure's linear correlation while Spearman's can measure other kinds of correlation such as exponential correlations, was entirely correct. We can quickly quote Wikipedia to show that Spearman's measures any monotonic correlation (including exponential) while Pearson's only measures linear correlation.
The Wikipedia article on Pearson's Correlation starts by noting that it is a "measure of the correlation (linear dependence) between two variables".
The Wikpedia article on Spearman's Correlation starts with an example in the upper right showing that a "Spearman correlation of 1 results when the two variables being compared are monotonically related, even if their relationship is not linear. In contrast, this does not give a perfect Pearson correlation."
E. Garcia's position neither makes sense nor agrees with the literature. I would go into the math in more detail, or quote more authoritative sources, but I'm pretty sure Garcia now knows he is wrong. After E. Garcia made his incorrect claim about the difference between Spearman's correlation and Pearson's correlation, and after I corrected E. Garcia's source (which was in a comment on our blog), E. Garcia has stated the difference between Spearman's and Pearson's correctly. However, we want to make sure there's a good record of the points, and explain the what and why.
4) Rebuttal To Claim That PCA Is Not A Linear Method
This example is particularly interesting because it is about Principle Component Analysis(PCA), which is related to PageRank (something many SEOs are familiar with). In PCA one finds principal components, which are eigenvectors. PageRank is also an eigenvector. But I am digressing, let's discuss Garcia's claim.
After Dr. E. Garcia criticized a third party for using Pearson's Correlation because Pearson's only shows linear correlations, he criticized us for not using PCA. Like Pearson's, PCA can only find linear correlations, so I pointed out his contradiction:
Ben: "Given the top of your post criticizes someone else for using Pearson’s because of linearity issues, isn’t it kinda odd to suggest another linear method?"
To which E. Garcia has respond: "Ben’s comments about... PCA confirms an incorrect knowledge about statistics" and "Be careful when you, Ben and Rand, talk about linearity in connection with PCA as no assumption needs to be made in PCA about the distribution of the original data. I doubt you guys know about PCA...The linearity assumption is with the basis vectors."
But before we get to the core of the disagreement, let me point out that E. Garcia is close to correct with his actual statement. PCA defines basis vectors such that they are linearly de-correlated, so it does not need to assume that they will be. But this a minor quibble. This issue with Dr. E. Garcia's position is the implication that the linear aspect of PCA is not in the correlations it finds in the source data like I claimed, but only in the basis vectors.
So, there is the disagreement - analogous to how Pearson's Correlation only finds linear correlations, does PCA also only find linear correlations? Dr. E. Garcia says no. SEOmoz, and many academic publications, say yes. For instance:
"PCA does not take into account nonlinear correlations among the features" ("Kernel PCA for HMM-Based Cursive Handwriting Recognition"; Andreas Fischer and Horst Bunke 2009)
"PCA identifies only linear correlations between variables" ("Nonlinear Principal Component Analysis Using Autoassociative Neural Networks"; Mark A. Kramer (MIT), AIChE Journal 1991)
However, besides citing authorities, let's consider why his claim is incorrect. As E. Garcia imprecisely notes, the basis vectors are linearily de-correlated. As the sources he cites points out, PCA tries to represent the source data as linear combinations of these basis vectors. This is how PCA shows us correlations - by creating basis vectors that can be linearly combined to get close to the original data. We can then look at these basis vectors and see how aspects of our source data vary together, but because it only is combining them linearly, it is only showing us linear correlations. Therefore, PCA is used to provide an insight into linear correlations -- even for non-linear data.
5) Rebuttal To Claim About Small Correlations Not Being Published
Ted Dzubia suggests that small correlations are not interesting, or at least are not interesting because our dataset is too small. He writes:
Dzubia: "out of all the factors they measured ranking correlation for, nothing was correlated above .35. In most science, correlations this low are not even worth publishing. "
Academic papers frequently publish correlations of this size. On the first page of a google scholar search for "mean correlation coefficient" I see:
- The Stanford neurology paper I cited above to refute Garcia is reporting a mean correlation coefficient of 0.12.
- "Meta-analysis of the relationship between congruence and well-being measures" a paper with over 200 citations whose abstract cites coefficients of 0.06, 0.15, 0.21, and 0.31.
- "Do amphibians follow Bergmann's rule" which notes that "grand mean correlation coefficient is significantly positive (+0.31)."
These papers were not cherry picked from a large number of papers. Contrary to Ted Dzubia's suggestion, the size of a correlation that is interesting varies considerably with the problem. For our problem, looking at correlations in Google results, one would not expect any single high correlation value from features we were looking at unless one believes Google has a single factor they predominately use to rank results with and one is only interested in that factor. We do not believe that. Google has stated on many occasions that they employ more than 200 features in their ranking algorithm. In our opinion, this makes correlations in the 0.1 - 0.35 range quite interesting.
6) Rebuttal To Claim That Small Correlations Need A Bigger Sample Size
Dzubia: "Also notice that the most negative correlation metric they found was -.18.... Such a small correlation on such a small data set, again, is not even worth publishing."
Our dataset was over 100,000 results across over 11,000 queries, which is much more than sufficient for the size of correlations we found. The risk when having small correlations and a small dataset is that it may be hard to tell if correlations are statistical noise. Generally 1.96 standard deviations is required to consider results statistically significant. For the particular correlation Dzubia brings up, one can see from the standard error value that we have 52 standard deviations of confidence the correlation is statistically significant. 52 is substantially more than the 1.96 that is generally considered necessary.
We use a sample size so much larger than usual because we wanted to make sure the relative differences between correlation coefficients were not misleading. Although we feel this adds value to our results, it is beyond what is generally considered necessary to publish correlation results.
Conclusions
Some folks inside the SEO community have had disagreements about our interpretations and opinions regarding what the data means (and where/whether confounding variables exist to explain some points). As Rand carefully noted in our post on correlation data and his presentation, we certainly want to encourage this. Our opinions about where/why the data exists are just that - opinions - and shouldn't be ascribed any value beyond its use in applying to your own thinking about the data sources. Our goal was to collect data and publish it so that our peers in the industry could review and interpret.
It is also healthy to have a vigorous debate about how statistics such as these are best computed, and how we can ensure accuracy of reported results. As our community is just starting to compute these statistics (Sean Weigold Ferguson, for example, recently submitted a post on PageRank using very similar methodologies), it is only natural there will be some bumbling back and forth as we develop industry best practices. This is healthy and to our industry's advantage that it occur.
The SEO community is the target of a lot of ad hominem attacks which try to associate all SEOs with the behavior of the worst. Although we can answer such attacks by pointing out great SEOs and great conferences, it is exciting that we've been able to elevate some attacks to include mathematical points, because when they are arguing math they can be definitively rebutted. On the six points of mathematical disagreement, the tally is pretty clear - SEO community: Six, SEO bashers: zero. Being SEOs doesn't make us infallible, so surely in the future the tally will not be so lopsided, but our tally today reflects how seriously we take our work and how we as a community can feel good about using data from this type of research to learn more about the operations of search engines.
Ben - first off, thanks for the statistics lessons. I learned more in this process and, actually, with this post than I did in my stats class at University.
Second - I appreciate that you're doing so much to make things transparent. Your writing style isn't what I'm used to, so following portions of the post are challenging, but for technical writing, it's surprisingly accessible.
Third - It means a lot to me and I think to the community as a whole that you take this research and the rigor of statistics so seriously. As SEOs, we often have to rely on experience, examples and test data without nearly this degree of thoughtfulness (consider my recent WB Friday on paid links, which pales in comparison to this work).
I'm excited to do more of this with you in the future - hopefully we can uncover even more new and exciting data points about search engine operations, rankings and correlations.
Ben - following posts like this and our conversations at the seomoz/distilled conference in London last year, I am blown away by your statistical analysis.
Every good SEO should have fundamental statistical analysis skills, which is something that I missed out on during my educational career.
Every one of your posts is educational to the point of being humbling. Thank you!
Nice work, Ben. You've inspired me to dig out my stats books :) One is on my desk staring at me right now.
One point I think a lot of people miss is that, when it comes to advanced methodology, especially applied to a field that doesn't have a lot of established statistical standards, we're often in uncharted territory. Even the experts sometimes disagree, and we're going to make a mistake here and there, but I think that our ultimate goal is to help improve the methodology of the entire SEO discipline. You've made a lot of progress toward that goal in a short time.
Ben -
You know I rarely comment on the blog, but I felt that I needed to speak out and give you props on this one.
You are bringing a lot of legitimacy to the industry. I can't tell you how many times I've told people what my husband does for a living, only to have them roll their eyes.
One guy refered to it "alchemy". Someone else wondered if Rand and his co-workers were "just guessing". What you've done is bring something incredibly tangible - statistics - to the world of SEO. To think! SEO as a legitimate science! Based on facts and proven theories! If it doesn't make people take the industry seriously, I don't know what will.
Good on you. Score 1 for SEO!
Whether you agree with the methodology or not (or understand it or not), it's fantastic that SEOmoz is willing to get in the ring and challenge the bull. I offer a huge "great, thanks" to Rand and Ben for putting themselves on the line like this.
The first time I saw Rand give one of these correlation data presentations (I think it was in NY about a year ago), there was something in my gut that didn't accept the results. I saw the new and improved version recently at the SMX Advanced conference and I still found the results somewhat unsettling. I think my gut was aware that there must be a "confounding variable" to explain the unexpected results. But I didn't know what a confounding variable was until I read this post LOL.
I appreciate and respect that Ben's research isn't intended to offer up any answers about the causation of the results. Ben and Rand have been clear in articulating that it's not meant to do that. But it is helping us ask better questions. Only then might we uncover what the confounding variables are and be better able to assess what really matters for SEO.
Before I begin, let me give a little background on myself. My professional experience is PPC- I've just begun learning about SEO from SEOMOZ and other sources. I do utilize statistics in my work, and I'm a big proponent of statistical analysis in almost any cause-effect examination, but the models I create are much simpler. So with that said, I may put my foot in my mouth here.
Now I do not have any reservations over your modeling. Everything appears to me to be completely on the up and up- I can't point to anything you've done and say "Aha- you should have done that differently."
I do see some issues with what you are trying to model. In a perfect world, we could create statistical models in a vacuum, where the correlations and weights of those variables would hold fast. Obviously, that is simply impractical so we strive to do the best we can.
But in this case, what you are modeling has serious issues in expected results with time as a factor. Google obviously knows the factors and weights that lead to rankings in SERPs. We are trying to explain them using statistical models. However, the real problem here is that the SERP you pulled results from at t(0) (time 0) for q(1) (query 1) and the SERP you pulled results from for t(end) and q(last) may have had different factors and weights. Like I said, I am new to this, but if I recall correctly I have seen it mentioned here and other places that the search algorithms can be adjusted multiple times per day.
On top of this, fresh content is being added constantly which can change your results minute to minute for some queries. This means that in the case of instances where results do change, you cannot be entirely sure whether it is due to a change in the weights and factors, or due to a "better" result entering the fray. Taken separately you can probably model it. But when q(5000) is from a different model than q(1), or q(5000) would present different results at t(0) and t(n), it causes complications.
I am not bringing this up because I aim to refute your work, or to question the value of this work. I think it is GREAT that SEOs are interested in using more statistics and less artsy tag-lines for optimizing SEO (that reads funny :p) I honestly hope I am missing something and that this critique is refutable. If not, the challenge of modeling something where the criteria for results are changing rapidly is worth acknowledging, even though it doesn't mean the analysis is worthless.
Thank you for the great article, and I'm really enjoying the recent statistical articles here and on YOUMOZ. It's really helping me dive deeper into a topic I know little about in a manner that I can understand and find infinitely more actionable than "10 SEO Tricks from Top SEOs!"
I think both of those are excellent points Sandro:
In the future, I'll add both caveats to the numbers we present. Thanks!
I have a few points to make.
1. I studied statistics in college and even taught the statistics research lab for a year and a half. I know Pearson and Spearman tests.
2. I understand almost every word in this post and am not speaking from any lack of experience or understanding.
That said, I find it interesting that at no point is there a significance value given. My statistical analysis always lived and died by SPSS. If you run a pearson or spearman in SPSS, you always get a significance value, either for a 1-tailed or 2-tailed test.
Why did you not tell us the significance on any of these correlations? Stats 101 will teach you that you can have a .7 positive relationship, which would be strong, but with a sig. value of .215, it doesn't mean anything. I buy the argument that very low coefficient values like .2 can matter in this instance. However, no correlation data means a thing if you don't report significance, which from what I have seen, you have not. Please correct me if I'm wrong.
My biggest problem wtih all of this is that you are still doing tests of correlation. Why on earth do we waste time with a pearson or a spearman? As you very clearly stated, correlation data still doesn't tell us anything, especially in the SEO industry when you are talking about 200+ ranking factors. Talk about confounding variables, you are looking at rankings correlated to one of those 200 and you think that one correlation of .12 means something?
Don't get me wrong, I'm not a hater here. I'm a fellow math geek with a desire to get answers. But I want real answers here.We need to run experiments, not make observations. This is why I made a new community at realseoscience.com. I'm going to run experiments. Basically, I'll make a bunch of web pages on one domain so as to eliminate as many variables as possible and only isolate one. I'll check rankings. I'll perform a change. Observe the ranking change. Plug these numbers into SPSS. Run a paired samples t-test and let the math tell me if the change is statistically significant or not.
I invite you to do the same. I've already linked to you as I feel you are a valuable SEO science resource. However, I would like to see more experiments with t-test data that can be linked to cause rather than correlation. I'm happy to repost any experiments you perform with their accompanying data for any to analyze as well. Let's get more people on board with this idea and then the other math and stat people out there will have nothing to refute. We can conclusively prove once and for all if the H1 tag or alt tags really have an effect on rankings.
Sorry for the rant. Really like what you guys are trying to do, just think you can do it better. Be sure to hit me up when you do run future experiments so that I can repost them and get more visibility for the SEO science/stats community.
I too am used to significance values being reported with all correlations. I also agree with the need to run more SEO experiments to determine causal relationships. However, as I'm sure you well know, experiments can be quite difficult to perform properly, especially without a strong background and experience in the area. Additionally, studies done "in a lab" may or may not be useful in drawing inferences about a more "natural" setting. This is exacerbated by the likelihood that Google has taken measures to make it difficult to dissect their algorithm. Still, I look forward to following your blog and seeing the results of your experiments. It's not an experiment, but you might find my YOUmoz post interesting:
https://www.seomoz.org/ugc/what-is-pagerank-good-for-anyway-statistics-galore
I think this is great feedback. All though I am not a trained statistician, my last business managed to get a prediction patent and we used Pearson & Spearman in our analysis quite often. We ran research that was then provided to Nielsen and most of the major television networks.
We made very accurate predictions based on correlation values as low as 0.2, if we ever had 0.3+ we new we had a lock. However we were able to work in a much more controlled environment with far fewer variables. That being said, I applaud Ben & SEOMoz for taking these steps. Their experiements, research and feedback like Daniel's will only help improve the science and the results.
Ben - nice meeting you in Seattle at SXM and in your offices. Keep plugging away, this is great for our industry.
I'm excited to hear you are conducting trials, and I will be interested to hear your results.
While correlation results like those we posted have the problem of confounding variables, controlled experiments like what you intend have the issues with how well they generalize. For instance, are you testing your results across dozens of domains and IP address? Are you testing pages ranking on the 1st page for queries of interest to SEOs? If not, then there will be questions if your results generalize to different domains or to real world SEO problems. Your results will be valuable, and I will be excited to see them, but I'm skeptical that controlled experiments on this problem are going to give the unambiguous answers you are hoping they will.
When we've tried controlled experiments, it was hard to get enough trials to find statistical significance while still being careful to keep all of the trials independent enough and similar enough to the real world.
.....
Regarding significance: we did report the significance of the correlations by reporting standard error. This is, I think, a better practice on a problem like ours than reporting the results of a long list of t-tests.
Had we rejected null hypothesis to show nearly every result was significant (as noted in rebuttal #6 many with astronomical amounts of certainty), I'm not sure what that would have added besides length to the post. One can see the bar for significance by looking at the standard error values reported and multiplying by 1.96 (or whatever level of confidence you feel like). A reader is perhaps going to be interested in the significance of the difference between specific correlations which they can tell because we published standard error, but they would not have if we had published a list of how unlikely each correlation found was to have been a fluke. By publishing standard error, we show both.
...
Anyway, I'm excited to learn of your results when you post them.
Ben
Ultimately what you are reporting with stderr is a confidence interval. You can use that for significance, but confidence intervals aren't typically used that way.
So for corr of 0.2, and stderr of 0.002, you'd get a confidence interval of:
Lower limit: 0.2-(0.002*1.96) = 0.19608Upper limit: 0.2+(0.002*1.96) = 0.20392
So if the null hypothesis is corr = 0 and the alternate hypothesis is that corr != 0, in this case the alternate hypothesis is proven. You have proven significance in a roundabout manner.
Confidence intervals are used more frequently when you have an expected/desired result, but there is variance in individual samples. For example, filling a bottle of water to 16 ounces. You can run your tests, get a standard error, get a sample mean, see if 16 ounces is within a 95% (99% whatever you choose) confidence interval.
More typically we use t-tests to show significance. It says the same thing, true, it's just the contemporary way of expressing it. Miles per hour and kilometers per hour can both be valid ways of expressing the same concept, but they are used in different ways. And to be honest, if you are testing at 90%/95%/99.9% significance it's either significant or not. If you are trying to argue that some things are more "important" than others because they have higher levels of significance, it kind of rings hollow. If it's significant, it's significant.
Ben, I do see my room for error in that I have small sample sizes and other issues. The reason I use one domain and multiple pages on the domain is that it takes away every variable related to domain authority. If we examine multiple pages with no outside links on the same domain, it is a fantastic way to isolate on site ranking factors. Change one at a time and see what happens.
You are right though, hard to generalize this across the whole web. Which is why realseoscience.com is meant to be a community of sorts. I will post my own experiments. After explaining them, I hope other people will do their own experiments and send me their models and data. Then I can post all the research being done in the industry not just by seomoz or just by me, but by several people interested in this kind of thing.
If all our results tend to line up on certain issues, we can then generalize and say yup, H1 tags really do/don't matter. In academics replication is a huge thing. I think as we all repeat each others experiments in our own ways and if we get the same results, that will add validity to them instead of just me or you coming up with all the results on our own. The more involved we get the community of seo science/stats the better in my mind.
I personally assumed that non-significant correlations weren't shared, but you are right. There's no reason to be making those kind of assumptions. Significance should be mentioned, and in the event that something is proved to be insignificant it should be mentioned as well.
As far as attempting to run a series of tests utilizing a new domain, I'll be curious to see how you do it. Either you will have to censor/generalize your reporting of results, or you open the doors to a lot of unwanted interference. Also, I'd be interested to see how you setup your hypothesi since your sample size will be extremely small for some tests.
As far as going for causation in their tests, it's tough. When you are dealing with 200 variables and you really can't control for any of them, nor can you isolate covariance possibilities across some variables... it's for sure difficult.
We included standard error values for everything in both of the posts showing correlation results. I think that did a reasonable job of showing how much significance we had both for both the correlation themselves and for the differences between arbitrary correlations. The rebuttal to claim #6 in the blog post above also touches on issues of statistical significance.
What more would you be looking for with regard to showing significance?
(Update: SandroM - it looks like we are cross posting each other. I understand your post replying to mine above, and I think we are on the same page.)
Yeah, we played a little bit of post tag, but I think everything is all clear now :)
The problem with this post is addressed in AaronSW's post "That Sounds Smart": https://www.aaronsw.com/weblog/soundsmart
Although this clearly has some very technical jargon, I am saddened that much of this effort may go wasted if it's too technical to interpret. I, honestly, have not read this post, merely scanned it, and can tell it will take some effort to complete. I'm going to try - but just because it was created and it's popped full with technical jargon doesn't make it great.
I'd like a more layman's version of this post. Few people in SEO majored in statistics, so much of this content is above our noggin, making much of this extremely inaccessible to the market you're targeting.
And as much as everyone in the contents gave a "great, thanks!" as a response, there has to be something worth addressing here - only one post out of eight actually directly replies to the content, making me doubt that anyone actually read or got the content Ben wrote about.
That said, thanks for this, Ben. I look forward to the child version.
Ross, I think many commenters (and not just on our blog) leave "great, thanks!" comments without reading the full post. We once had a YOUmoz post where the author talked about how ridiculous Google Analytics was. But if you read the entire post, you knew that in the end she was just joking. That she was actually quite a fan of GA. But she received many thumbs down, and many comments telling her how GA was so awesome. Which obviously meant people didn't read the entire post.
My point being that there's no reason for this post to be any different. Sometimes people comment on the "theme" of a post, or on a specific section.
Most of our posts are written in a non-technical way, so that both the advanced and the beginner SEO can understand it. This post happens to be about statistics, I don't know that there's a "child version" and I'm not sure that there should be. I read the entire thing, but there were parts I didn't understand. That just tells me I don't know enough about statistics (in fact it's the only class I ever failed in college).
Ben's post kicks ass in my opinion, whether my non-statistics brain understands 100% of it or not.
I understand that this was more math than many
people want to read. That is why I don't post frequently and we don't include lengthy descriptions of why we use a particular methodology when we post results.
However, I felt it necessary to do a post like this because there were technical criticism of our methodology, and a decent respect to the opinions of the SEO community required showing that our results were derived using accepted statistical methodology. It would have been disrespectful of the community to ask them to just accept our assurances.
While we tried to make the discussion as accessible as possible, we were required to address the issues raised by the critics in enough technical depth that readers who wanted to would be able to evaluate the issue.
Ross, I am in agreement with much of what you have said. I also am amused by the fact that people have decided to thumb down your post, which seems to be the immediate knee-jerk reaction of some users of this site if others dare offer some well written and objective constructive criticism of a Staff member post.
Hey-ho, I just opened myself up for the same treatment.Jennita is of course correct; people have a habit of commenting and thumbing posts before they read them, even the non-technical ones. Ross was suggesting that to add true value to the SEO community, the SEO community must be able to comprehend what is being said. Is that such a wild notion? I don’t think so. As it happens I agree with you both that Ben's rebuttal was thorough and excellently written, for Dr Garcia, and other statistical experts. It was an extremely important post and did indeed add weight to the importance of SEOmoz's excellent work within the SEO community, and defending the work that you do.
That said, it is not excellently written for most SEOs, which does not devalue it as an academic piece but it does mean that it's value is limited to (I would guess, I can't prove) most SEOs. You have to articulate appropriately to your audience, and yes, that can be difficult without appearing to be "dumbing down" (a phrase often cited at the UK media, I don't know if it's used elsewhere). Should there be a simpler version in addition to this? Yes, because we need to draw people into maths and specifically statistics. They are important aspects to our work, but people get scared and disenfranchised, and subsequently may in their own minds devalue their significance.
Now then, to my original reason for commenting! It may have been noted I was critical of Ben's posts in the past. I would like to point out I only studied statistics as a small part of my university degree, and I have the up most respect for Ben's work.
I believe it to be accurate, thorough and of benefit to the reputation of the SEO industry. My issue was with the way, in the first of Ben's two recent posts, how the data was displayed graphically (22nd April post). I did not like the way the scales of the graphs were inconsistent and changed throughout the post, stretching and distorting the significance of the findings. Yes, the text explained that a 0.2 correlation was weak, and yes I agree with your point in rebuttal number 5 that it was worth reporting and that "correlations in the 0.1 - 0.35 range quite interesting".
However, if we assume from Jennita (and I agree with her) that people don't always read the posts fully, I think it makes it all the more important to have accurate diagrams and graphs because people will look at these and not read the accompanying text which qualifies it. I must stress that I feel this was adequately addressed in the June 8th post, but I just wanted to reiterate what my original criticisms were, and that I am pleased they were largely addressed subsequently.
I know, tl;dr.
(Edit: formatting)
I often wonder what makes people leave a thumbs down, but really it seems to be different for every person. It could be the language used or the actual content, or maybe the person is having a bad day. Hah!
I think we can approach this post as having a peek into a complicated machine. While we may not understand every piece of it, we can still respect and admire the intricacy of the whole.
Absolutely, I didn't understand much of the post, but I still enjoyed reading it, and I'm glad that Ben took the time to put it all together. Without information like this, we wouldn't have an appreciation for the raw numbers behind all this!
It's funny how a person can complete an entire mathematics degree and never encounter a statistics class that gets past the nitty-gritty of academia and on to the valuable applications in the real world.
I suspect a lesson such as this one would have completely converted me from a self-described 'calculus guy' to a rabid 'stats guy'.
Well done and thanks for the inspiration. Your rebuttal was well crafted and deilvered with the highest level of professionalism.
Wow! I thought I was good at stat's in collage. Apparently not! :) I had to read & re-read a few times to fully get the gist of point you were making and substantiating information.
However it is always good to have someone like yourselfto put in the time and research to help give SEO the credibility it deserves.
Thank you & have wonderful weekend!
Sorry, no technical discourse below just kudos to you!
Wow! What a read- a tad technical, but a great smackdown rebuttal to those calling you (and any other SEOs) out on statistical findings. I think you replied with an emphatic 'Scoreboard.'
I have a button in my office that reads 'Data Rocks'. Without solid data to see where you've been, it's very difficult to drive direction for where you want to go.
I leave this comment now, to go dust off my statistics course packs and text book to take a trip down memory lane.
Hey Ben,
Thank you for taking the time to educate all of us. A lot of people complain that SEO is not taken seriously. You are one of the few that actually work to change that.
I think this post is a great step in the right direction. :-)
Excellent post, Ben. Some of us were awaiting this sort of reply as a sign that SEOmoz was taking this particular study (and SEO data as a whole) seriously.
I'd second Dr. Pete's sentiments that what's important here is the exchange of information in uncharted territory. There's certainly a shortage of transparent writing and data on SEO experiments. Kudos to you, Ben, for taking the time to explain the technical aspects behind your thinking and keep the discussion and debate going.
Lots to learn here if you're an SEO watching from the sidelines...
I love seeing the math geeks who challenge thoughtful work without offering a better solution. I can appreciate the fanaticism that entices math gurus to want to kill one another in religious zeal over semantics. Unfortunately, all of these arguments are about personal pride and not about serving business. The SE's have the secrets locked down and business does not care about ~.001%. We're not building a little machine that goes "ding" in the operating room. Just get results for reasonable cost. Get results and business will appreciate your guess.
Here's the situation, "It's 106 miles to Chicago, we got a full tank of gas, half a pack of cigarettes, it's dark, we're wearing sunglasses and we have humpback whales we have to take back to the future before the space cigar destroys Earth."
When you're about to richochet around the sun to jump start a time warp, sometimes the best we can do is ask Spock to take a guess.
I accept the challenge to learn more about reasonable statistics. Across a huge web presence every bit of juice does add up. Yet let none of us forget that we can be slightly off the mark on many things but still get rocking results from other optimized areas. In fact I know a branding expert who screwed up a clients website by making it invisible to search engines. But their call volume was up from 4 calls per month to 4 calls per day. Turns out that she nailed the branding and got the buzz rocking in other ways.
Even the new SEOmoz beginners guide still has the "... SEO is many things but it is largely optimizations of many kinds... yadda yadaa".
So keep this great scientific approach coming. I truly thank you for helping me go deeper. With the right practical mindset getting smarter on stats and making pretty charts will make us all stronger.
Great post!
Any young and emerging sector in an old established market reacts normally in two ways ... sadly SEO has seen too much of the first.
1) Become a 'Young Turk' hot headed and arrogant, strut to be noticed (but very rarely taken seriously). Armed with noise, bluster and trolling damn the old ways, embrace the new, ignore everything else.
2) Have an 'old head on young shoulders'. Challenge the old ways head on with clear thoughts, find ways to positively contribute by understanding the past but live in the future. Armed with old weapons, wielded in new ways.
This post is one of the few I have read that has succeeded in the second category. A great post, thoughtful, insightful and for a developer, and not a blogger, explained very well. Thanks for the post, I will be using some of this advice.
This is fascinating stuff. I forgot most of what I learned in college statistics but if you would like to learn more abouts statistics there are free lecture recordings from schools like Berkeley. You can listen to almost all the lectures for "Introductory Probability and Statistics for Business" here:https://webcast.berkeley.edu/course_details_new.php?seriesid=2010-B-87405&semesterid=2010-B
Out of all of the SEO blog posts I have read, this one I think makes me most enthusiastic on the direction and eventual acceptance of SEO as a legitimate academic field.
Great job addressing the facts & the claims without making it inflammatory and personal. You've set a great example that Dr. Garcia especially ought to pay attention to.
A layman's rebuttal goes something like this: People who think SEO is all hocus-pocus have simply never tried it.
Amazing post! Thank you so much! Keep up the incredible work, look forward to reading what else you have to offer everyone! :)
Ben, That's a heavy duty post! I'm pleased to see some real rigour brought into the field of SEO, and SEOmoz are leading the way in this so this is another great post.
SEO bashing is a popular sport. Rather than argue over detailed methodology with the detractors, why not blow our own trumpets a little more with the key metric, success?
Jeremy.
A response to SEOmoz “rebuttal” is available now at https://irthoughts.wordpress.com/2010/07/12/on-seomoz-knowledge-about-statistics/.
A tutorial on the correct way of computing and analyzing correlation coefficients is available at https://www.miislita.com/information-retrieval-tutorial/a-tutorial-on-correlation-coefficients.pdf
Dr. E. Garcia
Do you realize your new claims contradict your prior ones? Your new position is that the mean we compute is a biased estimator for the correlation coefficient on the population of pooled paired data (or so I gather. It is a biased estimator of many things, and you are imprecise in exactly what you are pointing it out for). But your old position was that the mean was uncomputable. Something cannot be uncomputable and also be a biased estimator. Your positions contradict.
You must realize your prior claims were in error. Look at how this all started. You cited someone's argument, and backed it as being "exactly right," that claimed "both Spearman and Pearson are trying to find whether two variables correlate through a monotone function..." Seriously, your inital criticism of our blog post was based on the idea that Pearson's measured any monotonic correlation like Spearman's does. Then you claimed my statement that PCA was a linear method "confirms an incorrect knowledge about statistics." Sometimes there can be ambiguity, but there is none here. Pearson's measures linear correlation. PCA is a linear mehtod. You are just wrong.
Do you realize I never claimed the mean correlation coefficient was an unbiased estimator for the correlation coefficient computed on the pooled data? You must, right? When you were claiming we should compute standard error for our mean correlation coefficient as if it was not a mean but a direct correlation coefficient you were likely assuming it was an unbiased estimator for this other value. Of the two of us, that would make you the one that was assuming it was an unbiased estimator for this, not me. Your new point about bias might rebut this earlier point of yours, but it does not contradict anything I've said.
You don't have to keep digging yourself a deeper hole. There is another way - just fess up to your mistakes and move on. Or if that is too much for you, just move on.
Ben
(edited to fix whitespace)
You and Fiskin are the one digging a hole.
Correlation coefficients are not additive. Period.
Arithmetic averaging correlation coefficients is incorrect. Period.
Computing a standard error for correlation coeffcients as done for a mean of observations is a gross statistical error. Period.
We have forwarded your methodology to real statisticians and they agree on the above. Period.
It is clear that you and Fishkin are in the business of deceiving the public.,
Repeating your conclusion doesn't get around the fact your arguments to reach it have been refuted, nor that your arguments contradict each other.
The lack of additivity does not make the mean uncomputable, nor did either of our blog posts assume the mean was an unbiased estimator for the coefficient on all pooled paired data points. And obviously the mean cannot be uncomputable (your first argument) and also compute (your second argument) a biased estimator.
Regarding your claim to have found a statistician who agrees with you - where do they work and what is their name? And if I contact them, will they actually tell me they agreed with your claim that the arithmetic mean is uncomputable on a set of correlation coefficients? Did they also agree with your claim that PCA was not a linear method? And did they agree with your claim resting on that Pearson's measures any monotonic correlation similar to how Spearman's does? Or will they tell me you did not accurately reflect the nature of our disagreement to them?
Astonishingly, well expounded article. Thanks for distributing your precious knowledge to everybody.
I would like to ask you something about the use of Spearman correlation as you seem to have explored the thing in depth in your post here (https://www.seomoz.org/blog/statistics-a-win-for-seo ). How could you justify that SEOmoz (https://www.seomoz.org/article/search-ranking-factors#metrics-1) uses Spearman for the correlation of Page Authority or Domain Authority since they are not definitely monotonic and why don't you use another correlation metric? I mean lets say you get the first 10 pages for a group of 10 queries from Google search engine and their corresponding Page Authority. So in total you have 100 results and you have 100 pairs of PA and rank. If you display this in a 2 dimensional x-y graph it does not have to be monotonic. Could you help me a bit on this because I am highly interested in the topic. I am not into statistics in depth so I would like to know if my question makes even sense?
This feud is very interesting...having finally read most of the comments and back and forth posts between SEOmoz and Dr. Garcia.
Seems only right to include a link to his rebuttal to the rebuttal....
https://irthoughts.wordpress.com/2010/07/12/on-seomoz-knowledge-about-statistics/
Thank you Ben for reminding us in the SEO field that having stats is a great way to back up the facts that is going on in our clients websites.
Instead of seeing us as just Stats Geeks, we (SEO) are the ones that mine this data & interpret it & bring it to visulatization data so clients can understand.
+1 insightful. As someone mentioned above, this kind of real-world example would make stats courses a heck of a lot more interesting. I, like others, have learnt a great deal during the process. Looking forward to the next lesson :)
I have now followed 95% of your reasoning (I'm still working on some of the arguments around linear combinations of eigenvectors - never my favourite area) and I'm confident in standing with you on this one. I think the data is valid and interesting. Thanks for all the hard work putting it together and then explaining and defending it.
ye...what he said :)
I am glad to see some mathematical rigor being brought into the public discourse for SEO.
With a population genetics and evolutionary theory background, one thing I've learned is that you selecting the right test for the right question is key.
Just because you can find 1000's of references for a statistical technique does not mean the method is applicable (or not) to your situation.
Methodology Question
I did not read this in depth, so I may have missed something.
If I understand correctly, you are using the mean of a correlation coefficent for a set of metrics computed over 11K queries.
Does this not assume that the metrics are independent? Is this the case?
Perhaps, this is a data set for whch bootstrapping could be very informative.
When in doubt?
Mark Twain has some good advice:
"There are three kinds of lies: lies, damned lies and statistics."
Sorry, I'm still such a newb... but what does "bootstrapping" mean in the SEO context?
It doesn't have a specific SEO purpose beyond the general statistical one.
Sometimes it can be difficult to figure out how robust a statistic is because of sample size. The two cases I've seen it most used is when the statistic being computed doesn't have an easy formula for standard error, or because results are not independent because they are in clusters.
The Wikipedia description of bootstrapping is not bad:
https://en.wikipedia.org/wiki/Bootstrapping_(statistics)
We are assuming the set of queries themselves are a simple random sample from a reasonably interesting larger population of queries. This is, I think, fair. We do not assume all of the results of the queries are themselves a simple random sample for some larger population of results, which would not be fair. That would be a clustered sample, which is not a simple random sample.
As the blog post notes, if we had pooled all of the results for all queries together and computing statistics on that we would probably have had to use bootstrapping to deal with the clustered sample, but doing this would have still assumed the queries themselves were a simple random sample.
If you would like to email me, I'll send you a copy of the dataset so that you can do bootstrapping on it, although given we are using the mean correlation coefficient instead of pooling results from different queries, I don't think there is anything interesting to bootstrap on.
I think my head just caved in :-(
Hmm. The debate is getting a bit intense it seems. Nevertheless, I like it. This is the kind of dialogue and debate that is healthy and needed in the SEO community. I applaud SEOMoz (especially Ben and Rand) for both sticking its neck out and raising the bar. This is the type of exercise, the mere attempt of which, can only be of benefit to the industry.
I believe the -0.18 coefficient was for URL length negatively correlated with ranking. (sorry if I'm incorrect).
I take issue with this, and believe it is largely misidentified correlation, statistically valid yes, but not "real".
Sunspots after all, over time, are very highly correlated with the dow jones average. Oddly this is linked to from a MIT opencourseware on "geo-biology" here,
https://www.flickr.com/photos/mitopencourseware/3592499948/
See the nobel prize winning work of Robert F. Engle III, Clive W.J. Granger
Economics (2003) on why this correlation although "valid" is not actually meanigful and how to control for it.
In any case I mean to convey that 1) correlation although statistically valid, does not necessary mean a thing, or imply causation. And also, I know a thing or two about stats.
For me to believe the url lenght correlation is real in terms of causation I would ask if keywords were matched to domain that contained keywords, and if this was controled for. If the analysis was performed amoung sub segments of domiains that only both contained the keyword in url, but not domain, only in domain, and not in domian entirely, and the results still persists I would view the result as much more impressive.
Yes correlation matters. But back to David Hume, 4 conditions for causality, nearly always must apply.
1) temporal antecedence
2) correlation
3) hypothesis of causation
4) Isolation of other effects
Statistics is notoriously difficult in achiving 4, thus the brillance of Steven Levitt so widely apprcieated. Econometricians are widely criticized for ignoring 3.
In this case 4 is not offered at all.
3 offers little hope. Why would url lenght matter? that is the question. Statements by Matt Cutts seem in conflict with this as well:
https://www.seomoz.org/blog/matt-cutts-movie-marathon
Note: SEOmoz correlation data has shown that deeper folder structure may correlate with worse rankings. Deep folder structures can be an indication of other issues, including information architecture problems.
Hi RealGdog
I edited your comment to fix some spacing issues and make the post easier to read. After parsing your statements, I think you're referring purely to the concept that correlation is not causation, which was mentioned at the top of this article, as well as the referenced piece (note the photo of me in a suit used for precisely these purposes).
You seem to suggest that there's something inaccurate or wrong with the research, but all your points are then about the idea that we can't definitively state that negative correlation (or positive correlation) means that variable was the cause. Did you read Ben's part about pregnancy/drinking? Confounding variables? I'd posit that the points you're making are already well covered/included in the posts.
Nonetheless, I certainly agree with your position - we can't say that longer URL length is the "reason" for the negative correlation, merely that it exists. This post is simply showing why the -0.18 (and other mean correlation coefficients) are accurately calculated data points, not trying to relate correlation and causation.
Thank you for the editing Rand. I am rather verbose. In contrast I will (try to) keep this short:
1) I do suppose I imply inocompleteness in the research, but not error. I push for a "disambiguation" of what I believe to be mis-identified correlation.
2) I hope that creative ways will be found to isolate variable effects. This is what seperates the noble laurates, and the Steven Levitts.
To this end, I was hoping for ideas as to why url length is negatively correlated with rankings. Or what else is url length highly correlated with.
In rebuttal to phbj's comment below. Causation absolutely does matter. We live in a somewhat "deterministic world" (at least for serps, but getting philosophical, must end soon). A set of variables taken for a single page will result in a deterministic ranking on a Google results page. I would think major se algo writers would be able to sepearte out good quality long urls from bad ones, instead or resorting to a correation driven rule that all long urls are lower quality.
I just want to tackle Matt's statement, since that quote comes from my recent post. There are two factors at play here (probably quite a few more, actually, but I'll focus on two):
(1) PAGE "DEPTH"
Let's say you have a URL structure like this:
www.example.com/folder1/folder2/folder3/page1
Google does not value "page1" based on its placement in the URL - that's the essence of Matt's statement. Google values page1 based on its placement in the site's information architecture, or from an SEO POV, the site's internal link profile. If page1 is the first menu tab on the home-page, Google doesn't really care that it's 4 levels deep in the URL.
(2) KEYWORD DEPTH
A similar but slightly modified example:
www.example.com/word1/word2/word3/word4
All else being equal, "word1" will carry more ranking power than "word4" - this is what the correlations seem to indicate. So, the length of the URL may matter in that longer URLs tend to bury keywords deeper.
>Why would url lenght matter?
URL length was observed by SE algo writers to be correlated with low quality domains. It doesn't cause them, but adding a negative weighting to long URLs (or those-with-lots-of-hyphens) produces better SERPs.
Causation isn't needed as the SERPs reflect the ranking algorithm chosen.
If rank was an emergent characteristic of high quality relevant pages (as a function of keyword) then causation would be an issue, but it isn't. Rank is chosen (by algo) and uses correlating factors to [imperfectly] approximate quality/relevance/freshness.
It's interesting that you bring this up, because it is something that I took a second look at while replicating SEOmoz's Google vs. Bing methods. I'm not yet ready to post all of my results, but after reading your comment I put up an excerpt on Google Docs.