The actual results of the test were inconclusive. Plain and simple, my test did not include enough samples to be statistically significant.
In doing so, I unintentionally misinformed all of you. For that, I am extremely sorry.
It is now my goal to make this up to all of you. Below is more information on this and my plan for running a new test.
In the meantime, what should I do about PageRank sculpting?
The first test results should be disregarded. This means that I, along with my co-workers at SEOmoz, recommend neither removing nofollow if it is installed (as we have seen detrimental effects for websites) nor adding it if you don't have it. Quite simply, we don't have enough information. (Which is why I ran the original test in the first place... damn)
Your time is best spent on link building and creating quality content. Tricks and tactics like PageRank sculpting are interesting short term tactics but fail in comparison to long term ROI on building a page that should be found. Remember, the most valuable information that Google ever gave SEOs is in the second sentence on this page. Providing the information that Google wants to make universally accessible in a search engine friendly way is the best long term strategy an SEO can have.
What was the old test?
We built 40 websites that looked similar to the following screen shot:
Each website in the experiment used the same template. Each keyword phrase was targeted in the same place on each page and each page had the same amount of images, text and links.
Each domain was unique and used a different IP address. Each testing group had different information in the WHOIS records, different hosting providers and different payment methods.
The standardized website layout contained:
- Three pages per domain (the homepage and the keyword specific content pages)
- One internal in-link per page (Links in content)
- One in-link to homepage from third party site
- Six total outbound links.
- Two "junk" links to popular website articles to mimic natural linking profile (old Digg articles)
- One normal link to keyword test page
- Three modified links (according to given test) to three separate pages optimized for given keyword
- Links to internal pages only came from internal links
- The internal links used the anchor text (random English phrase) that was optimized for the given internal page
- Outbound links (aka "junk" links) used anchor text that was the same as the title tag of the external page being linked to (Old social media articles)
This graphic represents an ultra simplified version of five test sites.
In the old experiment each of these different "variable links" would have attempted to sculpt PageRank in a different way. (Variable link 'a' might use nofollow, variable link 'b' might use JavaScript, etc..) Each of the "normal links" would then point to one of five different pages trying to rank for the same term.
For testing purposes, I chose phrases that were completely unique to the Internet. These were phrases that had never been written online before. (For example, "I enjoy spending time with Sam Niccolls". Just kidding Sam... don't hurt me) In theory, the page that corresponded to the most effective PageRank sculpting method would outrank its competition for these isolated phrases.
To make us confident in our results we had to compensate for the inherent noisiness of the Internet. To do this, we ran the experiment in parallel eight times.
This shows the full scale of the experiment. Each color (labeled with the numbers 1 - 5) refers to a different PageRank sculpting method. The 8 groups horizontally represent the isolated tests.
What went wrong?
As far as I can tell, the experiment was executed without a problem. As it turned out, the problem wasn't necessarily with the experiment itself but rather with interpreting the results. I used the wrong metric to evaluate the results (average rank of each testing group) and relied on too few samples.
What is the new test?
Rather than testing which PageRank sculpting method works the best, I am now going to test if the nofollow method works at all.
We ran the numbers (see math below) and found out we could run this test in either of two ways. The first way would only require 40 samples but would require a very high rate of success (nofollow beating control) to prove valid. The second test emphasizes precision and requires a much lower success rate but a much larger sample.
I have both tests planned and would love to hear your feedback of the tests prior to running them. Below is a diagram of the plan to test the nofollow method against a control (null) case.
This diagram shows an ultra simplified version of two test pages to be used in the new nofollow test. The real versions of the pages are more like the "Horsey Cow Tipper" example at the beginning of the post.
For this new test, both "normal links" will point to two separate pages trying to rank for the same unique phrase. "variable link 'x' will then link to a different page. "variable link 'y' " will be nofollowed and also link to a completely separate page. For each test group, we will see which of the two competing pages ranks higher. Our hypothesis is that the page that that is linked to from the page that has the nofollowed link (variable link 'y') will ranker higher. We believe this because we think the control case will split the link value semi-equally between the two links on the page and thus not send its full worth to the page trying to rank for the unique term.
This test will then be duplicated 20 times as seen in the diagram below.
Diagram showing simplified test pages from the new nofollow test
Are 20 tests (40 domains) really enough? We think yes but only for one very specific outcome. In order for this second test to be valid at 95% confidence, 15 out of the 20 tests will need to show that nofollow was an effective PageRank sculpting method.
If this doesn't happen, we will need to run a third test with a much bigger sample size. If we want to be 95% sure we will detect nofollow being better with 95% significance even if the odds nofollow wins a given trial is only 5 out of 8, we will need 168 test pairs. (See math below)
What keeps you from making the same mistake?
While reworking the old test, I got the help of Ben Hendrickson who sits a few desks away. Please feel free to check our math before we run the test.
The Math Behind the 168 Pairs Nofollow TestThis test consists of a number of independent trials. In each trial, either nofollow or the control will rank higher. Thus the number of wins will be distributed according to a binomial distribution. Where n is the number of trials, and p is the probability that nofollow wins a trial, the normal approximation to the binomial distribution is:
Where W is the number of wins, and z is the number of standard deviations above the mean, the formula for the number of wins is thus:
The null hypothesis is that nofollow wins at p = 0.5 (even odds). To reject the null hypothesis in favor of the hypothesis that nofollow is an effective PageRank sculpting method with 95% confidence, we would need to see a minimum of W(n,0.5, 1.645) wins. How many wins will we see? We are 95% sure to see at least W(n, p, -1.645), where p of the actual chance that nofollow win a given trial. If we are setting a lower bound of p = 5/8 = 0.625 for what we are trying to detect, then we have a lower bound of seeing W(n, 0.625, -1.645) wins (with 95% likelihood) if in fact nofollow is at least that much better. We can set this lower bound on the number of wins we expect equal to the number of wins we need to see to have 95% confidence nofollow is better. After that we can then solve for the number of trials.
So we conclude we need 168 trials. If this test fails to show nofollow is better, then we are 95% sure that nofollow wins trials less than 62.5% of the time. We wouldn't be able to say nofollow sculpting doesn't matter, but this does say it doesn't seem large in comparison to the other factors we were unable to control for in our experiment.
The Math Behind the 20 Pairs Nofollow Test
So then why don't we run this as our next test? The answer is simple. A 168 trials is a lot of domains to setup. So maybe we will get lucky. If we do a good job of controlling for other factors, and the nofollow sculpting has a modest effect, perhaps the nofollow will win much more frequently than 62.5% of the time on average.
To see a 95% significance of nofollow doing better than the control, we will need to see 15 wins for nofollow out of the 20 trials. One could do more math for this, but how we actually got this number was an online binomial distribution probability calculator. Plug in p=0.5 (as this is the null hypothesis), n=20, and many various values for the numbers of wins until you find the lowest number whose chance of getting greater to or equal to it is less than 5%. That number should be should get 15.
Is there any takeaway from the original test?
Thousands of people read the post about the first PageRank sculpting methods and based on my assessment took it as truth. It wasn't until two days after posting the original entry that Darren Slatten pointed out my mistake. That means that the damage had already been done and it would be practically impossible to contact all of the people who had read the post.The small amount of people who did notice were (rightfully) upset with me. Their frustration with me and SEOmoz was vented on their personal blogs, Twitter, Facebook, e-mails and in the comments on the original post. This was a great (although unintentional) case study on how the Internet affects the distribution of information.
(mis)information on the Internet does not die.
We saw the very real effects of this on a large scale after the Iran election was covered by normal Iranian individuals on Twitter and on a very small scale with the test results of my first experiment. Once the information reached the Internet, it was out of the control of both its creator and those trying to silence it. For me, this was a much needed reminder of how much the Internet empowers its users. Together, all of you are a force to be reckoned with :-)
One more thing... Why don't you post the actual URLs so we can investigate them ourselves?
I will do this, but not right now. Posting them now would compromise the integrity of this and future tests. By linking to the test pages I change their link profile. I am happy to do this after the tests have been run and we no longer need the framework. I hope that makes sense :-)
I would love to hear your thoughts and constructive criticism on our new test. Please feel free to chat your brains out in the comments below :-)
The original test, including its flawed analysis, has redeemed its value in that it's prompted another round of testing which is highly unlikely to be the final round, at that. I feel that the intense negative and sometimes unprofessional reaction to the results of the first test are just indicative of a young industry who has yet to learn that science has made wildly inaccurate statements (most of which are far more inaccurate and damaging than anything you said last week) for a long time, but that tempered debate and re-testing only advances the discipline as a whole.
Again, the results probably aren't going to provide a conclusive answer to the question "does PageRank sculpting with nofollow work?", but it'll help the next test be more accurate and mean more. Science can also attest to this: you can't devise the perfect test without doing a whole lot of preliminary tests first, which all reveal their flaws to the benefit of future experiments. You demonstrated this very well last week, and now appear to be demonstrating an understanding of the idea that you can't call "proof" quite so easily.
Classy stuff, Danny. Echoing what others have said above, but it's so refreshing to see you work with the feedback you received from the original post in a highly positive manner, even though some of that feedback descended into negativity. I really hope that any further ideas people have about this sort of test can mirror the positive attitude you've taken here: you've shown that you're genuinely interested in the test's outcomes.
I'd like to second this. I think the way Danny responded to the feedback was an inspiration. Danny, you come across as level headed, humble and responsive, and I applaud you for that.
I'm really looking forward to the update on this test. The efforts being made are far beyond my own patience and capacity, and illustrate the benefits of resources like SEOmoz. Good stuff.
I can totally appreciate the amount effort that this has taken and I think everybody appreciates the attempts being made to test this but....
I still feel that the fundamental measurement that this test is being built on is unuseable. This test is in essence a binary one: Either one page ranks higher or another one does. The ranking algorithim Google uses has hundreds of different facets that could tip these results one way or the other (0 or 1) regardless of the link graph of the pages you have so diligently setup.
Even if you were to create these pages identically and not run a test, one page would rank higher than the other.
Unless you can have a much much larger test population or improve the signal/noise ratio (by knowing the means by which Google ranks pages), your test will I believe return results that have no more practical significance than those obtained by flipping a coin.
I'm not usually one to pee on peoples bonfires, but I only offer this objection to prevent what could be a similar occurance to a few weeks ago.
I have my own set of concerns with the current design of the experiment, but statistical significance is statistical significance. If you flip a coin 20 times and it comes up heads 15, there's a 95% chance that you've got a rigged coin.
my mistake, I meant practical significance not statistical. Edited above.
Also am I'm not right in suggesting that as this test relys on multiple comparisons, that the familywise error rate should be considered too?
Thanks ricky,
You raised my main concern which is that the experiment seems to "control" for an unnecessary number of variables. It's been a while since I've taken a course in experimental design, but it seems like there are much simpler ways to reach the same end.
I'd think dropping the variable external links, would be a start to reduce the noise this test would have. I'm still not 100% sure what they are there for: "mimic natural linking profile" is a rather leading statement!!
In this new round, Danny is only going to compare nofollow to the control group, and only judge if nofollow is doing better. So I think there will only be one comparison in the "family", making familywise error rate the same as normal single comparison error rate.
So a good issue to keep in mind for this kind of testing, but I think not an issue here. But let me know if you think it is, as (confession time) I just looked up familywise error rate on Wikipedia to know what you are talking about, so could be missing something.
Several "identical" tests are going to be run alongside each other, but the results are going to be compiled from these seperate tests. There may be a comparison between the control but the comparisons themselves are analysed as a group set of results. I believe (but it has been a very long time since I have done stuff like this so I welcome correction/education) that this requires the p-values to be tweaked somewhat to take into account false positives (I think).
I see your point - that we have multiple "tests", and when one has multiple "tests" one needs to use familywise error. But I think for the purposes of familywise error, we actually have one test with multiple trials.
Here is what I think familywise error is getting at: One is given a bag of 100 coins and ask to check if there are some rigged coins in it. So each coin is flipped 20 times. Assuming the bag contains to no rigged coins, we expect to see around 5 that flipped 15/20 or more. One naively could then conclude that we are 95% sure those coins are rigged, and thus are at least 95% sure that the bag contained some rigged coins. But that is silly, because we will probably conclude that even if the bag contains only fair coins. Hence the need for familywise error to adjust the level for statistical significance.
But in Danny's new test, he isn't testing multiple coins. He is only testing the "no-follow vs control" coin. His "multiple tests" are really flipping the same coin multiple times.
But in Danny's new test, he isn't testing multiple coins. He is only testing the "no-follow vs control" coin. His "multiple tests" are really flipping the same coin multiple times.
I'm not sure I agree that these multiple tests can be considered exactly the same.
We have multiple sites on different domains that can not possibly be identical. To continue the coin analogy the tests are not tests of the the same coin but rather very similar coins. Therefore the chances of false positives exists. That level of possible error needs accounting for surely?
Back to the main problem in this test as I see it:
We have 200+ external ranking factors that could tip the ranking of either page in test to position 1 or 2, without the interference of nofollow.
Like I've said before the test could be run with identical links pointing to each page (without nofollow) and Google would still rank one page higher than the other due to this other large number of external variables. In essence we need to know exactly how ranking works and we just don't.
Fundamentally I think there is too much noise in the test to make any kind of reasonable assumptions especially with a data set so small (not that I'm belittling the efforts required to set up that many domains :) )
The trials (what you are calling "tests") need to be independently and identically distributed (commonly called IID), but they don't need to be identical. Flips from the same coin are not identical flips (they happen at different moments in time), but they are independent and identically distributed from the distribution of flips from that coin. Likewise, Danny's domains are different domains, but they are independent and identically distributed from the distribution of small domains set up that way. What matters is the differences between them are uniformly randomly distributed. So they meet the standard needed for being IID trials.
But your main point, that there will likely be too much noise for 20 trials to come up with anything besides being inconclusive, is totally right. The math I did with the 5/8 assumption (which Danny included in his post above) says Danny needs to do 168 trials. So thinking Danny will get anything except something inconclusive with 20 trials is optimistic. But the result will still be a bit interesting... more so, of course, if the results are extreme. And at least it will be a valid test even if it comes up inconclusive.
A good point.
Of course, a Bayesian would argue if you have a strong prior belief that your coin is fair, then the evidence should only adjust your belief. So if you were 95% sure the coin was fair, and you got a 15/20 result, then after the test you should believe there is a 50% chance the coin is fair.
This matters because if Tootricky is pretty sure the results will be inconclusive, but we get statistical significance with a 15/20 result, he still should be less than 95% sure the test showed nofollow worked.
Thumbs-up for bringing Bayes into the conversation...
Danny, I cannot imagine how disappointed you were that your original post didn't receive the reception you expected and that the results weren't what you expected either. You must have spent a long time working on the experiment and I'm sure we all know how you're feeling -- probably gutted.
The way you're performing this experiment seems much more scientific and I assume that next time you're going to release the raw results for us to take a look at straight away with interpretation attached obviously. I think that by doing this, it will stress people out less and you'll get a better response.
I'm really looking forward to seeing what results you get, I hope they're what you expect this time.
Well said, Danny. I have said this in private, but it bears repeating in public: way to go with your appetite to learn and for admitting mistakes. I enjoyed brainstorming with you, Ben, Rand et al about how to structure the test this time round. Here's hoping for a level of significance we can measure with the small test.
PS. I'm thrilled that my 5/8 lower bound has made it this far ;) - remember the reasoning for that is not that there couldn't be smaller effects but that the smaller the effect the less we are interested in it, so we're happier to call 'fail' and say nofo doesn't sculpt if we can't detect something at that level.
Really nice and informative post! Thanks for the read...
This only shows that the people at SEOmoz are only human, which can make a mistake (how suprising ;))
It's a good thing to come clean with it and to make sure the mistakes are corrected.
It's big of you to publicly admit that you made a mistake with the original example. By doing this you are showing that SEOmoz is the real deal, and that you genuinely want to get it right.
The study is important, and the outcome will be useful to many of us in the business, so good luck with it.
This is something that most of us could not consider testing ourselves in any large scale, controlled way; and we rely on organisations such as yours to help us out with this.
Cheers
Ditto this Danny boy. I don't think a single reader is free from having made a mistake, and owning it the way you have only increases my confidence in you as well as the rest of the crew at the mozplex.
wow, maybe I should screw up more :-p j/k
Thank you everyone for your understanding and support :-) I really apprciate it. More tests to come.
Maybe this mistake was a blessing in disguise for SEOmoz to help everyone understand the strong ethical values under the company. The information is solid and now, there is great anticipation for the new results. Your time put into this is greatly appreciated Danny, strong work!
I have to admit, after reading the post and comments last week I really thought SEOMoz was going to stick to it's guns on this one. I'm glad to see that my original assumption was wrong, and the testing will continue with guidance from the last test results and feedback.
Irionically I just received my Statistics book to help refresh myself, so this could be a good way for me to have a real life example of how it all works again.
Can't wait to see the results of this latest test.
I thought that too. That post caused a lot of controversy in the SEOMoz community and I just hope this is a bit of a turnaround. I love this website and the articles. If you looked at my Delicious you'd see that the majority of the articles come from this place.
Edit: I'm glad you were wrong too. Wouldn't want the reputation of this place tarnished for the sake of that test.
More testing? Cool! :) I'm a big fan of a sytematic, scientific approach to SEO, and I wholeheartedly support SEOmoz's efforts in this. Everyone makes mistakes, but it takes character to admit them. It takes even stronger character to then carry on and fix it. Kudos to you Danny. Can't wait for the results of the next test, even if it's another inconclusive. That's how science works, one small step at a time.
This post took some balls. I respect that. We all make mistakes.
Danny,
As we say in Argentina, it takes "huevos" to admit you made a mistake and besides all the tech details, I think that you and the folks of Seomoz managed the problem the way it must be managed : telling the truth.
Un abrazo para todos,
Mariano.
Classy response Danny!
Spot-on specs listings too. Can't wait to see the new testing results and as someone else has already stated, perhaps this should be done each quarter to test on a regular basis maybe?
:-)Jim
It is hard, but it is the best way to say you are wrong. Much better as to say nothing. We all are only humans, everyone tries it's best.
Greetings from germany
Not every day you see someone in Digital Marketing saying they made a mistake. Fair play.
I look forward very much to seeing the results of the new experiment!
Paul Martin
Cube3 Marketing
Danny, you still haven't described your setup in a way that would allow anyone to duplicate it, much less understand whether it even tests what you want to test.
On the original post, these were 4-page web sites. In this post you say they were 3-page web sites.
Is the following description of the test sites?
Home Page A links to content pages B and C. Link A->B is a normal link using anchor text D. Link A->C is a test link also using anchor text D. There are no B->C or C->B links
If not please explain what you are doing and for bonus points, why these were 4-page sites before.
Either way, please explain what is being measured as a result. Is this a comparson between the rankings of the (paired) sites' "Page B" for the search query?
Hi Dan,
Good question.
First off, the test was done with 3 pages, not 4. That was a typo. I corrected it in the old post.
Secondly, you have a lot more experience in this than I do so you might be able to help me answer your question.
I am more than happy to provide more detailed diagrams that answer your questions but I am fearful that this will destroy the integrity of the tests.
I know for a fact that there are Googlers who read this post and I don't want to give them the ability track down the test page and influence the results.
As you know, SE testing is a tricky problem because the engines don't particularly like to be tested. This is somewhat analogous to wave–particle duality. If you measure it (or in this case, provide enough information for others to influence the results) the thing being measured changes.
What would you do in my case?
Danny, let's just go to email on this - dan at SEO Fast Start dot com
It occurs to me that you could publish an example site without disclosing the actual test sites, but that still potentially discloses a footprint for the test sites.
If I worked at a certain search engine, I'd totally love spending my 20% time trying to mess up SEO tests. ;-)
Good follow up article, fair
looking forward to seeing the results.
Nice follow up.
And a very good way to appologise for a misstake.
It reminds me of that we all are humas. And the wise ones deal with their misstakes a humble way
It takes a big man to admit when he is wrong.....
....mind you those dissenters amongst us were clearly of loud voice when you made your last post.
Frankly I dont think that we need huge amounts of data to backup the claim that pagerank sculpting is not dead - I still employ it on all my sites, but when the whole bombshell hit last time it just made me carefully look at the on site navigation much closer and strip out the fat that over-reliance on rel=nofollow had allowed me to do.
The advantage (being the first adoptor in my industry) of this was that we had a surge in SERPs which we continue to enjoy to this day.
Do I continue to rel=nofollow links of no significant search value?
YES...
...WHY: well - at the very least I dont want my own pages competing with the content pages I actually want to promote, so on that basis alone PR sculpting is still a valuable tool that should be in every competent SEO's armoury.
MOGmartin
I think this situation shows that no matter how trusted the source it is important to analyze the results and methodology used and test for yourself.
I in no way fault anyone for this mistake and look forward to the new test results.
It is refreshing that you are so transparent. Just because the first test was "inconclusive" does not mean the community didn't totally benefit from this learning experience.
"All life is an experiment. The more experiments you make, the better. " ;)
Very informatica blog for me i ma just rejestred with site only to say thanks
Wow, we appreciate the follow-up. After reading the original post, I started running some test on my own site, putting the nofollow on some lesser important pages to see the results. I did notice one thing interesting - in some of my double listings, the second listing chaged from directions to our services page, which I thought was great. I haven't determined yet if this is directly related to the nofollow added, but I couldn't help but be intrigued to look into it more.
Maybe nofollow doesn't pass Pagerank anymore, but I still can't help but think it has some effect. I can't wait to see what the new test shows, and I just wanted to throw out what I found.
Thanks again SEOMOZ!
Good luck buddy - looking forward to seeing the results.
dag. you guys put data and blogs posts together in ways unlike any I've ever seen.
I understood none of this, but I'm sure it was correct
Cheers,
Eric ;)
Thank you for posting this article, i really didnt relise there is so much thing involvedin nofollow, all this maths and web sites you made to perform. Its got really useful info.
Ah Danny thanks a lot for clearing the questions in my mind about the previous test. Since pr sculpting is pretty important :)
It would be pretty darn interesting if the results were to show that nofollow still passed linkjuice. It would really change some SEO tactics overnight.
Unless I'm mistaken we're at the very last step on this chart, right?
Aww Shucks
i came here only to make sure the chart had already been linked in the comments.
Will be interesting to see what the results show and also to get the URLs so we can look for ourselves.
Thanks for taking the time to retest this and acknowledge your results from the last test.
It takes a lot of time and hard work to set all of that up and even more props for stating you made a mistake, fixing it and posting it to clarify for all the SEOs out there who can't take time away from their PRECIOUS clients to figure this type of thing out. its inspiring to our Labs team.
:)
@KRONiS
This just shows that you should not blindly follow the advice of others, even from a leading authority/site. In fairness I've found most of the information on SEOmoz reliable, but I believe you really need to do your own testing before implementing such suggestions.
Thank you for posting this article.... Informatica is such a course that requires learning the technological features of the product. visit our site for more details <a href="https://hadooptraininginhyderabad.co.in/informatica-training-in-hyderabad/">informatica online training in hyderabad</a>
The First expirement was a little off, but we all forgive !
i really love how you gave me the science of the test , right on and keep moving up and on!
Your Awesome!
https://listformula.com
Nice follow up.
That’s very interesting. I’m glad you posted this SEO information. I can really make good use of these tips. I’m looking forward to reading more of your posts.
Thanks
Admin
www.nexpider.com
You called your site Neck Spider.
Now I'm all jumpy.
yet again you get it wrong, link sculpting, nofollow all are a totally misunderstood. Are you building sites for the engines or the users and what makes you think the indexing of the sites and the paired tests was adequate enough to actually come to some conclusive results and there was enough time in the development of an indexing strategy for and by the engines?
Yet another reason i have not renewed my membership to SEOmoz.
I think you are a really smart guy but incredibly mislead, kinda like those benevolonet dictator scenarios.
The test was a good second attempt and nicely thought out but seriously, there are other issue to look at and all you did was burn my time answering client questions on a topic that doesnt bear thought.
you scally wag ;-)
WOW I think it's undeniable you are on the cutting edge of SEO. Your Scientist.