Penguin changed everything. For most search engine optimizers like myself, especially those who operate in the gray areas of optimization, we had long grown comfortable with using "ratios" and "percentages" as simple litmus tests to protect ourselves against the wrath of Google. I can't tell you how many times I both participated in and was questioned about what our current "anchor text ratio" was. Many of you probably remember having the same types of discussions back in the keyword-stuffing days.
We now know unequivocally that Google has used and continues to use statistical tools far more advanced than simply looking at where an individual ranking factor sits on a dial. (We certainly have more than enough Remove'em users to prove that.) My understanding of Penguin and its content-focused predecessor Panda is that Google now employs machine-learning techniques across large data sets to uncover patterns of over-optimization that aren't easily discerned by the human eye or the crude algorithms of the past. It is with this understanding that I and my company, Virante, Inc., undertook the Open Penguin Data project, and ultimately formed our Penguin Vulnerability Score.
The Open Penguin Data Project
Matt Cutts occasionally gives us a heads-up about future updates, and in the Spring of 2013 we were informed that within a few weeks Penguin 2.0 would roll out. I remember exactly when the idea hit me. I was reading "How is Big Data Different from Previous Data" by Bryan Eisenberg, and it occurred to me that the kind of stuff we were doing at Remove'em to detect bad links just didn't keep muster with the sophistication of the "big data" analysis Google was using at the time. So Virante went to work. We started monitoring a huge number of keywords, so that when Penguin 2.0 hit we could catch winners and losers. In the end, we used data from three different awesome providers: Authority Labs (for the initial data set), Stat Search Analytics (for cross-validation) and SerpMetrics (for determining that we weren't just picking up manual penalties). We identified around 600 losing URL/keyword pairs and matched them with their competitors who did not lose rankings.
We then opened the data up to the community at the Open Penguin Data project website and asked members of the community to contribute their ideas for factors that might influence the Penguin algorithm. You can go there right now and download the latest data set, although at present I know there is a bug in the mozRank and mozTrust columns that needs to be fixed. We have identified over 70 factors that may influence Penguin and are still building upon them, with the latest variable update being October 14th. Unfortunately, only certain variables can be added now as fresh data won't be relevant. The data behind the factors came from a large number of sources beginning with Moz of course, and including Majestic SEO, Ahrefs, Grep/Words, and Archive.org
We then began to analyze the data in a number of ways. The first was through standard correlation coefficients to help determine direction of influence (assuming there was any influence at all). It is important that I deal with the issue of correlation vs. causation here, because I am sure one of you will bring it up.
Correlation vs. causation
The purpose of the Open Penguin Data Project was not and is not to determine which factors cause a Penguin penalty. Rather, we want to determine which factors predict a Penguin penalty so that we can build a reasonable model of vulnerability. Once we know a website's vulnerability to Penguin, we can start applying different techniques to lower that vulnerability that fall closer to the realm of causal factors.
For example, we will talk about the difference of mozTrust and mozRank as being a fairly good predictor of Penguin. No one in their right mind believes that Google consumes Moz's data to determine who and who not to penalize. However, once we know that a site is likely to be penalized (because we know the mozTrust and mozRank differential), we can start to apply tactics that will likely counter Penguin, such as using the disavow tool or removing spammy links. We aren't talking about causation, we are talking about prediction.
The analysis of the risk factors
We then began analyzing the data using a couple of methods. First, we used standard mean Spearman correlations to give us an idea of the lay of the land. This allowed us to also build a crude regression model that actually works quite well without much tweaking. This model essentially comes from adding up the correlation coefficients for each of the factors. Obviously, more sophisticated modeling is better than this, but to build a crude overview, this works quite nicely and can be done on the fly. The real magic happens, though, when we apply the same sorts of machine-learning techniques to the data set that Google uses in building models like Penguin.
Let me be clear, I do not presume to know what statistical techniques Google used to build their model. However, there are certain types of techniques that are regularly used to answer these types of multivariate classification problems and I chose to use them. In particular, I chose to use a gradient boosting algorithm. You can read up on the methodology or the specific implementation we used via scikit-learn, but I'll save you the headache and tell you what you need to know.
Most of us think about statistical analysis as putting some variables in Excel and making a nice graph with a linear regression that shows an upward or downward trend. You can see this below. Unfortunately, this grossly over-simplifies complex problems and often produces a crude result where everything above the line is considered different from that below the line, when clearly they are not. As you see in the example graph below, there are plenty of penalized sites that get missed by falling below the line and completely decent sites that are above the line that get hit.
Classification systems work differently. We aren't necessarily concerned with higher or lower numbers, we are concerned with patterns that might predict something. In this case, we know sites that were hit by Penguin, so now we use a whole bunch of factors and see how the patterns between them might accurately predict them. We don't need to draw an arbitrary line, we can individually analyze the points using machine learning, as you see in the example graph below.
The hard part is that machine learning tells us a lot about prediction, but not a lot about how we came to that prediction. That is where some extra work comes into play. With the Open Penguin Data project, we grouped some of the factors by common characteristics and measured the effectiveness of their predictions in isolation from the other factors. For example, we grouped trust metrics together and anchor text metrics together. We then grouped them in combinations as well. This then gave us a model we could use to determine not only increased Penguin vulnerability, but also what factors contributed to that vulnerability and to what degree.
So, let's talk through some of them here.
Anchor text
By now, everyone and their paid search guy knows that manipulated commercial anchor text is a risk factor for both algorithmic and manual penalties. So, of course, we looked at this closely from the start. We actually broke down the anchor text into three subcategories: exact-match anchor text (meaning the keyword is exactly the keyword for which you would like to rank), phrase-match anchor text (meaning the keyword for which you would like to rank occurs somewhere within the anchor text) and commercial anchor text (the anchor text has a high CPC value).
Exact-match anchor text
We broke exact-match anchor text down into a couple of metrics:
- The most common anchor to the page is exact match
- The highest mozRank passed anchor to the page is exact match
- There is at least one exact match anchor to the page
- The most common anchor to the domain is exact match
- The highest mozRank passed anchor to the domain is exact match
- There is at least one exact match anchor to the domain
Across the board, every single metric related to anchor text provided some positive predictive power except for highest mozRank passed anchor to the domain. Importantly, no single factor had a particularly strong mean Spearman correlation coefficient. For example, the highest was that the domain merely had a single link with the exact match anchor text (.11 correlation coefficient). This is a very weak signal, but our analysis looks to find patterns in these weak signals, so we are not necessarily hindered because each measurement is not sufficiently predictive.
For the biggest victims of Penguin, we often see that exact match anchor text is the second- or third-largest predictor. For example, the below webmaster's predictive vulnerability score could be lowered by 50% simply by impacting exact match anchor text links. For this particular webmaster, the anchor text hit most positive signals we measure regarding anchor text.
Now let me say it one more time: I am not saying that Google is using anchor text to determine who to penalize, rather that it is a strong predictor. Prediction is not causation. However, we can say that the groupings of exact-match anchor text metrics allow us to detect Penguin vulnerability quite well.
Phrase-match anchor text
We broke down phrase-match anchor text in the exact same fashion. This was one of the more surprising features we noticed. In many cases, phrase-match anchor text metrics appeared to be more predictive than exact-match anchor text. Many SEOs, myself included, have long depended on what we call "brand blend" to protect against over-optimization penalties. Instead of just building links for the keyword "SEO", we might build links for "Virante SEO" or "SEO by Virante". This may have insulated us against manual anchor text over-optimization penalties, but it does not appear to be the case with Penguin.
In the example I mentioned above, the webmaster hit nearly every exact match anchor text metric. They also hit every phrase match metric as well. The combination of these factors increased their prediction of being impact by Penguin by a full 100%.
Shoving your high-value keywords inside other phrases doesn't guarantee you any protection. Now, there are a lot of potential takeaways from this. It could be an artifact of merely doubling the exact match influence (i.e. if you score high on exact match, you will also score high on phrase match). We do see some of this occurring, but it doesn't appear to explain all of the additional predictive power. It could be that they are targeting other related keywords and thereby increase their exposure to other parts of the Penguin algorithm. All we know, though, is that the predictive power of the model increases greatly when we take into account phrase-match anchor text. Nothing more, nothing less.
Commercial anchor text
This is my favorite measure of all, as it shows how Google can use one of its most powerful ancillary data sets, bid prices for keywords, to detect manipulation of the link graph. We built 4 metrics around commercial anchor text.
- The page has a high-value anchor in a single link
- The majority of the anchors are valuable
- The majority of links are very high-value anchors
- Has a high CPC site-wide
Both having high-value anchors and very high-value anchors had strong predictive values of penguin vulnerability. In keeping with the example we have been using so far, you can see that removing commercial anchor text would have a profound impact on our prediction as to whether or not the site will be impacted by Penguin.
If you've been paying close attention, you may have noticed that a lot of these are related. Having exact-match and phrase-match anchor text likely means you have highly commercial anchors. All of these metrics are related to one another and it is their combined weak signals that make it easier to detect Penguin vulnerability.
Link sources
The next issue we tried to target was the quality of link sources. The most obvious step was trying to detect commonly spammed link sources: directories, forums, guestbooks, press releases, articles, and comments. Using a set of footprints to identify these types of links and spidering all of the backlinks of the training set, we were able to build a few metrics identifying sites that either simply had these types of links or had a preponderance of these types of links.
First, it was interesting that every type of link was positively correlated, but only very weakly. You can't just look at a bunch of article directory submissions and assume that is the cause of a Penguin penalty. However, the combination—that is a site that would rely on four or five of these types of techniques for nearly all of their PageRank—would appear to have a greater risk factor.
At this point, I want to stop and draw attention to something: Each of these groupings of factors appear to have some good predictive value, but none of them comes even close to explaining the whole vulnerability. Fixing your exact-match anchor text links, or phrase-match links, or commercial anchor links, or poor link sources by themselves will not insulate you from detection. It is the combination of these factors that appears to increase the vulnerability to Penguin. Most sites that we see hit by Penguin have vulnerability scores that are 250%+, although in Penguin 2.1 we saw them as low as 150%. To get to these levels you have to trip a wide variety of factors, but you don't have to be egregiously violating any one single SEO tactic.
Site-wides
This was one of the most disappointing features we used. I was certain, as were many, that site-wide links would be the nail in the coffin. Clearly site-wide links are the culprit behind the Penguin penalty, right? Well, the data just doesn't bear that out.
Site-wides are just too common. The best sites on the web enjoy tons of site-wide links, often in the form of Blog-Rolls. In fact, high site-wide rates correlate negatively with Penguin penalties. Certainly this doesn't mean you should run out and try to get a bunch of site-wide links, but it does beg the question: Are site-wides really all that bad?
Here is where we find the real difference: anchor text. Commercial anchor text site-wides positively correlate with Penguin penalties. While we cannot say they cause them, there is definitely a predictive leap between just any old site-wide link and a site-wide link with specific, commercially valuable anchor text.
This also helps illustrate another issue we SEOs often run into: anecdotal evidence. It is really easy to look at a link profile, see that site-wide, and immediately assume it is the culprit. It is then seemingly reinforced when we scratch the surface with too simple an analysis like looking at the preponderance of that feature among sites that are penalized. It can and does often lead us down the wrong path.
Trust, trust, trust
Of all the eye-opening, mind-blowing discoveries revealed by the Open Penguin Data project, this one was the biggest. At minimum, we all need to tip our hats to the folks at Moz and Majestic for providing us with great link statistics. Two of the strongest metrics we found in helping predict Penguin vulnerability were MozRank greater than MozTrust (Moz) and Domain Citation Flow over Domain Trust Flow (Majestic).
Both Moz and Majestic give us statistics that mimic to a certain degree the raw flow of PageRank (MozTrust and Citation Flow) and an alternative often referred to as Trust Rank (MozRank and Trust Flow). They are essentially the same thing, except Trust metrics start with a trusted set of URLs like .govs and .edus and gives extra value to sites that get links from these trusted sources. These metrics by themselves, while useful in other endeavors, don't really give us much information about Penguin.
However, if we flag URLs and domains where the trust metrics are lower than the raw link metrics, we score some of the highest correlations of all factors tested. Even cruder metrics like whether or not the domain has a single .gov link help predict Penguin vulnerability. While it would be insane to conclude that Google has a subscription to Moz and Majestic and use them to build their Penguin algorithm, this appears to be true: In the aggregate, cheap, low quality links are a Penguin risk factor.
What we should learn
There are some really amazing takeaways that we can build from this kind of analysis—the kind of takeaways that should change your understanding of Penguin and Google's algorithm for many of you who are not yet seasoned professionals. So let's dive in...
Penguin isn't spam detection, it's you detection
Try this fact on for size. If you hit every anchor text trigger in the Open Penguin data set, our predictive model actually DROPS in effectiveness. At first glance this seems counter-intuitive. Certainly Google should catch these extreme spammers. The reality is, though, that cruder algorithms generally clear out this type of search spam. If you have done any traditional off-site SEO in the last three years, it will probably create additional Penguin vulnerability. The Penguin update is targeted at catching patterns of optimization that aren't so easily detected. The most egregious offenders are more likely to be caught by other algorithms than Penguin. So when the next Penguin update comes out and you hear people complain about how some spam site wasn't affected, you can be confident that this isn't a flaw in Penguin, rather a deliberate choice on Google's behalf to create separate algorithms to target different types of over-optimization.
The rise of the link assassin
It was Ian Curl, a former Virante employee and now head of Link Assassins who first pointed out to me the clear future of SEO: pruning the link graph. Google has essentially given us the tools via GWT to both view our links and disavow them. A new class of link removal and disavow professionals has grown over the last year: SEOs who can spot a toxic link and guide you through the process of not just cleaning up a penalty but proactively managing your link profile to avoid penalties in the first place. These "link assassins" will play a vital role in the future of SEO in just the same way that one would expect a professional gardener to prune back excessive growth.
The demise of cheap, scalable white-hat link building
Let me be clear: If it works, Google wants to stop it. We have already heard the shots across the bow for lily-white link building techniques like guest posting from Matt Cutts. Right now, the only hold-out I see left is broken link building which is only scalable under certain circumstances. Google is doing its best to identify the exact same footprints you use to link-build and adding them into their own link pattern detection. It isn't an easy task, which is why Penguin only rolls out every few months, but it appears to be one to which Google is committed.
The growth of integrated SEO
There is no way around it. If you are interested in long term, effective, white-hat SEO, you are going to have to build integrated campaigns largely focused around content marketing that include multiple forms of advertising. There is a great write up on this by Chris Boggs over at Internet Marketing Ninjas on Integrating Content Marketing into Traditional Advertising Campaigns. As Google continues to get better at detecting unnatural patterns, it will be harder and harder to get away with simply turning one dial at a time.
Next steps
The average webmaster or SEO needs to really step back and make an honest account of their current SEO footprint. I don't mean to be fear-mongering; only a fraction of a percent of all websites will ever get hit by Penguin. 75% of adult males who smoke a pack a day will never get lung cancer, but that doesn't mean you should keep on smoking because the odds are in your favor. While the odds are greatly in your favor that Penguin will never strike your site, there is no reason to not take simple precautions to determine whether your tactics are putting your site at risk.
Big Mistake in Write Up:
Hey folks, if you didn't catch it, it should say "MozRank greater than MozTrust (Moz) " not "MozTrust greater than MozRank (Moz) " in the trust section!!!
First of all, thanks for the great study and all the explanaitions and takeaways!!
The second reason for this comment is the following. It turned out that in the trust section there was a mistake which you identified and fix:
'... it should say "MozRank greater than MozTrust (Moz) " not "MozTrust greater than MozRank (Moz) " in the trust section!!!'
But also in this section I found another questionable statement:
'Both Moz and Majestic give us statistics that mimic to a certain degree the raw flow of PageRank (MozTrust and Citation Flow) and an alternative often referred to as Trust Rank (MozRank and Trust Flow).'
Shouldn't it be replaced? I mean MozRank and MozTrust. Cause it more likely to be like this:
'... a certain degree the raw flow of PageRank (MozRank and Citation Flow) and an alternative often referred to as Trust Rank (MozTrust and Trust Flow).'
Anyway I hope you'll make it clear for me an others.
And thanks again for sharing it with us :)
This is a really detailed and really useful post. It's sparked some interesting points of discussion in the comments too. Thank you.
"While the odds are greatly in your favor that Penguin will never strike your site, there is no reason to not take simple precautions to determine whether your tactics are putting your site at risk."
Great point. I recently started working with a site that sailed through Penguin 1.0 and 2.0 (an everything in between) just fine, but then got nailed by Penguin 2.1. Just because you haven't got whacked yet that doesn't mean you are completely immune.
On top of that, there is a highly likelihood that Google targets pages that are already receiving traffic, meaning that you become vulnerable once you reach the rankings you have been trying to earn. It is hard to know until then if your site has been analyzed by the Penguin algorithm until you have strong rankings and make it through an update.
I think you hit the nail on the head with this statement. Whenever a site is succeeding against the monster companies that spend millions on advertising an update will come out to push the big companies up the rankings. Just another way google is keeping the big guys happy and keep them spending.
I am an advocate for the smaller business. As I was once a smaller business in the insurance field battling sites like ehealthinsurance gohealth and a few more of the top big 5 players spending millions on advertising. Well we had them all beat at one point, but then of course google rewarded the top 5 big players after the updates most likely because they were spending with google and the smaller business was not. The smaller business was actually doing the correct things and still got hurt.
well Ched, Know you through SEO-Hop & panfecta , developed by IT Chimes ! you provided some useful information related to Stabilize Rankings From Google Dance After Penguin 5 and found its more useful.
Nick, this is good point and I think everyone who has had any link building performed as part of their SEO strategy in the past should take a good hard look at their backlink profiles. Penguin issued in a paradigm shift in SEO. Those who recognize this and adapt will succeed; those that don't, won't.
Thank you for the endorsement, Russ. The OPD project is a great community effort and it's good to see the predictive tool resulting from that work. Great post.
No problem - there are a lot of great link removers out there and it is great to see people starting to perfect their craft.
Hey folks, Ill get on these comments as soon as I get into work later today. Also, Ill be speaking on this topic thursday at Pubcon for those of you attensig.
Excellent post, Russ. Love the graphic. Gonna head over to PenguinAnalysis.com right now
Nice graphic - that must have taken quite a while to make.
Wow this is post is so useful that I wrote you a generic comment message. Thanx!
winner of the best comment goes to...
great comment here :) I used an ungenric one - but uncommented x)
It was and still is a great post
That's cool post Russ. Predicting in advance which factors can get you penalized and working on these before you actually got hit by Penguin is fascinating.
Many webmasters are aware of some factors which can get them kicked by Penguin, but tools like yours would help them to find out more vulnerabilities in their website and work on them before the disaster arrives.
Thanks for sharing the tool.
Thanks for the positive comments. I really just wanted people to see that Google is looking at patterns of manipulation, not single data points. It makes it a lot harder to game.
Well said Russ.. penguin, panda and now hummingbird. Google is introducing one by one new algorithm. But the motive of all updates are same. Create unique content. And put lots of variation in anchor text. Having duplicate content are exact match anchor text is not a good practice.
What are you smoking? Content appears to have little to do with ranking. I see sites on the first page with their sitemap ranking!?!
Zero content! Rubbish duplicate content, errors, NO onsite but they rank, still there post whatever animal Google rolls out, this has nothing to do with quality, those words are a smokescreen as organic search is dismantled.
The current Google mindset of presenting alternative searches is a backward step, I want a page full of similar results that I can open in tabs and compare for myself, not be given what Google deems fit for me to receive.
If I search for Brad Pitt, that is what I want to know about, not one or two results about Brad Pitt and then a page about a guy called Brad who once dug a pit or Brads fire Pitt warehouse.
More often in results I see two moderately relevant results then a series of results with the two word query separated by a paragraph of content, what?
Google is broken, results are less effective each time and the push towards fully paid search continues.
Now these updates have reached the point where you are afraid to talk about your product for fear of tipping some scale and how to link from information sites or third party advertising sites?
If your company name is a keyword and you build links with your company name how is that considered spam?
It appears it is now!
Let's suppose a site is called ABC Condos, the name condo is unavoidable even in url links, company names, url links to specific condo pages.
One very revealing conclusion from the data is that a high CPC keyword site will be at risk, of course, they are the people who are most likely to be forced to use adwords if the site is penalized!
Don't tell me Google are not singling that sector of customers out for demotion.
I don't know how You search Brad Pitt but I have 100% related searches...
It was a hypothetical example of where search is going and perhaps not the best as the results would of course be wiki, IMDB, and some news sites before the plethora of rubbish sites driven by social media .
Can't believe you actually searched it!
I stand by what I say, it is like 2006 in search, WIKI and news sites or the classifieds sections of online newspapers showing a ton of outdated ads to trawl through.
If I want a product I want a company not a classifieds section.
Right product/page/provider not a garage sale.
I am not an old dear with time to waste at the mall.
...and that's why hummingbird is on
No offense but nonsense!
Hummingbird has been running for more than a month now and the results are still rubbish.
Still full of sites with no onsite and spammy backlinks.
As the author said a high cost CPC keyword will put your site in a dangerous place as Google sees you as $$$ and nothing else.
Excellent piece of research and some great advice - thanks!
It's great to see some refreshing, data-driven conclusions here. That said, I've said it before, and I'll say it again: I think there's a fundamental assumption here, and I think it's wrong.
I do not believe that Google's algorithm penalizes sites for their inbound link profile. (They do have a manual penalty for this, but that's beside the point.) I believe that Google penalizes sites for their OUTBOUND link profile. You lose rankings because the sites linking to you have been penalized, not because you've been penalized.
I wrote about this here: https://www.northcutt.com/blog/2013/10/are-negative-links-a-myth-hint-the-answers-complicated/
I don't want to speak with too much certainty here; none of us know the secrets of the algorithm. That said, based on everything Google has said, and thinking like a search engine, not like an SEO, it makes very little sense to design an algorithm that penalizes sites for their inbound link profiles.
I would love to see what the data would look like if we evaluated things based on that assumption instead.
Just my 2 cents.
Gre8 post, Thanks for sharing this valuable information.
post was emphasized mostly on Just Anchor text variation and the Trust factor of which the trust factor was given as the first priority.
Awesome information that was explained very well.
Question - how does keyword anchor text links ONSITE figure into this new Penguin? What if your site has a lot of anchor text links in content areas on your site?
Thanks
ron
Amazing data analysis! I am definitely going to be following you Russ
Nice article, love the Duck Hunt graphic!
Thanks! Decided to make that the background for one of my Pubcon preso's
Great information!
Great article, Russ and well worth the read.
This is a very good post about how to prevent your website by getting penalized and what measures should be strictly taken to be on safe end and do not get caught by the penguin radar.
Overall a great post from Russ. There are lots of reasons sites are getting penalised but the new manual actions option is great, that's if the users/client actually has webmaster tools linked.This is the problem if you get new clients who have been hit but you don't have access to WMT.
Thanks for passing this along Russ. It will be interesting to see what the future of White-Hat SEO brings.
white-hat SEO will bring down black-hat seo? lol. People will always try to game google's algo to come on TOP positions and they do succeed too! there are many results on which even google surrendered that they have bug in the system!
Great article having detailed information... thanks you for sharing.
Any risk of penalty with a sub 30 DA? Seems like small local sites would fly under the radar since they're competing for local longtails rather than big money keywords.
Again I have to wonder what would keep a guy from spamming for a competitor.
This is a very authoritative and significant study here, Russ. The visual illustrations are so persuasive and excellent. Predicting Penguin vulnerability is clearly the most vital aspect that should be seriously considered nowadays in order to carry on.
"So when the next Penguin update comes out and you hear people complain about how some spam site wasn't affected, you can be confident that this isn't a flaw in Penguin, rather a deliberate choice on Google's behalf to create separate algorithms to target different types of over-optimization."
Very good point. It means always do safe optimization by keeping user in mind not the ranking in mind.You maybe saved for a while, but can't escape for long time.
Google will do anything to give its user relevant search results :)
Yes @Russ That's a very good idea. Keep monitoring your keyword as well as your competitor keywords simultaneously before update and after update. Then analysis the which keywords are hit by penguin or panda rolling. Link source is the very good point you mentioned, I also agreed with this. Peoples are trying to submit their website to bunch of spammy directory and article websites. I suggest if you did this, search those links and remove those links, if you do not want to hit by Penguin. This is an awesome line : "Penguin isn't spam detection, it's you detection"
How are people handling anchor text now? Obviously there is no more keyword stuffing...so I was just curious what most SEO professionals are doing. Thanks in advance for your response!
I just mix most generic and natural anchors in my link building compaigns. Anchors which people are already using to link to other sites which they like. Such as, if someone likes sony's headphone's deal then they will just put the link there, they wont customize the anchor to show that it's a link of sony's headphone! And this sounds natural too. Other anchors can be visit, click here, visit our site, visit our website etc.
Awesome post Russ Jones. Thanks For Your Explanations.
I think thsi si the most impressive and relevant material I read. Great work I must say. However afte reading this SEO & PENGUIN pitfalls ... Google in near future will promote CPC and AD-WORDS model reliance on SEO will be abolished or too difficult to build or maintain. Thats my opinion.
It is unlikely Google will be able to get rid of all "organic". It will certainly become a tighter and tighter market with fewer and fewer non-paid avenues, but in the beginning at least it is more likely to squeeze out novices than professionals.
Great Information RUSS! After first and other Penguin Updates by Google. Its already hard to a SEO expert to maintain and stable in Internet marketing field. PPC will play a big roll in near future.
pretty cool thing. this tool may help a lot - i take a further look
Hi Great Post! I Just want to ask that my web page have many site-wide back links from our clients web pages and my rank have also fallen in last 1 month, is it due to penguin penalization me. If that is so than what can I do now?
Write them to put Your link in "partners" subsite not sitewide like now. If there's still problems ask them to add nofollow (I don't know if nofollow is enough).
I already tried nofollowing all the inner page links and left the homepage link as it is. BUT google webmaster counts all the links in it's database! Removing links from those pages is the only way you should try.
so no follow did not work then @yogendra? I thought it did?
Or make Your link as a flash - potential clients click it and google won't see/read it.
This post took me over half an hour to read and get my head around. And I have to say it was half an hour very well invested on the back of goodness only knows how many days you guys spent doing the stuff.
Thanks for this, it is a really helpful post. Personally I am glad the SEO community is moving away from ratios such as anchor text distribution because it means we don't use math to justify practice.
I really like the cross-over between CPC and Anchor text, we need to add more adwords data into our own processes now, so thanks.
Can you clarify for me (just because I didn't fully understand), why mT>mR is bad?
And my last observations on site-wide, as 50%+ of our business is web design, we have had site-wide footer links (traditional web designer credits) for ages. Granted we remoulded these to be mostly branded rather than phrase match anchors, but regardless, the actual existence of those has not seemed to be bad for us.
Thanks again, great work!
I think that won't boost You now (new algo etc) but could give potentional customers. I don't know if adding nofollow is good enough so You can make a flash link - clients click it and google won't read it. Everyone's happy:)
That was a mistake :-) It should say mR over mT is bad, not vice versa!
Brilliant - it makes perfect sense now! Thanks!
Hi Martin, i am seeing site wide anchor text for web design companies getting hit. There are a few good blogs on this. Change the links to no follow.
So I got one more to translate and post on my blog:)
I think exact match should be 20-30% maximum. Alll over is risky.
I do love spoof comments like this
You're point about sitewide links is interesting. There are a lot of sitewide links that are natural and editorially given so the key could be if it has commercial anchor text. An example of a tactic that creates a lot of sitewides are badges like the "I Heart Seomoz" badge or the Alltop badge or the AdAge Power 150 (discontinued). These may be great for branding and traffic referrals but since it creates a lot of sitewides it can be considered risky.
Really great, insightful post Russ. I have to say, I love how much detail you have gone into around the ranking factors concerning Google.
I totally agree that the only way forward is a full content marketing system. I personally am very excited about that as I feel its proper whole hearted marketing and us SEO's can really get into the heart of a business whilst marketing them to their target audience.
I love your quote - "turning one dial at a time"! I think it really spells out what we were doing so wrong in the past and what we should be focusing on now.
Awesome Post Russ. Predicting in Advance all Ranking factors. You were sharing ideal information about latest Google Penguin 2.1 algo, You have clearly mention in some point which I most likely exact match anchor text, Commercial and Phrase-match anchor text which really useful techniques in latest update. I have some doubts if my domain is exact match domain and I am using all pages based upon keyword for example www.XYZ(EMDDomain).com/search-engine-optimization.html, it can be considered risky? Please advise me.
WoW great post! U have done some really detailed research and u find some pretty helpful things. The tool looks interesting im gonna check it! :)
Any thoughts on anchor text on internal links?
We didn't study this and, unfortunately, it is too late to grab that data as a part of the study. My general opinion though is that internal anchor text is mostly useless. If it makes sense in context (ie: directing users to a particular page in the middle of content) then sure, go for it, but anything else is just silly.
Internal linking is very useful for direction users to a particular page in middle of content. It has some role increasing search rankings as well. You can go through these articles on MOZ-
https://moz.com/learn/seo/internal-link
https://moz.com/learn/seo/anchor-text
As someone who has just created a new site what does this mean. Another site of mine has been affected and now the confusion on how to proceed is rife. Do we give up on link building entirely ? As far as I gather I have seen lots of discussion and very little evidence that commercial SEO / marketing companies are still of value.
Link building is perfectly fine, you should either be willing to do one of two things...
1. Pursue it totally organically through content marketing
2. Be willing to invest a ton of time, energy and money into getting the right links one at a time.
The middle gray area is where you get caught.
I have to agree with Russ here. Focusing on building a brand that attracts links is the best and most organic way to get links. Utilize industry specific resource opportunities for broken link building and other tactics that have the potential to drive traffic back to your website with referral visits. Building a link just to get a link puts you in that "gray area where you get caught"
I have found great successes focusing on links that target real customers and drive real traffic. Much harder to get but have the potential to lead to conversions and usually increase ability to rank because they are from authoritative link rellavant websites.
Link Building is absolutely fine if it is done in such a way that readers are benefited. Doing it with the purpose of rankings will create trouble.
Just keep links relevant to content and context, you will have long term benefits.
Happy Link Building!
"if it is done in such a way that readers are benefited" How about explaining, it seems like nonsense to me.
Incredible data thanks for sharing. Do you have any insight on outliers like large brands for example? Not in a scientific study but with my own clients I have seen large brands get away with anchor text that much smaller brands would be nailed for. Partially due to the brand being linked to often organically through media and industry coverage but also due to anchor text manipulation.
The "brand bias" we saw in our analysis could almost all be described as trust. By-and-large, big brands enjoyed higher "trust" measures than raw link measures. This kind of insulation isn't a direct bias on Google's part, but a reflection that other trusted sites link to them, even if indirectly.
There is a read-worth article at SEW on the big brands issue https://searchenginewatch.com/article/2300841/Big-Brands-Google-Penalties-You
It might be helpfull, as it covers some points you were intersted in.
Thank you Russ... GREAT post!!!
Thank you Russ. Will the program ever be available for public use that gives the "Initial Vulnerability" and "After Recommendations" scores shown in your article's images?
It is already available at PenguinAnalysis.com
Hi Russ, great post! Just wanted to ask if you didnt mix up the MozRank and MozTrust metrics in the trust section of the article?
I was thinking the same thing.
Did you Russ?
YES! I have that backwards. Let's see if we can fix that...
Fixed! Thanks for the catch. =)
Nice post!
Hey Russ, thanks for taking us inside your company's tool that helps deter Penguin penalties. This data is incredibly interesting and shows the true value of what you're offering. The SEO community as a whole is moving from silly spammers to data-driven inbound marketers who have a strong knowledge about optimizing websites for organic search.
It is worth pointing out that Penguin really isn't targeted at "silly spammers". It is targeted at the bulk of SEOs who do a fairly good job of staying below historical thresholds for over optimization.
This point is really worth underlining. It's a common complaint from many removal clients that their competitors are using blatant spam tactics, but aren't being penalized. Illustrating that Penguin is more nuanced is a big step in understanding how the algorithm really works and what adjustments need to be made to optimization tactics for the future.
Great post Russ