Thin content! Duplicate content! Everyone knows that these are huge Panda factors. But are they really? In this article, I will explore the possibility that Panda is about so much more than thin and duplicate content. I don’t have a list of ten steps to follow to cure your Panda problems. But, I do hope that this article provokes some good discussion on how to improve our websites in the eyes of Google’s Panda algorithm.
The duplicate content monster
Recently, Google employee John Mueller ran a webmaster help hangout that focused on duplicate content issues. It was one of the best hangouts I have seen in a while—full of excellent information. John commented that almost every website has some sort of duplicate content. Some duplicate content could be there because of a CMS that sets up multiple tag pages. Another example would be an eCommerce store that carries several sizes of a product and has a unique URL for each size.
He also said that when Google detects duplicate content, it generally does not do much harm, but rather, Google determines which page they think is the best and they display that page.
But wait! Isn’t duplicate content a Panda issue? This is well believed in the SEO world. In fact, the Moz Q&A has almost 1800 pages indexed that ask about duplicate content and Panda!
I asked John Mueller whether duplicate content issues could be Panda issues. I wondered if perhaps duplicate content reduced crawl efficiency and this, in turn, would be a signal of low quality in the eyes of the Panda algorithm. He responded saying that these were not related, but were in fact two separate issues:
The purpose of this post is not to instruct you on how to deal with duplicate content. Google has some good guidelines here. Cleaning up your duplicate content can, in many cases, improve your crawl efficiency—which in some cases can result in an improvement in rankings. But I think that, contrary to what many of us have believed, duplicate content is NOT a huge component to the Panda algorithm.
Where duplicate content can get you in trouble is if you are purposely duplicating content in a spammy way in order to manipulate Google. For example, if a huge portion of your site consisted of articles duplicated from other sources, or if you are purposely trying to duplicate content with the intent of manipulating Google, then this can get you a manual penalty and can cause your site to be removed from the Google index:
These cases are not common, though. Google isn't talking about penalizing sites that have duplicate product pages or a boatload of Wordpress tag pages. While it's always good to have as clean a site as possible, I'm going to make a bold statement here and say that this type of issue likely is not important when it comes to Panda.
What about thin content?
This is where things can get a little bit tricky. Recently, Google employee Gary Illyes caused a stir when he stated that Google doesn’t recommend removing thin content but rather, beefing up your site to make it “thick” and full of value.
Jen Slegg from The SEM Post had a great writeup covering this discussion; if you’re interested in reading more, I wrote a long post discussing why I believe that we should indeed remove thin content when trying to recover from a Panda hit, along with a case study showing a site that made a nice Panda recovery after removing thin content.
The current general consensus amongst SEOs who work with Panda-hit sites is that thin content should be improved upon wherever possible. But, if a site has a good deal of thin, unhelpful pages, it does make sense to remove those pages from Google’s index.
The reason for this is that Panda is all about quality. In the example which I wrote about where a site recovered from Panda after removing thin content, the site had hosted thousands of forum posts that contained unanswered questions. A user landing on one of these questions would not have found the page helpful and would likely have found another site to read in order to answer their query.
I believe that thin content can indeed be a Panda factor if that content consistently disappoints searchers who land on that page. If you have enough pages like this on your site, then yes, by all means, clean it up.
Panda is about so much MORE than duplicate and thin content
While some sites can recover from Panda after clearing out pages and pages of thin content, for most Panda-hit sites, the issues are much deeper and more complex. If you have a mediocre site that contains thousands of thin pages, removing those thin pages will not make the site excellent.
I believe Panda is entirely about excellence.
At Pubcon in Vegas, Rand Fishkin gave an excellent keynote speech in which he talked about living in a two-algo world. Rand spoke about the “regular algorithm,” which, in years past, we've worked hard to figure out and conquer by optimizing our title tags, improving our page speed, and gaining good links. But then he also spoke of a machine learning algorithm.
When Rand said “We’re talking about algorithms that build algorithms,” something clicked in my head and I realized that this very well could be what's happening with Panda. Google has consistently said that Panda is about showing users the highest-quality sites. Rand suggested that machine learning algos may classify a site as a high quality one if they're able to do some of the following things:
- Consistently garner a higher click-through rate than their competitors.
- Get users to engage more with your site than others in your space.
- Answer more questions than other sites.
- Earn more shares and clicks that result in loyal users.
- Be the site that ultimately fulfills the searcher's task.
There are no quick ways to fulfill these criteria. Your site ultimately has to be the best in order for Google to consider it the best.
I believe that Google is getting better and better at determining which sites are the most helpful ones to show users. If your site has been negatively affected by Panda, it may not be because you have technical on-site issues, but because your competitors’ sites are of higher overall quality than yours.
Is this why we're not seeing many Panda recoveries?
In mid- to late 2014, Google was still refreshing Panda monthly. Then, after October of 2014, we had nine months of Panda silence. We all rejoiced when we heard that Google was refreshing Panda again in July of 2015. Google told us it would take a while for this algo to roll out. At the time of writing this, Panda has been supposedly rolling out for three months. I’ve seen some sporadic reports of mild recoveries, but I would say that probably 98% of the sites that have made on-site quality changes in hopes of a Panda recovery have seen no movement at all.
While it’s possible that the slow rollout still hasn’t affected the majority of sites, I think that there's another frightening possibility.
It's possible that sites that saw a Panda-related ranking demotion will only be able to recover if they can drastically improve the site to the point where users GREATLY prefer this site over their competitors’ sites.
It is always good to do an on-site quality audit. I still recommend a thorough site audit for any website that has suffered a loss in traffic that coincides with a Panda rerun date. In many cases, fixing quality issues—such as page speed problems, canonical issues, and confusing URL structures—can result in ranking improvement. But I think that we also need to put a HUGE emphasis on making your site the best of its kind.
And that’s not easy.
I've reviewed a lot of eCommerce sites that have been hit by Panda over the years. I have seen few of these recover. Many of them have had site audits done by several of the industry’s recognized experts. In some cases, the sites haven't recovered because they have not implemented the recommended changes. However, there are quite a few sites that have made significant changes, yet still seem to be stuck under some type of ranking demotion.
In many cases like this, I've spent some time reviewing competitors’ sites that are currently ranking well. What I’ll do is try to complete a task, such as searching for and reaching the point of purchase on a particular product on the Panda hit-site, as well as the competitors’ sites. In most cases, I’ll find that the competitors offer a vastly better search experience. They may have a number of things that the Panda-hit site doesn't, such as the following:
- A better search interface.
- Better browsing options (i.e. search by color, size, etc.)
- Pictures that are much better and more descriptive than the standard stock product photos.
- Great, helpful reviews.
- Buying guides that help the searcher determine which product is best to buy.
- Video tutorials on using their products.
- More competitive pricing.
- A shopping cart that's easier to use.
The question that I ask myself is, “If I were buying this product, would I want to search for it and buy it on my clients’ site, or on one of these competitors’ sites?” The answer is almost always the latter.
And this is why Panda recovery is difficult. It’s not easy for a site to simply improve their search interface, add legitimate reviews that are not just scraped from another source, or create guides and video tutorials for many of their products. Even if the site did add these features, this is only going to bring them to the level where they are perhaps just as good as their competitors. I believe that in order to recover from Panda, you need to show Google that by far, users prefer your website over any other one.
This doesn’t just apply to eCommerce sites. I have reviewed a number of informational sites that have been hit by Panda. In some cases, clearing up thin content can result in Panda recoveries. But often, when an informational site is hit by Panda, it’s because the overall quality of the content is sub-par.
If you run a news site and you’re pushing out fifty stories a day that contain the same information as everyone else in your space, it’s going to be hard to convince Google’s algorithms that they should be showing your site’s pages first. You’ve got to find a way to make your site the one that everyone wants to visit. You want to be the site that when people see you in the SERPS, even if you’re not sitting at position #1, they say, “Oh…I want to get my news from THAT site…I know them and I trust them…and they always provide good information.”
In the past, a mediocre site could be propelled to the top of the SERPS by tweaking things like keywords in title tags, improving internal linking, and building some links. But, as Google’s algorithms get better and better at determining quality, the only sites that are going to rank well are the ones that are really good at providing value. Sure, they’re not quite there yet, but they keep improving.
So should I just give up?
No! I still believe that Panda recovery is possible. In fact, I would say that we're in an age of the Internet where we have much potential for improvement. If you've been hit by Panda, then this is your opportunity to dig in deep, work hard, and make your site an incredible site that Google would be proud to recommend.
The following posts are good ones to read for people who are trying to improve their sites in the eyes of Panda:
How the Panda Algorithm Might Evaluate Your Site – A thorough post by Michael Martinez that looks at each of Amit Singhal’s 23 Questions for Panda-hit sites in great detail.
Leveraging Panda To Get Out Of Product Feed Jail – An excellent post on the Moz blog in which Michael Cottam gives some tips to help make your product pages stand out and be much more valuable than your competitors’ pages.
Google’s Advice on Making a High-Quality Site – This is short, but contains many nuggets.
Case Study – One Site’s Recovery from an Ugly SEO Mess – Alan Bleiweiss gives thorough detail on how implementing advice from a strong technical audit resulted in a huge Panda recovery.
Glenn Gabe’s Panda 4.0 Analysis – This post contains a fantastic list of things to clean up and improve upon for Panda-hit sites.
If you have been hit by Panda, you absolutely must do the following:
- Start with a thorough on-site quality audit.
- Find and remove any large chunks of thin content.
- Deal with anything that annoys users, such as huge popups or navigation that doesn’t work.
But then we have to do more. In the first few years of Panda’s existence, making significant changes in on-site quality could result in beautiful Panda recoveries. I am speculating though that now, as Google gets better at determining which sites provide the most value, this may not be enough for many sites.
If you have been hit by Panda, it is unlikely that there is a quick fix. It is unlikely that you can tweak a few things or remove a chunk of content and see a dramatic recovery. Most likely, you will need to DRAMATICALLY improve the overall usefulness of the site to the point where it's obvious to everyone that your pages are the best choices for Google to present to searchers.
What do you think?
I am seriously hoping that I'm wrong in predicting that the only sites we'll see make significant Panda recoveries are ones that have dramatically overhauled all of their content. Who knows…perhaps one day soon we'll start seeing awesome recoveries as this agonizingly slow iteration of Panda rolls out. But if we don’t, then we all need to get working on making our sites far better than anyone else’s site!
Do you think that technical changes alone can result in Panda recoveries? Or is vastly improving upon all of your content necessary as well?
Hi Marie,
as I told you yesterday on Twitter answering to your "Thin Content and Panda" question, citing Duplicated and Nearly Duplicated Content (or Thin) as main issue in a Panda penalty is a classic excuse for being able to introduce changes that will benefit the SEO performances of a web site in general (crawlability, indexability and fighting keyword cannibalization).
But I would not exclude it also as a Panda factor, maybe more as high correlated with Panda factor than an actual causation (the classic SEO dilemma).
Many times in the past Googlers told us that thin content is not necessarily a problem, because even if "thin", that content provides a unique value to the site's users. Or, to make another example, many sites that substantially consists in very well curated content from many different sources (think Techmeme) do not suffer Panda because - albeit being their content not literally unique - they have and provide an unique value. A final example could be considered everything international SEO when targeting 2+ countries sharing the same language: Google stated clear that we don't have to cross-canonicalize but using the hreflang mark-up, because it understands that even if the content of those websites may be nearly identical, those tiny (thin?) differences are what make them very different one each other.
The problem is that people tend to oversimplify everything when it comes to SEO, and equal "unique content" literally with how much text is on a page, when "unique content" should be read as "unique value content".
The problem is understanding what Google means with "value". I am glad you cited in one of your links the (in)famous Amit Singhal's questions, because they still are a good base for understanding this. At the same time I am glad you cited the Rand's MozCon presentation, because I believe too that user generated signals are possibly used in the always learning Panda algo.
Users and Search User Experience is what really matters now, also when it comes to Panda (less with Penguin, which - IMHO - is closer to a classic antispam algorithm).
Therefore what we should do when auditing a website is trying to match the pure technical needs for being perfectly indexable and visible on Google with what our audience may be interested for real.
I think another issue is that Google seems inconsistent with how it measures "value", I have numerous times added detailed articles from various researchers onto our site and whilst some have soared others have struggled to be found for very basic terms (I am not talking top 10 here I am looking down to the 1000)
Thanks Gianluca,
I agree that sites that are hit with Panda have a great push to become GREAT sites rather than remain mediocre.
I think that part of the problem with "thin" content is that there is some confusion over what exactly is thin. Many people have said that all short content is thin content. Or, anything under xxx words is thin. But as you have pointed out, not all short content is thin. Short content can be quite useful. But, if a site has pages and pages of content (be it short or long) that no one would ever find useful or engage with, then this is thin in my books.
I like your definition Marie, thin content = low quality, thick content = high quality.
How then, does Google measure the quality? Engagement metrics? If it is a short but high quality post, engagement for time on site might be low, but frequency of visitation could be high, and bounce rate could be low as it leads people to explore the site further.
What then is the ultimate aim of any content... getting people to interact with and explore your site.
Hi Karen,
I think that this is the point that Rand was trying to make with his excellent talk on living in a two algo world. In the past we could focus on certain metrics and try to improve them, but now, with machine learning algos it is possible that there are no specific metrics to focus on. I think that engagement metrics could be part of the mix, but "the mix" probably consists of thousands of factors and combinations and varies from niche to niche.
My advice is to truly find ways to make your content better and more "engage-worthy" than anyone else.
"Engage-worthy" = in a nutshell :)
"The problem is that people tend to oversimplify everything when it comes to SEO" - Exactly! Without a 20 minute conversation about context and specific pros and cons, ways things work and variations we've seen, it's impossible to say "X all the time" because it's rarely ANYTHING all the time.
Off site duplicate content CAN hurt your website. It can outrank it. That's not a penalty but it can outrank you. If it outranks you and gets the clicks, and you don't own that duplicate or benefit from it, that hurts you. But then people hear that and say "he thinks duplicate content penalty exists!" No - but it does do harm to you in the wrong circumstances. So people say "post yours first, with canonical, wait, then republish." Well, if you post that same post on Medium, chances are it will still outrank you. And you'll still lose clicks to your own site.
It's lose-lose many times these days with Google. I do like "unique content value." That's not a bad way to look at it!
Hey Marie
The problem with penguin and panda recoveries is 100% down to the mentality of the site owners and unfortunately, many SEO consultants. You can't spend years building out near duplicate landing pages and spam links and then just remove them and recover. It's a completely broken mindset.
Certainly, there is merit to technical optimisation and we see sites with lots of URL based duplicate content and crawl issues where technical fixes are an important element yet this is done with a goal to improve the overall quality.
It is not about what you take away. It is about what you can add. It is about quality. It is about being the best. No link clean up, content clean up, crawl optimisation or other typical SEO audit action points will work if they exist in a quality vacuum.
Site owners have to realise that the days where these workaround strategies delivered results are all but gone. Search engines are smarter and search engine users now demand more from the sites they click and interact with.
Be the best. Offer something unique. Focus on quality in everything you do. Be the authority on a given topic. Have the most relevant and detailed content to map to your audiences informational needs. Build visibility and trust where it matters. Don't do stupid stuff to damage a search engine or users ability to trust you. Work hard. Commit. Invest. Keep at it.
We have to work to change peoples mindsets and lead with a mantra of being the best and sprinkle the SEO smarts on top of that.
Great post. :)
Marcus
"You can't spend years building out near duplicate landing pages and spam links and then just remove them and recover. "
Exactly! I believe that in the early days of Panda, many sites could recover by tweaking this content. But now, as Google gets better and better at figuring out which sites are truly the high quality sites, this is not working any more. And that's a good thing. :)
I love everything about your comment. I often ask people who have been hit by Panda, "Why should Google rank you higher than those that are currently ranking in the top positions?" Sometimes I think that site owners have a hard time seeing that their baby is ugly. In other words, they may say, "Well, we have been recognized as the authority in this space for years." But very few users remain loyal to brands these days. Rather, they want the best experience.
I believe that this is why so many mom and pop brick and mortar stores are going out of business. If "Joe's Hardware" was known as the authority in a town, but Home Depot, Walmart and Lowes move into town and offer a bigger selection, lower prices and a parking lot that's easier to access, then a lot of people are going to choose the big brands over the long standing mom and pop store. When people complain that Google prefers brands, I would argue that it's people who prefer brands, and Google wants to show people what they prefer.
And this doesn't just go for brands. I'm not saying that a smaller site can't compete with the big sites. But, in order to compete they have to offer users something that makes them prefer to go to their site rather than what they already know and are comfortable with.
As a small business owner, it's really frustrating to hear someone say "Just be the best". Oh that's all? Ok, sure. I'll just suddenly start out competing the giants in my industry who have multimillion dollar budgets for marketing, SEO, etc.
I hear your frustration Mark. I think you've nailed the main point when it comes to the struggle that small business owners have. People often argue that Google has a brand bias, but really, it's PEOPLE who have a brand bias.
I remember about 20 years or so ago when WalMart came to my small town. A lot of small businesses died because WalMart took their customers. The small businesses can complain and say, "But we have this...or that...." but the reality is that people preferred WalMart for lots of different reasons.
I think that this is the same struggle that online merchants face. In the past, if I wanted to start a niche store selling stationary supplies, I could probably outrank Staples in my area provided I built enough links and had a decent product. But now, Google is not into ranking the site with the most links or the most optimized site. Instead, they want to show users what users want to see. I am guessing that the average user is more likely to buy stationary supplies from Staples than from my niche site.
So, what is a small business owner to do? I don't want to say that you can't compete at all, but it's important to know that the only small businesses that are going to be able to compete well with big brands are ones that can really really make it clear that their site is the best option for Google to show, even when the majority of people have a bias towards large brands.
The duplicate content thing drives me scatty. I can't count the number of times I've had to tell people to stop panicking about it and just get on with writing good content rather than setting up intricate robots.txt rules to hide stuff.
I know. I've seen people freak out because their site's articles were scraped by someone else and now they are worried about Panda. This is not how Panda works. Now, if your site contains huge chunks of content that are copied from other sources, it may be best to not have this in the index. Also, if you have on-site duplication that is eating up your crawl budget, it's a good idea to fix this. But these issues are not likely what caused the site to have Panda problems. There has to be a deeper quality concern.
Think of a young basketball player.
They want to be the best so they look at what the best is doing. They copy all those moves and duplicate them to the best of their abilities. Then they act all surprised when coach puts that best player in and not them - we wanted the best, not a copy of it.
They want to be the best but their abilities are thin. They can't pass, shoot, run, and they know nothing of strategy. Those are all very thin abilities for that player.
That's frustrating, for the player knows that if they want to be the best they have to put years of work in. Of course, there are shortcuts like steroids and you can always cheat, maybe even try to injure that best player so you can take his spot.
The coach doesn't like this, however, and by trying to take those shortcuts, by not trying to put in the years of work, that player will never be the best and will never look good in coach's eyes.
The fact that the young player tried to game the system at all is an irrevocable mark against them.
Welcome to our content-driven world, where SEO no longer has a place.
Great analogy! I would argue though that Panda is not an irrevocable mark. That young basketball player could potentially work extremely hard and put in the time and effort to be great and could become one of the best players out there. But, as you mentioned, it's no longer going to happen via tricks and cheats.
I think when the term "duplicate content" came out as a term, everyone hooked onto it and did extensive analysis trying to understand what it meant and how to fix it. I think it's much better to have 60 quality pages with 40 thin/neglected pages than just 20 quality pages and you removed everything else.
Duplicate content has been over-applied to everything, including syndicated and republished posts as well as tags pages on a site that really aren't hurting. Like mentioned in the post, the focus should be mainly on creating new high quality stuff, rather than combing through technical elements on the site currently.
I was recently involved in a study of several hundred websites in the auto leasing industry.
The sites were all built using the same CMS platform that created landing pages with deals for each make and model of auto manufacturer. E.g. “BMW leasing deals”, “Audi A6 Leasing deals” etc.
The CMS platform didn’t allow the site owners to edit any of the content of these landing pages. Each page contained identical content other than the name of the dealer and the badge and model name of the vehicle.
I compared the ranking of these pages with the equivalent pages of other auto leasing websites that were not use this CMS platform. Each page in our control group had plenty of unique content and shared a similar PA to the sites we were evaluating.
Out of tens of thousands of pages using the CMS only one single page ranked on page one of Google UK SERPS. The sites in the control group were ranking for several thousands of keywords on page one.
I controlled for as many other variables that I thought might affect the results, but my conclusion was clear that the lack of unique content was the most likely contributory factor to the low ranking of these websites.
Interesting study Danny. This makes perfect sense to me. I think that in the past I might say that having duplicate content like this is a signal of low quality that could cause Panda to affect the entire site. But, I'm changing my view on this now. I'm having a hard time putting my thoughts into words though so hopefully this makes sense.
I don't think Panda says, "Aha...this site has duplicate content, so we won't rank anything on the site well." Rather, I think that Panda is recognizing that there is not much of value on the site that would be good to show users, and THAT is what is causing the site to not rank well. In a case like this, changing up the duplicate content so that it is unique would likely not improve the quality of the site. What would have to happen is a major overhaul of each site so that they each could provide good, unique value that would deserve to rank higher than their competitors.
I would imagine that this is a hard thing to do when you have a CMS that doesn't allow for changes. :)
Great article. By reworking all the thinner content on a website that once held top positions over the course of a year or so I was able to get the site back onto page one for all the terms it used to be at position one for, but you're right, I've never quite been able to get them back on top. Whilst the site in question ranks for terms we've targeted that it's never previously ranked for. I think we've got to consider the weighting Google place on websites for their historical rankings, or recent historical rankings at least. Google want to provide useful information but they also don't want their results to be all over the place so there's got to be some kind of brake on the system which favours websites that currently rank well over those that don't. If you take on top of that user signals for those sites that rank less well potentially being less favourable it's just harder to get even great content to rerank once it's has been dropped. My gut feeling is it's not just about reworking the content to get the recovery, it needs a fresh approach, maybe a reorientation of the targeting to take on other terms and come at things from a different angle.
Excellent points Simon. I wonder what Google does in order to reevaluate a Panda hit site in terms of engagement. If a site is ranking poorly because of Panda, how can they determine whether users like it? Who knows, perhaps they have something built in to Panda where if they determine that content is changing, that they occasionally pop the changing pages up in the SERPS to see how users engage with it. Maybe that happens when Panda refreshes.
One of great question is how they gather this data... So let's start - imagine that we have a large computer cloud that can render page with loading of all CSS, JS and execute them to render proper DOM and creating real screenshots. This isn't miracle even today there are few solutions like PhantomJS in market. But this doesn't show general problem - where user click and how user will interact with page.
So just for second imagine that you have also testing laboratory with monkeys. You show them pages and watch how they interact with it - scrolling, time per visit, bouncing, clicking and points of click, sharing, goal completions. And many other small but very important factors used for ranking. So you can very quick determine great, good, moderate, bad and very bad sites. Also you can teach your AI with already stored user signals. AI also can be trained for links... but this is another story.
And let's back in real world. Webspam team was growing as work there is growing. More sites need to be tested, more languages, more specific pages, etc. All this is definitely more and more work for them. So team need to grow and staff there to be 1/10 of all company and even more. BUT this wasn't happening...
HOW?
Can you remember testing lab with monkeys before? Yes - it exist in real world and we're seen it with many variations. It can be morph in different forms as SERP database, website analytics database, browser database, robot crawler database, links database, content database, social network database, and other types of databases. So everything here is counted and was used as input data for AI processing. And here magic happens - you can be shown in SERP with zero links, thin content, broken website, slow website, etc. if you are great with other ranking signals.
Once AI process all input signals it can give definitely rank them who to be on 1st page, 2nd page, 3rd page and rest. And here some people still trying to beat system with more links, more wide anchor texts, more content, etc. And in reality nothing happens because you can fake few input signals like few small battle wins. But you can won a war with all ranking signals.
PS: I'm a victim too...
Great points Peter,
I know that in the early days of introducing new algorithms like Panda, Google would train human quality raters and get them to rate pages and then they'd build an algorithm that duplicated what the humans were seeing. Who knows....they probably still do this. But, I think that machine learning could possibly be taking this over.
I'd love to know how they gather the data too. Are they really looking at a combo of CTR, dwell time, and possibly engagement? Or is it immensely complicated? Perhaps the algo says, "OK, we'll try introducing this factor into our decisions" and then another algo determines that the results weren't as good and then the original algo says, "OK, that didn't work so well so let's also add this factor". That's a scary concept....that machines could be determining what I get to see all day long as I Google. We've already seen people write about how Google can sway an election. Who knows what else an algorithm is conditioning me to believe?
Thank you for your contribution Peter . It is very helpful .
Terrific analysis Marie. Well done! Far too many people have been focusing on short term tactics. I've always thought the right approach was to do ten things to the site to make it a richer, more useful and engaging experience for users, knowing that Panda might only be able to detect 5 of them and give you credit.
Exactly. Thanks for the kind words Michael. :)
One methodology toward panda recovery involves focusing on making significant improvements to your remaining trafficked pages as a top priority. Remember, Google has to be able to collect enough data to be statistically significant in determining the quality/value of your site as it relates to others. Simply deleting your low-quality content may be a bright idea, but as that content was unlikely to be garnering significant traffic, there won't be enough subsequent data to determine that your site as a whole has made improvements.
If you focus on improving the site overall, with special attention to the pages currently attracting the most traffic, you will likely be able to send a faster signal, that can be trusted, that you have cleaned up, improved, etc..
I like this idea. Trim the thin content and then make the remaining, already trafficked content as incredible and helpful as possible.
SHendison (Scott) helps me with my site. Awhile ago, together we decided to remove many thin content pages that got almost no traffic in a year and beef up the better ones.
I use Hitslink in addition to Analytics, which allows me to track individual pages over years. I can definitely see an improvement in traffic in those pages that I worked on.
Great article. Just shows that there are millions of ways to analyze Google's actions. Just what they want I would think.
Thanks for the post Marie, the task completion in my opinion is the most important signal in the on-site work, which also affects Panda.
That makes sense to me Roy. I can't prove it, but I would think that if more users complete a task on your site than on your competitors that Google would want to show your site first!
This is some very good analysis on Panda. Agree with the whole duplicate/thin content discussion. I have been hanging in the ecommerce world a lot lately...and seeing what I would have used to call a 'thin content' website ranking just fine in Google. Several examples of this. I think it is important (as you mention) to look beyond simple explanations of 'thin' and 'duplicate' content and get down to the nitty-gritty details as to what might really be causing the Panda issues.
Thanks again Maria :)
Quality original content is the #1 constant, we all know that. My concern is: 1) Tags from our blog creating duplicate content and 2) Our shopping cart generating duplicate content. I have spoken to our cart provider and they are researching the question. Wordpress is another monster within itself, so much good and bad with WP.
Hi Joseph. I'd highly recommend watching the John Mueller hangout referenced in this post where he talks about duplicate content issues. Google knows that some CMS's create lots of duplicate content via tag pages and other pages like this. It sounds like John was saying that this really shouldn't be an issue for most sites. That said, it's always a good idea to tidy up these things were possible. But, I think that it's unlikely that this type of issue alone will cause Panda problems.
Hello Marie,
Such kind of points needed to be raised because it always create confusion when everyone talks different. So, we marketers get confused what to believe on and what to do..The most important and simple thing I want to share - Just forget about the online business, website, marketing, content, keyword, links, etc..Think whenever you go to the market and buy something, what action you take, how you react and how other costumers react. Then find out the avg. behavior of costumers, and also keep in mind that what they expect the most from seller..Now, notice how different seller sells their stuffs, this could be the best example you will ever get. Because, it's a real and original marketing, there are no google algorithms OFFLINE..
This is the only thing we should learn from our self, because we make things too much technical which could be solved simply.. Let me come to the point, write contents which you think could be helpful to readers and also can help you too. Ultimately you will be a better position in google eye, just stop thinking about the google penalties, provide original and quality stuffs only...Because, google is also there for doing business, not just for helping people.
I don't know if I am totally right or not, but this is the fact. Anyway, I appreciate your thoughts Marie, you think out of the box..Thanks and keep sharing such interesting topics :)
Thanks Shubham,
I think that part of the confusion with Panda is that in the past, a site with mediocre content could rank quite well if they figured out the right way to present the content. But now, it's getting harder and harder to convince Google that you are good unless you truly are good.
Yes i checked many time in search engine result .lots of blog and search result available online.but which is the right and correct information showing anybody doesn't know.i really like this post. and thanks to marie .who aware on this topic.
Thanks for this great post.
I've been frantically reading as much as I can about Panda and peoples' experiences with it.
My site is a forum. Its all UGC and has been around for 14 years so is well respected and has always done well with minimal work within SERPs.
The main thing for me when Panda 'hit' (unless it is coincidental) is a dumping of over 400k URL's from Googles index. I went from 1.3M to 900K in a matter of days. Ouch. Its been sitting at between 910K and 930K ever since (mid-late August.) I can see this horrible graph in Webmaster Tools.
As for 'thin' content, as all the content is UGC its hard to rewrite or maintain it at all. The site has always done well because the content is always changing. The area that has the 'talking crap' posts is for members only so not available to the crawler. This was a conscious decision to make sure we're recognised as a quality site.
Anyway, my traffic has dropped so much over the past few months and its starting to do damage. We're only just keeping heads above water with ad impressions atm.
Thanks again for the post, not sure how I can make any changes though. :(
Hi Christian. It sounds like you are doing the right things. I've had some forums make Panda recoveries by trimming out the obvious thin content such as user profiles that were indexed or posts that have no replies.
Forums can be hard because UGC can be either awesome or awful. If I were running a forum today, the only way I would do it is if every single post were moderated and if I had the ability to manually select certain posts to be noindexed. I know that this is not an option in a large forum.
It might be worthwhile to spend some time digging in to analytics to see if you can see what users are really engaging with and what they are not. For example, although bounce rates and dwell time don't always tell the whole picture, if there is a certain type of post that consistently has a lower bounce and a higher time on page, then encourage your moderators to start more discussions like this. Conversely, if all of the posts in your "Introduce yourself" category (just an example) are rarely read, then it may be a good idea to noindex that entire category.
Thanks so much for the reply!
Most of the dwelling occurs in the For Sale sections which sucks because for the most part, thats terribly thin content. Occasionally there are big For Sale threads with tonnes of pictures and heaps of detail but not often.
I'll need to do the robots test in webmaster tools again as I think a few url patterns have changed since last time.
Interestingly, I noticed yesterday that Bing is showing similar patterns with my content. It might not be Panda after all. Perhaps I have upset them somehow. :(
Note, a quick search of Google showed 55k results for user account pages on my site. Have updated robots to kill those.
Thanks again!
Just a thought, Christian, but do you have any sort of conversions on your site? If so (and I hope that you do, whether it's "Create an Account" or subscribe to a newsletter) then I'd take a look at how conversions have been affected with the drop in traffic.
I work at an SEO agency and something we've been seeing, especially with bigger sites, is that they're getting less traffic because the search engines don't want a single site ranking for thousands of terms. However, the traffic that those sites are still getting is very targeted traffic. So, for example, we've got a big ecommerce client who's lost lots of organic traffic, but conversions are actually UP; Google has figured out what terms result in conversions in them (and remember, a user that converts means Google returned a good result, so Google wants some sort of conversion on your site, too) and so, from a raw ranking perspective, it looks like the site is getting hammered. But their revenue is actually up.
Now, I know that this is small consolation if your revenue is coming from ad impressions, but it may be the case that this is the new environment; and setting up conversions can be a helpful way of measuring user engagement which, again, Google is interested, as well. If that's the case, and you're getting less traffic but it's more targeted traffic, then it may be time to incorporate some sort of lead gen-component to your ads, rather than raw impressions.
I really hope this helps.
Thanks John,
We have no conversions. :( I'll definitely set some up though, I've wanted to but didn't see a registration as a conversion (for some reason) I just assumed that was for ecommerce sites.
I guess until I've set one up, google wont actually see a registration as a good thing, will it?
Thanks again! Much appreciated!
Christian
Christian-
Google should be able to understand conversions on your site with or without any tracking; adding conversion tracking to your site is to help you understand the behavior of users. So, let's say that before all of this happened with your site, you were getting 500 visitors a month, but 200 of them just glanced at a page and bounced; they didn't poke around the site, they didn't sign up for an account, etc. No engagement. The other 300 visitors clicked through the site, and 50 of those users signed up for an account. At that point, you've got a 40% bounce rate, and only 10% of users signed up.
Now, let's say that after the change to the algorithm, you're only getting 300 visits a month. But if you're average time on site is higher, and you're still getting that 50 sign ups, then suddenly your conversion rate (sign-ups) jumps to 17%, and your bounce rate is much lower. So what Google is doing is discovering that certain sites get better traffic for a smaller number of keywords. So even though your total number of visits is down, how are your engagement metrics? Has bounce rate improved? How about time on site, pages per session, etc.?
It's not enough to simply look at raw number of visits, but you want to set your analytics up (and begin to think about your website) so that you can determine: am I getting a lot of window shoppers, or am I getting customers? Even if you're not selling something, you want visitors to your site to do something, and that something is often measurable.
Here are a couple of helpful articles to get you started on thinking about micro-conversions:
https://support.google.com/analytics/answer/266521...
https://searchenginewatch.com/sew/how-to/2202915/top-5-microconversions-you-should-measure
@Marie Thanks for sharing such a great post. I am 100% agree Panda is not about thin and duplicate content. And I strongly believe that updating thin content is not solution for recovery. I may be wrong but I would personally advise to add fresh and brief good quality content. Off course, i am not ruling out other key factors at all.
No one saw the panda uprising coming. One day, they were frolicking in our zoos. The next, they were frolicking in our entrails. They came for the identical twins first, then the gingers, and then the rest of us. I finally trapped one and asked him the question burning in all of our souls – 'Why?!' He just smiled and said ‘You humans all look alike to me.’”- Sgt. Jericho “Bamboo” Jackson
Your article give a lot of information on google panda like How it works and what we should do to make our site content free from any duplicate content. but I wonder if panda recognizes the keywords which are intentionally stuffed in the article or Does panda algo penalizes on stuffing keywords in article. Your reply will be much appreciated.
I don't know whether Panda looks at keyword stuffing, but we do know that there is an algorithm that is specifically just for keyword stuffing. It runs every time your site is crawled and not on a sporadic basis like Panda. If a site is severely keyword stuffed this can be a part of Penguin as well as it can contribute to on-page webspam.
I do think that if your site is keyword stuffed to the point where users are annoyed then this could possibly affect you in the eyes of Panda as well, but I don't have proof for that.
Thanks Marie Haynes for the reply and I analyse my websites and find few things to fix according to your reply. Can you tell me more about the algorithms which specifically targets the keyword stuffing process as I am more curious to know more in a way you provide information. Thanks in advance
As far as I know, there really is not a lot of specific information known about the keyword stuffing algorithm. John Mueller mentioned that it runs in real time. But I don't think anyone knows exactly how it works. I've seen plenty of keyword stuffed sites rank well, so it's certainly not a percentage thing. Sorry...I don't have more info than that.
Thanks and don't be sorry for that as the information you provide suffices my curiosity. I will look forward for more posts coming from you
Hi Marie,
Insightful and very relevant post, considering I am into content creation. Even the comment sections had loads to learn from.
Thanks! :)
Hey Marie, thanks a lot to aware the people about PANDA.
Cause when they hear the word PADNA they always start to think about DUPLICATE & THIN CONTENT ONLY. Where it is actually more than it.
Well, Google is becoming extremely smart and PANDA is the one of the reason among it. But there's a confusion too in it. Suppose I'm writing the content for my services which is already written by some one else and it's much similar to my content. So, does google crawl will get my content as a duplicate content? Yeah, I know there are lot's of softwares are available to check the duplicate content, but though few things will remain the same as we both providing the same services. This kind of issues are more arise for eCommerce website.
Anyways, It was a really awesome article to read about something new.
Cheers :)
Interesting article! Thank you. Quick question - I have a website with about 10 pages that are just menus. Is this the sort of page you refer to in your article as being 'thin' and would I be best advised, as is wouldn't make sense to add content to them, to make them no-follow,
I think that in many cases pages that are just menu pages could be thin content. I would be surprised if 10 pages got you into Panda trouble though. I've seen cases where a site's system results in thousands of pages like this. For example, one page is just a list of every car company and then this drills down to a list of every possible model of car and this drills down further and further with the end results being thousands of pages in Google's index that no one ever would click on from search.
What I'd probably do if I had your site would be to look at Google Analytics data and go to Behavior-->Site Content-->Landing Pages and then use the pre-set Google filter to look at only organic traffic. Are users landing on these pages from organic search? If yes, then you may not need to make any changes. If no, then would it make sense to beef these up with good content to the point where Google would be proud to display them to searchers? Or, if they are just there for navigation and not meant to attract users from search, then the best option may be add a meta-noindex tag to these pages (not nofollow) so that they are still a part of your site, but not included in search.
Thank you :-)
Great article Marie, Panda is highly entitled with the thin content issues but Yes, there is more to it than just thin content perception. In short your site should be good to the reader and publish high quality content and you don't have to worry about the Panda beast anymore.
My expertise is Hummingbird filter. Not sure you mentioned here, but this is a lot of how they judge AND it's the machine learning in action you speak of. Google can grade the topic thoroughness of your whole websites. If you write in topic silos = fail. If you have "topic creep" = fail.
Thanks for such a resourceful article, Marie. One thing that attracted my attention the most is the following. If you say that "Google is getting better and better at determining which sites are the most helpful ones to show users" and that the actual reason why penalised websites haven't recovered from Panda, does that actually mean that there is no help at all, or that simply one needs to work harder every day on improving the website in every aspect and produce more great content? Wouldn't that be spammy?
VS
I'm not sure if I completely follow your point VS. If someone is working hard every day on improving their content, I'd say it would be unlikely that this content would be spam. Now, if someone is working hard every day on figuring out how to trick Google into thinking that their content is the best, then that's another story.
I'm sure Google can't be persuaded that bad content is actually good. :) I wasn't sure about how spamming in terms of posting articles worked (I thought that there was a limit or something, which was dumb of me, sorry), but now I get how silly I was.
Thanks for your reply, Marie!
Hope to read some more of your articles soon.
Best,
VS
This is a great post about the hype around Panda. People generally over-analyze and develop the "sky is falling" attitude towards these updates. It's great to see you demystify it a bit. It's aimed at sites that have duplicate content with other sites...i.e. content farms. An old practice was to write an article and syndicate it across many of these sites for backlinks. So they put the stop to that by devaluing these farms. If you're site is a hub of repeated content then you're in danger but if you have a paragraph of similar content on pages on your site, probably less so. Plus if we continue to overload "thin" content for the sake of adding content rather than adding value, some other algorithm change (likely another cute animal) will come around that says "hey is this really necessary to add 15 paragraphs on your product category page?" then devalue your site for "inflated content".
Also when site owners or SEO are worried about the Panda update, they should be more concerned about sites they originally had authority from being penalized and therefore decreasing their overall link value. Too often I have clients or cohorts worry about Panda harming their site, when it's more of an indirect result of having a poor backlink profile that is obtaining authority from content farm-like sites who were impacted by the update.
It's all a rich tapestry.
Nice Article Marie,
You pointed some really good topics here. I am currently working on Job Portal and I have seen many websites with links from really Bad/Spam websites on Google's the 1st page.
I am new to the SEO world. Please correct me if i am wrong.
Another thing is those sites have huge amount of Duplicate content as Job Descriptions on all those websites are Duplicate.. because all the employers/recruiters copy paste the same job on every job site where they can post jobs.
Many Job websites are using Indeed API/Careerjet API/ and other free API's for showing jobs on there portal because of which the amount of duplicate data increases.. Even after having Bad/Spammy Links + Huge Duplicate content those job sites are at 1st page. Over thousands of Job Websites are using Indeed.com's API, why this do not affect Indeed.com itself? Indeed is on the 1st spot on nearly all job related keywords. Its really good website but it do have duplicate content.. in-fact a big amount of duplicate content.
Do Google neglect Duplicate content problem in this Niche?
I have similar experience with Recipe Website. Do google also Neglect duplicate content problem in Recipe Websites too.?
You bring up a very important point Vishal,
The one place where duplicate content can be a Panda factor is if your site consists almost entirely of content that is taken (or scraped) from other sources. I have seen a lot of job sites hit by Panda.
However, if you use duplicate content in this way and you are able to add some type of significant value to the content then this can do well. For example, let's say that three different sites are copying a particular job description via Indeed's API. Site one simply lists the jobs as they appear on Indeed and doesn't add much else. This site will likely be hit by Panda. Site two takes the Indeed data and includes the job listings inside of a large page that lists the pros and cons of working for different employers. This could be good content. Site three takes the Indeed data and uses it in a niche specific job site that is just for a particular profession. They also have all sorts of useful articles for people in that profession who are trying to find a job. This is good too and deserves to rank alongside site two.
I think though that you are asking why there are sites ranking well when all they have is scraped content. This is hard to answer without having specific examples. I feel that in most cases Google is getting fairly good at weeding these sites out. However, I recently was looking up a recipe and came across a site that existed primarily because they scraped other people's content. They had a pinterest like collection of recipes from other sources. My first thought was, "Hey...why does this exist in the age of Panda?" However, if they are able to organize this content in a way that makes users keep coming back to their site and users really like it, then I could see why Google would continue to rank it well.
Thanks for your informative Reply :
It's true that Panda is all about quality. The best approach is to make sure your site doesn't get hit with a penalty in the first place. However, if it does happen, it's important to beef up the quality of the site by beefing up most of the content and deleting any pages that have such a low quality that they can't be salvaged. However, the overall goal is to make the site "excellent" and sometimes changing the content around isn't enough. The site should be analyzed thoroughly to determine what else needs to be changed.
google have even said themselves that the original algorithm has never been removed, merely had many add-ons so perhaps thats why panda dosnt seem to do loads, but check this out
Thanks for a great article
Rand said a similar thing back at Mozcon earlier this year and thats when I started thinking the same thing, if you really want to rank and out perform your competitors in 2016+ you need to be producing the best possible user experience with the best content to meet their needs.
Forget just chucking a few lines of text up and hoping to rank like in 2009, now you need to really answer the users questions. A reason why Wikipedia still ranks is that it answers users questions.
While we won't admit it, at Uni it was always the first site we visited for an answer - we never quoted it but it was always the starting point.
Haha...I use Wikipedia a lot too. :) But you're right...answer users questions and you should do well. The problem is that many people have a mindset where they say, "OK, how can I put more content up on a page that makes Google think that I'm answering questions" when really the mindset has to be "How can I legitimately be the most helpful website that exists?" For many sites, doing this is no easy task and this is why Panda recovery is so difficult.
Great discussion starter!
I've noticed there's been a lot of panic and confusion lately in the community about what to do with thin or duplicate content. I'm personally finding that I'm spending a lot of time these days merging pages that are ranking for similar search terms, and grouping based more on search intent than search keywords. I think we could all do with a thorough site audit, like you said. I find it helps to get that bigger picture on how all these little details and aspects of the website work together. Kind of like piecing together a large wall map, then taking a step back.
Excellent points Ria. Merging pages can be an excellent thing to do for Panda hit sites. If you have 50 pages on how to buy green widgets, how to use green widgets, how to compare green widgets, etc. etc., creating one massive page that tells the user everything they need to know about green widgets may be a much better idea. But...I would urge people to not just create long form content for the sake of having more words on a page. The page has to truly add value. If users look at the page and see it as a wall of text, then that's not going to help. (Not saying you are doing this...it's just that I've seen this before.)
What I like to do is look at competitors and ask myself, "What questions are they NOT answering?" so that I can provide the best answers. A great way to do this is to search Yahoo Answers for your subject. If lots of people are asking how to disassemble a green widget, then it probably means that the current articles on the internet don't cover this well. So, if you can make this a part of your content, it will add value that is above and beyond your competitors.
Dulicate content?
Maybe, my website have this problem. And I try delete old content and create my think content add website. I think my keywords top ranking
I think that with the tag "rel=canonical" that Google develop, the content duplicate is not a problem if you use this tool correctly. In wordpress you have some plugins to do this easily.
Thank you very much for the post. So we know as Google Panda works and what to do with the content.
Hi Marie,
Great effort! I like this post with excellent research.thanks.
Thanks for your useful post. If you want to Engineer click here.. https://adf.ly/1Pd8HE
Useful information!Can learn & know many things about google panda.Helpful!
Good analysis Marie. Well done!
Thanks @Marie
I saw that a bunch of sites with bad links pointing at them jumped back into the rankings. It is almost as if Google is now discounting the bad links as we have asked them all along. In addition, sites with authoritative content are doing better than ever before.
Regards:
Hi Ali,
Bad links pointing to Panda should not be a Panda problem, but could be a Penguin problem. I haven't seen any obvious Penguin movement, but Google does run tests from time to time before running a full Penguin refresh so perhaps this is what you are seeing?
Thanks Marie for the insightful post and busting various myths about Panda update like thin and duplicate content, high bounce rates and keyword stuffed content. Is it advisable to update overall content when a site face the Panda to eliminate every possibility?
In some cases this may be the best answer. If all of your content is mediocre content when compared to the top players in your space, then starting over might be the best option.
Thank You very much to post this types of awesome blog. We are developers from https://www.urbangekodesign.com/, we always make a website with full efficiency that is free from duplicity and SEO friendly