A weird thing has happened as a result of panda. Something you might have expected Google's Search Quality testers to catch before rolling the update out. Due to the domain-wide nature of the signal, high-quality, original content produced by the websites who were negatively impacted are now being ranked below the exact same content, republished by partners to whom they syndicate. Even more egregious, they are also being outranked by scrapers who effectively steal and republish the same content without permission or credit.
I have seen this briefly mentioned by observers, but I haven't seen this phenomenon transparently documented either in SEO press or in the Panda Google forum. The purpose of this post is to transparently share data from the site WonderHowTo.com (of which I am the CTO) and locate others experiencing a similar phenomenon.
Pre Panda
For three years, we at WonderHowTo organized the sprawling world of HowTo with taxonomical zeal and very human curation. By January, we had grown to more than 10mm monthly uniques. As our community formed, we began to shift our efforts towards the concept of covering timely news in the HowTo space (there is astounding innovation each day among the 427 subcategories we follow).
Our journalistic cred grew, and at the beginning of the year, two fantastic syndication partners Business Insider, and Huffington Post recognized our quality and eagerly published our articles in their sections (primarily Technology). On occasion, we noticed that our articles were outranked by our partners, but over the course of a few days, Google always got it right, recognizing the source as WonderHowTo. For the record, pre-Panda, we cannot recall one instance when a scraper outranked us with our own content in Google. Never. There seemed to be order in the universe.
Post Panda
Our Google traffic fell by 40%. Among our 1 million indexed pages, we experienced plenty of displaced rankings. Before getting into the what, how, & why, one thing has stood out as alarmingly egregious: Original content created by us is no longer able to rise to the top above our partners or even scrapers who republish our content. Ever. Panda branded us the Rosa Parks of content, forcing us to the back of Google's ranking bus, along with all the other sites which fit its profiling.
Crediting the Original Source - Google vs Bing
I took a look at the articles we're promoting on our home page and syndicating to Business Insider and Huffington Post. As I mentioned earlier, our articles also tend to get scraped and republished on dozens of sites within minutes of them being published. Post panda, it turns out Bing is doing a better (though still imperfect) job of ranking the original source (WonderHowTo) above the scrapers & syndication partners. Here are examples from a few recent posts (For simplicity, I searched for each article's exact title):
"How To Remove Your Name and Profile Picture from Facebook's Social Ads"
Original Source is #9 on Google
"Transform Your Android Home Screen into a 3D Environment with the SPB Shell 3D Launcher App"
Original Source is #7 on Google
"How to Add a Dislike Button to Your Facebook Page"
Original Source is #14 on Google
The larger implication is that if Google cannot rank the source first when searching for the exact title, then the source will also lose out on traffic from any additional keyword variations that the very same content ends up receiving on scraper and partner sites.
Deconstructing The Panda Damage
Our process has always revolved around human curation with the goal of weeding out anything low quality, it seemed odd that the hit would be so large. We did a deep analysis on a variety of signals (article word count, title word count, how many links, embedded media, how many comments, how many favorites, bounce rate, etc) to try to determine which individual pieces of content were getting hit the most.
We separated the content that gained the most traffic to compare against the content that had lost the most traffic, comparing signals & looking for trends. The results seemed random. Very short video descriptions would rank quite well, while long, detailed original transcriptions and guides were suffering. Every time we thought we'd found an influencing signal, we'd go on to find enough exceptions to negate it.
It became abundantly clear that Panda does not work by filtering out individual low quality content as was originally implied. It works by punishing entire domain names if an undetermined percentage of the content on that site meets the undefined "low-quality" criteria. Soon after we came to this realization, Google confirmed it in a statement to Search Engine Land, and an interview with WIRED.
This Site-Wide Approach Punishes High Quality Results
With this signal hitting an entire site instead of just its individual low quality content, the results fundamentally oppose the stated goal of search quality and fairness in attribution. The collateral damage results in Google burying the original source of high quality content, promoting those who steal, scrape, and republish above them. Furthermore, it ends up demoting other top quality results simply because of the domain on which the content resides. It's counter-intuitive to think that prejudicially branding every piece of a particular site's content, past, present and future is an effective way to promote top quality results.
Trying To Resolve Your Site-Wide Demotion
Within a week, several search analysis reports started popping up with post-mortem break-downs. Most were fundamentally flawed in that they only looked at the number of ranking places each site would loose without taking search quantity and click through rate into account. The bottom line is that the difference between ranking 1st and ranking 2nd is mammoth. As such if your site ranked #1 for a couple hundred popular queries and you got flagged by panda, the bulk of your traffic loss would be from those #1 positions changing to #2 to #10 positions. Shifts between #4-#8 don't make nearly as much of a difference. But I digress.
A consensus has been forming across the web stating that if you remove duplicate and otherwise low-quality content from your site, or do the work of telling Google not to index it, your classification as low-quality under panda would be lifted. The idea that you can get out from under this cloud started to gain traction as a couple of stand out examples started showing up.
Find Your "Problem Content"
The vast majority of content on WonderHowTo was written by our team of editors, researchers, and curators. It has always been our policy to write original descriptions for the videos our curators approve for our library so as to ensure authenticity, accuracy, and relevance. It is part of the added value we bring to the table when embedding how-to videos from youtube, vimeo, or any of the other 17,000 creators we've curated in our hunt for useful and excellent HowTos (Talented video creators often produce an excellent tutorial with zero regard to title or description, rendering them invisible to search. To these compelling voices, we have sent a steady stream of deserved traffic).
Over the years we have also consummated one-off agreements with a handful of partners who requested that we use their own specific descriptions, word-for-word, when including their content on our site. As was the Pre-Panda norm, Google would always rank the original source 1st, so there was no need for any one-off no-index tags to keep rankings in their correct place.
With the growing consensus that such republishing could be a major signal in getting a domain flagged, it seemed apparent that our biggest problem might be this content from our partners. After auditing our library, we found that about 16% of our content had been republished word for word from one of these partners. We would have to noindex these to take them out of search visibility.
Enact Your Sweeping Changes to Remove Your Problem Content
Once you've identified all your problematic content, it's time to noindex it. Digital Inspiration made a number of similar changes and saw his rankings restored within two weeks. Here are the changes we made to WonderHowTo as of March 25, 2011:
1. Duplicate Content from Syndication Partnerships
We added a robots noindex meta tag to each page where content was republished from one of our partners.
2. Related Video Pages
We realized that the pages we have that show all the related videos to a particular video were allowed to be indexed. So, we added a robots noindex meta tag to each of those pages.
3. Un-embeded Video Pages
When we don't embed how-to videos from around the web that we feel meet our quality guidelines for inclusion on our library, we provide a link for people to watch that video on the source site. As people who land on these pages from a google search may find this page to be an intermediary page, we think these may tripping the signal as well. So, we added a robots noindex meta tag to each of those pages.
4. Tag Pages
According to Digital Inspiration, allowing tag pages with inadequate content to be indexed may also trip this flag, so we added a robots noindex meta tag to all topic pages with fewer than 4 useful videos on them.
5. Page Link Count
I read that too many links on a page may have also been a signal. So, we cut the limit of the number of related topics to show on any given page down by 50%.
Wait for your Changes to Take Effect
Within a week, Google had re-crawled enough of our content to start removing the no-indexed pages from the index. We knew this would result in an additional drop in search traffic, but the hope was to rectify the side effect of Google ranking our high-quality content lower than the scrapers who republish it.
We are hopeful that the changes we've made will remove this site-wide flag, or that Google will tweak the algorithm to only target low quality content as opposed to an entire site. But as of today, (4/19/2011), the problem still exists. Google continues to drive people who search for our content to the republished versions on our partners sites and the sites who scrape us without permission or attribution. Our search traffic has declined (now partially because of our noindexing changes), and our high quality content continues to be outranked by less helpful results.
If you have a site that is experiencing a similar phenomenon, let us know in the comments. This behavior seems contrary to the fundamentals of search quality, and Panda specifically. Without making some noise about it, it may never be corrected.
Those SERPs are pretty scary. Unfortunately, it seems like webmasters of large sites have to work a lot harder post-Panda to control internal duplicates and "low value" content. This goes against Google's "we'll take care of it" mantra over the past couple of years to just let them handle duplicates. With all due respect to Google engineers, DON'T let Google take care of it - your ranking and traffic will suffer.
Nicely said, take control of it yourself. your site, your traffic!
Whoo! You simply sketch the dark side of the scary Panda! This is the big question mark on Panda! As I was initially started a research and found out that Business insider added one of your article and in the resource they linked to your page but when I found the results over on Google it was not really amazing to see that the original source of your content was on 6th rank and Business Insider page was on 1st in results
1. One of the reason is Website Credibility (I always believe that some websites are more credible then others in Google’s eye) and let suppose Business Insider is more valuable in the eye of Google then your website… may be this can be the reason why its ranking above the original source (though it shouldn’t be as the business insider page itself is linking to the original resource)
Also, 2. check social shares for particular URLs may be the social shares of Business Insider pages are higher than the shares of that particular URL and that’s tell Google that in the human eye this URL is more valuable than others…
Obviously it shouldn’t be like that and Google should fix this issue but above mentioned cases can help you figuring out what the real problem is!
No offense meant to the author, but the points made by this guy makes absolute sense why Google should rank business insider better than yours. However, why the scrapped conent sites are ranked well are left unexplained.
I don't think this explains why BI should outrank the original source at all.
Google should understand by now that if it finds duplicate content and several of the 'duplicate' articles are linking to a certain source, that source should outrank any other page as it is the original piece that other are republishing (whether this is with permission of the original author or without is irrelevant in this case). Thef
I would think Panda would've improved on this ability, in this case it seems to have gotten worse, unfortunately.
The reason is that it all boils down to what the user wants and needs: good, trusted content. Regardless of the original source, Huffington Post is always going to be regarded as more trustworthy than a smaller site with fewer users and fewer social shares – unfortunately that's the way it goes. As a user with an interest in SEO I can honestly say that I'd feel more comfortable reading a "how to" on a site like WIRED than on a random blog.
Except that if WIRED links to it, you would think that it has credibility. The fact that your blog is not as authoritative as WIRED should not mean your original has no value.
Surely WIRED is lending some authority to the blog post?
If you requested that your partners use canonical tags, that might have some posistive impact but it seems scrappers will always do it better, maybe it's time to look at how you are using social media to try and get that step ahead of them by sharing the content first.
Also a great post with lots of good tips on what you tested and found might have been hurting your rankings.
Realistically many vendors of content aren't going to want to use the canonical tag.
sorry if I'm pointing out the obvious. But you may want to consider your page layout. Especially on your video embed pages. I've counted no less than 7 ad containers on these pages. You have Adsense very tightly surrounding the video player and the text content on the page is very "thin."
Another thought...if you believe your domain was possibly flagged because you are republishing duplicate content, wouldn't that also flag your partners who republish your content word for word?
Two solutions that I see here are adding absolute reference links in the articles. Also having your partners add a rel=canonical tag to their page will solve much of the scraper issue. Believe it or not, much of the scaped content is being scraped from your partner's site...not yours. When they do this, their links (in many cases) point back to the partner site, which could be why Google is ranking the partner site above yours. Rel=canonical on partner site pages may help this.
Seems like a good solution to the problem. Thanks
I have seen this in action with an old client I have been speaking to called www.vinyltodigital.co.uk. They are a real small company, couple of people basically and they work from home but they were about the first company to do what they do or at least to have had a website which has been up for eight years or so, maybe more.
The don't have content strategy or any such thing but have never really needed one and have never really done much in the way of externall SEO but something odd happened in Feb this year. Weirdly, it was 7th of Feb and it is a UK site so it was supposedly pre panda but it tallies so closely with what you are seeing here and other reports I have seen that it's hard not to look for a connection.
Basically, they saw traffic drop by about 80%, certain pages are just buried where they had good rankings (well, we are not 100% sure on rankings as they were not really tracking but GA shows the terrible traffic loss).
Subsequent investigations have shown a few abnormalities but the big thing we have noticed is how many sites have copied the content. Some have taken paragraphs and tweaked them, others have practically stolen whole pages and images as well.
The kicker is that people who have taken the content rank above the client for it and seemingly, the whole site has picked up some kind of penalty and with no real active SEO going on, it's hard to see what else it could be.
We have contacted many of the thieves and are having some success getting the other content taken down and we are rewriting the content across the site to be doubly sure but it's certainly annoying and has cost a hard working, small business a load of revenue.
Still, odd, and pre panda but the effects are pretty much the same so felt it worth a mention.
CheersMarcus
Would be interested to now if anyone at all recovers even given the work done to fix these problems. I have had a lot of scrapers feast on my content too but I have also noted issues with my site in general. So I'm focusing on fixing my site and when I have time, I am running after scrapers. I've successfully shut down at least half a dozen full websites who have been copying my content en masse.
If Panda is indeed largely based on usage statistcs (click throughs, bounce-rates, etc) then we would expect some shuffling to go on for quite some time.
For example, let's assume that one of the most important metrics is Bounce Rates from SERPS. Let's say that you had been ranking #3 but Google determined that you had the 5th worst rate in the top 10. Subsequently, come Panda, you saw some depreciation in the rankings. In the meantime, sites that were position 11, 12, 13, 14 and 15 may move above you temporarily, as Google doesn't have good SERP bounce rate statistics for these pages. Once Google establishes this, they will be culled as well if they are not up to par - which they probably wont be.
You have to think of Panda as iterative. Google is giving other sites a chance to beat out yours because yours didnt perform so hot earlier. Over time, though, if yours is better than those it is trying out, it will be sifted back again onto the front page. At the beginning, though, the sites that are most affected are the ones for which Google had the best data - which in this case means sites that were already receiving substantial traffic from Google.
Just a theory, but makes sense to me.
Realistically many vendors of content aren't going to want to use the canonical tag?
Thanks everyone for the thoughtful feedback. I want to address some common themes.
Linking back to the source:
A couple of you suggested embedding links back to the source in our content. This is something we do currently require of our partners (and as it turns out, a number of the scrapers also include the link). At the bottom of each of the above syndication examples, you'll notice a link back to the source on WonderHowTo, using it's headline as the anchor text. We also include this link in our RSS Feeds (typically the source scrapers use when they republish our content).
rel=canonical
I'm sure some of you have been able to convince a large partner with an over-booked development staff to change code on their site for your benefit, but you'll also know it is not a simple sell.
Moving mountains aside, the larger point I'm trying to make here is that Google had already solved this problem. As Joshua Hedlund pointed out, Pre-panda, Google would always follow the "via" links in the syndicated article, allowing it to properly credit the source. Post-Panda, the domain-wide approach has negated that accomplishment.
If enough quality sites come forward to point out these problems, I'm hopeful that Google will act to come up with a page-by-page approach to improving quality instead of prejudicially forcing some sites to carry sand-bags on their backs.
It might be hard to convince your partners to include a canonical link to the original work, but it is even harder to suffer a 40% drop in traffic and loss in no1 position for your own work. It might be worth considering just how valuable these partnerships remain, now that you are suffering because of it.
Having the backlinks is nice, but if the backlinked work is still being outranked by the syndication, it pretty much defies the point. Instead it might be a good idea to start looking for more mid-level publishers who would be willing to create the correct canonical link to your work, giving you the complete benefit, while still allowing them to use your high quality content on their site.
I don't syndicate my content but my site is a victim of the Panda algorithm. My site was hit on Feb 24 and while I have done a LOT of changes and improvements on my site since, Google traffic has just decreased over the months. I am extremely saddened and frustrated because I see my site being victimized a lot by auto blogs. They probably notice that it is weakened by Panda while being a top site in its niche for sometime. I am now very exhausted by battling these scrapers regularly. I have done everything I can to protect my content -- only publish snippets on RSS, put copyright notices everywhere and warn duplicators of facing legal consequences. This is ALL TO NO AVAIL.
What I am seeing is that once you are hit by Panda, your site is weakened. So anyone copying your material, even aggregators, will rank higher. This is painful and terrible. There seems to be no way out of this, and is pretty much destroying my honest business. How can we continue when scrapers are ranking ahead, no matter what we do? I have great social signals while the scraper has none. Yet, my pages are not appearing in the SERPs while scrapers are, once they copy the material word for word. The only recourse is DMCA, but that takes time and in the meantime, traffic is affected.
While getting deranked due to content quality hurts, I can accept this, especially if it allows webmasters to improve their sites and go in the right direction. But getting victimzed repeatedly by scrapers while we are trying to regain our traffic and business is NOT and is illegal. I can show you egregious examples of this happening... and I am not sure what else to do, as these scrapers just keep coming back. I fight them off, but they simply return.
Hey guys, check this:
https://www.google.com/search?hl=en&biw=1280&bih=574&q=%22A+weird+thing+has+happened+as+a+result+of+panda.+Something+you+might%22&aq=f&aqi=&aql=&oq=
A piece of content of this blog is another proof of this strange thing with Panda.
Very funny! :) :) :)
omarinho,
it is not funny.
It illustrates that Google blocks any human (aka original aka manual) writing and promote SEO (aka spam aka finding what is SEO-ed but not what is searched for) and filling with spinning bots stolen + programmatically distorted (for uniqueness) spam farms
Excellent post! About time somebody made mention of this from a reputable SEO Site.
Several of my dating websites have gone from top positions right back to past page 3,4 and in some case completely removed. My SEO team have discovered that someone has been scraping the content.
Not only are scraper sites destroying this segment but ONE person in particular is using the popular tool ScrapeBox to get free spammy backlinks to his dating pages...which are shocking to say the least.
I have gone through the top 30 results for the keywords 'dating sites' and 'free dating sites'. Below are the results of the HIGHLY questionable results -
6. https://discussion.dreamhost.com/user-140696.html
11. https://webarticle.tripod.com/freedatingsites/
12. https://www.dreamscape.com/articles/freedatingsites/
15. www.uci-tsa.com/
17. freedatingsitesmm.blip.tv/
18. https://www.obesityhelp.com/member/rchlmccorm/blog/2010/09/23/free-dating-sites/
19. https://freedatingsites22.wordpress.com/2011/01/04/free-dating-sites-lets-visitors-to-come-together-and-also-acquire-really-like-for-each-some-other/
20. devzone.zend.com/article/13277
21. https://blog.weber.edu/groups/espaolcomerciali/wiki/34be9/The_things_you_must_watch_out_for_when_becoming_a_member_of_free_internet_dating_sites.html
22. https://collaboration.dumontnj.org/groups/humanitiesi2/weblog/77e99/Exactly_what_you_should_look_out_for_when_signing_up_for_dating_websites.html
23. freedatingsites45.commongate.com/post/free_dating_sites
24. www.momentville.com/freedatingsites/welcome
27. cospire.com/koviewer.aspx?id=22673
28. bigthink.com/michaelmartin3
29. www.photoblog.com/evaevgrc/2010/10/10/30 https://diablo.incgamers.com/forums/blog.php?u=352930
Can someone PLEASE tell me how this content is better than what was previously displayed before the update?
Same trouble here. We have a 9 year old domain, well linked from government, universities, blogs, you name it. Active community of 20,000 members and somehow we've been destroyed by Panda. Over 50% loss of traffic. Scrapers out ranking us for our own content.
It's seriously to the point that the site won't last long, since although traffic has been cut 50%, revenue is down 80%. Which means I'm no longer hiring, and I'm seriously having to consider laying people off just to stay afloat. Way to go Google! Seriously, Google should take some responsibility for what they've done to so many legitimate sites and businesses caught in the crossfire. It's not right.
Thanks for the article, you touched on a lot of good points. We were hit very hard from this update on our companies flagship site - stupidcelebriies dot net. This sites been around since 2006 and like you guys also gets featured on huffingtonpost.com as well as a lot of other big sites. We ourselves were on of the larger sites out there covering gossip. After the update, our google traffic is down 70% and we constantly get outranked by scrapers even after other well known sites link to our stories. At a big loss here!
Hopefully Google realizes their mistake and gets it fixed. If not, were going to end up laying off quite a few employees and contractors.
Jeff
Sounds brutal, but thank you for taking the time to document this. It is one of those issues that will require a community to get a grasp on.
Here are several thoughts I have had on the issue.
Rel canon based - It is possible that google is trying to achieve higher adoption of the practice with the panda update and will. Reward sites that use it. For instance, if you have several syndication partners that add rel canons but the scrapers don't, google may still restore order to the universe.
Higher view based - Perhaps what we are seeing with panda is much like the what would need to happen for us to change us government spending. It would hurt alot up front (Panda) but it that pain would force businesses and people to adjust how they do business. Google may be expecting publishers to take more of the burden when it comes to stopping scrappers from grabbing their stuff, and that websites should not be syndicating their content to larger providers without canonical reference.
I am not saying it the answer but it is another way to look at it. Google is a for profit business and not a socialist government after all. All the work that they put into policing results costs a ton of money. In the end, they are more worried about the quality of the search for the end user. How that information gets their is not a main priority for them. I can see how not rewarding great writers and curators with credit would deter them from continuing to write great stuff, but even with the scrapers, you stuff content is still reaching the end user, so googles job is done. The question is how do you solve that.
This is an awesome post - possibly my favourite since reading up on Panda. We're in a similar position, and what you've described here perfectly outlines the problems now facing many webmasters. With no clear guidance from Google, webmasters will just start blocking low quality content from their site, further impacting their already low traffic. The whole thing seems massively counterintuitive to Google's goals - they're penalising entire sites with good content, and then the webmasters are blocking more content to try and get the penalised content ranking again. With no clarification as to whether this will positively help those sites, we may end up in a position where entire content types (how-tos, voucher codes, review sites) are no longer accessible via search. Seems ludicrous to me. Would be good to compare notes on Panda - DM me if you're interested.
This is pertty scary. This seems to show the oposite effect of what the Panda Upate claimed to resolve.
I agree with david they really should have canonical tags pointing to your website instead of just links saying the content is from your website. I mean this content is really high quality on your website and i have a look at Huff post some of these pages have like 10k+ likes I am sure they are generating high page views off it I hope the agreement is worth it =)
But yeah I have seen a similar drop in traffic for a personal site where It was getting around 3k a day unqiue visitors global panada update has seen a drop to around 2k a day, I have noticed specific keywords doing poorly.
I think Panda was really a kick in the ass for small website owners and medium ones but real big boys in the industry are just going to kill it even more so.
Although I am a php developer now but I used to do small seo jobs like link building etc. before. So, I am finding this article interesting.
Let me share my experience and views on this.
As a php and Joomla! developer I was doing a demo project for one of my client. He asked me to build up a kinda scrappy site. That site was based on joomla! for publishing feeds from others site.
So, as a demo I developed it on my server using joomla and installed feedgator (a joomla component that imports feed). Guys you will not believe what happend next. Within 15 to 20 days articles of this demo site started to ranked on Google 1st page. I noticed it from statcounter. Traffic started to increase day by day.
So, what was the difference bitween my scrappy site and those original site? Those original articles sites are also highly optimized and built on WordPress. So, what is the difference? Again Joomla do not have any ping system like super blogging platform wp (Although there are extensions for doing so with joomla! But I did not use it.). So what was the difference?
That is a million dolar question. The answere is simple. The freequency of content update. The original sites get updated every day or every week. But a scrappy site gets updated every half an hour. My site was getting updated every half an hour with flood of fresh content from all around the web.
Although I have stopped for some weird reason. But this is fact.
========================================================================
follow @aWorkah0lic
Kamrul, interesting observation. Thanks.
the difference between an original content site and a scraped site would be the percentage of oringal content vs duplicate content on the web. The next difference between those two sites would be authority .
Both are easy to detect . Eventually things will be tweeked enough that scrapper sites will become obsolete like they should be. unfortuantely the se technology has not caught up with it all yet , but no doubt it will.
I have been hearing this since my early SEO days 2007! But, there is a but...
Our site is an independent online newspaper. We publish 50 to 100 stories every day.
We have 100+ active writers and 3 editors.
That means the site is being updated all day, from early morning florida time to late evening California time and sometimes even later.
We are being filtered, so update frequency by itself is probably not the answer.
If you are not hit by panda, then fequency might explain it, but if Panda hit you, frequency will not undo it.
Our google traffic is down 40%
Also we have several news blogs. they are all suffering too. Some of that is because we had to lay off staff and so now we're having trouble maintaining those smaller sites, so I have no idea if panda did that or if it is because we're a lot slower adding news.
Of course, our readers are now confused because we're not updating them as much.
Google just likes destroying other businesses.
Thank you for sharing this case study with us. Unfortunately, Panda doesn't punish low quality content because Google has almost no ability to directly evaluate the quality of content on a page. So, if your site has been Pandalized, you are stuck guessing which combination of site characteristics is causing them to guess that your content is sketchy.
An interesting thought exercise would be to ignore Google's statements about the update and the SEO media's claims about its targets and to look at what the actual effects were. We have heard that nearly a billion in revenue changed hands. So what's the bottom line?
If you're talking "quality" in the subjective sense of how well-written or useful content is, then sure, but Google certainly has the ability to measure characteristics it equates with quality. For example, two possibilities have come up in Panda speculation:
(1) Ad-to-Content Ratio
(2) Unique content Ratio (how much of the page is unique vs. rehashed from other pages)
There's definitely some evidence to suggest that Google can measure both of those factors. A page that's mostly ads or that's mostly the same as 100s of other pages except for one sentence could certainly be called "low quality".
That's a good point, but the fact is there are many examples of sites with these characteristics that are doing quite well, which would indicate that these direct measures of content quality are not the primary means Google is using to evaluate content "quality".
And really, if you can measure content quality, why not do it at the page level, instead of burning down the site as a whole? And why let duplicate or thin content be pushed to the top of the results just because it is on an authority site?
Yeah, no argument there - the "rules" seem to be too often applied inconsistently, and overall brand power and domain authority can overwhelm all sorts of quality issues. It's not a level playing field.
The Ad to Content ratio is definitely something that was raised in a case-study I read about Ezine articles and the hit they took with the Panda update.
Saying that, I'd imagine the auto-blogs and scraper sites that republish people's content have just as bad, if not worse ad to content ratio!
yep, noticed a lot of this after panda. what kills me is google's cache date implies the original source, yet our content is STILL below it... sometimes many spots below
Thanks so much for writing this. I hope the BIG G actually reads this article and really considers what is being said. We are in the exact same situation: A site-wide signal is clearly demoting our results through dozens of scrapers... It's awful.
Basically, every independent content publisher we know, save for about 5% of them, has experienced this same downfall. A few things are clear:
1) Google's team didn't really do a good job testing PANDA, considering what types of negative fall out could be expected from the algorithm, and fixing these before the roll out.
2) Google is trying to solve a quality problem using proxy signals that are far from perfect. While the signals may work reasonably well for certain short-tail search queries, in the long-tail, they are a disaster. There had to be better ways to ferret out good vs. bad content.
I hope they make some changes, but as Amit and Matt noted, they are "really happy" with this update. There are so many ways they could have addressed the content farm problem without doing this... so. many. ways.
Bryan, I feel your pain too.
We have 2 sites for software that we develop and sell. It's the programs that we created from scratch and are being appreciated by users for over 8 years. However, after the second Panda update both these products sites lost 50% off their google search traffic, and now websites that copied our content rank higher than us. The websites in discussion are https://www.novapdf.com and https://www.backup4all.com/
Because these are software products, there are a lot of torrent sites, download libraries and even blog posts that simply copied our content and some of them even link back to us. What puzzles me even more is that after the first Panda update, our traffic grew a little, so there was no sign that something will go wrong. But on the second update we were hit very bad. The novaPDF website even has a pagerank of 8, because we have several institutions that use it internally, and they linked to us so we have backlinks from .gov, .edu, but this doesn't seem to matter at all for Google.
We started doing some changes too (mainly excluding in robots txt pages with content that we thought might be considered low quality) but to this day no recovery. And it makes no sense why we where penalized given that all of the content we created shows our users how to use our products.
Honestly, I don't think that noindexing content which has been scraped by other sites would help here, in fact as you witnessed made matters worst in the short run. Google is no perfect engine by no means, and if your situation improves, I would say that your public post here made the most impact :)
This does seem to back the belief that the Panda update was counter intuitive. Perhaps having all partners have only an excerpt and then a link to your sire will help you to solve this issue, by reducing the amount of duplicate copy, and providing you with more referral traffic :)
There was a Google Webmaster video I think last week which was addressing something very similar to this - how to tell Google that you are the original author of the content, and not whoever gets it indexed first. From what I remember, there was very little help on how to do that, other than "tell us about it" - i.e. Tweets etc. - but obviously anyone can do that.
The video is here anyway incase its of some use: https://www.youtube.com/watch?v=4LsB19wTt0Q
Thanks for taking the time to write this post. It's confirmed much of what I've found over the past month with a handful of my clients. To get keep their doors open in the interim, I've recommended redirecting a large portion of their marketing efforts to PPC and Social Media at least until the dust settles. We've been actively revamping the content, contacting the offenders to remove our client's copied content and throwing a few Press Releases in for added credibility.
Looking at some articles, as a user they don't seem to be of real quality, here is an example: https://www.wonderhowto.com/how-to-hack-satellite-dish-into-wifi-signal-booster-257436/
Agreed, the video description is short but the fact this has had 300,000+ views and attracted a ton of comments suggests it is liked by people, and isn't that what we mean by quality content?
Although I'm not sure whether it was the steps taken by Digital inspiration owner really worked or was it his personal relations with google helped.He claims on his post that after taking the mentioned steps he did see the ranking back for his site.
However, one important thing that many of you dont know is that he really have a lot of "good friends" at google. As you can see from his blog posts, he regularly conducts interviews with google employees, takes questions, gets his DMCA complains adressed very promptly by Google. Do you think it is really possible to get all these addressed so soon for a common blogger?
With my own experience, I can tell you its NOT. Whether it is filing DMCA complains, asking questions or anything, I've never received a proper response from Google.
Given by the details posted for the wonderhowto site above, I dont think Google will be reacting so soon.
Google's policy seems to be clear - make rick richer, poor poorer. If big guy steals from smaller guy, the big guy is always right.
I've been blogging for last 6 years. Still get hit by Google randomly and I cant really do anything about it.
Scraped and copied dup content is outranking sites mauled by Panda. For an example this site: https://doyourownpestcontrol.com/boxelder.htm
Copy the article content into google verbatim, and you'll find the scraped dup content ranks higher at:
askville.amazon.com
It that's way for most of the pages now at: https://doyourownpestcontrol.com
Original content should always outrank duplicate content, unfortuantely as part of the Panda update this general rule seems to have been disregarded. I have seen similar issues with some of my content being outranked by scrapers, and I am NOT happy.
I think quality has been a huge factor in the update though, Nearly every single one of my sites remained stable after panda, except for my testbed sites which contained duplicate content. This includes my own personal article directory, which while still only a year old has strict rules against duplicate content, low quality content, and poor quality English. What really hit me hard was traffic drops on 3rd party sites such as Hubpages. I have enjoyed a 5 figure side income from their site for a few years now, and seeing that drop was a real kick in the teeth.
This may seem like a dumb content, but I looked at WonderHowTos site and some of the sites that scraped their content; I saw a lot of similarities in that they both have a lot of ads. I wonder if WonderHowTo didn't have so many ads, if they wouldn't get penalized in their rankings.
This is a problem which I started facing after 9Th update.. I have been reporting such pages to Google.com/dmca.html to make sure they get removed from Google and Google will understand and in some time it will started ranking my proginal content. Till now I have reported over 200 pages but result is still the same. Though traffic improved as many republished posts were removed from web search but the problem still exists for posts which I have been publishing these days :(
Here is a funny one:
Check the image. It;s a content from Mattc utts blog and blogs who copied from his blog is ranking higher than Matt;s blog.. Seems like it;s high time that Google should look into Panda algo and fix it ...
samething happened to one of my sites, but, I was able to get it back ranked. I am not sure which method or a comination of methods that I used. but I wrote in in my blog post How I recovered from Panda if anyone want to see the steps I took to get it back up my happened starting on april 12th after the global update.
I would expect that before Panda, your #1 problem was how to scale your business. We all have that problem. From reading your post, it seems to me that your answer was to create an interesting business model. I hate bring up Warren Buffet, but I would say that today your #1 problem is that your business lacks a moat. Or maybe that Google's algo was your moat.
P.S. I think that suffering a business setback and comparing yourself to a human rights icon like Rosa Parks is like getting the served the wrong bagel at Starbucks and comparing yourself to Ghandi. Or coming home from work and telling your wife that you went to Mars today.
yep, noticed a lot of this after panda. what kills me is google's cache date implies the original source, yet our content is STILL below it... sometime spots below -
Thanks for highlighting this issue Bryan. Our company Amara Software has had a very similar experience.
Software112's comment also sounds very much like what we're experiencing.
Our website www.amarasoftware.com got hit badly after the second Panda update as well. All our pages have suddenly fallen dramatically in the SERPs.
We create original Flash software applications and sell them online - which we've been doing successfully for over 8 years.
Suddenly, most of our pages have fallen off the first Google SERP for important keyword phrases for which they ranked in the top 3 for many years.
Our pages with original content are also now being outranked by software download sites who sell our applications and have a few paragraphs on their page that are similar to the copy on our site. That makes no sense.
We've already spent weeks checking the entire site for any content Google may consider low quality and making tweaks and improvements where we can, but we simply cannot find anything really wrong with our site that would warrant such a huge drop in rankings.
Our traffic has dropped by more than half and our sales have dived and almost ground to a halt. It's just scary that Google has such power to make or break serious companies who have built up their business over the course of a decade and who don't do anything dodgy that could be considered black-hat.
Just a few of the keyword phrases for which we used to rank in the top 3 are 'flash slideshows', 'flash software' and 'photo animation software'. For that last term, we ranked #1 and are now on page 2, with many download sites outranking us (listing our product).
We continue to tweak our website but feel that Google has got something seriously wrong in this update and needs to fix it.
Hello,
After reading your comment I'm curious to know if there was anything specific that you addressed to get back your ranking? For all these terms you rank now at 2-3 flash 'slideshows', 'flash software' and 'photo animation software'.
We've followed all the advice in the forums to get our traffic back since the Panda 2.2 run and to date have lost 99% of our Google traffic. We've been moved down for our own site name more than 50 places on average, have none of our articles appearing in the top 100 spots, and see scraper sites running stolen content from our all-original site returned in roughly 70 of the top spots for our article titles. Our terrible act that got us demoted? We are a news provider for IMDb. Yes -- they run our RSS feed excerpts with a link to our articles. We can't get any help from the Google team and are only insulted by their forum members for writing celebrity gossip news. The trouble is our content really is 100% unique op-ed pieces that roll a review of the latest entertainment news in with stars charitable activities not typically reported by any gossip magazines. The more good news we share, the more scrapers pick out site up... we've had to lay off most of our staff now (including writers, IT Marketing people, and researchers) because of the decline in traffic. Its heartbreaking to have to scrap such a worthwhile project over an algorithm flaw that they can't get around to fixing. How many other people who make their living have lost their jobs over Panda's flaw? The bug should have been fixed before it ever rolled out. We've gone from feeding 5 families who worked their hearts out to find and share great content to making less than $10 monthly in google adwords money -- not enough ad money to pay the domain and hosting fees. https://bit.ly/8YL74r
This site https://feedreader.com/feed/WP_Sites has scraped my entire site and stolen my traffic. They are ranking for all my keywords and displaying my entire site within their. I want to know who they are and where they live so i can pay them a personal visit.
How do i find out who owns the site and where they live?
Can Google be sued for promoting spammers and spamming approaches, I wonder?
And how?
There are a lot of comments on here so perhaps I missed someone else making this point:
The business model for WonderHowTo and all these other massive how to content sites is the thing under attack here. Putting up how to content for every subject imaginable in order to make money on ads is the thing Google is targeting. I don't think it really matters how good your content is or whether or not you noindexed your syndicated content.
When your site is about everything then it quickly approaches being worth nothing to anyone. The big winners here are quality content sites that focus on a specific niche. I don't know about you, but I would rather read a blog passionately dedicated to every aspect of red widgets than an eHow or WonderHowTo article that explains red widgets, along with blue, white, green, yellow, and orange widgets plus 10 million other topics.
No offense - I'm sure your content is good, I just don't think you'll ever reach the same quality level as the guy who thinks about nothing but red widgets all day.
I'd have to disagree a little with this comment. Sites that are very niche tend to ramble on and repeat the same ideas and concepts over and over, ultimately bringing little value or new knowledge to the reader. I don't read many blogs for this reason, I think its better to pick up a well written book or two. Although I agree that its better for specialization. I also think big authority sites are better and if Google is now trying to get rid of big sites then its a bit of a paradox as it supposedly trusts large authority sites.
hi brian, i have your same problem with my site here in italy (but i think panda is not yet in italy!):
https://www.paid2write.org/
it has 2 years and 11.000 articles all original and good.
even my friendfeed titles are better ranked than my original contents in serps. I becoming crazy. Im sure its not a manual penalty cause google answered to my reinclusion request saying there is no manual penalty. Maybe we had to ask to Matt Cutts but one thing is clear:
GOOGLE IS GOING WRONG THIS TIME
and as you see also search results are worst than before.
Also of possible interest is John McElborough's entry at his own site, "How To Beat A Panda." It's not a case study of a site hit by the Panda update, as this one is, but rather an analysis of what you can do to protect your site from Panda's secondary effects. For example, if your site relied on links from ezinearticles or buzzle or a combination of both to rank for certain terms, you may be slouching a bit. I'm sure an industrious article spinner will tell us at some point how they've gotten around that. In the meantime, John makes some good points. I may not have been focusing as much on the seo blog world in the past few months but I haven't read a lot about secondary Panda effects.
All I know is that I removed my 10% supposedly 'weakest' pages (even though they had some highly useful info on them) and small pages (with the assumption that these were read as weak). I'm still getting clobbered. It's an all-original, info-dense site on religion, philosophy, spirituality and such, that has proven its worth over the course of 16 years - www.spirithome.com. G says such sites aren't the target, but that's not what's happened. Bing's still doing just fine.
I would love to have someone tell me what the G folks are killing the site for - what their new algo is reading that calls it a content or link farm. No answers found yet. Maybe they just like to challenge us?
Report as of June 1 : rankings still gone.
Checked HTML syntax: airtight.
Checked inbound: at least 3 new major inbound links to front page. Usage of it went up as each one got added. But it was due to the link itself, which has now faded in impact, but still good. SERPs however continued to fall.
Deindexed seasonal files. (They're rather small and rank low when not in season.)
62 pages with substantive additional content.
Fine-tuned on-page using SEOmoz On-Page Report Card. Almost all pages were As. (And it said the files 'work well with others'...)
Got all pages canonicalized. Had a small hiccup with the front page thru on-site Google search, but now it all settles to one single page.
Results: No gain. No additional fall. Only 1 page returned to previous levels.
What now?
Forgot: spirithome.com is the site.
Very interesting post. Did I read it right though - you added to noindex tags to your own pages and let your partners appropriate the content fully? Have you tried using the canonical tag to list your page as the original source and the partners/scrapers pages as copies? - Jenni
Excellent well-documented post. Keep us updated as to whether your rankings recover or not.
Great post! Thank you.
I am a link builder, so I always check the links first, and when we are talking about business Insider, i wanted to look it up. So i checked out MajesticSEO and found that BI has about 80k referring domains, to your about 10k. Authority, and links matter, so maybe its time to look at a link building campaign. Now, NOT SPAM. Higher quality stuff, the custom done link building that takes time. Hopefully this post will get you some additional links.
Can you not threaten the scrapers with copy rite infringement if they don’t link back to the original article? If not Mark Hodson's idea is definitely the way to go.
You can do that - you can also submit a DMCA request through Google too (https://www.google.com/dmca.html), but I'd imagine you'd have a tough time to tackle every last site who scrapes/copies your content.
Bryan,
Interesting post, thanks for sharing.
Have you thought about inserting links in your articles for the scrapers (eg. 1 pixel square image with a link back to the page on your site and an alt attribute "this article was originally published on WonderHowTo.com")?
I presume you've tried Google spam reports on the scrapers? Any joy?
Good luck getting your traffic back.
Thanks Mark. Good reminder of the 'spam reports'. Just filed some reports.
Wouldn't a one pixel square image seem spammy and have a possible negative effect on the page? However I do like your general idea... it is always recommended to include as many relevant links as possible (by which I mean, up to 5 or 6) within your article text to other articles, so at least if scrapers scrape the HTML and don't bother to clean it up their pages will be linking back to you
I have the exact same issues with scrapers. But even those who have copied and only show a small excerpt PLUS a link back to my site are the ones appearing on the first page for my key words and my site is nowhere to be seen. This is even after I've fixed my site of its apparent Panda issues. External duplication is one of the biggest issues I'm facing at this time.
Now that Panda/Farmer has been rolled out to all corners of the globe, I think that we are going to be seeing a lot more SERP indiscretions, anomalies, and blatant search injustices. This is a very interesting post as it highlights the vast inadequacies of Googles Panda/Farmer update and that it isn't the great saviour from weak content SERPs that had been proposed all along. Too many strong sites with fresh and original content have taken massive knocks. Only time will tell, but honestly I think that we have been sold up river by Google and that there is a disguised motive behind the update that lingers in the higher echelons of the Googleplex.
I feel for you, having 40% of your work wiped out overnight ... not easy to stomach no matter how you slice it.
Appears they are also bias towards large publishers, makes sense since they earn the lion's share of revenues. Here is an article that is an interesting read
https://politicallyillustrated.com/index.php/lpnh/2522/
Would like to know how legit this is, reading over at the google webmaster forum official thread for compliants, it seems a lot of small businesses/sites were the ones to get hit.
Interesting thought: my business does not syndicate content with partners, nor do we republish other people's videos or content. I run a small business (with the emphasis on small) and our traffic has doubled since the Panda update! Perhaps it's high quality content + original content, where they look at both the quantity of original high quality vs. non-original AND percentage of the total amount of content?Plus, I believe it has to do with alternative media, this update. Think about it: which sites are the ones who republish parts of content or entire pieces of content, most of the time with an intro and a link to the original article? Yes: news sites!This only goes to show that you need to use our formula and/or create multiple sources of traffic so when Google hates your guts, your bank account won't hate on you too...
Really nice work. I would like to think that this is the kind of meticulously researched case study that Google would seek out and rely on for iteration to their search algorithm.
As an aside, you might find this helpful: You inserted a number of side-by-side search results (Google vs. Bing) to show the respective positions of the same item in the two result lists. There's actually an excellent visualization technique for this--simple and intuitive to grasp without having to scan back and forth, concise and easy to create in R, python, etc. I first saw this technique used on ProgrammableWeb.com by a developer who created a mash-up of Google & Yahoo search results. I'm not sure that Post is still on P/W, but you can find that post reproduced on the developer's Site here. Also about a year ago,, i posted a recipe (step-by-step example in code) on StackOverflow for this plot type (which are apparently called 'parallel coordinate plots' by the data viz experts).
The syndicated site thing is bizarre, though. Do I understand right that the syndicates always link back to you? (Otherwise why would you want to syndicate if you are just giving them page views) Surely Google should be able to easily tell, page A and page B have the same content, page A links to page B, so page B must be original...
It's really unfortunate that the original authors are getting slammed for "duplicate content" simply because so many other sites have copied their great content. The point of Panda was to get rid of those splogs that steal and republish content. If Google can't tell who the original author is, Panda might have been doomed to negatively impact the wrong people before it was ever launched.
This is quite scary. I just had a look at the stats of Italyum.com, a site that has top SERP results worldwide for "italian recipes" and a statistically valid amount of visitors. Not a trace of Panda claw marks.
But this site is 100% unique content, and they don't make any effort to syndicate, chasing down copyright infringements. Does Google not want us to share?
It seems that after Panda up gradation G is not identifying the original creator where they are mainly focused on brand values,brands are getting reward for original content creation,its bad part of this algorithm....
Another thing we need to know how can we share or syndicate our content to our partner's without having a fear of pandalization...just curious to know..If G could incoporate some specified rules for that,then web business can survive and pollution could be controlled..
You know, if the Panda update's focus and aim was to get rid of duplicat content, I am confused as to why my site dropped from the first page of Google for our main keyword and many related keywords to page 17 as we are an ecommerce site, been online since 1999, survived many previous updates and while we do have some originaly content, most of our content consists of product descriptions. Some of those may be duplicate as they are provided by our supplier an other dealers may have used them verbatim as we have. We do have some articles talking about our products, but those are definitely original and we have not tried to syndicate any of them. Any ideas?
A lot of well established sites have been hit hard, even those who have been unaffected by precious updates.
The final image is very disturbing. Sorry for that.
I just had a chance to read the interview of Matt Cutts and Amit Singhal given to Wired.com, from what I speculate, very soon all the original sources will be bac in the top. It kind of makes sense, the way they formulated this Algorithm, but practically thinking, how many results would they get when they say "We considered the results of all the people who gave an option of block all results by this site".
Thanks for all those analysis. I am subscribing to this post and will monitor it.
We're getting crushed. It didn't happen when it first hit but now in the past week and a half our traffic has dropped in half and seems to be still dropping, we've lost about 8000 visitors per day. There's nothing fishy about our site, its our own content and school directory. I'm gonna have to let go most of our little staff.
This analysis by WonderHowTo.com, is well thought out and represents a huge amount of work and time, trying to regain one's Google SERPS.
It's always good to see credible publishers perform such a detailed analysis of Google's Panda Search. Other publishers need this sort of back up to counter the party line coming out of Google.
When I see so many publishers (including myself) having their businesses so severly damaged by Google's Panda update, and then to hear Google say they have improved search quality, when we show them proof that they have reduced search quality, it reminds me of when governments tell you one thing but what you see in reality is something else.
What I usually want to tell the politicians who mislead us and now Google, is that "You are either incompetent, or you are flat out lying and therefore corrupt." Which is it I wonder?
The update really didn't affect any of our clients much at all, a slight temporary drop and then back up. All of them have quality sites with quality content, they're not all known brands and they have wide ranges of authority and age. The only things I work on which got affected really badly and didn't come back is affiliate sites. Despite that costing me a little, in my book that makes the update a good one... less affiliate and ad sites in the serps, more original content with a real purpose.
"The only things I work on which got affected really badly and didn't come back is affiliate sites. Despite that costing me a little, in my book that makes the update a good one... less affiliate and ad sites in the serps, more original content with a real purpose."
The same thing happened to a lot of the affiliate sites I work with but I still think there is a place for affiliate sites on the Internet.
Take Pokerlistings.com as an example. You would be hard pressed finding a better site on the Internet with such quality content and information about the poker industry and reviews of poker rooms. Sure their main mission is to get sign-ups but which business website isn't about conversion? I think sometimes people place informative websites in the same category as business websites. Good quality affiliate sites can and do offer excellent information (or guides) on various segments (web hosting, dating, insurance, etc) that users can find all in the one place. Sure some of the reviews might be biased based on commission structure but which sites aren't? Hotels.com/Tripadvisor.com, etc. rate Hotels differently based on kick-backs.
Unfortunately, there are a lot of shite affiliate sites that give the industry a bad name and definitely these sites need to be weeded out.
You might want to try working with your partners so that visitors liking and tweeting your syndicated content like/tweet your canonical URL when they use buttons on the pages in question.
I will just pick up on this one comment: "but I haven't seen this phenomenon transparently documented either in SEO press or in the Panda Google forum." I think you will find there are literally 100's of sites reporting this on the Google forum if you read that thead there.
I would also advice to use canonical tags or original-source tags so Google and others realize where that content originates. Sure, this does not keep vicious scrapers from stealing and republishing without those tags, but for the others it should work. That's what these tags are for.
Hope you get your traffic and rankings back.
Following this article, I found out my content is getting scraped as well *help*
Thanks so much for sharing this information! Very interesting findings! I actually wrote a blog about the update a few months ago.... https://area2oh3.com/google%E2%80%99s-farmerpanda-algorithm-change/
Are you seriously trying to link spam on SEOMoz of all places????
Join the class action lawsuite against Huffington Post. You should get a chunk of the AOL buy out too.