This morning I set myself a challenge. Using some inspiration from recent excellent ideas, strategies and articles about the Panda update, I decided to see if I could cobble together a quick strategy to weed out pages that might be deemed as “low quality” in the eyes of Google's most recent major algorithm update.
I gave myself two hours to get the data and to put this post together, with the intention that you'll be able to download the template and pick up your analysis from where I left off.
It’s all about poor performance
This methodology should help you identify poorly performing pages that have few, if any links and a high average bounce rate across a wide spectrum of keywords. This might help you identify any page candidates that need a rethink.
Step 1 – Head to Google Analytics
Head over to analytics and navigate to Traffic Sources > Search Engines:
Now, select “Google”
Step 2 – Get lots of raw data
Make sure you can get your hands on plenty of data by inserting the &limit=50000 query into your report URL. This might come in handy later!
Step 3 – Sort by landing page
We’re interested in landing page performance, so in your left hand sort column, select “landing page”
Step 4 – Download the data as CSV and create an Excel Table
Ok so far so good – by now you should have a rich data set all tuckered up in Excel. To make your data into a table, highlight it and press CTRL-L on your keyboard.
Step 5 – Head to Open Site Explorer
Next, we’re going to export all the links data that Open Site Explorer can give us, and use VLOOKUP to add the number of links to each URL in our table. Whee!
If you’re not familiar with VLOOKUP, check out Mike’s awesome guide to Excel for SEOs. Create an Open Site Explorer top pages report (My favourite report since, ever), download the data and throw it in an Excel tab called “Top Pages”.
Tip: for the purpose of this blog post, you’ll need to remove the domain name from the Open Site Explorer data. Do a find and replace for your domain, replacing the domain URL with nothing, like this:
Step 6 – VLOOKUP time
Next, you’re going to need to combine the analytics data with the top pages data from OSE. Create a new column in your analytics data called “Links” and add your VLOOKUP, just like this:
Pro tip: use IFERROR to weed out any nasty N/A errors, replacing them with a 0, like this:
=IFERROR(VLOOKUP([@[Landing Page]],toppages,6,0),0)
Step 7 – Create your pivot table
With a complete data set, you’re now able to create your pivot table. Insert a pivot table and setup your filters, labels and values like this:
Step 8 – Filter by bounce, visits and use conditional formatting
At the end of my data mashing, I came up with this table:
I can only imagine what this data might look like on an extremely content thin, "low value" site. Any page with a very high bounce rate, measurable level of traffic and low / no links might cause some concern and there are certainly a few pages in this list I’d like to take a closer look at.
If you'd like to take a closer look at your pages, you can download this Excel document here:
PRO Tip: Add your keyword data
I have a working theory that it’s good to have a complete picture of a landing page’s performance. In principle, you could build a more complete picture using keyword data. Think about it like this: if a page has a slightly below par bounce rate, with the keyword data intact you can investigate the problem a little further. Is there a specific keyword that’s causing a problem? How would you approach this problem?
I hope you enjoy using the data and I'd love to hear your thoughts on how this type of analysis could be developed further. Happy number crunching!
WOW...
[... after 30 seconds and shutting finally my wide open mouth...]
I think that the best compliment I can tell is that I'm going to experiment what you wrote with my clients' sites in 3, 2, 1, now :)
Me too!
Thumbs up Tim!
Nice one Richard - good post. One thing that still confuses me about using BR to determine poor quality pages is the definition of BR. If you satisty the users intent on the page, and they leave via that page, then that's a bounce. So blog posts and forum posts will have incredibly high bounce rates, but that doesn't mean that they're poor quality content. I'm not sure how you'd take this into account - possibly cross-referencing time on site against URLs with high bounce rates? So if you had a high BR, low TOS then you'd be poor quality?
I do agree though that evaluating bounce rates is a good first step to getting your head round where poor quality might be occurring, but I'm not sure it gives you the complete picture. Maybe I'm wrong though - would be good to hear your thoughts.
Hi Jonathan!
Yes - I agree - it's such a difficult one to assess. In the same data export, you could use time on site. If, on average users who land on your page only hang around for a few seconds you *might* have a problem.
I say "might" because some sites sole purpose is to send traffic to another site - affiliates and reviews sites, for example. In that case, you've got a real problem with Google Analytics!
I suppose we need a metric that describes a user's likelihood to return to the serps. The fundamental problem with bounce is that it doesn't tell us where the user bounced too.
Fair point well made and exactly the type of comment I'd hoped for.
See you soon!
You could add another column for conversions perhaps? If you are tracking clicks on affiliate links using on-click events or internal redirects you should be able to have a conversion rate for the page too. High bounce-rate, high-conversion rate = good for an affiliate site. High BR, low CR means a potential low-quality page.
Everett, you can add a column for pretty much anything - that's why I love VLOOKUP :-)
Yes I agree! I have noticed that with some of my clients' location pages with good UX, they had an extremely high bounce rate. They did say that their phone calls were about double what they normally were. So yes, high BR and TOS can be a great thing!
"I can only imagine what this data might look like on an extremely content thin, "low value" site. Any page with a very high bounce rate, measurable level of traffic and low / no links might cause some concern and there are certainly a few pages in this list I’d like to take a closer look at."
My certified analytics ninja skills refuse to agree with your post.I have to disagree with the metrics you use to measure the effectiveness of your web pages. First of all, number of back links to a page is no indicator of the page's content quality at least not in the case of e-commerce website and not in the world i live in. A web page can have great contents but still failed to get links becuase it was either not promoted at all or not promoted to the right audience (i.e. potential linking partners). Same is the case with the other metric 'visits'. People may not the visit a high quality page for number of reasons like it is not easily accessible,it doesn't rank on SERPs or not promoted in any other way. Now comes the metric 'bounce rate'. If you calcuate the bounce rate of order confirmation page it is generally 100%. Any page which fully satisfy the visitors purpose (like looking for a particular information) will generally have 100% bounce rate as there is no need to browse the website any further. If all the users interactions are taking place on a single page in case of one long form or ajax/flash based contents, the bounce rate will always be 100%. Many people see a product page and then bounce to compare prices and features somewhere else. Then they come back again to the page to start the checkout process. So bounce rate is a poor indicator of page performance.
Now imagine an ecommerce website with 50k pages. How many people naturally link out to product pages of an ecommerce site? If i apply your theory majority of my product pages are performing badly (high bounce rate and no links). In reality majority of product pages of an e-commerce site can rank high even without any backlink just because of high domain authority.
IMHO the best way to determine whether the contents of a page are adding value to your business goals is through'per visit value' and 'transactions' metrics. A webpage which has higher 'per visit value' is performing better. For non-ecommerce site ‘per visit goal value’ and ‘total goal completions’ metrics are best to determine whether or not the contents of a page are adding value to your bottomline. There there is this mighty $index metrics. Explaining any further is beyond the scope of this comment. But i guess you got my point.
Himanshu, I don't think Richard's purpose was to give a bulletproof procedure in order to find what are the lowest performing pages on someone's website, but to suggest a way to be able to determine some of them according to the data an average SEO may be able to obtain.
When you get a problem with your car, which you don't have a single idea about, and someone's tell you: "Oh! Maybe it's your gas pump! I think it would be a good idea to check if it's working well", and someone else come by your side and tell you "It's sure it's not the gas pump! I mean, according to the noise I hear, the length of your car and X Y Z factor, it think may be related to alternator! But you should also consider....".
... It would probably have been quicker to listen to their first advice, take 15 minutes, and check if your gas pump was working... Agreed?
I'm sorry, it's not especially me having problems with purists, but I think it don't cost anything to make experiments!
Also, no offense, but I feel your "*certified* analytics ninja skills" are a bit pretentious in such a comment... Is it to give importance to the value of what you say? I don't think it's required... I mean, I'm also a GAIQ, and I think anyone with 50$ a few hours to spend can be one! I'm still here to learn, and to see what the community has to offer.
As I said, no offense... It's just that I didn't like this word. :)
Keep up the good work!
Sorry if I was expecting bulletproof procedure from a top SEO on a top seo blog. And if in this world full of self proclamied analytics ninjas my genuine hard earned analytics certification sounds pretentious then let it be. If getting that $50 certification would have been so easy every one would be strutting with it by now and it would have become meaningless long time ago. Anyways my comment is just a friendly reminder that there are lot of smart SEOs out there. Who are not so famous but know their work really well and you cant serve them just anything on the fly no matter how influencial you are in the seo community. I am not targeting Richard here or any particular SEO. I just have problem with bad tips.
The use of bounce rate as THE default analytic is one of my most infuring pet peeves, but he did say that this approach "might" help you identify low performing pages. I think people will understand that bounce rate can be substituted with whatever real metric you're using is.
Nice counter-argument and yes, of course a 50k page site is going to require far more in-depth analysis and you should indeed be proud of your own depth of knowledge and expertise...but as a starting point Richard's "a quick strategy to weed out pages that might be deemed as “low quality” "is likely to provide a good foundation for further explration...time (or the lack of it) is often our biggest challenge - no excuse for not delving deeper, just a reality.
Hey gang - obviously you can do a lot more with this pivot (it's why I made this one downloadable) - and I got a LOT of requests to show a VLOOKUP example. Analytics is a means to an end, and Excel is a great way to get there. Every analysis will likely be unique, relevant to the site in question or SEO and not applicable to the next problem. I'm ok with that and I get a great deal of satisfaction from sharing It's nice to be nice!
Just a side note, in your grand total, avg should be used for BR instead of sum :)
A very inspiring post, thanks a lot!
Hey Dennis - yep - as there's only one URL entry, average and sum would produce the same figure. In a standard data export though, you're correct, thanks for highlighting that!
Around the time of the panda update, our website which is a UK Ecommerce site dropped for 5 of our 6 biggest keywords, from being 1st or 2nd in Googles UK search results we were now 6 or 7. One factor we considered was maybe our blog was hurting us more than helping us. All blogs are written, i believe to a high standard in house. We write approcimately 3-4 blog posts per week about a very uninspiring topic... bins! Or trash cans if your on the wrong side of the pond :P
Now firstly i wont lie, the blog was setup to try and rank for some long tail keywords but we deliberately chose not to over optimise and to try make it as light hearted as possible.
Now could it be that some sites which are different to the norm (not many, if any bin websites in the UK have a blog that is updated regularly) could be targeted for special negative treatment?
We have some awesome links from goverment websites, national newspapers, and national tv websites. All our content is 100% unique and yet we have been penalised.
The point is, could being different maybe have an effect? Could Google see blogs, which does not get a lot of traffic despite a lot of posts negatively, despite having relevant, decent quality content?
On a side note, maybe SEOmoz should have a spell checker on the comment posts what with the Panda update on content :)
It is unlikely that your blog - provided the content is all unique from your site - is a negative factor. Having a blog should be benefiting you, not hurting you. That said, perhaps you could do some humorous posts or share some funny videos, install CommentLuv and go out commenting in blogs that might have readers interested in what you have to offer. (That may be something of a challenge to identify.)
An even better idea is to get your site listed in as many Local Directories as possible. I recently did a Local Search Directory Case Study that showed that G controlled 95-98% of the search traffic to that small business but using Universal Business Listings (UBL) to get them into hundreds of tiny databases has generated 75% as much traffic as G sends them. UBL now has strong coverage in the UK and Canada as well as the U.S. and plans to announce coverage in Australia soon and Germany and South America to follow. I HIGHLY recommend EVERY busienss with a physical address use their services.
Businesses that want to ensure they have a future need to focus on every way they can generate traffic beyond G - and all of us would do well to recommend alternative, INDEPENDENT (i.e., not Y, B, or anything they launch, back or control) search engines to all our friends, family, neighbors and readers.
When G starts using the crowdsourced Chrome block data they will use that as the reason to drop any sites they choose from their results the same way that Akismet censors comments of active commentators today. They will simply say it is what their users want. How long will you allow them to be the master of you and your business? Who died and let them play god?
Regarding getting rid of problematic pages, what's everyone's opinions of placing canonical tags in the head section instead to zap them out of the index?
I have a client that appears to have about 200,000 thin content pages which are very basic content pages for their part number product pages. They basicly just have the part number stated in the H1, a part number image, unique meta data and the specific pricing content and then add to cart button. So the content is very thin on these pages and they might be bringing their overall rankings and traffic down as they had a 25% negative hit in Google after the Panda update.
We recommended that they canonical the part number pages to their related parent product page where each group of part numbers relates to one parent product page that has nice in-depth content about the product, UGC and what not.
Do you think the canonical tag will serve the same purpose as placing no-index tags in the head of the pages? They have already removed the part number urls from their google sitemap too.
Good stuff Richard - two ideas here:
Good stuff very helpful thanks!
1000% Agree Ken - I'd love to see WMT and Analytics data compared properly. It might make an awesome YOUmoz post :-)
Good additions. Care to go into more detail (or have the time for it)?
Happy number crunching - that's a good one.Sometimes I wish there were less excellent posts here so that I have more time to go on with the usual works - just joking.At the end you help us to learn and become the best SEO's out there in the world.I am not best friend with those Pivot tables but I guess if Gianluca is speachless I have to deal with them.
Petra - practice makes perfect. The cool thing about this community is there's always someone to help!
Good luck,
Richard
Good thinking here Rich! A very quick (and minor) one for you, but just wanted to let you know that Ctrl+T also works for transforming a data set into a table. I find it's a bit more intuitive/easy to remember.
Thanks for giving me something to play around with this afternoon though!
Wow - sweet shortcut! I had no idea ctrl-t did the same thing. I'm a bit set in my ways now good buddy but that's a sweet bit of Excel trivia :-)
@Richard
I think this is best combination of Google Analytics data & SEOmoz software data. I love to play with excel & really happy to read excellent data of excel ninja by distilled. I am going to do some awesome excercise with my excel.
I want to add one more thing about Google shopping data. If any eCommerce website is promoting products on Google shopping with help of Google merchant center so, landing page value may be change. Because, my one product is on top over Google shopping & getting massive traffic on website. So, product page have no links but performing well. This is just my concern to execute report with accurate data & selecting accurate landing page.
BTW: After making of report we can justify landing page manually & imagine that, this is quite important as per my concern. Thanks for your valuable post & eager to make report for my clients as well as ongoing projects.
Great post Richard. There's enough to take away here in actionable points to last a lifetime! I like the way you've provided all the details to repeat what you've done, but it's a bit worrying that there are several comments from people taking certain aspects very literally. As SEOs it's important to use other people's research and ideas as a starting point, not copy them across without adjustments or modifications to suit particular circumstances, industries and websites. - Jenni
What actions are you going to take based on the step 8 screenshot?
It's all one big hype. Google changes their algorithms on weekly basis, this one was done publicly because of the NYtimes JC Penny story.
If you have a fully autoblogged site, you deserve this and had this coming. There is no short-run for good content website. Our website, Bloginity that buys content from big vendors like AP, Bang , etc features duplicated content but of higher quality . I think and know this update only hurts bad websites that make absolutely no sense and ran by the industry's Black Hats.
I don't understand what step 2 is telling me to do :(
How did you add those SWEET green bars?
Within Excel 2010 and 2007 it's the "Data Bars" option under "Conditional Formatting" within the "Home" tab.
It's great to see SEOs using data more and more in driving decisions!
Another great use for this data set is to follow through with Richards "PRO Tip" to add the keyword data, and then review the keywords that are driving reasonable levels of relevant, good quality traffic (i.e. in this data set the bounce rate would be measuring this) but ones that you haven't intentionally optimised for.This will uncover some of the good quality & volume and low competition keywords that you can quickly do some more on-page for and immediately receive extra traffic!
Hi Richard,
Great tool, thanks for sharing the explanation and the finished product. Excel is a must for digging into analytics data and knowing a few of the aspects you have shared here makes it so easy to get some real actionable data to improve clients's sites. Being able to send them the spreadsheet too & explain the rationale adds value to an SEO service. I agree that you have to apply common sense and when interpreting data its essential to have an understanding of how people use the website - only that way will you truly recognise when something needs addressing.
Thanks for sharing :) Nice to meet you at SMX London recently too.
Cheers,
Kath
Hey Richard,
here is my situation. I've got a very successfull real Linkbait that is linked from many forums and 6 times from wikipedia. These external Links generate about 200 Visitors a day. However the page has a bounce rate of about 90% and people spend in average 30 sec on the page. But these visitors dont come from Google they come from wikipedia and other external references. The page gets barely traffic from Google.
How am i supposed to handle this? Is a high bounce rate just bad when its about Google visitors? Or are we talking about bouncerate in general?
Does this content harm or does ist push my rankings?? it makes about 10% if my website traffic. But these visitors are not my customers. its just some content they are interested in and that is helpfull for my customers as well.
Skyper, don't delete quality pages that are doing something for you just because you read something on SEOMoz.
The author of this post has no inside information. It's all guess work... and he hasn't removed any pages you see listed above from his own site.
In fact, I'd be surprised if any site that was hit by Panda has gained anything by following the advice in this post. Removing pages based on anything other than an assessment of their Quality is not justified by anything that is actuallyu known about Google/Panda.
Lots of stuff here "sounds" smart. Wow, pivot tables. Ooooh, formulas. That doesn't make it smart. It just makes it impressively complicated.
Great post - although I would exercise caution in using bounce rate as a 100% bounce may mean that page was precisely what they wanted to read and nothing more or less.
Interesting idea which is certainly worth exploring. I am not sure that landing page performance is the most actionable data to use when removing pages however. In my opinion this would be a good way to create a list of pages that needed to be manually reviewed to determine if they were indeed SEF pages.
I have been putting together a similar strategy regarding Panda update. The update was clearly aimed at content farms, so I asked myself what was the most likely way to algorithmically define a content farm? My hypothesis is the following:
Content farms are sites where a high percentage of the pages can be described as:
So the test of this hypothesis will be to identify and remove pages will fall into the above categories, especially if they fall into the category of poor landing page performance. For example, a site I manage SEM for has an even calendar which has entries linking out to the events they will be attending, which is fine from SEO perspective, but in the CMS they are allowing pages which represent the events to be both indexable by the SEs and listed in the xml sitemap. These pages add absolutely no value to users and cannot be found anywhere except in the sitemap. When analyzing the indexed pages a great deal of them are these event pages which have very small variations in content (event name, time, outgoing link). Other content on the site is great: topically relevant to the site & to the keywords targeted on the pages, have inbound links, and SEO best practices in general. So to test my hypothesis on this site I will be removing all those event pages from the sitemap and adding the noindex tags and possibly requesting removal through GWT.
A very informative post but I guess now that we are 40 days or so into Panda ... has ANYONE come back from the abyss? I have seen the experts give their advice but the experts aren't paying payroll or rent. All the suggestions simply come with what people think but I have yet to see any that have actually had proven results.
Has anyone had success or the hint of success?
Arthur
Nice Work!!!...Many thanks for sharing the excel doc. It's nice to come across something like this and see how other people approaches same issues. Great job!
I downloaded your excel file. Love it. Using it. Loving it. So useful!
Thanks for not only explaining the how to, but including the finished product. :D
All the best, Richard!
How was the data exported from Analytics with the Keywords in there also. In the instructions all that is said is choosing landing page as the sort data. But in the next screenshot of this tutorial there is the landing page data, plus the keywords used.
Making use of lots of different tools! Show's how we acquire data from these programs and make usable information! TY for the great post!
Love this post!
Hello!
So let's say you're looking at the report URL and it's like:
https://www.google.com/analytics/reporting/search_engine_detail?id=177&pdr=20110226-20110328&cmp=average&d1=google&slice=non-paid
You'd add the query here
https://www.google.com/analytics/reporting/search_engine_detail?id=177&pdr=20110226-20110328&cmp=average&limit=50000&d1=google&slice=non-paid
Hope that helps!
Richard
In addition to what Rich said make sure you export as "CSV", not "CSV for Excel". Don't know why the latter doesn't work, but it just doesn't...
The &limit=50000 trick doesn't work anymore. At time of writing, this is how I got it to work.
First set your rows to something other than the default 10.
Then, look for the section of the URL that looks like this:
Change the number to whatever you want (up to 50000)
Hope that helps!
Will come in very handy at a much later stage. Thank you, I'll steal that. ;)
Wow that's a really good tip, thanks for sharing.
I guess one action from this analysis might be to redirect poor performing pages to closely related content that users are responding to and that are already ranking well. That way you could take what's good about those poor performers (the page authority) and put it to better use.
Yes - redirection / removal is one option. A rewrite might be another. I'm sort of in a place where it feels wrong to completely remove the content, unless of course it was terrible! Some of the examples i've been looking at recently have been articles created with a skewed intention (say - overly commercial, self promotional etc). I don't personally believe that every article can be rewritten, but I do believe in having candidates for rewriting or redirection.
Yes , Removal and redirection is the best procerss...
Maybe it is wise (if you haven't got many sub sites) to collect links to those, so they preform better, instead taking them out and loosing content. And yes like asefati said to improve their content.
But in general a nice fresh idea to improve performance. I like that.
Great round up. On a similar note, have posted an article today with tips to get around dupicate content on eCommerce sites here (following up from Patrick Altoft's post) - https://www.further.co.uk/blog/5-ways-to-avoid-the-Google-Panda-Farmer-update-for-duplicate-content-eCommerce-sites-321.
hm...why not just improving the quality of those pages? You may ask why? My answer to you: More than 25% of search marketing (yahoo/bing)
why drop pages for google to increase a little bit of traffic but lose bing/yahoo's which could be a lot more. You gotta do the math though
Thanks for the reply.
We certainly plan on inserting valuable content into the part number pages. They are planning on working UGC and other unique part number specific contents into the pages later on this year, but in the interim the pages won't be touched due to available IT resources.
So at a later date they will be re-introduced to crawlers. In their current state, these pages make up for a VERY small amount of overall entry page revenue, so that's why we recommended that they be removed because the page type count represents over 60% of the websites overall pages and this page type makes up for a very small amount of revenue. So they're hoping that trimming the content thin pages will improve the overall domain score that was effected in the Panda update.
After the panda update, their was an overall sitewide hit on rankings all the way from head terms to long-tail terms.
Any further thoughts on whether canonical or no-index is the best route when it comes to cleaning up thin content pages that may be hurting a domain's overall score with Google?
The archivist in me bristles at this whole thing. Maybe it will help your site, and maybe it won't. The whole Panda thing might be relevant on a mega site, but possibly not on a regular site. Meanwhile some potentially (to some one, some day) content is being destroyed.
This is a well thought out tactic to implement an unproven strategy. I concur with previous commenters that have mentioned that the dots have been connected yet between Panda downgrades and bounce rate.
Hey Randy, you mean have "NOT", correct?
Ooops - yes thank you for the correction
Haha, np, just wanted to make sure I didn't miss any relevations with the panda update.
This is truly an awesome starter for aggregating GA, OPE as well as RankTracking data. Many thanks Richard!
I'm always nervous about just relying on the bounce rate of a page to determine if it is performing well or not. Sometimes you have to look at the content of the page. If the goal of that page is to get visitors to pick up the phone and call, does a bounce rate really mean anything? Why would someone hang around once they have the information they need and have been convinced to take action? But I agree that its's important to determine which pages are performing and which aren't.
Hey Nick - totally, it was Jonathan's comment that brought attention to this. Time on site is probably a lot better, but still not always applicable. In retrospect I'd have probably have uncovered this, but I deliberately gave myself two hours to put the data together :-)
Bounce rate has nothing to do with quality... nor does time on site