We all have it. The cruft. The low-quality, or even duplicate-content pages on our sites that we just haven't had time to find and clean up. It may seem harmless, but that cruft might just be harming your entire site's ranking potential. In today's Whiteboard Friday, Rand gives you a bit of momentum, showing you how you can go about finding and taking care of the cruft on your site.
Click on the whiteboard image above to open a high resolution version in a new tab!
Video transcription
Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we're chatting about cleaning out the cruft from your website. By cruft what I mean is low quality, thin quality, duplicate content types of pages that can cause issues even if they don't seem to be causing a problem today.
What is cruft?
If you were to, for example, launch a large number of low quality pages, pages that Google thought were of poor quality, that users didn't interact with, you could find yourself in a seriously bad situation, and that's for a number of reasons. So Google, yes, certainly they're going to look at content on a page by page basis, but they're also considering things domain wide.
So they might look at a domain and see lots of these green pages, high quality, high performing pages with unique content, exactly what you want. But then they're going to see like these pink and orange blobs of content in there, thin content pages with low engagement metrics that don't seem to perform well, duplicate content pages that don't have proper canonicalization on them yet. This is really what I'm calling cruft, kind of these two things, and many variations of them can fit inside those.
But one issue with cruft for sure it can cause Panda issues. So Google's Panda algorithm is designed to look at a site and say, "You know what? You're tipping over the balance of what a high quality site looks like to us. We see too many low quality pages on the site, and therefore we're not just going to hurt the ranking ability of the low quality pages, we're going to hurt the whole site." Very problematic, really, really challenging and many folks who've encountered Panda issues over time have seen this.
There are also other probably non-directly Panda kinds of related things, like site-wide analysis of things like algorithmic looks at engagement and quality. So, for example ,there was a recent analysis of the Phantom II update that Google did, which hasn't really been formalized very much and Google hasn't said anything about it. But one of the things that they looked at in that Phantom update was the engagement of pages on the sites that got hurt versus the engagement of pages on the sites that benefited, and you saw a clear pattern. Engagement on sites that benefited tended to be higher. On those that were hurt, tended to be lower. So again, it could be not just Panda but other things that will hurt you here.
It can waste crawl bandwidth, which sucks. Especially if you have a large site or complex site, if the engine has to go crawl a bunch of pages that are cruft, that is potentially less crawl bandwidth and less frequent updates for crawling to your good pages.
It can also hurt from a user perspective. User happiness may be lowered, and that could mean a hit to your brand perception. It could also drive down better converting pages. It's not always the case that Google is perfect about this. They could see some of these duplicate content, some of these thin content pages, poorly performing pages and still rank them ahead of the page you wish ranked there, the high quality one that has good conversion, good engagement, and that sucks just for your conversion funnel.
So all sorts of problems here, which is why we want to try and proactively clean out the cruft. This is part of the SEO auditing process. If you look at a site audit document, if you look at site auditing software, or step-by-step how-to's, like the one from Annie that we use here at Moz, you will see this problem addressed.
How do I identify what's cruft on my site(s)?
So let's talk about some ways to proactively identify cruft and then some tips for what we should do afterwards.
Filter that cruft away!
One of those ways for sure that a lot of folks use is Google Analytics or Omniture or Webtrends, whatever your analytics system is. What you're trying to design there is a cruft filter. So I got my little filter. I keep all my good pages inside, and I filter out the low quality ones.
What I can use is one of two things. First, a threshold for bounce or bounce rate or time on site, or pages per visit, any kind of engagement metric that I like I can use that as a potential filter. I could also do some sort of a percentage, meaning in scenario one I basically say, "Hey the threshold is anything with a bounce rate higher than 90%, I want my cruft filter to show me what's going on there." I'd create that filter inside GA or inside Omniture. I'd look at all the pages that match that criteria, and then I'd try and see what was wrong with them and fix those up.
The second one is basically I say, "Hey, here's the average time on site, here's the median time on site, here's the average bounce rate, median bounce rate, average pages per visit, median, great. Now take me 50% below that or one standard deviation below that. Now show me all that stuff, filters that out."
This process is going to capture thin and low quality pages, the ones I've been showing you in pink. It's not going to catch the orange ones. Duplicate content pages are likely to perform very similarly to the thing that they are a duplicate of. So this process is helpful for one of those, not so helpful for other ones.
Sort that cruft!
For that process, you might want to use something like Screaming Frog or OnPage.org, which is a great tool, or Moz Analytics, comes from some company I've heard of.
Basically, in this case, you've got a cruft sorter that is essentially looking at filtration, items that you can identify in things like the URL string or in title elements that match or content that matches, those kinds of things, and so you might use a duplicate content filter. Most of these pieces of software already have a default setting. In some of them you can change that. I think OnPage.org and Screaming Frog both let you change the duplicate content filter. Moz Analytics not so much, same thing with Google Webmaster Tools, now Search Console, which I'll talk about in a sec.
So I might say like, "Hey, identify anything that's more than 80% duplicate content." Or if I know that I have a site with a lot of pages that have only a few images and a little bit of text, but a lot of navigation and HTML on them, well, maybe I'd turn that up to 90% or even 95% depending.
I can also use some rules to identify known duplicate content violators. So for example, if I've identified that everything that has a question mark refer equals bounce or something or partner. Well, okay, now I just need to filter for that particular URL string, or I could look for titles. So if I know that, for example, one of my pages has been heavily duplicated throughout the site or a certain type, I can look for all the titles containing those and then filter out the dupes.
I can also do this for content length. Many folks will look at content length and say, "Hey, if there's a page with fewer than 50 unique words on it in my blog, show that to me. I want to figure out why that is, and then I might want to do some work on those pages."
Ask the SERP providers (cautiously)
Then the last one that we can do for this identification process is Google and Bing Webmaster Tools/Search Console. They have existing filters and features that aren't very malleable. We can't do a whole lot with them, but they will show you potential site crawl issues, broken pages, sometimes dupe content. They're not going to catch everything though. Part of this process is to proactively find things before Google finds them and Bing finds them and start considering them a problem on our site. So we may want to do some of this work before we go, "Oh, let's just shove an XML sitemap to Google and let them crawl everything, and then they'll tell us what's broken." A little risky.
Additional tips, tricks, and robots
A couple additional tips, analytics stats, like the ones from GA or Omniture or Webtrends, they can totally mislead you, especially for pages with very few visits, where you just don't have enough of a sample set to know how they're performing or ones that the engines haven't indexed yet. So if something hasn't been indexed or it just isn't getting search traffic, it might show you misleading metrics about how users are engaging with it that could bias you in ways that you don't want to be biased. So be aware of that. You can control for it generally by looking at other stats or by using these other methods.
When you're doing this, the first thing you should do is any time you identify cruft, remove it from your XML sitemaps. That's just good hygiene, good practice. Oftentimes it is enough to at least have some of the preventative measures from getting hurt here.
However, there's no one size fits all methodology after the don't include it in your XML sitemap. If it's a duplicate, you want to canonicalize it. I don't want to delete all these pages maybe. Maybe I want to delete some of them, but I need to be considered about that. Maybe they're printer friendly pages. Maybe they're pages that have a specific format. It's a PDF version instead of an HTML version. Whatever it is, you want to identify those and probably canonicalize.
Is it useful to no one? Like literally, absolutely no one. You don't want engines visiting. You don't want people visiting it. There's no channel that you care about that page getting traffic to. Well you have two options -- 301 it. If it's already ranking for something or it's on the topic of something, send it to the page that will perform well that you wish that traffic was going to, or you can completely 404 it. Of course, if you're having serious trouble or you need to remove it entirely from engines ASAP, you can use the 410 permanently delete. Just be careful with that.
Is it useful to some visitors, but not search engines? Like you don't want searchers to find it in the engines, but if somebody goes and is paging through a bunch of pages and that kind of thing, okay, great, I can use no index, follow for that in the meta robots tag of a page.
If there's no reason bots should access it at all, like you don't care about them following the links on it, this is a very rare use case, but there can be certain types of internal content that maybe you don't want bots even trying to access, like a huge internal file system that particular kinds of your visitors might want to get access to but nobody else, you can use the robots.txt file to block crawlers from visiting it. Just be aware it can still get into the engines if it's blocked in robots.txt. It just won't show any description. They'll say, "We are not showing a site description for this page because it's blocked by robots."
If the page is almost good, like it's on the borderline between pink and green here, well just make it good. Fix it up. Make that page a winner, get it back in the engines, make sure it's performing well, find all the pages like that have those problems, fix them up or consider recreating them and then 301'ing them over if you want to do that.
With this process, hopefully you can prevent yourself from getting hit by the potential penalties, or being algorithmically filtered, or just being identified as not that great a website. You want Google to consider your site as high quality as they possibly can. You want the same for your visitors, and this process can really help you do that.
Looking forward to the comments, and we'll see you again next week for another edition of Whiteboard Friday. Take care.
Another tip. When using Google Analytics to find cruft on your website, keep the intention of the pages in mind. If you have a page that gives a quick answer to a question it can have a low time on page, a low pages per visit and a high bounce rate. This doesn’t mean it’s a low quality page. Don’t only look at the data, but also try to understand it.
Is that enough though? We have a number of pages on our site that are high quality informational articles. They rank very well but they have a very high bounce rate (80-90%). They don't really give much to the rest of the site even from the users who don't immediately bounce.
So should we continue to host these pages or should we focus on other parts of our site?
I don’t think these pages will harm your rankings. Informational pages may not have good engagement metrics, but they can attract links. Besides that, if you provide valuable information you can position yourself as an expert in your industry. People who visit your pages may not become customers right away, but they may in the future. I would keep the pages, but not put your main focus one optimizing them.
Another way to reach them is with remarketing. Of course this depends on the site your having and the pages.
I am not really concerned about harming rankings (there are many other more pressing ranking issues!), and we are already experts in our industry. It's more the cost/time to continually produce and maintain these articles and the lack of ROI on them.
I am relatively sure that it isn't a case of converting users, they are finding exactly what they want and then continuing with whatever their overall task is.
At the moment we don't have good visibility on if people are returning to our site later after reading these articles so that's something I am trying to shed some light on as that would change it to just being a step in a conversion funnel, please share any insight you have on tracking this as it's been a big challenge.
Not sure if this works, but you could give it try. Lets say you measure send contact forms as conversions and have added it as a goal in Analytics. Your information pages are in a subfolder of the site, like /informationpages/.
In this case you could make a segment with the condition Page contains “/informationpages/”. Put the filter on Include Users instead of Sessions. Than go to Conversions > Goals > Overview. Now you should only see conversions of people who have read an article in the informationpages subfolder.
Let me know if this works of you. If anyone else has any thoughts on this, please share.
Just a few thoughts I had for Richard as I have been in a similar situation. Consider gating the content. If visitors are willing to put in an email to get to your content, chances are it is valuable and you should continue producing it. Also, you are building your email marketing list.Consult a UX designer. It could be that your pages are formatted in a way that discourages engagement and click through.Play around with the size of your content pieces. For example, lets say you currently do one content piece a month. Experiment with doing one every other month and putting two months budget behind one content piece. Do that a few times and see if the bigger budget pieces get more engagement. If that doesn't work do the opposite and use one months budget to do two pieces.Lastly use communities like Reddit, Google+, or Linkedin to get content ideas. If people tell you what they want before you make it that helps A LOT.
@Kevin - Thanks that does seem to be giving me some useful data to work with, it's pretty heavily sampled but there isn't much of a work around for that without paying a huge amount of premium analytics.
Hi Richard
I often use adjusted bounce rate to measure the performance of in-depth informational content pages. It's easy to setup with Tag Manager, which fires an event after a set timeframe. I'm pretty strict and set the timeframe to one minute. If someone spend a minute on the page, an event is being fired and the visit is not a bounce any more. I can create reports about pages generating 1-minute events, so it's easy to see the content performance.
Hope it helps.
Thanks Gyorgy, we don't have tag manager implemented on our existing site but we have a new site in the works so I will make sure it's all added
I've got articles with a high bounce rate too, and although the bounces are numerous, we'd still consider those customers fulfilled; so long as they digested the information they were looking for and left happy. To deal with this, you can define "bounce" as something other than a single-page session, and include the session length of average time on page. So pages with an 80% bounce rate but 4 minutes on average on the page aren't cruft.
Kevin that's right, I think Barry Schwarz has a lot of posts with less words on seroundtable, but it often is a solution or answer. He has some trouble with google caused by the mass of content I am sure, I wanted to say - if you have those sites with a small but great answer, you should make the visitors see something other they are interested and will click on. Low visit time isn't equal low quality - thats true :)
Good point.!
i agree with you , we just dont need to follow the tools as sometimes for buyer FAQ, terms and conditions and privacy policy page is also important.
Google is becoming a useless virus.
It's algorithms and support are so out of touch.
Other than for very large organizations or those who pay per click... Google is fast becoming outdated.
If it doesn't get better at helping reduce the amount of time folks use trying to use Google, and optimize for Google, instead of just make their sites, searches and run their businesses and their lives... Google will be soon replaced.
As an example, while I liked Android, because it wasn't Apple... or Microsoft
Google is trying fast to become Apple, or Microsoft...
It is working hard to spam in lack of choice to it's users...
Soon as an alternative appears, people will run from Google as they have from Apple and Microsoft
Thing is... Apple at least is a consumer based company and makes products which kinda work.
Microsoft can't sell a new opperating system... Becuase Microsoft is the Google of opperating systems...
Several of my last few projects have been full website overhauls & call me weird, but it’s been fun getting rid of sometimes hundreds of old crappy “SEO pages.” Part of me wants to track down the old SEOs who created individual landing pages for variations of the same exact keyword idea & throw something at them. ;)
One brand in particular has actually seen a drop in the overall volume of ranking KWs (according to SEMrush), fewer indexed pages (a good thing in this case!), and a slight drop in organic traffic after a major overhaul that included a mini rebrand, responsive update, & going site-wide https. But since launching in May, everything else has improved - quality/usefulness of content, engagement (bounce rates, time on site, etc), conversion rates, and revenue. Relevant rankings are growing slow & steady (tracked via Moz Analytics).
What’s a bit crazy is that at one point in the planning process for each of these recent projects, I remember being hesitant about cutting/redirecting so many of these “SEO pages” that were potentially ranking and/or ‘contributing to the overall context & theme of the site,' especially b/c the sites didn’t have any obvious penalties. Thank goodness the fear of potentially throwing off bots didn’t prevent us from completely cleaning up each site's thin & duplicate content pages. The good news is that more and more, doing what's right for human visitors is also what's right for search bots, and vice versa. Great WBF, Rand & Team!
Yeah - do be aware that you can perform worse in SEMRush but actually be doing better from a Google traffic perspective, often because of the mid and long-tail of keyword demand (SEMRush simply can't track every keyword in existence so if you gain traffic from thousands of long-tail terms but lose rankings for a few head terms, it will look bad in SEMRush but be good for your search traffic in reality).
There's something strangely (sadly?) satisfying about this process of cleaning up and consolidating old and spammy pages! Maybe it's the feeling of a job well done or "look at what I've created!"
Being in an agency environment, if a lot of changes are needed we tend to roll these out incrementally to mitigate some of that risk. Add new, quality content to some pages while removing and consolidating others. As you mentioned, cutting a large volume of pages can give you a temporary drop and no matter how pro-active you are at managing client expectations, a month of lower rankings and engagement is going to put them on edge at the very least.
Of course, if they don't currently rank for anything, there's no risk to mitigate.
How long did your ranking/traffic dip last after you applied the changes, Sheena?
Additional alert
remember that if you have a multi-country web sites, with the different country versions in subfolders, if you run tools as they are setup by default, you will see a lot of URLs flagged as duplicates when, for instance, you have an English version for USA and another for UK.
Because of the peculiarities of international SEO, those pages are not to be considered as true duplicates, so you should not go for massive canonicalization from one version to another (the same it's true if versions are in different subdomains or domains names). In fact, "cross-canonicalizing" between versions will literally screw up your international SEO efforts (eg: the USA URLs popping up in Google.co.uk because you have canonicalized these last to the first ones).
Instead, consider if you correctly used or not the hreflang annotation. If you have already implemented it, you are fine and Google won't consider the USA and UK URLs as duplicates in terms of Panda or any other classic issue.
If you have not implemented the hreflang, then you should run and implement it.
For this reason, it is always better tweak a little tools like Screaming Frog adding custom fields, so that it can report if a URL has the hreflang annotation implemented or not.
Finally, always follow these rules when it comes to hreflang and canonical:
P.S. for Rand:
It would be cool if a new version of the Moz Analytics crawler would consider and list if any rel="alternate" (hreflang, but also the ones related to mobile) is present in <head> of a page.
I'm a bit curious here, Gianluca can we actually use the same content on website with different regions using hreflang and rel="alternate" ? I thought it was still counted as duplicate. Can you please illuminate futher on this issue
I let Google speaks for itself:
"Some example scenarios where rel="alternate" hreflang="x" is recommended:
A classic example are product pages, which may are identical for everything but the price, currency and customer care contacts (eg: email address, phone number...).
Thanks Gianluca - I'll give the heads up to Jon White and hopefully he can get that into the MA crawler in the future.
Remember that you don't have to fix all your pages at once. If you are a busy person just start with 1-2 pages a week. Before you know it, you will have tackled a big chunk of your cruft.
Agreed! Far better to make a habit of doing a little at a time, and to update regularly.
Super handy stuff. First time I've heard the word "cruft" though.
Thanks so much for doing a WBF on this topic! I'm telling my clients about this all the time, but most don't listen until they start getting smacked around by Google. Now I can use this video to help people get more people to listen to me the first time. ;)
Additional alert
Some sites (and pages) have naturally high bounce due their nature. Example software companies. You're looking in internet "software for X" and you go in product description of this software. You go there, scroll up-down, find information and quickly decide that this IS software that you need. Then you naturally click on "Download" and .EXE/.ZIP (or other software distribution - .DMG, link to AppStore, Microsoft Marketplace, Google Play, etc.).
This pages comes with high bounce, low pages/session and average session duration. Example of one of mine product page - bounce 87.50%, pages/session 1.12 and session duration 00:45. The trick is to count how many of them are clicking in page - can be "view gallery", "play video" and most important "Download". Once you setup goals (can be with events or destination page) then you can see really conversion ratio.
Another trick is to use Rob Flaherty awesome plugins ScrollDepth or Riveted for tracking user interaction. Once you implement them you can see where users are scrolling in individual pages counted as events. Second is for tracking actual user time on page - clicking, scrolling and some keyboard activity.
So not every page with high bounce, low pages/session or avg. session duration mean "low quality content". Sometime they give user answers so fast and this fool analytics tracking.
100% agree - not every page with high bounce and low browse rate is low quality. These metrics can be a good filter for ID'ing those pages that are low quality, but a manual review is required.
Also be aware of parameters such as paging and filters, trash often create many pages with many difficulties in changing metadata and content.
Yup! For pagination, the rel=prev/next markup can be helpful and for filters that don't substantively change page content, rel=canonical is usually the way to go.
i really agree with alberto, so many website create garbage pages with a smelly content
Ahrefs is saying that my page has 73% health is that good or bad for ranking, should I start cleaning more?
Ahrefs doesn't typically track the metrics you'd want to look at for this type of exercise. I'd check your web analytics.
Check what are the other 27% errors you have on your website. If they are more harsh like ( wrong canonical tags, 404s, bad redirects, duplicate meta data etc. ), then you have problems, which is bad not only for SE, but for USERS too!
@Rand Whiteboard Friday always brings something unique. Infact last week, i read an article about user engagement and the importance of user engagement in getting stability in ranking. I believe user engagement is one of the most important factor of ranking. If we say about Crufts, it simply signifies that if your web page is having low user engagement rate, will counted as a negative factor.
Low User Engement Rate = Cruft
"Inherently, not all pages are created equal, your website will always have pages that perform better than others. Could cleaning the cruft be the start of a slippery slope where you're now deleting pages that actually perform well, but they are the 'worst' ones on your site? I guess what I'm asking is, is there a certain threshold that helps us determine when the cruft is successfully gone from our site and we are only left with quality pages?"
That was the note I took about 3 minutes into your presentation. You then proceeded to thoroughly answer my questions. Awesome!
How is quality of a page on any site measured according to user behaviour and engagement? Is it in comparison to other pages on the site that do better in those metrics?
I have few pages on my site that experience far better user behaviour metrics and engagement than the rest 90% of the pages. These are extensive reviews with 500+ comments and tons of users spending lot of time on them. According to these metrics, there is a huge discrepancy between them and all others, making almost everything else on my site cruft.
Hello Rand,
Some good points about website safety, and some basic information about duplication issues.
I think if anyone visit to the website and found whatever he/she looking for, then the information would be fine on the page. We don't need to count the words of contents, because, sometimes thin contents helps better than long contents, people don't like to read long contents. So, web pages contents should carry only the important and required information which could help to visitors.
I really don't know why panda hit the thin contents, I do believe that if the given information unique and helpful then that would be consider as good, not thin or bad.
Thanks
Hi,
In the video Rand mentions that you can use Screaming Frog to filter out pages with duplicate content over a given threshold (i.e. 80%).. does anyone know how to do this in Screaming Frog? Can't seem to work it out :(
Many thanks in advance, Lee.
I think a good post and also very interesting. Congratulations . I think you always have to leave that hole to make clear what our goal and attack . But ... What is the best way?
Great WBF. I joined Moz last month and started watching all your WBF. Some of your WBF Videos doesn't have zoom option like in https://moz.com/blog/hacking-keyword-targeting-whiteboard-friday.These posts really helped me a lot and broaden my horizons. I feel like I am extremely late in following you :) A Great Speaker, creative,stylish man.
Thanks for this article! Quite useful!
Great tips Rand - I've really been enjoying the recent WBFs after a bit of time without keeping up!
Cruft has been high on our list recently and we've had clients make significant gains just from removing some of their lower quality pages from the index. On small sites which don't have a lot of pages, we've found just having a few thin/duplicate pages linked in the navigation can have a really negative impact. Since removing them, headline phrases have seen a real boost.
We were surprised to see such a dramatic impact, despite what we know about how severe Panda can be. It's scary for small businesses who know nothing about SEO though and would otherwise not have a clue why their website was losing performance. As far as they were concerned, these were useful pages for users - maybe not in the Panda world, but in real life.
Anyway, great to see the topic featured here and I'm sure many folks like us have found it very useful!
Really great, helpful 'Whiteboard Friday" As usual! As others have mentioned, it can be scary cutting pages/content that may well be pulling people into your site. But it is also important that visitor numbers aren't the most important metric, as bounces can be very bad. Good content that is important yo your visitors is the best way to go- and as others have said the changes to the google algorithm, support that idea. All search engines want the pages they supply to be the right page, that is what their business is built on. Great stuff! I gotta go tidy a bit.
One thing that I want to share with you, just in case if you are going to remove bad pages from your website, you need to be very careful before deleting such pages, like:
Nice Article Rand. Low quality content is always a issue for Google. I guess most easiest way to remove cruft is through robots.txt.
Hi Rand,
Another great Whiteboard Friday
Using Google analytic Behavior > Site Content > All page also help to know, best pages and thin paged of the site.
BTW, I am using Screaming frog and Moz Analytic, both works awesome for me.
Irfan
Hello there Rand, Some good factors concerning web site basic safety, and a few standard specifics of replication concerns. I'm sure in case anyone trip to the site as well as discovered whichever he/she looking for, then the info could well be okay around the site. We don't have to depend the words regarding articles, simply because, often thin articles assists greater than extended articles, individuals don't like to learn to read extended articles. Therefore, internet pages articles need to have simply the top as well as necessary info which could be an aid to website visitors. I must say i don't know precisely why panda attack this thin articles, I do feel that should the presented info one of a kind as well as useful and then that would be think about as good, not thin or maybe negative. Many thanks.
Hii Rand ! But What about the posts I have delete them from my site but they already index in google. I have use a WordPress theme dummy data, before I have to delete them google index all the dummy pages and posts. Now I have delete my all dummy data from my site but google still show dummy posts data after 2 months of clearing my dummy data, What to do now ????
Oh yes. Webpages with thin content needs to go. Why have pages on the site that are not helping anyone. A waste of bandwidth. We recently lost rankings for a few core local keywords that were bring us traffic. We were moving to a new WordPress template for our site.
When we sat down to fix it, we noticed a lot of duplicate images across multiple pages (not sure if that impacts rankings), internal redirects, 404's and a few links from really bad sites.
Often webmasters and SEO's overlook on page factors and pay attention to off page SEO. We learned this the hard way.
Great post Rand. As always.
I really enjoyed this article, thanks for another informative white board. Just say no to CRUFT!
It's always difficult for me to write such long posts :( I want to know it but it way too too toooooooo long !
Hi Rand,
Regarding the xml sitemap advice, what would you suggest for classified sites (job listing, real estate, etc) to handle it.
I mean should I keep my sitemap updated all the time, and remove the ones that have already expired?, something like Active-Job-Listing.xml that updates daily or weekly?
I'm just trying to deal with the high amount of errors webmaster tools keeps showing (over 100k now)
Any advice would much appreciated.
Very nice article, Rand! But: I can't really agree with your suggestion to 404 a poorly preforming page. Wouldn't status code 410 be better as it indicates that the page is dead and will never come back to life? A 404 exception can also be thrown when the server can't deliver a page due to temporary internal issues.
Have a nice weekend everybody.
Excellent insights. I will be sure to check the cruft on my own site.
What percentage of the page is classed as duplicate? A paragraph taken from another page or a whole page with exact content on? x
Any backlink checker tool will be good here to check Backlinks. 301 redirections and custom 404 page will be a good option to reduce low quality issues.
Hey Rand, Thanks for the another extraordinary WBF.
That's really ridiculous, just because of a few low quality pages your entire website will pay the cost of it. So for that you need to croft your website and even i'll do that asap . Actually I am handling couple of website and that both are a huge websites, but there are few thin quality contents on different pages just because of easy to understand, cause sometime people don't like to read in brief they want everything in short and sweet. So it doesn't mean that are low quality pages . Yeah the quality of that contents are totally unique and fresh.
Anyways, thank you again and badly waiting for next WBF.
Great WBF Rand. Getting rid of the garbage is just good practice regardless of rankings. A useful web is a better web. IMO, 75% of all pages could be deleted because they serve absolutely no purpose. Some may disagree with me on that but there are so many sites that could easily be 5 pages and done. Your objective is to get people to contact you or buy now. That's it - unless you are in the advertising business. How many advertising websites are there? How many of those are of any use?
Hi
thank you very much. your blog has helped me a lot to learn new things.
I am new to this industry and having some doubts on some topics.
How does google treats Coupon and deal site? They don't have 2000 word post for a single coupon.
What steps to be taken in order to save a site like that?
I wouldn't worry too much about hitting a precise words/page limit. Some pieces of content (like coupons) only need to provide a small amount of content to serve visitors well.
Thank you very much... Appreciate your reply.
it may be possible that a site penalized algorithmically not recover positions after one year of any cleaning and disavow file already sent ?
Many Thanks, expecially to God on earth Mr. Rand !
Yeah, it's possible, but often those unrecoverable penalties have more to do with external links and spam classification than with on-page issues. I'd check Google Search Console and also keep a close eye on the backlinks if you feel like you've already improved the on-page issues.
Rand,
Say I have a page that is now a 404 due to a URL change (underscore to a hyphen, addition of a word, etc.), or it's just been deleted. Google is seeing the 404, and and I don't want it to. This happened recently with a css path that was active, but one part of it was capitalized so it didn't recognize it I guess (app-utils/css/Print.aspx instead of print.aspx.) Sometimes when I initially see that in Search console or on any software, I put that path into the robots.txt file. While watching this blog I am now getting the impression that is not best practice. What would be the best practice for this, or any scenario where you see google finding a page that is a 404?
Thank you SO much.
-Ray
404s aren't inherently bad for Google to find. Unless you're actively linking to them in your pages (or someone else is), you can ignore 404s that Google happens to stumble across because they've attempted some weird combination of letters/capitalization on your site. We see this random surfing/guessing behavior from Google's crawler sometimes and a 404 lets them know those pages aren't active and they shouldn't further attempt to crawl in that fashion.
Thank you!
Hello Rand!
Googles penalties are our worst nightmare. One day you're up and the next day do not know why, do not leave your page in the search. Lest Panda hits us we must look to eliminate low-quality content, duplicate, etc. Which sometimes it is complicated.
Luckily we have some tools that can help us today to solve these problems and not throw us their hair.
Thanks for your article and advice.
Great tips as always Rand. I have begun to use the Google Analytics & Search Console approach a lot more extensively to find poor performing pages by their engagement metrics and improve them. It is something I would recommend to any webmaster, to go page by page (on flagged pages, not all!) and analyse how the user is engaging, or not engaging. I will have to take a closer look though for that cruft now... another task you have assigned me!
Excellent WBF, Rand! Improving the utility of each and every page on your website by making it informative and authoritative can be one of the ways to boost engagement and reduce bounce rates. SEO's should periodically audit their website to determine pages that have lost their relevance or need significant revamp in order to match visitors expectations. This is especially true for website that have a huge number of pages added over the past several years. Similarly, e-commerce sites with large number of product pages can do well by getting rid of pages that are rather deprecated or no longer relevant to the customers.
Hello Rand,
This is really good article as well as helpful. But most of the user's and owner looking the website should be looks good, looks and feels, good information about products/services, delivery etc...
Some of the Good eCommerce website, the most of the customer's looking the products description, services, user friendly, look and feel, where to use the products with full information, offers, discount, shipping bla bla bla....
I think the without content and thin content are not much important, may be I'm wrong but it's true.
Thanks again for a lovely articles and reminds us.
:)
Spending a very short time in a page and the content is thin it doesn't mean the panda will comes to there for penalty. Actual readers stay 1 or 3 seconds on the page displaying result, but the pages fulfilling their queries it means Google can understand this search queries by algorithms so don't be worried. I am always trying to suggest, write that type of content what actual site domain you have. Forget about this penalty and just think about the uniqueness, informative content, well optimization and what actually user or searchers looking for, that might be beneficial.
And in end's i would to say about Rand Fishkin, I love the way you trying to describe in videos.
Hmm... I'm not sure I'd agree 1-3 seconds is enough time for anyone to truly get the information they need. Maybe in the 5-10 second range is where it's possible that someone could get an answer and depart, but if it's lower than that, I'd try and confirm that folks aren't just clicking back to the search results and choosing another webpage (which, over time, can seriously hurt your rankings).
Hi Rand Awesome WBF, Can u Pls, provide some information by moz post or WBF....How to filter the craft content pages or low quality pages in Google Analytics or Search Console (GWT).......? If you provide it will be more help full for me I think for other's also.....Thanks for very Good and energetic WBF.....:)
Broken links should also be kept in daily watch list
Such a unusual topic but important one. As many professional do miss these points in their routine work. Great WFB edition.
Am I the only person that has never heard of the term 'cruft' before?! Maybe it's not an Aussie thing? Who knows...
Fantastic whiteboard though - your best for a while in my opinion Rand!
Well done Rand, I totally agree with cleaning up duplicate mess on the site. Thank you for sharing and reminding us!
Hi Rand! thanks for sharing another WBF, Do you recommend any tool for listing all cruft of website? Its very time consuming when you are managing 4-5 website around 1000 pages each.
Thanks for the great article. Actually I am in a search of complete marketing strategy for YouTube. Also I want to know is there any tool which can analyze the marketing strategy of competitor's YouTube channel. Kindly reply.
Some great points - I like how you suggest using other tools bar Search Console as it doesn't catch everything. Also, beating search engines to the mark on discovering errors is the grind that is SEO :)
Awesome, stuff! Cruft causes headaches and this post definitely clears things up!
I just picked the top 100 best performing pages by click metrics on my 500 page site and then fixed those up if needed; Let go the rest. Lots of cruft. It also is easier to motivate myself to work on 100 pages as a sole small biz webmaster. That seems manageable to me.
What is the character count threshold on duplicate content. And is there a way to block only certain blocks of content on a page. What do you do when you have certain chunks of text, a bio, for example that is buried within other good content on multiple pages?
Thanks for sharing this important information about cruft. Duplicity is the main problems of the webmasters but there is not any tool except www.copyscape.com . I am sharing my experience. I have also seen the duplicity problem in a site. We have applied canonical tag to those duplicated pages but google was also crawling them and showing in the search results. It was strange to me.
Awesome Tips Rand. thanks for it..
i need to check my website errors and how to fix it ..can i get the solution here
Some tools are useful to analyze and clean your site, but the most important is to be aware to what links means to your site. That's the SEO analyst issue.
Another amazing seo updated Article.