Google's Indexation Cap

Comments 57

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Robert O'Brien

2009-12-01T02:20:34-08:00

This is one of the better posts I have read in a while.

I think larger sites need to do more content cleanup / analysis on a more consistent basis to really avoid this problem. I would recommend running regular reports on a URL, Date Created, Last Date Modified and Incoming Links to that URL as well as traffic numbers, etc (i.e. everything mentioned above). Then, the SEO teams for these larger sites should institute an "aging" policy where either content is archived or removed from the submitted sitemap or old URLs redirected to higher authority/similar topic URLs.

It really comes down to being a huge signal to noise ratio problem where noise are URLs without any backlinks, real visitors, etc. and the greater your site can get its signal:noise ratio, the more authority it will seem to have.

Definitely a very challenging problem because most people use a "publish and forget" policy to producing web site content.

7 0

This is one of the better posts I have read in a while. I think larger sites need to do more content cleanup / analysis on a more consistent basis to really avoid this problem. I would recommend running regular reports on a URL, Date Created, Last Date Modified and Incoming Links to that URL as well as traffic numbers, etc (i.e. everything mentioned above). Then, the SEO teams for these larger sites should institute an "aging" policy where either content is archived or removed from the submitted sitemap or old URLs redirected to higher authority/similar topic URLs. It really comes down to being a huge signal to noise ratio problem where noise are URLs without any backlinks, real visitors, etc. and the greater your site can get its signal:noise ratio, the more authority it will seem to have. Definitely a very challenging problem because most people use a "publish and forget" policy to producing web site content. 
Cancel
Eric Enge

2009-12-01T08:39:07-08:00

Brilliant post Rand! We spend a lot of time explaining this issue to clients. At the end of the day publishers need to focus on a few things to maximize indexation:

1. Quality content (across the whole site)
2. Eliminating dupe content
3. Quality links

6 0

Brilliant post Rand! We spend a lot of time explaining this issue to clients. At the end of the day publishers need to focus on a few things to maximize indexation: 1. Quality content (across the whole site) 2. Eliminating dupe content 3. Quality links
Cancel
Paul Allen

2009-12-01T03:59:01-08:00

MacSeth, the speed of the website is a good point... Efficient code and layout is important, but on huge sites, maybe this is an even more important metric, simply because a large site is more time consuming to index and more intensive on Google's hardware.Even more important to get deep level links to big sites to "confirm" the importance and validity of the page and let Google know the page is worth indexing!

4 0

MacSeth, the speed of the website is a good point... Efficient code and layout is important, but on huge sites, maybe this is an even more important metric, simply because a large site is more time consuming to index and more intensive on Google's hardware.Even more important to get deep level links to big sites to "confirm" the importance and validity of the page and let Google know the page is worth indexing!
Cancel
- 3ring
 
 2009-12-01T08:06:57-08:00
 
 I agree with @porly and @MacSeth...
 
 Our large sites have seen this "deindexing" effect within the last year or so, but they were built 3 years ago not accounting for page speed. Amongst the other important factors Rand mentioned, it seems it would be prudent for anyone with "old code" to revamp the site, lose some code weight and where possible, utilize design templates that place main content at the top of the code with nav and other sidebar elements at the bottom of the code - this way at least the spiders see the important content before all the bloat. I can see how critical this is for large sites especially.
 
 Time for a clean up! Great post Rand.
 
 2 0
 
 I agree with @porly and @MacSeth... Our large sites have seen this "deindexing" effect within the last year or so, but they were built 3 years ago not accounting for page speed. Amongst the other important factors Rand mentioned, it seems it would be prudent for anyone with "old code" to revamp the site, lose some code weight and where possible, utilize design templates that place main content at the top of the code with nav and other sidebar elements at the bottom of the code - this way at least the spiders see the important content before all the bloat. I can see how critical this is for large sites especially. Time for a clean up! Great post Rand. 
 Cancel
Richard Baxter

2009-12-01T03:10:20-08:00

I love the point about: "Rate of Growth in Pages vs. Backlinks"

It's a pretty strong signal to send to a search engine when your site expands from 100,000 indexed pages to let's say... 6 million indexed pages in the space of a few weeks. You're going to attract some attention, right?

And then: Deep pages rarely receive external links

If you're working to expand a site in the way I mentioned above, I'd strongly recommend you take it easy, being sure to address areas of "weakness" in the architecture - large volumes of content with no external links.

One final point - if your development team are making changes to the site, keep an eye on your internal link strucure. If you lost a navigational element you could end up orphaning an entire content section. I've seen this happen and it can drastically reduce the length of your long tail if it affects enough pages. I've seen something similar happen, where pages were constantly being orphaned, re-linked to, and orphaned again, depending on whether a particular item was available in the database. I drew a chart of the number of keywords bring traffic to the site on a daily basis and the results clearly demonstrated instability in the long tail.

richardbaxter edited 2009-12-01T03:11:04-08:00
4 0

I love the point about: "Rate of Growth in Pages vs. Backlinks" It's a pretty strong signal to send to a search engine when your site expands from 100,000 indexed pages to let's say... 6 million indexed pages in the space of a few weeks. You're going to attract some attention, right? And then: Deep pages rarely receive external links If you're working to expand a site in the way I mentioned above, I'd strongly recommend you take it easy, being sure to address areas of "weakness" in the architecture - large volumes of content with no external links. One final point - if your development team are making changes to the site, keep an eye on your internal link strucure. If you lost a navigational element you could end up orphaning an entire content section. I've seen this happen and it can drastically reduce the length of your long tail if it affects enough pages. I've seen something similar happen, where pages were constantly being orphaned, re-linked to, and orphaned again, depending on whether a particular item was available in the database. I drew a chart of the number of keywords bring traffic to the site on a daily basis and the results clearly demonstrated instability in the long tail.
Cancel
- Dan Florin
 
 2009-12-02T07:41:52-08:00
 
 Following the indicator "Rate of Growth in Pages versus Backlinks" seems correct, but is it only about dofollow links or also nofollow?
 
 I'm asking this, as people tend to abuse of nofollow links these days.
 
 Thanks!
 
 Dan
 
 1 0
 
 Following the indicator "Rate of Growth in Pages versus Backlinks" seems correct, but is it only about dofollow links or also nofollow? I'm asking this, as people tend to abuse of nofollow links these days. Thanks! Dan 
 Cancel
Mark Jackson

2009-12-01T01:27:21-08:00

Great post - although it raises a few worrying thoughts!

Particularly useful for Wordpress style blogs where there are so many ways of accessing the same content - time for some housekeeping me thinks!

Thanks - now get some sleep!

3 0

Great post - although it raises a few worrying thoughts! Particularly useful for Wordpress style blogs where there are so many ways of accessing the same content - time for some housekeeping me thinks! Thanks - now get some sleep!
Cancel
Webranking

2009-12-01T03:58:21-08:00

Great post

I think there's another factor who can influence the "indexation cap": the amount of internal duplicate content.

IMHO, the highest is the number of internal duplicate pages, the lower the indexation cap. Having an efficient strucutre means also saturate the indexation cap with only relevant pages.

3 0

Great post I think there's another factor who can influence the "indexation cap": the amount of internal duplicate content. IMHO, the highest is the number of internal duplicate pages, the lower the indexation cap. Having an efficient strucutre means also saturate the indexation cap with only relevant pages.
Cancel
Steven van Vessum

2009-12-01T05:22:25-08:00

Great Post Rand. Doing SEO for really big sites is a whole different game. And nice dialogue with the Adwords peeps ;)

3 0

Great Post Rand. Doing SEO for really big sites is a whole different game. And nice dialogue with the Adwords peeps ;)
Cancel
Seth Rietdijk

2009-12-01T03:19:25-08:00

Great post but I think you "missed" out one important metric: the actual speed of the website (as a whole and individual pages).

I think this is/will also cause a certain drop in indexed pages as well.

Why? Big websites with a LOT of content tend to slow down quite a lot in the process... if it is not fixed the bots will not really like it and may leave before they got the chance to crawl it.

3 1

Great post but I think you "missed" out one important metric: the actual speed of the website (as a whole and individual pages). I think this is/will also cause a certain drop in indexed pages as well. Why? Big websites with a LOT of content tend to slow down quite a lot in the process... if it is not fixed the bots will not really like it and may leave before they got the chance to crawl it. 
Cancel
identity

2009-12-01T06:38:25-08:00

Absolutely, and only going to increase.

No longer is it publish or perish, but publish originally, or -- and minimize internal duplication, URL bloat, earn strong external validation -- or perish.

2 0

Absolutely, and only going to increase. No longer is it publish or perish, but publish originally, or -- and minimize internal duplication, URL bloat, earn strong external validation -- or perish. 
Cancel
SiteGround Web Hosting

2009-12-01T23:42:10-08:00

I wonder whether the removal of old and generally obsolete pages that have been left out there "not to generate 404 errors" would help "free some space" in Google's index.

2 0

I wonder whether the removal of old and generally obsolete pages that have been left out there "not to generate 404 errors" would help "free some space" in Google's index. 
Cancel
- Luke Jones
 
 2009-12-02T08:13:34-08:00
 
 It's a good point. I hope Google soon come up with a system that will crawl through and remove several types of sites:
 
 - Sites that are indexed in the first page that require you to login or even pay for the answer to your question.
 - Sites that lead to 404s, server errors or even 403 forbidden.
 - Misleading sites that don't explain your issue and are spam orientated (I know this is something Google is constantly working on, but the report system isn't exactly easy in most situations!)
 
 2 0
 
 It's a good point. I hope Google soon come up with a system that will crawl through and remove several types of sites: - Sites that are indexed in the first page that require you to login or even pay for the answer to your question. - Sites that lead to 404s, server errors or even 403 forbidden. - Misleading sites that don't explain your issue and are spam orientated (I know this is something Google is constantly working on, but the report system isn't exactly easy in most situations!)
 Cancel
 - SiteGround Web Hosting
 
 2009-12-03T05:45:57-08:00
 
 I would prefer them to develop more useful way of communication with people that goes a step further than "submit something and we may see it/do something".
 
 The percentage of low-quality and inaccurate results in SERPS is so big. I don't believe placing a cap over big sites with a lot of (mostly unique) content will solve those issues for Google.
 
 1 0
 
 I would prefer them to develop more useful way of communication with people that goes a step further than "submit something and we may see it/do something". The percentage of low-quality and inaccurate results in SERPS is so big. I don't believe placing a cap over big sites with a lot of (mostly unique) content will solve those issues for Google. 
 Cancel
 - Luke Jones
 
 2009-12-03T06:07:21-08:00
 
 Very good point there. But at least they are doing something. Google has always been a bit of a mysterious figure to most people. There's no real way of contacting them with proper results at the moment.
 
 They rely a little too much on automation.
 
 2 0
 
 Very good point there. But at least they are doing something. Google has always been a bit of a mysterious figure to most people. There's no real way of contacting them with proper results at the moment. They rely a little too much on automation.
 Cancel
Jen Wiss

2009-12-01T08:09:00-08:00

This is really interesting article - good work Rand!

I think the indexation cap is a good idea - especailly for the users as hopefully this will bring back even better quality results and finally start to remove more and more spam.

But...

One thing I have noticed again is the fact the Google is inserting breadcrumb links again (I've just googled "renault" and the second entry down has further links next to the url).

So my point is even though websites may start to loose content, website owners will start to gain further links in the SERP which has got to be a plus!

AngelDigital edited 2009-12-01T08:09:30-08:00
2 0

This is really interesting article - good work Rand! I think the indexation cap is a good idea - especailly for the users as hopefully this will bring back even better quality results and finally start to remove more and more spam. But... One thing I have noticed again is the fact the Google is inserting breadcrumb links again (I've just googled "renault" and the second entry down has further links next to the url). So my point is even though websites may start to loose content, website owners will start to gain further links in the SERP which has got to be a plus!
Cancel
Staff

Dr. Peter J. Meyers
Staff

2009-12-01T11:57:21-08:00

Great review of a question that we're definitely hearing a lot. One clarification - do you think this "cap" is a literal, fixed cap (X pages or X% of pages) or just an artifact of what I'll call "spider fatigue"? I've always thought of it as just a matter of there not being enough link-juice to go around, but it sounds like the Google rep suggested there is a literal, programmatic cap?

2 0

Great review of a question that we're definitely hearing a lot. One clarification - do you think this "cap" is a literal, fixed cap (X pages or X% of pages) or just an artifact of what I'll call "spider fatigue"? I've always thought of it as just a matter of there not being enough link-juice to go around, but it sounds like the Google rep suggested there is a literal, programmatic cap?
Cancel
- Ehren Reilly
 
 2009-12-02T11:26:49-08:00
 
 Isn't there a way to tell these apart? In one case, Googlebot has a "depth cap" that may be different for different spider entry pages. In the other case, there is a domain-wide cap.
 
 If we do what Rand says and spread link love directly from deep entry pages that may have few internal links from them (e.g., a very popular article from a couple years ago, we would expect to see the new "neighbor" pages getting indexed, and the total pages indexed would increase. However, if it is a domain-wide cap, we'd expect this strategy to results in the new "neighbor pages" getting indexed at the expense of some other pages somewhere, with the total number staying the same.
 
 In other words, reorganizing your internal links would not improve TrustRank or any of the factors that might boost your fixed cap, but it would allow you to get the most out of a spider that that that had a "depth cap".
 
 1 0
 
 Isn't there a way to tell these apart? In one case, Googlebot has a "depth cap" that may be different for different spider entry pages. In the other case, there is a domain-wide cap. If we do what Rand says and spread link love directly from deep entry pages that may have few internal links from them (e.g., a very popular article from a couple years ago, we would expect to see the new "neighbor" pages getting indexed, and the total pages indexed would increase. However, if it is a domain-wide cap, we'd expect this strategy to results in the new "neighbor pages" getting indexed at the expense of some other pages somewhere, with the total number staying the same. In other words, reorganizing your internal links would not improve TrustRank or any of the factors that might boost your fixed cap, but it would allow you to get the most out of a spider that that that had a "depth cap". 
 Cancel
Neil Palmer

2009-12-01T06:35:32-08:00

Great article, and this is a subject that many of us have thought about alot over the years. I used to have an e-commerce sites with 15,000 products, and even though I worked very hard on the internal and external linking, it still took 5 years to get anywhere near all the pages indexed.

The problem with branded products is that the landing pages will almost always be seen as duplicate content, and only by having a massively popular domain could you get around this.

For me the focus will always now be on fresh content, which means I advise my customers to describe their products themselves instead of copying the standard descriptions in as text. It's a nightmare and hard work but because of the competition now, it's a necessary evil.

2 0

Great article, and this is a subject that many of us have thought about alot over the years. I used to have an e-commerce sites with 15,000 products, and even though I worked very hard on the internal and external linking, it still took 5 years to get anywhere near all the pages indexed. The problem with branded products is that the landing pages will almost always be seen as duplicate content, and only by having a massively popular domain could you get around this. For me the focus will always now be on fresh content, which means I advise my customers to describe their products themselves instead of copying the standard descriptions in as text. It's a nightmare and hard work but because of the competition now, it's a necessary evil.
Cancel
humbled

2009-12-01T04:43:27-08:00

Really good post and interesting read but enough to start me worrying.It looks like the over commercialisation of the web and search engine results is really going to either push out the little guy who maintains a website which is interesting totally or leave him only findable though a selection of vertical search options

2 0

Really good post and interesting read but enough to start me worrying.It looks like the over commercialisation of the web and search engine results is really going to either push out the little guy who maintains a website which is interesting totally or leave him only findable though a selection of vertical search options
Cancel
SoulSurfer8

2009-12-13T20:02:36-08:00

One of the best points in the post is to watch the ratio of pages generated vs inbound links those pages attract. You simply can't hire rubbish writers for pennies to create crappy content.

2 0

One of the best points in the post is to watch the ratio of pages generated vs inbound links those pages attract. You simply can't hire rubbish writers for pennies to create crappy content.
Cancel
Travis-W

2013-01-14T09:57:38-08:00

Are the limits different depending on the domain? How will Google differentiate a site with hundreds of writers churning how hundreds of pages of quality content a day versus an aggregator? Would they possibly use the time the content was originally crawled, MozTrust, or other metrics?

1 0

Are the limits different depending on the domain? How will Google differentiate a site with hundreds of writers churning how hundreds of pages of quality content a day versus an aggregator? Would they possibly use the time the content was originally crawled, MozTrust, or other metrics?
Cancel
luckywomen

2010-02-05T12:06:34-08:00

Hi rand,

Thanks for a wonderful insight on suhc a critical topic. But what I dont understand is why google wants to have cap on newly build site. I have site https://www.ekhichdi.com that has some 300 odd pages but still just 100 pages are index. Though some old pages which has been removed are still in the index

Also just yesterday google index one of the pages that was created yesterday itself but it has not index pages that are 20 days old.

1 0

Hi rand, Thanks for a wonderful insight on suhc a critical topic. But what I dont understand is why google wants to have cap on newly build site. I have site <a href="https://www.ekhichdi.com/" rel="nofollow">https://www.ekhichdi.com</a> that has some 300 odd pages but still just 100 pages are index. Though some old pages which has been removed are still in the index Also just yesterday google index one of the pages that was created yesterday itself but it has not index pages that are 20 days old. 
Cancel
Melissa Fach - @SEOAware

2009-12-01T20:43:53-08:00

This was great! I especially loved this part "...soon to be recognized by the Academy of Nonsense for its pre-eminent place among the least helpful collection of words ever assembled, spurs bouts of cursing...". :-)
1. Great information as usual.
1 0
This was great! I especially loved this part "...soon to be recognized by the Academy of Nonsense for its pre-eminent place among the least helpful collection of words ever assembled, spurs bouts of cursing...". :-) <ol><li> Great information as usual. </li></ol>
Cancel
TracyWeb

2009-12-01T13:05:32-08:00

Thanks for the post Rand, was looking at one of my sites and noticed a significant drop in pages in the index and was like...WTF? This actually went undected longer than normal as I have been enjoying an increase in traffic on my Tier 1 keywords. That said, i want my long tails back. I pretty much figured I needed to add some value to those pages by way of content and links juice. This post confirms that.

BTW - This is one of the best line you've ever written (regarding the email back from Google), "This email, soon to be recognized by the Academy of Nonsense for its pre-eminent place among the least helpful collection of words ever assembled..."

Freakin' Hilarious! See you in AZ in Jan.

1 0

Thanks for the post Rand, was looking at one of my sites and noticed a significant drop in pages in the index and was like...WTF? This actually went undected longer than normal as I have been enjoying an increase in traffic on my Tier 1 keywords. That said, i want my long tails back. I pretty much figured I needed to add some value to those pages by way of content and links juice. This post confirms that. BTW - This is one of the best line you've ever written (regarding the email back from Google), "This email, soon to be recognized by the Academy of Nonsense for its pre-eminent place among the least helpful collection of words ever assembled..." Freakin' Hilarious! See you in AZ in Jan. 
Cancel
Renel

2009-12-01T21:07:51-08:00

Thanks for the update on this. I wonder if Google, or other search engines, will increase the cap after a couple of more years. Because if the trend continues then there would be billions of webpages all over the Internet, and that could take quite a load on their engines.

1 0

Thanks for the update on this. I wonder if Google, or other search engines, will increase the cap after a couple of more years. Because if the trend continues then there would be billions of webpages all over the Internet, and that could take quite a load on their engines.
Cancel
Chris Redshaw

2009-12-01T17:48:01-08:00

IF YOU LINKBUILD BY SYNDICATION, read this please

Say you send out articles for republication on another site, with links back to the original. This is a fantastic way of link building, but what heppens when you decide to time limit your articles, marking them for deletion after a year? You're be setting yourself up to erase all your hard work by having broken inbound links.

Fix this by linking to a major category of your site, like "breaking news" rather than the original article itself, but make sure you're articles are indexed first (you'll have to figure a way to do that)

Or, if someone has created a plugin that deletes articles after a year, then places a 301 redirect to on that page to a major category of your site, that would be awesome. In fact, that's a damn good idea for a CMS plugin, not meaning to toot my own horn.

Hope this helps,

Chris

chris.redshaw edited 2009-12-01T17:49:42-08:00
1 0

IF YOU LINKBUILD BY SYNDICATION, read this please Say you send out articles for republication on another site, with links back to the original. This is a fantastic way of link building, but what heppens when you decide to time limit your articles, marking them for deletion after a year? You're be setting yourself up to erase all your hard work by having broken inbound links. Fix this by linking to a major category of your site, like "breaking news" rather than the original article itself, but make sure you're articles are indexed first (you'll have to figure a way to do that) Or, if someone has created a plugin that deletes articles after a year, then places a 301 redirect to on that page to a major category of your site, that would be awesome. In fact, that's a damn good idea for a CMS plugin, not meaning to toot my own horn. Hope this helps, Chris 
Cancel
- Rand Fishkin
 
 2009-12-01T22:52:49-08:00
 
 While I can sometimes see the benefit of this, in my opinion, it's potentially more dangerous than the problem it seeks to fix. If you get those links pointing to other pages, rather than the original, there's a much higher chance your articles will be considered duplicates than if the copies all point to your originals. I think I'd recommend not deprecating old content without 301'ing it to the new version/location instead.
 
 1 0
 
 While I can sometimes see the benefit of this, in my opinion, it's potentially more dangerous than the problem it seeks to fix. If you get those links pointing to other pages, rather than the original, there's a much higher chance your articles will be considered duplicates than if the copies all point to your originals. I think I'd recommend not deprecating old content without 301'ing it to the new version/location instead.
 Cancel
samdaams

2009-12-22T01:33:19-08:00

Bummer I'm late to comment here as it's a great post.

If it's an indexation cap, I guess meta noindex/follow is a good way to help resolve this?

Or is it a 'crawl cap', in which case robots.txt blocking of pages might be better?

Some of the more fancy javascript created links are starting to look more and more attractive!

1 0

Bummer I'm late to comment here as it's a great post. If it's an indexation cap, I guess meta noindex/follow is a good way to help resolve this? Or is it a 'crawl cap', in which case robots.txt blocking of pages might be better? Some of the more fancy javascript created links are starting to look more and more attractive! 
Cancel
Christine Cadena

2009-12-02T10:24:49-08:00

As a contributing writer for Associated Content, this has been a huge issue on the site. Because we earn pageview bonus money from content written, it is vital that our unique content is indexed regularly to maintain search position. Unfortunately, many pages go unindexed. For this reason, I have found other ways to try and promote my content on a regular basis with link building directly to my content pages. Overtime, the cost and time associated with this effort depletes any earnings from pageview revenue. Catch 22, to me.

1 0

As a contributing writer for Associated Content, this has been a huge issue on the site. Because we earn pageview bonus money from content written, it is vital that our unique content is indexed regularly to maintain search position. Unfortunately, many pages go unindexed. For this reason, I have found other ways to try and promote my content on a regular basis with link building directly to my content pages. Overtime, the cost and time associated with this effort depletes any earnings from pageview revenue. Catch 22, to me.
Cancel
Reuben Yau

2009-12-02T10:14:02-08:00

Great post Rand (good to meet you pubcon btw :) )

Do you think the index cap is one of the reasons behind the development of caffeine?

Reuben

1 0

Great post Rand (good to meet you pubcon btw :) ) Do you think the index cap is one of the reasons behind the development of caffeine? Reuben 
Cancel
Luke Jones

2009-12-02T08:10:26-08:00

Thanks for taking the time to write this. It's made me stress out a little bit, but in an enriching way.

My main concern is that my upcoming website is going to be quite static on the root domain, which is the highest priority in the sitemap and has the CTAs within. But part of my site is also a blog area, which is where I'm hoping to drive some content to my site from. With Google placing the indexing cap on my site, am I at a significant level of risk when my blog is getting more visitors than the index of my site? If so, then what do you suggest?

1 0

Thanks for taking the time to write this. It's made me stress out a little bit, but in an enriching way. My main concern is that my upcoming website is going to be quite static on the root domain, which is the highest priority in the sitemap and has the CTAs within. But part of my site is also a blog area, which is where I'm hoping to drive some content to my site from. With Google placing the indexing cap on my site, am I at a significant level of risk when my blog is getting more visitors than the index of my site? If so, then what do you suggest?
Cancel
Imran Jafri

2009-12-12T13:22:22-08:00

For a while I have been looking the answer to that How many page in % does Google Index for a site. I have been running a blog for a year now and I noticed only approx 40% of the total pages are indexed. It alarmed me and wanted to explore what exactly is wrong. And going through several sources and finally this post i can confirm that Google certainly indexation cap in place for Websites with fewer baclinks. I obsereved several other sites with low quality content that had more pages indexed in Google and only found that they had good number of backlinks mostly generated through blogrolls or by distributing free wordpress theme with link to their blog. anyway thanks for the detailed information that i was looking for.

1 0

For a while I have been looking the answer to that How many page in % does Google Index for a site. I have been running a blog for a year now and I noticed only approx 40% of the total pages are indexed. It alarmed me and wanted to explore what exactly is wrong. And going through several sources and finally this post i can confirm that Google certainly indexation cap in place for Websites with fewer baclinks. I obsereved several other sites with low quality content that had more pages indexed in Google and only found that they had good number of backlinks mostly generated through blogrolls or by distributing free wordpress theme with link to their blog. anyway thanks for the detailed information that i was looking for. 
Cancel
Nikita Sharma

2010-02-05T03:12:22-08:00

Excellent post.

here i want to know about alternatives of directory submissions, can you help me for this.

thanks

1 0

Excellent post. here i want to know about alternatives of directory submissions, can you help me for this. thanks 
Cancel
Stephan Baldwin

2009-12-01T08:28:32-08:00

Think seomoz could create a simple tool that is a site index checker - helping us easily monitor our index size over time?

Like the Rank Checker but for Index Size.

1 0

Think seomoz could create a simple tool that is a site index checker - helping us easily monitor our index size over time? Like the Rank Checker but for Index Size. 
Cancel
CoolNameIdeas

2009-12-01T05:17:34-08:00

A fantastic post as usual.

I would imagine that the main factor has got to be internal duplicate content, which I see all the time on clients’ websites.

I think that this is a positive step in many ways fixing the dupe content and improving the site rather than falling back on just adding a site map will only improve the web after all.

1 0

A fantastic post as usual. I would imagine that the main factor has got to be internal duplicate content, which I see all the time on clients’ websites. I think that this is a positive step in many ways fixing the dupe content and improving the site rather than falling back on just adding a site map will only improve the web after all. 
Cancel
Darren Shaw

2009-12-01T06:24:46-08:00

Mechanical Turk just blew my mind. I can't believe this has been around since 2005 and this is the first I have heard of it.

1 0

Mechanical Turk just blew my mind. I can't believe this has been around since 2005 and this is the first I have heard of it.
Cancel
- kdfisher
 
 2009-12-01T09:13:07-08:00
 
 artificial, atificial intelligence? I still can't figure out it's purpose, but that's not new with the intenet.
 
 1 0
 
 artificial, atificial intelligence? I still can't figure out it's purpose, but that's not new with the intenet.
 Cancel
Jason Capshaw

2009-12-01T04:59:30-08:00

Nice post Rand...I could really identify with the adwords exec conversation :)

1 0

Nice post Rand...I could really identify with the adwords exec conversation :)
Cancel
Tom Critchlow

2009-12-01T04:13:00-08:00

Good post Rand. I'm flattered that 250 people felt kind enough to blog about my post (wat? did I miss the point?_.

Seriously though, I think this is the single biggest argument for still using nofollow and general link sculpting on a site - reduce crawl wastage and get more of your money pages in the main index.

Does a supplemental index still exist? I'd like to say yes but would love to hear others thoughts on that.

1 0

Good post Rand. I'm flattered that 250 people felt kind enough to blog about my post (wat? did I miss the point?_. Seriously though, I think this is the single biggest argument for still using nofollow and general link sculpting on a site - reduce crawl wastage and get more of your money pages in the main index. Does a supplemental index still exist? I'd like to say yes but would love to hear others thoughts on that. 
Cancel
- Muffin
 
 2009-12-01T07:20:31-08:00
 
 So why not use the meta robots noindex,follow rule?The pageranks will keep flowing, you don't have the PR "evaporation effect" (probably) and you do exactly what Google wants: you endorse an indexation cap, only you now are in charge of the pages indexed.
 
 2 0
 
 So why not use the meta robots noindex,follow rule?The pageranks will keep flowing, you don't have the PR "evaporation effect" (probably) and you do exactly what Google wants: you endorse an indexation cap, only you now are in charge of the pages indexed.
 Cancel
- Rand Fishkin
 
 2009-12-01T10:36:28-08:00
 
 Yeah - we've seen a specific example where a large site removed all their nofollows thinking that Google was hurting them for it, then lost 35-40% of their search traffic from the tail due to indexation problems. They put back the nofollows, waited a couple weeks, and things were back to normal. I'd say there's still strong evidence that while PageRank may "leak" nofollows can help with controlling the indexation on a site.
 
 2 0
 
 Yeah - we've seen a specific example where a large site removed all their nofollows thinking that Google was hurting them for it, then lost 35-40% of their search traffic from the tail due to indexation problems. They put back the nofollows, waited a couple weeks, and things were back to normal. I'd say there's still strong evidence that while PageRank may "leak" nofollows can help with controlling the indexation on a site.
 Cancel
 - Zoran Knezevic
 
 2009-12-01T16:58:58-08:00
 
 Rand once explained well 100*100*100 = 1M and #@(% all other tags and navigations and this is one of best advices ever. Noffolow maybe 'drains' link joice but it seems it is more important for google to determine relative pagerank of page on your website.
 
 1 0
 
 Rand once explained well 100*100*100 = 1M and #@(% all other tags and navigations and this is one of best advices ever. Noffolow maybe 'drains' link joice but it seems it is more important for google to determine relative pagerank of page on your website.
 Cancel
Adam Smith

2009-12-01T02:56:05-08:00

I for one can't wait to see what this does to two of my spammy competitors who seem to have untold thousands of pages in order to reap as much long tail as possible.

1 0

I for one can't wait to see what this does to two of my spammy competitors who seem to have untold thousands of pages in order to reap as much long tail as possible.
Cancel
SEOServiceWithStyle

2009-12-01T03:01:37-08:00

Excellent post Rand! This has been a topic of debate in our offices for a while now, nice work

1 0

Excellent post Rand! This has been a topic of debate in our offices for a while now, nice work
Cancel
Lara Walsh

2009-12-01T09:51:49-08:00

Nice post. Good topic too. Wonder if it would effect a 'small' site of 500+ pages. I ask because a site I worked on recently had been revamped, but the indexing was atrocious. This is with 9+ years of being active.

Also, I read somewhere recently the nofollows don't actually work. Any ideas on this?

Cheers,
Lara

1 0

Nice post. Good topic too. Wonder if it would effect a 'small' site of 500+ pages. I ask because a site I worked on recently had been revamped, but the indexing was atrocious. This is with 9+ years of being active. Also, I read somewhere recently the nofollows don't actually work. Any ideas on this? Cheers, Lara
Cancel
Derk Loebus

2009-12-01T06:29:28-08:00

Sure thing that Google won't like to keep too many pages around, since in the end it justs adds up to their spending. Say now, you want to play nice ad relief Google a little bit and have less pages cached. Anyone an idea how one could do that?

I currently have 6 Mio pages in the index, where my sitemap lists only 300k links and I am really wondering what else Google has stored (we have a lot more pages, but they are basically combinations of filters and hence "no-index").

1 0

Sure thing that Google won't like to keep too many pages around, since in the end it justs adds up to their spending. Say now, you want to play nice ad relief Google a little bit and have less pages cached. Anyone an idea how one could do that? I currently have 6 Mio pages in the index, where my sitemap lists only 300k links and I am really wondering what else Google has stored (we have a lot more pages, but they are basically combinations of filters and hence "no-index"). 
Cancel
Caliber Interactive

2009-12-01T08:45:12-08:00

Nice post Rand

Google's crawl resource is an ongoing challenge. Those machines are running hot!

1 0

Nice post Rand Google's crawl resource is an ongoing challenge. Those machines are running hot!
Cancel
trickym81

2009-12-01T08:47:14-08:00

Excellent post. It does make things a little difficult for WordPress blogs, but I like the overarching thought (once again) that high-quality content is the answer.

1 0

Excellent post. It does make things a little difficult for WordPress blogs, but I like the overarching thought (once again) that high-quality content is the answer.
Cancel
SEOBlahBlah

2009-12-01T09:01:44-08:00

Great post - inspired me to register and ask a question...

If I have a site with about 100,000 "quality pages" and over 2 million "poor quality" pages primarily for SEO, am I shooting myself in the foot? Does this article imply that I would be better off deleting the 2 million SEO pages (which bring in about 1k-3k visits per day) to boost the quality of the 100,000 quality pages? And would the net gain be better if I made that decision? Thanks for any thoughts.

1 0

Great post - inspired me to register and ask a question... If I have a site with about 100,000 "quality pages" and over 2 million "poor quality" pages primarily for SEO, am I shooting myself in the foot? Does this article imply that I would be better off deleting the 2 million SEO pages (which bring in about 1k-3k visits per day) to boost the quality of the 100,000 quality pages? And would the net gain be better if I made that decision? Thanks for any thoughts. 
Cancel
- Rand Fishkin
 
 2009-12-01T10:37:59-08:00
 
 I'm not sure I'd phrase it exactly that way - the 100K pages, if they're getting indexed and aren't at risk (they have external links and other positive signals), may not be harmed by the other 2 million. But, to get those 2 million consistently in and earning traffic, you probably need to think about the items above.
 
 1 0
 
 I'm not sure I'd phrase it exactly that way - the 100K pages, if they're getting indexed and aren't at risk (they have external links and other positive signals), may not be harmed by the other 2 million. But, to get those 2 million consistently in and earning traffic, you probably need to think about the items above.
 Cancel
Jim Rudnick

2009-12-01T07:14:37-08:00

Hmm...great piece, Rand.

This quote tho --

"...We've talked previously about metrics like a domain-level calculation of PageRank (Domain mozRank is an example of this). It's likely that Google would make this a backbone of the indexation cap estimate, as sites that tend to be more important and well-linked-to by other important sites tend to also have content worthy of being in the index..."

-- will make me rethink my own opinions of PR...exceptional catch here!!!!

:-)

Jim

1 0

Hmm...great piece, Rand. This quote tho -- "...We've talked previously about metrics like a domain-level calculation of PageRank (Domain mozRank is an example of this). It's likely that Google would make this a backbone of the indexation cap estimate, as sites that tend to be more important and well-linked-to by other important sites tend to also have content worthy of being in the index..." -- will make me rethink my own opinions of PR...exceptional catch here!!!! :-) Jim
Cancel
Chris Barton

2009-12-01T08:10:58-08:00

"...We've talked previously about metrics like a domain-level calculation of PageRank (Domain mozRank is an example of this). It's likely that Google would make this a backbone of the indexation cap estimate, as sites that tend to be more important and well-linked-to by other important sites tend to also have content worthy of being in the index...

-- will make me rethink my own opinions of PR...exceptional catch here!!!"

Agree with you JVRudnick

1 0

"...We've talked previously about metrics like a domain-level calculation of PageRank (Domain mozRank is an example of this). It's likely that Google would make this a backbone of the indexation cap estimate, as sites that tend to be more important and well-linked-to by other important sites tend to also have content worthy of being in the index... -- will make me rethink my own opinions of PR...exceptional catch here!!!" Agree with you JVRudnick 
Cancel
Jordan Tyler

2009-12-01T07:14:53-08:00

While there are several ways websites can mitigate this I think decisions like these on Google's part are a turn for the worse.

I understand the technical reasons to make such decisions (db storage, processors, speed to index, etc...) but when your business is driven by relevancy, as Google's is, removing pages to make life easier for them means they are/may be removinghigh relevancy pages for a lot of searches. This seems to be an instance when Google serves themselves and not their customers. I understand why they make these decisions but think these small issues are the same type of issues that led Yahoo to their decline.

1 0

While there are several ways websites can mitigate this I think decisions like these on Google's part are a turn for the worse. I understand the technical reasons to make such decisions (db storage, processors, speed to index, etc...) but when your business is driven by relevancy, as Google's is, removing pages to make life easier for them means they are/may be removinghigh relevancy pages for a lot of searches. This seems to be an instance when Google serves themselves and not their customers. I understand why they make these decisions but think these small issues are the same type of issues that led <a href="https://www.siftable.com/education/articles/yahoo-im-going-to-have-to-break-up-with-you/" rel="nofollow">Yahoo to their decline.</a> 
Cancel
Nicolas Chevallier

2009-12-02T06:38:16-08:00

I think Google assign an amount of time to spend on a website based on the rank of this website. Then it will crawl pages until it reaches this amount of time : so html download speed and ranking is important to crawl, and index also.

1 2

I think Google assign an amount of time to spend on a website based on the rank of this website. Then it will crawl pages until it reaches this amount of time : so html download speed and ranking is important to crawl, and index also.
Cancel
iexplore vietnam

2009-12-05T02:21:38-08:00

Sorry because i need help.

I have a website https://www.iexplorevietnam.com, any of you kindly to check for me if it is working well, I build this by myself but don't really know how to make it friendl with google search engine.

Very thank you if anyone could check for me.

HELP!!!!

- Casey Removed Link

caseyhen edited 2010-04-06T17:05:18-07:00
1 3

Sorry because i need help. I have a website https://www.iexplorevietnam.com, any of you kindly to check for me if it is working well, I build this by myself but don't really know how to make it friendl with google search engine. Very thank you if anyone could check for me. HELP!!!! - Casey Removed Link 
Cancel
DataEntryServices

2009-12-01T04:29:08-08:00

Good thing I only have 999,999 pages! Honestly, why would any site be that big.

2 4

Good thing I only have 999,999 pages! Honestly, why would any site be that big.
Cancel

Post Analytics

Comments 57

Log in to Moz

Don't have an account?