Cracking Google's 1,000 Page Barrier

Comments 45

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Rand Fishkin

2008-07-02T14:28:45-07:00

Great post Pete - I love digging into what the search engines provide and offering up ways to use them. Other tactics along I like along these same lines:
- site+ subfolder - e.g. site:seomoz.org/blog
- site+intitle - e.g. site:seomoz.org intitle:youmoz
- date range query - e.g. adding &as_qdr=m4 (last 4 months) to the query string
There are other good ones, too, but these generally work well for me.

13 0
Great post Pete - I love digging into what the search engines provide and offering up ways to use them. Other tactics along I like along these same lines: <ul><li>site+ subfolder - e.g. site:seomoz.org/blog</li><li>site+intitle - e.g. site:seomoz.org intitle:youmoz</li><li>date range query - e.g. adding <a href="https://www.google.com/search?q=site%3Aseomoz.org&as_qdr=m4" rel="nofollow">&as_qdr=m4</a> (last 4 months) to the query string </li></ul> There are other good ones, too, but these generally work well for me. 
Cancel
- Richard Baxter
 
 2008-07-02T14:36:57-07:00
 
 Oooo! That date range query is nice Rand! I've just been talking about dupe content detection using similar techniques here - hope you like
 
 richardbaxter edited 2008-07-02T14:37:19-07:00
 3 0
 
 Oooo! That date range query is nice Rand! I've just been talking about dupe content detection using similar techniques <a href="https://seogadget.co.uk/duplicate-content-detection/" rel="nofollow">here</a> - hope you like
 Cancel
- Dr. Peter J. Meyers
 
 2008-07-02T14:45:21-07:00
 
 Excellent; thanks. I keep hearing about the date-range query but wasn't clear on how to use it as a URL parameter.
 
 Oh, and BTW, thanks for letting me use SEOmoz as an example. I know it's publicly available information, but some people might still consider it a little invasive. You just happened to be a perfect example for this exercise.
 
 4 0
 
 Excellent; thanks. I keep hearing about the date-range query but wasn't clear on how to use it as a URL parameter. Oh, and BTW, thanks for letting me use SEOmoz as an example. I know it's publicly available information, but some people might still consider it a little invasive. You just happened to be a perfect example for this exercise. 
 Cancel
 - Pete Watson-Wailes
 
 2008-07-02T15:17:54-07:00
 
 Would a post on all the URL parameters for Google be helpful?
 
 4 0
 
 Would a post on all the URL parameters for Google be helpful?
 Cancel
 - Kristy Bolsinger
 
 2008-07-02T16:22:20-07:00
 
 Ooohh...YES YES YES :D
 
 2 0
 
 Ooohh...YES YES YES :D
 Cancel
 - identity
 
 2008-07-08T05:53:40-07:00
 
 Okay, this hasn't been updated lately, but still pretty on target. Here is a free PDF that Stephan did back when...
 
 Unlocking Google's Hidden Potential
 
 2 0
 
 Okay, this hasn't been updated lately, but still pretty on target. Here is a free PDF that Stephan did back when... <a href="https://www.netconcepts.com/google-ebook/" rel="nofollow">Unlocking Google's Hidden Potential</a>
 Cancel
- Kristy Bolsinger
 
 2008-07-02T16:12:50-07:00
 
 Great queries, but I'm a little confused about the date range query. This might be a stupid question, but just so I'm straight here...You go to google, do your search, and once you have the results you add that juicy little tidbit to the end of the string and that will give you the filter?
 
 thanks for helping out the clueless :P
 
 1 0
 
 Great queries, but I'm a little confused about the date range query. This might be a stupid question, but just so I'm straight here...You go to google, do your search, and once you have the results you add that juicy little tidbit to the end of the string and that will give you the filter? thanks for helping out the clueless :P 
 Cancel
 - Dr. Peter J. Meyers
 
 2008-07-02T16:18:28-07:00
 
 Exactly: the version Rand gave is just a URL parameter. You can also go to Google, click on "Advanced Search" and then open up the "Date, usage rights, numeric range, and more" link to see the date-search options.
 
 I should've actually bothered to try that one out earlier, so thanks for the nudge :)
 
 3 0
 
 Exactly: the version Rand gave is just a URL parameter. You can also go to Google, click on "Advanced Search" and then open up the "Date, usage rights, numeric range, and more" link to see the date-search options. I should've actually bothered to try that one out earlier, so thanks for the nudge :) 
 Cancel
- dasterx
 
 2012-04-09T00:46:55-07:00
 
 cool stuff!
 
 gfiorelli1 edited 2012-04-09T08:27:36-07:00
 1 0
 
 cool stuff!
 Cancel
venice

2008-07-02T15:17:02-07:00

Excellent post Pete. You had me at "cracking." :)

2 0

Excellent post Pete. You had me at "cracking." :)
Cancel
FirstStop

2008-07-05T12:34:40-07:00

Great post.

Just one thing to be aware of: The total numbers of search results reported by Google is not accurate for most searches.

They use a distributed BigTable database for their index. And Google's developers always keep on saying: "BigTable doesn't have counts". That's why the total number of search results changes every time you click "Next". It's just an estimation.

So if Google reports, say, 975 results on the first page, it doesn'tmean that you will be able to get those 975 URLs from Google. On the second page this number may be slightly different, and a few pages later you will see that 342 results is all that Google have for you.

P.S. As a shameless self-promotion, I can mention our tool called FirstStop WebSearch, which can be used to automate retrieving large amounts of search results. Some of our clients found it convenient to use batch searches (using techniques similar to those described in this article) to break the Google's 1,000 page barrier.

FirstStop edited 2008-07-05T12:35:11-07:00
2 0

Great post. Just one thing to be aware of: The total numbers of search results reported by Google is not accurate for most searches. They use a distributed BigTable database for their index. And Google's developers always keep on saying: "BigTable doesn't have counts". That's why the total number of search results changes every time you click "Next". It's just an estimation. So if Google reports, say, 975 results on the first page, it doesn'tmean that you will be able to get those 975 URLs from Google. On the second page this number may be slightly different, and a few pages later you will see that 342 results is all that Google have for you. P.S. As a shameless self-promotion, I can mention our tool called FirstStop WebSearch, which can be used to automate retrieving large amounts of search results. Some of our clients found it convenient to use batch searches (using techniques similar to those described in this article) to break the Google's 1,000 page barrier.
Cancel
- Dr. Peter J. Meyers
 
 2008-07-05T14:53:22-07:00
 
 Although the total search results for a regular query are usually an approximation, I've found the result count for the "site:" command to be reasonably accurate. It's definitely true, though, that as you drill down into smaller counts, the accuracy seems to improve.
 
 1 0
 
 Although the total search results for a regular query are usually an approximation, I've found the result count for the "site:" command to be reasonably accurate. It's definitely true, though, that as you drill down into smaller counts, the accuracy seems to improve.
 Cancel
 - FirstStop
 
 2008-07-09T09:13:38-07:00
 
 It depends ;-)
 
 For site:seomoz.org inurl:ugc the first page said 711 but I could only get 380 results.
 
 Results 371 - 380 of 380 from seomoz.org for inurl:ugc
 
 For site:seomoz.org inurl:articles the initial estimate was 99 and the final 38.
 
 1 0
 
 It depends ;-) For site:seomoz.org inurl:ugc the first page said 711 but I could only get 380 results. <blockquote>Results 371 - 380 of 380 from seomoz.org for inurl:ugc</blockquote> For site:seomoz.org inurl:articles the initial estimate was 99 and the final 38.
 Cancel
Richard Baxter

2008-07-02T14:34:50-07:00

Nice one Pete! sphunn!

richardbaxter edited 2008-07-02T15:17:51-07:00
2 0

Nice one Pete! <a href="https://sphinn.com/story/56642" rel="nofollow">sphunn!</a>
Cancel
- Dr. Peter J. Meyers
 
 2008-07-02T14:38:07-07:00
 
 Thanks... you beat me to the punch. I was about to tell everyone that you just posted a great blog entry on finding dupe content with site:, inurl:, and intitle:. File that under "great minds think alike" ;)
 
 3 0
 
 Thanks... you beat me to the punch. I was about to tell everyone that you just posted a great blog entry on <a href="https://sphinn.com/story.php?id=56636" rel="nofollow">finding dupe content</a> with site:, inurl:, and intitle:. File that under "great minds think alike" ;)
 Cancel
 - Richard Baxter
 
 2008-07-02T14:43:30-07:00
 
 Cheers for the shout and the sphinns on my post too.
 
 richardbaxter edited 2008-07-02T15:19:31-07:00
 2 0
 
 Cheers for the shout and the <a href="">sphinns on my post too</a>.
 Cancel
GoogleConsultant

2008-07-07T13:00:08-07:00

great post doctor Pete,

I work full time for a very large directory firm in the UK and we have over 5 million pages in our directory. I undertand how hard it is to find out how all the pages are ranking due to the size of the site we have. We currently get over 7 million searches per month but would like more.

We have tried to make it as easy as possible for Google to understand the pages we want ranked and the reverse o not getting indexed. We are currently considering using nofollows on all client links so we dont look like we are selling paid listings. As we have multipul thousands o clients this needs to be right.due to thefact we have both canonical and non-canonical URLs in the idex it makes it even harder to understand whats going on.

Thanks for the article, its given me some ideas. I do appreciate the tame taken to write the post.

1 0

great post doctor Pete, I work full time for a very large directory firm in the UK and we have over 5 million pages in our directory. I undertand how hard it is to find out how all the pages are ranking due to the size of the site we have. We currently get over 7 million searches per month but would like more. We have tried to make it as easy as possible for Google to understand the pages we want ranked and the reverse o not getting indexed. We are currently considering using nofollows on all client links so we dont look like we are selling paid listings. As we have multipul thousands o clients this needs to be right.due to thefact we have both canonical and non-canonical URLs in the idex it makes it even harder to understand whats going on. Thanks for the article, its given me some ideas. I do appreciate the tame taken to write the post. 
Cancel
- Dr. Peter J. Meyers
 
 2008-07-07T13:09:05-07:00
 
 Wow, and I thought sorting through a 30,000-page index for a client was a challenge. You've really got your work cut out for you with 5,000,000,000 pages, but I suppose the basic challenges are the same.
 
 1 0
 
 Wow, and I thought sorting through a 30,000-page index for a client was a challenge. You've really got your work cut out for you with 5,000,000,000 pages, but I suppose the basic challenges are the same.
 Cancel
 - SeanMaguire
 
 2008-07-07T14:00:37-07:00
 
 Pete,
 
 I read your comment before I saw the one above it and I immediately thought - WOW! Five Billion pages! That's like GigaGinormous!
 
 I think you have a few too many extra zeros in there. ;)
 
 2 0
 
 Pete, I read your comment before I saw the one above it and I immediately thought - WOW! Five Billion pages! That's like GigaGinormous! I think you have a few too many extra zeros in there. ;) 
 Cancel
 - Dr. Peter J. Meyers
 
 2008-07-07T14:17:28-07:00
 
 Oops, got a little carried away :)
 
 1 0
 
 Oops, got a little carried away :)
 Cancel
Ann Smarty

2008-07-07T07:03:49-07:00

An awesome post. I wish there was an easy way to sort out dupe URLs that pop out with different inurl: queries...

1 0

An awesome post. I wish there was an easy way to sort out dupe URLs that pop out with different inurl: queries...
Cancel
identity

2008-07-08T05:46:17-07:00

Dr. Pete,

Great post! Especially when initially auditing a site, we'll undergo this type of URL dissection because it also helps to illustrate potential siloing, templating, and may illustrate where challenges exist, such as much lower indexation of a URL pattern that may illustrate crawling issues or pages that are perceived as duplication.

I tend to prefer the segmentation within a site: query.

site:www.seomoz.org/blog

Even in cases where the modifier doesn't appear elsewhere within the URL, the above and the inurl: will often result in differing counts, so whichever method is used, it is best to pick and stick with one than go back and forth.

The advanced queries can be extremely useful and powerful, but mixing and matching with them can be a little challenging. I recommend people play with these on a site they know they well to get a comfort level of what works, what works best, and what goes wacky.

On a sidenote: from a methodology standpoint, I have the team append &start=990&filter=0 to the Google URL to jump to the last page and pull in any omitted results. In some brief tests on sites that were small enough (less than 1,000 results), I've found this to be the most accurate method.

1 0

Dr. Pete, Great post! Especially when initially auditing a site, we'll undergo this type of URL dissection because it also helps to illustrate potential siloing, templating, and may illustrate where challenges exist, such as much lower indexation of a URL pattern that may illustrate crawling issues or pages that are perceived as duplication. I tend to prefer the segmentation within a site: query. <blockquote>site:www.seomoz.org/blog</blockquote> Even in cases where the modifier doesn't appear elsewhere within the URL, the above and the inurl: will often result in differing counts, so whichever method is used, it is best to pick and stick with one than go back and forth. The advanced queries can be extremely useful and powerful, but mixing and matching with them can be a little challenging. I recommend people play with these on a site they know they well to get a comfort level of what works, what works best, and what goes wacky. On a sidenote: from a methodology standpoint, I have the team append &start=990&filter=0 to the Google URL to jump to the last page and pull in any omitted results. In some brief tests on sites that were small enough (less than 1,000 results), I've found this to be the most accurate method.
Cancel
- Dr. Peter J. Meyers
 
 2008-07-08T06:27:42-07:00
 
 ...but mixing and matching with them can be a little challenging.
 Whenever I try a new combination, I try to use a little common sense and see if the results pass the smell test. I was playing with a strange mix of multiple "intitle:" and "inurl:" statements the other day, for example, and the SERPs made absolutely no sense.
 
 1 0
 
 <blockquote>...but mixing and matching with them can be a little challenging.</blockquote>Whenever I try a new combination, I try to use a little common sense and see if the results pass the smell test. I was playing with a strange mix of multiple "intitle:" and "inurl:" statements the other day, for example, and the SERPs made absolutely no sense. 
 Cancel
- dericknwq
 
 2009-03-05T19:26:42-08:00
 
 I do prefer site: modifier over the inurl: modifier as well when it comes to segmenting folders.
 
 In the inurl:blog example, even results from /some-folder/containing/blog/here gets counted which is not intended.
 
 1 0
 
 I do prefer site: modifier over the inurl: modifier as well when it comes to segmenting folders. In the inurl:blog example, even results from /some-folder/containing/blog/here gets counted which is not intended. 
 Cancel
DavidRahmel

2008-08-14T12:15:28-07:00

How to go about Yahoo and MSN?

1 0

How to go about Yahoo and MSN?
Cancel
- Webdannmark
 
 2009-03-28T14:58:59-07:00
 
 superb post and cool quick tips for google
 
 1 0
 
 superb post and cool quick tips for google
 Cancel
partyrama

2010-04-14T06:01:30-07:00

hi Guys

I have been looking into this and got struck at

https://www.google.co.uk/webhp?sourceid=navclient-ff#hl=en&q=site:partyrama.co.uk&start=990&sa=N&fp=ee8e7832ac926f03

Site I am working on is www.partyrama.co.uk

1 0

hi Guys I have been looking into this and got struck at https://www.google.co.uk/webhp?sourceid=navclient-ff#hl=en&q=site:partyrama.co.uk&start=990&sa=N&fp=ee8e7832ac926f03 Site I am working on is www.partyrama.co.uk 
Cancel
mediacloud

2010-02-09T12:41:33-08:00

Just did some research on my sites.. and i found some interesting stuff google index special character links.. nice post! thumbs UP!

1 0

Just did some research on my sites.. and i found some interesting stuff google index special character links.. nice post! thumbs UP!
Cancel
tombart

2010-01-14T19:02:30-08:00

I think this inurl: thing may help me. My blog is using the 'all in one SEO' plugin and it has decided to give the /tag/ pages (that get indexed) the canonical preference. I want to change all of these to point to the actual post, to avoid duplication.

Can I use the inurl: tool to find all the /tag/ pages, then edit them all individually to change the canonical command?

The blog is Freebiejeebies

Thanks

1 0

I think this inurl: thing may help me. My blog is using the 'all in one SEO' plugin and it has decided to give the /tag/ pages (that get indexed) the canonical preference. I want to change all of these to point to the actual post, to avoid duplication. Can I use the inurl: tool to find all the /tag/ pages, then edit them all individually to change the canonical command? The blog is <a href="https://www.freebiejeebiesgadgets.com" rel="nofollow">Freebiejeebies</a> Thanks
Cancel
- Dr. Peter J. Meyers
 
 2010-01-14T19:21:17-08:00
 
 You can use inurl: or you can just add the virtual directory path right to the site: command, like this:
 
 https://www.google.com/search?q=site:www.freebiejeebiesgadgets.com/tag
 
 2 0
 
 You can use inurl: or you can just add the virtual directory path right to the site: command, like this: https://www.google.com/search?q=site:www.freebiejeebiesgadgets.com/tag
 Cancel
 - tombart
 
 2010-01-15T04:09:48-08:00
 
 Thanks for the help - I didn't know I could do that!
 
 1 0
 
 Thanks for the help - I didn't know I could do that!
 Cancel
SeanMaguire

2008-07-06T13:20:31-07:00

Hey Pete,

I've been out of pocket for a few days but just wanted to quick add my commendations for a very nice post. The fact that it also elicited a few more helpful query strings and ideas only adds to an already excellent contribution.

I found this one and the better captcha post on your UserEffect site to be both insightful and helpful.

Thanks.

1 0

Hey Pete, I've been out of pocket for a few days but just wanted to quick add my commendations for a very nice post. The fact that it also elicited a few more helpful query strings and ideas only adds to an already excellent contribution. I found this one and the <a href="https://www.usereffect.com/topic/captcha-is-there-a-better-way" rel="nofollow">better captcha</a> post on your UserEffect site to be both insightful and helpful. Thanks. 
Cancel
KristenT

2008-07-10T22:36:36-07:00

Oooo...I'm having the very same issue with canonical and non-canonical URLs, and my programmer seems confused as to why it's a problem. Do you have an easy fix I can pass along?

Thanks for the great information. Super helpful!

1 0

Oooo...I'm having the very same issue with canonical and non-canonical URLs, and my programmer seems confused as to why it's a problem. Do you have an easy fix I can pass along? Thanks for the great information. Super helpful! 
Cancel
JonnyRash

2008-07-03T08:04:43-07:00

Cool, I use site: a lot, but I'm excited to try out inurl:

1 0

Cool, I use site: a lot, but I'm excited to try out inurl:
Cancel
webent

2008-07-02T15:07:28-07:00
- I am new to SEO and website development. I have been doing a lot of reading on SEO and trying to stay on top of relevant blogs. Anyway, I am now getting ready to start teaching myself website development from the ground up. I have begun to learn HTML and want to get some opinions of what website development software works best with writing clean script so I can optimize my SEO efforts. Any other suggestions on how to go about teaching myself is appreciated!
webent edited 2008-07-02T15:09:47-07:00
1 0
<ul><li>I am new to SEO and website development. I have been doing a lot of reading on SEO and trying to stay on top of relevant blogs. Anyway, I am now getting ready to start teaching myself website development from the ground up. I have begun to learn HTML and want to get some opinions of what website development software works best with writing clean script so I can optimize my SEO efforts. Any other suggestions on how to go about teaching myself is appreciated! </li></ul>
Cancel
- Kwyjibo
 
 2008-07-03T09:52:11-07:00
 
 read everything you can on this site - start with the free articles and work your way up.
 
 there are a lot of really great insights here so use the search to find your answers too.
 
 good luck, and welcome to the party :)
 
 2 0
 
 read everything you can on this site - start with the free articles and work your way up. there are a lot of really great insights here so use the search to find your answers too. good luck, and welcome to the party :) 
 Cancel
Ciaran Norris

2008-07-02T14:43:33-07:00

Really nice post Pete - I don't work with as many large sites as I used to, but I'll be bookmarking this one for when I do, and sending it on to people I know who still are

1 0

Really nice post Pete - I don't work with as many large sites as I used to, but I'll be bookmarking this one for when I do, and sending it on to people I know who still are
Cancel
Rishi Lakhani

2008-07-02T14:41:12-07:00

dr pete thats the best thing I have read in the last 4 weeks, and trust me, theres been some good stuff going round. awesome.

I wish I had something intelligent to add, but I dont :P

1 0

dr pete thats the best thing I have read in the last 4 weeks, and trust me, theres been some good stuff going round. awesome. I wish I had something intelligent to add, but I dont :P
Cancel
Kristy Bolsinger

2008-07-02T16:19:07-07:00

Great post....very helpful!!!!

1 0

Great post....very helpful!!!! 
Cancel
Jean Marc Thomas

2008-07-03T02:11:45-07:00

thank you Pete for all those very helpful insights! great post, especially when working with larger sites, such as e-commerce ones. until now I was using mostly site:+keyword in the query, eg. site:www.seomoz.org rand but your tips will get me better results. thx again!

BottomTurn edited 2008-07-03T02:14:47-07:00
1 0

thank you Pete for all those very helpful insights! great post, especially when working with larger sites, such as e-commerce ones. until now I was using mostly site:+keyword in the query, eg. site:www.seomoz.org rand but your tips will get me better results. thx again! 
Cancel
David Mihm

2008-07-03T06:37:31-07:00

Great post, Dr. Pete, you are really killin' it this week with your CAPTCHA post and now this.

I don't use site: and inurl: commands nearly as much as I should, and this is a fantastic reminder to do so.

Rand, thanks for the additional advanced operators also.

1 0

Great post, Dr. Pete, you are really killin' it this week with your CAPTCHA post and now this. I don't use site: and inurl: commands nearly as much as I should, and this is a fantastic reminder to do so. Rand, thanks for the additional advanced operators also. 
Cancel
DavidLaFerney

2008-07-03T06:12:38-07:00

Good stuff Pete. Congratulations on promotion to the blog.

1 0

Good stuff Pete. Congratulations on promotion to the blog.
Cancel
awatson

2008-07-03T08:50:31-07:00

I'm wondering if there'd be any utility in large sites planning for this when designing their site by including unique identifying strings for sections that they'll want to track. Like you pointed out "blog" could return inurl: results that aren't actually in the blog section, so perhaps adding something more unique to all blog page urls would be useful?

1 0

I'm wondering if there'd be any utility in large sites planning for this when designing their site by including unique identifying strings for sections that they'll want to track. Like you pointed out "blog" could return inurl: results that aren't actually in the blog section, so perhaps adding something more unique to all blog page urls would be useful?
Cancel
- Dr. Peter J. Meyers
 
 2008-07-03T08:56:04-07:00
 
 It's absolutely a good idea to plan out a logical URL structure and make links distinctive, but I've toyed around with it a bit, and you have to be careful not to get too cryptic. The priority is still to create URLs that are user-friendly and good for general SEO.
 
 1 0
 
 It's absolutely a good idea to plan out a logical URL structure and make links distinctive, but I've toyed around with it a bit, and you have to be careful not to get too cryptic. The priority is still to create URLs that are user-friendly and good for general SEO.
 Cancel
 - ColleenMonahan
 
 2008-07-03T11:05:59-07:00
 
 Well done Dr.
 
 As a WordPress user I have to take some of the plugin descriptions at face value for what they do. Those claiming to get rid of WP's duplicate content issues have always been difficult to truly evaluate. These techniques show definite ways to help out, thanks.
 
 The newsletter I do in-house work for writes on the same few subjects over and over again so I use the site: + intitle: + keyphrase or topic to help ensure canonical linking structures are what I'd like them to be as well as keeping an eye out for unintentional cannibalism.
 
 Cheers!
 
 @trontastic
 
 1 0
 
 Well done Dr. As a WordPress user I have to take some of the plugin descriptions at face value for what they do. Those claiming to get rid of WP's duplicate content issues have always been difficult to truly evaluate. These techniques show definite ways to help out, thanks. The newsletter I do in-house work for writes on the same few subjects over and over again so I use the site: + intitle: + keyphrase or topic to help ensure canonical linking structures are what I'd like them to be as well as keeping an eye out for unintentional cannibalism. Cheers! @trontastic 
 Cancel

Post Analytics

Comments 45

Log in to Moz

Don't have an account?