What Google Doesn't See CAN Hurt You (Update: But, in this case, it was cloaking)

Comments 85

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

softplus

2008-06-26T03:24:51-07:00

Hi Rand
You might want to try to access that page using a user-agent like "Mozilla/5.0 (compatible; Googlebot/2.1; +https://www.google.com/bot.html)".

Not only is the page returning a 404 error page with result code 200, it also has a meta robots with "noindex" on it. I think fixing that would go a long way towards helping us to crawl, index and rank it appropriately :-)

13 0

Hi Rand You might want to try to access that page using a user-agent like "Mozilla/5.0 (compatible; Googlebot/2.1; +https://www.google.com/bot.html)". Not only is the page returning a 404 error page with result code 200, it also has a meta robots with "noindex" on it. I think fixing that would go a long way towards helping us to crawl, index and rank it appropriately :-)
Cancel
- Per Svanström
 
 2008-06-26T03:48:04-07:00
 
 Oh, thats where it went wrong.
 
 The client went from changeing their url from their old /Database/ to useing /database/ but I guess this has been missconfigured on their end as the 404 page.
 
 The scary part is that the browser don't display 404 but show the correct page so easy to think it is working.
 
 Will most definatly use the user agent you linked to in the future as that would have pin pointed this issue right away. Thank you for that.
 
 2 0
 
 Oh, thats where it went wrong. The client went from changeing their url from their old /Database/ to useing /database/ but I guess this has been missconfigured on their end as the 404 page. The scary part is that the browser don't display 404 but show the correct page so easy to think it is working. Will most definatly use the user agent you linked to in the future as that would have pin pointed this issue right away. Thank you for that. 
 Cancel
- Rand Fishkin
 
 2008-06-26T09:37:05-07:00
 
 John - greatly appreciate that! I had changed my user agent (to just GGbot), but that still re-directed me properly and using SEO-Browser even returned it properly. I definitely need to imitate google more closely when investigating sites (I had presumed because Yahoo! and MSN had it, and it had received PageRank, that is must be a penalty - not a 404).
 
 Thanks a ton!
 
 1 0
 
 John - greatly appreciate that! I had changed my user agent (to just GGbot), but that still re-directed me properly and using SEO-Browser even <a href="https://seo-browser.com/index.php?address=http%3A%2F%2Fwww.birdstep.com%2Fdatabase&action=Parse+URL" rel="nofollow">returned it properly</a>. I definitely need to imitate google more closely when investigating sites (I had presumed because Yahoo! and MSN had it, and it had received PageRank, that is must be a penalty - not a 404). Thanks a ton! 
 Cancel
 - Matt Cutts
 
 2008-06-26T10:23:48-07:00
 
 Rand, this is not just a capitalization issue.
 
 If you're going to do a blog post about an issue, please go to the trouble of doing a telnet to port 80 and giving the Googlebot user agent. That way, you've done as much as possible to get down to the metal (you're not coming from a Google IP, but everything else is the same). Plus you'd be able to see information that other tools (wget, browsers, SEO-Browser) don't provide, e.g. if there's a noindex meta directive or a 404 page in the body text.
 
 My takeaway is that there's an opportunity to write a tool that gives this level of information, or to write a blog post about how to fetch a page as Googlebot using telnet or curl. That would help people more in my opinion.
 
 7 0
 
 Rand, this is not just a capitalization issue. If you're going to do a blog post about an issue, please go to the trouble of doing a telnet to port 80 and giving the Googlebot user agent. That way, you've done as much as possible to get down to the metal (you're not coming from a Google IP, but everything else is the same). Plus you'd be able to see information that other tools (wget, browsers, SEO-Browser) don't provide, e.g. if there's a noindex meta directive or a 404 page in the body text. My takeaway is that there's an opportunity to write a tool that gives this level of information, or to write a blog post about how to fetch a page as Googlebot using telnet or curl. That would help people more in my opinion.
 Cancel
 - Rand Fishkin
 
 2008-06-26T10:37:03-07:00
 
 Agreed - that sounds like a very good tool emulator to build. I'll ask some folks here at SEOmoz to put it together :)
 
 And yes - I agree that I should haev investigated more thoroughly. Not to make excuses, but seeing it in Yahoo! and MSN (and through SEO-Browser / simple user-agent change) threw me off. Thanks!
 
 2 1
 
 Agreed - that sounds like a very good tool emulator to build. I'll ask some folks here at SEOmoz to put it together :) And yes - I agree that I should haev investigated more thoroughly. Not to make excuses, but seeing it in Yahoo! and MSN (and through SEO-Browser / simple user-agent change) threw me off. Thanks! 
 Cancel
 - metaspy
 
 2008-06-26T10:40:06-07:00
 
 if the capitalization is not the issue, what is?
 
 1 0
 
 if the capitalization is not the issue, what is?
 Cancel
 - ColleenMonahan
 
 2008-06-26T20:17:07-07:00
 
 Is there an example someone can provide showing how to do this? It has been ages since I used telnet and I never used it for something like this.
 
 Cheers,
 
 @trontastic
 
 1 0
 
 Is there an example someone can provide showing how to do this? It has been ages since I used telnet and I never used it for something like this. Cheers, @trontastic 
 Cancel
Matt Cutts

2008-06-26T10:32:45-07:00

Whoever is in charge of birdstep.com, you continue to do suboptimal things. When a user tries to fetch https://www.birdstep.com/Database/ they see a normal page. When Google tries to fetch a page, we see a temporary redirect (302) to https://www.birdstep.com/errors/404.html?aspxerrorpath=/default_database.aspx

My take on this: you're shooting yourself in the foot by trying to be too smart. Not only are you putting yourself at higher risk at being removed for cloaking, you've effectively removed yourself from Google entirely already, by redirecting to a 404 page.

My advice: in the next day or so, go over your webserver code and remove *absolutely everything* that is doing conditional redirects or serving based on the Googlebot user-agent or IP address. Once you've removed all that junk, your redirect problems will be apparent just by following the links on your own site.

10 0

Whoever is in charge of birdstep.com, you continue to do suboptimal things. When a user tries to fetch https://www.birdstep.com/Database/ they see a normal page. When Google tries to fetch a page, we see a temporary redirect (302) to https://www.birdstep.com/errors/404.html?aspxerrorpath=/default_database.aspx My take on this: you're shooting yourself in the foot by trying to be too smart. Not only are you putting yourself at higher risk at being removed for cloaking, you've effectively removed yourself from Google entirely already, by redirecting to a 404 page. My advice: in the next day or so, go over your webserver code and remove *absolutely everything* that is doing conditional redirects or serving based on the Googlebot user-agent or IP address. Once you've removed all that junk, your redirect problems will be apparent just by following the links on your own site.
Cancel
- g1smd
 
 2008-06-27T03:58:49-07:00
 
 In this case, Xenu Linksleuth may help get you started in finding the many errors in the internal navigation -- once you serve the same content to browsers and bots alike.
 
 g1smd edited 2008-06-27T04:14:49-07:00
 2 0
 
 In this case, Xenu Linksleuth may help get you started in finding the many errors in the internal navigation -- once you serve the same content to browsers and bots alike.
 Cancel
Matt Cutts

2008-06-26T08:57:16-07:00

Bzzzt. Sorry Rand, not correct. softplus got it above. I got suspicious when I visited www.birdstep.com/database and immediately got sent to a different url www.birdstep.com/Database (note the uppercase 'D').

The document that is returned to Google is seriously horked (it claims to be a 404). So this is certainly a problem that birdstep.com created for themselves and can solve themselves.

Rand, would you mind correcting this title or updating the blog post?

9 0

Bzzzt. Sorry Rand, not correct. softplus got it above. I got suspicious when I visited www.birdstep.com/database and immediately got sent to a different url www.birdstep.com/Database (note the uppercase 'D'). The document that is returned to Google is seriously horked (it claims to be a 404). So this is certainly a problem that birdstep.com created for themselves and can solve themselves. Rand, would you mind correcting this title or updating the blog post? 
Cancel
- Per Svanström
 
 2008-06-26T09:08:00-07:00
 
 I would agree that the 404 is horked but the redirection has happend after the initial post as the client has renamed their catalouge "database" to the old setting (when the page was indexed and very well ranked) "Database" and then their CMS automaticly adds a 301 from the lowercase to the uppercase as the system knows the uppercase version.
 
 I know this is not good and both urls should of course be handled by the system, but it is nothing added by the client but by their CMS.
 
 On the other hand, the de-index is not related to the invisible menu (if I read your answer correctly) and then I guess the initial thought about the exluction is not correct anymore.
 
 1 0
 
 I would agree that the 404 is horked but the redirection has happend after the initial post as the client has renamed their catalouge "database" to the old setting (when the page was indexed and very well ranked) "Database" and then their CMS automaticly adds a 301 from the lowercase to the uppercase as the system knows the uppercase version. I know this is not good and both urls should of course be handled by the system, but it is nothing added by the client but by their CMS. On the other hand, the de-index is not related to the invisible menu (if I read your answer correctly) and then I guess the initial thought about the exluction is not correct anymore. 
 Cancel
 - Ann Smarty
 
 2008-06-26T09:18:45-07:00
 
 So I just want to confirm:
 
 what happened first: the page dropped out of the index or your client's url path change?
 
 1 0
 
 So I just want to confirm: what happened first: the page dropped out of the index or your client's url path change? 
 Cancel
 - Per Svanström
 
 2008-06-26T09:43:12-07:00
 
 They changed the url in their CMS from capital D to /database/ along with alot of other advices and a while after single pages started to vanish untill after about 2 weeks they were all gone.
 
 I'm also abit embarassed as I managed to turn a very intressting post to an error seeking gathering on the single page, which of course was not my intention, so will continue my investigations on this matter elsewhere in respect of the blog and all you mozers.
 
 1 0
 
 They changed the url in their CMS from capital D to /database/ along with alot of other advices and a while after single pages started to vanish untill after about 2 weeks they were all gone. I'm also abit embarassed as I managed to turn a very intressting post to an error seeking gathering on the single page, which of course was not my intention, so will continue my investigations on this matter elsewhere in respect of the blog and all you mozers. 
 Cancel
 - Matt Cutts
 
 2008-06-26T10:48:40-07:00
 
 "On the other hand, the de-index is not related to the invisible menu (if I read your answer correctly) and then I guess the initial thought about the exluction is not correct anymore."
 
 Macaper, it has nothing to do with the menu. It has to do with the site cloaking to Google, but then sending Google to a 404 error page. By the way, that's completely aside from the upper/lowercase database/Database issue.
 
 4 0
 
 "On the other hand, the de-index is not related to the invisible menu (if I read your answer correctly) and then I guess the initial thought about the exluction is not correct anymore." Macaper, it has nothing to do with the menu. It has to do with the site cloaking to Google, but then sending Google to a 404 error page. By the way, that's completely aside from the upper/lowercase database/Database issue.
 Cancel
 - Per Svanström
 
 2008-06-26T13:40:45-07:00
 
 Hard to keep up with the post as answers keep coming in into post furhter up the post.
 
 Due to all the great help and input at this post now the problem is pin pointed. Cloaking is of course not good and should result in a page not beeing indexed.
 
 The telenet information was completly new to me, but will most definatly keep that for the future to check new sites.
 
 As of the cloaking being very much unintentional and even worse, not vissible in the server this is hard to find the cause of all this, as it has obviously not been like this from start as that would have made the site excluded before the SEO began of the page.
 
 First check of the site was Google Webmaster Tools and that flaged all OK which was unfortunate as that made us think that problem was elsewhere.
 
 An other issue is that the site is hosted on a joined webhosting company so hard to get that company to react and start to go over their servers to find what is wrong and what is causing this cloaking behavior, but as you write Matt. Make them take away everything. Reset servers totatly and reinstall things, cause as either client or hosting company knows what has made the server act as cloaking I se no other way to make sure the problem is fixed.
 
 Alot of great input in all your answers in this post and I just wanted to point out, as I read your statements as the client tries to cloak. Trust me, they are not. Just a server glitch, probably from their hosting company, that started all this a while back.
 
 1 0
 
 Hard to keep up with the post as answers keep coming in into post furhter up the post. Due to all the great help and input at this post now the problem is pin pointed. Cloaking is of course not good and should result in a page not beeing indexed. The telenet information was completly new to me, but will most definatly keep that for the future to check new sites. As of the cloaking being very much unintentional and even worse, not vissible in the server this is hard to find the cause of all this, as it has obviously not been like this from start as that would have made the site excluded before the SEO began of the page. First check of the site was Google Webmaster Tools and that flaged all OK which was unfortunate as that made us think that problem was elsewhere. An other issue is that the site is hosted on a joined webhosting company so hard to get that company to react and start to go over their servers to find what is wrong and what is causing this cloaking behavior, but as you write Matt. Make them take away everything. Reset servers totatly and reinstall things, cause as either client or hosting company knows what has made the server act as cloaking I se no other way to make sure the problem is fixed. Alot of great input in all your answers in this post and I just wanted to point out, as I read your statements as the client tries to cloak. Trust me, they are not. Just a server glitch, probably from their hosting company, that started all this a while back. 
 Cancel
- Rand Fishkin
 
 2008-06-26T09:33:41-07:00
 
 Updated the post, Matt. Thanks for stopping by :)
 
 BTW - Is this something Google will fix? If Yahoo! and Live can index it, and browsers are redirecting, wouldn't Google want to do so also? Seems like there might be a lot of pages excluded if Google can't pick up the capitalization issue.
 
 randfish edited 2008-06-26T10:04:45-07:00
 2 1
 
 Updated the post, Matt. Thanks for stopping by :) BTW - Is this something Google will fix? If Yahoo! and Live can index it, and browsers are redirecting, wouldn't Google want to do so also? Seems like there might be a lot of pages excluded if Google can't pick up the capitalization issue. 
 Cancel
 - Per Svanström
 
 2008-06-26T10:12:49-07:00
 
 I think this is rather some strange server setup rather then capitalization issue with Google, cause the server requires an exact match of the url atm, I just found out after digging further after the input from this post. If you enter lowercase it 301 you to the /Database/ version, hence the lowercase will be a 404 as the system doesn't think it excists.
 
 The same is, as I found now, if you enter it with our without a trailing slash it will 301 you to the url with a trailing slash.
 
 So think that this is all rather a very strange server setting change that ended up requireing exakt matches in the url, even down to capitalization or not and by that adding tons of 301:s all over the place.
 
 Macaper edited 2008-06-26T10:13:33-07:00
 1 0
 
 I think this is rather some strange server setup rather then capitalization issue with Google, cause the server requires an exact match of the url atm, I just found out after digging further after the input from this post. If you enter lowercase it 301 you to the /Database/ version, hence the lowercase will be a 404 as the system doesn't think it excists. The same is, as I found now, if you enter it with our without a trailing slash it will 301 you to the url with a trailing slash. So think that this is all rather a very strange server setting change that ended up requireing exakt matches in the url, even down to capitalization or not and by that adding tons of 301:s all over the place. 
 Cancel
 - Matt Cutts
 
 2008-06-26T10:42:53-07:00
 
 Strange is one way to put it. :) Please see my comment (here) about an issue that definitely needs to be fixed ASAP.
 
 Edit from Rand - Made Matt's link live and visible.
 
 randfish edited 2008-06-26T11:50:53-07:00
 2 0
 
 Strange is one way to put it. :) Please see my comment (<a href="../../../../blog/what-google-doesnt-see-can-hurt-you#jtc62752">here</a>) about an issue that definitely needs to be fixed ASAP. Edit from Rand - Made Matt's link live and visible. 
 Cancel
 - g1smd
 
 2008-06-27T03:46:14-07:00
 
 *** If you enter lowercase it 301 you to the /Database/ version, hence the lowercase will be a 404 as the system doesn't think it excists. ***
 
 Errr. NO.
 
 If the lowercase issues a 301 redirect, then that URL is a 301 redirect. It cannot also be a 404.
 
 A URL returns ONE status code in the HTTP header:
 200 - Page OK, here is the content.
 301 - redirect to another URL. The browser makes a new HTTP request to fetch the new URL.
 302 - redirect to another URL. use the 301 not the 302.
 404 - Page not found. A page full of error text is served at the originally requested URL, and the 404 tells the bot to NOT index the URL and not index the content. In the 404 nothing is said about whether the page may come back with real content at some future date.
 410 - Page gone. Similar to 404, except that page is NEVER coming back.
 
 2 0
 
 *** If you enter lowercase it 301 you to the /Database/ version, hence the lowercase will be a 404 as the system doesn't think it excists. *** Errr. NO. If the lowercase issues a 301 redirect, then that URL is a 301 redirect. It cannot also be a 404. A URL returns ONE status code in the HTTP header: 200 - Page OK, here is the content. 301 - redirect to another URL. The browser makes a new HTTP request to fetch the new URL. 302 - redirect to another URL. use the 301 not the 302. 404 - Page not found. A page full of error text is served at the originally requested URL, and the 404 tells the bot to NOT index the URL and not index the content. In the 404 nothing is said about whether the page may come back with real content at some future date. 410 - Page gone. Similar to 404, except that page is NEVER coming back. 
 Cancel
 - g1smd
 
 2008-06-27T03:51:03-07:00
 
 *** So think that this is all rather a very strange server setting change that ended up requireing exakt matches in the url, even down to capitalization or not and by that adding tons of 301:s all over the place. ***
 
 Links within the internal navigation should point to the URL that you want to be indexed, in exactly the correct case.
 
 When a user or bot follows an internal link they should not be hitting any kind of internal redirect to get to the content.
 
 The issue should be fixed by making the URL in the link be exactly the same, and exactly the same case, as the actual URL for the content.
 
 If it is not possible to fix the internal scripting to make them both match up, then you could always employ an internal rewrite (that's a rewrite, NOT a redirect) to translate the externally requested URL into an internal facing filepath and filename. In the rewrite, the internal file path and file name are NOT exposed back to the browser.
 
 This is basic design stuff that I see badly done all the time.
 
 g1smd edited 2008-06-27T03:53:15-07:00
 2 0
 
 *** So think that this is all rather a very strange server setting change that ended up requireing exakt matches in the url, even down to capitalization or not and by that adding tons of 301:s all over the place. *** Links within the internal navigation should point to the URL that you want to be indexed, in exactly the correct case. When a user or bot follows an internal link they should not be hitting any kind of internal redirect to get to the content. The issue should be fixed by making the URL in the link be exactly the same, and exactly the same case, as the actual URL for the content. If it is not possible to fix the internal scripting to make them both match up, then you could always employ an internal rewrite (that's a rewrite, NOT a redirect) to translate the externally requested URL into an internal facing filepath and filename. In the rewrite, the internal file path and file name are NOT exposed back to the browser. This is basic design stuff that I see badly done all the time.
 Cancel
 - Matt Cutts
 
 2008-06-26T10:44:17-07:00
 
 Rand, I have no idea whether birdstep.com is cloaking to Yahoo or Microsoft as well. If not, that would account for why Y/M have the url. Also, Y/M handle the noindex meta tag differently. Most people prefer how Google handles the noindex meta tag.
 
 3 0
 
 Rand, I have no idea whether birdstep.com is cloaking to Yahoo or Microsoft as well. If not, that would account for why Y/M have the url. Also, Y/M handle the noindex meta tag differently. Most people prefer how Google handles the noindex meta tag.
 Cancel
- Patrick Sexton
 
 2008-06-26T13:07:00-07:00
 
 HeHe, it is nice to hear John called "softplus" (his old google webmaster help group name)
 
 1 0
 
 HeHe, it is nice to hear John called "softplus" (his old google webmaster help group name)
 Cancel
- darkSEO
 
 2008-07-04T16:37:51-07:00
 
 heh!
 
 1 0
 
 heh!
 Cancel
Per Svanström

2008-06-26T03:14:27-07:00

As beeing the person not notice the missnig menu item in the first place I can only say that Pro membership can end up saving your life. I work with SEO atlest 80-100h every week and I know that alot of the readers in here do the same and I can say for a fact that I wouldn't know what to do without my Pro membership, so anyone not useing it, please check it out.

I also want to thank you Rand for diggin into this as I still feel that even if it is a rule breaker, even if it was unintentionally, I still think that a total exclution of all pages are not in balance to so many other pages you find daily that are intentionally breaking the rules.

I don't want to go totatly of topic and start to talk about all pages that breakes the rules intentionally, but as everyone know that works with SEO or SEM they are plenty and still they are allowed to be indexed and even rank top positions. But one honest misstake like the one we are talking about in this post can get you totatly excluded.

Get me right here, I'm all for White Hat SEO and I would be the first one to cry happy tears if all Black Hat SEO pages was excluded, but I'm starting to loose my trust in Google cause it feels like it's a total lottery if you site will go through the "all seeing eye" of Google or if it will get the full penelty.

7 0

As beeing the person not notice the missnig menu item in the first place I can only say that Pro membership can end up saving your life. I work with SEO atlest 80-100h every week and I know that alot of the readers in here do the same and I can say for a fact that I wouldn't know what to do without my Pro membership, so anyone not useing it, please check it out. I also want to thank you Rand for diggin into this as I still feel that even if it is a rule breaker, even if it was unintentionally, I still think that a total exclution of all pages are not in balance to so many other pages you find daily that are intentionally breaking the rules. I don't want to go totatly of topic and start to talk about all pages that breakes the rules intentionally, but as everyone know that works with SEO or SEM they are plenty and still they are allowed to be indexed and even rank top positions. But one honest misstake like the one we are talking about in this post can get you totatly excluded. Get me right here, I'm all for White Hat SEO and I would be the first one to cry happy tears if all Black Hat SEO pages was excluded, but I'm starting to loose my trust in Google cause it feels like it's a total lottery if you site will go through the "all seeing eye" of Google or if it will get the full penelty. 
Cancel
David Mihm

2008-06-26T07:03:06-07:00

Amen to the value of the SEOmoz Pro membership. My blog got banned for about a month due to some viagra injections on my WP blog. I couldn't figure out what the hell was going on since I didn't see the code anywhere on my site, but Jeff & Rebecca responded to my Q in the A section within about 12 hours and had me all squared away.

Even if I only use this feature once in 12 months, $400 a year is a small price to pay for a second opinion from some of the top SEOs in the world on something this important.

6 0

Amen to the value of the SEOmoz Pro membership. My blog got banned for about a month due to some viagra injections on my WP blog. I couldn't figure out what the hell was going on since I didn't see the code anywhere on my site, but Jeff & Rebecca responded to my Q in the A section within about 12 hours and had me all squared away. Even if I only use this feature once in 12 months, $400 a year is a small price to pay for a second opinion from some of the top SEOs in the world on something this important. 
Cancel
Rishi Lakhani

2008-06-26T06:31:40-07:00

Thus, I'm making a point of noting here that Birdstep got their issue solved (or at least diagnosed) thanks to the Q+A section in SEOmoz PRO. I do think we offer a good service, and I really do believe in it; I think I'm just a bit shy about self-promotion.

I dont think you should be shy of self promotion. Its not really promotion if you are stating a fact! lol. Pro membership does help. And QnA is one of the best ways to make use of resources not available to most SEOs: A second opinion or a pair of professional eyes.

4 0

Thus, I'm making a point of noting here that Birdstep got their issue solved (or at least diagnosed) thanks to the <a href="../qa">Q+A section</a> in <a href="../gopro">SEOmoz PRO</a>. I do think we offer a good service, and I really do believe in it; I think I'm just a bit shy about self-promotion. I dont think you should be shy of self promotion. Its not really promotion if you are stating a fact! lol. Pro membership does help. And QnA is one of the best ways to make use of resources not available to most SEOs: A second opinion or a pair of professional eyes. 
Cancel
DaveN

2008-06-26T06:11:03-07:00

while eating a pork pie, and taking time to catch up on my rss.. i found this,

Personally I would have started at the robots.txt file , in turning finding a xml site map, in turn finding this :

https://www.birdstep.com/upload/sitemap/sitemap-DB.xml

which shows me that the webmaster has made a mistake, the xml links are all lower case giving a Google QI to what to index.. looking at the server headers things get silly .. so fix the Urls or at least fix your xml sitemap.

DaveN

DaveN edited 2008-06-26T06:11:44-07:00
3 0

while eating a pork pie, and taking time to catch up on my rss.. i found this, Personally I would have started at the robots.txt file , in turning finding a xml site map, in turn finding this : https://www.birdstep.com/upload/sitemap/sitemap-DB.xml which shows me that the webmaster has made a mistake, the xml links are all lower case giving a Google QI to what to index.. looking at the server headers things get silly .. so fix the Urls or at least fix your xml sitemap. DaveN
Cancel
- Per Svanström
 
 2008-06-26T07:27:16-07:00
 
 When you say headers get silly what do you mean?
 
 Client is useing a CMS that by default sets all names that gets used in urls a capital letter. Yes, I know that it's silly as you want all lowercase in all urls, but as the CMS automaticly create capital catalouge names in the url the sitemap is generated (by the CMS) reflecting the actually structure of the system.
 
 It's even like if you enter /database/ into the url you get redirected to /Database/ automaticly by the system
 
 But maby I'm not understanding what you said needed to be changed.
 
 Sorry if my questions and focus on the actually site rather then the hidden link and by that went far of topic.
 
 Macaper edited 2008-06-26T07:27:55-07:00
 1 0
 
 When you say headers get silly what do you mean? Client is useing a CMS that by default sets all names that gets used in urls a capital letter. Yes, I know that it's silly as you want all lowercase in all urls, but as the CMS automaticly create capital catalouge names in the url the sitemap is generated (by the CMS) reflecting the actually structure of the system. It's even like if you enter /database/ into the url you get redirected to /Database/ automaticly by the system But maby I'm not understanding what you said needed to be changed. Sorry if my questions and focus on the actually site rather then the hidden link and by that went far of topic. 
 Cancel
 - Rand Fishkin
 
 2008-06-26T09:38:12-07:00
 
 Per - I think Dave's just suggesting that since the URLs all resolve to /Database/, you should remove the /database/ (small "d") in the sitemap and replace them with Big "D"s.
 
 2 0
 
 Per - I think Dave's just suggesting that since the URLs all resolve to /Database/, you should remove the /database/ (small "d") in the sitemap and replace them with Big "D"s.
 Cancel
 - g1smd
 
 2008-06-27T03:20:43-07:00
 
 Having just modified a sitelike this, the problem is often more difficult than the original designer may have thought at first glance.
 
 The Category Name needs to feed to the URL href, anchor text and title attribute in all of the navigational links and breadcrumb trails. It may also be needed in the title tag of the destination page.
 
 In the href, spaces and underscores are best avoided, and the words are best done all in lower case. Most punctuation, other than slash, hyphen, colon, or dot is also best avoided.
 
 In the anchor text and in the title attribute, words must have spaces between them, and leading capitals are best, and almost any punctuation is allowed (although quotes and brackets may need to be escaped).
 
 2 0
 
 Having just modified a sitelike this, the problem is often more difficult than the original designer may have thought at first glance. The Category Name needs to feed to the URL href, anchor text and title attribute in all of the navigational links and breadcrumb trails. It may also be needed in the title tag of the destination page. In the href, spaces and underscores are best avoided, and the words are best done all in lower case. Most punctuation, other than slash, hyphen, colon, or dot is also best avoided. In the anchor text and in the title attribute, words must have spaces between them, and leading capitals are best, and almost any punctuation is allowed (although quotes and brackets may need to be escaped). 
 Cancel
WeRASkitzzo

2008-06-26T07:04:08-07:00

Rand, you stated:

Jane & I spent a few minutes trying to puzzle out if bad links were pointing in...

Do you really think that having bad links pointing at a legitimate site will get that site deindexed? I've always heard of people worrying about this but as best I can tell it's another SEO created boogie-man. I'd love to see any examples or tests done that back up the fear of inbound links.

3 0

Rand, you stated: <blockquote>Jane & I spent a few minutes trying to puzzle out if bad links were pointing in...</blockquote> Do you really think that having bad links pointing at a legitimate site will get that site deindexed? I've always heard of people worrying about this but as best I can tell it's another SEO created boogie-man. I'd love to see any examples or tests done that back up the fear of inbound links. 
Cancel
- Kevin Doory
 
 2008-06-26T07:32:28-07:00
 
 WeRASkitzzo- Love the Carlin Icon!
 
 I agree i thought another shaddy seo could get your site to drop in the SERPS but not necessarily be dropped from the index completely with bad links.
 
 KevinDoory edited 2008-06-26T07:33:16-07:00
 2 0
 
 WeRASkitzzo- Love the Carlin Icon! I agree i thought another shaddy seo could get your site to drop in the SERPS but not necessarily be dropped from the index completely with bad links. 
 Cancel
 - WeRASkitzzo
 
 2008-06-26T09:27:49-07:00
 
 Haha, it's not a Carlin logo, that's a cartoon charicature of me! It's not that clear at this resolution but it's definitely the first time I've ever been told I look like him.
 
 I'd question whether they can even get you to drop in the SERPS, not to mention be deindexed.
 
 1 1
 
 Haha, it's not a Carlin logo, that's a cartoon charicature of me! It's not that clear at this resolution but it's definitely the first time I've ever been told I look like him. I'd question whether they can even get you to drop in the SERPS, not to mention be deindexed. 
 Cancel
- paisley amoeba
 
 2008-06-26T07:36:15-07:00
 
 WeRASkitzzo : it can be done.
 
 easy way.. buy yahoo PPC account for URL you want to dump, buy real estate on known spam/link farms... and then...
 
 umm.. maybe i shouldn't complete this thought...
 
 (whistles innocently)
 
 paisley edited 2008-06-26T07:38:52-07:00
 4 0
 
 <a href="../../../../users/view/3529">WeRASkitzzo</a> : it can be done. easy way.. buy yahoo PPC account for URL you want to dump, buy real estate on known spam/link farms... and then... umm.. maybe i shouldn't complete this thought... (whistles innocently)
 Cancel
 - WeRASkitzzo
 
 2008-06-26T09:30:28-07:00
 
 A lot of people have claimed they know how to do it but, I'm sorry, I just don't buy it. There are more than a few people in the SEO world that would have no qualms about doing such a thing and I find it hard to believe that more cases of this wouldn't be documented.
 
 I mean hell, if it were that easy to do, wouldn't someone have done it to a high profile SEO site like Moz, or SELand, or Matt Cutts' blog?
 
 What I'm saying is I need proof before I buy into this theory.
 
 2 0
 
 A lot of people have claimed they know how to do it but, I'm sorry, I just don't buy it. There are more than a few people in the SEO world that would have no qualms about doing such a thing and I find it hard to believe that more cases of this wouldn't be documented. I mean hell, if it were that easy to do, wouldn't someone have done it to a high profile SEO site like Moz, or SELand, or Matt Cutts' blog? What I'm saying is I need proof before I buy into this theory. 
 Cancel
- g1smd
 
 2008-06-27T03:31:46-07:00
 
 *** Do you really think that having bad links pointing at a legitimate site ***
 
 Bad links "in" can get "alternative" URLs for the content spidered and indexed, using URLs that do not exist anywhere in the internal site navigation.
 
 There are many things you can do to protect against that happening, but most people don't actually do any of them.
 
 g1smd edited 2008-06-27T03:33:03-07:00
 2 0
 
 *** Do you really think that having bad links pointing at a legitimate site *** Bad links "in" can get "alternative" URLs for the content spidered and indexed, using URLs that do not exist anywhere in the internal site navigation. There are many things you can do to protect against that happening, but most people don't actually do any of them.
 Cancel
Mintyman

2008-06-26T03:00:13-07:00

Don't be shy about promoting the PRO section. I'm glad I made the investment and being able to put questions to the team really is worth its weight in gold alone. Having access to all the other goodies is just icing on top.

3 0

Don't be shy about promoting the PRO section. I'm glad I made the investment and being able to put questions to the team really is worth its weight in gold alone. Having access to all the other goodies is just icing on top.
Cancel
- shedside
 
 2008-06-27T07:29:42-07:00
 
 Is there are reason why the SEOMoz promotional material for the pro account doesn't mention the Q+A feature? I had no idea!
 
 1 0
 
 Is there are reason why the SEOMoz promotional material for the pro account doesn't mention the Q+A feature? I had no idea!
 Cancel
robert nicholson

2008-06-26T03:25:18-07:00

Must admit the Pro Q&A is an amazing service that I really dont use enough!!

Still I find the Pro account so usefull i've been paying for it personally (not thru expenses or company) since I joined up!

3 0

Must admit the Pro Q&A is an amazing service that I really dont use enough!! Still I find the Pro account so usefull i've been paying for it personally (not thru expenses or company) since I joined up! 
Cancel
GoodNeighbor

2008-06-26T04:17:48-07:00

Thanks for sharing another fine real life issue!

But I'm a bit sceptical about the "point technique". Maybe I understood it totally wrong but for most pages I tried to lookup with this search query it gives me results. Normally if you open a page with the dot e.g. https://www.seomoz.org/. you are "redirected" to https://www.seomoz.org/ with a status code 200 (ok). As I understand this is normally behavior:

. = this directory

.. = parent directory

Which means www.example.com/. = www.example.com

Also if I see results like the ones beneath, I'm curious if you mean something special about the SERP you mentioned, e.g. that it has only 1 result, without description and "no omitted results".

Could you clarify how your result differs from typical results like these:

"https://www.seomoz.org/."

https://www.seomoz.org/. (shows the web2.0 award, becuase it seems the most relevant page with a dot, but I believe otherwise it would show the same result like the above with quotes)

https://mail.yahoo.com/.

2 0

Thanks for sharing another fine real life issue! But I'm a bit sceptical about the "point technique". Maybe I understood it totally wrong but for most pages I tried to lookup with this search query it gives me results. Normally if you open a page with the dot e.g. https://www.seomoz.org/. you are "redirected" to https://www.seomoz.org/ with a status code 200 (ok). As I understand this is normally behavior: . = this directory .. = parent directory Which means www.example.com/. = www.example.com Also if I see results like the ones beneath, I'm curious if you mean something special about the SERP you mentioned, e.g. that it has only 1 result, without description and "no omitted results". Could you clarify how your result differs from typical results like these: <a href="https://www.google.com/search?hl=en&q=%22http%3A%2F%2Fwww.seomoz.org%2F.%22" rel="nofollow">"https://www.seomoz.org/." </a> <a href="https://www.google.com/search?hl=en&q=http%3A%2F%2Fwww.seomoz.org%2F." rel="nofollow">https://www.seomoz.org/.</a> (shows the web2.0 award, becuase it seems the most relevant page with a dot, but I believe otherwise it would show the same result like the above with quotes) <a href="https://www.google.com/search?hl=en&q=http%3A%2F%2Fmail.yahoo.com%2F." rel="nofollow">https://mail.yahoo.com/.</a>
Cancel
Ann Smarty

2008-06-26T09:13:21-07:00

It's interesting to hear about different kinds of penalties as you never know where it can hit you. I have always been under impression that hidden content when perceived as illegit effects the whole site, not just one page.... really interesting...

2 0

It's interesting to hear about different kinds of penalties as you never know where it can hit you. I have always been under impression that hidden content when perceived as illegit effects the whole site, not just one page.... really interesting...
Cancel
DaveN

2008-06-26T10:50:36-07:00

Rand wrong dude again Yahoo didn't get it right

https://search.yahoo.com/search?p=www.birdstep.com%2FDatabase%2F&ei=UTF-8&fr=moz2

@ann i thought they there lower cloaking when i saw it, didn't want to say the "C" but header where different if you where a spider to human

It's a Ban in my world

DaveN edited 2008-06-26T11:04:36-07:00
2 0

Rand wrong dude again Yahoo didn't get it right https://search.yahoo.com/search?p=www.birdstep.com%2FDatabase%2F&ei=UTF-8&fr=moz2 @ann i thought they there lower cloaking when i saw it, didn't want to say the "C" but header where different if you where a spider to human It's a Ban in my world 
Cancel
- Rand Fishkin
 
 2008-06-26T11:39:28-07:00
 
 That's odd - it's listed fine in Site Explorer...
 
 But why would you ban them? They clearly did not do this to gain an advantage or with the intent to game Google - those pages were 404'ing! If Google is saying they strongly consider intent when looking at gaming issues, this has to be one where they'd try to help the site, not ban them.
 
 randfish edited 2008-06-26T11:57:14-07:00
 3 0
 
 That's odd - it's listed fine in Site Explorer... But why would you ban them? They clearly did not do this to gain an advantage or with the intent to game Google - those pages were 404'ing! If Google is saying they strongly consider intent when looking at gaming issues, this has to be one where they'd try to help the site, not ban them. 
 Cancel
 - g1smd
 
 2008-06-27T06:12:08-07:00
 
 What content was being delivered to Google before this problem was found?
 
 Has that content been recently removed, hence the 404 now?
 
 SOCO can't make a full analysis, as the crime scene has been contaminated.
 
 2 0
 
 What content was being delivered to Google before this problem was found? Has that content been recently removed, hence the 404 now? SOCO can't make a full analysis, as the crime scene has been contaminated.
 Cancel
- mvandemar
 
 2008-06-27T07:10:51-07:00
 
 Dave, try this search instead, it's not the same thing with Yahoo:
 
 [site:www.birdstep.com/Database -rfijbdrefv]
 
 -Michael
 
 3 0
 
 Dave, try this search instead, it's not the same thing with Yahoo: [<a href="https://search.yahoo.com/search?p=site%3Awww.birdstep.com%2FDatabase+-rfijbdrefv&y=Search&fr=&ei=UTF-8" rel="nofollow">site:www.birdstep.com/Database -rfijbdrefv</a>] -Michael 
 Cancel
softplus

2008-07-02T01:57:31-07:00

Seeing how this issue has not been resolved, I'd like to post my guess at what is happening. You're probably hitting a bug in IIS6, which can cause the web application to crash when the Googlebot visits. It's probably triggering a 500 error, but your custom error page is handling it. You can find out more about it (and get a fix) at:

https://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole.aspx

https://todotnet.com/archive/0001/01/01/7472.aspx

(added: just noticed that Fabio pointed to the same page :-))

softplus edited 2008-07-02T01:58:45-07:00
2 0

Seeing how this issue has not been resolved, I'd like to post my guess at what is happening. You're probably hitting a bug in IIS6, which can cause the web application to crash when the Googlebot visits. It's probably triggering a 500 error, but your custom error page is handling it. You can find out more about it (and get a fix) at: https://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole.aspx https://todotnet.com/archive/0001/01/01/7472.aspx (added: just noticed that Fabio pointed to the same page :-))
Cancel
Matt Cutts

2008-06-26T12:26:50-07:00

"You need to actually be on Port 80 (as Matt Cutts notes in the comments)."

It's less that the cloaking happened on port 80 (every tool was talking to the same port 80 on the webserver) and more that if you telnet'ed to port 80 you'd see the raw dump of exactly what birdstep.com was returning, without following any 301/302-type behavior.

Telnetting to port 80 is handy because you can see things like the raw body text that was returned and the raw server headers that are returned. Things like wget usually just follow the redirect, so you don't see the nitty-gritty details that the web server returned along the way.

Thanks for updating the title/post.

2 0

"You need to actually be on Port 80 (as Matt Cutts notes in the comments)." It's less that the cloaking happened on port 80 (every tool was talking to the same port 80 on the webserver) and more that if you telnet'ed to port 80 you'd see the raw dump of exactly what birdstep.com was returning, without following any 301/302-type behavior. Telnetting to port 80 is handy because you can see things like the raw body text that was returned and the raw server headers that are returned. Things like wget usually just follow the redirect, so you don't see the nitty-gritty details that the web server returned along the way. Thanks for updating the title/post.
Cancel
- mvandemar
 
 2008-06-26T12:42:45-07:00
 
 Matt, did they fix that issue? I added a user-agent switcher to my header detector tool, and I'm not seeing what you described:
 
 https://www.bad-neighborhood.com/header_detector.php
 
 It was by user-agent, correct, and not by ip?
 
 mvandemar edited 2008-06-26T12:52:36-07:00
 3 0
 
 Matt, did they fix that issue? I added a user-agent switcher to my header detector tool, and I'm not seeing what you described: <a href="https://www.bad-neighborhood.com/header_detector.php" rel="nofollow">https://www.bad-neighborhood.com/header_detector.php</a> It was by user-agent, correct, and not by ip? 
 Cancel
 - Matt Cutts
 
 2008-06-26T12:50:00-07:00
 
 mvandemar, it doesn't look like the issue is fixed, because I just checked and it's still doing it for Googlebot. So they are at a minimum still doing something high-risk based on the IP address of Googlebot (added: or they haven't truly turned off the user-agent checking, as John/softplus points out below.)
 
 MattCutts edited 2008-06-26T13:24:56-07:00
 2 0
 
 mvandemar, it doesn't look like the issue is fixed, because I just checked and it's still doing it for Googlebot. So they are at a minimum still doing something high-risk based on the IP address of Googlebot (added: or they haven't truly turned off the user-agent checking, as John/softplus points out below.)
 Cancel
 - mvandemar
 
 2008-06-26T12:51:33-07:00
 
 That's worse than user agent cloaking, isn't it?
 
 Meaning, more dangerous as far as staying indexed goes.
 
 mvandemar edited 2008-06-26T12:53:43-07:00
 1 0
 
 That's worse than user agent cloaking, isn't it? Meaning, more dangerous as far as staying indexed goes. 
 Cancel
 - softplus
 
 2008-06-26T13:10:24-07:00
 
 Hi Michael, there's actually a pretty easy way to see what's happening here. All you need is "wget" (open source / free). Just use a command like the following (all on one line):
 
 wget --save-headers -U "Mozilla/5.0 (compatible; Googlebot/2.1; +https://www.google.com/bot.html)" https://www.birdstep.com/database/
 
 You will see the redirects happening and it will download the final page where you can see the robots "noindex" meta tag. At the moment, I see a 301 redirect to a 302 redirect to a page called 404.html returning result code 200 (with the "noindex" robots meta tag). It's nothing exotic, has nothing to do with uppercase in URLs, just some strange cloaking to the Googlebot's user agent.
 
 In general, when an URL shows up with no associated information in the index, that means that we know about the URL but either aren't allowed to or just can't show more. That's usually from a robots.txt, a robots/googlebot meta tag, from an exotic x-robots meta tag or because the URL is just not working for us (returning 5xx, 4xx, times out, etc).
 
 Leaving a robots "none" or "noindex" meta tag on a site when it's new or has been re-done is actually pretty common (and confusing to new webmasters).
 
 softplus edited 2008-06-26T13:11:49-07:00
 3 0
 
 Hi Michael, there's actually a pretty easy way to see what's happening here. All you need is "wget" (open source / free). Just use a command like the following (all on one line): wget --save-headers -U "Mozilla/5.0 (compatible; Googlebot/2.1; +https://www.google.com/bot.html)" https://www.birdstep.com/database/ You will see the redirects happening and it will download the final page where you can see the robots "noindex" meta tag. At the moment, I see a 301 redirect to a 302 redirect to a page called 404.html returning result code 200 (with the "noindex" robots meta tag). It's nothing exotic, has nothing to do with uppercase in URLs, just some strange cloaking to the Googlebot's user agent. In general, when an URL shows up with no associated information in the index, that means that we know about the URL but either aren't allowed to or just can't show more. That's usually from a robots.txt, a robots/googlebot meta tag, from an exotic x-robots meta tag or because the URL is just not working for us (returning 5xx, 4xx, times out, etc). Leaving a robots "none" or "noindex" meta tag on a site when it's new or has been re-done is actually pretty common (and confusing to new webmasters). 
 Cancel
 - mvandemar
 
 2008-06-26T13:37:47-07:00
 
 Actually softplus, it almost looks like wget is doing something that's wrong. For some reason it seems to be appending index.html to the end of what it is trying to fetch, which is why you are getting the 404. There is no index.html.
 
 If I just use telnet (which doesn't add anything), I get this:
 
 Microsoft Telnet> open www.birdstep.com 80Connecting To www.birdstep.com...
 
 GET /database/ HTTP/1.0
 
 User-agent: Mozilla/5.0 (compatible; Googlebot/2.1; +https://www.google.com/bot.html)
 
 HTTP/1.1 301 Moved Permanently
 
 Connection: close
 
 Date: Thu, 26 Jun 2008 20:43:03 GMT
 
 Server: Microsoft-IIS/6.0X-Powered-By: ASP.NET
 
 X-AspNet-Version: 2.0.50727
 
 Location: /Database/
 
 Cache-Control: private
 
 Content-Type: text/html; charset=utf-8
 
 Content-Length: 127
 
 Connection to host lost.Press any key to continue...
 
 My tool does the exact same thing, doesn't try to add a default document or anything, and I don't get the 404 error.
 
 (edited to re-format output...)
 
 mvandemar edited 2008-06-26T13:46:50-07:00
 1 0
 
 Actually softplus, it almost looks like wget is doing something that's wrong. For some reason it seems to be appending index.html to the end of what it is trying to fetch, which is why you are getting the 404. There is no index.html. If I just use telnet (which doesn't add anything), I get this: Microsoft Telnet> open www.birdstep.com 80Connecting To www.birdstep.com... GET /database/ HTTP/1.0 User-agent: Mozilla/5.0 (compatible; Googlebot/2.1; +https://www.google.com/bot.html) HTTP/1.1 301 Moved Permanently Connection: close Date: Thu, 26 Jun 2008 20:43:03 GMT Server: Microsoft-IIS/6.0X-Powered-By: ASP.NET X-AspNet-Version: 2.0.50727 Location: /Database/ Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Length: 127 Connection to host lost.Press any key to continue... My tool does the exact same thing, doesn't try to add a default document or anything, and I don't get the 404 error. (edited to re-format output...) 
 Cancel
 - mvandemar
 
 2008-06-26T13:52:09-07:00
 
 Ok, nevermind, I see it now. It does have to do with case, softplus... lowercase 301's to the uppercase version, regardless of what useragent you are using, and uppercase 302's to a 404 page if you have Googlebot as your useragent. My tool is showing it, so no, it's not IP based delivery. I was just checking the wrong case URL.
 
 1 0
 
 Ok, nevermind, I see it now. It does have to do with case, softplus... lowercase 301's to the uppercase version, regardless of what useragent you are using, and uppercase 302's to a 404 page if you have Googlebot as your useragent. My tool is showing it, so no, it's not IP based delivery. I was just checking the wrong case URL.
 Cancel
 - softplus
 
 2008-06-26T13:57:45-07:00
 
 Hi Michael, the "index.html" that wget displays is just the local file name that it defaults to when there is none specified (eg for "domain.com/folder/"). It doesn't necessarily mean that the server is using that name, it's just that it has to use something to save those URLs under locally (where you're running wget). It's cool to see you make a tool that helps detect this kind of issue!
 
 1 0
 
 Hi Michael, the "index.html" that wget displays is just the local file name that it defaults to when there is none specified (eg for "domain.com/folder/"). It doesn't necessarily mean that the server is using that name, it's just that it has to use something to save those URLs under locally (where you're running wget). It's cool to see you make a tool that helps detect this kind of issue!
 Cancel
 - Per Svanström
 
 2008-06-26T13:55:42-07:00
 
 it would be instressting to know what you get for information if you check https://www.birdstep.com/Database/ (with slash) rather then the lowrecase database.
 
 Reason for this is as I stated above is that for some very odd reason their CMS (or server settings) requires an exact match on the url based on what the name of the path is in their CMS.
 
 In their CMS, atm (this is what has been renamed since the initial post this afternoon) the database folder is named "Database" wiith an uppercase. So when you add the url with either a lowercase database or without a trailing slash the system will automaticly redirect the incoming question to the exact match which is the /Database/.
 
 Why it does this I have no idea as I'm not in charge of the servers (niether is the client - birdstep), but it's sort of hurtfull to read the comments like risky business and the "what if" answers as I know that nothing that atm is acting like cloaking is intentional and it hasn't been like this from start, but someone changed something along the way that started this odd behavior and I can't seem to find out what it is as the hosting company isn't responding and hasn't been all day.
 
 1 0
 
 it would be instressting to know what you get for information if you check https://www.birdstep.com/Database/ (with slash) rather then the lowrecase database. Reason for this is as I stated above is that for some very odd reason their CMS (or server settings) requires an exact match on the url based on what the name of the path is in their CMS. In their CMS, atm (this is what has been renamed since the initial post this afternoon) the database folder is named "Database" wiith an uppercase. So when you add the url with either a lowercase database or without a trailing slash the system will automaticly redirect the incoming question to the exact match which is the /Database/. Why it does this I have no idea as I'm not in charge of the servers (niether is the client - birdstep), but it's sort of hurtfull to read the comments like risky business and the "what if" answers as I know that nothing that atm is acting like cloaking is intentional and it hasn't been like this from start, but someone changed something along the way that started this odd behavior and I can't seem to find out what it is as the hosting company isn't responding and hasn't been all day. 
 Cancel
 - mvandemar
 
 2008-06-26T14:30:41-07:00
 
 The "risky" comment was based on me not seeing what Matt Cutts was seeing, because I misunderstood what he was saying. He made this statement:
 
 By the way, that's completely aside from the upper/lowercase database/Database issue.
 
 In fact you only get the 302 redirect to the 404 page with the uppercase version of the url, and only if you have your user-agent set to Googlebot. That's what threw me, and lead me to think that maybe IP cloaking was involved. I was wrong though.
 
 mvandemar edited 2008-06-26T14:32:40-07:00
 2 0
 
 The "risky" comment was based on me not seeing what Matt Cutts was seeing, because I misunderstood what he was saying. He made this statement: <blockquote>By the way, that's completely aside from the upper/lowercase database/Database issue.</blockquote> In fact you only get the 302 redirect to the 404 page with the uppercase version of the url, and only if you have your user-agent set to Googlebot. That's what threw me, and lead me to think that maybe IP cloaking was involved. I was wrong though. 
 Cancel
 
 Rand Fishkin
 
 2008-06-26T14:33:51-07:00
 
 Really great work on seeing that Michael - we were confused internally about the capitalization issue (hence changing the post twice), so it's nice to see it get sorted.
 
 BTW - You mentioned a tool you're using - if that's public, please feel free to link to it. I'm sure the other SEOs reading the post would appreciate.
 
 1 0
 
 Really great work on seeing that Michael - we were confused internally about the capitalization issue (hence changing the post twice), so it's nice to see it get sorted. BTW - You mentioned a tool you're using - if that's public, please feel free to link to it. I'm sure the other SEOs reading the post would appreciate.
 Cancel
 
 mvandemar
 
 2008-06-26T14:39:35-07:00
 
 I did, thanks. 10 comments up.
 
 mvandemar edited 2008-06-26T14:40:02-07:00
 1 0
 
 I did, thanks. 10 comments up.
 Cancel
 
 Ankit Rawat
 
 2008-06-26T22:50:08-07:00
 
 Yups please add the tool into the post if it gets public !
 
 2 0
 
 Yups please add the tool into the post if it gets public !
 Cancel
 - g1smd
 
 2008-06-27T04:25:34-07:00
 
 Your server should redirect an incorrectly-cased request over to the right URL using a 301 redirect (do NOT use a 302 redirect), or else it should directly serve a 404 status code for the incorrect requests.
 
 This ensures that the content can only ever be indexed under one canonical URL.
 
 There are several errors in implementation here.
 
 A major one is that your internal links don't point to the correct URL. Users should NEVER hit a redirect when they click an INTERNAL link. The links need to match the real URL.
 
 Another is that you're doing different things for Googlebot and for regular users.
 
 Another is that serving an error page with a "200" status is at best confusing, and at worst, a source of Infinite Duplicate Content.
 
 If you are doing different things for Google, then that is either coded into your script, or into the server configuration files.
 
 Someone needs to own up as to where the problem is, and fix it real soon now.
 
 2 0
 
 Your server should redirect an incorrectly-cased request over to the right URL using a 301 redirect (do NOT use a 302 redirect), or else it should directly serve a 404 status code for the incorrect requests. This ensures that the content can only ever be indexed under one canonical URL. There are several errors in implementation here. A major one is that your internal links don't point to the correct URL. Users should NEVER hit a redirect when they click an INTERNAL link. The links need to match the real URL. Another is that you're doing different things for Googlebot and for regular users. Another is that serving an error page with a "200" status is at best confusing, and at worst, a source of Infinite Duplicate Content. If you are doing different things for Google, then that is either coded into your script, or into the server configuration files. Someone needs to own up as to where the problem is, and fix it real soon now.
 Cancel
 
 Per Svanström
 
 2008-06-27T11:57:43-07:00
 
 So many good answers from you in this entire post so dont know where to answer so will just answer here, at the last one.
 
 I know all about the 301, not use the 302, only use lowercase, setup the IIS to handle the upperlowercase the same as Apache (thats why I dont like IIS).
 
 Problem is that I don't controll any of this.
 
 Good news though, is that the problem should finaly be fixed with the user agent cloaking issue. The CMS supplier had released an hotfix to sort this, that the server responsible had missed to implement (yeah, I know, dhuu).
 
 The issue with the 301 from d to D still is there, but they know about it and are fixing it.
 
 Thank you again everyone for your great responses and knowledge and sorry again for, indirectly causing the 3 renames of this post. Very embarassing...
 
 2 0
 
 So many good answers from you in this entire post so dont know where to answer so will just answer here, at the last one. I know all about the 301, not use the 302, only use lowercase, setup the IIS to handle the upperlowercase the same as Apache (thats why I dont like IIS). Problem is that I don't controll any of this. Good news though, is that the problem should finaly be fixed with the user agent cloaking issue. The CMS supplier had released an hotfix to sort this, that the server responsible had missed to implement (yeah, I know, dhuu). The issue with the 301 from d to D still is there, but they know about it and are fixing it. Thank you again everyone for your great responses and knowledge and sorry again for, indirectly causing the 3 renames of this post. Very embarassing... 
 Cancel
 
 g1smd
 
 2008-06-27T12:07:46-07:00
 
 I think a lot of people learnt a lot of new things during this investigation... and you got some great advice direct from, not one, but TWO Googlers...
 
 2 0
 
 I think a lot of people learnt a lot of new things during this investigation... and you got some great advice direct from, not one, but TWO Googlers...
 Cancel
 
 Per Svanström
 
 2008-06-28T20:16:37-07:00
 
 I hope alot of people learned alot cause atlest I did. I learned alot about the CMS the client was useing, but more I learned about the will and efford the Google people give, to make the net a better place by participating in this post and for that I salute them. I truely salute them for even giving them time and in what other place can you get that but here, at SEOMoz. Thank you all for this, after all, great post and thank you from a humble simple consultant for explaining why I should stay so humble.
 
 /M
 
 1 0
 
 I hope alot of people learned alot cause atlest I did. I learned alot about the CMS the client was useing, but more I learned about the will and efford the Google people give, to make the net a better place by participating in this post and for that I salute them. I truely salute them for even giving them time and in what other place can you get that but here, at SEOMoz. Thank you all for this, after all, great post and thank you from a humble simple consultant for explaining why I should stay so humble. /M 
 Cancel
Jeremy Dearringer

2008-06-26T08:59:48-07:00

My developers frequently make changes without my approval which forces me to thoroughly and constantly quality check our sites for issues that appear harmless to the 'not so SEO savvy' developers.

I could see how making the image link a single pixel image may have seemed to be the quickest temporary solution to a design problem. I still find it hard to believe that it was truly that harmless, but I may never know.

As for SEOmoz PRO membership, it's great. Promote it like crazy because it is truly useful. My mentor lead me to SEOmoz.org and it has played a critical role in the success of our company and my personal growth.

@Matt Cutts - Thanks for pointing out the real issue. Regardless of the issue it appears that SEOmoz lead to the solution. The question lead to a reasonable assertion by Rand's team which then caused this post. This post peaked your interest on a great blog that is graced with your presence and the real issue was revealed. Props to SEOmoz and Matt Cutts!

PapaRelevance edited 2008-06-26T09:08:23-07:00
4 2

My developers frequently make changes without my approval which forces me to thoroughly and constantly quality check our sites for issues that appear harmless to the 'not so SEO savvy' developers. I could see how making the image link a single pixel image may have seemed to be the quickest temporary solution to a design problem. I still find it hard to believe that it was truly that harmless, but I may never know. As for SEOmoz PRO membership, it's great. Promote it like crazy because it is truly useful. My mentor lead me to SEOmoz.org and it has played a critical role in the success of our company and my personal growth. @Matt Cutts - Thanks for pointing out the real issue. Regardless of the issue it appears that SEOmoz lead to the solution. The question lead to a reasonable assertion by Rand's team which then caused this post. This post peaked your interest on a great blog that is graced with your presence and the real issue was revealed. Props to SEOmoz and Matt Cutts! 
Cancel
- g1smd
 
 2008-06-27T03:55:26-07:00
 
 peaked ---> piqued
 
 :-)
 
 1 0
 
 peaked ---> piqued :-)
 Cancel
Ken Jones

2008-06-26T03:21:42-07:00

Great sluething Rand.

It's always panic stations when pages suddenly get de-indexed and it takes a cool head to quickly figure out the cause and find a fix. Well done.

BTW I'd love to see more from you about the advanced query you mentioned at the end. What's the signifigance of the added period? Think you can write a post in the future once you've had a bit more time to try it out?

2 0

Great sluething Rand. It's always panic stations when pages suddenly get de-indexed and it takes a cool head to quickly figure out the cause and find a fix. Well done. BTW I'd love to see more from you about the advanced query you mentioned at the end. What's the signifigance of the added period? Think you can write a post in the future once you've had a bit more time to try it out?
Cancel
Per Svanström

2008-06-26T05:29:03-07:00

Now I am even more confused cause I did some more diggin into this as I got the info about the page beeing 404.

In the clients CMS enviroment they have named the initial folder "database" hence the url https://www.birdstep.com/database/.

When you check with the above useragent bot suggestion you actually get a 404 response, even if the page works in a browser.

Initially the client had the folder named "Database" and then the page was indexed. So the error of de-indexing started once they renamed their folder to "database".

Also if I check the useragent bot above against https://www.birdstep.com/Database I get the page. URL's shouldn't be casesensitive so how come this is?

Now the client renamed their folder again back to how it once was - "Database", before the page started to vanish and if I now check the url against the useragent bot I get exactly the oposite result.https://www.birdstep.com/Database/ the bot say 404, but if I enter https://www.birdstep.com/database/ I get the page.

This is exactly the other way around in comparasing on how they have named their folders and shoud a folder name actually create casesensitive urls?

Just to check if this was the case I added another level into the url to check what the bot thought about it and was i surprised as both variations worked. https://www.birdstep.com/database/Support/https://www.birdstep.com/database/support/ both returned the page when I checked with the bot.

This is the user agent I used to test the site based on the entry a few comments above.

Macaper edited 2008-06-26T05:30:45-07:00
2 0

Now I am even more confused cause I did some more diggin into this as I got the info about the page beeing 404. In the clients CMS enviroment they have named the initial folder "database" hence the url https://www.birdstep.com/database/. When you check with the above useragent bot suggestion you actually get a 404 response, even if the page works in a browser. Initially the client had the folder named "Database" and then the page was indexed. So the error of de-indexing started once they renamed their folder to "database". Also if I check the useragent bot above against https://www.birdstep.com/Database I get the page. URL's shouldn't be casesensitive so how come this is? Now the client renamed their folder again back to how it once was - "Database", before the page started to vanish and if I now check the url against the useragent bot I get exactly the oposite result.https://www.birdstep.com/Database/ the bot say 404, but if I enter https://www.birdstep.com/database/ I get the page. This is exactly the other way around in comparasing on how they have named their folders and shoud a folder name actually create casesensitive urls? Just to check if this was the case I added another level into the url to check what the bot thought about it and was i surprised as both variations worked. https://www.birdstep.com/database/Support/https://www.birdstep.com/database/support/ both returned the page when I checked with the bot. This is the <a href="https://www.botsvsbrowsers.com/SimulateUserAgent.asp?UserAgent=%22Googlebot%2F2.1+%28+http%3A%2F%2Fwww.googlebot.com%2Fbot.html%29%22Mozilla%2F5.0+%28compatible%3B+Googlebot%2F2.1%3B++http%3A%2F%2Fwww.google.com%2Fbot.html" rel="nofollow">user agent</a> I used to test the site based on the entry a few comments above. 
Cancel
- Paul Montwill
 
 2008-06-26T05:35:15-07:00
 
 Macaper, thanks for the link to user agent.
 
 I think the lesson from this example is simple - stay away from capital letters in URLs.
 
 2 0
 
 Macaper, thanks for the link to user agent. I think the lesson from this example is simple - stay away from capital letters in URLs. 
 Cancel
- leadegroot
 
 2008-06-26T14:27:18-07:00
 
 When using Apache, case matters, so /database and /Database should return different folders (or 404s if one doesn't exist)
 You seem to be using IIS, which is a different kettle of fish.
 IIS is not case sensitive out of the box (although I believe there are extensions to fix that)
 So /database and /Database will point to the same place by defaut.
 I think what you really need to do is review the CMS you are using to decide whether it is causing more problems for you than its worth.
 There seems to be an awful lot of complex funkiness in there - more than necessary :(
 
 2 0
 
 When using Apache, case matters, so /database and /Database should return different folders (or 404s if one doesn't exist) You seem to be using IIS, which is a different kettle of fish. IIS is not case sensitive out of the box (although I believe there are extensions to fix that) So /database and /Database will point to the same place by defaut. I think what you really need to do is review the CMS you are using to decide whether it is causing more problems for you than its worth. There seems to be an awful lot of complex funkiness in there - more than necessary :(
 Cancel
- g1smd
 
 2008-06-27T03:27:32-07:00
 
 *** URL's shouldn't be casesensitive so how come this is? ***
 
 Oh yes they should.
 
 "Page.html" is a different URL to "page.html" is a different URL to "PAGE.html" is a different URL to "page.HTML".
 
 Only the domain name is case-insensitive, not the folder or file path.
 
 The fact that IIS isn't case-sensitive should be treated as a BUG. It is a major cause of Duplicate Content.
 
 Apache gets it right, right out of the box.
 
 3 0
 
 *** URL's shouldn't be casesensitive so how come this is? *** Oh yes they should. "Page.html" is a different URL to "page.html" is a different URL to "PAGE.html" is a different URL to "page.HTML". Only the domain name is case-insensitive, not the folder or file path. The fact that IIS isn't case-sensitive should be treated as a BUG. It is a major cause of Duplicate Content. Apache gets it right, right out of the box.
 Cancel
 - Ann Smarty
 
 2008-06-27T06:43:50-07:00
 
 totally...
 
 1 0
 
 totally...
 Cancel
chance

2008-06-27T16:30:36-07:00

heh - funny

chance edited 2008-06-27T16:36:15-07:00
1 0

heh - funny
Cancel
- Fábio Ricotta
 
 2008-06-28T15:21:51-07:00
 
 Rand, sometime ago I got the same problem with a SEO customer, and he has the same ASP.NET server that this server site.
 
 I've used this tool to check how Googlebot sees the page:
 
 https://www.smart-it-consulting.com/internet/google/googlebot-spoofer/index.htm
 
 There are 2 options of Googlebot. Choose the "Googlebot-Mozilla-2.1", then click on submit. The new page will show "Object Moved". This is because server returned an 302 error.
 
 I've solved this problem asking the server support to fix using these steps:
 
 https://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole.aspx
 
 I hope this help.
 
 Fábio Ricotta
 
 2 0
 
 Rand, sometime ago I got the same problem with a SEO customer, and he has the same ASP.NET server that this server site. I've used this tool to check how Googlebot sees the page: https://www.smart-it-consulting.com/internet/google/googlebot-spoofer/index.htm There are 2 options of Googlebot. Choose the "Googlebot-Mozilla-2.1", then click on submit. The new page will show "Object Moved". This is because server returned an 302 error. I've solved this problem asking the server support to fix using these steps: https://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole.aspx I hope this help. Fábio Ricotta 
 Cancel
mikka2008

2008-06-29T10:18:15-07:00

Thanks this was a helpful post. You can get into the cloaking trap too easily.

1 0

Thanks this was a helpful post. You can get into the cloaking trap too easily.
Cancel
spekz

2008-07-06T01:52:54-07:00

Forgive my ignorance on this subject, but how does this affect affiliate marketing? ie, many affiliate codes include a 1x1 gif image for tracking. Does including this image violate Google's TOS?

Thanks

1 0

Forgive my ignorance on this subject, but how does this affect affiliate marketing? ie, many affiliate codes include a 1x1 gif image for tracking. Does including this image violate Google's TOS? Thanks 
Cancel
- g1smd
 
 2008-07-07T07:42:35-07:00
 
 It might do, if it is embedded directly into the HTML code.
 
 If it is written out to the browser screen using Javascript from an external file, then the bot will likely never see it.
 
 1 0
 
 It might do, if it is embedded directly into the HTML code. If it is written out to the browser screen using Javascript from an external file, then the bot will likely never see it.
 Cancel
Stuart McIlreavy

2008-06-26T04:28:39-07:00

Does this mean that using css display:none on elements like h1 or menu items can hurt? I thought it was quite commonly done on sites that are image intensive or built entirely with flash.

StuartMcIlreavy edited 2008-06-26T04:29:12-07:00
1 0

Does this mean that using css display:none on elements like h1 or menu items can hurt? I thought it was quite commonly done on sites that are image intensive or built entirely with flash.
Cancel
DaveN

2008-06-26T12:15:17-07:00

Rand, why would i ban them..

it's cloaking, what if and this is a "what if" thier was two pages

page one what the user see's, page two spam for the engines, and whatif just after that page was banned in google the webmaster had just removed the spam page, causing a 404 error now. I not say that did happen more a what if, and what if you run though more pages fro a google ua and you saw some other issues, and not to mention dupe content issues.

Dave

1 0

Rand, why would i ban them.. it's cloaking, what if and this is a "what if" thier was two pages page one what the user see's, page two spam for the engines, and whatif just after that page was banned in google the webmaster had just removed the spam page, causing a 404 error now. I not say that did happen more a what if, and what if you run though more pages fro a google ua and you saw some other issues, and not to mention dupe content issues. Dave 
Cancel
2008-06-26T11:56:30-07:00

ok, so this seems to still be causing confusion, and if i'm reading this correctly, then the UPDATES you made to the post, Rand, are still wrong.

What I'm reading (perhaps incorrectly), is that it's not an issue of url mis-capitalization - but an issue of cloaking - serving something different to googlebot - and that "SOMETHING" that is being served to gbot is 404 and noindex.

Am I right in my interpretation of what John and Matt are saying?

1 0

ok, so this seems to still be causing confusion, and if i'm reading this correctly, then the UPDATES you made to the post, Rand, are still wrong. What I'm reading (perhaps incorrectly), is that it's not an issue of url mis-capitalization - but an issue of cloaking - serving something different to googlebot - and that "SOMETHING" that is being served to gbot is 404 and noindex. Am I right in my interpretation of what John and Matt are saying? 
Cancel
- Rand Fishkin
 
 2008-06-26T11:58:29-07:00
 
 Donna - yep, I had to update a second time because there was an actual cloaking issue happening (it's just not visible unless you hit from the right port).
 
 2 0
 
 Donna - yep, I had to update a second time because there was an actual cloaking issue happening (it's just not visible unless you hit from the right port).
 Cancel
rjb627

2008-06-26T08:24:21-07:00

Great detective work Rand & Jane.

It is unfortunate that shady links can make your site drop in the SERPS..I have seen it happen. I personally would not do it - maybe I have to much of a conscience.

What I have seen is that the site drops in rankings for a week or two and then goes back up.

1 0

Great detective work Rand & Jane. It is unfortunate that shady links can make your site drop in the SERPS..I have seen it happen. I personally would not do it - maybe I have to much of a conscience. What I have seen is that the site drops in rankings for a week or two and then goes back up. 
Cancel
Jean Marc Thomas

2008-06-26T02:30:30-07:00

this technic can be very useful to see a removed URL from the index. Thank you for sharing that point Rand!

1 0

this technic can be very useful to see a removed URL from the index. Thank you for sharing that point Rand! 
Cancel
Dudibob

2008-06-26T03:04:22-07:00

Well it's good to see Google discounting bad links rather than making the whole site suffer :)

1 0

Well it's good to see Google discounting bad links rather than making the whole site suffer :) 
Cancel
Christian Biggins

2008-06-26T02:45:51-07:00

For all ye non believers. I am a very happy pro member.

Btw, nice columbo work on that site. Damn Google and their rules n stuff...

2 1

For all ye non believers. I am a very happy pro member. Btw, nice columbo work on that site. Damn Google and their rules n stuff... 
Cancel
JonnyRash

2008-06-26T08:40:39-07:00

Presumably the nav link was the first link to that subdomain in the markup. Do you think that might have had to do with the SERP drop?

One question I have wondered about repeatedly is if it is considered cloaking to mirror a mostly Javascript/AJAX driven site with a hard anchor-tag URL version (where the hard links are mostly hidden)?

Also if you disable links with, say jQuery, is this the same as hiding links on the page?

Any advice would be GREATLY appreciated.

(and to all you Pro-membership fanboys, I'll be jumping on board soon, I just don't have the time right now to use all the tools)

1 0

Presumably the nav link was the first link to that subdomain in the markup. Do you think that might have had to do with the SERP drop? One question I have wondered about repeatedly is if it is considered cloaking to mirror a mostly Javascript/AJAX driven site with a hard anchor-tag URL version (where the hard links are mostly hidden)? Also if you disable links with, say jQuery, is this the same as hiding links on the page? Any advice would be GREATLY appreciated. (and to all you Pro-membership fanboys, I'll be jumping on board soon, I just don't have the time right now to use all the tools)
Cancel

Post Analytics

Comments 85

Log in to Moz

Don't have an account?