Silicon Valley Sleuth has a good piece on items they searched for at Google that people surely don't want revealed. Everything from payroll excel files to passwords to remote desktop logins. It's an unsecured world and Google is just waiting to show it to you. The funny thing is, I wonder if anyone would ever think to try this out on Yahoo! or MSN. MSN has been spidering pretty deeply lately - hitting folders and files that have virtually no linkage whatsoever (possibly based on browser data?).
Quick Example of a Password Uncovered by the Sleuths
How about you? Ever had Google drag something up you didn't want found?
I guess I was raised in the careful old days of attention to security. Unfortunately, occasionally when I've mentioned the secret phrase ("password protecting directories"), it was pooh-poohed because "no one can find it" because the person hadn't linked to anything in it.
So, here's a question: when there's no default page in a directory (e.g., index.html, etc.), your browser will make a linked list of items in that directory. So, if a browser can do that ...
Here's the thing: say you have a shopping cart; it has a login page. Sure, it requests a username and password, but why allow that page to be spidered?
Wow, that is really weird that these types of things were spidered. How can they spider an excel file that's not even hosted? I guess it must have been something stored on a server that was never intended to be crawled... I thought they only crawl stuff that is linked to... weird.
benj - great call - I found this pdf document based on that search that details how Dell's website should construct its security measures - there's tons of gold in those SERPs (and tens of thousands of results).
A simple search on Google for the phrase:
"Not for distribution"
results in quite a remarkable amount of results!
Definetly, i have even got indexed new test sites that i have been developing and testing just 2 or three days after uploading them.
It's really interesting all the information you can get with Google. I know a couple of persons that take advantage of this uncovered data and install eggdrops to take full control of big servers.
Robots aren't just following links, in my case they have been querying all kind of variation of urls and files inside my site. Ex. index.BAK, index.OLD, index.php.bak, and some default hidden folders from the server have been found too. Not funny.
You bet.
A large database site can copy multiple search engine visits per minute - 24 hours a day. Not just Google.
Its not so much a privacy issue, but a database maintenance issue.
We need to make sure the customer experience is not compromised by serving robots all day long.