Update: You can now download the complete list of Google User Data by clicking here.
Google Inc. is first and foremost a data company. In the past, it competed on a level playing field by manipulating publicly available data better than its competition. By doing this, it had unprecedented success.
Enter Web 2.0. Hard drives, processors, bandwidth, and even workers are now all relatively inexpensive. This has caused the barriers to entry in the search field to drastically lower. As Google’s competition has started to catch up (MSN Image Search) and new competitors are arising, (Cuill) the search engine is looking for some kind of advantage. Since everyone has reasonably equal access to the internet’s content, leaders have been striving to gain access to private data. The most cost effective way of doing this for the engines is by collecting data from the users that already use their services. Google has been increasingly serving its users by using their personal data to manipulate public data in individualized ways. These methods are impossible to copy without the necessary personal data.
The Methods Google Uses to Get Data
Click Tracking - Google logs all the navigational clicks (ads, actions, feature clicks, etc) of all of its users on all of its services.
Forms - Along with the data the user enters directly into the forms (username, password, etc), Google logs the time and date and location of submission.
Code From Google Account Sign Up
1. Input type is hidden so user doesn't see or enter data into given field
2. Location to send user after submitting (hidden)
3. Input type is hidden so user doesn't see or enter data into given field
4. User's referrer data is used and sent via the form so Google knows where user clicked "Sign Up" (hidden)
Cookies - Google uses cookies on all of its web properties. Additionally, it leaves advertising (Doubleclick) cookies to track users' movement around the web. By doing this, Google can track individual users on any page that has either Doubleclick or Adsense ads. This means millions of pages that are not on Google’s web properties.
Unique cookies stored on user's computer from multiple Google web properties
Server Requests Stored in Log Files - Every request made to any of Google's server (ex. GET https://www.google.com) is stored in log files. The content stored is dependent on the type of request. (See ‘normal search’ below for more details.)
Example of a log file
URL - "https://www.google.com/search?hl=en&q=seomoz&ie=UTF-8"
1. IP Address from user making request. This can be used to geo-locate the user
2. Date, time, and time zone offset of user
3. Language of requested result (in this case, English)
4. Search query
5. Operating system of user
6. Browser of user
The additional information is less important but details the server type of request, the server response, and rendering engine.
Javascript - Google has small amounts of javascript embedded in websites all over the internet. When a user’s browser executes the script in the background, Google is able to tell a lot of important information on a person’s browsing habits (location, operating system, browser type and version, etc).
Web Beacons - Google embeds small (1 pixel by 1 pixel) transparent .gifs into many of its checkout screens. Just like the javascript, a user downloads the invisible image and sends information about their computer to Google.
Example of a Web Beacon (What you can't see it? That is the point.)
Understanding What Google Does with the Data
Store - Google uses an internal database called BigTable spread over approximately one million servers.
Google Data In 2006
Data | Size (TB) |
Crawl Index | 800 |
Google Analytics | 200 |
Google Base | 2 |
Google Earth | 70 |
Orkut | 9 |
Personalized Search | 4 |
(Source: Bigtable: A Distributed Storage System for Structured Data)
This is the size of the compressed data in terabytes (1,024 GB). That puts Google's disclosed data size at over 1 petabyte (1,048,576 GB). GREAT GOOGLEY MOOGLEY! This doesn't even consider AdSense, Gmail, Google Maps, Street View, Google Images, or other private databases. This is considered to be a lot of data now and these are stats from over two years ago before the Web 2.0 Data Rush.
Massive Data Analysis - This is a little like Charlie and the Chocolate Factory. We know that a lot of data goes into Google, and we know a lot of useful manipulated data comes out. We just don't know what happens in between.
Ompa Loompas working hard at Google writing pretty primary colored code.
We know that Google has many algorithms to sort and organize its data. Page Rank is the most well known. It also known that Google has many complicated spam filters, duplicate content filters, pattern detection algorithms, natural language interpreters, image recognition software, and loads of other complicated software.
Permanent Backup - The final resting place for data at Google is likely in permanent storage. Google's privacy policies hint that some user data can never be completely deleted because of permanent backups.
Understanding What Specific User Data Google Collects
Below is a list of every self-declared piece of datum that Google collects when a user interacts with its many web services. This means there is even more user data that is gathered by Google that is unknown to the public. Be forewarned, ignorance is bliss. After you read this you may feel inclined to wear a tinfoil hat.
The Comprehensive List of All the Data Google Admits to Collecting from Users
Download as:
PDF Doc Pages
Cookies and logs (described above) are used in addition to the methods used below to track users. Note: a few of the items below require a user to opt in.
Google (Normal Search)
- Search Engine Result Pages
- Country code domain
- Query
- IP address
- Language
- Number of results
- Safe search
- Additional preferences can include:
- Street Address
- City
- State
- Zip/postal code
- Server log
- Query
- URL
- IP address
- Cookie
- Browser
- Date
- Time
- Clicks
Google Personalized Search
- Logs every website visited as a result of a Google search.
Google's data on me while I researched this article
- Content analysis of visited websites
Google Account
- Used as resource to compile information on individual users
- Sign up
- Sign up date
- Username
- Password
- Alternate e-mail
- Location (country)
- Personal picture
- Usage
- Friends
- Google Services usage
- Amount of logins
Toolbar
- All websites visited
- Unique application number
- Sends all visited 404s to Google
- Toolbar synchronization function
- Stores autofill info with Google account
- Sends structure of web forms to Google
- Safe browsing
- Stores response to security warnings
- Stores autofill forms data
- Spellcheck sends data to Google servers
Web History
- Every website visited from Google SERP
- Date
- Time
- Search query
- Ads clicked
- Which service
Translate
- All text sent to Google servers
Google Finance
- Stock portfolio
- User’s stocks
- Amount of shares
- Date/time bought
- Bought at price
Google Checkout
- Buyers
- Full legal name
- Credit card number
- Debit card number
- Card expiration date
- Card Verification Number (CVN)
- Billing address
- Phone number
- E-mail address
- Sellers
- Bank account number
- Personal address
- Business category
- Government-issued identification number
- Social Security Number
- Taxpayer Identification Number
- Sales Volume
- Government-issued identification number
- Transaction volume
- Business information from Dun & Bradstreet
- Transactions
- Amount
- Description of product
- Name of seller
- Name of buyer
- Type of payment used
- User trend data
- Web Beacons
- Referrer data
YouTube
- YouTube SERP data
- Registered user data
- Videos uploaded
- Comments posted
- Videos flagged
- Subscriptions
- Channels
- Groups
- Favorites
- Contacts
- All videos watched
- Frequency of data transfers
- Size of data transfers
- Click location data
- Information display data
- E-mail
- Web Beacons for tracking
- E-mail opened or discarded
- Web Beacons for tracking
- Account basics
- Password
- Username
- Location (country)
- Postal code
- Birthdate
- Gender
Gmail
- Stores, processes, and maintains all messages
- Account activity
- Storage usage
- Number of log-ins
- Data displayed
- Links clicked
- Stores all e-mails
- Contact lists
- Spam trends
- Gchat
- All conversations and who they involve.
- When service is used
- Size of contact list
- Contacts communicated with
- Gchat
- Frequency of data transfers
- Size of data transfers
- Clicks
Calendar
- Name
- Default language
- Time zone
- Usage statistics
- How long the service is used for
- Frequency of data transfers
- Size of data transfers
- Number of events
- Number of calendars
- Clicks
- Deletes every 90 days
- All events
- Who is going
- Who was invited
- Comments
- Descriptions
- Date
- Time
Desktop
- Indexes and stores
- Versions of your files
- Computer activity
- E-mails
- Chats
- Web history
- Mixed with web search results
- Content analysis of data on computer for integration into SERPs (opt-in)
- Unique application number
- Application interacts with Google’s servers
- Number of searches and response times
Goog 411
- Phone number
- Time of call
- Duration of call
- Options selected
- Phone number used as identifier
- Records all voice commands
iGoogle
- Settings stored in Cookies
- Settings linked to Google Account
Blogger
- User photo
- Birth date
- Location
- Frequency of data transfers
- Size of data transfers
- Clicks
- Blogger Mobile
- Phone number
- Associates with Google Account
- Device identifiers
- Hardware Identifiers
Google Docs
- E-mail address
- Number of logins
- Actions taken
- Storage usage
- Clicks
- All collaborators
- All text
- All images
- All changes (previous versions)
Groups
- E-mail password
- Contents of posts
- Contents of custom pages
- Contents of external files
- Account activity
- Groups joined
- Groups managed
- List of members
- List of invitees
- Ratings made
- Preferred settings
Orkut
- Name
- Gender
- Age
- Location
- Occupation
- Religion
- Friend graph
- Hobbies
- Interests
- Photos
- Invites
- Messages
- Orkut Mobile
- Phone number
- Wireless carrier
- Content of message
- Date
- Time
- Everything a user writes
- Every blog post a user reads
Picasa
- Friend graph
- Favorite lists
- Clicks (almost all Google services track all clicks)
- All photos
- Geotags (Exif data)
- People who subscribe to albums
Mobile
- Phone number
- Device type
- Request type
- Carrier
- Carrier user ID
- Content of request
- Maps for mobile
- Location information (GPS)
- Address
- Websites visited if user asks Google to transcode
- Voice commands
Web Accelerator
- Web requests
- Cache of websites before you go to them
Double Click/AdWords
- Ads clicked
- Age
- Sex
- Location
- Trends of past visited websites
- IP address
Health
- Medial records
- Doctors
- Conditions
- Prescriptions
- Age
- Sex
- Race
- Blood type
- Weight
- Height
- Allergies
- Procedures
- Test results
- Immunizations
Postini
- E-mail address
- Traffic patterns
- Clicks
GrandCentral
- Credit card
- Credit card expiration date
- Credit card verification number
- Billing address
- Stores, process and maintains
- Voicemail messages
- Recorded conversations
- Contact lists
- Storage usage
- Number of log ins
- Data displayed
- Clicks
- Telephony log information
- Calling-party phone number
- Forwarding numbers
- Time of calls
- Date of calls
- Duration of calls
- Types of calls
Google Merchant Search
- Name
- Contact information
- E-mail address
- Phone number
Notebook
- Stores, processes and maintains
- All content in notebook
- Nickname
- Storage usage
- Number of log-ins
Google Web Services That Conveniently Don't Have Individual Privacy Policies Disclosing What User Data is Collected
- Webmaster Tools
- Google Analytics
- AdWords
- AdSense
- Alerts
- Reader
- Earth
- FeedBurner (technically has one, but it is useless)
Search Verticals
- Image search
- Map search
- Blog search
- Book search
- News search
- Patent search
- Product search
- Scholar search
- Special search
- Video search
- Code search
By the way Google...
I found some broken links and errors on your website. On your main privacy policy page the link anchored with "Video Player" is broken. Additionally, you capitalized your own product incorrectly. "GMail" should be "Gmail." Lastly, the Google Store has text encoding issues on the homepage and the link to download sketchup is broken.
Please send my check in the mail (I am sure you already have my address).
Sources:
Additional Information:
Can you trust Google to obey the rules? - Excellent analysis of the darker side of Google Inc. as a web giant.
If you have any other advice that you think is worth sharing, feel free to post it in the comments. This post is very much a work in progress. As always, feel free to e-mail me or send me a private message if you have any suggestions on how I can make my posts more useful. All of my contact information is available on my profile: Danny Thanks!
Danny, this is a great list -- Google could probably use an all-in-one page like it as well. And honestly not trying to excuse them from anything -- they DO collect a lot of information. But how about doing the same thing for Microsoft next week? Because if you do, I'm virtually certain you'll have a list that is as long or longer. And that perspective is important. With out it, Google gets painted as this massive data collector when in reality, they're doing just what Microsoft (and Yahoo and lots of companies) do.
I guess it's just inevitable that Google is going to be the lightning rod, however. We need better protections, and people continue to be so freaked by Google that maybe they'll drive it over all.
Just wish people were as concerned about you know, my credit data apparently being free to be sold all over the place. I'd like to have an overall privacy protection system put into place.
DD - great post!
DS - I couldn't agree more . . . the MS & Y! perspective is definitely needed here. Also, in a variation of what has been mentioned in the comments above, I think most people understand that they are “paying” to use Google services in one way or another. They might not understand the extent to which their personal information is being used as collateral, but I think the idea that ‘nothing is free’ is fairly pervasive in today’s society.
I make this point only to highlight general societal acceptance in this area. Most people just don’t mind that this is going on. I for one, see Google as more trustworthy and secure than the government and am happy that this is being innovated in the private sector.
good point Danny. I was thinking about the same as I wrote my first comment. I mean Microsoft may have even more personal access to our private or even larger datas. I guess that more than 99% of the largest companies rely on Windows XP. Aren't there "secrets" even more valuable than our "little" credit card numbers ?
Well stated Danny, and I agree wholeheartedly.
Google is not alone in their quest for information. Every business and every person in the world is seeking it. It's just that none have come nearly as close (with the exception of the Federal Government), to capturing and organizing so much of it.
The leader will always be the lightning rod and there's no question - Google is the leader.
I bet it would be ever harder to find out what Microsoft collects. I also bet you it would elicit a much more emotional reaction than Google.
Great article Danny! thanks a bunch!
I totally agree with you. Many people have a very soft place in their heart for Google, and a very uncomfortable place for MS. The reactions I'm guessing would be quite different although it seems that Google knows a lot more.
Get out the pitchforks and torches! Its time to storm the castle! ;-)
Exactly, I do think it would go something like that. Good for Google though, they've got a lot of the public completely smitten. Makes it easier to operate a business like theirs.
I wish people were a 10th as smitten with my company as they are with google lol. Oh well, with time and effort comes great reward. :-)
It's true, I love the Google. Maybe I have just been lucky but everyone I have worked with there has been awesome too, which reflects on the company themselves.
True, Microsoft has made so many of us mad especially those that bought Vista when if first came out, so it may be a more emotional topic, but it might be fun to see what people have to say about it.
That’s a good point. When it comes to credit data, that’s one thing... but giants like these don't really concern me when they collect this data, its the smaller guys that aren't quite as savvy. Then again, if the VA can be compromised like they were a couple years ago, anyone can.
Microsloth, really evil... Google.. happy and good..
lol.
The fact that Google is tracking all this stuff is inevitable in the evolution of mankind. Perhaps they'll do it too quickly and the world will freak out and put the kabosh on it or maybe they'll keep it in check and the public will slowly embrace it.
Now for a religious take on the whole thing . . .
The Book of Life
It may not be in the form that some guy from 2,000 - 4,000 years ago understood the internet to be but give it another 30 years and I think we'll have the Book of Life pretty much covered.
Am I concerned? Nope.
Why not? Because I am who I am. Sure there are things I don't want people to know about me but at the end of the day, I am the one that has to live with those decisions (and I've made some hefty bad ones). How many things do you do in the shadows when nobody is looking?
What about privacy? Well just about every religion of the world tosses that to the wind and 98% of the world believes in some form of omniscient being. So . . . 'god' is going to know everything anyway and the world will know sooner or later anyway come 'Judgement Day' or its equivalent for other religions. What difference does it make? Personally it's just a matter of time.
What about identity theft? Yeah, we better get on that one people. Because it's going to get easier and easier. Of course, the 'mark of the beast' would make it easier to prevent such personal violations to happen and we'll go there eventually but maybe not with the numbers '666' on our foreheads (again context people). BTW, my great grandparents thought a credit card was the mark of the beast, I can't imagine what they'd think of a thumbprint to pay for groceries--yes it could easily go there.
I'm an Atheist that finds religion fascinating and believes strongly in the prophecies of the past from many different religions if context is applied properly (i.e. the writer's ability to understand today's time). I used to be many religions growing up but had my strongest belief in the LDS (yes, Mormon) religion from age 15 - 21 (too the point I would now consider it fanatical).
I haven't attend church in 10 years. So take me as a grain of salt if you wish. ;-)
Brent
(here come the thumbs down, and a lesson in public relations--never discuss religion)
That's good Brent, funny, was going to take that same direction in my earlier post, changed mind, wasn't sure how it would fly here. So I thumbed you UP, for being braver than I was...
Thanks! This is a pretty open, honest, and forgiving community. We'll see what happens though. Religion is touchy. Let alone religion from a former-Mormon now Atheist. :-)
Brent
"The fact that Google is tracking all this stuff is inevitable in the evolution of mankind."
Not quite sure what you mean by that Brent, it's definitely inevitable in the evolution of online business and marketing though ;)
I totally agree with what you're getting at about people accepting such a big change however; it takes a long time for the general populace to accept and understand something like this.
In the still slightly paranoid global atmosphere of '08 it's perfectly clear why people are worried, but open discussion is the best way forward. This seems to be where Google does fall down, with their slightly obscure and sometimes evasive explanations.
Brent,
That's a fantastically thoughtful comment. Thumbs up.
I will say, with regard to privacy and the exploitation of this information in damaging ways, my biggest concerns are not so much exposing an individual's "dirty deeds" as much as much as invading his/her personal privacy and the privacy of corporations.
Very well said.
Just a couple of things. First - the competition has in no way, shape or form, started to catch up to Google.
Having said that - great post Danny.
It would be INCREDIBLY NAIVE to discount to potential harm any company with the vast amount of both public and private data, organized in the way Google has, could do.
Am I saying Googel is evil? No. Do they have the potential to be evil? Yes. Does the potential exist for individuals at such a company, with insidious ideas, to exploit that data in damaging ways? Without question.
Could the government potentially force Google to hand over data that they could use to infringe on private citizen's rights? ABSOLUTELY.
Call me paranoid, but I find alarming, the extent to which Google is endeavoring to know every miniscule piece of information about everyone and everything in an effort to exploit it to their profitable benefit.
Information leads to knowledge. Knowledge is power. George Orwell said:
Is Google a dictatorship in the making?
Very interesting stuff... I like the post Danny although it doesn't come as a great surprise.
It's funny you should ask if Google is a dictatorship in the making Sean. While Google may well be a trustworthy entity now, and I do believe as Rand said they have strong beliefs about how this information should be used (and protecting people's privacy), Google is a business. All it would take is one significant staff change for morality to be cast out the window and that wealth of information to be abused.
The thing I find re-assuring however is that like any business, I don't think Google is as organised as it appears from the outside. While all this information appears to be kept under the umbrella of 'Google', the company is spread across the globe, and there is no 'man behind the curtain' pulling all the strings.
Then again, maybe that's just what they want me to think...
That's the thing. We may not always be blessed with leaders who are as wise and benevolent as the current ones.
This is a truly insigtful voice. It's the possibility of the data connectedness which makes us uneasy. No matter the moral qualities, no matter what the behavior, the problem is that there is an opportunity for making things quite odd. The whole discussion is on what power there exist to use the data? If there is a culture not to discriminate people of what they think or where they come from or how they behave, there is no problem with data procesing. There is always a problem of concentration of power. Power corrupts. People conform. Anyone to argue?
Great Summary... Nice job..
You might add Feedburner. A data treasure trove. They get notified right away when someone posts content, they know who is getting subscriptions, who is getting syndicated, and who is getting clicked.
Google bought feedburner not to provide a cool service, they bought it for the data.
Great point, I am adding it right now. Happy to see you back on the blog. I really enjoyed your earlier work on this site.
I've always thought of Isaac Asimov's story The Last Question when it comes to Google.
Basically, humanity winds down over the centuries and each time a civilization dies, their information is entered into a giant computer. People keep asking this computer "Can entropy be reversed?" and the computer never has enough information to answer until the very last person inputs their information.
I don't want to give away the ending if you haven't read it, but please do. It's awesome.
Given what little I know about you, I would have thought you would be a bit put off by that ending. Now I know your're a fellow fan of classic Science Fiction. Cool.
Surely the little you know about me includes the fact that I always like to be the opposite of what people expect me to be? :)
I discovered Isaac through his Black Widow mysteries, and then went on to read all his sci-fi. This was probably in junior high school...anyway, I'm a big fan of his. And Rishi is too - hi, Rishi! *waves*
Black widows rocks! :P.
Hi! *waves back*
I have been loking for this! I read it about a year ago and didn't bookmark it. This story creeps into my mind every once in a while. Thanks for linking!
If anyone hasn't read this, I highly recommend it! It isn't too long and will stick with you for a long time.
That's a ton of good information, thanks. I tend to be fairly comfortable with Google knowing all that stuff about me. Like Rand says, they seem to take security seriously. Was trying to remember if there has been any major data breaches with Google that we know of?
The voice part does surprise me, I guess the take on that is don't use Google Talk to:
Actually since they know this much I wish they knew more. Could save me a lot of time. Like when I go out to the garage and stand there not remembering what I went out for, could just pull up my thought history on the cell phone and search the last ten minutes. I don't care if they show me ads for screwdrivers on Ebay, that's what I went out there for.
I suppose there is goverments that doesn't even have that kind of data on it's population. If Google were to rule the land, would probably think differently about them knowing all that, and sure don't want them knowing my thoughts.
LOL! Love the cell phone to recall thoughts comment. I so wish I had that feature.
We could call it G'Thought.
I'm kind of along the same lines of thinking.
With Google having all of this information, they seem to be doing a rather shitty job of running my life.
Knowing what they know about me - they should not only be filling out my Google calendar, but answering all of my gmail, programming my DVR and ordering my twice weekly Chinese food.
Really like the thought recall option....wouldn't want most of that info getting out though :P
It takes a lot of "smart" to process and use that data in good, useful, interesting ways, but I think that's where Google can really excel. As more and more of their processes get exposed, they can, as you said, shift to more private, un-knowable sources of information to help their security through obscurity system.
I can't say that I'm really that worried about what Google will do with the data they collect. Generally speaking, I have a lot of trust that the people who work there have strong, personal feelings about data privacy on the individual level - which is really good for Google users and customers. My only wish is that more of their data were exposed in public ways, so that the world's business and educational communities could benefit, too.
Rand, I remember at your seminar last year you said something about Google only using data in the aggregate, not on an individual basis. Is this something they've explicitly stated? Does that apply to all their data, or just some kinds?
Rand,
I agree. I talk to a lot of people at Google and all of them tell me the exact same thing (even after several beers) that Google isn't nearly as perfect at making all the different cogs work together as people think. Plus, they make it as a policy to not create conflicts for themselves. It's just not a good long term strategy and Google is in for the long term.
I do think Google is a public relations machine though. I think they could say that there are no primary colors in their logo and people would believe it because they have such high trust in the community. That type of power is scary as hell but no different than what any other organization or government could have or a community.
I worry about the future of Google. I have seen a lot of really good quality people leave Google. Good quality ethically. What happens when the founders are gone? Will the company stick to their motto of 'do no evil' or will that line get grey? I trust the Google of today but feel they could violate that trust with the community tomorrow.
Personally . . . my life is fairly open book (my son being the caveat as I believe he should have the right to choose how open he wants to be with his life versus me posting everything about him before he is old enough to make that decision) so what Google knows or shares with the world will probably never concern me.
Brent
Exactly!
Great post, thanks for organizing all of this information into one central location. None of it comes as a huge surprise, but people should be aware of what is being collected.
I wouldn't mind the data collection nearly as much if there was a way a user could opt-out of data collection completely, and delete all the information on them which has been stored. Facebook takes a lot of heat for not providing this feature, why is Google immune?
I don't expect most of this to be a big surprise. However some of these did pop out at me.
Crazy stuff.
BTW I should be able to make a PDF with this info available later tonight.
I wonder if they have recorded enough of Natalie Portman's voice for me to get my GOOG-411 responses back in her voice. That'd have me use it more often . . . oh wait I already use it exclusively.
Brent
I guess I should have figured that out, but it never dawned on me that the Google desk top would collect data on local files.
Guess I better clear out all of that info on my tax evasion :P
Sphinn it here
EDIT: Link Updated
Ah, you see, Google is a really powerful company with great lawyers and access to all the data in the world. They can... get away with a lot of things.
Question to the moz community:
Should I reformat this post to have the long list as a seperate PDF? This would remove the long scroll but might kill the impact. Thoughts?
I would keep the list and add a PDF. I didn't mind scrolling but a PDF is useful too.
Danny, you are just blowing me away with your awesome posts - and one thing I appreciate is that you're not just sitting down and riffing on some subject (not that I don't like that kind of post too), you're putting a lot of time and research into them.
Since you are now an expert on all the types of info Google collects, are you able to speculate about how (or if) they use all of it (or some of it)?
yes. thats my vote too
Yes, please look into your Google Crystal Ball!
I agree with Lorisa.
Keep both. The list has a good impact value to it.
Yes - keep both. The impact is very important. People might skip opening a pdf thinking they know it all, but without all the research that you have put in, I suspect most don't. I certainly didn't.
No, please don't reformat.
Excellent post Danny! I think having the long scroll adds to the impact. Definately keep the list in the post with a link to the PDF.
Kudos!
It reminds me of the courthouse scene from Arrested Development:
George Sr., never having heard the charges listed consecutively in one sitting, panicked and ran with great intensity.
I'm having a similar reaction seeing all this stuff in this giant list running all the way down the page.... Makes me want to run for the hills {shudder}
A PDF option would be nice, but I do think the current presentation is effective.
I'll thumb up any AD reference.
you can add a PDF, but keep your post this way. it will be good for people who have read your post, to have the full list aside for a later use.
This is an excellent post! I wrote a brief post a while ago and I am planning to write something more extensive about this. You've saved me a lot of research time. Thanks!
I have a theory of why Google collects all this information. Apart from the obvious use for improving their relevancy algorithms (paid and organic) I believe they use it for data mining.
In English, this means they are/will be able to predict the future. They will always know where the profits are and will adjust their business strategies accordingly. This isn't necessarily bad for consumers/individual users but for their competitors. Sound like science fiction. No?
Walmart has been doing this for a long time.
At one time they had one of the most powerful single computers in existence and used it to process transaction data. One of the more interesting findings was that men in their 20s who purchase beer on Fridays after work are also likely to buy a pack of diapers. Stress? Supposedly, they moved the items closer together and sales went way up.
Note: I think my professor lied to me.
Wow! It might have been easier to list what Google don't know about us! Nice list!
I was trying to think of something they may not know off the top of my head but it starting getting too complicated, everything I thought of made me think "well they could probably deduce the answer from when I did this" etc.
Hmmmmm, do they know what I am thinking right now?
Wow, great post. God knows how much work went into that.
Thanks for enlightening me. (Even though it scared the bejesus out of me!)
At the beginning of your post when you were describing the collection methods and basic data collected I was thinking 'big deal - we all do it. I know google has my data.' But then when it got to the very extensive list of services that google collects data on and data that goes beyond clicks, IP and browser stats, that is when the impact hit me. I completely forgot that they bought Postini, which I use to filter my mail. So now they are reading my mail in addition to watching all my customers with google analytics and adwords. And NO privacy policy of those very services! Big brother is indeed watching.
Hi Danny,
This is a great article, I know that its been a long time since you posted this, but I am doing some research on SEO and specially the way Google handles search strings for school this semester.
I am assuming that when you did this research the results applied only to logged in Google users. As you may know in December 2009 Google announced on it’s blog that they have started personalisation approach even for non logged in users. The new feature customises search results based upon 180 days of search activity linked to an anonymous cookie in the user browser.
I would say the personalisation approach apart from some privacy issues for some people, it has some impact on SEO and might modify the ranking of the pages and gives priority to real-time content generated pages by social media such as Twitter or facebook.
Speaking of privacy, I was amazed when I went to Ad Preferences page on Google website and saw how much Google knows about my behaviour on search keywords. For those who don’t know where this is, go to Ads Preferences.
All that being said, we all know that the cookies can be disabled in the browsers to skip all these.
Cheers,
Ramtin
Hi Buddy,First of all I heartly thankful for this post. I also heatly salute for your imagination and for your work effort for research all data. You are perfectly write right because everyone think about google is best forever & evergreen. But In My point of view when we use any thing so we must know about its goodness and also its badness. Everyone tell me google is best from all aspects. But when I read your post than I find it google has also some bad points. I tell you onething google has trace all information but it is useful sometimes, we take one example like when any place terririst attack being so at that time google will help us to find that specific place. I understand your all point but tell me every information about cookie. What is the meaning of cookie. I am totally unknown about cookie. I also want total information about google goodness. In which place google is most useful comparative to other all. Our Sir always said one sentence to us is "Google is the God of our Future" But today I think it is wrong at sometime. This is Marvelous post try to Make Extra Marvelous post keep it up Danny Dover.
Great timing on the article. Regardless of the privacy issues, it also helps to show where they can obtain data for the new Ad Planning tool.
I've always thought the biggest threat to Google's advertising empire has been the ISP (not only do they get to sniff all your web usage, they know who you really are and where you really live, and can combine online data with offline data aggregators). Now after seeing this post I'm thinking if Google ever partnered with the ISPs we'd see targeted advertising leap to a whole new level (good or bad depending on your perspective).
Thanks for putting this all together. It's amazing how much they do admit to gathering and makes you wonder what they aren't telling you.
The scariest part of it all is not knowing what they are going to do with all this information...... use it, sell it, rent it, have it taken from them?
That's an aweful lot of information.
Hat tip to Danny for one of the best seomoz posts I've read.
Kudos Danny for your outstanding compilation of all the personally-sensitive-information that Google collects. On my blog, I linked to your post and also added some additional important organizationally-sensitive information that Google collects that many have not thought of. https://www.precursorblog.com/content/j-edgar-google-information-is-power-no-accountability
I am Scott Cleland of Precursor LLC and Netcompetition.org.
Many said they trust Google to store this level of information because they inherently trust the company. But even if you trsut Google with your data, they are not the only concern.
Today's ruling about YouTube being forced to hand over a list of every YouTube user, their IP, and every video they have watched should hammer home the point about why such databases should not exist:
https://www.techcrunch.com/2008/07/03/judge-protects-youtubes-source-code-throws-users-to-the-wolves/
I trust this ruling will be overturned, and that Google will take a stand, but how long until one of these rulings stands?
(PS: sorry to comment on an old post)
Thanks for taking the time to put all of this data together Danny.
Google is singlehandedly responsible for raising my IQ by a significant amount. I process data faster, more accurately and I waste less time. Now they may turn round and bite me later on in life but right now they are worth all that data they have on me. I'd like to own it myself so I could trade it with Google as an intermediary but I trust Google more than say the accuracy of friends or family to recount what I am about. Sometimes people I know think they know me and they know nothing at all. Google on the other hand has delivered time after time after time for over a decade now.
Just another perspective.
After reading this I took some time to really evaluate this without responding off the cuff. Does it concern me? Does this situation have the potential to be devastating? Does it make sense to be upset by all this? Is it really a good thing or a bad thing?
Even when answering these questions in my head, its all just speculation. Anything has the potential to be devistating and anything has the potential to be good or bad.
To be completely truthful, it doesn't really concern me if they have a privacy policy or not because there isn't much they can find out about me that I wouldn't willingly tell someone anyway so if it comes down to, 'well they didn't ask, they just took'... so what?
I applaud them for finding ways to track what their market is doing. As someone who is always harping on our execs to gather as much data about our customers as possible, I think the innovations Google has come up with should be examined and followed. If I could find a way you moniter the content of my customers most read, and responded to emails, you better believe I'd be doing it. If I could find a way to pear into the future and see what my customers are planning to do or taking notes on what to write next..hell yeah I'd be tracking it.
If anything, we should be looking at the tools they have created and finding ways to emulate on a smaller scale rather than debating the fear factor of this potential data collection.
I agree with you in that I don't really have anything to hide. I don't care if they know where I shop online, or what sites I visit on a regular basis. But items such as credit card number, ssn...that's a different story.
My first reaction is to say that I agree but as I posted above, what harm do you foresee if they hold your credit card info? Do you save your info with Amazon, ebay, iTunes, or any other site? I can think of at least 4 different sites that I purchase items from on a regular basis that have my info saved. I don't think twice about it. What makes Google any different? Because they may extract it from their toolbar's auto-fill function? Would it be more ethical for them to stipulate that this info is being collected for use in developing better products and services... sure.
In the grand scheme of things I just don't see it being that huge of an issue.
I see your point. I guess my only problem (and it's not something that keeps me up at night) is that I 'give' my cc info to those other sites. Knowingly. And I'm not even saying it's an unethical practice. That's a pretty big discussion in and of itself. I suppose in the grand scheme of things its not that big of a deal, but I wonder how long it will take all of these little things to add up. Then before you know it every move we make is being tracked, both online and off. Okay...so that's a little off track but that is just the basis of where I'm coming from. Basically a slipper slope perspective. That's all.
Bug again...I really do see your point as well. I regularly shop on several sites that have my info stored as well and I don't give that another thought.
great post, and not too surprising too !
Since Google knows so much stuff about everyone, are there any guys from the governement spying there, to get some infos about "suspicious" people ? Or maybe have they even a direct access to those datas, without being there? I could easily think so, even if Rand mentionned that people working there "have strong personal feelings about data privacy", I don't see how the governement would not use those very intimate informations.
The government tries to get data from Google more frequently than you might imagine. The New York Times briefly mentioned one such instance yesterday.
Fantastic post Danny and very well researched.
I have to confess that I'm somewhat emotional about my data - whilst I recognise that it's all pretty secure etc, I just don't like the idea of that much data being stored.
Does anyone else feel like this?
I do kind of feel the same way. At this point my e-Identity is probably more dynamic than I am. Wonder if it's better looking?
Also...what's the deal with the spell checker collecting data? Do they care how bad I type, or how unintelligent I can be without the aid of technology?
I was curious about that too. My best guess is Google uses the user spell check data to compare it against correct spellings on Google servers. They also use it to improve the "Did you mean?" function on the SERPs.
Oh that makes a lot of sense actually. That didn't really dawn on me...thanks for the reply!
I would guess they might also use it for broad matching mis-spellings via Adwords...
I found a better answer to your question here. I was partially right.
I also find it alarming that they have permanent backups. We have data on our customer's activities but we deliberately delete most backups after 30 days and all backups after 6 months. We don't need them and don't see any reason why anyone else should either. But we are not Google.
Super post! Open another face of Google. Some day somebody will sell all this info to somebody. Who is he/she and how he/she will use it? That's the main question.
Jaak, https://seoapplied.blogspot.com/
Wow, Google will soon have more data compiled on us than the government, its a necessary evil I guess. The key is to find out how to turn this information into profit for us. I don't mind if Google has all my information if I can tap into that to market more effectively to my target audience.
Big Brother is Watching! I guess I should have known this but didn't realize the details and extent of it. They do make mistakes too - they have a couple of items mixed up in my Webmaster account that I know is wrong.
Care to clarify? Any additional insight would be useful for everyone.
My God, Never thought Google would Collect so much details... Thanks for the Great Information
Great information and really a great job. As an SEO expert this information really surprises me as Google usually bands those site who are engaged on blackhat techniques for their promotion whereas google engaging itself by following these hidden methods.
Do they sell this data or just use it to further their own insidious goals? Not that one is any beytter then the other. Also, when you opt out of that web history nonsense are you saying they still continue to track and monitor your activities?
As far as I can tell they don't sell it, but like I mentioned in the post, Google is very secretive about some of their more intrusive products.
Regarding tracking your web history. Even if you opt out they use cookies to customize your results. Try searching "malpractice" followed by searching for "lawyer". The searches will be linked as indicated by the ads that should say "malpractice lawyer".
Danny, that's based on the browser referrer, not cookies.
Yeah, like you would know better than me? What a N00b ;-)
of course they sell your data, and of course they are evil
why would any company have that creepy motto of theirs otherwise. just think about it people. it's like a slap in the face to how naive and trustworthy people are of absolute leeches like these corporations have become.
Thanks Danny. I read you every day,
Excellent work!
People worry about the Government monitoring cell phone calls they receive from Iran (ie: "wire tapping") but Google knows more about you than your own mother!
This is a link-worthy effort unless you get a -900 penalty from Google because of it...
Huh? I thought Google was our government. Guess I jumped the gun by about 3 years... soon enough...
By the way...
Seomoz.org uses google-analytics too...
Best way to go safely and anonymously through internet, is to block cookies (from google) or save them only for session.
I use Firefox with "AdBlock Plus" where you can permantly block any Site you want (e.g. google analytics / adsense etc.). Another useful Add-On is "Cookie Button".
Use a Proxy like "Proxomitron" or "TOR". I think things will be fine...
But on the other hand, everything that is posted here will be safed and maybe analysed to make a User-Profile. Who cares? Do You?
Yet another rich post Danny. Correct me if I am wrong, it's technically impossible for a website to retrieve cookies dropped by another website. For a moment let's think other wise, imagine websites agreeing to grant access to each other's dropped cookies. I mean Google serving me search results and ads based on the cookies dropped by Facebook or Youtube serving me music videos based on the search I did on mtv.com
just a note:
Facebook run google ads and mtv.com use analytics.
Google can track people or drop cookies in many different locations in many different ways.
Just as a bit of technical advice - a server can only access its own cookies. If Google drops a cookie on a facebook page, only Google can see that cookie. Google of course can only drop a cookie on a facebook page if facebook allow them to. And Google can only see its own cookies if something on that pages loads from Google's server.
I think the first bit of the list mentioning JS, web beacons and server requests should come before the mention of cookies. Cookies can't measure anything. All they do is have a unique id so that you can link all the js, web beacon or server requests back together to a unique browser. Google would still measure all these things without cookies, but they'd have to come up with another method of linking it all together (eg IP/browser).
Thanks for the clarification. I didn't look at the tracking process from the perspective of the cookie merely being a id marker. Good call
Google must be spending a fortune on this data collection and analysis !!
So when the Google OS and Google Browser come out, will the TOS agreement include users signing over their souls in blood?
One they Google will know you better than you know yourself.
One thing that concerns me... have you ever heard about Google servers to be HACKED? Nobody wants to do that or what?
Can you imagine how much this data is worth? Information from Google checkout is already worth a lot of $$$.
I know they are they allmighty Google, and no one wants to think this is a possibility...but is it? What is the chances of their servers being hacked? Goodness I hope not. Then the whole world would know how much money I spend on iTunes. Gah!
As is the case with all major networks with internet access, I am sure Google's servers are under constant attack from hackers. Have they succedded yet? Doubtful. But every minute of everyday someone is looking for a vulnerability in Google's security.
Great analysis, but i would like to see what the end impact is on SEO, for example. That would make for an excelent post.
Great post and definitely something to keep my eyes on for the future. Some information, even though you're expecting it, did freak me out a little. As long as the data sleeps on the servers I don't see any harm. However my head is spinning now around all sorts of different ways to use this data for evil (money making)!
It's slightly concerning how google doesn't have privacy policies on some of their services, just makes the mind wander as to what they might be collecting.
Danny, thanks for the informative post! Certainly an eye opener, but not really suprising. In all honesty, it feels like they should be using more information.
Post with lots of juice .. thanks for sharing !!
What's slightly frightening is that there are not a lot of well-known alternatives to Google's data collection. Sure, there are other search engines, but I don't think the average user, unless working very consciously, can avoid giving Google some of his/her information eventually.
Of course, that's not as true for the smaller Google services, for example, Microsoft Office is still preferred over Google Docs, and 1-800-FREE411 is a great option over Goog-411 (we don't collect your info, and we actually offer a wider range of listings). But even for the smaller services, how secure is the information they collect? I feel pretty safe at this moment, but in 20 years will all this data still be un-hackable?
Nice Work Danny!
Ofcourse, google is collecting data not only from google toolbar, serps and analytics.. I think Adsense is used too.We wrote some parameters to use it in SeoQuake (FireFox seo plugin), so you can see all Gtrends data just in SERP or visiting site. (looks like alexa toolbar). You can get it here https://addons.seoquake.com/params/index.php?sln=en&browse=2&tag=google&res_lim=10
Fantastic revalation (or is it Revalations)!
Danny's
...reminds me of the wind. You can't see the wind, but you can feel its affects and know it's there. (Guess faith is like that, but too many odd and often narcissistic interpretations by men have messed up perceptions, understandably.)
I did keep thinking 666 as I read this and enjoyed Brent's take.
Paradox: Most of us are in the business of making sure we get found on Google. We optimize our sites and our social profiles so we can lay it all bare so we can be found.
Google Trends' privacy statement mentions using a robots.txt file to disallow the Googlebot if you don't want info shared. Not! Are we laying our lives up on the search altar to be burned as a result?
OMG ...hhhawwwww ... :O
just kiddin . . :)
I knew it . . since the day Google launch universal search
Mormon Power. We finally unite.
This list is HUGE, but I'm not surprise. This post is awesome though, and I'm going to have to vote/link to it.
I was once asked by [an anonymous] Google Engineer if I knew what it's like to have a Terabyte of RAM. Of course I don't, that's outrageous! And they have machines (well, networks of machines) like that running 95% system resources around the clock.
He says.. the other 5% are reserved for browsing, e-mail, typing up documents, and playing Minesweeper, which he loves.
No.. do not ask me for a name. You won't get it.
mmmmm.... data.... yummy!@