AOL search data - why am I posting this under Google?  Because AOL uses Google for their search.

I recently grabbed a copy of the AOL search data that was released earlier this month.  The dataset is rather large - 450MB compressed - over 1.4GB uncompressed in MySQL.

This data has been the subject of much controversy and has even resulted in at least 3 people getting fired. .

Anyhow, I've been parsing through the dataset and there's some interesting stuff in there.

I'll keep this thread updated with the information that I find...
  1. People use the search bar to type in website names a lot more than you realize.  Out of the over 17 million queries, almost 3.5 million of them contain .com, .gov, .edu, or .org.
  2. People refine their searches as they go along
    • dr shermam
    • dr sherman longwood
    • dr sherman vision therapy
    • dr sherman vision therapy fl
  3. Babies search too!  How do I know this?  Witness searches for random characters or symbols.  You can't tell me that people were actually expecting to FIND something like this:
  •  6;p6p5p56ptpptptptppprprpprpprprprp...
I'm also running an analysis on the words that are in the queries.  I'm curious how many 1 word, 2 word, 3 word, and 4 word queries there are.  It might not amount to anything but it'll be interesting to look at :)  I expect it to take quite a while to break all this data down so probably no updates until the end of the week.

G-Man