This is not a post about SEO. It is, however, a post about the future of search. This surprised even me – when I started writing this piece, it really was just an idea about building a better review. I realized, though, that finding relevant reviews is a useful microcosm of the broader challenge search engines face. Specifically, I want to talk about three S’s – Social, Sentiment, and Semantics, and how each of these pieces fit the search puzzle. Along the way, I might just try to build a better mousetrap.

The Core Problem

Product reviews are great, but on a site as big and popular as Amazon.com, filtering reviews isn’t much easier than filtering Google search results. Here’s the review section for the Kindle Fire:

Kindle Fire on Amazon - 10,859 reviews

That’s right – 10,859 reviews to sort through. Even if I just decide to look at the 5 stars and 1 stars, that’s still 7,208 reviews. If I could click and skim each one of those 7,208 in about 5 seconds, I’ve got roughly 10 hours of enjoyment ahead of me (if I don’t eat or take bathroom breaks). So, how can we make this system better?

(1) The Social Graph

These days our first answer is usually: “SOCIAL!” Social is sexy, and it will solve all our problems with its sexy sexiness. The problem is that we tend to oversimplify. Here’s how we think about Search + Social, in our perfect world:

Search/Social Intersection = Sexy

Unfortunately, it’s not quite so magical. There are two big problems, whether we’re talking about product reviews or organic search results. The first problem is a delicate one. Some of the people that you associate with are – how shall I put it – stupid.

Ok, maybe stupid is a bit harsh, but just because you’re connected to someone doesn’t mean you have a lot in common or share the same tastes. So, we really want to weed out some of the intersection, like Crazy Cousin Larry…

Search/Social Intersection minus Crazy Cousin Larry

It’s surprisingly hard to figure out who we actually sit at the Crazy-Larry table. Computationally, this is a huge challenge. There’s a bigger problem, though. In most cases, especially once we start weeding people out, the picture actually looks more like this:

Real Search/Social Intersection - Very Small

Even with relatively large social circles, the actual overlap of your network and any given search result or product is often so small as to be useless. We can extend our circles to 2nd- and 3rd-degree relationships, but then relevance quickly suffers.

To be fair to Amazon, they’ve found one solution – they elicit user feedback of the reviews themselves as a proxy social signal:

20,396 people thie review helpful

This approach certainly helps, but it mostly weeds out the lowest-quality offerings. Reviews of reviews help control quality, but they don't do much to help us find the most relevant information.

(2) Sentiment Analysis

Reviews are a simple form of sentiment analysis – they help us determine if people view a product positively or negatively. More advanced sentiment analysis uses natural-language processing (NLP) to try to extract the emotional tone of the text.

You may be wondering why we need more advanced sentiment analysis when someone has already told us how they feel on a 1-5 scale. Welcome to what I call “The Cupholder Problem”, something I’ve experienced frequently as a parent trying to buy high-end products on Amazon. Consider this fictional review which is all-too-based in reality:

The Cupholder Problem (fake review)

I’m exaggerating, of course, but the core problem is that reviews are entirely subjective, and sometimes just one feature or problem can ruin a product for someone. Once that text is reduced to a single data point (one star), though, the rest of the information in the content is lost.

Sentiment analysis probably wouldn’t have a dramatic impact on Amazon reviews, but it’s a hot topic in search in general because it can help extract emotional data that’s sometimes lost in a summary (whether it’s a snippet or a star rating). It might be nice to see Amazon institute some kind of sentiment correction process, warning people if the tone of their review doesn’t seem to match the star rating.

(3) Semantic Search

This is where things get interesting (and I promise I’ll get back to sentiment so that the previous section has a point). The phrase “semantic search” has been abused, unfortunately, but the core idea is to get at the meaning and conceptual frameworks behind information. Google Knowledge Graph is probably the most visible, recent attempt to build a system that extracts concepts and even answers, instead of just a list of relevant documents.

How does this help our review problem? Let’s look at the “Thirsty” example again. It’s not a dishonest review or even useless – the problem is that I fundamentally don’t care about cupholders. There are certain features that matter a lot to me (safety, weight, durability), others that I’m only marginally sensitive to (price, color), and some that I don’t care about at all (beverage dispensing capability).

So, what if we could use a relatively simple form of semantic analysis to extract the salient features from reviews for any given product? We might end up with something like this:

Sample Review w/ Feature Extraction

Pardon the uninspired UI, but even the addition of a few relevant features could help customers drill down to what really matters to them, and this could be done with relatively simple semantic analysis. This basic idea also illustrates some of the direction I think search is heading.  Semantic search isn’t just about retrieving concepts; it’s also about understanding the context of our questions.

Here’s an interesting example from Google Australia (Google.com.au). Search for “Broncos colors” and you’ll get this answer widget (hat tip to Brian Whalley for spotting these):

Denver Broncos Colors (Google.com.au)

It’s hardly a thing of beauty, but it gets the job done and probably answers the query for 80-90% of searches. This alone is an example of search returning concepts and not just documents, but it gets even more interesting. Now search for “Broncos colours”, using the British spelling (still in Google.com.au). You should get this answer:

Brisbane Broncos Colors

The combination of Google.com.au and the Queen’s English now has Google assuming that you meant Australia’s own Brisbane Broncos. This is just one tiny taste of the beginning of search using concepts to both deliver answers and better understand the questions.

(4) Semantics + Sentiment

Let’s bring this back around to my original idea. What if we could combine semantic analysis (feature extraction) and sentiment in Amazon reviews? We could easily envision a system like this:

Reviews with Feature Extraction + Sentiment

I’ve made one small addition – a positive or negative (+/-) sentiment choice next to each feature. Maybe I only want to see products where people spoke highly of the value, or rule out the ones where they bashed the safety. Even a few simple combinations could completely change the way you digest this information.

The Tip of the Penguin

This isn’t the tip of the iceberg – it’s the flea on the wart on the end of the penguin’s nose on the tip of the iceberg. We still think of Knowledge Graph and other semantic search efforts as little more than toys, but they’re building a framework that will revolutionize the way we extract information from the internet over the next five years. I hope this thought exercise has given you a glimpse into how powerful even a few sources of information can be, and why they’re more powerful together than alone. Social doesn’t hold all of the answers, but it is one more essential piece of a richer puzzle.

I’d also like to thank you for humoring my Amazon reviews insanity. To be fair to Amazon, they’ve invested a lot into building better systems, and I’m sure they have fascinating ideas in the pipe. If they’d like to use any of these ideas, I’m happy to sell them for the very reasonable price of ONE MILL-I-ON DOLLARS.