Sometime in January, Google quietly rolled out a change that I believe could revolutionize organic search. Currently, the impact is limited, and it may take months or years for the full effect to be felt, but the underlying shift is fundamental to the future of the Knowledge Graph and the delicate symbiosis between Google and webmasters.
Answer box 1.0
Let's start at the beginning. I've written a lot about the current generation of answer boxes (sometimes called "direct answers" or "one-box answers"). These display quick answers to what are usually concrete questions. For example, if I want to know when the Willis Tower here in Chicago is open, I can search for [Willis tower hours] and get:
Google's ability to understand questions has expanded significantly in the past couple of years, probably pushed forward even more by the Hummingbird update. For example, I can get the same answer box by querying [when is the Sears Tower open].
So, where is this data coming from? Typically, it's coming directly from the Knowledge Graph, and you can spot it pretty easily. Here's the Knowledge Panel for [Willis tower]:
I've added the red arrow – as you can see, the information in the answer box is taken directly from a property in the Knowledge Graph. You can easily reverse it, too, to create endless examples. Let's take the property "Construction started: 1970" and turn it into a query, like [when was the sears tower built]. You'll get another answer box:
Most of this information comes from a very limited number of sources, including Freebase, Wikipedia, and Google+. Freebase is structured in terms of entities and properties (think object-based, as opposed to article-based), which makes it a perfect fit for Knowledge Graph.
Google's dilemma
There's a problem, though. The main sources of data for the Knowledge Graph are curated by people. Ironically, Google is facing the same dilemma with Knowledge Graph in 2014 that led to the creation of internet search engines in the first place. Put simply, the scope of information is much too large, and growing too quickly, for any human-edited approach to scale. Google can't just hire Wikipedia editors – they need a new data source.
Google is hardly blind to this problem. In a research paper published just this year, Google outlines the basic issue (hat-tip to Andrew Isidoro):
The paper goes on to explain a method of extracting missing knowledge graph data on demand, using Google's existing search technology. Welcome to...
Answer box 2.0
Luckily (for them), Google already has one of the largest data sources on the planet – their index of the worldwide web. What if, instead of looking for answers in a limited set of encyclopedic sources, Google could generate answers directly from our websites?
That's exactly what they've done. For example, here's what you'll see at the top of a recent search for [social security tax rate]:
Unlike answer boxes based on the Knowledge Graph, this new format pulls its answer directly from third party websites, giving them attribution via the page title and link. In many ways, this is an additional organic result, and like all answer boxes in the left-hand column, it appears above "#1".
These longer answers look more like search snippets, but there's also a second version, triggered when Google can find a definitive answer on a third-party site. Here's the new answer box for the query [September birthstone]:
This example includes a longer snippet, but the direct answer – "Sapphire" – is highlighted, more in the style of a traditional answer box. Again, the source page's title and URL is shown below the snippet.
How do we know, beyond the third-party attribution, that this isn't coming from the traditional Knowledge Graph? Try a variation on the query, like [september's birthstone]. I get this result:
Here's the answer box for a longer query [what is september's birthstone]:
Interestingly, the short answer ("sapphire") is no longer capitalized, because that's how Google found it on the source page. In my personal testing, these variations weren't consistent, so Google may be using some kind of query refinement. Regardless of that, it's pretty clear that these answers are being generated on the fly.
The new number one
These answer boxes are essentially a new organic result, and clearly disrupt the traditional top results. So, where are these answers coming from, and how do you get one? We don't have a lot of data yet, but in every case I've seen, the URL used to create the answer box also appears on page one of Google results. So, you have to already be ranking well on the term.
In most of the cases that I've seen so far (again, the data set is small), the answer is coming from the #1 organic position. For example, here's the answer box and #1 result I get for [marine corps' birthday]:
So, military.com is essentially getting two listings on this SERP. In some cases, though, the answer is coming from a result lower on page 1. Here's the answer box and part of page 1 for [richest man in the world]:
In this case, Time Magazine gets credit for the answer box, even though it's all the way down in #8, and Forbes has all three of the top organic spots. What's even worse is that Time article directly cites Forbes as the source, even in the search snippet. So, what's going on here?
I suspect this comes down to fairly basic on-page factors. The main Forbes article is a bit design-heavy (it has limited crawlable text) and uses an "infinite" scroll approach. None of the Forbes pages directly mention the phrase "richest man in the world", especially in proximity to Bill Gates' name.
What if I change my query to something that Forbes targets better, like [world's richest people]? Here's the result I get (all of these searches are incognito, but I can't rule out some sort of query history effect):
It's interesting that Google seems to be inferring that I want to know the world's richest person (and is bolding "Bill Gates"), but doesn't feel that the answer is definitive enough to break it out as a short answer. Even since starting this post, Google has made refinements to the matching system, but currently it seems like on-page keyword targeting is fairly critical.
It's just the beginning
Google clearly has a long way to go. Some of the answer boxes are pretty ridiculous. Take, for example, a search for [hair color]:
This is a pretty ambiguous query, and it doesn't seem well suited for any kind of answer box (let alone one that's one step away from a salon advertisement). Expect Google to put a lot of time and money into improving this system over the next year.
While this post is focused on answer boxes, Google is using a similar approach to expand knowledge panels. For example, here's a search for [biology]:
Notice the "Related topics" section – only one of those results is coming from Wikipedia. Google is building a decent chunk of this knowledge panel on sites in their index. The attribution on these is much more subtle – only the small, gray text goes to the source site. The blue links (except for "Wikipedia" at the top) go directly to more Google searches.
Is the balance shifting?
It's easy to see how this progression is inevitable – Google has to expand the Knowledge Graph, and they can't rely on human editors and static data sources. While this data may be good for users, it represents a shift in the balance between Google and webmasters. There's always been an implied symbiosis – Google crawls our sites and extracts information, but they send us traffic in return. We may not always like how they do things, but the end result has benefitted millions of site owners.
What happens when a user can get a simple answer quickly, and that answer is extracted from a third party page and cannibalizes the organic clicks? What happens when third-party data is being used not to drive traffic to the source, but to more Google searches? It seems to me that the symbiosis is threatened.
For now, there's not much you can do. You can work to retune your on-page content to appear in these new entities, but you do so at the risk of harming your own organic traffic. It's probably better to be in the answer box than let your competitor be there, but it's hardly an ideal choice. The best I can say is to be aware of your money terms – not just how you're ranking, but how those SERPs actually look in context. At some point, we may all have to decide if giving away our data is worth what we get in return.
I get that Google is trying to answer the search query for the user, but when Google is using content that it doesn't own or created to do so it starts to take on a different face. Is this really different than scraping content of websites and putting your name on it. Regardless of the benefit to the user there needs to be attributions made and potential royalties paid. How is this different than what Napster did to the music industry?
While I agree in principle, much of the information is from open source Wikis, so a reference/attribute would suffice.
In many examples it's showing where the content is scraped from, giving them exposure they previously wouldn't have got.. I'm more concerned in click-through-rate dropping and this not giving users the opportunity to delve deeper into other content on those pages.
So if a site scraped tons of their content from other websites but gave attribution, do you think Google would penalize it?
Yes, of course.. I see your point, but essentially the goal at Google is to provide the most relevant content as easily / quickly as possible.. They seem to follow a 'Do as I say, not as I do' policy at times, but like it or not this appears to be the direction Google is heading in, so now it's understanding how to get the most out of semantic search.
Consider this from a monetary standpoint. By serving up information IN the SERP via Knowledge Graph, Google is forgoing the potential revenue they could have made from AdSense & on-site advertising by sending the traffic to that site instead of "borrowing" information from it for the SERP.
The problem, IMO, is that this attribution is a lot weaker than an organic result. If a source is just listed under an extracted answer, and that answer solves the problem, then the click is probably lost. Is it good for users? Probably, but is it right? I think they're coming close to crossing the line.
I never click when I see the answer. A few years ago I klicked on the bayern münchen homepage to see when the next game is - now I google "bayern spielplan" and see when and where it is.
It is fast - but good?
Surely, its fast and instant form to get some answers but in order to go to a level where we can call many it needs some time. From the time of release up till now I think Google has mainly used its own content rather than scrapping other website content.
There is another thing that would be a point to think-on by the content providers, that would the length of the content being provided effect search carried out by bots on a particular page?
I think a lot depends on what answers are being provided, and as things stand there's a limit on what it's possible to provide.
If the intent of a search is simply to access some basic fact, the most that searcher would do in the absence of Knowledge Graph is hit a result or two, scan for the answer, and leave. I'd argue that's no great loss, and providing easy access to such simple data doesn't in my mind equate to wholesale scraping.
This is exactly what I was thinking - for people who were just looking for a quick answer, yes in the past a small percentage would land on the site and maybe buy something or subscribe when they weren't planning on it, but most likely the vast majority were just in and out. I still believe this is a blow to website owners in aggregate, but a win for users - which is who Google mostly cares about.
I do think that Google should link directly to the source via their snippets and carousels rather than just another search result - that seems counterintuitive and not user friendly.
There is a fundamental difference in Napster and Google.
User-Agent: Googlebot
Disallow: /
If you don't want to be included; you don't have to be.
This is simialar to what they've done for years. Under each search result google shows the most relavant part. Nevertheless, no one has or will stop going to websites. The people who just spend a few seconds reading the answer from google knowledge graph would have only spent a few seconds on the website anyway. Anyone who wants to spend more than a few seconds will go to the website.
Hi Pete,
This article is a matter of interest for all the webmasters and definitely for the organizations who have been promising their clients to get the first five position on Google ranking. The fact is that Google has devices a way to beat all the efforts of the webmaster that get that #1 position. One question that immediately pops up into the mind is that will this reduce the value of ranking on the first page or position on the search results? Well we will need to wait and watch.
The other thing that is off concern is that if Google is delivering results from Third party data sources, what is the authentication criteria that it will be considering for selecting the best answer. We all know people fake data and figures on their websites to attract traffic and this might lead to unsatisfied or suspicious user for Google. I hope Google has this factor in mind and will keep data authenticity as one the main concern areas of the future updated.
Google is always a great case study from the day it came into existence.
They have a ways to go before they get this right. They're starting to get into MY area now, and I'm seeing all kinds of weirdness. For some queries, I show up in the box, others they show incorrect answers from sites like about.com above results that actually have the correct answers (and normally rank first or second)
Here are screen shots for a few examples. A commercial site is ranked above navypier.com for Navy Pier events; and the Cleveland and Detroit screenshots don't even answer the question. There are others where it gives 2013 information for a 2014 query. It's a mess.
https://imgur.com/mH4k5vC
https://imgur.com/9IyhKaT
https://imgur.com/2YijFsd
While I agree that the second third results are awful, it's fairly obvious why Navy Pier isn't showing for it's own event.
The Navy Pier landing page is awful from an SEO perspective and is really not super relevant to the query. It does have the answer to the question in it but it's not written out. It's buried in a table with tons of other dates and times. The page never even says July 4th, it just says July 4 and Independence Day.
Also, the title tag is a decently sized sentence at 161 characters long and has a typo in it. It also doesn't even mention fireworks in the title tag, which is funny, because the page is about fireworks.
Hello,
Thank you Dr.Pete for a detailed post on this topic. Well I'd say Answer box is one of the best thing from the Google. I like to share a very good example here.
Being a Cricket fan, there is a Twenty 20 Worldcup is going on. Yesterday, when I searched google to know the status of my team I was really surprised by the result.
The Answer box didn't only show the result of last match but also shown the next matches to be played with country flag images, time, date, venue everything. Is this the latest form of Answer box?
Thanks,
P.S: The search query was this.
Thanks Dr.Peter for the Post.!
I just noticed that few search terms are not matching with the answer box. As for an example if we search for "Willis tower hours" or even "when is the Sears Tower open", it does not show what has been expected. So, I am just curious to know whether it is region specific or something else?
I believe that some of this is US-only or at least limited in other countries, but I don't have good data on that.
Great post Pete.
The related topics in the Knowledge Graph are a wonderful way to spot connected nodes in the Knowledge Graph, and they are substantially identical to the In-Depth "categories" that Google suggests to discover (check this snapshot). And that they are the same, IMHO, is quite logic.
But, from where Google mainly takes the nodes? In this case I still believe that Freebase is the main source, being what you described (using information retrieval from indexed documents) a filling the empty spaces the information related to those nodes have.
And this is maybe the most explicit manifestation of how Hummingbird works.
Google's research paper describes a process where they "fill" the Knowledge Graph based on search technology. At first, this almost makes it sound like they're trying to back-fill Freebase, etc., but reading deeper, I don't think that's the case. I think they want to answer the questions on the fly. Are they using Freebase as a data source in that equation? Probably. Over time, though, I think that reliance will fade. They need to be able to construct entities in real time.
Exactly, that's why my Hummingbird reference, to which we should add Machine Learning and Concept Coupling.
The most explicit manifestation of Hummingbird is semantically similar phrases (different keywords) generating nearly identical SERPS for the top 100 results, give or take 5 results that move places.
This is an opportunity. To stay relevant Google has to address a frictionless user experience and keep pace with how people consume information. They need to deliver trustworthy answers and they aren't going to scrape a little from everyone. There is too much risk of bias or manipulated voice in a broad selection of sources for most types of knowledge they want to collect. They are going to curate only from a few of the most accurate/trustworthy sources.
It's a new space: Content knowledge provider. We can't get stuck in how we see the internet as working; if Google doesn't keep pace with how people consume information as a service provider then someone else will.
Sorry, the product you represent shouldn't be the top answer returned as a solution to a general informational query. People want to remove the marketing from answers. It makes them feel manipulated.
You can opt-out of this future. I'm sure Google will protect copyright. In fact, I bet they wouldn't want your content anyhow. (as copyright of the information would ultimately imply bias, as if the information were somehow 'owned' by your company)
Certainly the media channels will go for relevance/learned curation. (they'll get the click-throughs to full articles anyhow, so they don't care if they are scraped) But the object/factual information will need to be strictly unvoiced.
p.s. Thank you so much for this post! One of the best I've read lately. It's more important then I think a lot of people realize.
Thanks for the updates on Knowledge Graph. As a publisher, it definitely is worrying how much content Google is scraping from websites and displaying in their search results for free.
To play devil's advocate though, I don't think this will have a ton of impact on revenues for businesses. These Knowledge Graph listings are mostly showing up on informational queries with low commercial intent. If you think about it, Google also gets the majority of its revenue from when users click out of the SERPs. So if publishers were losing tons of money from missed out clicks because the KG result answered their query, Google would also be missing out on revenue from people not clicking out on Adwords results.
So that kind of serves as the upper limit to how aggressive Google can get with Knowledge Graph results. They can't display them to the point where they go broke because no one is clicking on the ads.
Another cracking post here Pete, and thanks for the H/T.
One of the issues I have had with the Knowledge Graph is it's over reliance on human edited data. It's the reason Wikipedia can't be used as an accurate up to date resource (at least in academia) and it has already begun to haunt entity entries.
I know we have spoken before about how the Knowledge Graph's expansion is one of the most complex areas of search at the moment yet so few seem to be actively studying it. Great to see you continue to break that mold!
Interesting post Pete.
Andrew - but aren't websites just another set of human edited data. Yes they are in general more reliable than Wikipedia, but still full of errors and inaccuracies.
I would argue that some kind of amalgamation of data (from knowledgebase, freebase, answerbox and social networks) with cross referencing for consistency to find the most likely answer would be the way forward. It would currently be impossible to keep an up-to-date perfect database of current information without it being based on human input.
This is essentially what they two in Knowledge Graph, as some of Andrew's tests have shown. They try to validate information across multiple sources (it's now enough just to hack Freebase, for example). With these new entities, though, they've temporarily lowered the bar a bit, and it shows. I think they'll improve and find ways to consider multiple sources, as you said.
Yeah what he said....
In all seriousness, you're correct humans created the data on the web and there are certainly a lot of errors in there, but as with any corpus of data the strength is in consistency. A few months back Google relaxed their need for validation from multiple sources and it lead to this post on manipulating the Knowledge Graph.
As ever I have not doubt that this is Google testing scenarios and measuring effect; but in the meantime we are stuck with low(er) quality panels drawn from a dirty dataset.
Hello Everyone
We are facing problem to Getting Google Knowledge Graph. Can somebody assist me, how to get it and how much time will take to get it.
We have Wikipedia page but our Knowledge graph not comes on Google Search Result.
We have updated scheme code or structure data very well for our website
Thank you everyone for your help and support. All Suggestion Appreciated.
Alex
Nice post Pete..
I have a question simple question, (I know answer will be complex so I am ready for it)
Which factors Google consider to grabs answer from the source (other than Domain authority & links) because answer can be anywhere or everywhere?
I think it's a combination of authority (every answer I've seen has come from a page 1 result) and whole well the content on the page matches the question. Post-Hummingbird, Google can interpret questions better, but this matching is still pretty crude right now. Your content needs to answer the question. I'm not seeing any clear evidence that schemas and structured data are heavily involved.
I checked the source of many of the URLs cited and others, they are totally lacking any schema.org/microdata tagging.
I just saw the answer box for a non-authoritative site when I typed "what is meta description". So, I want to know whether it is possible to view the answer box for any non-authoritative site?
You're right, "the symbiosis is threatened." However, Google has to do this. They have to give searchers what they want instantly, without requiring a click-thru. This is because of mobile's rising popularity. When you're using your voice to search with Google Now and you say "when was the sears tower built?" -- You don't want to have to click a result to get your answer. You want the answer right away.
I think I would like to see Google limit the number of answer boxes for desktop search, but make it more common on mobile searches.
Imagine asking someone a question, "What year did Star Wars release?" That person then goes and gets 10 people for you that probably know the answer. Then you have to talk to the first guy to get your answer. Maybe he actually doesn't know the answer though... so you move onto the 2nd guy and ask him.
...or... the first guy you asked the question to just tells you and doesn't leave to go get 10 other people that might be able to help you.
I love these boxes of information, but if Google is taking that information from Wiki sources then that is a concern.
Sadly Wikipedia can be totally wrong.
Something strange is happening on Google, getting worse at returning searches I want and I see more American and Australia companies than British. Maybe Google can't put me in a box ;)
I am really impressed with your article, it covers almost everything. I am happy my listings showing up with a map, operating hours and etc. The main problem I am facing is "People also search for" option. It is showing up my competitors which is really pathetic. How can solve this issue?
Can anyone of you help me out to solve the people also search issue??
Thanks Dr. Pete, this research goes to show the importance of how Google interprets user intent, and how we as SEOs, webmaster or website owners should do the same. I think the best solution to the ever changing world of search is to first consider the intent of your users and research and test keywords around that intent.
As cited with the Time Magazine and Forbes example of Bill Gates where Time was featured but ranked 8th (likely because of the presence of those keywords (on-pgae factors) in the Title Tag and article), it goes to show how important exact match keywords can be in the SERPs.
This sure is a point of concern for all of us as traffic incoming from Google will decrease drastically specially on informative queries. But I don't think this will effect our money keywords, atleast for now.
hi
Its been while and when I checked the answer box for the query, it doesn't show up anymore! another update?
There's no doubt that the knowledge graph is going to get better and better in the future. (even though it's still a scraper any way you look at it... LOL) Still, there are times that you can't help but appreciate its' usefulness, so it's something that Google should definitely continue to improve on and develop.
Great research as always, Dr. Pete!
Yes, Peter!! You're right, I have also been noticed this activity after Hummingbird Algorithm. Actually I was searching Sarkari Naukri 2014 which is a Hindi language keyword, on that day I searched by many keywords to check form where Google is fetching this data and shown within a Box on top.
Thank you so much Peter for distributing this tips.
"At some point, we may all have to decide if giving away our data is worth what we get in return."
What are you doing now? Selling it? So Google can take what you're selling and give it away, right? What if you have a password-protected page or something?
I don't see any point in worrying about this. Sounds like Google is tinkering in the lab again and none of this will affect anyone for awhile. Well, people might be worrying. Google makes websites worry more than anything else.
I see multiple comments about Google answering people's questions by scraping data and regurgitating it. While I understand the frustration that websites loose traffic, I can at the same time appreciate what Google is going: answering questions efficiently, by taking one or more steps out of the process, in a pleasing, user friendly manner.
Obviously this brings-up questions of fair use. I wrote a scathing review of how the image bar uses images without links or attribution. At the same time I imagine the Google lawyers (Googawyers?) have given the Knowledge Graph a vigorous fair use review.
A second, and interesting question is when does artificial intelligence become "artificial knowledge?" If I ask you what time the zoo opens and you tell me 10AM, You share your knowledge. No one gets upset that I didn't look at the website. If an artificial intelligence platform gathers data and information then shares it when asked, at what point does it become knowledge, or as I'm dubbing it, "artificial knowledge?"
Great post thanks Pete!
One of these days Google will answer our questions before we even have to ask!
Nice work, Pete.
I'm really interested to see how these "answer boxes" evolve in the next couple of years. I like your analysis and referring to them as "another organic listing."
Pretty soon there will be SEO guides and long-form blog posts about "the new #1" and "How to Get Your Site Referenced At the Top of Google SERPs."
You can have all this features by keeping proper watch and feeling proper data in google Local listing page at a time of listing local page or website, building, school, company etc. You can also make your regulare activities appear in search engine using schema documents
Google is becoming more user/answer centric. The only thing I can see in future is AUTHORITY sites will get more preferences.
By the way this giving us an idea how intelligent is google algo.
The over under on Google getting into a massive copyright infringement lawsuit is 2017.
I'll go with over/never.
edit >> Google cites the source and links right to it just below the content. If you leave your site open for googlebot to crawl, you're accepting the terms to be displayed in a SERP. If you don't want Google to display your content in an answer box, there's a robots.txt for that.
Depends on the law in each country. Its not always allowed to do what you want just by saying "you can just do this". Sometimes you have to ask first...
If we search very local question, if question is not relevant, Google give local sites like local.com, yelp, google+ local etc.
I liked that , Google give answers directly in their knowledge graph space.
The whole concept of launching Humming Bird update was onto semantic search and Knowledge Graph is a big piece of that update. Google acquisition of Deepmind recently is also going to be a major breakthrough as Google plans to include AI into search and I think this would be the next big leap. As Larry Page recently said in one of TED talks, search is still yet to be refined and we are scaling that ladder to maintain its importance. The accuracy part would gradually be fed with inputs from various Google News and other platforms. Google Now is another addition which Google had launched for Android and is now live for Chrome on desktops.
In addition to the web interface, I suspect mobile has a huge influence on where Google is eventually trying to go with the Knowledge Graph. Google Now is already fairly impressive, and Voice Search has improved significantly. Future versions of Google Now (and the Knowledge Graph) are likely to be broader, faster and more accurate. As much as possible for factual queries, they're going to want people to ask Google Now a conversational question, and then provide the answer instantly (spoken). No typing, no refining queries, no scanning SERPs, no reading, no clicking, just question -- answer.
I believe online flight search is the best of Google knowledge graph/Answer Box. You mean Google is stealing the data of third party providers without given them traffic back? I am just talking about flight seaches here.
This not only applicable on flight search, its true for 80% of other knowledge graph and its become the biggest scraper of all time,
I know flight terms are also one of them, but these searches are highly affected by knowledge graph results.
Fantastic article, Dr. Pete. It seems that this system would discourage users from digging into the subject further, but it does make sense from a user satisfaction perspective. It'll be interesting to see how this evolves.
Thank for such a great post pete.
Knowledge graph has really revolutionized how Google works. Lets see now what Bing is planning about it.
I think the concern of traffic being lost is being overblown. How much was the traffic worth if all the person wanted was one answer? If I am going to a sports site just to find the score of the last game, or the time of the next game that is some low-value traffic.
But it's still traffic. And it's relevant traffic. The sports site could have enticing sports stories or breaking sports news in the sidebar ready to grab your attention and hopefully keep you for longer. Or they could be selling ad space per impression. Just because someone wants to know the score of a sports game and they Googled it, that doesn't mean they will be 'low-value' traffic to the site they choose to visit.
This. Every visitor counts. Impression is an impression. Click is awareness.
Thanks ...