Last week, drowned out by the Panda 4.1 rollout, the MozCast Feature Graph detected a significant jump in the presence of answer boxes (+42% day-over-day, up to +44% on September 30th):
This measurement includes all types of "answer" boxes – direct answers, stock quotes, weather forecasts, box scores, and even the new, attributed answer boxes. Digging into the data, it appears that almost the entirety of the jump is in the new style of answer boxes. These are the answers that are extracted from 3rd-party websites, and they look something like this:
The key distinction is that you'll see a search-result-style title and link below the answer. Separating just this data, the same two-week graph looks like this:
The day-over-day increase from September 25-26 in new answer boxes was +98%, almost doubling the total number in our data set. This clearly represents a significant expansion in Google's ability to extract and display answers.
The "Winning" Queries
Over 100 queries picked up the new answer boxes in our data set. Below are 10 examples. Keep in mind that any given query may gain or lose its answer box for any given search, depending on factors such as search history, localization, and personalization:
- global warming
- mba
- steampunk
- dsl
- triathlon
- pollution
- firewall
- activex
- vegan
- project management
Many of these are general, informational answers, and quite a few of the new answer boxes in our data set seem to be coming directly from Wikipedia. With this update, Google also may have added a new capability – here's the answer box for #3 above ("steampunk"):
The image on the right is being extracted directly from the article. While we've seen some examples of brand boxes with logos, the ability to directly add general images seems to be new. Other new answer boxes are more traditional, such as "mba":
Many of these new queries seem to be broad, "head" queries, but that could be a result of our data set, which tends to be skewed toward shorter, commercial queries. One four-word query with a new answer box was "girl scout cookies types":
It's interesting to note that the more grammatically correct "girl scout cookie types" doesn't seem to return an answer box. These new answers seem to be very dependent on query structure and how the query matches on-page keywords.
An Experiment in Answers
If Google is pulling more and more answers directly from the index (i.e. our sites), then it stands to reason we could update those answers. A couple of months ago, I noticed that one of my posts was producing an answer box for the search "how much does google make":
Even as the author of this post, I had to admit that was a pretty terrible answer, especially being 3-4 years out of date. I quickly assembled a Twitter mob to deal with this problem (well, basically Ruth Burr Reedy and David Iwanow), and we unanimously decided something must be done:
I decided to edit the top of the post, adding a user-friendly update for new visitors that gave new numbers for 2013. This went up on July 10th – I posted the update on social, and by later that day the new page was cached.
Two weeks went by, and there was no change to the answer box. Naturally, I assumed this was because the old text was still in place (I had simply added new information). So, on July 24th, I carefully removed the old content (that appears in the answer box) and edited the META description. By the next day, the new page was cached and the new snippet was showing up in Google SERPs.
So, what does that answer box look like today, almost two months later? Look up four paragraphs, because it's exactly the same. Even though the content used in this answer box is now completely gone, Google is still using it in search results.
While this is only one example, it seems to suggest that these answers are not being extracted and created in real-time – they're being stored in some sort of internal Google knowledge base. This may sound familiar, if you've read anything over the last month about Google's theoretical Knowledge Vault.
Unlike Freebase-based Knowledge panels and answers, this internal vault can't be edited directly. Unlike organic results, where changes to our pages are generally reflected on the next crawl-and-cache, these answer boxes are being updated much less frequently. Since these new answers link directly to pages, they could be connecting to information that's been mismatched for weeks or even months.
At this point, there's very little anyone outside of Google can do but keep their eyes open. If this is truly the Knowledge Vault in action, it's going to grow, impacting more queries and potentially drawing more traffic away from sites. At the same time, Google may be becoming more possessive of that information, and will probably try to remove any kind of direct, third-party editing (which is possible, if difficult, with the current Knowledge Graph).
This is a really interesting. I think its bizzare that Google would build and launch a "timely information navigation" feature that is so flawed when it comes to providing correct data.
I know a few months ago there was much who-ha about the fact they couldn't keep up with the change in Microsoft CEO despite it being widespread news. I wrote about that here: https://www.andrewisidoro.co.uk/blog/microsofts-new-ceo-breaking-news-knowledge-graph/
I thought with the shift away from Freebase-based Knowledge panels and answers we would see less of these issues... seems not.
On average, I think the data coming out of Freebase is pretty good - it's at least human written and vetted. The problem is scale - Google just can't grow it fast enough or keep it up to date. Oddly, we're back to the same problems web directories ran into over a decade ago.
You are spot on, it's like digital is coming full circle and tracing back to many of the same problems as before...needing human eyes and ears for large amounts of data (sometimes just to ensure things are, indeed, created by and FOR people and not 'bots). Fine point!
Hi Andrew,
because I found your article about the knowledge graph very interesting, I'd like to know your opinion about The Greggs Logo issue. Do you member around 18th or 19th of August the knowledge graph showed up a fake logo of Greggs? Mainstream media referred that to a kind of hacker issue when it was actually something "natural", Google took that logo from a satiric website, I suppose because in somehow that site gave better signals about Greggs business than the official website or even than Wikipedia. As soon as in the world everybody knew this issue Google amended.
Do you think Google took the right decision? I mean, of course it was an error, but hold on does Google give reply to human or business queries? I mean think about the keywords; there are keywords perfect to define your business and your industry. They are the right ones, but actually who looks for your products/services, because he's not a professional, uses different words. Your business want to rank for keywords that none actually searches. In a case like this, should Google shape the answers to user queries according to what the business fancies?
I try to be as clear as the day: a former SEO of Nationalexpress told me this. The board of that business needed months - maybe years - to understand in UK their customers search for buses, coaches is not so popular. So do you think that Google should have shaped the queries around Nationalexpress let it ranks for "buses" although they haven't mentioned a single time that word in their website or other content referred to them? They were like disgusted by the word buses.
I think if Google aims to give relevant answers, it means they have to give back answers which reflect what people are thinking about a topic, like a brand is. If we think - for any reason - in our collective imagination a brand is horrible or synonymous of something why they should clean off this from the SERP?
Like for the Google bombing "miserable failure", well it was a kind of manual action, but it reflected what many in the world were thinking during those days. In that case Google didn't do anything so fast to amend.
IMHO Google did something horrible: it protected - like WWF does with pandas - the big business; Google wants to see everybody on Adwords therefore knows the knowledge graph can harm potential investor. I think Greggs that day would have stopped any Adwords campaign on its brand name. I wouldn't invest a single penny when on the right side there is logo which states I am a crook.
Cheers,
Pierpaolo
quick note I linked to the Greggs (and pc world ) issues below, clearly someone didn't like it as it was thumbed down! Regards to the error don't forget knowledge graph is still young and learning so there will be some errors, also it could be that Google might take info from a few courses to ensure it doesn't happen again.
The Gregg issue was nothing new. In Italy (Google.it), in 2013 if you were searching for "Stakhanov", you were proposed with a KGraph correct in everything but the main photo, which was a photo of the Italian Finance Minister at that time (Brunetta) photoshopped as if he was a soviet general hero.
The interesting fact was that the Images Search Result for that query was not showing that image.
Why was that? Because the post from where the image Google picked up the image (and not really from an authoritative blog) was comparing Brunetta with Stakhanov, and because the image was used in Open Graph og: structured data.
That was the only reasonable justification: the fake image was not present both in Freebase or Wikipedia or any other potential public linked data sources.
What that was telling us? That Google retrieves information thanks to any form of structured data (and open graph is rdf at the end) but it is still not able to understand the semiotic behind the semantic. It is not able to understand if we use an image ironically or not.
In this sense, Google really is still a third grader. A well-intentioned student, but still a third-grader.
I didn't know this about Brunetta. very interesting and I'm agreed with about Google like a third-grader, but I mean Brunetta or Stakhanov won't bid on Adwords, neither will do Bush, instead Greggs and other businesses invest a lot of money on Adwords.
But I didn't get what you say about that picture used in O.G. Did they uplaod the fake picture through a structured data? As far as I know Google images works on the genarl shape of an image to look for similar. that picture can be very similar to the original one, as only the head changes.
However, Google indexes images according the words on the page. In one of my former agency, one I found that one of the former developers placed in the footer a link using white font on white background. Cloaking but non fro spamming for insulting our director! It was "XXX (name of our director) is a fat f*@k1n* c*&%t"
Well, our site ranked for that work among web pages and images either! The exact search gave only our site!
Well, I can tell you that wasn't nice, but Google and that former developer were totally right, IMHO.
What I was saying is that for Knowledge Graph Google doesn't use the same process used for the Images Index. The images tends to be different, and usually it seems taking into account - a part images present in linked data sources like Wikipedia - images that are used in structured data like Open Graph or schema.org/Article.
And KGraph and Adwords are two completely different gardens in the Google palace. That Greggs is spending a lot of money on Adwords or not doesn't matter to the Knowledge team.
It's always good to read your post. I would like to add one more thing which I have already tweeted cyrus shephard about few days ago. It look totally new to me.
In Between, I am a little bit confused about the Google answer box. What are the Google's criteria to give link for the Answer box? Sometimes it gives the link for some queries but for other queries it directly gives the answer box without any link.
Few weeks ago I asked the same question on Twitter, and Barry Schwartz moved it to Google itself, which quite surprisily gave an answer very fast.
Resuming it, Google is saying that if the answer can be defined as unique (aka: integrally taken out from the content of a given site), then Google will credit the source with the link in the Answer Box.
On the other hand, if the answer is something common to all the sites (i.e.: the age of Obama), then it won't show any credit link.
I can't help feel there's a lot of spin in that (Google's, that is) statement. It seems like answers that come from properties they own (like Freebase) don't get links. Answers they scrape from 3rd-party sites get links. Of course, that will probably evolve.
Hey guys,I'm sure both of you are far more qualified than me to talk to the finer technical points of this, but I've been perceiving the difference as factual (unlinked, owned) versus nuanced (linked, scraped) knowledge. I think both of you are saying the same thing in different ways. My perception is that Google's properties can deal with topics and determining hard and set facts for those topics, but when it comes to nuance, it is relying on the web as a whole. It can use its capabilities to identify that a page speaks to a particular topic, and even attributes of that topic, but if the answer is not an easily definable fact, it provides the linked page. For example, for "how do you treat malaria", the answer is far more nuanced than a single fact, so it provides a link. Conversely "What is jaundice" is an easily definable definition that provides an unlinked answer.
That's a fair point - the kinds of questions Freebase can answer are naturally the structured, factual questions. The kinds of questions Google needs to extract from the index are naturally more ambiguous questions. So, this may not be deliberate on Google's part, so much as just a side effect of these two, unique data sources.
Now, I'm trying to find exceptions to that rule... :)
ATTN SEO's,
This is a true story.
A few months ago I spent all day calculating the cost and material involved in running a new fence across the back of my property. I searched for the price of things like; different types of lumber, the cost of cement mixes and I used google's calculator to estimate the distance from one point of my property to another. After a few hours of research I performed the search query "how many feet per acer" and Google showed the following answer box.
https://drive.google.com/file/d/0B6F9fSbj8rLsOWNzb...
For those of you who are too lazy ;) to view that screenshot, here is what the answer box said:
"An acre is a unit of area, it is 43560 square feet, but it can be any shape, a square, a circle, a long thin rectangle, a triangle, a pie shaped (…come on Google, we get the point) or any other shape. If it is a square, the sides would be approximately 209 feet long and you would need 4 209 = 836 feet of fencing. "
They knew that I wanted to calculate how much fencing I would need!!! This is amazing.
Takeaway: The most interesting part of this situation is that the answer box was personalized to my search history. If you were to perform that search today, than you will probably get the following result:
"1 acre is approximately 208.71 feet × 208.71 feet (a square) 4,840 square yards. 43,560 square feet. "
Google expanded their answer box to include the part about fencing. This tells me that Google is giving personalized answer boxes that are relevant to our intentions, not just our keywords. #incredible.
Has anyone else experienced something like this?
Wow, that's wild. I can't seem to replicate it by creating a similar search history, so it's tough to tell if this one case is just a freak coincidence, but it's hard to dismiss the possibility. Definitely worth keeping our eyes open.
Yeah, it was the first and last time I've seen something that personalized appear in a Google answer box. My hypothesis is that Google is testing this concept. I imagine we will start to see more personalized answer box rolled out within a year.
Dr. Pete,
I'm wondering what the likely impact is/could be for content creators? We're constantly bombarded with "Seek to be the best answer for a query on the topic." This seems to further buttress that point.
RS
Yeah, it's tough - if there has to be a box, you'd rather be in it than your competitors, but, all else being equal, it might be better if there was no box at all (from a CTR standpoint). Many of these answers are making the organic results nearly irrelevant.
It looks like it has updated now as you can see by this screenshot https://easycaptures.com/fs/uploaded/936/7558078085.png . I guess it only took 4-5 months. Have you heard any updates on this like if they are being updated in real time now?
Yeah, it got updated a couple of weeks ago (sorry, probably should've added a note to the post). It does not look like this is in real-time. I suspect it's an occasional mass-update, but it's a bit hard to tell right now.
YAY experiments! I always find it amusing that there has been a few hiccups on the knowledge graph like Greggs and Pc World I can only imagine the fun if like the knowledge vault there was not way to go back! I Wonder how much the knowledge graph will come into play in the future (will it be like trying to get people set up with the publisher tag?)
As Always thanks for the post.
Interested in seeing how the Knoweldge Graph plays out in SERPs. Please continue to write, research, and keep us updated. Although a bit buggy, issues with scale, and the unknowns it seems fairly powerful for an end user. I do feel as if Google will eventually fix the scale issue, just may take a little time.
This is a behavior of Google almost incomprehensible. Thank you for your answers anyway! I adhere, but only half.
No words left I believe..
Other question. Does somebody knows how to check broken backlinks for free. Not how many but which. Also for the same site: https://bic-code.nl is it better to use Alexa tools or Google tools. Alexa is not free but is it so much better then Google. As I have a nl domain. Not sure. Thanks
Any ideas on how we could contribute towards providing answers to Google answer boxes
Still sad to see how long information can take to update in Google. Interesting to see if this can be a huge traffic driver for clients sites if they are able to get their information in a popular query to be the knowledge graph answer though.
Very interesting and love the experiments. Odd that like two months later it still has not been updated. Wondering if there is a new "bot" that does this perhaps?
The world's biggest scraper is hitting again and strongest! At least it could be nice for websites a revenue of 0,5$ for each snippet of information stolen by Google :D
Interesting @peter as usual I've same concern but with an answer box coming from Freebase apparently (Google doesn't provide source URL): It shows my current situation along with a logo which is wrong. I need to update so I submitted a feedback saying the logo was wrong (Search for "Mozalami") I removed the wrong logo from Freebase but still showing in SERPs : maybe Google Vault is applying same rules whatever the source is : even Freebase that's editable doesn't make difference !
If your website shows up in the answer box for several queries does that mean that Google finds your site an authority in its niche?
Very interesting post. I'm really curious how Google goes about gathering this data. I assumed they scrape from whatever the top result is, but a lot of Answer Boxes use stuff from the 2nd or 3rd result (https://www.google.com/search?q=facebook%20banner%20dimensions&rct=j). This could also be a case of the data going into their Knowledge Vault, while ranking #1 for the search term, and then getting re-ranked afterward.
As far as I can tell, they all come from page 1 SERPs, but since ranking is calculated in real-time and these answer boxes don't seem to be, it seems like the data has to be extracted and stored and *then* the site has to rank high enough for any given, relevant query to surface that data. That's just speculation right now, though.
Now I am stuck researching about steampunk ...
So I want to try and get one of these answer box things. I don't really know much about them other than the terms look rather non commercial ,brandish or are designed to keep visitors on Googles home page longer with the hopes Google will get more ad clicks..
Must the landing page from the answer box also be on a noncommercial page or site?
I rank pretty well for "uses of a website" "uses for a website" stuff old guys are suppose to rank for blah blah. I want my name in a box. too bad it does make any money lol
At this point, it's seems like getting chosen for an answer depends on a mix of authority and on-page signals, but the on-page signals are probably more important than they are for general SEO. Google seems to be looking for pretty exact matches on the query keywords to generate this kind of answer.
The tougher question is: what causes Google to put an answer box on a query at all? Some of these are obvious, question-style queries, but others aren't quite so clear.
Yea Pete your right about figuring out what causes Google to put up an answer box at all and reverse engineer the process.
After reading this thread I tried a couple of search queries of my own including
investing in gold (no answerbox)
how to invest in gold (yes answerbox) <<< Look how far that answer box pushes down the organic results below the fold. AND BY TOTAL Accident leaves a nice selection of paid ads to click on. Another Loss for SEO and free placement and A Win for Google and more exposure for their paying traffic.
why invest in gold (yes answerbox)
Now what triggers the answer box to a specific phrase I don't know yet.
Remember Never, I mean .. always check the answer box to see if it offers a link to the source page. If you can buy display advertising on that page probably be a good Idea in most cases.
So I guess you can make money with the answer boxes on the buy/sell side and I would suppose it does well for brand enforcement.
So people will be losing more clicks soon. Bring back authorship! LOL
Great experiment Dr. Pete. Always appreciate the hard work you and the team put in.