Earlier this month, I attended the SemTechBiz2013 conference in San Francisco. This is a gathering of creators and designers of the semantic tech stack, folks who work on semantic web standards, and representatives from the search engines, all coming together to discuss the state of the industry. There was a focus on semantic search and structured data markup at the show, reflecting the expansion of schema.org and Google Knowledge Graph as well as Bing Snapshots and the growing influence of the Open Graph Protocol.

Aaron Bradley wrote up a fantastic list of key takeaways from the conference, and if you're attempting to get your head around semantic search, it's a great starting point. Blatant plug alert: I'll be talking about how to strategically adjust for these shifts in my talk at MozCon in early July.

Marketers have a laundry list of activities to choose from to increase visibility, build brand, and drive engagement. It can be tough to quantify when to work on the hot new thing, especially when the words "Google" and "SEO" are prominently involved. When there are fundamental shifts in the SEO landscape (and I believe we're near the beginning of one of these shifts), search industry practitioners are often asked how to organize a strategy around the new tactical options. Here are five questions that I hope clarify the current state of semantic SEO and structured data markup:

1. Is "Semantic SEO" a new term?

We spend a lot of time in the SEO community debating terms and definitions, even when they are established activities we've been doing for years. This is doubly true for folks in tech who are not in the search industry. If you have an abundance of free time, you can jump into any Hacker News thread related to SEO and see there's still no agreement on whether or not SEO is a valid term or discipline.

The optimization part aside, in Aaron's SemTech piece I referenced above there is a concise definition of semantic search provided by Tamas Doszkocs of WebLib:

"Semantic search is a search or a question or an action that produces meaningful results, even when the retrieved items contain none of the query terms, or the search involves no query text at all."

That's a great starting point to think about how Google and Bing are shifting towards semantic search results. Justin Briggs wrote a piece about entity search results that's over a year old, and it's still a useful primer on how search engines are increasingly moving towards these kinds of results for user queries. However, there's still not an agreed-upon term to describe the activities around achieving visibility in the semantic search results or optimizing for a semantic search engine.

I've heard everything from "entity-based SEO" to "entity SEO" to "Search Entity Optimization" as descriptors for optimizing around entity-based results. I'd personally lean towards "semantic search optimization" or "semantic SEO," but I can guarantee one thing: It doesn't matter what you call it at the end of the day. Adjusting to the semantic search landscape will be part of the SEO's job description going forward.

2. What do "entity-based search results" look like now?

The first wave of entity-based results in Google have been through "answer cards" and Knowledge Graph results. We're used to frequently seeing Google searches for people, places, and media object results that look like this:

It's obvious that Google's Knowledge Graph result above is generated primarily from the Freebase entry on Sam Peckinpah. The shift that will be much harder to deconstruct will be search results ranking sites that aren't clearly optimizing for specific keyword queries, or may not contain what SEOs would consider strong link profiles with exact- or partial-match anchor text. Consider this result for the classic phrase from the movie 2001: A Space Odyssey:

The YouTube clips and other search results on the first page all contain what you might expect to see in terms of on-page optimization and anchor text profiles: keyword usage in the title/META tags/URL, and a mix of exact- and partial-match anchor text in the link profiles. But the IMDb and Wikiquote pages are a bit different, and don't contain strong signals in either of those areas. There are quite a few links to the IMDb page, but relatively little in the way of partial- or exact-match anchor text that an SEO might be expect to see. Additionally, while the phrase is found in the body content of the page, the usual SEO sweet spots in the URL, internal anchor text, and HTML title tag aren't optimized for the quote.

Gianluca Fiorelli recently wrote a piece on graphs and entity recognition, which addressed this topic and how it it may relate to co-occurrence and co-reference across web documents. Google released the Wikilinks Corpus this year, and in the release they describe a system of co-reference to add in entity resolution. Specifically, when are different mentions or queries referencing the same entity across web documents?

The Google/UMass Wikilinks project provides a good illustration of cross-document co-reference with two web documents that both link to the disambiguated entity 'Banksy' on Wikipedia:

Or in my previous example above, when people are searching for "I'm sorry Dave," Google can fairly easily match that query to the entity 2001: A Space Odyssey across web documents that co-reference the IMDb page, and return results for that entity without relying on keyword string matches in HTML tags and anchor text.

3. So is the keyword dead?

Interestingly enough, I've read two pieces from very sharp SEOs who have a different take on that. AJ Kohn makes a compelling case that keywords still matter as they are crucial in determining user intent and matching that to relevant results. While entity-based SEO and Knowledge Graph results attempt to guess user intent through localization, personalization, and entity disambiguation, there's nothing more clear in terms of intent than a keyword string of "hospitals in Seattle" or "What's the best Xbox 360 game?" (Obviously it's Bioshock.)

But there are a couple of signs that the keyword may be fading a bit as the ultimate arbiter of user intent. Consider the launch of Google's "conversational search," which layers what you've searched for, who you are, and where you are as intent modifiers to your query. Even stubborn old SEOs are coming to realize that there are layers of implicit intent in search results that we can't possibly unravel through keyword research or link graph metrics.

Mr. Bradley makes a very salient point in his SemTechBiz writeup (seriously, read that): Mobile is the driving force behind the semantic search revolution. Google, Bing, and Yahoo all see the writing on the wall with mobile adoption and the slow death of the desktop PC. Keywords may never die, but they're going to have a lot of company when it comes to determining user intent and serving relevant search results.

4. Is structured data markup a ranking factor?

Wouldn't we love to know? Not to be rude and answer my question with a question, but when was the last time Google actually confirmed something is a factor in their ranking algorithm? My memory says it was the site speed announcement in 2010. Readers should feel free to correct me in the comments if there is a more recent example.

As a simplified mental model, you could group the search engine ranking factors into one of these categories:

  • Popularity signals: Links, and the quality and quantity thereof in particular. Other visibility signals such as social media sharing would fall into this category.
  • Relevancy signals: There's a whole lot that goes into this one, but a good reference point is the Google patent on phrase-based indexing.
  • Things that dramatically affect user experience on a site: Hacked sites at the extreme, and smaller factors like site speed or reading level at the other end of the spectrum.
  • Things that actually appear in the search engine results: Keywords in HTML titles, URLs, and META description tags (yes, they affect CTR at a minimum).

Structured data markup significantly affects both the way relevancy signals have traditionally been generated in the keyword-string SEO world, as well as how search results actually appear. The SERP landscape is a long way from ten blue links; video and image thumbnails, authorship thumbnails, and rich snippets of many types now fundamentally alter what users click on:

It will be interesting to see what testing data and correlation studies tell us about structured data markup as a ranking factor. If Google and Bing can derive a clean signal from the presence of this markup, it certainly meets other criteria we've typically used to mark something as a ranking factor. Here at Moz, we'll soon be publishing ongoing updates to the 2011 Search Engine Ranking Factors study. It should be interesting to once again see any changes in correlation data as well as the latest SEO survey results.

5. Will implementing schema.org markup actually hurt our search engine visibility in the future?

There have been a number of SEOs who raise valid concerns about the implementation of structured data markup. Will it enable scraper sites to easily take your data and use it to outrank you? Or worse, will Google vacuum up your data for its own purposes in Knowledge Graph results or increasingly sophisticated rich snippets? This tweet from Dennis Goedegebuure concisely sums up the latter concern, and it applies to Google, Bing, Facebook, Twitter or any other search engine or social media network:

For many practitioners in the SEO industry, it feels like we may have seen this movie before. Let's say, for example, that you spent considerable time and money optimizing images with an eye on increasing your visibility in Google image search. The recent UI change to Google image search results likely had a significant negative ROI impact on that effort. There's a very sound takeaway in that Define Media Group post: It's still a good idea to adhere to SEO best practices for image search optimization, but it likely changes how heavily you'd opt to prioritize that work versus an activity that will yield more traffic or visibility. The same ROI calculation should be applied to structured data markup, whether it's schema.org, Open Graph Protocol, or Twitter Cards markup.

The vast majority of the rich snippets and Knowledge Graph elements in the search results are derived from Freebase and a small handful of other semantic data sources, such as the CIA World Factbook and MusicBrainz. Whether or not we choose to mark up our sites will have little effect on the current Google or Bing SERPs.

However, there's a massive amount of data still present in good old HTML, and the search engines are keen to use structured data to display that information. You can see the limitations of document retrieval and reliance on the link graph in any number of less-than-desirable search results. I believe Google and Bing will raise the bar on the quality of search results through the wider adoption of semantic data markup.

I also believe we should consistently hold them and any other structured data consumers accountable for making sure proper attribution and responsible user interface design are key parts of their structured data consumption. SEO has received a bad rap in some circles as simply being a vehicle for spam. The reality is that SEO heavy-lifting is behind many of the better search results you'll find. Going forward, the same guideline will apply to structured data.

A healthy web ecosystem will find a balance between search engine, user, and content publisher. Let's continue to remind the aggregators of our data of that as we continue down the semantic SEO path.

Bonus question: What's the best move for web publishers?

The rate of adoption of structured data markup will have profound effects on what our SERPs look like in the next few years. Is it worth the effort for web publishers (both small and large) to immediately provide this markup? I'd love to hear your thoughts and strategy in the comments.