Search engines, especially Google, have gotten remarkably good at understanding searchers' intent—what we mean to search for, even if that's not exactly what we search for. How in the world do they do this? It's incredibly complex, but in today's Whiteboard Friday, Rand covers the basics—what we all need to know about how entities are connected in search.
For reference, here's a still of this week's whiteboard!
Video Transcription
Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we're talking topic modeling and semantic connectivity. Those words might sound big and confusing, but, in fact, they are important to understanding the operations of search engines, and they have some direct influence on things that we might do as SEOs, hence our need to understand them.
Now, I'm going to make a caveat here. I am not an expert in this topic. I have not taken the required math classes, stats classes, programming classes to truly understand this topic in a way that I would feel extremely comfortable explaining. However, even at the surface level of understanding, I feel like I can give some compelling information that hopefully you all and myself included can go research some more about. We're certainly investigating a lot of topic modeling opportunities and possibilities here at Moz. We've done so in the past, and we're revisiting that again for some future tools, so the topic is fresh on my mind.
So here's the basic concept. The idea is that search engines are smarter than just knowing that a word, a phrase that someone searches for, like "Super Mario Brothers," is only supposed to bring back results that have exactly the words "Super Mario Brothers," that perfect phrase in the title and in the headline and in the document itself. That's still an SEO best practice because you're trying to serve visitors who have that search query. But search engines are actually a lot smarter than this.
One of my favorite examples is how intelligent Google has gotten around movie topics. So try, for example, searching for "That movie where the guy is called The Dude," and you will see that Google properly returns "The Big Lebowski" in the first ranking position. How do they know that? Well, they've essentially connected up "movie," "The Dude," and said, "Aha, those things are most closely related to 'The Big Lebowski. That's what the intent of the searcher is. That's the document that we're going to return, not a document that happens to have 'That movie about the guy named 'The Dude' in the title, exactly those words.'"
Here's another example. So this is Super Mario Brothers, and Super Mario Brothers might be connected to a lot of other terms and phrases. So a search engine might understand that Super Mario Brothers is a little bit more semantically connected to Mario than it is to Luigi, then to Nintendo and then Bowser, the jumping dragon guy, turtle with spikes on his back -- I'm not sure exactly what he is -- and Princess Peach.
As you go down here, the search engine might actually have a topic modeling algorithm, something like latent semantic indexing, which was an early model, or a later model like latent Dirichlet allocation, which is a somewhat later model, or even predictive latent Dirichlet allocation, which is an even later model. Model's not particularly important, especially for our purposes.
What is important is to know that there's probably some scoring going on. A search engine -- Google, Bing -- can understand that some of these words are more connected to Super Mario Brothers than others, and it can do the reverse. They can say Super Mario Brothers is somewhat connected to video games and very not connected to cat food. So if we find a page that happens to have the title element of Super Mario Brothers, but most of the on-page content seems to be about cat food, well, maybe we shouldn't rank that even if it has lots of incoming links with anchor text saying "Super Mario Brothers" or a very high page rank or domain authority or those kinds of things.
So search engines, Google, in particular, has gotten very, very smart about this connectivity stuff and this topic modeling post-Hummingbird. Hummingbird, of course, being the algorithm update from last fall that changed a lot of how they can interpret words and phrases.
So knowing that Google and Bing can calculate this relative connectivity, connectivity between the words and phrases and topics, we want to know how are they doing this. That answer is actually extremely broad. So that could come from co-occurrence in web documents. Sorry for turning my back on the camera. I know I'm supposed to move like this, but I just had to do a little twirl for you.
Distance between the keywords. I mean distance on the actual page itself. Does Google find "Super Mario Brothers" near the word "Mario" on a lot of the documents where the two occur, or are they relatively far away? Maybe Super Mario Brothers does appear with cat food a lot, but they're quite far away. They might look at citations and links between documents in terms of, boy, there's a lot pages on the web, when they talk about Super Mario Brothers, they also link to pages about Mario, Luigi, Nintendo, etc.
They can look at the anchor text connections of those links. They could look at co-occurrence of those words biased by a given corpi, a set of corpuses, or from certain domains. So they might say, "Hey, we only want to pay attention to what's on the fresh web right now or in the blogosphere or on news sites or on trusted domains, these kinds of things as opposed to looking at all of the documents on the web." They might choose to do this in multiple different sets of corpi.
They can look at queries from searchers, which is a really powerful thing that we unfortunately don't have access to. So they might see searcher behavior saying that a lot of people who search for Mario, Luigi, Nintendo are also searching for Super Mario Brothers.
They might look at searcher clicks, visits, history, all of that browser data that they've got from Chrome and from Android and, of course, from Google itself, and they might say those are corpi that they use to connect up words and phrases.
Probably there's a whole list of other places that they're getting this from. So they can build a very robust data set to connect words and phrases. For us, as SEOs, this means a few things.
If you're targeting a keyword for rankings, say "Super Mario Brothers," those semantically connected and related terms and phrases can help with a number of things. So if you could know that these were the right words and phrases that search engines connected to Super Mario Brothers, you can do all sorts of stuff. Things like inclusion on the page itself, helping to tell the search engine my page is more relevant for Super Mario Brothers because I include words like Mario, Luigi, Princess Peach, Bowser, Nintendo, etc. as opposed to things like cat food, dog food, T-shirts, glasses, what have you.
You can think about it in the links that you earn, the documents that are linking to you and whether they contain those words and phrases and are on those topics, the anchor text that points to you potentially. You can certainly be thinking about this from a naming convention and branding standpoint. So if you're going to call a product something or call a page something or your unique version of it, you might think about including more of these words or biasing to have those words in the description of the product itself, the formal product description.
For an About page, you might think about the formal bio for a person or a company, including those kinds of words, so that as you're getting cited around the web or on your book cover jacket or in the presentation that you give at a conference, those words are included. They don't necessarily have to be links. This is a potentially powerful thing to say a lot of people who mention Super Mario Brothers tend to point to this page Nintendo8.com, which I think actually you can play the original "Super Mario Brothers" live on the web. It's kind of fun. Sorry to waste your afternoon with that.
Of course, these can also be additional keywords that you might consider targeting. This can be part of your keyword research in addition to your on-page and link building optimization.
What's unfortunate is right now there are not a lot of tools out there to help you with this process. There is a tool from Virante. Russ Jones, I think did some funding internally to put this together, and it's quite cool. It's nTopic.org. Hopefully, this Whiteboard Friday won't bring that tool to its knees by sending tons of traffic over there. But if it does, maybe give it a few days and come back. It gives you a broad score with a little more data if you register and log in. It's got a plugin for Chrome and for WordPress. It's fairly simplistic right now, but it might help you say, "Is this page on the topic of the term or phrase that I'm targeting?"
There are many, many downloadable tools and libraries. In fact, Code.google.com has an LDA topic modeling tool specifically, and that might have been something that Google used back in the day. We don't know.
If you do a search for topic modeling tools, you can find these. Unfortunately, almost all of them are going to require some web development background at the very least. Many of them rely on a Python library or an API. Almost all of them also require a training corpus in order to model things on. So you can think about, "Well, maybe I can download Wikipedia's content and use that as a training model or use the top 10 search results from Google as some sort of training model."
This is tough stuff. This is one of the reasons why at Moz I'm particularly passionate about trying to make this something that we can help with in our on-page optimization and keyword difficulty tools, because I think this can be very powerful stuff.
What is true is that you can spot check this yourself right now. It is very possible to go look at things like related searches, look at the keyword terms and phrases that also appear on the pages that are ranking in the top 10 and extract these things out and use your own mental intelligence to say, "Are these terms and phrases relevant? Should they be included? Are these things that people would be looking for? Are they topically relevant?" Consider including them and using them for all of these things. Hopefully, over time, we'll get more sophisticated in the SEO world with tools that can help with this.
All right, everyone, hope you've enjoyed this addition of Whiteboard Friday. Look forward to some great comments, and we'll see you again next week. Take care.
The best part of this WBF is the last one: "use your brain". Nothing more correct than that, in fact, using tools is not really necessary, IMHO: with just simply "studying the subject" (i.e.: reading stuff about it, talking with the client, monitoring "social chatter" about the topic) you can build a theasaurus of words all related to a specific topic.
I mean, we don't need to do all this complicated analysis, for instance, when we write about SEO. Why? Because we know the topic, we know the jargon SEOs use, etc etc
Really, if you did some literature and humanistic studies, all the semantics search should not be so complicated for you, also because Google, even if it's a quite applied student, still is a Year 3 one in term of semantics, while we are supposed to have at least finished our College years with some success.
Use your brain and don't overcomplicate things, don't start thinking at writing as if it was a some sort of weird combination of "factors" (use this word here, that one there, this anchors surrounded by those words...): doing that your will be passing from keyword over-optimization to "topical over-optimization".
Said that, here some sources here on Moz about Topic Modeling and Semantic Search:
1) My Whiteboard Friday about "Topical Hubs";
2) My free Moz Webinar about "Semantic SEO"
3) Cyrus post about the new "on-page" optimization.
Finally, a suggestion:
Then, obviously, integrate with what Google Suggest suggests (sigh) and what are the Related Search shown by Google at the end of the SERP.
Thanks Gianluca Fiorelli, for your suggestion of seven steps. But why wikipedia only? Can we try it for the links ranking better in the search result in Google for the keyword we searched?
i suggest to do that in point 7.
And suggest Wikipedia because it is the most accessible source of the many Google uses for Knowledge Graph, and because for topic searches it ranks always in the top 3.
oops! Thanks for the response.
By the way, you can use some instruments for more keyword suggestions, not only Google keyword suggestions tool.
Here is the nice example of keyword suggestions tool:
https://sg.serpstat.com/
I had the same thought! Though I do love all the analysis/detail Rand put into this WBF. :)
"With just simply 'studying the subject' (i.e.: reading stuff about it, talking with the client, monitoring 'social chatter' about the topic) you can build a thesaurus of words all related to a specific topic." -- Exactly. I think this is why a lot of SEOs/SEO agencies fail. They apply "best practices" without truly understanding/becoming an expert on the product/service & customer needs. They simply do not take enough ownership. This is partly why I'm an advocate of brands having in-house SEOs and for SEOs to only work with brands that they themselves would be a customer of or at least can relate with/get excited about.
At the very least, putting yourself in the customer's shoes will always spur new ideas… if I'm interested in red wine, I might also be interested in recipe pairings for red wine or a decanter or serving/storing tips, etc. If I'm looking for a surfboard, maybe I also need a wetsuit, wax, a leash, tips on learning to surf. And if I find a brand that offers everything I need, great! But if they don't, I would love any info they (a brand I already trust) can provide or maybe even refer me to another trusted company. Going through this type of brainstorming exercise really helps create search-friendly (topic-modeled, semantically-connected) content that's helpful to customers.
Great addition to Cyrus' "More than Keywords" post from last week!
There are some sites on the web that list synonyms for a word. That might help too.
Hi Gianluca,
If I may paraphrase, I think where the tools can help is giving you more suggestions, more accurate suggestions, and faster.
Russ Jones has indeed done a great job. We at MarketMuse have also built a semantic keyword research tool, at https://www.marketmuse.com
The keyword research tool itself is interesting data that could save you a few hours every month. What's more interesting are the content analysis and site analysis applications that can be built with this type of data.
Great post and discussion! Looking forward to seeing more about Semantic SEO in the years ahead.
Aki
Thanks for the shout-out for nTopic :-) You probably will destroy the service temporarily, but that's ok. I did invest substantially in the product and it was worth it for Virante clients alone. You are being a little too modest though, as it was Moz, Rand and Ben who first turned Virante onto LDA. Moreover, you mention using the Google code library for LDA, which is actually what nTopic uses!
For those who are prepared to do a deeper dive, take a look at this study we conducted using nTopic recommendations for search traffic increases. What we found is that where the real benefit comes is the ability to pick up long-tail traffic that comes from the usage of words that will naturally occur in long tail queries that you otherwise would have missed by not including all the related terms.
I am excited to see what Moz is thinking about launching in this space, it has been too quiet too long!
Also, a word to those who go to use the tool - the Chrome Plug-in attempts to extract the textual content from the page. Malformed HTML can get in the way of this so scores can be iffy. Your best bet is to always get the free API-Key and then copy and paste the textual content into the Writer app on the site.
I really wish the photo of the whiteboard would be high resolution. Sadly it is almost unreadable.
PS: Bowser is a Kooper, which is a turtle-like race.
Whiteboard Friday is always a real pleasure. Thanks! nTopic.org is a great idea, however, I just tested some No. 1 ranking keyword/URL combinations (google.de) for the German market and always got "Your content is statistically irrelevant", which is probably not an appropriate result. I wonder whether this is due to the early stage of the product or due to the exotic geographical reference.
That's because the tool, as so many unfortunately, works only with English.
That was also the issue of an old tool Moz had (and quite soon retired) about latent dirichlet allocation.
Thanks - I will give it a try for English sites.
Yep, thanks for pointing that out. Unfortunately, we would have to build an language model for all other languages and, frankly, with our current user base size, it wouldn't be worth it :-(
That being said, at Virante's 2015 planning retreat yesterday, nTopic was on the list, so hopefully better things to come!
Thanks, this is good news - I keep my fingers crossed.
Huh, I also get "statistically irrelevant" for our English blog travelmemo.com. Is that due to our small footprint? We only have a few thousand visitors per month.
As I mentioned in a different post, the Chrome extension can fail if it can't extract your content. It is better to get the free API key and then use the Writer App - https://ntopic.org/writer.php
I ran the content of your home page against the words "Travel Blog" and "Luxury Travel" and you scored a 96.55% and 96.68% respectively.
We recommend that you try and get to 99% relevancy for your primary keywords. For example, if you wanted to do that for "Luxury Travel", you would want to make sure your homepage included words like tours, luxury, islands, destination (singular), city, resorts, private, leisure, family, book, package, offer, deal, and food.
I will push to the Spanish version.:)
Me to!
Me too ;)
This article is timely as we just built Relevance Scores into our SEO Content Analysis Tool here at Volume Nine.
Right now you can only use the scoring tool if you are a SEO Dashboard user. We are using the Alchemy API to measure relevancy of the website page content against a single keyword phrase. Without going into too many details, the tool basically analyze every word on the page to see if they are relevant or not to that single keyword phrase.
Just to explain if it works, let's take 2 pages that are both focused on Halloween Decorations that rank in the Top 100 SERPs on Google
The last link is a news story about someone that stole Halloween Decorations while the first and second link are actually pretty decent pages for the search intent of buying Halloween Decorations.
We added this because we found that most Content Analysis tools only looked for Keyword Representation and even if you looked for variations, the tools didn't encourage the writer or optimizer to do the hard thinking work of building in relevant content surrounding the topic. Of course, we thing that this content expansion (or enrichment) makes a better blog post or organic landing page experience but we wanted to be able to explain in numerically like most SEOs want to do.
Anyways, I saw this WBF post and wanted to share that we are working on this exact thing in our SEO Content Analysis Tool
Hi Chuck,
This is great -- I haven't heard of VolumeNine before, but I'm glad that you're working on this problem as well.
AlchemyAPI is a good general-purpose algorithm to start with. I imagine you're using the Keyword Extraction API?
At MarketMuse, we've built a topic modeling tool that analyzes a large volume of ranking content to identify the topics (typically bigrams and trigrams, i.e. 2-word keywords and 3-word keywords) that you're missing from your site. We utilize a combination of semantic analysis, graph analysis and natural language processing. Taking this approach gives you the gaps that you're missing that you didn't know about (i.e. relevant terms, proof terms, co-occurrence, etc.).
I think you'd find our API interesting as you build your dashboard. Let me ping you under separate cover.
Aki
Akos (Aki) Balogh
Co-Founder, MarketMuse
I watched this with my 5 year old daughter (#trainthemyoung)... Her key take away from this video was "I wouldn't want to get stuck in a "fresh web," that would be gross.
One word: beautiful. :) Looking forward to see topic modeling tools developements one day in the future
Hi Rand. Been writing for SEO for the last 7 years. Wrote a post in response to your WhiteBoard Friday, how a content writer thinks of all this. Hope it is persuasive enough. :) . Have a great day.
https://www.linkedin.com/today/post/article/20141107103742-50244662-semantic-connectivity-seo-where-does-content-stand
Hi!My name is Alexander! I'm from Ukraine and i'm seo specialist.I need your help. We have problems with one site. We were asking a lot of people but nobody can help us. We have DMCA Complaint to Google https://www.chillingeffects.org/notices/1023418 Main page of our site are deleted from SERP. We were writting to chillingeffects, google forum.https://productforums.google.com/forum/#!category-topic/webmasters/webmaster-tools/oVO3_G3ksCQhttps://productforums.google.com/forum/?hl=ru#!category-topic/webmaster-ru/%D1%81%D0%BA%D0%B0%D0%BD%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5-%D0%B8%D0%BD%D0%B4%D0%B5%D0%BA%D1%81%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5-%D0%B8-%D1%80%D0%B0%D0%BD%D0%B6%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5-%D1%81%D0%B0%D0%B9%D1%82%D0%BE%D0%B2/H5yeegPFCjY
Guys from forum help us to find out that our complaint was lost. Now it's ok but we have no answer from anywhere....and the main page still is out.
Sorry for my bad english.I have no more ideas. That's why i wrote you. I hope that you have experience in that questions and i hope that you can help us.
Thanks!Best Regards,
Alexander.
Great post, mainly because it turned me onto some interesting resources. After wasting an hour on Nintendo8 I stated tackling Latent Dirichlet Allocation topic modeling in detail. My head hurts now but the methodologies of topic modeling is starting to make a lot more sense.
For related topic or WORD don't you think Wikipedia will work for that?
GREAT example using Nintendo and Mario Brothers!! Search engines are getting much smarter and are able to connect words & phrases to topics.
I'm starting to think that I don't know enough math to do SEO anymore. With that being said my suggestion would be to use use both wikipedia and google to find words that have some sort of semantic relationship to your keyword. Search wikipedia and see what other topics it has related to your keyword. Secondly use the Google related searches to see other keywords that people searched for when looking for this topic.
Google Wonder Wheel would have helped us. What do you guys think, shouldn't Google bring it back?
Topic Modelling tools!!
heard about such tools for the first time :) Thanks Rand
Use generators synonyms, like a dictionary, it often helps. Great topic Rand!
Hi there,
Rand thanks for the video, sorry i watched the video 2 weeks after, but its really good, my question is that when we used the searches from LSI ya its been older once, but now we follow the topic modeling is this some sort of new version of LSI or this is an older, you told us about the relate link where the link comes from your searches.
for e,g :-
i am searching on Spider Man.
1 . Amazing Spider man
2. Animated Spider man
3. Spider man the movie
4. spider man game.
So these are the searches i have got, so its true that older version of LSI is there or its simply vanished.
This post is a little above my level at this stage but still interesting.
Great WBF topic! I've been working more semantic into my content marketing strategies over the last year...working wonders for rankings. Good stuff!
Its great man ! Actually i was bit confused about whiteboard but now its cleared thanks a lot buddy
Thanks for sharing
Where do you buy your shirts?
Very interesting Whiteboard Friday. Have a read of this experiment testing semantic search vs traditional on-page SEO
The semantic connection between the different levels of the search engines is getting more and more important in the work of an SEO is no longer the future is now in the making. Batteries must be !! Thanks for the post !!
Further reading:
https://www.cs.cmu.edu/~nlao/publication/2014.kdd.p...
https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/demo/
https://gate.d5.mpi-inf.mpg.de/webyagospotlx/SvgBrowser (I think this is how Google is now seeing topics and entities)
Rand, it's a favorite topic of our legends Bill Slawski & Gianluca Fiorelli but you explained the whole process really well :) :)
Semantic search is indeed a great thing not only for search engines but for the normal users as well. I'd say every brand should now shift their focus on words connectivity with their names.
Rand, How do you foresee the future of anchor texts and keywords now? Will the semantic modelling wipe them out?
Umar
I.AM.LEGEND
YOU ARE!! :)
Mind sharing your views on my questions? :)
Anchor texts are search entities, and search entities play a major role in Semantic Search, hence anchor text still matter now and in the future. That's why having links only with silly anchors like: "here" "post" "great", words that don't say nothing about the content linked is not just silly, it's also bad.
Semantic Search is about relations between words and the concepts/entities they represents inside an ontology set. And keywords are nothing but (key)words :-)
Thank you so much for the detailed explanation sir! :)
Thank you for your very clear description
Already back in January I noticed that the German Google list sites on top positions that do not have the "keyword combination" in the document at all. For example a site ranked No.1 in the German Google for the term "Onlineshop tableware" without having the word tableware at the site at all. Or position 3 on "health pomegranate" without having the word pomegranate anyware in the document.
That's happened in the US Google, too. For example, Acrobat has ranked for "click here" for years, due to all the anchor text with "click here" to download Adobe Acrobat.
I doubt that in the example "health pomegranate" anybody linked to the site with the keyword pomegranate. The thing is that this site is very strong on health issues and have articles about other fruits and vegetables. I think that the topic food + health is very strong with the site. Thats why they get listed as well for fruits and vegetables + health where they do not have any article about. I guess hummingbird went here in overdrive.
Great WBF I just like the cover photo on the video, awesome!
Never knew about this, thanks for the shout out to ntopic.com, amazing tool.
Hey Rand,i am really glad to learn about modeling and semantic connectivity,Very smoothly you have thought about this.It is more informative blog to understand the operations of search engine.
Amazing whiteboard, learnt many new little things! Thanks a lot Duuudes!
All great stuff for all of us to do consistently...we fall out of sync, and we need for you to bring us back to the fold...baa, baa! thank you
It's all connected. Even this topic - this is one step in the evolution of "tracking rankings." What good is tracking a single keyword when so many others may be ranking that we just aren't tracking? How is revenue? We want to start "rank indexing" but even then it's not ideal. You're still tracking buckets of keywords, not singles, but it's more planning & more resources to figure out what buckets will move the needle on the metrics that DO matter.
Love the topic - LSI is always an interesting read.
this WBF is not about LSI :-)
"As you go down here, the search engine might actually have a topic modeling algorithm, something like latent semantic indexing"
LSI is directly referenced in the article. It's one of the parts that most interest me from the WBF even though it's a small part of the overall topic. It's not "about" rank indexing either, but that's what interests me about topic modelling & semantic indexing.
indeed it is cited, but as a name for giving an idea of what rand was talking about, as LSI - unfortunately - is such a known term.
Please, go read my last post here on Moz for seeing why LSI is a myth and read the Cyrus post I link in my longer comment for better ways of making your content correct also semantically.
Another Caucasian, Gary. The Dude abides after another great WBF.
I think that those, who thumbed you down have not ever seen "The Big Lebowski". Sad :-(
In what year did the term "Semantic search" first appear in my SEO contracts?
HINT: It was a while ago
I will help you narrow it down After 1966 and before 2012
This is a good place to start. https://en.wikipedia.org/wiki/Latent_semantic_indexing
How to fix broken links?Can anyone help me?
Are you using Wordpress? If yes you only have to install this plugin https://wordpress.org/plugins/broken-link-checker/
I like whiteboard only because of Rand.
NIce and
Its so interesting and much impressive.
I was really hoping that Rand would have dressed up in a costume for this Whiteboard Friday. I'm just going to pretend that he is V for Vendetta. Nonetheless, Great whiteboard Friday as usual :) If you have some time, please check out my new article I posted for Halloween "Internet Marketing 101: Bad First Impressions That Can Spook Off Your Leads"
https://www.odegiecommerce.com/internet-marketing-1...