LDA - Is On-Page Optimization the SEO Secret? - Hold for Review!

Comments 53

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Gianluca Fiorelli

2010-09-04T15:13:29-07:00

Note to understand my comment: I absolutely not own a scientific past and my hability in understanding formulas like those ones presented in your post can be compared to an ant one. But I've a deep semiotic (and rethoric) knowledge due to my past studies and work... and I think that LDA could be explained also with the theory of signs and rethorics.

Now I try to explain my assumption in the most logical way:
1. The mission of Search Engines is to deliver the most relevant results to a precise search.
2. Therefore Search Engines Algos look to Signs that express the relevancy of a web page to that search
3. That means:
 - Discrimination of the words contained in that page accordingly to a logical organized symbolic structure, that essentially represents the way humanbeings since always have expressed themselves in written words (this is why Titles, URLs text, Alt text... are important;
 - Context that those words are used in. That is the only way to understand how to discrimate between words that express different concept. The "Apple" example or the "Stones" done by Rand are representative of these ambiguous case. But rethorics teaches us (and that is the big difference between us and other evoluted mammal's intelligence) that we can use words to express something really different... symbolically (metaphores, synecdoches, allegories...). Is context what makes us understand the real meaning of those rethorical figures and of the symbols used. Is it probable that this so human forms of expression are considered in the Algos.
 - When I talk of context, I would not consider only the context of the single page, but also of the entire website. If the entire website is talking about Fruits, then is even more probable that the page talking about Apple MacIntosh is actually about the tree/fruit and not about Mr. Jobs
 - Links... but remember that we always are talking about how are somehow more relevant when they are in a context related to the topic of of the site/page linked
4. "Cooking" all these context factors with their Algos, Search Engines finally can deliver the most appropriate results for a given search. Somehow the Algos are trying to replicate the human ability to discrimate things in a desordered enviroment thanks to our symbolic intelligence.
Therefore context - in a more wide sense - is important. But probably saying that On Page context is more important than the Link Graph is not totally correct. Maybe the best assumption is to think to LDA is something as important as Link Graph and one complementing the other.

Links present in a different context from the site linked, are going to be devaluated because of the failing correspondance. But also the page with the highest LDA percentile without a Link Graph confirming the relevancy of the page itself is going - probably - to rank worst than a less LDA optimized page that owns a better link profile.

P.S.1: maybe this search of context signs - apart all the Business reasons - can explain why Search Engines are indexing more kind of content every day (Images, .Pdf, .Doc, Flash Video, and now also SVG... when Audio files?).

P.S.2: I hope I was understandable

P.S.3: If not just think that I mixed LDA with LSI and came out with LSD

gfiorelli1 edited 2010-09-04T15:34:21-07:00
12 0
Note to understand my comment: I absolutely not own a scientific past and my hability in understanding formulas like those ones presented in your post can be compared to an ant one. But I've a deep semiotic (and rethoric) knowledge due to my past studies and work... and I think that LDA could be explained also with the theory of signs and rethorics. Now I try to explain my assumption in the most logical way: <ol><li>The mission of Search Engines is to deliver the most relevant results to a precise search.</li> <li>Therefore Search Engines Algos look to Signs that express the relevancy of a web page to that search</li> <li>That means: <ul><li>Discrimination of the words contained in that page accordingly to a logical organized symbolic structure, that essentially represents the way humanbeings since always have expressed themselves in written words (this is why Titles, URLs text, Alt text... are important;</li> <li>Context that those words are used in. That is the only way to understand how to discrimate between words that express different concept. The "Apple" example or the "Stones" done by Rand are representative of these ambiguous case. But rethorics teaches us (and that is the big difference between us and other evoluted mammal's intelligence) that we can use words to express something really different... symbolically (metaphores, synecdoches, allegories...). Is context what makes us understand the real meaning of those rethorical figures and of the symbols used. Is it probable that this so human forms of expression are considered in the Algos.</li> <li>When I talk of context, I would not consider only the context of the single page, but also of the entire website. If the entire website is talking about Fruits, then is even more probable that the page talking about Apple MacIntosh is actually about the tree/fruit and not about Mr. Jobs </li> <li>Links... but remember that we always are talking about how are somehow more relevant when they are in a context related to the topic of of the site/page linked</li> </ul></li> <li>"Cooking" all these context factors with their Algos, Search Engines finally can deliver the most appropriate results for a given search. Somehow the Algos are trying to replicate the human ability to discrimate things in a desordered enviroment thanks to our symbolic intelligence.</li> </ol> Therefore context - in a more wide sense - is important. But probably saying that On Page context is more important than the Link Graph is not totally correct. Maybe the best assumption is to think to LDA is something as important as Link Graph and one complementing the other. Links present in a different context from the site linked, are going to be devaluated because of the failing correspondance. But also the page with the highest LDA percentile without a Link Graph confirming the relevancy of the page itself is going - probably - to rank worst than a less LDA optimized page that owns a better link profile. P.S.1: maybe this search of context signs - apart all the Business reasons - can explain why Search Engines are indexing more kind of content every day (Images, .Pdf, .Doc, Flash Video, and now also SVG... when Audio files?). P.S.2: I hope I was understandable P.S.3: If not just think that I mixed LDA with LSI and came out with LSD
Cancel
- Cyrus Shepard
 
 2010-09-04T15:24:15-07:00
 
 I agree with you. On page optimization is too easy to game and easy to spam, whereas external links will always carry more authority with the search engines. Although this tool makes me think I need to do a better job with on-page optimization.
 
 1 0
 
 I agree with you. On page optimization is too easy to game and easy to spam, whereas external links will always carry more authority with the search engines. Although this tool makes me think I need to do a better job with on-page optimization.
 Cancel
- DanaLookadoo
 
 2010-09-04T19:42:12-07:00
 
 That was a post in itself gfiorelli1! Your point about links providing context for LDA is certainly key, and we do know Google values links heavily. There are certainly complex factors involved, and our attempts to explain such are from a layman's perspective.
 
 LSD? Well, some of us in attendance may have felt we were observing and listening to Ben's presentation through such a lens!
 
 3 0
 
 That was a post in itself gfiorelli1! Your point about links providing context for LDA is certainly key, and we do know Google values links heavily. There are certainly complex factors involved, and our attempts to explain such are from a layman's perspective. LSD? Well, some of us in attendance may have felt we were observing and listening to Ben's presentation through such a lens!
 Cancel
- Keith Paulin
 
 2010-09-05T19:14:22-07:00
 
 I think that was a great summary!
 
 1 0
 
 I think that was a great summary!
 Cancel
- Hans Braumüller
 
 2011-06-21T06:55:03-07:00
 
 Hi, you are right about the signs and the context. I remember Umberto Eco. I studied him once, when i was in the university trying to learn about fine arts.
 
 Interesting would be now to get an review after the tool have gone online. Are there somebodies out here, that have use this tool effective for better onpage optimization?
 
 1 0
 
 Hi, you are right about the signs and the context. I remember Umberto Eco. I studied him once, when i was in the university trying to learn about fine arts. Interesting would be now to get an review after the tool have gone online. Are there somebodies out here, that have use this tool effective for better onpage optimization?
 Cancel
Associate

Cyrus Shepard
Associate

2010-09-04T15:05:25-07:00

Very very interesting Dana. The LDA tool - super insightful. This may be naive of me, but I've always used GooglesExternal Keyword Tool to find relevent keyword phrases to add to my content to make it seem more relevant. Google lists the keyword suggestions in order of relevancy, so it makes sense that the more of these related terms it finds in my copy, the more likely they will judge my targeted keyword as relevent. The SEOMoz LDA tool works using a different formula, but seems to judge how good of job I'm doing at targeting these phrases.

Rack up another metric for on-page optimization, perhaps the most important since the title tag.

Cyrus-Shepard edited 2010-09-04T15:14:21-07:00
6 0

Very very interesting Dana. The LDA tool - super insightful. This may be naive of me, but I've always used GooglesExternal Keyword Tool to find relevent keyword phrases to add to my content to make it seem more relevant. Google lists the keyword suggestions in order of relevancy, so it makes sense that the more of these related terms it finds in my copy, the more likely they will judge my targeted keyword as relevent. The SEOMoz LDA tool works using a different formula, but seems to judge how good of job I'm doing at targeting these phrases. Rack up another metric for on-page optimization, perhaps the most important since the title tag.
Cancel
- goodnewscowboy
 
 2010-09-04T18:38:38-07:00
 
 This may be naive of me, but I've always used GooglesExternal Keyword Tool to find relevent keyword phrases to add to my content to make it seem more relevant
 
 Rather than being a naive idea, I think that that's a great idea Cyrus. As long as you kept the words within the parameters of good copy (i.e. not adsense like copy)it sounds like it'd lend itself towards a better LDA score
 
 2 0
 
 This may be naive of me, but I've always used GooglesExternal Keyword Tool to find relevent keyword phrases to add to my content to make it seem more relevant Rather than being a naive idea, I think that that's a great idea Cyrus. As long as you kept the words within the parameters of good copy (i.e. not adsense like copy)it sounds like it'd lend itself towards a better LDA score 
 Cancel
- Rand Fishkin
 
 2010-09-04T18:42:39-07:00
 
 I'm planning to post about this much more on Monday night/Tuesday morning (didn't know Dana was writing this up!), but I can say that the LDA tool isn't suggesting terms/phrases that get lots of search volume, but rather words and phrases that are likely "connected" in a vector space model of semantics.
 
 So, for example, when writing about elephants, words like "tusks," "protrusions," "hide," "pachyderm," "africa," "poachers," etc. may not be commonly searched-for, but may be useful to use on the page to provide more value to readers who are interested in learning about elephants AND more relevant to search engines who use vector-space models (probably more complex/advanced than the LDA stuff we've created) to influence their rankings.
 
 4 0
 
 I'm planning to post about this much more on Monday night/Tuesday morning (didn't know Dana was writing this up!), but I can say that the LDA tool isn't suggesting terms/phrases that get lots of search volume, but rather words and phrases that are likely "connected" in a vector space model of semantics. So, for example, when writing about elephants, words like "tusks," "protrusions," "hide," "pachyderm," "africa," "poachers," etc. may not be commonly searched-for, but may be useful to use on the page to provide more value to readers who are interested in learning about elephants AND more relevant to search engines who use vector-space models (probably more complex/advanced than the LDA stuff we've created) to influence their rankings.
 Cancel
 - goodnewscowboy
 
 2010-09-04T19:52:11-07:00
 
 It sounds like another great tool for the labs Rand. If anyone could find a way to create a tool that lists related phrases it's your team.
 
 1 0
 
 It sounds like another great tool for the labs Rand. If anyone could find a way to create a tool that lists related phrases it's your team.
 Cancel
 - Himanshu Sharma
 
 2010-09-05T23:43:04-07:00
 
 Google wonder wheel does the job of finding related phrases really well.
 
 3 0
 
 Google wonder wheel does the job of finding related phrases really well.
 Cancel
 - Christopher G. McGiffen
 
 2010-09-06T04:34:57-07:00
 
 Would you say that looking at the colocation of words within a search results page would achieve the same? e.g. take the top x results, extract common phrases from those pages and chances are they are there because Google has deemed those pages relevant to the same topic. Saves on having to create an index of the web and implement the algorithm.
 
 1 0
 
 Would you say that looking at the colocation of words within a search results page would achieve the same? e.g. take the top x results, extract common phrases from those pages and chances are they are there because Google has deemed those pages relevant to the same topic. Saves on having to create an index of the web and implement the algorithm.
 Cancel
Audax666

2010-09-05T13:29:03-07:00

Sounds very logical.

But when i compare no. 1 in the serps against my page (far away from position 1) with a special keyword, we have a score of

6% (wikipedia) : 62% (mine)

So I think we are still on the way to a semantic web.

Until then, incoming links and trust are the major factors.

But this future vision sounds good when the link hunting game times are over.

Because it's just a game: the best link collector wins, not the most relevant!

Long live king content .... :)

4 0

Sounds very logical. But when i compare no. 1 in the serps against my page (far away from position 1) with a special keyword, we have a score of 6% (wikipedia) : 62% (mine) So I think we are still on the way to a semantic web. Until then, incoming links and trust are the major factors. But this future vision sounds good when the link hunting game times are over. Because it's just a game: the best link collector wins, not the most relevant! Long live king content .... :)
Cancel
Rebecca Lehmann

2010-09-04T21:33:52-07:00

Ben didn't offer any ideas on how to use it for SEO, so as soon as I was home from the Mozinar I explained the LDA tool to my SEO team as best I could and we brainstormed a number of ways we could apply it to our SEO efforts. It's not as eloquent an explanation as Dana's as to the intricacies and history of LDA, but maybe it can help answer that particular question of how to use it.

3 0

Ben didn't offer any ideas on how to use it for SEO, so as soon as I was home from the Mozinar I explained the LDA tool to my SEO team as best I could and <a href="https://www.whitehat-blackbelt.com/2010/09/how-to-use-seomozs-lda-and-10-things-to-use-it-for/" rel="nofollow">we brainstormed a number of ways we could apply it to our SEO efforts</a>. It's not as eloquent an explanation as Dana's as to the intricacies and history of LDA, but maybe it can help answer that particular question of how to use it. 
Cancel
- greymorsels
 
 2010-09-06T22:40:40-07:00
 
 Nice article Rebecca. Very well explained. Thanks for the link.
 
 1 0
 
 Nice article Rebecca. Very well explained. Thanks for the link.
 Cancel
TopSC

2010-09-04T15:13:18-07:00

I just recently tried this tool, this is after reading this article.

I noticed a couple of my projects home page were ranked on the first page,

not in a high competitive market, but local wise yes,

When i looked at the LDA tool - LDA spit out both of my project home page

in the above 50%.

So I am more curious to start tracking this. Thank you for this Great Eye Opener!

I've always believed that on-site optimization is just as important as off-site.

But its great to hear the explanation of being on topic relevant & not spamming the

keywords.

Cheers! :-)

3 0

I just recently tried this tool, this is after reading this article. I noticed a couple of my projects home page were ranked on the first page, not in a high competitive market, but local wise yes, When i looked at the LDA tool - LDA spit out both of my project home page in the above 50%. So I am more curious to start tracking this. Thank you for this Great Eye Opener! I've always believed that on-site optimization is just as important as off-site. But its great to hear the explanation of being on topic relevant & not spamming the keywords. Cheers! :-)
Cancel
James Drake

2010-09-05T12:51:04-07:00

We had similar idea and the sore truth still lies in number of RootDomains linking and variation of anchors.

2 0

We had similar idea and the sore truth still lies in number of RootDomains linking and variation of anchors.
Cancel
trafx2

2010-09-07T08:28:14-07:00

Awesome stuff. What a way to come back from Labor Day. This is an interesting development not just for content on one's own site, but for links too. This emphasis on relevancy is linked in my mind to the recent kerfluffle about linkbait and infographics (see https://www.davidnaylor.co.uk/infographics-are-here-to-stay.html) and the now notorious Pakistan flood infographic (see: https://thenextcorner.net/pakistan-floodbait-end-infographic/). Infographics are very clever but do they actually make sense half the time? Well, no. I welcome any attempts on Google's part to limit their effectiveness (when they are irrelevant).

2 0

Awesome stuff. What a way to come back from Labor Day. This is an interesting development not just for content on one's own site, but for links too. This emphasis on relevancy is linked in my mind to the recent kerfluffle about linkbait and infographics (see <a href="https://www.davidnaylor.co.uk/infographics-are-here-to-stay.html" rel="nofollow">https://www.davidnaylor.co.uk/infographics-are-here-to-stay.html</a>) and the now notorious Pakistan flood infographic (see: <a href="https://thenextcorner.net/pakistan-floodbait-end-infographic/" rel="nofollow">https://thenextcorner.net/pakistan-floodbait-end-infographic/</a>). Infographics are very clever but do they actually make sense half the time? Well, no. I welcome any attempts on Google's part to limit their effectiveness (when they are irrelevant).
Cancel
Jonathan Goodman

2010-09-06T05:28:17-07:00

If SEOmoz took the "Better Way to Think of it" equation and put it on a t-shirt I'd just have to buy one...maybe two. But I'm a super nerd when it comes to industry t-shirts. Still not a bad idea.

2 0

If SEOmoz took the "Better Way to Think of it" equation and put it on a t-shirt I'd just have to buy one...maybe two. But I'm a super nerd when it comes to industry t-shirts. Still not a bad idea.
Cancel
Mark Rushworth

2010-09-06T04:31:34-07:00

I've known about this for some time and always called it 'theasaurus based content' before i knew the real term was synonims you can use the google suggest box as well as online synonim tools to work through your content ensuring that all terminology reinforces the core keyword.

2 0

I've known about this for some time and always called it 'theasaurus based content' before i knew the real term was synonims you can use the google suggest box as well as online synonim tools to work through your content ensuring that all terminology reinforces the core keyword.
Cancel
Jason Capshaw

2010-09-06T06:16:40-07:00

Very interesting concept...i have started using the wonder wheel to add relevant keywords, great tool

2 0

Very interesting concept...i have started using the wonder wheel to add relevant keywords, great tool
Cancel
Ben Morel

2010-09-06T01:07:55-07:00

Excellent post - we heard about Latent Semantic Indexing (LSI) a couple of years ago, but this is a far better model. Still, I think that Google may have customised it a bit if they're really looking to emulate human searchers. A couple of hypothesis/questions:
- Having skimmed through a couple of papers on this, notably Blei et al, Journal of Machine Learning Research 3 (2003) pg 993-1022, it is possible to apply this to a corpus of documents and compute the relevance of each document before giving back a score on the corpus. We have often noted that certain on-page elements seem to have higher weightings than others. Could Google be using each HTML element as a document, finding a relevancy score for this and using a weighting system to modify the LDA model?
- People don't use words in a Baysian way - the Zipf-Mandelbrot is far more realistic model. Would it be possible to combine Zipf-Mandelbrot with LDA to produce a model which emulates far more accurately natural human word usage? Of course you couldn't then use a standard Monte Carlo algo in this model, since Z-M is a power law.
2 0
Excellent post - we heard about Latent Semantic Indexing (LSI) a couple of years ago, but this is a far better model. Still, I think that Google may have customised it a bit if they're really looking to emulate human searchers. A couple of hypothesis/questions: <ul><li>Having skimmed through a couple of papers on this, notably Blei et al, Journal of Machine Learning Research 3 (2003) pg 993-1022, it is possible to apply this to a corpus of documents and compute the relevance of each document before giving back a score on the corpus. We have often noted that certain on-page elements seem to have higher weightings than others. Could Google be using each HTML element as a document, finding a relevancy score for this and using a weighting system to modify the LDA model?</li> <li>People don't use words in a Baysian way - the Zipf-Mandelbrot is far more realistic model. Would it be possible to combine Zipf-Mandelbrot with LDA to produce a model which emulates far more accurately natural human word usage? Of course you couldn't then use a standard Monte Carlo algo in this model, since Z-M is a power law.</li> </ul>
Cancel
Nathan Byloff

2010-09-05T09:33:57-07:00

I love this topic. I have been dabbling in topic modeling the last few weeks and LDA is exactly the formula I used. Good to see I chose the right direction. There are two situations I use it in. (1) To crawl a list competitor's high priority pages (for me) and extract the interesting words and topics. (2) I group together keywords and find out what pages are ranking for that group, and run those pages through my LDA script, and extract any useful insights.

I am really into 3D modeling lately as well. Right now I have a prototype of a 3D scatter plot that clusters interesting topics together. It's interactive so you can pan & zoom. It's all browser based too, so no flash, no plugins, etc. I am doing this because I write code but my role has switched to SEO in recent years. I hate using Excel. On top of this, our office is filled iwth a nice mix of creative personnel and statistic nuts. Interactive 3D topic modeling should help bridge the gap between those two teams. When I get closer to completion I will probably write a post demonstrating it for the community to use. Not only will it give the user actionable data, but visually pleasing to the average non-SEO.

nbyloff edited 2010-09-05T09:35:28-07:00
2 0

I love this topic. I have been dabbling in topic modeling the last few weeks and LDA is exactly the formula I used. Good to see I chose the right direction. There are two situations I use it in. (1) To crawl a list competitor's high priority pages (for me) and extract the interesting words and topics. (2) I group together keywords and find out what pages are ranking for that group, and run those pages through my LDA script, and extract any useful insights. I am really into 3D modeling lately as well. Right now I have a prototype of a 3D scatter plot that clusters interesting topics together. It's interactive so you can pan & zoom. It's all browser based too, so no flash, no plugins, etc. I am doing this because I write code but my role has switched to SEO in recent years. I hate using Excel. On top of this, our office is filled iwth a nice mix of creative personnel and statistic nuts. Interactive 3D topic modeling should help bridge the gap between those two teams. When I get closer to completion I will probably write a post demonstrating it for the community to use. Not only will it give the user actionable data, but visually pleasing to the average non-SEO.
Cancel
David Curtis

2010-09-04T18:35:13-07:00

My experiences have led me to believe quite stronly in on-page SEO for getting the results I seek in the SERPs. I'm absolutely certain I could obtain higher ranking via this method if I were to write less per page and focus instead on creating more tightly written category silos for my own site. However, as I'm trying to provide prospects with a logical progression of information in the form of quotes backed up by links to parent resources in order to educate and convert these prospects I must accept losing some position as the sacrifice. I like the way this tool rates pages however, and as a test I took your own results (SEOMoz.org), copied and pasted it into an HTML doc and uploaded the text to a brand new domain & ran the test on the results again and got almost the exact same rating (minus two). I then over-wrote the page with a blank page so as not to invite Google to place the new site in the SERPs and thus possibly harm anyone elses ranking.

What's most interesting however is that the index page of SEOMoz.org did not contain the search term I chose to investigate even once, and the remaining text in the results also did not seem to pertain very closely at all to what it was I was looking for, even though SEOMoz holds the first spot in the SERPs for that keyword phrase. Most interesting. I'll have to continue looking at this. Thanks.

Suthnautr edited 2010-09-04T18:37:52-07:00
2 0

My experiences have led me to believe quite stronly in on-page SEO for getting the results I seek in the SERPs. I'm absolutely certain I could obtain higher ranking via this method if I were to write less per page and focus instead on creating more tightly written category silos for my own site. However, as I'm trying to provide prospects with a logical progression of information in the form of quotes backed up by links to parent resources in order to educate and convert these prospects I must accept losing some position as the sacrifice. I like the way this tool rates pages however, and as a test I took your own results (SEOMoz.org), copied and pasted it into an HTML doc and uploaded the text to a brand new domain & ran the test on the results again and got almost the exact same rating (minus two). I then over-wrote the page with a blank page so as not to invite Google to place the new site in the SERPs and thus possibly harm anyone elses ranking. What's most interesting however is that the index page of SEOMoz.org did not contain the search term I chose to investigate even once, and the remaining text in the results also did not seem to pertain very closely at all to what it was I was looking for, even though SEOMoz holds the first spot in the SERPs for that keyword phrase. Most interesting. I'll have to continue looking at this. Thanks.
Cancel
Rand Fishkin

2010-09-04T18:31:12-07:00

Hey Dana - thanks so much for covering this session. It was certainly exciting to be there and see Ben's research for the first time together with so many folks :-)

On the specifics - I think there might be some inaccuracies above, but that's certainly not your fault. Ben and I have been working on a post (with some help from others) to help clarify the issue as best we can. We're shooting to have that ready on Tuesday. I'm sorry - I didn't know you were planning to post something as well or we could have coordinated!

Thanks again for all your hard work covering what was, I'm sure, a very challenging session (amongst many great, and fast paced presentations).

randfish edited 2010-09-04T18:34:18-07:00
2 0

Hey Dana - thanks so much for covering this session. It was certainly exciting to be there and see Ben's research for the first time together with so many folks :-) On the specifics - I think there might be some inaccuracies above, but that's certainly not your fault. Ben and I have been working on a post (with some help from others) to help clarify the issue as best we can. We're shooting to have that ready on Tuesday. I'm sorry - I didn't know you were planning to post something as well or we could have coordinated! Thanks again for all your hard work covering what was, I'm sure, a very challenging session (amongst many great, and fast paced presentations).
Cancel
- DanaLookadoo
 
 2010-09-04T19:28:45-07:00
 
 Thanks Rand. I just had a "duh" moment! Of course, why didn't I think to connect with you knowing you had a post forthcoming? sigh... I strictly had my blogger/recap hat on. Hindsight is 20/20.
 
 However, what may be beneficial out of this is how those of us in attendance understand it. I spoke with close to a dozen attendees, and everyone had a different opinion/understanding of LDA and the tool.
 
 I am surely not alone in looking forward to your and Ben's explanation to enlighten us. Thank YOU!
 
 1 0
 
 Thanks Rand. I just had a "duh" moment! Of course, why didn't I think to connect with you knowing you had a post forthcoming? sigh... I strictly had my blogger/recap hat on. Hindsight is 20/20. However, what may be beneficial out of this is how those of us in attendance understand it. I spoke with close to a dozen attendees, and everyone had a different opinion/understanding of LDA and the tool. I am surely not alone in looking forward to your and Ben's explanation to enlighten us. Thank YOU!
 Cancel
goodnewscowboy

2010-09-04T16:05:02-07:00

How uber sweet to be able to comment again!

Thanks for the post Dana. That is some pretty awe inspiring Greek Ben used. I'm fairly certain I would have had to put a helmet on my head to keep it from exploding if I had sat through that presentation. You've explained it nicely and now I know for certain that it's not about the Learning Disabilities Association, the Long Drivers of America or the Lyme Disease Association.

While I was watching the tweet stream come out of #mozinar, I was totally stumped by the "LDA" reference. Every reference said things like "game changer" and "blown away" and I was all...Huh? What? What's LDA??

So instead of just twittering back "Yo, what in the world is LDA?" I decided that I didn't want to look like a total ignoramus (and thus remove any lingering doubts people had) so I went looking for "LDA SEO" on Google. And nothing came up that explained it. PS - I just looked on Google a minute ago and the entire SERP is now dominated by SEOmoz and this topic.

randfish edited 2010-09-04T18:32:19-07:00
2 0

How uber sweet to be able to comment again! Thanks for the post Dana. That is some pretty awe inspiring Greek Ben used. I'm fairly certain I would have had to put a helmet on my head to keep it from exploding if I had sat through that presentation. You've explained it nicely and now I know for certain that it's not about the Learning Disabilities Association, the Long Drivers of America or the Lyme Disease Association. While I was watching the tweet stream come out of #mozinar, I was totally stumped by the "LDA" reference. Every reference said things like "game changer" and "blown away" and I was all...Huh? What? What's LDA?? So instead of just twittering back "Yo, what in the world is LDA?" I decided that I didn't want to look like a total ignoramus (and thus remove any lingering doubts people had) so I went looking for "LDA SEO" on Google. And nothing came up that explained it. PS - I just looked on Google a minute ago and the entire SERP is now dominated by SEOmoz and this topic.
Cancel
- goodnewscowboy
 
 2010-09-04T16:07:00-07:00
 
 Sorry about the run on paragraph. I had used spacing, but apparently when Javascript is turned off, so is spacing. :(
 
 1 0
 
 Sorry about the run on paragraph. I had used spacing, but apparently when Javascript is turned off, so is spacing. :(
 Cancel
DanaLookadoo

2010-09-04T20:02:12-07:00

Key points from Rand,

"We're hopeful that this is the start of learning more about this process and productizing suggestions about it. It remains to be seen whether folks can "improve" their LDA scores according to our models and move up in the rankings, but we should see some of those results soon.

Let's all keep these points in mind.

Consider this recap as an interpretation that has some errors from one who is learning about LDA without testing or applicable understanding. Just like the telephone game where you sit in a circle and pass on a message, meanings get deciphered and translated differently. Thus, I'd suggest we hold on further comments on this post to allow Rand and Ben to share and explain more to us in their post next week. Wouldn't it be best to hold the conversation there? Agree?

2 0

Key points from Rand, "We're hopeful that this is the start of learning more about this process and productizing suggestions about it. It remains to be seen whether folks can "improve" their LDA scores according to our models and move up in the rankings, but we should see some of those results soon. Let's all keep these points in mind. Consider this recap as an interpretation that has some errors from one who is learning about LDA without testing or applicable understanding. Just like the telephone game where you sit in a circle and pass on a message, meanings get deciphered and translated differently. Thus, I'd suggest we hold on further comments on this post to allow Rand and Ben to share and explain more to us in their post next week. Wouldn't it be best to hold the conversation there? Agree?
Cancel
- Cindy Turrietta
 
 2010-09-12T16:34:20-07:00
 
 Can you provide a link to their post here? Thanks!
 
 1 0
 
 Can you provide a link to their post here? Thanks!
 Cancel
JamieLaw

2010-09-04T15:47:50-07:00

Great blog post and a good attempt at explaining what can be a very complicated area.

Research shows that users are increasing their average search query length each year in anticipation of the ambiguous results they will receive. So, if LDA isn't one of their primary methods used to establish how relevant a search query is to a particular page in its database, what else could they be using? LSI certainly isn't scalable..

Google offered my old university professor big bucks to go work for them with his knowledge in text mining to improve relevancy in their results. You can bet LDA has been in use for a good few years at least.

2 0

Great blog post and a good attempt at explaining what can be a very complicated area. Research shows that users are increasing their average search query length each year in anticipation of the ambiguous results they will receive. So, if LDA isn't one of their primary methods used to establish how relevant a search query is to a particular page in its database, what else could they be using? LSI certainly isn't scalable.. Google offered my old university professor big bucks to go work for them with his knowledge in text mining to improve relevancy in their results. You can bet LDA has been in use for a good few years at least.
Cancel
- Rand Fishkin
 
 2010-09-04T18:38:55-07:00
 
 Hi Jamie - Ben noted this in his presentation as well - that LDA, at least our model for it, is likely much more simple than what Google actually uses to calculate term/phrase vector models. That said, we're hopeful that this is the start of learning more about this process and productizing suggestions about it. It remains to be seen whether folks can "improve" their LDA scores according to our models and move up in the rankings, but we should see some of those results soon.
 
 2 0
 
 Hi Jamie - Ben noted this in his presentation as well - that LDA, at least our model for it, is likely much more simple than what Google actually uses to calculate term/phrase vector models. That said, we're hopeful that this is the start of learning more about this process and productizing suggestions about it. It remains to be seen whether folks can "improve" their LDA scores according to our models and move up in the rankings, but we should see some of those results soon.
 Cancel
greymorsels

2010-09-06T22:36:27-07:00

One more great post from you here Dana. Informative, interesting and an insightful post here. On-site optimization is always as important as that of off-site optimization and this is a great tool to help and give ideas on how we should optimize our pages.

Thanks for writing about it. :)

1 0

One more great post from you here Dana. Informative, interesting and an insightful post here. On-site optimization is always as important as that of off-site optimization and this is a great tool to help and give ideas on how we should optimize our pages. Thanks for writing about it. :)
Cancel
goodnewscowboy

2010-09-07T04:49:10-07:00

Hey Dana, I just finished reading Rand's post on LDA, and was compelled to come back here to say you did a great job recapping it. I got the exact same take aways from his post as I did yours. You go girl!

1 0

Hey Dana, I just finished reading Rand's post on LDA, and was compelled to come back here to say you did a great job recapping it. I got the exact same take aways from his post as I did yours. You go girl!
Cancel
humanmathematics

2011-05-17T22:49:03-07:00
1. Dirichlet distribution is just the multivariate version of the beta distribution, which is a common prior for naive Bayes. Wikipedia has a good simple explanation of it.
2. For LSI, the technique is based on SVD = Singular Value Decomposition. SVD decomposes a complex matrix into three simple matrices which are timesed together. With some rounding you can then shed the most ignorable parts of the matrix. This is essentially how lower JPEG quality images are generated, by rounding away the least relevant-seeming parts of the image. Except Google is shedding the least relevant-seeming parts of either an Ngram matrix or a Term-Document matrix.
humanmathematics edited 2011-05-17T23:15:02-07:00
1 0
<ol><li>Dirichlet distribution is just the multivariate version of the beta distribution, which is a common prior for naive Bayes. <a href="https://en.wikipedia.org/wiki/Beta_distribution" rel="nofollow">Wikipedia</a> has a good simple explanation of it.</li> <li>For LSI, the technique is based on SVD = <a href="https://video.google.com/videoplay?docid=-3184505661983090095" rel="nofollow">Singular Value Decompositio</a>n. SVD decomposes a complex matrix into three simple matrices which are timesed together. With some rounding you can then shed the most ignorable parts of the matrix. This is essentially how lower JPEG quality images are generated, by rounding away the least relevant-seeming parts of the image. Except Google is shedding the least relevant-seeming parts of either an Ngram matrix or a Term-Document matrix.</li> </ol>
Cancel
Ryan Berger

2011-01-19T23:01:17-08:00

LDA score for this page for the term "lda score": ~85%

Score for Rand and Ben's follow-up post: ~70% :)

Any plans to bring LDA score into the SEOmoz API?

1 0

LDA score for this page for the term "lda score": ~85% Score for Rand and Ben's follow-up post: ~70% :) Any plans to bring LDA score into the SEOmoz API? 
Cancel
MatildaRose

2010-09-11T07:00:28-07:00

Well, the takeaway from this must surely be that long detailed articles about your topic should help you rank better - because in the course of a 1000 word article, you should have naturally mentioned all the related words to your subject, which in turn means that the bots have a better idea about what the page is about.

The second takeaway is that spun content gets knocked down the rankings, especially badly spun stuff.

1 0

Well, the takeaway from this must surely be that long detailed articles about your topic should help you rank better - because in the course of a 1000 word article, you should have naturally mentioned all the related words to your subject, which in turn means that the bots have a better idea about what the page is about. The second takeaway is that spun content gets knocked down the rankings, especially badly spun stuff.
Cancel
Juraj Sasko

2010-09-08T03:50:28-07:00

unfortunately its not working with other languages such as Slovak that contains special characters, are you planning on fixing this?

1 0

unfortunately its not working with other languages such as Slovak that contains special characters, are you planning on fixing this?
Cancel
Sid.Surana

2011-01-14T01:25:46-08:00

I have been out of SEO for a good 4 years now and this topic is re-kindling my interest in all things Search. Finally, I have got off to starting a few experiments based around topicality and have also incorporated certain tests for topical links, non-topical, site/page topicality. Should be good to get base learnings in 3 months time. Now I am in the midst of taking up math/stat classes to get myself up-to some serious analysis ;)

SEOmoz has a new member now (not that it matters) and It would be an understatement to say that I would be following this place very closely, perhaps everyday.

Thanks Dana for the post.

1 0

I have been out of SEO for a good 4 years now and this topic is re-kindling my interest in all things Search. Finally, I have got off to starting a few experiments based around topicality and have also incorporated certain tests for topical links, non-topical, site/page topicality. Should be good to get base learnings in 3 months time. Now I am in the midst of taking up math/stat classes to get myself up-to some serious analysis ;) SEOmoz has a new member now (not that it matters) and It would be an understatement to say that I would be following this place very closely, perhaps everyday. Thanks Dana for the post. 
Cancel
Jonah Stein

2010-09-05T21:48:06-07:00

Other than trial and error, is their some way to get this tool to suggest additional terms or at least highlight terms that are contributing to the overall score?

1 0

Other than trial and error, is their some way to get this tool to suggest additional terms or at least highlight terms that are contributing to the overall score?
Cancel
- Alan Mosley
 
 2010-09-06T10:05:06-07:00
 
 Hear, Hear
 
 I have been trial and error-ing for the last hour and not getting too far.
 
 Rand what I would like is, I give the tool my Keyword, and it gives me back my content, then SEOMoz takes on no more pro members.
 
 To be realistic, to give us a list of do and don’ts would be good, words that are relevant, and those that may cause ambiguous content.
 
 I can see this being a great tool for ambiguous keywords, other keywords may not benefit so much.
 
 1 0
 
 Hear, Hear I have been trial and error-ing for the last hour and not getting too far. Rand what I would like is, I give the tool my Keyword, and it gives me back my content, then SEOMoz takes on no more pro members. To be realistic, to give us a list of do and don’ts would be good, words that are relevant, and those that may cause ambiguous content. I can see this being a great tool for ambiguous keywords, other keywords may not benefit so much.
 Cancel
Jonas Skoglund

2010-09-05T14:10:04-07:00

So, basically you are saying Use synomyms when writing stuff?

1 0

So, basically you are saying Use synomyms when writing stuff?
Cancel
- Audax666
 
 2010-09-05T14:31:00-07:00
 
 Plus related keywords and terms.
 
 As mentioned in the example with "The Stones" and Mick Jagger.
 
 1 0
 
 Plus related keywords and terms. As mentioned in the example with "The Stones" and Mick Jagger.
 Cancel
 - DanaLookadoo
 
 2010-09-05T17:09:32-07:00
 
 Audax666's is correct, related keywords and terms. "Mick Jagger" is not a synonym for "The Stones" but is contextually relevant.
 
 I may have placed too much emphasis on synoyms by bolding such words above. Bottom line, think about classic on-page optimization. You don't want to mix messages. Avoid topical ambiguity as I did (on purpose) by writing about "topic modeling" and "fashion modeling" in the same post.
 
 We do know Google's algorithm is complex and that they do look at context.
 
 Again, we all look forward to Rand's expansion on the application of LDA and the SEOmoz tool. We can continue that conversation there.
 
 Thanks everyone!
 
 1 0
 
 Audax666's is correct, related keywords and terms. "Mick Jagger" is not a synonym for "The Stones" but is contextually relevant. I may have placed too much emphasis on synoyms by bolding such words above. Bottom line, think about classic on-page optimization. You don't want to mix messages. Avoid topical ambiguity as I did (on purpose) by writing about "topic modeling" and "fashion modeling" in the same post. We do know Google's algorithm is complex and that they do look at context. Again, we all look forward to Rand's expansion on the application of LDA and the SEOmoz tool. We can continue that conversation there. Thanks everyone!
 Cancel
Matthew Brookes

2010-09-05T09:46:08-07:00

Hi Dana,

Good post my maths is not great so the explanation that went with it helped a lot - i can just imagine peoples faces when the formulas started to appear!

LDA does make a lot of sense when thinking about search and possibly will become more relevant as the web become more semantic.

I am looking forward to the follow up post from the SEOMoz team on this as its something i cam keep to learn more about.

1 0

Hi Dana, Good post my maths is not great so the explanation that went with it helped a lot - i can just imagine peoples faces when the formulas started to appear! LDA does make a lot of sense when thinking about search and possibly will become more relevant as the web become more semantic. I am looking forward to the follow up post from the SEOMoz team on this as its something i cam keep to learn more about. 
Cancel
Kenneth Dreyer

2010-09-04T20:23:36-07:00

For quick rankings we should add another 4th factor, which isn't new or ground breaking information, but it deserves to be revisited: Keyword relevance in domain.

At least in the search scene in Norway (and I would assume it's not limited to only Norway) I've seen a huge number of domains that are spot-on relevant to the keywords claim top ranking. Even with low LDA relevance (thanks, this was a great tool to prove my point even stronger), hardly any inbound links and spammy-looking content, you see these domains pop-up everywhere in the rankings.

Keyword relevance in domain has always been an important factor, but I'm suspecting Google has put A LOT more weight on it.

Any thoughts? Other observations?

KennethDreyer edited 2010-09-04T20:25:46-07:00
1 0

For quick rankings we should add another 4th factor, which isn't new or ground breaking information, but it deserves to be revisited: Keyword relevance in domain. At least in the search scene in Norway (and I would assume it's not limited to only Norway) I've seen a huge number of domains that are spot-on relevant to the keywords claim top ranking. Even with low LDA relevance (thanks, this was a great tool to prove my point even stronger), hardly any inbound links and spammy-looking content, you see these domains pop-up everywhere in the rankings. Keyword relevance in domain has always been an important factor, but I'm suspecting Google has put A LOT more weight on it. Any thoughts? Other observations?
Cancel
Joanne Garrett

2010-09-06T03:48:36-07:00

Correlation between LDA and Google Position

I just did a (very) quick and dirty experiment on data which is close to my heart as the owner of a maternity clothes shop: I took the top 20 Google ranked sites for 'Maternity Clothes', plugged them into to LDA tool and checked the correlation (using Excel's Pearson test).

The result was -0.45 - so a middling negative correlation between rank and LDA value, meaning that overall as Google position number goes up, LDA goes down, but that position is also affected by other factors. I think this is a strong result, and didn't expect the LDA alone to have as strong a correlation. More data is obviously needed (this is just one test on a small data set with only one key phrase considered), but I think that's an interesting result.

Notes:

Correlation result of+1 would indicate a perfect positive correlation, 0 no correlation, and -1 a perfect negative correlation.

Experiment was done on Google UK

There was one outlier (which I included in the correlation) which had a high rank but very low LDA score as it's recently stopped trading and is now just a holding page, it's taking a while for it to drop down the rankings.

To get a better picture I need to compare this correlation with others, the obvious being to do a correlation between SERP and domain/page authority.

1 0

Correlation between LDA and Google Position I just did a (very) quick and dirty experiment on data which is close to my heart as the owner of a maternity clothes shop: I took the top 20 Google ranked sites for 'Maternity Clothes', plugged them into to LDA tool and checked the correlation (using Excel's Pearson test). The result was -0.45 - so a middling negative correlation between rank and LDA value, meaning that overall as Google position number goes up, LDA goes down, but that position is also affected by other factors. I think this is a strong result, and didn't expect the LDA alone to have as strong a correlation. More data is obviously needed (this is just one test on a small data set with only one key phrase considered), but I think that's an interesting result. Notes: Correlation result of+1 would indicate a perfect positive correlation, 0 no correlation, and -1 a perfect negative correlation. Experiment was done on Google UK There was one outlier (which I included in the correlation) which had a high rank but very low LDA score as it's recently stopped trading and is now just a holding page, it's taking a while for it to drop down the rankings. To get a better picture I need to compare this correlation with others, the obvious being to do a correlation between SERP and domain/page authority.
Cancel
Staff

Sarah Bird
Staff

2010-09-05T14:56:17-07:00

Thanks for such a great post Dana!

2 1

Thanks for such a great post Dana!
Cancel
Chris Horner

2010-09-06T01:59:56-07:00

Great post. Using the Google tools like Wonder Wheel, Sets and the ~keyword search you can find the words that Google see as related or relevent.

Thanks for a great post.

1 0

Great post. Using the Google tools like Wonder Wheel, Sets and the ~keyword search you can find the words that Google see as related or relevent. Thanks for a great post.
Cancel
nikunjlist

2010-09-06T01:08:22-07:00

For Some reasons people were not taking On-page optimization more seriously & suddenly now after this LDA thing they feel the importance of on-page optimization.

I think On-page is base of any SEO & lately people were ignoring it by spending much time on link building. But Google & other search engine have maintained a constant importance on On-page factors.

Only after someone performing some experiments on this theory can prove it completely correct & finally we can exposed the google algorthim.

1 0

For Some reasons people were not taking On-page optimization more seriously & suddenly now after this LDA thing they feel the importance of on-page optimization. I think On-page is base of any SEO & lately people were ignoring it by spending much time on link building. But Google & other search engine have maintained a constant importance on On-page factors. Only after someone performing some experiments on this theory can prove it completely correct & finally we can exposed the google algorthim.
Cancel
- Ben Morel
 
 2010-09-06T02:02:09-07:00
 
 I think people were taking on-page seriously - just acknowledging that link building is more important. Building links without having a well-optimised site would be like bailing water out of a boat with holes in. Either way, you're going to expend a lot of effort and still end up sinking. Withouthaving a stable, seaworthy craft you're never going to get anywhere.
 
 All LDA does is give us more of an idea of how we should optimise our pages.
 
 2 0
 
 I think people were taking on-page seriously - just acknowledging that link building is more important. Building links without having a well-optimised site would be like bailing water out of a boat with holes in. Either way, you're going to expend a lot of effort and still end up sinking. Withouthaving a stable, seaworthy craft you're never going to get anywhere. All LDA does is give us more of an idea of how we should optimise our pages.
 Cancel
Donnie Cooper

2010-09-04T19:40:51-07:00

Howdy Dana,

Thanks for sharing all this great stuff from the Mozinar! And thanks for giving us some 'context' on LDA. I think all of us SEO's instinctively knew this was happening, but there hasn't been much for discussions, per say, about the mechanics of semantic relatedness. Off hand, I can think of a couple interesting tools that Google might be using to achieve it's 'topic model'.

The first is knowledge-based information retrieval, in which databases of information such as wikipedia are mined for related terms and frequency of ocurrences.

The second, is entity tags. These would certainly reduce the margin of error when attemptimg to automatically calculate numerous uses for terms in an almost infinite combination of words and sentences.

Do you think Google might be using these to help their algorithm understand the nuances of human languages?

1 0

Howdy Dana, Thanks for sharing all this great stuff from the Mozinar! And thanks for giving us some 'context' on LDA. I think all of us SEO's instinctively knew this was happening, but there hasn't been much for discussions, per say, about the mechanics of semantic relatedness. Off hand, I can think of a couple interesting tools that Google might be using to achieve it's 'topic model'. The first is <a href="https://www.youtube.com/watch?v=NFCZuzA4cFc" rel="nofollow">knowledge-based information retrieval</a>, in which databases of information such as wikipedia are mined for related terms and frequency of ocurrences. The second, is <a href="https://www.seobythesea.com/?p=4172" rel="nofollow">entity tags</a>. These would certainly reduce the margin of error when attemptimg to automatically calculate numerous uses for terms in an almost infinite combination of words and sentences. Do you think Google might be using these to help their algorithm understand the nuances of human languages?
Cancel
JerkyOats

2010-09-09T09:29:13-07:00

I heard about Latent Semantic Indexing (LSI) a couple of years ago, but you've done the best job at explaining it. Great post!

1 1

I heard about Latent Semantic Indexing (LSI) a couple of years ago, but you've done the best job at explaining it. Great post!
Cancel
Aron Baczoni

2011-01-30T11:39:09-08:00

I just used the LDA tool to match the term "website design" with a page that is about website design but in a different language (did not post the link to the page because some might consider it as self promotion). The relevance was less than 1%. The same page with the same keyword but in the language that the page was created got a score of 99.981%.

Does that mean that search engines can not recognize content relevance to a topic if it is in another language? The same content in two different languages is categorized by search engines as two different topics? SEOs what's the verdict on this?

baczoni edited 2011-01-30T11:40:35-08:00
1 1

I just used the LDA tool to match the term "website design" with a page that is about website design but in a different language (did not post the link to the page because some might consider it as self promotion). The relevance was less than 1%. The same page with the same keyword but in the language that the page was created got a score of 99.981%. Does that mean that search engines can not recognize content relevance to a topic if it is in another language? The same content in two different languages is categorized by search engines as two different topics? SEOs what's the verdict on this?
Cancel

Post Analytics

Comments 53

Log in to Moz

Don't have an account?