Inverse Document Frequency and the Importance of Uniqueness

Comments 26

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Eric Enge

2015-05-15T04:08:50-07:00

Hi Slavo,

Thanks for your question, what I was trying to say was it's not just about slapping the words on the page, but you need also to address the user's needs directly related to those words.

An example might help. If you have a page about lamps, and you decided that "ornate lead glass lamps" was a a rare term, you could add it to the page, but you should only do so if you see those types of lamps.

Let me know if that makes sense.

2 0

Hi Slavo, Thanks for your question, what I was trying to say was it's not just about slapping the words on the page, but you need also to address the user's needs directly related to those words. An example might help. If you have a page about lamps, and you decided that "ornate lead glass lamps" was a a rare term, you could add it to the page, but you should only do so if you see those types of lamps. Let me know if that makes sense.
Cancel
- Slavko Desik
 
 2015-05-17T03:55:46-07:00
 
 Thanks for the reply Eric. Totally agree. My point was this- instead of just adding more long tail phrases (option a. - that most people use, but brings no results), or creating content that specifically targets those phrases (option b.), a better option would be somewhere in between- to expand the content and include these rare terms- but do it in more depth (adding a whole additional chapter). And of course if it makes sense to user intent.
 
 A great example would be adding "local SEO in 2015" to a guide about SEO. Though not only throwing the phrase in the copy, but expanding it into a whole separate chapter. Or is it wiser to target this phrase with a different piece of content altogether?
 
 1 0
 
 Thanks for the reply Eric. Totally agree. My point was this- instead of just adding more long tail phrases (option a. - that most people use, but brings no results), or creating content that specifically targets those phrases (option b.), a better option would be somewhere in between- to expand the content and include these rare terms- but do it in more depth (adding a whole additional chapter). And of course if it makes sense to user intent. A great example would be adding "local SEO in 2015" to a guide about SEO. Though not only throwing the phrase in the copy, but expanding it into a whole separate chapter. Or is it wiser to target this phrase with a different piece of content altogether? 
 Cancel
Slavko Desik

2015-05-14T04:54:10-07:00

Hi Eric, great concept, but allow me to understand it better by asking a couple of dumb questions.

As you replied to some of the comments, this is more than just adding long tail keywords to the page. What would be a better alternative- adding some additional chapters into the content, or creating content that takes a new angle altogether (new title and all...)?

I use the first approach myself whenever I try to create reviews of products- adding additional chapters that I think are going to be valuable info for the user.

2 0

Hi Eric, great concept, but allow me to understand it better by asking a couple of dumb questions. As you replied to some of the comments, this is more than just adding long tail keywords to the page. What would be a better alternative- adding some additional chapters into the content, or creating content that takes a new angle altogether (new title and all...)? I use the first approach myself whenever I try to create reviews of products- adding additional chapters that I think are going to be valuable info for the user. 
Cancel
Umar Khan

2015-05-13T03:28:41-07:00

Thanks Eric for writing the IDF part of this scientific series.

I just wanted to know, does TF-IDF model also applies on Latent Semantic Indexing (LSI) uses by search engines? As per my understanding, LSI tries to overcome the problems of lexical matching by using statistically derived conceptual indicesinstead of individual words for retrieval.

2 0

Thanks Eric for writing the IDF part of this scientific series. I just wanted to know, does TF-IDF model also applies on Latent Semantic Indexing (LSI) uses by search engines? As per my understanding, LSI tries to overcome the problems of lexical matching by using statistically derived conceptual indicesinstead of individual words for retrieval. 
Cancel
- Eric Enge
 
 2015-05-13T05:06:59-07:00
 
 Hi Umar - I don't believe it does. LSI is something that came in later.
 
 2 0
 
 Hi Umar - I don't believe it does. LSI is something that came in later.
 Cancel
Oleg Korneitchouk

2015-05-13T08:45:44-07:00

Correct me if I'm wrong here... IDF = measure of uniqueness of a term/phrase based on all indexed instances of that term/phrase. The more unique it is, the more valuable it is to have on the page - which would help the page rank for all related terms or simply get the page indexed?

OlegKorneitchouk edited 2015-05-13T08:46:17-07:00
2 0

Correct me if I'm wrong here... IDF = measure of uniqueness of a term/phrase based on all indexed instances of that term/phrase. The more unique it is, the more valuable it is to have on the page - which would help the page rank for all related terms or simply get the page indexed?
Cancel
- Eric Enge
 
 2015-05-13T16:24:32-07:00
 
 Hi Oleg - it can help you rank for related terms. However, it's important that you ties this to some real value that you add, as opposed to simply loading the rare words on a page.
 
 2 0
 
 Hi Oleg - it can help you rank for related terms. However, it's important that you ties this to some real value that you add, as opposed to simply loading the rare words on a page.
 Cancel
 - Ruslan Vasylev
 
 2015-10-26T12:32:37-07:00
 
 Hi Eric, Nice article! What is the right way to tie content to chosen terms? What do engines expect? For example, if in one sentence I list terms "nike", "adidas" and in another sentence I use the term "these brands", will search engines be able to map "these brands" to the aforementioned list of brands?
 
 1 0
 
 Hi Eric, Nice article! What is the right way to tie content to chosen terms? What do engines expect? For example, if in one sentence I list terms "nike", "adidas" and in another sentence I use the term "these brands", will search engines be able to map "these brands" to the aforementioned list of brands?
 Cancel
NickHolland

2015-05-13T08:17:00-07:00

Thank you very much Eric! This basically explains everything what we did wrong at my previous company (when a was a novice in online marketing), and which was taught to us by a large consulting company. I will forward this blog to my previous co-workers.

2 0

Thank you very much Eric! This basically explains everything what we did wrong at my previous company (when a was a novice in online marketing), and which was taught to us by a large consulting company. I will forward this blog to my previous co-workers.
Cancel
- Eric Enge
 
 2015-05-13T16:24:53-07:00
 
 Glad it was helpful!
 
 1 0
 
 Glad it was helpful!
 Cancel
Tino Fernandez

2015-05-13T08:14:31-07:00

Hello Eric!

Good article.

If I understand correctly, IDF is the system that measures the importance of the frequency of keywords and Google examines all existing documents and based on that, which is calculated by the average of occurrences of a keyword.

This is important when Google to index your page, but there are other things that are important when it comes to having a good position.

It is not always easy to find a keyword that is not repeated in the Google search. Perhaps the search for synonyms to help us.

Thanks for the information.

2 0

Hello Eric! Good article. If I understand correctly, IDF is the system that measures the importance of the frequency of keywords and Google examines all existing documents and based on that, which is calculated by the average of occurrences of a keyword. This is important when Google to index your page, but there are other things that are important when it comes to having a good position. It is not always easy to find a keyword that is not repeated in the Google search. Perhaps the search for synonyms to help us. Thanks for the information.
Cancel
marianduanet

2015-05-15T15:59:58-07:00

Thanks for sharing this, Eric!

I was wondering how IDF differs from long tail keywords?

1 0

Thanks for sharing this, Eric! I was wondering how IDF differs from long tail keywords?
Cancel
Toby Bateson

2015-06-23T09:52:56-07:00

Very interesting slant on a popular topic. You have explained the science behind not just being unique but also being found

1 0

Very interesting slant on a popular topic. You have explained the science behind not just being unique but also being found
Cancel
Ishan Mathur

2015-05-19T05:45:25-07:00

Frankly, it's too technical for basic-level guys like me :)

1 0

Frankly, it's too technical for basic-level guys like me :)
Cancel
Salman Sharif

2015-05-18T21:20:46-07:00

That's an amazing way to calculate the uniqueness of content, but calculating the IDF of a whole article would need something complex, is there any way or tool that can help us calculate the IDF on our own? I mean it would take months to do it manually. Btw your scientific series is always awesome Eric :)

1 0

That's an amazing way to calculate the uniqueness of content, but calculating the IDF of a whole article would need something complex, is there any way or tool that can help us calculate the IDF on our own? I mean it would take months to do it manually. Btw your scientific series is always awesome Eric :)
Cancel
Shubham Tiwari

2015-05-13T21:55:54-07:00

Hello Eric,

Great post, short and informative. IDF does not help us to rank well but could be helpful to get found to audience that's what everyone looking for.

I like the image on the top, it is telling the definition of uniqueness ;)

Thanks

1 0

Hello Eric, Great post, short and informative. IDF does not help us to rank well but could be helpful to get found to audience that's what everyone looking for. I like the image on the top, it is telling the definition of uniqueness ;) Thanks 
Cancel
Procore Tech

2015-05-13T11:33:33-07:00

Hey Eric,

Thanks for the post, correct me if I'm wrong but you're basically showing the similarities between inverse document frequency (IDF) and long tailed keyword, how you can statistically break down what you need to target, in a granular form?

I know this is more of a broad out look of the post, but wanted to make sure I am understanding the bigger picture.

Cheers,

1 0

Hey Eric, Thanks for the post, correct me if I'm wrong but you're basically showing the similarities between inverse document frequency (IDF) and long tailed keyword, how you can statistically break down what you need to target, in a granular form? I know this is more of a broad out look of the post, but wanted to make sure I am understanding the bigger picture. Cheers,
Cancel
- Eric Enge
 
 2015-05-13T16:20:57-07:00
 
 Hi Justin - One of the main things I was trying to get at is that publishing the same old stuff that every one else does, or simply copying successful people, is not really a good strategy. You need to bring something new and unique to the table.
 
 However, as I will say to JibbedSEO in a moment (in response to his comment below) the goal is not to throw random rare keywords on your page (see below for the rest).
 
 2 0
 
 Hi Justin - One of the main things I was trying to get at is that publishing the same old stuff that every one else does, or simply copying successful people, is not really a good strategy. You need to bring something new and unique to the table. However, as I will say to JibbedSEO in a moment (in response to his comment below) the goal is not to throw random rare keywords on your page (see below for the rest). 
 Cancel
 - Procore Tech
 
 2015-05-14T08:58:39-07:00
 
 Gotcha, back in the day I use to piggy back off what others were doing, but now I try to bring new creativity to the table. Thanks for your clarification (lets me know I'm on the right path)
 
 Also I like how you do brake down the keyword search in a statistical manner, like you said "not to throw random rare keywords on your page". Buggers will stick every once in a while, but your break down it built for a better overall foundation.
 
 Thank you for getting back to me
 
 Cheers,
 
 1 0
 
 Gotcha, back in the day I use to piggy back off what others were doing, but now I try to bring new creativity to the table. Thanks for your clarification (lets me know I'm on the right path) Also I like how you do brake down the keyword search in a statistical manner, like you said "not to throw random rare keywords on your page". Buggers will stick every once in a while, but your break down it built for a better overall foundation. Thank you for getting back to me Cheers,
 Cancel
CommercePundit

2015-05-13T23:12:04-07:00

Before i say anything, your first image makes me eager to read the full article. ready catchy images and it says everything about the article. i think we provide the same to customers, business will also grow like this.

1 0

Before i say anything, your first image makes me eager to read the full article. ready catchy images and it says everything about the article. i think we provide the same to customers, business will also grow like this.
Cancel
JibbedSEO

2015-05-13T11:50:13-07:00

Hope everyone takes this post as (I think) it was intended, which is an interesing look at how Google understands and can sort content on the web. I don't think the author was suggesting to add in rare keywords from obsessive Googling.

JibbedSEO edited 2015-05-13T11:53:12-07:00
1 0

Hope everyone takes this post as (I think) it was intended, which is an interesing look at how Google understands and can sort content on the web. I don't think the author was suggesting to add in rare keywords from obsessive Googling.
Cancel
- Eric Enge
 
 2015-05-13T16:23:22-07:00
 
 I do think I am suggesting a bit more than this. I think that IDF teaches us that it's really critical to bring something unique to the table. If you are the 2,137th person trying to rank on some major term, well, good luck! Differentiation is essential. What is it that you do that's unique?
 
 I completely agree though, that this is not meant to spawn some keyword spamming exercise!
 
 1 0
 
 I do think I am suggesting a bit more than this. I think that IDF teaches us that it's really critical to bring something unique to the table. If you are the 2,137th person trying to rank on some major term, well, good luck! Differentiation is essential. What is it that you do that's unique? I completely agree though, that this is not meant to spawn some keyword spamming exercise!
 Cancel
nealeg

2015-05-13T20:56:26-07:00

Eric your post and examples are nice for delivering a very basic understanding of the subject. Unfortunately, there are as I see it two major flaws in the examples, A) Only a very small percentage of users search with quotes. For those that do search with quotes, your example is somewhat correct if the string is unique I.E only one result. If as the case most of the time a user searches for "pink monkeys in fort lauderdale" without quotes.. even if there is only one instance of the phrase "pink monkeys in fort lauderdale" on the web it is more likely that google will return an authoritative page about monkeys in fort lauderdale that sit on pink chairs. B) Is really just part of A as I understand it google / search engines put far more weight on individual words than phrases, at the single word level nothing will return just a few results.

1 0

Eric your post and examples are nice for delivering a very basic understanding of the subject. Unfortunately, there are as I see it two major flaws in the examples, A) Only a very small percentage of users search with quotes. For those that do search with quotes, your example is somewhat correct if the string is unique I.E only one result. If as the case most of the time a user searches for "pink monkeys in fort lauderdale" without quotes.. even if there is only one instance of the phrase "pink monkeys in fort lauderdale" on the web it is more likely that google will return an authoritative page about monkeys in fort lauderdale that sit on pink chairs. B) Is really just part of A as I understand it google / search engines put far more weight on individual words than phrases, at the single word level nothing will return just a few results. 
Cancel
- Ash Nallawalla
 
 2015-05-14T06:42:05-07:00
 
 nealeg, I don't see how searcher behaviour has anything to do with TF and IDF, which are constants even if everyone on the planet was wiped out. The real challenge is how to use this knowledge in a practical manner, i.e. compute the TF-IDF for every term on a page. For that you will need the count of total pages in Google (about 30 trillion) and a total count of the term in the SERPs. That could be thousands of queries to Google per the study set. If your IP doesn't get blocked, you will get very good numbers. Alternatively, you can get somewhat useful numbers by comparing each term against the total number of words in the given web page.
 
 1 0
 
 nealeg, I don't see how searcher behaviour has anything to do with TF and IDF, which are constants even if everyone on the planet was wiped out. The real challenge is how to use this knowledge in a practical manner, i.e. compute the TF-IDF for every term on a page. For that you will need the count of total pages in Google (about 30 trillion) and a total count of the term in the SERPs. That could be thousands of queries to Google per the study set. If your IP doesn't get blocked, you will get very good numbers. Alternatively, you can get somewhat useful numbers by comparing each term against the total number of words in the given web page.
 Cancel
- Eric Enge
 
 2015-05-15T04:15:31-07:00
 
 Hi Nealeg - the use of quotes or not is irrelevant to the use of TF/IDF. I simply used that to have Google help me find pages with the exact phrases. For example, 6.78M pages appear to have the exact phrases "super bowl 2015" (without the quotes) on them.
 
 Of course most uses search without "", but the point is that Google place a lot more weight on exact phrases on pages. So if a user searches on "super bowl 2015" (without the ""), Google will weight a page with that exact phrase more than they will a page talking about the 2014 super bowl, that happened to be written on January 15, 2015 that happens to have the article publication date on the page.
 
 1 0
 
 Hi Nealeg - the use of quotes or not is irrelevant to the use of TF/IDF. I simply used that to have Google help me find pages with the exact phrases. For example, 6.78M pages appear to have the exact phrases "super bowl 2015" (without the quotes) on them. Of course most uses search without "", but the point is that Google place a lot more weight on exact phrases on pages. So if a user searches on "super bowl 2015" (without the ""), Google will weight a page with that exact phrase more than they will a page talking about the 2014 super bowl, that happened to be written on January 15, 2015 that happens to have the article publication date on the page. 
 Cancel
Vitaliy_D

2016-11-07T23:55:13-08:00

Thank U Eric 4 great post!

1 2

Thank U Eric 4 great post!
Cancel

Post Analytics

Inverse Document Frequency and the Importance of Uniqueness

What is inverse document frequency?

What does the concept of IDF teach us?

Summary

Comments 26

What is inverse document frequency?

What does the concept of IDF teach us?

Summary

Comments 26

Log in to Moz

Don't have an account?