Xan has a good post today exploring the world of TDT - Topic Detection research. We've been trying to work on something quick and dirty that automatically classifies a web page into a hierarchical category structure based on the key terms and phrases we've found to be common in that topic area. We've had no success, but maybe some of Xan's links can help.
Does anyone know of a good topic detection tool other than Google's simplistic site-flavored tool?
Heh, heh - Actually, we tried using it as a system to pull data, but didn't have great luck. It is a nifty system though, I'd love it someone would release the source code so we could tweak it a bit.
Ive used the Yahoo tool before but just as a guide to what terms have been picked up by Yahoo! More information on what technology they use to disect some text would be useful, and give more of an insight into how Yahoo! catagorises sites.
I guess rand you were looking for something that takes the textual information from a web site and then run Yahoo!s tool against??
Wow, I've never seen that, but it seems that it should be very helpful to your project.
Rand,
You might want to take a look at Yahoo's term extraction api. (Y!Q)
https://developer.yahoo.net/search/content/V1/termExtraction.html
Then run the term list against documents that have been categorized (tagged/whatever - for example del.icio.us). Might give you what you are looking for.
- Michael
DJ - Nope. We definitely don't need another KW density tool. That metric carries no value for SEO, whatsoever.
Rand,
I don't know of any other tools, but I have some ideas as to how it may be acheived. Feel free to contact me if you're interested in talking about it.
Not exactly what you guys are looking for, but Spannerworks has an interesting tool found here: https://www.spannerworks.com/keywordanalyser.0.html
It does a pretty good job of automatically picking up phrases that are potentially the topic of the web page.