I admit it - I struggle to understand patent applications (one of the big reasons that Si is part of our staff). However, Bill Slawski doesn't and it's made our collective lives in the SEO world (and the mozplex) considerably easier. Take, for example, his two incredibly fascinating posts collecting patent applications and speculating on how the engines might re-rank the results:
- 20 Ways Search Engines May Rerank Search Results
- 20 More Ways Search Engines May Rerank Search Results
Bill inspired me tonight to give my own wild and crazy speculation on how search engines might use data of all different kinds to help order the search results. Here's what I came up with:
- URL / Brand Name Mentions
As the engines find mentions of URLs and brand names in the content of websites, they might actually use that data almost like hyperlinks - considering it as a reference even when the link isn't there. Since many of the best news sites and virtually all of the research papers in PDF provide no link love, this might be a good signal for the engines to consider. - URL / Brand Searches
If a domain or brand name is receiving some volume of searches, the engines could use both frequency and temporal demand to re-rank results. They might even consider associated terms - like if lots of folks are searching for "Tasered University Student Video," the engines might start to rank that intended result highly even for more general searches like "taser video." - Link Traffic
I see no reason why, with the number of monitoring applications (toolbars, desktop search, analytics, etc.) that engines like Google, Yahoo! and (perhaps especially) Microsoft have available, they couldn't use the number of clicks on a link or the amount of traffic driven to a page from another to help rank the value of links. - External Registration Data
Book publishers officially announce book release dates, movie studios set opening days, sports teams file for trades and games, and businesses file tax returns, outlook prospecti, quarterly earnings reports and more. All of this "registration" or "release" type data is public knowledge (at least, a great deal is) and search engines could conceptually use information like this to help to predict demand and identify the official site/pages associated with the material. It might even be voluntary - if you're a business releasing a product/service/etc, one day you might be able to register with the engines the same way you register a local business with them now. - URL Mentions Offline
I know, this seems like it would be hard to track, but in reality, nearly every major TV program, radio show or broadcast message has a script (some before release, others after) that the engines could conceptually scour for URL mentions. Again, this might help to identify demand before it pops up, and to help serve up the right results, i.e. What domain name did Anthony Bourdain mention on his show for Stockholm travel info? - Email Content
Since all of the major engines now have fairly popular email services, watching for URLs and domains in the body of email copy, and using it either in personalized results or general web search might be beneficial. - Social Media Activity
Services like StumbleUpon, Reddit, Newsvine, Digg, Del.icio.us, Google Share, etc. all let you pass content around or vote on it. One day, search engines might try to interpret all the activity and find the signal buried inside the noise - start thumbing up those pages, people :) - Site Owner Credibility
No good bum? Superstar Philanthropist? Maybe the next time you register a domain, the engines will take notice. This might be a really smart idea for things like new product launches - ID'ing a company like Disney buying a domain and making sure it stays out of the sandbox. Or, conversely, finding a domain registered by Dave and keeping it in :) - Link Sting Operations
If spam ever gets really low, and search quality teams need something to do with their time, they might consider making honeypots of paid link sources to help ID potential link manipulators. Just think, one day you might be trying to close a transaction to help rank your Texas Hold 'em / Lingerie / Printer Cartridge website, the next, all your domains have the "you fell for it" penalty slapped on.
Any nutty ideas you've got for factors that search engines might consider using?
Thanks, Rand.
Some thoughts on the factors that you've written about:
URL / Brand Name Mentions
Under Google's approach, mentions of businesses, along with any geographic location, seem to be used in Local Search as a way of verifying that a business is located at a specific address, and may boost the ranking of that business for a specific term in local search. These can also be used to provide "contact information" with a search caption (title, snippet, URL) to make it easier to find that information for a business in Web search results.
There have also been a number of papers about "named entities" and fact gathering around mentions for purposes of Question Answering that comes close to what you are describing.
URL / Brand Searches
Under a "stream of data" approach, more popular and bursty topics and page selections may cause some reranking to happen. A query expansion (taser video) like the one you suggest doesn't sound unreasonable at all.
Link Traffic
Not just clicks or traffic (and we're talking about browsing activing here, and not just selections from the search engines), but also bookmarking activity, annotations, distance scrolled down a page, amount of time spent on a page, and so on, may influence rankings.
External Registration Data
There were a couple of fairly recent patent applications that I didn't include. One was on understanding events related to queries and dates. For instance, a search for "World Series" shows a few pages touting ticket sales for the 2007 world series. Registration data could be one signal in helping search engines understand queries that might relate to events, especially for terms where the events might change in nature, like a search for world series.
URL Mentions Offline
Though I really haven't seen a lot of discussion about it, check out Yahoo's Y!Q pages, which provide a way of tying print and TV citations and keywords to listings and rankings at Yahoo's Y!Q pages. With advertising on print, TV, and Radio a potential target for search advertising, awareness of mentions like this seems to be a direction to head in.
Email Content
I think one of the Google spokespeople recently stated that they aren't using URLs in email (gmail) for rankings in search engines. But, the patent application that I wrote about on Distributed Search Results through blogs, email, and IM mentioned using those citations in an aggregated manner as a possibility.
Social Media Activity
The Ask.com patent application on using web traffic to rerank results mentioned viewing results and selections at other search engines as a potential part of that process. I wouldn't imagine that looking at activity as sites like Digg, stumbleupon, and others would be more of a challenge than that.
Site Owner Credibility
Google's Agent Rank and treatment of product reviews, and Microsoft's Object Rank might be looking at reputation issues involving some authors. While reputation and credibility aren't necessarily matches, a high reputation ranking might be perceived as indications of trustworthiness and expertise, which is how I define credibility.
Link Sting Operations
Reminds me a little of a Microsoft study that "followed the money" between advertisers and spammers to get a lot of statistical information about web spam - Spam Double-Funnel: Connecting Web Spammers with Advertisers (pdf).
Cheers.
Ha ha! There's nothing new under the sun - it's good to know that the search engines are already thinking in these directions. Great work, as always, Bill :)
Thanks. It's funny, Rand. Sometimes the things that you think would be obvious that the search engines might be doing are things that they really haven't been. Or haven't been for too long.
For instance, papers describing the use of query sessions to gather user data information, instead of just individual searches, didn't start appearing until 2005. See: Query Chains: Learning to Rank from Implicit Feedback (pdf) for example.
Claritas.com has divided customers into 66 marketing segments based upon where they live. To do this communities are placed into these segments based upon the characteristics of their residents. They have given these segments cute names (some hilarious) such as.....
Young DigeratiBohemian Mix
American Dreams
Urban Achievers
Shotguns and Pickups
Upper Crust
Empty Nesters
Kids and Cul-de-Sacs
Mayberry-ville
Red,White and Blues
..... now, Claritas has done this geographically (I wonder how much this is being merged with website data?).... However, we might guess that some mixing of peoples might occur within a geography. A search engine can get enough data about us to divide us into *mental communities* - regardless of where we live. They can then use that data to produce refined and homogenous categories - which can then be used to color the SERPs that we receive.
They can do this via the same technology used for personalized search... but a bit in reverse. Blogs, websites, feeds, videos - even advertisements - would be categorized. Then individuals on the web are assigned into Mental Communities based upon factors such as: 1) the amount of time we spend on various categories of websites, 2) the devices that we use to connect to the web, 3) the times of day that we are active, 4) the geographic locations where we log in, 5) many other factors. They then place us in that N-dimensional space which, for the web might cluster like these....
Laptop Poker
Blackberry Republican
Lunch Hour Shopping Spree
Golfcourses and Airports
Mobile Debutante
Strippers at Midnight
New York Wannabe
Bilingual Blogger
A smart search engine could make good guesses at these - some people would daypart into multiple categories. It might take less data and processing power to assign people by category than it does to personalize.
I love these categorisations. I think Microsoft are doing a lot in this area.
I apologize if I'm repeating anyone, but I think reach and appeal within a single audience may matter. If you write a blog about cars, and you get 40,000 visits a week from people coming from sites that are related in some way, and then those people spend lots of time on the site, that might indicate a more relevant site. Since relevance is the product they're selling, any additional measure of relevance is fair game.
I also suggest that buying drinks for Matt Cutts may have a strong effect :)
Now that Google owns Feedburner, they can tap the originality of your posts, how often and timely you post on hot topics, the number of feed subscribers, item uses, and email subscribers that clickthrough.
Google bought feedburner because it is a data goldmine.
Google probably already uses "link traffic" coming from monitoring applications.
Another idea could be to evaluate adjectives and emotive verbs used in anchor texts. A link that says "this handy little tool" or "cos its great" on a given page should not count the same as a "this crap" or "a service I hated" link text.
Good thinking - semantic information would be a goldmine.
It's hard though. I hope that if the engines make steps towards cracking it, they release some kind of API that we could use for other applications. It'd be awesome to get to tap into some of that brainpower.
Will, I agree that some kind of API would be great, but I also wonder how likely, or at least how much, of that information will be shared in this way.
After all, the collection and, to some extent, control of that information is very much at the heart of the engine's power.
cheers
You're right. I think I'm hoping in vain...
Though it's the kind of thing that could come out of adlabs.
As my first comment, in participating on this site, I want to thank you for this post.
Both the links and your take on things. I really appreciate the different perspectives, and it has already given me a lot of food for thought (and ammunition) when explaining things to my clients.
It's always nice to have a response that isn't just ignorant, "It's Google. Big Brother. They have proprietary rights. Don't talk too loud or they'll hear us!"
As I've gotten more into performing SEO day-to-day, this site and posts like this become more and more valuable.
This isn't really all that far from what is in place already, at least at Google, with the new paid link reporting and more generally, Google's Human Review.
The idea of mentions though I think also makes sense since it really addresses the greater popularity/visibility factors that extend beyond links.
It would help if I read the original post thoroughly :)
Expanding on the email idea, I always wonder about email newsletters. This is a form of marketing that goes out to millions of people daily, but (unless it's captured in an online version for better viewing) is virtually invisible to the search engines.
Also - with the widespread use of Google analytics (and soon microsoft's tool?) using things like bounce rate and time on site when specifically associated with different keyphrases (i.e. how relevant is a site for different searches).
Or do they already do this?
Also - link sting operations, lol that'd be cool. not.
Some days, it seems like Google ranks me well if my hair is looking great. No, really!
I absolutely agree with #3. If you Google my business name, my homepage comes up first, naturally, but second (those indented results) is a page in my store that doesn't have the next most links (that would be my blog), but the page that receives the most hits out of my entire site, month after month. (It's a page of free educational materials in the form of PDFs).
Since I use Google Analytics, Google would definitely know that that's my most popular page, and must be ranking it highly on that basis instead of just looking at incoming links.
no. 2 seems a bit weird to me.. I would think thats a "bad relation"?
Regarding site owner credibility, it wouldn't surprise me if anonymous whois services might have a minimal negative impact on your ranking -
If you are really authoritative/trustworthy, why are you hiding your contact information?
Well very often idiots like to use your information to play stupid pranks - For exmple see https://headrush.typepad.com/whathappened.html (Kathy Sierra) something so small escalated, with people posting her Social Security numbers, adress etc online - regardless of how ell respected your words are, there are still going to be crazy people out there - I wouldnt want people to have access to my data - even if I had a top notch authority site...
I agree with that statement. I think there's enough to worry about doing "right" with the search engines that this shouldn't really play a major role.
So many people acquire by 3rd parties acting on their behalf anyway.
Trustworthy or not, I think many people choose to anonymize their registration info because of privacy and/or spam concerns.
On-site activity is another one I thought of - not in the analytics sense - in the commenting / posting sense. Many of my comments on SEOmoz would have got you a link if the community wasn't so much fun - I might still have written a take on it, but I would have done so on my own blog. Where you have really active communities, I think there is an argument that says they should rank well for certain kinds of search.
In particular, the dating searches that Matt was talking about - an awesome dating website with great linkbait but no users (imagine a hollow mingle2) shouldn't rank well, whereas a site that is entirely behind registration pages but is free and massively well-used should rank well (but wouldn't, at the moment).
I have no idea how you implement that though. I'm just doing the ideas!
Rand - I agree 100% with the first two of the above-mentioned factors. The other ones are good speculation though.
Take it more deeply... For example; search for "myspace layouts" in Google. You will be offered relevant popular terms at the bottom of the page. e.g. premade myspace layouts, cute myspace layouts, myspace sports layouts, emo myspace layouts,myspace backgrounds, pimp myspace, myspace layout generator, whateverlife.com
Look at the last term "whateverlife.com" This is just a URL. Since this is a very popular site about myspace layouts and the like, most people are likely to google it, either mistakenly into Google Toolbar search box or in the normal google search page. This makes the engine take this as a good factor for relevancy and popularity at the same time. Kill two birds with one stone! And it considers the site in question is POPULAR and is RELEVANT to ‘mypace layouts’. So it ranks it higher accordingly!
This is just a speculation but makes good sense here.