Over the past few months, Eytan Seidman, product manager for Live Search at Microsoft, and I have traded a few emails on the subject of an interview. At long last, we've got a visible product to show off :-) In this interview, I asked Eytan about the recent changes and updates to Live, his involvement with the search space and much more. Enjoy!

Eytan, first off, thanks a ton for agreeing to participate - it's terrific to have you on the blog. Can you give us a little bit of background about yourself before you came to Microsoft, how you found your way to the Live search team and what your current position is?

I came to Microsoft straight out of college. I interned at MS in the summer of 2000, loved it and came back right after I graduated in 2001. Since then I have basically worked on two products. From 2001 to 2003 I worked on Microsoft CRM which is an application that helps small and medium sized business manage their customers. Since 2003 I have been on search and worked on a variety of things. Initially I started out helping with the design and architecture of our crawl infrastructure which is what led to my being involved with SES and Webmasterworld. For the past 1.5 yrs I have been leading a team of program managers that drive two major efforts – relevance and relevance measurement. The first is everything we do at run-time to find the best results and the second area pertains to how we measure the relevance of our engine.

You're one of the few folks from MSN/Live search who speaks publicly about the service. Will you continue to be a/the public face for Live or should we expect a bigger team in the near future representing Live to the webmaster and search industry worlds? Who might be a part of that team?

We have had a number of folks beyond myself be the face of Live Search over the years. One of them is my manager, Ramez Naam who has spoken at SES and Webmasterworld on a number of occasions.  I won’t give any specifics or set expectations about folks from the team becoming more involved, but it is something we think about.

Live recently upgraded many of its services, including core search relevancy and index size. I had some specific questions about that shift, the first of which is - how do you measure search relevancy for humans? Who tells you "hey, the users like this better and they feel their results are more satisfactory?" Do you find that satisfaction has gone up for all types of searches, or only for more "common" or "popular" queries?

This is a great question Rand. We have a number of different mechanisms to measure search quality. While I unfortunately cannot get into the specifics of it we do look at multiple things including “blind” tests as well as data of how our users are engaging with our live site. In terms of satisfaction, the improvement is across the board. One of the things we highlighted at our recent searchification event is some of the neat work that we did around “query intent”. That is we are doing a ton of work to better understand what people really mean when they enter a query. An example of this on our blog is the query [nw coed soccer] and being able to understand that in this particular context [nw] means [northwest].  I mention this as clearly this is not a common query, but it is the type of thing you will see a lot of improvement on as you use the engine more.

The new Live search results feature a lot of cool vertical integration from local, product search, instant answers, etc. - what are some of your personal favorites? What are the ones that you think are a considerable leap above what the other engines offer right now?

I think all of them are really neat. The one I use the most is almost certainly what we call our Shopping Answer. This delivers instant answers for all things related to Shopping and specifically in electronics it really shines. A great example of this if you query Nikon D40x.  You will notice that right below the ads there is a relevant “instant answer” that gives you some price information and then lots of aggregated review information. If you click on “More Features and Reviews” you can taken to a detail page that gives more detail on the reviews. I think this experience is awesome and is truly a leap ahead of our competition. Try it for yourself.

With the new relevancy algorithm, it appears you've been cracking down on a lot of spam in the index - can you talk about any of the techniques or tactics you've been able to defeat (even if it's just from a broad sense)?

No comment :)

Fair enough - how about this - in the fight against spam, what would you say is a current high priority for Live? What are some things you'd tell Webmasters and Site Owners to really watch out for, because you're seeing a lot of people engage in it and get penalized as a result?

Still, no comment.

With the rollout, it appears there's a fresh (larger) index of the web, new algorithm(s) and new results. This varies from a standard "update" or re-tooling of an index/algo - any reason you chose to go with a clean launch rather than iterative changes? Is this standard for Live, or should we expect that more consistent changes to the current system will continue to come each day/week/month?

We are rolling out improvements to the live site all of the time. This release dubbed the “Live Search Update” was a culmination of a lot of big efforts and so we decided to release it all at the same time. Specifically this release had a lot of things that are hard to in a time of span of just a couple of months and so it made sense to bundle up.  I cannot comment on what we will do moving forward. We are very psyched to have an infrastructure that allows to ship improvements to our live site very easily. We will use that infrastructure to our advantage while at the same time being realistic about the time it takes to ship major improvements and game changing ideas.

Live is now rolling out Webmaster Tools as well  - what are the big goals with that product? What is Microsoft hoping to achieve by providing access to this data?

Webmasters are a critical piece of our success. Fundamentally it is webmasters who provide us the content that makes our engine useful to end consumers. We think that if we can forge a closer relationship with webmasters we can accomplish two things. First, we can help webmasters understand how Live Search is beneficial to their business and where they might be able to improve to get better results. For example, customers will be able to see details on what our crawler is seeing which may help them deal with issues more effectively. Second, we hope that we can build a relationship with folks so that we are able to get more of the content and get it faster. One thing that we noticed for example while mining our logs is that there are still a fair number of sites that specifically only allow Googlebot and do not allow MSNBot. If we can reach out to those webmasters and get them traffic by indexing their content we think that we will have done something that serves everyone well.

Live's Neural Net - have you been happy with progress of this ranking system? Are you planning on sticking with it for the foreseeable future? Have there been major tweaks since it was first announced back in summer of 2005?

Overall our core ranker, our net, has scaled very well. That is, even as we increased the size of our index by 4 fold and fundamentally made the job of our ranker more difficult we have continued to improve the relevance of our engine. I would say that we don’t hold anything sacred.  We are not married to any particular piece of technology. There are folks on the core product team and in MSR always looking to improve our Net or potentially replace it with something better and if we think that will deliver far more relevant results to our customers then we will move to it.

Correct me if I'm wrong, but my understanding is that Neural Net can be "trained" to get smarter and smarter over time by showing it "examples" of what you want great results to look like. How far along would you say this process is? Are we seeing 10% of what Neural Net is capable of doing? 20%? 90%? (Maybe, an undisclosed percent under/over 50?)

More data points can help our net learn more effectively and that is something that we are always looking to improve. I cannot, unfortunately, disclose details of where we think we are along that curve.

The "related searches" on the side of Live.com's results - are those semantically related through term vectors/usage or "related" in that many folks who searched the given query also searched for the listed terms/phrases?

The related searches are a combination of “more specific” and “broader”.  So if you search for [Eytan] it turns out that a fair number of people who looked for that are specifically looking for [Eytan seidman]: https://search.live.com/results.aspx?q=eytan&form=QBRE.  I swear that was not gamed.  There are also scenarios where we think people might want to search more broadly. For example, if you search for [Perl] then we show as a related search [Php] as that tends to be broadly related to [Perl].

How does Live's News search work - are the news sources editorially selected? How can an aspiring news site join in?

Yes, the sources are editorially selected. Fair question. We don’t have a formal mechanism to accept nominations although it is constantly reviewed. If you have one you would like to submit send it to me and I will get it to the right folks.

Are there forums/groups/blogs that the Live team regularly reads? Any you particularly like or find valuable? Any that the Live folks plan to participate at in the near future?

We all read almost all of the technology and search related news out there – SEOMoz (of course!), Techmeme, TechCrunch, Webmasterworld, Searchenginewatch, Battelle, Searchengineland, Sphinn and others.  Right now we participate on Webmasterworld.  What we participate on is really a function of ensuring that once we start participating we can be as consistent as possible about always doing so. Over time, my hope is that will increase as we have more bandwidth to so. We are also posting to our team blog on a much more frequent basis covering topics that we hope our readers will find interesting.

Sebastian (Gard - his blog) spoke to us about Microsoft's use of a quality panel of raters and it was noted at Searchification (see this slide). It's also something that seems to be common in Live press discussions ( i.e. our rating panel gave us feedback X, Y or Z). From talking with Sebastian, we noted that all of the engines do it, but only Microsoft publicly discusses its human search quality rating team (although to my knowledge, all of the engines - see Google -  have human raters) - two questions - why do you think the other engines try to keep their human, editorial input quiet and what value do you see human raters providers? Where do you get the most help from a large focus group, rather than teams of engineers and machines?

It really is not a super interesting topic :)

Among major web search properties, Live is among the least active in the social media space - is that going to change? Does Microsoft see value in "Web 2.0," user-generated content, web-as-a-platform, etc. or feel that it's all a fad?

I really don’t know what Web 2.0 means anymore. Seriously. I think Rich Barton has it largely right in terms of denouncing what some might view as the definition for Web 2.0. See the article here: https://blog.seattlepi.nwsource.com/venture/archives/122180.asp.

On Live Search we do believe in User-Generated Content in a non-trivial way. Let me give you a couple of examples. First Local Search. If you search for [Kisaku Seattle, WA] and click on the Local Instant Answer you will be taken to the following page. 21 customers who have collectively given it 4.5 stars. User Generated Content.  That data then flows through to our Live Search Mobile application. If you use it on Windows Mobile or Java enabled phone you will see that same data come across which can be hugely helpful when you are looking for a restaurant in some random place. Those are just a couple of example of how we will use User Generated Content and user data as it is helpful.

Another example is the work we have done with collections on Live Search Maps. You can build collections, share them and even create a 3D movie of the collection. We announced this at Searchification and it is shipping the week of October 15th.

MSN/Live was one of the first to release research about vision-based page segmentation - is that something you've found usable and useful in applying to the ranking algorithm? Some folks speculated that it was simply too load-intensive to be applied to the billions of web pages that have to be crawled. See both - https://research.microsoft.com/research/pubs/view.aspx?tr_id=690 and https://www.searchengineguide.com/articles/2005/0301_rc1.html

This technology is definitely important to us. When you think of something like News and doing a query on that it is really critical to be able to separate navigational text from other text. If you look at our news results and descriptions (https://search.live.com/news/results.aspx?q=seattle&FORM=BNRE) you will notice that we tend to pull summaries fairly well from just the core article.

More broadly I would say that MSR does a tremendous amount of research in the area of Web Search and they are totally free to publish it. There are numerous folks across MSR that we work with closely. Depending on where they are at with their research it might be something that we go and immediately integrate into our product or it might be something that they work on for a little while longer as it matures. In terms of computational power we really don’t think about those things too much. If something can help improve the experience in a dramatic way then we want to leverage it. You can read more about our massive investment in computational power in these articles: https://arstechnica.com/journals/microsoft.ars/2007/04/19/microsoft-builds-giant-datacenter-in-quincy-washington and https://www.datacenterknowledge.com/archives/2007/Jan/19/microsoft_confirms_huge_san_antonio_center.html

What are your thoughts about search "fracturing" in the future? Do you think that vertical search, mobile search, local search, etc. could eventually all have individual properties serving them or will the "major" engines retain control of those niches?

I am not a deep visionary about what will happen. I think, however, that if you look at why the engines and specifically Live Search are building up expertise around things like Shopping, Local, Entertainment and Health it is because customers are asking us those questions.  That is, we are building up deep domain experiences specifically in response to customer needs so clearly we see the demand.

Have to ask - with regard to paid links, Google has a very public and oft-discussed policy that suggests that site owners who engage in paid links of any kind should use "nofollow" or risk having their websites penalized in some form or fashion. What is Live's position on paid links? Do you see a difference between links that are primarily advertising and those that are meant to manipulate search engine rankings? Will Live take action against sites it suspects of buying or selling links and if so, do you have concerns that this could create a "Live-Bowling" type of competitive environment? I'm interested in both philosophy and your practical application - how does the Live Search Team think about paid links? Are they always evil? Are there degrees of evil? Can some paid links be good and relevant and others be untrustworthy? Do webmasters or websites risk getting banned for buying links? For selling links? Are you concerned that penalizing link buyers could result in competitive link buying (buying links and pointing them to your high-ranking competition in order to get their sites hurt)?

The thing I think about most in this space is relevance and objectivity of links. The reality is that most paid links are a.) obviously not objective and b.) very often irrelevant. If you are asking about those then the answer is absolutely there is a risk. We will not tolerate bogus links that add little value to the user experience and are effectively trying to game the system. Is there a small gray area of people that use paid links in a totally legit manner? Yes, however, it is a small percentage of what we see.

Finally, I wanted to come back and ask a personal question - what drives you to be in search? What are the specific elements that make you passionate about the work that you do?

There are really three things I love about search. First is the task of search. I think that web search, specifically, is at the leading edge of transforming the way people seek and find information and to be in the thick of that is really exciting. Second, is the technology.  We work on some super interesting technical challenges day in, day out.  Many of the problems we work on involve mining petabytes of data, using 1000’s of machines to do so and just doing things at a scale that is hard to almost anywhere else. Last, but definitely most importantly are the people. I think that even if you like the technology you work on if you don’t get along with the people you are never going to love your job. I love my job because of the people I get to interact with on a daily basis.  On the engineering side we have some of the smartest folks around that I learn a ton from. From a management perspective we also have folks that are just outright awesome to learn from and so this really creates an environment that I enjoy a lot.

Many thanks to Eytan and the Live team for participating here. I'm currently in the process of running some screenshots by some of the folks who have launched the Live Webmaster Portal Beta and hope to have those on the blog sometime later this week.

Rand & Eytan Enjoy Cigars at Microsoft's Party during SES NYC 2007
Rand & Eytan Enjoy Cigars at Microsoft's Party during SES NYC 2007

BTW - For more on the updates to Live Search, check out this post on relevance, this one on product search and this latest one on the upgrades to Live Maps. There's some busy engineers out in Redmond.

In the next few weeks, I'll be visiting the Live team over in Redmond to look at version 2 of the Live Webmaster Portal. Hopefully, I'll get permission to leak some of these . In the meantime, Barry Schwartz posted screencaps of version 1 here.

p.s. ALSO - Don't miss Eric Enge's recent chat interview on some more technical subject matter with Live's Ramez Naam. I know, I know, it's an interview bonanza this week - what can I say, when it rains, it pours :)