Good interview with Matt Cutts from Phillip Lenssen at Google Blogoscoped. I particularly liked this bit about valid HTML and rewarding it in the rankings:
People sometimes ask whether Google should boost (or penalize) for valid (or invalid) HTML. There are plenty of clean, perfectly validating sites, but also lots of good information on sloppy, hand-coded pages that don’t validate. Google’s home page doesn’t validate and that’s mostly by design to save precious bytes. Will the world end because Google doesn’t put quotes around color attributes? No, and it makes the page load faster. :) Eric Brewer wrote a page while at Inktomi that claimed 40% of HTML pages had syntax errors. We can’t throw out 40% of the web on the principle that sites should validate; we have to take the web as it is and try to make it useful to searchers, so Google’s index parsing is pretty forgiving.
As much as folks in SEO distrust what Matt has to say, I think this quote is worthy of respecting. I think when you combine healthy skepticism with common sense and an understanding of motivation, you can glean useful information from Matt's works. I still applaud the guy for having the chutzpah to get out there and take all the flak for writing a blog on the company and their search ranking practices. I don't envy him that position.
Mano70 - I agree. They claim to save bytes by not writing valid html, but it seems like they could save a whole lot more if they used external css and javascript.
Matt claims leaving out the quotes around a color attribute saves bytes, but you'd think having an external style sheet that creates a class for that particular color would be much more efficient than repeating that color=red tag EVERY time you want an element to be red.
Whether Google improves SE placement as a result of a clean site is skeptical at best. This argument could go on forever.
I like to think that validating html code allows a page to load faster and provide my page viewers a better experience. What good is placement if a web page does not load fast?
The 40% that are not validated are just being lazy and do not want to spend the money to fix it.
Come on :) The faster load times is not the reason(it would be in order of ms). The reason is traffic. Google tries to save peanuts on one hand, on another hand it is spreading error codes over the web with its Ads. Some validators validate all frames on the page and at least this point Google should take care on customers, who wish to make their pages error - free. This behaivior shows us, the dark side of the monopol. I hope in the future there will be more than one suitable search engine. Some good begin is for example a YaCy(https://www.yacy.net)- distributed p2p search engine. AskJeeves and other. Of course, they are busy now with search algorythms more than with the validation of the frontpage. However if they a strong enough, they would fight for users at the same level as Google. And one of the user value is a clean, valid code.
Well, it should be possible to save more precious bytes on Google's homepage by using valid code and external CSS and javascript, so that one doesn't hold water.