It feels cruel to give the search engineers at Yahoo!, Google, MSN & Ask more work, but I know that sometimes they might not have the best project tracking or distribution software, so I figured I'd be kind and make a list of items that need attention here in the blog, and they can refer to my post when they're wondering what to do with all their free time.
The following are requests made to engineers during SMX Seattle a couple weeks back that I personally felt were excellent suggestions.
- Parameter-Decapitator Instructions
_
I'm not sure that's the technical name, but this is certainly one of the items that the entire webmaster community direly needs. The concept is simple - provide a method in robots.txt to clarify that specific, parameter-type additions to a URL will be ignored by the engines. Thus, if Yahoo! was crawling along and saw the following URLs:
_
- seomoz.org/blog/entry-1
- seomoz.org/blog/entry-1?referrer=marketingpilgrim.com
- seomoz.org/blog/entry-1?referrer=marketingpilgrim.com&ID=44556
_
The engines could look at a robots.txt line indicating something like "ignoreparam = ID+referrer" and realize that the content at all four of the above URLs should be treated as the singular URL "seomoz.org/blog/entry-1" - they wouldn't index multiple versions and wouldn't encounter duplicate content problems from that indexing. It's really the best idea since sitemaps, and it's a long time coming.
_ - Commercial API for Organic Data Requests
_
This one's desperately needed by anyone who runs tools that request data from the engines (ahem). Yahoo! Search Marketing has a commercial API, so do many of the services at Google and Amazon's Alexa data. However, Yahoo! Site Explorer, Microsoft Live.com, Ask.com & Google web search all need to add this feature. It should mean a few million more in revenue for each over the next couple years, and it gives legitimate marketers a way to grab data that's accurate and accounted for, without skewing number of searches or ad impressions, etc.
_ - Webmaster Central for Yahoo!, MSN & Ask
_
Vanessa set the bar really, really high, so this is a tall order to fill, but I guarantee it's had positive returns for Google, not just in branding, but in financials, too. Personally, I'd nominate Laura Lippay to run the one at Yahoo!, and Rahul Lahiri would make a great webmaster interface head over at Ask. For Microsoft... Maybe Ken Headrick (granted, he's in Canada, but one stereotype that definitely sticks is the friendliness).
_ - Hyperlink Next to "Results" to Explain the Estimates
_
In my interview with Matt, I went off a bit on all the reporters who use Google's search count numbers as research in a story. We came to the conclusion that one good solution might be to place a "?" next to the results where visitors could hover and see a little box explaining the "roughness" of the measurement.
_ - Make it Easy to Sign Out of Google Personalized Results (without logging out)
_
You can use Joost's tool, but honestly, shouldn't it be more obvious that you're in personalized when they show them and easier to remove that feature without appending stuff to the URL? No one outside the search world is going to even know how to start with that.
_ - MSN - Bring Back the Link Command
It's been gone a while, and using the suggestion for a commercial API, you could actually make money off of it. The data's useful, it's available to you, go for it! Oh yeah - Eytan... You owe me a little sometin' sometin' :)
That's it for now. If you've got other requests, big or small, feel free to add them below.
FYI - This week finds me in Washington DC, helping the good folks at NPR with their SEO. Wish me luck! I've got 20+ hours of meetings and training over the next 3 days and around 400 slides to show (talk about death by PowerPoint). Hence, email (and posting, unless it's at 2:00am) will be very, very slow.
Rand,
I concur with all your petitions. They are all desperately needed.
Now, for those that can't wait for prameter-decapitator instructions, I will share a simple trick to address this issue (it works for Google).
Using Googlebot support for wildcards, we can do this:
robots.txt (using wildcards):
User-agent:Googlebot
Disallow:/*?referrer=*
Disallow:/*&referrer=*
Disallow:/*?ID=*
Disallow:/*&ID=*
This matches any URL that includes &referrer=, ?referrer, &ID= and ?ID=.
If Googlebot supported regular expressions, it would be rewritten like this:
robots.txt (using regular expressions):
Disallow:/.*[\?\&](referrer|ID)=
This says: "match a ? or a &, followed by referrer or ID, followed by equals"
Far shorter.
Hamlet - we just need to make sure that the engines interpret these instructions as "pretend URLs with this appeneded data are the same as the root URL without it" rather than "don't crawl or index these." We need all those links to count, but have the engines ignore the multiple URLs for the same content.
Rand,
I understand your valid point. You are concerned that by blocking access from those URLs we might lose the potential link juice. Right?
I think this strategy will have the desired effect.
Google doesn't drop inbound links because the destination content is inaccessible. When they can not crawl a page they can still use the anchor text as a qualifier.
Please check Section 2.2 of Google's original paper to confirm this.
Those pages appear in the listings as URLs only.
The problem is that they're still assigning the links to the page with the appended parameters. Our desire here is to see that the original page (the one with no parameters) receives the weight of all the links (internal or external) pointing to any page with those parameters attached.
I understand your point, but I think that the execution has to be slightly different in order to reach 100% of the goals with this suggestion.
You are right in your assesment. Please note that I am not trying to discourage search engineers from implementing this.
I am trying to find an immediate solution for this critical problem.
If the purpose is to have the original page carry all the weight, what do you think of using permant redirects for this purpose?
seomoz.org/blog/entry-1?referrer=marketingpilgrim.com and seomoz.org/blog/entry-1?referrer=marketingpilgrim.com&ID=44556 would redirect (HTTP 301) to seomoz.org/blog/entry-1
This can be implemented via .htaccess.
This is similar to the use of permanent redirects for non-www to www. Now is no longer necessary as Google it possible to specify that via webmaster central.
Hamlet - yes, that's the current solution that we tell companies to go with. What we're seeking from the engines is the elimination of that programmatic step, since it applies to so many sites.
Random idea - couldn't you do a re-write in .htaccess?
I'll have a play around with that sometime and see what I can come up with...
Good stuff. Sounds like you are going to be hard at work. That's a lot of powerpoint!
Another request on a slightly different (i.e. PPC) note, is for Yahoo! and MSN to release something like AdWords editor from Google.
Great Call on the desktop editor...as usual Google typically leads the others follow.
Matt Cutts asked about what would be useful during his Q&A.
I mentioned that some kind of examples of successful re inclusion requests would be great to have, or at least some way to know your voice was heard.
Even if it was a blanket email like
"we have read and researched your re inclusion request"
Would be nice for alot of people
Fantastic suggestion, Pat. I totally agree that a "your voice has been heard" response would be valuable. A signal in Webmaster Central saying "we think you're doing something manipulative" would also be a great addition, and would certainly help many sites who don't know better to pay attention.
And aside from anything else, it'd just be nice common curtosy. Especially for a company that has the motto "Do no evil". Kinda an extension to "Do lots of good"
Wouldn't this also signal spammers that they've been found out so they know it works as long as they get no message? If you leave people in the dark they will have more trouble manipulating the SERPs. Regretfully the 'bad guys' (yeah, they're intentional) screwed the mom-and-pop sites.
I like all the whole list of items. Did you send a link to your blog post to them? So when are we going to send out a professional lobyist? ;-)
I especially like your first idea ("Parameter-Decapitator Instructions"). Would make it a lot easier, rather than redirecting everything to its main page (And also less "risky". slight chance redirecting things like ?ref=whatever to the "plain" page could have negative effects (IMO)).
All great suggestions. But the best one is definitley the Parameter-Decapitator instructions in the robot.txt This makes SO much sense and would solve a whole load of hassle for both SEOs and the search engines.
Commercial API's for search, can't wait. Been wondering for years why we can't have a few servers dedicated to this, do you think it is coming? If so how long?
Great suggestions, Rand. You're right...these are direly needed. I'm blown away that it's 2007 and these aren't in place already.
Sometimes the obvious takes a long time to happen.
Solid suggestions. I would love a version of webmaster central for yahoo and msn.
Don't really care about Ask.com. :)
Ask.com - proving that no matter how much you polish a turd, it's still just a shiny shit.
Rubbish results can't be disguised by a pretty UI.
You've said this one before Rand, I wish all search engines would limit each URL to having two results per search query, sites like eBay can sometimes hold upto 15 of the top 20 search results for certain queries because they have so many different sub-domains. I know it is pretty much impossible to pull off because of sites like Wordpress.com etc. but I can dream.