Everyone's favorite social bookmarking site: Del.icio.us, appears to be rendering different content to the search engines than to its users. Every page on del.icio.us appears to have a the following directive in place:
<meta name="robots" content="noarchive,nofollow,noindex"/>
This should technically remove all of their pages from Google's index. Performing a site:del.ico.us command at google, however, returns several million pages. I checked Google's cache and didn't see a meta noindex tag. Examining robots.txt reveals the same thing: a standard user agent renders "disallow /," but a GoogleBot user agent does not.
They also nofollow all the outbound links on their site, but the nofollow attributes don't appear to be cloaked (so del.icio.us won't pass any link value).
My guess is that this is a method of combatting spam. They're probably hoping that having the meta tag will deter potential spammers from saturating their site with crap. It might also weed out robots that are scouring the web for valuable places to inject spammy links. It doesn't seem like a particularly effective tactic, but I suppose every little bit helps.
Thanks to Emil Stenström for pointing this out. Apparently this was covered on SEO Speedwagon in August, but I hadn't heard menton of it until now.
Del.icio.us Cloaking to Combat Spam
Online Advertising
The author's views are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.
An attempt to stop or slow down site scraping.
See the transcription of presentations from Dan Thies and Bill Atchison on this seroundtable post:
The Bot Obedience Course.
Dan mentions inserting the meta tag for visitors who aren't approved useragents. incrediBill discusses dynamic serving of robots.txt based on whether a robot is approved or not.
I'd say that explanation fits very well with what's going on at del.icio.us, and matching up to Matt's hypothesis (though from the other side). The cloaking prevents both unauthorized agents (though anyone "unauthorized" is also likely to ignore robot commands) and discourages potential spammers from tossing their links on the site.
Great insight, Bill.
The vast majority of scrapers use the search engines to find their targets. Furthermore, it is much easier to piggy-back scrape (ie, grab from the Google cache) than to spider a site and find content yourself. I would be hugely surprised if scrapers are responsible for even a fraction of sites struggling for rankings.
Maybe that is correct... but could be that the most success is doing something that the rest of the spammers are not into - finding new blogs and scraping their new posts daily.
EGOL - I'd suspect that you are right to at least some degree, but I'd also add to rjones' comments that many scrapers also use "re-purposed" content by sending it through markov-chaining programs that jumble word order, sentence structure and grammar to produce "new" content. The SEs have to work even harder to try to uncover content that's "real" vs. what's useless mumbo-jumbo.
because people use it to get high PR backlink and increase their SERP on google. De.li.cious is preventing leaking out page PR to scammers.
OK... I'd like to hear some opinions on this...
*Why would del.icio.us give a damn about scrapers?* What do YOU think? I don't know the answer.
However, my personal feeling why I don't like scrapers grabbing and reposting my content is that it can result in a traffic loss on long tail search terms. Scrapers take your content and post it, then they are in competition with you in the SERPs. I believe that legit sites would get more traffic if the scrapers were not around. In fact, I believe that all of the scraper traffic really belongs to the sites that they scrape from.
Imagine this scenario... Mr. Jones starts working on a brand new site. At the start he has very few links and very little content. If a scraper learns about his new site and visits daily to grab new content the pages on the scraper site could actually outrank the legit site where the content was taken. Why? The scraper might get spidered more often and indexed sooner - thus it will have an advantage over the newer, weaker site. I wonder if this might be at least part of the cause of some new sites struggling to get traction in the search engine rankings.
Maybe I am full of bull... let me know if this makes sense to you.
So - whats the bottomline here? Will the links that Delicious users add to pass any link juice? If not, then its linkbait value just dropped.
Del.icio.us links do not pass link value. However, there are many blogs and aggregator sites that mirror what's in del.icio.us popular, so getting in those results is still very valuable from an SEO perspective.
Actually, there is a concerted effort in the spam community to target robots.txt and meta-tag blocked content. How better to generate unique content than just scrape sites that block googlebot?
Thanks.
I gave a presentation on this subject recently. :)
When you switch from an opt-out approach to robots.txt to an opt-in approach, you're cloaking (perhaps reverse-cloaking is a better term) for those unapproved robots. A human visitor isn't going to see any different content than what Google or Yahoo! might have indexed, even though there is an extra meta tag in there.
I won't presume to guess what they are doing this for, I have no idea.
However I think that "My guess is that this is a method of combatting spam" is a silly conclusion to reach. It makes no sense that anyone who sat down and thought about the issue would think that this action would reduce spam.
If you have another explanation I'm all ears
Yeah they are cloaking. I used this header request simulator and went to my del.icio.us account. Meta tags are completely different. But the nofollows for all my externals links are still in place (damn nofollow!).
Spammers don't look at those things. They don't care... but they do love the fact the pages are in the google cache. The only thing it stops is legitimate bots.
De.li.cious still sends a decent amount of traffic if you make it on the main page and since many users have completely replaced their regular bookmarks with social bookmarking services it's a great metric for measuring the quality and "addictive" effect of your website content.