Del.icio.us Cloaking to Combat Spam

Comments 16

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Bill Slawski

2006-09-29T20:40:29-07:00

An attempt to stop or slow down site scraping.

See the transcription of presentations from Dan Thies and Bill Atchison on this seroundtable post:

The Bot Obedience Course.

Dan mentions inserting the meta tag for visitors who aren't approved useragents. incrediBill discusses dynamic serving of robots.txt based on whether a robot is approved or not.

billslawski edited 2006-09-29T20:41:16-07:00
2 0

An attempt to stop or slow down site scraping. See the transcription of presentations from Dan Thies and Bill Atchison on this seroundtable post: <a href="https://www.seroundtable.com/archives/004336.html" rel="nofollow">The Bot Obedience Course</a>. Dan mentions inserting the meta tag for visitors who aren't approved useragents. incrediBill discusses dynamic serving of robots.txt based on whether a robot is approved or not.
Cancel
- Rand Fishkin
 
 2006-09-29T20:47:28-07:00
 
 I'd say that explanation fits very well with what's going on at del.icio.us, and matching up to Matt's hypothesis (though from the other side). The cloaking prevents both unauthorized agents (though anyone "unauthorized" is also likely to ignore robot commands) and discourages potential spammers from tossing their links on the site.
 
 Great insight, Bill.
 
 randfish edited 2006-09-29T20:47:39-07:00
 1 0
 
 I'd say that explanation fits very well with what's going on at del.icio.us, and matching up to Matt's hypothesis (though from the other side). The cloaking prevents both unauthorized agents (though anyone "unauthorized" is also likely to ignore robot commands) and discourages potential spammers from tossing their links on the site. Great insight, Bill.
 Cancel
rjonesx

2006-10-01T19:06:08-07:00

The vast majority of scrapers use the search engines to find their targets. Furthermore, it is much easier to piggy-back scrape (ie, grab from the Google cache) than to spider a site and find content yourself. I would be hugely surprised if scrapers are responsible for even a fraction of sites struggling for rankings.

1 0

The vast majority of scrapers use the search engines to find their targets. Furthermore, it is much easier to piggy-back scrape (ie, grab from the Google cache) than to spider a site and find content yourself. I would be hugely surprised if scrapers are responsible for even a fraction of sites struggling for rankings.
Cancel
EGOL

2006-10-01T20:53:19-07:00

Maybe that is correct... but could be that the most success is doing something that the rest of the spammers are not into - finding new blogs and scraping their new posts daily.

1 0

Maybe that is correct... but could be that the most success is doing something that the rest of the spammers are not into - finding new blogs and scraping their new posts daily.
Cancel
- Rand Fishkin
 
 2006-10-01T22:21:50-07:00
 
 EGOL - I'd suspect that you are right to at least some degree, but I'd also add to rjones' comments that many scrapers also use "re-purposed" content by sending it through markov-chaining programs that jumble word order, sentence structure and grammar to produce "new" content. The SEs have to work even harder to try to uncover content that's "real" vs. what's useless mumbo-jumbo.
 
 1 0
 
 EGOL - I'd suspect that you are right to at least some degree, but I'd also add to rjones' comments that many scrapers also use "re-purposed" content by sending it through markov-chaining programs that jumble word order, sentence structure and grammar to produce "new" content. The SEs have to work even harder to try to uncover content that's "real" vs. what's useless mumbo-jumbo.
 Cancel
america-lottery

2007-08-24T09:47:06-07:00

because people use it to get high PR backlink and increase their SERP on google. De.li.cious is preventing leaking out page PR to scammers.

1 0

because people use it to get high PR backlink and increase their SERP on google. De.li.cious is preventing leaking out page PR to scammers.
Cancel
EGOL

2006-10-01T13:36:52-07:00

OK... I'd like to hear some opinions on this...

*Why would del.icio.us give a damn about scrapers?* What do YOU think? I don't know the answer.

However, my personal feeling why I don't like scrapers grabbing and reposting my content is that it can result in a traffic loss on long tail search terms. Scrapers take your content and post it, then they are in competition with you in the SERPs. I believe that legit sites would get more traffic if the scrapers were not around. In fact, I believe that all of the scraper traffic really belongs to the sites that they scrape from.

Imagine this scenario... Mr. Jones starts working on a brand new site. At the start he has very few links and very little content. If a scraper learns about his new site and visits daily to grab new content the pages on the scraper site could actually outrank the legit site where the content was taken. Why? The scraper might get spidered more often and indexed sooner - thus it will have an advantage over the newer, weaker site. I wonder if this might be at least part of the cause of some new sites struggling to get traction in the search engine rankings.

Maybe I am full of bull... let me know if this makes sense to you.

1 0

OK... I'd like to hear some opinions on this... *Why would del.icio.us give a damn about scrapers?* What do YOU think? I don't know the answer. However, my personal feeling why I don't like scrapers grabbing and reposting my content is that it can result in a traffic loss on long tail search terms. Scrapers take your content and post it, then they are in competition with you in the SERPs. I believe that legit sites would get more traffic if the scrapers were not around. In fact, I believe that all of the scraper traffic really belongs to the sites that they scrape from. Imagine this scenario... Mr. Jones starts working on a brand new site. At the start he has very few links and very little content. If a scraper learns about his new site and visits daily to grab new content the pages on the scraper site could actually outrank the legit site where the content was taken. Why? The scraper might get spidered more often and indexed sooner - thus it will have an advantage over the newer, weaker site. I wonder if this might be at least part of the cause of some new sites struggling to get traction in the search engine rankings. Maybe I am full of bull... let me know if this makes sense to you.
Cancel
david_rrove.com

2006-10-02T14:56:59-07:00

So - whats the bottomline here? Will the links that Delicious users add to pass any link juice? If not, then its linkbait value just dropped.

1 0

So - whats the bottomline here? Will the links that Delicious users add to pass any link juice? If not, then its linkbait value just dropped. 
Cancel
- Oatmeal
 
 2006-10-02T15:01:25-07:00
 
 Del.icio.us links do not pass link value. However, there are many blogs and aggregator sites that mirror what's in del.icio.us popular, so getting in those results is still very valuable from an SEO perspective.
 
 1 0
 
 Del.icio.us links do not pass link value. However, there are many blogs and aggregator sites that mirror what's in del.icio.us popular, so getting in those results is still very valuable from an SEO perspective. 
 Cancel
rjonesx

2006-10-01T13:15:15-07:00

Actually, there is a concerted effort in the spam community to target robots.txt and meta-tag blocked content. How better to generate unique content than just scrape sites that block googlebot?

1 0

Actually, there is a concerted effort in the spam community to target robots.txt and meta-tag blocked content. How better to generate unique content than just scrape sites that block googlebot?
Cancel
Bill Slawski

2006-09-29T21:05:25-07:00

Thanks.

I gave a presentation on this subject recently. :)

When you switch from an opt-out approach to robots.txt to an opt-in approach, you're cloaking (perhaps reverse-cloaking is a better term) for those unapproved robots. A human visitor isn't going to see any different content than what Google or Yahoo! might have indexed, even though there is an extra meta tag in there.

billslawski edited 2006-09-29T21:05:40-07:00
1 0

Thanks. I gave a presentation on this subject recently. :) When you switch from an opt-out approach to robots.txt to an opt-in approach, you're cloaking (perhaps reverse-cloaking is a better term) for those unapproved robots. A human visitor isn't going to see any different content than what Google or Yahoo! might have indexed, even though there is an extra meta tag in there.
Cancel
webprofessor

2006-09-29T18:37:11-07:00

I won't presume to guess what they are doing this for, I have no idea.

However I think that "My guess is that this is a method of combatting spam" is a silly conclusion to reach. It makes no sense that anyone who sat down and thought about the issue would think that this action would reduce spam.

webprofessor edited 2006-09-29T19:04:58-07:00
1 0

I won't presume to guess what they are doing this for, I have no idea. However I think that "My guess is that this is a method of combatting spam" is a silly conclusion to reach. It makes no sense that anyone who sat down and thought about the issue would think that this action would reduce spam. 
Cancel
- Oatmeal
 
 2006-09-29T18:52:50-07:00
 
 If you have another explanation I'm all ears
 
 Oatmeal edited 2006-09-29T18:57:57-07:00
 1 0
 
 If you have another explanation I'm all ears
 Cancel
roadies

2006-09-29T21:59:34-07:00

Yeah they are cloaking. I used this header request simulator and went to my del.icio.us account. Meta tags are completely different. But the nofollows for all my externals links are still in place (damn nofollow!).

1 0

Yeah they are cloaking. I used this <a href="https://www.hashemian.com/tools/browser-simulator.htm" rel="nofollow">header request simulator</a> and went to my del.icio.us account. Meta tags are completely different. But the nofollows for all my externals links are still in place (damn nofollow!).
Cancel
webprofessor

2006-09-30T06:17:19-07:00

Spammers don't look at those things. They don't care... but they do love the fact the pages are in the google cache. The only thing it stops is legitimate bots.

1 0

Spammers don't look at those things. They don't care... but they do love the fact the pages are in the google cache. The only thing it stops is legitimate bots. 
Cancel
solro

2007-02-11T17:17:12-08:00

De.li.cious still sends a decent amount of traffic if you make it on the main page and since many users have completely replaced their regular bookmarks with social bookmarking services it's a great metric for measuring the quality and "addictive" effect of your website content.

1 1

De.li.cious still sends a decent amount of traffic if you make it on the main page and since many users have completely replaced their regular bookmarks with social bookmarking services it's a great metric for measuring the quality and "addictive" effect of your website content.
Cancel

Post Analytics

Comments 16

Log in to Moz

Don't have an account?