Remember Me | register
Entries Blogs

Forums > Animenano.com Issues and Requests > stopping a feed scraper

1 2 3 >>
jpmeyer
Jpmeyer
Lvl: 4
Posts: 32
05/11/2007 09:56 AM EDT

This "tubeless" site is using the Elliot Back plugin to scrape people's blog feeds and it's really annoying.  Anyone know a good way to stop this?

DiGiKerot
Digikerot
Lvl: 2
Posts: 9
05/11/2007 12:46 PM EDT

God, this has been annoying me too. I think I'm just going to try blocking the IP address at .htaccess level and see what happens...

Kurisu
Kurisu
Lvl: 5
Posts: 52
05/11/2007 12:58 PM EDT

Search for their IP adress and add something like this to your .htaccess file to block them:

RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^XX\.XX\.XX\.XX
RewriteRule ^.*$ - [F]

 

 

You could also redirect them to a special feed you set up (e.g. saying that they stole your content):

RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^XX.XX.XX.XX
RewriteRule ^(.*)$ http://newfeedurl.com/feed

 

Or you could redirect them to their own feed. 

 

 

Additionally, you should contact the hoster and send them a cease and desist letter.

You can also try to send Google a DMCA. 

 

 

 

scraper 

owen
Owen
Lvl: 3
Posts: 21
05/11/2007 01:00 PM EDT

It happened to my last post, but Akismet marked it as spam so I didn't see it at all -- consider installing it on your blog or something. It works wonders for spam, and lolikit's comments. Especially the latter.

Kurisu
Kurisu
Lvl: 5
Posts: 52
05/11/2007 01:03 PM EDT

"and lolikit's comments."
lol 

Kabitzin
Kabitzin
Lvl: 4
Posts: 41
05/11/2007 01:15 PM EDT

It depends on how serious the scaper is.  Certain programs will strip out all code from feeds, which makes it difficult to use various plugins to stop the scrapers.  Other scrapers will actually download the posts and images from your site, bypassing hotlinking protection.

As for .htaccess banning IPs, I'm not personally a huge fan of IP banning as a long-term solution.  It's pretty trivial to change IPs, and then you often just end up blocking someone who is a legitimate user.

I'm sure some people hate this, but on our website, I only allow excerpts in the RSS feed.  Some other tips include contacting google (if they use adsense) and reporting the scraping, or perhaps routing your feeds through feedburner.  There are also a number of wordpress plugins that will append text or a "digital fingerprint" to your RSS feeds.  This may help to hijack the scraper or at least tell who is scraping your RSS.  However, I am not completely convinced that these plugins will stop the problem.

Kabitzin
Kabitzin
Lvl: 4
Posts: 41
05/11/2007 01:21 PM EDT

BTW, Hung posted a funny trick you play on scrapers, at least for a little bit. 

hung
Hung
Lvl: 12
Posts: 462
05/11/2007 03:04 PM EDT

I think there's some kind of plugin where you can block people easier than by using an htaccess. I'll have to see if it's still around, and if it actually works.

Edit: Found it. Does anyone want to be the guinea pig who verifies if it works or not? ;) 

Xebek
Xebek
Lvl: 2
Posts: 4
05/11/2007 04:13 PM EDT

I will, I'm one of the sites Tubeless is scrapping.

Although, I need to figure out how to work the plugin first.

Kabitzin
Kabitzin
Lvl: 4
Posts: 41
05/11/2007 05:44 PM EDT

Tubeless: whois | more info

Current Registrar: TUCOWS INC.
IP Address: 76.162.58.250 (ARIN & RIPE IP search)
IP Location: US(UNITED STATES)-KENTUCKY-HOPKINSVILLE
Lock Status: clientTransferProhibited
DMOZ no listings
Y! Directory: see listings
Data as of: 14-Jun-2005

 

Xebek
Xebek
Lvl: 2
Posts: 4
05/12/2007 08:35 AM EDT

The plugin doesn't seem to work. So far my posts are being displayed by Tubeless the saem as before. I've set it to two IPs tolook for, as the one Kabitzin mentioned is different then the one I found using the site/tool mentioned in Hung's trick.

Alternatively, with the plugin you can set it to look for different user agents and not IPs, but I don't really know what the is as tehre is no readme that came with the plugin, and I had to figure a lot of stuff by trial and error. 

Zyl
Zyl
Lvl: 2
Posts: 10
05/14/2007 10:39 AM EDT

I've also been scrapped by tubeless. Have tried the AntiLeech plugin, but like Xebek, no joy. Apparently if the feed is being grabbed via Feedburner, AntiLeech doesn't work. (Link) Now I'm half considering whether or not to delete my Feedburner feed!

In any case, the Google Ads seemed to have vanished from tubeless.

hung
Hung
Lvl: 12
Posts: 462
05/14/2007 11:55 AM EDT

Hmm. It looks like they're scraping via a secondary RSS feed. So a local change wouldn't stop them. I dunno which feed they're using though. It's weird because the thing says "Original post by AnimeNation News" instead of, say, Basugasubakuhatsu Anime Blog.

Plus they don't have ads, so I don't see what their point is. I guess at some point I disabled trackbacks, so they don't get through to me, which is why I didn't really care about Tubeless. It's still sorta annoying, but they're not really harming me in any way... 

Kurisu
Kurisu
Lvl: 5
Posts: 52
05/14/2007 01:29 PM EDT

I always wondered why you disabled trackbacks. I even think you're the only one who has disabled them.

Isn't there a plugin that blacklists trackbacks from certain sites?

hung
Hung
Lvl: 12
Posts: 462
05/14/2007 01:39 PM EDT

Yeah, too much trackback spam.

Um, I dunno if there's a plugin for that. Akismet blocks the obvious ones, but not the not obvious ones. 

1 2 3 >>
Anime-nano-rss