Logo Your Web Hosting Solution
Line
robots.txt Guide
Usage Info
List Of Bots
Creation Tool
Use Meta Tags
Block Bad Bots
Tutorials Home
Home Page
robots.txt Tutorial - Block Bad Bots

Some bots will ignore robots.txt files as they don't care if you want them on your web site or not.

These can be blocked by using a .htaccess file instead.

1. Block robots via .htaccess

We can't block by robot name here, we block them by matching the beginning of their User-Agent string.

SetEnvIfNoCase User-Agent "^EmailSiphon" bad_bot
SetEnvIfNoCase User-Agent "^EmailWolf" bad_bot
SetEnvIfNoCase User-Agent "^ExtractorPro" bad_bot
SetEnvIfNoCase User-Agent "^CherryPicker" bad_bot
SetEnvIfNoCase User-Agent "^NICErsPRO" bad_bot
SetEnvIfNoCase User-Agent "^Teleport" bad_bot
SetEnvIfNoCase User-Agent "^EmailCollector" bad_bot

<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

This example bans robots on our list of spambots.

To block another robot, add a line for it near the top.

SetEnvIfNoCase User-Agent "^User-Agent" bad_bot

Replace User-Agent with the User-Agent string for this robot, as found in log files. Here's a sample log entry.

xyz.net - - [07/Mar/2003:11:28:35] "GET / HTTP/1.0" 403 - "-" "Teleport 1.28"

Here, the User-Agent is Teleport 1.28. The ^ character in the SetEnvIfNoCase lines tells our .htaccess file to block anything starting with the string we provide.

Any User-Agent starting directly with Teleport would be blocked, regardless of version number or added text.

2. Tool to create the .htaccess file

This tool can create .htaccess files for you, blocking some of the robots discussed in this tutorial.

You can also enter up to six custom User-Agent strings to have blocked from your site. Enter one per box.

 Create a custom .htaccess file
Block robots used by spammers
Block link directory builder bots
 Enter User-Agent strings (optional)

Top Of Page 
Line
Copyright© 1996 - 2024 Clockwatchers, Inc.