Home > Make Money > Article
 
 
ROBOT.TXT File for Search Engines
 

A robots.txt is a file placed on your server to tell the various search engine spiders not to crawl or index certain sections or pages of your site. You can use it to prevent indexing totally, prevent certain areas of your site from being indexes or to issue individual indexing instructions to specific search engines.

 

The file itself is a simple text file, which can be created in Notepad. It need to be saved to the root directory of your site, that is the directory where your home page or index page is.

 

All search engines, or at least all the important ones, now look for a robots.txt file as soon their spiders or bots arrive on your site. So, even if you currently do not need to exclude the spiders from any part of your site, having a robots.txt file is still a good idea, it can act as a sort of invitation into your site.

 

The very fact that search engines are looking for them is reason enough to put one on your site. Have you looked at your site statistics recently? If your stats include a section on 'files not found', you are sure to see many entries where search engines spiders looked for, and failed to find, a robots.txt file on your site.

 

Create a new file with Notepad and call it robots.txt

 
The two conventions used in robots.txt file are User-agent: and Disallow: /
 

User-agent: * By using the * or wild card you are addressing ALL robots. If you wish to address individual robots you need to list each robot separately with an individual User-agent: statement. They must be listed by their specific name or IP Address, along with a separate Disallow: / statement listing the folders and files you DO NOT want the specified robot to index.

 

Tip: Use the * wild card to address all robots..... it is the safest way
Disallow: / List any folders that you do not want to have indexed by robots.

Warning: Disallow: / used without any folder name tells the robot do not index ANY page of the website.
 
ALL Files and folders in the directory named in the Disallow: / statement as well as all of those under it will NOT be indexed by robots.
 

There is nothing difficult about creating a basic robots.txt file. It can be created using notepad or whatever is your favorite text editor. Each entry has just two lines:

 

User-Agent: [Spider or Bot name]
Disallow: [Directory or File Name]

 

This line can be repeated for each directory or file you want to exclude, or for each spider or bot you want to exclude.

 
Exclude a file from an individual Search Engine
 

You have a file, privatefile.htm, in a directory called 'private' that you do not wish to be indexed by Google. You know that the spider that Google sends out is called 'Googlebot'. You would add these lines to your robots.txt file:

 

User-Agent: Googlebot
Disallow: /private/privatefile.htm