 |
 |
Currently, our web crawlers are designed to visit one web site not more than
once (the intention is to collect as many unique web sites as possible). We will change this
behavior in the future as soon as the repository becomes larger so that web sites will also
be revisited. The web crawler
gathers only those web sites that are referenced by other sites by HREF tags. Also to reduce
load on servers, a server should not be visited more than once in a few seconds. If you
experience problems due to our crawlers please contact us at
contact@inetbot.com.
To avoid inetbot from crawling your web site or parts of your web site you may make use
of the robots exclusion standard. Simply add an
entry to your robots.txt setting the user agent to inetbot as follows:
| User-agent: inetbot | (* for all robots) |
| Disallow: path | (/ for the complete web site)
|
The robots.txt is a standard for robot exclusion. For more details of the
standard you should visit
http://www.robotstxt.org/wc/exclusion.html#robotstxt.
The latest version of inetbot is: 0.0alpha
|
|
|
 |