inetbot web crawler
Main  |  Get access to the repository  |  API  |  The robot  |  Publications  |  Usenet Groups  |  Plainweb  |  Authors / Contact  | 
 inetbot - The Robot

Currently, our web crawlers are designed to visit one web site not more than once (the intention is to collect as many unique web sites as possible). We will change this behavior in the future as soon as the repository becomes larger so that web sites will also be revisited. The web crawler gathers only those web sites that are referenced by other sites by HREF tags. Also to reduce load on servers, a server should not be visited more than once in a few seconds. If you experience problems due to our crawlers please contact us at contact@inetbot.com.

To avoid inetbot from crawling your web site or parts of your web site you may make use of the robots exclusion standard. Simply add an entry to your robots.txt setting the user agent to inetbot as follows:

User-agent: inetbot(* for all robots)
Disallow: path(/ for the complete web site)

The robots.txt is a standard for robot exclusion. For more details of the standard you should visit http://www.robotstxt.org/wc/exclusion.html#robotstxt.

The latest version of inetbot is: 0.0alpha
Copyright © 2017 inetbot   -   All rights reserved