inetbot web crawler
Main  |  Get access to the repository  |  API  |  The robot  |  Publications  |  Usenet Groups  |  Plainweb  |  Authors / Contact  | 
 inetbot - Introduction

Welcome to inetbot, a full experimental and research oriented web and usenet repository. Permanently, our robots are crawling the world wide web and the usenet groups to fill our repository with web sites and new newsgroup postings. The repository is used to test the scalability of new algorithms mainly from the research areas information retrieval, search engine development, clustering, machine learning, text processing, and data mining.

Currently, due to lack of the required infrastructure we are not able to provide full access to the repository. Therefore, we depend on paid advertisement to extend and keep the current infrastructure running. We are also looking for sponsors which would like to help us building one of the largest web repositories which should be open to all researchers by providing us with bandwidth, servers and storage.

So far we have developed the following components:
  • a centralized crawler system to crawl the world wide web and usenet
  • a scalable repository to store web sites
  • a distributed filesystem
For the future we plan to develop:

All components are (and will be) written in C/C++ for efficiency and are running on standard Linux systems.

Copyright © 2017 inetbot   -   All rights reserved