 |
 |
Welcome to inetbot, a full experimental and research oriented web and usenet repository. Permanently,
our robots are crawling the world wide web and the usenet groups to fill our repository with web sites and new
newsgroup postings. The repository is used to test the scalability of new algorithms mainly from the
research areas information retrieval, search engine development, clustering, machine learning, text processing,
and data mining.
Currently, due to lack of the required infrastructure we are not able to provide full access to the repository.
Therefore, we depend on paid advertisement to extend and keep the current infrastructure running. We are also
looking for sponsors which would like to help us building one of the largest web repositories which should be
open to all researchers by providing us with bandwidth, servers and storage.
So far we have developed the following components:
- a centralized crawler system to crawl the world wide web and usenet
- a scalable repository to store web sites
- a distributed filesystem
|
For the future we plan to develop:
|
All components are (and will be) written in C/C++ for efficiency and are running on standard Linux systems.
|
|
|
 |