You are browsing the archive for spider crawler download.

Test the blog spider prototype

December 13, 2011 in Blog, News

Finally the first software delivery in the Blogforever project is available – the prototype of a blogosphere spider.

The spider enables crawling and monitoring lists of identified blogs as well as new, unknown blogs. ItThe  Any new blog posts or comments from each blog will be added to the feed through the spider.

The spider can be downloaded and run from a single server; and managed through a web portal interface, as seen in the figure below.






 Figure 1 -Spider portal: Details linked from any of indexed sources can display crawled XML, and link to actual HTML.

Although this is only the first prototype of the research project, we have tested the prototype through crawling 36,000 distinct blogs, and extracting blog data of approximately 1GB.

The prototype spider can be downloaded and run from:

Minimum server requirements to run the crawler:

  1. Operating System: Windows 2008 Server 64bit
  2. CPU: 2 Xeon CPUs 2.5Ghz
  3. RAM: 4 GB
  4. Hard Disk: 20GB (SAS) plus 60GB (SATA)

 Anyone interested in blog crawling should test this and contact us for discussing further requirements and usage.






Figure 2 – The downloader file: Contains download and installation instruction for the Blogforever Prototype Spider