You are browsing the archive for News.

BlogForever platform released

October 8, 2013 in Blog, News

The BlogForever platform is one of the major results of the BlogForever project. It is a simple weblog digital archiving platform to preserve weblogs and ensure their authenticity, integrity, completeness, usability, and long term accessibility as a valuable cultural, social, and intellectual resource.

This release consists of the BlogForever repository and two blog spiders, a free version based on .NET and an OSS version based on python.

BlogForever Repository Component source code

http://invenio-software.org/repo/blogforever/ or

https://github.com/vbanos/blogforever

deployment instructions (PDF)

BlogForever Free Spider binaries

Spider, SpiderBackendApp, SpiderWebApp, (RAR files), deployment instructions (PDF)

BlogForever OSS Spider source code

bfspider.tar.bzip2 (82MB), README

BlogForever papers at iPRES2013

August 22, 2013 in Blog, News

ipres_green_logoTwo full papers and a poster stemming from the BlogForever project will be presented at the 10th International Conference on Preservation of Digital Objects (iPRES’2013) which will take place in Lisbon, Portugal, during 2-6 September 2013.

Interoperability of web archives and digital libraries: A Delphi study

Hendrik Kalb, Paraskevi Lazaridou, Ed Pinsent and Matthias Trier

The interoperability of web archives and digital libraries is crucial to avoid silos of preserved data and content. While various researches focus on specific facets of the challenge to interoperate, there is a lack of empirical work about the overall situation of actual challenges. We conduct a Delphi study to survey and reveal the insights of experts in the field. Results of our study are presented in this paper to enhance further research and development efforts for interoperability.

CLEAR: a credible method to evaluate website archivability

Vangelis Banos, Yunhyong Kim, Seamus Ross and Yannis Manolopoulos

Web archiving is crucial to ensure that cultural, scientific and social heritage on the web remains accessible and usable over time. A key aspect of the web archiving process is optimal data extraction from target websites. This procedure is difficult for such reasons as, website complexity, plethora of underlying technologies and ultimately the open-ended nature of the web. The purpose of this work is to establish the notion of Website Archivability (WA) and to introduce the Credible Live Evaluation of Archive Readiness (CLEAR) method to measure WA for any website. Website Archivability captures the core aspects of a website crucial in diagnosing whether it has the potentiality to be archived with completeness and accuracy. An appreciation of the archivability of a web site should provide archivists with a valuable tool when assessing the possibilities of archiving material and influence web design professionals to consider the
implications of their design decisions on the likelihood could be archived. A prototype application, archiveready.com, has been established to demonstrate the viabiity of the proposed method for assessing Website Archivability.

Diverse approaches to blog preservation: a comparative study

Richard M. Davis, Edward Pinsent, Silvia Arango-Docio

This poster presents highlights of a comparative study of three distinct approaches to preserving the content of blogs, to consider the relative benefits of each approach in meeting the requirements for blog preservation, in different contexts. Assessment criteria are drawn from key publications and frameworks on digital preservation as well practical considerations derived from the authors’ experience as users and designers of digital archiving tools and systems.

Technological foundations of the current Blogosphere paper accepted at WIMS’12

February 15, 2012 in Blog, News, Publications

A paper named “Technological foundations of the current Blogosphere” has been accepted in the International Conference on Web Intelligence, Mining and Semantics (WIMS’12), to be held on 13-15 June 2012 in Craiova, Romania.

Authors: Vangelis Banos, Karen Stepanyan, Yannis Manolopoulos, Mike Joy and Alexandra Cristea

Abstract: In this paper, we review the technological foundations of the current Blogosphere. The review is primarily based on a large-scale evaluation of active blogs. The extensive list of examined technologies enables commenting on a range of widely adopted standards and potential trends in the Blogosphere. The evaluation has been conducted in the following stages:

  1. Retrieving and parsing a large set of blogs
  2. Identifying and quantifying the use of technologies such as web standards, adopted services, file formats and platforms.
  3. Analysing collected data and reporting the results
  4. Comparing the results with existing findings from the generic Web to identify similarities and differences in the Blogosphere.

The presented work was performed as part of BlogForever (ICT No. 269963), an EC funded research project aiming to aggregate, preserve, manage and disseminate blogs. The results of this study are relevant within the context weblog preservation and weblog data extraction.

Test the blog spider prototype

December 13, 2011 in Blog, News

Finally the first software delivery in the Blogforever project is available – the prototype of a blogosphere spider.

The spider enables crawling and monitoring lists of identified blogs as well as new, unknown blogs. ItThe  Any new blog posts or comments from each blog will be added to the feed through the spider.

The spider can be downloaded and run from a single server; and managed through a web portal interface, as seen in the figure below.

 

 

 

 

 

 Figure 1 -Spider portal: Details linked from any of indexed sources can display crawled XML, and link to actual HTML.

Although this is only the first prototype of the research project, we have tested the prototype through crawling 36,000 distinct blogs, and extracting blog data of approximately 1GB.

The prototype spider can be downloaded and run from: http://bf2.csd.auth.gr/BFCrawler.rar.

Minimum server requirements to run the crawler:

  1. Operating System: Windows 2008 Server 64bit
  2. CPU: 2 Xeon CPUs 2.5Ghz
  3. RAM: 4 GB
  4. Hard Disk: 20GB (SAS) plus 60GB (SATA)

 Anyone interested in blog crawling should test this and contact us for discussing further requirements and usage.

 

 

 

 

 

Figure 2 – The downloader file: Contains download and installation instruction for the Blogforever Prototype Spider

 

2nd BlogForever Consortium Meeting

September 16, 2011 in Blog, News

The 2nd BlogForever Consortium Meeting took place during 8-9 September in Thessaloniki, Greece. Nineteen participants from twelve institutions came to Thessaloniki to discuss about BlogForever. Current progress was evaluated and the project roadmap was laid down.

The meeting was organized in sessions covering all aspects of the project:

  • Weblog Structure and Semantics (WP2) was one of the main sessions of the meeting, covering recently submitted BlogForever Survey and the pending Blog Data Model.
  • The BlogForever Policies (WP3) section of the meeting covered work on Risk management as well as the Preservation Policy.
  • In the BlogForever software platform (WP4) session, work on User Requirements & Platform Specifications was evaluated. Additionally, a special technical session explored possible ways of designing & developing the BlogForever Platform.
  • Last but not least, the dissemination plan & associated activites were presented in the Dissemination & Exploitation (WP6) session.

Besides BlogForever partners, Carolyn Hank was also invited to present her work on Blog Preservation and contribute to expanding the spectrum of the project.

 

The BlogForever survey is live!

July 11, 2011 in Blog, News

After weeks of design work, the BlogForever survey is live, available in 6 languages and running for 28 days. The results of the survey, available at the end of the summer, will help us to develop digital preservation, management and  dissemination  facilities for weblogs within the BlogForever project. Hence, we are keen to gather information from you about blog content, context and usage patterns of current weblogs, so we could identify your views on the long-term preservation, management, analysis, access and future use of the BlogForever Archive. We would appreciate if you could take part on the survey using the following link:

Thanks for participating!

Kick off Meeting

February 20, 2011 in News

The kick-off meeting will be held in Warwick, UK on the 22nd and 23rd of March 2011 and the administrative workshop (for those who need it) will be held on the 21st of March.

About BlogForever

November 20, 2010 in News

BLOGFOREVER will develop robust digital preservation, management and dissemination facilities for weblogs. These facilities will be able to capture the dynamic and continuously evolving nature of weblogs, their network and social structure, and the exchange of concepts and ideas that they foster; pieces of information omitted by current Web Archiving methodsand solutions.