You are browsing the archive for Blog.

BlogForever platform released

October 8, 2013 in Blog, News

The BlogForever platform is one of the major results of the BlogForever project. It is a simple weblog digital archiving platform to preserve weblogs and ensure their authenticity, integrity, completeness, usability, and long term accessibility as a valuable cultural, social, and intellectual resource.

This release consists of the BlogForever repository and two blog spiders, a free version based on .NET and an OSS version based on python.

BlogForever Repository Component source code

http://invenio-software.org/repo/blogforever/ or

https://github.com/vbanos/blogforever

deployment instructions (PDF)

BlogForever Free Spider binaries

Spider, SpiderBackendApp, SpiderWebApp, (RAR files), deployment instructions (PDF)

BlogForever OSS Spider source code

bfspider.tar.bzip2 (82MB), README

BlogForever papers at iPRES2013

August 22, 2013 in Blog, News

ipres_green_logoTwo full papers and a poster stemming from the BlogForever project will be presented at the 10th International Conference on Preservation of Digital Objects (iPRES’2013) which will take place in Lisbon, Portugal, during 2-6 September 2013.

Interoperability of web archives and digital libraries: A Delphi study

Hendrik Kalb, Paraskevi Lazaridou, Ed Pinsent and Matthias Trier

The interoperability of web archives and digital libraries is crucial to avoid silos of preserved data and content. While various researches focus on specific facets of the challenge to interoperate, there is a lack of empirical work about the overall situation of actual challenges. We conduct a Delphi study to survey and reveal the insights of experts in the field. Results of our study are presented in this paper to enhance further research and development efforts for interoperability.

CLEAR: a credible method to evaluate website archivability

Vangelis Banos, Yunhyong Kim, Seamus Ross and Yannis Manolopoulos

Web archiving is crucial to ensure that cultural, scientific and social heritage on the web remains accessible and usable over time. A key aspect of the web archiving process is optimal data extraction from target websites. This procedure is difficult for such reasons as, website complexity, plethora of underlying technologies and ultimately the open-ended nature of the web. The purpose of this work is to establish the notion of Website Archivability (WA) and to introduce the Credible Live Evaluation of Archive Readiness (CLEAR) method to measure WA for any website. Website Archivability captures the core aspects of a website crucial in diagnosing whether it has the potentiality to be archived with completeness and accuracy. An appreciation of the archivability of a web site should provide archivists with a valuable tool when assessing the possibilities of archiving material and influence web design professionals to consider the
implications of their design decisions on the likelihood could be archived. A prototype application, archiveready.com, has been established to demonstrate the viabiity of the proposed method for assessing Website Archivability.

Diverse approaches to blog preservation: a comparative study

Richard M. Davis, Edward Pinsent, Silvia Arango-Docio

This poster presents highlights of a comparative study of three distinct approaches to preserving the content of blogs, to consider the relative benefits of each approach in meeting the requirements for blog preservation, in different contexts. Assessment criteria are drawn from key publications and frameworks on digital preservation as well practical considerations derived from the authors’ experience as users and designers of digital archiving tools and systems.

BlogForever at the Long Night of Science 2013

June 16, 2013 in Blog, Events

On Saturday, June 9, BlogForever was once again present to its appointment with Science, participating for the second year in the Long Night of Science in Berlin.

The Long Night of Science is an annual event in Berlin and other German cities where large scientific institutions demonstrate their research topics and accomplishments to the public.

TUB was there presenting the project to the visitors and giving answers about the objectives and the importance of blog preservation. Students, as well as other non-academic visitors, were particularly interested in what BlogForever is about.

«IN/SIDE/OUT» BlogForever meets bloggers from all over the world at re:publica 2013

May 13, 2013 in Blog, Events

Where can one personally meet bloggers from 50 countries and discuss the urgent issues of digital society in Europe? Right in the heart of Berlin, Germany.

This year, BlogForever once again presented our consortium’s efforts at re:publica, Germany´s biggest conference for social media issues and innovation (http://www.re-publica.de/). The conference took place for the seventh year in Berlin with about 5.000 visitors. 450 speakers presented workshops, discussions and talks from the 6th to the 8th of May about the latest developments in social media all over the world. In such an environment of active bloggers and fresh ideas, it was a great pleasure to be included and have the chance to discuss our project with the people whose content we hope to preserve.

It was both interesting and promising to see that the content of those discussions had changed from the previous year at re:publica. Many of our booth visitors were already informed and convinced that weblogs should be acknowledged as an important piece of our cultural heritage. Additionally, they were supportive of a more robust preservation, one that can better accommodate future research and improve accessibility for the public. The current situation in Syria and other crisis regions in Northern African countries were especially noted as underlining the necessity and importance of independent blogging and the access of such information for journalists, researchers and the public in general.
While bloggers wanted to inform themselves about ways to preserve their own blog more completely and securely, they also asked for ways to disseminate their blog more efficiently. Some individuals indicated interest in a large-scale solution, similar to the internet archive (http://archive.org) in the United States, for European weblogs. BlogForever also met with researchers at several European universities who represented communities of expert bloggers focusing on specific thematic areas, such as science or technology. Several of these individuals expressed the need for a blog preservation solution which could strengthen research networks.

We have come a long way in spreading the message about the importance of digital preservation, in particular of weblogs. The public is now ready to dream about the ways in which preservation can be valuable in the present as well as the future.

5th BlogForever Consortium Meeting & Workshop

February 26, 2013 in Blog

BlogForever 5th Project MeetingThe fifth BlogForever Consortium Meeting took place during 11-12 February 2013 in Istanbul, Turkey.

In addition to the meeting, technical partners participated in a full day workshop focused on the development of the BlogForever platform.

 

Profile photo of Hendrik

by Hendrik

Visit BlogForever at the CeBIT exhibition

February 4, 2013 in Blog, Events

The BlogForever project will be present on the CeBIT exhibition in Hannover/Germany from 05 to 09 March 2013.

The CeBIT is (http://www.cebit.de/en/about-the-trade-show/facts-figures/about-cebit-2013)

the world’s largest trade fair showcasing digital IT and telecommunications solutions for home and work environments. The key target groups are users from industry, the wholesale/retail sector, skilled trades, banks, the services sector, government agencies, science and all users passionate about technology. CeBIT offers an international platform for comparing notes on current industry trends, networking, and product presentations.

The BlogForever project will participate as part of the TU Berlin pavillion. You can find us in Hall 9, Stand C20.

BlogForever presented in the event “AUTH at NOESIS”

January 20, 2013 in Blog

BlogForever was presented in the event “AUTH at NOESIS” (Greek site only). The event was organized by the Research Committee and the Employment and Career Service of the Aristotle University of Thessaloniki during 18 – 20 January 2013 in the premises of NOESIS Science Center and Technology Museum in Thessaloniki. The aim of the event was to showcase the research activities of the Aristotle University to the public.

BlogForever AUTH NOESIS 1BlogForever AUTH NOESIS 3BlogForever AUTH NOESIS 2BlogForever AUTH NOESIS 4

2nd BlogForever review meeting

November 9, 2012 in Blog

The 2nd BlogForever review meeting was held in Berlin on the 6th of November 2012, hosted excellently by mokono (Populis).

The coordinator (AUTH) presented the status and the overall achievements of the project, that are summarized in the project statement. The review continued with partners from UW and TUB presenting the results of the study that the project conducted for Weblog Structure and Semantics (WP2). UG continued by analysing the development of the Preservation Strategy for blogs (WP3) and the advances that this task brings to the state of the art. Initial thoughts for the Interoperability Prospects and for the Digital Rights Management Policy development were also presented by AUTH and mokono (Populis) respectively. The review meeting continued with a combined presentation of the BlogForever software infrastructure (WP4) and the BlogForever case studies (WP5). CERN presented the status of the development while UL presented the plan for the implementation and validation of the case studies. CERN made a demonstration of the BlogForever repository while Cyberwatcher presented the interface and functionality of the BlogForever spider. Finally, Tero presented the dissemination activities during the last six months of the project and continued with the presentation of the plan for the exploitation of the project results; results of the market analysis study were presented together with the initial plan for the development of the BlogForever business model.

The meeting ended up with the feedback the two reviewers and the project officer gave to the consortium. The reviewers congratulated the BlogForever team stressing their opinion on the exemplary project management and the conceptual work that furthers the state of the art, and at the same time showing their satisfaction on the technical part and project.

4th BlogForever Consortium Meeting

September 14, 2012 in Blog

 The forth BlogForever Consortium Meeting took place during 10-11 September 2012 in Oslo, Norway. All project partners came together to discuss about current progress and plan for our next steps. The main topics of the meeting were:

Preservation in BlogForever: an alternative view

July 23, 2012 in Blog

I’d like to propose an alternative digital preservation view for the BF partners to consider.

The preservation problem is undoubtedly going to look complicated if we concentrate on the live blogosphere. It’s an environment that is full of complex behaviours and mixed content. Capturing it and replaying it presents many challenges.

But what type of content is going into the BF repository? Not the live blogosphere. What’s going in is material generated by the spider: it’s no longer the live web. It’s structured content, pre-processed, and parsed, fit to be read by the databases that form the heart of the BF system. If you like, the spider creates a “rendition” of the live web, recast into the form of a structured XML file.

What I propose is that these renditions of blogs should become the target of preservation. This way, we would potentially have a much more manageable preservation task ahead of us, with a limited range of content and behaviours to preserve and reproduce.

If these blog renditions are preservable, then the preservation performance we would like to replicate is the behaviour of the Invenio database, and not live web behaviour. All the preservation strategy needs to do is to guarantee that our normalised objects, and the database itself, conform to the performance model.

When I say “normalised”, I mean the crawled blogs that will be recast in XML. As I’ve suggested previously, XML is already known to be a robust preservation format. We anticipate that all the non-XML content is going to be images, stylesheets, multi-media, stylesheets, and attachments. Preservation strategies for this type of content are already well understood in the digital preservation world, and we can adapt them.

There is already a strand of the project that is concerned with migration of the database, to ensure future access and replay on applications and platforms of the future. This in itself could feasibly form the basis of the long-term preservation strategy.

The preservation promise in our case should not guarantee to recreate the live web, rather to recreate the contents of the BF repository, and to replicate the behaviour of the BF database. After all that is the real value of what the project is offering: searchability, retrievability, and creating structure (parsed XML files) where there is little or no structure (the live blogosphere).

Likewise it’s important that the original order and arrangement of the blogs be supported. I would anticipate that this will be one of the possible views of the harvested content. If it’s possible for an Invenio database query to “rebuild” a blog in its original order, that would be a test of whether preservation has succeeded.

As to PREMIS metadata: in this alternative scenario the live data in the database and the preserved data are one and the same thing. In theory, we should be able to manipulate the database to devise a PREMIS “view” of the data, with any additional fields needed to record our preservation actions on the files.

In short, I wonder whether the project is really doing “web archiving” at all? And does it matter if we aren’t?

In summary I would suggest:

  • We consider the target of preservation to be crawled blogs which have been transformed into parsed XML (I anticipate that this would not invalidate the data model).
  • We regard the spidering action as a form of “normalisation” which is an important step to transforming unmanaged blog content into a preservable package.
  • Following the performance model proposed by National Archives of Australia, we declare the performance we wish to replicate is that of normalised files in the Invenio database, rather than the behaviours of individual blogs. This approach potentially makes it simpler to define “significant properties”; instead of trying to define the significant properties of millions of blogs and their objects, we could concentrate on the significant properties of our normalised files, and of Invenio.