You are browsing the archive for Blog.

by admin

5th BlogForever Consortium Meeting & Workshop

February 26, 2013 in Blog

BlogForever 5th Project MeetingThe fifth BlogForever Consortium Meeting took place during 11-12 February 2013 in Istanbul, Turkey.

In addition to the meeting, technical partners participated in a full day workshop focused on the development of the BlogForever platform.


by Hendrik

Visit BlogForever at the CeBIT exhibition

February 4, 2013 in Blog, Events

The BlogForever project will be present on the CeBIT exhibition in Hannover/Germany from 05 to 09 March 2013.

The CeBIT is (

the world’s largest trade fair showcasing digital IT and telecommunications solutions for home and work environments. The key target groups are users from industry, the wholesale/retail sector, skilled trades, banks, the services sector, government agencies, science and all users passionate about technology. CeBIT offers an international platform for comparing notes on current industry trends, networking, and product presentations.

The BlogForever project will participate as part of the TU Berlin pavillion. You can find us in Hall 9, Stand C20.

by admin

BlogForever presented in the event “AUTH at NOESIS”

January 20, 2013 in Blog

BlogForever was presented in the event “AUTH at NOESIS” (Greek site only). The event was organized by the Research Committee and the Employment and Career Service of the Aristotle University of Thessaloniki during 18 – 20 January 2013 in the premises of NOESIS Science Center and Technology Museum in Thessaloniki. The aim of the event was to showcase the research activities of the Aristotle University to the public.

BlogForever AUTH NOESIS 1BlogForever AUTH NOESIS 3BlogForever AUTH NOESIS 2BlogForever AUTH NOESIS 4

2nd BlogForever review meeting

November 9, 2012 in Blog

The 2nd BlogForever review meeting was held in Berlin on the 6th of November 2012, hosted excellently by mokono (Populis).

The coordinator (AUTH) presented the status and the overall achievements of the project, that are summarized in the project statement. The review continued with partners from UW and TUB presenting the results of the study that the project conducted for Weblog Structure and Semantics (WP2). UG continued by analysing the development of the Preservation Strategy for blogs (WP3) and the advances that this task brings to the state of the art. Initial thoughts for the Interoperability Prospects and for the Digital Rights Management Policy development were also presented by AUTH and mokono (Populis) respectively. The review meeting continued with a combined presentation of the BlogForever software infrastructure (WP4) and the BlogForever case studies (WP5). CERN presented the status of the development while UL presented the plan for the implementation and validation of the case studies. CERN made a demonstration of the BlogForever repository while Cyberwatcher presented the interface and functionality of the BlogForever spider. Finally, Tero presented the dissemination activities during the last six months of the project and continued with the presentation of the plan for the exploitation of the project results; results of the market analysis study were presented together with the initial plan for the development of the BlogForever business model.

The meeting ended up with the feedback the two reviewers and the project officer gave to the consortium. The reviewers congratulated the BlogForever team stressing their opinion on the exemplary project management and the conceptual work that furthers the state of the art, and at the same time showing their satisfaction on the technical part and project.

by admin

4th BlogForever Consortium Meeting

September 14, 2012 in Blog

 The forth BlogForever Consortium Meeting took place during 10-11 September 2012 in Oslo, Norway. All project partners came together to discuss about current progress and plan for our next steps. The main topics of the meeting were:

Preservation in BlogForever: an alternative view

July 23, 2012 in Blog

I’d like to propose an alternative digital preservation view for the BF partners to consider.

The preservation problem is undoubtedly going to look complicated if we concentrate on the live blogosphere. It’s an environment that is full of complex behaviours and mixed content. Capturing it and replaying it presents many challenges.

But what type of content is going into the BF repository? Not the live blogosphere. What’s going in is material generated by the spider: it’s no longer the live web. It’s structured content, pre-processed, and parsed, fit to be read by the databases that form the heart of the BF system. If you like, the spider creates a “rendition” of the live web, recast into the form of a structured XML file.

What I propose is that these renditions of blogs should become the target of preservation. This way, we would potentially have a much more manageable preservation task ahead of us, with a limited range of content and behaviours to preserve and reproduce.

If these blog renditions are preservable, then the preservation performance we would like to replicate is the behaviour of the Invenio database, and not live web behaviour. All the preservation strategy needs to do is to guarantee that our normalised objects, and the database itself, conform to the performance model.

When I say “normalised”, I mean the crawled blogs that will be recast in XML. As I’ve suggested previously, XML is already known to be a robust preservation format. We anticipate that all the non-XML content is going to be images, stylesheets, multi-media, stylesheets, and attachments. Preservation strategies for this type of content are already well understood in the digital preservation world, and we can adapt them.

There is already a strand of the project that is concerned with migration of the database, to ensure future access and replay on applications and platforms of the future. This in itself could feasibly form the basis of the long-term preservation strategy.

The preservation promise in our case should not guarantee to recreate the live web, rather to recreate the contents of the BF repository, and to replicate the behaviour of the BF database. After all that is the real value of what the project is offering: searchability, retrievability, and creating structure (parsed XML files) where there is little or no structure (the live blogosphere).

Likewise it’s important that the original order and arrangement of the blogs be supported. I would anticipate that this will be one of the possible views of the harvested content. If it’s possible for an Invenio database query to “rebuild” a blog in its original order, that would be a test of whether preservation has succeeded.

As to PREMIS metadata: in this alternative scenario the live data in the database and the preserved data are one and the same thing. In theory, we should be able to manipulate the database to devise a PREMIS “view” of the data, with any additional fields needed to record our preservation actions on the files.

In short, I wonder whether the project is really doing “web archiving” at all? And does it matter if we aren’t?

In summary I would suggest:

  • We consider the target of preservation to be crawled blogs which have been transformed into parsed XML (I anticipate that this would not invalidate the data model).
  • We regard the spidering action as a form of “normalisation” which is an important step to transforming unmanaged blog content into a preservable package.
  • Following the performance model proposed by National Archives of Australia, we declare the performance we wish to replicate is that of normalised files in the Invenio database, rather than the behaviours of individual blogs. This approach potentially makes it simpler to define “significant properties”; instead of trying to define the significant properties of millions of blogs and their objects, we could concentrate on the significant properties of our normalised files, and of Invenio.

by admin

Call for Papers – World Wide Web Journal (Springer)

June 17, 2012 in Blog

Special Issue on Social Media Preservation and Applications (pdf)

The rise of the blogosphere and the following explosive growth of social media applications and communities have affected greatly our culture and communications. The research community is aware of the need to preserve social media records for future generations. Thus, social media archiving and long term digital preservation has become highly relevant. The key challenges of social media preservation are expanding over many fields including Web archiving, semantic Web, digital preservation, social computing and open access.


The primary goal of this special issue is to exchange the latest fundamental advances in the state of the art and practice of social media preservation and related areas. We are interested not only in papers with algorithmic innovations, but also in leading work on applications, experimental implementations and evaluations. Areas of interest include, but are not limited to:

  • Social media modeling & analysis
  • Current state of social media and trends
  • Social Web archiving
  • Web digital preservation
  • Social network analysis
  • Web 2.0 and semantic Web
  • Cultural patterns and representations
  • Spam detection
  • Social media content classification
  • Interoperability
  • Resource adaptation, allocation and delivery
  • Blogs, micro-blogs, internet forums
  • Topic detection
  • Case studies
  • Social media preservation case studies
  • Archiving applications and systems
  • Blog preservation technologies within applications and services
  • Content refreshing, migration, replication, emulation
  • Information retrieval
  • Topic detection
  • Metadata & metadata schemas
  • Preservation policies
  • Web digital preservation strategies
  • Digital rights management

Manuscript submission

Authors are encouraged to submit high-quality, original work that has neither appeared in, nor is under consideration by, other journals. Springer offers authors, editors and reviewers of World Wide Web Journal a Web-enabled online manuscript submission and review system. Manuscripts should be submitted to: under the article type ‘Social Media Preservation’. All submissions will be reviewed based on technical merit and relevance.


Deadline for paper submission: November 1, 2012
First round notification: February 1, 2013
Revised version due: April 1, 2013

Guest Editors

Yannis Manolopoulos
Aristotle University of Thessaloniki, Greece
manolopo at

Alexandra Cristea
University of Warwick, UK
A.I.Cristea at

Dimitrios Katsaros
University of Thessaly, Greece
dkatsar at

by Hendrik

BlogForever at the Long Night of Science

June 4, 2012 in Blog, Events

Last saturday, BlogForever was present at the Long Night of Science in Berlin and explained the necessity and challenges of blog preservation as well as the objectives and the status of the project to the general public. A lot of people were interested in the project and have not realized before that information could get lost in the internet.

We shared our booth with the EU project Cyberemotions that examines and visualizes sentiments in e-communities.

by admin

“Trends in Blog Preservation” keynote speech at ICEIS2012

May 9, 2012 in Blog

The paper: “Trends in Blog Preservation” will be presented at a keynote speech at the the 14th International Conference on Enterprise Information Systems, to be held on 28 June-1 July 2012 in Craiova, Romania.

Authors: Vangelis Banos, Nikos Baltas, Yannis Manolopoulos

Abstract: Blogging is yet another popular and prominent application in the era of Web 2.0. According to recent measurements often considered as conservative, as of now worldwide there are more than 152 million blogs with content spanning over every aspect of life and science, necessitating long term blog preservation and knowledge management. In this talk, we will present a range of issues that arise when facing the task of blog preservation. We argue that current web archiving solutions are not able to capture the dynamic and continuously evolving nature of blogs, their network and social structure as well as the exchange of concepts and ideas that they foster. Furthermore, we provide directions and objectives that could be reached to realize robust digital preservation, management and dissemination facilities for blogs. Finally, we will introduce the BlogForever EC funded project, its main motivation and findings towards widening the scope of blog preservation.

BlogForever at re:publica 2012

May 4, 2012 in Blog, Events

BlogForever hosted a small session yesterday afternoon at re:publica, Germany’s largest blogging and social media conference with more than 4,000 visitors from over 30 countries (  Our session was related to the urgency of blog preservation and other relevant themes in digital archiving. As we get closer and closer to our final software, we keep in mind the purpose of taking on such a task: to prevent the loss of social, cultural and historical artifacts contained within the Blogosphere and to preserve them for future generations.

We asked our visitors, what if Anne Frank’s diary had been a blog? What if Leonardo Da Vinci had kept his notes and sketches in blog format? What if Martin Luther had presented his critiques of the church on his blog instead of nailing them to the church doors? Understanding that it is difficult to know what should be preserved, or what will be important for the future, the time is still now to think of solutions and implement them – if we want to to avoid losses. We have enough examples of how failure to act has cost us valuable insights into our past and even our current understanding of the world.

During our session, we heard several stories about “lost blogs”, blogs that are no longer accessible because their authors have passed away, or have moved on from the blogging community. Some blogs were removed because the authors no longer had the rights to the content (as with some government or business blogs upon staff changes). We also shared some examples of blogs that simply disappeared under somewhat dubious circumstances involving third parties (see one example here: Our discussion of digital loss was not limited to blogs, but also included some of the early online communities that wound up scattered into the digital wind due to reductions in software support or format changes (see the closing of GeoCities in 2009 All of these stories remind us that our digital heritage, both on a personal or cultural level, is every bit as valuable as the many collections of letters, diaries, first editions, newspapers and other artifacts we preserve in physical form. Blogs, as one of the more complex and multifaceted forms of digital media, require quite a bit more attention to ensure their authentic and robust preservation.

To preserve blogs, one needs a software solution that will be able to respond quickly to dynamic shifts in world politics, science and culture – a solution that will allow us to develop blog archives that are valuable now. The solution must also be able to manage the development of technology and software redundancies – a solution that will ensure the safe preservation of digital artifacts well into the future. Curators, as well, will need certain tools and policy guidelines to help them manage the specific challenges associated with blog preservation.

We presented BlogForever as the software solution that will meet all of those criteria and do so in a way that is efficient, effective and user friendly.

Those individuals that we spoke with are excited about the project and see its value, not only for digital libraries and academic institutions but also for the general public.