You are browsing the archive for Blog.

Call for Papers – World Wide Web Journal (Springer)

June 17, 2012 in Blog

Special Issue on Social Media Preservation and Applications (pdf)

The rise of the blogosphere and the following explosive growth of social media applications and communities have affected greatly our culture and communications. The research community is aware of the need to preserve social media records for future generations. Thus, social media archiving and long term digital preservation has become highly relevant. The key challenges of social media preservation are expanding over many fields including Web archiving, semantic Web, digital preservation, social computing and open access.

Scope

The primary goal of this special issue is to exchange the latest fundamental advances in the state of the art and practice of social media preservation and related areas. We are interested not only in papers with algorithmic innovations, but also in leading work on applications, experimental implementations and evaluations. Areas of interest include, but are not limited to:

  • Social media modeling & analysis
  • Current state of social media and trends
  • Social Web archiving
  • Web digital preservation
  • Social network analysis
  • Web 2.0 and semantic Web
  • Cultural patterns and representations
  • Spam detection
  • Social media content classification
  • Interoperability
  • Resource adaptation, allocation and delivery
  • Blogs, micro-blogs, internet forums
  • Topic detection
  • Case studies
  • Social media preservation case studies
  • Archiving applications and systems
  • Blog preservation technologies within applications and services
  • Content refreshing, migration, replication, emulation
  • Information retrieval
  • Topic detection
  • Metadata & metadata schemas
  • Preservation policies
  • Web digital preservation strategies
  • Digital rights management

Manuscript submission

Authors are encouraged to submit high-quality, original work that has neither appeared in, nor is under consideration by, other journals. Springer offers authors, editors and reviewers of World Wide Web Journal a Web-enabled online manuscript submission and review system. Manuscripts should be submitted to: http://WWWJ.edmgr.com under the article type ‘Social Media Preservation’. All submissions will be reviewed based on technical merit and relevance.

Schedule

Deadline for paper submission: November 1, 2012
First round notification: February 1, 2013
Revised version due: April 1, 2013

Guest Editors

Yannis Manolopoulos
Aristotle University of Thessaloniki, Greece
manolopo at csd.auth.gr

Alexandra Cristea
University of Warwick, UK
A.I.Cristea at warwick.ac.uk

Dimitrios Katsaros
University of Thessaly, Greece
dkatsar at inf.uth.gr

Profile photo of Hendrik

by Hendrik

BlogForever at the Long Night of Science

June 4, 2012 in Blog, Events

Last saturday, BlogForever was present at the Long Night of Science in Berlin and explained the necessity and challenges of blog preservation as well as the objectives and the status of the project to the general public. A lot of people were interested in the project and have not realized before that information could get lost in the internet.

We shared our booth with the EU project Cyberemotions that examines and visualizes sentiments in e-communities.

“Trends in Blog Preservation” keynote speech at ICEIS2012

May 9, 2012 in Blog

The paper: “Trends in Blog Preservation” will be presented at a keynote speech at the the 14th International Conference on Enterprise Information Systems, to be held on 28 June-1 July 2012 in Craiova, Romania.

Authors: Vangelis Banos, Nikos Baltas, Yannis Manolopoulos

Abstract: Blogging is yet another popular and prominent application in the era of Web 2.0. According to recent measurements often considered as conservative, as of now worldwide there are more than 152 million blogs with content spanning over every aspect of life and science, necessitating long term blog preservation and knowledge management. In this talk, we will present a range of issues that arise when facing the task of blog preservation. We argue that current web archiving solutions are not able to capture the dynamic and continuously evolving nature of blogs, their network and social structure as well as the exchange of concepts and ideas that they foster. Furthermore, we provide directions and objectives that could be reached to realize robust digital preservation, management and dissemination facilities for blogs. Finally, we will introduce the BlogForever EC funded project, its main motivation and findings towards widening the scope of blog preservation.

BlogForever at re:publica 2012

May 4, 2012 in Blog, Events

BlogForever hosted a small session yesterday afternoon at re:publica, Germany’s largest blogging and social media conference with more than 4,000 visitors from over 30 countries (http://re-publica.de/12/).  Our session was related to the urgency of blog preservation and other relevant themes in digital archiving. As we get closer and closer to our final software, we keep in mind the purpose of taking on such a task: to prevent the loss of social, cultural and historical artifacts contained within the Blogosphere and to preserve them for future generations.

We asked our visitors, what if Anne Frank’s diary had been a blog? What if Leonardo Da Vinci had kept his notes and sketches in blog format? What if Martin Luther had presented his critiques of the church on his blog instead of nailing them to the church doors? Understanding that it is difficult to know what should be preserved, or what will be important for the future, the time is still now to think of solutions and implement them – if we want to to avoid losses. We have enough examples of how failure to act has cost us valuable insights into our past and even our current understanding of the world.

During our session, we heard several stories about “lost blogs”, blogs that are no longer accessible because their authors have passed away, or have moved on from the blogging community. Some blogs were removed because the authors no longer had the rights to the content (as with some government or business blogs upon staff changes). We also shared some examples of blogs that simply disappeared under somewhat dubious circumstances involving third parties (see one example here: http://techcrunch.com/2011/08/23/fukushima-robot-operators-blog-deleted-internet-steps-in/). Our discussion of digital loss was not limited to blogs, but also included some of the early online communities that wound up scattered into the digital wind due to reductions in software support or format changes (see the closing of GeoCities in 2009 http://techcrunch.com/2009/04/23/yahoo-quietly-pulls-the-plug-on-geocities/). All of these stories remind us that our digital heritage, both on a personal or cultural level, is every bit as valuable as the many collections of letters, diaries, first editions, newspapers and other artifacts we preserve in physical form. Blogs, as one of the more complex and multifaceted forms of digital media, require quite a bit more attention to ensure their authentic and robust preservation.

To preserve blogs, one needs a software solution that will be able to respond quickly to dynamic shifts in world politics, science and culture – a solution that will allow us to develop blog archives that are valuable now. The solution must also be able to manage the development of technology and software redundancies – a solution that will ensure the safe preservation of digital artifacts well into the future. Curators, as well, will need certain tools and policy guidelines to help them manage the specific challenges associated with blog preservation.

We presented BlogForever as the software solution that will meet all of those criteria and do so in a way that is efficient, effective and user friendly.

Those individuals that we spoke with are excited about the project and see its value, not only for digital libraries and academic institutions but also for the general public.

Profile photo of Hendrik

by Hendrik

The Blogosphere as Œuvre – Paper accepted at ECIS 2012

April 10, 2012 in Blog, Publications

A paper with the title “The Blogosphere as Œuvre: Individual and collective influences on bloggers” has been accepted in the European Conference on Information Systems, to be held on 11-13 June 2012 in Barcelona, Spain.

Authors: Hendrik Kalb, Matthias Trier

Abstract: Blogging has often been described as writing an online diary but, nowadays, it is more diverse and a considerable amount of blogs forms a common interconnected resource – the blogosphere – with comprehensive societal impact. While various studies have inquired social-psychological influences on the intention to contribute to an individual blog, the perceptions related to creating such a common valuable good have not yet been focussed. Therefore, we introduce a new construct – the œuvre of blogging – to better account for the notion of the blogosphere as a collective outcome. Furthermore, we propose a research model to inquire the influence of individual and collective beliefs on the œuvre in comparison to short-term blogging activity. We conducted an online survey with 509 international distributed bloggers to test our model. The results of our study provide support for the importance of an œuvre construct to explain influences on bloggers and blogging.

Keywords: Blog, Blogosphere, Collective benefit, PLS, Knowledge Sharing

BlogForever and migration

April 2, 2012 in Blog

Recently I have been putting together my report on the extent to which the BlogForever platform operates within the framework of the OAIS model. Inevitably, I have thought a bit about migration as one of the potential approaches we could use to preserve blog content.

Migration is the process whereby we preserve data by shifting it from one file format to another. We usually do this when the “old” format is in danger of obsolescence for a variety of reasons, while the “target” format is something we think we can depend on for a longer period of time. This strategy works well for relatively static document-like content, such as format-shifting a text file onto PDF.

The problem with blogs, and indeed all web content, is when we start thinking of the content exclusively in terms of file formats. The content of a blog could be said to reside in multiple formats, not just one; and even if we format-shift all the files we gather, does that really constitute preservation?

With BlogForever, we’re going for an approach to capture and ingest which seems to have two discrete strands to it.

(1) We will be gathering and keeping the content in its “original” native formats, such as HTML, images files, CSS etc. At time of writing, the current plan is that we will have a repository record for each ingested blog post and all its associated files (original images, CSS, PDF, etc.) will be connected with this record. These separate files will be preserved and presumably migrated over time, if some of these native formats acquire “at risk” status.

(2) We are also going to create an XML file (complete with all detected Blog Data Model elements) from each blog post we are aggregating. What interests me here is that in this strand, an archived blog is being captured and submitted as a stream of data, rather than a file format. It so happens the format for storing that data-stream is going to be XML. The CyberWatcher spider is capable of harvesting blog content by harnessing the RSS feed from a blog, and by using blog-specific monitoring technologies like blog pings; and it also performs a complex parsing of the data it finds. The end result is a large chunk of “live” blog content, stored in an XML file.

Two things are of interest here. One is that the spider is already performing a form of migration, or transformation, simply by the action of harvesting the blog. Secondly, it’s migrating to XML, which is something we already know to be a very robust and versatile preservation format, more so even than a non-proprietary tabular format such as CSV. The added value of XML is the possibility of easily storing more complex data structures and multiple values.

If that assumption about the spider is correct, perhaps we need to start thinking of it as a transformation / validation tool. The more familiar digital preservation workflow assumes that migration will probably happen some time after the content has been ingested; what if migration is happening before ingest? We’re already actively considering the use of the preservation metadata standard PREMIS to document our preservation actions. Maybe the first place to use PREMIS is on the spider itself, picking up some technical metadata and logs on the way the spider is performing. Indeed, some of the D4.1 user requirements refer to this: DR6 ‘Metadata for captured Contents’ and DR17 ‘Metadata for Blogs’.

We anticipate the submitted XML is going to be further transformed in the Invenio repository via its databases, and various metadata additions and modifications will transform it from a Submission Information Package into an Archival Information Package and a Dissemination Information Package. As far as I can see though, the XML format remains in use throughout these processes. It feels as though the BlogForever workflow could have a credible preservation process hard-wired into it, and that (apart from making Archival Information Packages, backing-up and keeping the databases free from corruption) very little is needed from us in the way of migration interventions.

It also feels as though it would be much easier to test this methodology; the focus of the testing becomes the spider>XML>repository>database workflow, rather than a question of juggling multiple strategies and testing them against file formats and/or significant properties. Of course, migration would still need to apply to the original native file formats we have captured, and this would probably need to be part of our preservation strategy. But it’s the XML renditions which most users of BlogForever will be experiencing.

Blogs and the evolutionary prospects of ERP 10 years from now

March 22, 2012 in Blog

by Prof David Olson,

ERP has undergone a rapid and dramatic evolution.Since SAP started working on their accounting product in the early 1970s, five major and many smaller vendors dominated large organizational computing in the 1990s.Most large organizations took advantage of the integrative opportunities of this form of software to rely on COTS products to downsize their very large information technology staffs.Y2K created a boom period for both ERP vendors and for IT individuals, but after the world somehow survived COBOL’s minor limitations, the demand for BOPSE ERP systems dropped, and IT individuals found a very depressing job market.The first decade of the 21st Century has seen highly cyclical IT employment, demonstrating the need to be mobile.The five BOPSE ERP vendors have collapsed to two (although both SAP and Oracle are very large and prosperous).There has been stronger emphasis on industry systems, support to small businesses, and country-specific software products.

Technologically, there have been many interesting developments in software-as-a-service, and open source software projects.SourceForge.net contains hundreds of thousands of open source software projects, including about 1,000 classified as ERP.These projects tend to be utilities in nature, similar to what software-as-a-service offers for a fee over the Internet.Countries such as Brazil are publicly supporting development of open source ideas.There are a growing number of ERP vendors offering their software for free (installation help and training available at a fee), such as Compiere and Nexedi.There even are a few efforts to develop a completely open source ERP product.

Software development is clearly evolving.My perception is that globally, there is strong preference for the open ideas of Richard Stallman in preference to the US-dominated proprietorial model of Bill Gates and Larry Ellison.The Internet enables enormous potential in linking active minds around the globe, enabling their collaboration in developing new and better things.

Looking closer to the Blogforever.eu project and ALTEC’s interest for commercial uptake and adoption, according to which the blogosphere could be a means to develop organizational computing, taking advantage of database technologies,  my first reaction is a disconnect in my own mind between ERP and database issues. However, I think that there is potential there.:  last fall I taught a database class, and had students report on databases of their selection.There were a number of very interesting reports of databases for various Web-businesses, such as Facebook, Craig’s List, Amazon, etc.The database systems to support things in the blogosphere certainly exist.Objects such as SaaS, open source software, and the apps that the latest generation love can be assembled to accomplish things that people and organizations need.You could conceivably assemble the software you need from a free and open Internet platform.

This is going to take a long time to evolve.But five years ago I thought an open source ERP software was impossible.It demonstrably is possible.There is a danger in forecasting, as anyone who reads the forecasts of the 19th Century, or Aldous Huxley or George Orwell knows.It is interesting to review any forecast over 20 years old.They inevitably miss many important factors.But I do feel confident that the means for collaborative software product development can support such a free-form software environment.

A key issue is how the benefits will be shared, and how contributors will be rewarded.I think that recent times have been dominated by US ideas about such matters.I perceive that is changing radically, and for the better.After all, Bill Gates is looking for ways to redistribute all of his gains, isn’t he?

David L. Olson is the James & H.K. Stuart Professor in MIS and Chancellor’s Professor at the University of Nebraska.  He has published research in over 100 refereed journal articles, primarily on the topic of multiple objective decision-making and information technology.  He teaches in the management information systems, management science, and operations management areas.  He has authored 18 books, to include Decision Aids for Selection Problems, Introduction to Information Systems Project Management, Managerial Issues of Enterprise Resource Planning Systems, Supply Chain Risk Management, and Supply Chain Information Technology.  Additionally, he has co-authored the books Introduction to Business Data Mining, Enterprise Risk Management, Advanced Data Mining Techniques, New Frontiers in Enterprise Risk Management, Enterprise Information Systems, and Enterprise Risk Management Models. He is associate editor of Service Business and co-editor in chief of International Journal of Services Sciences. He has made over 100 presentations at international and national conferences on research topics.  He is a member of the Decision Sciences Institute, the Institute for Operations Research and Management Sciences, and the Multiple Criteria Decision Making Society.  He was a Lowry Mays endowed Professor at Texas A&M University from 1999 to 2001.  He was named the Raymond E. Miles Distinguished Scholar award for 2002, and was a James C. and Rhonda Seacrest Fellow from 2005 to 2006.  He was named Best Enterprise Information Systems Educator by IFIP in 2006.  He is a Fellow of the Decision Sciences Institute.

Technological foundations of the current Blogosphere paper accepted at WIMS’12

February 15, 2012 in Blog, News, Publications

A paper named “Technological foundations of the current Blogosphere” has been accepted in the International Conference on Web Intelligence, Mining and Semantics (WIMS’12), to be held on 13-15 June 2012 in Craiova, Romania.

Authors: Vangelis Banos, Karen Stepanyan, Yannis Manolopoulos, Mike Joy and Alexandra Cristea

Abstract: In this paper, we review the technological foundations of the current Blogosphere. The review is primarily based on a large-scale evaluation of active blogs. The extensive list of examined technologies enables commenting on a range of widely adopted standards and potential trends in the Blogosphere. The evaluation has been conducted in the following stages:

  1. Retrieving and parsing a large set of blogs
  2. Identifying and quantifying the use of technologies such as web standards, adopted services, file formats and platforms.
  3. Analysing collected data and reporting the results
  4. Comparing the results with existing findings from the generic Web to identify similarities and differences in the Blogosphere.

The presented work was performed as part of BlogForever (ICT No. 269963), an EC funded research project aiming to aggregate, preserve, manage and disseminate blogs. The results of this study are relevant within the context weblog preservation and weblog data extraction.

3rd BlogForever Consortium Meeting

January 17, 2012 in Blog

The third BlogForever Consortium Meeting will take place during 22-23 February 2012 in Berlin. All project partners will come together to discuss about current progress and plan for our next steps.

The main meeting topics will be:

Herbert van de Sompel will be our guest for this meeting. Herbert is the team leader of the Prototyping Team at the Research Library of the Los Alamos National Laboratory. Currently, he works with his team on the Open Annotation and Memento (time travel for the Web) projects.

 

Test the blog spider prototype

December 13, 2011 in Blog, News

Finally the first software delivery in the Blogforever project is available – the prototype of a blogosphere spider.

The spider enables crawling and monitoring lists of identified blogs as well as new, unknown blogs. ItThe  Any new blog posts or comments from each blog will be added to the feed through the spider.

The spider can be downloaded and run from a single server; and managed through a web portal interface, as seen in the figure below.

 

 

 

 

 

 Figure 1 -Spider portal: Details linked from any of indexed sources can display crawled XML, and link to actual HTML.

Although this is only the first prototype of the research project, we have tested the prototype through crawling 36,000 distinct blogs, and extracting blog data of approximately 1GB.

The prototype spider can be downloaded and run from: http://bf2.csd.auth.gr/BFCrawler.rar.

Minimum server requirements to run the crawler:

  1. Operating System: Windows 2008 Server 64bit
  2. CPU: 2 Xeon CPUs 2.5Ghz
  3. RAM: 4 GB
  4. Hard Disk: 20GB (SAS) plus 60GB (SATA)

 Anyone interested in blog crawling should test this and contact us for discussing further requirements and usage.

 

 

 

 

 

Figure 2 – The downloader file: Contains download and installation instruction for the Blogforever Prototype Spider