Asynchronicities in blog structure
At an atomic level, a “blog” comprises “blog posts”, which are continually added to the blog corpus: that is the dynamic essence of a blog, and distinguishes it from old-fashioned, largely static Websites and hypertexts in which little content changed between major update iterations, which process was probably more akin to “publishing a new edition” in the world of non-digital publications.
The blog also displays, as part of its frame, other graphical and functional elements (sidebars, widgets, “blogrolls”, etc) which may themselves contain dynamically updated, constantly changing information. These can be added, removed, amended and rearranged at will by the blog author/editor. Blog posts that were “published” in the context of one set of framing elements, will persist through subsequent versions of that framework.
Similarly with design (layout, colours, mastheads, etc), though the persistence tends to be longer, the informal nature of blogs means that these may be easily changed by the blog editor/author, and are thus more volatile than a typical “corporate” website. Again, blog posts may persist, unchanged in themselves, through many iterations of the blog site design and layout.
This very simplified visualisations suggests where we might start conceptualising key elements of a blog. It indicates that they iterate over time, but in the cases of Design, Posts and Widgets (as we’ll call them for brevity), according to independent schedules. While Posts and Comments persist in the online view of a blog, designs and widget arrangements are overwritten.
With my earlier ArchivePress project we deliberately overlooked preservation of the blog’s framing elements, and (given the much smaller scope of that project) established an acceptable rationale for doing so. The challenge for BlogForever is to find a solution to precisely these issues. Unless we were simply to adopt the snapshot approach of Heritrix-based web archiving initiatives (e.g. Wayback/archive.org, UK Web Archive), we need to ensure the BlogForever repository supports a degree of granularity that can capture, describe and preserve atomic blog objects in a way that reflects the particular interdependencies, in order to understand and preserve them authentically, and permit the many possible authentic and valid “time slice” views and analyses that users of the archive will need.
(I appreciate, by the way that these objects themselves are compound objects, so not strictly “atomic”: but the same is also true of atoms, as our CERN colleagues can attest!)



Totally well said Richard.
The way I see it, it all comes down to what we define as content. This had been a topic we discussed a lot at Warwick, I am not really sure we got somewhere solid though.
Just by handling comments as content, the whole thing is going to change dramatically, both in terms of needed updates and also in terms parsing/indexing.
Also, the possibilities and available options on what we could do with this stream of ever-changing content (comments) are quite many themselves.
If we go for widgets too – though they will provide a handy way to relate blogs and so on – this will make the whole thing even trickier.
It’s clear that each definition (aka path) has its pros and cons. A decision will have to be made as to which one should be followed.
Thanks George. This is the kind of question where we’d welcome others’ thoughts, even people not directly involved in the project.
There is an extra complexity related to comments. It’s the case when a blog post has zero of few comments in the blog itself while there is a longer and/or more interesting discussion about the post in another site, sometimes with the participation of the post author.
As a task, it is less trivial to include comments from social news sites that refer to a particular blog post than only comments that are situated in the same site. Functionally, though, does it make sense to discriminate between comment threads according to where they take place?