Future-Proofing the Web: What We Can Do Today: Presentation - iPRES 2005 - Göttingen

Abstract

All we know about predicting our digital future is based on our past, a review of which reveals a remarkable truth dating from the beginning of the digital era: plain text is a versatile and lossless format that is just as readable with today's computers as it was 30 years ago. Compared to the fonts, colors, point sizes, and graphics available in contemporary formats, plain text may look dull and dry, but in fact this "desiccated data" successfully represents all the protocols that built the Internet. Moreover, it is hard to imagine its not being nominated as the most likely current format to be readable 30 years from now. A proposed strategy for preserving today's web formats is to go ahead and save the original format, but also to automatically derive and save various "desiccated" versions that, while failing to capture all the original format's richness, nonetheless capture its essential nutrient value. In the case of a document, saving a plain text format version alongside the original would provide a fall back in case the original format failed. One never knows if there will ever be money enough to touch a preserved object again, let alone migrate its format. The effort and storage for a derived plain text version is often needed anyway to support search indexing. Generalizing, the lesson appears to be that the simpler technological intermediation required to render the digital object for the user, the easier it is to reproduce that intermediation, hence to carry forward the object. Along these lines, the image format analog of plain text file might be a basic raster file, in which the array of pixels (picture elements) could be seen to mimic ancient weaving technology. It may be that adding the complication of a simple run-length encoding compression would be worth the space savings. A strategy for deriving and saving raster images of original documents rendered with today's software has two advantages: we will never have better rendering tools for today's formats than today (with all the features and error-compensation that make malformed format instances -- very common -- renderable) and it provides an additional fall back in case the original and the plain text fail. This is something we can do for preservation today that we may never have the money or the knowledge to do in the future.

Details

Creators
John Kunze
Institutions
Date
Keywords
göttingen
Publication Type
paper
License
CC BY-SA 3.0 AT
Download
19676 bytes

View This Publication