PDF Hybrid Preservation on Paper: Combining Digital and Analog to Preserve Critical Documents for Centuries in a Radioactive Waste Management Context

Abstract

Many organizations preserve content in PDF form. Andra, the French national radioactive waste management agency, documents its nuclear waste repositories mostly in this format, with a requirement to preserve some of this documentation over multi-century timescales at least. The impermanence of computing and complexity of the PDF format make the latter inappropriate for such timescales, and Andra has consequently taken the route of printing these documents on permanent paper. In the process however, in addition to the resulting high page count, digital integrity is lost which jeopardizes automated processing of text (e.g. search, translation, sorting) as it becomes dependent on the hypothetical future availability of perfect OCR. To try and benefit from the best of both digital and analog, the Micr’Olonys solution from Eupalia, already tested at Andra on a database, was extended with a preparer utility, called Sumetar, to convert PDF files into plain UTF-8 encoded text files associated with images in the BMP format or printed in analog form. Plain text and the BMP format are both very simple yet widespread, and are therefore suited for long term as well as current accessibility. Sumetar packages multiple files into a single uncompressed tar file, another very simple format. Micr’Olonys then transcribes the digital file into 2D barcodes printed on paper for passive preservation. Tests carried out in late 2023 show that this strategy typically offers a four-fold reduction in page count for a document with roughly one image every two pages on average.

Details

Creators
Florence Poidevin; Vincent Joguin
Institutions
Date
2024-09-17 13:30:00 +0100
Keywords
information technology for dp; start 2 preserve
Publication Type
paper
License
Creative Commons Attribution Share-Alike 4.0 (CC-BY-SA-4.0)
Download
(unknown) bytes
Video Stream
here
Collaborative Notes
here

View This Publication