Harvester results in a digital preservation system

Abstract

In the last few years libraries from all around the world have build up OAIS compliant archival systems. The information packages in these systems are often based on METS and the contents are mainly e-journals and scientific publications. On the other hand Web archiving is becoming more and more important for libraries. Most of the member institutions of the International Internet Preservation Consortium (IIPC) use the software Heritrix to harvest selected Web pages or complete domains. The results are stored in the container format ARC or the successor WARC. The files’ quantity and the sizes of these archival packages are significantly different than those of the other publications in the existing archiving systems. This challenges the way the archival packages are defined and handled in current OAIS compliant systems. This paper compares existing approaches to use METS and Web harvesting results in archival systems. It describes the advantages and disadvantages of treating Web harvests in the same way as other digital publications in dedicated preservation systems. Containers based on METS are set side by side with WARC and its possibilities.

Details

Creators
Tobias Steinke
Institutions
Date
Keywords
london
Publication Type
paper
License
CC BY-SA 3.0 AT
Download
31536 bytes

View This Publication