A METS BASED INFORMATION PACKAGE FOR LONG TERM ACCESSIBILITY OF WEB ARCHIVES

Abstract

The British Library’s web archive comprises several terabyte of harvested websites. Like other content streams this data should be ingested into the library’s central preservation repository. The repository requires a standardized Submission- and Archival Information Package. Harvested Websites are stored in Archival Information Packages (AIP). Each AIP is described by a METS file. Operational metadata for resource discovery as well as archival metadata are normalized and embedded in the METS descriptor using common metadata profiles such as PREMIS and MODS. The British Library’s METS profile for web archiving considers dissemination and preservation use cases ensuring the authenticity of data. The underlying complex content model disaggregates websites into web pages, associated objects and their actual digital manifestations. The additional abstract layer ensures accessibility over the long term and the ability to carry out preservation actions such as migrations. The library wide preservation policies and principles become applicable to web content as well.

Details

Creators
Markus Enders
Institutions
Date
Keywords
Publication Type
paper
License
GPLv3
Download
413834 bytes

View This Publication