Considerations for High Throughput Digital Preservation

Abstract

In partnership with Tessella, FamilySearch is developing an automated approach to large scale digitization, ingest and longterm preservation of electronic content. The set of proposed processes and underlying architecture must support required ingest rates in excess of 20Tb a day. Significant effort has been placed on examining the preservation architecture and processes for potential bottlenecks. Digital preservation requires computational intensive capabilities to provide functionality such as fixity checking, format identification and characterization of content. When operating at very large scale there is also a real need for a large network bandwidth and high speed storage systems. By minimizing the need for human interaction and employing software parallelization our initial findings indicate that the primary bottleneck is not processor bound, but is directly associated with the movement of digital files into and within the application. In short the scalability problem is really a system engineering problem and not necessarily an issue for digital preservation per se.

Details

Creators
Jason Pierson; Robert Sharpe; James Carr; Mark Evans
Institutions
Date
Keywords
singapore; digital preservation; digital archiving; scalability; automation
Publication Type
paper
License
CC BY-SA 3.0 AT
Download
483161 bytes

View This Publication