BagIt Fixer-Upper: Scaling BagIt Tools to Manage the Ingest of Petabytes of Digitization Work


The New York Public Library has created over 1.5 PB of files from digitizing over 50,000 audio and video items for the long-term preservation of their content. This paper details the Library’s usage of the BagIt File Packaging Format during Quality Assurance and Audit Submissions functions as defined by OAIS. It also discusses extensions of the bagit-python library in order repair bags that do not pass those functions. Working with thousands of terabytes stored in hundreds of thou- sands of bags requires that our approaches to ingest scale appro- priately. Common changes to bags such as the accidental creation of system files in bags or purposeful edits of metadata files will invalidate the entire bag. Noting and responding to these errors is critical for improving workflows, but manual response is impos- sible. Using the bagit-python library, NYPL has created tools to selectively clean system files from bag directories and manifests, update or add checksums, and create event logs of repairs.


Krabbenhoeft, Nick
Publication Type
CC BY-SA 4.0 International
Direct Download
111301 bytes

View This Publication