Email Preservation at Scale: Preliminary Findings Supporting the Use of Predictive Coding

Abstract

Email provides a rich history of an organization yet poses unique challenges to archivists. It is difficult to acquire and process due to sensitive content and diverse topics and formats, which inhibits access and research. Predictive coding alleviates these challenges by using supervised machine learning to: augment appraisal decisions, identify and prioritize sensitive content for review and redaction, and generate descriptive metadata of themes and trends. Following the authors’ previous work which describes the project at its inception, preliminary findings support the use of predictive coding as an effective tool to enable digital preservation at scale. Specific tools, methodologies, and human factors that affect their success are discussed.

Details

Creators
Joanne Kaczmarek; Brent West
Institutions
Date
Keywords
boston
Publication Type
paper
License
CC BY 4.0 International
Download
402222 bytes

View This Publication