Born Broken: Fonts and Information Loss in Legacy Digital Documents

Abstract

For millions of legacy documents, correct rendering depends upon resources such as fonts that are not generally embedded within the document structure. Yet there is significant risk of information loss due to missing or incorrectly substituted fonts. In this paper we use a collection of 230,000 Word documents to assess the difficulty of matching font requirements with a database of fonts. We describe the identifying information contained in common font formats, font requirements stored in Word documents, the API provided by Windows to support font requests by applications, the documented substitution algorithms used by Windows when requested fonts are not available, and the ways in which support software might be used to control font substitution in a preservation environment.

Details

Creators
Brown, Geoffrey; Woods, Kam
Institutions
Date
Keywords
san francisco
Publication Type
paper
License
CC BY-SA 3.0 AT
Direct Download
1351125 bytes

View This Publication