Evaluation of Research Data File Errors - First Results

Abstract

The digital preservation community is well acquainted with text publications, as well as file format identification and validation of these file types. In contrast, research data publications are becoming more frequent, introducing more or different kinds of format errors messages. The PUBLISSO – Repository for Life Sciences at ZB MED – Information Centre for Life Sciences contains both types of publication. In the process of transferring these to the digital preservation system, differences in file error messages and error handling emerged. This lightning talk intends to give a first view into the error types prominent for research data, comparing them to typical errors for text publications. It will focus specifically on those types of format validation and identification error messages that often indicate severe file errors. It will also introduce a possible approach to researching these error messages and their causes, as well as an example of resolving errors post-publication. Furthermore, it will raise various questions – such as ways to approach preservation actions that result in slight file changes and if these actions may already have an effect on research data integrity and technical compatibility with research software. It will also touch upon the question of handling relevant context material like standards which are not published alongside research data but may be necessary for its reusability in the long-term. Finally, it will give an overview of changes to our repository publication policy that have been introduced as a consequence of discussing these topics.

Details

Creators
Katharina Markus
Institutions
Date
2024-09-17 15:35:00 +0100
Keywords
approaches to preservation; from document to data
Publication Type
lightning talk
License
Creative Commons Attribution 4.0 (CC-BY-4.0)
Download
(unknown) bytes
Slides
here
Video Stream
here
Collaborative Notes
here

View This Publication