Meemoo's Metadata Bake-Off: How AI Bakes Metadata into Digital Collections

Abstract

## Abstract Meemoo digitises images and audio for its content partners in cultural, media, and government sectors. We also manage the influx of their existing digital collections into our archive system, where the content is sustainably preserved. However, most material has almost no metadata attached, making it unfindable. Therefore, we have developed an AI-metadata pipeline that automatically adds metadata to video content, enhancing its searchability. This AI-driven approach helps make content findable and accessible, unlocking it for future generations. Our AI metadata pipeline has a microservice-based architecture capable of processing large datasets while keeping processing costs low. To date, we have processed more than 150,000 hours of video material. Our facial recognition pipeline combines several open-source models for face detection (YuNet, Mediapipe) and face recognition (MagFace). For speech-to-text, we use SpeechMatics, a high-quality, multilanguage SaaS speech engine. For entity recognition and linking on the transcripts, we utilize TextRazor, a multilingual SaaS engine. We will demonstrate the power of our pipeline by processing material from the Video Person-Clustering Dataset and the YouCook2 Dataset. This demonstration will apply facial recognition, speech-to-text, entity recognition, and entity linking models, to create AI-generated metadata and illustrate the improved searchability. While whisking in new insights and sprinkling some Generative AI (GenAI) on top, we will enable content creation and transformation, baking up the future potential of our technology. ## References: - Video Person-Clustering Dataset: https://www.robots.ox.ac.uk/~vgg/data/Video_Person_Clustering/ - YouCook2 Dataset: http://youcook2.eecs.umich.edu/download - YuNet: Wu et al. (2023) Yunet: A tiny millisecond-level face detector - Mediapipe: Lugaresi et al. (2019) Mediapipe: A framework for building perception pipelines - MagFace: Meng et al. (2021) Magface: A universal representation for face recognition and quality assessment

Details

Creators
Alec Hantson; Matthias Priem; Peter Vanden Berghe
Institutions
Date
2024-09-18 15:45:00 +0100
Keywords
metadata standards and implementation; from document to data
Publication Type
tool demo
License
Creative Commons Zero (CC0-1.0)
Video Stream
here
Collaborative Notes
here