Introduction

The Audiovisual Metadata Platform, or AMP, is a software platform that aims to generate metadata for digitized and born digital audiovisual materials using a combination of machine learning models and human intervention. It was created in collaboration between Indiana University, AVP, the University of Texas at Austin, and the New York Public Library. It is funded by a grant from the Mellon Foundation.

The creation of mass digitization projects is one of the most important shifts in library and archival practices, both from a preservation and access standpoint. As analog media continue to deteriorate, digitization is the best hope for their long-term storage. However, the sheer amount of materials created by digitization makes it unfeasible for catalogers to catalog everything by hand. Making matters worse, many of these materials come with only scant metadata, meaning a cataloger would need to watch or listen to the item to generate metadata. As such, using machine learning to automate the metadata generation is a promising solution. However, while machine learning algorithms have significantly improved over the years---particularly in the last decade with the sudden explosion of deep neural networks---they still have difficulties and biases. These pose problems for metadata in terms of usefulness, accuracy, and fairness. The goal of AMP is to introduce human intervention in its machine learning pipelines to circumvent some of these limitations, so the metadata it generates is more correct, or at least more useful, to collection managers and catalogers.

This guide explains AMP's front-end features and functionality, and additionally explains how to navigate the user interface. It is designed for users of AMP: collection managers, catalogers, or any other person using the platform. Presently, the guide reflects AMP as it currently exists in its pilot stage; it is very likely that some of this information will change over time.

Definitions

These definitions are provided to make reading this guide easier.

Collection: A set of items
Bundle: A set of items from a collection or multiple collections. The bundle gathers items that the user wants to submit through a workflow at the same time
Item: A bibliographic item. It contains metadata and A/V binary content. Items belong to a collection
Content File: A file is a media file (sound recording, moving image) that is part of an item. Multiple related files can exist in one item
Unit: Collections belong to a Unit; [roles and permissions in AMP are tied to a ]unit[.]
Workflow: A representation of a graph that describes the routing rules for a set of MGMs. The input of a workflow may be an item or a group of items
Job: One execution of a workflow for a particular Content File
Metadata Generation Mechanism (MGM): a machine-learning tool or a human interaction tool that can be added to a workflow to create / edit metadata for a content file.
Supplemental File: Any file that is provided to supplement the information about a collection, an item, or a primary file
Intermediary file - a file that produced as part of a workflow that can be re-used as input into another workflow (ex: an amp_transcript file produced by the AWS Transcribe MGM)

Document generated by Confluence on Feb 25, 2025 10:39

Introduction

Introduction

Definitions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!