Linked open vocabularies provide a base for standardized metadata as required for inter-operable AI based media applications.
Media Asset Management (MAM) depends on technical metadata being carried in a container be it IMF, MXF or AVI. Yet, the real benefits of MAM come when both contextual and technical metadata are available linked by a URL in the container data structure.
While asset management in the production and distribution of moving pictures for non-theatrical consumption is dependent on technical metadata, it is the contextual metadata that makes possible fine-grained repurposing of existing content. In today’s world of multichannel and repurposing of assets, this characteristic is often key to expanding revenue.
With IMF we seem to be moving, at last, to a machine-readable container format capable of providing the information required for automated technical postproduction. In this regard, the generation of metadata at the point of origin cannot be emphasized enough. IMF provides for a universally unique identifier (UUID). This 128-bit number can identify unique internet objects or data for digital content based upon the time of capture and the MAC address of the capture system. This UUID is absolutely essential for all future metadata implementations. IMF however is aimed only at distribution.It is not capable of reproducing a finished asset based upon stored information, such as in the Advanced Authoring Format (AAF) and a set of original assets.
While one goal of AAF is to be able to automatically conform a set of original assets into a finished product, there is no implementation currently (2018) capable of doing this, especially in regards to grading. That said, IMF is a useful and practical tool for repurposing a set of master assets for various delivery scenarios.
The state of standards regarding contextual metadata is still in flux despite attempts at standardization since at least 1995 (Dublin Core Metadata Initiative). Some contextual metadata, such as time and capture location, may be generated automatically, but capturing the descriptive metadata in a machine readable form is of greater value.
What are the tools available to do this and how is it organized?
Taxonomy is the word used to classify in a hierarchical manner. Curation is the process of classifying. Ontology is the relationship between different categories of classification or as Tom Gruber creator of SIRI said “An Ontology is a formal specification of a shared conceptualization”. Bringing this all together at the highest level is the Resource Description Framework (RDF), itself part of the Semantic Web.
RDF describes things in such a way that a machine can find and manipulate them. The syntax is based upon triplets of subject->predicate->object. This is where .ttl (Terse RDF Triple Language or Turtle) files get their name. These .ttl files are similar to .xml but, Turtle does not rely on XML and is generally recognised as being more readable and easier to edit manually than its XML counterpart. RDF does not describe the semantics, or meaning behind the data, rather, it provides the structure upon which this meaning is built.
The simplest RDF triple statement is a sequence of (subject, predicate, object) terms, separated by whitespace and terminated by '.' after each triple. Here is how a statement might be written in order to describe The Green Goblin:
The subject Spiderman is related to the object green-goblin as “enemy of”. For more details go here. This kind of connectedness does not easily translate into representation in either relational or hierarchical databases. Instead, a new kind of database is required.
A graph database (GDB, shown on the right) is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. This allows for simple queries based upon the relationships between objects. The language used for queries into this database is SPARQL (pronounced “sparkle”). Click to enlarge.
Taxonomy and Curation of Moving Pictures
Any taxonomy of moving images has to be considered as a superset of still images. Whereas the content of a single frame may be defined technically by the color information in a histogram, this kind of automatically generated information can only be a starting point due to the semantic gap problem i.e., a purely technical description of the image will result in many false positives. However a structured classification of still images based upon object recognition using neural networks is now a real possibility.
This is one tool used in the “search by image” feature of modern search engines. The larger problem is in defining a “Scene” or “Chapter” (as in a DVD) based upon the repetition or movement of objects over a number of frames. The taxonomy of a moving image includes the relationships between any objects that are relevant in a series of still images. Manual curation is still used (2018) for most of these tasks.
This Ontology of city sounds, was developed by Edward Brown et al., Soundscape of the modern city, New York Department of Health 1930. Click to enlarge.
To understand the Taxonomy of sound we need to look a little deeper at what Ontology is. Ontology normally defines a common set of terms that are used to describe and represent a domain. Ontology is domain specific, and it is used to describe and represent an area of knowledge. It contains terms and the relationships among these terms. Figure 3 above represents ontology of city noise.
One way to classify audio would be to determine natural sounds vs man made sounds. Unfortunately, as of 2018, there is no generally accepted ontology covering all audio sound waves. Regardless, there are tools available that can help with regard to more common sounds.
Many of us are comfortable using Siri or Alexis or Google Voice as commands. These tools translate voice into actionable information based upon a library of sounds. Goggle uses the tool, AudioSet, which contains more than 2 million 10 sec sound samples. AudioSet is still only a starting point for a comprehensive vocabulary of sound events.
In order to make this all work we need controlled vocabularies such as the Library of Congress Subject Headings. Controlled vocabularies provide a common description framework as well as common notation for the relationships between objects in the RDF schema. See Linked Open Vocabularies.
The Wizard of MAMs and WMS
All of the above represents just a brief look behind the curtains at what makes media asset management systems work and workflow management systems (WMS) possible. If your vendor has followed best practices, you may not have to worry about it. However, if the goal is to monetize older assets, be sure the vendor can properly interface the MAM with data created by an external AI conversion platform.
It is said that 90% of available multimedia assets have still not been digitized. Monetizing these assets is not possible without AI and AI depends upon access to data in a standardized manner. The Semantic Web is that standard.
Christopher Walker, SYNERGIST
Related Editorial Content
Multicast-broadcast single-frequency network (MBSFN) is a communication channel defined in the fourth-generation cellular networking standard called Long Term Evolution (LTE). As ATSC 3.0 standards continue to evolve, are they already too late?