Metadata Grinds Towards Unification and Automation

The arcane world of metadata has been enlivened by automation with the promise of efficiency savings in asset management and much richer labelling of content to enhance discovery. At the same time, there are hopes at last of the field being unified behind common standards, which is essential as video services become increasingly global through online distribution with wider access to premium content.

However, progress has still to be made on both fronts, with automatic metadata generation especially still in relative infancy, as was clear at the recent annual Metadata Developer Network's Workshop held at the EBU (European Broadcasting Union) headquarters in Geneva, Switzerland.

The EBU’s status as the global epicenter for metadata standards was emphasized by record participation drawing nearly fifty participants either in person or online from around the globe, including Australia, Canada and the USA. The Metadata Developer Network has been instrumental in drawing together the two leading global standard groups for audio-visual (AV) metadata, Europe’s EBUCore and PBCore developed by public broadcasters in the USA.

From early in 2015 the two groups have worked together to align the standards more closely for complete interoperability and ultimate convergence, with a common vocabulary for describing key AV attributes. The PBCore group conceded that EBUCore was further ahead on this count such that it made sense for the former to largely adopt the latter’s descriptions. In particular US broadcasters could start taking advantage of the EBU’s work in integrating with semantic web applications and RDF (Resource Description Format), which is a set of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. RDF has gained wide use for conceptual modelling of information implemented in web resources. It provides the framework for the semantic web, which is supposed to allow data to be shared and reused across different online sectors and services. As AV services increasingly overlap with other domains in communications and the Internet of Things, it makes sense to adopt common underlying standards for metadata, which EBUCore and PBCore are moving towards, perhaps to be unified under the ITU (International Telecommunications Union).

Broadcasting needs automatic metadata generation to streamline asset management and improve content discovery (click to enlarge).

Broadcasting needs automatic metadata generation to streamline asset management and improve content discovery (click to enlarge).

An essential condition for metadata convergence is agreement over what categories of description it covers. Originally the USA National Information Standards Organization (NISO) identified three categories, descriptive, structural and administrative metadata. Descriptive metadata describes aspects of an asset to aid discovery and identification, including title, abstract, author, actors involved and keywords specifying genre as well as ideally drilling down deeper into the content. Structural metadata indicates how the media objects are constructed, which could highlight scenes or episodes. Finally, administrative metadata provides management information such as time and method of creation, file type and who can access it.

However, NISO subsequently split two sub-divisions out of this last category, rights management metadata dealing with intellectual property rights and preservation metadata comprising information needed to archive and conserve a resource. More recently another USA standards group, the National Information Standards Organization, which deals more specifically with technical publishing standards than NISO, recognized these last two as fundamentally separate categories from administrative metadata as far as AV assets are concerned. Rights management metadata for example deals with protection of individual content assets during distribution and includes copyright as well as encryption policies, while administrative metadata covers higher level usage and access rights.

The models on which EBUCore and PBCore are based can therefore be split into those five categories. It is worth noting that of these categories, structural, administrative, rights management and archiving metadata by their nature tend to be generated at source. However descriptive metadata in practice has to be added during the content’s lifecycle because the field is constantly evolving with rising expectations over ability to search at ever more detailed levels of granularity. Increasingly users want to be able to select videos with specific combinations of actors, genres and scenes for example, requiring more detailed descriptive metadata than was available at the time the content was produced. With content proliferation, it is impractical to generate all this metadata manually so there has been increased focus on automation. Inevitably the field has been led by the big Internet players with their great resources, especially Google’s YouTube which could not otherwise hope to keep metadata up with thousands of new videos being uploaded every day.

Indeed in March 2017 Google announced at its Cloud Next conference in San Francisco a machine learning API for automatically recognizing objects in videos and making them searchable. Google demonstrated how its new Video Intelligence API allowed developers to build applications that can automatically extract entities from a video, such as dogs or flowers, with potential to drill down further into breeds of the former or species of the latter. This built on earlier work on recognition of such objects in still images and has obvious great potential, even if a lot more work is needed to streamline automated metadata generation and integrate with search and discovery engines effectively.

You might also like...