MPEG-H Audio: The One Mix to Rule Them All (Part 2)

In addition to providing technological solutions for audio description, VIDEO TO VOICE is actively involved in other fields surrounding accessibility for digital media. For some time, VIDEO TO VOICE has been working in close partnership with the Fraunhofer Institute for Integrated Circuits (IIS) on topics such as automated mixing and MPEG-H Audio. The following article provides detailed analysis of the tools that make up MPEG-H Audio, a technology developed by Fraunhofer IIS to improve accessibility and deliver the best sound experience possible.

This is the second part of our series exploring the object-based approach to audio production and Fraunhofer IIS's involvement in developing MPEG-H Audio system. Part one focused on the fundamentals in traditional channel-based audio, the accessibility problems this approach causes, and the solutions that object-based audio can provide.

With the basics covered, it's now time to take a closer look at the ground-breaking technology behind MPEG-H Audio and the various tools content creators can take advantage of. Let's start with the MPEG-H Authoring Suite...

What does the MPEG-H Authoring Suite involve?

Authoring in this sense is the step where metadata is created to enable playback devices to deliver on the three key principles of interactivity, immersive sound, and universal delivery.

Metadata is information that describes an audio object's existence, position, and function.

The MPEG-H Authoring Suite (MAS) is the place where content creators can create metadata, such as setting up presets, enabling interactivity options for viewers, and defining positions and properties of audio objects.

On the MAS, content creators can also monitor how their mix will be rendered in different layouts and make any necessary changes.

MPEG-H Audio: The One Mix to Rule Them All. A man sitting at computer wearing headphones

Is it a complex process?

It depends on the end user's requirements, so some projects may be more intricate than others.

Content creators choose the required degree of interactivity. For example, they may want to enable accessibility features or audio labels in multiple languages.

Fraunhofer IIS has developed tools such as the MPEG-H Authoring Plug-in to make it as easy as possible to create complex features.

What can you do with the MPEG-H Authoring Plug-in?

The MPEG-H Authoring Plug-in (MHAPi) takes the user through all the steps of creating object and channel-based MPEG-H Audio productions inside any VST3 or AAX-enabled audio workstation.

Content creators can export their immersive and interactive MPEG-H Audio scenes to MPEG-H master files that can be distributed via MPEG-H-enabled channels.

MPEG-H Audio: The One Mix to Rule Them All. Screenshot of the MPEG-H Audio Plug-in showing various meters and settings.

What if the content creator doesn't want to use a digital audio workstation?

Content creators can use the MPEG-H Authoring Tool (MHAT) to take care of the authoring step independent from the mixing process.

It is a stand-alone tool for Windows and macOS, which provides most of MHAPi's functionality; the only features missing are those that rely on timeline-based automation.

Content creators can use the MHAT to export their music, effects, and multiple dialogue mixes in different languages, before merging everything together into a single interactive delivery.

The MHAT can also be used as a player and to edit existing MPEG-H Audio masters.

Screenshot of the MPEG-H Authoring Tool. On the left, there are various audio objects including one for each of the audio description, commentary, German audio, French audio, and Italian audio.

Speaking of players, what is the best way to perform quality control checks before broadcast?

The MPEG-H Production Format Player is the best option.

It can play back MPEG-H Audio masters, preview interactivity features, render multiple layouts, and play back video in various progress formats such as H.264 and H.265.

Screenshot of the MPEG-H Production Format Player. There is a preview of the video top centre showing an apple on a tree. Below there are various meters and parameters in the tool.

There are a number of next generation audio (NGA) systems and production tools out there. Do compatibility problems arise?

Several NGA systems and production tools now exist, resulting in different standards and proprietary file types. This can cause a lot of headaches for broadcasters and content creators.

As a result, the BWF/ADM format was created: an open standard that serves as an archive and exchange format.

However, there are also different ADM file types, which is why Fraunhofer IIS developed the MPEG-H Conversion Tool (MCO) – a hub that converts ADM files from different tool sets into MPEG-H master files.

Screenshot of the MPEG-H Conversion Tool. On the left is the source file with technical information listed. On the right is the target format with frame rate information.

Sounds great. Where can you get hold of these tools?

All the tools are available on Fraunhofer IIS's website free of charge.

VIDEO TO VOICE has been working closely with Fraunhofer IIS to integrate an evaluation tool for MPEG-H into its system. Feel free to contact us for more information.

Summary – what have we learnt?

MPEG-H Audio is an innovative object-based audio system that removes accessibility hurdles for the end user. Benefits include improved dialogue intelligibility through enhanced speech, multiple language versions and audio description within the same stream, accessible presets as part of the regular broadcast, and options to customize audio object levels and positions. In establishing the principles of interactivity, immersive sound, and universal delivery, MPEG-H Audio is certain to have a major impact on broadcast delivery services over the next few years.