This discreet 30 channel test system at McGill University was a reference to identify 30 different positions of a single immersive sound.
Progress in work on the candidate standard for the next generation of broadcast television in the US is on its way to becoming the new TV standard in homes worldwide by default. Engineers need to understand the details.
When Part 2 of this ATSC 3.0 series of reports concluded, the virtual mic was about to be handed to Skip Pizzi, NAB Senior Director of New Media Technologies. He presented the second half of the SMPTE - ATSC webinar, focusing on the essence formats carried by ATSC 3.0 - video, audio and accessibility. The following story is comprised primarily of Mr. Pizzi's own words in the context of an edited-for-print version of his discussion.
ATSC 3.0 uses HEVC, the best video compression available to control broadcast bandwidth. Better systems may be specified later, but HEVC is the sole system specified to date. No other systems competed. HEVC is also officially known as “MPEG-H Part 2” which is also known as “ITU H.265.” By any name, some of the specifics and constraints around HEVC are currently under discussion, particularly how Ultra HD will be applied. Much of it is tentatively decided, but HDR and WCG are issues still being specified, as is 3D support and it is being defined primarily for sending UHD to fixed-receiver large screens. For mobile TV reception environments, ATSC 3.0 is intended to provide for HD 1080p.
ATSC 3.0 also has hybrid capabilities, meaning some elements are sent by broadcast, other elements are sent by broadband, and both used as discussed in ATSC 3.0 Details Explained, Part 2. The format also supports a high frame rate (HFR) element, intended for high-action content such as sports. It also supports legacy interlaced and progressive formats at SD and HD resolutions.
UHD formats are progressive-only, and the resolutions aren’t necessarily fixed. Work is focused on 4K today, but perhaps it may ultimately move to 8K. Right now, HEVC capabilities seem to be limited to 4K for practical distribution on a 6 MHz wide US television channel.
ATSC is designed to work in 6, 7, or 8 MHz wide channels for international capability. It also supports fractional and integer frame rates even in higher frame rates.
There is temporal scalability as well, which would allow a 120 fps content element to be used at 60 Hz. The specifics on how to accomplish this dual frame rate so that viewers at either get a good viewing experience are being explored. Right now, new shutter technique experiments are underway to help accomplish that goal.
Figure 1: The actual broadcast transmission of 4K takes significantly more bandwidth than all of the other Ultra HD elements identified. Click to enlarge.
Viewers need a big screen display larger than today’s average home HDTVs to appreciate the spatial resolution of 4K at today’s common viewing distances. Viewers will either need a bigger screen or to sit closer to the screen than usual to get the value of 4K resolution. On the other hand, the other UHD improvements, such as HFR, HDR, WCG and 10-bit bit depth can be seen and appreciated at any distance.
The 2K to 4K upconversion in today’s 4K TV sets is good, and 4K itself is relatively mechanical. Recent experiments reveal it is difficult to see the difference between a true 4K signal and one downconverted and transmitted at 2K, then upconverted at the receiver back to 4K.
The other UHD elements require professional human intervention such as color toning, shutter and frame rate choices. Right now, some broadcasters are thinking of taking advantage of UHD’s features at a 1080p frame rate, conserving bandwidth for more services. Ultimately, broadcasters can choose which elements they want to use and which they would rather not.
HDR/WCG is currently a candidate standard and undergoing serious evaluation and examination right now.
A HDR/WCG spec could be complete by 3Q16. Right now, it is being included with the published ATSC candidate specification, but with “TBAs” and blanks in place of upcoming HDR/WCG details.
The NAB hopes that ATSC 3.0 becomes the international standard for the totality of Ultra HD, which is much more than 4K. In fact, the 4K spatial resolution may become the least important UHD element.
ATSC 3.0 was intended to provide and improve audio in two areas; enhanced or immersive audio, and personalized audio. Both are enabled under the proposed candidate standard. So-called 3D audio is now described as immersive audio, which is at least 7.1+4 channels.
The +4 is 7.1 on the listener's plane plus four channels above the listener's overhead plane as shown in Figure 2. The system can optionally handle up to the 22.2 channel system NHD developed for its 8K system. Immersive audio also includes “higher order ambisonics,” which is a technology that scales audio to the immersive capabilities the reproduction system can support, such as directionality or the number of channels being locally reproduced.
Immersive 7.1+4 audio has an amazing impact on the suspension of disbelief and engagement.
A more flexible approach to audio and special channels is “object-based audio,” where a particular channel is not dedicated to a particular speaker, but rather sounds that exist in something of a virtual channel, with sound, level and location data in the metadata. When no longer needed the object goes away, increasing the efficiency of immersive audio content delivery.
Figure 2: Darker blue speakers are lower, on the listener’s plane. The four lighter blue speakers are up above, providing a layer of height. Click to enlarge.
Monaural sound defines a point. Stereo defines a line. 5.1 surround defines a plane. Immersive audio defines the whole cubic 3-dimensional space and allows a sound to be put at any point in that space. Immersive audio adds to content enjoyment and engagement in a very convincing manner. It is also designed to work well with headphones, in anticipation of engaging mobile television.
Personalization allows broadcasters to provide a number of options to the viewer/listener typically not available in broadcast beyond the second audio program (SAP) channel. SAP (and audio description) switched the entire audio output from one program audio source to another.
Object-based audio allows broadcasters to consider each piece of content, such as dialog, effects, and music separately. Changing the dialog language or adding visual descriptions or listening to the director’s commentary can be as simple as inserting a new audio track on the program with the multi-channel music and effects tracks unchanged.
Dialog enhancement is another feature that allows listeners to adjust the volume of the chosen voice track relative to the other elements such as music and effects, or sports crowd noise. Some viewers may wish to eliminate the dialog all together. It allows broadcasters to add surround sound to a mix without obliterating the dialog. Broadcasters can choose to limit how much viewers are allowed to do with the audio.
Hybrid content allows broadcasters to make dialog tracks in multiple languages available. When an audio track in a less common language is received on-line, the video and other audio tracks are tightly synchronized with the internet audio track feed, down to the sample or sub-sample level.
ATSC 3.0 audio provides many new possibilities for content creators, broadcasters, viewers and listeners. Click to enlarge.
The ATSC 3.0 audio coding system is two generations beyond the Dolby AC3 system used in ATSC 1.0 today. ATSC tests verify higher audio quality in ATSC 3.0, and it maintains support for stereo, mono and 5.1 surround, as well as loudness control and today’s 1.0 accessibility features. ATSC improves lip-sync so that in any kind of delivery scenario lip-sync is maintained to a very tight tolerance. The throughput is low-latency, necessary for live content.
Due to multiple competitive proposals, audio bitrates tested are all lower relative to Dolby AC3, supporting higher codec efficiency. Last year, subjective tests of bitrates and features were performed around the world, as were immersive headphone systems. Technically and acoustically, all performed well. Recommendations based on the results were made earlier this spring.
Audio continues to be a candidate standard. Audio sets up a common framework for the immersive and personalized audio systems with extensibility for the future. It also allows multiple current systems to be used. Two systems performed well enough to be used. Part 2 of the standard specifies Dolby AC4, and a subset of MPEG-H Part 3, the so-called 3-D audio system that MPEG has developed, as presented by the MPEG-H Audio Alliance group, which was formed by Qualcomm, Technicolor and Fraunhofer IIS. Either of the systems can be used within the common framework the standard proposes. The systems are not interoperable without conversion. Usage will later be recommended in a different document.
The audio candidate standard is open until the end of November. Initial comment resolution was recently completed, revisions are expected and work continues.
ATSC 3.0 has Video Description for the Visually Impaired (VI), Closed Caption, Closed Signing and Dialog Intelligibility features. It also provides for emergency crawls and audio tracks, and emergency information beyond the pure alerting process. Needs are always expanding to do better. ((5010))
Captioning is not necessarily purely embedded in the video, it is its own data stream. Optionally, it can remain embedded in the video as well, to make its way through MVODs. The mandatory element is to use a web-like approach for captioning. Other languages could also be sent through broadband.
Closed Signing is not a legal requirement like captioning, but sign language can be much more beneficial for some profoundly deaf people. It can be added in a one-to-many scenario, which would be a picture-in-picture window large enough to see the expressions of the singer.
Emergency alerts and messaging technologies are completely different than anything used before, analog or digital. Click to enlarge.
The discreet broadcast and broadband transmission of various elements of ATSC 3.0 gives broadcasters significant discretion in what they send, reducing worry about annoying those who don’t care or aren’t affected by a particular emergency. Those who want more information can continue to display it while others can turn it off.
ATSC 3.0 can also wake up a receiver for the initial alert if the user opts in to that particular capability. The standard has been written to require a very low trickle of current to keep that part of the system awake. Work towards Energy Star compliance is in progress.
The few bits used to do the emergency wakeup are exposed in the so-called “bootstrap” at the very top edge of the signal. The system doesn’t have to unpack or decode much to get its initial wake-up call. When the receiver wakes up, it can then find and present more details.
In the upcoming Part 4 finale of ATSC 3.0 Details Explained, Mark Corl with Triveni Digital will complete the walk-thru 3.0 tour with a technical discussion of bootstrap, signaling & announcement.
Related Editorial Content
The proposed ATSC 3.0 TV transmission standard will change television, much as POTS telephones were changed by wireless and the internet. Once adopted, every department in every TV station and the viewer’s watching habits will become as outdated as videotape.
To stay competitive and offer new services, TV broadcasters need to change. The core features of the ATSC 3.0 transmission system are configurable, scalable, interoperable and adaptable, not only for today's media environment but for what lies ahead.