The bewildering number of video and audio compression formats available is difficult for those new to the industry to come to terms with. For broadcast engineers and IT engineers to work effectively together, IT engineers must understand the formats used, the legacy systems still in place, and the reasoning behind their existence. In this article, from the series Broadcast for IT, we investigate compression formats.
A compressed video or audio television program is either stored on a tape, file, or distributed in a network for live transmission. The video, audio, and metadata elements are called “essence” files or streams. To link them together they rely on a “wrapper”.
In the specification for MPEG-2, part 2 describes the video essence, and part 3 describes the audio essence. MPEG-2 part 1 describes the transport layer and shows how the video, audio, and metadata are linked together in a transport stream to facilitate live transmission.
Each essence stream is divided into 184 octet packets and appended to the Packet-Identifier (Pid) header to form a 188 octet packet. Pid’s have unique numbers for each essence stream. A video Pid might have a value of 26, and the audio might be 27. More than one program could exist within a transport stream resulting in hundreds of Pid streams for a highly optimized system.
MPEG-2 didn’t lend itself well to recording on computer hard disk drives as the entire transport stream had to be streamed to the disk drive. Typically, this would be a stream of 270MBits/sec.
Diagram 1 – Video, audio, and meta-data essence files are divided into 188 octet chunks and multiplexed together to form a single transport stream for broadcast.
Some vendors specified their own variants to facilitate disk drive recording. This included recording essence data into individual files and referencing them to one file directory. Audio and video files would be stored in separate files allowing for the possibility to lose or even use the wrong audio and video combination.
As we moved from tape to digital storage, exchanging media between broadcasters and post houses also started to become an issue. Tape formats are well defined storage media. If a program was stored on a Betacam tape, then it would always play the video, audio, and timecode correctly on a compatible player. But the same was not true of digital storage formats as alternative distributions to tape were sought.
MXF (Media Exchange Format) wrapper was a SMPTE (Society of Motion Picture Engineers) initiative designed to remedy the issue of disk storage and exchanging media. The MXF wrapper describes the codec used and how the data is formatted within the stream. MXF was designed to be future-proof and can be incredibly complex. A series of Operational Patterns were specified to categorize and simplify its operation.
OP1a is defined as the layout options for the minimal simple MXF file. To assist editing, it’s important that the video, audio, timecode, and metadata tracks are all temporally close to each other within the disc storage system.
Files Must Be Segmented
If the video is at the start of a file stream, and the audio at the end, the edit or media processing system will have to ingest the entire file to be able to retrieve the audio associated with the video. This may work well for short video segments, but a one-hour program would need more than 150Gbits of RAM to store a 40Mbit/s stream of the entire file before processing. Clearly this is inefficient and prohibits the use of IT infrastructures. MXF addressed this by interleaving the video and audio together so only shorter, more manageable chunks of data had to be stored in RAM for processing.
Various SMTPE specifications define the types of codecs used, such as SMPTE-383M for DV, and SMPTE-386 for D10.
Diagram 2 – Individual audio and video essence, caption, timecode, and other meta-data components are mapped to the wrapper to form an MXF file
Quick Time MOV is a media format wrapper created by Apple and is very closely related to the MPEG-4 wrapper, or MP4. However, MOV specifically uses Apples proprietary compression algorithm whereas MP4 uses many different codecs including the MPEG-4 video and audio codecs.
Wrappers are sometimes called containers, but they essentially perform the same task, that is to present and define where the essence data exists in the media stream or file.
DV Started as Tape
DV was standardized within the IEC 61834 family of standards, launched in 1995 as a tape format, and uses lossy video compression but uncompressed audio. Video is intra-frame coded using 4:2:0 color subsampling and packetized into digital interface format blocks. DIF blocks can be stored as individual computer files or wrapped using Quick Time (QT), Audio Video Interface (AVI), or MXF.
Both Panasonic and Sony improved on DV to provide their own versions. DVCPRO was introduced by Panasonic in 1995 specifically for electronic news gathering. It improves the audio subsampling to 4:1:1 and locks the audio to remove the risk of lip-sync errors that were seen during multiple transfers.
DVCAM was provided by Sony and increases the size of the track recorded on tape to facilitate frame-accurate insert editing.
DV was designed to be recorded on computer storage and so both DVPRO and DVCAM record onto disk drives.
Advanced Video Coding High Definition (AVCHD) was developed by Sony and Panasonic in 2006 and released into the professional market in 2008. Video is MPEG-4 compressed and the audio is either uncompressed or uses Dolby’s AC3. The file layout is derived from Blu-ray and can be used as a file distribution format.
MPEG-4 was introduced in 1998 and uses the concept of multi-media objects. MPEG-4 is an evolving standard currently consisting of 33 parts. To be truly compliant, a vendor must specify which parts they are compliant with, this rarely happens resulting in some playout-record combinations not working.
MPEG-4 was designed to be incredibly versatile and the vendors are left to decide which parts they include in their product. Profiles and levels are defined within the standard, so vendors can best define which parts they include. For example, profile SP-L5 describes D1-NTSC or D1-PAL.
The Interoperable Media Forum (IMF) is the latest addition to the formats family. It is designed for file-based distribution for localization and multi-versioning. Essentially, IMF defines a framework to allow programs and films to be distributed with regional variations.
When a film or television program aimed at multi-national markets is released, many versions of the same program will need to be made. There might be different language versions, or scenes acceptable in one culture that will not be acceptable in another. Using IMF, distributors make one common release for all broadcasters, but include the different shots needed in the same package. The broadcaster requiring the different shots, subtitles, or language tracks, will select them and the playout system will automatically insert them as appropriate.
IMF is still in its infancy but has potential to greatly simplify film and program distribution between production houses and broadcasters.
The proliferation of differing vendor specific compression formats has led to many variants of seemingly compatible formats which should work together, but do not always. An entire industry has grown that revolves around interfacing different versions of the similar standards, so video and audio can be easily exchanged between broadcasters and post houses.
You might also like...
Immersive audio transforms the listening environment to deliver a mesmerizing and captivating experience for a wide range of audiences and expansive group of genres.
Wild variations in the levels of program audio has long been a problem for broadcast outlets. Due to controversy over varying audio levels, governments have forced broadcasters to specify specific loudness levels for all programming. In this article, we’ll l…
Immersive audio has the great potential to transform our human listening experience, captivate our imagination, and inspire our inventiveness.
Part one of this four-part series introduces immersive audio, the terminology used, the standards adopted, and the key principles that make it work.
Every Super Bowl is a showcase of the latest broadcast technology, whether video or audio. For the 53rd Super Bowl broadcast, CBS Sports will use almost exclusively IP and network-based audio.