Compression: Part 1 - Defining Terms

Compression is the ultimate enabling technology behind broadcasting. Without it, life would be very difficult indeed. In this new series, the whole topic will be explored at some depth.


Related articles:


Compression introduces a whole slew of buzzwords and acronyms and these will be defined as we go along. The first of these is codec, the series combination of an encoder and a decoder. The sole purpose of an encoder is to reduce the bandwidth/bit rate of a video signal either for practical or economic reasons. Anything the decoder can predict for itself is called redundancy. The encoder need not send it. In order to obtain that benefit we have to put up with a lot of other characteristics that may or may not become drawbacks, dependent on the application.

At one level, digital video is just data, so any data compression technique could be used on it. Many data codecs are lossless, in that what comes out is bit-identical to what went in. That's a basic requirement to compress computer instructions and bank statements, for example, but the compression factors achieved are not enough to meet the demands of television. Practical codecs used in broadcasting are lossy, which means that what comes out is not identical to what went in. To be precise, the loss takes place in the encoder and the decoder is essentially blameless.

If we were to subtract a codec output frame from the original frame, pixel by pixel, we would have an objective record of the differences. The differences represent a distortion of the original. Distortions may also be called artifacts. Some authors refer to distortion as coding noise, which is a misnomer. Noise is an unwanted random signal added to a wanted signal. Coding distortion is not necessarily random. The common compression artifact known as blocking is highly structured and correlated with the signal. The concern here is not simply one of etymological pedantry. Assuming something is random when it isn't can lead to incorrect technical conclusions.

Given that the human visual system (HVS) has finite powers, a well-designed codec might arrange the distortions in such a way that the HVS is essentially unaware of them, rather like the actions of a magician. This suggests two important conclusions. The first is that a well-designed encoder must rely on an intimate knowledge of the HVS. The second is that assessment of the quality of an encoder is non-trivial. Ultimately television is designed for human consumption and the HVS must be the quality arbiter. Any piece of equipment seeking to replicate the human quality assessment must also contain a model of the HVS.

Our hypothetical codec in which the distortion is invisible will achieve a certain compression factor, the ratio of the input bit rate to the output bit rate, but it is important to realize that such performance is only going to be achieved on similar video material. In the real world, video material differs quite alarmingly in the amount of information, the amount of unpredictability, in the signal. A cartoon, for example is compressor heaven, because, by definition, the images are simplified. The leaves of trees blowing in the wind, or light reflecting from water waves are both difficult for an encoder because of the complex motion combined with significant detail.

Fig.1 - The complexity of a codec rises with the compression factor

Fig.1 - The complexity of a codec rises with the compression factor

This suggests that testing or demonstrating a codec on unknown material is meaningless because if the material is easy the codec will appear to perform well. Codecs need to be compared on the same material to avoid that. A metric for the performance of a coder needs to be based on the compression factor obtained with respect to the difficulty of the material. Measuring that is non-trivial.

Another result is that we can have two types of compression system. In one, the compression factor and the output bit rate are constant and the distortion is a function of the difficulty. In the other the amount of distortion is constant but the compression factor and the bit rate must vary with the difficulty. In applications such as video disks, it is easy to adjust the data rate at the player so it is obvious to adopt a variable compression factor. The high quality achieved by video disks is partly down to that.

However, in broadcasting and data transmission variable bit rate causes all kinds of practical problems. One solution is where a number of TV channels share the same multiplex. The overall bit rate of the multiplex can be constant, but the way it is divided between the channels can change dynamically using statistical multiplexing. This assumes that the channels are uncorrelated and that a difficult segment in one channel will occur when the other channels are having it easy and can give up bandwidth.

Although there are many types of codec, they must all abide by the same rules that follow from information theory. One of these rules is shown in Fig.1, which is that the complexity of an encoder increases exponentially as a function of the required performance. Complexity can be measured by the number of computational operations required per second or per pixel. Early compression techniques such as interlace and color difference working were simple enough to be implemented with vacuum tubes in the analog domain.

Fig.2 - The coding delay rises with the compression factor.

Fig.2 - The coding delay rises with the compression factor.

Going further had to wait for digital techniques, which then brought the rules of microelectronics into the game. One of these is Moore's Law, which predicts the way digital hardware gets faster and cheaper with time. Clever compression techniques remain academic to the broadcaster if they cannot be implemented at consumer prices, so the complexity of television codecs follows Moore's Law quite closely.

Another fundamental truth is shown in Fig.2, which is that the codec delay increases with performance. This also follows from information theory, because finding more redundancy through time requires a larger number of frames to be considered. Those of us who can remember NTSC know that the addition of the color subcarrier caused the sequence length of the signal to increase from two fields to four (eight in the case of PAL). That was the harbinger of the GOP (group of pictures) of MPEG (moving pictures by educated guesswork). When digital television was new and wonderful, one could wander into a TV shop and see that the digital channels were significantly delayed relative to the analog versions.

Most codecs are asymmetrical, which means that the encoder does more work than the decoder. This follows from the fact that the decoder is deterministic: it does exactly what the compressed bitstream tells it to do. In contrast the encoder has an array of compression tools to choose from and has to figure out which ones work best as the incoming video changes. Practical encoders actually contain a decoder, so the encoder knows exactly what the decoder knows. Asymmetry is a good thing for a broadcaster because there will always be more decoders than encoders, and it is better to have a few expensive encoders driving a lot of cheap decoders. As increasingly the consumer is watching on portable devices, reduced decoder complexity also helps battery life.

Composite video was about the last format in which the encoder was defined. In the digital domain it is the encoded signal that is defined and the standards documents are practically silent on how the encoding should be done. This is as it should be because manufacturers invest thousands of man-hours in encoder design and naturally wish it to be proprietary. When it is not possible to tell from the bit stream how the encoder works, the intellectual property of the manufacturer is safe.

The encoded signal has some of the attributes of a language, in that it has a vocabulary, which the decoder must understand if it is to work properly. However, the encoder is not compelled to use the entire vocabulary if the objective is to deliver a cheaper product. In order to offer flexibility, the coding standard may be divided into levels, which represent a maximum frame size and within each level there may be more than one profile, which is a way of allowing some variation in complexity and cost.

Coders can also be hierarchical, which means that two or more qualities are available depending on the decoder. A base signal is sent to all decoders, and an optional enhancement signal can be sent to decoders that can use it. The enhancement signal might improve the picture resolution, or it might reduce the level of coding distortion.

Having come this far, it should be clear what are the major topics that need to be discussed. These will include the concepts of prediction, both within and between pictures, the importance of motion compensation, why and how transform coding is used, how pictures are assembled into groups, how the picture is divided into blocks, how the bit rate is controlled, the differences between various popular codecs, and how codecs can be tested and compared. Pre-processing, the treatment of video signals to make them suitable for compression will also be considered.

You might also like...

Orchestrating Resources For Large-Scale Events: Part 3 - Contribution & Remote Control

A discussion of camera sources, contribution network and remote control infrastructure required at the venue.

Ten Years Later: NBC Sports’ Stamford Facility Grows With The Times

It was ten years ago, in the fall of 2012, that NBCUniversal opened a new international broadcast center in Stamford Connecticut, as the home for NBC Sports. It served as a way to consolidate its growing employee base and the production…

Orchestrating Resources For Large-Scale Events: Part 2 - Connecting Remote Locations

A discussion of how to create reliable, secure, high-bandwidth connectivity between multiple remote locations, your remote production hub, and distributed production teams.

Inside Amazon Studios Huge New VVC Virtual Production Studio On Historic Culver City Lot

Exciting new types of on-premise and cloud-based feature film and episodic television production and post workflows are now being experimented with and deployed at Amazon Studios’ recently opened virtual production stage, dubbed Stage 15, in Culver City, Calif.

Essential Guide: Delivering Timing For Live Cloud Productions

IP is an enabling technology, not just another method of transporting media signals. Consequently, it is giving broadcasters the opportunity to reconsider how we build live television workflows and infrastructures.