Compression is the ultimate enabling technology behind broadcasting. Without it, life would be very difficult indeed. In this new series, the whole topic will be explored at some depth.
Compression introduces a whole slew of buzzwords and acronyms and these will be defined as we go along. The first of these is codec, the series combination of an encoder and a decoder. The sole purpose of an encoder is to reduce the bandwidth/bit rate of a video signal either for practical or economic reasons. Anything the decoder can predict for itself is called redundancy. The encoder need not send it. In order to obtain that benefit we have to put up with a lot of other characteristics that may or may not become drawbacks, dependent on the application.
At one level, digital video is just data, so any data compression technique could be used on it. Many data codecs are lossless, in that what comes out is bit-identical to what went in. That's a basic requirement to compress computer instructions and bank statements, for example, but the compression factors achieved are not enough to meet the demands of television. Practical codecs used in broadcasting are lossy, which means that what comes out is not identical to what went in. To be precise, the loss takes place in the encoder and the decoder is essentially blameless.
If we were to subtract a codec output frame from the original frame, pixel by pixel, we would have an objective record of the differences. The differences represent a distortion of the original. Distortions may also be called artifacts. Some authors refer to distortion as coding noise, which is a misnomer. Noise is an unwanted random signal added to a wanted signal. Coding distortion is not necessarily random. The common compression artifact known as blocking is highly structured and correlated with the signal. The concern here is not simply one of etymological pedantry. Assuming something is random when it isn't can lead to incorrect technical conclusions.
Given that the human visual system (HVS) has finite powers, a well-designed codec might arrange the distortions in such a way that the HVS is essentially unaware of them, rather like the actions of a magician. This suggests two important conclusions. The first is that a well-designed encoder must rely on an intimate knowledge of the HVS. The second is that assessment of the quality of an encoder is non-trivial. Ultimately television is designed for human consumption and the HVS must be the quality arbiter. Any piece of equipment seeking to replicate the human quality assessment must also contain a model of the HVS.
Our hypothetical codec in which the distortion is invisible will achieve a certain compression factor, the ratio of the input bit rate to the output bit rate, but it is important to realize that such performance is only going to be achieved on similar video material. In the real world, video material differs quite alarmingly in the amount of information, the amount of unpredictability, in the signal. A cartoon, for example is compressor heaven, because, by definition, the images are simplified. The leaves of trees blowing in the wind, or light reflecting from water waves are both difficult for an encoder because of the complex motion combined with significant detail.
This suggests that testing or demonstrating a codec on unknown material is meaningless because if the material is easy the codec will appear to perform well. Codecs need to be compared on the same material to avoid that. A metric for the performance of a coder needs to be based on the compression factor obtained with respect to the difficulty of the material. Measuring that is non-trivial.
Another result is that we can have two types of compression system. In one, the compression factor and the output bit rate are constant and the distortion is a function of the difficulty. In the other the amount of distortion is constant but the compression factor and the bit rate must vary with the difficulty. In applications such as video disks, it is easy to adjust the data rate at the player so it is obvious to adopt a variable compression factor. The high quality achieved by video disks is partly down to that.
However, in broadcasting and data transmission variable bit rate causes all kinds of practical problems. One solution is where a number of TV channels share the same multiplex. The overall bit rate of the multiplex can be constant, but the way it is divided between the channels can change dynamically using statistical multiplexing. This assumes that the channels are uncorrelated and that a difficult segment in one channel will occur when the other channels are having it easy and can give up bandwidth.
Although there are many types of codec, they must all abide by the same rules that follow from information theory. One of these rules is shown in Fig.1, which is that the complexity of an encoder increases exponentially as a function of the required performance. Complexity can be measured by the number of computational operations required per second or per pixel. Early compression techniques such as interlace and color difference working were simple enough to be implemented with vacuum tubes in the analog domain.
Going further had to wait for digital techniques, which then brought the rules of microelectronics into the game. One of these is Moore's Law, which predicts the way digital hardware gets faster and cheaper with time. Clever compression techniques remain academic to the broadcaster if they cannot be implemented at consumer prices, so the complexity of television codecs follows Moore's Law quite closely.
Another fundamental truth is shown in Fig.2, which is that the codec delay increases with performance. This also follows from information theory, because finding more redundancy through time requires a larger number of frames to be considered. Those of us who can remember NTSC know that the addition of the color subcarrier caused the sequence length of the signal to increase from two fields to four (eight in the case of PAL). That was the harbinger of the GOP (group of pictures) of MPEG (moving pictures by educated guesswork). When digital television was new and wonderful, one could wander into a TV shop and see that the digital channels were significantly delayed relative to the analog versions.
Most codecs are asymmetrical, which means that the encoder does more work than the decoder. This follows from the fact that the decoder is deterministic: it does exactly what the compressed bitstream tells it to do. In contrast the encoder has an array of compression tools to choose from and has to figure out which ones work best as the incoming video changes. Practical encoders actually contain a decoder, so the encoder knows exactly what the decoder knows. Asymmetry is a good thing for a broadcaster because there will always be more decoders than encoders, and it is better to have a few expensive encoders driving a lot of cheap decoders. As increasingly the consumer is watching on portable devices, reduced decoder complexity also helps battery life.
Composite video was about the last format in which the encoder was defined. In the digital domain it is the encoded signal that is defined and the standards documents are practically silent on how the encoding should be done. This is as it should be because manufacturers invest thousands of man-hours in encoder design and naturally wish it to be proprietary. When it is not possible to tell from the bit stream how the encoder works, the intellectual property of the manufacturer is safe.
The encoded signal has some of the attributes of a language, in that it has a vocabulary, which the decoder must understand if it is to work properly. However, the encoder is not compelled to use the entire vocabulary if the objective is to deliver a cheaper product. In order to offer flexibility, the coding standard may be divided into levels, which represent a maximum frame size and within each level there may be more than one profile, which is a way of allowing some variation in complexity and cost.
Coders can also be hierarchical, which means that two or more qualities are available depending on the decoder. A base signal is sent to all decoders, and an optional enhancement signal can be sent to decoders that can use it. The enhancement signal might improve the picture resolution, or it might reduce the level of coding distortion.
Having come this far, it should be clear what are the major topics that need to be discussed. These will include the concepts of prediction, both within and between pictures, the importance of motion compensation, why and how transform coding is used, how pictures are assembled into groups, how the picture is divided into blocks, how the bit rate is controlled, the differences between various popular codecs, and how codecs can be tested and compared. Pre-processing, the treatment of video signals to make them suitable for compression will also be considered.
You might also like...
Having considered all of the vital elements of moving image coding this final part looks at how these elements were combined throughout coding history.
The criticality of service assurance in OTT services is evolving quickly as audiences grow and large broadcasters double-down on their streaming strategies.
Having looked at the traditional approach to moving pictures and found that the portrayal of motion was irremediably poor, thoughts turn to how moving pictures might be portrayed properly.
Quantum Computing is still a developmental technology but it has the potential to completely transform more or less everything we currently assume regarding what computers can and can’t do - when it hits the mainstream what will it do…
At its core, the network-side can be an early warning system for QoS, which in turn correlates to actual QoE performance.