Compression: Part 7 - Macro Blocks

Here we continue the story of motion compensated compression using macroblocks.

Other articles in this series and other series by the same author:

Compression relies heavily on motion compensation to locate temporal redundancy or, more precisely, redundancy along optic flow axes. This in itself works extremely well and, as has been noted, a lossless codec could be made using just temporal coding and transmitting the full residual. In MPEG-2 the macroblocks on which motion compensation is based were always the same size and this led to prediction errors when a moving object boundary cut through the middle of a macroblock. When two parts of the block are moving differently, no single vector can be correct. In AVC the 16 x 16 macroblock can be divided up in to two 8 x 16 blocks or two 16 x8 blocks, or into four 8 x 8 blocks, each having their own vectors. This allows the boundary of a moving object to be more closely approximated so the prediction error becomes smaller. The number of vectors to be sent rises, but this is easily coded because vectors within a moving object are similar or identical as are vectors outside it.

In practice, higher compression factors are demanded and the residual is not sent in full, but is itself compressed and that must result in a loss of quality. The use of macroblocks of constant size means that the macroblock boundaries fall on a horizontal and vertical grid in the picture. When lossy coding is used on the residual data, the resultant errors mean that the boundary between a pair of macroblocks may not be properly portrayed. If, for example, the luma at the edge of one block is slightly too high and the luma at the mating edge of the next block is too low because of coding errors, an unwanted transient will be introduced at the boundary.

The HVS is very good as spotting regularity in images and even relatively low levels of these transients can be seen when they always occur on a regular grid. The phenomenon is known as blocking and was one of the weaknesses of MPEG-2. The solution sometimes adopted with MPEG-2 was to employ a post filter to minimize the blocking. The filter was not part of the standard but could be added to the later stages of a decoder where it would be able to access information about the position of macroblock boundaries and the type of picture (I, B or P). Any transient found at a macroblock boundary would either be genuine and left alone, or a blocking artefact that would benefit from filtering.

Although post filtering works, and can be applied to any codec, it has limitations. For example if an anchor picture contains blocking errors, after motion compensation the errors will be moved into the body of the macroblock where a post filter that knows only the block boundaries will not find them. Higher performance is obtained if the de-blocking filter is built into the codec. Built in de-blocking allows a higher compression factor to be used for the same quality, but it does increase the complexity of the codec as coding theory would predict. The complexity of the AVC decoder may be from a quarter to a third greater than an MPEG-2 decoder because of the use of de-blocking. There are also implications for power consumption.

Fig.1 shows how de-blocking is built into a codec. As the encoder must contain a decoder, the encoder must contain a de-blocker so it knows at all times what the decoder will do. If this is not done, the encoder and the decoder will suffer drift instead of tracking one another. With a de-blocker in the encoder, it follows that the de-blocker must be specified by the standard so that encoders and decoders remain compatible. AVC differs from prior codecs in adopting such an approach. The de-blocker is a compulsory part of the standard and all compliant decoders must be able to perform de-blocking. It is, however, legal for a compliant encoder not to use de-blocking.

Fig.1 - The encoder and decoder must track, and this is achieved by putting a de-blocker in the encoder that is the same as the de-blocker in the decoder. That means the de-blocker must be part of the coding standard.

The de-blocker has access to the entire transmitted bitstream and can make useful decisions from that. For example, the wordlength to which coefficients are truncated in lossy coding allows the magnitude of quantizing error to be estimated. It is those quantizing errors that cause blocking. By finding macroblock boundaries and estimating the quantizing error, the de-blocker knows where the errors are likely to be and how big they will be. Appropriate filtering actually reduces the quantizing error.

The advantage of built-in de-blocking is that decoded pictures become more accurate. When they are used for prediction that prediction will also be more accurate and fewer residual data will need to be sent. Built-in de-blocking makes the codec more efficient which post-filtering cannot do. Much of the enhanced performance of AVC over its predecessors is down to the de-blocking.

One interesting fact is that the use of de-blocking barely shows up in signal-to-noise measurements. An improvement of only a fraction of a dB is measured. One reason is that de-blocking only works on block boundaries, leaving the rest of the picture unaffected. The main reason is that the HVS finds blocking subjectively very annoying because it is sensitive to the grid structure. The same errors randomly located, instead of on a grid, would probably not be seen. We also learn from this that codec errors are not actually noise because they are not random and not de-correlated from the signal. Signal to noise ratio is not a useful metric when the problem is not noise and when the subjectivity of the HVS is involved.

De-blocking filters are complex because they must be adaptive. It is not possible to use a fixed filter that is always active because pixel differences across a macroblock boundary can be genuine as well as being caused by blocking. For example, if the macroblock boundary happens to coincide with the edge of an object, a steep change in luminance is to be expected as the brightness of the object transitions to the brightness of the background. A blocking artefact will be masked by the transition. Filtering such a transition will impair the picture instead of improving it. The steepness of the luminance change can be estimated by considering a number of pixels in a row or column on each side of the boundary and if these exceed a threshold the filter is turned off.

Blocking is most visible in areas of relatively constant brightness and hue, such as the sky. Such areas have few steep changes, so it possible to set the threshold to enable the filter only when steep changes are absent.

Blocking artefacts lie on a grid and can appear in vertical and/or horizontal lines. These are processed separately. In AVC, vertical edges are filtered first, followed by horizontal edges. As the chroma signal is subsampled, chroma filtering is done independently, once more with vertical edges first.

In practice the filter will have a variable window, which means that the number of pixels it affects on either side of the boundary between a pair of blocks can change. The decision making regarding what filter window to use and whether to filter at all requires more processing power than the actual filtering. The decision-making process uses a combination of feed forward and feedback and takes into account a number of factors. The feed forward aspect tries to predict how bad the error could be based on the characteristics of the two macroblocks concerned. The prediction is known as the boundary strength and has a value of 0 to 4, where 0 means the filter is off and 4 means the strongest filter should be used.

If both macroblocks concerned have not been intra coded, there cannot be any quantizing error and the filtering will be disabled. If only one macroblock of the pair has been intra coded, the boundary strength is likely to be smaller, whereas if both have been intra coded the boundary strength could be larger. In motion compensated coding, the motion estimation has finite accuracy and so do the motion vectors. This means that motion compensated data do not always fit perfectly and error at bock boundaries may result. In bidirectional coding adjacent macroblocks could have come from different anchor pictures and the chances of a boundary error are higher. AVC increases the boundary strength for macroblocks that have come from different anchors. The feedback aspect of the filter control looks at actual pixel values to see if the transient at the boundary is genuine or an artefact and to allow or disable the filter. If the filter is allowed, it will use the strength calculated by the feed forward.

Chroma filtering can simply follow the decisions made for luma filtering. The filters themselves are quite simple finite impulse response structures that implement essentially a low pass filtering effect. The lower the frequency response of the filter the stronger the effect and the more important it becomes not to use the filter when it is not appropriate as it can cause loss of resolution.

Other related articles posted on The Broadcast Bridge.

Compression: Part 8 - Spatial Compression

You might also like...

Microphones: Part 10 - Mid-Side (M-S) Recording And Processing

M-S techniques provide useful sound-field positioning and a convenient way to check mono compatibility. We explain the hard science behind this often misunderstood technique.

Microphones: Part 9 - The Science Of Stereo Capture & Reproduction

Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.

Microphones: Part 8 - Audio Vectorscopes

The audio vectorscope is an excellent tool for assuring quality in stereo sound production, because it makes the virtual sound image visible in the same way that a television vectorscope allows the color signals to be seen.

Microphones: Part 7 - Microphones For Stereophony

Once the basic requirements for reproducing sound were in place, the most significant next step was to reproduce to some extent the spatial attributes of sound. Stereophony, using two channels, was the first successful system.

Microphones: Part 6 - Omnidirectional Response In Practice

Having looked at how microphones are supposed to work, here we see that what happens in practice isn’t quite the same because the ideal and the actual are somewhat different.