Compression: Part 7 - Macro Blocks

Here we continue the story of motion compensated compression using macroblocks.

Compression relies heavily on motion compensation to locate temporal redundancy or, more precisely, redundancy along optic flow axes. This in itself works extremely well and, as has been noted, a lossless codec could be made using just temporal coding and transmitting the full residual. In MPEG-2 the macroblocks on which motion compensation is based were always the same size and this led to prediction errors when a moving object boundary cut through the middle of a macroblock. When two parts of the block are moving differently, no single vector can be correct. In AVC the 16 x 16 macroblock can be divided up in to two 8 x 16 blocks or two 16 x8 blocks, or into four 8 x 8 blocks, each having their own vectors. This allows the boundary of a moving object to be more closely approximated so the prediction error becomes smaller. The number of vectors to be sent rises, but this is easily coded because vectors within a moving object are similar or identical as are vectors outside it.

In practice, higher compression factors are demanded and the residual is not sent in full, but is itself compressed and that must result in a loss of quality. The use of macroblocks of constant size means that the macroblock boundaries fall on a horizontal and vertical grid in the picture. When lossy coding is used on the residual data, the resultant errors mean that the boundary between a pair of macroblocks may not be properly portrayed. If, for example, the luma at the edge of one block is slightly too high and the luma at the mating edge of the next block is too low because of coding errors, an unwanted transient will be introduced at the boundary.

The HVS is very good as spotting regularity in images and even relatively low levels of these transients can be seen when they always occur on a regular grid. The phenomenon is known as blocking and was one of the weaknesses of MPEG-2. The solution sometimes adopted with MPEG-2 was to employ a post filter to minimize the blocking. The filter was not part of the standard but could be added to the later stages of a decoder where it would be able to access information about the position of macroblock boundaries and the type of picture (I, B or P). Any transient found at a macroblock boundary would either be genuine and left alone, or a blocking artefact that would benefit from filtering.

Although post filtering works, and can be applied to any codec, it has limitations. For example if an anchor picture contains blocking errors, after motion compensation the errors will be moved into the body of the macroblock where a post filter that knows only the block boundaries will not find them. Higher performance is obtained if the de-blocking filter is built into the codec. Built in de-blocking allows a higher compression factor to be used for the same quality, but it does increase the complexity of the codec as coding theory would predict. The complexity of the AVC decoder may be from a quarter to a third greater than an MPEG-2 decoder because of the use of de-blocking. There are also implications for power consumption.

Fig.1 shows how de-blocking is built into a codec. As the encoder must contain a decoder, the encoder must contain a de-blocker so it knows at all times what the decoder will do. If this is not done, the encoder and the decoder will suffer drift instead of tracking one another. With a de-blocker in the encoder, it follows that the de-blocker must be specified by the standard so that encoders and decoders remain compatible. AVC differs from prior codecs in adopting such an approach. The de-blocker is a compulsory part of the standard and all compliant decoders must be able to perform de-blocking. It is, however, legal for a compliant encoder not to use de-blocking.

Fig.1 - The encoder and decoder must track, and this is achieved by putting a de-blocker in the encoder that is the same as the de-blocker in the decoder. That means the de-blocker must be part of the coding standard.

Fig.1 - The encoder and decoder must track, and this is achieved by putting a de-blocker in the encoder that is the same as the de-blocker in the decoder. That means the de-blocker must be part of the coding standard.

The de-blocker has access to the entire transmitted bitstream and can make useful decisions from that. For example, the wordlength to which coefficients are truncated in lossy coding allows the magnitude of quantizing error to be estimated. It is those quantizing errors that cause blocking. By finding macroblock boundaries and estimating the quantizing error, the de-blocker knows where the errors are likely to be and how big they will be. Appropriate filtering actually reduces the quantizing error.

The advantage of built-in de-blocking is that decoded pictures become more accurate. When they are used for prediction that prediction will also be more accurate and fewer residual data will need to be sent. Built-in de-blocking makes the codec more efficient which post-filtering cannot do. Much of the enhanced performance of AVC over its predecessors is down to the de-blocking.

One interesting fact is that the use of de-blocking barely shows up in signal-to-noise measurements. An improvement of only a fraction of a dB is measured. One reason is that de-blocking only works on block boundaries, leaving the rest of the picture unaffected. The main reason is that the HVS finds blocking subjectively very annoying because it is sensitive to the grid structure. The same errors randomly located, instead of on a grid, would probably not be seen. We also learn from this that codec errors are not actually noise because they are not random and not de-correlated from the signal. Signal to noise ratio is not a useful metric when the problem is not noise and when the subjectivity of the HVS is involved.

De-blocking filters are complex because they must be adaptive. It is not possible to use a fixed filter that is always active because pixel differences across a macroblock boundary can be genuine as well as being caused by blocking. For example, if the macroblock boundary happens to coincide with the edge of an object, a steep change in luminance is to be expected as the brightness of the object transitions to the brightness of the background. A blocking artefact will be masked by the transition. Filtering such a transition will impair the picture instead of improving it. The steepness of the luminance change can be estimated by considering a number of pixels in a row or column on each side of the boundary and if these exceed a threshold the filter is turned off.

Blocking is most visible in areas of relatively constant brightness and hue, such as the sky. Such areas have few steep changes, so it possible to set the threshold to enable the filter only when steep changes are absent.

Blocking artefacts lie on a grid and can appear in vertical and/or horizontal lines. These are processed separately. In AVC, vertical edges are filtered first, followed by horizontal edges. As the chroma signal is subsampled, chroma filtering is done independently, once more with vertical edges first.

In practice the filter will have a variable window, which means that the number of pixels it affects on either side of the boundary between a pair of blocks can change. The decision making regarding what filter window to use and whether to filter at all requires more processing power than the actual filtering. The decision-making process uses a combination of feed forward and feedback and takes into account a number of factors. The feed forward aspect tries to predict how bad the error could be based on the characteristics of the two macroblocks concerned. The prediction is known as the boundary strength and has a value of 0 to 4, where 0 means the filter is off and 4 means the strongest filter should be used.

If both macroblocks concerned have not been intra coded, there cannot be any quantizing error and the filtering will be disabled. If only one macroblock of the pair has been intra coded, the boundary strength is likely to be smaller, whereas if both have been intra coded the boundary strength could be larger. In motion compensated coding, the motion estimation has finite accuracy and so do the motion vectors. This means that motion compensated data do not always fit perfectly and error at bock boundaries may result. In bidirectional coding adjacent macroblocks could have come from different anchor pictures and the chances of a boundary error are higher. AVC increases the boundary strength for macroblocks that have come from different anchors. The feedback aspect of the filter control looks at actual pixel values to see if the transient at the boundary is genuine or an artefact and to allow or disable the filter. If the filter is allowed, it will use the strength calculated by the feed forward.

Chroma filtering can simply follow the decisions made for luma filtering. The filters themselves are quite simple finite impulse response structures that implement essentially a low pass filtering effect. The lower the frequency response of the filter the stronger the effect and the more important it becomes not to use the filter when it is not appropriate as it can cause loss of resolution.

You might also like...

Standards: Part 11 - Streaming Video & Audio Over IP Networks

Streaming services deliver content to the end-users via an IP network connection. The transport process is similar to broadcasting and shares some of the same technologies but there are some unique caveats.

Designing IP Broadcast Systems: Routing

IP networks are wonderfully flexible, but this flexibility can be the cause of much frustration, especially when broadcasters must decide on a network topology.

Audio For Broadcast: Cloud Based Audio

With several industry leading audio vendors demonstrating milestone product releases based on new technology at the 2024 NAB Show, the evolution of cloud-based audio took a significant step forward. In light of these developments the article below replaces previously published content…

Future Technologies: New Hardware Paradigms

As we continue our series of articles considering technologies of the near future and how they might transform how we think about broadcast, we consider the potential processing paradigm shift offered by GPU based processing.

Standards: Part 10 - Embedding And Multiplexing Streams

Audio visual content is constructed with several different media types. Simplest of all would be a single video and audio stream synchronized together. Additional complexity is commonplace. This requires careful synchronization with accurate timing control.