Understanding Compression Technology. Part 1.

Over the decades from our industry’s move from analog to digital based technology, codecs (encoders and decoders) have played a key role in the capture, editing, and distribution of video media. The MPEG-2 codec has supported all three tasks. Camera oriented codecs have moved from MPEG-2 to MPEG-4 (H.264). Distribution has followed two paths. Television program distribution continues to employ MPEG-2. Internet distribution immediately went to progressive video encoded using H.264 because the codec is about 2X more efficient than MPEG-2. The newest distribution format is H.265 or HEVC (high efficiency video codec).

While the details of compression change over sequential implementations, there are seven aspects of MPEG-2 and H.264/H.265 technology that are fundamental to compression: Macroblocks, DCT, Quantization, Lossless Compression, Motion Estimation, Predicted Frames, and Difference frames. Part 1 of this article will cover macroblocks, DCT, and quantization. Lossless compression and motion estimation and will be covered in Part 2. Part 3 will detail the critical role of predicted frames and difference frames.

MPEG-2, H.264 & H265 can be compressed (encoded) two ways. The simplest, is intra-frame encoding. Here each video frame is compressed without any dependence on any other frame. (Both the DV and HEVC/AVC-Intra codecs use intra-frame compression.) The more typical form of compression is inter-frame encoding because it is far more efficient. Inter-frame encoding employs an “I-frame” plus “P-frames” and usually “B-frames.” An I-frame is compressed using the same process as is use by intra-frame encoding.

Video frames may need to be chroma down-sampled to 4:2:0 prior to encoding. Low-pass filtering then reduces noise within each video frame. Both types of filtering are mild forms of lossy compression.

Encoding a video frame begins with the macroblock in the upper-left corner. MPEG-2 encoding starts with a video frame partitioned into non-overlapping 16x16 pixel blocks called macroblocks” Assuming 4:2:0 color-sampling, the amount of Red chroma (Cr) data is one-quarter the amount of luminance (Y) data—thus one 8x8 block of Cr data is pulled from a macroblock. Likewise, one 8x8 block of Blue chroma (Cb) data are pulled from a macroblock. Each luminance macroblock is subdivided into four 8x8 blocks thereby yielding a total of six 8x8 blocks. (H.264 employs an adaptive scheme. Small blocks are employed when there is extensive detail while large blocks are used when there are few details—for example an area of blue sky).

A Discrete Cosine Transform (DCT) is then applied to the each of the six 8x8 data blocks. A DCT is 2D Fast Fourier Transform (FFT) that inputs an 8x8 pixel-block and transforms it from the spatial domain to the frequency domain. In other words, pixel information is converted to a mathematical (cosine coefficients) as represented by the grayscale in the figure below.

Image 1: Pixel information is converted into a mathematical representation of cosine coefficients as represented by the grayscale in the figure below.

The 64-coefficients come from an 8x8 block of pixels where the upper-left cell represents the DC (zero-frequency) value. From left-to-right and top-to-bottom these values represent an increase in signal frequency, which we understand to represent an increase in image detail. To this point, encoding has been mathematically lossless.

The next encoding stage, quantization, performs lossy compression. During quantization the coefficient matrix from a DCT is multiplied by a pre-defined “quantization matrix.” (A quantization matrix determines the amount of compression to be applied.) The matrix favors coefficients toward the upper-left of the coefficient matrix. Those coefficients that are out of favor typically become zero and thus generate very little data.

Image 2: The next encoding stage, illustrated above, performs a lossy compression. During quantization the coefficient matrix from a DCT is multiplied by a pre-defined “quantization matrix.

Lossless data reduction, Variable Length Coding (VLC) and Run Length Coding (RLC), follow quantization. Variable length coding identifies the most frequent patterns in the quantized coefficients. They are then represented by codes defined by only a few bits. Less frequent patterns, conversely, are represented by codes that require more bits.

Next, the set of numeric values resulting from VLC are run length encoded. RLC generates a unique code that represents a repeating pattern found in the output from the VLC encoding stage. For example, a “run” of 31 zero values—as seen in the figure above—becomes a unique code to which a repeat count is appended.

All the data from the RLC stage are then stored. Remember that the DCT, quantization, VLC, and RLC must be completed for all six 8x8 pixel-blocks before a new video-block can be processed.

You might also like...

Minimizing OTT Churn Rates Through Viewer Engagement

A D2C streaming service requires an understanding of satisfaction with the service – the quality of it, the ease of use, the style of use – which requires the right technology and a focused information-gathering approach.

Designing IP Broadcast Systems: Where Broadcast Meets IT

Broadcast and IT engineers have historically approached their professions from two different places, but as technology is more reliable, they are moving closer.

Network Orchestration And Monitoring At NAB 2024

Sophisticated IP infrastructure requires software layers to facilitate network & infrastructure planning, orchestration, and monitoring and there will be plenty in this area to see at the 2024 NAB Show.

Encoding & Transport For Remote Contribution At NAB 2024

As broadcasters embrace remote production workflows the technology required to compress, encode and reliably transport streams from the venue to the network operation center or the cloud become key, and there will be plenty of new developments and sources of…

Standards: Part 7 - ST 2110 - A Review Of The Current Standard

Of all of the broadcast standards it is perhaps SMPTE ST 2110 which has had the greatest impact on production & distribution infrastructure in recent years, but much has changed since it’s 2017 release.