Understanding Compression Technology. Part 1.

Over the decades from our industry’s move from analog to digital based technology, codecs (encoders and decoders) have played a key role in the capture, editing, and distribution of video media. The MPEG-2 codec has supported all three tasks. Camera oriented codecs have moved from MPEG-2 to MPEG-4 (H.264). Distribution has followed two paths. Television program distribution continues to employ MPEG-2. Internet distribution immediately went to progressive video encoded using H.264 because the codec is about 2X more efficient than MPEG-2. The newest distribution format is H.265 or HEVC (high efficiency video codec).

While the details of compression change over sequential implementations, there are seven aspects of MPEG-2 and H.264/H.265 technology that are fundamental to compression: Macroblocks, DCT, Quantization, Lossless Compression, Motion Estimation, Predicted Frames, and Difference frames. Part 1 of this article will cover macroblocks, DCT, and quantization. Lossless compression and motion estimation and will be covered in Part 2. Part 3 will detail the critical role of predicted frames and difference frames.

MPEG-2, H.264 & H265 can be compressed (encoded) two ways. The simplest, is intra-frame encoding. Here each video frame is compressed without any dependence on any other frame. (Both the DV and HEVC/AVC-Intra codecs use intra-frame compression.) The more typical form of compression is inter-frame encoding because it is far more efficient. Inter-frame encoding employs an “I-frame” plus “P-frames” and usually “B-frames.” An I-frame is compressed using the same process as is use by intra-frame encoding.

Video frames may need to be chroma down-sampled to 4:2:0 prior to encoding. Low-pass filtering then reduces noise within each video frame. Both types of filtering are mild forms of lossy compression.

Encoding a video frame begins with the macroblock in the upper-left corner. MPEG-2 encoding starts with a video frame partitioned into non-overlapping 16x16 pixel blocks called macroblocks” Assuming 4:2:0 color-sampling, the amount of Red chroma (Cr) data is one-quarter the amount of luminance (Y) data—thus one 8x8 block of Cr data is pulled from a macroblock. Likewise, one 8x8 block of Blue chroma (Cb) data are pulled from a macroblock. Each luminance macroblock is subdivided into four 8x8 blocks thereby yielding a total of six 8x8 blocks. (H.264 employs an adaptive scheme. Small blocks are employed when there is extensive detail while large blocks are used when there are few details—for example an area of blue sky).

A Discrete Cosine Transform (DCT) is then applied to the each of the six 8x8 data blocks. A DCT is 2D Fast Fourier Transform (FFT) that inputs an 8x8 pixel-block and transforms it from the spatial domain to the frequency domain. In other words, pixel information is converted to a mathematical (cosine coefficients) as represented by the grayscale in the figure below.

Image 1: Pixel information is converted into a mathematical representation of cosine coefficients as represented by the grayscale in the figure below.

The 64-coefficients come from an 8x8 block of pixels where the upper-left cell represents the DC (zero-frequency) value. From left-to-right and top-to-bottom these values represent an increase in signal frequency, which we understand to represent an increase in image detail. To this point, encoding has been mathematically lossless.

The next encoding stage, quantization, performs lossy compression. During quantization the coefficient matrix from a DCT is multiplied by a pre-defined “quantization matrix.” (A quantization matrix determines the amount of compression to be applied.) The matrix favors coefficients toward the upper-left of the coefficient matrix. Those coefficients that are out of favor typically become zero and thus generate very little data.

Image 2: The next encoding stage, illustrated above, performs a lossy compression. During quantization the coefficient matrix from a DCT is multiplied by a pre-defined “quantization matrix.

Lossless data reduction, Variable Length Coding (VLC) and Run Length Coding (RLC), follow quantization. Variable length coding identifies the most frequent patterns in the quantized coefficients. They are then represented by codes defined by only a few bits. Less frequent patterns, conversely, are represented by codes that require more bits.

Next, the set of numeric values resulting from VLC are run length encoded. RLC generates a unique code that represents a repeating pattern found in the output from the VLC encoding stage. For example, a “run” of 31 zero values—as seen in the figure above—becomes a unique code to which a repeat count is appended.

All the data from the RLC stage are then stored. Remember that the DCT, quantization, VLC, and RLC must be completed for all six 8x8 pixel-blocks before a new video-block can be processed.

You might also like...

Preventing The Streaming Tsunami

Today, most broadcasters deliver less than 10% of their total viewing hours via OTT streaming services. As that shifts to streaming first delivery the Tsunami will be big… so what can be done about it?

Local TV In The U.S.A – 1967 Style

Our very own TV pioneer shares recollections of local TV in the US from his start in 1967.

Monitoring & Compliance In Broadcast: Monitoring Delivery In The Converged OTA – OTT Ecosystem

Convergence or coexistence between linear broadcast, IP based delivery and 5G mobile networks creates new challenges for monitoring of delivery paths, both technically and logistically.

Seeing The Streaming Tsunami Coming

Streaming video is on the cusp of becoming a major problem for broadband networks. We are about to see a huge Tsunami wave of demand emerge as broadcasters finally make a big shift towards streaming-first.

Monitoring & Compliance In Broadcast: Monitoring The Media Supply Chain

Why monitoring the multi-format delivery ecosystem starts with a holistic approach to the entire media supply chain.