Understanding Compression Technology. Part 1.

Over the decades from our industry’s move from analog to digital based technology, codecs (encoders and decoders) have played a key role in the capture, editing, and distribution of video media. The MPEG-2 codec has supported all three tasks. Camera oriented codecs have moved from MPEG-2 to MPEG-4 (H.264). Distribution has followed two paths. Television program distribution continues to employ MPEG-2. Internet distribution immediately went to progressive video encoded using H.264 because the codec is about 2X more efficient than MPEG-2. The newest distribution format is H.265 or HEVC (high efficiency video codec).

While the details of compression change over sequential implementations, there are seven aspects of MPEG-2 and H.264/H.265 technology that are fundamental to compression: Macroblocks, DCT, Quantization, Lossless Compression, Motion Estimation, Predicted Frames, and Difference frames. Part 1 of this article will cover macroblocks, DCT, and quantization. Lossless compression and motion estimation and will be covered in Part 2. Part 3 will detail the critical role of predicted frames and difference frames.

MPEG-2, H.264 & H265 can be compressed (encoded) two ways. The simplest, is intra-frame encoding. Here each video frame is compressed without any dependence on any other frame. (Both the DV and HEVC/AVC-Intra codecs use intra-frame compression.) The more typical form of compression is inter-frame encoding because it is far more efficient. Inter-frame encoding employs an “I-frame” plus “P-frames” and usually “B-frames.” An I-frame is compressed using the same process as is use by intra-frame encoding.

Video frames may need to be chroma down-sampled to 4:2:0 prior to encoding. Low-pass filtering then reduces noise within each video frame. Both types of filtering are mild forms of lossy compression.

Encoding a video frame begins with the macroblock in the upper-left corner. MPEG-2 encoding starts with a video frame partitioned into non-overlapping 16x16 pixel blocks called macroblocks” Assuming 4:2:0 color-sampling, the amount of Red chroma (Cr) data is one-quarter the amount of luminance (Y) data—thus one 8x8 block of Cr data is pulled from a macroblock. Likewise, one 8x8 block of Blue chroma (Cb) data are pulled from a macroblock. Each luminance macroblock is subdivided into four 8x8 blocks thereby yielding a total of six 8x8 blocks. (H.264 employs an adaptive scheme. Small blocks are employed when there is extensive detail while large blocks are used when there are few details—for example an area of blue sky).

A Discrete Cosine Transform (DCT) is then applied to the each of the six 8x8 data blocks. A DCT is 2D Fast Fourier Transform (FFT) that inputs an 8x8 pixel-block and transforms it from the spatial domain to the frequency domain. In other words, pixel information is converted to a mathematical (cosine coefficients) as represented by the grayscale in the figure below.

Image 1: Pixel information is converted into a mathematical representation of cosine coefficients as represented by the grayscale in the figure below.

The 64-coefficients come from an 8x8 block of pixels where the upper-left cell represents the DC (zero-frequency) value. From left-to-right and top-to-bottom these values represent an increase in signal frequency, which we understand to represent an increase in image detail. To this point, encoding has been mathematically lossless.

The next encoding stage, quantization, performs lossy compression. During quantization the coefficient matrix from a DCT is multiplied by a pre-defined “quantization matrix.” (A quantization matrix determines the amount of compression to be applied.) The matrix favors coefficients toward the upper-left of the coefficient matrix. Those coefficients that are out of favor typically become zero and thus generate very little data.

Image 2: The next encoding stage, illustrated above, performs a lossy compression. During quantization the coefficient matrix from a DCT is multiplied by a pre-defined “quantization matrix.

Lossless data reduction, Variable Length Coding (VLC) and Run Length Coding (RLC), follow quantization. Variable length coding identifies the most frequent patterns in the quantized coefficients. They are then represented by codes defined by only a few bits. Less frequent patterns, conversely, are represented by codes that require more bits.

Next, the set of numeric values resulting from VLC are run length encoded. RLC generates a unique code that represents a repeating pattern found in the output from the VLC encoding stage. For example, a “run” of 31 zero values—as seen in the figure above—becomes a unique code to which a repeat count is appended.

All the data from the RLC stage are then stored. Remember that the DCT, quantization, VLC, and RLC must be completed for all six 8x8 pixel-blocks before a new video-block can be processed.

You might also like...

Live Sports Production: Camera To Truck

Much of the OB production infrastructure has moved to IP, but has the connectivity between the cameras and the OB or backhaul also migrated to IP?

Live Sports Production: Exploring The Evolving OB

The first of our three articles is focused on comparing what technology is required in OBs and other venue systems to support the various approaches to live sports production.

Cloud Compute Infrastructure At IBC 2025

In celebration of the 2025 IBC Show, this article focuses on the key theme of cloud compute infrastructure and what exhibitors at the show are doing in this key area of technological enablement.

Navigating Streaming Networks For Live Sports

With the relentless rise of consumers moving from OTA to live streaming of big-ticket sports, this series shares insight into what happens after content leaves production during a live stream. It is a subject broadcasters cannot afford to regard as…

Mobile Broadcasting Opportunities

Broadcasters have been catering for mobile viewing in various ways for many years but are now entering a new era as devices become more capable, with increasing scope for interactivity and greater immersiveness through Extended Reality (XR). But with connected…