Understanding Compression Technology: Motion Estimation Part 2

Compare the two images. The right, which is much clearer, used motion estimation in an NVIDIA GeForce 6600 GT graphics card. Image courtesy

Part 1 of this article covered multiple aspects of compression technology: macroblocks, DCT, quantization, and lossless compression. Part 2 will focus on motion estimation for P- and B-frames—after a review of lossless data compacting. Part 3 (next month) will detail the critical role of Predicted frames and Difference frames in maintaining image quality.

Compression, defined as the removal of information deemed by an encoder’s designer as not essential to a successful transfer of visual information can be performed in multiple ways: reduction of chroma information (i.e., 4:2:2 to 4:2:0 colorspace); noise reduction; quantization, and lossless data compacting. (The definition of “successful” is based upon an encoder’s design specification.)

The quantization process is the primary locus of compression. During quantization the 64-cell coefficient matrix from a DCT is multiplied by a pre-defined quantization matrix. (Pre-defined here means based upon an encoder’s design.) The definition is, however, relative to whether Variable Bit Rate (VBR) or Constant Bit Rate (CBR) encoding is applied. When the compression factor—represented by a quantization matrix—is kept constant under varying image complexity, the result is Variable Bit Rate (VBR) encoding. Alternatively, by monitoring the output bit-rate and dynamically altering the compression factor, data output is smoothed thereby yielding Constant Bit Rate (CBR) encoding.

Lossless data reduction follows quantization and employs Variable Length Coding (VLC) and then Run Length Coding (RLC). Both processes reduce information by compacting the data that results from quantization. Like ZIP file compression, no information is lost. Once the luminance (Y) blocks have been compressed, the Cb and Cr blocks are compressed. All compressed blocks are then stored.

This compression process is the same for intra-frame encoding, and for generating an I-frame as part of inter-frame encoding in which a series of video frames are compressed into data contained within in a sequence of several types of data frames. One frame type, of course, is the “I” frame. Two other types of frames that can be found in a GOP (Group Of Picture) data stream include “P” (predictive) frames and “B” (bi-predictive) frames.

A P-frame contains the information needed to recreate a video frame in conjunction with the information from a previous closest I-frame or a previous closest P-frame. A B-frame contains the information needed to reconstitute a video frame when combined with information from: a previous closest I-frame; a previous closest P-frame; a future closest I-frame (open GOPs only); and a future closest P-frame. (A closed GOP is never dependent on information from another GOP.) Frame dependencies for 15-frame open and closed GOPs are shown below.

Compression relies on three types of frames. The first is the “I” or inter-frame. Two other types of frames that can be found in a GOP (Group Of Picture) data stream include “P” (predictive) frames and “B” (bi-predictive) frames.

(H.264 introduced the concept of slices—segments of a picture bigger than a macroblock but smaller than a frame. A P-slice depends only on a single link (motion vector) to another slice. A B-slice depends on two links. Unlike MPEG-2, “bi” here means two links—not two dependency directions.)

Motion estimation may be one of the most interesting digital technologies developed because in a way it “sees” the movement of objects over time in a sequence of video frames. One application of this technology is the creation of intermediate frames when, to avoid LCD display motion blur, 60fps video is converted to 120fps or 240fps video. Motion estimation is also employed when generating P-frames and B-frames. It is the first stage of creating P-frames and B-frames.

Video frames that will become P- and B-frames are partitioned into macroblocks in the same way as done for an I-frame. Starting with the first (uppermost, leftmost) macroblock, which is contained in what is called the Present Image, a search is made to determine where it’s content can be found in the next video frame, which is called the Adjacent Image.

The first comparison is made in the Adjacent Image at X=0 and Y=0 coordinates to determine if the Present Image’s first macroblock remains at its initial location. To determine whether or not a macroblock has moved, a content match is made between the Present Image and the Adjacent Image. (To measure the strength of a match, a correlation technique is used. The correlation must be above a defined threshold to be a match.) When the contents of a macroblock have not moved, the macroblock’s Motion Vector is set to zero.

When a match is not found at X=0 and Y=0, in a methodical pattern, the Present Image’s comparison macroblock is moved at an increasing, but limited, distance from its origin until there is a match— or no ultimately no match. Movement size is typically one PEL (Picture Element, e.g., a pixel), although a step-size of ½ PEL can be employed. The maximum number of X and Y steps allowed defines the search window shown by a red square, below.

Motion estimation can be considered as “seeing” the movement of objects over time in a sequence of video frames. This is often used to avoid LCD display motion blur; 60fps video is converted to 120fps or 240fps video. It is the first stage of creating P-frames and B-frames.

The displacement (direction and distance) moved until a match is made, determines a macroblock’s motion vector. A motion vector (the small arrow) is shown below.

Video frames that will become P- and B-frames are partitioned into macroblocks. Beginning with the first (uppermost, leftmost) macroblock, which is contained in what is called the Present Image, a search is made to determine where it’s content can be found in the next video frame, which is called the Adjacent Image.

Once a search is made for the first macroblock, additional searches are made for every macroblock within the Present Image. In this manner every macroblock within a Present Image is assigned a motion vector. A Present Image’s motion vectors are stored in a Motion Estimation block. Although an estimation block will ultimately be stored, it is first used to generate a Predicted Frame. This process will be detailed in Part 3.

The success of inter-frame compression depends on the contents of most macroblocks not moving from frame to frame. Therefore, the motion vector for most macroblocks is zero. When this assumption is violated, for example when an explosion fills the screen, you are likely to see a screen filled with ugly “macroblocking.”

You might also like...

Building Software Defined Infrastructure: Observability In Microservice Architecture

Building dynamic microservices based infrastructure introduces the potential for variable latency which brings new monitoring challenges that require an understanding of observability.

IP Monitoring & Diagnostics With Command Line Tools: Part 5 - Using Shell Scripts

Shell scripts enable you to edit your diagnostic and monitoring commands into a script file so they can be repeated without needing to type them manually every time. Shell scripts also offer some unique and powerful features that help to…

Broadcast Standards: Kubernetes & The Architecture Of Cloud Compute Based Systems

Here we describe Kubernetes and the taxonomy of containerized architecture based cloud compute system designs it manages.

Live Sports Production: Backhaul In Live Sports Production

Getting content reliably and securely from venue to studio remains key to live sports production so here we discuss the technology and services required.

Monitoring & Compliance In Broadcast: Monitoring Delivery In The Converged OTA – OTT Ecosystem

Convergence or coexistence between linear broadcast, IP based delivery and 5G mobile networks creates new challenges for monitoring of delivery paths, both technically and logistically.