Understanding Compression Technology: Motion Estimation Part 2

Part 1 of this article covered multiple aspects of compression technology: macroblocks, DCT, quantization, and lossless compression. Part 2 will focus on motion estimation for P- and B-frames—after a review of lossless data compacting. Part 3 (next month) will detail the critical role of Predicted frames and Difference frames in maintaining image quality.

Compression, defined as the removal of information deemed by an encoder’s designer as not essential to a successful transfer of visual information can be performed in multiple ways: reduction of chroma information (i.e., 4:2:2 to 4:2:0 colorspace); noise reduction; quantization, and lossless data compacting. (The definition of “successful” is based upon an encoder’s design specification.)

The quantization process is the primary locus of compression. During quantization the 64-cell coefficient matrix from a DCT is multiplied by a pre-defined quantization matrix. (Pre-defined here means based upon an encoder’s design.) The definition is, however, relative to whether Variable Bit Rate (VBR) or Constant Bit Rate (CBR) encoding is applied. When the compression factor—represented by a quantization matrix—is kept constant under varying image complexity, the result is Variable Bit Rate (VBR) encoding. Alternatively, by monitoring the output bit-rate and dynamically altering the compression factor, data output is smoothed thereby yielding Constant Bit Rate (CBR) encoding.

Lossless data reduction follows quantization and employs Variable Length Coding (VLC) and then Run Length Coding (RLC). Both processes reduce information by compacting the data that results from quantization. Like ZIP file compression, no information is lost. Once the luminance (Y) blocks have been compressed, the Cb and Cr blocks are compressed. All compressed blocks are then stored.

This compression process is the same for intra-frame encoding, and for generating an I-frame as part of inter-frame encoding in which a series of video frames are compressed into data contained within in a sequence of several types of data frames. One frame type, of course, is the “I” frame. Two other types of frames that can be found in a GOP (Group Of Picture) data stream include “P” (predictive) frames and “B” (bi-predictive) frames.

A P-frame contains the information needed to recreate a video frame in conjunction with the information from a previous closest I-frame or a previous closest P-frame. A B-frame contains the information needed to reconstitute a video frame when combined with information from: a previous closest I-frame; a previous closest P-frame; a future closest I-frame (open GOPs only); and a future closest P-frame. (A closed GOP is never dependent on information from another GOP.) Frame dependencies for 15-frame open and closed GOPs are shown below.

Compression relies on three types of frames. The first is the “I” or inter-frame. Two other types of frames that can be found in a GOP (Group Of Picture) data stream include “P” (predictive) frames and “B” (bi-predictive) frames.

(H.264 introduced the concept of slices—segments of a picture bigger than a macroblock but smaller than a frame. A P-slice depends only on a single link (motion vector) to another slice. A B-slice depends on two links. Unlike MPEG-2, “bi” here means two links—not two dependency directions.)

Motion estimation may be one of the most interesting digital technologies developed because in a way it “sees” the movement of objects over time in a sequence of video frames. One application of this technology is the creation of intermediate frames when, to avoid LCD display motion blur, 60fps video is converted to 120fps or 240fps video. Motion estimation is also employed when generating P-frames and B-frames. It is the first stage of creating P-frames and B-frames.

Video frames that will become P- and B-frames are partitioned into macroblocks in the same way as done for an I-frame. Starting with the first (uppermost, leftmost) macroblock, which is contained in what is called the Present Image, a search is made to determine where it’s content can be found in the next video frame, which is called the Adjacent Image.

The first comparison is made in the Adjacent Image at X=0 and Y=0 coordinates to determine if the Present Image’s first macroblock remains at its initial location. To determine whether or not a macroblock has moved, a content match is made between the Present Image and the Adjacent Image. (To measure the strength of a match, a correlation technique is used. The correlation must be above a defined threshold to be a match.) When the contents of a macroblock have not moved, the macroblock’s Motion Vector is set to zero.

When a match is not found at X=0 and Y=0, in a methodical pattern, the Present Image’s comparison macroblock is moved at an increasing, but limited, distance from its origin until there is a match— or no ultimately no match. Movement size is typically one PEL (Picture Element, e.g., a pixel), although a step-size of ½ PEL can be employed. The maximum number of X and Y steps allowed defines the search window shown by a red square, below.

Motion estimation can be considered as “seeing” the movement of objects over time in a sequence of video frames. This is often used to avoid LCD display motion blur; 60fps video is converted to 120fps or 240fps video. It is the first stage of creating P-frames and B-frames.

The displacement (direction and distance) moved until a match is made, determines a macroblock’s motion vector. A motion vector (the small arrow) is shown below.

Video frames that will become P- and B-frames are partitioned into macroblocks. Beginning with the first (uppermost, leftmost) macroblock, which is contained in what is called the Present Image, a search is made to determine where it’s content can be found in the next video frame, which is called the Adjacent Image.

Once a search is made for the first macroblock, additional searches are made for every macroblock within the Present Image. In this manner every macroblock within a Present Image is assigned a motion vector. A Present Image’s motion vectors are stored in a Motion Estimation block. Although an estimation block will ultimately be stored, it is first used to generate a Predicted Frame. This process will be detailed in Part 3.

The success of inter-frame compression depends on the contents of most macroblocks not moving from frame to frame. Therefore, the motion vector for most macroblocks is zero. When this assumption is violated, for example when an explosion fills the screen, you are likely to see a screen filled with ugly “macroblocking.”

You might also like...

NAB Show 2024 BEIT Sessions Part 2: New Broadcast Technologies

The most tightly focused and fresh technical information for TV engineers at the NAB Show will be analyzed, discussed, and explained during the four days of BEIT sessions. It’s the best opportunity on Earth to learn from and question i…

Standards: Part 6 - About The ISO 14496 – MPEG-4 Standard

This article describes the various parts of the MPEG-4 standard and discusses how it is much more than a video codec. MPEG-4 describes a sophisticated interactive multimedia platform for deployment on digital TV and the Internet.

Chris Brown Discusses The Themes Of The 2024 NAB Show

The Broadcast Bridge sat down with Chris Brown, executive vice president and managing director, NAB Global Connections and Events to discuss this year’s gathering April 13-17 (show floor open April 14-17) and how the industry looks to the show e…

NAB Show 2024 BEIT Sessions Part 1: ATSC 3.0 And TV RF

A full-time chief engineer in good relationships with manufacturer reps and an honest local dealer should spend most of their NAB Show time immersed in BEIT sessions. It’s an incredible opportunity to learn from and personally question indisputable industry e…

Audio For Broadcast - The Book

​Audio For Broadcast - The Book gathers together 16 articles into a 78 page eBook which explores the science and practical applications of audio in broadcast.  This book is not aimed at audio A1’s, it is intended as a reference resource for …