Image courtesy of XIPH.ORG
In Part 2 of this series on Compression Technology we learned how Motion Vectors are generated when motion estimation is employed as the first step of creating P-frames and B-frames. In Part 3 we’ll learn how these motion vectors are used to generate Predicted Frames.
Let’s review the nature of P- and B-frames by first looking at forward dependencies. Two types of frames serve as references for other frames: an I-frame can support a future P-frame and/or B-frame. And, a P-frame can support a future P-frame and/or B-frame. Put a different way, a P-frame and/or a B-frame can be dependent on a previous I-frame or a previous P-frame. Arrows that point leftward in the Closed GOP diagram below show such dependencies.
Dependencies among I-, P-, and B-Frames (Apple).
Video frames that will become P- and B-frames are partitioned into macroblocks in the same way as is done for an I-frame. Starting with the first macroblock in the Present Image (current video frame) a search is made to determine where it’s content can be found in the Adjacent Image (next video frame). When the contents of a macroblock have not moved, the macroblock’s motion vector is set to zero.
When a match is not found at X=0 and Y=0, the Present Image’s comparison macroblock is moved at an increasing distance from its origin until there is a match—or ultimately no match. Once a search is made for the first macroblock, additional searches are made for every macroblock within the Present Image. In this way every macroblock within a Present Image is assigned a motion vector. A Present Image’s motion vectors are stored in a Motion Estimation Block. Although an estimation block will ultimately be stored, it is first used to generate a Predicted Frame.
Below, the upper-left image is the Present Image. The upper-right image is the Adjacent Image (next video frame). One difference that has occurred between the capture of the Present Image and the capture of the Adjacent Image is obvious – the person has opened their eyes.
Steps to Generate a Predicted Frame (Wang).
Steps to Generate aPredicted Frame (Wang)
The lower-left image is the Adjacent Image with the calculated motion vectors superimposed. These motion vectors are applied to the Present Image (current video frame) to construct a Predicted Frame. Simply put, the vectors move macroblocks in the Present Image to new locations. The lower-right image shows the generated Predicted Frame.
Ideally, these vectors would move pixels exactly to their new locations. However, as shown, the Predicted Frame has errors. To eliminate motion estimation errors, a Difference Frame is created.
A Difference Frame is generated by subtracting the Adjacent Image (current video frame) from the Predicted Frame (next video frame). Were the motion vectors able to create a perfect Predicted Frame, the Predicted Frame would match the Adjacent Image and the Difference Frame would be empty. With motion video, likely there will be information in the Difference Frame – as shown below.
Difference Frame (Wang).
The Difference Frame is compressed (DCT) after which lossless data reduction is applied (VLC and RLC). This is the same process used to compress an I-frame. The motion estimation blocks are also VLC and RLC compressed. The compressed Difference Frame along with the compressed motion estimation block are then stored.
To summarize the compression process; each I-frame is intra-frame compressed and stored in a long-GOP stream. Each compressed P-frame includes two types of information: a motion estimation block and a Difference Frame. (Each compressed B-frame has two motion estimation blocks and two Difference Frames.)
As a stream is uncompressed, an I-frame is re-created by reversing its lossless compression and then performing an Inverse DCT. This yields a Present Image that is output as a video picture. (A Present Image can be obtained from a previous I- or P-frame.) When a P-frame is encountered in a long-GOP stream, its motion estimation block is uncompressed. These vectors are applied to the Present Image to create a Predicted Frame.
Next, the P-frame’s Difference Frame is re-created by reversing its lossless compression and then performing an Inverse DCT. With both a Predicted Frame and Difference Frame available, an Adjacent Image – output as a video picture – is generated by using the Difference Frame to correct errors in the Predicted Frame. (A B-frame’s single Adjacent Image is obtained from a previous I- or P-frame by appropriately employing two Difference Frames to correct errors in two Predicted Frames.)
This process is repeated for the remaining frames in each GOP. When the next I-frame is encountered, the process is repeated. Although P- and B-frames are more efficient – require less stored data – than are I-frames, because of the use of Difference Frames they have the same visual quality.
You might also like...
Our first Essential Insights is a set of three video episodes in which we discuss transitioning to IP with industry experts. We explore the fundamental challenges during the planning stage. The decisions that need to be made, and the long-term thi…
When CBS Sports broadcasts images of the players taking the field on February 7th for Super Bowl LV to millions of viewers around the world, it will be the culmination of the most challenging season for live football sports production…
The way consumers engage with content is constantly shifting and at a faster pace than ever before, leaving the television industry playing catch up. Broadcasters, production companies and content producers around the globe are seeing the complexities in production and…
The industry experienced futureshock head-on in 2020. The impact will take a long time to unwind but it’s already clear that some changes will be profound and not all of them bad. These changes include remote workflow permanency, virtual production s…
There’s no way to sugarcoat it: The pandemic has had a highly disruptive effect on video production and distribution in 2020 and many agree it will be felt for several years. The inability for people to gather safely has made i…