Motion Pictures: Part 2 - Optical Flow Axis

There is no motion in the static frames of a movie. The motion is purely in the imagination of the viewer. But how does it work?

The human visual system (HVS), in common with the visual systems of other forms of life is pretty remarkable. Light entering the eyes is converted into some kind of time variant image which is somehow communicated to the brain down a set of nerves. It is well known to those who work in the field that the bandwidth of nerves is very low indeed.

Yet even the old-fashioned standard definition (525/625) television signal required 270 megabits per second to send down an SDI cable and four or five megahertz of bandwidth to broadcast, although it was using tricks like interlace, color difference working and gamma to reduce the information rate.

There is no possible way a nervous system can handle bandwidth like that, yet the HVS can discern shortcomings in an SDTV picture and appreciate recent developments such as high definition, high dynamic range, and extended color gamut, all of which drive the information rate higher.

Clearly there must be something clever going on in the HVS: very clever indeed. We are often told the human eye is like a camera, but that is a misleading statement. It has a lens and an iris and an image forming sensor, but there the similarity ends. I am here writing this because my genes evolved to survive, and the visual system is an important consequence of that evolution.

Evolution reacted to the fact that in the typical field of view most of what can be seen is of no consequence and is static and neither a threat nor an opportunity. Threats in particular are seldom static. What evolved was a system in which most of what we think we see comes from a kind of mental frame store that will be updated if any movement is detected.

This is where the eye departs from being a camera. Most of the area of the retina is a motion detector. It is color blind and has poor resolution. Right in the center of the retina is a small area called the fovea that has high acuity and color vision. The eye has the ability to swivel left and right; up and down, so that if the motion detection spots something, the eye can turn to place it on the fovea so it can be seen in detail.

The eye scans a scene to fill up the frame store and from then on motion detection allows the frame store to be updated. All of this cuts dramatically the amount of information the HVS has to handle. A further and highly significant consequence of eyes that can move is that objects in motion can be tracked and brought to rest on the retina, eliminating motion smear and cutting information content. This is just as well, because the temporal response of the HVS is quite slow, whereas moving detail produces high frequencies.

Essentially living visual systems anticipated MPEG-2 in having motion detection that allows them to look along an optic flow axis. The motion vectors control the eyeballs and holding the image static on the retina means it doesn’t change and can be described with lower bandwidth. The key point to remember is that in all visual situations the HVS will attempt to use eye tracking. In real life it usually succeeds. When viewing artificially reproduced moving images the success may be only partial because of shortcomings in the system.

Fig.1 - The x, y plane is the plane of the image and of course the time axis t is orthogonal to that. However, the optic flow axis is not orthogonal when anything moves and it can be projected on the image plane.

Fig.1 - The x, y plane is the plane of the image and of course the time axis t is orthogonal to that. However, the optic flow axis is not orthogonal when anything moves and it can be projected on the image plane.

The traditional view that moving pictures can be described with three mutually orthogonal axes, x, y and z is incomplete. In the presence of eye tracking, x and y still exist in the image plane, but the important third axis is the axis of optic flow. As Fig.1 shows, the optic flow axis is typically not at right angles to the image plane and so it is not orthogonal. Put more plainly, things that are done on the time axis that is orthogonal to the image plane can still affect the image because actions on the time axis reflect off the optic flow axis and into the image plane.

If that sounds a bit academic, consider a fixed still camera shooting a moving object. The film is in x and y, and the shutter works in z, all three are mutually orthogonal. But the moving object isn’t moving along the z axis. Photographers soon learn that shooting moving objects requires short shutter speeds. If x, y and z are truly orthogonal, that wouldn’t make any difference. But it does. The object in motion has a component of movement across the image plane, so the shutter speed controls how far it moves.

In real life, optic flow axes tend to be straight for constant motion, or curved when objects change course or speed. The motion portrayal ability of imaging systems is basically the accuracy with which these axes are reproduced. None of today’s moving picture systems have good motion portrayal; it is just that 24Hz movies are even worse.

The reason that motion portrayal is so important is eye tracking. In real life, a moving object is tracked by the eye which can then extract detail from it. If in a motion picture system, the eye cannot track well, the detail seen by the viewer will be reduced.

When movies were in their infancy, test procedures simply copied what had been done in photography and the static resolution of the system was measured. Unfortunately, static resolution turned out to be virtually useless as a way of comparing moving images. Static resolution can be thought of as a kind of bound that cannot be exceeded. It is the resolution obtained when nothing moves.

Clearly movies and TV programs in which nothing moves are a long way from reality. When things do move, in the real word, the resolution actually obtained will always be less than the static resolution. The amount by which resolution is reduced depends upon the system and how it is used. It follows that the worse the system itself is, the more carefully it has to be used. One source of the film look is that it reflects that extra care.

Fig.2 - A traditional film projector has four phases within the 1/24 sec. frame. In phases 2 and 4 the screen is dark, but in phases 1 and 3 the same picture is on the screen. Relative to the tracking eye (shown dotted) there will be double images above a certain speed.

Fig.2 - A traditional film projector has four phases within the 1/24 sec. frame. In phases 2 and 4 the screen is dark, but in phases 1 and 3 the same picture is on the screen. Relative to the tracking eye (shown dotted) there will be double images above a certain speed.

A motion picture system based on frames is a sampling system. It is now common knowledge that sampling systems need to be preceded by a low-pass filter to prevent aliasing and followed by a further filter to return the sampled data to the continuous domain. Whilst that would be true for systems such as digital audio, in motion pictures it is not done.

Although a movie or TV camera is a sampling device, there is no technology that allows something to be placed in front of the camera to prevent temporal aliasing. It’s impossible. In fact, Nature is doing us a favor, because such a filter will be seen in Part 3 to be undesirable. Equally there is no optical device known that can be fitted to a display to smooth the frames/samples into a continuum. Again, this doesn’t matter because the HVS contains such a filter, which makes any display filter undesirable.

Movies and television simply don’t seem to adhere to sampling theory and they run with no temporal filters at all except for the filtering of the HVS. It is not that conventional sampling theory is wrong, it is that a more sophisticated form of sampling theory is needed to deal with the motion of the eye and the existence of optic flow axes.

Fig.2 shows that one frame period in a classical film projector is split in to four parts of about 1/96 of a second. In the first part the film frame is projected on to the screen. In the second part the shutter blocks the light path, and the screen goes dark. In the third part the shutter opens again, and the same film frame is projected a second time. In the fourth part the shutter closes, and the film is pulled down to the next frame so the cycle can repeat. 

Fig.3 - Electronic projectors don’t have pull down, and don’t need to double project. Nevertheless, motion is still limited to low speeds as the images are smeared with respect to a tracking eye.

Fig.3 - Electronic projectors don’t have pull down, and don’t need to double project. Nevertheless, motion is still limited to low speeds as the images are smeared with respect to a tracking eye.

The projector does this in order to make the flicker frequency 48Hz. However, Fig.2 shows the unintended consequence which is that the optic flow axis compares rather badly with the original. At low motion speeds the two projected versions of the frame arrive on the retina with a small displacement that causes defocusing. At high speeds, the retina sees a double image.

Fig.3 shows the same images handled by an electronic projector. These need no dark period to pull the film down, so the image is displayed for the whole 1/24 second. To the tracking eye the image moves across the retina a distance proportional to motion speed, once more causing loss of resolution.

Paradoxically the last thing that motion pictures can do is to portray motion and much of the grammar of movies results from an effort to get rid of it. The tracking shot is the perfect example. Moving the camera with the action renders the action still relative to the camera and pushes any funny stuff into the background. Fancy a trip to the barely movies?

You might also like...

An Introduction To Network Observability

The more complex and intricate IP networks and cloud infrastructures become, the greater the potential for unwelcome dynamics in the system, and the greater the need for rich, reliable, real-time data about performance and error rates.

Next-Gen 5G Contribution: Part 2 - MEC & The Disruptive Potential Of 5G

The migration of the core network functionality of 5G to virtualized or cloud-native infrastructure opens up new capabilities like MEC which have the potential to disrupt current approaches to remote production contribution networks.

Next-Gen 5G Contribution: Part 1 - The Technology Of 5G

5G is a collection of standards that encompass a wide array of different use cases, across the entire spectrum of consumer and commercial users. Here we discuss the aspects of it that apply to live video contribution in broadcast production.

NAB Show 2024 BEIT Sessions Part 2: New Broadcast Technologies

The most tightly focused and fresh technical information for TV engineers at the NAB Show will be analyzed, discussed, and explained during the four days of BEIT sessions. It’s the best opportunity on Earth to learn from and question i…

Standards: Part 6 - About The ISO 14496 – MPEG-4 Standard

This article describes the various parts of the MPEG-4 standard and discusses how it is much more than a video codec. MPEG-4 describes a sophisticated interactive multimedia platform for deployment on digital TV and the Internet.