Camera Lenses Part 6: Tracking Motion and the Effects

In the previous article lenses, the concept of the optic flow axis was introduced. This is the all-important fourth dimension of moving pictures because the human eye is capable of following, or tracking, motion. Let us look closer at how optic flow affects key aspects of imagery.

The concept is well proven, as it is used in motion-compensated standards convertors to locate moving objects in output frames in such a way that they do not judder, and in motion-compensated compressors, where the greatest inter-frame redundancy is found along the optic flow axis. These complex applications have benefited enormously from the use of this theory, which makes its neglect elsewhere harder to understand.

Optic Flow theory

It is useful to understand what optic flow theory predicts. 24Hz movies cannot, of course, be watched at 24Hz because the flicker is intolerable. Instead each frame is projected twice. Figure 1 shows that the effect of frame repeat is that pairs of images are the same and then all motion is condensed into a jump to the next pair. The optic flow axis on the screen gets rather clunky. What the tracking eye sees in the presence of motion is loss of resolution at low speed and a double image at higher speed. This is what is behind the moves to use increased frame rates, such as 48Hz, in the cinema.

Frame repeat in movies allows 24Hz material: a) to be seen with reduced flicker, but b) to the tracking eye, the frame repeat causes double images.

Optic flow theory can predict what will come out of real cameras. In the presence of motion, the optic flow axis is angled between the time axis and the image plane. A finite exposure time reflects off the sloping optic flow axis to produce a rectangular impulse on the image plane whose width is the amount by which the image will be smeared. That rectangle is the impulse response of the motion-induced low-pass filter that will reduce the resolution of the image. Self evidently, raising the frame rate shortens the shutter time and reduces the smear.

However, some 48Hz movies have been shot using electronic cameras having an exposure time practically equal to the frame period. The reason is that if alternate frames are discarded, the result is a 24Hz movie that looks as if it was shot with a 50 percent shutter angle, so there is backward compatibility. However, the exposure time in the 48Hz movie should have been half that of the 24Hz movie, but as it wasn’t there was no reduction in motion smear and the benefits of 48Hz film were not fully realised. Little wonder such movies met with a mixed reception.

Television images

Returning to television, let’s take an example of a true 4K camera having 4096 pixels across the frame and a lens that can handle it. Suppose the 60Hz camera exposes for half the frame period. Now let’s pan the camera, incredibly slowly at 34 seconds per picture width. I chose that speed because it means that the image moves across the sensor by exactly one pixel width while the shutter is open. In other words every pixel coming from the chip is the average of two pixels in the image. Your 4K camera has become a 2K camera. Pan at 17 seconds per picture width and you have a 1K camera. That 4K capable lens you bought is a bit over-specified, isn’t it?

This schoolboy calculation shows beyond doubt that static resolution (SR) is not a meaningful metric for moving picture quality. In the real world with real motion there are no cameras delivering 4K performance, there aren’t even any cameras delivering HD performance and there never will be as long as we adhere to these miserable frame rates based, without any science, on the local power-line frequency. 4K will never be broadcast widely as no one will be able to tell much difference on real material between native 4K and up-converted 720P. 4K is an oversampling display technology and possibly a capture technology.

The clunky motion portrayal of imaging systems remains the obvious clue that an artificial image is being watched and the transition from SD to HD did not remove that clue, nor will the adoption of 4K or HDR or wide colour gamut.

Use a spectrum analyser on any TV frame at random and the chances of finding frequencies corresponding to full resolution are practically nil. Now you know why compression works so well. Frames are divided into different spatial frequency bands and only those containing energy are coded. An MPEG bitstream analyzer will tell you the same story. Look for the highest spatial frequency coefficients in a block, and chances are you won’t find them.

Numbers can fool you

4K coders are claiming fantastically low bit rates. That is only possible because there is no such thing as a frame containing frequencies corresponding to 4K resolution. 4K video is just ordinary video massively oversampled. It has more pixels but no more information.

This image was down-converted from the original 60megabyte UHD frame. Camera was tripod based and the shot was timed for when the breeze dropped.

Interestingly, the fact that gamma correction works proves that TV signals never contain full resolution. Gamma is a strongly non-linear transfer function applied to the video signal at source. Strong non-linearity causes harmonics. The spectrum of video is doubled in bandwidth because of gamma. The fact that it still fits in the available channel is because high video frequencies simply aren’t present at the input to the gamma process.

I know from my own experience of still photography with a UHD sensor, that the best results are only obtained with the camera on a tripod to prevent inadvertent motion. Figure 2. shows a down-converted frame, and Figure 3. shows a small part of the frame to reveal the resolution. No real TV camera will ever be able to do that at present frame rates. 

A selected area of Figure 2 above. The 4K Resolution shown here can only be obtained with great care taken to avoid motion, so no 4K TV camera will ever produce anything like this under real conditions.

So in the case of pixel count, more isn’t necessarily better. The Dynamic Resolution Function (DRF) describes how resolution falls as a function of motion. A DRF that falls steeply from a high SR figure will give worse results in practice than a DRF that starts out with less SR but maintains it better with motion.

It may be interesting to compare the DRFs of the now-obsolete PAL and NTSC systems to see what can be learned. These were both called interlaced systems, even though they only interlaced rarely, such as when nothing moved. Figure 4. shows that in any interlaced format, there is a vertical motion speed that causes the optic flow axis to join up the lines in adjacent fields. When that happens, the resolution has halved because, to the HVS (Human Visual System), the picture now has the number of lines in a field, not a frame.

All interlaced formats: a) have a motion speed where they no longer interlace, and b) the resolution is based on the number of lines in the field.

That situation arises when the vertical motion is one frame line per field period. In NTSC that happens when the vertical motion speed is 8 seconds per picture height because in 8 seconds there are 480 fields and that is also the number of active lines. In PAL, the half-resolution point occurs at 12 seconds per picture height, so the dynamic resolution of NTSC is 50 percent better than that of PAL, even though it has fewer lines.

Being from the old world, I had been brought up to believe European television with its greater number of lines was somehow better, yet the first time I went to America, I discovered that it wasn’t. It took a few years before I understood why. Given that PAL and NTSC have nearly the same line rate, what NTSC does is to put fewer lines in more frames and that is the better way to reproduce moving pictures. Trying to improve moving pictures simply by increasing static resolution is like installing an upgraded parking brake and wondering why the car doesn’t perform any better. The problem is not the static resolution; it’s the way it falls when anything moves. The solution must be to control whatever makes it fall.

Optic flow analysis also demonstrates clearly that the effective resolution of interlaced television is the number of lines in the field, not in the frame. Thus by definition interlace is not a high definition technology and the inclusion of interlaced formats in the ATSC HD standard was a mistake. In 1080i, which, by the way ought to be described as 540i, the half resolution speed is a miserable 18 seconds per picture height, which is ideal for snail racing and watching paint dry but not much else. The pressure to move to 4K has come rather soon after the introduction of HD, which suggests it wasn’t good enough. Note that 4K formats do not support interlace, so something has been learned.

To summarise, in real cameras shooting real scenes, there are three mechanisms by which resolution can be limited. If we want to know how this will be interpreted by the tracking eye of a human viewer, the resolution we consider has to be dynamic resolution.

The first limit is the lens. Lenses today aren’t a problem, especially as the dynamic resolution of all lenses is the same as the static resolution. That’s right: lenses portray motion perfectly and we rely on them to conceal the problems introduced by parts of the system that can’t. The next limit is the sensor construction, the number of pixels it has and but primarily resolution loss is dominated by motion smear that is proportional to the exposure time.

In the next part if this series of tutorial on lenses and imagery, we will look at how to go about specifying a moving picture format that is optimised for motion.

You might also like...

Why Live Music Broadcast Needs A Specialized Music Truck

We talk to the multi-award winning team at Music Mix Mobile about the unique cultural and creative demands of mixing music live for broadcast.

Future Technologies: Private 5G Vs Managed RF

We continue our series considering technologies of the near future and how they might transform how we think about broadcast, with whether building your own private 5G network could be an excellent replacement for managed RF.

An Introduction To Network Observability

The more complex and intricate IP networks and cloud infrastructures become, the greater the potential for unwelcome dynamics in the system, and the greater the need for rich, reliable, real-time data about performance and error rates.

Essential Guide: Location Sound Recording

This Essential Guide examines the delicate and diverse art of capturing audio on location, across a range of different types of film and television production. A group of seasoned professionals discuss their art and the how it can dramatically elevate…

What Are The Long-Term Implications Of AI For Broadcast?

We’ve all witnessed its phenomenal growth recently. The question is: how do we manage the process of adopting and adjusting to AI in the broadcasting industry? This article is more about our approach than specific examples of AI integration;…