To be able to see a moving object reasonably well, the eye tracks the object and the smart camera operator will pan the camera to keep it still in the frame. Eyes and cameras have in common that they can only do this for one moving object and everything else will be seen less well.
Other articles in this series and more series by the same author:
Here’s an experiment that is desperately simple but illustrates the point. Best performed seated in a swivelling office chair, but failing that, standing up will have to suffice. Extend an arm forward and make a “thumbs up” gesture. Focus on the extended thumb. Whilst continuing to study the thumb, turn your office chair or swivel your body from side to side.
Your thumb looks great. Now continue to track the thumb, but consider the background. Note how the background goes by smoothly. It will be out of focus, because you are focussed on your thumb, and it will have some blur because of the motion, but the motion will be smooth. Next, try panning a moving object using a reflex camera having an optical viewfinder, or a pair of binoculars, and you will see the smooth background again.
However, in other situations where a camera is driving some display, it doesn’t matter whether it’s a movie camera using film, an electronic movie camera or a TV camera or even a still digital camera with an electronic viewfinder, you will not see the smooth background motion of the thumb experiment because these systems cannot portray motion.
Why do the human eye, the reflex camera and the binoculars allow a smooth background when the other stuff doesn’t? Those three are working in real time and are continuously relaying the scene, whereas the film and electronic cameras are sampling in the time axis, and for reasons which will become clearer, sampling along the time axis only works when the time axis is the only axis, as in digital audio where it works beautifully. In a multi-axial scenario such as imaging, it cannot and does not work.
The key fact to remember is that the only temporal filtering we have in the entire system is the finite response of the HVS. Everything else follows from that. The only way in which the filtering of the HVS can work properly is if the temporally changing image is spatially static on the retina. That means the changes are along the eye’s time axis only.
Imagine an ideal tracking shot where the camera is expertly propelled at the same speed as the actors and the cameraman keeps them essentially static in the viewfinder. When reproduced on a screen, the eye doesn’t have to move to see the near-static “action”. The finite temporal response of the eye reconstructs the sampled images and the tracked actors look great.
However, as I mentioned earlier, the camera and the eye can only follow one object at a time. The background and any other object moving differently will not appear on the eye’s time axis. Imagine a simple example of a tracking shot of a moving car in which there is a static post in the background. The car is tracked well and looks great, but with a short shutter speed the post does not move smoothly, but instead appears at a number of discrete places in the background. This is known as background strobing. The camera operator cannot use a short shutter speed to arrest motion.
Now consider the effect of a longer shutter speed. The car is not moving with respect to the camera, and continues to look great, but the post now has motion blur, which makes it less obvious. If in addition we use shallow depth of focus, the post is now out of focus as well as having motion blur.
In practice, the camera shutter is open for a significant part of the frame period, so there must be an aperture effect (yes: shutters can have aperture effects, but apertures don’t have shutter effects). The aperture is near rectangular and the effect of convolving a rectangular aperture with a signal waveform is filtering with a sinx/x frequency response that removes the high frequencies that carry detail. Any part of the image that moved with respect to the film or sensor will suffer that aperture effect. When watching the movie, any part of the image that the eye cannot or does not track will suffer a second aperture effect on the retina.
Fig.1 - These are two dimensional spectra showing horizontal spatial frequencies against temporal frequencies. At a) a static real-life picture such as a photograph has spatial frequencies, but the temporal spectrum collapses to zero. At b) the picture of a) is sampled by a movie camera. The multiples of the frame rate can be seen in the temporal spectrum. At c) the spectrum of b) as seen by the HVS, which filters out the frame rate. Movies work best when nothing moves!
The avoidance of strobing almost defines a movie camera. It will need a large sensor and a lens with a wide aperture, both of which cost money, but which will allow shallow depth of focus, and a focus puller, who needs to be paid, to ensure the lens is focussed where the storytelling wants it.
In contrast the ENG videographer needs a camera with a small sensor so constant focussing isn’t needed and it happens to make the camera lighter. The videographer has the advantage of a higher picture rate: 50 or 60 Hz.
With certain exceptions, most of the temporal frequencies in moving pictures come from moving detail. Fig.1 is a two-dimensional spectral diagram. For simplicity, Fig.1 shows a number of y,t spectra that considers only spatial frequencies across an image, with the temporal frequencies at right angles. Fig.1a) shows a static picture in which the spatial spectrum extends horizontally, due to detail in the picture, and the temporal spectrum has collapsed to zero because there is no motion.
Fig.1b) shows the picture of 1a) that is being sampled by a movie camera. The spectrum of Fig.1a) now repeats at the frame rate. Fig.1c) shows that if this is presented to the HVS, frame rate multiples cannot be seen because they exceed the critical flicker frequency and the HVS acts like a classical Shannon reconstruction filter.
Fig.2 - At a) the picture of Fig.1a) is being panned, which rotates the optic flow axis and the spectrum. At b) the spectrum of a) is sampled by the frame rate of a camera. At c) When the eye tracks, the passband of the eye is also rotated so the spectrum of b) is correctly reconstructed by the filtering of the HVS. At d) the use of a temporal anti-aliasing filter is detrimental as it removes wanted information.
Fig.2 shows what happens in the case of simple movement of the image, such as a pan, relative to the camera. Fig.2a) shows that the spectrum tilts, because moving detail produces temporal frequencies that are given by the product of the spatial frequency and the speed of motion. In the x,y,t domain, motion implies the existence of an optic flow axis that is rotated with respect to the time axis. Transform duality tells us not to be surprised that a corresponding rotation is seen in the frequency domain. If this spectrum is then sampled by a movie camera, the repeating spectrum is now also tilted as Fig.2b) shows.
If the HVS is able to track the motion, that motion will tilt the pass band of the eye as in Fig.2c) and the sampling spectrum is filtered out. Figs.1 and 2 are both spectra of the picture. One could also consider that if the eye can track, the spectrum falling on the moving retina would be that of Fig.1b) and the HVS would filter it as in Fig.1c).
Fig.2d) shows what would happen if we could install a temporal anti-aliasing filter in front of the camera. Information would be lost, so even if such a device were possible, we would not want to use it.
Figs. 1 and 2 show ideal sampling, according to Shannon, that requires the sampling instants to be infinitesimally short. However, we have already seen that movie camera in many cases cannot use such sampling, firstly because curtailing the amount of light would elevate the noise floor and secondly because it would produce background strobing. As was mentioned earlier, the result of a non-zero sampling aperture is loss of resolution. The direct result is that if a movie camera must be panned, the pan must be extremely slow and the speed must change imperceptibly, otherwise the results will be unsatisfactory. There is another reason for the film look.
Fig.2 is an extremely simple example. In reality, although the frame may contain a moving object of interest that the movie camera and the HVS are tracking, other objects and a background that may be moving in a different way. For a less simple example, consider two objects moving in opposite directions, with the camera and HVS tracking one of them.
Fig.3 - Two objects are moving here, one tracked and one not. Note how the sampling spectrum of the untracked object has a component in the spatial domain of the tracked object. That is where background strobing comes from.
Fig.3 shows the result, which is that the y,t spectra of the moving objects rotate in opposite directions. When the spectrum of the non-tracked object is seen through the filtering action of the tracking eye, it will be seen that the information content of the non-tracked object has been dramatically reduced. Furthermore, the temporal sampling spectrum of the non-tracked object now has a component in the spatial axis of the tracking eye. That is where background strobing comes from.
The 24Hz movie is an unpromising and unforgiving medium that places severe restrictions on what can be filmed and how it should be done. Looking on the bright side, those restrictions soon weed out the fools who don’t know how to do it. The restrictions of 24Hz also get some respite because the system is not used in real time. Any problem will be seen in the rushes and can be re-taken.
You might also like...
Having considered all of the vital elements of moving image coding this final part looks at how these elements were combined throughout coding history.
The Edge network scales with the audience. The more people that stream concurrently, or the higher the average bitrate requested by a consistently sized audience, the more capacity the Edge network needs. Achieving best possible efficiency at the Edge requires…
The criticality of service assurance in OTT services is evolving quickly as audiences grow and large broadcasters double-down on their streaming strategies.
Having looked at the traditional approach to moving pictures and found that the portrayal of motion was irremediably poor, thoughts turn to how moving pictures might be portrayed properly.
At its core, the network-side can be an early warning system for QoS, which in turn correlates to actual QoE performance.