Camera Lenses Part 3: MTF And Depth Of Field

All lenses suffer from various imperfections that reduce sharpness. However, even if the lens elements were ideal and caused no loss of resolution, lenses still can’t focus to a perfect point.


This article was first published in 2015. It and the rest of this mini series have been immensely popular, so we are re-publishing it for those who missed it first time around.


Camera lens elements are made to finite tolerance from glass of finite uniformity and assembled to finite accuracy and all those little imperfections add together. The manufacturers try very hard to minimise them without getting too far into diminishing returns. Users can further reduce these imperfections by stopping down.That is because of diffraction: the wave nature of light limits what imaging can do.

Imaging of detail is about distinguishing between or separating points in the picture. Figure1 shows what happens with a fixed wavelength, like the green in the middle of human vision. Bring points in the picture closer together, and the wavefronts describing the spacing of the points get more oblique to the optical axis. Lenses act like spectrum analyzers: the average brightness information travels up the optical axis and as the spatial frequency rises the information travels through a larger radius in the lens. The lens must have a large enough aperture to capture the oblique wavefronts carrying the detail. Light from any higher detail passes outside the lens, which therefore acts as a low-pass filter.

Figure 1. Here, light from two closely spaced points forms a diffraction pattern where the wavefront leaves at an angle θ (theta) which only just fits in the aperture of the lens. Move the points closer together and the angle increases. The lens won’t see the detail

Figure 1. Here, light from two closely spaced points forms a diffraction pattern where the wavefront leaves at an angle θ (theta) which only just fits in the aperture of the lens. Move the points closer together and the angle increases. The lens won’t see the detail

Lenses have a frequency response, just like analog video signal paths, except the frequency is spatial and the response is called a modulation transfer function (MTF). The modulation concerned is black and white stripes, either on a zebra or a test chart. As the stripes get narrower and closer together, there comes a time when the lens can’t tell them apart and the result is gray because there is no modulation left. In all lenses, the MTF is controlled by the aperture selected. The smaller the aperture, fewer high frequency wavefronts get through and diffraction loss increases. An ideal point becomes enlarged into a point-spread function, which is a fancy term for a blob.

There are two opposing sharpness mechanisms in real lenses. There are mechanisms due to tolerances that can be minimised by stopping down and there is loss due to diffraction that can be reduced by opening up the aperture. It follows that all real lenses have an optimum or sweet spot aperture where the overall loss of sharpness is minimal. Balanced camera design allows the lens to be used at that aperture in real life situations. Neutral density (ND) filters help, because if the scene is too bright, instead of stopping down we can pull in an ND filter and keep a bigger aperture if we want. Low noise electronics helps too, because if we want to keep a small aperture when the light is poor, we can turn up the gain, or the apparent speed of the sensor without the picture getting grainy.

In photography we can also change the exposure time, whereas in videography there is rather less freedom because of the fixed frame rate and the effects of motion which will have to wait for another time.

The subject of lenses cannot be treated without looking at depth of focus, because that’s where the theory meets the practice. No skill at all is needed to know a picture is out of focus. Knowing why and what to do is a little harder.

Lenses are mapping devices and in theory they can map ideally only from one plane to another. One of those planes will be the sensor; the other plane typically isn’t a plane at all, but a three dimensional scene. The only imaging technology that can map an arbitrary three dimensional screen onto a plane is the pin-hole, and whilst it worked for Canaletto, it isn’t sensitive enough for photography, let alone videography.

Recall that we use a lens because it can capture light over a greater solid angle and give imaging devices useable sensitivity. The down side is that with a lens only one plane at one distance in front of the camera can be in perfect focus at any one time, according to how we set the focussing ring.

In a perfect system having infinite resolution, everywhere else would be out of focus. In the real world, as we just saw above, lenses have point spread functions. Resolution is further lost due to other problems like pixel size and motion. If there is a fixed loss of resolution in the system, then small losses due to lack of focus will be disguised and there will be a range of distances from the camera where the quality of focus appears the same. That is what is meant by depth of focus (DOF), also called depth of field, it’s the range of distance from the camera over which errors other than those due to focus are dominant. Outside that range focus losses dominate the perceived sharpness.

One thing that should follow is that the depth of focus that can be achieved in practice is a function of the resolution of the camera. If the resolution is doubled, the amount of out-of-focus that can be disguised will be halved and with it the depth of field. That is immediately important for videographers making the transition from SD to HD to 4K shooting.

The mechanism of depth of focus is easy to follow. Light entering the lens aperture is focused into a cone whose point should fall on the sensor. Beyond the point light diverges in a second cone. Imagine moving a screen around the area of the focus. At the focus there would be a point on the screen, but too close or too far and one of the cones intersects the screen in a circle: the circle of confusion. It should be obvious that if we narrow down the cones by closing the aperture of the lens, the screen can move further for the same sized circle of confusion. Or in plain English, the smaller the aperture, the greater the depth of focus, except that going to an extremely small aperture will raise diffraction loss.

Depth of Field isn’t a binary thing. The picture isn’t suddenly out of focus as soon as a magic distance is exceeded. Instead it’s a gradual thing. In photography and cinematography it is common to use selective focus to draw the viewer’s attention to a specific subject by throwing background (or foreground) out of focus. In the out-of-focus areas the image is convolved with the shape of the iris, so that point highlights show up as little iris-shaped spots. The subjective smoothness of the out-of-focus area is referred to as the boke, pronounced bokeh, a Japanese word for blur.

Figure 2. Here is an example of using DOF to isolate a subject. Note the boke is good and the background has lost detail.

Figure 2. Here is an example of using DOF to isolate a subject. Note the boke is good and the background has lost detail.

Figure 3 shows another shot with limited DOF, but this time the boke is poor because a mirror lens was used and they have an annular aperture. This condition is occurs as mirror lenses are shorter and lighter for the same focal length, because they fold the light path. However the centre of the front element is opaque because there is an internal mirror behind it, so the aperture is annular, like a doughnut, hence the doughnut shaped highlights in the background.

The existence of depth of focus means that there is no point in focusing at infinity. The hyper-focal distance H is the distance at which a lens can be focused such that objects at infinity are not reduced in resolution more than they are due to other effects. This also holds true as close as H/2. This is very nearly the same as saying that a lens focused on infinity retains resolution down to H. As mentioned above, all of this changes with system resolution. A lens with a better MTF performance will have a longer H.

Figure 3. DOF isolation with poor boke. The critter (a coypu) looks good enough, but the background is confused rather than out of focus. This is a characteristic of mirror lenses.

Figure 3. DOF isolation with poor boke. The critter (a coypu) looks good enough, but the background is confused rather than out of focus. This is a characteristic of mirror lenses.

Format or sensor size also makes a difference to DOF. In a television application, we assume the viewer is always watching the same size TV screen at the same distance, and seeing the same field of view. If we change to a camera having a larger sensor, and keep the same aperture but increase the focal length in proportion to keep the field of view the same, the DOF will go down more or less in proportion to the format size. In short, what the picture looks like depends on the size of the sensor. There is no such thing as a definitive picture and people who say the camera never lies are simply telling us they know nothing about cameras.

So here we have another reason why ENG cameras have small chips. Typically ENG shooting requires plenty of DOF, so the reporter in front and the action behind are both in focus. Not only is a small chip cheaper, it allows the camera and lens to be smaller and lighter. Such cameras are not suitable for selective focus work and in any case it is unlikely the ENG videographer has time for such things. These shooters do not worry about boke. 

The everything-in-focus syndrome is part of the television look, just as the large sensor, selective focus approach is part of the cinematographic and professional photographic look where the boke of a lens matters.

The tiny sensors and lenses built into iPhones and tablets display the ultimate everything-in-focus look and produce images of such flatness and lifelessness that they have very nearly achieved the death of photography and have created a generation of people who hate having their photograph taken.

You might also like...

NDI For Broadcast: Part 1 – What Is NDI?

This is the first of a series of three articles which examine and discuss NDI and its place in broadcast infrastructure.

Brazil Adopts ATSC 3.0 For NextGen TV Physical Layer

The decision by Brazil’s SBTVD Forum to recommend ATSC 3.0 as the physical layer of its TV 3.0 standard after field testing is a particular blow to Japan’s ISDB-T, because that was the incumbent digital terrestrial platform in the country. C…

Broadcasting Innovations At Paris 2024 Olympic Games

France Télévisions was the standout video service performer at the 2024 Paris Summer Olympics, with a collection of technical deployments that secured the EBU’s Excellence in Media Award for innovations enabled by application of cloud-based IP production.

HDR & WCG For Broadcast - Expanding Acquisition Capabilities With HDR & WCG

HDR & WCG do present new requirements for vision engineers, but the fundamental principles described here remain familiar and easily manageable.

What Does Hybrid Really Mean?

In this article we discuss the philosophy of hybrid systems, where assets, software and compute resource are located across on-prem, cloud and hybrid infrastructure.