HDR - Part 2 - Brightness Encoding

Dealing with brightness in camera systems sounds simple. Increase the light going into the lens; increase the signal level coming out of the camera, and in turn increase the amount of light coming out of the display. In reality, it’s always been more complicated than that. Camera, display and postproduction technologies have been chasing each other for most of the last century, especially since a period in the late 1990s or early 2000s, when electronic cameras started to become good enough for serious single-camera drama work.

When the cameras became more capable, better recording systems were needed, and finally with high dynamic range pictures more recently we’ve started to ensure that more and more of that picture information makes its way from the scene to the viewer’s eye. It’s never been the intent of television to make display literally match the real-world scene; to do that, we’d need monitors capable of emitting as much light as the sun in case someone shoots a sunset. As such there’s always been a creative need to modify the way the image represents brightness, simply to produce something that looks reasonable to most people while not requiring those sun-bright displays.

So brightness in television has always been an artifice. Even more than that, the technology of early TV, the radio transmitters and cathode ray tubes, just didn’t work very well when the signal levels were directly proportional to the light hitting the lens. Human eyes tend to compress the range of brightness that exists in the real world, so that bright highlights don’t look quite as bright as we’d expect given the actual amount of photons involved. Open up the lens on a camera, say, three stops, and it looks like the light has increased in three equal steps. But the actual amount of photons hitting the sensor has increased eight times. What looks fairly bright to us is, in reality, incredibly bright.

That means most of the average scene is comparatively dark, producing a low signal level from a camera. Early television pioneers needed to send that information over a radio link, with the result that the low-level signal, and thus most of the picture, would be ruined by noise. The solution was gamma processing – essentially, boosting the brightness of darker parts of the image at the camera, to keep them away from the noise. At the receiving TV the process could be reversed, to achieve a normal picture with low noise. Audio signals on magnetic tape have often used a similar process and called it companding, for compressing then expanding the volume range of the sound.

In one of the great coincidences of engineering, the behavior of cathode ray tube TV receivers was more or less ideal by default. CRTs are, proportionally, less responsive to low level signals than they are to high level signals. That would darken already-darker areas of the image, and more or less automatically reverse the brightening of dark image areas which took place at the transmitter. That’s how early television worked, and the signals produced by cameras have used gamma processing of more or less that type all the way to the present day. That approach certainly lasted long beyond the popular use of CRT technology and wasn’t even formalized in a written standard until the Independent Telecommunications Union’s Recommendation BT.1886 in 2011.

Long before then, cameras had developed to the point where they could capture a huge range of contrast. That might include a sunset and the shadow side of a person watching it, or a person being illuminated by a candle flame and the dark room beyond. Whatever the actual light levels involved, the difference between them can be huge with modern cameras, but the standards specify much more limited capabilities for displays. Take a camera’s image, with hugely bright highlights, and display that image on a much dimmer display and the resulting image looks flat and dull. To solve that problem, image processing circuitry flattens out the highlights, boosting contrast in the middle greys and creating a watchable image.

Diagram 1 – the red line shows the gamma correction added to the signal to correct for the non-linear response of the monitor (green line). This also helps boost the dark regions during transmission to reduce the apparent visible noise.

Diagram 1 – the red line shows the gamma correction added to the signal to correct for the non-linear response of the monitor (green line). This also helps boost the dark regions during transmission to reduce the apparent visible noise.

The result is a camera that performs according to ITU Recommendations 709 and 1886. More accurately, it’s a camera designed to drive monitors conforming to those standards, because the paperwork describes displays not cameras. Either way, the result is a signal from camera to monitor that includes gamma processing designed to create a viewable image. It’s a matter of opinion as to what a viewable image looks like, and certainly a literal interpretation of the standards tends to create a rather too contrasty, harsh and over-saturated image that wouldn’t be very popular in 2019. Because of this, there’s a lot of engineering opinion involved, and most manufacturers simply aim to create a pleasing image on standard displays without being slavishly dedicated to implementing the standards literally. Most often, highlights are treated much more gently than the standard suggests, avoiding the clipped harshness that could afflict bright areas in many legacy video formats.

This all worked fine until the turn of the millennium, when it became reasonable to consider the best electronic cameras for work that would once have used film. Even the most carefully-designed image processing in a camera targeting Rec. 709 inevitably involves heavily altering the image, particularly turning many of the brightest parts of the frame into fields of plain white. Trying to fix this with color grading is difficult or impossible, simply because some of the highlight detail that the camera might have seen has already been thrown away by the image processing circuitry.

By the mid-2000s, cameras were becoming available which recorded logarithmically encoded images. View a log image on a monitor, as many people will have done, and it simply looks low contrast. It’s instinctively true that a lower-contrast image offers more grading options than a higher-contrast one, and it’s reasonable to think of a log image as simply very low in contrast, albeit according to some fairly specific mathematics. Log shooting stores the complete range of contrast seen by the camera. Yes, that’s what we were trying to avoid doing by processing the picture to look right on a conventional monitor, and as a result, log pictures don’t look right on a conventional monitor.

To solve that, many people will be familiar with the idea of uploading a lookup table into a monitor, or selecting one for a special monitoring output on a camera, to process a log image for viewability. The key thing to realize is that it represents very much the same process that a camera might normally have done internally before recording the image to tape. The difference is that with a log workflow that processed image is used for viewing only; the recorded image still contains all of the brightness information and a colorist now has plenty to work with. 

Diagram 2 – non-linear response of a camera to compress as much brightness information as possible into the signal, this also assists with grading.

Diagram 2 – non-linear response of a camera to compress as much brightness information as possible into the signal, this also assists with grading.

In some sense, then, nirvana had been achieved, but then it became clear that we could do better yet.

As we’ve said, it was never the intent of TVs to match reality, and reality certainly contains much brighter highlights than most displays can achieve. The actual brightness of a display is measured in candela per square meter, which for some reason is abbreviated nits, and Rec. 709 and 1886 say that a conventional video display should achieve a bit more than a hundred nits. In practice, very few displays, especially consumer displays, are anything like that dim: the specification assumes a dark viewing environment, whereas most real-world lounges are at least somewhat lit.

Common TVs can exceed 300 nits, while computer displays are often vastly brighter, often up to 400, on the assumption that offices may be brightly lit. On that basis, better displays were clearly possible, and might close the gap between cameras which can handle huge ranges of contrast, and displays that can’t. Improve the displays and it becomes less necessary to process bright highlights out of the image, and what’s more, people liked it. Early tests were done using a huge digital cinema projector back-projected onto a small, TV-sized screen, a configuration capable of thousands of nits, and test audiences instinctively preferred somewhat brighter images. This was the birth of high dynamic range (HDR), where the most common standards specify displays at up to a thousand nits, ten times the brightness of a conventional display, while maintaining low black levels.

Carrying pictures of such a high contrast range to consumers might need some new engineering. Schemes somewhat like log encoding in cameras can be used.

This would be an easy solution other than that log camera images are invariably stored in files capable of ten-bit resolution – that is, 1024 levels of brightness per RGB (or other) component. Distribution systems are generally eight bit, and the resulting 256 levels of brightness encoding might produce visible banding when stretched out to cover such a wide range of brightness.

So, HDR might end up requiring more bandwidth (to carry more data) or more expensive equipment (to handle better video compression,) neither of which the industry wants. There are several different potential solutions and as of mid-2019, no obvious leading contender.

It’s a format war, but regardless of the winner, all these things – standard dynamic range, log encoding in cameras, and now high dynamic range – concern broadly the same thing. Between scene and screen may be several layers of processing designed to reduce noise, allow for monitoring and grading, or to prepare the image for any of several kinds of display. It can get complicated, but in the end they are, at least in part, simply new ways of expressing the relationship between light going into a lens and light coming out of a display.

Let us know what you think…

Log-in or Register for free to post comments…

You might also like...

Essential Guide: Audio Over IP Primer For Broadcast

Sound engineers have spent over twenty years implementing and improving audio over IP systems. This has given audio a head-start in the race to migrate to IP. Not only does the sound seamlessly transfer across networks but recent designs have…

Sensors and Lenses - Part 1

The push to create the ideal digital cinematography camera has now been going on for, arguably, two decades. There were a couple of standout attempts in the 1980s involving high definition tube cameras, but the introduction of Sony’s HDCAM t…

Real Film Grain For Video

People have been making pictures for both the big and small screens for almost a century. In an industry with a history that long, it’s no surprise that the perpetual search for something new has long been tempered by a…

Data Recording: Part 14 – Error Handling

In the data recording or transmission fields, any time a recovered bit is not the same as what was supplied to the channel, there has been an error. Different types of data have different tolerances to error. Any time the…

For DOPs: The Peril Of Larger, Brighter TVs

Each year, as the TVs in our homes grow larger and brighter, DOPs have to wonder how this will affect our craft and the integrity of our images. As it is, HDR is touted as a kind of industry panacea,…