HDR: Part 2 - Brightness Encoding

Dealing with brightness in camera systems sounds simple. Increase the light going into the lens; increase the signal level coming out of the camera, and in turn increase the amount of light coming out of the display. In reality, it’s always been more complicated than that. Camera, display and postproduction technologies have been chasing each other for most of the last century, especially since a period in the late 1990s or early 2000s, when electronic cameras started to become good enough for serious single-camera drama work.

This article was first published in 2019. It and the entire 'HDR Series' have been immensely popular, so we are re-publishing it for those who missed it first time around.

When the cameras became more capable, better recording systems were needed, and finally with high dynamic range pictures more recently we’ve started to ensure that more and more of that picture information makes its way from the scene to the viewer’s eye. It’s never been the intent of television to make display literally match the real-world scene; to do that, we’d need monitors capable of emitting as much light as the sun in case someone shoots a sunset. As such there’s always been a creative need to modify the way the image represents brightness, simply to produce something that looks reasonable to most people while not requiring those sun-bright displays.

So brightness in television has always been an artifice. Even more than that, the technology of early TV, the radio transmitters and cathode ray tubes, just didn’t work very well when the signal levels were directly proportional to the light hitting the lens. Human eyes tend to compress the range of brightness that exists in the real world, so that bright highlights don’t look quite as bright as we’d expect given the actual amount of photons involved. Open up the lens on a camera, say, three stops, and it looks like the light has increased in three equal steps. But the actual amount of photons hitting the sensor has increased eight times. What looks fairly bright to us is, in reality, incredibly bright.

That means most of the average scene is comparatively dark, producing a low signal level from a camera. Early television pioneers needed to send that information over a radio link, with the result that the low-level signal, and thus most of the picture, would be ruined by noise. The solution was gamma processing – essentially, boosting the brightness of darker parts of the image at the camera, to keep them away from the noise. At the receiving TV the process could be reversed, to achieve a normal picture with low noise. Audio signals on magnetic tape have often used a similar process and called it companding, for compressing then expanding the volume range of the sound.

In one of the great coincidences of engineering, the behavior of cathode ray tube TV receivers was more or less ideal by default. CRTs are, proportionally, less responsive to low level signals than they are to high level signals. That would darken already-darker areas of the image, and more or less automatically reverse the brightening of dark image areas which took place at the transmitter. That’s how early television worked, and the signals produced by cameras have used gamma processing of more or less that type all the way to the present day. That approach certainly lasted long beyond the popular use of CRT technology and wasn’t even formalized in a written standard until the Independent Telecommunications Union’s Recommendation BT.1886 in 2011.

Long before then, cameras had developed to the point where they could capture a huge range of contrast. That might include a sunset and the shadow side of a person watching it, or a person being illuminated by a candle flame and the dark room beyond. Whatever the actual light levels involved, the difference between them can be huge with modern cameras, but the standards specify much more limited capabilities for displays. Take a camera’s image, with hugely bright highlights, and display that image on a much dimmer display and the resulting image looks flat and dull. To solve that problem, image processing circuitry flattens out the highlights, boosting contrast in the middle greys and creating a watchable image.

Diagram 1 – the red line shows the gamma correction added to the signal to correct for the non-linear response of the monitor (green line). This also helps boost the dark regions during transmission to reduce the apparent visible noise.

The result is a camera that performs according to ITU Recommendations 709 and 1886. More accurately, it’s a camera designed to drive monitors conforming to those standards, because the paperwork describes displays not cameras. Either way, the result is a signal from camera to monitor that includes gamma processing designed to create a viewable image. It’s a matter of opinion as to what a viewable image looks like, and certainly a literal interpretation of the standards tends to create a rather too contrasty, harsh and over-saturated image that wouldn’t be very popular in 2019. Because of this, there’s a lot of engineering opinion involved, and most manufacturers simply aim to create a pleasing image on standard displays without being slavishly dedicated to implementing the standards literally. Most often, highlights are treated much more gently than the standard suggests, avoiding the clipped harshness that could afflict bright areas in many legacy video formats.

This all worked fine until the turn of the millennium, when it became reasonable to consider the best electronic cameras for work that would once have used film. Even the most carefully-designed image processing in a camera targeting Rec. 709 inevitably involves heavily altering the image, particularly turning many of the brightest parts of the frame into fields of plain white. Trying to fix this with color grading is difficult or impossible, simply because some of the highlight detail that the camera might have seen has already been thrown away by the image processing circuitry.

By the mid-2000s, cameras were becoming available which recorded logarithmically encoded images. View a log image on a monitor, as many people will have done, and it simply looks low contrast. It’s instinctively true that a lower-contrast image offers more grading options than a higher-contrast one, and it’s reasonable to think of a log image as simply very low in contrast, albeit according to some fairly specific mathematics. Log shooting stores the complete range of contrast seen by the camera. Yes, that’s what we were trying to avoid doing by processing the picture to look right on a conventional monitor, and as a result, log pictures don’t look right on a conventional monitor.

To solve that, many people will be familiar with the idea of uploading a lookup table into a monitor, or selecting one for a special monitoring output on a camera, to process a log image for viewability. The key thing to realize is that it represents very much the same process that a camera might normally have done internally before recording the image to tape. The difference is that with a log workflow that processed image is used for viewing only; the recorded image still contains all of the brightness information and a colorist now has plenty to work with.

Diagram 2 – non-linear response of a camera to compress as much brightness information as possible into the signal, this also assists with grading.

In some sense, then, nirvana had been achieved, but then it became clear that we could do better yet.

As we’ve said, it was never the intent of TVs to match reality, and reality certainly contains much brighter highlights than most displays can achieve. The actual brightness of a display is measured in candela per square meter, which for some reason is abbreviated nits, and Rec. 709 and 1886 say that a conventional video display should achieve a bit more than a hundred nits. In practice, very few displays, especially consumer displays, are anything like that dim: the specification assumes a dark viewing environment, whereas most real-world lounges are at least somewhat lit.

Common TVs can exceed 300 nits, while computer displays are often vastly brighter, often up to 400, on the assumption that offices may be brightly lit. On that basis, better displays were clearly possible, and might close the gap between cameras which can handle huge ranges of contrast, and displays that can’t. Improve the displays and it becomes less necessary to process bright highlights out of the image, and what’s more, people liked it. Early tests were done using a huge digital cinema projector back-projected onto a small, TV-sized screen, a configuration capable of thousands of nits, and test audiences instinctively preferred somewhat brighter images. This was the birth of high dynamic range (HDR), where the most common standards specify displays at up to a thousand nits, ten times the brightness of a conventional display, while maintaining low black levels.

Carrying pictures of such a high contrast range to consumers might need some new engineering. Schemes somewhat like log encoding in cameras can be used.

This would be an easy solution other than that log camera images are invariably stored in files capable of ten-bit resolution – that is, 1024 levels of brightness per RGB (or other) component. Distribution systems are generally eight bit, and the resulting 256 levels of brightness encoding might produce visible banding when stretched out to cover such a wide range of brightness.

So, HDR might end up requiring more bandwidth (to carry more data) or more expensive equipment (to handle better video compression,) neither of which the industry wants. There are several different potential solutions and as of mid-2019, no obvious leading contender.

It’s a format war, but regardless of the winner, all these things – standard dynamic range, log encoding in cameras, and now high dynamic range – concern broadly the same thing. Between scene and screen may be several layers of processing designed to reduce noise, allow for monitoring and grading, or to prepare the image for any of several kinds of display. It can get complicated, but in the end they are, at least in part, simply new ways of expressing the relationship between light going into a lens and light coming out of a display.

You might also like...

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

Next-Gen 5G Contribution: Part 2 - MEC & The Disruptive Potential Of 5G

The migration of the core network functionality of 5G to virtualized or cloud-native infrastructure opens up new capabilities like MEC which have the potential to disrupt current approaches to remote production contribution networks.

Designing IP Broadcast Systems: Addressing & Packet Delivery

How layer-3 and layer-2 addresses work together to deliver data link layer packets and frames across networks to improve efficiency and reduce congestion.

Next-Gen 5G Contribution: Part 1 - The Technology Of 5G

5G is a collection of standards that encompass a wide array of different use cases, across the entire spectrum of consumer and commercial users. Here we discuss the aspects of it that apply to live video contribution in broadcast production.

Designing IP Broadcast Systems: Integrating Cloud Infrastructure

Connecting on-prem broadcast infrastructures to the public cloud leads to a hybrid system which requires reliable secure high value media exchange and delivery.