The techniques of 35mm film are seductively simple. The process is the same no matter what the camera in use, or how the film will be cut. Conversely, every digital camera might have its own ways of approaching different parts of the process, creating a forest of terminology – gamma, gamut, subsampling – that’s easily mistaken. Let’s follow a picture from the sensor to the recorded file and figure out exactly what all this means.
Most people understand that color pictures are made up of red, green and blue components. There are two main ways that’s done in modern cameras, but for now let’s assume that we have three pictures of the scene as seen through red, green and blue filters, which is how, for instance, TV studio cameras work. The word “gamut” means the same as ever: the complete range of something, the total space something can occupy. The range of colors available is controlled by the specific color filters. In terms of a camera, gamut just means what shade of red, what shade of green and what shade of blue we use.
This has far-reaching consequences. A deep red might be formed with a powerful red signal, with green and blue signals at low level. Once the red is fully on and the green and blue are off, it’s impossible to achieve a deeper red; the deepest available red is the red used in the filters in the camera (or monitor, or projector.) This affects every color the device is capable of handling; if we have a pale, feeble red, it affects our ability to create deep, saturated yellows or magentas as we start to mix that red with green or blue.
Fig 1- Wider gamut on the right image demonstrates more vibrant and intense colors. This isn’t just a function of the camera but is also influenced by the file storage type, processing, and display.
File formats have a gamut too, which might not match the one used in camera. If we store uncompressed pictures as a series of numbers representing red, green or blue values, we still need to know which shades of those colors we’re using. We can’t just ignore the difference; if the sensor has very deep primaries and the monitor has less saturated ones, the picture will look dull and colorless; full red on the display is simply less red than full red on the sensor. Pictures can be converted between one gamut and another and that’s quite a common requirement.
Common gamuts include Adobe RGB and sRGB on computers, the ITU’s Recommendation BT.709 for high definition TV, and improved gamuts such as that given in Rec. 2020. Camera manufacturers often have many of their own, and cinema projectors use DCI-P3.
Gamut describes the range of available colors in terms of the primary colors used.
Straight from the sensor, the picture represents, more or less, the number of photons that were coming out of the scene. We’d usually call that linear, because the value recorded depends directly on the amount of light.
Storing and transmitting linear data works but requires lots of precision. We talked about how brightness encoding works here, but we can summarize it simply: human eyes are not linear at all. Take a dark scene and increase the amount of light coming out of it by a given amount, and it looks appreciably brighter. Take a scene that’s already bright and increase the light by the same amount, and the difference isn’t as obvious. That means we need more precision at the shadow end of the scale, and less at the highlight end. Gamma encoding alters the brightness of the scene by boosting the level of the darkest areas, and then decoding reduces the level of those areas before displaying it later.
Fig 2 – Greater image precision is needed in the shadows to represent the non-linearity of the human eye.
Different approaches are used for different technologies. ITU Recommendation BT.1886 specifies the gamma used for HD television, while sRGB is used for computer displays. Camera manufacturers often use a variant on a mathematical logarithm, which is where logcomes in, but the idea is the same.
Gamma defines the relationship between the brightness of the scene and the signal level recorded.
Analog television describes each line of the picture as a variable voltage, a graph of brightness in each red, green or blue channel. It’s no great stretch to record that digitally, as a series of numbers. Larger numbers mean more finely-divided levels of brightness, and bit depth controls what range those numbers can cover.
An 8-bit number can count up to 255, while a 10-bit number can count up to 1023. 8-bit numbers have long been considered a minimum for storing photographic images, with 10 bit preferred for professional applications. As we discussed above, images stored without gamma encoding might need far smaller gradations, often 12, 14 or 16 bits. A 16-bit image can encode 65,535 different levels. The huge difference between that and 8-bit, 255-count numbers represents the sort of difference gamma encoding can make; without it, we need far more numbers.
So, bit depth controls the number of gradations used to store each red, green or blue channel of the picture, from black to maximum value.
Fig 4 – The image on the right demonstrates the banding effect seen when bit-depth is reduced, the image on the left has a higher bit-depth.
Now we have a digital image with a known RGB color gamut, gamma encoding, and bit depth. A BMP still image file works like this: it’s more or less the first approach ever taken to digital imaging, and it’s quite possible to store a sequence of files like that and call it a video sequence. That’s uncompressed video, and it is huge; nearly 180 megabytes per second (nearly 1500 megabits per second) for 10-bit 1080p images at 24 frames per second.
The first way that most systems reduce the amount of data to be stored is color subsampling. Subsampling is really a type of image compression, but it’s one that does so little damage to the image, and which has so few overheads in terms of processing power and issues with re-compression, that it’s usually considered separately to more complex, mathematical compression techniques.
Subsampling takes advantage of the fact that the human eye sees brightness with much more detail than it sees color. There are ninety-plus million light-sensitive rod cells in a human retina; there are only around four and a half million color-sensitive cone cells. In an electronic image, though, the brightness and color information are mixed together in the form of the red, green and blue channels.
The solution is to use three different components, called, for complicated reasons, Y’, CRand CB. (often called YUV, though that’s not really correct.) The Y’ channel approximates brightness (properly luma,) while the CRand CBchannels are used to modify that effectively black-and-white image to show color. The R stands for red and the B for blue; a high value in the CRchannel pulls the color toward red, while a low value pushes it toward cyan. A high CBvalue pulls the color toward blue, while a low value pushes it toward yellow. Between these three components, we can create a full-color image.
Fig 5 – The Y, Cr, and Cb signals in the lower three images demonstrate the low pass filtering of the chroma part of the signal to take advantage of the eye’s lower frequency response to color.
Crucially, we can now store the CRand CBchannels at a lower resolution than the Y’ channel. That’s the source of the familiar ratios, such as 4:2:2, meaning that the image stores 2 samples of CRand CBfor each sample of Y’. In other words, the color channels are stored at half the resolution, horizontally, of the Y’ channel. In 4:1:1 the color channels are stored at one quarter of the resolution. There are only two oddities: 4:2:0 describes a situation where the color channels are stored at half horizontal andhalf vertical resolution, and 3:1:1, as used on Sony’s HDCAM tape format, indicates that even the Y’ channel is stored at three-quarters normal resolution.
The mathematics used to go from RGB to Y’CRCBare not reversible, since each format can store colors the other can’t. The concept of a gamut still exists, though, based on the blue and red colors we use to define the CRand CBchannels.
So, color subsampling is a way to isolate brightness and reduce the resolution of color information.
Post-subsampling, the image is in the format that comes out of the SDI output. Storing that image tends to require more compression, and there are as many ways to do that as there are cameras in the world. More detail on that is a subject for another day but suffice for now to say that a codec – a pair of two pieces of software intended to encode then decode images – is a way of throwing away picture information the viewer won’t notice. That can be done well or badly, harshly or gently, and it is difficult to evaluate. In the end, though, the codec used by a camera (and by the edit equipment that uses its files) is something separate from color subsampling, gamma, gamut or bit depth.
There are a lot of different color gamuts, gamma encodings, bit depths and color subsampling options that can be combined in modern equipment. One of the most reassuring things about it, though, is that almost any imaginable new imaging technology – high dynamic range, high color depth, or anything else – involves a variation on one or more of them. Understanding what each of them does makes all of this a lot easier to understand, and with several manufacturers already fighting over how home HDR is going to work, that’s a very useful thing.
You might also like...
Information theory can also be applied to loudspeakers, which are among the most difficult of transducers to design. Measuring the information capacity of loudspeakers is a useful tool.
In the previous article in this series, we looked at layer-2 switching and layer-3 routing. In this article, we look at Software Defined Networks and why they are so appealing to broadcasters.
Capturing the essence of a location in a single shot or series of shots can present a range of challenges for the itinerant DOP.
Here we look at some practical results of transform theory that show up in a large number of audio and visual applications.
Much of the attention enjoyed by virtual production currently goes to the spectacular stages with LED displays the size of half a dozen cinema screens. The material we put on those displays, though, can come from a number of places,…