Quality Control is one of the many areas where IT and broadcast use similar terms, but the meaning is quite different. Whereas IT focuses on guaranteeing bit rates and packet delivery to improve quality of service and hence quality of experience, video and audio quality is based on satisfying the demands of the human visual and auditory systems. In this article, we investigate how we quantify and measure video and audio streams.
The peripheral vision is designed to detect movement in the extremities of our sight, especially in low lights. In our hunter-gatherer days this protected us from aggressors attacking us, especially at night. Our hearing is incredibly sensitive to short, high energy, bursts of noise, like that of a twig breaking, again protecting us from an assailant creeping up from behind.
Flickering images, crackling or distorted sound, will stimulate our “fight or flight” system. However, in the luxury of our homes, we know that it’s very unlikely that a bear will be lurking behind us ready to pounce so we suppress our urge to fight or run away. But the ancient part of our brain that governs protection doesn’t know this and pumps up our muscles in anticipation of a sprint or taking on the enemy. The result is stress and irritation for the viewer.
Broadcast engineers spend most of their time making sure pictures do not flicker or break up and audio does not distort or crackle. Our human protection systems are more sensitive to noise than image, probably so we can detect danger even when we are asleep and have our eyes closed. Consequently, audio streams are prioritized over video when bandwidth is limited.
All broadcasters use some form of quality control. This may be automated using a computer quality control (AQC) system, a human viewer watching the programs, or a combination of the two. Due to the massive amount of programming now being made, and the need to archive video tape material, most broadcasters have opted for AQC and use humans to check any borderline cases.
Bandwidth Not Guaranteed
Before IP, broadcasters had the luxury of guaranteed bandwidth audio and video circuits. They took independent paths, so vision and audio transmissions were independent of each other. However, as we move to IP, the interplay of latency and jitter can significantly impact the quality of audio and video.
Diagram 1 – the signal at “1” is the original, it then travels a long distance and at “2” has suffered attenuation and frequency loss. “3” is an equalizing amplifier that boosts the signal to “4”, to be the same as the original.
Before SDI, video and analogue was transported using analogue circuits. Transmission theory dictates that a signal will suffer frequency loss and will be susceptible to noise as it moves along the cable or through the either. Equalizing amplifiers were placed at the receiver to counteract significant frequency loss of the video and audio.
Video peak-to-peak is the measurement used to define the voltage level of the video signal between the bottom of the sync pulse and the top of peak-white of the video. This was differentiated from the video signal so that independent measurements of the active video and sync pulses could be made. The peak to peak video signal is 1V, the active video is 0.7V and the sync pulses were 0.3V. All these measurements were determined in the 1930’s during the design of television.
Long Distance Loss
Audio frequency bandwidths are limited to 20KHz and are significantly smaller than those of video’s 5.5MHz. However, audio still suffers from signal amplitude and frequency loss over long transmission paths. Instead of using voltage levels to describe audio measurements, broadcast engineers use the decibel. This is a logarithmic ratio of the power of the signal being measured and is referenced to a known power level.
As it is easy to convert from power to voltage using logarithms, voltages are also measured in the form of the dBu and dBV. Power measurements were used for long transmission paths stretching many miles or kilometers and voltage measurements have now largely superseded them.
For the decibel system to provide a meaningful measurement, it must be referenced to an absolute voltage or power. There are many different audio reference levels adopted by broadcasters throughout the world. Consequently, audio measurement can be incredibly complicated.
In an 8-bit SDI system, video reference white has the value 235 and reference black level is value 16. To allow some overhead for overshoot of analogue systems, peak white is value 254, and peak black is 1. Color must be limited to the gamut of the broadcast format being used. SD uses REC-601, HD uses REC-709, and UHD uses REC-2020. Other parameters such as the number of lines and frame rate must also be considered.
Diagram 2 – the bars on the left show the relationship between the analogue measurement on the far left and the full-scale digital measurement next to it. The equations on the right show the conversion between power and voltage measurements for dB’s using logarithms.
Certain repetition rates of flashing video are now known to cause seizures in some viewers suffering from photo-epilepsy. Examples of sources of this are flashes from cameras at press conferences or lines with high luma transients moving across a screen. Automated methods of checking images that induce seizures are well established and are regularly used by broadcasters.
There is no maximum defined analogue audio signal level, but best practice assumes the point at which a signal will clip is 24dBu (12.5Vrms). However, digital audio must have a maximum level and uses the FS (Full Scale). FS is a system of decibel measurement and is referred to as dBFS. A 0dBFS measurement is the highest audio level attained before clipping occurs and corresponds to the analogue level of 24dBu. The line-up level of 4dBu is the same as -20dBFS, giving 20dB’s of headroom before clipping occurs.
In recent years, governments throughout the world, responding to viewer complaints, have developed another measurement called “loudness”. The fundamental job of an advertiser is to grab our attention with a view to selling a product or service. Permitted maximum audio levels vary depending on the broadcaster, but a typical example is +8dBu. Some advertisers found that they could keep the maximum signal level to +8dBu but were able to significantly increase the perceived loudness by boosting the energy at certain frequencies.
EBU-R128 is one of the standards that measures the energy dispersal throughout an audio signal and helps sound engineers provide a mix that is more pleasing to the human auditory system.
The measurements described so far are objective, that is the specifications are well defined and can be measured. Automated quality control is easy to achieve with this. Software scans video and audio streams and determines the absolute digital values, compares them to the broadcaster’s specification, and then either passes or rejects them.
Science and Art
The much more difficult measurements are the subjective tests. Television is the point at which science meets the arts. Science, due to the amount of technology needed to deliver high quality, reliable, video and audio to viewers in their homes. And arts, due to the creative aspect of making television programs.
Program makers regularly introduce creative aspects into their productions that fall within technical specifications but could easily cause an automated system to fail. For example, a program might have bars-and-tone injected into it or video noise to simulate film grain. Although artificial intelligence is making strong in-roads into quality control, the only true way to check is to use a human operator.
Broadcasters are moving to automated quality control and artificial intelligence is playing a dominant role, but engineers still need to understand the basics of what is being measured and why. Broadcasters throughout the world have their own technical specifications for program makers and they must be complied with to pass quality control.
You might also like...
Wild variations in the levels of program audio has long been a problem for broadcast outlets. Due to controversy over varying audio levels, governments have forced broadcasters to specify specific loudness levels for all programming. In this article, we’ll l…
Immersive audio has the great potential to transform our human listening experience, captivate our imagination, and inspire our inventiveness.
Part one of this four-part series introduces immersive audio, the terminology used, the standards adopted, and the key principles that make it work.
Every Super Bowl is a showcase of the latest broadcast technology, whether video or audio. For the 53rd Super Bowl broadcast, CBS Sports will use almost exclusively IP and network-based audio.
This year’s Super Bowl LIII telecast on CBS will be produced and broadcast into millions of living rooms by employing the usual plethora of traditional live production equipment, along with a few wiz bang additions like 4K UHD and a…