In this new series we look at what transforms are, how they work and what they can be used for.
Other articles in this series:
Transforms form a large topic that is multi-dimensional, meaning that there isn't a single path through the subject that might form a story. There is more than one place to start and the place I have chosen is certainly not unique.
Transforms are traditionally the stamping ground of the mathematician, who has his own arcane language that the rest of us do not speak. The result is that one mathematician can explain a transform to another mathematician using only a handful of mysterious symbols that might equally well have decorated a Pharoah's tomb. The rest of us stay in the dark.
The reader will be relieved to know that I have no plans to go down that route. Since it is well trodden, there is nothing I can add. Instead I'm hoping to shed some light on the subject whilst going easy on the hieroglyphs.
Mankind's understanding of the world comes largely by creating models and using them to predict what might happen under various circumstances. Of course anyone with an imagination can create a model, whereas only a few people test their model by performing experiments to see if the predictions hold.
One gets tired of predictions that the world will end because of this, that or the other, usually followed by instructions on how we must behave to avoid it, preferably by buying something. Stubbornly, the world hasn't ended yet, whereas if it were as fragile as is often made out it would have ended years ago. Perhaps instead it is the predictions that are fragile.
Once the underlying mechanism is understood, the mathematicians can turn it into equations from which some quantitative predictions can be made. If the underlying mechanism has not been understood, the predictions will not turn out well. It is the underlying mechanism that is more important and that is the aspect of transforms that will be considered here.
Those involved with broadcasting and music usually have some idea of what a spectrum is. The good news is that a spectrum is one result of using a transform, although there are many others.
In all cases we begin with information of some kind. Information cannot exist on its own and as a practical matter it has to either be stored on some medium or transmitted down a communication channel. Some systems only exist to deliver that information to another place whereas others seek to analyze the information in order to learn something from it or to be selective about what is sent. When choosing a piece of fruit at the grocery shop, the natural thing is to turn it round and look from different directions. Why should we not do that with information as well?
If we are given a box and told to measure it, we come up with the length, width and height. We have, without thinking, turned the box until the sides align with our coordinate system. Why? Because it's much simpler than doing it any other way. Imagine that we balanced the box on one corner so that a diagonal was vertical and then started measuring. Our description of the box would take a lot longer as we noted the position of each corner in three dimensions. Clearly, looking at something from some directions can be more efficient than others.
At one level, a transform is just such a way of looking at information from another direction. Perhaps that's on a very simple level, but it's still correct. Equally, examining your piece of fruit or your box from different directions doesn't change it, and that can also be true of transforms. With appropriate care we can get back from the transform domain without necessarily losing any information.
In audio-visual systems, the domains that matter are frequency, space and time.
In a still image there are spatial frequencies, defined as the number of cycles of change per unit distance. When a still image is scanned, as in television, spatial frequencies are converted to temporal frequencies. In the case of moving pictures, the frequencies created become much higher as well as being multi-dimensional. Some of the high frequencies alias and with care can be recovered. Some are lost and the only high definition we are left with is the wording on the TV set.
Looking at television signals from moving pictures in the frequency domain, which we shall do in due course, helps us to see why traditional frame rates based on the avoidance of flicker are inadequate. Looking at multi-dimensional spectra in video signals suddenly becomes simpler when we look down different axes using concepts such as optic flow.
Not only will we find similarities between how birds manage to alight on branches and how self-driving vehicles work, but we will also find that much of the art of video compression relies on optic flow. In the case of moving pictures, moving objects cause increased differences between successive pictures, making compression harder. However, if we compensate for the motion, successive pictures become more alike again and are easier to compress. We may use transforms to measure the motion. Compression simply cannot be understood without some grasp of transforms.
The art of compression is not to send again something that is already known. Something that is already known is redundant and practically every real world message contains a mixture of novel or unpredictable matters and those that are redundant. When some aspect of a moving picture doesn't change, it can be assumed to be the same in the next picture. The trick is to identify what didn't change, the redundancy, and we do that by looking along seemingly odd axes with the help of transforms.
Music notation works in the frequency domain and in the time domain. Unsurprisingly the human auditory system (HAS) also works in both domains and the realistic reproduction of sound requires attention to be paid to both domains. Equally the compression of audio requires attention to both domains according to how the HAS is working.
Now that the majority of audio-visual signals are captured, produced and delivered in the digital domain, they of course need to be sampled at the outset and converted back to the continuous domain just before presentation. These conversions cannot fully be understood without knowledge of transforms.
Reading hi-fi magazines it becomes painfully obvious that an understanding of transforms is completely absent, and many of the conclusions drawn are without any foundation at all. Pointing out why they are wrong is fruitless, as a technical argument will not be understood. It is far better to leave well alone.
If we pass a series of discrete samples into a transform, it will analyze them according to the amount of signal that is present at a discrete set of spot frequencies. If we correctly low-pass-filtered those samples, we would obtain an analog waveform of finite bandwidth. If we low-pass-filtered the spot frequencies in the same way, we would obtain a continuous spectrum of finite frequency resolution.
Transforms also come in handy for testing purposes as they allow the results of our tests to be looked at from different directions. When lenses are tested, the modulation transfer function (MTF) is essentially the result of a transform. In fact if we follow the path of light through a lens we find that the lens itself is carrying out a transform, as diffraction is a process that is strongly dependent on spatial frequency.
One of the most common measurements made in audio is frequency response, not least because it is easy to do. As will be seen, an audio system can have what seems to be an excellent frequency response, yet sound completely different to another system having an excellent response. This is because there are other aspects of audio that need to be measured and which are commonly overlooked. Using transforms we can make those measurements.
Simulating The Real World
In the same way that the HAS can reject reflected sounds and concentrate on the direct sound, we can simulate that in audio testing and perform some tests that are meaningful even though we are not working in anechoic conditions.
As a transform doesn't lose information, certain characteristics of the information before the transform can be identified afterwards. This is known as transform duality, which we will look at in some detail in due course. In many cases, transform duality appears to work on opposites: something large before the transform becomes small afterwards.
In digital audio the smallest item we have is a single sample, which can be located precisely on the time axis. But a single sample cannot tell us anything about the waveform to which it belongs, so we cannot say what frequency it represents.
The phenomenon first showed up in quantum physics. Werner Heisenberg found that if he worked in the spatial domain, he could treat a photon as a point and calculate where it was, but then he couldn't calculate the frequency, which is the same as the energy. We will probably say more about the uncertainty principle in due course.
You might also like...
Compression is almost taken for granted despite its incredible complexity. But it’s worth remembering how compression has developed so we can progress further.
John Watkinson moves on to discussion of the effects of the medium waves are travelling in and explains why loudspeaker enclosures contain foam.
Compression is the ultimate enabling technology behind broadcasting. Without it, life would be very difficult indeed. In this new series, the whole topic will be explored at some depth.
Broadcasting is totally dependent on waves which crop up in a surprising number of places. Sound waves and light waves form the message, which is delivered by further types of wave.
Delivering determinant latency is more important than fighting variable latency, even if it is small. In this article, we look at how codec design and JPEG-XS can scale to make the best use of network bandwidth while keeping latency predictable.