Timing: Part 11 - Isochronous Systems
The traditional analog video signal contained all of the timing information needed by the receiver. Then it all changed….
Shortly after Noah's Ark ran aground, the tube-based television system became established. Based on scanning, the system relied on the electron beam in the receivers being at the same place as in the camera. The voltage gamut of the signal was split so that the range below black was interpreted as synchronizing information and the range above black was interpreted as picture information.
It all worked in real time and for simplicity, the line frequency was constant, the frame frequency was constant and there was an integer relationship between them. The line and frame rates in the receiver ran from oscillators that locked to the sync pulses. If an individual sync pulse were damaged by noise, the oscillator would keep going and fill in the gap. The system was called a flywheel circuit. Later on it would be called a phase-locked loop.
Initially, the arrival of digital video did not change things much. The first digital video formats, such as 601 were basically only the good old analog TV signal expressed as numbers. The digital signals ran in real time, complete with gaps in the data corresponding to blanking. As the timing structure was identical, 601 could be converted back to analog video very simply.
The bandwidth needed to carry 601 digital video was far beyond what was available in a broadcast channel, so use of 601 was limited to production. Actual digital broadcasting would have to wait for the development of compression algorithms.
It is fundamental to efficient compression that what happens is determined by the picture content. Some pictures compress easily, requiring little data, some are difficult and need more bits. Add to that the different types of picture used in MPEG, each needing different amounts of data, and the traditional approach of constant frame rate fell down. Throw in bidirectional coding and the pictures are not even sent in the correct order.
If statistical multiplexing is used, the bit rate of an individual video channel is not even constant, but rises and falls with picture content and motion. The traditional synchronizing systems of analog television had completely vanished and it would be necessary to come up with a completely new system that would handle the new conditions.
In point-to-point transmission, and in the replay of recordings, it is possible for the destination to control the timing, such that the source is asked for new data as needed to keep the destination timebase corrector centered. Clearly that can only work with one destination of the data as the source cannot lock itself to more than one reference at once.
For broadcast purposes the source has to provide the timing and the data and all destinations must lock to the source. The viewing of a TV program does not require precise synchronization with the TV station. Instead it is good enough if the picture rate is correct and constant and the sound is locked to the picture.
This is just as well, because delay is inherent in compression. As everything has to be causal, sending pictures in the wrong order, as MPEG does, has to be done using delay. Re-ordering them again needs more delay. Strictly speaking digital television is not synchronous because of the inherent delays. The correct term is isochronous, meaning that the rate is correct even if the absolute timing is out.
Before analog TV transmissions were switched off, it was possible to compare analog and digital transmissions side by side, perhaps in a TV shop. It was clear that the digital picture was always somewhat behind the analog picture because of the codec delay.
Fig.1 - The encoder master clock is recreated at the receiver using a phase locked loop that has the same long-term frequency, if not the same phase.
Some stations broadcast a picture of a clock from time to time and on the digital service it would be wrong because of the delay. This was soon dealt with by removing the seconds hand from the clock.
In an isochronous system, the overall delay needs to be constant, even though the instantaneous delay varies from picture to picture. As usual, the difference is absorbed by time base correction or buffering. As the average codec delay is relatively constant, the final say over the amount of delay needed would be the requirement to center the memory buffers.
A wide range of frame rates is used in television and movies, so a common timing system was needed that would support them all. This was done having a constant frequency clock at the source sampled by each incoming TV frame to create a time stamp. Each receiver would have to re-create the source clock and then place the decoded pictures on the time line of that clock according to the time stamps.
As the frame rate of a TV program is constant, it follows that the number of cycles of the master clock in each frame would be the same. Consequently it is not necessary to send a time stamp with every frame as the time stamp for any frame can be calculated from frames that do have stamps.
The constant frequency clock is known as PCR (Program Clock Reference) and the frequency of 27MHz was chosen as it was already present as the sampling rate of the 601 multiplex and the master clock of MPEG. An isochronous system requires the 27MHz clock to be recreated at every receiver.
A 27MHz clock has a constant frequency. Having no bandwidth it carries no information and is highly redundant. This means the clock frequency can be locked using very little data. As can be seen in Fig.1, the master 27MHz clock at the source drives a 48-bit binary counter, which periodically overflows and restarts. From time to time, the state of the count is sampled and multiplexed into the Transport Stream.
Each receiver contains a 27MHz crystal oscillator that drives an identical counter to the one in the master clock. The frequency of the receiver oscillator can be pulled slightly above or below its nominal frequency by a control voltage. The samples of the master oscillator are compared with the state of the counter driven by the receiver oscillator. The state of the difference between the two counts allows the frequency of the local oscillator to be adjusted.
In an ideal world, once the system locked, the local count would always be the same as the received PCR value. In practice the transmission of packets between master and receiver is subject to jitter and that would show up as noise on the comparison.
The noise is dealt with by the loop filter, which ensures that the VCXO is controlled by the average of a large number of phase comparisons. The damping of the loop filter makes the clock more stable, but it also takes longer to achieve lock. In some systems the local counter is jammed to the same value as the first received PCR on start up. The time constant of the loop filter may also be changed during start up.
Once the receiver has recreated the isochronous 27MHz clock, it is possible to use time stamps. The time stamp clock runs at 90KHz, obtained by dividing 27MHz by 300. This frequency drives a 33-bit counter that is sampled to obtain a time stamp.
Fig.2 - The use of decode time stamps helps the decoder get the pictures in the right order when bidirectional coding is used.
In the same transport stream, it is quite possible that different frame rates exist. There may be TV material at 50 or 60Hz, and there may be film-based material at 24Hz. Provided the time stamp counter is correctly sampled by each process, the differing frame rates can co-exist.
If the same count is sampled to create audio and video time stamps, then the lip sync at the receiver will be the same as it is at the source. There is, however, a further use for time stamps, and this comes about because of the use of bidirectional coding. When pictures are sent in the wrong order, the decoder needs some help to decide which picture to decode next. It may be necessary to decode a future picture because the contents will be needed for an earlier picture.
There are thus two types of time stamp. The presentation time stamp, PTS determines the time when the picture should appear on the screen and the decode time stamp, DTS, determines when a picture should be decoded.
Fig.2 shows an example of PTS/DTS in action. When bidirectional coding is used, the correct picture sequence may be IBBP, but it will be transmitted as IPBB because the B pictures may need data from the P picture. Fig.2 shows that the DTS tells the decoder to decode the P picture immediately after the I picture, so that the B pictures can be decoded. The decoded P picture is kept in memory. Only after the B pictures have been presented will the P picture be presented.
It should also be clear from Fig.2 how putting pictures back in the correct order causes delay. Re-ordering the pictures in the first place also causes delay.
You might also like...
IP Security For Broadcasters: Part 3 - IPsec Explained
One of the great advantages of the internet is that it relies on open standards that promote routing of IP packets between multiple networks. But this provides many challenges when considering security. The good news is that we have solutions…
The Resolution Revolution
We can now capture video in much higher resolutions than we can transmit, distribute and display. But should we?
Microphones: Part 3 - Human Auditory System
To get the best out of a microphone it is important to understand how it differs from the human ear.
HDR Picture Fundamentals: Camera Technology
Understanding the terminology and technical theory of camera sensors & lenses is a key element of specifying systems to meet the consumer desire for High Dynamic Range.
IP Security For Broadcasters: Part 2 - The Problem To Be Solved
By assuming that IP must be made secure, we run the risk of missing a more fundamental question that is often overlooked: why is IP so insecure?