Data Recording and Transmission: Part 19 - Cyclic Redundancy Checks

The CRC (cyclic redundancy check) was primarily an error detector, but it did allow some early error correction systems to be implemented.

There are many different CRCs but they all work in much the same way, which is that the data to be protected are divided by the CRC polynomial. This was done serially by shifting the data bits into the CRC generator at the same time as they were being written to a medium such as disk or tape or sent into a serial transmission line.

When the last data bit had been shifted in, the state of the twisted ring counter would represent the remainder of the polynomial division. A small re-arrangement of the generator logic, switching off the feedback, would turn it into a regular shift register so the remainder could be shifted out and recorded immediately after the data. The data and the remainder, also known as the check bits or checksum, would together form a code word.

The definition of the code word in this case is that irrespective of the data, if the entire code word is divided by the generator polynomial on being read, the remainder should be zero. Detection of the all zero's condition in the decoder is very simple and gives a binary go/no-go output almost immediately after the data have been read.

A large number of CRCs exist, of various lengths. 16 bit CRCs are popular in data storage and 32 bit CRCs are found in Ethernet and MPEG.

The twisted ring and its control logic were an obvious target for integration and one of the first CRC chips was the Fairchild 9401 of 1975. A simple block of the device is shown in Fig.1. It could be configured to generate a number of different CRCs including CRC-16 and CRC-CCITT. These two could also be calculated in reverse as some tape decks of the day could read data in either direction.

Fig.1 - Block diagram of an early CRC chip, which could generate and check several different polynomials.

Today the CRC calculation could equally well be implemented in software or, for higher speed, in a gate array. The complexity of CRC logic is very low.

Whilst the size of the code word could not exceed the maximum sequence length, in most cases the code word was shortened, or punctured, so that data blocks of any convenient size could be used. Puncturing is shown in Fig.2 and effectively the wanted data are preceded by a string of zeros that don't need to be recorded. In fact if the twisted ring is reset to zero, inputting a string of leading zeros would have no effect whatsoever, so the punctured data block could be input directly on both write and read.

Reducing the size of the data block also reduces the probability that it will contain an error and makes the CRC more reliable.

Fig.2 - In a punctured code, not all of the sequence length is used. Mathematically, the actual data are considered to be preceded by zeros in which errors of course cannot occur. This makes the CRC stronger against errors in the actual data.

Simple but effective error correcting systems could be made using the CRC as an error detector. In the IBM open reel data tape formats, that used 7 and later 9 tracks on 1/2 inch tape, one of the tracks was a parity track that recorded the result of a parity calculation across the tracks. A transverse parity failure on replay would indicate but not locate an error, nor could it tell if two or three tracks were in error.

Fig.3 shows that the data along the tracks were formed into CRC code words. At the end of a block, the results of the CRC checks were analyzed. In the event that all CRCs gave zero remainders, there was considered to be no error and no action was taken. If only one track contained a non-zero CRC, the errors could be corrected. If more than one track had a non-zero CRC remainder, the block was unrecoverable, but might revert to a single track error on retry if the problem had been due to noise or dirt.

In the case of the single-track error, the most likely cause would be a tape dropout and the same thing would happen on a second pass. The drive would need to read the block again, but this time whenever a transverse parity error was found, the good data tracks and the parity track could correct any failing bit in the error track. If that process worked, the CRC of the error track, after correction, would give a zero remainder.

Fig.3 - Early error correcting scheme using transverse parity and longitudinal CRC. The CRC identifies the failing track so that the parity can correct it. (IRG = Inter-record gap).

For a digital audio recorder, the need to read a block again would cause an intolerable interruption in the sound, so forward error correction (FEC) was necessary. Forward error correction simply means that the source of the data plays no further part if an error is found. Retries or retransmission is not used and the decoder alone has to deal with the one version of the data it receives.

In the first equipment used for mastering Compact Discs, the storage medium was U-matic tape, which had sufficient bandwidth to record stereo PCM audio, and more than enough signal to noise ratio to record binary as black and white patches in the picture. The recording density was low as was the raw error rate, so a simple FEC system was adequate.

Fig.4 shows the structure of the PCM-1610 error correction system. Pairs of audio samples generated a parity symbol and both samples and parity symbols were protected by CRCs. In the event of an error, if only one CRC in the block failed in a row of samples, those samples could be corrected using the remaining samples and the parity. If the parity bit CRC alone failed no action was taken. If two or more CRCs failed, correction was not possible, and concealment of the missing samples would be used.

Fig.4 - The correction structure of an early CD mastering recorder: the PCM-1610. The failure is located by CRC and corrected by parity.

If the principle of parity is considered in a three-symbol system, then the bit-by-bit modulo-2 sum of the three symbols should all be, for example, odd. If one symbol is known to be in error and its value is set to all zeros, then the bit-by-bit modulo-2 sum of the other two symbols and zero will be the missing symbol. This is known as correction by erasure

The PCM-1610 system had a 100% overhead, but the bandwidth of a U-matic video recorder was such that it didn't matter and the system worked surprisingly well.

The digital audio stationary head (DASH) format was an open-reel digital audio multi-track format intended for music production. FEC was a practical necessity, but the overhead used in the CD mastering format was not acceptable. Instead a two-dimensional parity structure was used along with a CRC error detector.

Fig.5 - In cross-interleaving, data coded in columns are subject to differing delays so that codes can also be created on diagonals. The example here is from the DASH format.

A two-dimensional parity structure could be created using the rows and columns of an interleave block, where the code words are at right angles. However, less memory is required if the interleave is convolutional. Fig. 5 shows that if data are arranged into columns, parity symbols P can be calculated on each column. If each row is then subject to a different delay before another parity symbol Q is calculated, the two parity code words would form columns and diagonals in the data.

The system is known as cross interleaving. Following the two-dimensional parity calculation, a further interleave is used to create data blocks to put on the tape. The result is that every symbol in a data block appears in a different P and Q code word, so that a defective block would result in single error in a large number of code words. The blocks finally recorded on the tape were coded with a CRC, which is the only error-detecting mechanism. In the event of a CRC failure in a tape block, possibly due to a dropout, every symbol in the block would be flagged as possibly in error and the flags would accompany the data through the first de-interleave to P.

Fig.6 - Multiple errors can be corrected in a cross interleave by re-interleaving as the example here shows.

In the best case, there would only be one flagged symbol in each P parity check, so that the symbol in error could be corrected by erasure. However, on occasions there could be random errors in the vicinity of the dropout, such that more than one symbol would be flagged in a P code word, which the P code could not correct. In that case if the data are re-interleaved to Q, there is a good chance there would be only a single error in the Q code words.

Fig.6 shows how this works. By re-interleaving to Q and de-interleaving again to P a practical system that could handle a mix of dropouts and random errors was achieved before it became economic to employ Reed-Solomon coding.

In the event of errors being uncorrectable, a further odd-even interleave was present that allowed concealment of odd samples in error using even samples and vice versa. This process was also invoked to allow DASH format tapes to be physically spliced!

The most successful DASH format recorder was the Sony PCM-3324, and users soon found that it produced a lower error rate if it was used with Ampex tape. That produced an interesting situation, because Ampex and Sony were deadly rivals in the broadcast video recorder market. The Ampex tape division actually bought a PCM-3324 to test their tapes.

You might also like...