Data Recording and Transmission: Part 22 - Reed Solomon Codes

The Reed Solomon codes are defined by what the decoder expects to see and the encoder has to be configured to suit that.

The simplest of Reed Solomon codes requires the redundant symbols P and Q to be such that in the error free case two checks will both result in a syndrome of all zeros. It follows that two expressions used to calculate the syndromes are equal to one another.

Fig.1 shows the two expressions set equal and then solved for P and Q. Once again we can do the math involved by adding powers, or we can get a twisted ring counter to do it for us.

Some observations about the arcane math of modulo-2 may make Fig.1 easier to follow. Firstly the addition sign in a circle represents XOR here. Galois Field symbols are multiplied by adding the powers. For example a³ x a² is a⁵. However, as we are in a finite field, the power of a can not exceed 7 so the sum of the powers has to be expressed modulo-7. For example a⁵ x a⁶ = a^(11-7) = a^4.

On writing data, the expressions derived in Fig.1 are used to calculate the redundancy P and Q, which is appended to the data A-E to make the codeword.

Fig.2 shows an example of a complete encode and decode in which the encoding expressions are those derived in Fig.1.

The simple Reed Solomon code, having two redundant symbols, can locate and correct one symbol in error, or correct two symbols by erasure if the location of the errors is known by some other means. One of the strengths of the Reed-Solomon codes is that further pairs of equations can be created throughout the data to increase the number of symbols that can be corrected.

Fig.1. The expressions used to calculate P and Q in the encoder are determined by what the decoder wants to see.

The examples here use three-bit symbols and it doesn’t matter how many of those three bits are wrong. However, it does matter if a three-bit burst error is not aligned with the symbol boundaries. If two of the bits are in one symbol and a third is in another, or if there is a random error in a different symbol, the error is uncorrectable and if correction is attempted the result will be to make the error worse.

However, if a data block had four R-S equations running through it, instead of attempting to correct twice as much, the correction could be made more reliable. For example, if there truly was a single symbol in error, both R-S codes would agree on the location and the nature of the error and it could be corrected with high reliability. However, in the event that the two R-S codes didn’t agree, the error could be declared uncorrectable.

In a real data recorder, the raw playback bit stream can only be described statistically. It will have some random error rate, where odd bits will be flipped due to noise, and it will have some distribution of burst errors, where the size of the burst and its probability are connected. None of that tells us what will actually happen when a particular data block is replayed on a particular occasion. All that can be said is that there will be some combination of random and burst errors.

The practical solution is to combine the Reed Solomon coding with a suitable interleave structure. Many actual formats use 8-bit R-S symbols and the symbol becomes the quantum used in the interleaving.

Block coding is a form of interleaving in which a memory is made into a two dimensional array by fooling with the addressing. For example if a 2Kbyte memory is to be arranged as 32 bytes by 64 bytes, where the rows are scanned in the natural order of addressing, it is possible to scan the first column by using addresses that are integer multiples of 32, such as 64, 96 and so on. The second column can be scanned by adding 1 to the integer multiples.

Prior to recording, data bytes are loaded into this memory in the normal sequence of memory addresses, so that they form rows. The memory is then read out in columns and written to the medium. On replay, the raw data are written into a memory in columns using the same addressing sequence. The natural memory sequence is then used to read out the memory, and bytes will emerge in their original sequence and the effect of the interleaving has been cancelled out.

However, if there were a large burst error on the medium, it would occur in the raw data domain and would affect sequential data bytes down a column in the memory. Such a burst could start in the middle of a symbol and wrap into others.

Fig.2. A complete R-S encode and decode according to the equations of Fig.1.

When the memory is read in columns, the burst error has been broken up into single symbols in error where the burst error is constrained to one symbol and cannot wrap around. Any R-S decoder working on columns of the de-interleave memory is protected from wrap around as the errors are always constrained to a symbol.

If the R-S code word working on a column has no other information, it has to work out the position and the nature of the error from scratch. If a random error occurred elsewhere in the column, it might miscorrect. If, however the column R-S code could be told the location of errors, it could correct twice as many and if some independent means could be found to fix random errors the R-S column code could be protected from miscorrection.

That is the reasoning behind the product code. Fig. 3 shows how a product code is implemented. There is an R-S coder on the input to the interleave memory and a corresponding decoder on the output of the de-interleave memory. In addition, there is a second R-S encoder between the interleave memory and the medium that makes data blocks on the medium into code words. There is a corresponding decoder between the medium and the de-interleave memory. For obvious reasons, these two R-S coding systems are called the inner code and the outer code. The de-interleave memory is modified so that it can convey status between the inner and outer codes in the form of flags.

The two-dimensional interleave memories have code words running down columns and across rows and it becomes possible to play a high-tech game of noughts and crosses with noughts and ones.

The inner code is exposed to whatever comes from the medium and will typically have at least four equations running through the data block. It has a number of functions. In the best case, all four checks produce a zero syndrome and the probability there has been an undetected error is very small. The data can be written into the de-interleave memory without any change.

Fig.3. The structure of a product coding system, showing how the inner and outer codes get their names.

The next worst thing that can happen is that a random error occurs, or a small burst that is wholly within one symbol. If the four equations all agree, the error can be corrected with a low probability of miscorrection.

The absolute worst thing that can happen is that there are non-zero syndromes and the four equations can’t agree on a single error being the cause. This could be due to anything from the signal being replaced by random noise or corrupted by some gross defect in the medium. In this case the data block is not meaningful and instead zeros are written into the de-interleave memory, along with flags attached to each byte indicating the data are in error.

Once the de-interleave block is full, the outer code takes a look at the columns. In the absence of the inner code, the outer code would be completely reliant on interleave to break up burst errors. The outer code would need two equations to locate and correct any symbol in error.

However, the inner code has placed erasure flags in the de-interleave memory and with the location of the faulty symbol known, only one equation is needed to correct it. The immediate result is that the inner code has made the outer code twice as powerful.

There is, however, more to it than that. Random errors are due to noise whereas burst errors are due to media defects. The two mechanisms are independent. With only an outer code a random error of only one bit would use up the correcting power that could correct a whole symbol. The inner code corrects random errors and prevents them from diminishing the power of the outer code.

The product code is correctly described as a synergistic process, where the performance is more than the sum of the parts.

The outer code looks at the flags. If there are more flags than outer equations, correction is not possible and in audio or video some form of concealment will have to be used. Otherwise the flagged symbols are all corrected.

In the same way that opinion polls give better results when the sample size is increased, the reliability of error correction systems increases with the size of the interleave block as there is more chance the error statistics are typical. The direct result of large interleave blocks is that there is a delay when coding and a further delay when decoding.

You might also like...

Navigating Streaming Networks For Live Sports: Broadcaster OTT & Streaming Delivery Networks

With the ongoing growth of OTT content consumption, and the drive from live sports broadcasters to provide high-scale and high-quality Direct to Consumer OTT services, Streamers and their customers now demand streaming services that operate at the scale and quality…

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

The Big Guide To OTT - The Book

The Big Guide To OTT ‘The Book’ provides deep insights into the technology that is enabling a new media industry. The Book is a huge collection of technical reference content. It contains 31 articles (216 pages… 64,000 words!) that exhaustively explore the technology and…

An Introduction To Network Observability

The more complex and intricate IP networks and cloud infrastructures become, the greater the potential for unwelcome dynamics in the system, and the greater the need for rich, reliable, real-time data about performance and error rates.

Next-Gen 5G Contribution: Part 2 - MEC & The Disruptive Potential Of 5G

The migration of the core network functionality of 5G to virtualized or cloud-native infrastructure opens up new capabilities like MEC which have the potential to disrupt current approaches to remote production contribution networks.