Data Recording and Transmission: Part 21 - Reed-Solomon Error Correcting Codes

The explosion in digital technology that led to Compact Discs, DVD, personal computers, digital cameras, the Internet and digital television broadcasting relies heavily on a small number of enabling technologies, one of which is the use of Reed-Solomon error correcting codes.

The Reed-Solomon codes are fundamentally burst-correcting codes and have the potential to correct more than one burst in a code word. In order to work at high speed, they adopt modulo-2 arithmetic, needing no borrows or carries. This was introduced earlier in the series as the basis for the Galois Field, which is a finite range of numbers whose size is determined by the number of bits in the symbol.

It is important to appreciate that the data quantum the Reed-Solomon codes deal with and the size of the burst that can be corrected, is equal in size to the number of bits in the Galois field symbol, in many cases eight bits. Binary data to be protected are actually treated as symbols in a Galois Field.

The operation of a Reed-Solomon scheme can be followed here in some worked examples that use three-bit symbols and short code words for simplicity. First of all, it is necessary to recap what a Galois Field looks like and this is shown in Fig.1. By defining a starting combination in the cyclic field, every combination of three data bits can be considered to be the starting combination raised to some power.

Like any other error-correcting scheme, the Reed-Solomon codes add redundancy to the data. The redundant symbols are also symbols in a Galois Field. The minimum number of redundant symbols is two. On writing the data, the value of these two symbols is calculated mathematically in two different ways.

On reading the code word, two calculations are performed. One of these is a simple parity check whereby all of the symbols are added together modulo-2. The encoding is designed to make this sum zero. In the event of an error that is confined to a single symbol, the sum will not be zero and the bit pattern in the syndrome will be the bit pattern of the error.

Fig.1 - The twisted ring counter steps through a Galois Field each time it is clocked. The bit patterns of each power of a are shown.

The second calculation is another modulo-2 sum, except that before summing, each symbol is raised to a different power. The power is given by the position in the code word. The encoding again is designed so that the sum of these products should be zero. If it is not, but instead there is a single symbol in error, the syndrome represents the bit pattern of the error raised to the power of its position.

Fig.2 shows the example of an error-free read. The calculation of S₀ is simple parity. The calculation of S₁ is more complicated because the symbols need to be raised to different powers. For example, symbol B needs to be multiplied by a⁶. It has a value of 100, which will be seen in Fig.1 to correspond to a². We can multiply a² by a⁶ simply by adding the powers, which gives a⁸, but as we work in a finite field, the answer must be a^{8 Mod.7} which is a, or 010 from Fig.1.

Fig.2 - An example of an error free read check, in which both syndromes are all zero.

Alternatively, if the value of symbol B, 100, is loaded into the circuit of Fig. 1 and clocked six times, then the result again is 010. The process can be repeated for all of the symbols and the products are added to obtain S₁.

Fig.3 shows an example of a correction. The three-bit symbols allow a code word of seven symbols. Two of these, P and Q, are redundancy and five of them, A - E, are data. Symbol C' has deliberately been corrupted. It will be seen in Fig.3 that the read checking gives non-zero syndromes. S₀ is the error and S₁ is the error raised to an unknown power. That power can be found by dividing S₁ by S₀, and the result is a⁵, which is the power to which symbol C was raised in the checking sum.

If C' and S₀ are added, the corrected value of 010 for symbol C will be obtained and confirmed with reference to Fig.2.

The key point to take from this example is that seemingly complex math can be performed for us by deceptively simple circuitry. Fig 4. shows a circuit that calculates S₀ from successive symbols. Fig.5 shows how S₁ can be calculated. This circuit is a variation on that of Fig.1 such that there is a Galois Field multiplication by a in the feedback loop such that successive inputs are raised to higher and higher powers.

Fig.3 - A read check, which discovers an error, then locates and corrects it by dividing the syndromes.

These simple circuits are not only economical, but can also work at extremely high speed. It is all very well having a mathematically elegant correction system, but if it can't work in real time on digital video it's not useful.

The examples of Figs. 2 or 3 can be emulated using Figs. 4 and 5. It's a good alternative to sudoku during lockdown.

It should be clear that if there are errors in more than one symbol in the above examples, the system is taken outside its capability. One or both of the syndromes will still be non-zero, but if an assumption is made that there has been a single error the result will be a mis-correction.

If there is a single three-bit error, which corrupts all three bits in one symbol, that is correctable, but a three-bit error shifted one place could corrupt one bit in one symbol and two bits in the next, which is uncorrectable. One solution to that problem is to interleave the symbols. As will be seen, the Reed-Solomon codes are commonly used with interleaves to make them more reliable.

Fig.4 - The simple parity check used to calculate S0.

Essentially the error correction system has caused the error to change two simultaneous equations. When these equations are solved, one solution is the location of the error and the other is the error itself. Reed-Solomon codes are not restricted to one pair of equations. Any number of pairs can be used and each pair can locate and correct one symbol.

Using two pairs of equations, if there appears to be a single error both pairs should agree on the error and the location. If they don't there is a risk of mis-correction. In that case the coding could simply flag the block as uncorrectable and rely on some other means to correct it.

Multiple pairs of equations would not be useful in the simple examples shown here, but in many applications the symbol size is eight bits, allowing a maximum code word length of 255 bytes, providing enough space for a considerable amount of redundancy. The code word can, of course, be punctured to create a smaller data block of convenient size.

Fig.5 - The calculation of S1 has a multiplication by a in the feedback so successive symbols are multiplied by higher and higher powers of a.

Considering any pair of Reed-Solomon equations, they can be used in three ways. One is to produce a reliable error detection mechanism where failure to obtain two zero syndromes is simply used to flag the existence of an error to some other correcting mechanism and no effort is made to solve the equations. Another is to solve the two equations in order to locate and correct a single symbol in error. There is, however, a third possibility which is extremely powerful and that is to solve the two equations to correct two symbols if their location is known by outside means.

Whilst it would be perfectly possible to calculate the error symbols from the syndromes in the usual way, there is a simpler method that can be used if the location of the erroneous symbols is known. In that case the value of the erroneous symbols is ignored and they are simply set to zero. Fig. 6 shows that when this is done, the two syndromes form two different expressions containing both errors. S₀ is the sum Mod. 2 of the two symbols and S1 is the sum Mod.2 of the symbols that have been raised to two different powers according to their location.

Fig.6 - An example of correction by erasure is shown, in which the location of two error symbols is already known.

When the expressions for S₀ and S₁ are solved, the solutions are not the corrections, but actually the correct data. That is because the data were set to zero, so adding zero to the correction doesn't change it. The act of setting the erroneous symbols to zero is known as erasure and the act of telling a Reed-Solomon code the location of errors by external means is called correction by erasure.

There is rather more to an error correction system that simply the codes employed. The codes must be combined with a rationally designed interleave structure in order to produce a correction system that is robust against the statistics of errors found in the real channel.

The code word of the R-S examples above has been defined in terms of what happens during readout, but for writing we need to know how to encode the data. That will have to wait until next time.

You might also like...

Navigating Streaming Networks For Live Sports: Broadcaster OTT & Streaming Delivery Networks

With the ongoing growth of OTT content consumption, and the drive from live sports broadcasters to provide high-scale and high-quality Direct to Consumer OTT services, Streamers and their customers now demand streaming services that operate at the scale and quality…

Designing IP Broadcast Systems - The Book

Designing IP Broadcast Systems is another massive body of research driven work - with over 27,000 words in 18 articles, in a free 84 page eBook. It provides extensive insight into the technology and engineering methodology required to create practical IP based broadcast…

The Big Guide To OTT - The Book

The Big Guide To OTT ‘The Book’ provides deep insights into the technology that is enabling a new media industry. The Book is a huge collection of technical reference content. It contains 31 articles (216 pages… 64,000 words!) that exhaustively explore the technology and…

An Introduction To Network Observability

The more complex and intricate IP networks and cloud infrastructures become, the greater the potential for unwelcome dynamics in the system, and the greater the need for rich, reliable, real-time data about performance and error rates.

Next-Gen 5G Contribution: Part 2 - MEC & The Disruptive Potential Of 5G

The migration of the core network functionality of 5G to virtualized or cloud-native infrastructure opens up new capabilities like MEC which have the potential to disrupt current approaches to remote production contribution networks.