Data Recording and Transmission: Part 24 - Message Integrity

Once upon a time, the cause of data corruption would be accidental. A dropout on a tape or interference picked up on a cable would damage a few bits. Error correction was designed to deal with that.

But what if the data were important and it would benefit some dishonest or malicious person if it were changed? If just the data were changed, the error checking at the receiver would obviously fail, so the recipient would know something had happened to the data.

The problem with error correction is that it's not intended to detect dishonesty. In order to use error correction to transmit data, the sender and the recipient must agree on the error checking standard that will be used, and that will be in the public domain. The genuine message could be intercepted and suffer tampering, and new error correcting check bits could be calculated according to the standard, so that the message would be declared error free at the destination. An error free message results in error checking syndromes that are all zero, irrespective of the message content

A different type of redundancy and a different systems approach is required to combat tampering. As with error correction, a redundant check symbol variously called a message digest, a cryptographic hash, or just a hash, is calculated according to some algorithm. The recipient runs the same algorithm to determine if the same check symbol results. Whereas the error correction strategy is based on the error statistics of the channel, the cryptographic hash cannot make any assumptions about what might have been changed.

If the hash is sent with the message, it could be intercepted, the hash algorithm identified, the data changed and a new hash calculated, so either the recipient needs to obtain the hash via a different route, perhaps by accessing the sender's web site, or the hash needs to be encrypted, which means that sender and recipient must share a secret key.

It is also possible to cause disruption if a malicious party intercepts a message without understanding it, but simply repeats the same block endlessly. For that reason a time code may be included in the hashed data block that would allow repeated messages to be rejected.

Fig.1 - In a block chain, the hash of the previous block is included in the hash calculation of the next block and so on. Any alteration of any data in the chain will affect the latest hash. If multiple copies of the chain exist, it will be obvious which one has been changed.

A good hash function will create a hash that will change dramatically if any alteration is made across the range of possibilities from one bit to every bit in the data block. The chances of two different data blocks producing the same hash, an event known as a collision, must be vanishingly small, which means that two blocks having the same hash can be assumed to be the same. A further requirement to prevent tampering is that it should be substantially impossible to create a data block having a specific hash.

This requirement is interpreted as it being computationally infeasible to try out different messages until a specific hash emerged. It follows immediately that hashes need to be long, so that the number of different possible hashes is vast. The number of combinations in a 64byte hash is two raised to the 256th power, which is fantastic. This makes it possible for the hash-calculating algorithm to be made public, because knowing the algorithm doesn't help to create a given hash.

Cryptographic hash based coding therefore makes it obvious if there has been any tampering with the data. There is one exception to that, which is where only one private copy of the data and its hash exists. The holder of the one copy, or someone in his employ, could change the data and calculate a new hash and no-one else would know.

It follows immediately that one solution to render hashed data strong against tampering is to make it as public as possible so that many others could repeat the hash algorithm on the data. There must be a mechanism for the various recipients to compare hashes. If any hash is found to differ from the rest, the relevant block has been corrupted and must be replaced with a block from the recipients whose hashes compare. In order to change a block a wrongdoer would need to have control over more than half of the recipients.

As a hashed block is only trustworthy if the hash is unchanged, it follows that the block is effectively set in stone for all time. The next stage in the tamper proofing is to extend it to allow a strong database to grow as new information arrives. This is done adding new data blocks having hashes. In a block chain, each new block also contains the hash of the previous block, which forms part of the data used to create the hash of the new block as is shown in Fig.1.

All of the blocks are logically linked so that if one bit in any block is changed, its hash and all subsequent hashes will change. Logically a block chain acts like one block because the hash of the last block is a function of every bit in every block.

There has been a lot of hype about block chain technology changing the world and so on. Most of that has come from writers who have no clue how it works or why it needs to be the way it is. In practice the way it needs to be rules it out for a huge number of applications.

Whilst there is no doubt that block chain technology works and allows a large tamper proof database to be set up, it only works if a) the data are immutable for all time and b) the data are made public.

Most real databases cannot use it, as one of both of those requirements may not acceptable. For example a block chain would not be suitable to maintain a customer database, as if one customer moved house, or changed telephone number, every version of the entire chain after the block containing that entry would have to re-computed after updating. Nor would it be acceptable for customer contact details to be made public.

Most databases need to be confidential such that the data they contain is available only to select users. On the other hand, in a private block chain, there is diminished protection against tampering. Nevertheless private data needs to be just that, and the greater threat to private data comes from hacking, where outsiders try to gain access. The hacking problem and the tampering problem are quite different and require completely different forms of defense. A public block chain does not need privacy by definition.

Block chain technology is a bit like a hovercraft. It works and it's very clever, but there is an extremely limited market. It's very difficult to think up genuine applications for publicly broadcast immutable data. One possibility is where the database forms a historical record of activities. As what happened in the past cannot change, there is no need to change the record.

Fig.2 - The data block contains a nonce, which is changed until the hash meets a certain requirement, such as a specific number of leading zeros. This is the proof-of-work concept that deters fraudsters.

When the block chain is public and distributed, anyone can create a new block and it therefore has to be made difficult to do so otherwise anyone with a computer could churn out bogus blocks and ruin the system. One solution to the creation of bogus data, which has also been used to combat spam, is the concept of proof of work.

Proof of work is a system whereby significant computing time is needed to generate a data block that is acceptable to the system, so that blocks cannot be mass-produced. One common system relies on the irreversibility of the cryptographic hash. As it is not possible to create a data block having a specific hash, one proof of work idea is that the hash must have certain characteristics that can only be established by repeated experiment.

In Bitcoin, for example, contained within the data block is a word shown in Fig.2 and called a nonce; that exists only to change the cryptographic hash without changing the wanted data. The proof of work consists of finding a nonce that causes the hash to have a certain number of leading zeros. There is no deterministic way of doing that, so a lot of computing time is needed to try out various nonces until a hash having the appropriate number of leading zeros is found.

As the amount of computation needed increases exponentially with the number of leading zeros, the amount of work needed for proof can be adjusted. In bitcoin the creator of the first new block that meets the proof of work criterion is rewarded, in bitcoins, of course. That led to the search for the nonce that would satisfy the hash criterion to be called mining, as in digging for gold.

The fundamental problem with proof of work is that it is wasteful of resources and energy. The amount of energy used is difficult to assess, but some estimates suggest that bitcoin uses each year the same amount of energy, with corresponding carbon emissions, as a small country. Perhaps it is fortunate that the number of applications suitable for block chain is so small.

You might also like...