# Is Gamma Still Needed?: Part 9 - Processing In Floating Point

Floating-point notation and gamma are both techniques that trade precision for dynamic range. However they differ fundamentally. Gamma is a non-linear function whereas floating point remains linear. Any mathematical manipulations carried out on floating-point encoded data will be correct whereas manipulations of gamma-encoded luma cannot be. Gamma was intended to linearize a cathode ray tube whereas floating point encoding was designed from the outset for mathematical manipulation.

Fig.1 The transfer function of a 16-bit quantizer is shown at a). If a sixteen bit number is expressed to 12-bit accuracy, the steps b) become four times larger. In floating point this can only happen in the presence of a signal, so the approximation is masked.

Fig.2 A five-bit exponent has 32 combinations. The ends of the scale denote zero and infinity, leaving fourteen negative values, zero and fifteen positive values for the exponent.

Fig.3 Addition first requires the exponents to be the same. Then the mantissae are added and the result is re-normalized.

Fig.4. When a mantissa is shifted right, three bits shifting from the end are considered. These are the guard, round and sticky bits that are used to optimize the normalization.

