rounding with floating point numbers

banker's rounding (round to even)

The banker's rounding may seem unintuitive at first, but it is not difficult. It round deterministically, according to the following rules:

fraction > 0.5 -> round up
fraction < 0.5 -> round down
fraction $\equiv$ 0.5 -> round up/down to the nearest even number

Let's try this with a few decimal numbers:

decimal number	rounded	direction	reason
3.4	3	down	fraction < 0.5
3.6	4	up	fraction > 0.5
3.5	4	up	fraction $\equiv$ 0.5, 4 even, 3 not even
6.5	6	down	fraction $\equiv$ 0.5, 6 nearest even number
The problem with always rounding fractions $\geq 0.5$ (like you probably learnt in school) is that for 9 numbers (1.1, 1.2, ..., 1.9) we would round 5 numbers up and 4 down. This introduces a bias, which becomes visible in large data sets. With round to even, we eliminate the bias.

(short) introduction to binary numbers

Floating point number consist of a sign (+/-), an exponent (how large is the number?) and a mantissa, which represents the significant digits of the number.

The number is then calculated in the following way:

sign \cdot 1. [mantissa] \cdot 2^{exponent}

#todo explain normalized and denormalized

In reality, the format is a bit more complicated than the formula: the exponent is stored with a bias, there are subnormals (numbers close to 0) and special values ( $- \infty, N a N, + \infty$ ).

For a more detailed explanation of floating point numbers, the article on Wikipedia is describing the format quite satisfactory.

Floating point numbers are discrete, and if more significant bits are required than the mantissa can hold, rounding has to occur.

rounding to even with binary numbers

With binary floating point numbers, specifically IEEE 754 floating point numbers, we round in the following way:

Assume after an operation (multiplication/addition/conversion) you have a number that no longer fits into the mantissa:

number:   01 0110 1011 (10 bits)
mantissa: 00 0000      (6 bits)

To round, we will look at three bits:

G guard bit: the last bit that fits in the mantissa (LSB)
R round bit: first discarded bit
S sticky bit: bitwise OR of all remaining discarded bits

S (sticky bit) = {\begin{cases} 1 & if any remaining bit is 1 \\ 0 & else \end{cases}

number:                01 0110 1011 (10 bits)
                            (R)round bit
							   🠷	
prepared for rounding: 01 0110 11
                             🠵  🠵
		         (G) guard bit  (S) sticky bit

To determine how to round, we can imagine that the last two bits are a tiny floating point binary number:

last 2 bits:    11   (R, S)
float number: 0.11   (0.RS)
in decimal:   0.75

It becomes immediately clear what to do:
The fraction is $0.75 > 0.5$ , so we will round up:

number:       01 0110 1011
rounding:     01 0110 11 (last two bits are the round and sticky bit)
new mantissa: 01 0111    (rounded up)

This way, it is quite intuitive. The guard bit is there to specify whether to round up or down to even, if the fraction is $\equiv 0.5$ :

G	R	S	fraction ( $0. R S_{2}$ )	Decimal	Action
x	0	0	0.00 $_{2}$	0.0	do nothing
x	0	1	0.01 $_{2}$	0.25	round down (truncate)
0	1	0	0.10 $_{2}$	0.5	round down (already even)
1	1	0	0.10 $_{2}$	0.5	round up (to make even)
x	1	1	0.11 $_{2}$	0.75	round up

Examples

round up to even:

number:       01 1101 1000    ->    guard: 1, round: 1, sticky: 0
truncated:    01 1101
fraction:     0.10            ->    0.5, round up (guard bit -> number odd)
new mantissa: 01 1110

round down to even:

number:       10 1100 1000    ->    guard: 0, round: 1, sticky: 0
truncated:    10 1100
fraction:     0.10            ->    0.5, round down (guard bit 0 -> number even)
new mantissa: 10 1100

round up:

number:       01 1001 1001    ->    guard: 1, round: 1, sticky: 1
truncated:    01 1001
fraction      0.11            ->    0.75, round up
new mantissa: 01 1010