rounding with floating point numbers
banker's rounding (round to even)
The banker's rounding may seem unintuitive at first, but it is not difficult. It round deterministically, according to the following rules:
- fraction > 0.5 -> round up
- fraction < 0.5 -> round down
- fraction
0.5 -> round up/down to the nearest even number
Let's try this with a few decimal numbers:
| decimal number | rounded | direction | reason |
|---|---|---|---|
| 3.4 | 3 | down | fraction < 0.5 |
| 3.6 | 4 | up | fraction > 0.5 |
| 3.5 | 4 | up | fraction |
| 6.5 | 6 | down | fraction |
| The problem with always rounding fractions |
(short) introduction to binary numbers
Floating point number consist of a sign (+/-), an exponent (how large is the number?) and a mantissa, which represents the significant digits of the number.
The number is then calculated in the following way:
#todo explain normalized and denormalized
In reality, the format is a bit more complicated than the formula: the exponent is stored with a bias, there are subnormals (numbers close to 0) and special values (
For a more detailed explanation of floating point numbers, the article on Wikipedia is describing the format quite satisfactory.
Floating point numbers are discrete, and if more significant bits are required than the mantissa can hold, rounding has to occur.
rounding to even with binary numbers
With binary floating point numbers, specifically IEEE 754 floating point numbers, we round in the following way:
Assume after an operation (multiplication/addition/conversion) you have a number that no longer fits into the mantissa:
number: 01 0110 1011 (10 bits)
mantissa: 00 0000 (6 bits)
To round, we will look at three bits:
- G guard bit: the last bit that fits in the mantissa (LSB)
- R round bit: first discarded bit
- S sticky bit: bitwise OR of all remaining discarded bits
number: 01 0110 1011 (10 bits)
(R)round bit
🠷
prepared for rounding: 01 0110 11
🠵 🠵
(G) guard bit (S) sticky bit
To determine how to round, we can imagine that the last two bits are a tiny floating point binary number:
last 2 bits: 11 (R, S)
float number: 0.11 (0.RS)
in decimal: 0.75
It becomes immediately clear what to do:
The fraction is
number: 01 0110 1011
rounding: 01 0110 11 (last two bits are the round and sticky bit)
new mantissa: 01 0111 (rounded up)
This way, it is quite intuitive. The guard bit is there to specify whether to round up or down to even, if the fraction is
| G | R | S | fraction ( |
Decimal | Action |
|---|---|---|---|---|---|
| x | 0 | 0 | 0.00 |
0.0 | do nothing |
| x | 0 | 1 | 0.01 |
0.25 | round down (truncate) |
| 0 | 1 | 0 | 0.10 |
0.5 | round down (already even) |
| 1 | 1 | 0 | 0.10 |
0.5 | round up (to make even) |
| x | 1 | 1 | 0.11 |
0.75 | round up |
Examples
round up to even:
number: 01 1101 1000 -> guard: 1, round: 1, sticky: 0
truncated: 01 1101
fraction: 0.10 -> 0.5, round up (guard bit -> number odd)
new mantissa: 01 1110
round down to even:
number: 10 1100 1000 -> guard: 0, round: 1, sticky: 0
truncated: 10 1100
fraction: 0.10 -> 0.5, round down (guard bit 0 -> number even)
new mantissa: 10 1100
round up:
number: 01 1001 1001 -> guard: 1, round: 1, sticky: 1
truncated: 01 1001
fraction 0.11 -> 0.75, round up
new mantissa: 01 1010