Range reduction in glibc with reduce_sincos
This post is about the reduce_sincos
range reduction function from the glibc math library. You can find the source
here, starting on line 151..
Introduction
Any floating point number x
can be written as \(q(\pi/2) + r\) where \(q\) is
an integer and \(-\pi/4 < r < \pi/4\). In fact, \(q = [(2/\pi)x]\) where the
square brackets denote rounding to the nearest integer. This process is called range reduction. The reason to do it is that sin or cosine of x differs from sin or cosine of r by a sign which we can determine from q, but evaluating a trigonometric function at r will be easier since its magnitude is guaranteed to be small.
The reduce_sincos
function does range reduction for moderately sized floating point numbers x
. There’s a different routine for larger ones. Here’s what the comment in the source says:
/* Reduce range of x to within PI/2 with abs (x) < 105414350. The high part
is written to *a, the low part to *da. Range reduction is accurate to 136
bits so that when x is large and *a very close to zero, all 53 bits of *a
are correct. */
One of the inputs to reduce_sincos
is a double x
with absolute value less than 105,414,350 = 0x1.921fb38000000p+26. The function
aims to set the doubles a
and da
to an approximation of the
remainder \(r\) and to return \(n\), the value of \(q\) mod 4 (unless x
is
negative in which case it is minus this). It doesn’t quite do this in all
cases (see the end of this post) but the “error” doesn’t actually matter.
If you’re wondering where the figure 105,414,350 comes from, note that \((2/\pi) \times 105,414,350 \approx 2^{26}\) (the approximation is very close). Error estimates in the function rely on multiplication by \((2/\pi)x\) (rounded to nearest int) being exact on doubles the final half of whose fraction digits are 0.
Outline of how reduce_sincos
works
The parameters are a double x
to be reduced and the addresses of two variables a
and da
which will hold a multiple precision approximation to the range reduction of x
. These are kept distinct because da
will be a lot smaller than a
so that the floating point addition a + da
could lose accuracy. The return value is the remainder mod 4 of the multiple of \(\pi/2\) which will be subtracted from x
First, reduce_sincos
evaluates xn = [hpinv * x]
where hpinv
is the nearest double to \(2/pi\) and the square brackets denote rounding to the nearest integer. Then it computes an approximation to the remainder \(x - (\pi/2)\mathtt{xn}\) by doing the subtraction in four steps, equivalent to
(((x - xn * mp1) - xn * mp2) - xn * pp3) - xn * pp4
where mp1
, mp2
, pp3
, and pp4
are doubles forming a multiple-precision approximation to \(\pi/2\). The result of this subtraction is stored in the variable a
. All of the multiplications except the last are exact as are the first two subtractions, but the final two floating point subtractions may not be. The sum of the two subtraction errors is stored in da
. Finally the function returns the remainder of xn
mod 4.
Details of the reduce_sincos
code
First, the code does a float trick (lines 156-157):
double t = x * hpinv + toint;
double xn = t - toint;
The two constants here are
static const double hpinv = 0x1.45F306DC9C883p-1;
static const double toint = 0x1.8000000000000p52; // 6 755 399 441 055 744
hpinv
is the nearest double to \(2/pi \approx 0.63661977236758138\). It
is slightly larger than \(2/pi\), by about 3.9e-17. toint
is a special double
value with the property that (a + toint) - toint
is the double equal to
a
rounded to the nearest integer. Therefore after the first two lines,
\(\mathtt{xn} = [\mathtt{x} * \mathtt{hpinv}]\), where the square brackets
denote rounding to the nearest integer (using round-to-even).
Lines 158-160 are as follows. v
here is a mynumber
which is a union of a
double and an int[2]
, letting us access the two halves of a double as two
32b ints v.i[0]
and v.i[1]
. int4
is a synonym for int
.
v.x = t;
double y = (x - xn * mp1) - xn * mp2;
int4 n = v.i[LOW_HALF] & 3;
&-ing with 3 sets all bits to 0 except for the 2s and 1s bits, so the
effect is to reduce mod 4 (so long as the number was positive). The last
line therefore sets n
to the value of \(\texttt{xn} = [\mathtt{x} *
\mathtt{hpinv}]\), if x
is positive at least. The return value of the function has been found but it still has to calculate the remainder \(r = x - [\mathtt{x} * \mathtt{hpinv}](\pi/2)\) and store it in a
and da
. The double y
will be a first approximation to this value.
The constants on the middle line are shown in the following table along with some others used later. Most 0s are omitted. The first two and last two lines are just column numbers.
0 1 2 3
0.1234567890123456789012345678901234567...
pi/2 1.921FB54442D18469898CC51701B839A252049...
hp0 1.921FB54442D18
hp1 . 469898CC51701C
mp1 1.921FB5800000
mp2 - . 3BBD2E78000000
pp3 - . 39676730000000
pp4 - . 3AE8FE47C65DAE
0 1234567890123456789012345678901234567...
0 1 2 3
Note that hp0
, the nearest double to \(\pi/2\), equals mp1 + mp2
.
The reason that y
is defined using two subtractions, rather than just as x -
hp0 * xn
, is that this latter floating point subtraction may not be exact
(e.g. when x = 100000.0
) while the two subtractions actually used can be
shown to be exact. The first is exact by the Sterbenz
lemma (a-b
is exact so long as
a/2 <= b <= 2a
): x
and hp0 * xn
are pretty close because xn
is
approximately \((2/\pi)\mathtt{x}\). To show that the second exact we need to show that \(x - mp1\times xn - mp2 \times xn\) (mathematical subtraction, not floating point subtraction) is a floating point number (that is, one of the possible values of a 64 bit double).
Note that \(|\pi/2 - mp1 - mp2| < 2^{-52}\), \(|xn| < 2^{26}\), and that we can write \(mp1 = s_1 2^{-25}\) and \(mp2 = -s_2 2^{-53}\) for whole numbers \(s_1, s_2\). Now \(x-mp1\times xn - mp2 \times xn = 2^{-53}(2^{53}x - xn(2^{28}s_1 - s_2))\). It’s enough (because denormal numbers are allowed) to show that the integer \(2^{53}x - xn(2^{28}s_1 - s_2)\) is less than \(2^{53}\) in magnitude. That’s equivalent to \(|x-mp1\times xn - mp2\times xn|<1\). This is true because it’s at most \(|x-\pi/2 xn| + |xn||\pi/2-mp1-mp2|\). The second term is very small because \(|xn|<2^{26}\) while \(mp1+mp2\) is less than \(2^{-52}\). For the first, \(|(2/\pi x - xn)| \leqslant 1/2\) so \(|x - xn \pi/2 |< \pi/4 \approx 0.785\).
Incidentally, studying exactness of floating point arithmetic operations is much easier when you have a convenient way to check your work. Python’s fractions
module is really useful here, e.g.
from fractions import Fraction
fff = Fraction.from_float
def sub_error(f1, f2):
# Compute the error in the fp subtraction f1 - f2
f1_frac = fff(f1)
f2_frac = fff(f2)
diff_frac = fff(f1-f2)
return diff_frac - (f1_frac - f2_frac)
def test_exactness(x):
xn = (x*hpinv + toint) - toint
return sub_error(x - xn*mp1, xn*mp2)
Back to reduce_sincos
, lines 163-165 are:
t1 = xn * pp3;
t2 = y - t1;
db = (y - t2) - t1;
To increase the accuracy of the approximation y
to the remainder \(r = x -
[(2/\pi)x](\pi/2)\) we subtract off pp3 * xn
where pp3
is as above. The
floating point subtraction is no longer exact - in fact, the rounding error is
exactly db
. This follows from the section of The Art of Computer
Programming on floating point arithmetic: 4.2.2 eq. (3) says \(u \ominus v = u
\oplus -v\). Using this in 4.2.2 Theorem C gives (for \(|u|>|v|\)) \(u-v - (u
\ominus v) = (u \ominus (u \ominus v))\ominus v\). For us, using the convention
that circled operations are floating point arithmetic and uncircled ones are real number arithmetic, db
is \((y \ominus (y \ominus t1)) \ominus t1 = y-t1 - (y \ominus t1)\) by Theorem C.
The next block of three lines sets b to (y - xn * pp3) - xn * pp4
and
increases db
by the rounding error in doing the new t2
minus xn * pp4
, so
that in the end the value of b
is (((x - xn * mp1) - xn * mp2) - xn * pp3) - xn * pp4
and db
is the sum of the rounding errors for the last two floating
point subtractions (assuming no underflow).
Taken together, the four constants mp1
, mp2
, pp3
, pp4
give us a 136b approximation \(p\) to \(\pi/2\).
Ignoring the subtraction errors, which we know exactly, the difference between our \(y\) and the true remainder \(\mathtt{x}-(\pi/2)\mathtt{xn}\) is \(| y - (\mathtt{x} - (\pi/2)\mathtt{xn})| = |p \mathtt{xn} - (\pi/2) \mathtt{sn}| \leqslant 2^{-136}|\mathtt{xn}| \leqslant 2^{-110}\) using the restriction on the size of x
. To decide if this is good enough for all 53 significant figures of y
to be correct, we need to know how small the remainder could theoretically be: if it was tiny, this error might make the estimate useless. In other words, we need to know how close a floating point number x
in our range could be to an integer multiple of \(\pi/2\). If every floating point number in the range is at least \(2^{-110+53} = 2^{-57}\) from an integer multiple of \(\pi/2\), we’re OK. I haven’t checked this, but it is certainly plausible: in Argument reduction for huge arguments: good to the last bit by K.C.Ng it is claimed that the minimum distance to a multiple of \(\pi/2\) over all 64b doubles is larger than \(2^{-61}\), and we are only dealing with a very small subset of the possible double values.
It will be useful to have an idea of the size of the sum of the two subtraction errors stored in da
. Floating point subtractions are required to be correctly rounded, so they must be within half an ulp of the true answer. The final subtraction gives us y
, so the second subtraction error is at most half an ulp for y
. The correct result of the penultimate subtraction could in principle be \(2^{26}\) times bigger than y
, so the order of magnitude for the sum of the subtraction errors is at most \(2^{-27}y\). That is, da
will be at most \(2^{-27}\) times a
.
[hpinv * x]
versus [2x/π]
Range reduction modulo \(\pi/2\) usually refers to the process of taking a possibly very large floating point number x
and finding an integer q
such that \(x = q(\pi/2) + r\) where \(r \in [-\pi/4, \pi/4]\). The reason to do this is that up to a sign, which you can work out from the residue class of q
mod 4, sin or cos of x
equals sin or cos of r
while r
is small enough that calculating its sine or cosine is relatively easy. The true value of q
in this equation is \((2/\pi)x\) rounded to the nearest integer (no need for any tiebreaking: this number is irrational so it cannot be half way between two integers).
The reduce_sincos
function doesn’t quite do this. The integer q
, named xn
in the code, equals [hpinv * x]
where hpinv
is the nearest double to \(2/\pi\). This doesn’t have enough precision to agree with [2x/π]
for all x
in our range. Experimenting with sympy.ntheory.continued_fraction
produces a concrete example of 61,462,730.5. The true value of 2x/π
is 39128389.499999999026, so [2x/π] = 39,128,389
while hpinv * x
is 39128389.5 which rounds up to 39,128,390. This means the reduced value a
computed by reduce_sincos
will be strictly less than \(-\pi/4\) (-0.7853981649269647, over \(10^{-9}\) outside the range - this isn’t compensated for by the subtraction error da
which is of much smaller magnitude).
While it means reduce_sincos
is not always calculating the correct range reduction, I don’t believe it will ever affect the sin and cos calculations. sin and cos of x can still be calculated from the computed values of a
, da
, and xn
. If the a
-value produced was very large then some of the assumptions used in calculating its sin or cosine by other functions might be broken, but this doesn’t occur.